diff options
author | Gustavo André dos Santos Lopes <cataphract@php.net> | 2012-05-31 12:11:44 +0200 |
---|---|---|
committer | Gustavo André dos Santos Lopes <cataphract@php.net> | 2012-06-04 22:25:07 +0200 |
commit | f5b421621d89c3e87c498ee228c421e67719fbc6 (patch) | |
tree | 54af302ed83bb74e738daf8c93df31affdacb1b1 | |
parent | c22a29b57639178581210ec377ea4e9909f828c9 (diff) | |
download | php-git-f5b421621d89c3e87c498ee228c421e67719fbc6.tar.gz |
BreakIterator and RuleBasedBreakiterator added
This commit adds wrappers for the classes BreakIterator and
RuleBasedbreakIterator. The C++ ICU classes are described here:
<http://icu-project.org/apiref/icu4c/classBreakIterator.html>
<http://icu-project.org/apiref/icu4c/classRuleBasedBreakIterator.html>
Additionally, a tutorial is available at:
<http://userguide.icu-project.org/boundaryanalysis>
This implementation wraps UTF-8 text in a UText. The text is
iterated without any copying or conversion to UTF-16. There is
also no validation that the input is actually UTF-8; where there
are malformed sequences, the UText will simply U+FFFD.
The class BreakIterator cannot be instantiated directly (has a
private constructor). It provides the interface exposed by the ICU
abstract class with the same name. The PHP class is not abstract
because we may use it to wrap native subclasses of BreakIterator
that we don't know how to wrap. This class includes methods to
move the iterator position to the beginning (first()), to the
end (last()), forward (next()), backwards (previous()), to the
boundary preceding a certain position (preceding()) and following
a certain position (following()) and to obtain the current position
(current()). next() can also be used to advance or recede an
arbitrary number of positions.
BreakIterator also exposes other native methods:
getAvailableLocales(), getLocale() and factory methods to build
several predefined types of BreakIterators: createWordInstance()
for word boundaries, createCharacterInstance() for locale
dependent notions of "characters", createSentenceInstance() for
sentences, createLineInstance() and createTitleInstance() -- for
title casing breaks. These factories currently return
RuleBasedbreakIterators where the names of the rule sets are found
in the ICU data, observing the passed locale (although the locale
is taken into considering there are very few exceptions to the
root rules).
The clone and compare_object PHP object handlers are also
implemented, though the comparison does not yield meaningful results
when used with >, <, >= and <=.
Note that BreakIterator is an iterator only in the sense of the
first 'Iterator' in 'IteratorIterator', i.e., it does not
implement the Iterator interface. The reason is that there is
no sensible implementation for Iterator::key(). Using it for
an ordinal of the current boundary is not feasible because
we are allowed to move to any boundary at any time. It we were
to determine the current ordinal when last() is called we'd
have to traverse the whole input text to find out how many
breaks there were before. Therefore, BreakIterator implements
only Traversable. It can be wrapped in an IteratorIterator,
but the usual warnings apply.
Finally, I added a convenience method to BreakIterator:
getPartsIterator(). This provides an IntlIterator, backed
by the BreakIterator PHP object (i.e. moving the pointer or
changing the text in BreakIterator affects the iterator
and also moving the iterator affects the backing BreakIterator),
which allows traversing the text between each boundary.
This iterator uses the original text to retrieve the text
between two positions, not the code points returned by the
wrapping UText. Therefore, if the text includes invalid code
unit sequences, these invalid sequences will be in the output
of this iterator, not U+FFFD code points.
The class RuleBasedIterator exposes a constructor that allows
building an iterator from arbitrary compiled or non-compiled
rules. The form of these rules in described in the tutorial linked
above. The rest of the methods allow retrieving the rules --
getRules() and getCompiledRules() --, a hash code of the rule set
(hashCode()) and the rules statuses (getRuleStatus() and
getRuleStatusVec()).
Because the RuleBasedBreakIterator constructor may return parse
errors, I reuse the UParseError to text function that was in the
transliterator files. Therefore, I move that function to
intl_error.c.
common_enum.cpp was also changed, mainly to expose previously
static functions. This avoided code duplication when implementing
the BreakIterator iterator and the IntlIterator returned by
BreakIterator::getPartsIterator().
-rw-r--r-- | ext/intl/breakiterator/breakiterator_class.cpp | 342 | ||||
-rw-r--r-- | ext/intl/breakiterator/breakiterator_class.h | 71 | ||||
-rw-r--r-- | ext/intl/breakiterator/breakiterator_iterators.cpp | 218 | ||||
-rw-r--r-- | ext/intl/breakiterator/breakiterator_iterators.h | 39 | ||||
-rw-r--r-- | ext/intl/breakiterator/breakiterator_methods.cpp | 442 | ||||
-rw-r--r-- | ext/intl/breakiterator/breakiterator_methods.h | 64 | ||||
-rw-r--r-- | ext/intl/breakiterator/rulebasedbreakiterator_methods.cpp | 227 | ||||
-rw-r--r-- | ext/intl/breakiterator/rulebasedbreakiterator_methods.h | 34 | ||||
-rw-r--r-- | ext/intl/common/common_enum.cpp | 49 | ||||
-rw-r--r-- | ext/intl/common/common_enum.h | 38 | ||||
-rwxr-xr-x | ext/intl/config.m4 | 4 | ||||
-rwxr-xr-x | ext/intl/config.w32 | 7 | ||||
-rwxr-xr-x | ext/intl/intl_error.c | 78 | ||||
-rwxr-xr-x | ext/intl/intl_error.h | 5 | ||||
-rwxr-xr-x | ext/intl/php_intl.c | 5 | ||||
-rw-r--r-- | ext/intl/transliterator/transliterator.c | 79 | ||||
-rw-r--r-- | ext/intl/transliterator/transliterator_methods.c | 2 |
17 files changed, 1583 insertions, 121 deletions
diff --git a/ext/intl/breakiterator/breakiterator_class.cpp b/ext/intl/breakiterator/breakiterator_class.cpp new file mode 100644 index 0000000000..47e5fb52df --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_class.cpp @@ -0,0 +1,342 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ +*/ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <unicode/brkiter.h> +#include <unicode/rbbi.h> + +#include "breakiterator_iterators.h" + +#include <typeinfo> + +extern "C" { +#define USE_BREAKITERATOR_POINTER 1 +#include "breakiterator_class.h" +#include "breakiterator_methods.h" +#include "rulebasedbreakiterator_methods.h" +#include <zend_exceptions.h> +#include <zend_interfaces.h> +#include <assert.h> +} + +/* {{{ Global variables */ +zend_class_entry *BreakIterator_ce_ptr; +zend_class_entry *RuleBasedBreakIterator_ce_ptr; +zend_object_handlers BreakIterator_handlers; +/* }}} */ + +U_CFUNC void breakiterator_object_create(zval *object, + BreakIterator *biter TSRMLS_DC) +{ + UClassID classId = biter->getDynamicClassID(); + zend_class_entry *ce; + + if (classId == RuleBasedBreakIterator::getStaticClassID()) { + ce = RuleBasedBreakIterator_ce_ptr; + } else { + ce = BreakIterator_ce_ptr; + } + + object_init_ex(object, ce); + breakiterator_object_construct(object, biter TSRMLS_CC); +} + +U_CFUNC void breakiterator_object_construct(zval *object, + BreakIterator *biter TSRMLS_DC) +{ + BreakIterator_object *bio; + + BREAKITER_METHOD_FETCH_OBJECT_NO_CHECK; //populate to from object + assert(bio->biter == NULL); + bio->biter = biter; +} + +/* {{{ compare handler for BreakIterator */ +static int BreakIterator_compare_objects(zval *object1, + zval *object2 TSRMLS_DC) +{ + BreakIterator_object *bio1, + *bio2; + + bio1 = (BreakIterator_object*)zend_object_store_get_object(object1 TSRMLS_CC); + bio2 = (BreakIterator_object*)zend_object_store_get_object(object2 TSRMLS_CC); + + if (bio1->biter == NULL || bio2->biter == NULL) { + return bio1->biter == bio2->biter ? 0 : 1; + } + + return *bio1->biter == *bio2->biter ? 0 : 1; +} +/* }}} */ + +/* {{{ clone handler for BreakIterator */ +static zend_object_value BreakIterator_clone_obj(zval *object TSRMLS_DC) +{ + BreakIterator_object *bio_orig, + *bio_new; + zend_object_value ret_val; + + bio_orig = (BreakIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + intl_errors_reset(INTL_DATA_ERROR_P(bio_orig) TSRMLS_CC); + + ret_val = BreakIterator_ce_ptr->create_object(Z_OBJCE_P(object) TSRMLS_CC); + bio_new = (BreakIterator_object*)zend_object_store_get_object_by_handle( + ret_val.handle TSRMLS_CC); + + zend_objects_clone_members(&bio_new->zo, ret_val, + &bio_orig->zo, Z_OBJ_HANDLE_P(object) TSRMLS_CC); + + if (bio_orig->biter != NULL) { + BreakIterator *new_biter; + + new_biter = bio_orig->biter->clone(); + if (!new_biter) { + char *err_msg; + intl_errors_set_code(BREAKITER_ERROR_P(bio_orig), + U_MEMORY_ALLOCATION_ERROR TSRMLS_CC); + intl_errors_set_custom_msg(BREAKITER_ERROR_P(bio_orig), + "Could not clone BreakIterator", 0 TSRMLS_CC); + err_msg = intl_error_get_message(BREAKITER_ERROR_P(bio_orig) TSRMLS_CC); + zend_throw_exception(NULL, err_msg, 0 TSRMLS_CC); + efree(err_msg); + } else { + bio_new->biter = new_biter; + bio_new->text = bio_orig->text; + if (bio_new->text) { + zval_add_ref(&bio_new->text); + } + } + } else { + zend_throw_exception(NULL, "Cannot clone unconstructed BreakIterator", 0 TSRMLS_CC); + } + + return ret_val; +} +/* }}} */ + +/* {{{ get_debug_info handler for BreakIterator */ +static HashTable *BreakIterator_get_debug_info(zval *object, int *is_temp TSRMLS_DC) +{ + zval zv = zval_used_for_init; + BreakIterator_object *bio; + const BreakIterator *biter; + + *is_temp = 1; + + array_init_size(&zv, 8); + + bio = (BreakIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + biter = bio->biter; + + if (biter == NULL) { + add_assoc_bool_ex(&zv, "valid", sizeof("valid"), 0); + return Z_ARRVAL(zv); + } + add_assoc_bool_ex(&zv, "valid", sizeof("valid"), 1); + + if (bio->text == NULL) { + add_assoc_null_ex(&zv, "text", sizeof("text")); + } else { + zval_add_ref(&bio->text); + add_assoc_zval_ex(&zv, "text", sizeof("text"), bio->text); + } + + add_assoc_string_ex(&zv, "type", sizeof("type"), + const_cast<char*>(typeid(*biter).name()), 1); + + return Z_ARRVAL(zv); +} +/* }}} */ + +/* {{{ void breakiterator_object_init(BreakIterator_object* to) + * Initialize internals of BreakIterator_object not specific to zend standard objects. + */ +static void breakiterator_object_init(BreakIterator_object *bio TSRMLS_DC) +{ + intl_error_init(BREAKITER_ERROR_P(bio) TSRMLS_CC); + bio->biter = NULL; + bio->text = NULL; +} +/* }}} */ + +/* {{{ BreakIterator_objects_dtor */ +static void BreakIterator_objects_dtor(void *object, + zend_object_handle handle TSRMLS_DC) +{ + zend_objects_destroy_object((zend_object*)object, handle TSRMLS_CC); +} +/* }}} */ + +/* {{{ BreakIterator_objects_free */ +static void BreakIterator_objects_free(zend_object *object TSRMLS_DC) +{ + BreakIterator_object* bio = (BreakIterator_object*) object; + + if (bio->text) { + zval_ptr_dtor(&bio->text); + } + if (bio->biter) { + delete bio->biter; + bio->biter = NULL; + } + intl_error_reset(BREAKITER_ERROR_P(bio) TSRMLS_CC); + + zend_object_std_dtor(&bio->zo TSRMLS_CC); + + efree(bio); +} +/* }}} */ + +/* {{{ BreakIterator_object_create */ +static zend_object_value BreakIterator_object_create(zend_class_entry *ce TSRMLS_DC) +{ + zend_object_value retval; + BreakIterator_object* intern; + + intern = (BreakIterator_object*)ecalloc(1, sizeof(BreakIterator_object)); + + zend_object_std_init(&intern->zo, ce TSRMLS_CC); +#if PHP_VERSION_ID < 50399 + zend_hash_copy(intern->zo.properties, &(ce->default_properties), + (copy_ctor_func_t) zval_add_ref, NULL, sizeof(zval*)); +#else + object_properties_init((zend_object*) intern, ce); +#endif + breakiterator_object_init(intern TSRMLS_CC); + + retval.handle = zend_objects_store_put( + intern, + BreakIterator_objects_dtor, + (zend_objects_free_object_storage_t) BreakIterator_objects_free, + NULL TSRMLS_CC); + + retval.handlers = &BreakIterator_handlers; + + return retval; +} +/* }}} */ + +/* {{{ BreakIterator/RuleBasedBreakIterator methods arguments info */ + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_void, 0, 0, 0) +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_locale, 0, 0, 0) + ZEND_ARG_INFO(0, "locale") +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_setText, 0, 0, 1) + ZEND_ARG_INFO(0, "text") +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_next, 0, 0, 0) + ZEND_ARG_INFO(0, "offset") +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_offset, 0, 0, 1) + ZEND_ARG_INFO(0, "offset") +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_biter_get_locale, 0, 0, 1) + ZEND_ARG_INFO(0, "locale_type") +ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO_EX(ainfo_rbbi___construct, 0, 0, 1) + ZEND_ARG_INFO(0, "rules") + ZEND_ARG_INFO(0, "areCompiled") +ZEND_END_ARG_INFO() + +/* }}} */ + +/* {{{ BreakIterator_class_functions + * Every 'BreakIterator' class method has an entry in this table + */ +static const zend_function_entry BreakIterator_class_functions[] = { + PHP_ME(BreakIterator, __construct, ainfo_biter_void, ZEND_ACC_PRIVATE) + PHP_ME_MAPPING(createWordInstance, breakiter_create_word_instance, ainfo_biter_locale, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(createLineInstance, breakiter_create_line_instance, ainfo_biter_locale, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(createCharacterInstance, breakiter_create_character_instance, ainfo_biter_locale, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(createSentenceInstance, breakiter_create_sentence_instance, ainfo_biter_locale, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(createTitleInstance, breakiter_create_title_instance, ainfo_biter_locale, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getAvailableLocales, breakiter_get_available_locales, ainfo_biter_void, ZEND_ACC_STATIC | ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getText, breakiter_get_text, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(setText, breakiter_set_text, ainfo_biter_setText, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(first, breakiter_first, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(last, breakiter_last, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(previous, breakiter_previous, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(next, breakiter_next, ainfo_biter_next, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(current, breakiter_current, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(following, breakiter_following, ainfo_biter_offset, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(preceding, breakiter_preceding, ainfo_biter_offset, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(isBoundary, breakiter_is_boundary, ainfo_biter_offset, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getLocale, breakiter_get_locale, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getPartsIterator, breakiter_get_parts_iterator, ainfo_biter_void, ZEND_ACC_PUBLIC) + + PHP_ME_MAPPING(getErrorCode, breakiter_get_error_code, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getErrorMessage, breakiter_get_error_message, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_FE_END +}; +/* }}} */ + +/* {{{ RuleBasedBreakIterator_class_functions + */ +static const zend_function_entry RuleBasedBreakIterator_class_functions[] = { + PHP_ME(RuleBasedBreakIterator, __construct, ainfo_rbbi___construct, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(hashCode, rbbi_hash_code, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getRules, rbbi_get_rules, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getRuleStatus, rbbi_get_rule_status, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getRuleStatusVec, rbbi_get_rule_status_vec, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_ME_MAPPING(getBinaryRules, rbbi_get_binary_rules, ainfo_biter_void, ZEND_ACC_PUBLIC) + PHP_FE_END +}; +/* }}} */ + + +/* {{{ breakiterator_register_BreakIterator_class + * Initialize 'BreakIterator' class + */ +void breakiterator_register_BreakIterator_class(TSRMLS_D) +{ + zend_class_entry ce; + + /* Create and register 'BreakIterator' class. */ + INIT_CLASS_ENTRY(ce, "BreakIterator", BreakIterator_class_functions); + ce.create_object = BreakIterator_object_create; + ce.get_iterator = _breakiterator_get_iterator; + BreakIterator_ce_ptr = zend_register_internal_class(&ce TSRMLS_CC); + + memcpy( &BreakIterator_handlers, zend_get_std_object_handlers(), + sizeof BreakIterator_handlers); + BreakIterator_handlers.compare_objects = BreakIterator_compare_objects; + BreakIterator_handlers.clone_obj = BreakIterator_clone_obj; + BreakIterator_handlers.get_debug_info = BreakIterator_get_debug_info; + + zend_class_implements(BreakIterator_ce_ptr TSRMLS_CC, 1, + zend_ce_traversable); + + zend_declare_class_constant_long(BreakIterator_ce_ptr, + "DONE", sizeof("DONE") - 1, BreakIterator::DONE TSRMLS_CC ); + + /* Create and register 'RuleBasedBreakIterator' class. */ + INIT_CLASS_ENTRY(ce, "RuleBasedBreakIterator", + RuleBasedBreakIterator_class_functions); + RuleBasedBreakIterator_ce_ptr = zend_register_internal_class_ex(&ce, + BreakIterator_ce_ptr, NULL TSRMLS_CC); +} +/* }}} */ diff --git a/ext/intl/breakiterator/breakiterator_class.h b/ext/intl/breakiterator/breakiterator_class.h new file mode 100644 index 0000000000..a387266283 --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_class.h @@ -0,0 +1,71 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ + */ + +#ifndef BREAKITERATOR_CLASS_H +#define BREAKITERATOR_CLASS_H + +//redefinition of inline in PHP headers causes problems, so include this before +#include <math.h> + +#include <php.h> +#include "../intl_error.h" +#include "../intl_data.h" + +#ifndef USE_BREAKITERATOR_POINTER +typedef void BreakIterator; +#endif + +typedef struct { + zend_object zo; + + // error handling + intl_error err; + + // ICU break iterator + BreakIterator* biter; + + // current text + zval *text; +} BreakIterator_object; + +#define BREAKITER_ERROR(bio) (bio)->err +#define BREAKITER_ERROR_P(bio) &(BREAKITER_ERROR(bio)) + +#define BREAKITER_ERROR_CODE(bio) INTL_ERROR_CODE(BREAKITER_ERROR(bio)) +#define BREAKITER_ERROR_CODE_P(bio) &(INTL_ERROR_CODE(BREAKITER_ERROR(bio))) + +#define BREAKITER_METHOD_INIT_VARS INTL_METHOD_INIT_VARS(BreakIterator, bio) +#define BREAKITER_METHOD_FETCH_OBJECT_NO_CHECK INTL_METHOD_FETCH_OBJECT(BreakIterator, bio) +#define BREAKITER_METHOD_FETCH_OBJECT \ + BREAKITER_METHOD_FETCH_OBJECT_NO_CHECK; \ + if (bio->biter == NULL) \ + { \ + intl_errors_set(&bio->err, U_ILLEGAL_ARGUMENT_ERROR, "Found unconstructed BreakIterator", 0 TSRMLS_CC); \ + RETURN_FALSE; \ + } + +void breakiterator_object_create(zval *object, BreakIterator *break_iter TSRMLS_DC); + +void breakiterator_object_construct(zval *object, BreakIterator *break_iter TSRMLS_DC); + +void breakiterator_register_BreakIterator_class(TSRMLS_D); + +extern zend_class_entry *BreakIterator_ce_ptr, + *RuleBasedBreakIterator_ce_ptr; + +extern zend_object_handlers BreakIterator_handlers; + +#endif /* #ifndef BREAKITERATOR_CLASS_H */ diff --git a/ext/intl/breakiterator/breakiterator_iterators.cpp b/ext/intl/breakiterator/breakiterator_iterators.cpp new file mode 100644 index 0000000000..4a0cf1da80 --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_iterators.cpp @@ -0,0 +1,218 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ +*/ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "breakiterator_iterators.h" + +extern "C" { +#define USE_BREAKITERATOR_POINTER +#include "breakiterator_class.h" +#include "../intl_convert.h" +#include "../locale/locale.h" +#include <zend_exceptions.h> +} + +/* BreakIterator's iterator */ + +inline BreakIterator *_breakiter_prolog(zend_object_iterator *iter TSRMLS_DC) +{ + BreakIterator_object *bio; + bio = (BreakIterator_object*)zend_object_store_get_object( + (const zval*)iter->data TSRMLS_CC); + intl_errors_reset(BREAKITER_ERROR_P(bio) TSRMLS_CC); + if (bio->biter == NULL) { + intl_errors_set(BREAKITER_ERROR_P(bio), U_INVALID_STATE_ERROR, + "The BreakIterator object backing the PHP iterator is not " + "properly constructed", 0 TSRMLS_CC); + } + return bio->biter; +} + +static void _breakiterator_destroy_it(zend_object_iterator *iter TSRMLS_DC) +{ + zval_ptr_dtor((zval**)&iter->data); +} + +static void _breakiterator_move_forward(zend_object_iterator *iter TSRMLS_DC) +{ + BreakIterator *biter = _breakiter_prolog(iter TSRMLS_CC); + zoi_with_current *zoi_iter = (zoi_with_current*)iter; + + iter->funcs->invalidate_current(iter TSRMLS_CC); + + if (biter == NULL) { + return; + } + + int32_t pos = biter->next(); + if (pos != BreakIterator::DONE) { + MAKE_STD_ZVAL(zoi_iter->current); + ZVAL_LONG(zoi_iter->current, (long)pos); + } //else we've reached the end of the enum, nothing more is required +} + +static void _breakiterator_rewind(zend_object_iterator *iter TSRMLS_DC) +{ + BreakIterator *biter = _breakiter_prolog(iter TSRMLS_CC); + zoi_with_current *zoi_iter = (zoi_with_current*)iter; + + int32_t pos = biter->first(); + MAKE_STD_ZVAL(zoi_iter->current); + ZVAL_LONG(zoi_iter->current, (long)pos); +} + +static zend_object_iterator_funcs breakiterator_iterator_funcs = { + zoi_with_current_dtor, + zoi_with_current_valid, + zoi_with_current_get_current_data, + NULL, + _breakiterator_move_forward, + _breakiterator_rewind, + zoi_with_current_invalidate_current +}; + +U_CFUNC zend_object_iterator *_breakiterator_get_iterator( + zend_class_entry *ce, zval *object, int by_ref TSRMLS_DC) +{ + BreakIterator_object *bio; + if (by_ref) { + zend_throw_exception(NULL, + "Iteration by reference is not supported", 0 TSRMLS_CC); + return NULL; + } + + bio = (BreakIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + BreakIterator *biter = bio->biter; + + if (biter == NULL) { + zend_throw_exception(NULL, + "The BreakIterator is not properly constructed", 0 TSRMLS_CC); + return NULL; + } + + zoi_with_current *zoi_iter = + static_cast<zoi_with_current*>(emalloc(sizeof *zoi_iter)); + zoi_iter->zoi.data = static_cast<void*>(object); + zoi_iter->zoi.funcs = &breakiterator_iterator_funcs; + zoi_iter->zoi.index = 0; + zoi_iter->destroy_it = _breakiterator_destroy_it; + zoi_iter->wrapping_obj = NULL; /* not used; object is in zoi.data */ + zoi_iter->current = NULL; + + zval_add_ref(&object); + + return reinterpret_cast<zend_object_iterator *>(zoi_iter); +} + +/* BreakIterator parts iterator */ + +typedef struct zoi_break_iter_parts { + zoi_with_current zoi_cur; + BreakIterator_object *bio; /* so we don't have to fetch it all the time */ +} zoi_break_iter_parts; + +static void _breakiterator_parts_destroy_it(zend_object_iterator *iter TSRMLS_DC) +{ + zval_ptr_dtor(reinterpret_cast<zval**>(&iter->data)); +} + +static void _breakiterator_parts_move_forward(zend_object_iterator *iter TSRMLS_DC) +{ + zoi_break_iter_parts *zoi_bit = (zoi_break_iter_parts*)iter; + BreakIterator_object *bio = zoi_bit->bio; + + iter->funcs->invalidate_current(iter TSRMLS_CC); + + int32_t cur, + next; + + cur = bio->biter->current(); + if (cur == BreakIterator::DONE) { + return; + } + next = bio->biter->next(); + if (next == BreakIterator::DONE) { + return; + } + + const char *s = Z_STRVAL_P(bio->text); + int32_t slen = Z_STRLEN_P(bio->text), + len; + char *res; + + if (next == BreakIterator::DONE) { + next = slen; + } + assert(next <= slen && next >= cur); + len = next - cur; + res = static_cast<char*>(emalloc(len + 1)); + + memcpy(res, &s[cur], len); + res[len] = '\0'; + + MAKE_STD_ZVAL(zoi_bit->zoi_cur.current); + ZVAL_STRINGL(zoi_bit->zoi_cur.current, res, len, 0); +} + +static void _breakiterator_parts_rewind(zend_object_iterator *iter TSRMLS_DC) +{ + zoi_break_iter_parts *zoi_bit = (zoi_break_iter_parts*)iter; + BreakIterator_object *bio = zoi_bit->bio; + + if (zoi_bit->zoi_cur.current) { + iter->funcs->invalidate_current(iter TSRMLS_CC); + } + + bio->biter->first(); + + iter->funcs->move_forward(iter TSRMLS_CC); +} + +static zend_object_iterator_funcs breakiterator_parts_it_funcs = { + zoi_with_current_dtor, + zoi_with_current_valid, + zoi_with_current_get_current_data, + NULL, + _breakiterator_parts_move_forward, + _breakiterator_parts_rewind, + zoi_with_current_invalidate_current +}; + +void IntlIterator_from_BreakIterator_parts(zval *break_iter_zv, + zval *object TSRMLS_DC) +{ + IntlIterator_object *ii; + + zval_add_ref(&break_iter_zv); + + object_init_ex(object, IntlIterator_ce_ptr); + ii = (IntlIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + + ii->iterator = (zend_object_iterator*)emalloc(sizeof(zoi_break_iter_parts)); + ii->iterator->data = break_iter_zv; + ii->iterator->funcs = &breakiterator_parts_it_funcs; + ii->iterator->index = 0; + ((zoi_with_current*)ii->iterator)->destroy_it = _breakiterator_parts_destroy_it; + ((zoi_with_current*)ii->iterator)->wrapping_obj = object; + ((zoi_with_current*)ii->iterator)->current = NULL; + + ((zoi_break_iter_parts*)ii->iterator)->bio = (BreakIterator_object*) + zend_object_store_get_object(break_iter_zv TSRMLS_CC); + assert(((zoi_break_iter_parts*)ii->iterator)->bio->biter != NULL); +} diff --git a/ext/intl/breakiterator/breakiterator_iterators.h b/ext/intl/breakiterator/breakiterator_iterators.h new file mode 100644 index 0000000000..4ef5a2f4ef --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_iterators.h @@ -0,0 +1,39 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ +*/ +#ifndef INTL_BREAKITERATOR_ITERATORS_H +#define INTL_BREAKITERATOR_ITERATORS_H + +#ifndef __cplusplus +#error Header for C++ only +#endif + +#include <unicode/brkiter.h> +#include <unicode/umachine.h> + +#include "../common/common_enum.h" + +extern "C" { +#include <math.h> +#include <php.h> +} + +void IntlIterator_from_BreakIterator_parts(zval *break_iter_zv, + zval *object TSRMLS_DC); + +U_CFUNC zend_object_iterator *_breakiterator_get_iterator( + zend_class_entry *ce, zval *object, int by_ref TSRMLS_DC); + +#endif
\ No newline at end of file diff --git a/ext/intl/breakiterator/breakiterator_methods.cpp b/ext/intl/breakiterator/breakiterator_methods.cpp new file mode 100644 index 0000000000..4aca6ef23f --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_methods.cpp @@ -0,0 +1,442 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ +*/ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <unicode/brkiter.h> + +#include "breakiterator_iterators.h" + +extern "C" { +#define USE_BREAKITERATOR_POINTER 1 +#include "breakiterator_class.h" +#include "../locale/locale.h" +#include <zend_exceptions.h> +} + +U_CFUNC PHP_METHOD(BreakIterator, __construct) +{ + zend_throw_exception( NULL, + "An object of this type cannot be created with the new operator", + 0 TSRMLS_CC ); +} + +static void _breakiter_factory(const char *func_name, + BreakIterator *(*func)(const Locale&, UErrorCode&), + INTERNAL_FUNCTION_PARAMETERS) +{ + BreakIterator *biter; + const char *locale_str = NULL; + int dummy; + char *msg; + UErrorCode status = UErrorCode(); + intl_error_reset(NULL TSRMLS_CC); + + if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s!", + &locale_str, &dummy) == FAILURE) { + spprintf(&msg, NULL, "%s: bad arguments", func_name); + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_NULL(); + } + + if (locale_str == NULL) { + locale_str = intl_locale_get_default(TSRMLS_C); + } + + biter = func(Locale::createFromName(locale_str), status); + intl_error_set_code(NULL, status TSRMLS_CC); + if (U_FAILURE(status)) { + spprintf(&msg, NULL, "%s: error creating BreakIterator", + func_name); + intl_error_set_custom_msg(NULL, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_NULL(); + } + + breakiterator_object_create(return_value, biter TSRMLS_CC); +} + +U_CFUNC PHP_FUNCTION(breakiter_create_word_instance) +{ + _breakiter_factory("breakiter_create_word_instance", + &BreakIterator::createWordInstance, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_create_line_instance) +{ + _breakiter_factory("breakiter_create_line_instance", + &BreakIterator::createLineInstance, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_create_character_instance) +{ + _breakiter_factory("breakiter_create_character_instance", + &BreakIterator::createCharacterInstance, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_create_sentence_instance) +{ + _breakiter_factory("breakiter_create_sentence_instance", + &BreakIterator::createSentenceInstance, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_create_title_instance) +{ + _breakiter_factory("breakiter_create_title_instance", + &BreakIterator::createTitleInstance, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_get_available_locales) +{ + intl_error_reset(NULL TSRMLS_CC); + + if (zend_parse_parameters_none() == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_available_locales: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + const Locale *locales; + int32_t count; + + locales = BreakIterator::getAvailableLocales(count); + array_init_size(return_value, (uint)count); + for (int i = 0; i < count; i++) { + Locale locale = locales[i]; + add_next_index_string(return_value, locale.getName(), 1); + } +} + +U_CFUNC PHP_FUNCTION(breakiter_get_text) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_text: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + if (bio->text == NULL) { + RETURN_NULL(); + } else { + RETURN_ZVAL(bio->text, 1, 0); + } +} + +U_CFUNC PHP_FUNCTION(breakiter_set_text) +{ + char *text; + int text_len; + UText *ut = NULL; + zval **textzv; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "Os", + &object, BreakIterator_ce_ptr, &text, &text_len) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_set_text: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + int res = zend_get_parameters_ex(1, &textzv); + assert(res == SUCCESS); + + BREAKITER_METHOD_FETCH_OBJECT; + + /* assert it's safe to use text and text_len because zpp changes the + * arguments in the stack */ + assert(text == Z_STRVAL_PP(textzv)); + + ut = utext_openUTF8(ut, text, text_len, BREAKITER_ERROR_CODE_P(bio)); + INTL_CTOR_CHECK_STATUS(bio, "breakiter_set_text: error opening UText"); + + bio->biter->setText(ut, BREAKITER_ERROR_CODE(bio)); + utext_close(ut); /* ICU shallow clones the UText */ + INTL_CTOR_CHECK_STATUS(bio, "breakiter_set_text: error calling " + "BreakIterator::setText()"); + + /* When ICU clones the UText, it does not copy the buffer, so we have to + * keep the string buffer around by holding a reference to its zval. This + * also allows a faste implementation of getText() */ + if (bio->text != NULL) { + zval_ptr_dtor(&bio->text); + } + bio->text = *textzv; + zval_add_ref(&bio->text); + + RETURN_TRUE; +} + +static void _breakiter_no_args_ret_int32( + const char *func_name, + int32_t (BreakIterator::*func)(), + INTERNAL_FUNCTION_PARAMETERS) +{ + char *msg; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + spprintf(&msg, NULL, "%s: bad arguments", func_name); + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + int32_t res = (bio->biter->*func)(); + + RETURN_LONG((long)res); +} + +static void _breakiter_int32_ret_int32( + const char *func_name, + int32_t (BreakIterator::*func)(int32_t), + INTERNAL_FUNCTION_PARAMETERS) +{ + char *msg; + long arg; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "Ol", + &object, BreakIterator_ce_ptr, &arg) == FAILURE) { + spprintf(&msg, NULL, "%s: bad arguments", func_name); + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + if (arg < INT32_MIN || arg > INT32_MAX) { + spprintf(&msg, NULL, "%s: offset argument is outside bounds of " + "a 32-bit wide integer", func_name); + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_FALSE; + } + + int32_t res = (bio->biter->*func)((int32_t)arg); + + RETURN_LONG((long)res); +} + +U_CFUNC PHP_FUNCTION(breakiter_first) +{ + _breakiter_no_args_ret_int32("breakiter_first", + &BreakIterator::first, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_last) +{ + _breakiter_no_args_ret_int32("breakiter_last", + &BreakIterator::last, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_previous) +{ + _breakiter_no_args_ret_int32("breakiter_previous", + &BreakIterator::previous, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_next) +{ + bool no_arg_version = false; + + if (ZEND_NUM_ARGS() == 0) { + no_arg_version = true; + } else if (ZEND_NUM_ARGS() == 1) { + zval **arg; + int res = zend_get_parameters_ex(1, &arg); + assert(res == SUCCESS); + if (Z_TYPE_PP(arg) == IS_NULL) { + no_arg_version = true; + ht = 0; /* pretend we don't have any argument */ + } else { + no_arg_version = false; + } + } + + if (no_arg_version) { + _breakiter_no_args_ret_int32("breakiter_next", + &BreakIterator::next, + INTERNAL_FUNCTION_PARAM_PASSTHRU); + } else { + _breakiter_int32_ret_int32("breakiter_next", + &BreakIterator::next, + INTERNAL_FUNCTION_PARAM_PASSTHRU); + } +} + +U_CFUNC PHP_FUNCTION(breakiter_current) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_current: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + int32_t res = bio->biter->current(); + + RETURN_LONG((long)res); +} + +U_CFUNC PHP_FUNCTION(breakiter_following) +{ + _breakiter_int32_ret_int32("breakiter_following", + &BreakIterator::following, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_preceding) +{ + _breakiter_int32_ret_int32("breakiter_preceding", + &BreakIterator::preceding, + INTERNAL_FUNCTION_PARAM_PASSTHRU); +} + +U_CFUNC PHP_FUNCTION(breakiter_is_boundary) +{ + long offset; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "Ol", + &object, BreakIterator_ce_ptr, &offset) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_is_boundary: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + if (offset < INT32_MIN || offset > INT32_MAX) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_is_boundary: offset argument is outside bounds of " + "a 32-bit wide integer", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + UBool res = bio->biter->isBoundary((int32_t)offset); + + RETURN_BOOL((long)res); +} + +U_CFUNC PHP_FUNCTION(breakiter_get_locale) +{ + long locale_type; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), + "Ol", &object, BreakIterator_ce_ptr, &locale_type) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_locale: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + if (locale_type != ULOC_ACTUAL_LOCALE && locale_type != ULOC_VALID_LOCALE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_locale: invalid locale type", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + Locale locale = bio->biter->getLocale((ULocDataLocaleType)locale_type, + BREAKITER_ERROR_CODE(bio)); + INTL_METHOD_CHECK_STATUS(bio, + "breakiter_get_locale: Call to ICU method has failed"); + + RETURN_STRING(locale.getName(), 1); +} + +U_CFUNC PHP_FUNCTION(breakiter_get_parts_iterator) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_parts_iterator: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + IntlIterator_from_BreakIterator_parts(object, return_value TSRMLS_CC); +} + +U_CFUNC PHP_FUNCTION(breakiter_get_error_code) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_error_code: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + /* Fetch the object (without resetting its last error code ). */ + bio = (BreakIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + if (bio == NULL) + RETURN_FALSE; + + RETURN_LONG((long)BREAKITER_ERROR_CODE(bio)); +} + +U_CFUNC PHP_FUNCTION(breakiter_get_error_message) +{ + const char* message = NULL; + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set( NULL, U_ILLEGAL_ARGUMENT_ERROR, + "breakiter_get_error_message: bad arguments", 0 TSRMLS_CC ); + RETURN_FALSE; + } + + + /* Fetch the object (without resetting its last error code ). */ + bio = (BreakIterator_object*)zend_object_store_get_object(object TSRMLS_CC); + if (bio == NULL) + RETURN_FALSE; + + /* Return last error message. */ + message = intl_error_get_message(BREAKITER_ERROR_P(bio) TSRMLS_CC); + RETURN_STRING(message, 0); +} diff --git a/ext/intl/breakiterator/breakiterator_methods.h b/ext/intl/breakiterator/breakiterator_methods.h new file mode 100644 index 0000000000..42a6f3a1b3 --- /dev/null +++ b/ext/intl/breakiterator/breakiterator_methods.h @@ -0,0 +1,64 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ + */ + +#ifndef BREAKITERATOR_METHODS_H +#define BREAKITERATOR_METHODS_H + +#include <php.h> + +PHP_METHOD(BreakIterator, __construct); + +PHP_FUNCTION(breakiter_create_word_instance); + +PHP_FUNCTION(breakiter_create_line_instance); + +PHP_FUNCTION(breakiter_create_character_instance); + +PHP_FUNCTION(breakiter_create_sentence_instance); + +PHP_FUNCTION(breakiter_create_title_instance); + +PHP_FUNCTION(breakiter_get_available_locales); + +PHP_FUNCTION(breakiter_get_text); + +PHP_FUNCTION(breakiter_set_text); + +PHP_FUNCTION(breakiter_first); + +PHP_FUNCTION(breakiter_last); + +PHP_FUNCTION(breakiter_previous); + +PHP_FUNCTION(breakiter_next); + +PHP_FUNCTION(breakiter_current); + +PHP_FUNCTION(breakiter_following); + +PHP_FUNCTION(breakiter_preceding); + +PHP_FUNCTION(breakiter_is_boundary); + +PHP_FUNCTION(breakiter_get_locale); + +PHP_FUNCTION(breakiter_get_parts_iterator); + +PHP_FUNCTION(breakiter_get_error_code); + +PHP_FUNCTION(breakiter_get_error_message); + +#endif
\ No newline at end of file diff --git a/ext/intl/breakiterator/rulebasedbreakiterator_methods.cpp b/ext/intl/breakiterator/rulebasedbreakiterator_methods.cpp new file mode 100644 index 0000000000..5ccef90707 --- /dev/null +++ b/ext/intl/breakiterator/rulebasedbreakiterator_methods.cpp @@ -0,0 +1,227 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ + */ + +#include <unicode/rbbi.h> + +extern "C" { +#define USE_BREAKITERATOR_POINTER 1 +#include "breakiterator_class.h" +#include <zend_exceptions.h> +#include <limits.h> +} + +#include "../intl_convertcpp.h" + +static inline RuleBasedBreakIterator *fetch_rbbi(BreakIterator_object *bio) { + return (RuleBasedBreakIterator*)bio->biter; +} + +static void _php_intlgregcal_constructor_body(INTERNAL_FUNCTION_PARAMETERS) +{ + zval *object = getThis(); + char *rules; + int rules_len; + zend_bool compiled = 0; + UErrorCode status = U_ZERO_ERROR; + intl_error_reset(NULL TSRMLS_CC); + + if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|b", + &rules, &rules_len, &compiled) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_create_instance: bad arguments", 0 TSRMLS_CC); + RETURN_NULL(); + } + + // instantiation of ICU object + RuleBasedBreakIterator *rbbi; + + if (!compiled) { + UnicodeString rulesStr; + UParseError parseError = UParseError(); + if (intl_stringFromChar(rulesStr, rules, rules_len, &status) + == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_create_instance: rules were not a valid UTF-8 string", + 0 TSRMLS_CC); + RETURN_NULL(); + } + + rbbi = new RuleBasedBreakIterator(rulesStr, parseError, status); + intl_error_set_code(NULL, status TSRMLS_CC); + if (U_FAILURE(status)) { + char *msg; + smart_str parse_error_str; + parse_error_str = intl_parse_error_to_string(&parseError); + spprintf(&msg, 0, "rbbi_create_instance: unable to create " + "RuleBasedBreakIterator from rules (%s)", parse_error_str.c); + smart_str_free(&parse_error_str); + intl_error_set_custom_msg(NULL, msg, 1 TSRMLS_CC); + efree(msg); + RETURN_NULL(); + } + } else { // compiled + rbbi = new RuleBasedBreakIterator((uint8_t*)rules, rules_len, status); + if (U_FAILURE(status)) { + intl_error_set(NULL, status, "rbbi_create_instance: unable to " + "creaete instance from compiled rules", 0 TSRMLS_CC); + RETURN_NULL(); + } + } + + breakiterator_object_create(return_value, rbbi TSRMLS_CC); +} + +U_CFUNC PHP_METHOD(RuleBasedBreakIterator, __construct) +{ + zval orig_this = *getThis(); + + return_value = getThis(); + //changes this to IS_NULL (without first destroying) if there's an error + _php_intlgregcal_constructor_body(INTERNAL_FUNCTION_PARAM_PASSTHRU); + + if (Z_TYPE_P(return_value) == IS_NULL) { + zend_object_store_ctor_failed(&orig_this TSRMLS_CC); + zval_dtor(&orig_this); + } +} + +U_CFUNC PHP_FUNCTION(rbbi_hash_code) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_hash_code: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + RETURN_LONG(fetch_rbbi(bio)->hashCode()); +} + +U_CFUNC PHP_FUNCTION(rbbi_get_rules) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_hash_code: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + const UnicodeString rules = fetch_rbbi(bio)->getRules(); + + Z_TYPE_P(return_value) = IS_STRING; + if (intl_charFromString(rules, &Z_STRVAL_P(return_value), + &Z_STRLEN_P(return_value), BREAKITER_ERROR_CODE_P(bio)) == FAILURE) + { + intl_errors_set(BREAKITER_ERROR_P(bio), BREAKITER_ERROR_CODE(bio), + "rbbi_hash_code: Error converting result to UTF-8 string", + 0 TSRMLS_CC); + RETURN_FALSE; + } +} + +U_CFUNC PHP_FUNCTION(rbbi_get_rule_status) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_get_rule_status: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + RETURN_LONG(fetch_rbbi(bio)->getRuleStatus()); +} + +U_CFUNC PHP_FUNCTION(rbbi_get_rule_status_vec) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_get_rule_status_vec: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + int32_t num_rules = fetch_rbbi(bio)->getRuleStatusVec(NULL, 0, + BREAKITER_ERROR_CODE(bio)); + if (BREAKITER_ERROR_CODE(bio) == U_BUFFER_OVERFLOW_ERROR) { + BREAKITER_ERROR_CODE(bio) = U_ZERO_ERROR; + } else { + // should not happen + INTL_METHOD_CHECK_STATUS(bio, "rbbi_get_rule_status_vec: failed " + " determining the number of status values"); + } + int32_t *rules = new int32_t[num_rules]; + num_rules = fetch_rbbi(bio)->getRuleStatusVec(rules, num_rules, + BREAKITER_ERROR_CODE(bio)); + if (U_FAILURE(BREAKITER_ERROR_CODE(bio))) { + delete[] rules; + intl_errors_set(BREAKITER_ERROR_P(bio), BREAKITER_ERROR_CODE(bio), + "rbbi_get_rule_status_vec: failed obtaining the status values", + 0 TSRMLS_CC); + RETURN_FALSE; + } + + array_init_size(return_value, num_rules); + for (int32_t i = 0; i < num_rules; i++) { + add_next_index_long(return_value, rules[i]); + } + delete[] rules; +} + +U_CFUNC PHP_FUNCTION(rbbi_get_binary_rules) +{ + BREAKITER_METHOD_INIT_VARS; + + if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O", + &object, BreakIterator_ce_ptr) == FAILURE) { + intl_error_set(NULL, U_ILLEGAL_ARGUMENT_ERROR, + "rbbi_get_binary_rules: bad arguments", 0 TSRMLS_CC); + RETURN_FALSE; + } + + BREAKITER_METHOD_FETCH_OBJECT; + + uint32_t rules_len; + const uint8_t *rules = fetch_rbbi(bio)->getBinaryRules(rules_len); + + if (rules_len > INT_MAX - 1) { + intl_errors_set(BREAKITER_ERROR_P(bio), BREAKITER_ERROR_CODE(bio), + "rbbi_get_binary_rules: the rules are too large", + 0 TSRMLS_CC); + RETURN_FALSE; + } + + char *ret_rules = static_cast<char*>(emalloc(rules_len + 1)); + memcpy(ret_rules, rules, rules_len); + ret_rules[rules_len] = '\0'; + + RETURN_STRINGL(ret_rules, rules_len, 0); +} diff --git a/ext/intl/breakiterator/rulebasedbreakiterator_methods.h b/ext/intl/breakiterator/rulebasedbreakiterator_methods.h new file mode 100644 index 0000000000..b645e0c1cc --- /dev/null +++ b/ext/intl/breakiterator/rulebasedbreakiterator_methods.h @@ -0,0 +1,34 @@ +/* + +----------------------------------------------------------------------+ + | PHP Version 5 | + +----------------------------------------------------------------------+ + | This source file is subject to version 3.01 of the PHP license, | + | that is bundled with this package in the file LICENSE, and is | + | available through the world-wide-web at the following url: | + | http://www.php.net/license/3_01.txt | + | If you did not receive a copy of the PHP license and are unable to | + | obtain it through the world-wide-web, please send a note to | + | license@php.net so we can mail you a copy immediately. | + +----------------------------------------------------------------------+ + | Authors: Gustavo Lopes <cataphract@php.net> | + +----------------------------------------------------------------------+ + */ + +#ifndef RULEBASEDBREAKITERATOR_METHODS_H +#define RULEBASEDBREAKITERATOR_METHODS_H + +#include <php.h> + +PHP_METHOD(RuleBasedBreakIterator, __construct); + +PHP_FUNCTION(rbbi_hash_code); + +PHP_FUNCTION(rbbi_get_rules); + +PHP_FUNCTION(rbbi_get_rule_status); + +PHP_FUNCTION(rbbi_get_rule_status_vec); + +PHP_FUNCTION(rbbi_get_binary_rules); + +#endif
\ No newline at end of file diff --git a/ext/intl/common/common_enum.cpp b/ext/intl/common/common_enum.cpp index a0e346061a..6dfacd7e3a 100644 --- a/ext/intl/common/common_enum.cpp +++ b/ext/intl/common/common_enum.cpp @@ -26,45 +26,14 @@ #include "common_enum.h" extern "C" { -#include "intl_error.h" -#include "intl_data.h" #include <zend_interfaces.h> #include <zend_exceptions.h> } -static zend_class_entry *IntlIterator_ce_ptr; +zend_class_entry *IntlIterator_ce_ptr; static zend_object_handlers IntlIterator_handlers; -typedef struct { - zend_object zo; - intl_error err; - zend_object_iterator *iterator; -} IntlIterator_object; - -#define INTLITERATOR_ERROR(ii) (ii)->err -#define INTLITERATOR_ERROR_P(ii) &(INTLITERATOR_ERROR(ii)) - -#define INTLITERATOR_ERROR_CODE(ii) INTL_ERROR_CODE(INTLITERATOR_ERROR(ii)) -#define INTLITERATOR_ERROR_CODE_P(ii) &(INTL_ERROR_CODE(INTLITERATOR_ERROR(ii))) - -#define INTLITERATOR_METHOD_INIT_VARS INTL_METHOD_INIT_VARS(IntlIterator, ii) -#define INTLITERATOR_METHOD_FETCH_OBJECT_NO_CHECK INTL_METHOD_FETCH_OBJECT(IntlIterator, ii) -#define INTLITERATOR_METHOD_FETCH_OBJECT\ - object = getThis(); \ - INTLITERATOR_METHOD_FETCH_OBJECT_NO_CHECK; \ - if (ii->iterator == NULL) { \ - intl_errors_set(&ii->err, U_ILLEGAL_ARGUMENT_ERROR, "Found unconstructed IntlIterator", 0 TSRMLS_CC); \ - RETURN_FALSE; \ - } - -typedef struct { - zend_object_iterator zoi; - zval *current; - zval *wrapping_obj; - void (*destroy_free_it)(zend_object_iterator *iterator TSRMLS_DC); -} zoi_with_current; - -static void zoi_with_current_dtor(zend_object_iterator *iter TSRMLS_DC) +void zoi_with_current_dtor(zend_object_iterator *iter TSRMLS_DC) { zoi_with_current *zoiwc = (zoi_with_current*)iter; @@ -84,22 +53,22 @@ static void zoi_with_current_dtor(zend_object_iterator *iter TSRMLS_DC) * function being called by the iterator wrapper destructor function and * not finding the memory of this iterator allocated anymore. */ iter->funcs->invalidate_current(iter TSRMLS_CC); - zoiwc->destroy_free_it(iter TSRMLS_CC); + zoiwc->destroy_it(iter TSRMLS_CC); efree(iter); } } -static int zoi_with_current_valid(zend_object_iterator *iter TSRMLS_DC) +U_CFUNC int zoi_with_current_valid(zend_object_iterator *iter TSRMLS_DC) { return ((zoi_with_current*)iter)->current != NULL ? SUCCESS : FAILURE; } -static void zoi_with_current_get_current_data(zend_object_iterator *iter, zval ***data TSRMLS_DC) +U_CFUNC void zoi_with_current_get_current_data(zend_object_iterator *iter, zval ***data TSRMLS_DC) { *data = &((zoi_with_current*)iter)->current; } -static void zoi_with_current_invalidate_current(zend_object_iterator *iter TSRMLS_DC) +U_CFUNC void zoi_with_current_invalidate_current(zend_object_iterator *iter TSRMLS_DC) { zoi_with_current *zoi_iter = (zoi_with_current*)iter; if (zoi_iter->current) { @@ -155,7 +124,7 @@ static void string_enum_rewind(zend_object_iterator *iter TSRMLS_DC) } } -static void string_enum_destroy_free_it(zend_object_iterator *iter TSRMLS_DC) +static void string_enum_destroy_it(zend_object_iterator *iter TSRMLS_DC) { delete (StringEnumeration*)iter->data; } @@ -179,7 +148,7 @@ U_CFUNC void IntlIterator_from_StringEnumeration(StringEnumeration *se, zval *ob ii->iterator->data = (void*)se; ii->iterator->funcs = &string_enum_object_iterator_funcs; ii->iterator->index = 0; - ((zoi_with_current*)ii->iterator)->destroy_free_it = string_enum_destroy_free_it; + ((zoi_with_current*)ii->iterator)->destroy_it = string_enum_destroy_it; ((zoi_with_current*)ii->iterator)->wrapping_obj = object; ((zoi_with_current*)ii->iterator)->current = NULL; } @@ -331,7 +300,7 @@ static PHP_METHOD(IntlIterator, rewind) if (ii->iterator->funcs->rewind) { ii->iterator->funcs->rewind(ii->iterator TSRMLS_CC); } else { - intl_error_set(NULL, U_UNSUPPORTED_ERROR, + intl_errors_set(INTLITERATOR_ERROR_P(ii), U_UNSUPPORTED_ERROR, "IntlIterator::rewind: rewind not supported", 0 TSRMLS_CC); } } diff --git a/ext/intl/common/common_enum.h b/ext/intl/common/common_enum.h index f3c8bfcead..bcd9f44796 100644 --- a/ext/intl/common/common_enum.h +++ b/ext/intl/common/common_enum.h @@ -25,10 +25,48 @@ extern "C" { #include <math.h> #endif #include <php.h> +#include "../intl_error.h" +#include "../intl_data.h" #ifdef __cplusplus } #endif +#define INTLITERATOR_ERROR(ii) (ii)->err +#define INTLITERATOR_ERROR_P(ii) &(INTLITERATOR_ERROR(ii)) + +#define INTLITERATOR_ERROR_CODE(ii) INTL_ERROR_CODE(INTLITERATOR_ERROR(ii)) +#define INTLITERATOR_ERROR_CODE_P(ii) &(INTL_ERROR_CODE(INTLITERATOR_ERROR(ii))) + +#define INTLITERATOR_METHOD_INIT_VARS INTL_METHOD_INIT_VARS(IntlIterator, ii) +#define INTLITERATOR_METHOD_FETCH_OBJECT_NO_CHECK INTL_METHOD_FETCH_OBJECT(IntlIterator, ii) +#define INTLITERATOR_METHOD_FETCH_OBJECT\ + object = getThis(); \ + INTLITERATOR_METHOD_FETCH_OBJECT_NO_CHECK; \ + if (ii->iterator == NULL) { \ + intl_errors_set(&ii->err, U_ILLEGAL_ARGUMENT_ERROR, "Found unconstructed IntlIterator", 0 TSRMLS_CC); \ + RETURN_FALSE; \ + } + +typedef struct { + zend_object zo; + intl_error err; + zend_object_iterator *iterator; +} IntlIterator_object; + +typedef struct { + zend_object_iterator zoi; + zval *current; + zval *wrapping_obj; + void (*destroy_it)(zend_object_iterator *iterator TSRMLS_DC); +} zoi_with_current; + +extern zend_class_entry *IntlIterator_ce_ptr; + +U_CFUNC void zoi_with_current_dtor(zend_object_iterator *iter TSRMLS_DC); +U_CFUNC int zoi_with_current_valid(zend_object_iterator *iter TSRMLS_DC); +U_CFUNC void zoi_with_current_get_current_data(zend_object_iterator *iter, zval ***data TSRMLS_DC); +U_CFUNC void zoi_with_current_invalidate_current(zend_object_iterator *iter TSRMLS_DC); + #ifdef __cplusplus U_CFUNC void IntlIterator_from_StringEnumeration(StringEnumeration *se, zval *object TSRMLS_DC); #endif diff --git a/ext/intl/config.m4 b/ext/intl/config.m4 index 431deeb7d2..227368334d 100755 --- a/ext/intl/config.m4 +++ b/ext/intl/config.m4 @@ -75,6 +75,10 @@ if test "$PHP_INTL" != "no"; then calendar/calendar_class.cpp \ calendar/calendar_methods.cpp \ calendar/gregoriancalendar_methods.cpp \ + breakiterator/breakiterator_class.cpp \ + breakiterator/breakiterator_iterators.cpp \ + breakiterator/breakiterator_methods.cpp \ + breakiterator/rulebasedbreakiterator_methods.cpp \ idn/idn.c \ $icu_spoof_src, $ext_shared,,$ICU_INCS -Wno-write-strings) PHP_ADD_BUILD_DIR($ext_builddir/collator) diff --git a/ext/intl/config.w32 b/ext/intl/config.w32 index 735749ab43..6b7d15d56d 100755 --- a/ext/intl/config.w32 +++ b/ext/intl/config.w32 @@ -102,6 +102,13 @@ if (PHP_INTL != "no") { gregoriancalendar_methods.cpp \ calendar_class.cpp", "intl"); + + ADD_SOURCES(configure_module_dirname + "/breakiterator", "\ + breakiterator_class.cpp \ + breakiterator_methods.cpp \ + breakiterator_iterators.cpp \ + rulebasedbreakiterator_methods.cpp", + "intl"); ADD_FLAG("LIBS_INTL", "icudt.lib icuin.lib icuio.lib icule.lib iculx.lib"); AC_DEFINE("HAVE_INTL", 1, "Internationalization support enabled"); diff --git a/ext/intl/intl_error.c b/ext/intl/intl_error.c index 2c7066b081..99b1c6001c 100755 --- a/ext/intl/intl_error.c +++ b/ext/intl/intl_error.c @@ -25,6 +25,7 @@ #include "php_intl.h" #include "intl_error.h" +#include "intl_convert.h" ZEND_EXTERN_MODULE_GLOBALS( intl ) @@ -242,7 +243,82 @@ void intl_register_IntlException_class( TSRMLS_D ) default_exception_ce, NULL TSRMLS_CC ); IntlException_ce_ptr->create_object = default_exception_ce->create_object; } -/* }}} */ + +smart_str intl_parse_error_to_string( UParseError* pe ) +{ + smart_str ret = {0}; + char *buf; + int u8len; + UErrorCode status; + int any = 0; + + assert( pe != NULL ); + + smart_str_appends( &ret, "parse error " ); + if( pe->line > 0 ) + { + smart_str_appends( &ret, "on line " ); + smart_str_append_long( &ret, (long ) pe->line ); + any = 1; + } + if( pe->offset >= 0 ) { + if( any ) + smart_str_appends( &ret, ", " ); + else + smart_str_appends( &ret, "at " ); + + smart_str_appends( &ret, "offset " ); + smart_str_append_long( &ret, (long ) pe->offset ); + any = 1; + } + + if (pe->preContext[0] != 0 ) { + if( any ) + smart_str_appends( &ret, ", " ); + + smart_str_appends( &ret, "after \"" ); + intl_convert_utf16_to_utf8( &buf, &u8len, pe->preContext, -1, &status ); + if( U_FAILURE( status ) ) + { + smart_str_appends( &ret, "(could not convert parser error pre-context to UTF-8)" ); + } + else { + smart_str_appendl( &ret, buf, u8len ); + efree( buf ); + } + smart_str_appends( &ret, "\"" ); + any = 1; + } + + if( pe->postContext[0] != 0 ) + { + if( any ) + smart_str_appends( &ret, ", " ); + + smart_str_appends( &ret, "before or at \"" ); + intl_convert_utf16_to_utf8( &buf, &u8len, pe->postContext, -1, &status ); + if( U_FAILURE( status ) ) + { + smart_str_appends( &ret, "(could not convert parser error post-context to UTF-8)" ); + } + else + { + smart_str_appendl( &ret, buf, u8len ); + efree( buf ); + } + smart_str_appends( &ret, "\"" ); + any = 1; + } + + if( !any ) + { + smart_str_free( &ret ); + smart_str_appends( &ret, "no parse error" ); + } + + smart_str_0( &ret ); + return ret; +} /* * Local variables: diff --git a/ext/intl/intl_error.h b/ext/intl/intl_error.h index b5000a15de..4d8eb79327 100755 --- a/ext/intl/intl_error.h +++ b/ext/intl/intl_error.h @@ -20,6 +20,8 @@ #define INTL_ERROR_H #include <unicode/utypes.h> +#include <unicode/parseerr.h> +#include <ext/standard/php_smart_str.h> #define INTL_ERROR_CODE(e) (e).code @@ -44,6 +46,9 @@ void intl_errors_set_custom_msg( intl_error* err, char* msg, int copyMsg void intl_errors_set_code( intl_error* err, UErrorCode err_code TSRMLS_DC ); void intl_errors_set( intl_error* err, UErrorCode code, char* msg, int copyMsg TSRMLS_DC ); +// Other error helpers +smart_str intl_parse_error_to_string( UParseError* pe ); + // exported to be called on extension MINIT void intl_register_IntlException_class( TSRMLS_D ); diff --git a/ext/intl/php_intl.c b/ext/intl/php_intl.c index 59272db712..5d8aa6be95 100755 --- a/ext/intl/php_intl.c +++ b/ext/intl/php_intl.c @@ -78,6 +78,8 @@ #include "calendar/calendar_methods.h" #include "calendar/gregoriancalendar_methods.h" +#include "breakiterator/breakiterator_class.h" + #include "idn/idn.h" #if U_ICU_VERSION_MAJOR_NUM > 3 && U_ICU_VERSION_MINOR_NUM >=2 @@ -958,6 +960,9 @@ PHP_MINIT_FUNCTION( intl ) /* Register 'IntlIterator' PHP class */ intl_register_IntlIterator_class( TSRMLS_C ); + /* Register 'BreakIterator' class */ + breakiterator_register_BreakIterator_class( TSRMLS_C ); + /* Global error handling. */ intl_error_init( NULL TSRMLS_CC ); diff --git a/ext/intl/transliterator/transliterator.c b/ext/intl/transliterator/transliterator.c index 75c9eaabda..8ee49e1e51 100644 --- a/ext/intl/transliterator/transliterator.c +++ b/ext/intl/transliterator/transliterator.c @@ -49,85 +49,6 @@ void transliterator_register_constants( INIT_FUNC_ARGS ) } /* }}} */ -/* {{{ transliterator_parse_error_to_string - * Transforms parse errors in strings. - */ -smart_str transliterator_parse_error_to_string( UParseError* pe ) -{ - smart_str ret = {0}; - char *buf; - int u8len; - UErrorCode status; - int any = 0; - - assert( pe != NULL ); - - smart_str_appends( &ret, "parse error " ); - if( pe->line > 0 ) - { - smart_str_appends( &ret, "on line " ); - smart_str_append_long( &ret, (long ) pe->line ); - any = 1; - } - if( pe->offset >= 0 ) { - if( any ) - smart_str_appends( &ret, ", " ); - else - smart_str_appends( &ret, "at " ); - - smart_str_appends( &ret, "offset " ); - smart_str_append_long( &ret, (long ) pe->offset ); - any = 1; - } - - if (pe->preContext[0] != 0 ) { - if( any ) - smart_str_appends( &ret, ", " ); - - smart_str_appends( &ret, "after \"" ); - intl_convert_utf16_to_utf8( &buf, &u8len, pe->preContext, -1, &status ); - if( U_FAILURE( status ) ) - { - smart_str_appends( &ret, "(could not convert parser error pre-context to UTF-8)" ); - } - else { - smart_str_appendl( &ret, buf, u8len ); - efree( buf ); - } - smart_str_appends( &ret, "\"" ); - any = 1; - } - - if( pe->postContext[0] != 0 ) - { - if( any ) - smart_str_appends( &ret, ", " ); - - smart_str_appends( &ret, "before or at \"" ); - intl_convert_utf16_to_utf8( &buf, &u8len, pe->postContext, -1, &status ); - if( U_FAILURE( status ) ) - { - smart_str_appends( &ret, "(could not convert parser error post-context to UTF-8)" ); - } - else - { - smart_str_appendl( &ret, buf, u8len ); - efree( buf ); - } - smart_str_appends( &ret, "\"" ); - any = 1; - } - - if( !any ) - { - smart_str_free( &ret ); - smart_str_appends( &ret, "no parse error" ); - } - - smart_str_0( &ret ); - return ret; -} - /* * Local variables: * tab-width: 4 diff --git a/ext/intl/transliterator/transliterator_methods.c b/ext/intl/transliterator/transliterator_methods.c index d0cfb9790d..1aa39c54b9 100644 --- a/ext/intl/transliterator/transliterator_methods.c +++ b/ext/intl/transliterator/transliterator_methods.c @@ -183,7 +183,7 @@ PHP_FUNCTION( transliterator_create_from_rules ) { char *msg = NULL; smart_str parse_error_str; - parse_error_str = transliterator_parse_error_to_string( &parse_error ); + parse_error_str = intl_parse_error_to_string( &parse_error ); spprintf( &msg, 0, "transliterator_create_from_rules: unable to " "create ICU transliterator from rules (%s)", parse_error_str.c ); smart_str_free( &parse_error_str ); |