diff options
author | Nikita Popov <nikita.ppv@gmail.com> | 2017-07-27 22:48:00 +0200 |
---|---|---|
committer | Nikita Popov <nikita.ppv@gmail.com> | 2017-07-28 12:32:50 +0200 |
commit | 582a65b06f3de125887cab02d5c561168fcf94bc (patch) | |
tree | 8e1420959ee8f8216227cbc2f15e2fef5ac6d569 /ext/mbstring/php_unicode.h | |
parent | 9ac7c1e71d956ddac63b042be6ad8b105e584c10 (diff) | |
download | php-git-582a65b06f3de125887cab02d5c561168fcf94bc.tar.gz |
Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:
* Only language-agnostic and unconditional full case mapping
is implemented. The only language-agnostic conditional case
mapping rule relates to Greek sigma in final position
(Final_Sigma). Correctly handling this requires both arbitrary
lookahead and lookbehind, which would require some larger
changes to how the case mapping is implemented. This is a
possible future extension.
* The only language-specific handling that is implemented is
for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
is used. This matches the previous behavior and makes sure
that no codepoints not supported by the encoding are
produced. A future extension would be to also handle the
Turkish mappings specified by SpecialCasing.txt based on
the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
operations continue to use simple case folding. The reason is
that full case folding of the haystack string may change the
position at which a match occurred. This would have to be
mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
mapping / folding, where full is the default. The constants
are:
* MB_CASE_LOWER (used by mb_strtolower)
* MB_CASE_UPPER (used by mb_strtolower)
* MB_CASE_TITLE
* MB_CASE_FOLD
* MB_CASE_LOWER_SIMPLE
* MB_CASE_UPPER_SIMPLE
* MB_CASE_TITLE_SIMPLE
* MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
Diffstat (limited to 'ext/mbstring/php_unicode.h')
-rw-r--r-- | ext/mbstring/php_unicode.h | 13 |
1 files changed, 9 insertions, 4 deletions
diff --git a/ext/mbstring/php_unicode.h b/ext/mbstring/php_unicode.h index 51978e37d7..6808939ca6 100644 --- a/ext/mbstring/php_unicode.h +++ b/ext/mbstring/php_unicode.h @@ -85,10 +85,15 @@ MBSTRING_API char *php_unicode_convert_case( int case_mode, const char *srcstr, size_t srclen, size_t *retlen, const mbfl_encoding *src_encoding); -#define PHP_UNICODE_CASE_UPPER 0 -#define PHP_UNICODE_CASE_LOWER 1 -#define PHP_UNICODE_CASE_TITLE 2 -#define PHP_UNICODE_CASE_FOLD 3 +#define PHP_UNICODE_CASE_UPPER 0 +#define PHP_UNICODE_CASE_LOWER 1 +#define PHP_UNICODE_CASE_TITLE 2 +#define PHP_UNICODE_CASE_FOLD 3 +#define PHP_UNICODE_CASE_UPPER_SIMPLE 4 +#define PHP_UNICODE_CASE_LOWER_SIMPLE 5 +#define PHP_UNICODE_CASE_TITLE_SIMPLE 6 +#define PHP_UNICODE_CASE_FOLD_SIMPLE 7 +#define PHP_UNICODE_CASE_MODE_MAX 7 /* Optimize the common ASCII case for lower/upper */ |