summaryrefslogtreecommitdiff
path: root/ext/mbstring/php_unicode.h
diff options
context:
space:
mode:
authorNikita Popov <nikita.ppv@gmail.com>2017-07-27 22:48:00 +0200
committerNikita Popov <nikita.ppv@gmail.com>2017-07-28 12:32:50 +0200
commit582a65b06f3de125887cab02d5c561168fcf94bc (patch)
tree8e1420959ee8f8216227cbc2f15e2fef5ac6d569 /ext/mbstring/php_unicode.h
parent9ac7c1e71d956ddac63b042be6ad8b105e584c10 (diff)
downloadphp-git-582a65b06f3de125887cab02d5c561168fcf94bc.tar.gz
Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and also full case folding according to CaseFolding.txt (F). There are a number of caveats: * Only language-agnostic and unconditional full case mapping is implemented. The only language-agnostic conditional case mapping rule relates to Greek sigma in final position (Final_Sigma). Correctly handling this requires both arbitrary lookahead and lookbehind, which would require some larger changes to how the case mapping is implemented. This is a possible future extension. * The only language-specific handling that is implemented is for Turkish dotted/undotted Is, if the ISO-8859-9 encoding is used. This matches the previous behavior and makes sure that no codepoints not supported by the encoding are produced. A future extension would be to also handle the Turkish mappings specified by SpecialCasing.txt based on the mbfl internal language. * Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string. * mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are: * MB_CASE_LOWER (used by mb_strtolower) * MB_CASE_UPPER (used by mb_strtolower) * MB_CASE_TITLE * MB_CASE_FOLD * MB_CASE_LOWER_SIMPLE * MB_CASE_UPPER_SIMPLE * MB_CASE_TITLE_SIMPLE * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
Diffstat (limited to 'ext/mbstring/php_unicode.h')
-rw-r--r--ext/mbstring/php_unicode.h13
1 files changed, 9 insertions, 4 deletions
diff --git a/ext/mbstring/php_unicode.h b/ext/mbstring/php_unicode.h
index 51978e37d7..6808939ca6 100644
--- a/ext/mbstring/php_unicode.h
+++ b/ext/mbstring/php_unicode.h
@@ -85,10 +85,15 @@ MBSTRING_API char *php_unicode_convert_case(
int case_mode, const char *srcstr, size_t srclen, size_t *retlen,
const mbfl_encoding *src_encoding);
-#define PHP_UNICODE_CASE_UPPER 0
-#define PHP_UNICODE_CASE_LOWER 1
-#define PHP_UNICODE_CASE_TITLE 2
-#define PHP_UNICODE_CASE_FOLD 3
+#define PHP_UNICODE_CASE_UPPER 0
+#define PHP_UNICODE_CASE_LOWER 1
+#define PHP_UNICODE_CASE_TITLE 2
+#define PHP_UNICODE_CASE_FOLD 3
+#define PHP_UNICODE_CASE_UPPER_SIMPLE 4
+#define PHP_UNICODE_CASE_LOWER_SIMPLE 5
+#define PHP_UNICODE_CASE_TITLE_SIMPLE 6
+#define PHP_UNICODE_CASE_FOLD_SIMPLE 7
+#define PHP_UNICODE_CASE_MODE_MAX 7
/* Optimize the common ASCII case for lower/upper */