summaryrefslogtreecommitdiff
path: root/ext/mbstring/php_unicode.c
Commit message (Collapse)AuthorAgeFilesLines
* Fixed bug #76319Nikita Popov2018-05-251-1/+14
| | | | | | While at it, also make sure that mbstring case conversion takes into account the specified substitution character and substitution mode.
* year++Xinchen Hui2018-01-021-1/+1
|
* fix c89 compatAnatol Belski2017-07-281-2/+2
|
* Fixed bug #65544 and #71298Nikita Popov2017-07-281-20/+12
|
* Implement full case mappingNikita Popov2017-07-281-18/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement full case mapping according to SpecialCasing.txt and also full case folding according to CaseFolding.txt (F). There are a number of caveats: * Only language-agnostic and unconditional full case mapping is implemented. The only language-agnostic conditional case mapping rule relates to Greek sigma in final position (Final_Sigma). Correctly handling this requires both arbitrary lookahead and lookbehind, which would require some larger changes to how the case mapping is implemented. This is a possible future extension. * The only language-specific handling that is implemented is for Turkish dotted/undotted Is, if the ISO-8859-9 encoding is used. This matches the previous behavior and makes sure that no codepoints not supported by the encoding are produced. A future extension would be to also handle the Turkish mappings specified by SpecialCasing.txt based on the mbfl internal language. * Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string. * mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are: * MB_CASE_LOWER (used by mb_strtolower) * MB_CASE_UPPER (used by mb_strtolower) * MB_CASE_TITLE * MB_CASE_FOLD * MB_CASE_LOWER_SIMPLE * MB_CASE_UPPER_SIMPLE * MB_CASE_TITLE_SIMPLE * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
* Use case-folding for case insensitive comparisonsNikita Popov2017-07-281-0/+24
| | | | Instead of using lowercasing.
* Use MPH for case mapsNikita Popov2017-07-281-32/+30
| | | | | | | Instead of performing a binary search, use a hashtable to store the case maps. In particular a minimal perfect hash construction is used, which does not require collision resolution (but does use an auxiliary table for the hash perturbation).
* Change layout of case mapping tableNikita Popov2017-07-231-87/+27
| | | | | | | | | | | | | | | | | | | Previously the case mapping table was segregated by the type of the character (upper, lower, title) and always stored the other two variants (key, other1, other2). Now the table is segregated by the target type (key, other). As only very few characters have more than one target this only slightly increases the size of the table. The advantage of this layout is that we only need to perform a single table lookup in the case table. Previously, depending on the case that was hit, either one lookup in the property table, or two lookups in the property table and one lookup in the case table were required. This changes the layout from libunicode in the OpenLDAP project -- however, the last commit there was over 10 years ago, so I don't see value in keeping this in sync.
* Merge branch 'PHP-7.2'Nikita Popov2017-07-231-14/+15
|\
| * Another fix for bug #69267Nikita Popov2017-07-231-1/+1
| | | | | | | | | | | | | | mb_strtoupper() was converting lowercase characters into titlecase characters, instead of uppercase characters. Luckily there are only very few characters with a distinct titlecase representation, so this mostly worked out okay...
| * Partial fix for bug #69267Nikita Popov2017-07-231-13/+14
| | | | | | | | | | This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the OpenLDAP project.
* | Directly use encodings instead of no_encoding in libmbflNikita Popov2017-07-201-5/+5
| | | | | | | | | | | | | | | | | | In particular strings now store encoding rather than the no_encoding. I've also pruned out libmbfl APIs that existed in two forms, one using no_encoding and the other using encoding. We were not actually using any of the former.
* | Reduce number of encoding conversions in case conversionNikita Popov2017-07-201-48/+85
| | | | | | | | | | | | | | | | | | | | Don't indirect through UCS4BE, instead directly work on wchars using a custom filter. This replaces the pipeline utf8 -> wchar -> ucs4be -> wchar -case-> wchar -> ucs4be -> wchar -> utf8 with utf8 -> wchar -case-> -> wchar -> utf8
* | Optimize php_unicode_tolower/upper for ASCIINikita Popov2017-07-201-39/+22
| |
* | Directly accept encoding in php_unicode_convert_case()Nikita Popov2017-07-191-11/+4
| | | | | | | | | | | | As a side-effect mb_strtolower() and mb_strtoupper() now correctly handle a NULL encoding parameter by using the internal encoding. This is what caused the two test changes.
* | Optimize php_unicode_is_prop()Nikita Popov2017-07-191-14/+20
| | | | | | | | | | | | | | | | Do not try to extract the properties from a bitmask. Instead make the function variadic and pass all properties individually. Also add a php_unicode_is_prop1() function to check only a single property.
* | Avoid unnecessary encoding lookups in mbstringNikita Popov2017-07-191-10/+15
|/ | | | | Extract part of php_mb_convert_encoding that does the actual work and use it whenever we already know the encoding.
* Update copyright headers to 2017Sammy Kaye Powers2017-01-021-1/+1
|
* Merge branch 'PHP-5.6' into PHP-7.0Lior Kaplan2016-01-011-1/+1
|\ | | | | | | | | * PHP-5.6: Happy new year (Update copyright to 2016)
| * Happy new year (Update copyright to 2016)Lior Kaplan2016-01-011-1/+1
| |
| * bump yearXinchen Hui2015-01-151-1/+1
| |
* | bump yearXinchen Hui2015-01-151-1/+1
| |
* | trailing whitespace removalStanislav Malyshev2015-01-101-9/+9
| |
* | first shot remove TSRMLS_* thingsAnatol Belski2014-12-131-11/+11
| |
* | s/PHP 5/PHP 7/Johannes Schlüter2014-09-191-1/+1
|/
* Bump yearXinchen Hui2014-01-031-1/+1
|
* Happy New YearXinchen Hui2013-01-011-1/+1
|
* - Year++Felipe Pena2012-01-011-1/+1
|
* - Year++Felipe Pena2011-01-011-1/+1
|
* sed -i "s#1997-2009#1997-2010#g" **/*.c **/*.h **/*.phpSebastian Bergmann2010-01-031-1/+1
|
* MFH: Bump copyright year, 3 of 3.Sebastian Bergmann2008-12-311-1/+1
|
* Fixed bug #46626 (mb_convert_case does not handle apostrophe correctly)Ilia Alshanetsky2008-11-241-1/+1
|
* - MFH: Fixed warnings.Moriyoshi Koizumi2008-07-241-2/+2
|
* fixed #43998 Two error messages returned for incorrect encoding for ↵Rui Hirokawa2008-02-161-0/+5
| | | | mb_strto[upper|lower]
* MFH: Bump copyright year, 2 of 2.Sebastian Bergmann2007-12-311-1/+1
|
* MFH: fixed bug #29955 invalid case conversion in iso-8859-9.Rui Hirokawa2007-09-041-4/+2
|
* MFH: Bump year.Sebastian Bergmann2007-01-011-1/+1
|
* bump year and license versionfoobar2006-01-011-3/+3
|
* MFH: fixed #29955 mb_strtoupper() / lower() broken with Turkish encoding..Rui Hirokawa2005-12-231-8/+39
|
* - Bumber up yearfoobar2005-08-031-1/+1
|
* - A belated happy holidays and PHP 5Andi Gutmans2004-01-081-2/+2
|
* updating license information in the headers.James Cox2003-06-101-3/+3
|
* Bump year.Sebastian Bergmann2002-12-311-1/+1
|
* Reverted the changes because the problem was elsewhere.Moriyoshi Koizumi2002-12-021-1/+0
|
* Fixing build on WIn32Frank M. Kromann2002-12-021-0/+1
| | | | | MBREGEX is disabled for now. 5 mbre_* functions are undefined on WIn32
* MFB (made mbstring compile on windows again).Edin Kadribasic2002-11-131-5/+5
|
* Fixed mb_convert_case() / mb_strtolower() / mb_strtoupper() to work inMoriyoshi Koizumi2002-11-111-17/+31
| | | | | 64bit systems
* Modified mb_convert_case() to handle cased characters properly when ↵Moriyoshi Koizumi2002-10-231-3/+18
| | | | MB_CASE_TITLE is specified.
* Fix warningsZeev Suraski2002-10-011-1/+1
|
* (PHP mb_convert_case) Add function that will convert the case of a stringWez Furlong2002-09-261-0/+284
Respecting it's encoding (or the internal encoding).