diff options
author | Karl Williamson <public@khwilliamson.com> | 2013-05-18 08:25:16 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2013-05-20 11:01:52 -0600 |
commit | 1ca267a56acf698557ec1deec44af651acc88696 (patch) | |
tree | 007f9d40b63b92728a095d5defdad663e239289c /pod | |
parent | 519101418837cf0edacb710b2b38b42dad6e47c1 (diff) | |
download | perl-1ca267a56acf698557ec1deec44af651acc88696.tar.gz |
Fix multi-char fold edge case
use locale;
fc("\N{LATIN CAPITAL LETTER SHARP S}")
eq 2 x fc("\N{LATIN SMALL LETTER LONG S}")
should return true, as the SHARP S folds to two 's's in a row, and the
LONG S is an antique variant of 's', and folds to s. Until this commit,
the expression was false.
Similarly, the following should match, but didn't until this commit:
"\N{LATIN SMALL LETTER SHARP S}" =~ /\N{LATIN SMALL LETTER LONG S}{2}/iaa
The reason these didn't work properly is that in both cases the actual
fold to 's' is disallowed. In the first case because of locale; and in
the second because of /aa. And the code wasn't smart enough to realize
that these were legal.
The fix is to special case these so that the fold of sharp s (both
capital and small) is two LONG S's under /aa; as is the fold of the
capital sharp s under locale. The latter is user-visible, and the
documentation of fc() now points that out. I believe this is such an
edge case that no mention of it need be done in perldelta.
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlfunc.pod | 12 |
1 files changed, 10 insertions, 2 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 676644f732..08b9df9e82 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -2174,8 +2174,16 @@ Case Charts available at L<http://www.unicode.org/charts/case/>. If EXPR is omitted, uses C<$_>. -This function behaves the same way under various pragma, such as in a locale, -as L</lc> does. +This function behaves the same way under various pragma, such as within +S<C<"use feature 'unicode_strings">>, as L</lc> does, with the single +exception of C<fc> of LATIN CAPITAL LETTER SHARP S (U+1E9E) within the +scope of S<C<use locale>>. The foldcase of this character would +normally be C<"ss">, but as explained in the L</lc> section, case +changes that cross the 255/256 boundary are problematic under locales, +and are hence prohibited. Therefore, this function under locale returns +instead the string C<"\x{17F}\x{17F}">, which is the LATIN SMALL LETTER +LONG S. Since that character itself folds to C<"s">, the string of two +of them together should be equivalent to a single U+1E9E when foldcased. While the Unicode Standard defines two additional forms of casefolding, one for Turkic languages and one that never maps one character into multiple |