summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2013-05-18 08:25:16 -0600
committerKarl Williamson <public@khwilliamson.com>2013-05-20 11:01:52 -0600
commit1ca267a56acf698557ec1deec44af651acc88696 (patch)
tree007f9d40b63b92728a095d5defdad663e239289c /pod
parent519101418837cf0edacb710b2b38b42dad6e47c1 (diff)
downloadperl-1ca267a56acf698557ec1deec44af651acc88696.tar.gz
Fix multi-char fold edge case
use locale; fc("\N{LATIN CAPITAL LETTER SHARP S}") eq 2 x fc("\N{LATIN SMALL LETTER LONG S}") should return true, as the SHARP S folds to two 's's in a row, and the LONG S is an antique variant of 's', and folds to s. Until this commit, the expression was false. Similarly, the following should match, but didn't until this commit: "\N{LATIN SMALL LETTER SHARP S}" =~ /\N{LATIN SMALL LETTER LONG S}{2}/iaa The reason these didn't work properly is that in both cases the actual fold to 's' is disallowed. In the first case because of locale; and in the second because of /aa. And the code wasn't smart enough to realize that these were legal. The fix is to special case these so that the fold of sharp s (both capital and small) is two LONG S's under /aa; as is the fold of the capital sharp s under locale. The latter is user-visible, and the documentation of fc() now points that out. I believe this is such an edge case that no mention of it need be done in perldelta.
Diffstat (limited to 'pod')
-rw-r--r--pod/perlfunc.pod12
1 files changed, 10 insertions, 2 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 676644f732..08b9df9e82 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -2174,8 +2174,16 @@ Case Charts available at L<http://www.unicode.org/charts/case/>.
If EXPR is omitted, uses C<$_>.
-This function behaves the same way under various pragma, such as in a locale,
-as L</lc> does.
+This function behaves the same way under various pragma, such as within
+S<C<"use feature 'unicode_strings">>, as L</lc> does, with the single
+exception of C<fc> of LATIN CAPITAL LETTER SHARP S (U+1E9E) within the
+scope of S<C<use locale>>. The foldcase of this character would
+normally be C<"ss">, but as explained in the L</lc> section, case
+changes that cross the 255/256 boundary are problematic under locales,
+and are hence prohibited. Therefore, this function under locale returns
+instead the string C<"\x{17F}\x{17F}">, which is the LATIN SMALL LETTER
+LONG S. Since that character itself folds to C<"s">, the string of two
+of them together should be equivalent to a single U+1E9E when foldcased.
While the Unicode Standard defines two additional forms of casefolding,
one for Turkic languages and one that never maps one character into multiple