Fix locale caseless matching and utf8

As explained in the doc changes of this patch, under /l, caseless matching of code points less than 256 now use locale rules regardless of the utf8ness of the pattern or string. They now match the behavior of things like \w, in this regard.
author: Karl Williamson <public@khwilliamson.com> 2011-02-19 10:20:50 -0700
committer: Karl Williamson <public@khwilliamson.com> 2011-02-19 11:47:42 -0700
commit: 17580e7a366d68c68fa37fe63c284c1d83b245fe (patch)
tree: 083c8313722786593172d0a6ff14a4767c9d520c /pod/perlre.pod
parent: 41ce0a5ea4efb0067c8416f074bb757a70d2faa1 (diff)
download: perl-17580e7a366d68c68fa37fe63c284c1d83b245fe.tar.gz
1 files changed, 11 insertions, 3 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 56e42f8088..31e881703c 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -52,7 +52,8 @@ X<regular expression, case-insensitive>
 Do case-insensitive pattern matching.
 
 If C<use locale> is in effect, the case map is taken from the current
-locale.  See L<perllocale>.
+locale for code points less than 255, and from Unicode rules for larger
+code points.  See L<perllocale>.
 
 =item x
 X</x>
@@ -655,8 +656,15 @@ locale, and can differ from one match to another if there is an
 intervening call of the
 L<setlocale() function|perllocale/The setlocale function>.
 This modifier is automatically set if the regular expression is compiled
-within the scope of a C<"use locale"> pragma.  Results are not
-well-defined when using this and matching against a utf8-encoded string.
+within the scope of a C<"use locale"> pragma.
+Perl only allows single-byte locales.  This means that code points above
+255 are treated as Unicode no matter what locale is in effect.
+Under Unicode rules, there are a few case-insensitive matches that cross the
+boundary 255/256 boundary.  These are disallowed.  For example,
+0xFF does not caselessly match the character at 0x178, LATIN CAPITAL
+LETTER Y WITH DIAERESIS, because 0xFF may not be LATIN SMALL LETTER Y
+in the current locale, and Perl has no way of knowing if that character
+even exists in the locale, much less what code point it is.
 
 C<"u"> means to use Unicode semantics when pattern matching.  It is
 automatically set if the regular expression is encoded in utf8, or is
author	Karl Williamson <public@khwilliamson.com>	2011-02-19 10:20:50 -0700
committer	Karl Williamson <public@khwilliamson.com>	2011-02-19 11:47:42 -0700
commit	17580e7a366d68c68fa37fe63c284c1d83b245fe (patch)
tree	083c8313722786593172d0a6ff14a4767c9d520c /pod/perlre.pod
parent	41ce0a5ea4efb0067c8416f074bb757a70d2faa1 (diff)
download	perl-17580e7a366d68c68fa37fe63c284c1d83b245fe.tar.gz