pp.c: Changing case of utf8 strings under locale uses locale for < 255

As proposed on p5p and approved, this changes the functions uc(), lc(), ucfirst(), and lcfirst() to respect locale for code points < 255; and use Unicode semantics for those above 255. This results in better, but not perfect results, as noted in the changed pods, and brings these functions into line with how regular expression pattern matching already works.
author: Karl Williamson <public@khwilliamson.com> 2011-12-13 22:01:46 -0700
committer: Karl Williamson <public@khwilliamson.com> 2011-12-15 16:26:00 -0700
commit: 094a2f8c3da82fac9e0698c2daeb7e94d0ae765a (patch)
tree: 377042bb7ad310d7b0b1cf66079e80e5d8743e0a /pod/perlfunc.pod
parent: 81c6c7ce308a6bd705e6d8343eb996df5a938aa5 (diff)
download: perl-094a2f8c3da82fac9e0698c2daeb7e94d0ae765a.tar.gz
1 files changed, 16 insertions, 4 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 18f8d37d19..f4f92bf8b4 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -2964,13 +2964,25 @@ respectively.
 
 =back
 
-=item Otherwise, If EXPR has the UTF8 flag set
+=item Otherwise, if C<use locale> is in effect
 
-Unicode semantics are used for the case change.
+Respects current LC_CTYPE locale for code points < 256; and uses Unicode
+semantics for the remaining code points (this last can only happen if
+the UTF8 flag is also set).  See L<perllocale>.
 
-=item Otherwise, if C<use locale> is in effect
+A deficiency in this is that case changes that cross the 255/256
+boundary are not well-defined.  For example, the lower case of LATIN CAPITAL
+LETTER SHARP S (U+1E9E) in Unicode semantics is U+00DF (on ASCII
+platforms).   But under C<use locale>, the lower case of U+1E9E is
+itself, because 0xDF may not be LATIN SMALL LETTER SHARP S in the
+current locale, and Perl has no way of knowing if that character even
+exists in the locale, much less what code point it is.  Perl returns
+the input character unchanged, for all instances (and there aren't
+many) where the 255/256 boundary would otherwise be crossed.
 
-Respects current LC_CTYPE locale.  See L<perllocale>.
+=item Otherwise, If EXPR has the UTF8 flag set
+
+Unicode semantics are used for the case change.
 
 =item Otherwise, if C<use feature 'unicode_strings'> is in effect:
author	Karl Williamson <public@khwilliamson.com>	2011-12-13 22:01:46 -0700
committer	Karl Williamson <public@khwilliamson.com>	2011-12-15 16:26:00 -0700
commit	094a2f8c3da82fac9e0698c2daeb7e94d0ae765a (patch)
tree	377042bb7ad310d7b0b1cf66079e80e5d8743e0a /pod/perlfunc.pod
parent	81c6c7ce308a6bd705e6d8343eb996df5a938aa5 (diff)
download	perl-094a2f8c3da82fac9e0698c2daeb7e94d0ae765a.tar.gz