diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-12-13 22:01:46 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-12-15 16:26:00 -0700 |
commit | 094a2f8c3da82fac9e0698c2daeb7e94d0ae765a (patch) | |
tree | 377042bb7ad310d7b0b1cf66079e80e5d8743e0a /pod/perlfunc.pod | |
parent | 81c6c7ce308a6bd705e6d8343eb996df5a938aa5 (diff) | |
download | perl-094a2f8c3da82fac9e0698c2daeb7e94d0ae765a.tar.gz |
pp.c: Changing case of utf8 strings under locale uses locale for < 255
As proposed on p5p and approved, this changes the functions uc(), lc(),
ucfirst(), and lcfirst() to respect locale for code points < 255; and
use Unicode semantics for those above 255. This results in better, but
not perfect results, as noted in the changed pods, and brings these
functions into line with how regular expression pattern matching already
works.
Diffstat (limited to 'pod/perlfunc.pod')
-rw-r--r-- | pod/perlfunc.pod | 20 |
1 files changed, 16 insertions, 4 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 18f8d37d19..f4f92bf8b4 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -2964,13 +2964,25 @@ respectively. =back -=item Otherwise, If EXPR has the UTF8 flag set +=item Otherwise, if C<use locale> is in effect -Unicode semantics are used for the case change. +Respects current LC_CTYPE locale for code points < 256; and uses Unicode +semantics for the remaining code points (this last can only happen if +the UTF8 flag is also set). See L<perllocale>. -=item Otherwise, if C<use locale> is in effect +A deficiency in this is that case changes that cross the 255/256 +boundary are not well-defined. For example, the lower case of LATIN CAPITAL +LETTER SHARP S (U+1E9E) in Unicode semantics is U+00DF (on ASCII +platforms). But under C<use locale>, the lower case of U+1E9E is +itself, because 0xDF may not be LATIN SMALL LETTER SHARP S in the +current locale, and Perl has no way of knowing if that character even +exists in the locale, much less what code point it is. Perl returns +the input character unchanged, for all instances (and there aren't +many) where the 255/256 boundary would otherwise be crossed. -Respects current LC_CTYPE locale. See L<perllocale>. +=item Otherwise, If EXPR has the UTF8 flag set + +Unicode semantics are used for the case change. =item Otherwise, if C<use feature 'unicode_strings'> is in effect: |