diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-01-16 15:18:05 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-01-21 10:02:54 -0700 |
commit | dc4bfc4bc1a2fde867e51d484c29733e80a99e7b (patch) | |
tree | 12d809a4516b24e23f60f443d365b42b333918ce /pod/perllocale.pod | |
parent | 82ad65bb0613be64ca286f6db04d305b7f037509 (diff) | |
download | perl-dc4bfc4bc1a2fde867e51d484c29733e80a99e7b.tar.gz |
perllocale: Add caveat on UTF-8 locales
It turns out that the C library may not handle UTF-8 locales properly,
and the docs should mention that instead of blindly encouraging their
use.
Diffstat (limited to 'pod/perllocale.pod')
-rw-r--r-- | pod/perllocale.pod | 15 |
1 files changed, 11 insertions, 4 deletions
diff --git a/pod/perllocale.pod b/pod/perllocale.pod index d97448e2a8..5b4e508a3d 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -1032,10 +1032,17 @@ implemented in version 5.8 and later. See L<perluniintro>. Perl tries to work with both Unicode and locales--but of course, there are problems. Perl does not handle multi-byte locales, such as have been used for various -Asian languages, such as Big5 or Shift JIS. However, the increasingly common -multi-byte UTF-8 locales, if properly implemented, tend to work -reasonably well in Perl, simply because both they and Perl store -characters that take up multiple bytes the same way. +Asian languages, such as Big5 or Shift JIS. However, the increasingly +common multi-byte UTF-8 locales, if properly implemented, may work +reasonably well (depending on your C library implementation) in this +form of the locale pragma, simply because both +they and Perl store characters that take up multiple bytes the same way. +However, some, if not most, C library implementations may not process +the characters in the upper half of the Latin-1 range (128 - 255) +properly under LC_CTYPE. To see if a character is a particular type +under a locale, Perl uses the functions like C<isalnum()>. Your C +library may not work for UTF-8 locales with those functions, instead +only working under the newer wide library functions like C<iswalnum()>. Perl generally takes the tack to use locale rules on code points that can fit in a single byte, and Unicode rules for those that can't (though this wasn't |