perllocale: Add caveat on UTF-8 locales

It turns out that the C library may not handle UTF-8 locales properly, and the docs should mention that instead of blindly encouraging their use.
author: Karl Williamson <public@khwilliamson.com> 2012-01-16 15:18:05 -0700
committer: Karl Williamson <public@khwilliamson.com> 2012-01-21 10:02:54 -0700
commit: dc4bfc4bc1a2fde867e51d484c29733e80a99e7b (patch)
tree: 12d809a4516b24e23f60f443d365b42b333918ce
parent: 82ad65bb0613be64ca286f6db04d305b7f037509 (diff)
download: perl-dc4bfc4bc1a2fde867e51d484c29733e80a99e7b.tar.gz
1 files changed, 11 insertions, 4 deletions
diff --git a/pod/perllocale.pod b/pod/perllocale.pod
index d97448e2a8..5b4e508a3d 100644
--- a/pod/perllocale.pod
+++ b/pod/perllocale.pod
@@ -1032,10 +1032,17 @@ implemented in version 5.8 and later.  See L<perluniintro>.  Perl tries to
 work with both Unicode and locales--but of course, there are problems.
 
 Perl does not handle multi-byte locales, such as have been used for various
-Asian languages, such as Big5 or Shift JIS.  However, the increasingly common
-multi-byte UTF-8 locales, if properly implemented, tend to work
-reasonably well in Perl, simply because both they and Perl store
-characters that take up multiple bytes the same way.
+Asian languages, such as Big5 or Shift JIS.  However, the increasingly
+common multi-byte UTF-8 locales, if properly implemented, may work
+reasonably well (depending on your C library implementation) in this
+form of the locale pragma, simply because both
+they and Perl store characters that take up multiple bytes the same way.
+However, some, if not most, C library implementations may not process
+the characters in the upper half of the Latin-1 range (128 - 255)
+properly under LC_CTYPE.  To see if a character is a particular type
+under a locale, Perl uses the functions like C<isalnum()>.  Your C
+library may not work for UTF-8 locales with those functions, instead
+only working under the newer wide library functions like C<iswalnum()>.
 
 Perl generally takes the tack to use locale rules on code points that can fit
 in a single byte, and Unicode rules for those that can't (though this wasn't
author	Karl Williamson <public@khwilliamson.com>	2012-01-16 15:18:05 -0700
committer	Karl Williamson <public@khwilliamson.com>	2012-01-21 10:02:54 -0700
commit	dc4bfc4bc1a2fde867e51d484c29733e80a99e7b (patch)
tree	12d809a4516b24e23f60f443d365b42b333918ce
parent	82ad65bb0613be64ca286f6db04d305b7f037509 (diff)
download	perl-dc4bfc4bc1a2fde867e51d484c29733e80a99e7b.tar.gz