diff options
author | Karl Williamson <khw@cpan.org> | 2014-11-12 10:32:47 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2014-11-14 13:38:18 -0700 |
commit | 8c6180a91de91a1194f427fc639694f43a903a78 (patch) | |
tree | 7ce9757ab2d8bbe46a96472d3e8c1b5278d442c3 /pod/perllocale.pod | |
parent | 278b2b56f3fef134f4e266a3f0d4f50760cdaba2 (diff) | |
download | perl-8c6180a91de91a1194f427fc639694f43a903a78.tar.gz |
Reinstate "Raise warnings for poorly supported locales"
This reverts commit 1244bd171b8d1fd4b6179e537f7b95c38bd8f099,
thus reinstating commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc.
Diffstat (limited to 'pod/perllocale.pod')
-rw-r--r-- | pod/perllocale.pod | 21 |
1 files changed, 19 insertions, 2 deletions
diff --git a/pod/perllocale.pod b/pod/perllocale.pod index c693ecb2aa..d083c09d2f 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -863,12 +863,18 @@ example, if you move from the "C" locale to a 7-bit ISO 646 one, you may find--possibly to your surprise--that C<"|"> moves from the C<POSIX::ispunct()> class to C<POSIX::isalpha()>. Unfortunately, this creates big problems for regular expressions. "|" still -means alternation even though it matches C<\w>. +means alternation even though it matches C<\w>. Starting in v5.22, a +warning will be raised when such a locale is switched into. More +details are given several paragraphs further down. Starting in v5.20, Perl supports UTF-8 locales for C<LC_CTYPE>, but otherwise Perl only supports single-byte locales, such as the ISO 8859 series. This means that wide character locales, for example for Asian -languages, are not supported. The UTF-8 locale support is actually a +languages, are not well-supported. (If the platform has the capability +for Perl to detect such a locale, starting in Perl v5.22, +L<Perl will warn, default enabled|warnings/Category Hierarchy>, +using the C<locale> warning category, whenever such a locale is switched +into.) The UTF-8 locale support is actually a superset of POSIX locales, because it is really full Unicode behavior as if no locale were in effect at all (except for tainting; see L</SECURITY>). POSIX locales, even UTF-8 ones, @@ -890,6 +896,17 @@ C<\n> for example, always mean the platform's native one. This means, for example, that C<\N> in regular expressions (every character but new-line) works on the platform character set. +Starting in v5.22, Perl will by default warn when switching into a +locale that redefines any ASCII printable character (plus C<\t> and +C<\n>) into a different class than expected. This is unlikely to +happen on modern locales, but can happen with the ISO 646 and other +7-bit locales that are essentially obsolete. Things may still work, +depending on what features of Perl are used by the program. For +example, in the example from above where C<"|"> becomes a C<\w>, and +there are no regular expressions where this matters, the program may +still work properly. The warning lists all the characters that +it can determine could be adversely affected. + B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result in clearly ineligible characters being considered to be alphanumeric by your application. For strict matching of (mundane) ASCII letters and |