diff options
author | Karl Williamson <khw@cpan.org> | 2018-03-02 12:13:55 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-03-02 12:37:16 -0700 |
commit | 578a6a873a320fe64743b060dbd467f1865d205c (patch) | |
tree | 6eb0156a51297f8bde040c9ad938125ff670af7b | |
parent | d8255c827dc80db97c8439ea38afc130902a7c1e (diff) | |
download | perl-578a6a873a320fe64743b060dbd467f1865d205c.tar.gz |
Reword warning for deviations from UTF-8 locales
Some locales are UTF-8, but not exactly what Perl is expecting. Revise
the message raised in this circumstance.
Originally I thought these were violations of Unicode, but based on
feedback from Craig Berry, I came to realize that these are legitimate
interpretations of the Unicode standard. But perl persists with its own
interpretation that differs from these, hence the warning.
-rw-r--r-- | README.hpux | 9 | ||||
-rw-r--r-- | locale.c | 5 | ||||
-rw-r--r-- | pod/perldelta.pod | 8 | ||||
-rw-r--r-- | pod/perldiag.pod | 47 | ||||
-rw-r--r-- | t/porting/known_pod_issues.dat | 1 |
5 files changed, 42 insertions, 28 deletions
diff --git a/README.hpux b/README.hpux index ce000dd887..e1857e08dc 100644 --- a/README.hpux +++ b/README.hpux @@ -563,15 +563,6 @@ questions about 64-bit numbers when Configure asks you, you may get a configuration that cannot be compiled, or that does not function as expected. -=head2 Locales on HP-UX - -HP-UX installs the locale C<univ.utf8> and C<en_US.utf8> on all systems. -Up to and including HP-UX 11.23, this local is defective in that it -does not thinks that the characters C<< $ + < = > ^ ` | >> and C<~> are -punctuation, which they are according to the Unicode standards. - -This appears to be fixed on HP-UX 11.31. - =head2 Oracle on HP-UX Using perl to connect to Oracle databases through DBI and DBD::Oracle @@ -1656,10 +1656,9 @@ S_new_ctype(pTHX_ const char *newctype) if (UNLIKELY(bad_count) && PL_in_utf8_CTYPE_locale) { PL_warn_locale = Perl_newSVpvf(aTHX_ "Locale '%s' contains (at least) the following characters" - " which have\nnon-standard meanings: %s\nThe Perl program" - " will use the standard meanings", + " which have\nunexpected meanings: %s\nThe Perl program" + " will use the expected meanings", newctype, bad_chars_list); - } else { PL_warn_locale = Perl_newSVpvf(aTHX_ diff --git a/pod/perldelta.pod b/pod/perldelta.pod index fb56b110e9..58781e37ee 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -227,6 +227,14 @@ allow entering the I<first> argument of an operator that takes a fixed number of arguments, since this is a case that will not cause stack corruption. [perl #132854] +=item * + +The warning added in 5.27.8 concerning UTF-8 locale compatibility was +misleading. The new wording and explanation are at +L<perldiag/Locale '%s' contains (at least) the following characters which +have unexepected meanings: %s The Perl program will use the exepected +meanings> + =back =head1 Utility Changes diff --git a/pod/perldiag.pod b/pod/perldiag.pod index c24be8a334..3abc301f7a 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3371,28 +3371,43 @@ said library was compiled against. Reinstalling the XS module will likely fix this error. =item Locale '%s' contains (at least) the following characters which -have non-standard meanings: %s The Perl program will use the standard +have unexepected meanings: %s The Perl program will use the exepected meanings (W locale) You are using the named UTF-8 locale. UTF-8 locales are -expected to adhere to the Unicode standard. This message arises when -perl found some anomalies in the locale, and is notifying you that there -are potential problems. - -The most common cause of this warning is that, contrary to the claims, -Unicode is not completely locale insensitive. Turkish and some related -languages have two types of C<"I"> characters. One is dotted in both -upper- and lowercase, and the other is dotless in both cases. Unicode -allows a locale to use either these rules, or the rules used in all -other instances, where there is only one type of C<"I">, which is -dotless in the uppercase, and dotted in the lower. The perl core does -not (yet) handle the Turkish case, and this warns you of that. Instead, +expected to have very particular behavior, which most do. This message +arises when perl found some departures from the expectations, and is +notifying you that the expected behavior overrides these differences. +In some cases the differences are caused by the locale definition being +defective, but the most common causes of this warning are when there are +ambiguities and conflicts in following the Standard, and the locale has +chosen an approach that differs from Perl's. + +One of these is because that, contrary to the claims, Unicode is not +completely locale insensitive. Turkish and some related languages have +two types of C<"I"> characters. One is dotted in both upper- and +lowercase, and the other is dotless in both cases. Unicode allows a +locale to use either the Turkish rules, or the rules used in all other +instances, where there is only one type of C<"I">, which is dotless in +the uppercase, and dotted in the lower. The perl core does not (yet) +handle the Turkish case, and this message warns you of that. Instead, the L<Unicode::Casing> module allows you to mostly implement the Turkish casing rules. -But there are other locales which are defective in not following the -Unicode standard, and this message is raised if one of these is -detected. +The other common cause is for the characters + + $ + < = > ^ ` | ~ + +These are probematic. The C standard says that these should be +considered punctuation in the C locale (and the POSIX standard defers to +the C standard), and Unicode is generally considered a superset of the C +locale. But Unicode has added an extra category, "Symbol", and +classifies these particular characters as being symbols. Most UTF-8 +locales have them treated as punctuation, so that L<ispunct(2)> returns +non-zero for them. But a few locales have it return 0. Perl takes the +first approach, not using C<ispunct()> at all (see L<Note [5] in +perlrecharclass|perlrecharclass/[5]>), and this message is raised to +notify you that you are getting Perl's approach, not the locale's. =item Locale '%s' may not work well.%s diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat index 78e0ec659d..5856f805f3 100644 --- a/t/porting/known_pod_issues.dat +++ b/t/porting/known_pod_issues.dat @@ -147,6 +147,7 @@ ioctl(2) IPC::Run IPC::Shareable IPC::Signal +ispunct(2) kill(3) langinfo(3) LaTeX::Encode |