summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-03-02 12:13:55 -0700
committerKarl Williamson <khw@cpan.org>2018-03-02 12:37:16 -0700
commit578a6a873a320fe64743b060dbd467f1865d205c (patch)
tree6eb0156a51297f8bde040c9ad938125ff670af7b
parentd8255c827dc80db97c8439ea38afc130902a7c1e (diff)
downloadperl-578a6a873a320fe64743b060dbd467f1865d205c.tar.gz
Reword warning for deviations from UTF-8 locales
Some locales are UTF-8, but not exactly what Perl is expecting. Revise the message raised in this circumstance. Originally I thought these were violations of Unicode, but based on feedback from Craig Berry, I came to realize that these are legitimate interpretations of the Unicode standard. But perl persists with its own interpretation that differs from these, hence the warning.
-rw-r--r--README.hpux9
-rw-r--r--locale.c5
-rw-r--r--pod/perldelta.pod8
-rw-r--r--pod/perldiag.pod47
-rw-r--r--t/porting/known_pod_issues.dat1
5 files changed, 42 insertions, 28 deletions
diff --git a/README.hpux b/README.hpux
index ce000dd887..e1857e08dc 100644
--- a/README.hpux
+++ b/README.hpux
@@ -563,15 +563,6 @@ questions about 64-bit numbers when Configure asks you, you may get a
configuration that cannot be compiled, or that does not function as
expected.
-=head2 Locales on HP-UX
-
-HP-UX installs the locale C<univ.utf8> and C<en_US.utf8> on all systems.
-Up to and including HP-UX 11.23, this local is defective in that it
-does not thinks that the characters C<< $ + < = > ^ ` | >> and C<~> are
-punctuation, which they are according to the Unicode standards.
-
-This appears to be fixed on HP-UX 11.31.
-
=head2 Oracle on HP-UX
Using perl to connect to Oracle databases through DBI and DBD::Oracle
diff --git a/locale.c b/locale.c
index ead73e5554..d6d91ea2b9 100644
--- a/locale.c
+++ b/locale.c
@@ -1656,10 +1656,9 @@ S_new_ctype(pTHX_ const char *newctype)
if (UNLIKELY(bad_count) && PL_in_utf8_CTYPE_locale) {
PL_warn_locale = Perl_newSVpvf(aTHX_
"Locale '%s' contains (at least) the following characters"
- " which have\nnon-standard meanings: %s\nThe Perl program"
- " will use the standard meanings",
+ " which have\nunexpected meanings: %s\nThe Perl program"
+ " will use the expected meanings",
newctype, bad_chars_list);
-
}
else {
PL_warn_locale = Perl_newSVpvf(aTHX_
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index fb56b110e9..58781e37ee 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -227,6 +227,14 @@ allow entering the I<first> argument of an operator that takes a fixed
number of arguments, since this is a case that will not cause stack
corruption. [perl #132854]
+=item *
+
+The warning added in 5.27.8 concerning UTF-8 locale compatibility was
+misleading. The new wording and explanation are at
+L<perldiag/Locale '%s' contains (at least) the following characters which
+have unexepected meanings: %s The Perl program will use the exepected
+meanings>
+
=back
=head1 Utility Changes
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index c24be8a334..3abc301f7a 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3371,28 +3371,43 @@ said library was compiled against. Reinstalling the XS module will
likely fix this error.
=item Locale '%s' contains (at least) the following characters which
-have non-standard meanings: %s The Perl program will use the standard
+have unexepected meanings: %s The Perl program will use the exepected
meanings
(W locale) You are using the named UTF-8 locale. UTF-8 locales are
-expected to adhere to the Unicode standard. This message arises when
-perl found some anomalies in the locale, and is notifying you that there
-are potential problems.
-
-The most common cause of this warning is that, contrary to the claims,
-Unicode is not completely locale insensitive. Turkish and some related
-languages have two types of C<"I"> characters. One is dotted in both
-upper- and lowercase, and the other is dotless in both cases. Unicode
-allows a locale to use either these rules, or the rules used in all
-other instances, where there is only one type of C<"I">, which is
-dotless in the uppercase, and dotted in the lower. The perl core does
-not (yet) handle the Turkish case, and this warns you of that. Instead,
+expected to have very particular behavior, which most do. This message
+arises when perl found some departures from the expectations, and is
+notifying you that the expected behavior overrides these differences.
+In some cases the differences are caused by the locale definition being
+defective, but the most common causes of this warning are when there are
+ambiguities and conflicts in following the Standard, and the locale has
+chosen an approach that differs from Perl's.
+
+One of these is because that, contrary to the claims, Unicode is not
+completely locale insensitive. Turkish and some related languages have
+two types of C<"I"> characters. One is dotted in both upper- and
+lowercase, and the other is dotless in both cases. Unicode allows a
+locale to use either the Turkish rules, or the rules used in all other
+instances, where there is only one type of C<"I">, which is dotless in
+the uppercase, and dotted in the lower. The perl core does not (yet)
+handle the Turkish case, and this message warns you of that. Instead,
the L<Unicode::Casing> module allows you to mostly implement the Turkish
casing rules.
-But there are other locales which are defective in not following the
-Unicode standard, and this message is raised if one of these is
-detected.
+The other common cause is for the characters
+
+ $ + < = > ^ ` | ~
+
+These are probematic. The C standard says that these should be
+considered punctuation in the C locale (and the POSIX standard defers to
+the C standard), and Unicode is generally considered a superset of the C
+locale. But Unicode has added an extra category, "Symbol", and
+classifies these particular characters as being symbols. Most UTF-8
+locales have them treated as punctuation, so that L<ispunct(2)> returns
+non-zero for them. But a few locales have it return 0. Perl takes the
+first approach, not using C<ispunct()> at all (see L<Note [5] in
+perlrecharclass|perlrecharclass/[5]>), and this message is raised to
+notify you that you are getting Perl's approach, not the locale's.
=item Locale '%s' may not work well.%s
diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat
index 78e0ec659d..5856f805f3 100644
--- a/t/porting/known_pod_issues.dat
+++ b/t/porting/known_pod_issues.dat
@@ -147,6 +147,7 @@ ioctl(2)
IPC::Run
IPC::Shareable
IPC::Signal
+ispunct(2)
kill(3)
langinfo(3)
LaTeX::Encode