diff options
author | Father Chrysostomos <sprout@cpan.org> | 2011-02-13 23:10:59 -0800 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2011-02-13 23:10:59 -0800 |
commit | f9eb106c1084642aeabc4c43b4e436fd24217bd2 (patch) | |
tree | a968c5406ea206890102f25c32935ddf318996fa /pod/perldiag.pod | |
parent | 5ff1373f1bf4b408f583891f0fe2c8a61674e0a4 (diff) | |
download | perl-f9eb106c1084642aeabc4c43b4e436fd24217bd2.tar.gz |
perldiag: Move the \p{} entry
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r-- | pod/perldiag.pod | 48 |
1 files changed, 24 insertions, 24 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index a9cb3f1bdd..56089d036a 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3332,30 +3332,6 @@ package-specific handler. That name might have a meaning to Perl itself some day, even though it doesn't yet. Perhaps you should use a mixed-case attribute name, instead. See L<attributes>. -=item \p{} uses Unicode rules, not locale rules - -(W) You compiled a regular expression that contained a Unicode property -match (C<\p> or C<\P>), but the regular expression is also being told to -use the run-time locale, not Unicode. Instead, use a POSIX character -class, which should know about the locale's rules. -(See L<perlrecharclass/POSIX Character Classes>.) - -Even if the run-time locale is ISO 8859-1 (Latin1), which is a subset of -Unicode, some properties will give results that are not valid for that -subset. - -Here are a couple of examples to help you see what's going on. If the -locale is ISO 8859-7, the character at code point 0xD7 is the "GREEK -CAPITAL LETTER CHI". But in Unicode that code point means the -"MULTIPLICATION SIGN" instead, and C<\p> always uses the Unicode -meaning. That means that C<\p{Alpha}> won't match, but C<[[:alpha:]]> -should. Only in the Latin1 locale are all the characters in the same -positions as they are in Unicode. But, even here, some properties give -incorrect results. An example is C<\p{Changes_When_Uppercased}> which -is true for "LATIN SMALL LETTER Y WITH DIAERESIS", but since the upper -case of that character is not in Latin1, in that locale it doesn't -change when upper cased. - =item pack/unpack repeat count overflow (F) You can't specify a repeat count so large that it overflows your @@ -3850,6 +3826,30 @@ declared or defined with a different function prototype. (F) You've omitted the closing parenthesis in a function prototype definition. +=item \p{} uses Unicode rules, not locale rules + +(W) You compiled a regular expression that contained a Unicode property +match (C<\p> or C<\P>), but the regular expression is also being told to +use the run-time locale, not Unicode. Instead, use a POSIX character +class, which should know about the locale's rules. +(See L<perlrecharclass/POSIX Character Classes>.) + +Even if the run-time locale is ISO 8859-1 (Latin1), which is a subset of +Unicode, some properties will give results that are not valid for that +subset. + +Here are a couple of examples to help you see what's going on. If the +locale is ISO 8859-7, the character at code point 0xD7 is the "GREEK +CAPITAL LETTER CHI". But in Unicode that code point means the +"MULTIPLICATION SIGN" instead, and C<\p> always uses the Unicode +meaning. That means that C<\p{Alpha}> won't match, but C<[[:alpha:]]> +should. Only in the Latin1 locale are all the characters in the same +positions as they are in Unicode. But, even here, some properties give +incorrect results. An example is C<\p{Changes_When_Uppercased}> which +is true for "LATIN SMALL LETTER Y WITH DIAERESIS", but since the upper +case of that character is not in Latin1, in that locale it doesn't +change when upper cased. + =item Quantifier follows nothing in regex; marked by <-- HERE in m/%s/ (F) You started a regular expression with a quantifier. Backslash it if you |