summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2011-09-28 09:36:25 -0600
committerKarl Williamson <public@khwilliamson.com>2011-10-01 09:30:40 -0600
commit94b42e47713770173606e6b7686a6ca5b74b41cc (patch)
tree544ce6bd26cc8467f591aea28110a5d969dc2df7 /pod
parent45bb2768cee5570e1fb15c763f1585fd2010f130 (diff)
downloadperl-94b42e47713770173606e6b7686a6ca5b74b41cc.tar.gz
More documenting that \p{} defined only for <= U+10FFF
Diffstat (limited to 'pod')
-rw-r--r--pod/perldiag.pod10
-rw-r--r--pod/perlrecharclass.pod11
-rw-r--r--pod/perlunicode.pod8
3 files changed, 29 insertions, 0 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 7e0cdd9caf..131fbb5d1b 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -1389,6 +1389,16 @@ will not match, because the code point is not in Unicode. But
will match.
+This may be counterintuitive at times, as both these fail:
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails!
+
+and both these succeed:
+
+ chr(0x110000) =~ \P{ASCII_Hex_Digit=True} # Succeeds.
+ chr(0x110000) =~ \P{ASCII_Hex_Digit=False} # Also succeeds!
+
=item %s: Command not found
(A) You've accidentally run your script through B<csh> instead of Perl.
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index fef220fc11..3a105798a0 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -370,6 +370,17 @@ which notes all forms that have C</i> differences.
It is also possible to define your own properties. This is discussed in
L<perlunicode/User-Defined Character Properties>.
+Unicode properties are defined (surprise!) only on Unicode code points.
+A warning is raised and all matches fail on non-Unicode code points
+(those above the legal Unicode maximum of 0x10FFFF). This can be
+somewhat surprising,
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails!
+
+Even though these two matches might be thought of as complements, they
+are so only on Unicode code points.
+
=head4 Examples
"a" =~ /\w/ # Match, "a" is a 'word' character.
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index f00b110082..2d0a671dbf 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -366,6 +366,14 @@ of which under C</i> matching match C<PosixAlpha>.
numerals, come in both upper and lower case so they are C<Cased>, but aren't considered
letters, so they aren't C<Cased_Letter>s.)
+The result is undefined if you try to match a non-Unicode code point
+(that is, one above 0x10FFFF) against a Unicode property. Currently, a
+warning is raised, and the match will fail. In some cases, this is
+counterintuitive, as both these fail:
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Fails!
+
=head3 B<General_Category>
Every Unicode character is assigned a general category, which is the "most