More documenting that \p{} defined only for <= U+10FFF

author: Karl Williamson <public@khwilliamson.com> 2011-09-28 09:36:25 -0600
committer: Karl Williamson <public@khwilliamson.com> 2011-10-01 09:30:40 -0600
commit: 94b42e47713770173606e6b7686a6ca5b74b41cc (patch)
tree: 544ce6bd26cc8467f591aea28110a5d969dc2df7 /pod
parent: 45bb2768cee5570e1fb15c763f1585fd2010f130 (diff)
download: perl-94b42e47713770173606e6b7686a6ca5b74b41cc.tar.gz
3 files changed, 29 insertions, 0 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 7e0cdd9caf..131fbb5d1b 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -1389,6 +1389,16 @@ will not match, because the code point is not in Unicode.  But
 
 will match.
 
+This may be counterintuitive at times, as both these fail:
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True}      # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False}     # Also fails!
+
+and both these succeed:
+
+ chr(0x110000) =~ \P{ASCII_Hex_Digit=True}      # Succeeds.
+ chr(0x110000) =~ \P{ASCII_Hex_Digit=False}     # Also succeeds!
+
 =item %s: Command not found
 
 (A) You've accidentally run your script through B<csh> instead of Perl.
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index fef220fc11..3a105798a0 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -370,6 +370,17 @@ which notes all forms that have C</i> differences.
 It is also possible to define your own properties. This is discussed in
 L<perlunicode/User-Defined Character Properties>.
 
+Unicode properties are defined (surprise!) only on Unicode code points.
+A warning is raised and all matches fail on non-Unicode code points
+(those above the legal Unicode maximum of 0x10FFFF).  This can be
+somewhat surprising,
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True}      # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False}     # Also fails!
+
+Even though these two matches might be thought of as complements, they
+are so only on Unicode code points.
+
 =head4 Examples
 
  "a"  =~  /\w/      # Match, "a" is a 'word' character.
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index f00b110082..2d0a671dbf 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -366,6 +366,14 @@ of which under C</i> matching match C<PosixAlpha>.
 numerals, come in both upper and lower case so they are C<Cased>, but aren't considered
 letters, so they aren't C<Cased_Letter>s.)
 
+The result is undefined if you try to match a non-Unicode code point
+(that is, one above 0x10FFFF) against a Unicode property.  Currently, a
+warning is raised, and the match will fail.  In some cases, this is
+counterintuitive, as both these fail:
+
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=True}      # Fails.
+ chr(0x110000) =~ \p{ASCII_Hex_Digit=False}     # Fails!
+
 =head3 B<General_Category>
 
 Every Unicode character is assigned a general category, which is the "most
author	Karl Williamson <public@khwilliamson.com>	2011-09-28 09:36:25 -0600
committer	Karl Williamson <public@khwilliamson.com>	2011-10-01 09:30:40 -0600
commit	94b42e47713770173606e6b7686a6ca5b74b41cc (patch)
tree	544ce6bd26cc8467f591aea28110a5d969dc2df7 /pod
parent	45bb2768cee5570e1fb15c763f1585fd2010f130 (diff)
download	perl-94b42e47713770173606e6b7686a6ca5b74b41cc.tar.gz