diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-09-28 09:36:25 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-10-01 09:30:40 -0600 |
commit | 94b42e47713770173606e6b7686a6ca5b74b41cc (patch) | |
tree | 544ce6bd26cc8467f591aea28110a5d969dc2df7 /pod | |
parent | 45bb2768cee5570e1fb15c763f1585fd2010f130 (diff) | |
download | perl-94b42e47713770173606e6b7686a6ca5b74b41cc.tar.gz |
More documenting that \p{} defined only for <= U+10FFF
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perldiag.pod | 10 | ||||
-rw-r--r-- | pod/perlrecharclass.pod | 11 | ||||
-rw-r--r-- | pod/perlunicode.pod | 8 |
3 files changed, 29 insertions, 0 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 7e0cdd9caf..131fbb5d1b 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -1389,6 +1389,16 @@ will not match, because the code point is not in Unicode. But will match. +This may be counterintuitive at times, as both these fail: + + chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails. + chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails! + +and both these succeed: + + chr(0x110000) =~ \P{ASCII_Hex_Digit=True} # Succeeds. + chr(0x110000) =~ \P{ASCII_Hex_Digit=False} # Also succeeds! + =item %s: Command not found (A) You've accidentally run your script through B<csh> instead of Perl. diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index fef220fc11..3a105798a0 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -370,6 +370,17 @@ which notes all forms that have C</i> differences. It is also possible to define your own properties. This is discussed in L<perlunicode/User-Defined Character Properties>. +Unicode properties are defined (surprise!) only on Unicode code points. +A warning is raised and all matches fail on non-Unicode code points +(those above the legal Unicode maximum of 0x10FFFF). This can be +somewhat surprising, + + chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails. + chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails! + +Even though these two matches might be thought of as complements, they +are so only on Unicode code points. + =head4 Examples "a" =~ /\w/ # Match, "a" is a 'word' character. diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index f00b110082..2d0a671dbf 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -366,6 +366,14 @@ of which under C</i> matching match C<PosixAlpha>. numerals, come in both upper and lower case so they are C<Cased>, but aren't considered letters, so they aren't C<Cased_Letter>s.) +The result is undefined if you try to match a non-Unicode code point +(that is, one above 0x10FFFF) against a Unicode property. Currently, a +warning is raised, and the match will fail. In some cases, this is +counterintuitive, as both these fail: + + chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails. + chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Fails! + =head3 B<General_Category> Every Unicode character is assigned a general category, which is the "most |