summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2001-09-29 20:15:32 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2001-09-29 20:15:32 +0000
commita1cc1cb15ece75062cc270a98e84bc57301a80f0 (patch)
tree854490affaf7d4fea19dec65b1d328f89557d92a /pod/perlunicode.pod
parentab13f0c73a71a1ea41c4bdcd1f78f8b903cc458c (diff)
downloadperl-a1cc1cb15ece75062cc270a98e84bc57301a80f0.tar.gz
Explain a bit the new more flexible \p\P syntax.
p4raw-id: //depot/perl@12270
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod15
1 files changed, 10 insertions, 5 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index f27173cded..4864909e35 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -168,11 +168,16 @@ match property) constructs. For instance, C<\p{Lu}> matches any
character with the Unicode uppercase property, while C<\p{M}> matches
any mark character. Single letter properties may omit the brackets,
so that can be written C<\pM> also. Many predefined character classes
-are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>. The
-recommended names of the C<In> classes are the official Unicode script
-and block names but with all non-alphanumeric characters removed, for
-example the block name C<"Latin-1 Supplement"> becomes
-C<\p{InLatin1Supplement}>.
+are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>.
+The recommended naming convention of the C<In> classes are the
+official Unicode script and block names, but with all non-alphanumeric
+characters removed, for example the block name C<"Latin-1 Supplement">
+becomes C<\p{InLatin1Supplement}>. Perl will ignore the case of
+letters, and any space or dash can be a space, dash, underbar, or be
+missing altogether, so C<\p{ in latin 1 supplement }> will work, too.
+You can also negate both C<\p{}> and C<\P{}> by introducing a caret
+(^) between the first curly and the property name: C<\p{^InTamil}> is
+equal to C<\P{Tamil}>.
Here is the list as of Unicode 3.1.0 (the two-letter classes) and
as defined by Perl (the one-letter classes) (in Unicode materials