diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-09-29 20:15:32 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-09-29 20:15:32 +0000 |
commit | a1cc1cb15ece75062cc270a98e84bc57301a80f0 (patch) | |
tree | 854490affaf7d4fea19dec65b1d328f89557d92a /pod/perlunicode.pod | |
parent | ab13f0c73a71a1ea41c4bdcd1f78f8b903cc458c (diff) | |
download | perl-a1cc1cb15ece75062cc270a98e84bc57301a80f0.tar.gz |
Explain a bit the new more flexible \p\P syntax.
p4raw-id: //depot/perl@12270
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r-- | pod/perlunicode.pod | 15 |
1 files changed, 10 insertions, 5 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index f27173cded..4864909e35 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -168,11 +168,16 @@ match property) constructs. For instance, C<\p{Lu}> matches any character with the Unicode uppercase property, while C<\p{M}> matches any mark character. Single letter properties may omit the brackets, so that can be written C<\pM> also. Many predefined character classes -are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>. The -recommended names of the C<In> classes are the official Unicode script -and block names but with all non-alphanumeric characters removed, for -example the block name C<"Latin-1 Supplement"> becomes -C<\p{InLatin1Supplement}>. +are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>. +The recommended naming convention of the C<In> classes are the +official Unicode script and block names, but with all non-alphanumeric +characters removed, for example the block name C<"Latin-1 Supplement"> +becomes C<\p{InLatin1Supplement}>. Perl will ignore the case of +letters, and any space or dash can be a space, dash, underbar, or be +missing altogether, so C<\p{ in latin 1 supplement }> will work, too. +You can also negate both C<\p{}> and C<\P{}> by introducing a caret +(^) between the first curly and the property name: C<\p{^InTamil}> is +equal to C<\P{Tamil}>. Here is the list as of Unicode 3.1.0 (the two-letter classes) and as defined by Perl (the one-letter classes) (in Unicode materials |