diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-03-06 23:12:38 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-03-06 23:12:38 +0000 |
commit | 8692993118c7bb6d29743e2f564d11368140b2bf (patch) | |
tree | 94b1ab82692ac1f943fd43e81f26a651f9f2dc74 | |
parent | 6d958cacfa2c47a9b3e4d7a29ec6bd3ffa8ff2f9 (diff) | |
download | perl-8692993118c7bb6d29743e2f564d11368140b2bf.tar.gz |
The perlretut was still talking about the old \p and \P
definitions.
p4raw-id: //depot/perl@9061
-rw-r--r-- | pod/perlretut.pod | 35 |
1 files changed, 19 insertions, 16 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod index a77b87e125..ad62873725 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1720,29 +1720,32 @@ characters, $x =~ /^\p{IsLower}/; # doesn't match, lowercase char class $x =~ /^\P{IsLower}/; # matches, char class sans lowercase -If a C<name> is just one letter, the braces can be dropped. For -instance, C<\pM> is the character class of Unicode 'marks'. Here is -the association between some Perl named classes and the traditional -Unicode classes: +Here is the association between some Perl named classes and the +traditional Unicode classes: - Perl class name Unicode class name + Perl class name Unicode class name or regular expression - IsAlpha Lu, Ll, or Lo - IsAlnum Lu, Ll, Lo, or Nd + IsAlpha ^[LM] + IsAlnum ^[LMN] IsASCII $code le 127 - IsCntrl C + IsCntrl ^C + IsBlank ^Z[^lp] or $code eq "0009" IsDigit Nd - IsGraph [^C] and $code ne "0020" + IsGraph ^([LMNPS]|Co) IsLower Ll - IsPrint [^C] - IsPunct P - IsSpace Z, or ($code lt "0020" and chr(hex $code) is a \s) - IsUpper Lu - IsWord Lu, Ll, Lo, Nd or $code eq "005F" + IsPrint ^([LMNPS]|Co|Zs) + IsPunct ^P + IsSpace ^Z or ($code =~ /^(0009|000A|000B|000C|000D)$/ + IsSpacePerl ^Z or ($code =~ /^(0009|000A|000C|000D)$/ + IsUpper ^L[ut] + IsWord ^[LMN] or $code eq "005F" IsXDigit $code =~ /^00(3[0-9]|[46][1-6])$/ -For a full list of Perl class names, consult the mktables.PL program -in the lib/perl5/5.6.0/unicode directory. +You can also use the official Unicode class names with the C<\p> and +C<\P>, like C<\p{L}> for Unicode 'letters', or C<\p{Lu}> for uppercase +letters, or C<\P{Nd}> for non-digits. If a C<name> is just one +letter, the braces can be dropped. For instance, C<\pM> is the +character class of Unicode 'marks'. C<\X> is an abbreviation for a character class sequence that includes the Unicode 'combining character sequences'. A 'combining character |