diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-06-11 17:55:47 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-06-11 17:55:47 +0000 |
commit | 3229381570cd559d546c04f62dc3e86718ceccd8 (patch) | |
tree | de91f8808ec4f2646f2b485c97fb4e4d90fd94af /pod/perlretut.pod | |
parent | fd71b04b8dfd287735727332d78eb1f4dd10bbbf (diff) | |
download | perl-3229381570cd559d546c04f62dc3e86718ceccd8.tar.gz |
Move the full \p\P lists to perlunicode.
p4raw-id: //depot/perl@10520
Diffstat (limited to 'pod/perlretut.pod')
-rw-r--r-- | pod/perlretut.pod | 71 |
1 files changed, 5 insertions, 66 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 2960950065..77e99429aa 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1746,72 +1746,11 @@ C<\P>, like C<\p{L}> for Unicode 'letters', or C<\p{Lu}> for uppercase letters, or C<\P{Nd}> for non-digits. If a C<name> is just one letter, the braces can be dropped. For instance, C<\pM> is the character class of Unicode 'marks', for example accent marks. -Here is the list as of Unicode 3.1.0 (the two-letter classes) and -Perl 5.8.0 (the one-letter classes): - - L Letter - Lu Letter, Uppercase - Ll Letter, Lowercase - Lt Letter, Titlecase - Lm Letter, Modifier - Lo Letter, Other - M Mark - Mn Mark, Non-Spacing - Mc Mark, Spacing Combining - Me Mark, Enclosing - N Number - Nd Number, Decimal Digit - Nl Number, Letter - No Number, Other - P Punctuation - Pc Punctuation, Connector - Pd Punctuation, Dash - Ps Punctuation, Open - Pe Punctuation, Close - Pi Punctuation, Initial quote - (may behave like Ps or Pe depending on usage) - Pf Punctuation, Final quote - (may behave like Ps or Pe depending on usage) - Po Punctuation, Other - S Symbol - Sm Symbol, Math - Sc Symbol, Currency - Sk Symbol, Modifier - So Symbol, Other - Z Separator - Zs Separator, Space - Zl Separator, Line - Zp Separator, Paragraph - C Other - Cc Other, Control - Cf Other, Format - Cs Other, Surrogate - Co Other, Private Use - Cn Other, Not Assigned (Unicode defines no Cn characters) - -Additionally, because scripts differ in their directionality -(for example Hebrew is written right to left), all characters -have their directionality defined: - - BidiL Left-to-Right - BidiLRE Left-to-Right Embedding - BidiLRO Left-to-Right Override - BidiR Right-to-Left - BidiAL Right-to-Left Arabic - BidiRLE Right-to-Left Embedding - BidiRLO Right-to-Left Override - BidiPDF Pop Directional Format - BidiEN European Number - BidiES European Number Separator - BidiET European Number Terminator - BidiAN Arabic Number - BidiCS Common Number Separator - BidiNSM Non-Spacing Mark - BidiBN Boundary Neutral - BidiB Paragraph Separator - BidiS Segment Separator - BidiWS Whitespace - BidiON Other Neutrals +For the full list see L<perlunicode>. + +The Unicode has also been separated into blocks of charaters which you +can test with C<\p{InBlock}> and C<\P{InBlock}>, for example C<\p{InGreek}> +and C<\P{InKatakana}. For the full list see L<perlunicode>. For the the full and latest information see the latest Unicode standard. |