diff options
author | Karl Williamson <khw@khw-desktop.(none)> | 2009-12-24 22:54:58 -0700 |
---|---|---|
committer | Abigail <abigail@abigail.be> | 2009-12-25 10:07:41 +0100 |
commit | e1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch) | |
tree | b12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /pod/perlretut.pod | |
parent | 27bca3226281a592aed848b7e68ea50f27381dac (diff) | |
download | perl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz |
Update .pods
Signed-off-by: Abigail <abigail@abigail.be>
Diffstat (limited to 'pod/perlretut.pod')
-rw-r--r-- | pod/perlretut.pod | 29 |
1 files changed, 19 insertions, 10 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 6c5c2e92a2..b9be6e6e51 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1945,20 +1945,29 @@ you would use the script name, for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>. Other sets are the Unicode blocks, the names of which begin with "In". One such block is dedicated to mathematical operators, and its pattern formula is <C\p{InMathematicalOperators>}>. -For the full list see L<perlunicode>. +For the full list see L<perluniprops>. + +What we have described so far is the single form of the C<\p{...}> character +classes. There is also a compound form which you may run into. These +look like C<\p{name=value}> or C<\p{name:value}> (the equals sign and colon +can be used interchangeably). These are more general than the single form, +and in fact most of the single forms are just Perl-defined shortcuts for common +compound forms. For example, the script examples in the previous paragraph +could be written equivalently as C<\p{Script=Latin}>, C<\p{Script:Greek}>, and +C<\P{script=katakana}> (case is irrelevant between the C<{}> braces). You may +never have to use the compound forms, but sometimes it is necessary, and their +use can make your code easier to understand. C<\X> is an abbreviation for a character class that comprises -the Unicode I<combining character sequences>. A combining character -sequence is a base character followed by any number of diacritics, i.e., -signs like accents used to indicate different sounds of a letter. Using -the Unicode full names, e.g., S<C<A + COMBINING RING>> is a combining -character sequence with base character C<A> and combining character -S<C<COMBINING RING>>, which translates in Danish to A with the circle -atop it, as in the word Angstrom. C<\X> is equivalent to C<\PM\pM*}>, -i.e., a non-mark followed by one or more marks. +a Unicode I<extended grapheme cluster>. This represents a "logical character", +what appears to be a single character, but may be represented internally by more +than one. As an example, using the Unicode full names, e.g., S<C<A + COMBINING +RING>> is a grapheme cluster with base character C<A> and combining character +S<C<COMBINING RING>>, which translates in Danish to A with the circle atop it, +as in the word Angstrom. For the full and latest information about Unicode see the latest -Unicode standard, or the Unicode Consortium's website http://www.unicode.org/ +Unicode standard, or the Unicode Consortium's website L<http://www.unicode.org> As if all those classes weren't enough, Perl also defines POSIX style character classes. These have the form C<[:name:]>, with C<name> the |