summaryrefslogtreecommitdiff
path: root/pod/perlretut.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@khw-desktop.(none)>2009-12-24 22:54:58 -0700
committerAbigail <abigail@abigail.be>2009-12-25 10:07:41 +0100
commite1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch)
treeb12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /pod/perlretut.pod
parent27bca3226281a592aed848b7e68ea50f27381dac (diff)
downloadperl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz
Update .pods
Signed-off-by: Abigail <abigail@abigail.be>
Diffstat (limited to 'pod/perlretut.pod')
-rw-r--r--pod/perlretut.pod29
1 files changed, 19 insertions, 10 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod
index 6c5c2e92a2..b9be6e6e51 100644
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -1945,20 +1945,29 @@ you would use the script name, for example C<\p{Latin}>, C<\p{Greek}>,
or C<\P{Katakana}>. Other sets are the Unicode blocks, the names
of which begin with "In". One such block is dedicated to mathematical
operators, and its pattern formula is <C\p{InMathematicalOperators>}>.
-For the full list see L<perlunicode>.
+For the full list see L<perluniprops>.
+
+What we have described so far is the single form of the C<\p{...}> character
+classes. There is also a compound form which you may run into. These
+look like C<\p{name=value}> or C<\p{name:value}> (the equals sign and colon
+can be used interchangeably). These are more general than the single form,
+and in fact most of the single forms are just Perl-defined shortcuts for common
+compound forms. For example, the script examples in the previous paragraph
+could be written equivalently as C<\p{Script=Latin}>, C<\p{Script:Greek}>, and
+C<\P{script=katakana}> (case is irrelevant between the C<{}> braces). You may
+never have to use the compound forms, but sometimes it is necessary, and their
+use can make your code easier to understand.
C<\X> is an abbreviation for a character class that comprises
-the Unicode I<combining character sequences>. A combining character
-sequence is a base character followed by any number of diacritics, i.e.,
-signs like accents used to indicate different sounds of a letter. Using
-the Unicode full names, e.g., S<C<A + COMBINING RING>> is a combining
-character sequence with base character C<A> and combining character
-S<C<COMBINING RING>>, which translates in Danish to A with the circle
-atop it, as in the word Angstrom. C<\X> is equivalent to C<\PM\pM*}>,
-i.e., a non-mark followed by one or more marks.
+a Unicode I<extended grapheme cluster>. This represents a "logical character",
+what appears to be a single character, but may be represented internally by more
+than one. As an example, using the Unicode full names, e.g., S<C<A + COMBINING
+RING>> is a grapheme cluster with base character C<A> and combining character
+S<C<COMBINING RING>>, which translates in Danish to A with the circle atop it,
+as in the word Angstrom.
For the full and latest information about Unicode see the latest
-Unicode standard, or the Unicode Consortium's website http://www.unicode.org/
+Unicode standard, or the Unicode Consortium's website L<http://www.unicode.org>
As if all those classes weren't enough, Perl also defines POSIX style
character classes. These have the form C<[:name:]>, with C<name> the