Update .pods

Signed-off-by: Abigail <abigail@abigail.be>
author: Karl Williamson <khw@khw-desktop.(none)> 2009-12-24 22:54:58 -0700
committer: Abigail <abigail@abigail.be> 2009-12-25 10:07:41 +0100
commit: e1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch)
tree: b12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /pod/perlretut.pod
parent: 27bca3226281a592aed848b7e68ea50f27381dac (diff)
download: perl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz
1 files changed, 19 insertions, 10 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod
index 6c5c2e92a2..b9be6e6e51 100644
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -1945,20 +1945,29 @@ you would use the script name, for example C<\p{Latin}>, C<\p{Greek}>,
 or C<\P{Katakana}>. Other sets are the Unicode blocks, the names
 of which begin with "In". One such block is dedicated to mathematical
 operators, and its pattern formula is <C\p{InMathematicalOperators>}>.
-For the full list see L<perlunicode>.
+For the full list see L<perluniprops>.
+
+What we have described so far is the single form of the C<\p{...}> character
+classes.  There is also a compound form which you may run into.  These
+look like C<\p{name=value}> or C<\p{name:value}> (the equals sign and colon
+can be used interchangeably).  These are more general than the single form,
+and in fact most of the single forms are just Perl-defined shortcuts for common
+compound forms.  For example, the script examples in the previous paragraph
+could be written equivalently as C<\p{Script=Latin}>, C<\p{Script:Greek}>, and
+C<\P{script=katakana}> (case is irrelevant between the C<{}> braces).  You may
+never have to use the compound forms, but sometimes it is necessary, and their
+use can make your code easier to understand.
 
 C<\X> is an abbreviation for a character class that comprises
-the Unicode I<combining character sequences>.  A combining character
-sequence is a base character followed by any number of diacritics, i.e.,
-signs like accents used to indicate different sounds of a letter. Using
-the Unicode full names, e.g., S<C<A + COMBINING RING>> is a combining
-character sequence with base character C<A> and combining character
-S<C<COMBINING RING>>, which translates in Danish to A with the circle
-atop it, as in the word Angstrom.  C<\X> is equivalent to C<\PM\pM*}>,
-i.e., a non-mark followed by one or more marks.
+a Unicode I<extended grapheme cluster>.  This represents a "logical character",
+what appears to be a single character, but may be represented internally by more
+than one.  As an example, using the Unicode full names, e.g., S<C<A + COMBINING
+RING>> is a grapheme cluster with base character C<A> and combining character
+S<C<COMBINING RING>>, which translates in Danish to A with the circle atop it,
+as in the word Angstrom.
 
 For the full and latest information about Unicode see the latest
-Unicode standard, or the Unicode Consortium's website http://www.unicode.org/
+Unicode standard, or the Unicode Consortium's website L<http://www.unicode.org>
 
 As if all those classes weren't enough, Perl also defines POSIX style
 character classes.  These have the form C<[:name:]>, with C<name> the
author	Karl Williamson <khw@khw-desktop.(none)>	2009-12-24 22:54:58 -0700
committer	Abigail <abigail@abigail.be>	2009-12-25 10:07:41 +0100
commit	e1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch)
tree	b12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /pod/perlretut.pod
parent	27bca3226281a592aed848b7e68ea50f27381dac (diff)
download	perl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz