diff options
author | Karl Williamson <khw@khw-desktop.(none)> | 2009-12-21 11:44:35 -0700 |
---|---|---|
committer | Rafael Garcia-Suarez <rgs@consttype.org> | 2009-12-22 11:44:37 +0100 |
commit | 0111a78fcc993bdfaa4b46112924c3a9751ecfa5 (patch) | |
tree | f9dc23978c71cd47fd18e36fff0613f8673b58e1 /pod/perlrebackslash.pod | |
parent | c3c0aa283b73660f84ae7e190dcbbd607facb512 (diff) | |
download | perl-0111a78fcc993bdfaa4b46112924c3a9751ecfa5.tar.gz |
Fix up pods for \X
Diffstat (limited to 'pod/perlrebackslash.pod')
-rw-r--r-- | pod/perlrebackslash.pod | 18 |
1 files changed, 7 insertions, 11 deletions
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index 40f73fcbc1..e8ffcf16d0 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -100,7 +100,7 @@ quoted constructs>. \w Character class for word characters. \W Character class for non-word characters. \x{}, \x00 Hexadecimal escape sequence. - \X Extended Unicode "combining character sequence". + \X Unicode "extended grapheme cluster". \z End of string. \Z End of string. @@ -507,18 +507,14 @@ metacharacter, and suggests C<\R> as the notation. =item \X -This matches an extended Unicode I<combining character sequence>, and -is equivalent to C<< (?>\PM\pM*) >>. C<\PM> matches any character that is -not considered a Unicode mark character, while C<\pM> matches any character -that is considered a Unicode mark character; so C<\X> matches any non -mark character followed by zero or more mark characters. Mark characters -include (but are not restricted to) I<combining characters> and -I<vowel signs>. +This matches a Unicode I<extended grapheme cluster>. C<\X> matches quite well what normal (non-Unicode-programmer) usage -would consider a single character: for example a base character -(the C<\PM> above), for example a letter, followed by zero or more -diacritics, which are I<combining characters> (the C<\pM*> above). +would consider a single character. As an example, consider a G with some sort +of accent mark over it (a diacritic). There is no such single character in +Unicode, but something like one can be constructed by using a G followed by a +Unicode combining accent, and would be displayed by Unicode-aware software as +if it were a single character. Mnemonic: eI<X>tended Unicode character. |