summaryrefslogtreecommitdiff
path: root/pod/perlrebackslash.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2010-09-12 21:33:12 -0600
committerFather Chrysostomos <sprout@cpan.org>2010-09-25 00:47:02 -0700
commitfb121860c2407cd1d1566d63a95a5220fa93d8e4 (patch)
treecc61893dd3ffe9966e079addeaa538172e2290e9 /pod/perlrebackslash.pod
parent8ebef31d4feab4b7c35ff0eb427632a67b1abdd9 (diff)
downloadperl-fb121860c2407cd1d1566d63a95a5220fa93d8e4.tar.gz
Teach Perl about Unicode named character sequences
mktables is changed to process the Unicode named sequence file. charnames.pm is changed to cache the looked-up values in utf8. A new function, string_vianame is created that can handle named sequences, as the interface for vianame cannot. The subroutine lookup_name() is slightly refactored to do almost all of the common work for \N{} and the vianame routines. It now understands named sequences as created my mktables.. tests and documentation are added. In the randomized testing section, half use vianame() and half string_vianame().
Diffstat (limited to 'pod/perlrebackslash.pod')
-rw-r--r--pod/perlrebackslash.pod24
1 files changed, 13 insertions, 11 deletions
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod
index eb51d94305..a9257c7d82 100644
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -85,7 +85,7 @@ as C<Not in [].>
\L Lowercase till \E. Not in [].
\n (Logical) newline character.
\N Any character but newline. Experimental. Not in [].
- \N{} Named or numbered (Unicode) character.
+ \N{} Named or numbered (Unicode) character or sequence.
\o{} Octal escape sequence.
\p{}, \pP Character with the given Unicode property.
\P{}, \PP Character without the given Unicode property.
@@ -165,14 +165,15 @@ Mnemonic: I<c>ontrol character.
$str =~ /\cK/; # Matches if $str contains a vertical tab (control-K).
-=head3 Named or numbered characters
+=head3 Named or numbered characters and character sequences
Unicode characters have a Unicode name and numeric ordinal value. Use the
C<\N{}> construct to specify a character by either of these values.
+Certain sequences of characters also have names.
-To specify by name, the name of the character goes between the curly braces.
-In this case, you have to C<use charnames> to load the Unicode names of the
-characters, otherwise Perl will complain.
+To specify by name, the name of the character or character sequence goes
+between the curly braces. In this case, you have to C<use charnames> to
+load the Unicode names of the characters, otherwise Perl will complain.
To specify a character by Unicode code point, use the form
C<\N{U+I<wide hex character>}>, where I<wide hex character> is a number in
@@ -183,8 +184,8 @@ C<LATIN CAPITAL LETTER A>, and you will rarely see it written without the two
leading zeros. C<\N{U+0041}> means "A" even on EBCDIC machines (where the
ordinal value of "A" is not 0x41).
-It is even possible to give your own names to characters, and even to short
-sequences of characters. For details, see L<charnames>.
+It is even possible to give your own names to characters and character
+sequences. For details, see L<charnames>.
(There is an expanded internal form that you may see in debug output:
C<\N{U+I<wide hex character>.I<wide hex character>...}>.
@@ -194,9 +195,9 @@ form only, subject to change, and you should not try to use it yourself.)
Mnemonic: I<N>amed character.
-Note that a character that is expressed as a named or numbered character is
-considered as a character without special meaning by the regex engine, and will
-match "as is".
+Note that a character or character sequence that is expressed as a named
+or numbered character is considered as a character without special
+meaning by the regex engine, and will match "as is".
=head4 Example
@@ -572,7 +573,8 @@ identical to the C<.> metasymbol, except under the C</s> flag, which changes
the meaning of C<.>, but not C<\N>.
Note that C<\N{...}> can mean a
-L<named or numbered character|/Named or numbered characters>.
+L<named or numbered character
+|/Named or numbered characters and character sequences>.
Mnemonic: Complement of I<\n>.