diff options
author | Karl Williamson <public@khwilliamson.com> | 2010-09-12 21:33:12 -0600 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2010-09-25 00:47:02 -0700 |
commit | fb121860c2407cd1d1566d63a95a5220fa93d8e4 (patch) | |
tree | cc61893dd3ffe9966e079addeaa538172e2290e9 /pod | |
parent | 8ebef31d4feab4b7c35ff0eb427632a67b1abdd9 (diff) | |
download | perl-fb121860c2407cd1d1566d63a95a5220fa93d8e4.tar.gz |
Teach Perl about Unicode named character sequences
mktables is changed to process the Unicode named sequence file.
charnames.pm is changed to cache the looked-up values in utf8. A new
function, string_vianame is created that can handle named sequences, as
the interface for vianame cannot. The subroutine lookup_name() is
slightly refactored to do almost all of the common work for \N{} and the
vianame routines. It now understands named sequences as created my
mktables..
tests and documentation are added. In the randomized testing section,
half use vianame() and half string_vianame().
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perldelta.pod | 15 | ||||
-rw-r--r-- | pod/perlop.pod | 4 | ||||
-rw-r--r-- | pod/perlre.pod | 8 | ||||
-rw-r--r-- | pod/perlrebackslash.pod | 24 | ||||
-rw-r--r-- | pod/perlreref.pod | 2 | ||||
-rw-r--r-- | pod/perluniintro.pod | 2 |
6 files changed, 36 insertions, 19 deletions
diff --git a/pod/perldelta.pod b/pod/perldelta.pod index abbca524c3..b21f2534b3 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -53,6 +53,21 @@ The C<"d"> modifier is used in the scope of C<use locale> to compile the regular expression as if it were not in that scope. See L<perlre/(?dlupimsx-imsx)>. +=head2 C<\N{...}> now handles Unicode named character sequences + +Unicode has a number of named character sequences, in which particular sequences +of code points are given names. C<\N{...}> now recognizes these. +See L<charnames>. + +=head2 New function C<charnames::string_vianame()> + +This function is a run-time version of C<\N{...}>, returning the string +of characters whose Unicode name is its parameter. It can handle +Unicode named character sequences, whereas the pre-existing +C<charnames::vianame()> cannot, as the latter returns a single code +point. +See L<charnames>. + =head1 Security XXX Any security-related notices go here. In particular, any security diff --git a/pod/perlop.pod b/pod/perlop.pod index dc5118c300..d5ca94262a 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -1029,7 +1029,7 @@ X<\o{}> \e escape (ESC) \x{263a} [1,8] hex char (example: SMILEY) \x1b [2,8] restricted range hex char (example: ESC) - \N{name} [3] named Unicode character + \N{name} [3] named Unicode character or character sequence \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON) \c[ [5] control char (example: chr(27)) \o{23072} [6,8] octal char (example: SMILEY) @@ -1073,7 +1073,7 @@ For example: =item [3] -The result is the Unicode character given by I<name>. +The result is the Unicode character or character sequence given by I<name>. See L<charnames>. =item [4] diff --git a/pod/perlre.pod b/pod/perlre.pod index b9216c156c..88089ee1d7 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -231,7 +231,7 @@ also work: \e escape (think troff) (ESC) \cK control char (example: VT) \x{}, \x00 character whose ordinal is the given hexadecimal number - \N{name} named Unicode character + \N{name} named Unicode character or character sequence \N{U+263D} Unicode character (example: FIRST QUARTER MOON) \o{}, \000 character whose ordinal is the given octal number \l lowercase next char (think vi) @@ -316,9 +316,9 @@ See L</Extended Patterns> below for details. =item [7] Note that C<\N> has two meanings. When of the form C<\N{NAME}>, it matches the -character whose name is C<NAME>; and similarly when of the form -C<\N{U+I<wide hex char>}>, it matches the character whose Unicode ordinal is -I<wide hex char>. Otherwise it matches any character but C<\n>. +character or character sequence whose name is C<NAME>; and similarly +when of the form C<\N{U+I<hex>}>, it matches the character whose Unicode +code point is I<hex>. Otherwise it matches any character but C<\n>. =back diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index eb51d94305..a9257c7d82 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -85,7 +85,7 @@ as C<Not in [].> \L Lowercase till \E. Not in []. \n (Logical) newline character. \N Any character but newline. Experimental. Not in []. - \N{} Named or numbered (Unicode) character. + \N{} Named or numbered (Unicode) character or sequence. \o{} Octal escape sequence. \p{}, \pP Character with the given Unicode property. \P{}, \PP Character without the given Unicode property. @@ -165,14 +165,15 @@ Mnemonic: I<c>ontrol character. $str =~ /\cK/; # Matches if $str contains a vertical tab (control-K). -=head3 Named or numbered characters +=head3 Named or numbered characters and character sequences Unicode characters have a Unicode name and numeric ordinal value. Use the C<\N{}> construct to specify a character by either of these values. +Certain sequences of characters also have names. -To specify by name, the name of the character goes between the curly braces. -In this case, you have to C<use charnames> to load the Unicode names of the -characters, otherwise Perl will complain. +To specify by name, the name of the character or character sequence goes +between the curly braces. In this case, you have to C<use charnames> to +load the Unicode names of the characters, otherwise Perl will complain. To specify a character by Unicode code point, use the form C<\N{U+I<wide hex character>}>, where I<wide hex character> is a number in @@ -183,8 +184,8 @@ C<LATIN CAPITAL LETTER A>, and you will rarely see it written without the two leading zeros. C<\N{U+0041}> means "A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41). -It is even possible to give your own names to characters, and even to short -sequences of characters. For details, see L<charnames>. +It is even possible to give your own names to characters and character +sequences. For details, see L<charnames>. (There is an expanded internal form that you may see in debug output: C<\N{U+I<wide hex character>.I<wide hex character>...}>. @@ -194,9 +195,9 @@ form only, subject to change, and you should not try to use it yourself.) Mnemonic: I<N>amed character. -Note that a character that is expressed as a named or numbered character is -considered as a character without special meaning by the regex engine, and will -match "as is". +Note that a character or character sequence that is expressed as a named +or numbered character is considered as a character without special +meaning by the regex engine, and will match "as is". =head4 Example @@ -572,7 +573,8 @@ identical to the C<.> metasymbol, except under the C</s> flag, which changes the meaning of C<.>, but not C<\N>. Note that C<\N{...}> can mean a -L<named or numbered character|/Named or numbered characters>. +L<named or numbered character +|/Named or numbered characters and character sequences>. Mnemonic: Complement of I<\n>. diff --git a/pod/perlreref.pod b/pod/perlreref.pod index 01d57cc4e6..6e028ee9c3 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -94,7 +94,7 @@ These work as in normal strings. \x7f Char whose ordinal is the 2 hex digits, max \xFF \x{263a} Char whose ordinal is the hex number, unrestricted \cx Control-x - \N{name} A named Unicode character + \N{name} A named Unicode character or character sequence \N{U+263D} A Unicode character by hex ordinal \l Lowercase next character diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 54ce2f0a1c..f0b2be5a40 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -248,7 +248,7 @@ characters: Note that both C<\x{...}> and C<\N{...}> are compile-time string constants: you cannot use variables in them. if you want similar -run-time functionality, use C<chr()> and C<charnames::vianame()>. +run-time functionality, use C<chr()> and C<charnames::string_vianame()>. If you want to force the result to Unicode characters, use the special C<"U0"> prefix. It consumes no arguments but causes the following bytes |