diff options
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perl.pod | 1 | ||||
-rw-r--r-- | pod/perlfunc.pod | 33 | ||||
-rw-r--r-- | pod/perlhist.pod | 1 | ||||
-rw-r--r-- | pod/perlop.pod | 20 | ||||
-rw-r--r-- | pod/perlre.pod | 5 |
5 files changed, 42 insertions, 18 deletions
diff --git a/pod/perl.pod b/pod/perl.pod index 0b9e9fa680..4895bb2711 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -254,7 +254,6 @@ Perl developers, please write to <F<perl-thanks@perl.org>>. =head1 FILES - "/tmp/perl-e$$" temporary file for -e commands "@INC" locations of perl libraries =head1 SEE ALSO diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index ec80259d4b..9cab569aa6 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -555,7 +555,9 @@ restrictions may be relaxed, but this is not a portable assumption. =item chr Returns the character represented by that NUMBER in the character set. -For example, C<chr(65)> is C<"A"> in ASCII. For the reverse, use L</ord>. +For example, C<chr(65)> is C<"A"> in either ASCII or Unicode, and +chr(0x263a) is a Unicode smiley face (but only within the scope of a +C<use utf8>). For the reverse, use L</ord>. If NUMBER is omitted, uses C<$_>. @@ -1945,7 +1947,7 @@ C<redo> work. Returns an lowercased version of EXPR. This is the internal function implementing the C<\L> escape in double-quoted strings. -Respects current C<LC_CTYPE> locale if C<use locale> in force. See L<perllocale>. +Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. If EXPR is omitted, uses C<$_>. @@ -1955,7 +1957,7 @@ If EXPR is omitted, uses C<$_>. Returns the value of EXPR with the first character lowercased. This is the internal function implementing the C<\l> escape in double-quoted strings. -Respects current C<LC_CTYPE> locale if C<use locale> in force. See L<perllocale>. +Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. If EXPR is omitted, uses C<$_>. @@ -1963,7 +1965,7 @@ If EXPR is omitted, uses C<$_>. =item length -Returns the length in bytes of the value of EXPR. If EXPR is +Returns the length in characters of the value of EXPR. If EXPR is omitted, returns length of C<$_>. =item link OLDFILE,NEWFILE @@ -2382,7 +2384,7 @@ DIRHANDLEs have their own namespace separate from FILEHANDLEs. =item ord -Returns the numeric ascii value of the first character of EXPR. If +Returns the numeric (ASCII or Unicode) value of the first character of EXPR. If EXPR is omitted, uses C<$_>. For the reverse, see L</chr>. =item pack TEMPLATE,LIST @@ -2400,7 +2402,7 @@ follows: H A hex string (high nybble first). c A signed char value. - C An unsigned char value. + C An unsigned char value. Only does bytes. See U for Unicode. s A signed short value. S An unsigned short value. @@ -2433,6 +2435,8 @@ follows: P A pointer to a structure (fixed-length string). u A uuencoded string. + U A Unicode character number. Encodes to UTF-8 internally. + Works even if C<use utf8> is not in effect. w A BER compressed integer. Its bytes represent an unsigned integer in base 128, most significant digit first, with as @@ -2470,10 +2474,12 @@ C<unpack("f", pack("f", $foo)>) will not in general equal C<$foo>). Examples: - $foo = pack("cccc",65,66,67,68); + $foo = pack("CCCC",65,66,67,68); # foo eq "ABCD" - $foo = pack("c4",65,66,67,68); + $foo = pack("C4",65,66,67,68); # same thing + $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); + # same thing with Unicode circled letters $foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD" @@ -2905,13 +2911,13 @@ will automatically return the value of the last expression evaluated.) In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the -elements of LIST, and returns a string value consisting of those bytes, -but in the opposite order. +elements of LIST, and returns a string value with all the characters +in the opposite order. print reverse <>; # line tac, last line first undef $/; # for efficiency of <> - print scalar reverse <>; # byte tac, last line tsrif + print scalar reverse <>; # character tac, last line tsrif This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those @@ -4070,6 +4076,8 @@ otherwise. Returns an uppercased version of EXPR. This is the internal function implementing the C<\U> escape in double-quoted strings. Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. +Under Unicode (C<use utf8>) it uses the standard Unicode uppercase mappings. (It +does not attempt to do titlecase mapping on initial letters. See C<ucfirst()> for that.) If EXPR is omitted, uses C<$_>. @@ -4077,7 +4085,8 @@ If EXPR is omitted, uses C<$_>. =item ucfirst -Returns the value of EXPR with the first character uppercased. This is +Returns the value of EXPR with the first character +in uppercase (titlecase in Unicode). This is the internal function implementing the C<\u> escape in double-quoted strings. Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. diff --git a/pod/perlhist.pod b/pod/perlhist.pod index 9ed8b6f52e..95a354fd51 100644 --- a/pod/perlhist.pod +++ b/pod/perlhist.pod @@ -302,6 +302,7 @@ the strings?). Graham 5.005_03 1998- Sarathy 5.005_50 1998-Jul-26 The 5.006 development track. + 5.005_51 1998-Aug-10 =head2 SELECTED RELEASE SIZES diff --git a/pod/perlop.pod b/pod/perlop.pod index c7209fac28..35f9e5f4f8 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -636,7 +636,7 @@ next line. This allows you to write: For constructs that do interpolation, variables beginning with "C<$>" or "C<@>" are interpolated, as are the following sequences. Within -a transliteration, the first ten of these sequences may be used. +a transliteration, the first eleven of these sequences may be used. \t tab (HT, TAB) \n newline (NL) @@ -645,8 +645,9 @@ a transliteration, the first ten of these sequences may be used. \b backspace (BS) \a alarm (bell) (BEL) \e escape (ESC) - \033 octal char - \x1b hex char + \033 octal char (ESC) + \x1b hex char (ESC) + \x{263a} wide hex char (SMILEY) \c[ control char \l lowercase next char @@ -1138,9 +1139,9 @@ to occur. Here are two common cases: 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; -=item tr/SEARCHLIST/REPLACEMENTLIST/cds +=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC -=item y/SEARCHLIST/REPLACEMENTLIST/cds +=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns @@ -1160,6 +1161,8 @@ Options: c Complement the SEARCHLIST. d Delete found but unreplaced characters. s Squash duplicate replaced characters. + U Translate to/from UTF-8. + C Translate to/from 8-bit char (octet). If the C</c> modifier is specified, the SEARCHLIST character set is complemented. If the C</d> modifier is specified, any characters specified @@ -1177,6 +1180,10 @@ enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class. +The first C</U> or C</C> modifier applies to the left side of the translation. +The second one applies to the right side. If present, these modifiers override +the current utf8 state. + Examples: $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case @@ -1196,6 +1203,9 @@ Examples: tr [\200-\377] [\000-\177]; # delete 8th bit + tr/\0-\xFF//CU; # translate Latin-1 to Unicode + tr/\0-\x{FF}//UC; # translate Unicode to Latin-1 + If multiple transliterations are given for a character, only the first one is used: tr/AAA/XYZ/ diff --git a/pod/perlre.pod b/pod/perlre.pod index 382ba65242..1b49ba4e7b 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -142,6 +142,7 @@ also work: \e escape (think troff) (ESC) \033 octal char (think of a PDP-11) \x1B hex char + \x{263a} wide hex char (Unicode SMILEY) \c[ control char \l lowercase next char (think vi) \u uppercase next char (think vi) @@ -166,6 +167,10 @@ In addition, Perl defines the following: \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character + \pP Match P, named property. Use \p{Prop} for longer names. + \PP Match non-P + \X Match eXtended Unicode "combining character sequence", \pM\pm* + \C Match a single C char (octet) even under utf8. A C<\w> matches a single alphanumeric character, not a whole word. To match a word you'd need to say C<\w+>. If C<use locale> is in |