summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
Diffstat (limited to 'pod')
-rw-r--r--pod/perl.pod1
-rw-r--r--pod/perlfunc.pod33
-rw-r--r--pod/perlhist.pod1
-rw-r--r--pod/perlop.pod20
-rw-r--r--pod/perlre.pod5
5 files changed, 42 insertions, 18 deletions
diff --git a/pod/perl.pod b/pod/perl.pod
index 0b9e9fa680..4895bb2711 100644
--- a/pod/perl.pod
+++ b/pod/perl.pod
@@ -254,7 +254,6 @@ Perl developers, please write to <F<perl-thanks@perl.org>>.
=head1 FILES
- "/tmp/perl-e$$" temporary file for -e commands
"@INC" locations of perl libraries
=head1 SEE ALSO
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index ec80259d4b..9cab569aa6 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -555,7 +555,9 @@ restrictions may be relaxed, but this is not a portable assumption.
=item chr
Returns the character represented by that NUMBER in the character set.
-For example, C<chr(65)> is C<"A"> in ASCII. For the reverse, use L</ord>.
+For example, C<chr(65)> is C<"A"> in either ASCII or Unicode, and
+chr(0x263a) is a Unicode smiley face (but only within the scope of a
+C<use utf8>). For the reverse, use L</ord>.
If NUMBER is omitted, uses C<$_>.
@@ -1945,7 +1947,7 @@ C<redo> work.
Returns an lowercased version of EXPR. This is the internal function
implementing the C<\L> escape in double-quoted strings.
-Respects current C<LC_CTYPE> locale if C<use locale> in force. See L<perllocale>.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses C<$_>.
@@ -1955,7 +1957,7 @@ If EXPR is omitted, uses C<$_>.
Returns the value of EXPR with the first character lowercased. This is
the internal function implementing the C<\l> escape in double-quoted strings.
-Respects current C<LC_CTYPE> locale if C<use locale> in force. See L<perllocale>.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses C<$_>.
@@ -1963,7 +1965,7 @@ If EXPR is omitted, uses C<$_>.
=item length
-Returns the length in bytes of the value of EXPR. If EXPR is
+Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of C<$_>.
=item link OLDFILE,NEWFILE
@@ -2382,7 +2384,7 @@ DIRHANDLEs have their own namespace separate from FILEHANDLEs.
=item ord
-Returns the numeric ascii value of the first character of EXPR. If
+Returns the numeric (ASCII or Unicode) value of the first character of EXPR. If
EXPR is omitted, uses C<$_>. For the reverse, see L</chr>.
=item pack TEMPLATE,LIST
@@ -2400,7 +2402,7 @@ follows:
H A hex string (high nybble first).
c A signed char value.
- C An unsigned char value.
+ C An unsigned char value. Only does bytes. See U for Unicode.
s A signed short value.
S An unsigned short value.
@@ -2433,6 +2435,8 @@ follows:
P A pointer to a structure (fixed-length string).
u A uuencoded string.
+ U A Unicode character number. Encodes to UTF-8 internally.
+ Works even if C<use utf8> is not in effect.
w A BER compressed integer. Its bytes represent an unsigned
integer in base 128, most significant digit first, with as
@@ -2470,10 +2474,12 @@ C<unpack("f", pack("f", $foo)>) will not in general equal C<$foo>).
Examples:
- $foo = pack("cccc",65,66,67,68);
+ $foo = pack("CCCC",65,66,67,68);
# foo eq "ABCD"
- $foo = pack("c4",65,66,67,68);
+ $foo = pack("C4",65,66,67,68);
# same thing
+ $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
+ # same thing with Unicode circled letters
$foo = pack("ccxxcc",65,66,67,68);
# foo eq "AB\0\0CD"
@@ -2905,13 +2911,13 @@ will automatically return the value of the last expression evaluated.)
In list context, returns a list value consisting of the elements
of LIST in the opposite order. In scalar context, concatenates the
-elements of LIST, and returns a string value consisting of those bytes,
-but in the opposite order.
+elements of LIST, and returns a string value with all the characters
+in the opposite order.
print reverse <>; # line tac, last line first
undef $/; # for efficiency of <>
- print scalar reverse <>; # byte tac, last line tsrif
+ print scalar reverse <>; # character tac, last line tsrif
This operator is also handy for inverting a hash, although there are some
caveats. If a value is duplicated in the original hash, only one of those
@@ -4070,6 +4076,8 @@ otherwise.
Returns an uppercased version of EXPR. This is the internal function
implementing the C<\U> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
+Under Unicode (C<use utf8>) it uses the standard Unicode uppercase mappings. (It
+does not attempt to do titlecase mapping on initial letters. See C<ucfirst()> for that.)
If EXPR is omitted, uses C<$_>.
@@ -4077,7 +4085,8 @@ If EXPR is omitted, uses C<$_>.
=item ucfirst
-Returns the value of EXPR with the first character uppercased. This is
+Returns the value of EXPR with the first character
+in uppercase (titlecase in Unicode). This is
the internal function implementing the C<\u> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
diff --git a/pod/perlhist.pod b/pod/perlhist.pod
index 9ed8b6f52e..95a354fd51 100644
--- a/pod/perlhist.pod
+++ b/pod/perlhist.pod
@@ -302,6 +302,7 @@ the strings?).
Graham 5.005_03 1998-
Sarathy 5.005_50 1998-Jul-26 The 5.006 development track.
+ 5.005_51 1998-Aug-10
=head2 SELECTED RELEASE SIZES
diff --git a/pod/perlop.pod b/pod/perlop.pod
index c7209fac28..35f9e5f4f8 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -636,7 +636,7 @@ next line. This allows you to write:
For constructs that do interpolation, variables beginning with "C<$>"
or "C<@>" are interpolated, as are the following sequences. Within
-a transliteration, the first ten of these sequences may be used.
+a transliteration, the first eleven of these sequences may be used.
\t tab (HT, TAB)
\n newline (NL)
@@ -645,8 +645,9 @@ a transliteration, the first ten of these sequences may be used.
\b backspace (BS)
\a alarm (bell) (BEL)
\e escape (ESC)
- \033 octal char
- \x1b hex char
+ \033 octal char (ESC)
+ \x1b hex char (ESC)
+ \x{263a} wide hex char (SMILEY)
\c[ control char
\l lowercase next char
@@ -1138,9 +1139,9 @@ to occur. Here are two common cases:
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
-=item tr/SEARCHLIST/REPLACEMENTLIST/cds
+=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC
-=item y/SEARCHLIST/REPLACEMENTLIST/cds
+=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
@@ -1160,6 +1161,8 @@ Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
+ U Translate to/from UTF-8.
+ C Translate to/from 8-bit char (octet).
If the C</c> modifier is specified, the SEARCHLIST character set is
complemented. If the C</d> modifier is specified, any characters specified
@@ -1177,6 +1180,10 @@ enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.
+The first C</U> or C</C> modifier applies to the left side of the translation.
+The second one applies to the right side. If present, these modifiers override
+the current utf8 state.
+
Examples:
$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
@@ -1196,6 +1203,9 @@ Examples:
tr [\200-\377]
[\000-\177]; # delete 8th bit
+ tr/\0-\xFF//CU; # translate Latin-1 to Unicode
+ tr/\0-\x{FF}//UC; # translate Unicode to Latin-1
+
If multiple transliterations are given for a character, only the first one is used:
tr/AAA/XYZ/
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 382ba65242..1b49ba4e7b 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -142,6 +142,7 @@ also work:
\e escape (think troff) (ESC)
\033 octal char (think of a PDP-11)
\x1B hex char
+ \x{263a} wide hex char (Unicode SMILEY)
\c[ control char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
@@ -166,6 +167,10 @@ In addition, Perl defines the following:
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
+ \pP Match P, named property. Use \p{Prop} for longer names.
+ \PP Match non-P
+ \X Match eXtended Unicode "combining character sequence", \pM\pm*
+ \C Match a single C char (octet) even under utf8.
A C<\w> matches a single alphanumeric character, not a whole
word. To match a word you'd need to say C<\w+>. If C<use locale> is in