summaryrefslogtreecommitdiff
path: root/pod/perlop.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2019-02-18 17:57:11 -0700
committerKarl Williamson <khw@cpan.org>2019-03-04 11:10:48 -0700
commit966b4e4752e107a969ce19fbdbdb819547d41137 (patch)
tree111ea3e4ab1a950210a3d8cb710cd19d61020a18 /pod/perlop.pod
parent0a142f463c08e1bf0466cee9a0f896e3d11e7dbf (diff)
downloadperl-966b4e4752e107a969ce19fbdbdb819547d41137.tar.gz
perlop: Improve documentation for (mostly) tr///
This adds examples and clarifications
Diffstat (limited to 'pod/perlop.pod')
-rw-r--r--pod/perlop.pod159
1 files changed, 104 insertions, 55 deletions
diff --git a/pod/perlop.pod b/pod/perlop.pod
index af695b678f..dd658bf5fb 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -2211,6 +2211,10 @@ Examples:
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
+ $foo !~ s/A/a/g; # Lowercase all A's in $foo; return
+ # 0 if any were found and changed;
+ # otherwise return 1
+
Note the use of C<$> instead of C<\> in the last example. Unlike
B<sed>, we use the \<I<digit>> form only in the left hand side.
Anywhere else it's $<I<digit>>.
@@ -2405,10 +2409,14 @@ X<tr> X<y> X<transliterate> X</c> X</d> X</s>
=item C<y/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
-Transliterates all occurrences of the characters found in the search list
-with the corresponding character in the replacement list. It returns
-the number of characters replaced or deleted. If no string is
-specified via the C<=~> or C<!~> operator, the C<$_> string is transliterated.
+Transliterates all occurrences of the characters found (or not found
+if the C</c> modifier is specified) in the search list with the
+positionally corresponding character in the replacement list, possibly
+deleting some, depending on the modifiers specified. It returns the
+number of characters replaced or deleted. If no string is specified via
+the C<=~> or C<!~> operator, the C<$_> string is transliterated.
+
+For B<sed> devotees, C<y> is provided as a synonym for C<tr>.
If the C</r> (non-destructive) option is present, a new copy of the string
is made and its characters transliterated, and this copy is returned no
@@ -2428,20 +2436,18 @@ Otherwise, a character range may be specified with a hyphen, so
C<tr/A-J/0-9/> does the same replacement as
C<tr/ACEGIBDFHJ/0246813579/>.
-For B<sed> devotees, C<y> is provided as a synonym for C<tr>.
-
If the I<SEARCHLIST> is delimited by bracketing quotes, the
I<REPLACEMENTLIST> must have its own pair of quotes, which may or may
not be bracketing quotes; for example, C<tr[aeiouy][yuoiea]> or
C<tr(+\-*/)/ABCD/>.
-Characters may be literals or (if the delimiters aren't single quotes)
+Characters may be literals, or (if the delimiters aren't single quotes)
any of the escape sequences accepted in double-quoted strings. But
there is never any variable interpolation, so C<"$"> and C<"@"> are
-treated as literals. A hyphen at the beginning or end, or preceded by a
-backslash is considered a literal. Escape sequence details are in L<the
-table near the beginning of this section|/Quote and Quote-like
-Operators>.
+always treated as literals. A hyphen at the beginning or end, or
+preceded by a backslash is also always considered a literal. Escape
+sequence details are in L<the table near the beginning of this
+section|/Quote and Quote-like Operators>.
Note that C<tr> does B<not> do regular expression character classes such as
C<\d> or C<\pL>. The C<tr> operator is not equivalent to the C<L<tr(1)>>
@@ -2480,85 +2486,128 @@ range's end points are expressed as C<\N{...}>
removes from C<$string> all the platform's characters which are
equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This
is a portable range, and has the same effect on every platform it is
-run on. It turns out that in this example, these are the ASCII
+run on. In this example, these are the ASCII
printable characters. So after this is run, C<$string> has only
controls and characters which have no ASCII equivalents.
But, even for portable ranges, it is not generally obvious what is
-included without having to look things up. A sound principle is to use
-only ranges that both begin from and end at either ASCII alphabetics of
-equal case (C<b-e>, C<B-E>), or digits (C<1-4>). Anything else is
-unclear (and unportable unless C<\N{...}> is used). If in doubt, spell
-out the character sets in full.
+included without having to look things up in the manual. A sound
+principle is to use only ranges that both begin from, and end at, either
+ASCII alphabetics of equal case (C<b-e>, C<B-E>), or digits (C<1-4>).
+Anything else is unclear (and unportable unless C<\N{...}> is used). If
+in doubt, spell out the character sets in full.
Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
- s Squash duplicate replaced characters.
r Return the modified string and leave the original string
untouched.
+ s Squash duplicate replaced characters.
-If the C</c> modifier is specified, the I<SEARCHLIST> character set
-is complemented. So for example these two are equivalent (the exact
-maximum number will depend on your platform):
-
- tr/\x00-\xfd/ABCD/c
- tr/\xfe-\x{7fffffff}/ABCD/
+If the C</d> modifier is specified, any characters specified by
+I<SEARCHLIST> not found in I<REPLACEMENTLIST> are deleted. (Note that
+this is slightly more flexible than the behavior of some B<tr> programs,
+which delete anything they find in the I<SEARCHLIST>, period.)
-If the C</d> modifier is specified, any characters
-specified by I<SEARCHLIST> not found in I<REPLACEMENTLIST> are deleted.
-(Note that this is slightly more flexible than the behavior of some
-B<tr> programs, which delete anything they find in the I<SEARCHLIST>,
-period.)
+If the C</s> modifier is specified, sequences of characters, all in a
+row, that were transliterated to the same character are squashed down to
+a single instance of that character.
-If the C</s> modifier is specified, runs of the same character in the
-result, where each those characters were substituted by the
-transliteration, are squashed down to a single instance of the character.
+ my $a = "aaaba"
+ $a =~ tr/a/a/s # $a now is "aba"
If the C</d> modifier is used, the I<REPLACEMENTLIST> is always interpreted
exactly as specified. Otherwise, if the I<REPLACEMENTLIST> is shorter
-than the I<SEARCHLIST>, the final character is replicated till it is long
-enough. If the I<REPLACEMENTLIST> is empty, the I<SEARCHLIST> is replicated.
-This latter is useful for counting characters in a class or for
-squashing character sequences in a class. For example, each of these pairs
-are equivalent:
+than the I<SEARCHLIST>, the final character, if any, is replicated until
+it is long enough. There won't be a final character if and only if the
+I<REPLACEMENTLIST> is empty, in which case I<REPLACEMENTLIST> is
+copied from I<SEARCHLIST>. An empty I<REPLACEMENTLIST> is useful
+for counting characters in a class, or for squashing character sequences
+in a class.
tr/abcd// tr/abcd/abcd/
tr/abcd/AB/ tr/abcd/ABBB/
tr/abcd//d s/[abcd]//g
tr/abcd/AB/d (tr/ab/AB/ + s/[cd]//g) - but run together
+If the C</c> modifier is specified, the characters to be transliterated
+are the ones NOT in I<SEARCHLIST>, that is, it is complemented. If
+C</d> and/or C</s> are also specified, they apply to the complemented
+I<SEARCHLIST>. Recall, that if I<REPLACEMENTLIST> is empty (except
+under C</d>) a copy of I<SEARCHLIST> is used instead. That copy is made
+after complementing under C</c>. I<SEARCHLIST> is sorted by code point
+order after complementing, and any I<REPLACEMENTLIST> is applied to
+that sorted result. This means that under C</c>, the order of the
+characters specified in I<SEARCHLIST> is irrelevant. This can
+lead to different results on EBCDIC systems if I<REPLACEMENTLIST>
+contains more than one character, hence it is generally non-portable to
+use C</c> with such a I<REPLACEMENTLIST>.
+
+Another way of describing the operation is this:
+If C</c> is specified, the I<SEARCHLIST> is sorted by code point order,
+then complemented. If I<REPLACEMENTLIST> is empty and C</d> is not
+specified, I<REPLACEMENTLIST> is replaced by a copy of I<SEARCHLIST> (as
+modified under C</c>), and these potentially modified lists are used as
+the basis for what follows. Any character in the target string that
+isn't in I<SEARCHLIST> is passed through unchanged. Every other
+character in the target string is replaced by the character in
+I<REPLACEMENTLIST> that positionally corresponds to its mate in
+I<SEARCHLIST>, except that under C</s>, the 2nd and following characters
+are squeezed out in a sequence of characters in a row that all translate
+to the same character. If I<SEARCHLIST> is longer than
+I<REPLACEMENTLIST>, characters in the target string that match a
+character in I<SEARCHLIST> that doesn't have a correspondence in
+I<REPLACEMENTLIST> are either deleted from the target string if C</d> is
+specified; or replaced by the final character in I<REPLACEMENTLIST> if
+C</d> isn't specified.
+
Some examples:
- $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
+ $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
+
+ $cnt = tr/*/*/; # count the stars in $_
+ $cnt = tr/*//; # same thing
+
+ $cnt = $sky =~ tr/*/*/; # count the stars in $sky
+ $cnt = $sky =~ tr/*//; # same thing
- $cnt = tr/*/*/; # count the stars in $_
+ $cnt = $sky =~ tr/*//c; # count all the non-stars in $sky
+ $cnt = $sky =~ tr/*/*/c; # same, but transliterate each non-star
+ # into a star, leaving the already-stars
+ # alone. Afterwards, everything in $sky
+ # is a star.
- $cnt = $sky =~ tr/*/*/; # count the stars in $sky
+ $cnt = tr/0-9//; # count the ASCII digits in $_
- $cnt = tr/0-9//; # count the digits in $_
+ tr/a-zA-Z//s; # bookkeeper -> bokeper
+ tr/o/o/s; # bookkeeper -> bokkeeper
+ tr/oe/oe/s; # bookkeeper -> bokkeper
+ tr/oe//s; # bookkeeper -> bokkeper
+ tr/oe/o/s; # bookkeeper -> bokkopor
- tr/a-zA-Z//s; # bookkeeper -> bokeper
+ ($HOST = $host) =~ tr/a-z/A-Z/;
+ $HOST = $host =~ tr/a-z/A-Z/r; # same thing
- ($HOST = $host) =~ tr/a-z/A-Z/;
- $HOST = $host =~ tr/a-z/A-Z/r; # same thing
+ $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
+ =~ s/:/ -p/r;
- $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
- =~ s/:/ -p/r;
+ tr/a-zA-Z/ /cs; # change non-alphas to single space
- tr/a-zA-Z/ /cs; # change non-alphas to single space
+ @stripped = map tr/a-zA-Z/ /csr, @original;
+ # /r with map
- @stripped = map tr/a-zA-Z/ /csr, @original;
- # /r with map
+ tr [\200-\377]
+ [\000-\177]; # wickedly delete 8th bit
- tr [\200-\377]
- [\000-\177]; # wickedly delete 8th bit
+ $foo !~ tr/A/a/ # transliterate all the A's in $foo to 'a',
+ # return 0 if any were found and changed.
+ # Otherwise return 1
If multiple transliterations are given for a character, only the
first one is used:
- tr/AAA/XYZ/
+ tr/AAA/XYZ/
will transliterate any A to X.
@@ -2567,10 +2616,10 @@ the I<SEARCHLIST> nor the I<REPLACEMENTLIST> are subjected to double quote
interpolation. That means that if you want to use variables, you
must use an C<eval()>:
- eval "tr/$oldlist/$newlist/";
- die $@ if $@;
+ eval "tr/$oldlist/$newlist/";
+ die $@ if $@;
- eval "tr/$oldlist/$newlist/, 1" or die $@;
+ eval "tr/$oldlist/$newlist/, 1" or die $@;
=item C<< <<I<EOF> >>
X<here-doc> X<heredoc> X<here-document> X<<< << >>>