summaryrefslogtreecommitdiff
path: root/pod/perlop.pod
diff options
context:
space:
mode:
authorGurusamy Sarathy <gsar@cpan.org>1999-05-24 07:24:11 +0000
committerGurusamy Sarathy <gsar@cpan.org>1999-05-24 07:24:11 +0000
commit19799a22062ef658e4ac543ea06fa9193323512a (patch)
treeae9ae04d1351eb1dbbc2ea3cfe207cf056e56371 /pod/perlop.pod
parentd92eb7b0e84a41728b3fbb642691f159dbe28882 (diff)
downloadperl-19799a22062ef658e4ac543ea06fa9193323512a.tar.gz
major pod update from Tom Christiansen
p4raw-id: //depot/perl@3460
Diffstat (limited to 'pod/perlop.pod')
-rw-r--r--pod/perlop.pod829
1 files changed, 425 insertions, 404 deletions
diff --git a/pod/perlop.pod b/pod/perlop.pod
index 106b9a9a87..0f8117ced9 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -5,11 +5,11 @@ perlop - Perl operators and precedence
=head1 SYNOPSIS
Perl operators have the following associativity and precedence,
-listed from highest precedence to lowest. Note that all operators
-borrowed from C keep the same precedence relationship with each other,
-even where C's precedence is slightly screwy. (This makes learning
-Perl easier for C folks.) With very few exceptions, these all
-operate on scalar values only, not array values.
+listed from highest precedence to lowest. Operators borrowed from
+C keep the same precedence relationship with each other, even where
+C's precedence is slightly screwy. (This makes learning Perl easier
+for C folks.) With very few exceptions, these all operate on scalar
+values only, not array values.
left terms and list operators (leftward)
left ->
@@ -64,11 +64,11 @@ For example, in
@ary = (1, 3, sort 4, 2);
print @ary; # prints 1324
-the commas on the right of the sort are evaluated before the sort, but
-the commas on the left are evaluated after. In other words, list
-operators tend to gobble up all the arguments that follow them, and
+the commas on the right of the sort are evaluated before the sort,
+but the commas on the left are evaluated after. In other words,
+list operators tend to gobble up all arguments that follow, and
then act like a simple TERM with regard to the preceding expression.
-Note that you have to be careful with parentheses:
+Be careful with parentheses:
# These evaluate exit before doing the print:
print($foo, exit); # Obviously not what you want.
@@ -95,16 +95,18 @@ as well as L<"I/O Operators">.
=head2 The Arrow Operator
-Just as in C and C++, "C<-E<gt>>" is an infix dereference operator. If the
-right side is either a C<[...]> or C<{...}> subscript, then the left side
-must be either a hard or symbolic reference to an array or hash (or
-a location capable of holding a hard reference, if it's an lvalue (assignable)).
-See L<perlref>.
+"C<-E<gt>>" is an infix dereference operator, just as it is in C
+and C++. If the right side is either a C<[...]>, C<{...}>, or a
+C<(...)> subscript, then the left side must be either a hard or
+symbolic reference to an array, a hash, or a subroutine respectively.
+(Or technically speaking, a location capable of holding a hard
+reference, if it's an array or hash reference being used for
+assignment.) See L<perlreftut> and L<perlref>.
-Otherwise, the right side is a method name or a simple scalar variable
-containing the method name, and the left side must either be an object
-(a blessed reference) or a class name (that is, a package name).
-See L<perlobj>.
+Otherwise, the right side is a method name or a simple scalar
+variable containing either the method name or a subroutine reference,
+and the left side must be either an object (a blessed reference)
+or a class name (that is, a package name). See L<perlobj>.
=head2 Auto-increment and Auto-decrement
@@ -129,7 +131,7 @@ The auto-decrement operator is not magical.
=head2 Exponentiation
-Binary "**" is the exponentiation operator. Note that it binds even more
+Binary "**" is the exponentiation operator. It binds even more
tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
implemented using C's pow(3) function, which actually works on doubles
internally.)
@@ -155,10 +157,10 @@ syntactically for separating a function name from a parenthesized expression
that would otherwise be interpreted as the complete list of function
arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
-Unary "\" creates a reference to whatever follows it. See L<perlref>.
-Do not confuse this behavior with the behavior of backslash within a
-string, although both forms do convey the notion of protecting the next
-thing from interpretation.
+Unary "\" creates a reference to whatever follows it. See L<perlreftut>
+and L<perlref>. Do not confuse this behavior with the behavior of
+backslash within a string, although both forms do convey the notion
+of protecting the next thing from interpolation.
=head2 Binding Operators
@@ -384,23 +386,26 @@ of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
own boolean state. It is false as long as its left operand is false.
Once the left operand is true, the range operator stays true until the
right operand is true, I<AFTER> which the range operator becomes false
-again. (It doesn't become false till the next time the range operator is
+again. It doesn't become false till the next time the range operator is
evaluated. It can test the right operand and become false on the same
evaluation it became true (as in B<awk>), but it still returns true once.
-If you don't want it to test the right operand till the next evaluation
-(as in B<sed>), use three dots ("...") instead of two.) The right
-operand is not evaluated while the operator is in the "false" state, and
-the left operand is not evaluated while the operator is in the "true"
-state. The precedence is a little lower than || and &&. The value
-returned is either the empty string for false, or a sequence number
-(beginning with 1) for true. The sequence number is reset for each range
-encountered. The final sequence number in a range has the string "E0"
-appended to it, which doesn't affect its numeric value, but gives you
-something to search for if you want to exclude the endpoint. You can
-exclude the beginning point by waiting for the sequence number to be
-greater than 1. If either operand of scalar ".." is a constant expression,
-that operand is implicitly compared to the C<$.> variable, the current
-line number. Examples:
+If you don't want it to test the right operand till the next
+evaluation, as in B<sed>, just use three dots ("...") instead of
+two. In all other regards, "..." behaves just like ".." does.
+
+The right operand is not evaluated while the operator is in the
+"false" state, and the left operand is not evaluated while the
+operator is in the "true" state. The precedence is a little lower
+than || and &&. The value returned is either the empty string for
+false, or a sequence number (beginning with 1) for true. The
+sequence number is reset for each range encountered. The final
+sequence number in a range has the string "E0" appended to it, which
+doesn't affect its numeric value, but gives you something to search
+for if you want to exclude the endpoint. You can exclude the
+beginning point by waiting for the sequence number to be greater
+than 1. If either operand of scalar ".." is a constant expression,
+that operand is implicitly compared to the C<$.> variable, the
+current line number. Examples:
As a scalar operator:
@@ -429,7 +434,7 @@ can say
@alphabet = ('A' .. 'Z');
-to get all the letters of the alphabet, or
+to get all normal letters of the alphabet, or
$hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
@@ -464,8 +469,6 @@ legal lvalues (meaning that you can assign to them):
($a_or_b ? $a : $b) = $c;
-This is not necessarily guaranteed to contribute to the readability of your program.
-
Because this operator produces an assignable result, using assignments
without parentheses will get you in trouble. For example, this:
@@ -479,6 +482,10 @@ Rather than this:
($a % 2) ? ($a += 10) : ($a += 2)
+That should probably be written more simply as:
+
+ $a += ($a % 2) ? 10 : 2;
+
=head2 Assignment Operators
"=" is the ordinary assignment operator.
@@ -500,7 +507,7 @@ The following are recognized:
.= %= ^=
x=
-Note that while these are grouped by family, they all have the precedence
+Although these are grouped by family, they all have the precedence
of assignment.
Unlike in C, the assignment operator produces a valid lvalue. Modifying
@@ -573,14 +580,14 @@ probably avoid using this for assignment, only for control flow.
($a = $b) or $c; # really means this
$a = $b || $c; # better written this way
-However, when it's a list context assignment and you're trying to use
+However, when it's a list-context assignment and you're trying to use
"||" for control flow, you probably need "or" so that the assignment
takes higher precedence.
@info = stat($file) || die; # oops, scalar sense of stat!
@info = stat($file) or die; # better, now @info gets its due
-Then again, you could always use parentheses.
+Then again, you could always use parentheses.
Binary "xor" returns the exclusive-OR of the two surrounding expressions.
It cannot short circuit, of course.
@@ -602,7 +609,7 @@ operators are typed: $, @, %, and &.)
=item (TYPE)
-Type casting operator.
+Type-casting operator.
=back
@@ -627,17 +634,17 @@ the same character fore and aft, but the 4 sorts of brackets
s{}{} Substitution yes (unless '' is delimiter)
tr{}{} Transliteration no (but see below)
-Note that there can be whitespace between the operator and the quoting
+There can be whitespace between the operator and the quoting
characters, except when C<#> is being used as the quoting character.
-C<q#foo#> is parsed as being the string C<foo>, while C<q #foo#> is the
-operator C<q> followed by a comment. Its argument will be taken from the
-next line. This allows you to write:
+C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
+operator C<q> followed by a comment. Its argument will be taken
+from the next line. This allows you to write:
s {foo} # Replace foo
{bar} # with bar.
-For constructs that do interpolation, variables beginning with "C<$>"
-or "C<@>" are interpolated, as are the following sequences. Within
+For constructs that do interpolate, variables beginning with "C<$>"
+or "C<@>" are interpolated, as are the following escape sequences. Within
a transliteration, the first eleven of these sequences may be used.
\t tab (HT, TAB)
@@ -650,7 +657,7 @@ a transliteration, the first eleven of these sequences may be used.
\033 octal char (ESC)
\x1b hex char (ESC)
\x{263a} wide hex char (SMILEY)
- \c[ control char
+ \c[ control char (ESC)
\l lowercase next char
\u uppercase next char
@@ -664,7 +671,7 @@ and C<\U> is taken from the current locale. See L<perllocale>.
All systems use the virtual C<"\n"> to represent a line terminator,
called a "newline". There is no such thing as an unvarying, physical
-newline character. It is an illusion that the operating system,
+newline character. It is only an illusion that the operating system,
device drivers, C libraries, and Perl all conspire to preserve. Not all
systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
on a Mac, these are reversed, and on systems without line terminator,
@@ -687,28 +694,17 @@ interpolated, so that regular expressions may be incorporated into the
pattern from the variables. If this is not what you want, use C<\Q> to
interpolate a variable literally.
-Apart from the above, there are no multiple levels of interpolation. In
-particular, contrary to the expectations of shell programmers, back-quotes
-do I<NOT> interpolate within double quotes, nor do single quotes impede
-evaluation of variables when used within double quotes.
+Apart from the behavior described above, Perl does not expand
+multiple levels of interpolation. In particular, contrary to the
+expectations of shell programmers, back-quotes do I<NOT> interpolate
+within double quotes, nor do single quotes impede evaluation of
+variables when used within double quotes.
=head2 Regexp Quote-Like Operators
Here are the quote-like operators that apply to pattern
matching and related activities.
-Most of this section is related to use of regular expressions from Perl.
-Such a use may be considered from two points of view: Perl handles a
-a string and a "pattern" to RE (regular expression) engine to match,
-RE engine finds (or does not find) the match, and Perl uses the findings
-of RE engine for its operation, possibly asking the engine for other matches.
-
-RE engine has no idea what Perl is going to do with what it finds,
-similarly, the rest of Perl has no idea what a particular regular expression
-means to RE engine. This creates a clean separation, and in this section
-we discuss matching from Perl point of view only. The other point of
-view may be found in L<perlre>.
-
=over 8
=item ?PATTERN?
@@ -727,21 +723,22 @@ patterns local to the current package are reset.
reset if eof; # clear ?? status for next file
}
-This usage is vaguely deprecated, and may be removed in some future
-version of Perl.
+This usage is vaguely depreciated, which means it just might possibly
+be removed in some distant future version of Perl, perhaps somewhere
+around the year 2168.
=item m/PATTERN/cgimosx
=item /PATTERN/cgimosx
Searches a string for a pattern match, and in scalar context returns
-true (1) or false (''). If no string is specified via the C<=~> or
-C<!~> operator, the $_ string is searched. (The string specified with
-C<=~> need not be an lvalue--it may be the result of an expression
-evaluation, but remember the C<=~> binds rather tightly.) See also
-L<perlre>.
-See L<perllocale> for discussion of additional considerations that apply
-when C<use locale> is in effect.
+true if it succeeds, false if it fails. If no string is specified
+via the C<=~> or C<!~> operator, the $_ string is searched. (The
+string specified with C<=~> need not be an lvalue--it may be the
+result of an expression evaluation, but remember the C<=~> binds
+rather tightly.) See also L<perlre>. See L<perllocale> for
+discussion of additional considerations that apply when C<use locale>
+is in effect.
Options are:
@@ -755,11 +752,10 @@ Options are:
If "/" is the delimiter then the initial C<m> is optional. With the C<m>
you can use any pair of non-alphanumeric, non-whitespace characters
-as delimiters. This is particularly useful for matching Unix path names
-that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
+as delimiters. This is particularly useful for matching path names
+that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
-If "'" is the delimiter, no variable interpolation is performed on the
-PATTERN.
+If "'" is the delimiter, no interpolation is performed on the PATTERN.
PATTERN may contain variables, which will be interpolated (and the
pattern recompiled) every time the pattern search is evaluated, except
@@ -770,12 +766,12 @@ the trailing delimiter. This avoids expensive run-time recompilations,
and is useful when the value you are interpolating won't change over
the life of the script. However, mentioning C</o> constitutes a promise
that you won't change the variables in the pattern. If you change them,
-Perl won't even notice.
+Perl won't even notice. See also L<qr//>.
If the PATTERN evaluates to the empty string, the last
I<successfully> matched regular expression is used instead.
-If the C</g> option is not used, C<m//> in a list context returns a
+If the C</g> option is not used, C<m//> in list context returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
also set, and that this differs from Perl 4's behavior.) When there are
@@ -805,15 +801,16 @@ remainder of the line, and assigns those three fields to $F1, $F2, and
$Etc. The conditional is true if any variables were assigned, i.e., if
the pattern matched.
-The C</g> modifier specifies global pattern matching--that is, matching
-as many times as possible within the string. How it behaves depends on
-the context. In list context, it returns a list of all the
-substrings matched by all the parentheses in the regular expression.
-If there are no parentheses, it returns a list of all the matched
-strings, as if there were parentheses around the whole pattern.
+The C</g> modifier specifies global pattern matching--that is,
+matching as many times as possible within the string. How it behaves
+depends on the context. In list context, it returns a list of the
+substrings matched by any capturing parentheses in the regular
+expression. If there are no parentheses, it returns a list of all
+the matched strings, as if there were parentheses around the whole
+pattern.
In scalar context, each execution of C<m//g> finds the next match,
-returning TRUE if it matches, and FALSE if there is no further match.
+returning true if it matches, and false if there is no further match.
The position after the last match can be read or set using the pos()
function; see L<perlfunc/pos>. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
@@ -823,8 +820,8 @@ string also resets the search position.
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
zero-width assertion that matches the exact position where the previous
C<m//g>, if any, left off. The C<\G> assertion is not supported without
-the C</g> modifier; currently, without C</g>, C<\G> behaves just like
-C<\A>, but that's accidental and may change in the future.
+the C</g> modifier. (Currently, without C</g>, C<\G> behaves just like
+C<\A>, but that's accidental and may change in the future.)
Examples:
@@ -832,12 +829,10 @@ Examples:
($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
# scalar context
- {
- local $/ = "";
- while (defined($paragraph = <>)) {
- while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
- $sentences++;
- }
+ $/ = ""; $* = 1; # $* deprecated in modern perls
+ while (defined($paragraph = <>)) {
+ while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
+ $sentences++;
}
}
print "$sentences\n";
@@ -893,7 +888,7 @@ Here is the output (split into several lines):
=item C<'STRING'>
-A single-quoted, literal string. A backslash represents a backslash
+A single-quoted, literal string. A backslash represents a backslash
unless followed by the delimiter or another backslash, in which case
the delimiter or backslash is interpolated.
@@ -909,15 +904,16 @@ A double-quoted, interpolated string.
$_ .= qq
(*** The previous line contains the naughty word "$1".\n)
- if /(tcl|rexx|python)/; # :-)
+ if /\b(tcl|java|python)\b/i; # :-)
$baz = "\n"; # a one-character string
=item qr/STRING/imosx
-Quote-as-a-regular-expression operator. I<STRING> is interpolated the
-same way as I<PATTERN> in C<m/PATTERN/>. If "'" is used as the
-delimiter, no variable interpolation is done. Returns a Perl value
-which may be used instead of the corresponding C</STRING/imosx> expression.
+This operators quotes--and compiles--its I<STRING> as a regular
+expression. I<STRING> is interpolated the same way as I<PATTERN>
+in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
+is done. Returns a Perl value which may be used instead of the
+corresponding C</STRING/imosx> expression.
For example,
@@ -936,7 +932,7 @@ The result may be used as a subpattern in a match:
$string =~ /$re/; # or this way
Since Perl may compile the pattern at the moment of execution of qr()
-operator, using qr() may have speed advantages in I<some> situations,
+operator, using qr() may have speed advantages in some situations,
notably if the result of qr() is used standalone:
sub match {
@@ -951,11 +947,11 @@ notably if the result of qr() is used standalone:
} @_;
}
-Precompilation of the pattern into an internal representation at the
-moment of qr() avoids a need to recompile the pattern every time a
-match C</$pat/> is attempted. (Note that Perl has many other
-internal optimizations, but none would be triggered in the above
-example if we did not use qr() operator.)
+Precompilation of the pattern into an internal representation at
+the moment of qr() avoids a need to recompile the pattern every
+time a match C</$pat/> is attempted. (Perl has many other internal
+optimizations, but none would be triggered in the above example if
+we did not use qr() operator.)
Options are:
@@ -1012,7 +1008,7 @@ double-quote interpolation, passing it on to the shell instead:
$perl_info = qx(ps $$); # that's Perl's $$
$shell_info = qx'ps $$'; # that's the new shell's $$
-Note that how the string gets evaluated is entirely subject to the command
+How that string gets evaluated is entirely subject to the command
interpreter on your system. On most platforms, you will have to protect
shell metacharacters if you want them treated literally. This is in
practice difficult to do, as it's unclear how to escape which characters.
@@ -1064,10 +1060,10 @@ Some frequently seen examples:
use POSIX qw( setlocale localeconv )
@EXPORT = qw( foo bar baz );
-A common mistake is to try to separate the words with comma or to put
-comments into a multi-line C<qw>-string. For this reason the C<-w>
-switch produce warnings if the STRING contains the "," or the "#"
-character.
+A common mistake is to try to separate the words with comma or to
+put comments into a multi-line C<qw>-string. For this reason, the
+B<-w> switch (that is, the C<$^W> variable) produces warnings if
+the STRING contains the "," or the "#" character.
=item s/PATTERN/REPLACEMENT/egimosx
@@ -1080,7 +1076,7 @@ variable is searched and modified. (The string specified with C<=~> must
be scalar variable, an array element, a hash element, or an assignment
to one of those, i.e., an lvalue.)
-If the delimiter chosen is a single quote, no variable interpolation is
+If the delimiter chosen is a single quote, no interpolation is
done on either the PATTERN or the REPLACEMENT. Otherwise, if the
PATTERN contains a $ that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
@@ -1163,16 +1159,14 @@ B<sed>, we use the \E<lt>I<digit>E<gt> form in only the left hand side.
Anywhere else it's $E<lt>I<digit>E<gt>.
Occasionally, you can't use just a C</g> to get all the changes
-to occur. Here are two common cases:
+to occur that you might want. Here are two common cases:
# put commas in the right places in an integer
- 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4
- 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
+ 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
# expand tabs to 8-column spacing
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
-
=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC
=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC
@@ -1206,14 +1200,14 @@ Options:
U Translate to/from UTF-8.
C Translate to/from 8-bit char (octet).
-If the C</c> modifier is specified, the SEARCHLIST character set is
-complemented. If the C</d> modifier is specified, any characters specified
-by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note
-that this is slightly more flexible than the behavior of some B<tr>
-programs, which delete anything they find in the SEARCHLIST, period.)
-If the C</s> modifier is specified, sequences of characters that were
-transliterated to the same character are squashed down to a single instance of the
-character.
+If the C</c> modifier is specified, the SEARCHLIST character set
+is complemented. If the C</d> modifier is specified, any characters
+specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
+(Note that this is slightly more flexible than the behavior of some
+B<tr> programs, which delete anything they find in the SEARCHLIST,
+period.) If the C</s> modifier is specified, sequences of characters
+that were transliterated to the same character are squashed down
+to a single instance of the character.
If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
@@ -1245,19 +1239,20 @@ Examples:
tr [\200-\377]
[\000-\177]; # delete 8th bit
- tr/\0-\xFF//CU; # translate Latin-1 to Unicode
- tr/\0-\x{FF}//UC; # translate Unicode to Latin-1
+ tr/\0-\xFF//CU; # change Latin-1 to Unicode
+ tr/\0-\x{FF}//UC; # change Unicode to Latin-1
-If multiple transliterations are given for a character, only the first one is used:
+If multiple transliterations are given for a character, only the
+first one is used:
tr/AAA/XYZ/
will transliterate any A to X.
-Note that because the transliteration table is built at compile time, neither
+Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
-interpolation. That means that if you want to use variables, you must use
-an eval():
+interpolation. That means that if you want to use variables, you
+must use an eval():
eval "tr/$oldlist/$newlist/";
die $@ if $@;
@@ -1268,52 +1263,52 @@ an eval():
=head2 Gory details of parsing quoted constructs
-When presented with something which may have several different
-interpretations, Perl uses the principle B<DWIM> (expanded to Do What I Mean
-- not what I wrote) to pick up the most probable interpretation of the
-source. This strategy is so successful that Perl users usually do not
-suspect ambivalence of what they write. However, time to time Perl's ideas
-differ from what the author meant.
-
-The target of this section is to clarify the Perl's way of interpreting
-quoted constructs. The most frequent reason one may have to want to know the
-details discussed in this section is hairy regular expressions. However, the
-first steps of parsing are the same for all Perl quoting operators, so here
-they are discussed together.
-
-The most important detail of Perl parsing rules is the first one
-discussed below; when processing a quoted construct, Perl I<first>
-finds the end of the construct, then it interprets the contents of the
-construct. If you understand this rule, you may skip the rest of this
-section on the first reading. The other rules would
-contradict user's expectations much less frequently than the first one.
-
-Some of the passes discussed below are performed concurrently, but as
-far as results are the same, we consider them one-by-one. For different
-quoting constructs Perl performs different number of passes, from
-one to five, but they are always performed in the same order.
+When presented with something that might have several different
+interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
+principle to pick the most probable interpretation. This strategy
+is so successful that Perl programmers often do not suspect the
+ambivalence of what they write. But from time to time, Perl's
+notions differ substantially from what the author honestly meant.
+
+This section hopes to clarify how Perl handles quoted constructs.
+Although the most common reason to learn this is to unravel labyrinthine
+regular expressions, because the initial steps of parsing are the
+same for all quoting operators, they are all discussed together.
+
+The most important Perl parsing rule is the first one discussed
+below: when processing a quoted construct, Perl first finds the end
+of that construct, then interprets its contents. If you understand
+this rule, you may skip the rest of this section on the first
+reading. The other rules are likely to contradict the user's
+expectations much less frequently than this first one.
+
+Some passes discussed below are performed concurrently, but because
+their results are the same, we consider them individually. For different
+quoting constructs, Perl performs different numbers of passes, from
+one to five, but these passes are always performed in the same order.
=over
=item Finding the end
-First pass is finding the end of the quoted construct, be it
-a multichar delimiter
-C<"\nEOF\n"> of C<<<EOF> construct, C</> which terminates C<qq/> construct,
-C<]> which terminates C<qq[> construct, or C<E<gt>> which terminates a
-fileglob started with C<<>.
+The first pass is finding the end of the quoted construct, whether
+it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
+construct, a C</> that terminates a C<qq//> construct, a C<]> which
+terminates C<qq[]> construct, or a C<E<gt>> which terminates a
+fileglob started with C<E<lt>>.
-When searching for one-char non-matching delimiter, such as C</>, combinations
-C<\\> and C<\/> are skipped. When searching for one-char matching delimiter,
-such as C<]>, combinations C<\\>, C<\]> and C<\[> are skipped, and
-nested C<[>, C<]> are skipped as well. When searching for multichar delimiter
-no skipping is performed.
+When searching for single-character non-pairing delimiters, such
+as C</>, combinations of C<\\> and C<\/> are skipped. However,
+when searching for single-character pairing delimiter like C<[>,
+combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
+C<[>, C<]> are skipped as well. When searching for multicharacter
+delimiters, nothing is skipped.
-For constructs with 3-part delimiters (C<s///> etc.) the search is
-repeated once more.
+For constructs with three-part delimiters (C<s///>, C<y///>, and
+C<tr///>), the search is repeated once more.
-During this search no attention is paid to the semantic of the construct,
-thus:
+During this search no attention is paid to the semantics of the construct.
+Thus:
"$hash{"$foo/$bar"}"
@@ -1323,30 +1318,28 @@ or:
bar # NOT a comment, this slash / terminated m//!
/x
-do not form legal quoted expressions, the quoted part ends on the first C<">
-and C</>, and the rest happens to be a syntax error. Note that since the slash
-which terminated C<m//> was followed by a C<SPACE>, the above is not C<m//x>,
-but rather C<m//> with no 'x' switch. So the embedded C<#> is interpreted
-as a literal C<#>.
+do not form legal quoted expressions. The quoted part ends on the
+first C<"> and C</>, and the rest happens to be a syntax error.
+Because the slash that terminated C<m//> was followed by a C<SPACE>,
+the example above is not C<m//x>, but rather C<m//> with no C</x>
+modifier. So the embedded C<#> is interpreted as a literal C<#>.
=item Removal of backslashes before delimiters
-During the second pass the text between the starting delimiter and
-the ending delimiter is copied to a safe location, and the C<\> is
-removed from combinations consisting of C<\> and delimiter(s) (both starting
-and ending delimiter if they differ).
-
-The removal does not happen for multi-char delimiters.
-
-Note that the combination C<\\> is left as it was!
+During the second pass, text between the starting and ending
+delimiters is copied to a safe location, and the C<\> is removed
+from combinations consisting of C<\> and delimiter--or delimiters,
+meaning both starting and ending delimiters will should these differ.
+This removal does not happen for multi-character delimiters.
+Note that the combination C<\\> is left intact, just as it was.
-Starting from this step no information about the delimiter(s) is used in the
-parsing.
+Starting from this step no information about the delimiters is
+used in parsing.
=item Interpolation
-Next step is interpolation in the obtained delimiter-independent text.
-There are four different cases.
+The next step is interpolation in the text obtained, which is now
+delimiter-independent. There are four different cases.
=over
@@ -1360,44 +1353,40 @@ The only interpolation is removal of C<\> from pairs C<\\>.
=item C<"">, C<``>, C<qq//>, C<qx//>, C<<file*globE<gt>>
-C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are converted
-to corresponding Perl constructs, thus C<"$foo\Qbaz$bar"> is converted to :
-
- $foo . (quotemeta("baz" . $bar));
-
-Other combinations of C<\> with following chars are substituted with
-appropriate expansions.
+C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
+converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
+is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
+The other combinations are replaced with appropriate expansions.
-Let it be stressed that I<whatever is between C<\Q> and C<\E>> is interpolated
-in the usual way. Say, C<"\Q\\E"> has no C<\E> inside: it has C<\Q>, C<\\>,
-and C<E>, thus the result is the same as for C<"\\\\E">. Generally speaking,
-having backslashes between C<\Q> and C<\E> may lead to counterintuitive
-results. So, C<"\Q\t\E"> is converted to:
-
- quotemeta("\t")
-
-which is the same as C<"\\\t"> (since TAB is not alphanumerical). Note also
-that:
+Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
+is interpolated in the usual way. Something like C<"\Q\\E"> has
+no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
+result is the same as for C<"\\\\E">. As a general rule, backslashes
+between C<\Q> and C<\E> may lead to counterintuitive results. So,
+C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
+as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
$str = '\t';
return "\Q$str";
may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
-Interpolated scalars and arrays are internally converted to the C<join> and
-C<.> Perl operations, thus C<"$foo >>> '@arr'"> becomes:
+Interpolated scalars and arrays are converted internally to the C<join> and
+C<.> catentation operations. Thus, C<"$foo XXX '@arr'"> becomes:
- $foo . " >>> '" . (join $", @arr) . "'";
+ $foo . " XXX '" . (join $", @arr) . "'";
-All the operations in the above are performed simultaneously left-to-right.
+All operations above are performed simultaneously, left to right.
-Since the result of "\Q STRING \E" has all the metacharacters quoted
-there is no way to insert a literal C<$> or C<@> inside a C<\Q\E> pair: if
-protected by C<\> C<$> will be quoted to became "\\\$", if not, it is
-interpreted as starting an interpolated scalar.
+Because the result of C<"\Q STRING \E"> has all metacharacters
+quoted, there is no way to insert a literal C<$> or C<@> inside a
+C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
+C<"\\\$">; if not, it is interpreted as the start of an interpolated
+scalar.
-Note also that the interpolating code needs to make a decision on where the
-interpolated scalar ends. For instance, whether C<"a $b -E<gt> {c}"> means:
+Note also that the interpolation code needs to make a decision on
+where the interpolated scalar ends. For instance, whether
+C<"a $b -E<gt> {c}"> really means:
"a " . $b . " -> {c}";
@@ -1405,99 +1394,108 @@ or:
"a " . $b -> {c};
-I<Most of the time> the decision is to take the longest possible text which
-does not include spaces between components and contains matching
-braces/brackets. Since the outcome may be determined by I<voting> based
-on heuristic estimators, the result I<is not strictly predictable>, but
-is usually correct for the ambiguous cases.
+Most of the time, the longest possible text that does not include
+spaces between components and which contains matching braces or
+brackets. because the outcome may be determined by voting based
+on heuristic estimators, the result is not strictly predictable.
+Fortunately, it's usually correct for ambiguous cases.
=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
-Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> and interpolation happens
-(almost) as with C<qq//> constructs, but I<the substitution of C<\> followed by
-RE-special chars (including C<\>) is not performed>! Moreover,
-inside C<(?{BLOCK})>, C<(?# comment )>, and C<#>-comment of
-C<//x>-regular expressions no processing is performed at all.
-This is the first step where presence of the C<//x> switch is relevant.
-
-Interpolation has several quirks: C<$|>, C<$(> and C<$)> are not interpolated, and
-constructs C<$var[SOMETHING]> are I<voted> (by several different estimators)
-to be an array element or C<$var> followed by a RE alternative. This is
-the place where the notation C<${arr[$bar]}> comes handy: C</${arr[0-9]}/>
-is interpreted as an array element C<-9>, not as a regular expression from
-variable C<$arr> followed by a digit, which is the interpretation of
-C</$arr[0-9]/>. Since voting among different estimators may be performed,
-the result I<is not predictable>.
-
-It is on this step that C<\1> is converted to C<$1> in the replacement
-text of C<s///>.
-
-Note that absence of processing of C<\\> creates specific restrictions on the
-post-processed text: if the delimiter is C</>, one cannot get the combination
-C<\/> into the result of this step: C</> will finish the regular expression,
-C<\/> will be stripped to C</> on the previous step, and C<\\/> will be left
-as is. Since C</> is equivalent to C<\/> inside a regular expression, this
-does not matter unless the delimiter is a special character for the RE engine,
-as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>, or an alphanumeric char, as in:
+Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
+happens (almost) as with C<qq//> constructs, but the substitution
+of C<\> followed by RE-special chars (including C<\>) is not
+performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
+a C<#>-comment in a C<//x>-regular expression, no processing is
+performed whatsoever. This is the first step at which the presence
+of the C<//x> modifier is relevant.
+
+Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
+interpolated, and constructs C<$var[SOMETHING]> are voted (by several
+different estimators) to be either an array element or C<$var>
+followed by an RE alternative. This is where the notation
+C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
+array element C<-9>, not as a regular expression from the variable
+C<$arr> followed by a digit, which would be the interpretation of
+C</$arr[0-9]/>. Since voting among different estimators may occur,
+the result is not predictable.
+
+It is at this step that C<\1> is begrudgingly converted to C<$1> in
+the replacement text of C<s///> to correct the incorrigible
+I<sed> hackers who haven't picked up the saner idiom yet. A warning
+is emitted if the B<-w> command-line flag (that is, the C<$^W> variable)
+was set.
+
+The lack of processing of C<\\> creates specific restrictions on
+the post-processed text. If the delimiter is C</>, one cannot get
+the combination C<\/> into the result of this step. C</> will
+finish the regular expression, C<\/> will be stripped to C</> on
+the previous step, and C<\\/> will be left as is. Because C</> is
+equivalent to C<\/> inside a regular expression, this does not
+matter unless the delimiter happens to be character special to the
+RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
+alphanumeric char, as in:
m m ^ a \s* b mmx;
-In the above RE, which is intentionally obfuscated for illustration, the
+In the RE above, which is intentionally obfuscated for illustration, the
delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
-RE is the same as for C<m/ ^ a s* b /mx>).
+RE is the same as for C<m/ ^ a s* b /mx>). There's more than one
+reason you're encouraged to restrict your delimiters to non-alphanumeric,
+non-whitespace choices.
=back
-This step is the last one for all the constructs except regular expressions,
+This step is the last one for all constructs except regular expressions,
which are processed further.
=item Interpolation of regular expressions
-All the previous steps were performed during the compilation of Perl code,
-this one happens in run time (though it may be optimized to be calculated
-at compile time if appropriate). After all the preprocessing performed
-above (and possibly after evaluation if catenation, joining, up/down-casing
-and C<quotemeta()>ing are involved) the resulting I<string> is passed to RE
-engine for compilation.
-
-Whatever happens in the RE engine is better be discussed in L<perlre>,
-but for the sake of continuity let us do it here.
-
-This is another step where presence of the C<//x> switch is relevant.
-The RE engine scans the string left-to-right, and converts it to a finite
-automaton.
-
-Backslashed chars are either substituted by corresponding literal
-strings (as with C<\{>), or generate special nodes of the finite automaton
-(as with C<\b>). Characters which are special to the RE engine (such as
-C<|>) generate corresponding nodes or groups of nodes. C<(?#...)>
-comments are ignored. All the rest is either converted to literal strings
-to match, or is ignored (as is whitespace and C<#>-style comments if
-C<//x> is present).
-
-Note that the parsing of the construct C<[...]> is performed using
-rather different rules than for the rest of the regular expression.
-The terminator of this construct is found using the same rules as for
-finding a terminator of a C<{}>-delimited construct, the only exception
-being that C<]> immediately following C<[> is considered as if preceded
-by a backslash. Similarly, the terminator of C<(?{...})> is found using
-the same rules as for finding a terminator of a C<{}>-delimited construct.
-
-It is possible to inspect both the string given to RE engine, and the
-resulting finite automaton. See arguments C<debug>/C<debugcolor>
-of C<use L<re>> directive, and/or B<-Dr> option of Perl in
-L<perlrun/Switches>.
+Previous steps were performed during the compilation of Perl code,
+but this one happens at run time--although it may be optimized to
+be calculated at compile time if appropriate. After preprocessing
+described above, and possibly after evaluation if catenation,
+joining, casing translation, or metaquoting are involved, the
+resulting I<string> is passed to the RE engine for compilation.
+
+Whatever happens in the RE engine might be better discussed in L<perlre>,
+but for the sake of continuity, we shall do so here.
+
+This is another step where the presence of the C<//x> modifier is
+relevant. The RE engine scans the string from left to right and
+converts it to a finite automaton.
+
+Backslashed characters are either replaced with corresponding
+literal strings (as with C<\{>), or else they generate special nodes
+in the finite automaton (as with C<\b>). Characters special to the
+RE engine (such as C<|>) generate corresponding nodes or groups of
+nodes. C<(?#...)> comments are ignored. All the rest is either
+converted to literal strings to match, or else is ignored (as is
+whitespace and C<#>-style comments if C<//x> is present).
+
+Parsing of the bracketed character class construct, C<[...]>, is
+rather different than the rule used for the rest of the pattern.
+The terminator of this construct is found using the same rules as
+for finding the terminator of a C<{}>-delimited construct, the only
+exception being that C<]> immediately following C<[> is treated as
+though preceded by a backslash. Similarly, the terminator of
+C<(?{...})> is found using the same rules as for finding the
+terminator of a C<{}>-delimited construct.
+
+It is possible to inspect both the string given to RE engine and the
+resulting finite automaton. See the arguments C<debug>/C<debugcolor>
+in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
+switch documented in L<perlrun/Switches>.
=item Optimization of regular expressions
This step is listed for completeness only. Since it does not change
semantics, details of this step are not documented and are subject
-to change. This step is performed over the finite automaton generated
-during the previous pass.
+to change without notice. This step is performed over the finite
+automaton that was generated during the previous pass.
-However, in older versions of Perl C<L<split>> used to silently
-optimize C</^/> to mean C</^/m>. This behaviour, though present
-in current versions of Perl, may be deprecated in future.
+It is at this stage that C<split()> silently optimizes C</^/> to
+mean C</^/m>.
=back
@@ -1506,39 +1504,40 @@ in current versions of Perl, may be deprecated in future.
There are several I/O operators you should know about.
A string enclosed by backticks (grave accents) first undergoes
-variable substitution just like a double quoted string. It is then
-interpreted as a command, and the output of that command is the value
-of the pseudo-literal, like in a shell. In scalar context, a single
-string consisting of all the output is returned. In list context,
-a list of values is returned, one for each line of output. (You can
-set C<$/> to use a different line terminator.) The command is executed
+double-quote interpolation. It is then interpreted as an external
+command, and the output of that command is the value of the
+pseudo-literal, j
+string consisting of all output is returned. In list context, a
+list of values is returned, one per line of output. (You can set
+C<$/> to use a different line terminator.) The command is executed
each time the pseudo-literal is evaluated. The status value of the
command is returned in C<$?> (see L<perlvar> for the interpretation
of C<$?>). Unlike in B<csh>, no translation is done on the return
data--newlines remain newlines. Unlike in any of the shells, single
quotes do not hide variable names in the command from interpretation.
-To pass a $ through to the shell you need to hide it with a backslash.
-The generalized form of backticks is C<qx//>. (Because backticks
-always undergo shell expansion as well, see L<perlsec> for
-security concerns.)
-
-In a scalar context, evaluating a filehandle in angle brackets yields the
-next line from that file (newline, if any, included), or C<undef> at
-end-of-file. When C<$/> is set to C<undef> (i.e. file slurp mode),
-and the file is empty, it returns C<''> the first time, followed by
-C<undef> subsequently.
-
-Ordinarily you must assign the returned value to a variable, but there is one
-situation where an automatic assignment happens. I<If and ONLY if> the
-input symbol is the only thing inside the conditional of a C<while> or
-C<for(;;)> loop, the value is automatically assigned to the variable
-C<$_>. In these loop constructs, the assigned value (whether assignment
-is automatic or explicit) is then tested to see if it is defined.
-The defined test avoids problems where line has a string value
-that would be treated as false by perl e.g. "" or "0" with no trailing
-newline. (This may seem like an odd thing to you, but you'll use the
-construct in almost every Perl script you write.) Anyway, the following
-lines are equivalent to each other:
+To pass a literal dollar-sign through to the shell you need to hide
+it with a backslash. The generalized form of backticks is C<qx//>.
+(Because backticks always undergo shell expansion as well, see
+L<perlsec> for security concerns.)
+
+In scalar context, evaluating a filehandle in angle brackets yields
+the next line from that file (the newline, if any, included), or
+C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
+(sometimes known as file-slurp mode) and the file is empty, it
+returns C<''> the first time, followed by C<undef> subsequently.
+
+Ordinarily you must assign the returned value to a variable, but
+there is one situation where an automatic assignment happens. If
+and only if the input symbol is the only thing inside the conditional
+of a C<while> statement (even if disguised as a C<for(;;)> loop),
+the value is automatically assigned to the global variable $_,
+destroying whatever was there previously. (This may seem like an
+odd thing to you, but you'll use the construct in almost every Perl
+script you write.) The $_ variables is not implicitly localized.
+You'll have to put a C<local $_;> before the loop if you want that
+to happen.
+
+The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
@@ -1548,34 +1547,40 @@ lines are equivalent to each other:
print while ($_ = <STDIN>);
print while <STDIN>;
-and this also behaves similarly, but avoids the use of $_ :
+This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
-If you really mean such values to terminate the loop they should be
-tested for explicitly:
+In these loop constructs, the assigned value (whether assignment
+is automatic or explicit) is then tested to see whether it is
+defined. The defined test avoids problems where line has a string
+value that would be treated as false by Perl, for example a "" or
+a "0" with no trailing newline. If you really mean for such values
+to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
-In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without explicit C<defined>
-test or comparison will solicit a warning if C<-w> is in effect.
+In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without an
+explicit C<defined> test or comparison elicit a warning if the B<-w>
+command-line switch (the C<$^W> variable) is in effect.
The filehandles STDIN, STDOUT, and STDERR are predefined. (The
-filehandles C<stdin>, C<stdout>, and C<stderr> will also work except in
-packages, where they would be interpreted as local identifiers rather
-than global.) Additional filehandles may be created with the open()
-function. See L<perlfunc/open> for details on this.
+filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
+in packages, where they would be interpreted as local identifiers
+rather than global.) Additional filehandles may be created with
+the open() function, amongst others. See L<perlopentut> and
+L<perlfunc/open> for details on this.
-If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a
-list consisting of all the input lines is returned, one line per list
-element. It's easy to make a I<LARGE> data space this way, so use with
-care.
+If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for
+a list, a list comprising all input lines is returned, one line per
+list element. It's easy to grow to a rather large data space this
+way, so use with care.
-E<lt>FILEHANDLEE<gt> may also be spelt readline(FILEHANDLE). See
-L<perlfunc/readline>.
+E<lt>FILEHANDLEE<gt> may also be spelled C<readline(*FILEHANDLE)>.
+See L<perlfunc/readline>.
-The null filehandle E<lt>E<gt> is special and can be used to emulate the
+The null filehandle E<lt>E<gt> is special: it can be used to emulate the
behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
standard input, or from each file listed on the command line. Here's
how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
@@ -1597,16 +1602,17 @@ is equivalent to the following Perl-like pseudo code:
}
}
-except that it isn't so cumbersome to say, and will actually work. It
-really does shift array @ARGV and put the current filename into variable
-$ARGV. It also uses filehandle I<ARGV> internally--E<lt>E<gt> is just a
-synonym for E<lt>ARGVE<gt>, which is magical. (The pseudo code above
-doesn't work because it treats E<lt>ARGVE<gt> as non-magical.)
+except that it isn't so cumbersome to say, and will actually work.
+It really does shift the @ARGV array and put the current filename
+into the $ARGV variable. It also uses filehandle I<ARGV>
+internally--E<lt>E<gt> is just a synonym for E<lt>ARGVE<gt>, which
+is magical. (The pseudo code above doesn't work because it treats
+E<lt>ARGVE<gt> as non-magical.)
You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
containing the list of filenames you really want. Line numbers (C<$.>)
-continue as if the input were one big happy file. (But see example
-under C<eof> for how to reset line numbers on each file.)
+continue as though the input were one big happy file. See the example
+in L<perlfunc/eof> for how to reset line numbers on each file.
If you want to set @ARGV to your own list of files, go right ahead.
This sets @ARGV to all plain text files if no @ARGV was given:
@@ -1634,12 +1640,13 @@ Getopts modules or put a loop on the front like this:
}
The E<lt>E<gt> symbol will return C<undef> for end-of-file only once.
-If you call it again after this it will assume you are processing another
-@ARGV list, and if you haven't set @ARGV, will input from STDIN.
+If you call it again after this, it will assume you are processing another
+@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
-If the string inside the angle brackets is a reference to a scalar
-variable (e.g., E<lt>$fooE<gt>), then that variable contains the name of the
-filehandle to input from, or its typeglob, or a reference to the same. For example:
+If angle brackets contain is a simple scalar variable (e.g.,
+E<lt>$fooE<gt>), then that variable contains the name of the
+filehandle to input from, or its typeglob, or a reference to the
+same. For example:
$fh = \*STDIN;
$line = <$fh>;
@@ -1648,9 +1655,9 @@ If what's within the angle brackets is neither a filehandle nor a simple
scalar variable containing a filehandle name, typeglob, or typeglob
reference, it is interpreted as a filename pattern to be globbed, and
either a list of filenames or the next filename in the list is returned,
-depending on context. This distinction is determined on syntactic
-grounds alone. That means C<E<lt>$xE<gt>> is always a readline from
-an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob.
+depending on context. This distinction is determined on syntactic
+grounds alone. That means C<E<lt>$xE<gt>> is always a readline() from
+an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob().
That's because $x is a simple scalar variable, but C<$hash{key}> is
not--it's a hash element.
@@ -1660,7 +1667,7 @@ in the previous paragraph. (In older versions of Perl, programmers
would insert curly brackets to force interpretation as a filename glob:
C<E<lt>${foo}E<gt>>. These days, it's considered cleaner to call the
internal function directly as C<glob($foo)>, which is probably the right
-way to have done it in the first place.) Example:
+way to have done it in the first place.) For example:
while (<*.c>) {
chmod 0644, $_;
@@ -1674,27 +1681,31 @@ is equivalent to
chmod 0644, $_;
}
-In fact, it's currently implemented that way. (Which means it will not
-work on filenames with spaces in them unless you have csh(1) on your
-machine.) Of course, the shortest way to do the above is:
+In fact, it's currently implemented that way, but this is expected
+to be made completely internal in the near future. (Which means
+it will not work on filenames with spaces in them unless you have
+csh(1) on your machine.) Of course, the shortest way to do the
+above is:
chmod 0644, <*.c>;
-Because globbing invokes a shell, it's often faster to call readdir() yourself
-and do your own grep() on the filenames. Furthermore, due to its current
-implementation of using a shell, the glob() routine may get "Arg list too
-long" errors (unless you've installed tcsh(1L) as F</bin/csh>).
-
-A glob evaluates its (embedded) argument only when it is starting a new
-list. All values must be read before it will start over. In a list
-context this isn't important, because you automatically get them all
-anyway. In scalar context, however, the operator returns the next value
-each time it is called, or a C<undef> value if you've just run out. As
-for filehandles an automatic C<defined> is generated when the glob
-occurs in the test part of a C<while> or C<for> - because legal glob returns
-(e.g. a file called F<0>) would otherwise terminate the loop.
-Again, C<undef> is returned only once. So if you're expecting a single value
-from a glob, it is much better to say
+Because globbing currently invokes a shell, it's often faster to
+call readdir() yourself and do your own grep() on the filenames.
+Furthermore, due to its current implementation of using a shell,
+the glob() routine may get "Arg list too long" errors (unless you've
+installed tcsh(1L) as F</bin/csh> or hacked your F<config.sh>).
+
+A (file)glob evaluates its (embedded) argument only when it is
+starting a new list. All values must be read before it will start
+over. In list context, this isn't important because you automatically
+get them all anyway. However, in scalar context the operator returns
+the next value each time it's called, or C
+run out. As with filehandle reads, an automatic C<defined> is
+generated when the glob occurs in the test part of a C<while>,
+because legal glob returns (e.g. a file called F<0>) would otherwise
+terminate the loop. Again, C<undef> is returned only once. So if
+you're expecting a single value from a glob, it is much better to
+say
($file) = <blurch*>;
@@ -1703,7 +1714,7 @@ than
$file = <blurch*>;
because the latter will alternate between returning a filename and
-returning FALSE.
+returning false.
It you're trying to do variable interpolation, it's definitely better
to use the glob() function, because the older notation can cause people
@@ -1715,10 +1726,10 @@ to become confused with the indirect filehandle notation.
=head2 Constant Folding
Like C, Perl does a certain amount of expression evaluation at
-compile time, whenever it determines that all arguments to an
+compile time whenever it determines that all arguments to an
operator are static and have no side effects. In particular, string
concatenation happens at compile time between literals that don't do
-variable substitution. Backslash interpretation also happens at
+variable substitution. Backslash interpolation also happens at
compile time. You can say
'Now is the time for all' . "\n" .
@@ -1731,20 +1742,20 @@ you say
if (-s $file > 5 + 100 * 2**16) { }
}
-the compiler will precompute the number that
-expression represents so that the interpreter
-won't have to.
+the compiler will precompute the number which that expression
+represents so that the interpreter won't have to.
=head2 Bitwise String Operators
Bitstrings of any size may be manipulated by the bitwise operators
(C<~ | & ^>).
-If the operands to a binary bitwise op are strings of different sizes,
-B<|> and B<^> ops will act as if the shorter operand had additional
-zero bits on the right, while the B<&> op will act as if the longer
-operand were truncated to the length of the shorter. Note that the
-granularity for such extension or truncation is one or more I<bytes>.
+If the operands to a binary bitwise op are strings of different
+sizes, B<|> and B<^> ops act as though the shorter operand had
+additional zero bits on the right, while the B<&> op acts as though
+the longer operand were truncated to the length of the shorter.
+The granularity for such extension or truncation is one or more
+bytes.
# ASCII-based examples
print "j p \n" ^ " a h"; # prints "JAPH\n"
@@ -1752,9 +1763,9 @@ granularity for such extension or truncation is one or more I<bytes>.
print "japh\nJunk" & '_____'; # prints "JAPH\n";
print 'p N$' ^ " E<H\n"; # prints "Perl\n";
-If you are intending to manipulate bitstrings, you should be certain that
+If you are intending to manipulate bitstrings, be certain that
you're supplying bitstrings: If an operand is a number, that will imply
-a B<numeric> bitwise operation. You may explicitly show which type of
+a B<numeric> bitwise operation. You may explicitly show which type of
operation you intend by using C<""> or C<0+>, as in the examples below.
$foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF)
@@ -1770,33 +1781,39 @@ in a bit vector.
=head2 Integer Arithmetic
-By default Perl assumes that it must do most of its arithmetic in
+By default, Perl assumes that it must do most of its arithmetic in
floating point. But by saying
use integer;
you may tell the compiler that it's okay to use integer operations
-from here to the end of the enclosing BLOCK. An inner BLOCK may
-countermand this by saying
+(if it feels like it) from here to the end of the enclosing BLOCK.
+An inner BLOCK may countermand this by saying
no integer;
-which lasts until the end of that BLOCK.
-
-The bitwise operators ("&", "|", "^", "~", "<<", and ">>") always
-produce integral results. (But see also L<Bitwise String Operators>.)
-However, C<use integer> still has meaning
-for them. By default, their results are interpreted as unsigned
-integers. However, if C<use integer> is in effect, their results are
-interpreted as signed integers. For example, C<~0> usually evaluates
-to a large integral value. However, C<use integer; ~0> is -1 on twos-complement machines.
+which lasts until the end of that BLOCK. Note that this doesn't
+mean everything is only an integer, merely that Perl may use integer
+operations if it is so inclined. For example, even under C<use
+integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
+or so.
+
+Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
+and ">>") always produce integral results. (But see also L<Bitwise
+String Operators>.) However, C<use integer> still has meaning for
+them. By default, their results are interpreted as unsigned integers, but
+if C<use integer> is in effect, their results are interpreted
+as signed integers. For example, C<~0> usually evaluates to a large
+integral value. However, C<use integer; ~0> is C<-1> on twos-complement
+machines.
=head2 Floating-point Arithmetic
While C<use integer> provides integer-only arithmetic, there is no
-similar ways to provide rounding or truncation at a certain number of
-decimal places. For rounding to a certain number of digits, sprintf()
-or printf() is usually the easiest route.
+analogous mechanism to provide automatic rounding or truncation to a
+certain number of decimal places. For rounding to a certain number
+of digits, sprintf() or printf() is usually the easiest route.
+See L<perlfaq4>.
Floating-point numbers are only approximations to what a mathematician
would call real numbers. There are infinitely more reals than floats,
@@ -1820,10 +1837,10 @@ this topic.
}
The POSIX module (part of the standard perl distribution) implements
-ceil(), floor(), and a number of other mathematical and trigonometric
-functions. The Math::Complex module (part of the standard perl
-distribution) defines a number of mathematical functions that can also
-work on real numbers. Math::Complex not as efficient as POSIX, but
+ceil(), floor(), and other mathematical and trigonometric functions.
+The Math::Complex module (part of the standard perl distribution)
+defines mathematical functions that work on both the reals and the
+imaginary numbers. Math::Complex not as efficient as POSIX, but
POSIX can't work with complex numbers.
Rounding in financial applications can have serious implications, and
@@ -1835,13 +1852,17 @@ need yourself.
=head2 Bigger Numbers
The standard Math::BigInt and Math::BigFloat modules provide
-variable precision arithmetic and overloaded operators.
-At the cost of some space and considerable speed, they
-avoid the normal pitfalls associated with limited-precision
-representations.
+variable-precision arithmetic and overloaded operators, although
+they're currently pretty slow. At the cost of some space and
+considerable speed, they avoid the normal pitfalls associated with
+limited-precision representations.
use Math::BigInt;
$x = Math::BigInt->new('123456789123456789');
print $x * $x;
# prints +15241578780673678515622620750190521
+
+The non-standard modules SSLeay::BN and Math::Pari provide
+equivalent functionality (and much more) with a substantial
+performance savings.