summaryrefslogtreecommitdiff
path: root/pod/perlvar.pod
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2023-02-10 08:20:14 +0100
committerYves Orton <demerphq@gmail.com>2023-02-12 22:10:05 +0800
commit2a51adc4c2afa95102ece84fb829fa1ec01aee0e (patch)
tree62399d689d79cf708ac027ab6c746838917c72ab /pod/perlvar.pod
parent09fbd16e8a0e26ef4ac3790a6f67ec8dbdaa8720 (diff)
downloadperl-2a51adc4c2afa95102ece84fb829fa1ec01aee0e.tar.gz
perlvar.pod - add "Scoping Rules of Regex Variables" section
This section is used to document the majority of the regex variables, and previous language referring to them as "dynamically scoped" has been changed or simplified and a link to the new section provided to centralize the explanation. Strictly speaking $1 is globally scoped, but the data it access is dynamically scoped such that successful matches behave as though they localize a regex match state variable. (Maybe one day we will actually have such a variable exposed to the user.) This patch adds links to the relevant docs in perlsyn and perlvar to various places where it seemed appropriate, and also cleans up the wording for most cases to be similar or identical across all uses. It also cleans up a bit of related language in nearby paragraphs where it seemed to improve the readability of the docs. It also replaces the older kind of confusing example code for understanding the behavior and documents that "goto LABEL" does not play nicely with the dynamic scoping. This fixes Github Issue #899.
Diffstat (limited to 'pod/perlvar.pod')
-rw-r--r--pod/perlvar.pod205
1 files changed, 128 insertions, 77 deletions
diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index 86ea6fc9cc..489f577894 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -888,49 +888,77 @@ command or referenced as a file.
=head2 Variables related to regular expressions
Most of the special variables related to regular expressions are side
-effects. Perl sets these variables when it has a successful match, so
-you should check the match result before using them. For instance:
+effects. Perl sets these variables when it has completed a match
+successfully, so you should check the match result before using them.
+For instance:
if( /P(A)TT(ER)N/ ) {
print "I found $1 and $2\n";
}
-These variables are read-only and dynamically-scoped, unless we note
-otherwise.
+These variables are read-only and behave similarly to a dynamically
+scoped variable, with only a few exceptions which are explicitly
+documented as behaving otherwise. See the following section for more
+details.
-The dynamic nature of the regular expression variables means that
-their value is limited to the block that they are in, as demonstrated
-by this bit of code:
+=head3 Scoping Rules of Regex Variables
- my $outer = 'Wallace and Grommit';
- my $inner = 'Mutt and Jeff';
+Regular expression variables allow the programmer to access the state of
+the most recent I<successful> regex match in the current dynamic scope.
- my $pattern = qr/(\S+) and (\S+)/;
+The variables themselves are global and unscoped, but the data they
+access is scoped similarly to dynamically scoped variables, in that
+every successful match behaves as though it localizes a global state
+object to the current block or file scope.
+(See L<perlsyn/"Compound Statements"> for more details on dynamic
+scoping and the C<local> keyword.)
- sub show_n { print "\$1 is $1; \$2 is $2\n" }
+A I<successful match> includes any successful match performed by the
+search and replace operator C<s///> as well as those performed by the
+C<m//> operator.
- {
- OUTER:
- show_n() if $outer =~ m/$pattern/;
+Consider the following code:
- INNER: {
- show_n() if $inner =~ m/$pattern/;
- }
+ my @state;
+ sub matchit {
+ push @state, $1; # pushes "baz"
+ my $str = shift;
+ $str =~ /(zat)/; # matches "zat"
+ push @state, $1; # pushes "zat"
+ }
- show_n();
+ {
+ $str = "foo bar baz blorp zat";
+ $str =~ /(foo)/; # matches "foo"
+ push @state, $1; # pushes "foo"
+ {
+ $str =~ /(pizza)/; # does NOT match
+ push @state, $1; # pushes "foo"
+ $str =~ /(bar)/; # matches "bar"
+ push @state, $1; # pushes "bar"
+ $str =~ /(baz)/; # matches "baz"
+ matchit($str); # see above
+ push @state, $1; # pushes "baz"
+ }
+ $str =~ s/noodles/rice/; # does NOT match
+ push @state, $1; # pushes "foo"
+ $str =~ s/(blorp)/zwoop/; # matches "blorp"
+ push @state, $1; # pushes "blorp"
}
+ # the following prints "foo, foo, bar, baz, zat, baz, foo, blorp"
+ print join ",", @state;
-The output shows that while in the C<OUTER> block, the values of C<$1>
-and C<$2> are from the match against C<$outer>. Inside the C<INNER>
-block, the values of C<$1> and C<$2> are from the match against
-C<$inner>, but only until the end of the block (i.e. the dynamic
-scope). After the C<INNER> block completes, the values of C<$1> and
-C<$2> return to the values for the match against C<$outer> even though
-we have not made another match:
+Notice that each successful match in the exact same scope overrides the
+match context of the previous successful match, but that unsuccessful
+matches do not. Also note that in an inner nested scope the previous
+state from an outer dynamic scope persists until it has been overriden
+by another successful match, but that when the inner nested scope exits
+whatever match context was in effect before the inner successful match
+is restored when the scope concludes.
- $1 is Wallace; $2 is Grommit
- $1 is Mutt; $2 is Jeff
- $1 is Wallace; $2 is Grommit
+It is a known issue that C<goto LABEL> may interact poorly with the
+dynamically scoped match context. This may not be fixable, and is
+considered to be one of many good reasons to avoid C<goto LABEL>.
=head3 Performance issues
@@ -984,14 +1012,14 @@ find uses of these problematic match variables in your code.
X<$1> X<$2> X<$3> X<$I<digits>>
Contains the subpattern from the corresponding set of capturing
-parentheses from the last successful pattern match, not counting patterns
-matched in nested blocks that have been exited already.
+parentheses from the last successful pattern match in the current
+dynamic scope. (See L</Scoping Rules of Regex Variables>.)
Note there is a distinction between a capture buffer which matches
the empty string a capture buffer which is optional. Eg, C<(x?)> and
C<(x)?> The latter may be undef, the former not.
-These variables are read-only and dynamically-scoped.
+These variables are read-only.
Mnemonic: like \digits.
@@ -1031,14 +1059,13 @@ This variable was added in 5.25.7
=item $&
X<$&> X<$MATCH>
-The string matched by the last successful pattern match (not counting
-any matches hidden within a BLOCK or C<eval()> enclosed by the current
-BLOCK).
+The string matched by the last successful pattern match.
+(See L</Scoping Rules of Regex Variables>.)
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
Mnemonic: like C<&> in some editors.
@@ -1057,7 +1084,7 @@ C<${^MATCH}> does the same thing as C<$MATCH>.
This variable was added in Perl v5.10.0.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
=item $PREMATCH
@@ -1065,13 +1092,12 @@ This variable is read-only and dynamically-scoped.
X<$`> X<$PREMATCH> X<${^PREMATCH}>
The string preceding whatever was matched by the last successful
-pattern match, not counting any matches hidden within a BLOCK or C<eval>
-enclosed by the current BLOCK.
+pattern match. (See L</Scoping Rules of Regex Variables>).
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
Mnemonic: C<`> often precedes a quoted string.
@@ -1090,7 +1116,7 @@ C<${^PREMATCH}> does the same thing as C<$PREMATCH>.
This variable was added in Perl v5.10.0.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
=item $POSTMATCH
@@ -1098,8 +1124,7 @@ This variable is read-only and dynamically-scoped.
X<$'> X<$POSTMATCH> X<${^POSTMATCH}> X<@->
The string following whatever was matched by the last successful
-pattern match (not counting any matches hidden within a BLOCK or C<eval()>
-enclosed by the current BLOCK). Example:
+pattern match. (See L</Scoping Rules of Regex Variables>). Example:
local $_ = 'abcdefghi';
/def/;
@@ -1108,7 +1133,7 @@ enclosed by the current BLOCK). Example:
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
Mnemonic: C<'> often follows a quoted string.
@@ -1127,7 +1152,7 @@ C<${^POSTMATCH}> does the same thing as C<$POSTMATCH>.
This variable was added in Perl v5.10.0.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
=item $LAST_PAREN_MATCH
@@ -1135,7 +1160,8 @@ This variable is read-only and dynamically-scoped.
X<$+> X<$LAST_PAREN_MATCH>
The text matched by the highest used capture group of the last
-successful search pattern. It is logically equivalent to the highest
+successful search pattern. (See L</Scoping Rules of Regex Variables>).
+It is logically equivalent to the highest
numbered capture variable (C<$1>, C<$2>, ...) which has a defined value.
This is useful if you don't know which one of a set of alternative patterns
@@ -1143,7 +1169,7 @@ matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
Mnemonic: be positive and forward looking.
@@ -1153,8 +1179,11 @@ Mnemonic: be positive and forward looking.
X<$^N> X<$LAST_SUBMATCH_RESULT>
The text matched by the used group most-recently closed (i.e. the group
-with the rightmost closing parenthesis) of the last successful search
-pattern. This is subtly different from C<$+>. For example in
+with the rightmost closing parenthesis) of the last successful match.
+(See L</Scoping Rules of Regex Variables>).
+
+
+This is subtly different from C<$+>. For example in
"ab" =~ /^((.)(.))$/
@@ -1173,6 +1202,8 @@ recently matched. For example, to effectively capture text to a variable
By setting and then using C<$var> in this way relieves you from having to
worry about exactly which numbered set of parentheses they are.
+This variable is read-only, and its value is dynamically scoped.
+
This variable was added in Perl v5.8.0.
Mnemonic: the (possibly) Nested parenthesis that most recently closed.
@@ -1183,15 +1214,24 @@ Mnemonic: the (possibly) Nested parenthesis that most recently closed.
X<@+> X<@LAST_MATCH_END>
This array holds the offsets of the ends of the last successful
-submatches in the currently active dynamic scope. C<$+[0]> is
-the offset into the string of the end of the entire match. This
-is the same value as what the C<pos> function returns when called
-on the variable that was matched against. The I<n>th element
-of this array holds the offset of the I<n>th submatch, so
-C<$+[1]> is the offset past where C<$1> ends, C<$+[2]> the offset
-past where C<$2> ends, and so on. You can use C<$#+> to determine
-how many subgroups were in the last successful match. See the
-examples given for the C<@-> variable.
+match and any matching capture buffers that the pattern contains.
+(See L</Scoping Rules of Regex Variables>)
+
+The number of elements it contains will be one more than the number
+of capture buffers in the pattern, regardless of which capture buffers
+actually matched. You can use this to determine how many capture
+buffers there are in the pattern. (As opposed to C<@-> which may
+have fewer elements.)
+
+C<$+[0]> is the offset into the string of the end of the entire match.
+This is the same value as what the C<pos> function returns when called
+on the variable that was matched against. The I<n>th element of this
+array holds the offset of the I<n>th submatch, so C<$+[1]> is the offset
+past where C<$1> ends, C<$+[2]> the offset past where C<$2> ends, and so
+on. You can use C<$#+> to determine how many subgroups were in the last
+successful match. See the examples given for the C<@-> variable.
+
+This variable is read-only, and its value is dynamically scoped.
This variable was added in Perl v5.6.0.
@@ -1204,7 +1244,7 @@ X<%+> X<%LAST_PAREN_MATCH> X<%{^CAPTURE}>
Similar to C<@+>, the C<%+> hash allows access to the named capture
buffers, should they exist, in the last successful match in the
-currently active dynamic scope.
+currently active dynamic scope. (See L</Scoping Rules of Regex Variables>).
For example, C<$+{foo}> is equivalent to C<$1> after the following match:
@@ -1228,27 +1268,33 @@ surprising.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE}> alias was
added in 5.25.7.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
=item @LAST_MATCH_START
=item @-
X<@-> X<@LAST_MATCH_START>
+This array holds the offsets of the beginnings of the last successful
+match and any capture buffers it contains.
+(See L</Scoping Rules of Regex Variables>).
+
+The number of elements it contains will be one more than the number of
+the highest capture buffers (also called a subgroup) that actually
+matched something. (As opposed to C<@+> which may have fewer elements.)
+
C<$-[0]> is the offset of the start of the last successful match.
C<$-[I<n>]> is the offset of the start of the substring matched by
I<n>-th subpattern, or undef if the subpattern did not match.
-Thus, after a match against C<$_>, C<$&> coincides with C<substr $_, $-[0],
-$+[0] - $-[0]>. Similarly, $I<n> coincides with C<substr $_, $-[n],
-$+[n] - $-[n]> if C<$-[n]> is defined, and $+ coincides with
-C<substr $_, $-[$#-], $+[$#-] - $-[$#-]>. One can use C<$#-> to find the
-last matched subgroup in the last successful match. Contrast with
-C<$#+>, the number of subgroups in the regular expression. Compare
-with C<@+>.
+Thus, after a match against C<$_>, C<$&> coincides with
+C<substr $_, $-[0], $+[0] - $-[0]>. Similarly, C<$I<n>> coincides
+with C<substr $_, $-[n], $+[n] - $-[n]> if C<$-[n]> is defined, and
+C<$+> coincides with C<substr $_, $-[$#-], $+[$#-] - $-[$#-]>.
+One can use C<$#-> to find the last matched subgroup in the last
+successful match. Contrast with C<$#+>, the number of subgroups
+in the regular expression.
-This array holds the offsets of the beginnings of the last
-successful submatches in the currently active dynamic scope.
C<$-[0]> is the offset into the string of the beginning of the
entire match. The I<n>th element of this array holds the offset
of the I<n>th submatch, so C<$-[1]> is the offset where C<$1>
@@ -1272,6 +1318,8 @@ After a match against some variable C<$var>:
=back
+This variable is read-only, and its value is dynamically scoped.
+
This variable was added in Perl v5.6.0.
=item %{^CAPTURE_ALL}
@@ -1280,12 +1328,12 @@ X<%{^CAPTURE_ALL}>
=item %-
X<%->
-Similar to C<%+>, this variable allows access to the named capture groups
-in the last successful match in the currently active dynamic scope. To
-each capture group name found in the regular expression, it associates a
-reference to an array containing the list of values captured by all
-buffers with that name (should there be several of them), in the order
-where they appear.
+Similar to C<%+>, this variable allows access to the named capture
+groups in the last successful match in the currently active dynamic
+scope. (See L</Scoping Rules of Regex Variables>). To each capture group
+name found in the regular expression, it associates a reference to an
+array containing the list of values captured by all buffers with that
+name (should there be several of them), in the order where they appear.
Here's an example:
@@ -1319,12 +1367,12 @@ B<Note:> C<%-> and C<%+> are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via C<each> may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
-surprising.
+surprising. See L</Scoping Rules of Regex Variables>.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE_ALL}> alias was
added in 5.25.7.
-This variable is read-only and dynamically-scoped.
+This variable is read-only, and its value is dynamically scoped.
=item $LAST_REGEXP_CODE_RESULT
@@ -1332,7 +1380,10 @@ This variable is read-only and dynamically-scoped.
X<$^R> X<$LAST_REGEXP_CODE_RESULT>
The result of evaluation of the last successful C<(?{ code })>
-regular expression assertion (see L<perlre>). May be written to.
+regular expression assertion (see L<perlre>).
+
+This variable may be written to, and its value is scoped normally,
+unlike most other regex variables.
This variable was added in Perl 5.005.