summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAaron Crane <arc@cpan.org>2016-11-19 13:28:02 +0000
committerAaron Crane <arc@cpan.org>2016-11-19 13:28:02 +0000
commit2db681f78770646c1a96a319010ae10d7c6b1295 (patch)
treeaa1a90b0b8459484563c260a719e75358f80ddd2
parent147e38468b8279e26a0ca11e4efd8492016f2702 (diff)
parentff8bb4687895e07f822f5227d573c967aa0a4524 (diff)
downloadperl-2db681f78770646c1a96a319010ae10d7c6b1295.tar.gz
Merge branch 'perlre-tidy' into blead
This branch makes assorted cleanups to pod/perlre.pod. In particular, it no longer claims that long-established, stable regex constructs like (?:pat) might stop working in the future.
-rw-r--r--pod/perldiag.pod2
-rw-r--r--pod/perlre.pod50
2 files changed, 33 insertions, 19 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 89ad1474d1..c0a717ccc1 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -7214,7 +7214,7 @@ front of your variable.
(F) Lookbehind is allowed only for subexpressions whose length is fixed and
known at compile time. For positive lookbehind, you can use the C<\K>
regex construct as a way to get the equivalent functionality. See
-L<perlre/(?<=pattern) \K>.
+L<(?<=pattern) and \K in perlre|perlre/\K>.
There are non-obvious Unicode rules under C</i> that can match variably,
but which you might not think could. For example, the substring C<"ss">
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 0e3928cab3..6f0c5e2a33 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1056,12 +1056,6 @@ pair of parentheses with a question mark as the first thing within
the parentheses. The character after the question mark indicates
the extension.
-The stability of these extensions varies widely. Some have been
-part of the core language for many years. Others are experimental
-and may change without warning or be completely removed. Check
-the documentation on an individual feature to verify its current
-status.
-
A question mark was chosen for this and for the minimal-matching
construct because 1) question marks are rare in older regular
expressions, and 2) whenever you see one, you should stop and
@@ -1089,7 +1083,8 @@ One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by C<"-">) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
-This is particularly useful for dynamic patterns, such as those read in from a
+This is particularly useful for dynamically-generated patterns,
+such as those read in from a
configuration file, taken from an argument, or specified in a table
somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need to
@@ -1148,11 +1143,13 @@ C<"()">, but doesn't make backreferences as C<"()"> does. So
@fields = split(/\b(?:a|b|c)\b/)
-is like
+matches the same field delimiters as
@fields = split(/\b(a|b|c)\b/)
-but doesn't spit out extra fields. It's also cheaper not to capture
+but doesn't spit out the delimiters themselves as extra fields (even though
+that's the behaviour of L<perlfunc/split> when its pattern contains capturing
+groups). It's also cheaper not to capture
characters if you don't need to.
Any letters between C<"?"> and C<":"> act as flags modifiers as with
@@ -1237,8 +1234,8 @@ in the same order, in each of the alternations:
Not doing so may lead to surprises:
"12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
- say $+ {a}; # Prints '12'
- say $+ {b}; # *Also* prints '12'.
+ say $+{a}; # Prints '12'
+ say $+{b}; # *Also* prints '12'.
The problem here is that both the group named C<< a >> and the group
named C<< b >> are aliases for the group belonging to C<< $1 >>.
@@ -1273,7 +1270,9 @@ will not do what you want. That's because the C<(?!foo)> is just saying that
the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
match. Use lookbehind instead (see below).
-=item C<(?<=pattern)> C<\K>
+=item C<(?<=pattern)>
+
+=item C<\K>
X<(?<=)> X<look-behind, positive> X<lookbehind, positive> X<\K>
A zero-width positive lookbehind assertion. For example, C</(?<=\t)\w+/>
@@ -1307,9 +1306,9 @@ only for fixed-width lookbehind.
=back
-=item C<(?'NAME'pattern)>
-
=item C<< (?<NAME>pattern) >>
+
+=item C<(?'NAME'pattern)>
X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
A named capture group. Identical in every respect to normal capturing
@@ -1677,23 +1676,28 @@ Here's a summary of the possible predicates:
=item C<(1)> C<(2)> ...
Checks if the numbered capturing group has matched something.
+Full syntax: C<< (?(1)then|else) >>
=item C<(E<lt>I<NAME>E<gt>)> C<('I<NAME>')>
Checks if a group with the given name has matched something.
+Full syntax: C<< (?(<name>)then|else) >>
=item C<(?=...)> C<(?!...)> C<(?<=...)> C<(?<!...)>
Checks whether the pattern matches (or does not match, for the C<"!">
variants).
+Full syntax: C<< (?(?=lookahead)then|else) >>
=item C<(?{ I<CODE> })>
Treats the return value of the code block as the condition.
+Full syntax: C<< (?(?{ code })then|else) >>
=item C<(R)>
Checks if the expression has been evaluated inside of recursion.
+Full syntax: C<< (?(R)then|else) >>
=item C<(R1)> C<(R2)> ...
@@ -1704,18 +1708,22 @@ inside of the n-th capture group. This check is the regex equivalent of
In other words, it does not check the full recursion stack.
+Full syntax: C<< (?(R1)then|else) >>
+
=item C<(R&I<NAME>)>
Similar to C<(R1)>, this predicate checks to see if we're executing
directly inside of the leftmost group with a given name (this is the same
logic used by C<(?&I<NAME>)> to disambiguate). It does not check the full
stack, but only the name of the innermost active recursion.
+Full syntax: C<< (?(R&name)then|else) >>
=item C<(DEFINE)>
In this case, the yes-pattern is never directly executed, and no
no-pattern is allowed. Similar in spirit to C<(?{0})> but more efficient.
See below for details.
+Full syntax: C<< (?(DEFINE)definitions...) >>
=back
@@ -1881,6 +1889,9 @@ to inside of one of these constructs. The following equivalences apply:
See L<perlrecharclass/Extended Bracketed Character Classes>.
+Note that this feature is currently L<experimental|perlpolicy/experimental>;
+using it yields a warning in the C<experimental::regex_sets> category.
+
=back
=head2 Backtracking
@@ -1893,8 +1904,8 @@ see L</Combining RE Pieces>.
A fundamental feature of regular expression matching involves the
notion called I<backtracking>, which is currently used (when needed)
-by all regular non-possessive expression quantifiers, namely C<"*">, C<"*?">, C<"+">,
-C<"+?">, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
+by all regular non-possessive expression quantifiers, namely C<*>, C<*?>, C<+>,
+C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
internally, but the general principle outlined here is valid.
For a regular expression to match, the I<entire> regular expression must
@@ -2115,7 +2126,10 @@ C<(*MARK:NAME)> verb below for more details.
B<NOTE:> C<$REGERROR> and C<$REGMARK> are not magic variables like C<$1>
and most other regex-related variables. They are not local to a scope, nor
readonly, but instead are volatile package variables similar to C<$AUTOLOAD>.
-Use C<local> to localize changes to them to a specific scope if necessary.
+They are set in the package containing the code that I<executed> the regex
+(rather than the one that compiled it, where those differ). If necessary, you
+can use C<local> to localize changes to these variables to a specific scope
+before executing a regex.
If a pattern does not contain a special backtracking verb that allows an
argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
@@ -2130,7 +2144,7 @@ argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
X<(*PRUNE)> X<(*PRUNE:NAME)>
This zero-width pattern prunes the backtracking tree at the current point
-when backtracked into on failure. Consider the pattern C<I<A> (*PRUNE) I<B>>,
+when backtracked into on failure. Consider the pattern C</I<A> (*PRUNE) I<B>/>,
where I<A> and I<B> are complex patterns. Until the C<(*PRUNE)> verb is reached,
I<A> may backtrack as necessary to match. Once it is reached, matching
continues in I<B>, which may also backtrack as necessary; however, should B