summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2023-01-08 15:49:04 +0100
committerYves Orton <demerphq@gmail.com>2023-01-19 18:44:49 +0800
commitc224bbd5d135fe48f49b4cc25f10a4977d695145 (patch)
tree5909b6fd666bb025496824a3f8c67715643164a8 /pod/perlre.pod
parent09b3a407e87f128d7aecd14f9c8d75dcff9aaaf8 (diff)
downloadperl-c224bbd5d135fe48f49b4cc25f10a4977d695145.tar.gz
regcomp.c - add optimistic eval (*{ ... }) and (**{ ... })
This adds (*{ ... }) and (**{ ... }) as equivalents to (?{ ... }) and (??{ ... }). The only difference being that the star variants are "optimisitic" and are defined to never disable optimisations. This is especially relevant now that use of (?{ ... }) prevents important optimisations anywhere in the pattern, instead of the older and inconsistent rules where it only affected the parts that contained the EVAL. It is also very useful for injecting debugging style expressions to the pattern to understand what the regex engine is actually doing. The older style (?{ ... }) variants would change the regex engines behavior, meaning this was not as effective a tool as it could have been. Similarly it is now possible to test that a given regex optimisation works correctly using (*{ ... }), which was not possible with (?{ ... }).
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod60
1 files changed, 51 insertions, 9 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index ef00abbe4b..30e3fe212f 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1990,6 +1990,18 @@ keep track of the number of nested parentheses. For example:
/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
print "color = $color, animal = $animal\n";
+The use of this construct disables some optimisations globally in the
+pattern, and the pattern may execute much slower as a consequence.
+Use a C<*> instead of the C<?> block to create an optimistic form of
+this construct. C<(*{ ... })> should not disable any optimisations.
+
+=item C<(*{ I<code> })>
+X<(*{})> X<regex, optimistic code>
+
+This is *exactly* the same as C<(?{ I<code> })> with the exception
+that it does not disable B<any> optimisations at all in the regex engine.
+How often it is executed may vary from perl release to perl release.
+In a failing match it may not even be executed at all.
=item C<(??{ I<code> })>
X<(??{})>
@@ -2047,6 +2059,20 @@ consuming any input string will also result in a fatal error. The depth
at which that happens is compiled into perl, so it can be changed with a
custom build.
+The use of this construct disables some optimisations globally in the pattern,
+and the pattern may execute much slower as a consequence. Use a C<*> instead
+of the C<?> to create an optimistic form of this construct: C<(**{...})>
+maybe used as a replacement and should not disable any optimisations, but is
+likely to be even more volatile from perl version to perl version than
+C<(??{...})> is.
+
+=item C<(**{ I<code> })>
+X<(**{})> X<regex, postponed optimistic>
+
+This is exactly the same as C<(??{ I<code> })> however it does not disable
+B<any> optimisations. It is even more likely to change from version to version
+of perl. In a failing match it may not even be executed at all.
+
=item C<(?I<PARNO>)> C<(?-I<PARNO>)> C<(?+I<PARNO>)> C<(?R)> C<(?0)>
X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
@@ -2201,7 +2227,15 @@ Full syntax: C<< (?(?=I<lookahead>)I<then>|I<else>) >>
=item C<(?{ I<CODE> })>
Treats the return value of the code block as the condition.
-Full syntax: C<< (?(?{ I<code> })I<then>|I<else>) >>
+Full syntax: C<< (?(?{ I<CODE> })I<then>|I<else>) >>
+
+Note use of this construct may globally affect the performance
+of the pattern. Consider using C<(*{ I<CODE> })>
+
+=item C<(*{ I<CODE> })>
+
+Treats the return value of the code block as the condition.
+Full syntax: C<< (?(*{ I<CODE> })I<then>|I<else>) >>
=item C<(R)>
@@ -3293,14 +3327,15 @@ part of this regular expression needs to be converted explicitly
=head2 Embedded Code Execution Frequency
-The exact rules for how often C<(??{})> and C<(?{})> are executed in a pattern
-are unspecified. In the case of a successful match you can assume that
-they DWIM and will be executed in left to right order the appropriate
-number of times in the accepting path of the pattern as would any other
-meta-pattern. How non-accepting pathways and match failures affect the
-number of times a pattern is executed is specifically unspecified and
-may vary depending on what optimizations can be applied to the pattern
-and is likely to change from version to version.
+The exact rules for how often C<(?{})> and C<(??{})> are executed in a pattern
+are unspecified, as are their even less well defined equivalents C<(*{})> and
+C<(**{})>. In the case of a successful match you can assume that they DWIM and
+will be executed in left to right order the appropriate number of times in the
+accepting path of the pattern as would any other meta-pattern. How non-
+accepting pathways and match failures affect the number of times a pattern is
+executed is specifically unspecified and may vary depending on what
+optimizations can be applied to the pattern and is likely to change from
+version to version.
For instance in
@@ -3326,6 +3361,13 @@ example:
will output "o" twice.
+For historical and consistency reasons the use of normal code blocks
+anywhere in a pattern will disable certain optimisations. As of 5.37.7
+you can use an "optimistic" codeblock, C<(*{ ... })> or C<(**{ ... })>
+if you do *not* wish to disable these optimisations. This may result
+in code blocks being called less often than might have been had they
+not been optimistic.
+
=head2 PCRE/Python Support
As of Perl 5.10.0, Perl supports several Python/PCRE-specific extensions