diff options
author | Yves Orton <demerphq@gmail.com> | 2023-01-08 15:49:04 +0100 |
---|---|---|
committer | Yves Orton <demerphq@gmail.com> | 2023-01-19 18:44:49 +0800 |
commit | c224bbd5d135fe48f49b4cc25f10a4977d695145 (patch) | |
tree | 5909b6fd666bb025496824a3f8c67715643164a8 /pod/perlre.pod | |
parent | 09b3a407e87f128d7aecd14f9c8d75dcff9aaaf8 (diff) | |
download | perl-c224bbd5d135fe48f49b4cc25f10a4977d695145.tar.gz |
regcomp.c - add optimistic eval (*{ ... }) and (**{ ... })
This adds (*{ ... }) and (**{ ... }) as equivalents to (?{ ... }) and
(??{ ... }). The only difference being that the star variants are
"optimisitic" and are defined to never disable optimisations. This is
especially relevant now that use of (?{ ... }) prevents important
optimisations anywhere in the pattern, instead of the older and inconsistent
rules where it only affected the parts that contained the EVAL.
It is also very useful for injecting debugging style expressions to the
pattern to understand what the regex engine is actually doing. The older
style (?{ ... }) variants would change the regex engines behavior, meaning
this was not as effective a tool as it could have been.
Similarly it is now possible to test that a given regex optimisation
works correctly using (*{ ... }), which was not possible with (?{ ... }).
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r-- | pod/perlre.pod | 60 |
1 files changed, 51 insertions, 9 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod index ef00abbe4b..30e3fe212f 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -1990,6 +1990,18 @@ keep track of the number of nested parentheses. For example: /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i; print "color = $color, animal = $animal\n"; +The use of this construct disables some optimisations globally in the +pattern, and the pattern may execute much slower as a consequence. +Use a C<*> instead of the C<?> block to create an optimistic form of +this construct. C<(*{ ... })> should not disable any optimisations. + +=item C<(*{ I<code> })> +X<(*{})> X<regex, optimistic code> + +This is *exactly* the same as C<(?{ I<code> })> with the exception +that it does not disable B<any> optimisations at all in the regex engine. +How often it is executed may vary from perl release to perl release. +In a failing match it may not even be executed at all. =item C<(??{ I<code> })> X<(??{})> @@ -2047,6 +2059,20 @@ consuming any input string will also result in a fatal error. The depth at which that happens is compiled into perl, so it can be changed with a custom build. +The use of this construct disables some optimisations globally in the pattern, +and the pattern may execute much slower as a consequence. Use a C<*> instead +of the C<?> to create an optimistic form of this construct: C<(**{...})> +maybe used as a replacement and should not disable any optimisations, but is +likely to be even more volatile from perl version to perl version than +C<(??{...})> is. + +=item C<(**{ I<code> })> +X<(**{})> X<regex, postponed optimistic> + +This is exactly the same as C<(??{ I<code> })> however it does not disable +B<any> optimisations. It is even more likely to change from version to version +of perl. In a failing match it may not even be executed at all. + =item C<(?I<PARNO>)> C<(?-I<PARNO>)> C<(?+I<PARNO>)> C<(?R)> C<(?0)> X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)> X<regex, recursive> X<regexp, recursive> X<regular expression, recursive> @@ -2201,7 +2227,15 @@ Full syntax: C<< (?(?=I<lookahead>)I<then>|I<else>) >> =item C<(?{ I<CODE> })> Treats the return value of the code block as the condition. -Full syntax: C<< (?(?{ I<code> })I<then>|I<else>) >> +Full syntax: C<< (?(?{ I<CODE> })I<then>|I<else>) >> + +Note use of this construct may globally affect the performance +of the pattern. Consider using C<(*{ I<CODE> })> + +=item C<(*{ I<CODE> })> + +Treats the return value of the code block as the condition. +Full syntax: C<< (?(*{ I<CODE> })I<then>|I<else>) >> =item C<(R)> @@ -3293,14 +3327,15 @@ part of this regular expression needs to be converted explicitly =head2 Embedded Code Execution Frequency -The exact rules for how often C<(??{})> and C<(?{})> are executed in a pattern -are unspecified. In the case of a successful match you can assume that -they DWIM and will be executed in left to right order the appropriate -number of times in the accepting path of the pattern as would any other -meta-pattern. How non-accepting pathways and match failures affect the -number of times a pattern is executed is specifically unspecified and -may vary depending on what optimizations can be applied to the pattern -and is likely to change from version to version. +The exact rules for how often C<(?{})> and C<(??{})> are executed in a pattern +are unspecified, as are their even less well defined equivalents C<(*{})> and +C<(**{})>. In the case of a successful match you can assume that they DWIM and +will be executed in left to right order the appropriate number of times in the +accepting path of the pattern as would any other meta-pattern. How non- +accepting pathways and match failures affect the number of times a pattern is +executed is specifically unspecified and may vary depending on what +optimizations can be applied to the pattern and is likely to change from +version to version. For instance in @@ -3326,6 +3361,13 @@ example: will output "o" twice. +For historical and consistency reasons the use of normal code blocks +anywhere in a pattern will disable certain optimisations. As of 5.37.7 +you can use an "optimistic" codeblock, C<(*{ ... })> or C<(**{ ... })> +if you do *not* wish to disable these optimisations. This may result +in code blocks being called less often than might have been had they +not been optimistic. + =head2 PCRE/Python Support As of Perl 5.10.0, Perl supports several Python/PCRE-specific extensions |