regcomp.c - remove (**{ ... }) from the regex engine

Dave M pointed out that this idea was flawed, and after some testing I have come to agree with him. This removes it. It was only available for 5.37.8, so no deprecation cycle involved. The point of (**{ ... }) was to have a postponed eval that does not disable optimizations. But some of the optimizations are disabled because if they are not we do not match correctly as the optimizations will make unwarranted assumptions about the pattern, assumptions which can be incorrect depending on what pattern is returned from the codeblock. The original idea was proposed because (?{ ... }) was treated as though it was (??{ ... }) and disabled many optimizations, when in fact it doesn't interact with optimizations at all. When I added (*{ ... }) as the optimistic version of (?{ ... }) I used "completeness" as the justification for also adding (**{ ... }) when it does not make sense to do so.
author: Yves Orton <demerphq@gmail.com> 2023-02-06 11:03:06 +0100
committer: Yves Orton <demerphq@gmail.com> 2023-02-08 13:03:01 +0800
commit: 5f5c35d3ce139755d02fa07de26229cd08d3c8cd (patch)
tree: 2b2d1fb07166852b9fde43dd3cc2c927c88971ae
parent: 4202141d20ddfa0501f385cf923860bcf7511398 (diff)
download: perl-5f5c35d3ce139755d02fa07de26229cd08d3c8cd.tar.gz
7 files changed, 41 insertions, 64 deletions
diff --git a/pod/perl5378delta.pod b/pod/perl5378delta.pod
index 980ec09175..ce4f8d3eef 100644
--- a/pod/perl5378delta.pod
+++ b/pod/perl5378delta.pod
@@ -28,6 +28,9 @@ included in a pattern, so that patterns which are O(N) in normal use become
 O(N*N) with a C<(?{ ... })> pattern in them. Switching to C<(*{ ... })> means
 the pattern will stay O(N).
 
+B<NOTE> the C<(**{ ... })> was removed in 5.37.9 as it didn't quite work out
+as planned.
+
 =head1 Modules and Pragmata
 
 =head2 Updated Modules and Pragmata
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 788ada2996..7a81735089 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -37,13 +37,13 @@ L</Selected Bug Fixes> section.
 
 =head1 Incompatible Changes
 
-XXX For a release on a stable branch, this section aspires to be:
+=head2 (**{ ... }) removed from the regex engine.
 
-    There are no changes intentionally incompatible with 5.XXX.XXX
-    If any exist, they are bugs, and we request that you submit a
-    report.  See L</Reporting Bugs> below.
-
-[ List each incompatible change as a =head2 entry ]
+This feature was released as part of 5.37.8, after some use and
+discussion it was seen as more problematic than understood at first
+and has been removed in 5.37.9. It was only ever present in a single
+dev release and has never been released as part of a production perl,
+thus no deprecation cycle has been performed.
 
 =head1 Deprecations
 
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 30e3fe212f..912061209e 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -2060,18 +2060,7 @@ at which that happens is compiled into perl, so it can be changed with a
 custom build.
 
 The use of this construct disables some optimisations globally in the pattern,
-and the pattern may execute much slower as a consequence. Use a C<*> instead
-of the C<?> to create an optimistic form of this construct: C<(**{...})>
-maybe used as a replacement and should not disable any optimisations, but is
-likely to be even more volatile from perl version to perl version than
-C<(??{...})> is.
-
-=item C<(**{ I<code> })>
-X<(**{})> X<regex, postponed optimistic>
-
-This is exactly the same as C<(??{ I<code> })> however it does not disable
-B<any> optimisations. It is even more likely to change from version to version
-of perl. In a failing match it may not even be executed at all.
+and the pattern may execute much slower as a consequence.
 
 =item C<(?I<PARNO>)> C<(?-I<PARNO>)> C<(?+I<PARNO>)> C<(?R)> C<(?0)>
 X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
@@ -3328,8 +3317,8 @@ part of this regular expression needs to be converted explicitly
 =head2 Embedded Code Execution Frequency
 
 The exact rules for how often C<(?{})> and C<(??{})> are executed in a pattern
-are unspecified, as are their even less well defined equivalents C<(*{})> and
-C<(**{})>. In the case of a successful match you can assume that they DWIM and
+are unspecified, and this is even more true of C<(*{})>.
+In the case of a successful match you can assume that they DWIM and
 will be executed in left to right order the appropriate number of times in the
 accepting path of the pattern as would any other meta-pattern. How non-
 accepting pathways and match failures affect the number of times a pattern is
@@ -3363,10 +3352,10 @@ will output "o" twice.
 
 For historical and consistency reasons the use of normal code blocks
 anywhere in a pattern will disable certain optimisations. As of 5.37.7
-you can use an "optimistic" codeblock, C<(*{ ... })> or C<(**{ ... })>
-if you do *not* wish to disable these optimisations. This may result
-in code blocks being called less often than might have been had they
-not been optimistic.
+you can use an "optimistic" codeblock, C<(*{ ... })> as a replacement
+for C<(?{ ... })>, if you do *not* wish to disable these optimisations.
+This may result in the code block being called less often than it might
+have been had they not been optimistic.
 
 =head2 PCRE/Python Support
 
diff --git a/regcomp.c b/regcomp.c
index 82d89b6161..7eebcaca95 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3008,15 +3008,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp, U32 depth)
             goto parse_rest;
         }
         else if ( *RExC_parse == '*') { /* (*VERB:ARG), (*construct:...) */
-            if (RExC_parse[1] == '{') {
+            if (RExC_parse[1] == '{') { /* (*{ ... }) optimistic EVAL */
                 fake_eval = '{';
                 goto handle_qmark;
             }
-            else
-            if ( RExC_parse[1] == '*' && RExC_parse[2] == '{' ) {
-                fake_eval = '?';
-                goto handle_qmark;
-            }
 
             char *start_verb = RExC_parse + 1;
             STRLEN verb_len;
diff --git a/regexec.c b/regexec.c
index 92154c555b..bcab2ad7d5 100644
--- a/regexec.c
+++ b/regexec.c
@@ -8237,7 +8237,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
                 PL_op = NULL;
 
                 re_sv = NULL;
-                if (logical == 0) {       /*   (?{})/   */
+                if (logical == 0) {       /* /(?{ ... })/ and /(*{ ... })/ */
                     SV *replsv = save_scalar(PL_replgv);
                     sv_setsv(replsv, ret); /* $^R */
                     SvSETMAGIC(replsv);
@@ -8246,7 +8246,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
                     sw = cBOOL(SvTRUE_NN(ret));
                     logical = 0;
                 }
-                else {                   /*  /(??{})  */
+                else {                   /*  /(??{ ... })  */
                     /*  if its overloaded, let the regex compiler handle
                      *  it; otherwise extract regex, or stringify  */
                     if (SvGMAGICAL(ret))
@@ -8289,7 +8289,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
                 }
             }
 
-                /* only /(??{})/  from now on */
+                /* only /(??{ ... })/  from now on */
                 logical = 0;
                 {
                     /* extract RE object from returned value; compiling if
diff --git a/t/re/pat_re_eval.t b/t/re/pat_re_eval.t
index 96ee8a4888..a7351751b7 100644
--- a/t/re/pat_re_eval.t
+++ b/t/re/pat_re_eval.t
@@ -24,7 +24,8 @@ BEGIN {
 
 our @global;
 
-plan tests => 551;  # Update this when adding/deleting tests.
+
+plan tests => 527;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -142,8 +143,8 @@ sub run_tests {
         # Test if $^N and $+ work in (*{ }) (optimistic eval)
         our @ctl_n = ();
         our @plus = ();
-        our $nested_tags;
-        $nested_tags = qr{
+        my $nested_tags = qr{
+          (?<nested_tags>
             <
                 ((\w)+)
                 (*{
@@ -151,8 +152,9 @@ sub run_tests {
                        push @plus, (defined $+ ? $+ : "undef");
                 })
             >
-            (**{$nested_tags})*
+            (?&nested_tags)*
             </\s* \w+ \s*>
+          )
         }x;
 
         # note the results of this may change from perl to perl as different optimisations
@@ -163,23 +165,11 @@ sub run_tests {
         for my $test (
             # Test structure:
             #  [ Expected result, Regex, Expected value(s) of $^N, Expected value(s) of $+, "note" ]
-            [ 1, qr#^$nested_tags$#, "bla blubb bla", "a b a" ],
+            [ 1, qr#^$nested_tags$#, "bla blubb <bla><blubb></blubb></bla>", "a b a" ],
             [ 1, qr#^($nested_tags)$#, "bla blubb <bla><blubb></blubb></bla>", "a b a" ],
-            [ 1, qr#^(|)$nested_tags$#, "bla blubb bla", "a b a" ],
-            [ 1, qr#^(?:|)$nested_tags$#, "bla blubb bla", "a b a" ],
+            [ 1, qr#^(|)$nested_tags$#, "bla blubb <bla><blubb></blubb></bla>", "a b a" ],
+            [ 1, qr#^(?:|)$nested_tags$#, "bla blubb <bla><blubb></blubb></bla>", "a b a" ],
             [ 1, qr#^<(bl|bla)>$nested_tags<(/\1)>$#, "blubb /bla", "b /bla" ],
-            [ 1, qr#(**{"(|)"})$nested_tags$#, "bla blubb bla", "a b a" ],
-            [ 1, qr#^(**{"(bla|)"})$nested_tags$#, "bla blubb bla", "a b a" ],
-            [ 1, qr#^(**{"(|)"})(**{$nested_tags})$#, "bla blubb undef", "a b undef" ],
-            [ 1, qr#^(**{"(?:|)"})$nested_tags$#, "bla blubb bla", "a b a" ],
-            [ 1, qr#^((**{"(?:bla|)"}))((**{$nested_tags}))$#, "bla blubb <bla><blubb></blubb></bla>", "a b <bla><blubb></blubb></bla>" ],
-            [ 1, qr#^((**{"(?!)?"}))((**{$nested_tags}))$#, "bla blubb <bla><blubb></blubb></bla>", "a b <bla><blubb></blubb></bla>" ],
-            [ 1, qr#^((**{"(?:|<(/?bla)>)"}))((**{$nested_tags}))\1$#, "bla blubb <bla><blubb></blubb></bla>", "a b <bla><blubb></blubb></bla>" ],
-            [ 0, qr#^((**{"(?!)"}))?((**{$nested_tags}))(?!)$#,
-                 "bla blubb undef",
-                 "a b undef",
-                 "this test is expected to fail if CURLYX optimisations are disabled"],
-
         ) { #"#silence vim highlighting
             $c++;
             @ctl_n = ();
@@ -187,14 +177,14 @@ sub run_tests {
             my $match = (("<bla><blubb></blubb></bla>" =~ $test->[1]) ? 1 : 0);
             push @ctl_n, (defined $^N ? $^N : "undef");
             push @plus, (defined $+ ? $+ : "undef");
-            ok($test->[0] == $match, "match $c");
+            ok($test->[0] == $match, "(*{ ... }) match $c");
             if ($test->[0] != $match) {
               # unset @ctl_n and @plus
               @ctl_n = @plus = ();
             }
             my $note = $test->[4] ? " - $test->[4]" : "";
-            is("@ctl_n", $test->[2], "ctl_n $c$note");
-            is("@plus", $test->[3], "plus $c$note");
+            is("@ctl_n", $test->[2], "(*{ ... }) ctl_n $c$note");
+            is("@plus", $test->[3], "(*{ ... }) plus $c$note");
         }
     }
 
diff --git a/toke.c b/toke.c
index 107d889857..14808e3731 100644
--- a/toke.c
+++ b/toke.c
@@ -3076,7 +3076,7 @@ Perl_get_and_check_backslash_N_name(pTHX_ const char* s,
     stops on:
         @ and $ where it appears to be a var, but not for $ as tail anchor
         \l \L \u \U \Q \E
-        (?{  or  (??{ or (*{ or (**{
+        (?{  or  (??{ or (*{
 
   In transliterations:
     characters are VERY literal, except for - not at the start or end
@@ -3636,7 +3636,7 @@ S_scan_const(pTHX_ char *start)
         }
             /* skip for regexp comments /(?#comment)/, except for the last
              * char, which will be done separately.  Stop on (?{..}) and
-             * friends (??{ ... }) or (*{ ... }) or (**{ ... }) */
+             * friends (??{ ... }) or (*{ ... }) */
         else if (*s == '(' && PL_lex_inpat && (s[1] == '?' || s[1] == '*') && !in_charclass) {
             if (s[1] == '?' && s[2] == '#') {
                 if (s_is_utf8) {
@@ -3653,13 +3653,13 @@ S_scan_const(pTHX_ char *start)
                     *d++ = *s++;
                 }
             }
-            else if (!PL_lex_casemods
-                     && ( (s[1] == '?' && ( s[2] == '{' /* This should match regcomp.c */
-                           || (s[2] == '?' && s[3] == '{'))) || /* (?{ ... }) (??{ ... }) */
-                          (s[1] == '*' && ( s[2] == '{'
-                           || (s[2] == '*' && s[3] == '{'))) )  /* (*{ ... }) (**{ ... }) */
-                 )
-            {
+            else
+            if (!PL_lex_casemods &&
+                /* The following should match regcomp.c */
+                ((s[1] == '?' && (s[2] == '{'                        /* (?{ ... })  */
+                              || (s[2] == '?' && s[3] == '{'))) ||   /* (??{ ... }) */
+                 (s[1] == '*' && (s[2] == '{' )))                    /* (*{ ... })  */
+            ){
                 break;
             }
         }
author	Yves Orton <demerphq@gmail.com>	2023-02-06 11:03:06 +0100
committer	Yves Orton <demerphq@gmail.com>	2023-02-08 13:03:01 +0800
commit	5f5c35d3ce139755d02fa07de26229cd08d3c8cd (patch)
tree	2b2d1fb07166852b9fde43dd3cc2c927c88971ae
parent	4202141d20ddfa0501f385cf923860bcf7511398 (diff)
download	perl-5f5c35d3ce139755d02fa07de26229cd08d3c8cd.tar.gz