diff options
author | David Mitchell <davem@iabyn.com> | 2015-02-10 12:17:51 +0000 |
---|---|---|
committer | David Mitchell <davem@iabyn.com> | 2015-02-10 12:37:10 +0000 |
commit | 0fa70a06a98fc8fa9840d4dbaa31fc2d3b28b99b (patch) | |
tree | 26a06d1b96e1d6e7bfc5a2df9ddf663a5672f4da /t/re/speed.t | |
parent | 5904c5c0c031fad19e61f5d279d624a91a196e02 (diff) | |
download | perl-0fa70a06a98fc8fa9840d4dbaa31fc2d3b28b99b.tar.gz |
simpify and speed up /.*.../ handling
See RT ##123743.
A pattern that starts /.*/ has a fake MBOL or SBOL flag added, along
with PREGf_IMPLICIT.
The idea is that, with /.*.../s, if the NFA don't match when started at
pos 0, then it's not going to match if started at any other position
either; while /.*.../ won't match at any other start position up until
the next \n.
However, the branch in regexec() that implemented this was a bit a mess
(like much in the perl core, it had gradually accreted), and caused
intuit-enabled /.*.../ and /.*...patterns to go quadratic.
The branch looked roughly like:
if (anchored) {
if (regtry(s)) goto success;
if (can_intuit) {
while (s < end) {
s = intuit(s+1);
if (!s) goto fail;
if (regtry(s)) goto success;
}
}
else {
while (s < end) {
s = skip_to_next_newline(s);
if (regtry(s)) goto success;
}
}
}
The problem is that in the presence of a .* at the start of the pattern,
intuit() will always return either NULL on failure, or the start position,
rather than any later position. So the can_intuit branch above calls
regtry() on every character position.
This commit fixes this by changing the structure of the code to be like
this, where it only tries things on newline boundaries:
if (anchored) {
if (regtry(s)) goto success;
while (1) {
s = skip_to_next_newline(s);
if (can_intuit) {
s = intuit(s+1);
if (!s) goto fail;
}
if (regtry(s)) goto success;
}
}
This makes the code a lot simpler, and mostly avoids quadratic behaviour
(you can still get it with a string consisting mainly of newlines).
Diffstat (limited to 't/re/speed.t')
-rw-r--r-- | t/re/speed.t | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/t/re/speed.t b/t/re/speed.t index f8d6723657..5afc0516c0 100644 --- a/t/re/speed.t +++ b/t/re/speed.t @@ -23,7 +23,7 @@ BEGIN { skip_all_without_unicode_tables(); } -plan tests => 9; # Update this when adding/deleting tests. +plan tests => 17; # Update this when adding/deleting tests. use strict; use warnings; @@ -98,6 +98,18 @@ sub run_tests { $s =~ /^XX\d{1,10}cde/ for 1..100; pass("abs anchored float string should fail quickly"); + # if /.*.../ fails to be optimised well (PREGf_IMPLICIT), + # things tend to go quadratic (RT #123743) + + $s = ('0' x 200_000) . '::: 0c'; + ok ($s !~ /.*:::\s*ab/, 'PREGf_IMPLICIT'); + ok ($s !~ /.*:::\s*ab/i, 'PREGf_IMPLICIT/i'); + ok ($s !~ /.*:::\s*ab/m, 'PREGf_IMPLICIT/m'); + ok ($s !~ /.*:::\s*ab/mi, 'PREGf_IMPLICIT/mi'); + ok ($s !~ /.*:::\s*ab/s, 'PREGf_IMPLICIT/s'); + ok ($s !~ /.*:::\s*ab/si, 'PREGf_IMPLICIT/si'); + ok ($s !~ /.*:::\s*ab/ms, 'PREGf_IMPLICIT/ms'); + ok ($s !~ /.*:::\s*ab/msi,'PREGf_IMPLICIT/msi'); } } # End of sub run_tests |