diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-10-16 11:43:08 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-10-17 17:04:28 -0600 |
commit | e067297c376fbbb5a0dc8428c65d922f11e1f4c6 (patch) | |
tree | 593b25c52b8899ff520f4842e536a6c0baba17ab /t | |
parent | 8a90a8fee1032a1bdee2a164f8265ff160fe22f0 (diff) | |
download | perl-e067297c376fbbb5a0dc8428c65d922f11e1f4c6.tar.gz |
regexec.c: Stop looking for match sooner
This is a partial reversion of commit
7c1b9f38fcbfdb3a9e1766e02bcb991d1a5452d9
which went unnecessarily far in fixing the problem.
After studying the situation some more, I see more clearly what was
going on. The point is that if you have only 2 characters left in the
string, but the pattern requires 3 to work, it's guaranteed to fail, so
pointless, and unnecessary work, to try. So don't being a match trial
at a position when there are fewer than the minimum number of characters
necessary. That is what the code before that commit did. However it
neglected the fact that it is possible for a single character to match
multiple ones, so there is not a 1:1 ratio. This new commit assumes the
worst possible ratio to calculate how far into a string is the furthest
a successful match could start. This is going to in most cases still
look too far, but it is much better than always going up to the final
character, as the previous patch did.
The maximum ratio is guaranteed by Unicode to be 3:1, but when the
target isn't in UTF-8, the max is 2:1, determined simply by inspection
of the defined folds. And actually, currently, the single case where it
isn't 1:1 doesn't come up here, because regcomp.c guarantees that that
match doesn't generate one of these EXACTFish nodes. However, I expect
that to change for 5.16, and so am preparing for that case by making it
2:1.
Diffstat (limited to 't')
-rw-r--r-- | t/re/re_tests | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/t/re/re_tests b/t/re/re_tests index 9b65f5532b..7b303c8755 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1546,4 +1546,9 @@ abc\N{def - c - \\N{NAME} must be resolved by the lexer /fi/i \x{FB01}\x{FB00} y $& \x{FB01} /fi/i \x{FB00}\x{FB01} y $& \x{FB01} +# These test that doesn't cut-off matching too soon in the string for +# multi-char folds +/ffiffl/i abcdef\x{FB03}\x{FB04} y $& \x{FB03}\x{FB04} +/\xdf\xdf/ui abcdefssss y $& ssss + # vim: softtabstop=0 noexpandtab |