summaryrefslogtreecommitdiff
path: root/t
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2011-10-16 11:43:08 -0600
committerKarl Williamson <public@khwilliamson.com>2011-10-17 17:04:28 -0600
commite067297c376fbbb5a0dc8428c65d922f11e1f4c6 (patch)
tree593b25c52b8899ff520f4842e536a6c0baba17ab /t
parent8a90a8fee1032a1bdee2a164f8265ff160fe22f0 (diff)
downloadperl-e067297c376fbbb5a0dc8428c65d922f11e1f4c6.tar.gz
regexec.c: Stop looking for match sooner
This is a partial reversion of commit 7c1b9f38fcbfdb3a9e1766e02bcb991d1a5452d9 which went unnecessarily far in fixing the problem. After studying the situation some more, I see more clearly what was going on. The point is that if you have only 2 characters left in the string, but the pattern requires 3 to work, it's guaranteed to fail, so pointless, and unnecessary work, to try. So don't being a match trial at a position when there are fewer than the minimum number of characters necessary. That is what the code before that commit did. However it neglected the fact that it is possible for a single character to match multiple ones, so there is not a 1:1 ratio. This new commit assumes the worst possible ratio to calculate how far into a string is the furthest a successful match could start. This is going to in most cases still look too far, but it is much better than always going up to the final character, as the previous patch did. The maximum ratio is guaranteed by Unicode to be 3:1, but when the target isn't in UTF-8, the max is 2:1, determined simply by inspection of the defined folds. And actually, currently, the single case where it isn't 1:1 doesn't come up here, because regcomp.c guarantees that that match doesn't generate one of these EXACTFish nodes. However, I expect that to change for 5.16, and so am preparing for that case by making it 2:1.
Diffstat (limited to 't')
-rw-r--r--t/re/re_tests5
1 files changed, 5 insertions, 0 deletions
diff --git a/t/re/re_tests b/t/re/re_tests
index 9b65f5532b..7b303c8755 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1546,4 +1546,9 @@ abc\N{def - c - \\N{NAME} must be resolved by the lexer
/fi/i \x{FB01}\x{FB00} y $& \x{FB01}
/fi/i \x{FB00}\x{FB01} y $& \x{FB01}
+# These test that doesn't cut-off matching too soon in the string for
+# multi-char folds
+/ffiffl/i abcdef\x{FB03}\x{FB04} y $& \x{FB03}\x{FB04}
+/\xdf\xdf/ui abcdefssss y $& ssss
+
# vim: softtabstop=0 noexpandtab