summaryrefslogtreecommitdiff
path: root/ext/re
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2014-09-22 13:59:39 -0600
committerKarl Williamson <khw@cpan.org>2014-09-29 13:07:07 -0600
commitb35552de5cea8eb47ccb046284ecb9a099430255 (patch)
tree9cddbc9de1b38404bcf6fdae9e65f46b5a5d3e79 /ext/re
parentdea37815c59831b7e586fa51968348fbb8009e1a (diff)
downloadperl-b35552de5cea8eb47ccb046284ecb9a099430255.tar.gz
Tighten uses of regex synthetic start class
A synthetic start class (SSC) is generated by the regular expression pattern compiler to give a consolidation of all the possible things that can match at the beginning of where a pattern can possibly match. For example qr/a?bfoo/; requires the match to begin with either an 'a' or a 'b'. There are no other possibilities. We can set things up to quickly scan for either of these in the target string, and only when one of these is found do we need to look for 'foo'. There is an overhead associated with using SSCs. If the number of possibilities that the SSC excludes is relatively small, it can be counter-productive to use them. This patch creates a crude sieve to decide whether to use an SSC or not. If the SSC doesn't exclude at least half the "likely" possiblities, it is discarded. This patch is a starting point, and can be refined if necessary as we gain experience. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/212644 In many patterns, no SSC is generated; and with the advent of tries, SSC's have become less important, so whatever we do is not terribly critical.
Diffstat (limited to 'ext/re')
-rw-r--r--ext/re/t/regop.t5
1 files changed, 1 insertions, 4 deletions
diff --git a/ext/re/t/regop.t b/ext/re/t/regop.t
index 6397d4e5c3..60e4c02e0d 100644
--- a/ext/re/t/regop.t
+++ b/ext/re/t/regop.t
@@ -261,7 +261,6 @@ Offsets: [3]
Freeing REx: "[q]"
---
#Compiling REx "^(\S{1,9}):\s*(\d+)$"
-#synthetic stclass "ANYOF[\x{00}-\x{06}\a\b\x{0E}-\x{1F}\x{21}-\x{FF}][{utf8}0100-167F 1681-1FFF 200B-2027 202A-202E 2030-205E 2060-2FFF 3001-INFINITY]".
#Final program:
# 1: SBOL (2)
# 2: OPEN1 (4)
@@ -277,11 +276,9 @@ Freeing REx: "[q]"
# 17: CLOSE2 (19)
# 19: EOL (20)
# 20: END (0)
-#floating ":" at 1..9 (checking floating) stclass ANYOF[\x{00}-\x{06}\a\b\x{0E}-\x{1F}\x{21}-\x{FF}][{utf8}0100-167F 1681-1FFF 200B-2027 202A-202E 2030-205E 2060-2FFF 3001-INFINITY] anchored(SBOL) minlen 3
#Freeing REx: "^(\S{1,9}):\s*(\d+)$"
-floating ":" at 1..9 (checking floating) stclass ANYOF[\x{00}-\x{06}\a\b\x{0E}-\x{1F}\x{21}-\x{FF}][{utf8}0100-167F 1681-1FFF 200B-2027 202A-202E 2030-205E 2060-2FFF 3001-INFINITY] anchored(SBOL) minlen 3
%MATCHED%
-synthetic stclass
+Freeing REx: "^(\S{1,9}):\s*(\d+)$"
---
#Compiling REx "(?(DEFINE)(?<foo>foo))(?(DEFINE)(?<bar>(?&foo)bar))(?(DEFINE"...
#Got 532 bytes for offset annotations.