summaryrefslogtreecommitdiff
path: root/regcomp.sym
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-12-04 09:58:13 -0700
committerKarl Williamson <khw@cpan.org>2018-12-07 21:12:16 -0700
commit8a100c918ec81926c0536594df8ee1fcccb171da (patch)
tree61553beb7d50f69a1c65e5295f7cf7fc079097cd /regcomp.sym
parent127a194773690138ef2e74691af748a925a2f47a (diff)
downloadperl-8a100c918ec81926c0536594df8ee1fcccb171da.tar.gz
regcomp.c: Allow more EXACTFish nodes to be trieable
The previous two commits fixed bugs where it would be possible during optimization to join two EXACTFish nodes together, and the result would not work properly with LATIN SMALL LETTER SHARP S. But by doing so, the commits caused all non-UTF-8 EXACTFU nodes that begin or end with [Ss] from being trieable. This commit changes things so that the only the ones that are non-trieable are the ones that, when joined, have the sequence [Ss][Ss] in them. To do so, I created three new node types that indicate if the node begins with [Ss] or ends with them, or both. These preclude having to examine the node contents at joining to determine this. And since there are plenty of node types available, it seemed the best choice. But other options would be available should we run out of nodes. Examining the first and final characters of a node is not expensive, for example.
Diffstat (limited to 'regcomp.sym')
-rw-r--r--regcomp.sym6
1 files changed, 6 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym
index dffc03b1a0..ddf5ba886f 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -117,6 +117,12 @@ EXACTFU_ONLY8 EXACT, str ; Like EXACTFU, but only UTF-8 encoded targets
# One could add EXACTFAA8 and and something that has the same effect for /l,
# but these would be extremely uncommon
+# If we ran out of node types, these could be replaced by some other method,
+# such as instead examining the first and final characters of nodes.
+EXACTFS_B_U EXACT, str ; EXACTFU but begins with [Ss]; (string not UTF-8; compile-time only).
+EXACTFS_E_U EXACT, str ; EXACTFU but ends with [Ss]; (string not UTF-8; compile-time only).
+EXACTFS_BE_U EXACT, str ; EXACTFU but begins and ends with [Ss]; (string not UTF-8; compile-time only).
+
#*Do nothing types
NOTHING NOTHING, no ; Match empty string.