diff options
author | Karl Williamson <khw@cpan.org> | 2018-12-04 09:58:13 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-12-07 21:12:16 -0700 |
commit | 8a100c918ec81926c0536594df8ee1fcccb171da (patch) | |
tree | 61553beb7d50f69a1c65e5295f7cf7fc079097cd /regcomp.sym | |
parent | 127a194773690138ef2e74691af748a925a2f47a (diff) | |
download | perl-8a100c918ec81926c0536594df8ee1fcccb171da.tar.gz |
regcomp.c: Allow more EXACTFish nodes to be trieable
The previous two commits fixed bugs where it would be possible during
optimization to join two EXACTFish nodes together, and the result would
not work properly with LATIN SMALL LETTER SHARP S. But by doing so,
the commits caused all non-UTF-8 EXACTFU nodes that begin or end with
[Ss] from being trieable.
This commit changes things so that the only the ones that are
non-trieable are the ones that, when joined, have the sequence [Ss][Ss]
in them. To do so, I created three new node types that indicate if the
node begins with [Ss] or ends with them, or both. These preclude having
to examine the node contents at joining to determine this. And since
there are plenty of node types available, it seemed the best choice.
But other options would be available should we run out of nodes.
Examining the first and final characters of a node is not expensive, for
example.
Diffstat (limited to 'regcomp.sym')
-rw-r--r-- | regcomp.sym | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym index dffc03b1a0..ddf5ba886f 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -117,6 +117,12 @@ EXACTFU_ONLY8 EXACT, str ; Like EXACTFU, but only UTF-8 encoded targets # One could add EXACTFAA8 and and something that has the same effect for /l, # but these would be extremely uncommon +# If we ran out of node types, these could be replaced by some other method, +# such as instead examining the first and final characters of nodes. +EXACTFS_B_U EXACT, str ; EXACTFU but begins with [Ss]; (string not UTF-8; compile-time only). +EXACTFS_E_U EXACT, str ; EXACTFU but ends with [Ss]; (string not UTF-8; compile-time only). +EXACTFS_BE_U EXACT, str ; EXACTFU but begins and ends with [Ss]; (string not UTF-8; compile-time only). + #*Do nothing types NOTHING NOTHING, no ; Match empty string. |