summaryrefslogtreecommitdiff
path: root/regcomp.sym
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-12-19 11:21:28 -0700
committerKarl Williamson <khw@cpan.org>2018-12-26 12:50:37 -0700
commit0ea669f4e37ccfbcd5ad708ca625ec17bf22e5b3 (patch)
tree8c8766a5d7fd7dc7cde698caa828e97d16a272c0 /regcomp.sym
parent627a7895564679975632d9b637b27e9c09d3d985 (diff)
downloadperl-0ea669f4e37ccfbcd5ad708ca625ec17bf22e5b3.tar.gz
Collapse regnode EXACTFU_SS into EXACTFUP
EXACTFUP was created by the previous commit to handle a problematic case in which not all the code points in an EXACTFU node are /i foldable at compile time. Doing so will allow a future commit to use the pre-folded EXACTFU nodes (done in a prior commit), saving execution time for the common case. The only problematic code point is the MICRO SIGN. Most patterns don't use this character. EXACTFU_SS is problematic in a different way. It contains the sequence 'ss' which is folded to by LATIN SMALL LETTER SHARP S, but everything in it can be pre-folded (unless it also contains a MICRO SIGN). The reason this is problematic is that it is the only non-UTF-8 node where the length in folding can change. To process it at runtime, the more general fold equivalence function is used that is capable of handling length disparities, but is slower than the functions otherwise used for non-UTF-8. What I've chosen to do for now is to make a single node type for all the problematic cases (which at this time means just the two aforementioned ones). If we didn't do this, we'd have to add a third node type for patterns that contain both 'ss' and MICRO. Or artificially split the pattern so the two never were in the same node, but we can't do that because it can cause bugs in handling multi-character folds. If more special handling is found to be needed, there'd be a combinatorial explosion of additional node types to handle all possible combinations. What this effectively means is that the slower, more general foldEQ function is used for portions of patterns containing the MICRO sign when the pattern isn't in UTF-8, even though there is no inherent reason to do so for non-UTF-8 strings that don't also contain the 'ss' sequence.
Diffstat (limited to 'regcomp.sym')
-rw-r--r--regcomp.sym1
1 files changed, 0 insertions, 1 deletions
diff --git a/regcomp.sym b/regcomp.sym
index bdbe059cc5..8033a138d2 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -107,7 +107,6 @@ EXACTFAA EXACT, str ; Match this string using /iaa rules (w/len) (stri
# End of important relative ordering.
-EXACTFU_SS EXACT, str ; Match this string using /iu rules (w/len); (string not UTF-8, only portions guaranteed to be folded; folded length > unfolded).
EXACTFUP EXACT, str ; Match this string using /iu rules (w/len); (string not UTF-8, not guaranteed to be folded; and its Problematic).
# In order for a non-UTF-8 EXACTFAA to think the pattern is pre-folded when
# matching a UTF-8 target string, there would have to be something like an