summaryrefslogtreecommitdiff
path: root/regcomp.sym
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-12-23 13:10:09 -0700
committerKarl Williamson <khw@cpan.org>2018-12-26 12:50:37 -0700
commitaa419ff31a1e359d67cd44223a599ef9f276ca12 (patch)
tree23c69948685975b5e2a5e8d5490b41dbe82405ba /regcomp.sym
parent53362e8571b73a5375a41d96a612bb1ff62d5bbf (diff)
downloadperl-aa419ff31a1e359d67cd44223a599ef9f276ca12.tar.gz
regexec.c: Most /iaa nodes are now pre-folded
So, we don't have to re-fold them. Previous commits have caused any EXACTFAA nodes to be pre-folded, and we now have the infrastructure in regexec.c to take advantage of this, including in non-UTF-8 patterns. This commit changes to do this. The only non-pre-folded EXACTFAA nodes are those that are not UTF-8, but the target string is. The reason is that the MICRO SIGN folds to something representable only in UTF-8, so if you have both non-UTF-8, it effectively is folded, and if you have the pattern in UTF-8, it gets folded to the proper character. In order for non-UTF-8 /iaa nodes to always be fully folded, there would need to be a separate node for ones that contain the MICRO SIGN, and then only that one wouldn't be considered folded when the target is UTF-8. I don't think it's worth it, as the only gain would be in matching a non-UTF-8 /iaa node against a UTF-8 target string. I suspect /iaa will be used mostly in non-UTF8 target strings. Comments have been added to point this out in case someone thinks it should be implemented.
Diffstat (limited to 'regcomp.sym')
-rw-r--r--regcomp.sym7
1 files changed, 7 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym
index 235305dbc9..ab9943def4 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -108,6 +108,13 @@ EXACTFAA EXACT, str ; Match this string using /iaa rules (w/len) (stri
# End of important relative ordering.
EXACTFU_SS EXACT, str ; Match this string using /iu rules (w/len); (string not UTF-8, only portions guaranteed to be folded; folded length > unfolded).
+
+# In order for a non-UTF-8 EXACTFAA to think the pattern is pre-folded when
+# matching a UTF-8 target string, there would have to be something like an
+# EXACTFAA_MICRO which would not be considered pre-folded for UTF-8 targets,
+# since the fold of the MICRO SIGN would not be done, and would be
+# representable in the UTF-8 target string.
+
EXACTFLU8 EXACT, str ; Like EXACTFU, but use /il, UTF-8, folded, and everything in it is above 255.
EXACTFAA_NO_TRIE EXACT, str ; Match this string using /iaa rules (w/len) (string not UTF-8, not guaranteed to be folded, not currently trie-able).