diff options
author | Karl Williamson <khw@cpan.org> | 2018-12-23 13:10:09 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-12-26 12:50:37 -0700 |
commit | aa419ff31a1e359d67cd44223a599ef9f276ca12 (patch) | |
tree | 23c69948685975b5e2a5e8d5490b41dbe82405ba /regcomp.sym | |
parent | 53362e8571b73a5375a41d96a612bb1ff62d5bbf (diff) | |
download | perl-aa419ff31a1e359d67cd44223a599ef9f276ca12.tar.gz |
regexec.c: Most /iaa nodes are now pre-folded
So, we don't have to re-fold them.
Previous commits have caused any EXACTFAA nodes to be pre-folded, and we
now have the infrastructure in regexec.c to take advantage of this,
including in non-UTF-8 patterns. This commit changes to do this.
The only non-pre-folded EXACTFAA nodes are those that are not UTF-8, but
the target string is. The reason is that the MICRO SIGN folds to
something representable only in UTF-8, so if you have both non-UTF-8, it
effectively is folded, and if you have the pattern in UTF-8, it gets
folded to the proper character.
In order for non-UTF-8 /iaa nodes to always be fully folded, there would
need to be a separate node for ones that contain the MICRO SIGN, and
then only that one wouldn't be considered folded when the target is
UTF-8. I don't think it's worth it, as the only gain would be in
matching a non-UTF-8 /iaa node against a UTF-8 target string. I suspect
/iaa will be used mostly in non-UTF8 target strings. Comments have been
added to point this out in case someone thinks it should be implemented.
Diffstat (limited to 'regcomp.sym')
-rw-r--r-- | regcomp.sym | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym index 235305dbc9..ab9943def4 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -108,6 +108,13 @@ EXACTFAA EXACT, str ; Match this string using /iaa rules (w/len) (stri # End of important relative ordering. EXACTFU_SS EXACT, str ; Match this string using /iu rules (w/len); (string not UTF-8, only portions guaranteed to be folded; folded length > unfolded). + +# In order for a non-UTF-8 EXACTFAA to think the pattern is pre-folded when +# matching a UTF-8 target string, there would have to be something like an +# EXACTFAA_MICRO which would not be considered pre-folded for UTF-8 targets, +# since the fold of the MICRO SIGN would not be done, and would be +# representable in the UTF-8 target string. + EXACTFLU8 EXACT, str ; Like EXACTFU, but use /il, UTF-8, folded, and everything in it is above 255. EXACTFAA_NO_TRIE EXACT, str ; Match this string using /iaa rules (w/len) (string not UTF-8, not guaranteed to be folded, not currently trie-able). |