diff options
author | Karl Williamson <khw@cpan.org> | 2022-05-31 06:44:20 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2022-05-31 06:52:33 -0600 |
commit | 8fcaedaae577eaca064a33debe61acb344a48a33 (patch) | |
tree | 67295faf1c4e526476b6b35e605896afb361d805 /regcomp.sym | |
parent | d28f3d9926a411980f5a4cdc1cd98635331536da (diff) | |
download | perl-8fcaedaae577eaca064a33debe61acb344a48a33.tar.gz |
regcomp.sym: Comment why no ANYOFRs node exists
Diffstat (limited to 'regcomp.sym')
-rw-r--r-- | regcomp.sym | 16 |
1 files changed, 14 insertions, 2 deletions
diff --git a/regcomp.sym b/regcomp.sym index bdf6e47551..f4060a378d 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -95,8 +95,20 @@ ANYOFHr ANYOF, sv 1 S ; Like ANYOFH, but the flags field contains pa ANYOFHs ANYOF, sv 1 S ; Like ANYOFHb, but has a string field that gives the leading matchable UTF-8 bytes; flags field is len ANYOFR ANYOFR, packed 1 S ; Matches any character in the range given by its packed args: upper 12 bits is the max delta from the base lower 20; the flags field contains the lowest matchable UTF-8 start byte ANYOFRb ANYOFR, packed 1 S ; Like ANYOFR, but all matches share the same UTF-8 start byte, given in the flags field -# There is no ANYOFRr because khw doesn't think there are likely to be real-world cases where such a large range is used. - +# There is no ANYOFRr because khw doesn't think there are likely to be +# real-world cases where such a large range is used. +# +# And khw doesn't believe an ANYOFRs (which would behave like ANYOFHs) is +# actually worth it. On two-byte UTF-8, the first byte alone is all we need, +# and ANYOFR already does that. And we don't consider non-Unicode code points +# or EBCDIC for performance decisions. If we had it, we would be comparing the +# strings, and if they are equal convert to UV and then test to see if it is in +# the range. The fast DFA we now use to do the conversion is slower than +# comparing the strings, but not by much, and negligible in 2 or 3 byte +# operations. (We don't have to compare the final byte as it has to be +# different or else this wouldn't be a range.) So we might as well displense +# with the comparisons that ANYOFRs would do, and go directly to do the +# conversion . ANYOFM ANYOFM, byte 1 S ; Like ANYOF, but matches an invariant byte as determined by the mask and arg NANYOFM ANYOFM, byte 1 S ; complement of ANYOFM |