From 8fcaedaae577eaca064a33debe61acb344a48a33 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Tue, 31 May 2022 06:44:20 -0600 Subject: regcomp.sym: Comment why no ANYOFRs node exists --- regcomp.sym | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'regcomp.sym') diff --git a/regcomp.sym b/regcomp.sym index bdf6e47551..f4060a378d 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -95,8 +95,20 @@ ANYOFHr ANYOF, sv 1 S ; Like ANYOFH, but the flags field contains pa ANYOFHs ANYOF, sv 1 S ; Like ANYOFHb, but has a string field that gives the leading matchable UTF-8 bytes; flags field is len ANYOFR ANYOFR, packed 1 S ; Matches any character in the range given by its packed args: upper 12 bits is the max delta from the base lower 20; the flags field contains the lowest matchable UTF-8 start byte ANYOFRb ANYOFR, packed 1 S ; Like ANYOFR, but all matches share the same UTF-8 start byte, given in the flags field -# There is no ANYOFRr because khw doesn't think there are likely to be real-world cases where such a large range is used. - +# There is no ANYOFRr because khw doesn't think there are likely to be +# real-world cases where such a large range is used. +# +# And khw doesn't believe an ANYOFRs (which would behave like ANYOFHs) is +# actually worth it. On two-byte UTF-8, the first byte alone is all we need, +# and ANYOFR already does that. And we don't consider non-Unicode code points +# or EBCDIC for performance decisions. If we had it, we would be comparing the +# strings, and if they are equal convert to UV and then test to see if it is in +# the range. The fast DFA we now use to do the conversion is slower than +# comparing the strings, but not by much, and negligible in 2 or 3 byte +# operations. (We don't have to compare the final byte as it has to be +# different or else this wouldn't be a range.) So we might as well displense +# with the comparisons that ANYOFRs would do, and go directly to do the +# conversion . ANYOFM ANYOFM, byte 1 S ; Like ANYOF, but matches an invariant byte as determined by the mask and arg NANYOFM ANYOFM, byte 1 S ; complement of ANYOFM -- cgit v1.2.1