diff options
author | Karl Williamson <khw@cpan.org> | 2019-09-19 16:03:04 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2019-11-17 21:20:07 -0700 |
commit | 13fcf6522466471a1b1c5fc2d760dd5367fd8940 (patch) | |
tree | f6272af8bf0e7308ab5792219978ab12d75d78d3 /regcomp.sym | |
parent | d913538e4f136a14760fb7c73de064901acfc25b (diff) | |
download | perl-13fcf6522466471a1b1c5fc2d760dd5367fd8940.tar.gz |
Add ANYOFR regnode
This matches a single range of code points. It is both faster and
smaller than other ANYOF-type nodes, requiring, after set-up, a single
subtraction and conditional branch.
The vast majority of Unicode properties match a single range (though
most of the properties likely to be used in real world applications have
more than a single range). But things like [ij] are a single range, and
those are quite commonly encountered. This new regnode matches them more
efficiently than a bitmap would, and doesn't require the space for one
either.
The flags field is used to store the minimum matchable start byte for
UTF-8 strings, and is ignored for non-UTF-8 targets. This, like ANYOFH
nodes which have a similar mechanism, allows for quick weeding out of
many possible matches without having to convert the UTF-8 to its
corresponding code point.
This regnode packs the 32 bit argument with 20 bits for the minimum code
point the node matches, and 12 bits for the maximum range. If the input
is a value outside these, it simply won't compile to this regnode,
instead going to one of the ANYOFH flavors.
ANYOFR is sufficient to match all of Unicode except for the final
(private use) 65K plane.
Diffstat (limited to 'regcomp.sym')
-rw-r--r-- | regcomp.sym | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym index 4ea160e6db..b664fc8f07 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -82,6 +82,7 @@ ANYOFPOSIXL ANYOF, sv charclass_posixl S ; Like ANYOFL, but matches [[:p ANYOFH ANYOF, sv 1 S ; Like ANYOF, but only has "High" matches, none in the bitmap; the flags field contains the lowest matchable UTF-8 start byte ANYOFHb ANYOF, sv 1 S ; Like ANYOFH, but all matches share the same UTF-8 start byte, given in the flags field ANYOFHr ANYOF, sv 1 S ; Like ANYOFH, but the flags field contains packed bounds for all matchable UTF-8 start bytes. +ANYOFR ANYOFR, packed 1 S ; Matches any character in the range given by its packed args: upper 12 bits is the max delta from the base lower 20; the flags field contains the lowest matchable UTF-8 start byte ANYOFM ANYOFM byte 1 S ; Like ANYOF, but matches an invariant byte as determined by the mask and arg NANYOFM ANYOFM byte 1 S ; complement of ANYOFM |