diff options
author | Karl Williamson <khw@cpan.org> | 2019-09-25 10:12:32 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2019-09-29 11:46:26 -0600 |
commit | ae06e581c6e9944620eed4980fe89a3749886ed0 (patch) | |
tree | 44d0e3585bccc343af3dab4ceb693667aafe1b90 /regcomp.sym | |
parent | 3ae8ec479bc65ef004bd856d90b82106186771d9 (diff) | |
download | perl-ae06e581c6e9944620eed4980fe89a3749886ed0.tar.gz |
Add regnode LEXACT, for long strings
This commit adds a new regnode for strings that don't fit in a regular
one, and adds a structure for that regnode to use. Actually using them
is deferred to the next commit.
This new regnode structure is needed because the previous structure only
allows for an 8 bit length field, 255 max bytes. This commit puts the
length instead in a new field, the same place single-argument regnodes
put their argument. Hence this long string is an extra 32 bits of
overhead, but at no string length is this node ever bigger than the
combination of the smaller nodes it replaces.
I also considered simply combining the original 8 bit length field
(which is now unused) with the first byte of the string field to get a
16 bit length, and have the actual string be offset by 1. But I
rejected that because it would mean the string would usually not be
aligned, slowing down memory accesses.
This new LEXACT regnode can hold up to what 1024 regular EXACT ones hold,
using 4K fewer overhead bytes to do so. That means it can handle
strings containing 262000 bytes. The comments give ideas for expanding
that should it become necessary or desirable.
Besides the space advantage, any hardware acceleration in memcmp
can be done in much bigger chunks, and otherwise the memcmp inner loop
(often written in assembly) will run many more times in a row, and our
outer loop that calls it, correspondingly fewer.
Diffstat (limited to 'regcomp.sym')
-rw-r--r-- | regcomp.sym | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym index 8a2fb240f1..5512c0f8a1 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -117,6 +117,10 @@ BRANCH BRANCH, node 0 V ; Match this alternative, or the next... # NOTE: the relative ordering of these types is important do not change it EXACT EXACT, str ; Match this string (flags field is the length). + +#* In a long string node, the U32 argument is the length, and is +#* immediately followed by the string. +LEXACT EXACT, len:str 1; Match this long string (preceded by length; flags unused). EXACTL EXACT, str ; Like EXACT, but /l is in effect (used so locale-related warnings can be checked for). EXACTF EXACT, str ; Like EXACT, but match using /id rules; (string not UTF-8, not guaranteed to be folded). EXACTFL EXACT, str ; Like EXACT, but match using /il rules; (string not likely to be folded). |