summaryrefslogtreecommitdiff
path: root/regcomp.sym
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2019-09-25 10:12:32 -0600
committerKarl Williamson <khw@cpan.org>2019-09-29 11:46:26 -0600
commitae06e581c6e9944620eed4980fe89a3749886ed0 (patch)
tree44d0e3585bccc343af3dab4ceb693667aafe1b90 /regcomp.sym
parent3ae8ec479bc65ef004bd856d90b82106186771d9 (diff)
downloadperl-ae06e581c6e9944620eed4980fe89a3749886ed0.tar.gz
Add regnode LEXACT, for long strings
This commit adds a new regnode for strings that don't fit in a regular one, and adds a structure for that regnode to use. Actually using them is deferred to the next commit. This new regnode structure is needed because the previous structure only allows for an 8 bit length field, 255 max bytes. This commit puts the length instead in a new field, the same place single-argument regnodes put their argument. Hence this long string is an extra 32 bits of overhead, but at no string length is this node ever bigger than the combination of the smaller nodes it replaces. I also considered simply combining the original 8 bit length field (which is now unused) with the first byte of the string field to get a 16 bit length, and have the actual string be offset by 1. But I rejected that because it would mean the string would usually not be aligned, slowing down memory accesses. This new LEXACT regnode can hold up to what 1024 regular EXACT ones hold, using 4K fewer overhead bytes to do so. That means it can handle strings containing 262000 bytes. The comments give ideas for expanding that should it become necessary or desirable. Besides the space advantage, any hardware acceleration in memcmp can be done in much bigger chunks, and otherwise the memcmp inner loop (often written in assembly) will run many more times in a row, and our outer loop that calls it, correspondingly fewer.
Diffstat (limited to 'regcomp.sym')
-rw-r--r--regcomp.sym4
1 files changed, 4 insertions, 0 deletions
diff --git a/regcomp.sym b/regcomp.sym
index 8a2fb240f1..5512c0f8a1 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -117,6 +117,10 @@ BRANCH BRANCH, node 0 V ; Match this alternative, or the next...
# NOTE: the relative ordering of these types is important do not change it
EXACT EXACT, str ; Match this string (flags field is the length).
+
+#* In a long string node, the U32 argument is the length, and is
+#* immediately followed by the string.
+LEXACT EXACT, len:str 1; Match this long string (preceded by length; flags unused).
EXACTL EXACT, str ; Like EXACT, but /l is in effect (used so locale-related warnings can be checked for).
EXACTF EXACT, str ; Like EXACT, but match using /id rules; (string not UTF-8, not guaranteed to be folded).
EXACTFL EXACT, str ; Like EXACT, but match using /il rules; (string not likely to be folded).