summaryrefslogtreecommitdiff
path: root/regcomp.h
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2014-01-12 23:39:43 -0700
committerKarl Williamson <public@khwilliamson.com>2014-01-22 11:45:56 -0700
commit710680787cad21825395c0224606ac1535624c52 (patch)
treec4f06a3bbc9eaa17894e69081b84b73c4baea6f0 /regcomp.h
parentbeab9ebe349dffa8fc22a2912b83f62d2365e594 (diff)
downloadperl-710680787cad21825395c0224606ac1535624c52.tar.gz
Use bit instead of node for regex SSC
The flag bits in regular expression ANYOF nodes are perennially in short supply. However there are still plenty of regex nodes possible. So one solution to needing to pass more information is to create a node that encapsulates what is needed. That is what commit 9aa1e39f96ac28f6ce5d814d9a1eccf1464aba4a did to tell regexec.c that a particular ANYOF node is for the synthetic start class (SSC). However this solution introduces other issues. If you have to express two things, then you need a regnode for A, a regnode for B, a regnode for both A and B, and another regnode for both not A nor B; With three things, you need 8 regnodes to express all possible combinations. This becomes unwieldy to write code for. The number of combinations goes way down if some of them are mutually exclusive. At the time of that commit, I thought that a SSC need not ever warn if matching against an above-Unicode code point. I was wrong, and that has been corrected earlier in the 5.19 series. But it finally came to me how to tell regexec that an ANYOF node is for the SSC without taking up a flag bit and without requiring a regnode type. The 'next_off' field in a regnode tells the engine the offeset in the regex program to the node it's supposed to go to after processing this one. Since the SSC stands alone, its 'next_off' field is unused, and we can put anything we want in it. That, however, is not true of other ANYOF regnodes. But it turns out that there are certain values that will never be legitimate in the 'next_off' field in these, and so this commit uses one of those to signal that this ANYOF field is an SSC. regnodes come in various sizes, and the offset is in terms of how many of the smallest ones are there to the next node to look at. Since ANYOF nodes are large, the offset is always > 1, and so this commit uses 1 to indicate an SSC.
Diffstat (limited to 'regcomp.h')
-rw-r--r--regcomp.h20
1 files changed, 16 insertions, 4 deletions
diff --git a/regcomp.h b/regcomp.h
index 3db3c156c3..8bca5d233e 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -204,6 +204,19 @@ struct regnode_ssc {
SV* invlist; /* list of code points matched */
};
+/* We take advantage of 'next_off' not otherwise being used in the SSC by
+ * actually using it: by setting it to 1. This allows us to test and
+ * distinguish between an SSC and other ANYOF node types, as 'next_off' cannot
+ * otherwise be 1, because it is the offset to the next regnode expressed in
+ * units of regnodes. Since an ANYOF node contains extra fields, it adds up
+ * to 12 regnode units on 32-bit systems, (hence the minimum this can be (if
+ * not 0) is 11 there. Even if things get tightly packed on a 64-bit system,
+ * it still would be more than 1. */
+#define set_ANYOF_SYNTHETIC(n) STMT_START{ OP(n) = ANYOF; \
+ NEXT_OFF(n) = 1; \
+ } STMT_END
+#define is_ANYOF_SYNTHETIC(n) (OP(n) == ANYOF && NEXT_OFF(n) == 1)
+
/* XXX fix this description.
Impose a limit of REG_INFTY on various pattern matching operations
to limit stack growth and to avoid "infinite" recursions.
@@ -322,10 +335,9 @@ struct regnode_ssc {
* performance penalty. Better would be to split it off into a separate node,
* which actually would improve performance a bit by allowing regexec.c to test
* for a UTF-8 character being above 255 without having to call a function nor
- * calculate its code point value. However, this solution might need to have a
- * second node type, ANYOF_SYNTHETIC_ABOVE_LATIN1_ALL. Several flags are not
- * used in synthetic start class (SSC) nodes, so could be shared should new
- * flags be needed for SSCs. */
+ * calculate its code point value. Several flags are not used in synthetic
+ * start class (SSC) nodes, so could be shared should new flags be needed for
+ * SSCs. */
/* regexec.c is expecting this to be in the low bit */
#define ANYOF_INVERT 0x01