summaryrefslogtreecommitdiff
path: root/regexec.c
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2014-01-12 23:39:43 -0700
committerKarl Williamson <public@khwilliamson.com>2014-01-22 11:45:56 -0700
commit710680787cad21825395c0224606ac1535624c52 (patch)
treec4f06a3bbc9eaa17894e69081b84b73c4baea6f0 /regexec.c
parentbeab9ebe349dffa8fc22a2912b83f62d2365e594 (diff)
downloadperl-710680787cad21825395c0224606ac1535624c52.tar.gz
Use bit instead of node for regex SSC
The flag bits in regular expression ANYOF nodes are perennially in short supply. However there are still plenty of regex nodes possible. So one solution to needing to pass more information is to create a node that encapsulates what is needed. That is what commit 9aa1e39f96ac28f6ce5d814d9a1eccf1464aba4a did to tell regexec.c that a particular ANYOF node is for the synthetic start class (SSC). However this solution introduces other issues. If you have to express two things, then you need a regnode for A, a regnode for B, a regnode for both A and B, and another regnode for both not A nor B; With three things, you need 8 regnodes to express all possible combinations. This becomes unwieldy to write code for. The number of combinations goes way down if some of them are mutually exclusive. At the time of that commit, I thought that a SSC need not ever warn if matching against an above-Unicode code point. I was wrong, and that has been corrected earlier in the 5.19 series. But it finally came to me how to tell regexec that an ANYOF node is for the SSC without taking up a flag bit and without requiring a regnode type. The 'next_off' field in a regnode tells the engine the offeset in the regex program to the node it's supposed to go to after processing this one. Since the SSC stands alone, its 'next_off' field is unused, and we can put anything we want in it. That, however, is not true of other ANYOF regnodes. But it turns out that there are certain values that will never be legitimate in the 'next_off' field in these, and so this commit uses one of those to signal that this ANYOF field is an SSC. regnodes come in various sizes, and the offset is in terms of how many of the smallest ones are there to the next node to look at. Since ANYOF nodes are large, the offset is always > 1, and so this commit uses 1 to indicate an SSC.
Diffstat (limited to 'regexec.c')
-rw-r--r--regexec.c3
1 files changed, 1 insertions, 2 deletions
diff --git a/regexec.c b/regexec.c
index 4a5771d56d..b8b5b992e6 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1497,7 +1497,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
/* We know what class it must start with. */
switch (OP(c)) {
case ANYOF:
- case ANYOF_SYNTHETIC:
if (utf8_target) {
REXEC_FBC_UTF8_CLASS_SCAN(
reginclass(prog, c, (U8*)s, (U8*) strend, utf8_target));
@@ -7536,7 +7535,7 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
|| (utf8_target
&& (c >=256
|| (! (flags & ANYOF_LOCALE))
- || OP(n) == ANYOF_SYNTHETIC))))
+ || is_ANYOF_SYNTHETIC(n)))))
{
SV * const sw = core_regclass_swash(prog, n, TRUE, 0);
if (sw) {