diff options
author | Jeremy Evans <code@jeremyevans.net> | 2023-01-27 11:08:49 -0800 |
---|---|---|
committer | Jeremy Evans <code@jeremyevans.net> | 2023-01-30 08:51:12 -0800 |
commit | eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a (patch) | |
tree | c739a7f9659ec7d73ea3eabda184e321aa4aeaf8 /benchmark/vm_bigarray.yml | |
parent | 3f54d09a5b8b6e4fd734abc8911e170d5967b5b0 (diff) | |
download | ruby-eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a.tar.gz |
Fix parsing of regexps that toggle extended mode on/off inside regexp
This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit
didn't handle cases where extended mode was turned on/off inside the
regexp. There are two ways to turn extended mode on/off:
```
/(?-x:#y)#z
/x =~ '#y'
/(?-x)#y(?x)#z
/x =~ '#y'
```
These can be nested inside the same regexp:
```
/(?-x:(?x)#x
(?-x)#y)#z
/x =~ '#y'
```
As you can probably imagine, this makes handling these regexps
somewhat complex. Due to the nesting inside portions of regexps,
the unassign_nonascii function needs to be recursive. In
recursive mode, it needs to track both opening and closing
parentheses, similar to how it already tracked opening and
closing brackets for character classes.
When scanning the regexp and coming to `(?` not followed by `#`,
scan for options, and use `x` and `i` to determine whether to
turn on or off extended mode. For `:`, indicting only the
current regexp section should have the extended mode
switched, recurse with the extended mode set or unset. For `)`,
indicating the remainder of the regexp (or current regexp portion
if already recursing) should turn extended mode on or off, just
change the extended mode flag and keep scanning.
While testing this, I noticed that `a`, `d`, and `u` are accepted
as options, in addition to `i`, `m`, and `x`, but I can't see
where those options are documented. I'm not sure whether or not
handling `a`, `d`, and `u` as options is a bug.
Fixes [Bug #19379]
Diffstat (limited to 'benchmark/vm_bigarray.yml')
0 files changed, 0 insertions, 0 deletions