summaryrefslogtreecommitdiff
path: root/Porting
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2019-11-14 15:26:53 -0700
committerKarl Williamson <khw@cpan.org>2019-11-16 11:12:14 -0700
commit4e4df0522939c9283b72776a4779e87429296b52 (patch)
treed583b36015e879134747b2e49215145e07621dac /Porting
parent42d7c9105488cf20f4e3f4c3d535a4ee170ab849 (diff)
downloadperl-4e4df0522939c9283b72776a4779e87429296b52.tar.gz
Revamp finding splittable places in /i full node
Commits 3ae8ec479bc65ef004bd856d90b82106186771d9 and cc1ed6368d665290794d7c24d1dbeb42466e256a didn't actually work. Tests in pat_advanced.t would have failed, except that optimizations in the regex engine in the meantime led to the tests not actually testing what they originally did. I believe that this finally gets it right for non-/l. The problem is when an EXACTFish node becomes full, you don't want to split across a multi-char fold. To use a fairly familiar example, we can't split between 'ss', as that sequence matches a LATIN SMALL LETTER SHARP S, and the way the regex engine currently works, it can't see beyond the current node; it would see one or the other 's' but not the sequence. So the code backs off one character and checks if it can split there. If not, it repeats until it finds such a place or gets to the beginning. If the entire node is all 's'es, for example, there's no good place to split. So it gives up and takes all of them. One thing I hadn't realized before is when there are three-character folds, you can't split if the current position is the beginning of the three, but also when it is the second of the three.
Diffstat (limited to 'Porting')
0 files changed, 0 insertions, 0 deletions