diff options
author | Karl Williamson <public@khwilliamson.com> | 2010-11-07 15:25:31 -0700 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2010-11-07 21:42:42 -0800 |
commit | 2726813d9af5d50f1451663cd931317e7172da50 (patch) | |
tree | 12ffa4ce7951e688df59ceceb9a061ab67d606de /perl.c | |
parent | a85c03da46d77cd5b9f4e0ba809245cf000962ad (diff) | |
download | perl-2726813d9af5d50f1451663cd931317e7172da50.tar.gz |
regexec.c: Don't give up on fold matching early
As noted in the comments of the code, "a" =~ /[A]/i doesn't work currently
(except that regcomp.c knows about the ASCII characters and corrects for
it, but not always, for example in cases like "a" =~ /\p{Upper}/i. This
patch catches all those).
It works by computing a list of all characters that (singly) fold to
another one, and then checking each of those. The maximum length of
the list is 3 in the current Unicode standard.
I believe that a better long-term solution is to do this at compile
rather than execution time, by generating a closure of everything
matched. But this can't be done now because the data structure would
need to be extensively revamped to list all non-byte characters, and
user-defined \p{} matches are not known at compile-time.
And it doesn't handle the multi-char folds. There is a separate ticket
for those.
Diffstat (limited to 'perl.c')
-rw-r--r-- | perl.c | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -1003,6 +1003,7 @@ perl_destruct(pTHXx) SvREFCNT_dec(PL_utf8_tofold); SvREFCNT_dec(PL_utf8_idstart); SvREFCNT_dec(PL_utf8_idcont); + SvREFCNT_dec(PL_utf8_foldclosures); PL_utf8_alnum = NULL; PL_utf8_ascii = NULL; PL_utf8_alpha = NULL; @@ -1022,6 +1023,7 @@ perl_destruct(pTHXx) PL_utf8_tofold = NULL; PL_utf8_idstart = NULL; PL_utf8_idcont = NULL; + PL_utf8_foldclosures = NULL; if (!specialWARN(PL_compiling.cop_warnings)) PerlMemShared_free(PL_compiling.cop_warnings); |