diff options
author | Karl Williamson <khw@cpan.org> | 2016-02-09 11:50:04 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-02-09 23:30:54 -0700 |
commit | 46d34d0e1e7de87f74f8b2df4b32f291baf21dbb (patch) | |
tree | d1c55a71af5488e197ca31951e49f480adb1c325 /t/lib/warnings/regcomp | |
parent | d8fd4ea0c782a6d356681b28eb35e215d74e4ccd (diff) | |
download | perl-46d34d0e1e7de87f74f8b2df4b32f291baf21dbb.tar.gz |
PATCH: [perl #8904] Revamp [:posix:] parsing
A problem with bracketed character classes, qr/[foo]/, is that there is
very little structure about them, so almost anything is legal, and so
typos just silently compile into something unintended. One of the
possible components are posix character classes. There are 14 of them,
and they have a very restricted structure, which is easy to get slightly
wrong, so that instead of the intended posix class being compiled,
something else silently is created. This commit causes the regex
compiler to look for slightly misspelled posix character classes and to
raise a warning when found. It does not change the results of the
compilation.
To do this, it introduces fuzzy parsing into the regex compiler, using
the Damerau-Levenshtein algorithm to find out how many single character
edits it would take to transform the input into one of the 14 classes.
If it is 1 or 2 off, it considers the input to have been intended to be
that class and raises the warning. If more edits would be needed, it
remains silent.
This is a heuristic, and someone could have made enough typos that this
thinks a class wasn't intended that was. Conversely it could raise a
warning when no class was intended, though warnings only happen when the
input very closely resembles a posix class of one of the 14 legal ones.
The algorithm can be tweaked if experience indicates it should. But the
bottom line is that many more cases of unintended results will now be
warned about.
Things like having blanks in the construct and having the '^' before the
colon are recognized as being intended posix classes (given that the
actual names are close to one of the 14), and raise warnings. Again
this commit does not change what gets compiled. This found a bug in
autodoc.pl which was fixed a few commits ago.
The [. .] and [= =] POSIX constructs cause perl to croak that they are
unimplemented. This commit improves the parsing of these two, and fixes
some false positives. See
http://nntp.perl.org/group/perl.perl5.porters/230975
The new code combines two functions in regcomp.c into one new one.
Diffstat (limited to 't/lib/warnings/regcomp')
-rw-r--r-- | t/lib/warnings/regcomp | 23 |
1 files changed, 22 insertions, 1 deletions
diff --git a/t/lib/warnings/regcomp b/t/lib/warnings/regcomp index 044e02f2a6..af8c06bf5c 100644 --- a/t/lib/warnings/regcomp +++ b/t/lib/warnings/regcomp @@ -1,6 +1,6 @@ regcomp.c These tests have been moved to t/re/reg_mesg.t except for those that explicitly test line numbers - and those that don't have a <-- HERE in them. + and those that don't have a <-- HERE in them, and those that die plus have warnings __END__ use warnings 'regexp'; @@ -52,3 +52,24 @@ no warnings 'utf8'; qr/abc[fi[.00./i; EXPECT Unmatched [ in regex; marked by <-- HERE in m/abc[ <-- HERE fi[.00./ at - line 4. +######## +# NAME perl qr/(?[[[:word]]])/ XXX Why is 'syntax' lc? +# OPTION fatal +qr/(?[[[:word]]])/; +EXPECT +Assuming NOT a POSIX class since there is no terminating ':' in regex; marked by <-- HERE in m/(?[[[:word <-- HERE ]]])/ at - line 2. +syntax error in (?[...]) in regex m/(?[[[:word]]])/ at - line 2. +######## +# NAME qr/(?[ [[:digit: ])/ +# OPTION fatal +qr/(?[[[:digit: ])/; +EXPECT +Assuming NOT a POSIX class since no blanks are allowed in one in regex; marked by <-- HERE in m/(?[[[:digit: ] <-- HERE )/ at - line 2. +syntax error in (?[...]) in regex m/(?[[[:digit: ])/ at - line 2. +######## +# NAME qr/(?[ [:digit: ])/ +# OPTION fatal +qr/(?[[:digit: ])/ +EXPECT +Assuming NOT a POSIX class since no blanks are allowed in one in regex; marked by <-- HERE in m/(?[[:digit: ] <-- HERE )/ at - line 2. +syntax error in (?[...]) in regex m/(?[[:digit: ])/ at - line 2. |