diff options
author | Karl Williamson <khw@cpan.org> | 2020-02-15 17:39:00 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2020-02-19 22:09:48 -0700 |
commit | 4829f32decd128e6a122bd8ce35fe944bd87f104 (patch) | |
tree | b75fb15b5cc5c3321468801f5a12d212ab6f36de /lib | |
parent | 809471154ccc09f339d9a2841b7c32c4aa9c40a2 (diff) | |
download | perl-4829f32decd128e6a122bd8ce35fe944bd87f104.tar.gz |
Restrict features in wildcards
The algorithm for dealing with Unicode property wildcards is to wrap the
user-supplied pattern with /miaa. We don't want the user to be able to
override the /m and /aa parts. Modifiers that are only specifiable as a
modifier in a qr or similar op (like /gc) can't be included in things
like (?gc). These normally incur a warning that they are ignored, but
the texts of those warnings are misleading when using wildcards, so I
chose to just make them illegal. Of course that could be changed to
having custom useful warning texts, but I didn't think it was worth it.
I also chose to forbid recursion of using nested \p{}, just from fear
that it might lead to issues down the road, and it really isn't useful
for this limited universe of strings to match against. Because
wildcards currently can't handle '}' inside them, only the single letter
\p,\P are valid anyway.
Similarly, I forbid the '*' quantifier to make it harder for the
constructed subpattern to take forever to make any progress and decide
to halt. Again, using it would be overkill on the universe of possible
match strings.
Diffstat (limited to 'lib')
-rw-r--r-- | lib/unicore/mktables | 2 | ||||
-rw-r--r-- | lib/unicore/uni_keywords.pl | 2 |
2 files changed, 3 insertions, 1 deletions
diff --git a/lib/unicore/mktables b/lib/unicore/mktables index 1820ad3a30..498a94d9f1 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -19251,6 +19251,8 @@ Error('\p{InKana}'); # 'Kana' is not a block so InKana shouldn't compile # Test_WB() Test_WB("$breakable 0020 $breakable 0020 $breakable 0308 $breakable"); Test_LB("$nobreak 200B $nobreak 0020 $nobreak 0020 $breakable 2060 $breakable"); +Expect(1, ord(" "), '\p{gc=:(?aa)s:}', ""); # /aa is valid +Expect(1, ord(" "), '\p{gc=:(?-s)s:}', ""); # /-s is valid EOF_CODE # Sort these so get results in same order on different runs of this diff --git a/lib/unicore/uni_keywords.pl b/lib/unicore/uni_keywords.pl index e76cf63899..f8ee5e2e6c 100644 --- a/lib/unicore/uni_keywords.pl +++ b/lib/unicore/uni_keywords.pl @@ -1295,7 +1295,7 @@ # baba9dfc133e3cb770a89aaf0973b1341fa61c2da6c176baf6428898b3b568d8 lib/unicore/extracted/DLineBreak.txt # 6d4a8c945dd7db83ed617cbb7d937de7f4ecf016ff22970d846e996a7c9a2a5d lib/unicore/extracted/DNumType.txt # 5b7c14380d5cceeaffcfbc18db1ed936391d2af2d51f5a41f1a17b692c77e59b lib/unicore/extracted/DNumValues.txt -# 93f508a690aa8949f213d50b573710f0b4a4e843c17283938035ecf19e0220e2 lib/unicore/mktables +# a3f3caba903e4d39b6c7aaa7ea4d3a739e745b010ad51cf0e05f34ffa0ac2c04 lib/unicore/mktables # 50b85a67451145545a65cea370dab8d3444fbfe07e9c34cef560c5b7da9d3eef lib/unicore/version # 2680b9254eb236c5c090f11b149605043e8c8433661b96efc4a42fb4709342a5 regen/charset_translations.pl # 6bbad21de0848e0236b02f34f5fa0edd3cdae9ba8173cc9469a5513936b9e728 regen/mk_PL_charclass.pl |