summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2020-02-15 17:39:00 -0700
committerKarl Williamson <khw@cpan.org>2020-02-19 22:09:48 -0700
commit4829f32decd128e6a122bd8ce35fe944bd87f104 (patch)
treeb75fb15b5cc5c3321468801f5a12d212ab6f36de /pod/perlunicode.pod
parent809471154ccc09f339d9a2841b7c32c4aa9c40a2 (diff)
downloadperl-4829f32decd128e6a122bd8ce35fe944bd87f104.tar.gz
Restrict features in wildcards
The algorithm for dealing with Unicode property wildcards is to wrap the user-supplied pattern with /miaa. We don't want the user to be able to override the /m and /aa parts. Modifiers that are only specifiable as a modifier in a qr or similar op (like /gc) can't be included in things like (?gc). These normally incur a warning that they are ignored, but the texts of those warnings are misleading when using wildcards, so I chose to just make them illegal. Of course that could be changed to having custom useful warning texts, but I didn't think it was worth it. I also chose to forbid recursion of using nested \p{}, just from fear that it might lead to issues down the road, and it really isn't useful for this limited universe of strings to match against. Because wildcards currently can't handle '}' inside them, only the single letter \p,\P are valid anyway. Similarly, I forbid the '*' quantifier to make it harder for the constructed subpattern to take forever to make any progress and decide to halt. Again, using it would be overkill on the universe of possible match strings.
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod19
1 files changed, 16 insertions, 3 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 761b0db85b..212f3e91c3 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1027,14 +1027,27 @@ C<$/>.
No modifiers may follow the final delimiter. Instead, use
L<perlre/(?adlupimnsx-imnsx)> and/or
L<perlre/(?adluimnsx-imnsx:pattern)> to specify modifiers.
+However, certain modifiers are illegal in your wildcard subpattern.
+The only character set modifier specifiable is C</aa>;
+any other character set, and C<-m>, and C<p>, and C<s> are all illegal.
+Specifying modifiers like C<qr/.../gc> that aren't legal in the
+C<(?...)> notation normally raise a warning, but with wildcard
+subpatterns, their use is an error. The C<m> modifier is ineffective;
+everything that matches will be a single line.
+
+By default, your pattern is matched case-insensitively, as if C</i> had
+been specified. You can change this by saying C<(?-i)> in your pattern.
+
+There are also certain operations that are illegal. You can't nest
+C<\p{...}> and C<\P{...}> calls within a wildcard subpattern, and C<\G>
+doesn't make sense, so is also prohibited.
+
+And the C<*> quantifier (or its equivalent C<(0,}>) is illegal.
This feature is not available when the left-hand side is prefixed by
C<Is_>, nor for any form that is marked as "Discouraged" in
L<perluniprops/Discouraged>.
-By default, your pattern is matched case-insensitively, as if C</i> had
-been specified. You can change this by saying C<(?-i)> in your pattern.
-
This experimental feature has been added to begin to implement
L<https://www.unicode.org/reports/tr18/#Wildcard_Properties>. Using it
will raise a (default-on) warning in the