summaryrefslogtreecommitdiff
path: root/pod/perlrecharclass.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2022-02-24 11:14:55 -0700
committerKarl Williamson <khw@cpan.org>2022-03-03 14:18:59 -0700
commit43843d94b0de72c0a7e6bd61101cb1c4b4622817 (patch)
tree2574db43e3ec403c5183f125d290535202e42144 /pod/perlrecharclass.pod
parent3c9bbd85c943afea781e3cccff4d8cfda2321ada (diff)
downloadperl-43843d94b0de72c0a7e6bd61101cb1c4b4622817.tar.gz
perlrecharclass: Update regex sets pod
The pod didn't reflect restrictions on what can go into such a class.
Diffstat (limited to 'pod/perlrecharclass.pod')
-rw-r--r--pod/perlrecharclass.pod35
1 files changed, 9 insertions, 26 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 0089c5ab6f..a2df2dfbeb 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -1181,46 +1181,29 @@ closing C<])> characters.
Just as in all regular expressions, the pattern can be built up by
including variables that are interpolated at regex compilation time.
-But its best to compile each sub-component.
+But currently each such sub-component should be an already-compiled
+extended bracketed character class.
my $thai_or_lao = qr/(?[ \p{Thai} + \p{Lao} ])/;
- my $lower = qr/(?[ \p{Lower} + \p{Digit} ])/;
+ ...
+ qr/(?[ \p{Digit} & $thai_or_lao ])/;
-When these are embedded in another pattern, what they match does not
-change, regardless of parenthesization or what modifiers are in effect
-in that outer pattern. If you fail to compile the subcomponents, you
-can get some nasty surprises. For example:
+If you interpolate something else, the pattern may still compile (or it
+may die), but if it compiles, it very well may not behave as you would
+expect:
my $thai_or_lao = '\p{Thai} + \p{Lao}';
- ...
qr/(?[ \p{Digit} & $thai_or_lao ])/;
compiles to
qr/(?[ \p{Digit} & \p{Thai} + \p{Lao} ])/;
-But this does not have the effect that someone reading the source code
+This does not have the effect that someone reading the source code
would likely expect, as the intersection applies just to C<\p{Thai}>,
-excluding the Laotian. Its best to compile the subcomponents, but you
-could also parenthesize the component pieces:
-
- my $thai_or_lao = '( \p{Thai} + \p{Lao} )';
-
-But any modifiers will still apply to all the components:
-
- my $lower = '\p{Lower} + \p{Digit}';
- qr/(?[ \p{Greek} & $lower ])/i;
-
-matches upper case things. So just, compile the subcomponents, as
-illustrated above.
+excluding the Laotian.
Due to the way that Perl parses things, your parentheses and brackets
may need to be balanced, even including comments. If you run into any
examples, please submit them to L<https://github.com/Perl/perl5/issues>,
so that we can have a concrete example for this man page.
-
-We may change it so that things that remain legal uses in normal bracketed
-character classes might become illegal within this experimental
-construct. One proposal, for example, is to forbid adjacent uses of the
-same character, as in C<(?[ [aa] ])>. The motivation for such a change
-is that this usage is likely a typo, as the second "a" adds nothing.