summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2013-01-30 12:39:51 -0700
committerKarl Williamson <public@khwilliamson.com>2013-02-03 21:41:28 -0700
commitf4f5fe57a6038d0f7d0c320bc9204a232dbd874f (patch)
tree60b5872e796cef0f1054d611c6b469547dc8fd84 /pod/perlre.pod
parentce86dcfa1d3e3c441454845d5f995a06e543a739 (diff)
downloadperl-f4f5fe57a6038d0f7d0c320bc9204a232dbd874f.tar.gz
Incorporate code review feedback for (?[])
Thanks to Hugo van der Sanden for reviewing this new code.
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod107
1 files changed, 72 insertions, 35 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 98fd36edbf..414ba387ff 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1736,15 +1736,6 @@ to inside of one of these constructs. The following equivalences apply:
=item C<(?[ ])>
X<set operations>
-This is an experimental feature present starting in 5.18, but is subject
-to change as we gain field experience with it. Any attempt to use it
-will raise a warning, unless disabled via
-
- no warnings "experimental::regex_sets";
-
-Comments on this feature are welcome; send email to
-C<perl5-porters@perl.org>.
-
This is a fancy bracketed character class that can be used for more
readable and less error-prone classes, and to perform set operations,
such as intersection. An example is
@@ -1752,7 +1743,17 @@ such as intersection. An example is
/(?[ \p{Thai} & \p{Digit} ])/
This will match all the digit characters that are in the Thai script.
-We can extend this by
+
+This is an experimental feature available starting in 5.18, but is
+subject to change as we gain field experience with it. Any attempt to
+use it will raise a warning, unless disabled via
+
+ no warnings "experimental::regex_sets";
+
+Comments on this feature are welcome; send email to
+C<perl5-porters@perl.org>.
+
+We can extend the example above:
/(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/
@@ -1780,7 +1781,12 @@ There is one unary operator:
All the binary operators left associate, and are of equal precedence.
The unary operator right associates, and has higher precedence. Use
-parentheses to override the default associations.
+parentheses to override the default associations. Some feedback we've
+received indicates a desire for intersection to have higher precedence
+than union. This is something that feedback from the field may cause us
+to change in future releases; you may want to parenthesize copiously to
+avoid such changes affecting your code, until this feature is no longer
+considered experimental.
The main restriction is that everything is a metacharacter. Thus,
you cannot refer to single characters by doing something like this:
@@ -1793,7 +1799,7 @@ it in brackets:
/(?[ [a] + [b] ])/
(This is the same thing as C<[ab]>.) You could also have said the
-equivalent
+equivalent:
/(?[[ a b ]])/
@@ -1801,16 +1807,21 @@ equivalent
C<\N{ }>, etc.)
This last example shows the use of this construct to specify an ordinary
-bracketed character class without set operations. Note the white space
-within it. To specify a matchable white space character, you can escape
-it with a backslash, like:
+bracketed character class without additional set operations. Note the
+white space within it; L</E<sol>x> is turned on even within bracketed
+character classes, except you can't have comments inside them. Hence,
+
+ (?[ [#] ])
+
+matches the literal character "#". To specify a literal white space character,
+you can escape it with a backslash, like:
/(?[ [ a e i o u \ ] ])/
This matches the English vowels plus the SPACE character.
All the other escapes accepted by normal bracketed character classes are
-accepted here as well; but unlike the normal ones, unrecognized escapes are
-fatal errors here.
+accepted here as well; but unrecognized escapes that generate warnings
+in normal classes are fatal errors here.
All warnings from these class elements are fatal, as well as some
practices that don't currently warn. For example you cannot say
@@ -1822,28 +1833,50 @@ zero to make two). These restrictions are to lower the incidence of
typos causing the class to not match what you thought it would.
The final difference between regular bracketed character classes and
-these, is that it is not possible to get the latter to match a
+these, is that it is not possible to get these to match a
multi-character fold. Thus,
/(?[ [\xDF] ])/iu
does not match the string C<ss>.
-You don't have to enclose Posix class names inside double brackets. The
-following works
+You don't have to enclose Posix class names inside double brackets,
+hence both of the following work:
/(?[ [:word:] - [:lower:] ])/
+ /(?[ [[:word:]] - [[:lower:]] ])/
-C<< (?[ ]) >> is a compile-time construct. Any attempt to use something
-which isn't knowable until run-time is a fatal error. Thus, this
-construct cannot be used within the scope of C<use locale> (or the
-L</C<E<sol>l>> regex modifier). Any L<user-defined
-property|perlunicode/"User-Defined Character Properties"> used must be
-already defined by the time the regular expression is compiled; but note
-that this construct can be used to avoid defining such properties.
+The Posix character classes, including things like C<\w> and C<\D>
+respect the L</E<sol>a (and E<sol>aa)> modifiers.
-A regular expression using this construct that otherwise would compile
-using L</C<E<sol>d>> rules will instead use L</C<E<sol>u>>.
+C<< (?[ ]) >> is a regex-compile-time construct. Any attempt to use
+something which isn't knowable at the time the containing regular
+expression is compiled is a fatal error. In practice, this means
+just three limitiations:
+
+=over 4
+
+=item 1
+
+This construct cannot be used within the scope of
+C<use locale> (or the L</C<E<sol>l>> regex modifier).
+
+=item 2
+
+Any
+L<user-defined property|perlunicode/"User-Defined Character Properties">
+used must be already defined by the time the regular expression is
+compiled (but note that this construct can be used instead of such
+properties).
+
+=item 3
+
+A regular expression that otherwise would compile
+using L</C<E<sol>d>> rules, and which uses this construct will instead
+use L</C<E<sol>u>>. Thus this construct tells Perl that you don't want
+L</E<sol>d> rules for the entire regular expression containing it.
+
+=back
The L</C<E<sol>x>> processing within this class is an extended form.
Besides the characters that are considered white space in normal C</x>
@@ -1860,13 +1893,17 @@ construct. There must not be any space between any of the characters
that form the initial C<(?[>. Nor may there be space between the
closing C<])> characters.
-Due to the way that Perl parses things, your parentheses and brackets
-may need to be balanced, even including comments.
-Since this experimental, we may change this so that other legal uses of
-normal bracketed character classes might become illegal. One proposal,
-for example, is to forbid adjacent uses of the same character, as in
-C<[aa]>. This is likely a typo, as the second "a" adds nothing.
+Due to the way that Perl parses things, your parentheses and brackets
+may need to be balanced, even including comments. If you run into any
+examples, please send them to C<perlbug@perl.org>, so that we can have a
+concrete example for this man page.
+
+We may change it so that things that remain legal uses in normal bracketed
+character classes might become illegal within this experimental
+construct. One proposal, for example, is to forbid adjacent uses of the
+same character, as in C<(?[ [aa] ])>. The motivation for such a change
+is that this usage is likely a typo, as the second "a" adds nothing.
=back