summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorFather Chrysostomos <sprout@cpan.org>2011-02-20 14:17:19 -0800
committerFather Chrysostomos <sprout@cpan.org>2011-02-20 14:17:39 -0800
commitb6fa137b2c9fea23677e2533d7f563bb9f49db0a (patch)
tree8d2be20b1f1393617c50bb359f4bfaeb4cd5ff49 /pod
parentd7a0d798cfc5c243cb147b827ff253e49f9bd57f (diff)
downloadperl-b6fa137b2c9fea23677e2533d7f563bb9f49db0a.tar.gz
Update docs for postfix /dual flags
Diffstat (limited to 'pod')
-rw-r--r--pod/perldiag.pod9
-rw-r--r--pod/perlop.pod23
-rw-r--r--pod/perlre.pod177
3 files changed, 107 insertions, 102 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 2d9a9ac541..442106450f 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2008,11 +2008,10 @@ Further error messages would likely be uninformative.
(D syntax)
You had a word that isn't a regex modifier immediately following a
-pattern without an intervening space or you used one of the regex
-modifiers ("a", "d", "l", and "u") that in 5.14 are disallowed as
-suffixes. In that case, use the infix form, like C</(?a:...)/>. In the
-other case, add white space between the pattern and following word.
-As an example of the latter, the two constructs:
+pattern without an intervening space. If you are trying to use the C</le>
+flags on a substitution, use C</el> instead. Otherwise, add white space
+between the pattern and following word to eliminate the warning. As an
+example of the latter, the two constructs:
$a =~ m/$foo/sand $bar
$a =~ m/$foo/s and $bar
diff --git a/pod/perlop.pod b/pod/perlop.pod
index 79f608bf20..3b7bc0df94 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1267,7 +1267,7 @@ matching and related activities.
=over 8
-=item qr/STRING/msixpo
+=item qr/STRING/msixpodual
X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
This operator quotes (and possibly compiles) its I<STRING> as a regular
@@ -1327,28 +1327,29 @@ Options (specified by the following modifiers) are:
p When matching preserve a copy of the matched string so
that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
o Compile pattern only once.
+ l Use the locale
+ u Use Unicode semantics
+ a Use ASCII for \d, \s, \w
+ d Use Unicode or native charset, as in 5.12 and earlier
If a precompiled pattern is embedded in a larger pattern then the effect
of 'msixp' will be propagated appropriately. The effect of the 'o'
modifier has is not propagated, being restricted to those patterns
explicitly using it.
-Several other modifiers to control the character set semantics were
-added for 5.14 that, unlike the ones listed above, may not be used
-after the final pattern delimiter, but only following a C<"(?"> inside
-the regular expression. (It is planned in 5.16 to make them usable in
-the suffix position.) These are B<C<"a">>, B<C<"d">>, B<C<"l">>, and
-B<C<"u">>. They are documented in L<perlre/Extended Patterns>.
+The last four modifiers listed above, added in Perl 5.14,
+control the character set semantics. They are documented in
+L<perlre/Modifiers>.
See L<perlre> for additional information on valid syntax for STRING, and
for a detailed look at the semantics of regular expressions.
-=item m/PATTERN/msixpogc
+=item m/PATTERN/msixpodualgc
X<m> X<operator, match>
X<regexp, options> X<regexp> X<regex, options> X<regex>
X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
-=item /PATTERN/msixpogc
+=item /PATTERN/msixpodualgc
Searches a string for a pattern match, and in scalar context returns
true if it succeeds, false if it fails. If no string is specified
@@ -1384,7 +1385,7 @@ the trailing delimiter. This avoids expensive run-time recompilations,
and is useful when the value you are interpolating won't change over
the life of the script. However, mentioning C</o> constitutes a promise
that you won't change the variables in the pattern. If you change them,
-Perl won't even notice. See also L<"qr/STRING/msixpo">.
+Perl won't even notice. See also L<"qr/STRING/msixpodual">.
=item The empty pattern //
@@ -1561,7 +1562,7 @@ but the resulting C<?PATTERN?> syntax is deprecated, will warn on
usage and may be removed from a future stable release of Perl without
further notice.
-=item s/PATTERN/REPLACEMENT/msixpogcer
+=item s/PATTERN/REPLACEMENT/msixpodualgcer
X<substitute> X<substitution> X<replace> X<regexp, replace>
X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 31e881703c..0f5b436e08 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -75,19 +75,23 @@ rather than the regex itself. See
L<perlretut/"Using regular expressions in Perl"> for further explanation
of the g and c modifiers.
+=item a, d, l and u
+X</a> X</d> X</l> X</u>
+
+These modifiers, new in 5.14, affect which character-set semantics
+(Unicode, ASCII, etc.) are used, as described below.
+
=back
These are usually written as "the C</x> modifier", even though the delimiter
-in question might not really be a slash. The modifiers C</imsx>
+in question might not really be a slash. The modifiers C</imsxadlup>
may also be embedded within the regular expression itself using
-the C<(?...)> construct. Also are new (in 5.14) character set semantics
-modifiers B<C<<"a">>, B<C<"d">>, B<C<"l">> and B<C<"u">>, which, in 5.14
-only, must be used embedded in the regular expression, and not after the
-trailing delimiter. All this is discussed below in
-L</Extended Patterns>.
-X</a> X</d> X</l> X</u>
+the C<(?...)> construct.
+
+The C</x>, C</l>, C</u>, C</a> and C</d> modifiers need a little more
+explanation.
-The C</x> modifier itself needs a little more explanation. It tells
+C</x> tells
the regular expression parser to ignore most whitespace that is neither
backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts. The C<#>
@@ -114,6 +118,81 @@ in C<\p{...}> there can be spaces that follow the Unicode rules, for which see
L<perluniprops/Properties accessible through \p{} and \P{}>.
X</x>
+C</l> means to use a locale (see L<perllocale>) when pattern matching.
+The locale used will be the one in effect at the time of execution of
+the pattern match. This may not be the same as the compilation-time
+locale, and can differ from one match to another if there is an
+intervening call of the
+L<setlocale() function|perllocale/The setlocale function>.
+This modifier is automatically set if the regular expression is compiled
+within the scope of a C<"use locale"> pragma.
+Perl only allows single-byte locales. This means that code points above
+255 are treated as Unicode no matter what locale is in effect.
+Under Unicode rules, there are a few case-insensitive matches that cross the
+boundary 255/256 boundary. These are disallowed. For example,
+0xFF does not caselessly match the character at 0x178, LATIN CAPITAL
+LETTER Y WITH DIAERESIS, because 0xFF may not be LATIN SMALL LETTER Y
+in the current locale, and Perl has no way of knowing if that character
+even exists in the locale, much less what code point it is.
+X</l>
+
+C</u> means to use Unicode semantics when pattern matching. It is
+automatically set if the regular expression is encoded in utf8, or is
+compiled within the scope of a
+L<C<"use feature 'unicode_strings">|feature> pragma (and isn't also in
+the scope of L<C<"use locale">|locale> nor L<C<"use bytes">|bytes>
+pragmas). On ASCII platforms, the code points between 128 and 255 take on their
+Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's), whereas
+in strict ASCII their meanings are undefined. Thus the platform
+effectively becomes a Unicode platform. The ASCII characters remain as
+ASCII characters (since ASCII is a subset of Latin-1 and Unicode). For
+example, when this option is not on, on a non-utf8 string, C<"\w">
+matches precisely C<[A-Za-z0-9_]>. When the option is on, it matches
+not just those, but all the Latin-1 word characters (such as an "n" with
+a tilde). On EBCDIC platforms, which already are equivalent to Latin-1,
+this modifier changes behavior only when the C<"/i"> modifier is also
+specified, and affects only two characters, giving them full Unicode
+semantics: the C<MICRO SIGN> will match the Greek capital and
+small letters C<MU>; otherwise not; and the C<LATIN CAPITAL LETTER SHARP
+S> will match any of C<SS>, C<Ss>, C<sS>, and C<ss>, otherwise not.
+(This last case is buggy, however.)
+X</u>
+
+C</a> is the same as C</u>, except that C<\d>, C<\s>, C<\w>, and the
+Posix character classes are restricted to matching in the ASCII range
+only. That is, with this modifier, C<\d> always means precisely the
+digits C<"0"> to C<"9">; C<\s> means the five characters C<[ \f\n\r\t]>;
+C<\w> means the 63 characters C<[A-Za-z0-9_]>; and likewise, all the
+Posix classes such as C<[[:print:]]> match only the appropriate
+ASCII-range characters. As you would expect, this modifier causes, for
+example, C<\D> to mean the same thing as C<[^0-9]>; in fact, all
+non-ASCII characters match C<\D>, C<\S>, and C<\W>. C<\b> still means
+to match at the boundary between C<\w> and C<\W>, using the C<"a">
+definitions of them (similarly for C<\B>). Otherwise, C<"a"> behaves
+like the C<"u"> modifier, in that case-insensitive matching uses Unicode
+semantics; for example, "k" will match the Unicode C<\N{KELVIN SIGN}>
+under C</i> matching, and code points in the Latin1 range, above ASCII
+will have Unicode semantics when it comes to case-insensitive matching.
+But writing two in "a"'s in a row will increase its effect, causing the
+Kelvin sign and all other non-ASCII characters to not match any ASCII
+character under C</i> matching.
+X</a>
+
+C</d> means to use the traditional Perl pattern-matching behavior.
+This is dualistic (hence the name C</d>, which also could stand for
+"depends"). When this is in effect, Perl matches according to the
+platform's native character set rules unless there is something that
+indicates to use Unicode rules. If either the target string or the
+pattern itself is encoded in UTF-8, Unicode rules are used. Also, if
+the pattern contains Unicode-only features, such as code points above
+255, C<\p()> Unicode properties or C<\N{}> Unicode names, Unicode rules
+will be used. It is automatically selected by default if the regular
+expression is compiled neither within the scope of a C<"use locale">
+pragma nor a <C<"use feature 'unicode_strings"> pragma.
+This behavior causes a number of glitches, see
+L<perlunicode/The "Unicode Bug">.
+X</d>
+
=head2 Regular Expressions
=head3 Metacharacters
@@ -646,86 +725,12 @@ after the C<"?"> is a shorthand equivalent to C<d-imsx>. Flags (except
C<"d">) may follow the caret to override it.
But a minus sign is not legal with it.
-Also, starting in Perl 5.14, are modifiers C<"a">, C<"d">, C<"l">, and
-C<"u">, which for 5.14 may not be used as suffix modifiers.
-
-C<"l"> means to use a locale (see L<perllocale>) when pattern matching.
-The locale used will be the one in effect at the time of execution of
-the pattern match. This may not be the same as the compilation-time
-locale, and can differ from one match to another if there is an
-intervening call of the
-L<setlocale() function|perllocale/The setlocale function>.
-This modifier is automatically set if the regular expression is compiled
-within the scope of a C<"use locale"> pragma.
-Perl only allows single-byte locales. This means that code points above
-255 are treated as Unicode no matter what locale is in effect.
-Under Unicode rules, there are a few case-insensitive matches that cross the
-boundary 255/256 boundary. These are disallowed. For example,
-0xFF does not caselessly match the character at 0x178, LATIN CAPITAL
-LETTER Y WITH DIAERESIS, because 0xFF may not be LATIN SMALL LETTER Y
-in the current locale, and Perl has no way of knowing if that character
-even exists in the locale, much less what code point it is.
-
-C<"u"> means to use Unicode semantics when pattern matching. It is
-automatically set if the regular expression is encoded in utf8, or is
-compiled within the scope of a
-L<C<"use feature 'unicode_strings">|feature> pragma (and isn't also in
-the scope of L<C<"use locale">|locale> nor L<C<"use bytes">|bytes>
-pragmas. On ASCII platforms, the code points between 128 and 255 take on their
-Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's), whereas
-in strict ASCII their meanings are undefined. Thus the platform
-effectively becomes a Unicode platform. The ASCII characters remain as
-ASCII characters (since ASCII is a subset of Latin-1 and Unicode). For
-example, when this option is not on, on a non-utf8 string, C<"\w">
-matches precisely C<[A-Za-z0-9_]>. When the option is on, it matches
-not just those, but all the Latin-1 word characters (such as an "n" with
-a tilde). On EBCDIC platforms, which already are equivalent to Latin-1,
-this modifier changes behavior only when the C<"/i"> modifier is also
-specified, and affects only two characters, giving them full Unicode
-semantics: the C<MICRO SIGN> will match the Greek capital and
-small letters C<MU>; otherwise not; and the C<LATIN CAPITAL LETTER SHARP
-S> will match any of C<SS>, C<Ss>, C<sS>, and C<ss>, otherwise not.
-(This last case is buggy, however.)
-
-C<"a"> is the same as C<"u">, except that C<\d>, C<\s>, C<\w>, and the
-Posix character classes are restricted to matching in the ASCII range
-only. That is, with this modifier, C<\d> always means precisely the
-digits C<"0"> to C<"9">; C<\s> means the five characters C<[ \f\n\r\t]>;
-C<\w> means the 63 characters C<[A-Za-z0-9_]>; and likewise, all the
-Posix classes such as C<[[:print:]]> match only the appropriate
-ASCII-range characters. As you would expect, this modifier causes, for
-example, C<\D> to mean the same thing as C<[^0-9]>; in fact, all
-non-ASCII characters match C<\D>, C<\S>, and C<\W>. C<\b> still means
-to match at the boundary between C<\w> and C<\W>, using the C<"a">
-definitions of them (similarly for C<\B>). Otherwise, C<"a"> behaves
-like the C<"u"> modifier, in that case-insensitive matching uses Unicode
-semantics; for example, "k" will match the Unicode C<\N{KELVIN SIGN}>
-under C</i> matching, and code points in the Latin1 range, above ASCII
-will have Unicode semantics when it comes to case-insensitive matching.
-But writing two in "a"'s in a row will increase its effect, causing the
-Kelvin sign and all other non-ASCII characters to not match any ASCII
-character under C</i> matching.
-
-C<"d"> means to use the traditional Perl pattern matching behavior.
-This is dualistic (hence the name C<"d">, which also could stand for
-"depends"). When this is in effect, Perl matches according to the
-platform's native character set rules unless there is something that
-indicates to use Unicode rules. If either the target string or the
-pattern itself is encoded in UTF-8, Unicode rules are used. Also, if
-the pattern contains Unicode-only features, such as code points above
-255, C<\p()> Unicode properties or C<\N{}> Unicode names, Unicode rules
-will be used. It is automatically selected by default if the regular
-expression is compiled neither within the scope of a C<"use locale">
-pragma nor a <C<"use feature 'unicode_strings"> pragma.
-This behavior causes a number of glitches, see
-L<perlunicode/The "Unicode Bug">.
-
Note that the C<a>, C<d>, C<l>, C<p>, and C<u> modifiers are special in
that they can only be enabled, not disabled, and the C<a>, C<d>, C<l>, and
C<u> modifiers are mutually exclusive: specifying one de-specifies the
others, and a maximum of one may appear in the construct. Thus, for
-example, C<(?-p)>, C<(?-d:...)>, and C<(?dl:...)> will warn when
-compiled under C<use warnings>.
+example, C<(?-p)>, will warn when compiled under C<use warnings>;
+C<(?-d:...)> and C<(?dl:...)> are fatal errors.
Note also that the C<p> modifier is special in that its presence
anywhere in a pattern has a global effect.
@@ -1018,7 +1023,7 @@ For reasons of security, this construct is forbidden if the regular
expression involves run-time interpolation of variables, unless the
perilous C<use re 'eval'> pragma has been used (see L<re>), or the
variables contain results of C<qr//> operator (see
-L<perlop/"qr/STRINGE<sol>msixpo">).
+L<perlop/"qr/STRINGE<sol>msixpodual">).
This restriction is due to the wide-spread and remarkably convenient
custom of using run-time determined strings as patterns. For example:
@@ -1090,7 +1095,7 @@ For reasons of security, this construct is forbidden if the regular
expression involves run-time interpolation of variables, unless the
perilous C<use re 'eval'> pragma has been used (see L<re>), or the
variables contain results of C<qr//> operator (see
-L<perlop/"qrE<sol>STRINGE<sol>msixpo">).
+L<perlop/"qrE<sol>STRINGE<sol>msixpodual">).
In perl 5.12.x and earlier, because the regex engine was not re-entrant,
delayed code could not safely invoke the regex engine either directly with