summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2015-10-21 17:04:20 -0600
committerKarl Williamson <khw@cpan.org>2015-10-30 11:18:14 -0600
commit7711f97842bc713f668a0686e9cb44322fe53f8c (patch)
treeca09fed59113d0c29ed7dae2725ea4c32f564045
parenta82f4918f5debccfb7e9a7047d2c2e558df538cd (diff)
downloadperl-7711f97842bc713f668a0686e9cb44322fe53f8c.tar.gz
perlre: Nits
This mostly adds C<> formatting, but there are a few updates, clarifications, and grammar-type fixes.
-rw-r--r--pod/perlre.pod274
1 files changed, 146 insertions, 128 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 2a4516cdb5..e45e4442a7 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -13,7 +13,7 @@ introduction is available in L<perlretut>.
For reference on how regular expressions are used in matching
operations, plus various examples of the same, see discussions of
-C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like
+C<m//>, C<s///>, C<qr//> and C<"??"> in L<perlop/"Regexp Quote-Like
Operators">.
New in v5.22, L<C<use re 'strict'>|re/'strict' mode> applies stricter
@@ -32,40 +32,41 @@ L<perlop/"Gory details of parsing quoted constructs">.
=over 4
-=item m
+=item B<C<m>>
X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
-Treat string as multiple lines. That is, change "^" and "$" from matching
+Treat the string as multiple lines. That is, change C<"^"> and C<"$"> from matching
the start of the string's first line and the end of its last line to
matching the start and end of each line within the string.
-=item s
+=item B<C<s>>
X</s> X<regex, single-line> X<regexp, single-line>
X<regular expression, single-line>
-Treat string as single line. That is, change "." to match any character
+Treat the string as single line. That is, change C<"."> to match any character
whatsoever, even a newline, which normally it would not match.
-Used together, as C</ms>, they let the "." match any character whatsoever,
-while still allowing "^" and "$" to match, respectively, just after
+Used together, as C</ms>, they let the C<"."> match any character whatsoever,
+while still allowing C<"^"> and C<"$"> to match, respectively, just after
and just before newlines within the string.
-=item i
+=item B<C<i>>
X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
X<regular expression, case-insensitive>
-Do case-insensitive pattern matching.
+Do case-insensitive pattern matching. For example, "A" will match "a"
+under C</i>.
If locale matching rules are in effect, the case map is taken from the
current
locale for code points less than 255, and from Unicode rules for larger
code points. However, matches that would cross the Unicode
-rules/non-Unicode rules boundary (ords 255/256) will not succeed. See
-L<perllocale>.
+rules/non-Unicode rules boundary (ords 255/256) will not succeed, unless
+the locale is a UTF-8 one. See L<perllocale>.
-There are a number of Unicode characters that match multiple characters
-under C</i>. For example, C<LATIN SMALL LIGATURE FI>
-should match the sequence C<fi>. Perl is not
+There are a number of Unicode characters that match a sequence of
+multiple characters under C</i>. For example,
+C<LATIN SMALL LIGATURE FI> should match the sequence C<fi>. Perl is not
currently able to do this when the multiple characters are in the pattern and
are split between groupings, or when one or more are quantified. Thus
@@ -84,30 +85,30 @@ inverted, which otherwise could be highly confusing. See
L<perlrecharclass/Bracketed Character Classes>, and
L<perlrecharclass/Negation>.
-=item x
+=item B<C<x>>
X</x>
Extend your pattern's legibility by permitting whitespace and comments.
Details in L</"/x">
-=item p
+=item B<C<p>>
X</p> X<regex, preserve> X<regexp, preserve>
-Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
-${^POSTMATCH} are available for use after matching.
+Preserve the string matched such that C<${^PREMATCH}>, C<${^MATCH}>, and
+C<${^POSTMATCH}> are available for use after matching.
In Perl 5.20 and higher this is ignored. Due to a new copy-on-write
-mechanism, ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} will be available
+mechanism, C<${^PREMATCH}>, C<${^MATCH}>, and C<${^POSTMATCH}> will be available
after the match regardless of the modifier.
-=item a, d, l and u
+=item B<C<a>>, B<C<d>>, B<C<l>>, and B<C<u>>
X</a> X</d> X</l> X</u>
These modifiers, all new in 5.14, affect which character-set rules
(Unicode, etc.) are used, as described below in
L</Character set modifiers>.
-=item n
+=item B<C<n>>
X</n> X<regex, non-capture> X<regexp, non-capture>
X<regular expression, non-capture>
@@ -169,7 +170,7 @@ C</x> tells
the regular expression parser to ignore most whitespace that is neither
backslashed nor within a bracketed character class. You can use this to
break up your regular expression into (slightly) more readable parts.
-Also, the C<#> character is treated as a metacharacter introducing a
+Also, the C<"#"> character is treated as a metacharacter introducing a
comment that runs up to the pattern's closing delimiter, or to the end
of the current line if the pattern extends onto the next line. Hence,
this is very much like an ordinary Perl code comment. (You can include
@@ -177,7 +178,7 @@ the closing delimiter within the comment only if you precede it with a
backslash, so be careful!)
Use of C</x> means that if you want real
-whitespace or C<#> characters in the pattern (outside a bracketed character
+whitespace or C<"#"> characters in the pattern (outside a bracketed character
class, which is unaffected by C</x>), then you'll either have to
escape them (using backslashes or C<\Q...\E>) or encode them using octal,
hex, or C<\N{}> escapes.
@@ -203,8 +204,8 @@ a C<\Q...\E> stays unaffected by C</x>. And note that C</x> doesn't affect
space interpretation within a single multi-character construct. For
example in C<\x{...}>, regardless of the C</x> modifier, there can be no
spaces. Same for a L<quantifier|/Quantifiers> such as C<{3}> or
-C<{5,}>. Similarly, C<(?:...)> can't have a space between the C<(>,
-C<?>, and C<:>. Within any delimiters for such a
+C<{5,}>. Similarly, C<(?:...)> can't have a space between the C<"{">,
+C<"?">, and C<":">. Within any delimiters for such a
construct, allowed spaces are not affected by C</x>, and depend on the
construct. For example, C<\x{...}> can't have spaces because hexadecimal
numbers don't have spaces in them. But, Unicode properties can have spaces, so
@@ -267,7 +268,7 @@ regular expressions compiled within the scope of various pragmas,
and we recommend that in general, you use those pragmas instead of
specifying these modifiers explicitly. For one thing, the modifiers
affect only pattern matching, and do not extend to even any replacement
-done, whereas using the pragmas give consistent results for all
+done, whereas using the pragmas gives consistent results for all
appropriate operations within their scopes. For example,
s/foo/\Ubar/il
@@ -302,9 +303,11 @@ the same as the compilation-time locale, and can differ from one match
to another if there is an intervening call of the
L<setlocale() function|perllocale/The setlocale function>.
-The only non-single-byte locale Perl supports is (starting in v5.20)
-UTF-8. This means that code points above 255 are treated as Unicode no
-matter what locale is in effect (since UTF-8 implies Unicode).
+Prior to v5.20, Perl did not support multi-byte locales. Starting then,
+UTF-8 locales are supported. No other multi byte locales are ever
+likely to be supported. However, in all locales, one can have code
+points above 255 and these will always be treated as Unicode no matter
+what locale is in effect.
Under Unicode rules, there are a few case-insensitive matches that cross
the 255/256 boundary. Except for UTF-8 locales in Perls v5.20 and
@@ -425,7 +428,8 @@ This modifier is automatically selected by default when none of the
others are, so yet another name for it is "Default".
Because of the unexpected behaviors associated with this modifier, you
-probably should only use it to maintain weird backward compatibilities.
+probably should only explicitly use it to maintain weird backward
+compatibilities.
=head4 /a (and /aa)
@@ -468,8 +472,8 @@ points in the Latin1 range, above ASCII will have Unicode rules when it
comes to case-insensitive matching.
To forbid ASCII/non-ASCII matches (like "k" with C<\N{KELVIN SIGN}>),
-specify the "a" twice, for example C</aai> or C</aia>. (The first
-occurrence of "a" restricts the C<\d>, etc., and the second occurrence
+specify the C<"a"> twice, for example C</aai> or C</aia>. (The first
+occurrence of C<"a"> restricts the C<\d>, etc., and the second occurrence
adds the C</i> restrictions.) But, note that code points outside the
ASCII range will use Unicode rules for C</i> matching, so the modifier
doesn't really restrict things to just ASCII; it just forbids the
@@ -559,20 +563,20 @@ X<\> X<^> X<.> X<$> X<|> X<(> X<()> X<[> X<[]>
() Grouping
[] Bracketed Character class
-By default, the "^" character is guaranteed to match only the
-beginning of the string, the "$" character only the end (or before the
+By default, the C<"^"> character is guaranteed to match only the
+beginning of the string, the C<"$"> character only the end (or before the
newline at the end), and Perl does certain optimizations with the
assumption that the string contains only one line. Embedded newlines
-will not be matched by "^" or "$". You may, however, wish to treat a
-string as a multi-line buffer, such that the "^" will match after any
+will not be matched by C<"^"> or C<"$">. You may, however, wish to treat a
+string as a multi-line buffer, such that the C<"^"> will match after any
newline within the string (except if the newline is the last character in
-the string), and "$" will match before any newline. At the
+the string), and C<"$"> will match before any newline. At the
cost of a little more overhead, you can do this by using the /m modifier
on the pattern match operator. (Older programs did this by setting C<$*>,
but this option was removed in perl 5.10.)
X<^> X<$> X</m>
-To simplify multi-line substitutions, the "." character never matches a
+To simplify multi-line substitutions, the C<"."> character never matches a
newline unless you use the C</s> modifier, which in effect tells Perl to pretend
the string is a single line--even if it isn't.
X<.> X</s>
@@ -598,8 +602,8 @@ or enclosing them within square brackets (C<"[{]">). This change will
allow for future syntax extensions (like making the lower bound of a
quantifier optional), and better error checking of quantifiers.)
-The "*" quantifier is equivalent to C<{0,}>, the "+"
-quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited
+The C<"*"> quantifier is equivalent to C<{0,}>, the C<"+">
+quantifier to C<{1,}>, and the C<"?"> quantifier to C<{0,1}>. I<n> and I<m> are limited
to non-negative integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
@@ -609,7 +613,7 @@ be seen in the error message generated by code such as this:
By default, a quantified subpattern is "greedy", that is, it will match as
many times as possible (given a particular starting location) while still
allowing the rest of the pattern to match. If you want it to match the
-minimum number of times possible, follow the quantifier with a "?". Note
+minimum number of times possible, follow the quantifier with a C<"?">. Note
that the meanings don't change, just the "greediness":
X<metacharacter> X<greedy> X<greediness>
X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
@@ -798,9 +802,9 @@ of it (in either order), counting the imaginary characters off the
beginning and end of the string as matching a C<\W>. (Within
character classes C<\b> represents backspace rather than a word
boundary, just as it normally does in any double-quoted string.)
-The C<\A> and C<\Z> are just like "^" and "$", except that they
+The C<\A> and C<\Z> are just like C<"^"> and C<"$">, except that they
won't match multiple times when the C</m> modifier is used, while
-"^" and "$" will match at every internal line boundary. To match
+C<"^"> and C<"$"> will match at every internal line boundary. To match
the actual end of the string and not ignore an optional trailing
newline, use C<\z>.
X<\b> X<\A> X<\Z> X<\z> X</m>
@@ -1008,7 +1012,7 @@ C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
and C<$'>, B<except> that they are only guaranteed to be defined after a
successful match that was executed with the C</p> (preserve) modifier.
The use of these variables incurs no global performance penalty, unlike
-their punctuation char equivalents, however at the trade-off that you
+their punctuation character equivalents, however at the trade-off that you
have to tell perl when you want to use them. As of Perl 5.20, these three
variables are equivalent to C<$`>, C<$&> and C<$'>, and C</p> is ignored.
X</p> X<p modifier>
@@ -1018,7 +1022,8 @@ X</p> X<p modifier>
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
C<\w>, C<\n>. Unlike some other regular expression languages, there
are no backslashed symbols that aren't alphanumeric. So anything
-that looks like \\, \(, \), \[, \], \{, or \} is always
+that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
+always
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
@@ -1027,9 +1032,9 @@ use for a pattern. Simply quote all non-"word" characters:
$pattern =~ s/(\W)/\\$1/g;
(If C<use locale> is set, then this depends on the current locale.)
-Today it is more common to use the quotemeta() function or the C<\Q>
-metaquoting escape sequence to disable all metacharacters' special
-meanings like this:
+Today it is more common to use the C<L<quotemeta()|perlfunc/quotemeta>>
+function or the C<\Q> metaquoting escape sequence to disable all
+metacharacters' special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
@@ -1068,8 +1073,8 @@ X<(?#)>
A comment. The text is ignored.
Note that Perl closes
-the comment as soon as it sees a C<)>, so there is no way to put a literal
-C<)> in the comment. The pattern's closing delimiter must be escaped by
+the comment as soon as it sees a C<")">, so there is no way to put a literal
+C<")"> in the comment. The pattern's closing delimiter must be escaped by
a backslash if it appears in the comment.
See L</E<sol>x> for another way to have comments in patterns.
@@ -1080,7 +1085,7 @@ See L</E<sol>x> for another way to have comments in patterns.
X<(?)> X<(?^)>
One or more embedded pattern-match modifiers, to be turned on (or
-turned off, if preceded by C<->) for the remainder of the pattern or
+turned off, if preceded by C<"-">) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read in from a
@@ -1107,7 +1112,7 @@ modifier outside this group.
These modifiers do not carry over into named subpatterns called in the
enclosing group. In other words, a pattern such as C<((?i)(?&NAME))> does not
-change the case-sensitivity of the "NAME" pattern.
+change the case-sensitivity of the C<"NAME"> pattern.
Any of these modifiers can be set to apply globally to all regular
expressions compiled within the scope of a C<use re>. See
@@ -1138,7 +1143,7 @@ X<(?:)>
X<(?^:)>
This is for clustering, not capturing; it groups subexpressions like
-"()", but doesn't make backreferences as "()" does. So
+C<"()">, but doesn't make backreferences as C<"()"> does. So
@fields = split(/\b(?:a|b|c)\b/)
@@ -1149,7 +1154,7 @@ is like
but doesn't spit out extra fields. It's also cheaper not to capture
characters if you don't need to.
-Any letters between C<?> and C<:> act as flags modifiers as with
+Any letters between C<"?"> and C<":"> act as flags modifiers as with
C<(?adluimnsx-imnsx)>. For example,
/(?s-i:more.*than).*million/i
@@ -1158,7 +1163,7 @@ is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
-Note that any C<(...)> constructs enclosed within this one will still
+Note that any C<()> constructs enclosed within this one will still
capture unless the C</n> modifier is in effect.
Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
@@ -1314,7 +1319,7 @@ after a successful match via C<%+> or C<%->. See L<perlvar>
for more details on the C<%+> and C<%-> hashes.
If multiple distinct capture groups have the same name then the
-$+{NAME} will refer to the leftmost defined group in the match.
+C<$+{NAME}> will refer to the leftmost defined group in the match.
The forms C<(?'NAME'pattern)> and C<< (?<NAME>pattern) >> are equivalent.
@@ -1325,10 +1330,10 @@ pattern
/(x)(?<foo>y)(z)/
-$+{foo} will be the same as $2, and $3 will contain 'z' instead of
+C<$+{I<foo>}> will be the same as C<$2>, and C<$3> will contain 'z' instead of
the opposite which is what a .NET regex hacker might expect.
-Currently NAME is restricted to simple identifiers only.
+Currently I<NAME> is restricted to simple identifiers only.
In other words, it must match C</^[_A-Za-z][_A-Za-z0-9]*\z/> or
its Unicode extension (see L<utf8>),
though it isn't extended by the locale (see L<perllocale>).
@@ -1453,7 +1458,7 @@ L<"Backtracking">). For example,
will initially increment C<$cnt> up to 8; then during backtracking, its
value will be unwound back to 4, which is the value assigned to C<$res>.
-At the end of the regex execution, $cnt will be wound back to its initial
+At the end of the regex execution, C<$cnt> will be wound back to its initial
value of 0.
This assertion may be used as the condition in a
@@ -1510,7 +1515,7 @@ etc., to refer to the enclosing pattern's capture groups.) Thus, although
('a' x 100)=~/(??{'(.)' x 100})/
-I<will> match, it will I<not> set $1 on exit.
+I<will> match, it will I<not> set C<$1> on exit.
The following pattern matches a parenthesized group:
@@ -1563,7 +1568,7 @@ Note that the counting for relative recursion differs from that of
relative backreferences, in that with recursion unclosed groups B<are>
included.
-The following pattern matches a function foo() which may contain
+The following pattern matches a function C<foo()> which may contain
balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
@@ -1613,7 +1618,7 @@ B<Note> that this pattern does not behave the same way as the equivalent
PCRE or Python construct of the same form. In Perl you can backtrack into
a recursed group, in PCRE and Python the recursed into group is treated
as atomic. Also, modifiers are resolved at compile time, so constructs
-like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will
+like C<(?i:(?1))> or C<(?:(?i)(?1))> do not affect how the sub-pattern will
be processed.
=item C<(?&NAME)>
@@ -1639,42 +1644,57 @@ Conditional expression. Matches C<yes-pattern> if C<condition> yields
a true value, matches C<no-pattern> otherwise. A missing pattern always
matches.
-C<(condition)> should be one of: 1) an integer in
-parentheses (which is valid if the corresponding pair of parentheses
-matched); 2) a look-ahead/look-behind/evaluate zero-width assertion; 3) a
-name in angle brackets or single quotes (which is valid if a group
-with the given name matched); or 4) the special symbol (R) (true when
-evaluated inside of recursion or eval). Additionally the R may be
+C<(condition)> should be one of:
+
+=over 4
+
+=item an integer in parentheses
+
+(which is valid if the corresponding pair of parentheses
+matched);
+
+=item a look-ahead/look-behind/evaluate zero-width assertion;
+
+=item a name in angle brackets or single quotes
+
+(which is valid if a group with the given name matched);
+
+=item the special symbol C<(R)>
+
+(true when evaluated inside of recursion or eval). Additionally the
+C<R> may be
followed by a number, (which will be true when evaluated when recursing
inside of the appropriate group), or by C<&NAME>, in which case it will
be true only when evaluated during recursion in the named group.
+=back
+
Here's a summary of the possible predicates:
=over 4
-=item (1) (2) ...
+=item C<(1)> C<(2)> ...
Checks if the numbered capturing group has matched something.
-=item (<NAME>) ('NAME')
+=item C<(E<lt>I<NAME>E<gt>)> C<('I<NAME>')>
Checks if a group with the given name has matched something.
-=item (?=...) (?!...) (?<=...) (?<!...)
+=item C<(?=...)> C<(?!...)> C<(?<=...)> C<(?<!...)>
-Checks whether the pattern matches (or does not match, for the '!'
+Checks whether the pattern matches (or does not match, for the C<"!">
variants).
-=item (?{ CODE })
+=item C<(?{ I<CODE> })>
Treats the return value of the code block as the condition.
-=item (R)
+=item C<(R)>
Checks if the expression has been evaluated inside of recursion.
-=item (R1) (R2) ...
+=item C<(R1)> C<(R2)> ...
Checks if the expression has been evaluated while executing directly
inside of the n-th capture group. This check is the regex equivalent of
@@ -1683,14 +1703,14 @@ inside of the n-th capture group. This check is the regex equivalent of
In other words, it does not check the full recursion stack.
-=item (R&NAME)
+=item C<(R&I<NAME>)>
Similar to C<(R1)>, this predicate checks to see if we're executing
directly inside of the leftmost group with a given name (this is the same
-logic used by C<(?&NAME)> to disambiguate). It does not check the full
+logic used by C<(?&I<NAME>)> to disambiguate). It does not check the full
stack, but only the name of the innermost active recursion.
-=item (DEFINE)
+=item C<(DEFINE)>
In this case, the yes-pattern is never directly executed, and no
no-pattern is allowed. Similar in spirit to C<(?{0})> but more efficient.
@@ -1825,7 +1845,7 @@ This was only 4 times slower on a string with 1000000 C<a>s.
The "grab all you can, and do not give anything back" semantic is desirable
in many situations where on the first sight a simple C<()*> looks like
the correct solution. Suppose we parse text with comments being delimited
-by C<#> followed by some optional (horizontal) whitespace. Contrary to
+by C<"#"> followed by some optional (horizontal) whitespace. Contrary to
its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
the comment delimiter, because it may "give up" some whitespace if
the remainder of the pattern can be made to match that way. The correct
@@ -1834,7 +1854,7 @@ answer is either one of these:
(?>#[ \t]*)
#[ \t]*(?![ \t])
-For example, to grab non-empty comments into $1, one should use either
+For example, to grab non-empty comments into C<$1>, one should use either
one of these:
/ (?> \# [ \t]* ) ( .+ ) /x;
@@ -1864,8 +1884,8 @@ See L<perlrecharclass/Extended Bracketed Character Classes>.
=head2 Special Backtracking Control Verbs
-These special patterns are generally of the form C<(*VERB:ARG)>. Unless
-otherwise stated the ARG argument is optional; in some cases, it is
+These special patterns are generally of the form C<(*I<VERB>:I<ARG>)>. Unless
+otherwise stated the I<ARG> argument is optional; in some cases, it is
mandatory.
Any pattern containing a special backtracking verb that allows an argument
@@ -1873,9 +1893,9 @@ has the special behaviour that when executed it sets the current package's
C<$REGERROR> and C<$REGMARK> variables. When doing so the following
rules apply:
-On failure, the C<$REGERROR> variable will be set to the ARG value of the
+On failure, the C<$REGERROR> variable will be set to the I<ARG> value of the
verb pattern, if the verb was involved in the failure of the match. If the
-ARG part of the pattern was omitted, then C<$REGERROR> will be set to the
+I<ARG> part of the pattern was omitted, then C<$REGERROR> will be set to the
name of the last C<(*MARK:NAME)> pattern executed, or to TRUE if there was
none. Also, the C<$REGMARK> variable will be set to FALSE.
@@ -1902,10 +1922,10 @@ argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
X<(*PRUNE)> X<(*PRUNE:NAME)>
This zero-width pattern prunes the backtracking tree at the current point
-when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>,
-where A and B are complex patterns. Until the C<(*PRUNE)> verb is reached,
-A may backtrack as necessary to match. Once it is reached, matching
-continues in B, which may also backtrack as necessary; however, should B
+when backtracked into on failure. Consider the pattern C<I<A> (*PRUNE) I<B>>,
+where I<A> and I<B> are complex patterns. Until the C<(*PRUNE)> verb is reached,
+I<A> may backtrack as necessary to match. Once it is reached, matching
+continues in I<B>, which may also backtrack as necessary; however, should B
not match, then no further backtracking will take place, and the pattern
will fail outright at the current starting position.
@@ -1964,7 +1984,7 @@ C<(*MARK:NAME)> was encountered while matching, then it is that position
which is used as the "skip point". If no C<(*MARK)> of that name was
encountered, then the C<(*SKIP)> operator has no effect. When used
without a name the "skip point" is where the match point was when
-executing the (*SKIP) pattern.
+executing the C<(*SKIP)> pattern.
Compare the following to the examples in C<(*PRUNE)>; note the string
is twice as long:
@@ -1989,7 +2009,7 @@ This zero-width pattern can be used to mark the point reached in a string
when a certain part of the pattern has been successfully matched. This
mark may be given a name. A later C<(*SKIP)> pattern will then skip
forward to that point if backtracked into on failure. Any number of
-C<(*MARK)> patterns are allowed, and the NAME portion may be duplicated.
+C<(*MARK)> patterns are allowed, and the I<NAME> portion may be duplicated.
In addition to interacting with the C<(*SKIP)> pattern, C<(*MARK:NAME)>
can be used to "label" a pattern branch, so that after matching, the
@@ -2025,7 +2045,7 @@ The two branches of a C<(?(condition)yes-pattern|no-pattern)> do not
count as an alternation, as far as C<(*THEN)> is concerned.
Its name comes from the observation that this operation combined with the
-alternation operator (C<|>) can be used to create what is essentially a
+alternation operator (C<"|">) can be used to create what is essentially a
pattern-based if/then/else block:
( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
@@ -2047,8 +2067,8 @@ is not the same as
/ ( A (*PRUNE) B | C ) /
-as after matching the A but failing on the B the C<(*THEN)> verb will
-backtrack and try C; but the C<(*PRUNE)> verb will simply fail.
+as after matching the I<A> but failing on the I<B> the C<(*THEN)> verb will
+backtrack and try I<C>; but the C<(*PRUNE)> verb will simply fail.
=item C<(*COMMIT)> C<(*COMMIT:args)>
X<(*COMMIT)>
@@ -2077,8 +2097,8 @@ X<(*FAIL)> X<(*F)>
This pattern matches nothing and always fails. It can be used to force the
engine to backtrack. It is equivalent to C<(?!)>, but easier to read. In
fact, C<(?!)> gets optimised into C<(*FAIL)> internally. You can provide
-an argument so that if the match fails because of this FAIL directive
-the argument can be obtained from $REGERROR.
+an argument so that if the match fails because of this C<FAIL> directive
+the argument can be obtained from C<$REGERROR>.
It is probably useful only when combined with C<(?{})> or C<(??{})>.
@@ -2101,8 +2121,8 @@ will match, and C<$1> will be C<AB> and C<$2> will be C<B>, C<$3> will not
be set. If another branch in the inner parentheses was matched, such as in the
string 'ACDE', then the C<D> and C<E> would have to be matched as well.
-You can provide an argument, which will be available in the var $REGMARK
-after the match completes.
+You can provide an argument, which will be available in the var
+C<$REGMARK> after the match completes.
=back
@@ -2118,8 +2138,8 @@ see L<Combining RE Pieces>.
A fundamental feature of regular expression matching involves the
notion called I<backtracking>, which is currently used (when needed)
-by all regular non-possessive expression quantifiers, namely C<*>, C<*?>, C<+>,
-C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
+by all regular non-possessive expression quantifiers, namely C<"*">, C<"*?">, C<"+">,
+C<"+?">, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
internally, but the general principle outlined here is valid.
For a regular expression to match, the I<entire> regular expression must
@@ -2138,8 +2158,8 @@ word following "foo" in the string "Food is on the foo table.":
When the match runs, the first part of the regular expression (C<\b(foo)>)
finds a possible match right at the beginning of the string, and loads up
-$1 with "Foo". However, as soon as the matching engine sees that there's
-no whitespace following the "Foo" that it had saved in $1, it realizes its
+C<$1> with "Foo". However, as soon as the matching engine sees that there's
+no whitespace following the "Foo" that it had saved in C<$1>, it realizes its
mistake and starts over again one character after where it had the
tentative match. This time it goes all the way until the next occurrence
of "foo". The complete regular expression matches this time, and you get
@@ -2254,7 +2274,7 @@ You might have expected test 3 to fail because it seems to a more
general purpose version of test 1. The important difference between
them is that test 3 contains a quantifier (C<\D*>) and so can use
backtracking, whereas test 1 will not. What's happening is
-that you've asked "Is it true that at the start of $x, following 0 or more
+that you've asked "Is it true that at the start of C<$x>, following 0 or more
non-digits, you have something that's not 123?" If the pattern matcher had
let C<\D*> expand to "ABC", this would have caused the whole pattern to
fail.
@@ -2271,7 +2291,7 @@ time. Now there's indeed something following "AB" that is not
"123". It's "C123", which suffices.
We can deal with this by using both an assertion and a negation.
-We'll say that the first part in $1 must be followed both by a digit
+We'll say that the first part in C<$1> must be followed both by a digit
and by something that's not "123". Remember that the look-aheads
are zero-width expressions--they only look, but don't consume any
of the string in their match. So rewriting this way produces what
@@ -2299,10 +2319,10 @@ take a painfully long time to run:
'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
-And if you used C<*>'s in the internal groups instead of limiting them
+And if you used C<"*">'s in the internal groups instead of limiting them
to 0 through 5 matches, then it would take forever--or until you ran
out of stack space. Moreover, these internal optimizations are not
-always applicable. For example, if you put C<{0,5}> instead of C<*>
+always applicable. For example, if you put C<{0,5}> instead of C<"*">
on the external group, no current optimization is applicable, and the
match takes a long time to finish.
@@ -2324,8 +2344,8 @@ routines, here are the pattern-matching rules not described above.
Any single character matches itself, unless it is a I<metacharacter>
with a special meaning described here or above. You can cause
characters that normally function as metacharacters to be interpreted
-literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
-character; "\\" matches a "\"). This escape mechanism is also required
+literally by prefixing them with a C<"\"> (e.g., C<"\."> matches a C<".">, not any
+character; "\\" matches a C<"\">). This escape mechanism is also required
for the character used as the pattern delimiter.
A series of characters matches that series of characters in the target
@@ -2334,19 +2354,19 @@ string.
You can specify a character class, by enclosing a list of characters
in C<[]>, which will match any character from the list. If the
-first character after the "[" is "^", the class matches any character not
-in the list. Within a list, the "-" character specifies a
+first character after the C<"["> is C<"^">, the class matches any character not
+in the list. Within a list, the C<"-"> character specifies a
range, so that C<a-z> represents all characters between "a" and "z",
-inclusive. If you want either "-" or "]" itself to be a member of a
-class, put it at the start of the list (possibly after a "^"), or
-escape it with a backslash. "-" is also taken literally when it is
-at the end of the list, just before the closing "]". (The
+inclusive. If you want either C<"-"> or C<"]"> itself to be a member of a
+class, put it at the start of the list (possibly after a C<"^">), or
+escape it with a backslash. C<"-"> is also taken literally when it is
+at the end of the list, just before the closing C<"]">. (The
following all specify the same class of three characters: C<[-az]>,
C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
specifies a class containing twenty-six characters, even on EBCDIC-based
character sets.) Also, if you try to use the character
classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of
-a range, the "-" is understood literally.
+a range, the C<"-"> is understood literally.
Note also that the whole range idea is rather unportable between
character sets, except for four situations that Perl handles specially.
@@ -2375,15 +2395,15 @@ used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
of three octal digits, matches the character whose coded character set value
is I<nnn>. Similarly, \xI<nn>, where I<nn> are hexadecimal digits,
matches the character whose ordinal is I<nn>. The expression \cI<x>
-matches the character control-I<x>. Finally, the "." metacharacter
+matches the character control-I<x>. Finally, the C<"."> metacharacter
matches any character except "\n" (unless you use C</s>).
-You can specify a series of alternatives for a pattern using "|" to
+You can specify a series of alternatives for a pattern using C<"|"> to
separate them, so that C<fee|fie|foe> will match any of "fee", "fie",
or "foe" in the target string (as would C<f(e|i|o)e>). The
first alternative includes everything from the last pattern delimiter
-("(", "(?:", etc. or the beginning of the pattern) up to the first "|", and
-the last alternative contains everything from the last "|" to the next
+(C<"(">, "(?:", etc. or the beginning of the pattern) up to the first C<"|">, and
+the last alternative contains everything from the last C<"|"> to the next
closing pattern delimiter. That's why it's common practice to include
alternatives in parentheses: to minimize confusion about where they
start and end.
@@ -2396,7 +2416,7 @@ part will match, as that is the first alternative tried, and it successfully
matches the target string. (This might not seem important, but it is
important when you are capturing matched text using parentheses.)
-Also remember that "|" is interpreted as a literal within square brackets,
+Also remember that C<"|"> is interpreted as a literal within square brackets,
so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>.
Within a pattern, you may designate subpatterns for later reference
@@ -2410,7 +2430,7 @@ match "0x1234 0x4321", but not "0x1234 01234", because subpattern
1 matched "0x", even though the rule C<0|0x> could potentially match
the leading 0 in the second number.
-=head2 Warning on \1 Instead of $1
+=head2 Warning on C<\1> Instead of C<$1>
Some people get too used to writing things like:
@@ -2451,7 +2471,7 @@ loops using regular expressions, with something as innocuous as:
The C<o?> matches at the beginning of C<'foo'>, and since the position
in the string is not moved by the match, C<o?> would match again and again
-because of the C<*> quantifier. Another common way to create a similar cycle
+because of the C<"*"> quantifier. Another common way to create a similar cycle
is with the looping modifier C<//g>:
@matches = ( 'foo' =~ m{ o? }xg );
@@ -2460,7 +2480,7 @@ or
print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
-or the loop implied by split().
+or the loop implied by C<split()>.
However, long experience has shown that many programming tasks may
be significantly simplified by using repeated subexpressions that
@@ -2472,7 +2492,7 @@ may match zero-length substrings. Here's a simple example being:
Thus Perl allows such constructs, by I<forcefully breaking
the infinite loop>. The rules for this are different for lower-level
loops given by the greedy quantifiers C<*+{}>, and for higher-level
-ones like the C</g> modifier or split() operator.
+ones like the C</g> modifier or C<split()> operator.
The lower-level loops are I<interrupted> (that is, the loop is
broken) when Perl detects that a repeated expression matched a
@@ -2507,7 +2527,7 @@ prints
Notice that "hello" is only printed once, as when Perl sees that the sixth
iteration of the outermost C<(?:)*> matches a zero-length string, it stops
-the C<*>.
+the C<"*">.
The higher-level loops preserve an additional state between iterations:
whether the last match was zero-length. To break the loop, the following
@@ -2530,7 +2550,7 @@ Similarly, for repeated C<m/()/g> the second-best match is the match at the
position one notch further in the string.
The additional state of being I<matched with zero-length> is associated with
-the matched string, and is reset by each assignment to pos().
+the matched string, and is reset by each assignment to C<pos()>.
Zero-length matches at the end of the previous match are ignored
during C<split>.
@@ -2681,7 +2701,7 @@ expressions, i.e., those without any runtime variable interpolations.
As documented in L<overload>, this conversion will work only over
literal parts of regular expressions. For C<\Y|$re\Y|> the variable
part of this regular expression needs to be converted explicitly
-(but only if the special meaning of C<\Y|> should be enabled inside $re):
+(but only if the special meaning of C<\Y|> should be enabled inside C<$re>):
use customre;
$re = <>;
@@ -2748,8 +2768,6 @@ Subroutine call to a named capture group. Equivalent to C<< (?&NAME) >>.
=head1 BUGS
-Many regular expression constructs don't work on EBCDIC platforms.
-
There are a number of issues with regard to case-insensitive matching
in Unicode rules. See C<i> under L</Modifiers> above.