summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod53
1 files changed, 38 insertions, 15 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 373e1ca84e..68ce4b9bf7 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -34,6 +34,13 @@ line anywhere within the string,
Treat string as single line. That is, change "." to match any character
whatsoever, even a newline, which it normally would not match.
+The /s and /m modifiers both override the C<$*> setting. That is, no matter
+what C<$*> contains, /s (without /m) will force "^" to match only at the
+beginning of the string and "$" to match only at the end (or just before a
+newline at the end) of the string. Together, as /ms, they let the "." match
+any character whatsoever, while yet allowing "^" and "$" to match,
+respectively, just after and just before newlines within the string.
+
=item x
Extend your pattern's legibility by permitting whitespace and comments.
@@ -139,7 +146,7 @@ also work:
\Q quote (disable) regexp metacharacters till \E
If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
-and <\U> is taken from the current locale. See L<perllocale>.
+and C<\U> is taken from the current locale. See L<perllocale>.
In addition, Perl defines the following:
@@ -238,7 +245,7 @@ non-alphanumeric characters:
$pattern =~ s/(\W)/\\$1/g;
Now it is much more common to see either the quotemeta() function or
-the \Q escape sequence used to disable the metacharacters special
+the C<\Q> escape sequence used to disable all metacharacters' special
meanings like this:
/$unquoted\Q$quoted\E$unquoted/
@@ -278,16 +285,17 @@ matches a word followed by a tab, without including the tab in C<$&>.
A zero-width negative lookahead assertion. For example C</foo(?!bar)/>
matches any occurrence of "foo" that isn't followed by "bar". Note
however that lookahead and lookbehind are NOT the same thing. You cannot
-use this for lookbehind: C</(?!foo)bar/> will not find an occurrence of
-"bar" that is preceded by something which is not "foo". That's because
-the C<(?!foo)> is just saying that the next thing cannot be "foo"--and
-it's not, it's a "bar", so "foobar" will match. You would have to do
-something like C</(?!foo)...bar/> for that. We say "like" because there's
-the case of your "bar" not having three characters before it. You could
-cover that this way: C</(?:(?!foo)...|^..?)bar/>. Sometimes it's still
-easier just to say:
+use this for lookbehind.
+
+If you are looking for a "bar" which isn't preceded by a "foo", C</(?!foo)bar/>
+will not do what you want. That's because the C<(?!foo)> is just saying that
+the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
+match. You would have to do something like C</(?!foo)...bar/> for that. We
+say "like" because there's the case of your "bar" not having three characters
+before it. You could cover that this way: C</(?:(?!foo)...|^.{0,2})bar/>.
+Sometimes it's still easier just to say:
- if (/foo/ && $` =~ /bar$/)
+ if (/bar/ && $` !~ /foo$/)
For lookbehind see below.
@@ -387,7 +395,7 @@ Say,
matches a chunk of non-parentheses, possibly included in parentheses
themselves.
-=item C<(?imsx)>
+=item C<(?imstx)>
One or more embedded pattern-match modifiers. This is particularly
useful for patterns that are specified in a table somewhere, some of
@@ -653,9 +661,18 @@ first alternative includes everything from the last pattern delimiter
the last alternative contains everything from the last "|" to the next
pattern delimiter. For this reason, it's common practice to include
alternatives in parentheses, to minimize confusion about where they
-start and end. Note however that "|" is interpreted as a literal with
-square brackets, so if you write C<[fee|fie|foe]> you're really only
-matching C<[feio|]>.
+start and end.
+
+Note that alternatives are tried from left to right, so the first
+alternative found for which the entire expression matches, is the one that
+is chosen. This means that alternatives are not necessarily greedy. For
+example: when mathing C<foo|foot> against "barefoot", only the "foo"
+part will match, as that is the first alternative tried, and it successfully
+matches the target string. (This might not seem important, but it is
+important when you are capturing matched text using parentheses.)
+
+Also note that "|" is interpreted as a literal within square brackets,
+so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>.
Within a pattern, you may designate subpatterns for later reference by
enclosing them in parentheses, and you may refer back to the I<n>th
@@ -695,4 +712,10 @@ different things on the I<left> side of the C<s///>.
=head2 SEE ALSO
+L<perlop/"Regexp Quote-Like Operators">.
+
+L<perlfunc/pos>.
+
+L<perllocale>.
+
"Mastering Regular Expressions" (see L<perlbook>) by Jeffrey Friedl.