summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2007-01-10 21:33:39 +0100
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2007-01-11 14:47:01 +0000
commitee9b8eaedac8053a01cc9281ada34dd182a8f7d0 (patch)
tree129df8e187e17fb664051b0f7f9a1b55d46fcecd /pod
parentb4390064818aaae08b8f53f740ea62f7dd8517a1 (diff)
downloadperl-ee9b8eaedac8053a01cc9281ada34dd182a8f7d0.tar.gz
Add Regexp::Keep \K functionality to regex engine as well as add \v and \V, cleanup and more docs for regatom()
Message-ID: <9b18b3110701101133i46dc5fd0p1476a0f1dd1e9c5a@mail.gmail.com> (plus POD nits by Merijn and myself) p4raw-id: //depot/perl@29756
Diffstat (limited to 'pod')
-rw-r--r--pod/perl595delta.pod19
-rw-r--r--pod/perlre.pod49
2 files changed, 60 insertions, 8 deletions
diff --git a/pod/perl595delta.pod b/pod/perl595delta.pod
index 0497d55781..47fbaf51a1 100644
--- a/pod/perl595delta.pod
+++ b/pod/perl595delta.pod
@@ -107,9 +107,9 @@ would expect. This is considered a feature. :-) (Yves Orton)
=item Possessive Quantifiers
-Perl now supports the "possessive quantifier" syntax of the "atomic match"
+Perl now supports the "possessive quantifier" syntax of the "atomic match"
pattern. Basically a possessive quantifier matches as much as it can and never
-gives any back. Thus it can be used to control backtracking. The syntax is
+gives any back. Thus it can be used to control backtracking. The syntax is
similar to non-greedy matching, except instead of using a '?' as the modifier
the '+' is used. Thus C<?+>, C<*+>, C<++>, C<{min,max}+> are now legal
quantifiers. (Yves Orton)
@@ -129,6 +129,21 @@ that contain backreferences. (Yves Orton)
=back
+=item Regexp::Keep internalized
+
+The functionality of Jeff Pinyan's module Regexp::Keep has been added to
+the core. You can now use in regular expressions the special escape C<\K>
+as a way to do something like floating length positive lookbehind. It is
+also useful in substitutions like:
+
+ s/(foo)bar/$1/g
+
+that can now be converted to
+
+ s/foo\Kbar//g
+
+which is much more efficient.
+
=head2 The C<_> prototype
A new prototype character has been added. C<_> is equivalent to C<$> (it
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 6c2049628c..7133a02c96 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -255,6 +255,9 @@ X<word> X<whitespace>
\N{name} Named unicode character, or unicode escape
\x12 Hexadecimal escape sequence
\x{1234} Long hexadecimal escape sequence
+ \K Keep the stuff left of the \K, don't include it in $&
+ \v Shortcut for (*PRUNE)
+ \V Shortcut for (*SKIP)
A C<\w> matches a single alphanumeric character (an alphabetic
character, or a decimal digit) or C<_>, not a whole word. Use C<\w+>
@@ -690,6 +693,17 @@ is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
+=item Look-Around Assertions
+X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
+
+Look-around assertions are zero width patterns which match a specific
+pattern without including it in C<$&>. Positive assertions match when
+their subpattern matches, negative assertions match when their subpattern
+fails. Look-behind matches text up to the current match position,
+look-ahead matches text following the current match position.
+
+=over 4
+
=item C<(?=pattern)>
X<(?=)> X<look-ahead, positive> X<lookahead, positive>
@@ -716,13 +730,30 @@ Sometimes it's still easier just to say:
For look-behind see below.
-=item C<(?<=pattern)>
-X<(?<=)> X<look-behind, positive> X<lookbehind, positive>
+=item C<(?<=pattern)> C<\K>
+X<(?<=)> X<look-behind, positive> X<lookbehind, positive> X<\K>
A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/>
matches a word that follows a tab, without including the tab in C<$&>.
Works only for fixed-width look-behind.
+There is a special form of this construct, called C<\K>, which causes the
+regex engine to "keep" everything it had matched prior to the C<\K> and
+not include it in C<$&>. This effectively provides variable length
+look-behind. The use of C<\K> inside of another look-around assertion
+is allowed, but the behaviour is currently not well defined.
+
+For various reasons C<\K> may be signifigantly more efficient than the
+equivalent C<< (?<=...) >> construct, and it is especially useful in
+situations where you want to efficiently remove something following
+something else in a string. For instance
+
+ s/(foo)bar/$1/g;
+
+can be rewritten as the much more efficient
+
+ s/foo\Kbar//g;
+
=item C<(?<!pattern)>
X<(?<!)> X<look-behind, negative> X<lookbehind, negative>
@@ -730,6 +761,8 @@ A zero-width negative look-behind assertion. For example C</(?<!bar)foo/>
matches any occurrence of "foo" that does not follow "bar". Works
only for fixed-width look-behind.
+=back
+
=item C<(?'NAME'pattern)>
=item C<< (?<NAME>pattern) >>
@@ -761,7 +794,7 @@ its Unicode extension (see L<utf8>),
though it isn't extended by the locale (see L<perllocale>).
B<NOTE:> In order to make things easier for programmers with experience
-with the Python or PCRE regex engines the pattern C<< (?P<NAME>pattern) >>
+with the Python or PCRE regex engines the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
maybe be used instead of C<< (?<NAME>pattern) >>; however this form does not
support the use of single quotes as a delimiter for the name. This is
only available in Perl 5.10 or later.
@@ -1251,7 +1284,7 @@ argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
=over 4
=item C<(*PRUNE)> C<(*PRUNE:NAME)>
-X<(*PRUNE)> X<(*PRUNE:NAME)>
+X<(*PRUNE)> X<(*PRUNE:NAME)> X<\v>
This zero-width pattern prunes the backtracking tree at the current point
when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>,
@@ -1261,6 +1294,8 @@ continues in B, which may also backtrack as necessary; however, should B
not match, then no further backtracking will take place, and the pattern
will fail outright at the current starting position.
+As a shortcut, X<\v> is exactly equivalent to C<(*PRUNE)>.
+
The following example counts all the possible matching strings in a
pattern (without actually matching any of them).
@@ -1312,6 +1347,8 @@ of this pattern. This effectively means that the regex engine "skips" forward
to this position on failure and tries to match again, (assuming that
there is sufficient room to match).
+As a shortcut X<\V> is exactly equivalent to C<(*SKIP)>.
+
The name of the C<(*SKIP:NAME)> pattern has special significance. If a
C<(*MARK:NAME)> was encountered while matching, then it is that position
which is used as the "skip point". If no C<(*MARK)> of that name was
@@ -2008,7 +2045,7 @@ Perl specific syntax, the following are legal in Perl 5.10:
=over 4
-=item C<< (?P<NAME>pattern) >>
+=item C<< (?PE<lt>NAMEE<gt>pattern) >>
Define a named capture buffer. Equivalent to C<< (?<NAME>pattern) >>.
@@ -2020,7 +2057,7 @@ Backreference to a named capture buffer. Equivalent to C<< \g{NAME} >>.
Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>.
-=back 4
+=back
=head1 BUGS