diff options
author | Yves Orton <demerphq@gmail.com> | 2007-01-10 21:33:39 +0100 |
---|---|---|
committer | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2007-01-11 14:47:01 +0000 |
commit | ee9b8eaedac8053a01cc9281ada34dd182a8f7d0 (patch) | |
tree | 129df8e187e17fb664051b0f7f9a1b55d46fcecd /pod | |
parent | b4390064818aaae08b8f53f740ea62f7dd8517a1 (diff) | |
download | perl-ee9b8eaedac8053a01cc9281ada34dd182a8f7d0.tar.gz |
Add Regexp::Keep \K functionality to regex engine as well as add \v and \V, cleanup and more docs for regatom()
Message-ID: <9b18b3110701101133i46dc5fd0p1476a0f1dd1e9c5a@mail.gmail.com>
(plus POD nits by Merijn and myself)
p4raw-id: //depot/perl@29756
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perl595delta.pod | 19 | ||||
-rw-r--r-- | pod/perlre.pod | 49 |
2 files changed, 60 insertions, 8 deletions
diff --git a/pod/perl595delta.pod b/pod/perl595delta.pod index 0497d55781..47fbaf51a1 100644 --- a/pod/perl595delta.pod +++ b/pod/perl595delta.pod @@ -107,9 +107,9 @@ would expect. This is considered a feature. :-) (Yves Orton) =item Possessive Quantifiers -Perl now supports the "possessive quantifier" syntax of the "atomic match" +Perl now supports the "possessive quantifier" syntax of the "atomic match" pattern. Basically a possessive quantifier matches as much as it can and never -gives any back. Thus it can be used to control backtracking. The syntax is +gives any back. Thus it can be used to control backtracking. The syntax is similar to non-greedy matching, except instead of using a '?' as the modifier the '+' is used. Thus C<?+>, C<*+>, C<++>, C<{min,max}+> are now legal quantifiers. (Yves Orton) @@ -129,6 +129,21 @@ that contain backreferences. (Yves Orton) =back +=item Regexp::Keep internalized + +The functionality of Jeff Pinyan's module Regexp::Keep has been added to +the core. You can now use in regular expressions the special escape C<\K> +as a way to do something like floating length positive lookbehind. It is +also useful in substitutions like: + + s/(foo)bar/$1/g + +that can now be converted to + + s/foo\Kbar//g + +which is much more efficient. + =head2 The C<_> prototype A new prototype character has been added. C<_> is equivalent to C<$> (it diff --git a/pod/perlre.pod b/pod/perlre.pod index 6c2049628c..7133a02c96 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -255,6 +255,9 @@ X<word> X<whitespace> \N{name} Named unicode character, or unicode escape \x12 Hexadecimal escape sequence \x{1234} Long hexadecimal escape sequence + \K Keep the stuff left of the \K, don't include it in $& + \v Shortcut for (*PRUNE) + \V Shortcut for (*SKIP) A C<\w> matches a single alphanumeric character (an alphabetic character, or a decimal digit) or C<_>, not a whole word. Use C<\w+> @@ -690,6 +693,17 @@ is equivalent to the more verbose /(?:(?s-i)more.*than).*million/i +=item Look-Around Assertions +X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround> + +Look-around assertions are zero width patterns which match a specific +pattern without including it in C<$&>. Positive assertions match when +their subpattern matches, negative assertions match when their subpattern +fails. Look-behind matches text up to the current match position, +look-ahead matches text following the current match position. + +=over 4 + =item C<(?=pattern)> X<(?=)> X<look-ahead, positive> X<lookahead, positive> @@ -716,13 +730,30 @@ Sometimes it's still easier just to say: For look-behind see below. -=item C<(?<=pattern)> -X<(?<=)> X<look-behind, positive> X<lookbehind, positive> +=item C<(?<=pattern)> C<\K> +X<(?<=)> X<look-behind, positive> X<lookbehind, positive> X<\K> A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/> matches a word that follows a tab, without including the tab in C<$&>. Works only for fixed-width look-behind. +There is a special form of this construct, called C<\K>, which causes the +regex engine to "keep" everything it had matched prior to the C<\K> and +not include it in C<$&>. This effectively provides variable length +look-behind. The use of C<\K> inside of another look-around assertion +is allowed, but the behaviour is currently not well defined. + +For various reasons C<\K> may be signifigantly more efficient than the +equivalent C<< (?<=...) >> construct, and it is especially useful in +situations where you want to efficiently remove something following +something else in a string. For instance + + s/(foo)bar/$1/g; + +can be rewritten as the much more efficient + + s/foo\Kbar//g; + =item C<(?<!pattern)> X<(?<!)> X<look-behind, negative> X<lookbehind, negative> @@ -730,6 +761,8 @@ A zero-width negative look-behind assertion. For example C</(?<!bar)foo/> matches any occurrence of "foo" that does not follow "bar". Works only for fixed-width look-behind. +=back + =item C<(?'NAME'pattern)> =item C<< (?<NAME>pattern) >> @@ -761,7 +794,7 @@ its Unicode extension (see L<utf8>), though it isn't extended by the locale (see L<perllocale>). B<NOTE:> In order to make things easier for programmers with experience -with the Python or PCRE regex engines the pattern C<< (?P<NAME>pattern) >> +with the Python or PCRE regex engines the pattern C<< (?PE<lt>NAMEE<gt>pattern) >> maybe be used instead of C<< (?<NAME>pattern) >>; however this form does not support the use of single quotes as a delimiter for the name. This is only available in Perl 5.10 or later. @@ -1251,7 +1284,7 @@ argument, then C<$REGERROR> and C<$REGMARK> are not touched at all. =over 4 =item C<(*PRUNE)> C<(*PRUNE:NAME)> -X<(*PRUNE)> X<(*PRUNE:NAME)> +X<(*PRUNE)> X<(*PRUNE:NAME)> X<\v> This zero-width pattern prunes the backtracking tree at the current point when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>, @@ -1261,6 +1294,8 @@ continues in B, which may also backtrack as necessary; however, should B not match, then no further backtracking will take place, and the pattern will fail outright at the current starting position. +As a shortcut, X<\v> is exactly equivalent to C<(*PRUNE)>. + The following example counts all the possible matching strings in a pattern (without actually matching any of them). @@ -1312,6 +1347,8 @@ of this pattern. This effectively means that the regex engine "skips" forward to this position on failure and tries to match again, (assuming that there is sufficient room to match). +As a shortcut X<\V> is exactly equivalent to C<(*SKIP)>. + The name of the C<(*SKIP:NAME)> pattern has special significance. If a C<(*MARK:NAME)> was encountered while matching, then it is that position which is used as the "skip point". If no C<(*MARK)> of that name was @@ -2008,7 +2045,7 @@ Perl specific syntax, the following are legal in Perl 5.10: =over 4 -=item C<< (?P<NAME>pattern) >> +=item C<< (?PE<lt>NAMEE<gt>pattern) >> Define a named capture buffer. Equivalent to C<< (?<NAME>pattern) >>. @@ -2020,7 +2057,7 @@ Backreference to a named capture buffer. Equivalent to C<< \g{NAME} >>. Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>. -=back 4 +=back =head1 BUGS |