diff options
-rw-r--r-- | pod/perlfaq6.pod | 15 | ||||
-rw-r--r-- | pod/perlop.pod | 18 |
2 files changed, 24 insertions, 9 deletions
diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 29136abd96..4ab4d4cc98 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -527,11 +527,16 @@ variable is no longer "expensive" the way the other two are. =head2 What good is C<\G> in a regular expression? -The notation C<\G> is used in a match or substitution in conjunction the -C</g> modifier (and ignored if there's no C</g>) to anchor the regular -expression to the point just past where the last match occurred, i.e. the -pos() point. A failed match resets the position of C<\G> unless the -C</c> modifier is in effect. +The notation C<\G> is used in a match or substitution in conjunction with +the C</g> modifier to anchor the regular expression to the point just past +where the last match occurred, i.e. the pos() point. A failed match resets +the position of C<\G> unless the C</c> modifier is in effect. C<\G> can be +used in a match without the C</g> modifier; it acts the same (i.e. still +anchors at the pos() point) but of course only matches once and does not +update pos(), as non-C</g> expressions never do. C<\G> in an expression +applied to a target string that has never been matched against a C</g> +expression before or has had its pos() reset is functionally equivalent to +C<\A>, which matches at the beginning of the string. For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading C<< > >> characters), and diff --git a/pod/perlop.pod b/pod/perlop.pod index b317bdec9c..945d4f3c5f 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -851,9 +851,11 @@ string also resets the search position. You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a zero-width assertion that matches the exact position where the previous -C<m//g>, if any, left off. The C<\G> assertion is not supported without -the C</g> modifier. (Currently, without C</g>, C<\G> behaves just like -C<\A>, but that's accidental and may change in the future.) +C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion +still anchors at pos(), but the match is of course only attempted once. +Using C<\G> without C</g> on a target string that has not previously had a +C</g> match applied to it is the same as using the C<\A> assertion to match +the beginning of the string. Examples: @@ -861,7 +863,7 @@ Examples: ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); # scalar context - $/ = ""; $* = 1; # $* deprecated in modern perls + $/ = ""; while (defined($paragraph = <>)) { while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { $sentences++; @@ -879,6 +881,7 @@ Examples: print "3: '"; print $1 while /(p)/gc; print "', pos=", pos, "\n"; } + print "Final: '$1', pos=",pos,"\n" if /\G(.)/; The last example should print: @@ -888,6 +891,13 @@ The last example should print: 1: '', pos=7 2: 'q', pos=8 3: '', pos=8 + Final: 'q', pos=8 + +Notice that the final match matched C<q> instead of C<p>, which a match +without the C<\G> anchor would have done. Also note that the final match +did not update C<pos> -- C<pos> is only updated on a C</g> match. If the +final match did indeed match C<p>, it's a good bet that you're running an +older (pre-5.6.0) Perl. A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can combine several regexps like this to process a string part-by-part, |