summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGurusamy Sarathy <gsar@engin.umich.edu>1997-01-13 15:13:12 -0500
committerChip Salzenberg <chip@atlantic.net>1997-01-16 07:24:00 +1200
commitb2a07c1c241ec86f010fc0ea3bfa54c8ec28be90 (patch)
tree1ea5611946258421a00cdb1d497d22d833919f53
parent7c36043de26da560a0f7eb04f36d232762c0092c (diff)
downloadperl-b2a07c1c241ec86f010fc0ea3bfa54c8ec28be90.tar.gz
Document use of pos() and /\G/
Subject: Re: resetting pos broken in _20 On Mon, 13 Jan 1997 12:49:24 EST, Ilya Zakharevich wrote: >Gurusamy Sarathy writes: >> What's wrong with saying >> C<pos $foo = length $foo> after /g fails, to get the behavior >> you want? > >Since this has different semantics. You need to get `pos' before each >match, and reset it after each failing match. > > /=/g; /;/g; /=/g; /;/g; > >may give you non-monotoneous movement of `pos' over the string, which >is a bad thing. Ahh, of course. >But I still do not understand what you mean by "having pos at >end". The bug was that position is reset at failing match, probably >you have some other case in mind? Never mind, I was missing the possibility of chaining //g matches with the \G escape :-( >I did not realize that pos was available at perl 4.?, bug-for-bug >compatibility may be a reason if this was so for so many years... The bug fix seems to make a lot sense (to me) now. \G was essentially useless without the new "incompatiblity", eh? Here's a pod update that documents current behavior in all the places I could think of. - Sarathy. gsar@engin.umich.edu p5p-msgid: <199701132013.PAA26606@aatma.engin.umich.edu>
-rw-r--r--pod/perlfunc.pod4
-rw-r--r--pod/perlnews.pod13
-rw-r--r--pod/perlop.pod29
-rw-r--r--pod/perlre.pod5
-rw-r--r--pod/perltrap.pod20
5 files changed, 67 insertions, 4 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index c1cd67d8ba..65bba93bbb 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -2132,7 +2132,9 @@ like shift().
Returns the offset of where the last C<m//g> search left off for the variable
is in question ($_ is used when the variable is not specified). May be
-modified to change that offset.
+modified to change that offset. Such modification will also influence
+the C<\G> zero-width assertion in regular expressions. See L<perlre> and
+L<perlop>.
=item print FILEHANDLE LIST
diff --git a/pod/perlnews.pod b/pod/perlnews.pod
index e6d1225a76..3ddb1e07c2 100644
--- a/pod/perlnews.pod
+++ b/pod/perlnews.pod
@@ -23,7 +23,8 @@ file in the distribution for details.
There is a new Configure question that asks if you want to maintain
binary compatibility with Perl 5.003. If you choose binary
compatibility, you do not have to recompile your extensions, but you
-might have symbol conflicts if you embed Perl in another application.
+might have symbol conflicts if you embed Perl in another application,
+just as in the 5.003 release.
=head2 New Opcode Module and Revised Safe Module
@@ -186,6 +187,16 @@ function whose prototype you want to retrieve.
Functions documented in the Camel to default to $_ now in
fact do, and all those that do are so documented in L<perlfunc>.
+=head2 C<m//g> does not trigger a pos() reset on failure
+
+The C<m//g> match iteration construct used to reset the iteration
+when it failed to match (so that the next C<m//g> match would start at
+the beginning of the string). You now have to explicitly do a
+C<pos $str = 0;> to reset the "last match" position, or modify the
+string in some way. This change makes it practical to chain C<m//g>
+matches together in conjunction with ordinary matches using the C<\G>
+zero-width assertion. See L<perlop> and L<perlre>.
+
=back
=head2 New Built-in Methods
diff --git a/pod/perlop.pod b/pod/perlop.pod
index a8f34c0e57..dd3aeab663 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -695,7 +695,10 @@ In a scalar context, C<m//g> iterates through the string, returning TRUE
each time it matches, and FALSE when it eventually runs out of
matches. (In other words, it remembers where it left off last time and
restarts the search at that point. You can actually find the current
-match position of a string using the pos() function--see L<perlfunc>.)
+match position of a string or set it using the pos() function--see
+L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
+matches or intermix C<m//g> matches with C<m/\G.../>.
+
If you modify the string in any way, the match position is reset to the
beginning. Examples:
@@ -711,6 +714,30 @@ beginning. Examples:
}
print "$sentences\n";
+ # using m//g with \G
+ $_ = "ppooqppq";
+ while ($i++ < 2) {
+ print "1: '";
+ print $1 while /(o)/g; print "', pos=", pos, "\n";
+ print "2: '";
+ print $1 if /\G(q)/; print "', pos=", pos, "\n";
+ print "3: '";
+ print $1 while /(p)/g; print "', pos=", pos, "\n";
+ }
+
+The last example should print:
+
+ 1: 'oo', pos=4
+ 2: 'q', pos=4
+ 3: 'pp', pos=7
+ 1: '', pos=7
+ 2: 'q', pos=7
+ 3: '', pos=7
+
+Note how C<m//g> matches change the value reported by C<pos()>, but the
+non-global match doesn't.
+
+
=item q/STRING/
=item C<'STRING'>
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 12f9f51016..a4c0a7d9de 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -174,7 +174,10 @@ represents backspace rather than a word boundary.) The C<\A> and C<\Z> are
just like "^" and "$" except that they won't match multiple times when the
C</m> modifier is used, while "^" and "$" will match at every internal line
boundary. To match the actual end of the string, not ignoring newline,
-you can use C<\Z(?!\n)>.
+you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global
+matches (using C<m//g>) and non-global ones, as described in L<perlop>.
+The actual location where C<\G> will match can also be influenced
+by using C<pos()> as an lvalue. See L<perlfunc/pos>.
When the bracketing construct C<( ... )> is used, \E<lt>digitE<gt> matches the
digit'th substring. Outside of the pattern, always use "$" instead of "\"
diff --git a/pod/perltrap.pod b/pod/perltrap.pod
index b8247a4208..4b56dd23d8 100644
--- a/pod/perltrap.pod
+++ b/pod/perltrap.pod
@@ -1108,6 +1108,26 @@ repeatedly, like C</x/> or C<m!x!>.
# perl5 prints: perl5
+=item * Regular Expression
+
+Under perl4 and upto version 5.003, a failed C<m//g> match used to
+reset the internal iterator, so that subsequent C<m//g> match attempts
+began from the beginning of the string. In perl version 5.004 and later,
+failed C<m//g> matches do not reset the iterator position (which can be
+found using the C<pos()> function--see L<perlfunc/pos>).
+
+ $test = "foop";
+ for (1..3) {
+ print $1 while ($test =~ /(o)/g);
+ # pos $test = 0; # to get old behavior
+ }
+
+ # perl4 prints: oooooo
+ # perl5.004 prints: oo
+
+You may always reset the iterator yourself as shown in the commented line
+to get the old behavior.
+
=back
=head2 Subroutine, Signal, Sorting Traps