diff options
author | Yves Orton <demerphq@gmail.com> | 2007-02-13 23:04:54 +0100 |
---|---|---|
committer | H.Merijn Brand <h.m.brand@xs4all.nl> | 2007-02-14 07:54:59 +0000 |
commit | 87e95b7ff3268cc4f947098ed09d244372b3af0d (patch) | |
tree | 558694850b849d198a69d4f0111e9706388c2d9f /pod | |
parent | ee03edd30dac4a70bf2d56895e9bf4dd1243abc8 (diff) | |
download | perl-87e95b7ff3268cc4f947098ed09d244372b3af0d.tar.gz |
Re: [PATCH] Document that m//k works
Message-ID: <9b18b3110702131304q370f3530j463c1a59c5ac1dfe@mail.gmail.com>
p4raw-id: //depot/perl@30278
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlop.pod | 390 | ||||
-rw-r--r-- | pod/perlre.pod | 28 | ||||
-rw-r--r-- | pod/perlvar.pod | 6 |
3 files changed, 215 insertions, 209 deletions
diff --git a/pod/perlop.pod b/pod/perlop.pod index 18851f04da..52ecddc4e8 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -1049,33 +1049,77 @@ matching and related activities. =over 8 -=item ?PATTERN? -X<?> +=item qr/STRING/msixpo +X<qr> X</i> X</m> X</o> X</s> X</x> -This is just like the C</pattern/> search, except that it matches only -once between calls to the reset() operator. This is a useful -optimization when you want to see only the first occurrence of -something in each file of a set of files, for instance. Only C<??> -patterns local to the current package are reset. +This operator quotes (and possibly compiles) its I<STRING> as a regular +expression. I<STRING> is interpolated the same way as I<PATTERN> +in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation +is done. Returns a Perl value which may be used instead of the +corresponding C</STRING/imosx> expression. - while (<>) { - if (?^$?) { - # blank line between header and body - } - } continue { - reset if eof; # clear ?? status for next file +For example, + + $rex = qr/my.STRING/is; + s/$rex/foo/; + +is equivalent to + + s/my.STRING/foo/is; + +The result may be used as a subpattern in a match: + + $re = qr/$pattern/; + $string =~ /foo${re}bar/; # can be interpolated in other patterns + $string =~ $re; # or used standalone + $string =~ /$re/; # or this way + +Since Perl may compile the pattern at the moment of execution of qr() +operator, using qr() may have speed advantages in some situations, +notably if the result of qr() is used standalone: + + sub match { + my $patterns = shift; + my @compiled = map qr/$_/i, @$patterns; + grep { + my $success = 0; + foreach my $pat (@compiled) { + $success = 1, last if /$pat/; + } + $success; + } @_; } -This usage is vaguely deprecated, which means it just might possibly -be removed in some distant future version of Perl, perhaps somewhere -around the year 2168. +Precompilation of the pattern into an internal representation at +the moment of qr() avoids a need to recompile the pattern every +time a match C</$pat/> is attempted. (Perl has many other internal +optimizations, but none would be triggered in the above example if +we did not use qr() operator.) + +Options are: + + m Treat string as multiple lines. + s Treat string as single line. (Make . match a newline) + i Do case-insensitive pattern matching. + x Use extended regular expressions. + p When matching preserve a copy of the matched string so + that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. + o Compile pattern only once. + +If a precompiled pattern is embedded in a larger pattern then the effect +of 'msixp' will be propagated appropriately. The effect of the 'o' +modifier has is not propagated, being restricted to those patterns +explicitly using it. + +See L<perlre> for additional information on valid syntax for STRING, and +for a detailed look at the semantics of regular expressions. -=item m/PATTERN/cgimosxk +=item m/PATTERN/msixpogc X<m> X<operator, match> X<regexp, options> X<regexp> X<regex, options> X<regex> X</c> X</i> X</m> X</o> X</s> X</x> -=item /PATTERN/cgimosxk +=item /PATTERN/msixpogc Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails. If no string is specified @@ -1086,17 +1130,11 @@ rather tightly.) See also L<perlre>. See L<perllocale> for discussion of additional considerations that apply when C<use locale> is in effect. -Options are: +Options are as described in qr// in addition to the following match +process modifiers - i Do case-insensitive pattern matching. - m Treat string as multiple lines. - s Treat string as single line. - x Use extended regular expressions. g Match globally, i.e., find all occurrences. c Do not reset search position on a failed match when /g is in effect. - o Compile pattern only once. - k Keep a copy of the matched string so that ${^MATCH} and friends - will be defined. If "/" is the delimiter then the initial C<m> is optional. With the C<m> you can use any pair of non-alphanumeric, non-whitespace characters @@ -1256,6 +1294,137 @@ Here is the output (split into several lines): lowercase lowercase line-noise lowercase lowercase line-noise MiXeD line-noise. That's all! +=item ?PATTERN? +X<?> + +This is just like the C</pattern/> search, except that it matches only +once between calls to the reset() operator. This is a useful +optimization when you want to see only the first occurrence of +something in each file of a set of files, for instance. Only C<??> +patterns local to the current package are reset. + + while (<>) { + if (?^$?) { + # blank line between header and body + } + } continue { + reset if eof; # clear ?? status for next file + } + +This usage is vaguely deprecated, which means it just might possibly +be removed in some distant future version of Perl, perhaps somewhere +around the year 2168. + +=item s/PATTERN/REPLACEMENT/msixpogce +X<substitute> X<substitution> X<replace> X<regexp, replace> +X<regexp, substitute> X</e> X</g> X</i> X</m> X</o> X</s> X</x> + +Searches a string for a pattern, and if found, replaces that pattern +with the replacement text and returns the number of substitutions +made. Otherwise it returns false (specifically, the empty string). + +If no string is specified via the C<=~> or C<!~> operator, the C<$_> +variable is searched and modified. (The string specified with C<=~> must +be scalar variable, an array element, a hash element, or an assignment +to one of those, i.e., an lvalue.) + +If the delimiter chosen is a single quote, no interpolation is +done on either the PATTERN or the REPLACEMENT. Otherwise, if the +PATTERN contains a $ that looks like a variable rather than an +end-of-string test, the variable will be interpolated into the pattern +at run-time. If you want the pattern compiled only once the first time +the variable is interpolated, use the C</o> option. If the pattern +evaluates to the empty string, the last successfully executed regular +expression is used instead. See L<perlre> for further explanation on these. +See L<perllocale> for discussion of additional considerations that apply +when C<use locale> is in effect. + +Options are as with m// with the addition of the following replacement +specific options: + + e Evaluate the right side as an expression. + ee Evaluate the right side as a string then eval the result + +Any non-alphanumeric, non-whitespace delimiter may replace the +slashes. If single quotes are used, no interpretation is done on the +replacement string (the C</e> modifier overrides this, however). Unlike +Perl 4, Perl 5 treats backticks as normal delimiters; the replacement +text is not evaluated as a command. If the +PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own +pair of quotes, which may or may not be bracketing quotes, e.g., +C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the +replacement portion to be treated as a full-fledged Perl expression +and evaluated right then and there. It is, however, syntax checked at +compile-time. A second C<e> modifier will cause the replacement portion +to be C<eval>ed before being run as a Perl expression. + +Examples: + + s/\bgreen\b/mauve/g; # don't change wintergreen + + $path =~ s|/usr/bin|/usr/local/bin|; + + s/Login: $foo/Login: $bar/; # run-time pattern + + ($foo = $bar) =~ s/this/that/; # copy first, then change + + $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count + + $_ = 'abc123xyz'; + s/\d+/$&*2/e; # yields 'abc246xyz' + s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' + s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' + + s/%(.)/$percent{$1}/g; # change percent escapes; no /e + s/%(.)/$percent{$1} || $&/ge; # expr now, so /e + s/^=(\w+)/pod($1)/ge; # use function call + + # expand variables in $_, but dynamics only, using + # symbolic dereferencing + s/\$(\w+)/${$1}/g; + + # Add one to the value of any numbers in the string + s/(\d+)/1 + $1/eg; + + # This will expand any embedded scalar variable + # (including lexicals) in $_ : First $1 is interpolated + # to the variable name, and then evaluated + s/(\$\w+)/$1/eeg; + + # Delete (most) C comments. + $program =~ s { + /\* # Match the opening delimiter. + .*? # Match a minimal number of characters. + \*/ # Match the closing delimiter. + } []gsx; + + s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively + + for ($variable) { # trim whitespace in $variable, cheap + s/^\s+//; + s/\s+$//; + } + + s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields + +Note the use of $ instead of \ in the last example. Unlike +B<sed>, we use the \<I<digit>> form in only the left hand side. +Anywhere else it's $<I<digit>>. + +Occasionally, you can't use just a C</g> to get all the changes +to occur that you might want. Here are two common cases: + + # put commas in the right places in an integer + 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; + + # expand tabs to 8-column spacing + 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; + +=back + +=head2 Quote-Like Operators +X<operator, quote-like> + =item q/STRING/ X<q> X<quote, single> X<'> X<''> @@ -1281,64 +1450,6 @@ A double-quoted, interpolated string. if /\b(tcl|java|python)\b/i; # :-) $baz = "\n"; # a one-character string -=item qr/STRING/imosx -X<qr> X</i> X</m> X</o> X</s> X</x> - -This operator quotes (and possibly compiles) its I<STRING> as a regular -expression. I<STRING> is interpolated the same way as I<PATTERN> -in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation -is done. Returns a Perl value which may be used instead of the -corresponding C</STRING/imosx> expression. - -For example, - - $rex = qr/my.STRING/is; - s/$rex/foo/; - -is equivalent to - - s/my.STRING/foo/is; - -The result may be used as a subpattern in a match: - - $re = qr/$pattern/; - $string =~ /foo${re}bar/; # can be interpolated in other patterns - $string =~ $re; # or used standalone - $string =~ /$re/; # or this way - -Since Perl may compile the pattern at the moment of execution of qr() -operator, using qr() may have speed advantages in some situations, -notably if the result of qr() is used standalone: - - sub match { - my $patterns = shift; - my @compiled = map qr/$_/i, @$patterns; - grep { - my $success = 0; - foreach my $pat (@compiled) { - $success = 1, last if /$pat/; - } - $success; - } @_; - } - -Precompilation of the pattern into an internal representation at -the moment of qr() avoids a need to recompile the pattern every -time a match C</$pat/> is attempted. (Perl has many other internal -optimizations, but none would be triggered in the above example if -we did not use qr() operator.) - -Options are: - - i Do case-insensitive pattern matching. - m Treat string as multiple lines. - o Compile pattern only once. - s Treat string as single line. - x Use extended regular expressions. - -See L<perlre> for additional information on valid syntax for STRING, and -for a detailed look at the semantics of regular expressions. - =item qx/STRING/ X<qx> X<`> X<``> X<backtick> @@ -1459,117 +1570,6 @@ put comments into a multi-line C<qw>-string. For this reason, the C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) produces warnings if the STRING contains the "," or the "#" character. -=item s/PATTERN/REPLACEMENT/egimosxk -X<substitute> X<substitution> X<replace> X<regexp, replace> -X<regexp, substitute> X</e> X</g> X</i> X</m> X</o> X</s> X</x> - -Searches a string for a pattern, and if found, replaces that pattern -with the replacement text and returns the number of substitutions -made. Otherwise it returns false (specifically, the empty string). - -If no string is specified via the C<=~> or C<!~> operator, the C<$_> -variable is searched and modified. (The string specified with C<=~> must -be scalar variable, an array element, a hash element, or an assignment -to one of those, i.e., an lvalue.) - -If the delimiter chosen is a single quote, no interpolation is -done on either the PATTERN or the REPLACEMENT. Otherwise, if the -PATTERN contains a $ that looks like a variable rather than an -end-of-string test, the variable will be interpolated into the pattern -at run-time. If you want the pattern compiled only once the first time -the variable is interpolated, use the C</o> option. If the pattern -evaluates to the empty string, the last successfully executed regular -expression is used instead. See L<perlre> for further explanation on these. -See L<perllocale> for discussion of additional considerations that apply -when C<use locale> is in effect. - -Options are: - - i Do case-insensitive pattern matching. - m Treat string as multiple lines. - s Treat string as single line. - x Use extended regular expressions. - g Replace globally, i.e., all occurrences. - o Compile pattern only once. - k Keep a copy of the original string so ${^MATCH} and friends - will be defined. - e Evaluate the right side as an expression. - - -Any non-alphanumeric, non-whitespace delimiter may replace the -slashes. If single quotes are used, no interpretation is done on the -replacement string (the C</e> modifier overrides this, however). Unlike -Perl 4, Perl 5 treats backticks as normal delimiters; the replacement -text is not evaluated as a command. If the -PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own -pair of quotes, which may or may not be bracketing quotes, e.g., -C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the -replacement portion to be treated as a full-fledged Perl expression -and evaluated right then and there. It is, however, syntax checked at -compile-time. A second C<e> modifier will cause the replacement portion -to be C<eval>ed before being run as a Perl expression. - -Examples: - - s/\bgreen\b/mauve/g; # don't change wintergreen - - $path =~ s|/usr/bin|/usr/local/bin|; - - s/Login: $foo/Login: $bar/; # run-time pattern - - ($foo = $bar) =~ s/this/that/; # copy first, then change - - $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count - - $_ = 'abc123xyz'; - s/\d+/$&*2/e; # yields 'abc246xyz' - s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' - s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' - - s/%(.)/$percent{$1}/g; # change percent escapes; no /e - s/%(.)/$percent{$1} || $&/ge; # expr now, so /e - s/^=(\w+)/pod($1)/ge; # use function call - - # expand variables in $_, but dynamics only, using - # symbolic dereferencing - s/\$(\w+)/${$1}/g; - - # Add one to the value of any numbers in the string - s/(\d+)/1 + $1/eg; - - # This will expand any embedded scalar variable - # (including lexicals) in $_ : First $1 is interpolated - # to the variable name, and then evaluated - s/(\$\w+)/$1/eeg; - - # Delete (most) C comments. - $program =~ s { - /\* # Match the opening delimiter. - .*? # Match a minimal number of characters. - \*/ # Match the closing delimiter. - } []gsx; - - s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively - - for ($variable) { # trim whitespace in $variable, cheap - s/^\s+//; - s/\s+$//; - } - - s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields - -Note the use of $ instead of \ in the last example. Unlike -B<sed>, we use the \<I<digit>> form in only the left hand side. -Anywhere else it's $<I<digit>>. - -Occasionally, you can't use just a C</g> to get all the changes -to occur that you might want. Here are two common cases: - - # put commas in the right places in an integer - 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; - - # expand tabs to 8-column spacing - 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; =item tr/SEARCHLIST/REPLACEMENTLIST/cds X<tr> X<y> X<transliterate> X</c> X</d> X</s> diff --git a/pod/perlre.pod b/pod/perlre.pod index aa861ae46e..99cba6889e 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -27,15 +27,6 @@ L<perlop/"Gory details of parsing quoted constructs">. =over 4 -=item i -X</i> X<regex, case-insensitive> X<regexp, case-insensitive> -X<regular expression, case-insensitive> - -Do case-insensitive pattern matching. - -If C<use locale> is in effect, the case map is taken from the current -locale. See L<perllocale>. - =item m X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline> @@ -54,11 +45,26 @@ Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. +=item i +X</i> X<regex, case-insensitive> X<regexp, case-insensitive> +X<regular expression, case-insensitive> + +Do case-insensitive pattern matching. + +If C<use locale> is in effect, the case map is taken from the current +locale. See L<perllocale>. + =item x X</x> Extend your pattern's legibility by permitting whitespace and comments. +=item p +X</p> X<regex, preserve> X<regexp, preserve> + +Preserve the string matched such that ${^PREMATCH}, {$^MATCH}, and +${^POSTMATCH} are available for use after matching. + =back These are usually written as "the C</x> modifier", even though the delimiter @@ -593,11 +599,11 @@ X<$&> X<$`> X<$'> As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&> and C<$'>, B<except> that they are only guaranteed to be defined after a -successful match that was executed with the C</k> (keep-copy) modifier. +successful match that was executed with the C</p> (preserve) modifier. The use of these variables incurs no global performance penalty, unlike their punctuation char equivalents, however at the trade-off that you have to tell perl when you want to use them. -X</k> X<k modifier> +X</p> X<p modifier> Backslashed metacharacters in Perl are alphanumeric, such as C<\b>, C<\w>, C<\n>. Unlike some other regular expression languages, there diff --git a/pod/perlvar.pod b/pod/perlvar.pod index b4db654178..fc738a0903 100644 --- a/pod/perlvar.pod +++ b/pod/perlvar.pod @@ -234,7 +234,7 @@ X<${^MATCH}> This is similar to C<$&> (C<$POSTMATCH>) except that it does not incur the performance penalty associated with that variable, and is only guaranteed to return a defined value when the pattern was compiled or executed with -the C</k> modifier. +the C</p> modifier. =item $PREMATCH @@ -257,7 +257,7 @@ X<${^PREMATCH}> This is similar to C<$`> ($PREMATCH) except that it does not incur the performance penalty associated with that variable, and is only guaranteed to return a defined value when the pattern was compiled or executed with -the C</k> modifier. +the C</p> modifier. =item $POSTMATCH @@ -286,7 +286,7 @@ X<${^POSTMATCH}> This is similar to C<$'> (C<$POSTMATCH>) except that it does not incur the performance penalty associated with that variable, and is only guaranteed to return a defined value when the pattern was compiled or executed with -the C</k> modifier. +the C</p> modifier. =item $LAST_PAREN_MATCH |