summaryrefslogtreecommitdiff
path: root/pod/perlfaq6.pod
diff options
context:
space:
mode:
Diffstat (limited to 'pod/perlfaq6.pod')
-rw-r--r--pod/perlfaq6.pod39
1 files changed, 21 insertions, 18 deletions
diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod
index 4ab4d4cc98..a032d49a70 100644
--- a/pod/perlfaq6.pod
+++ b/pod/perlfaq6.pod
@@ -8,8 +8,9 @@ This section is surprisingly small because the rest of the FAQ is
littered with answers involving regular expressions. For example,
decoding a URL and checking whether something is a number are handled
with regular expressions, but those answers are found elsewhere in
-this document (in the section on Data and the Networking one on
-networking, to be precise).
+this document (in L<perlfaq9>: ``How do I decode or create those %-encodings
+on the web'' and L<perfaq4>: ``How do I determine whether a scalar is
+a number/whole/integer/float'', to be precise).
=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?
@@ -175,7 +176,7 @@ appear within a certain time.
$file->waitfor('/second line\n/');
print $file->getline;
-=head2 How do I substitute case insensitively on the LHS, but preserving case on the RHS?
+=head2 How do I substitute case insensitively on the LHS while preserving case on the RHS?
Here's a lovely Perlish solution by Larry Rosler. It exploits
properties of bitwise xor on ASCII strings.
@@ -280,10 +281,11 @@ Without the \Q, the regex would also spuriously match "di".
=head2 What is C</o> really for?
Using a variable in a regular expression match forces a re-evaluation
-(and perhaps recompilation) each time through. The C</o> modifier
-locks in the regex the first time it's used. This always happens in a
-constant regular expression, and in fact, the pattern was compiled
-into the internal format at the same time your entire program was.
+(and perhaps recompilation) each time the regular expression is
+encountered. The C</o> modifier locks in the regex the first time
+it's used. This always happens in a constant regular expression, and
+in fact, the pattern was compiled into the internal format at the same
+time your entire program was.
Use of C</o> is irrelevant unless variable interpolation is used in
the pattern, and if so, the regex engine will neither know nor care
@@ -367,8 +369,8 @@ A slight modification also removes C++ comments:
=head2 Can I use Perl regular expressions to match balanced text?
Although Perl regular expressions are more powerful than "mathematical"
-regular expressions, because they feature conveniences like backreferences
-(C<\1> and its ilk), they still aren't powerful enough -- with
+regular expressions because they feature conveniences like backreferences
+(C<\1> and its ilk), they still aren't powerful enough--with
the possible exception of bizarre and experimental features in the
development-track releases of Perl. You still need to use non-regex
techniques to parse balanced text, such as the text enclosed between
@@ -379,7 +381,7 @@ and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>,
or C<(> and C<)> can be found in
http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .
-The C::Scan module from CPAN contains such subs for internal usage,
+The C::Scan module from CPAN contains such subs for internal use,
but they are undocumented.
=head2 What does it mean that regexes are greedy? How can I get around it?
@@ -450,7 +452,8 @@ regular expression:
print "$count $line";
}
-If you want these output in a sorted order, see the section on Hashes.
+If you want these output in a sorted order, see L<perlfaq4>: ``How do I
+sort a hash (optionally by value instead of key)?''.
=head2 How can I do approximate matching?
@@ -487,7 +490,7 @@ approach, one which makes use of the new C<qr//> operator:
=head2 Why don't word-boundary searches with C<\b> work for me?
-Two common misconceptions are that C<\b> is a synonym for C<\s+>, and
+Two common misconceptions are that C<\b> is a synonym for C<\s+> and
that it's the edge between whitespace characters and non-whitespace
characters. Neither is correct. C<\b> is the place between a C<\w>
character and a C<\W> character (that is, C<\b> is the edge of a
@@ -514,11 +517,11 @@ not "this" or "island".
=head2 Why does using $&, $`, or $' slow my program down?
-Because once Perl sees that you need one of these variables anywhere in
-the program, it has to provide them on each and every pattern match.
+Once Perl sees that you need one of these variables anywhere in
+the program, it provides them on each and every pattern match.
The same mechanism that handles these provides for the use of $1, $2,
etc., so you pay the same price for each regex that contains capturing
-parentheses. But if you never use $&, etc., in your script, then regexes
+parentheses. If you never use $&, etc., in your script, then regexes
I<without> capturing parentheses won't be penalized. So avoid $&, $',
and $` if you can, but if you can't, once you've used them at all, use
them at will because you've already paid the price. Remember that some
@@ -589,7 +592,7 @@ Of course, that could have been written as
}
}
-But then you lose the vertical alignment of the regular expressions.
+but then you lose the vertical alignment of the regular expressions.
=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
@@ -670,12 +673,12 @@ Well, if it's really a pattern, then just use
chomp($pattern = <STDIN>);
if ($line =~ /$pattern/) { }
-Or, since you have no guarantee that your user entered
+Alternatively, since you have no guarantee that your user entered
a valid regular expression, trap the exception this way:
if (eval { $line =~ /$pattern/ }) { }
-But if all you really want to search for a string, not a pattern,
+If all you really want to search for a string, not a pattern,
then you should either use the index() function, which is made for
string searching, or if you can't be disabused of using a pattern
match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented