diff options
author | Tom Christiansen <tchrist@perl.com> | 1999-01-07 16:05:02 -0700 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 1999-01-08 11:51:52 +0000 |
commit | 65acb1b1d672587d3a0d073613a475584830e38e (patch) | |
tree | fcb09719fada1c9453493712a798b889dd89b086 /pod/perlfaq6.pod | |
parent | ae83f3772b2dd371e676035c6714025e89d7e08f (diff) | |
download | perl-65acb1b1d672587d3a0d073613a475584830e38e.tar.gz |
FAQ jumbo patch from tchrist.
Message-Id: <199901080605.XAA20229@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20231@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq1.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20233@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq2.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20235@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq3.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20237@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq4.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20239@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq5.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20241@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq6.pod
Date: Thu, 7 Jan 1999 23:05:02 -0700
Message-Id: <199901080605.XAA20243@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq7.pod
Date: Thu, 7 Jan 1999 23:05:03 -0700
Message-Id: <199901080605.XAA20245@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq8.pod
Date: Thu, 7 Jan 1999 23:05:03 -0700
Message-Id: <199901080605.XAA20257@jhereg.perl.com>
From: Tom Christiansen <tchrist@jhereg.perl.com>
To: pumpkings@jhereg.perl.com
Subject: newest version of perlfaq9.pod
Date: Thu, 7 Jan 1999 23:05:03 -0700
p4raw-id: //depot/cfgperl@2588
Diffstat (limited to 'pod/perlfaq6.pod')
-rw-r--r-- | pod/perlfaq6.pod | 127 |
1 files changed, 67 insertions, 60 deletions
diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 488a27c83a..834fd89aa1 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regexps ($Revision: 1.22 $, $Date: 1998/07/16 14:01:07 $) +perlfaq6 - Regexps ($Revision: 1.25 $, $Date: 1999/01/08 04:50:47 $) =head1 DESCRIPTION @@ -128,7 +128,7 @@ L<perlop>): If you wanted text and not lines, you would use - perl -0777 -pe 'print "$1\n" while /START(.*?)END/gs' file1 file2 ... + perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ... But if you want nested occurrences of C<START> through C<END>, you'll run up against the problem described in the question in this section @@ -387,48 +387,31 @@ See the module String::Approx available from CPAN. =head2 How do I efficiently match many regular expressions at once? -The following is super-inefficient: +The following is extremely inefficient: - while (<FH>) { - foreach $pat (@patterns) { - if ( /$pat/ ) { - # do something - } - } - } - -Instead, you either need to use one of the experimental Regexp extension -modules from CPAN (which might well be overkill for your purposes), -or else put together something like this, inspired from a routine -in Jeffrey Friedl's book: - - sub _bm_build { - my $condition = shift; - my @regexp = @_; # this MUST not be local(); need my() - my $expr = join $condition => map { "m/\$regexp[$_]/o" } (0..$#regexp); - my $match_func = eval "sub { $expr }"; - die if $@; # propagate $@; this shouldn't happen! - return $match_func; - } - - sub bm_and { _bm_build('&&', @_) } - sub bm_or { _bm_build('||', @_) } - - $f1 = bm_and qw{ - xterm - (?i)window - }; - - $f2 = bm_or qw{ - \b[Ff]ree\b - \bBSD\B - (?i)sys(tem)?\s*[V5]\b - }; - - # feed me /etc/termcap, prolly - while ( <> ) { - print "1: $_" if &$f1; - print "2: $_" if &$f2; + # slow but obvious way + @popstates = qw(CO ON MI WI MN); + while (defined($line = <>)) { + for $state (@popstates) { + if ($line =~ /\b$state\b/i) { + print $line; + last; + } + } + } + +That's because Perl has to recompile all those patterns for each of +the lines of the file. As of the 5.005 release, there's a much better +approach, one which makes use of the new C<qr//> operator: + + # use spiffy new qr// operator, with /i flag even + use 5.005; + @popstates = qw(CO ON MI WI MN); + @poppats = map { qr/\b$_\b/i } @popstates; + while (defined($line = <>)) { + for $patobj (@poppats) { + print $line if $line =~ /$patobj/; + } } =head2 Why don't word-boundary searches with C<\b> work for me? @@ -460,22 +443,24 @@ not "this" or "island". =head2 Why does using $&, $`, or $' slow my program down? -Because once Perl sees that you need one of these variables anywhere -in the program, it has to provide them on each and every pattern -match. The same mechanism that handles these provides for the use of -$1, $2, etc., so you pay the same price for each regexp that contains -capturing parentheses. But if you never use $&, etc., in your script, -then regexps I<without> capturing parentheses won't be penalized. So -avoid $&, $', and $` if you can, but if you can't (and some algorithms -really appreciate them), once you've used them once, use them at will, -because you've already paid the price. +Because once Perl sees that you need one of these variables anywhere in +the program, it has to provide them on each and every pattern match. +The same mechanism that handles these provides for the use of $1, $2, +etc., so you pay the same price for each regexp that contains capturing +parentheses. But if you never use $&, etc., in your script, then regexps +I<without> capturing parentheses won't be penalized. So avoid $&, $', +and $` if you can, but if you can't, once you've used them at all, use +them at will because you've already paid the price. Remember that some +algorithms really appreciate them. As of the 5.005 release. the $& +variable is no longer "expensive" the way the other two are. =head2 What good is C<\G> in a regular expression? The notation C<\G> is used in a match or substitution in conjunction the C</g> modifier (and ignored if there's no C</g>) to anchor the regular expression to the point just past where the last match occurred, i.e. the -pos() point. +pos() point. A failed match resets the position of C<\G> unless the +C</c> modifier is in effect. For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading C<E<gt>> characters), and @@ -596,20 +581,41 @@ Or like this: Or like this: - die "sorry, Perl doesn't (yet) have Martian support )-:\n"; - -In addition, a sample program which converts half-width to full-width -katakana (in Shift-JIS or EUC encoding) is available from CPAN as - -=for Tom make it so + die "sorry, Perl doesn't (yet) have Martian support )-:\n"; There are many double- (and multi-) byte encodings commonly used these days. Some versions of these have 1-, 2-, 3-, and 4-byte characters, all mixed. +=head2 How do I match a pattern that is supplied by the user? + +Well, if it's really a pattern, then just use + + chomp($pattern = <STDIN>); + if ($line =~ /$pattern/) { } + +Or, since you have no guarantee that your user entered +a valid regular expression, trap the exception this way: + + if (eval { $line =~ /$pattern/ }) { } + +But if all you really want to search for a string, not a pattern, +then you should either use the index() function, which is made for +string searching, or if you can't be disabused of using a pattern +match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented +in L<perlre>. + + $pattern = <STDIN>; + + open (FILE, $input) or die "Couldn't open input $input: $!; aborting"; + while (<FILE>) { + print if /\Q$pattern\E/; + } + close FILE; + =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as part of the Standard Version of Perl, or as part of @@ -624,3 +630,4 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. + |