diff options
author | Ilya Zakharevich <ilya@math.berkeley.edu> | 1999-09-09 14:35:37 -0400 |
---|---|---|
committer | Gurusamy Sarathy <gsar@cpan.org> | 1999-10-11 17:57:31 +0000 |
commit | 35a734be319e73f3caacab9f423c3ca51b60342b (patch) | |
tree | 9059d26523b6598f2413e470b9798b042c5e6474 /pod/perlre.pod | |
parent | abef537ad8d3230927d259211d4c0d673d28d7b7 (diff) | |
download | perl-35a734be319e73f3caacab9f423c3ca51b60342b.tar.gz |
slightly edited variant of suggested patch
Message-ID: <19990909183537.A28682@monk.mps.ohio-state.edu>
Subject: [PATCH 5.005_58] How RExen match?
p4raw-id: //depot/perl@4344
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r-- | pod/perlre.pod | 106 |
1 files changed, 106 insertions, 0 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod index 4bc042d9b3..9a06305629 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -717,6 +717,11 @@ themselves. =head2 Backtracking +NOTE: This section presents an abstract approximation of regular +expression behavior. For a more rigorous (and complicated) view of +the rules involved in selecting a match among possible alternatives, +see L<Combining pieces together>. + A fundamental feature of regular expression matching involves the notion called I<backtracking>, which is currently used (when needed) by all regular expression quantifiers, namely C<*>, C<*?>, C<+>, @@ -1085,6 +1090,107 @@ the matched string, and is reset by each assignment to pos(). Zero-length matches at the end of the previous match are ignored during C<split>. +=head2 Combining pieces together + +Each of the elementary pieces of regular expressions which were described +before (such as C<ab> or C<\Z>) could match at most one substring +at the given position of the input string. However, in a typical regular +expression these elementary pieces are combined into more complicated +patterns using combining operators C<ST>, C<S|T>, C<S*> etc +(in these examples C<S> and C<T> are regular subexpressions). + +Such combinations can include alternatives, leading to a problem of choice: +if we match a regular expression C<a|ab> against C<"abc">, will it match +substring C<"a"> or C<"ab">? One way to describe which substring is +actually matched is the concept of backtracking (see L<"Backtracking">). +However, this description is too low-level and makes you think +in terms of a particular implementation. + +Another description starts with notions of "better"/"worse". All the +substrings which may be matched by the given regular expression can be +sorted from the "best" match to the "worst" match, and it is the "best" +match which is chosen. This substitutes the question of "what is chosen?" +by the question of "which matches are better, and which are worse?". + +Again, for elementary pieces there is no such question, since at most +one match at a given position is possible. This section describes the +notion of better/worse for combining operators. In the description +below C<S> and C<T> are regular subexpressions. + +=over + +=item C<ST> + +Consider two possible matches, C<AB> and C<A'B'>, C<A> and C<A'> are +substrings which can be matched by C<S>, C<B> and C<B'> are substrings +which can be matched by C<T>. + +If C<A> is better match for C<S> than C<A'>, C<AB> is a better +match than C<A'B'>. + +If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if +C<B> is better match for C<T> than C<B'>. + +=item C<S|T> + +When C<S> can match, it is a better match than when only C<T> can match. + +Ordering of two matches for C<S> is the same as for C<S>. Similar for +two matches for C<T>. + +=item C<S{REPEAT_COUNT}> + +Matches as C<SSS...S> (repeated as many times as necessary). + +=item C<S{min,max}> + +Matches as C<S{max}|S{max-1}|...|S{min+1}|S{min}>. + +=item C<S{min,max}?> + +Matches as C<S{min}|S{min+1}|...|S{max-1}|S{max}>. + +=item C<S?>, C<S*>, C<S+> + +Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively. + +=item C<S??>, C<S*?>, C<S+?> + +Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively. + +=item C<(?E<gt>S)> + +Matches the best match for C<S> and only that. + +=item C<(?=S)>, C<(?<=S)> + +Only the best match for C<S> is considered. (This is important only if +C<S> has capturing parentheses, and backreferences are used somewhere +else in the whole regular expression.) + +=item C<(?!S)>, C<(?<!S)> + +For this grouping operator there is no need to describe the ordering, since +only whether or not C<S> can match is important. + +=item C<(?p{ EXPR })> + +The ordering is the same as for the regular expression which is +the result of EXPR. + +=item C<(?(condition)yes-pattern|no-pattern)> + +Recall that which of C<yes-pattern> or C<no-pattern> actually matches is +already determined. The ordering of the matches is the same as for the +chosen subexpression. + +=back + +The above recipes describe the ordering of matches I<at a given position>. +One more rule is needed to understand how a match is determined for the +whole regular expression: a match at an earlier position is always better +than a match at a later position. + =head2 Creating custom RE engines Overloaded constants (see L<overload>) provide a simple way to extend |