From 35a734be319e73f3caacab9f423c3ca51b60342b Mon Sep 17 00:00:00 2001 From: Ilya Zakharevich Date: Thu, 9 Sep 1999 14:35:37 -0400 Subject: slightly edited variant of suggested patch Message-ID: <19990909183537.A28682@monk.mps.ohio-state.edu> Subject: [PATCH 5.005_58] How RExen match? p4raw-id: //depot/perl@4344 --- pod/perlre.pod | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) diff --git a/pod/perlre.pod b/pod/perlre.pod index 4bc042d9b3..9a06305629 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -717,6 +717,11 @@ themselves. =head2 Backtracking +NOTE: This section presents an abstract approximation of regular +expression behavior. For a more rigorous (and complicated) view of +the rules involved in selecting a match among possible alternatives, +see L. + A fundamental feature of regular expression matching involves the notion called I, which is currently used (when needed) by all regular expression quantifiers, namely C<*>, C<*?>, C<+>, @@ -1085,6 +1090,107 @@ the matched string, and is reset by each assignment to pos(). Zero-length matches at the end of the previous match are ignored during C. +=head2 Combining pieces together + +Each of the elementary pieces of regular expressions which were described +before (such as C or C<\Z>) could match at most one substring +at the given position of the input string. However, in a typical regular +expression these elementary pieces are combined into more complicated +patterns using combining operators C, C, C etc +(in these examples C and C are regular subexpressions). + +Such combinations can include alternatives, leading to a problem of choice: +if we match a regular expression C against C<"abc">, will it match +substring C<"a"> or C<"ab">? One way to describe which substring is +actually matched is the concept of backtracking (see L<"Backtracking">). +However, this description is too low-level and makes you think +in terms of a particular implementation. + +Another description starts with notions of "better"/"worse". All the +substrings which may be matched by the given regular expression can be +sorted from the "best" match to the "worst" match, and it is the "best" +match which is chosen. This substitutes the question of "what is chosen?" +by the question of "which matches are better, and which are worse?". + +Again, for elementary pieces there is no such question, since at most +one match at a given position is possible. This section describes the +notion of better/worse for combining operators. In the description +below C and C are regular subexpressions. + +=over + +=item C + +Consider two possible matches, C and C, C and C are +substrings which can be matched by C, C and C are substrings +which can be matched by C. + +If C is better match for C than C, C is a better +match than C. + +If C and C coincide: C is a better match than C if +C is better match for C than C. + +=item C + +When C can match, it is a better match than when only C can match. + +Ordering of two matches for C is the same as for C. Similar for +two matches for C. + +=item C + +Matches as C (repeated as many times as necessary). + +=item C + +Matches as C. + +=item C + +Matches as C. + +=item C, C, C + +Same as C, C, C respectively. + +=item C, C, C + +Same as C, C, C respectively. + +=item C<(?ES)> + +Matches the best match for C and only that. + +=item C<(?=S)>, C<(?<=S)> + +Only the best match for C is considered. (This is important only if +C has capturing parentheses, and backreferences are used somewhere +else in the whole regular expression.) + +=item C<(?!S)>, C<(? + +For this grouping operator there is no need to describe the ordering, since +only whether or not C can match is important. + +=item C<(?p{ EXPR })> + +The ordering is the same as for the regular expression which is +the result of EXPR. + +=item C<(?(condition)yes-pattern|no-pattern)> + +Recall that which of C or C actually matches is +already determined. The ordering of the matches is the same as for the +chosen subexpression. + +=back + +The above recipes describe the ordering of matches I. +One more rule is needed to understand how a match is determined for the +whole regular expression: a match at an earlier position is always better +than a match at a later position. + =head2 Creating custom RE engines Overloaded constants (see L) provide a simple way to extend -- cgit v1.2.1