diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2009-10-18 19:50:34 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2009-10-18 19:50:34 +0000 |
commit | fce3f37a66924340f6a8dc1e536e5ead40f4c2e4 (patch) | |
tree | 95323ddfd71ab35c1d83da94bfaf030cabf5a242 | |
parent | bf3e4896d728bafe2515dd61bf484e0b84e84c89 (diff) | |
download | pcre-fce3f37a66924340f6a8dc1e536e5ead40f4c2e4.tar.gz |
Document more clearly capturing behaviour for recursion and subroutines.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@464 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r-- | configure.ac | 4 | ||||
-rw-r--r-- | doc/pcrepattern.3 | 43 | ||||
-rw-r--r-- | testdata/testinput11 | 33 | ||||
-rw-r--r-- | testdata/testinput2 | 13 | ||||
-rw-r--r-- | testdata/testinput7 | 14 | ||||
-rw-r--r-- | testdata/testoutput11 | 65 | ||||
-rw-r--r-- | testdata/testoutput2 | 24 | ||||
-rw-r--r-- | testdata/testoutput7 | 24 |
8 files changed, 195 insertions, 25 deletions
diff --git a/configure.ac b/configure.ac index 2254be6..612810b 100644 --- a/configure.ac +++ b/configure.ac @@ -8,8 +8,8 @@ dnl empty. m4_define(pcre_major, [8]) m4_define(pcre_minor, [00]) -m4_define(pcre_prerelease, [-RC2]) -m4_define(pcre_date, [2009-10-17]) +m4_define(pcre_prerelease, []) +m4_define(pcre_date, [2009-10-19]) # Libtool shared library interface versions (current:revision:age) m4_define(libpcre_version, [0:1:0]) diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3 index 460a6f8..dda0b8e 100644 --- a/doc/pcrepattern.3 +++ b/doc/pcrepattern.3 @@ -2074,10 +2074,9 @@ the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported. .P -At the end of a match, the values set for any capturing subpatterns are those -from the outermost level of the recursion at which the subpattern value is set. -If you want to obtain intermediate values, a callout function can be used (see -below and the +At the end of a match, the values of capturing parentheses are those from +the outermost level. If you want to obtain intermediate values, a callout +function can be used (see below and the .\" HREF \fBpcrecallout\fP .\" @@ -2085,18 +2084,15 @@ documentation). If the pattern above is matched against .sp (ab(cd)ef) .sp -the value for the capturing parentheses is "ef", which is the last value taken -on at the top level. If additional parentheses are added, giving -.sp - \e( ( ( [^()]++ | (?R) )* ) \e) - ^ ^ - ^ ^ -.sp -the string they capture is "ab(cd)ef", the contents of the top level -parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE -has to obtain extra memory to store data during a recursion, which it does by -using \fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no -memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. +the value for the inner capturing parentheses (numbered 2) is "ef", which is +the last value taken on at the top level. If a capturing subpattern is not +matched at the top level, its final value is unset, even if it is (temporarily) +set at a deeper level. +.P +If there are more than 15 capturing parentheses in a pattern, PCRE has to +obtain extra memory to store data during a recursion, which it does by using +\fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no memory can +be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. .P Do not confuse the (?R) item with the condition (R), which tests for recursion. Consider this pattern, which matches text in angle brackets, allowing for @@ -2207,10 +2203,11 @@ matches "sense and sensibility" and "response and responsibility", but not is used, it does match "sense and responsibility" as well as the other two strings. Another example is given in the discussion of DEFINE above. .P -Like recursive subpatterns, a "subroutine" call is always treated as an atomic +Like recursive subpatterns, a subroutine call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent -matching failure. +matching failure. Any capturing parentheses that are set during the subroutine +call revert to their previous values afterwards. .P When a subpattern is used as a subroutine, processing options such as case-independence are fixed when the subpattern is defined. They cannot be @@ -2294,10 +2291,10 @@ a backtracking algorithm. With the exception of (*FAIL), which behaves like a failing negative assertion, they cause an error if encountered by \fBpcre_dfa_exec()\fP. .P -If any of these verbs are used in an assertion subpattern, their effect is -confined to that subpattern; it does not extend to the surrounding pattern. -Note that assertion subpatterns are processed as anchored at the point where -they are tested. +If any of these verbs are used in an assertion or subroutine subpattern +(including recursive subpatterns), their effect is confined to that subpattern; +it does not extend to the surrounding pattern. Note that such subpatterns are +processed as anchored at the point where they are tested. .P The new verbs make use of what was previously invalid syntax: an opening parenthesis followed by an asterisk. In Perl, they are generally of the form @@ -2418,6 +2415,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 04 October 2009 +Last updated: 18 October 2009 Copyright (c) 1997-2009 University of Cambridge. .fi diff --git a/testdata/testinput11 b/testdata/testinput11 index 936bdb1..ad896fd 100644 --- a/testdata/testinput11 +++ b/testdata/testinput11 @@ -303,4 +303,37 @@ ** Failers b\"11111 +/(?:(?1)|B)(A(*F)|C)/ + ABCD + CCD + ** Failers + CAD + +/^(?:(?1)|B)(A(*F)|C)/ + CCD + BCD + ** Failers + ABCD + CAD + BAD + +/(?:(?1)|B)(A(*ACCEPT)XX|C)D/ + AAD + ACD + BAD + BCD + BAX + ** Failers + ACX + ABC + +/(?(DEFINE)(A))B(?1)C/ + BAC + +/(?(DEFINE)((A)\2))B(?1)C/ + BAAC + +/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x + (ab(cd)ef) + /-- End of testinput11 --/ diff --git a/testdata/testinput2 b/testdata/testinput2 index 850242e..57188e4 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -3147,4 +3147,17 @@ a random value. /Ix xxxxabcde\P xxxxabcde\P\P +/-- This is not in the Perl 5.10 test because Perl seems currently to be broken + and not behaving as specified in that it *does* bumpalong after hitting + (*COMMIT). --/ + +/(?1)(A(*COMMIT)|B)D/ + ABD + XABD + BAD + ABXABD + ** Failers + ABX + BAXBAD + /-- End of testinput2 --/ diff --git a/testdata/testinput7 b/testdata/testinput7 index 2b42d10..710d9ee 100644 --- a/testdata/testinput7 +++ b/testdata/testinput7 @@ -4528,4 +4528,18 @@ xxxxabcde\P xxxxabcde\P\P +/(?:(?1)|B)(A(*F)|C)/ + ABCD + CCD + ** Failers + CAD + +/^(?:(?1)|B)(A(*F)|C)/ + CCD + BCD + ** Failers + ABCD + CAD + BAD + /-- End of testinput7 --/ diff --git a/testdata/testoutput11 b/testdata/testoutput11 index 734339a..e901e0b 100644 --- a/testdata/testoutput11 +++ b/testdata/testoutput11 @@ -647,4 +647,69 @@ No match b\"11111 No match +/(?:(?1)|B)(A(*F)|C)/ + ABCD + 0: BC + 1: C + CCD + 0: CC + 1: C + ** Failers +No match + CAD +No match + +/^(?:(?1)|B)(A(*F)|C)/ + CCD + 0: CC + 1: C + BCD + 0: BC + 1: C + ** Failers +No match + ABCD +No match + CAD +No match + BAD +No match + +/(?:(?1)|B)(A(*ACCEPT)XX|C)D/ + AAD + 0: AA + 1: A + ACD + 0: ACD + 1: C + BAD + 0: BA + 1: A + BCD + 0: BCD + 1: C + BAX + 0: BA + 1: A + ** Failers +No match + ACX +No match + ABC +No match + +/(?(DEFINE)(A))B(?1)C/ + BAC + 0: BAC + +/(?(DEFINE)((A)\2))B(?1)C/ + BAAC + 0: BAAC + +/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x + (ab(cd)ef) + 0: (ab(cd)ef) + 1: (ab(cd)ef) + 2: ef + /-- End of testinput11 --/ diff --git a/testdata/testoutput2 b/testdata/testoutput2 index 646478e..2a90af4 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -10407,4 +10407,28 @@ Partial match: abca xxxxabcde\P\P Partial match: abcde +/-- This is not in the Perl 5.10 test because Perl seems currently to be broken + and not behaving as specified in that it *does* bumpalong after hitting + (*COMMIT). --/ + +/(?1)(A(*COMMIT)|B)D/ + ABD + 0: ABD + 1: B + XABD + 0: ABD + 1: B + BAD + 0: BAD + 1: A + ABXABD + 0: ABD + 1: B + ** Failers +No match + ABX +No match + BAXBAD +No match + /-- End of testinput2 --/ diff --git a/testdata/testoutput7 b/testdata/testoutput7 index c1b4fd0..c6c9df4 100644 --- a/testdata/testoutput7 +++ b/testdata/testoutput7 @@ -7560,4 +7560,28 @@ Partial match: abc1 xxxxabcde\P\P Partial match: abcde +/(?:(?1)|B)(A(*F)|C)/ + ABCD + 0: BC + CCD + 0: CC + ** Failers +No match + CAD +No match + +/^(?:(?1)|B)(A(*F)|C)/ + CCD + 0: CC + BCD + 0: BC + ** Failers +No match + ABCD +No match + CAD +No match + BAD +No match + /-- End of testinput7 --/ |