summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2009-10-18 19:50:34 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2009-10-18 19:50:34 +0000
commitfce3f37a66924340f6a8dc1e536e5ead40f4c2e4 (patch)
tree95323ddfd71ab35c1d83da94bfaf030cabf5a242
parentbf3e4896d728bafe2515dd61bf484e0b84e84c89 (diff)
downloadpcre-fce3f37a66924340f6a8dc1e536e5ead40f4c2e4.tar.gz
Document more clearly capturing behaviour for recursion and subroutines.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@464 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--configure.ac4
-rw-r--r--doc/pcrepattern.343
-rw-r--r--testdata/testinput1133
-rw-r--r--testdata/testinput213
-rw-r--r--testdata/testinput714
-rw-r--r--testdata/testoutput1165
-rw-r--r--testdata/testoutput224
-rw-r--r--testdata/testoutput724
8 files changed, 195 insertions, 25 deletions
diff --git a/configure.ac b/configure.ac
index 2254be6..612810b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -8,8 +8,8 @@ dnl empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [00])
-m4_define(pcre_prerelease, [-RC2])
-m4_define(pcre_date, [2009-10-17])
+m4_define(pcre_prerelease, [])
+m4_define(pcre_date, [2009-10-19])
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 460a6f8..dda0b8e 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -2074,10 +2074,9 @@ the match runs for a very long time indeed because there are so many different
ways the + and * repeats can carve up the subject, and all have to be tested
before failure can be reported.
.P
-At the end of a match, the values set for any capturing subpatterns are those
-from the outermost level of the recursion at which the subpattern value is set.
-If you want to obtain intermediate values, a callout function can be used (see
-below and the
+At the end of a match, the values of capturing parentheses are those from
+the outermost level. If you want to obtain intermediate values, a callout
+function can be used (see below and the
.\" HREF
\fBpcrecallout\fP
.\"
@@ -2085,18 +2084,15 @@ documentation). If the pattern above is matched against
.sp
(ab(cd)ef)
.sp
-the value for the capturing parentheses is "ef", which is the last value taken
-on at the top level. If additional parentheses are added, giving
-.sp
- \e( ( ( [^()]++ | (?R) )* ) \e)
- ^ ^
- ^ ^
-.sp
-the string they capture is "ab(cd)ef", the contents of the top level
-parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
-has to obtain extra memory to store data during a recursion, which it does by
-using \fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no
-memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+the value for the inner capturing parentheses (numbered 2) is "ef", which is
+the last value taken on at the top level. If a capturing subpattern is not
+matched at the top level, its final value is unset, even if it is (temporarily)
+set at a deeper level.
+.P
+If there are more than 15 capturing parentheses in a pattern, PCRE has to
+obtain extra memory to store data during a recursion, which it does by using
+\fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no memory can
+be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
.P
Do not confuse the (?R) item with the condition (R), which tests for recursion.
Consider this pattern, which matches text in angle brackets, allowing for
@@ -2207,10 +2203,11 @@ matches "sense and sensibility" and "response and responsibility", but not
is used, it does match "sense and responsibility" as well as the other two
strings. Another example is given in the discussion of DEFINE above.
.P
-Like recursive subpatterns, a "subroutine" call is always treated as an atomic
+Like recursive subpatterns, a subroutine call is always treated as an atomic
group. That is, once it has matched some of the subject string, it is never
re-entered, even if it contains untried alternatives and there is a subsequent
-matching failure.
+matching failure. Any capturing parentheses that are set during the subroutine
+call revert to their previous values afterwards.
.P
When a subpattern is used as a subroutine, processing options such as
case-independence are fixed when the subpattern is defined. They cannot be
@@ -2294,10 +2291,10 @@ a backtracking algorithm. With the exception of (*FAIL), which behaves like a
failing negative assertion, they cause an error if encountered by
\fBpcre_dfa_exec()\fP.
.P
-If any of these verbs are used in an assertion subpattern, their effect is
-confined to that subpattern; it does not extend to the surrounding pattern.
-Note that assertion subpatterns are processed as anchored at the point where
-they are tested.
+If any of these verbs are used in an assertion or subroutine subpattern
+(including recursive subpatterns), their effect is confined to that subpattern;
+it does not extend to the surrounding pattern. Note that such subpatterns are
+processed as anchored at the point where they are tested.
.P
The new verbs make use of what was previously invalid syntax: an opening
parenthesis followed by an asterisk. In Perl, they are generally of the form
@@ -2418,6 +2415,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 04 October 2009
+Last updated: 18 October 2009
Copyright (c) 1997-2009 University of Cambridge.
.fi
diff --git a/testdata/testinput11 b/testdata/testinput11
index 936bdb1..ad896fd 100644
--- a/testdata/testinput11
+++ b/testdata/testinput11
@@ -303,4 +303,37 @@
** Failers
b\"11111
+/(?:(?1)|B)(A(*F)|C)/
+ ABCD
+ CCD
+ ** Failers
+ CAD
+
+/^(?:(?1)|B)(A(*F)|C)/
+ CCD
+ BCD
+ ** Failers
+ ABCD
+ CAD
+ BAD
+
+/(?:(?1)|B)(A(*ACCEPT)XX|C)D/
+ AAD
+ ACD
+ BAD
+ BCD
+ BAX
+ ** Failers
+ ACX
+ ABC
+
+/(?(DEFINE)(A))B(?1)C/
+ BAC
+
+/(?(DEFINE)((A)\2))B(?1)C/
+ BAAC
+
+/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x
+ (ab(cd)ef)
+
/-- End of testinput11 --/
diff --git a/testdata/testinput2 b/testdata/testinput2
index 850242e..57188e4 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -3147,4 +3147,17 @@ a random value. /Ix
xxxxabcde\P
xxxxabcde\P\P
+/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
+ and not behaving as specified in that it *does* bumpalong after hitting
+ (*COMMIT). --/
+
+/(?1)(A(*COMMIT)|B)D/
+ ABD
+ XABD
+ BAD
+ ABXABD
+ ** Failers
+ ABX
+ BAXBAD
+
/-- End of testinput2 --/
diff --git a/testdata/testinput7 b/testdata/testinput7
index 2b42d10..710d9ee 100644
--- a/testdata/testinput7
+++ b/testdata/testinput7
@@ -4528,4 +4528,18 @@
xxxxabcde\P
xxxxabcde\P\P
+/(?:(?1)|B)(A(*F)|C)/
+ ABCD
+ CCD
+ ** Failers
+ CAD
+
+/^(?:(?1)|B)(A(*F)|C)/
+ CCD
+ BCD
+ ** Failers
+ ABCD
+ CAD
+ BAD
+
/-- End of testinput7 --/
diff --git a/testdata/testoutput11 b/testdata/testoutput11
index 734339a..e901e0b 100644
--- a/testdata/testoutput11
+++ b/testdata/testoutput11
@@ -647,4 +647,69 @@ No match
b\"11111
No match
+/(?:(?1)|B)(A(*F)|C)/
+ ABCD
+ 0: BC
+ 1: C
+ CCD
+ 0: CC
+ 1: C
+ ** Failers
+No match
+ CAD
+No match
+
+/^(?:(?1)|B)(A(*F)|C)/
+ CCD
+ 0: CC
+ 1: C
+ BCD
+ 0: BC
+ 1: C
+ ** Failers
+No match
+ ABCD
+No match
+ CAD
+No match
+ BAD
+No match
+
+/(?:(?1)|B)(A(*ACCEPT)XX|C)D/
+ AAD
+ 0: AA
+ 1: A
+ ACD
+ 0: ACD
+ 1: C
+ BAD
+ 0: BA
+ 1: A
+ BCD
+ 0: BCD
+ 1: C
+ BAX
+ 0: BA
+ 1: A
+ ** Failers
+No match
+ ACX
+No match
+ ABC
+No match
+
+/(?(DEFINE)(A))B(?1)C/
+ BAC
+ 0: BAC
+
+/(?(DEFINE)((A)\2))B(?1)C/
+ BAAC
+ 0: BAAC
+
+/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x
+ (ab(cd)ef)
+ 0: (ab(cd)ef)
+ 1: (ab(cd)ef)
+ 2: ef
+
/-- End of testinput11 --/
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index 646478e..2a90af4 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -10407,4 +10407,28 @@ Partial match: abca
xxxxabcde\P\P
Partial match: abcde
+/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
+ and not behaving as specified in that it *does* bumpalong after hitting
+ (*COMMIT). --/
+
+/(?1)(A(*COMMIT)|B)D/
+ ABD
+ 0: ABD
+ 1: B
+ XABD
+ 0: ABD
+ 1: B
+ BAD
+ 0: BAD
+ 1: A
+ ABXABD
+ 0: ABD
+ 1: B
+ ** Failers
+No match
+ ABX
+No match
+ BAXBAD
+No match
+
/-- End of testinput2 --/
diff --git a/testdata/testoutput7 b/testdata/testoutput7
index c1b4fd0..c6c9df4 100644
--- a/testdata/testoutput7
+++ b/testdata/testoutput7
@@ -7560,4 +7560,28 @@ Partial match: abc1
xxxxabcde\P\P
Partial match: abcde
+/(?:(?1)|B)(A(*F)|C)/
+ ABCD
+ 0: BC
+ CCD
+ 0: CC
+ ** Failers
+No match
+ CAD
+No match
+
+/^(?:(?1)|B)(A(*F)|C)/
+ CCD
+ 0: CC
+ BCD
+ 0: BC
+ ** Failers
+No match
+ ABCD
+No match
+ CAD
+No match
+ BAD
+No match
+
/-- End of testinput7 --/