summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-07-21 14:34:51 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-07-21 14:34:51 +0000
commit1ad8a5e6add80b53753a4b78589ff41fc58dad18 (patch)
tree0f98c3150bddb2e2e6e617bfacd337369cd23061 /doc
parent8d3008803ffa16d46260e7e1ad087f62e1ca0a28 (diff)
downloadpcre2-1ad8a5e6add80b53753a4b78589ff41fc58dad18.tar.gz
Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
followed by (*ACCEPT) in an assertion. More small updates to perltest.sh. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@968 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc')
-rw-r--r--doc/html/pcre2pattern.html98
-rw-r--r--doc/html/pcre2syntax.html14
-rw-r--r--doc/html/pcre2test.html11
-rw-r--r--doc/pcre2.txt249
-rw-r--r--doc/pcre2pattern.396
-rw-r--r--doc/pcre2syntax.317
-rw-r--r--doc/pcre2test.113
-rw-r--r--doc/pcre2test.txt12
8 files changed, 274 insertions, 236 deletions
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index 4d68862..17d94f0 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -3122,17 +3122,16 @@ in the
documentation.
</P>
<P>
-Experiments with Perl suggest that it too has similar optimizations, sometimes
-leading to anomalous results.
+Experiments with Perl suggest that it too has similar optimizations, and like
+PCRE2, turning them off can change the result of a match.
</P>
<br><b>
Verbs that act immediately
</b><br>
<P>
-The following verbs act as soon as they are encountered. They may not be
-followed by a name.
+The following verbs act as soon as they are encountered.
<pre>
- (*ACCEPT)
+ (*ACCEPT) or (*ACCEPT:NAME)
</pre>
This verb causes the match to end successfully, skipping the remainder of the
pattern. However, when it is inside a subpattern that is called as a
@@ -3149,19 +3148,23 @@ example:
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
the outer parentheses.
<pre>
- (*FAIL) or (*F)
+ (*FAIL) or (*FAIL:NAME)
</pre>
-This verb causes a matching failure, forcing backtracking to occur. It is
-equivalent to (?!) but easier to read. The Perl documentation notes that it is
-probably useful only when combined with (?{}) or (??{}). Those are, of course,
-Perl features that are not present in PCRE2. The nearest equivalent is the
-callout feature, as for example in this pattern:
+This verb causes a matching failure, forcing backtracking to occur. It may be
+abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
+documentation notes that it is probably useful only when combined with (?{}) or
+(??{}). Those are, of course, Perl features that are not present in PCRE2. The
+nearest equivalent is the callout feature, as for example in this pattern:
<pre>
a+(?C)(*FAIL)
</pre>
A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times).
</P>
+<P>
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
+(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
+</P>
<br><b>
Recording which path was taken
</b><br>
@@ -3186,9 +3189,9 @@ assertions and atomic groups. (There are differences in those cases when
(*MARK) is used in conjunction with (*SKIP) as described below.)
</P>
<P>
-As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
-arguments. Whichever is last on the matching path is passed back. See below for
-more details of these other verbs.
+As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
+associated NAME arguments. Whichever is last on the matching path is passed
+back. See below for more details of these other verbs.
</P>
<P>
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
@@ -3250,22 +3253,25 @@ reaches them. The behaviour described below is what happens when the verb is
not in a subroutine or an assertion. Subsequent sections cover these special
cases.
<pre>
- (*COMMIT)
+ (*COMMIT) or (*COMMIT:NAME)
</pre>
-This verb, which may not be followed by a name, causes the whole match to fail
-outright if there is a later matching failure that causes backtracking to reach
-it. Even if the pattern is unanchored, no further attempts to find a match by
-advancing the starting point take place. If (*COMMIT) is the only backtracking
-verb that is encountered, once it has been passed <b>pcre2_match()</b> is
-committed to finding a match at the current starting point, or not at all. For
-example:
+This verb causes the whole match to fail outright if there is a later matching
+failure that causes backtracking to reach it. Even if the pattern is
+unanchored, no further attempts to find a match by advancing the starting point
+take place. If (*COMMIT) is the only backtracking verb that is encountered,
+once it has been passed <b>pcre2_match()</b> is committed to finding a match at
+the current starting point, or not at all. For example:
<pre>
a+(*COMMIT)b
</pre>
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." The name of the most
-recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
-match failure.
+dynamic anchor, or "I've started, so I must finish."
+</P>
+<P>
+The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
</P>
<P>
If there is more than one backtracking verb in a pattern, a different one that
@@ -3309,7 +3315,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) or (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
<pre>
(*SKIP)
</pre>
@@ -3317,7 +3323,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
pattern is unanchored, the "bumpalong" advance is not to the next character,
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
signifies that whatever text was matched leading up to it cannot be part of a
-successful match. Consider:
+successful match if there is a later mismatch. Consider:
<pre>
a+(*SKIP)b
</pre>
@@ -3364,7 +3370,7 @@ the second branch of the pattern.
</P>
<P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
-names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
<pre>
(*THEN) or (*THEN:NAME)
</pre>
@@ -3383,10 +3389,10 @@ more alternatives, so there is a backtrack to whatever came before the entire
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
</P>
<P>
-The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
-It is like (*MARK:NAME) in that the name is remembered for passing back to the
+The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) and (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
</P>
<P>
A subpattern that does not contain a | character is just a part of the
@@ -3461,13 +3467,14 @@ onto (*COMMIT).
Backtracking verbs in repeated groups
</b><br>
<P>
-PCRE2 differs from Perl in its handling of backtracking verbs in repeated
-groups. For example, consider:
+PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
+repeated groups. For example, consider:
<pre>
/(a(*COMMIT)b)+ac/
</pre>
-If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
-in the second repeat of the group acts.
+If the subject is "abac", Perl matches unless its optimizations are disabled,
+but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
+acts.
<a name="btassert"></a></P>
<br><b>
Backtracking verbs in assertions
@@ -3480,9 +3487,10 @@ subpattern.
</P>
<P>
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
-without any further processing; captured strings are retained. In a standalone
-negative assertion, (*ACCEPT) causes the assertion to fail without any further
-processing; captured substrings are discarded.
+without any further processing; captured strings and a (*MARK) name (if set)
+are retained. In a standalone negative assertion, (*ACCEPT) causes the
+assertion to fail without any further processing; captured substrings and any
+(*MARK) name are discarded.
</P>
<P>
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
@@ -3515,16 +3523,16 @@ Backtracking verbs in subroutines
</b><br>
<P>
These behaviours occur whether or not the subpattern is called recursively.
-Perl's treatment of subroutines is different in some cases.
-</P>
-<P>
-(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
-an immediate backtrack.
</P>
<P>
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the
-subroutine call.
+subroutine call. Perl documents this behaviour. Perl's treatment of the other
+verbs in subroutines is different in some cases.
+</P>
+<P>
+(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
+an immediate backtrack.
</P>
<P>
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
@@ -3551,7 +3559,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 16 July 2018
+Last updated: 20 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
index dee937e..63b8fb7 100644
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@@ -569,7 +569,11 @@ condition if the relevant named group exists.
</P>
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P>
-The following act immediately they are reached:
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
<pre>
(*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F)
@@ -582,13 +586,13 @@ pattern is not anchored.
<pre>
(*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character
- (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation
- (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
-</PRE>
+</pre>
+The effect of one of these verbs in a group called as a subroutine is confined
+to the subroutine call.
</P>
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
<P>
@@ -617,7 +621,7 @@ Cambridge, England.
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 07 July 2018
+Last updated: 21 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index 58adefd..a28db98 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -410,10 +410,11 @@ patterns. Modifiers on a pattern can change these settings.
The appearance of this line causes all subsequent modifier settings to be
checked for compatibility with the <b>perltest.sh</b> script, which is used to
confirm that Perl gives the same results as PCRE2. Also, apart from comment
-lines, none of the other command lines are permitted, because they and many
-of the modifiers are specific to <b>pcre2test</b>, and should not be used in
-test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
-command helps detect tests that are accidentally put in the wrong file.
+lines, #pattern commands, and #subject commands that set or unset "mark", no
+command lines are permitted, because they and many of the modifiers are
+specific to <b>pcre2test</b>, and should not be used in test files that are also
+processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
+that are accidentally put in the wrong file.
<pre>
#pop [&#60;modifiers&#62;]
#popcopy [&#60;modifiers&#62;]
@@ -2003,7 +2004,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 16 July 2018
+Last updated: 21 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>
diff --git a/doc/pcre2.txt b/doc/pcre2.txt
index b6caf06..abd5560 100644
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@@ -8601,44 +8601,46 @@ BACKTRACKING CONTROL
in the pcre2api documentation.
Experiments with Perl suggest that it too has similar optimizations,
- sometimes leading to anomalous results.
+ and like PCRE2, turning them off can change the result of a match.
Verbs that act immediately
- The following verbs act as soon as they are encountered. They may not
- be followed by a name.
+ The following verbs act as soon as they are encountered.
- (*ACCEPT)
+ (*ACCEPT) or (*ACCEPT:NAME)
- This verb causes the match to end successfully, skipping the remainder
- of the pattern. However, when it is inside a subpattern that is called
- as a subroutine, only that subpattern is ended successfully. Matching
+ This verb causes the match to end successfully, skipping the remainder
+ of the pattern. However, when it is inside a subpattern that is called
+ as a subroutine, only that subpattern is ended successfully. Matching
then continues at the outer level. If (*ACCEPT) in triggered in a posi-
- tive assertion, the assertion succeeds; in a negative assertion, the
+ tive assertion, the assertion succeeds; in a negative assertion, the
assertion fails.
- If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
+ If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
tured. For example:
A((?:A|B(*ACCEPT)|C)D)
- This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+ This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
tured by the outer parentheses.
- (*FAIL) or (*F)
+ (*FAIL) or (*FAIL:NAME)
- This verb causes a matching failure, forcing backtracking to occur. It
- is equivalent to (?!) but easier to read. The Perl documentation notes
- that it is probably useful only when combined with (?{}) or (??{}).
- Those are, of course, Perl features that are not present in PCRE2. The
- nearest equivalent is the callout feature, as for example in this pat-
- tern:
+ This verb causes a matching failure, forcing backtracking to occur. It
+ may be abbreviated to (*F). It is equivalent to (?!) but easier to
+ read. The Perl documentation notes that it is probably useful only when
+ combined with (?{}) or (??{}). Those are, of course, Perl features that
+ are not present in PCRE2. The nearest equivalent is the callout fea-
+ ture, as for example in this pattern:
a+(?C)(*FAIL)
- A match with the string "aaaa" always fails, but the callout is taken
+ A match with the string "aaaa" always fails, but the callout is taken
before each backtrack happens (in this example, 10 times).
+ (*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
+ (*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
+
Recording which path was taken
There is one verb whose main purpose is to track how a match was
@@ -8659,9 +8661,9 @@ BACKTRACKING CONTROL
cases when (*MARK) is used in conjunction with (*SKIP) as described
below.)
- As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated
- NAME arguments. Whichever is last on the matching path is passed back.
- See below for more details of these other verbs.
+ As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
+ associated NAME arguments. Whichever is last on the matching path is
+ passed back. See below for more details of these other verbs.
Here is an example of pcre2test output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data:
@@ -8717,22 +8719,26 @@ BACKTRACKING CONTROL
when the verb is not in a subroutine or an assertion. Subsequent sec-
tions cover these special cases.
- (*COMMIT)
+ (*COMMIT) or (*COMMIT:NAME)
- This verb, which may not be followed by a name, causes the whole match
- to fail outright if there is a later matching failure that causes back-
- tracking to reach it. Even if the pattern is unanchored, no further
- attempts to find a match by advancing the starting point take place. If
- (*COMMIT) is the only backtracking verb that is encountered, once it
- has been passed pcre2_match() is committed to finding a match at the
- current starting point, or not at all. For example:
+ This verb causes the whole match to fail outright if there is a later
+ matching failure that causes backtracking to reach it. Even if the pat-
+ tern is unanchored, no further attempts to find a match by advancing
+ the starting point take place. If (*COMMIT) is the only backtracking
+ verb that is encountered, once it has been passed pcre2_match() is com-
+ mitted to finding a match at the current starting point, or not at all.
+ For example:
a+(*COMMIT)b
This matches "xxaab" but not "aacaab". It can be thought of as a kind
- of dynamic anchor, or "I've started, so I must finish." The name of the
- most recently passed (*MARK) in the path is passed back when (*COMMIT)
- forces a match failure.
+ of dynamic anchor, or "I've started, so I must finish."
+
+ The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
+ MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
+ ing back to the caller. However, (*SKIP:NAME) searches only for names
+ set with (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and
+ (*THEN).
If there is more than one backtracking verb in a pattern, a different
one that follows (*COMMIT) may be triggered first, so merely passing
@@ -8776,7 +8782,7 @@ BACKTRACKING CONTROL
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
It is like (*MARK:NAME) in that the name is remembered for passing back
to the caller. However, (*SKIP:NAME) searches only for names set with
- (*MARK), ignoring those set by (*PRUNE) or (*THEN).
+ (*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
(*SKIP)
@@ -8784,29 +8790,30 @@ BACKTRACKING CONTROL
the pattern is unanchored, the "bumpalong" advance is not to the next
character, but to the position in the subject where (*SKIP) was encoun-
tered. (*SKIP) signifies that whatever text was matched leading up to
- it cannot be part of a successful match. Consider:
+ it cannot be part of a successful match if there is a later mismatch.
+ Consider:
a+(*SKIP)b
- If the subject is "aaaac...", after the first match attempt fails
- (starting at the first character in the string), the starting point
+ If the subject is "aaaac...", after the first match attempt fails
+ (starting at the first character in the string), the starting point
skips on to start the next attempt at "c". Note that a possessive quan-
- tifer does not have the same effect as this example; although it would
- suppress backtracking during the first match attempt, the second
- attempt would start at the second character instead of skipping on to
+ tifer does not have the same effect as this example; although it would
+ suppress backtracking during the first match attempt, the second
+ attempt would start at the second character instead of skipping on to
"c".
(*SKIP:NAME)
- When (*SKIP) has an associated name, its behaviour is modified. When
- such a (*SKIP) is triggered, the previous path through the pattern is
- searched for the most recent (*MARK) that has the same name. If one is
- found, the "bumpalong" advance is to the subject position that corre-
- sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
+ When (*SKIP) has an associated name, its behaviour is modified. When
+ such a (*SKIP) is triggered, the previous path through the pattern is
+ searched for the most recent (*MARK) that has the same name. If one is
+ found, the "bumpalong" advance is to the subject position that corre-
+ sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
- The search for a (*MARK) name uses the normal backtracking mechanism,
- which means that it does not see (*MARK) settings that are inside
+ The search for a (*MARK) name uses the normal backtracking mechanism,
+ which means that it does not see (*MARK) settings that are inside
atomic groups or assertions, because they are never re-entered by back-
tracking. Compare the following pcre2test examples:
@@ -8820,18 +8827,19 @@ BACKTRACKING CONTROL
0: b
1: b
- In the first example, the (*MARK) setting is in an atomic group, so it
+ In the first example, the (*MARK) setting is in an atomic group, so it
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
- This allows the second branch of the pattern to be tried at the first
- character position. In the second example, the (*MARK) setting is not
- in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
+ This allows the second branch of the pattern to be tried at the first
+ character position. In the second example, the (*MARK) setting is not
+ in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
backtracks, and this causes a new matching attempt to start at the sec-
- ond character. This time, the (*MARK) is never seen because "a" does
+ ond character. This time, the (*MARK) is never seen because "a" does
not match "b", so the matcher immediately jumps to the second branch of
the pattern.
- Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
- ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+ Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
+ ignores names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or
+ (*THEN:NAME).
(*THEN) or (*THEN:NAME)
@@ -8850,87 +8858,87 @@ BACKTRACKING CONTROL
track to whatever came before the entire group. If (*THEN) is not
inside an alternation, it acts like (*PRUNE).
- The behaviour of (*THEN:NAME) is the not the same as
- (*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is
- remembered for passing back to the caller. However, (*SKIP:NAME)
- searches only for names set with (*MARK), ignoring those set by
- (*PRUNE) and (*THEN).
-
- A subpattern that does not contain a | character is just a part of the
- enclosing alternative; it is not a nested alternation with only one
- alternative. The effect of (*THEN) extends beyond such a subpattern to
- the enclosing alternative. Consider this pattern, where A, B, etc. are
- complex pattern fragments that do not contain any | characters at this
+ The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
+ It is like (*MARK:NAME) in that the name is remembered for passing back
+ to the caller. However, (*SKIP:NAME) searches only for names set with
+ (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
+
+ A subpattern that does not contain a | character is just a part of the
+ enclosing alternative; it is not a nested alternation with only one
+ alternative. The effect of (*THEN) extends beyond such a subpattern to
+ the enclosing alternative. Consider this pattern, where A, B, etc. are
+ complex pattern fragments that do not contain any | characters at this
level:
A (B(*THEN)C) | D
- If A and B are matched, but there is a failure in C, matching does not
+ If A and B are matched, but there is a failure in C, matching does not
backtrack into A; instead it moves to the next alternative, that is, D.
- However, if the subpattern containing (*THEN) is given an alternative,
+ However, if the subpattern containing (*THEN) is given an alternative,
it behaves differently:
A (B(*THEN)C | (*FAIL)) | D
- The effect of (*THEN) is now confined to the inner subpattern. After a
+ The effect of (*THEN) is now confined to the inner subpattern. After a
failure in C, matching moves to (*FAIL), which causes the whole subpat-
- tern to fail because there are no more alternatives to try. In this
+ tern to fail because there are no more alternatives to try. In this
case, matching does now backtrack into A.
- Note that a conditional subpattern is not considered as having two
- alternatives, because only one is ever used. In other words, the |
+ Note that a conditional subpattern is not considered as having two
+ alternatives, because only one is ever used. In other words, the |
character in a conditional subpattern has a different meaning. Ignoring
white space, consider:
^.*? (?(?=a) a | b(*THEN)c )
- If the subject is "ba", this pattern does not match. Because .*? is
- ungreedy, it initially matches zero characters. The condition (?=a)
- then fails, the character "b" is matched, but "c" is not. At this
- point, matching does not backtrack to .*? as might perhaps be expected
- from the presence of the | character. The conditional subpattern is
+ If the subject is "ba", this pattern does not match. Because .*? is
+ ungreedy, it initially matches zero characters. The condition (?=a)
+ then fails, the character "b" is matched, but "c" is not. At this
+ point, matching does not backtrack to .*? as might perhaps be expected
+ from the presence of the | character. The conditional subpattern is
part of the single alternative that comprises the whole pattern, and so
- the match fails. (If there was a backtrack into .*?, allowing it to
+ the match fails. (If there was a backtrack into .*?, allowing it to
match "b", the match would succeed.)
- The verbs just described provide four different "strengths" of control
+ The verbs just described provide four different "strengths" of control
when subsequent matching fails. (*THEN) is the weakest, carrying on the
- match at the next alternative. (*PRUNE) comes next, failing the match
- at the current starting position, but allowing an advance to the next
- character (for an unanchored pattern). (*SKIP) is similar, except that
+ match at the next alternative. (*PRUNE) comes next, failing the match
+ at the current starting position, but allowing an advance to the next
+ character (for an unanchored pattern). (*SKIP) is similar, except that
the advance may be more than one character. (*COMMIT) is the strongest,
causing the entire match to fail.
More than one backtracking verb
- If more than one backtracking verb is present in a pattern, the one
- that is backtracked onto first acts. For example, consider this pat-
+ If more than one backtracking verb is present in a pattern, the one
+ that is backtracked onto first acts. For example, consider this pat-
tern, where A, B, etc. are complex pattern fragments:
(A(*COMMIT)B(*THEN)C|ABD)
- If A matches but B fails, the backtrack to (*COMMIT) causes the entire
+ If A matches but B fails, the backtrack to (*COMMIT) causes the entire
match to fail. However, if A and B match, but C fails, the backtrack to
- (*THEN) causes the next alternative (ABD) to be tried. This behaviour
- is consistent, but is not always the same as Perl's. It means that if
- two or more backtracking verbs appear in succession, all the the last
+ (*THEN) causes the next alternative (ABD) to be tried. This behaviour
+ is consistent, but is not always the same as Perl's. It means that if
+ two or more backtracking verbs appear in succession, all the the last
of them has no effect. Consider this example:
...(*COMMIT)(*PRUNE)...
If there is a matching failure to the right, backtracking onto (*PRUNE)
- causes it to be triggered, and its action is taken. There can never be
+ causes it to be triggered, and its action is taken. There can never be
a backtrack onto (*COMMIT).
Backtracking verbs in repeated groups
- PCRE2 differs from Perl in its handling of backtracking verbs in
- repeated groups. For example, consider:
+ PCRE2 sometimes differs from Perl in its handling of backtracking verbs
+ in repeated groups. For example, consider:
/(a(*COMMIT)b)+ac/
- If the subject is "abac", Perl matches, but PCRE2 fails because the
- (*COMMIT) in the second repeat of the group acts.
+ If the subject is "abac", Perl matches unless its optimizations are
+ disabled, but PCRE2 always fails because the (*COMMIT) in the second
+ repeat of the group acts.
Backtracking verbs in assertions
@@ -8940,44 +8948,46 @@ BACKTRACKING CONTROL
in a conditional subpattern.
(*ACCEPT) in a standalone positive assertion causes the assertion to
- succeed without any further processing; captured strings are retained.
- In a standalone negative assertion, (*ACCEPT) causes the assertion to
- fail without any further processing; captured substrings are discarded.
+ succeed without any further processing; captured strings and a (*MARK)
+ name (if set) are retained. In a standalone negative assertion,
+ (*ACCEPT) causes the assertion to fail without any further processing;
+ captured substrings and any (*MARK) name are discarded.
- If the assertion is a condition, (*ACCEPT) causes the condition to be
- true for a positive assertion and false for a negative one; captured
+ If the assertion is a condition, (*ACCEPT) causes the condition to be
+ true for a positive assertion and false for a negative one; captured
substrings are retained in both cases.
The remaining verbs act only when a later failure causes a backtrack to
- reach them. This means that their effect is confined to the assertion,
+ reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after
an assertion is complete does not jump back into the assertion. Note in
- particular that a (*MARK) name that is set in an assertion is not
+ particular that a (*MARK) name that is set in an assertion is not
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
- The effect of (*THEN) is not allowed to escape beyond an assertion. If
- there are no more branches to try, (*THEN) causes a positive assertion
+ The effect of (*THEN) is not allowed to escape beyond an assertion. If
+ there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true.
- The other backtracking verbs are not treated specially if they appear
- in a standalone positive assertion. In a conditional positive asser-
+ The other backtracking verbs are not treated specially if they appear
+ in a standalone positive assertion. In a conditional positive asser-
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
- or (*PRUNE) causes the condition to be false. However, for both stand-
+ or (*PRUNE) causes the condition to be false. However, for both stand-
alone and conditional negative assertions, backtracking into (*COMMIT),
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
ing any further alternative branches.
Backtracking verbs in subroutines
- These behaviours occur whether or not the subpattern is called recur-
- sively. Perl's treatment of subroutines is different in some cases.
-
- (*FAIL) in a subpattern called as a subroutine has its normal effect:
- it forces an immediate backtrack.
+ These behaviours occur whether or not the subpattern is called recur-
+ sively.
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
match to succeed without any further processing. Matching then contin-
- ues after the subroutine call.
+ ues after the subroutine call. Perl documents this behaviour. Perl's
+ treatment of the other verbs in subroutines is different in some cases.
+
+ (*FAIL) in a subpattern called as a subroutine has its normal effect:
+ it forces an immediate backtrack.
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
cause the subroutine match to fail.
@@ -9002,7 +9012,7 @@ AUTHOR
REVISION
- Last updated: 16 July 2018
+ Last updated: 20 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@@ -10226,7 +10236,11 @@ CONDITIONAL PATTERNS
BACKTRACKING CONTROL
- The following act immediately they are reached:
+ All backtracking control verbs may be in the form (*VERB:NAME). For
+ (*MARK) the name is mandatory, for the others it is optional. (*SKIP)
+ changes its behaviour if :NAME is present. The others just set a name
+ for passing back to the caller, but this is not a name that (*SKIP) can
+ see. The following act immediately they are reached:
(*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F)
@@ -10239,12 +10253,13 @@ BACKTRACKING CONTROL
(*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character
- (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation
- (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
+
+ The effect of one of these verbs in a group called as a subroutine is
+ confined to the subroutine call.
CALLOUTS
@@ -10254,14 +10269,14 @@ CALLOUTS
(?C"text") callout with string data
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
- the start and the end), and the starting delimiter { matched with the
- ending delimiter }. To encode the ending delimiter within the string,
+ the start and the end), and the starting delimiter { matched with the
+ ending delimiter }. To encode the ending delimiter within the string,
double it.
SEE ALSO
- pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
+ pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2(3).
@@ -10274,7 +10289,7 @@ AUTHOR
REVISION
- Last updated: 07 July 2018
+ Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3
index bcd2279..056cad5 100644
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "16 July 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -3154,17 +3154,16 @@ in the
.\"
documentation.
.P
-Experiments with Perl suggest that it too has similar optimizations, sometimes
-leading to anomalous results.
+Experiments with Perl suggest that it too has similar optimizations, and like
+PCRE2, turning them off can change the result of a match.
.
.
.SS "Verbs that act immediately"
.rs
.sp
-The following verbs act as soon as they are encountered. They may not be
-followed by a name.
+The following verbs act as soon as they are encountered.
.sp
- (*ACCEPT)
+ (*ACCEPT) or (*ACCEPT:NAME)
.sp
This verb causes the match to end successfully, skipping the remainder of the
pattern. However, when it is inside a subpattern that is called as a
@@ -3180,18 +3179,21 @@ example:
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
the outer parentheses.
.sp
- (*FAIL) or (*F)
+ (*FAIL) or (*FAIL:NAME)
.sp
-This verb causes a matching failure, forcing backtracking to occur. It is
-equivalent to (?!) but easier to read. The Perl documentation notes that it is
-probably useful only when combined with (?{}) or (??{}). Those are, of course,
-Perl features that are not present in PCRE2. The nearest equivalent is the
-callout feature, as for example in this pattern:
+This verb causes a matching failure, forcing backtracking to occur. It may be
+abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
+documentation notes that it is probably useful only when combined with (?{}) or
+(??{}). Those are, of course, Perl features that are not present in PCRE2. The
+nearest equivalent is the callout feature, as for example in this pattern:
.sp
a+(?C)(*FAIL)
.sp
A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times).
+.P
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
+(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
.
.
.SS "Recording which path was taken"
@@ -3220,9 +3222,9 @@ documentation. This applies to all instances of (*MARK), including those inside
assertions and atomic groups. (There are differences in those cases when
(*MARK) is used in conjunction with (*SKIP) as described below.)
.P
-As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
-arguments. Whichever is last on the matching path is passed back. See below for
-more details of these other verbs.
+As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
+associated NAME arguments. Whichever is last on the matching path is passed
+back. See below for more details of these other verbs.
.P
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data:
@@ -3282,22 +3284,24 @@ reaches them. The behaviour described below is what happens when the verb is
not in a subroutine or an assertion. Subsequent sections cover these special
cases.
.sp
- (*COMMIT)
+ (*COMMIT) or (*COMMIT:NAME)
.sp
-This verb, which may not be followed by a name, causes the whole match to fail
-outright if there is a later matching failure that causes backtracking to reach
-it. Even if the pattern is unanchored, no further attempts to find a match by
-advancing the starting point take place. If (*COMMIT) is the only backtracking
-verb that is encountered, once it has been passed \fBpcre2_match()\fP is
-committed to finding a match at the current starting point, or not at all. For
-example:
+This verb causes the whole match to fail outright if there is a later matching
+failure that causes backtracking to reach it. Even if the pattern is
+unanchored, no further attempts to find a match by advancing the starting point
+take place. If (*COMMIT) is the only backtracking verb that is encountered,
+once it has been passed \fBpcre2_match()\fP is committed to finding a match at
+the current starting point, or not at all. For example:
.sp
a+(*COMMIT)b
.sp
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." The name of the most
-recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
-match failure.
+dynamic anchor, or "I've started, so I must finish."
+.P
+The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
.P
If there is more than one backtracking verb in a pattern, a different one that
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
@@ -3338,7 +3342,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) or (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
.sp
(*SKIP)
.sp
@@ -3346,7 +3350,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
pattern is unanchored, the "bumpalong" advance is not to the next character,
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
signifies that whatever text was matched leading up to it cannot be part of a
-successful match. Consider:
+successful match if there is a later mismatch. Consider:
.sp
a+(*SKIP)b
.sp
@@ -3391,7 +3395,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
the second branch of the pattern.
.P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
-names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
.sp
(*THEN) or (*THEN:NAME)
.sp
@@ -3409,10 +3413,10 @@ succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
more alternatives, so there is a backtrack to whatever came before the entire
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
.P
-The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
-It is like (*MARK:NAME) in that the name is remembered for passing back to the
+The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) and (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
.P
A subpattern that does not contain a | character is just a part of the
enclosing alternative; it is not a nested alternation with only one
@@ -3485,13 +3489,14 @@ onto (*COMMIT).
.SS "Backtracking verbs in repeated groups"
.rs
.sp
-PCRE2 differs from Perl in its handling of backtracking verbs in repeated
-groups. For example, consider:
+PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
+repeated groups. For example, consider:
.sp
/(a(*COMMIT)b)+ac/
.sp
-If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
-in the second repeat of the group acts.
+If the subject is "abac", Perl matches unless its optimizations are disabled,
+but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
+acts.
.
.
.\" HTML <a name="btassert"></a>
@@ -3504,9 +3509,10 @@ not the assertion is standalone or acting as the condition in a conditional
subpattern.
.P
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
-without any further processing; captured strings are retained. In a standalone
-negative assertion, (*ACCEPT) causes the assertion to fail without any further
-processing; captured substrings are discarded.
+without any further processing; captured strings and a (*MARK) name (if set)
+are retained. In a standalone negative assertion, (*ACCEPT) causes the
+assertion to fail without any further processing; captured substrings and any
+(*MARK) name are discarded.
.P
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are
@@ -3536,14 +3542,14 @@ the assertion to be true, without considering any further alternative branches.
.rs
.sp
These behaviours occur whether or not the subpattern is called recursively.
-Perl's treatment of subroutines is different in some cases.
-.P
-(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
-an immediate backtrack.
.P
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the
-subroutine call.
+subroutine call. Perl documents this behaviour. Perl's treatment of the other
+verbs in subroutines is different in some cases.
+.P
+(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
+an immediate backtrack.
.P
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
the subroutine match to fail.
@@ -3574,6 +3580,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 16 July 2018
+Last updated: 20 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi
diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3
index 7e29beb..fa32ad6 100644
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32"
+.TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -410,8 +410,6 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?>...) atomic, non-capturing group
.
.
-.
-.
.SH "COMMENT"
.rs
.sp
@@ -552,7 +550,11 @@ condition if the relevant named group exists.
.SH "BACKTRACKING CONTROL"
.rs
.sp
-The following act immediately they are reached:
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
.sp
(*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F)
@@ -565,12 +567,13 @@ pattern is not anchored.
.sp
(*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character
- (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation
- (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
+.sp
+The effect of one of these verbs in a group called as a subroutine is confined
+to the subroutine call.
.
.
.SH "CALLOUTS"
@@ -606,6 +609,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 07 July 2018
+Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi
diff --git a/doc/pcre2test.1 b/doc/pcre2test.1
index edf87df..06b3dc8 100644
--- a/doc/pcre2test.1
+++ b/doc/pcre2test.1
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "16 July 2018" "PCRE 10.32"
+.TH PCRE2TEST 1 "21 July 2018" "PCRE 10.32"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -360,10 +360,11 @@ patterns. Modifiers on a pattern can change these settings.
The appearance of this line causes all subsequent modifier settings to be
checked for compatibility with the \fBperltest.sh\fP script, which is used to
confirm that Perl gives the same results as PCRE2. Also, apart from comment
-lines, none of the other command lines are permitted, because they and many
-of the modifiers are specific to \fBpcre2test\fP, and should not be used in
-test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
-command helps detect tests that are accidentally put in the wrong file.
+lines, #pattern commands, and #subject commands that set or unset "mark", no
+command lines are permitted, because they and many of the modifiers are
+specific to \fBpcre2test\fP, and should not be used in test files that are also
+processed by \fBperltest.sh\fP. The \fB#perltest\fP command helps detect tests
+that are accidentally put in the wrong file.
.sp
#pop [<modifiers>]
#popcopy [<modifiers>]
@@ -1981,6 +1982,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 16 July 2018
+Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi
diff --git a/doc/pcre2test.txt b/doc/pcre2test.txt
index 5b23d6a..44727a7 100644
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@@ -344,11 +344,11 @@ COMMAND LINES
The appearance of this line causes all subsequent modifier settings to
be checked for compatibility with the perltest.sh script, which is used
to confirm that Perl gives the same results as PCRE2. Also, apart from
- comment lines, none of the other command lines are permitted, because
- they and many of the modifiers are specific to pcre2test, and should
- not be used in test files that are also processed by perltest.sh. The
- #perltest command helps detect tests that are accidentally put in the
- wrong file.
+ comment lines, #pattern commands, and #subject commands that set or
+ unset "mark", no command lines are permitted, because they and many of
+ the modifiers are specific to pcre2test, and should not be used in test
+ files that are also processed by perltest.sh. The #perltest command
+ helps detect tests that are accidentally put in the wrong file.
#pop [<modifiers>]
#popcopy [<modifiers>]
@@ -1818,5 +1818,5 @@ AUTHOR
REVISION
- Last updated: 16 July 2018
+ Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge.