summaryrefslogtreecommitdiff
path: root/doc/html/pcrepattern.html
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-08-02 11:00:40 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-08-02 11:00:40 +0000
commit9c65843dde6af3b331acdf8518a6020df32f45af (patch)
treef4938ee9a3d4ca4b7282f86370a5a39875a3a562 /doc/html/pcrepattern.html
parent2c1db477501a36945e05bc50a1d563c96c4e13f4 (diff)
downloadpcre-9c65843dde6af3b331acdf8518a6020df32f45af.tar.gz
Documentation and general text tidies in preparation for test release.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@654 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html/pcrepattern.html')
-rw-r--r--doc/html/pcrepattern.html59
1 files changed, 47 insertions, 12 deletions
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index b1fa6e0..6ddf3ef 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -245,7 +245,11 @@ Perl, $ and @ cause variable interpolation. Note the following examples:
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
</pre>
The \Q...\E sequence is recognized both inside and outside character classes.
-An isolated \E that is not preceded by \Q is ignored.
+An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
+by \E later in the pattern, the literal interpretation continues to the end of
+the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
+a character class, this causes an error, because the character class is not
+terminated.
<a name="digitsafterbackslash"></a></P>
<br><b>
Non-printing characters
@@ -752,6 +756,10 @@ preceding character. None of them have codepoints less than 256, so in
non-UTF-8 mode \X matches any one character.
</P>
<P>
+Note that recent versions of Perl have changed \X to match what Unicode calls
+an "extended grapheme cluster", which has a more complicated definition.
+</P>
+<P>
Matching characters by Unicode property is not fast, because PCRE has to search
a structure that contains data for over fifteen thousand characters. That is
why the traditional escape sequences such as \d and \w do not use Unicode
@@ -1405,7 +1413,7 @@ items:
an escape such as \d or \pL that matches a single character
a character class
a back reference (see next section)
- a parenthesized subpattern (unless it is an assertion)
+ a parenthesized subpattern (including assertions)
a recursive or "subroutine" call to a subpattern
</pre>
The general repetition quantifier specifies a minimum and maximum number of
@@ -1796,12 +1804,32 @@ that look behind it. An assertion subpattern is matched in the normal way,
except that it does not cause the current matching position to be changed.
</P>
<P>
-Assertion subpatterns are not capturing subpatterns, and may not be repeated,
-because it makes no sense to assert the same thing several times. If any kind
-of assertion contains capturing subpatterns within it, these are counted for
-the purposes of numbering the capturing subpatterns in the whole pattern.
-However, substring capturing is carried out only for positive assertions,
-because it does not make sense for negative assertions.
+Assertion subpatterns are not capturing subpatterns. If such an assertion
+contains capturing subpatterns within it, these are counted for the purposes of
+numbering the capturing subpatterns in the whole pattern. However, substring
+capturing is carried out only for positive assertions, because it does not make
+sense for negative assertions.
+</P>
+<P>
+For compatibility with Perl, assertion subpatterns may be repeated; though
+it makes no sense to assert the same thing several times, the side effect of
+capturing parentheses may occasionally be useful. In practice, there only three
+cases:
+<br>
+<br>
+(1) If the quantifier is {0}, the assertion is never obeyed during matching.
+However, it may contain internal capturing parenthesized groups that are called
+from elsewhere via the
+<a href="#subpatternsassubroutines">subroutine mechanism.</a>
+<br>
+<br>
+(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
+were {0,1}. At run time, the rest of the pattern match is tried with and
+without the assertion, the order depending on the greediness of the quantifier.
+<br>
+<br>
+(3) If the minimum repetition is greater than zero, the quantifier is ignored.
+The assertion is obeyed just once when encountered during matching.
</P>
<br><b>
Lookahead assertions
@@ -2445,8 +2473,10 @@ failing negative assertion, they cause an error if encountered by
<P>
If any of these verbs are used in an assertion or subroutine subpattern
(including recursive subpatterns), their effect is confined to that subpattern;
-it does not extend to the surrounding pattern. Note that such subpatterns are
-processed as anchored at the point where they are tested.
+it does not extend to the surrounding pattern, with one exception: a *MARK that
+is encountered in a positive assertion <i>is</i> passed back (compare capturing
+parentheses in assertions). Note that such subpatterns are processed as
+anchored at the point where they are tested.
</P>
<P>
The new verbs make use of what was previously invalid syntax: an opening
@@ -2536,6 +2566,11 @@ of obtaining this information than putting each alternative in its own
capturing parentheses.
</P>
<P>
+If (*MARK) is encountered in a positive assertion, its name is recorded and
+passed back if it is the last-encountered. This does not happen for negative
+assetions.
+</P>
+<P>
A name may also be returned after a failed match if the final path through the
pattern involves (*MARK). However, unless (*MARK) used in conjunction with
(*COMMIT), this is unlikely to happen for an unanchored pattern because, as the
@@ -2705,9 +2740,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 21 November 2010
+Last updated: 24 July 2011
<br>
-Copyright &copy; 1997-2010 University of Cambridge.
+Copyright &copy; 1997-2011 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.