Documentation and general text tidies in preparation for test release.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@654 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2011-08-02 11:00:40 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2011-08-02 11:00:40 +0000
commit: 9c65843dde6af3b331acdf8518a6020df32f45af (patch)
tree: f4938ee9a3d4ca4b7282f86370a5a39875a3a562 /doc/html/pcrepattern.html
parent: 2c1db477501a36945e05bc50a1d563c96c4e13f4 (diff)
download: pcre-9c65843dde6af3b331acdf8518a6020df32f45af.tar.gz
1 files changed, 47 insertions, 12 deletions
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index b1fa6e0..6ddf3ef 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -245,7 +245,11 @@ Perl, $ and @ cause variable interpolation. Note the following examples:
   \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
 </pre>
 The \Q...\E sequence is recognized both inside and outside character classes.
-An isolated \E that is not preceded by \Q is ignored.
+An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
+by \E later in the pattern, the literal interpretation continues to the end of
+the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
+a character class, this causes an error, because the character class is not
+terminated.
 <a name="digitsafterbackslash"></a></P>
 <br><b>
 Non-printing characters
@@ -752,6 +756,10 @@ preceding character. None of them have codepoints less than 256, so in
 non-UTF-8 mode \X matches any one character.
 </P>
 <P>
+Note that recent versions of Perl have changed \X to match what Unicode calls
+an "extended grapheme cluster", which has a more complicated definition.
+</P>
+<P>
 Matching characters by Unicode property is not fast, because PCRE has to search
 a structure that contains data for over fifteen thousand characters. That is
 why the traditional escape sequences such as \d and \w do not use Unicode
@@ -1405,7 +1413,7 @@ items:
   an escape such as \d or \pL that matches a single character
   a character class
   a back reference (see next section)
-  a parenthesized subpattern (unless it is an assertion)
+  a parenthesized subpattern (including assertions)
   a recursive or "subroutine" call to a subpattern
 </pre>
 The general repetition quantifier specifies a minimum and maximum number of
@@ -1796,12 +1804,32 @@ that look behind it. An assertion subpattern is matched in the normal way,
 except that it does not cause the current matching position to be changed.
 </P>
 <P>
-Assertion subpatterns are not capturing subpatterns, and may not be repeated,
-because it makes no sense to assert the same thing several times. If any kind
-of assertion contains capturing subpatterns within it, these are counted for
-the purposes of numbering the capturing subpatterns in the whole pattern.
-However, substring capturing is carried out only for positive assertions,
-because it does not make sense for negative assertions.
+Assertion subpatterns are not capturing subpatterns. If such an assertion
+contains capturing subpatterns within it, these are counted for the purposes of
+numbering the capturing subpatterns in the whole pattern. However, substring
+capturing is carried out only for positive assertions, because it does not make
+sense for negative assertions.
+</P>
+<P>
+For compatibility with Perl, assertion subpatterns may be repeated; though
+it makes no sense to assert the same thing several times, the side effect of
+capturing parentheses may occasionally be useful. In practice, there only three
+cases:
+<br>
+<br>
+(1) If the quantifier is {0}, the assertion is never obeyed during matching.
+However, it may contain internal capturing parenthesized groups that are called
+from elsewhere via the
+<a href="#subpatternsassubroutines">subroutine mechanism.</a>
+<br>
+<br>
+(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
+were {0,1}. At run time, the rest of the pattern match is tried with and
+without the assertion, the order depending on the greediness of the quantifier.
+<br>
+<br>
+(3) If the minimum repetition is greater than zero, the quantifier is ignored.
+The assertion is obeyed just once when encountered during matching.
 </P>
 <br><b>
 Lookahead assertions
@@ -2445,8 +2473,10 @@ failing negative assertion, they cause an error if encountered by
 <P>
 If any of these verbs are used in an assertion or subroutine subpattern
 (including recursive subpatterns), their effect is confined to that subpattern;
-it does not extend to the surrounding pattern. Note that such subpatterns are
-processed as anchored at the point where they are tested.
+it does not extend to the surrounding pattern, with one exception: a *MARK that
+is encountered in a positive assertion <i>is</i> passed back (compare capturing
+parentheses in assertions). Note that such subpatterns are processed as
+anchored at the point where they are tested.
 </P>
 <P>
 The new verbs make use of what was previously invalid syntax: an opening
@@ -2536,6 +2566,11 @@ of obtaining this information than putting each alternative in its own
 capturing parentheses.
 </P>
 <P>
+If (*MARK) is encountered in a positive assertion, its name is recorded and
+passed back if it is the last-encountered. This does not happen for negative
+assetions.
+</P>
+<P>
 A name may also be returned after a failed match if the final path through the
 pattern involves (*MARK). However, unless (*MARK) used in conjunction with
 (*COMMIT), this is unlikely to happen for an unanchored pattern because, as the
@@ -2705,9 +2740,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 November 2010
+Last updated: 24 July 2011
 <br>
-Copyright &copy; 1997-2010 University of Cambridge.
+Copyright &copy; 1997-2011 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2011-08-02 11:00:40 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2011-08-02 11:00:40 +0000
commit	9c65843dde6af3b331acdf8518a6020df32f45af (patch)
tree	f4938ee9a3d4ca4b7282f86370a5a39875a3a562 /doc/html/pcrepattern.html
parent	2c1db477501a36945e05bc50a1d563c96c4e13f4 (diff)
download	pcre-9c65843dde6af3b331acdf8518a6020df32f45af.tar.gz