summaryrefslogtreecommitdiff
path: root/doc/html/pcre2pattern.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/pcre2pattern.html')
-rw-r--r--doc/html/pcre2pattern.html67
1 files changed, 48 insertions, 19 deletions
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index 0daddaf..c88e931 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -669,8 +669,8 @@ This is an example of an "atomic group", details of which are given
This particular group matches either the two-character sequence CR followed by
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
-line, U+0085). The two-character sequence is treated as a single unit that
-cannot be split.
+line, U+0085). Because this is an atomic group, the two-character sequence is
+treated as a single unit that cannot be split.
</P>
<P>
In other modes, two additional characters whose codepoints are greater than 255
@@ -1186,6 +1186,16 @@ when the <i>startoffset</i> argument of <b>pcre2_match()</b> is non-zero. The
PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set.
</P>
<P>
+When the newline convention (see
+<a href="#newlines">"Newline conventions"</a>
+below) recognizes the two-character sequence CRLF as a newline, this is
+preferred, even if the single characters CR and LF are also recognized as
+newlines. For example, if the newline convention is "any", a multiline mode
+circumflex matches before "xyz" in the string "abc\r\nxyz" rather than after
+CR, even though CR on its own is a valid newline. (It also matches at the very
+start of the string, of course.)
+</P>
+<P>
Note that the sequences \A, \Z, and \z can be used to match the start and
end of the subject in both modes, and if all branches of a pattern start with
\A it is always anchored, whether or not PCRE2_MULTILINE is set.
@@ -1236,7 +1246,7 @@ with \C in UTF-8 or UTF-16 mode means that the rest of the string may start
with a malformed UTF character. This has undefined results, because PCRE2
assumes that it is matching character by character in a valid UTF string (by
default it checks the subject string's validity at the start of processing
-unless the PCRE2_NO_UTF_CHECK option is used).
+unless the PCRE2_NO_UTF_CHECK option is used).
</P>
<P>
An application can lock out the use of \C by setting the
@@ -1247,9 +1257,9 @@ build PCRE2 with the use of \C permanently disabled.
PCRE2 does not allow \C to appear in lookbehind assertions
<a href="#lookbehind">(described below)</a>
in a UTF mode, because this would make it impossible to calculate the length of
-the lookbehind. Neither the alternative matching function
-<b>pcre2_dfa_match()</b> not the JIT optimizer support \C in a UTF mode. The
-former gives a match-time error; the latter fails to optimize and so the match
+the lookbehind. Neither the alternative matching function
+<b>pcre2_dfa_match()</b> not the JIT optimizer support \C in a UTF mode. The
+former gives a match-time error; the latter fails to optimize and so the match
is always run using the interpreter.
</P>
<P>
@@ -1341,11 +1351,11 @@ example [\000-\037]. Ranges can include any characters that are valid for the
current mode.
</P>
<P>
-There is a special case in EBCDIC environments for ranges whose end points are
-both specified as literal letters in the same case. For compatibility with
-Perl, EBCDIC code points within the range that are not letters are omitted. For
-example, [h-k] matches only four characters, even though the codes for h and k
-are 0x88 and 0x92, a range of 11 code points. However, if the range is
+There is a special case in EBCDIC environments for ranges whose end points are
+both specified as literal letters in the same case. For compatibility with
+Perl, EBCDIC code points within the range that are not letters are omitted. For
+example, [h-k] matches only four characters, even though the codes for h and k
+are 0x88 and 0x92, a range of 11 code points. However, if the range is
specified numerically, for example, [\x88-\x92] or [h-\x92], all code points
are included.
</P>
@@ -1672,6 +1682,10 @@ first one in the pattern with the given number. The following pattern matches
<pre>
/(?|(abc)|(def))(?1)/
</pre>
+A relative reference such as (?-1) is no different: it is just a convenient way
+of computing an absolute group number.
+</P>
+<P>
If a
<a href="#conditions">condition test</a>
for a subpattern's having matched refers to a non-unique number, the test is
@@ -2512,7 +2526,7 @@ For example:
(?(VERSION&#62;=10.4)yes|no)
</pre>
This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
-"no" otherwise. The fractional part of the version number may not contain more
+"no" otherwise. The fractional part of the version number may not contain more
than two digits.
</P>
<br><b>
@@ -2626,6 +2640,21 @@ parentheses preceding the recursion. In other words, a negative number counts
capturing parentheses leftwards from the point at which it is encountered.
</P>
<P>
+Be aware however, that if
+<a href="#dupsubpatternnumber">duplicate subpattern numbers</a>
+are in use, relative references refer to the earliest subpattern with the
+appropriate number. Consider, for example:
+<pre>
+ (?|(a)|(b)) (c) (?-2)
+</pre>
+The first two capturing groups (a) and (b) are both numbered 1, and group (c)
+is number 2. When the reference (?-2) is encountered, the second most recently
+opened parentheses has the number 1, but it is the first such group (the (a)
+group) to which the recursion refers. This would be the same if an absolute
+reference (?1) was used. In other words, relative references are just a
+shorthand for computing a group number.
+</P>
+<P>
It is also possible to refer to subsequently opened parentheses, by writing
references such as (?+2). However, these cannot be recursive because the
reference is not inside the parentheses that are referenced. They are always
@@ -2929,13 +2958,13 @@ depending on whether or not a name is present.
</P>
<P>
By default, for compatibility with Perl, a name is any sequence of characters
-that does not include a closing parenthesis. The name is not processed in
+that does not include a closing parenthesis. The name is not processed in
any way, and it is not possible to include a closing parenthesis in the name.
-However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
-is applied to verb names and only an unescaped closing parenthesis terminates
-the name. A closing parenthesis can be included in a name either as \) or
-between \Q and \E. If the PCRE2_EXTENDED option is set, unescaped whitespace
-in verb names is skipped and #-comments are recognized, exactly as in the rest
+However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
+is applied to verb names and only an unescaped closing parenthesis terminates
+the name. A closing parenthesis can be included in a name either as \) or
+between \Q and \E. If the PCRE2_EXTENDED option is set, unescaped whitespace
+in verb names is skipped and #-comments are recognized, exactly as in the rest
of the pattern.
</P>
<P>
@@ -3359,7 +3388,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 01 November 2015
+Last updated: 13 November 2015
<br>
Copyright &copy; 1997-2015 University of Cambridge.
<br>