Add (*CR) etc.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@227 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-08-21 15:00:15 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-08-21 15:00:15 +0000
commit: 273487b8386264c012ab681035d19c93b4309ed3 (patch)
tree: 8706cad6dc3ba3dcfa9971357c00a725e8029244 /doc/html/pcrepattern.html
parent: c6a88bf880d462c62e00d8d7c3eeeaad60ebab49 (diff)
download: pcre-273487b8386264c012ab681035d19c93b4309ed3.tar.gz
1 files changed, 85 insertions, 52 deletions
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index 45d8181..d9847d5 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -14,31 +14,32 @@ man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
-<li><a name="TOC2" href="#SEC2">CHARACTERS AND METACHARACTERS</a>
-<li><a name="TOC3" href="#SEC3">BACKSLASH</a>
-<li><a name="TOC4" href="#SEC4">CIRCUMFLEX AND DOLLAR</a>
-<li><a name="TOC5" href="#SEC5">FULL STOP (PERIOD, DOT)</a>
-<li><a name="TOC6" href="#SEC6">MATCHING A SINGLE BYTE</a>
-<li><a name="TOC7" href="#SEC7">SQUARE BRACKETS AND CHARACTER CLASSES</a>
-<li><a name="TOC8" href="#SEC8">POSIX CHARACTER CLASSES</a>
-<li><a name="TOC9" href="#SEC9">VERTICAL BAR</a>
-<li><a name="TOC10" href="#SEC10">INTERNAL OPTION SETTING</a>
-<li><a name="TOC11" href="#SEC11">SUBPATTERNS</a>
-<li><a name="TOC12" href="#SEC12">DUPLICATE SUBPATTERN NUMBERS</a>
-<li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>
-<li><a name="TOC14" href="#SEC14">REPETITION</a>
-<li><a name="TOC15" href="#SEC15">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
-<li><a name="TOC16" href="#SEC16">BACK REFERENCES</a>
-<li><a name="TOC17" href="#SEC17">ASSERTIONS</a>
-<li><a name="TOC18" href="#SEC18">CONDITIONAL SUBPATTERNS</a>
-<li><a name="TOC19" href="#SEC19">COMMENTS</a>
-<li><a name="TOC20" href="#SEC20">RECURSIVE PATTERNS</a>
-<li><a name="TOC21" href="#SEC21">SUBPATTERNS AS SUBROUTINES</a>
-<li><a name="TOC22" href="#SEC22">CALLOUTS</a>
-<li><a name="TOC23" href="#SEC23">BACTRACKING CONTROL</a>
-<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
-<li><a name="TOC25" href="#SEC25">AUTHOR</a>
-<li><a name="TOC26" href="#SEC26">REVISION</a>
+<li><a name="TOC2" href="#SEC2">NEWLINE CONVENTIONS</a>
+<li><a name="TOC3" href="#SEC3">CHARACTERS AND METACHARACTERS</a>
+<li><a name="TOC4" href="#SEC4">BACKSLASH</a>
+<li><a name="TOC5" href="#SEC5">CIRCUMFLEX AND DOLLAR</a>
+<li><a name="TOC6" href="#SEC6">FULL STOP (PERIOD, DOT)</a>
+<li><a name="TOC7" href="#SEC7">MATCHING A SINGLE BYTE</a>
+<li><a name="TOC8" href="#SEC8">SQUARE BRACKETS AND CHARACTER CLASSES</a>
+<li><a name="TOC9" href="#SEC9">POSIX CHARACTER CLASSES</a>
+<li><a name="TOC10" href="#SEC10">VERTICAL BAR</a>
+<li><a name="TOC11" href="#SEC11">INTERNAL OPTION SETTING</a>
+<li><a name="TOC12" href="#SEC12">SUBPATTERNS</a>
+<li><a name="TOC13" href="#SEC13">DUPLICATE SUBPATTERN NUMBERS</a>
+<li><a name="TOC14" href="#SEC14">NAMED SUBPATTERNS</a>
+<li><a name="TOC15" href="#SEC15">REPETITION</a>
+<li><a name="TOC16" href="#SEC16">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
+<li><a name="TOC17" href="#SEC17">BACK REFERENCES</a>
+<li><a name="TOC18" href="#SEC18">ASSERTIONS</a>
+<li><a name="TOC19" href="#SEC19">CONDITIONAL SUBPATTERNS</a>
+<li><a name="TOC20" href="#SEC20">COMMENTS</a>
+<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a>
+<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a>
+<li><a name="TOC23" href="#SEC23">CALLOUTS</a>
+<li><a name="TOC24" href="#SEC24">BACTRACKING CONTROL</a>
+<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
+<li><a name="TOC26" href="#SEC26">AUTHOR</a>
+<li><a name="TOC27" href="#SEC27">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
 <P>
@@ -74,7 +75,39 @@ discussed in the
 <a href="pcrematching.html"><b>pcrematching</b></a>
 page.
 </P>
-<br><a name="SEC2" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
+<br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>
+<P>
+PCRE supports five different conventions for indicating line breaks in
+strings: a single CR (carriage return) character, a single LF (linefeed)
+character, the two-character sequence CRLF, any of the three preceding, or any
+Unicode newline sequence. The
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page has
+<a href="pcreapi.html#newlines">further discussion</a>
+about newlines, and shows how to set the newline convention in the
+<i>options</i> arguments for the compiling and matching functions.
+</P>
+<P>
+It is also possible to specify a newline convention by starting a pattern
+string with one of the following five sequences:
+<pre>
+  (*CR)        carriage return
+  (*LF)        linefeed
+  (*CRLF)      carriage return, followed by linefeed
+  (*ANYCRLF)   any of the three above
+  (*ANY)       all Unicode newline sequences
+</pre>
+These override the default and the options given to <b>pcre_compile()</b>. For
+example, on a Unix system where LF is the default newline sequence, the pattern
+<pre>
+  (*CR)a.b
+</pre>
+changes the convention to CR. That pattern matches "a\nb" because LF is no
+longer a newline. Note that these special settings, which are not
+Perl-compatible, are recognized only at the very start of a pattern, and that
+they must be in upper case.
+</P>
+<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
 <P>
 A regular expression is a pattern that is matched against a subject string from
 left to right. Most characters stand for themselves in a pattern, and match the
@@ -131,7 +164,7 @@ a character class the only metacharacters are:
 </pre>
 The following sections describe the use of each of the metacharacters.
 </P>
-<br><a name="SEC3" href="#TOC1">BACKSLASH</a><br>
+<br><a name="SEC4" href="#TOC1">BACKSLASH</a><br>
 <P>
 The backslash character has several uses. Firstly, if it is followed by a
 non-alphanumeric character, it takes away any special meaning that character
@@ -180,7 +213,7 @@ represents:
   \cx       "control-x", where x is any character
   \e        escape (hex 1B)
   \f        formfeed (hex 0C)
-  \n        newline (hex 0A)
+  \n        linefeed (hex 0A)
   \r        carriage return (hex 0D)
   \t        tab (hex 09)
   \ddd      character with octal code ddd, or backreference
@@ -675,7 +708,7 @@ If all the alternatives of a pattern begin with \G, the expression is anchored
 to the starting match position, and the "anchored" flag is set in the compiled
 regular expression.
 </P>
-<br><a name="SEC4" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
+<br><a name="SEC5" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
 <P>
 Outside a character class, in the default matching mode, the circumflex
 character is an assertion that is true only if the current matching point is
@@ -729,7 +762,7 @@ Note that the sequences \A, \Z, and \z can be used to match the start and
 end of the subject in both modes, and if all branches of a pattern start with
 \A it is always anchored, whether or not PCRE_MULTILINE is set.
 </P>
-<br><a name="SEC5" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>
+<br><a name="SEC6" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>
 <P>
 Outside a character class, a dot in the pattern matches any one character in
 the subject string except (by default) a character that signifies the end of a
@@ -754,7 +787,7 @@ The handling of dot is entirely independent of the handling of circumflex and
 dollar, the only relationship being that they both involve newlines. Dot has no
 special meaning in a character class.
 </P>
-<br><a name="SEC6" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
+<br><a name="SEC7" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
 <P>
 Outside a character class, the escape sequence \C matches any one byte, both
 in and out of UTF-8 mode. Unlike a dot, it always matches any line-ending
@@ -769,7 +802,7 @@ PCRE does not allow \C to appear in lookbehind assertions
 because in UTF-8 mode this would make it impossible to calculate the length of
 the lookbehind.
 <a name="characterclass"></a></P>
-<br><a name="SEC7" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
+<br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
 <P>
 An opening square bracket introduces a character class, terminated by a closing
 square bracket. A closing square bracket on its own is not special. If a
@@ -864,7 +897,7 @@ introducing a POSIX class name - see the next section), and the terminating
 closing square bracket. However, escaping other non-alphanumeric characters
 does no harm.
 </P>
-<br><a name="SEC8" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
+<br><a name="SEC9" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
 <P>
 Perl supports the POSIX notation for character classes. This uses names
 enclosed by [: and :] within the enclosing square brackets. PCRE also supports
@@ -910,7 +943,7 @@ supported, and an error is given if they are encountered.
 In UTF-8 mode, characters with values greater than 128 do not match any of
 the POSIX character classes.
 </P>
-<br><a name="SEC9" href="#TOC1">VERTICAL BAR</a><br>
+<br><a name="SEC10" href="#TOC1">VERTICAL BAR</a><br>
 <P>
 Vertical bar characters are used to separate alternative patterns. For example,
 the pattern
@@ -925,7 +958,7 @@ that succeeds is used. If the alternatives are within a subpattern
 "succeeds" means matching the rest of the main pattern as well as the
 alternative in the subpattern.
 </P>
-<br><a name="SEC10" href="#TOC1">INTERNAL OPTION SETTING</a><br>
+<br><a name="SEC11" href="#TOC1">INTERNAL OPTION SETTING</a><br>
 <P>
 The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
 PCRE_EXTENDED options can be changed from within the pattern by a sequence of
@@ -973,7 +1006,7 @@ The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
 changed in the same way as the Perl-compatible options by using the characters
 J, U and X respectively.
 <a name="subpattern"></a></P>
-<br><a name="SEC11" href="#TOC1">SUBPATTERNS</a><br>
+<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br>
 <P>
 Subpatterns are delimited by parentheses (round brackets), which can be nested.
 Turning part of a pattern into a subpattern does two things:
@@ -1027,7 +1060,7 @@ from left to right, and options are not reset until the end of the subpattern
 is reached, an option setting in one branch does affect subsequent branches, so
 the above patterns match "SUNDAY" as well as "Saturday".
 </P>
-<br><a name="SEC12" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
+<br><a name="SEC13" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
 <P>
 Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
 the same numbers for its capturing parentheses. Such a subpattern starts with
@@ -1058,7 +1091,7 @@ the first one in the pattern with the given number.
 An alternative approach to using this "branch reset" feature is to use
 duplicate named subpatterns, as described in the next section.
 </P>
-<br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
+<br><a name="SEC14" href="#TOC1">NAMED SUBPATTERNS</a><br>
 <P>
 Identifying capturing parentheses by number is simple, but it can be very hard
 to keep track of the numbers in complicated regular expressions. Furthermore,
@@ -1113,7 +1146,7 @@ details of the interfaces for handling named subpatterns, see the
 <a href="pcreapi.html"><b>pcreapi</b></a>
 documentation.
 </P>
-<br><a name="SEC14" href="#TOC1">REPETITION</a><br>
+<br><a name="SEC15" href="#TOC1">REPETITION</a><br>
 <P>
 Repetition is specified by quantifiers, which can follow any of the following
 items:
@@ -1264,7 +1297,7 @@ example, after
 </pre>
 matches "aba" the value of the second captured substring is "b".
 <a name="atomicgroup"></a></P>
-<br><a name="SEC15" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
+<br><a name="SEC16" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
 <P>
 With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
 repetition, failure of what follows normally causes the repeated item to be
@@ -1368,7 +1401,7 @@ an atomic group, like this:
 </pre>
 sequences of non-digits cannot be broken, and failure happens quickly.
 <a name="backreferences"></a></P>
-<br><a name="SEC16" href="#TOC1">BACK REFERENCES</a><br>
+<br><a name="SEC17" href="#TOC1">BACK REFERENCES</a><br>
 <P>
 Outside a character class, a backslash followed by a digit greater than 0 (and
 possibly further digits) is a back reference to a capturing subpattern earlier
@@ -1482,7 +1515,7 @@ that the first iteration does not need to match the back reference. This can be
 done using alternation, as in the example above, or by a quantifier with a
 minimum of zero.
 <a name="bigassertions"></a></P>
-<br><a name="SEC17" href="#TOC1">ASSERTIONS</a><br>
+<br><a name="SEC18" href="#TOC1">ASSERTIONS</a><br>
 <P>
 An assertion is a test on the characters following or preceding the current
 matching point that does not actually consume any characters. The simple
@@ -1642,7 +1675,7 @@ preceded by "foo", while
 is another pattern that matches "foo" preceded by three digits and any three
 characters that are not "999".
 <a name="conditions"></a></P>
-<br><a name="SEC18" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
+<br><a name="SEC19" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
 <P>
 It is possible to cause the matching process to obey a subpattern
 conditionally or to choose between two alternative subpatterns, depending on
@@ -1780,7 +1813,7 @@ subject is matched against the first alternative; otherwise it is matched
 against the second. This pattern matches strings in one of the two forms
 dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
 <a name="comments"></a></P>
-<br><a name="SEC19" href="#TOC1">COMMENTS</a><br>
+<br><a name="SEC20" href="#TOC1">COMMENTS</a><br>
 <P>
 The sequence (?# marks the start of a comment that continues up to the next
 closing parenthesis. Nested parentheses are not permitted. The characters
@@ -1791,7 +1824,7 @@ If the PCRE_EXTENDED option is set, an unescaped # character outside a
 character class introduces a comment that continues to immediately after the
 next newline in the pattern.
 <a name="recursion"></a></P>
-<br><a name="SEC20" href="#TOC1">RECURSIVE PATTERNS</a><br>
+<br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>
 <P>
 Consider the problem of matching a string in parentheses, allowing for
 unlimited nested parentheses. Without the use of recursion, the best that can
@@ -1921,7 +1954,7 @@ In this pattern, (?(R) is the start of a conditional subpattern, with two
 different alternatives for the recursive and non-recursive cases. The (?R) item
 is the actual recursive call.
 <a name="subpatternsassubroutines"></a></P>
-<br><a name="SEC21" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
+<br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
 <P>
 If the syntax for a recursive subpattern reference (either by number or by
 name) is used outside the parentheses to which it refers, it operates like a
@@ -1961,7 +1994,7 @@ changed for different calls. For example, consider this pattern:
 It matches "abcabc". It does not match "abcABC" because the change of
 processing option does not affect the called subpattern.
 </P>
-<br><a name="SEC22" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
 <P>
 Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
 code to be obeyed in the middle of matching a regular expression. This makes it
@@ -1996,7 +2029,7 @@ description of the interface to the callout function is given in the
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
 </P>
-<br><a name="SEC23" href="#TOC1">BACTRACKING CONTROL</a><br>
+<br><a name="SEC24" href="#TOC1">BACTRACKING CONTROL</a><br>
 <P>
 Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
 are described in the Perl documentation as "experimental and subject to change
@@ -2111,11 +2144,11 @@ the end of the group if FOO succeeds); on failure the matcher skips to the
 second alternative and tries COND2, without backtracking into COND1. If (*THEN)
 is used outside of any alternation, it acts exactly like (*PRUNE).
 </P>
-<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
 </P>
-<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -2124,9 +2157,9 @@ University Computing Service
 Cambridge CB2 3QH, England.
 <br>
 </P>
-<br><a name="SEC26" href="#TOC1">REVISION</a><br>
+<br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 09 August 2007
+Last updated: 21 August 2007
 <br>
 Copyright &copy; 1997-2007 University of Cambridge.
 <br>
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-08-21 15:00:15 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-08-21 15:00:15 +0000
commit	273487b8386264c012ab681035d19c93b4309ed3 (patch)
tree	8706cad6dc3ba3dcfa9971357c00a725e8029244 /doc/html/pcrepattern.html
parent	c6a88bf880d462c62e00d8d7c3eeeaad60ebab49 (diff)
download	pcre-273487b8386264c012ab681035d19c93b4309ed3.tar.gz