Refactored auto-possessification code.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1363 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2013-10-01 16:54:40 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2013-10-01 16:54:40 +0000
commit: 5f42224005b7d9a503903e3342ec7ada75590b07 (patch)
tree: cd216c1c4ce213cc37bb9440077dc878abc54580 /doc
parent: f312a9a8397f6f52dc3ef3db4e3589dec11b3f73 (diff)
download: pcre-5f42224005b7d9a503903e3342ec7ada75590b07.tar.gz
5 files changed, 49 insertions, 15 deletions
diff --git a/doc/pcre_compile.3 b/doc/pcre_compile.3
index 21d3e8c..d293515 100644
--- a/doc/pcre_compile.3
+++ b/doc/pcre_compile.3
@@ -1,4 +1,4 @@
-.TH PCRE_COMPILE 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_COMPILE 3 "01 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH SYNOPSIS
@@ -51,6 +51,7 @@ The option bits are:
   PCRE_FIRSTLINE          Force matching to be before newline
   PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
   PCRE_MULTILINE          ^ and $ match newlines within data
+  PCRE_NEVER_UTF          Lock out UTF, e.g. via (*UTF)
   PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
   PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
                             sequences
@@ -59,6 +60,8 @@ The option bits are:
   PCRE_NEWLINE_LF         Set LF as the newline sequence
   PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
+  PCRE_NO_AUTO_POSSESSIFY Disable auto-possessification
+  PCRE_NO_START_OPTIMIZE  Disable match-time start optimizations 
   PCRE_NO_UTF16_CHECK     Do not check the pattern for UTF-16
                             validity (only relevant if
                             PCRE_UTF16 is set)
diff --git a/doc/pcre_compile2.3 b/doc/pcre_compile2.3
index 3d86dc6..32e1211 100644
--- a/doc/pcre_compile2.3
+++ b/doc/pcre_compile2.3
@@ -1,4 +1,4 @@
-.TH PCRE_COMPILE2 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_COMPILE2 3 "01 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH SYNOPSIS
@@ -56,6 +56,7 @@ The option bits are:
   PCRE_FIRSTLINE          Force matching to be before newline
   PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
   PCRE_MULTILINE          ^ and $ match newlines within data
+  PCRE_NEVER_UTF          Lock out UTF, e.g. via (*UTF)
   PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
   PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
                             sequences
@@ -64,6 +65,8 @@ The option bits are:
   PCRE_NEWLINE_LF         Set LF as the newline sequence
   PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
+  PCRE_NO_AUTO_POSSESSIFY Disable auto-possessification
+  PCRE_NO_START_OPTIMIZE  Disable match-time start optimizations 
   PCRE_NO_UTF16_CHECK     Do not check the pattern for UTF-16
                             validity (only relevant if
                             PCRE_UTF16 is set)
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 8752a19..90c7787 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "03 September 2013" "PCRE 8.34"
+.TH PCREAPI 3 "01 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -795,6 +795,12 @@ were followed by ?: but named parentheses can still be used for capturing (and
 they acquire numbers in the usual way). There is no equivalent of this option
 in Perl.
 .sp
+  PCRE_NO_AUTO_POSSESSIFY
+.sp
+If this option is set, it disables "auto-possessification". This is an 
+optimization that, for example, turns a+b into a++b in order to avoid
+backtracks into a+ that can never be successful.
+.sp
   PCRE_NO_START_OPTIMIZE
 .sp
 This is an option that acts at matching time; that is, it is really an option
@@ -860,10 +866,10 @@ page. If an invalid UTF-8 sequence is found, \fBpcre_compile()\fP returns an
 error. If you already know that your pattern is valid, and you want to skip
 this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.
 When it is set, the effect of passing an invalid UTF-8 string as a pattern is
-undefined. It may cause your program to crash. Note that this option can also
-be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
-validity checking of subject strings only. If the same string is being matched
-many times, the option can be safely set for the second and subsequent
+undefined. It may cause your program to crash or loop. Note that this option
+can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress
+the validity checking of subject strings only. If the same string is being
+matched many times, the option can be safely set for the second and subsequent
 matchings to improve performance.
 .
 .
@@ -1931,7 +1937,7 @@ all the matches in a single subject string. However, you should be sure that
 the value of \fIstartoffset\fP points to the start of a character (or the end
 of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an
 invalid string as a subject or an invalid value of \fIstartoffset\fP is
-undefined. Your program may crash.
+undefined. Your program may crash or loop.
 .sp
   PCRE_PARTIAL_HARD
   PCRE_PARTIAL_SOFT
@@ -2773,6 +2779,14 @@ matching string is given first. If there were too many matches to fit into
 \fIovector\fP, the yield of the function is zero, and the vector is filled with
 the longest matches. Unlike \fBpcre_exec()\fP, \fBpcre_dfa_exec()\fP can use
 the entire \fIovector\fP for returning matched strings.
+
+NOTE: PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\ed+?") or set the PCRE_NO_AUTO_POSSESSIFY option when compiling.
 .
 .
 .SS "Error returns from \fBpcre_dfa_exec()\fP"
@@ -2849,6 +2863,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 03 September 2013
+Last updated: 01 October 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi
diff --git a/doc/pcrematching.3 b/doc/pcrematching.3
index a9977d5..ea92cc9 100644
--- a/doc/pcrematching.3
+++ b/doc/pcrematching.3
@@ -1,4 +1,4 @@
-.TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30"
+.TH PCREMATCHING 3 "01 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE MATCHING ALGORITHMS"
@@ -106,6 +106,14 @@ the three strings "caterpillar", "cater", and "cat" that start at the fifth
 character of the subject. The algorithm does not automatically move on to find
 matches that start at later positions.
 .P
+PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\ed+?") or set the PCRE_NO_AUTO_POSSESSIFY option when compiling.
+.P
 There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:
 .P
@@ -201,6 +209,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 08 January 2012
+Last updated: 01 October 2013
 Copyright (c) 1997-2012 University of Cambridge.
 .fi
diff --git a/doc/pcretest.1 b/doc/pcretest.1
index 19982d0..48c317c 100644
--- a/doc/pcretest.1
+++ b/doc/pcretest.1
@@ -1,4 +1,4 @@
-.TH PCRETEST 1 "27 August 2013" "PCRE 8.34"
+.TH PCRETEST 1 "01 October 2013" "PCRE 8.34"
 .SH NAME
 pcretest - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -155,6 +155,10 @@ Output the size of each compiled pattern after it has been compiled. This is
 equivalent to adding \fB/M\fP to each regular expression. The size is given in
 bytes for both libraries.
 .TP 10
+\fB-O\fP
+Behave as if each pattern has the \fB/O\fP modifier, that is disable 
+auto-possessification for all patterns. 
+.TP 10
 \fB-o\fP \fIosize\fP
 Set the number of elements in the output vector that is used when calling
 \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP to be \fIosize\fP. The
@@ -324,6 +328,7 @@ sections.
   \fB/M\fP              show compiled memory size
   \fB/m\fP              set PCRE_MULTILINE
   \fB/N\fP              set PCRE_NO_AUTO_CAPTURE
+  \fB/O\fP              set PCRE_NO_AUTO_POSSESSIFY 
   \fB/P\fP              use the POSIX wrapper
   \fB/S\fP              study the pattern after compilation
   \fB/s\fP              set PCRE_DOTALL
@@ -380,6 +385,7 @@ options that do not correspond to anything in Perl:
   \fB/f\fP              PCRE_FIRSTLINE
   \fB/J\fP              PCRE_DUPNAMES
   \fB/N\fP              PCRE_NO_AUTO_CAPTURE
+  \fB/O\fP              PCRE_NO_AUTO_POSSESSIFY 
   \fB/U\fP              PCRE_UNGREEDY
   \fB/W\fP              PCRE_UCP
   \fB/X\fP              PCRE_EXTRA
@@ -512,8 +518,8 @@ expression has been compiled, and the results used when the expression is
 matched. There are a number of qualifying characters that may follow \fB/S\fP.
 They may appear in any order.
 .P
-If \fBS\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is called
-with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
+If \fB/S\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is
+called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
 \fBpcre_extra\fP block, even when studying discovers no useful information.
 .P
 If \fB/S\fP is followed by a second S character, it suppresses studying, even
@@ -1098,6 +1104,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 27 August 2013
+Last updated: 01 October 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2013-10-01 16:54:40 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2013-10-01 16:54:40 +0000
commit	5f42224005b7d9a503903e3342ec7ada75590b07 (patch)
tree	cd216c1c4ce213cc37bb9440077dc878abc54580 /doc
parent	f312a9a8397f6f52dc3ef3db4e3589dec11b3f73 (diff)
download	pcre-5f42224005b7d9a503903e3342ec7ada75590b07.tar.gz