summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-10-01 16:54:40 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-10-01 16:54:40 +0000
commit5f42224005b7d9a503903e3342ec7ada75590b07 (patch)
treecd216c1c4ce213cc37bb9440077dc878abc54580 /doc
parentf312a9a8397f6f52dc3ef3db4e3589dec11b3f73 (diff)
downloadpcre-5f42224005b7d9a503903e3342ec7ada75590b07.tar.gz
Refactored auto-possessification code.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1363 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc')
-rw-r--r--doc/pcre_compile.35
-rw-r--r--doc/pcre_compile2.35
-rw-r--r--doc/pcreapi.328
-rw-r--r--doc/pcrematching.312
-rw-r--r--doc/pcretest.114
5 files changed, 49 insertions, 15 deletions
diff --git a/doc/pcre_compile.3 b/doc/pcre_compile.3
index 21d3e8c..d293515 100644
--- a/doc/pcre_compile.3
+++ b/doc/pcre_compile.3
@@ -1,4 +1,4 @@
-.TH PCRE_COMPILE 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_COMPILE 3 "01 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
@@ -51,6 +51,7 @@ The option bits are:
PCRE_FIRSTLINE Force matching to be before newline
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
PCRE_MULTILINE ^ and $ match newlines within data
+ PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
sequences
@@ -59,6 +60,8 @@ The option bits are:
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
+ PCRE_NO_AUTO_POSSESSIFY Disable auto-possessification
+ PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
validity (only relevant if
PCRE_UTF16 is set)
diff --git a/doc/pcre_compile2.3 b/doc/pcre_compile2.3
index 3d86dc6..32e1211 100644
--- a/doc/pcre_compile2.3
+++ b/doc/pcre_compile2.3
@@ -1,4 +1,4 @@
-.TH PCRE_COMPILE2 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_COMPILE2 3 "01 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
@@ -56,6 +56,7 @@ The option bits are:
PCRE_FIRSTLINE Force matching to be before newline
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
PCRE_MULTILINE ^ and $ match newlines within data
+ PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
sequences
@@ -64,6 +65,8 @@ The option bits are:
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
+ PCRE_NO_AUTO_POSSESSIFY Disable auto-possessification
+ PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
validity (only relevant if
PCRE_UTF16 is set)
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 8752a19..90c7787 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "03 September 2013" "PCRE 8.34"
+.TH PCREAPI 3 "01 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@@ -795,6 +795,12 @@ were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). There is no equivalent of this option
in Perl.
.sp
+ PCRE_NO_AUTO_POSSESSIFY
+.sp
+If this option is set, it disables "auto-possessification". This is an
+optimization that, for example, turns a+b into a++b in order to avoid
+backtracks into a+ that can never be successful.
+.sp
PCRE_NO_START_OPTIMIZE
.sp
This is an option that acts at matching time; that is, it is really an option
@@ -860,10 +866,10 @@ page. If an invalid UTF-8 sequence is found, \fBpcre_compile()\fP returns an
error. If you already know that your pattern is valid, and you want to skip
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.
When it is set, the effect of passing an invalid UTF-8 string as a pattern is
-undefined. It may cause your program to crash. Note that this option can also
-be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
-validity checking of subject strings only. If the same string is being matched
-many times, the option can be safely set for the second and subsequent
+undefined. It may cause your program to crash or loop. Note that this option
+can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress
+the validity checking of subject strings only. If the same string is being
+matched many times, the option can be safely set for the second and subsequent
matchings to improve performance.
.
.
@@ -1931,7 +1937,7 @@ all the matches in a single subject string. However, you should be sure that
the value of \fIstartoffset\fP points to the start of a character (or the end
of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an
invalid string as a subject or an invalid value of \fIstartoffset\fP is
-undefined. Your program may crash.
+undefined. Your program may crash or loop.
.sp
PCRE_PARTIAL_HARD
PCRE_PARTIAL_SOFT
@@ -2773,6 +2779,14 @@ matching string is given first. If there were too many matches to fit into
\fIovector\fP, the yield of the function is zero, and the vector is filled with
the longest matches. Unlike \fBpcre_exec()\fP, \fBpcre_dfa_exec()\fP can use
the entire \fIovector\fP for returning matched strings.
+
+NOTE: PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\ed+?") or set the PCRE_NO_AUTO_POSSESSIFY option when compiling.
.
.
.SS "Error returns from \fBpcre_dfa_exec()\fP"
@@ -2849,6 +2863,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 03 September 2013
+Last updated: 01 October 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
diff --git a/doc/pcrematching.3 b/doc/pcrematching.3
index a9977d5..ea92cc9 100644
--- a/doc/pcrematching.3
+++ b/doc/pcrematching.3
@@ -1,4 +1,4 @@
-.TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30"
+.TH PCREMATCHING 3 "01 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE MATCHING ALGORITHMS"
@@ -106,6 +106,14 @@ the three strings "caterpillar", "cater", and "cat" that start at the fifth
character of the subject. The algorithm does not automatically move on to find
matches that start at later positions.
.P
+PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\ed+?") or set the PCRE_NO_AUTO_POSSESSIFY option when compiling.
+.P
There are a number of features of PCRE regular expressions that are not
supported by the alternative matching algorithm. They are as follows:
.P
@@ -201,6 +209,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 08 January 2012
+Last updated: 01 October 2013
Copyright (c) 1997-2012 University of Cambridge.
.fi
diff --git a/doc/pcretest.1 b/doc/pcretest.1
index 19982d0..48c317c 100644
--- a/doc/pcretest.1
+++ b/doc/pcretest.1
@@ -1,4 +1,4 @@
-.TH PCRETEST 1 "27 August 2013" "PCRE 8.34"
+.TH PCRETEST 1 "01 October 2013" "PCRE 8.34"
.SH NAME
pcretest - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -155,6 +155,10 @@ Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding \fB/M\fP to each regular expression. The size is given in
bytes for both libraries.
.TP 10
+\fB-O\fP
+Behave as if each pattern has the \fB/O\fP modifier, that is disable
+auto-possessification for all patterns.
+.TP 10
\fB-o\fP \fIosize\fP
Set the number of elements in the output vector that is used when calling
\fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP to be \fIosize\fP. The
@@ -324,6 +328,7 @@ sections.
\fB/M\fP show compiled memory size
\fB/m\fP set PCRE_MULTILINE
\fB/N\fP set PCRE_NO_AUTO_CAPTURE
+ \fB/O\fP set PCRE_NO_AUTO_POSSESSIFY
\fB/P\fP use the POSIX wrapper
\fB/S\fP study the pattern after compilation
\fB/s\fP set PCRE_DOTALL
@@ -380,6 +385,7 @@ options that do not correspond to anything in Perl:
\fB/f\fP PCRE_FIRSTLINE
\fB/J\fP PCRE_DUPNAMES
\fB/N\fP PCRE_NO_AUTO_CAPTURE
+ \fB/O\fP PCRE_NO_AUTO_POSSESSIFY
\fB/U\fP PCRE_UNGREEDY
\fB/W\fP PCRE_UCP
\fB/X\fP PCRE_EXTRA
@@ -512,8 +518,8 @@ expression has been compiled, and the results used when the expression is
matched. There are a number of qualifying characters that may follow \fB/S\fP.
They may appear in any order.
.P
-If \fBS\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is called
-with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
+If \fB/S\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is
+called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
\fBpcre_extra\fP block, even when studying discovers no useful information.
.P
If \fB/S\fP is followed by a second S character, it suppresses studying, even
@@ -1098,6 +1104,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 27 August 2013
+Last updated: 01 October 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi