summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-11-21 18:45:10 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-11-21 18:45:10 +0000
commit6bd55b864efee297700d9aa850c31a98a1fdee85 (patch)
treeec9526d1cd09d61adf3f6c808fcaff7fd8ac0eb5
parent4fd19debe11ba11f06ab5eb633cd3010567679e9 (diff)
downloadpcre-6bd55b864efee297700d9aa850c31a98a1fdee85.tar.gz
Added support for (*NO_START_OPT)
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@576 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog7
-rw-r--r--doc/pcreapi.323
-rw-r--r--doc/pcrepattern.35
-rw-r--r--doc/pcresyntax.35
-rw-r--r--doc/pcretest.13
-rw-r--r--pcre.h.in2
-rw-r--r--pcre_compile.c2
-rw-r--r--pcre_dfa_exec.c6
-rw-r--r--pcre_exec.c5
-rw-r--r--pcre_internal.h40
-rw-r--r--pcretest.c4
-rw-r--r--testdata/testinput26
-rw-r--r--testdata/testinput73
-rw-r--r--testdata/testoutput224
-rw-r--r--testdata/testoutput712
15 files changed, 115 insertions, 32 deletions
diff --git a/ChangeLog b/ChangeLog
index 47dd4a3..b18a0d5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -113,6 +113,13 @@ Version 8.11 10-Oct-2010
compile-time error is now given if \c is not followed by an ASCII
character, that is, a byte less than 128. (In EBCDIC mode, the code is
different, and any byte value is allowed.)
+
+20. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
+ START_OPTIMIZE option, which is now allowed at compile time - but just
+ passed through to pcre_exec() or pcre_dfa_exec(). This makes it available
+ to pcregrep and other applications that have no direct access to PCRE
+ options. The new /Y option in pcretest sets this option when calling
+ pcre_compile().
Version 8.10 25-Jun-2010
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 6e555d3..702005b 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -428,8 +428,9 @@ within the pattern (see the detailed description in the
documentation). For those options that can be different in different parts of
the pattern, the contents of the \fIoptions\fP argument specifies their
settings at the start of compilation and execution. The PCRE_ANCHORED,
-PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
-of matching as well as at compile time.
+PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and
+PCRE_NO_START_OPT options can be set at the time of matching as well as at
+compile time.
.P
If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
@@ -658,6 +659,17 @@ were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). There is no equivalent of this option
in Perl.
.sp
+ NO_START_OPTIMIZE
+.sp
+This is an option that acts at matching time; that is, it is really an option
+for \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. If it is set at compile time,
+it is remembered with the compiled pattern and assumed at matching time. For
+details see the discussion of PCRE_NO_START_OPTIMIZE
+.\" HTML <a href="#execoptions">
+.\" </a>
+below.
+.\"
+.sp
PCRE_UCP
.sp
This option changes the way PCRE processes \eB, \eb, \eD, \ed, \eS, \es, \eW,
@@ -1487,7 +1499,10 @@ a pre-scan of the subject that takes place before the pattern is run.
The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly
causing performance to suffer, but ensuring that in cases where the result is
"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
-are considered at every possible starting position in the subject string.
+are considered at every possible starting position in the subject string. If
+PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
+time.
+.P
Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
Consider the pattern
.sp
@@ -2252,6 +2267,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 13 November 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 8bec417..049afe6 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -52,6 +52,11 @@ such as \ed and \ew to use Unicode properties to determine character types,
instead of recognizing only characters with codes less than 128 via a lookup
table.
.P
+If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
+PCRE_NO_START_OPTIMIZE option either at compile or matching time. There are
+also some more of these special sequences that are concerned with the handling
+of newlines; they are described below.
+.P
The remainder of this document discusses the patterns that are supported by
PCRE when its main matching function, \fBpcre_exec()\fP, is used.
From release 6.0, PCRE offers a second matching function,
diff --git a/doc/pcresyntax.3 b/doc/pcresyntax.3
index a6d9091..692fdde 100644
--- a/doc/pcresyntax.3
+++ b/doc/pcresyntax.3
@@ -24,7 +24,7 @@ syntax.
.rs
.sp
\ea alarm, that is, the BEL character (hex 07)
- \ecx "control-x", where x is any character
+ \ecx "control-x", where x is any ASCII character
\ee escape (hex 1B)
\ef formfeed (hex 0C)
\en newline (hex 0A)
@@ -336,6 +336,7 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
The following are recognized only at the start of a pattern or after one of the
newline-setting options with similar syntax:
.sp
+ (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
(*UTF8) set UTF-8 mode (PCRE_UTF8)
(*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
.
@@ -473,6 +474,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 12 May 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
diff --git a/doc/pcretest.1 b/doc/pcretest.1
index b11ec66..6c37d3d 100644
--- a/doc/pcretest.1
+++ b/doc/pcretest.1
@@ -179,6 +179,7 @@ options that do not correspond to anything in Perl:
\fB/U\fP PCRE_UNGREEDY
\fB/W\fP PCRE_UCP
\fB/X\fP PCRE_EXTRA
+ \fB/Y\fP PCRE_NO_START_OPTIMIZE
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT
\fB/<cr>\fP PCRE_NEWLINE_CR
\fB/<lf>\fP PCRE_NEWLINE_LF
@@ -778,6 +779,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 07 November 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
diff --git a/pcre.h.in b/pcre.h.in
index fda971c..eb10d8f 100644
--- a/pcre.h.in
+++ b/pcre.h.in
@@ -129,7 +129,7 @@ compile-time only bits for runtime options, or vice versa. */
#define PCRE_BSR_ANYCRLF 0x00800000 /* Compile, exec, DFA exec */
#define PCRE_BSR_UNICODE 0x01000000 /* Compile, exec, DFA exec */
#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* Compile */
-#define PCRE_NO_START_OPTIMIZE 0x04000000 /* Exec, DFA exec */
+#define PCRE_NO_START_OPTIMIZE 0x04000000 /* Compile, exec, DFA exec */
#define PCRE_NO_START_OPTIMISE 0x04000000 /* Synonym */
#define PCRE_PARTIAL_HARD 0x08000000 /* Exec, DFA exec */
#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* Exec, DFA exec */
diff --git a/pcre_compile.c b/pcre_compile.c
index c269eaa..5cb069a 100644
--- a/pcre_compile.c
+++ b/pcre_compile.c
@@ -6859,6 +6859,8 @@ while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS &&
{ skipatstart += 7; options |= PCRE_UTF8; continue; }
else if (strncmp((char *)(ptr+skipatstart+2), STRING_UCP_RIGHTPAR, 4) == 0)
{ skipatstart += 6; options |= PCRE_UCP; continue; }
+ else if (strncmp((char *)(ptr+skipatstart+2), STRING_NO_START_OPT_RIGHTPAR, 13) == 0)
+ { skipatstart += 15; options |= PCRE_NO_START_OPTIMIZE; continue; }
if (strncmp((char *)(ptr+skipatstart+2), STRING_CR_RIGHTPAR, 3) == 0)
{ skipatstart += 5; newnl = PCRE_NEWLINE_CR; }
diff --git a/pcre_dfa_exec.c b/pcre_dfa_exec.c
index e4c635b..5c0bcb3 100644
--- a/pcre_dfa_exec.c
+++ b/pcre_dfa_exec.c
@@ -3057,9 +3057,11 @@ for (;;)
/* There are some optimizations that avoid running the match if a known
starting point is not found. However, there is an option that disables
- these, for testing and for ensuring that all callouts do actually occur. */
+ these, for testing and for ensuring that all callouts do actually occur.
+ The option can be set in the regex by (*NO_START_OPT) or passed in
+ match-time options. */
- if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+ if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
{
/* Advance to a known first byte. */
diff --git a/pcre_exec.c b/pcre_exec.c
index 08443a8..96cf50b 100644
--- a/pcre_exec.c
+++ b/pcre_exec.c
@@ -5936,9 +5936,10 @@ for(;;)
/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later character is not present.
However, there is an option that disables these, for testing and for ensuring
- that all callouts do actually occur. */
+ that all callouts do actually occur. The option can be set in the regex by
+ (*NO_START_OPT) or passed in match-time options. */
- if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+ if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
{
/* Advance to a unique first byte if there is one. */
diff --git a/pcre_internal.h b/pcre_internal.h
index dcf7223..ae67e0f 100644
--- a/pcre_internal.h
+++ b/pcre_internal.h
@@ -615,7 +615,7 @@ time, run time, or study time, respectively. */
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
- PCRE_JAVASCRIPT_COMPAT|PCRE_UCP)
+ PCRE_JAVASCRIPT_COMPAT|PCRE_UCP|PCRE_NO_START_OPTIMIZE)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
@@ -932,15 +932,16 @@ so that PCRE works on both ASCII and EBCDIC platforms, in non-UTF-mode only. */
#define STRING_DEFINE "DEFINE"
-#define STRING_CR_RIGHTPAR "CR)"
-#define STRING_LF_RIGHTPAR "LF)"
-#define STRING_CRLF_RIGHTPAR "CRLF)"
-#define STRING_ANY_RIGHTPAR "ANY)"
-#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)"
-#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
-#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
-#define STRING_UTF8_RIGHTPAR "UTF8)"
-#define STRING_UCP_RIGHTPAR "UCP)"
+#define STRING_CR_RIGHTPAR "CR)"
+#define STRING_LF_RIGHTPAR "LF)"
+#define STRING_CRLF_RIGHTPAR "CRLF)"
+#define STRING_ANY_RIGHTPAR "ANY)"
+#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)"
+#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
+#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
+#define STRING_UTF8_RIGHTPAR "UTF8)"
+#define STRING_UCP_RIGHTPAR "UCP)"
+#define STRING_NO_START_OPT_RIGHTPAR "NO_START_OPT)"
#else /* SUPPORT_UTF8 */
@@ -1186,15 +1187,16 @@ only. */
#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E
-#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS
-#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
-#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
-#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
-#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS
+#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
+#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
+#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
+#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_NO_START_OPT_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS
#endif /* SUPPORT_UTF8 */
diff --git a/pcretest.c b/pcretest.c
index 8802bdd..d0aa8b1 100644
--- a/pcretest.c
+++ b/pcretest.c
@@ -1588,6 +1588,7 @@ while (!done)
case 'U': options |= PCRE_UNGREEDY; break;
case 'W': options |= PCRE_UCP; break;
case 'X': options |= PCRE_EXTRA; break;
+ case 'Y': options |= PCRE_NO_START_OPTIMISE; break;
case 'Z': debug_lengths = 0; break;
case '8': options |= PCRE_UTF8; use_utf8 = 1; break;
case '?': options |= PCRE_NO_UTF8_CHECK; break;
@@ -1924,7 +1925,7 @@ while (!done)
if (do_flip) all_options = byteflip(all_options, sizeof(all_options));
if (get_options == 0) fprintf(outfile, "No options\n");
- else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
((get_options & PCRE_ANCHORED) != 0)? " anchored" : "",
((get_options & PCRE_CASELESS) != 0)? " caseless" : "",
((get_options & PCRE_EXTENDED) != 0)? " extended" : "",
@@ -1940,6 +1941,7 @@ while (!done)
((get_options & PCRE_UTF8) != 0)? " utf8" : "",
((get_options & PCRE_UCP) != 0)? " ucp" : "",
((get_options & PCRE_NO_UTF8_CHECK) != 0)? " no_utf8_check" : "",
+ ((get_options & PCRE_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
((get_options & PCRE_DUPNAMES) != 0)? " dupnames" : "");
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
diff --git a/testdata/testinput2 b/testdata/testinput2
index 6528138..8ac500e 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -2584,6 +2584,12 @@ a random value. /Ix
abc\Y
abcxypqr
abcxypqr\Y
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+
+/xyz/CY
+ abcxyz
/^"((?(?=[a])[^"])|b)*"$/C
"ab"
diff --git a/testdata/testinput7 b/testdata/testinput7
index 012af7f..6674887 100644
--- a/testdata/testinput7
+++ b/testdata/testinput7
@@ -4411,6 +4411,9 @@
abc\Y
abcxypqr
abcxypqr\Y
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
/(?C)ab/
ab
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index 9bbc784..5c599a4 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -9294,6 +9294,30 @@ No match
+0 ^ x
+0 ^ x
No match
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+--->abcxyz
++15 ^ x
++15 ^ x
++15 ^ x
++15 ^ x
++16 ^^ y
++17 ^ ^ z
++18 ^ ^
+ 0: xyz
+
+/xyz/CY
+ abcxyz
+--->abcxyz
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +1 ^^ y
+ +2 ^ ^ z
+ +3 ^ ^
+ 0: xyz
/^"((?(?=[a])[^"])|b)*"$/C
"ab"
diff --git a/testdata/testoutput7 b/testdata/testoutput7
index add4dfb..123ba82 100644
--- a/testdata/testoutput7
+++ b/testdata/testoutput7
@@ -7319,6 +7319,18 @@ No match
+0 ^ x
+0 ^ x
No match
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+--->abcxyz
++15 ^ x
++15 ^ x
++15 ^ x
++15 ^ x
++16 ^^ y
++17 ^ ^ z
++18 ^ ^
+ 0: xyz
/(?C)ab/
ab