summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:39:05 +0000
committernigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:39:05 +0000
commit8413b86222848f277386e72706ca548a37dbc6ca (patch)
treeaa68b52aa527385811d5e4af091c59609cc8fa03
parent4864ac99ba4c4395fd8dc157ec734e228c780eb4 (diff)
downloadpcre-8413b86222848f277386e72706ca548a37dbc6ca.tar.gz
Load pcre-2.06 into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@35 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog23
-rw-r--r--README143
-rw-r--r--dftables.c2
-rw-r--r--internal.h2
-rw-r--r--pcre.368
-rw-r--r--pcre.c13
-rw-r--r--pcre.h2
-rw-r--r--pcreposix.c2
-rw-r--r--pcretest.c69
-rw-r--r--pgrep.c2
-rw-r--r--testinput142
-rw-r--r--testinput244
-rw-r--r--testoutput161
-rw-r--r--testoutput2140
-rw-r--r--testoutput32
-rw-r--r--testoutput42
16 files changed, 527 insertions, 90 deletions
diff --git a/ChangeLog b/ChangeLog
index 2259f87..d5ac469 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,29 @@ ChangeLog for PCRE
------------------
+Version 2.06 09-Jun-99
+----------------------
+
+1. Change pcretest's output for amount of store used to show just the code
+space, because the remainder (the data block) varies in size between 32-bit and
+64-bit systems.
+
+2. Added an extra argument to pcre_exec() to supply an offset in the subject to
+start matching at. This allows lookbehinds to work when searching for multiple
+occurrences in a string.
+
+3. Added additional options to pcretest for testing multiple occurrences:
+
+ /+ outputs the rest of the string that follows a match
+ /g loops for multiple occurrences, using the new startoffset argument
+ /G loops for multiple occurrences by passing an incremented pointer
+
+4. PCRE wasn't doing the "first character" optimization for patterns starting
+with \b or \B, though it was doing it for other lookbehind assertions. That is,
+it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
+the letter 'x'. On long subject strings, this gives a significant speed-up.
+
+
Version 2.05 21-Apr-99
----------------------
diff --git a/README b/README
index 2db0070..190e75f 100644
--- a/README
+++ b/README
@@ -16,8 +16,23 @@ README file for PCRE (Perl-compatible regular expressions)
* possible to pass over a pointer to character tables built in the current *
* locale by pcre_maketables(). To use the default tables, this new arguement *
* should be passed as NULL. *
+* *
+* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05 *
+* *
+* Yet another (and again I hope this really is the last) change has been made *
+* to the API for the pcre_exec() function. An additional argument has been *
+* added to make it possible to start the match other than at the start of the *
+* subject string. This is important if there are lookbehinds. The new man *
+* page has the details, but you just want to convert existing programs, all *
+* you need to do is to stick in a new fifth argument to pcre_exec(), with a *
+* value of zero. For example, change *
+* *
+* pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize) *
+* to *
+* pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize) *
*******************************************************************************
+
The distribution should contain the following files:
ChangeLog log of changes to the code
@@ -45,7 +60,7 @@ The distribution should contain the following files:
testinput2 test data for error messages and non-Perl things
testinput3 test data, compatible with Perl 5.005
testinput4 test data for locale-specific tests
- testoutput1 test results corresponding to testinput
+ testoutput1 test results corresponding to testinput1
testoutput2 test results corresponding to testinput2
testoutput3 test results corresponding to testinput3
testoutput4 test results corresponding to testinput4
@@ -112,19 +127,20 @@ Character tables
PCRE uses four tables for manipulating and identifying characters. The final
argument of the pcre_compile() function is a pointer to a block of memory
-containing the concatenated tables. A call to pcre_maketables() is used to
-generate a set of tables in the current locale. However, if the final argument
-is passed as NULL, a set of default tables that is built into the binary is
-used.
+containing the concatenated tables. A call to pcre_maketables() can be used to
+generate a set of tables in the current locale. If the final argument for
+pcre_compile() is passed as NULL, a set of default tables that is built into
+the binary is used.
The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
(compiled from dftables.c), which uses the ANSI C character handling functions
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
-sources. This means that the default C locale set your system will control the
-contents of the tables. You can change the default tables by editing
-chartables.c and then re-building PCRE. If you do this, you should probably
-also edit Makefile to ensure that the file doesn't ever get re-generated.
+sources. This means that the default C locale which is set for your system will
+control the contents of these default tables. You can change the default tables
+by editing chartables.c and then re-building PCRE. If you do this, you should
+probably also edit Makefile to ensure that the file doesn't ever get
+re-generated.
The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
@@ -178,9 +194,9 @@ example,
/abc/\
-then a backslash is added to the end of the pattern. This provides a way of
-testing the error condition that arises if a pattern finishes with a backslash,
-because
+then a backslash is added to the end of the pattern. This is done to provide a
+way of testing the error condition that arises if a pattern finishes with a
+backslash, because
/abc\/
@@ -188,42 +204,63 @@ is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
-PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These
-options have the same effect as they do in Perl.
+PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
+example:
+
+ /caseless/i
+
+These modifier letters have the same effect as they do in Perl. There are
+others which set PCRE options that do not correspond to anything in Perl: /A,
+/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
+
+Searching for all possible matches within each subject string can be requested
+by the /g or /G modifier. The /g modifier behaves similarly to the way it does
+in Perl. After finding a match, PCRE is called again to search the remainder of
+the subject string. The difference between /g and /G is that the former uses
+the start_offset argument to pcre_exec() to start searching at a new point
+within the entire string, whereas the latter passes over a shortened substring.
+This makes a difference to the matching process if the pattern begins with a
+lookbehind assertion (including \b or \B).
-There are also some upper case options that do not match Perl options: /A, /E,
-and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
+There are a number of other modifiers for controlling the way pcretest
+operates.
-The /L option must be followed directly by the name of a locale, for example,
+The /+ modifier requests that as well as outputting the substring that matched
+the entire pattern, pcretest should in addition output the remainder of the
+subject string. This is useful for tests where the subject contains multiple
+copies of the same substring.
+
+The /L modifier must be followed directly by the name of a locale, for example,
/pattern/Lfr
-For this reason, it must be the last option letter. The given locale is set,
+For this reason, it must be the last modifier letter. The given locale is set,
pcre_maketables() is called to build a set of character tables for the locale,
and this is then passed to pcre_compile() when compiling the regular
-expression. Without an /L option, NULL is passed as the tables pointer; that
+expression. Without an /L modifier, NULL is passed as the tables pointer; that
is, /L applies only to the expression on which it appears.
-The /I option requests that pcretest output information about the compiled
+The /I modifier requests that pcretest output information about the compiled
expression (whether it is anchored, has a fixed first character, and so on). It
does this by calling pcre_info() after compiling an expression, and outputting
the information it gets back. If the pattern is studied, the results of that
are also output.
-The /D option is a PCRE debugging feature, which also assumes /I. It causes the
-internal form of compiled regular expressions to be output after compilation.
+The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
+the internal form of compiled regular expressions to be output after
+compilation.
-The /S option causes pcre_study() to be called after the expression has been
+The /S modifier causes pcre_study() to be called after the expression has been
compiled, and the results used when the expression is matched.
-The /M option causes information about the size of memory block used to hold
+The /M modifier causes information about the size of memory block used to hold
the compile pattern to be output.
-Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API
-rather than its native API. When this is done, all other options except /i and
-/m are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m
-is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
-PCRE_DOTALL unless REG_NEWLINE is set.
+Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
+rather than its native API. When this is done, all other modifiers except /i,
+/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
+set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
+and PCRE_DOTALL unless REG_NEWLINE is set.
Before each data line is passed to pcre_exec(), leading and trailing whitespace
is removed, and it is then scanned for \ escapes. The following are recognized:
@@ -263,16 +300,38 @@ pcre_exec() returns, starting with number 0 for the string that matched the
whole pattern. Here is an example of an interactive pcretest run.
$ pcretest
- Testing Perl-Compatible Regular Expressions
- PCRE version 0.90 08-Sep-1997
+ PCRE version 2.06 08-Jun-1999
re> /^abc(\d+)/
data> abc123
- 0: abc123
- 1: 123
+ 0: abc123
+ 1: 123
data> xyz
No match
+If the strings contain any non-printing characters, they are output as \0x
+escapes. If the pattern has the /+ modifier, then the output for substring 0 is
+followed by the the rest of the subject string, identified by "0+" like this:
+
+ re> /cat/+
+ data> cataract
+ 0: cat
+ 0+ aract
+
+If the pattern has the /g or /G modifier, the results of successive matching
+attempts are output in sequence, like this:
+
+ re> /\Bi(\w\w)/g
+ data> Mississippi
+ 0: iss
+ 1: ss
+ 0: iss
+ 1: ss
+ 0: ipp
+ 1: pp
+
+"No match" is output only if the first match attempt fails.
+
If any of \C, \G, or \L are present in a data line that is successfully
matched, the substrings extracted by the convenience functions are output with
C, G, or L after the string number instead of a colon. This is in addition to
@@ -313,21 +372,21 @@ The perltest program
The perltest program tests Perl's regular expressions; it has the same
specification as pcretest, and so can be given identical input, except that
-input patterns can be followed only by Perl's lower case options. The contents
-of testinput1 and testinput3 meet this condition.
+input patterns can be followed only by Perl's lower case modifiers. The
+contents of testinput1 and testinput3 meet this condition.
The data lines are processed as Perl strings, so if they contain $ or @
characters, these have to be escaped. For this reason, all such characters in
-the testinput file are escaped so that it can be used for perltest as well as
-for pcretest, and the special upper case options such as /A that pcretest
-recognizes are not used in this file. The output should be identical, apart
-from the initial identifying banner.
+testinput1 and testinput3 are escaped so that they can be used for perltest as
+well as for pcretest, and the special upper case modifiers such as /A that
+pcretest recognizes are not used in these files. The output should be
+identical, apart from the initial identifying banner.
-The testinput2 and testinput4 files are not suitable for feeding to Perltest,
-since they do make use of the special upper case options and escapes that
+The testinput2 and testinput4 files are not suitable for feeding to perltest,
+since they do make use of the special upper case modifiers and escapes that
pcretest uses to test some features of PCRE. The first of these files also
contains malformed regular expressions, in order to check that PCRE diagnoses
them correctly.
Philip Hazel <ph10@cam.ac.uk>
-April 1999
+June 1999
diff --git a/dftables.c b/dftables.c
index 729049f..7b336e6 100644
--- a/dftables.c
+++ b/dftables.c
@@ -59,7 +59,7 @@ printf(
"/*************************************************\n"
"* Perl-Compatible Regular Expressions *\n"
"*************************************************/\n\n"
- "/* This file is automatically written by the makechartables auxiliary \n"
+ "/* This file is automatically written by the dftables auxiliary \n"
"program. If you edit it by hand, you might like to edit the Makefile to \n"
"prevent its ever being regenerated.\n\n"
"This file is #included in the compilation of pcre.c to build the default\n"
diff --git a/internal.h b/internal.h
index 2b28ac1..e162c96 100644
--- a/internal.h
+++ b/internal.h
@@ -3,7 +3,7 @@
*************************************************/
-#define PCRE_VERSION "2.05 21-Apr-1999"
+#define PCRE_VERSION "2.06 21-Jun-1999"
/* This is a library of functions to support regular expressions whose syntax
diff --git a/pcre.3 b/pcre.3
index ec356e1..63358cc 100644
--- a/pcre.3
+++ b/pcre.3
@@ -20,9 +20,9 @@ pcre - Perl-compatible regular expressions.
.br
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
.ti +5n
-.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIoptions\fR,
+.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR,
.ti +5n
-.B int *\fIovector\fR, int \fIovecsize\fR);
+.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR);
.PP
.br
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
@@ -249,7 +249,7 @@ treated as letters), the following code could be used:
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The
pointer that is passed to \fBpcre_compile\fR is saved with the compiled
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR
-and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and
+and \fBpcre_exec()\fR. Thus for any single pattern, compilation, studying and
matching all happen in the same locale, but different patterns can be compiled
in different locales. It is the caller's responsibility to ensure that the
memory containing the tables remains available for as long as it is needed.
@@ -293,9 +293,6 @@ pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
pattern has been studied, the result of the study should be passed in the
\fIextra\fR argument. Otherwise this must be NULL.
-The subject string is passed as a pointer in \fIsubject\fR and a length in
-\fIlength\fR. Unlike the pattern string, it may contain binary zero characters.
-
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose
unused bits must be zero. However, if a pattern was compiled with
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
@@ -316,6 +313,34 @@ should not match it nor (except in multiline mode) a newline immediately before
it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never
to match.
+The subject string is passed as a pointer in \fIsubject\fR, a length in
+\fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern
+string, it may contain binary zero characters. When the starting offset is
+zero, the search for a match starts at the beginning of the subject, and this
+is by far the most common case.
+
+A non-zero starting offset is useful when searching for another match in the
+same subject by calling \fBpcre_exec()\fR again after a previous success.
+Setting \fIstartoffset\fR differs from just passing over a shortened string and
+setting PCRE_NOTBOL in the case of a pattern that begins with any kind of
+lookbehind. For example, consider the pattern
+
+ \\Biss\\B
+
+which finds occurrences of "iss" in the middle of words. (\\B matches only if
+the current position in the subject is not a word boundary.) When applied to
+the string "Mississipi" the first call to \fBpcre_exec()\fR finds the first
+occurrence. If \fBpcre_exec()\fR is called again with just the remainder of the
+subject, namely "issipi", it does not match, because \\B is always false at the
+start of the subject, which is deemed to be a word boundary. However, if
+\fBpcre_exec()\fR is passed the entire string again, but with \fIstartoffset\fR
+set to 4, it finds the second occurrence of "iss" because it is able to look
+behind the starting point to discover that it is preceded by a letter.
+
+If a non-zero starting offset is passed when the pattern is anchored, one
+attempt to match at the given offset is tried. This can only succeed if the
+pattern does not require the match to be at the start of the subject.
+
In general, a pattern matches a certain portion of the subject, and in
addition, further substrings from the subject may be picked out by parts of the
pattern. Following the usage in Jeffrey Friedl's book, this is called
@@ -730,16 +755,19 @@ first or last character matches \\w, respectively.
The \\A, \\Z, and \\z assertions differ from the traditional circumflex and
dollar (described below) in that they only ever match at the very start and end
of the subject string, whatever options are set. They are not affected by the
-PCRE_NOTBOL or PCRE_NOTEOL options. The difference between \\Z and \\z is that
-\\Z matches before a newline that is the last character of the string as well
-as at the end of the string, whereas \\z matches only at the end.
+PCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of
+\fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Z
+and \\z is that \\Z matches before a newline that is the last character of the
+string as well as at the end of the string, whereas \\z matches only at the
+end.
.SH CIRCUMFLEX AND DOLLAR
Outside a character class, in the default matching mode, the circumflex
character is an assertion which is true only if the current matching point is
-at the start of the subject string. Inside a character class, circumflex has an
-entirely different meaning (see below).
+at the start of the subject string. If the \fIstartoffset\fR argument of
+\fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a character
+class, circumflex has an entirely different meaning (see below).
Circumflex need not be the first character of the pattern if a number of
alternatives are involved, but it should be the first thing in each alternative
@@ -766,8 +794,10 @@ after and immediately before an internal "\\n" character, respectively, in
addition to matching at the start and end of the subject string. For example,
the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode,
but not otherwise. Consequently, patterns that are anchored in single line mode
-because all branches start with "^" are not anchored in multiline mode. The
-PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
+because all branches start with "^" are not anchored in multiline mode, and a
+match for circumflex is possible when the \fIstartoffset\fR argument of
+\fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
+PCRE_MULTILINE is set.
Note that the sequences \\A, \\Z, and \\z can be used to match the start and
end of the subject in both modes, and if all branches of a pattern start with
@@ -1219,11 +1249,11 @@ matches an occurrence of "baz" that is preceded by "bar" which in turn is not
preceded by "foo".
Assertion subpatterns are not capturing subpatterns, and may not be repeated,
-because it makes no sense to assert the same thing several times. If an
-assertion contains capturing subpatterns within it, these are always counted
-for the purposes of numbering the capturing subpatterns in the whole pattern.
-Substring capturing is carried out for positive assertions, but it does not
-make sense for negative assertions.
+because it makes no sense to assert the same thing several times. If any kind
+of assertion contains capturing subpatterns within it, these are counted for
+the purposes of numbering the capturing subpatterns in the whole pattern.
+However, substring capturing is carried out only for positive assertions,
+because it does not make sense for negative assertions.
Assertions count towards the maximum of 200 parenthesized subpatterns.
@@ -1390,4 +1420,6 @@ Cambridge CB2 3QG, England.
.br
Phone: +44 1223 334714
+Last updated: 10 June 1999
+.br
Copyright (c) 1997-1999 University of Cambridge.
diff --git a/pcre.c b/pcre.c
index dd5852d..58adcef 100644
--- a/pcre.c
+++ b/pcre.c
@@ -1790,6 +1790,11 @@ for (;;)
code += 2;
break;
+ case OP_WORD_BOUNDARY:
+ case OP_NOT_WORD_BOUNDARY:
+ code++;
+ break;
+
case OP_ASSERT_NOT:
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
@@ -4113,6 +4118,7 @@ Arguments:
external_extra points to "hints" from pcre_study() or is NULL
subject points to the subject string
length length of subject string (may contain binary zeros)
+ start_offset where to start in the subject string
options option bits
offsets points to a vector of ints to be filled in with offsets
offsetcount the number of elements in the vector
@@ -4125,14 +4131,15 @@ Returns: > 0 => success; value is the number of elements filled in
int
pcre_exec(const pcre *external_re, const pcre_extra *external_extra,
- const char *subject, int length, int options, int *offsets, int offsetcount)
+ const char *subject, int length, int start_offset, int options, int *offsets,
+ int offsetcount)
{
int resetcount, ocount;
int first_char = -1;
int ims = 0;
match_data match_block;
const uschar *start_bits = NULL;
-const uschar *start_match = (const uschar *)subject;
+const uschar *start_match = (const uschar *)subject + start_offset;
const uschar *end_subject;
const real_pcre *re = (const real_pcre *)external_re;
const real_pcre_extra *extra = (const real_pcre_extra *)external_extra;
@@ -4224,7 +4231,7 @@ if (!anchored)
start_bits = extra->start_bits;
}
-/* Loop for unanchored matches; for anchored regexps the loop runs just once. */
+/* Loop for unanchored matches; for anchored regexs the loop runs just once. */
do
{
diff --git a/pcre.h b/pcre.h
index 27204b6..148fd3b 100644
--- a/pcre.h
+++ b/pcre.h
@@ -59,7 +59,7 @@ extern pcre *pcre_compile(const char *, int, const char **, int *,
const unsigned char *);
extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
- int, int, int *, int);
+ int, int, int, int *, int);
extern int pcre_get_substring(const char *, int *, int, int, const char **);
extern int pcre_get_substring_list(const char *, int *, int, const char ***);
extern int pcre_info(const pcre *, int *, int *);
diff --git a/pcreposix.c b/pcreposix.c
index b370701..9672be4 100644
--- a/pcreposix.c
+++ b/pcreposix.c
@@ -223,7 +223,7 @@ if ((eflags & REG_NOTEOL) != 0) options |= PCRE_NOTEOL;
preg->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
-rc = pcre_exec(preg->re_pcre, NULL, string, (int)strlen(string), options,
+rc = pcre_exec(preg->re_pcre, NULL, string, (int)strlen(string), 0, options,
(int *)pmatch, nmatch * 2);
if (rc == 0) return 0; /* All pmatch were filled in */
diff --git a/pcretest.c b/pcretest.c
index da736a2..537fb34 100644
--- a/pcretest.c
+++ b/pcretest.c
@@ -275,8 +275,8 @@ compiled re. */
static void *new_malloc(size_t size)
{
if (log_store)
- fprintf(outfile, "Memory allocation request: %d (code space %d)\n",
- (int)size, (int)size - offsetof(real_pcre, code[0]));
+ fprintf(outfile, "Memory allocation (code space): %d\n",
+ (int)((int)size - offsetof(real_pcre, code[0])));
return malloc(size);
}
@@ -372,7 +372,10 @@ while (!done)
unsigned const char *tables = NULL;
int do_study = 0;
int do_debug = debug;
+ int do_G = 0;
+ int do_g = 0;
int do_showinfo = showinfo;
+ int do_showrest = 0;
int do_posix = 0;
int erroroffset, len, delimiter;
@@ -444,14 +447,17 @@ while (!done)
{
switch (*pp++)
{
+ case 'g': do_g = 1; break;
case 'i': options |= PCRE_CASELESS; break;
case 'm': options |= PCRE_MULTILINE; break;
case 's': options |= PCRE_DOTALL; break;
case 'x': options |= PCRE_EXTENDED; break;
+ case '+': do_showrest = 1; break;
case 'A': options |= PCRE_ANCHORED; break;
case 'D': do_debug = do_showinfo = 1; break;
case 'E': options |= PCRE_DOLLAR_ENDONLY; break;
+ case 'G': do_G = 1; break;
case 'I': do_showinfo = 1; break;
case 'M': log_store = 1; break;
case 'P': do_posix = 1; break;
@@ -661,16 +667,18 @@ while (!done)
for (;;)
{
unsigned char *q;
+ unsigned char *bptr = dbuffer;
int count, c;
int copystrings = 0;
int getstrings = 0;
int getlist = 0;
+ int start_offset = 0;
int offsets[45];
int size_offsets = sizeof(offsets)/sizeof(int);
options = 0;
- if (infile == stdin) printf(" data> ");
+ if (infile == stdin) printf("data> ");
if (fgets((char *)buffer, sizeof(buffer), infile) == NULL)
{
done = 1;
@@ -769,8 +777,8 @@ while (!done)
if ((options & PCRE_NOTBOL) != 0) eflags |= REG_NOTBOL;
if ((options & PCRE_NOTEOL) != 0) eflags |= REG_NOTEOL;
- rc = regexec(&preg, (char *)dbuffer, sizeof(pmatch)/sizeof(regmatch_t),
- pmatch, eflags);
+ rc = regexec(&preg, (unsigned char *)bptr,
+ sizeof(pmatch)/sizeof(regmatch_t), pmatch, eflags);
if (rc != 0)
{
@@ -788,14 +796,20 @@ while (!done)
pchars(dbuffer + pmatch[i].rm_so,
pmatch[i].rm_eo - pmatch[i].rm_so);
fprintf(outfile, "\n");
+ if (i == 0 && do_showrest)
+ {
+ fprintf(outfile, " 0+ ");
+ pchars(dbuffer + pmatch[i].rm_eo, len - pmatch[i].rm_eo);
+ fprintf(outfile, "\n");
+ }
}
}
}
}
- /* Handle matching via the native interface */
+ /* Handle matching via the native interface - repeats for /g and /G */
- else
+ else for (;;)
{
if (timeit)
{
@@ -803,16 +817,16 @@ while (!done)
clock_t time_taken;
clock_t start_time = clock();
for (i = 0; i < LOOPREPEAT; i++)
- count = pcre_exec(re, extra, (char *)dbuffer, len, options, offsets,
- size_offsets);
+ count = pcre_exec(re, extra, (char *)bptr, len,
+ (do_g? start_offset : 0), options, offsets, size_offsets);
time_taken = clock() - start_time;
fprintf(outfile, "Execute time %.3f milliseconds\n",
((double)time_taken * 1000.0)/
((double)LOOPREPEAT * (double)CLOCKS_PER_SEC));
}
- count = pcre_exec(re, extra, (char *)dbuffer, len, options, offsets,
- size_offsets);
+ count = pcre_exec(re, extra, (char *)bptr, len,
+ (do_g? start_offset : 0), options, offsets, size_offsets);
if (count == 0)
{
@@ -830,8 +844,18 @@ while (!done)
else
{
fprintf(outfile, "%2d: ", i/2);
- pchars(dbuffer + offsets[i], offsets[i+1] - offsets[i]);
+ pchars(bptr + offsets[i], offsets[i+1] - offsets[i]);
fprintf(outfile, "\n");
+ if (i == 0)
+ {
+ start_offset = offsets[1];
+ if (do_showrest)
+ {
+ fprintf(outfile, " 0+ ");
+ pchars(bptr + offsets[i+1], len - offsets[i+1]);
+ fprintf(outfile, "\n");
+ }
+ }
}
}
@@ -840,7 +864,7 @@ while (!done)
if ((copystrings & (1 << i)) != 0)
{
char buffer[16];
- int rc = pcre_copy_substring((char *)dbuffer, offsets, count,
+ int rc = pcre_copy_substring((char *)bptr, offsets, count,
i, buffer, sizeof(buffer));
if (rc < 0)
fprintf(outfile, "copy substring %d failed %d\n", i, rc);
@@ -854,7 +878,7 @@ while (!done)
if ((getstrings & (1 << i)) != 0)
{
const char *substring;
- int rc = pcre_get_substring((char *)dbuffer, offsets, count,
+ int rc = pcre_get_substring((char *)bptr, offsets, count,
i, &substring);
if (rc < 0)
fprintf(outfile, "get substring %d failed %d\n", i, rc);
@@ -869,7 +893,7 @@ while (!done)
if (getlist)
{
const char **stringlist;
- int rc = pcre_get_substring_list((char *)dbuffer, offsets, count,
+ int rc = pcre_get_substring_list((char *)bptr, offsets, count,
&stringlist);
if (rc < 0)
fprintf(outfile, "get substring list failed %d\n", rc);
@@ -886,8 +910,19 @@ while (!done)
}
else
{
- if (count == -1) fprintf(outfile, "No match\n");
- else fprintf(outfile, "Error %d\n", count);
+ if (start_offset == 0)
+ {
+ if (count == -1) fprintf(outfile, "No match\n");
+ else fprintf(outfile, "Error %d\n", count);
+ }
+ start_offset = -1;
+ }
+
+ if ((!do_g && !do_G) || start_offset <= 0) break;
+ if (do_G)
+ {
+ bptr += start_offset;
+ len -= start_offset;
}
}
}
diff --git a/pgrep.c b/pgrep.c
index b410836..1cf5289 100644
--- a/pgrep.c
+++ b/pgrep.c
@@ -74,7 +74,7 @@ while (fgets(buffer, sizeof(buffer), in) != NULL)
if (length > 0 && buffer[length-1] == '\n') buffer[--length] = 0;
linenumber++;
- match = pcre_exec(pattern, hints, buffer, length, 0, offsets, 99) >= 0;
+ match = pcre_exec(pattern, hints, buffer, length, 0, 0, offsets, 99) >= 0;
if (match && whole_lines && offsets[1] != length) match = FALSE;
if (match != invert)
diff --git a/testinput1 b/testinput1
index 2d0116c..511a706 100644
--- a/testinput1
+++ b/testinput1
@@ -1813,4 +1813,46 @@
*** Failers
abcde\nBar
+/^.*B/
+ **** Failers
+ abc\nB
+
+/(?s)^.*B/
+ abc\nB
+
+/(?m)^.*B/
+ abc\nB
+
+/(?ms)^.*B/
+ abc\nB
+
+/(?ms)^B/
+ abc\nB
+
+/(?s)B$/
+ B\n
+
+/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
+ 123456654321
+
+/^\d\d\d\d\d\d\d\d\d\d\d\d/
+ 123456654321
+
+/^[\d][\d][\d][\d][\d][\d][\d][\d][\d][\d][\d][\d]/
+ 123456654321
+
+/^[abc]{12}/
+ abcabcabcabc
+
+/^[a-c]{12}/
+ abcabcabcabc
+
+/^(a|b|c){12}/
+ abcabcabcabc
+
+/^[abcdefghijklmnopqrstuvwxy0123456789]/
+ n
+ *** Failers
+ z
+
/ End of test input /
diff --git a/testinput2 b/testinput2
index 39a7560..2046605 100644
--- a/testinput2
+++ b/testinput2
@@ -442,4 +442,48 @@
/(?s:.*X|^B)/D
+/\Biss\B/+
+ Mississippi
+
+/\Biss\B/+P
+ Mississippi
+
+/iss/G+
+ Mississippi
+
+/\Biss\B/G+
+ Mississippi
+
+/\Biss\B/g+
+ Mississippi
+ *** Failers
+ Mississippi\A
+
+/(?<=[Ms])iss/g+
+ Mississippi
+
+/(?<=[Ms])iss/G+
+ Mississippi
+
+/^iss/g+
+ ississippi
+
+/.*iss/g+
+ abciss\nxyzisspqr
+
+/.i./+g
+ Mississippi
+ Mississippi\A
+ Missouri river
+ Missouri river\A
+
+/^.is/+g
+ Mississippi
+
+/^ab\n/g+
+ ab\nab\ncd
+
+/^ab\n/mg+
+ ab\nab\ncd
+
/ End of test input /
diff --git a/testoutput1 b/testoutput1
index bfe8862..ce809e3 100644
--- a/testoutput1
+++ b/testoutput1
@@ -1,4 +1,4 @@
-PCRE version 2.05 21-Apr-1999
+PCRE version 2.06 21-Jun-1999
/the quick brown fox/
the quick brown fox
@@ -2767,5 +2767,64 @@ No match
abcde\nBar
No match
+/^.*B/
+ **** Failers
+No match
+ abc\nB
+No match
+
+/(?s)^.*B/
+ abc\nB
+ 0: abc\x0aB
+
+/(?m)^.*B/
+ abc\nB
+ 0: B
+
+/(?ms)^.*B/
+ abc\nB
+ 0: abc\x0aB
+
+/(?ms)^B/
+ abc\nB
+ 0: B
+
+/(?s)B$/
+ B\n
+ 0: B
+
+/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
+ 123456654321
+ 0: 123456654321
+
+/^\d\d\d\d\d\d\d\d\d\d\d\d/
+ 123456654321
+ 0: 123456654321
+
+/^[\d][\d][\d][\d][\d][\d][\d][\d][\d][\d][\d][\d]/
+ 123456654321
+ 0: 123456654321
+
+/^[abc]{12}/
+ abcabcabcabc
+ 0: abcabcabcabc
+
+/^[a-c]{12}/
+ abcabcabcabc
+ 0: abcabcabcabc
+
+/^(a|b|c){12}/
+ abcabcabcabc
+ 0: abcabcabcabc
+ 1: c
+
+/^[abcdefghijklmnopqrstuvwxy0123456789]/
+ n
+ 0: n
+ *** Failers
+No match
+ z
+No match
+
/ End of test input /
diff --git a/testoutput2 b/testoutput2
index 09148ff..91a09cd 100644
--- a/testoutput2
+++ b/testoutput2
@@ -1,4 +1,4 @@
-PCRE version 2.05 21-Apr-1999
+PCRE version 2.06 21-Jun-1999
/(a)b|/
Identifying subpattern count = 1
@@ -981,7 +981,7 @@ No first char
/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
)?)?)?)?)?)?)?)?)?otherword/M
-Memory allocation request: 441 (code space 428)
+Memory allocation (code space): 428
Identifying subpattern count = 8
No options
First char = 'w'
@@ -1081,6 +1081,142 @@ Identifying subpattern count = 0
No options
First char at start or follows \n
+/\Biss\B/+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+
+/\Biss\B/+P
+ Mississippi
+ 0: iss
+ 0+ issippi
+
+/iss/G+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+ 0: iss
+ 0+ ippi
+
+/\Biss\B/G+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+
+/\Biss\B/g+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+ 0: iss
+ 0+ ippi
+ *** Failers
+No match
+ Mississippi\A
+No match
+
+/(?<=[Ms])iss/g+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+ 0: iss
+ 0+ ippi
+
+/(?<=[Ms])iss/G+
+Identifying subpattern count = 0
+No options
+First char = 'i'
+ Mississippi
+ 0: iss
+ 0+ issippi
+
+/^iss/g+
+Identifying subpattern count = 0
+Options: anchored
+No first char
+ ississippi
+ 0: iss
+ 0+ issippi
+
+/.*iss/g+
+Identifying subpattern count = 0
+No options
+First char at start or follows \n
+ abciss\nxyzisspqr
+ 0: abciss
+ 0+ \x0axyzisspqr
+ 0: xyziss
+ 0+ pqr
+
+/.i./+g
+Identifying subpattern count = 0
+No options
+No first char
+ Mississippi
+ 0: Mis
+ 0+ sissippi
+ 0: sis
+ 0+ sippi
+ 0: sip
+ 0+ pi
+ Mississippi\A
+ 0: Mis
+ 0+ sissippi
+ 0: sis
+ 0+ sippi
+ 0: sip
+ 0+ pi
+ Missouri river
+ 0: Mis
+ 0+ souri river
+ 0: ri
+ 0+ river
+ 0: riv
+ 0+ er
+ Missouri river\A
+ 0: Mis
+ 0+ souri river
+
+/^.is/+g
+Identifying subpattern count = 0
+Options: anchored
+No first char
+ Mississippi
+ 0: Mis
+ 0+ sissippi
+
+/^ab\n/g+
+Identifying subpattern count = 0
+Options: anchored
+No first char
+ ab\nab\ncd
+ 0: ab\x0a
+ 0+ ab\x0acd
+
+/^ab\n/mg+
+Identifying subpattern count = 0
+Options: multiline
+First char at start or follows \n
+ ab\nab\ncd
+ 0: ab\x0a
+ 0+ ab\x0acd
+ 0: ab\x0a
+ 0+ cd
+
/ End of test input /
Identifying subpattern count = 0
No options
diff --git a/testoutput3 b/testoutput3
index 6d597cd..31c79c0 100644
--- a/testoutput3
+++ b/testoutput3
@@ -1,4 +1,4 @@
-PCRE version 2.05 21-Apr-1999
+PCRE version 2.06 21-Jun-1999
/(?<!bar)foo/
foo
diff --git a/testoutput4 b/testoutput4
index 0e156c4..36bceb6 100644
--- a/testoutput4
+++ b/testoutput4
@@ -1,4 +1,4 @@
-PCRE version 2.05 21-Apr-1999
+PCRE version 2.06 21-Jun-1999
/^[\w]+/
*** Failers