summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:38:53 +0000
committernigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:38:53 +0000
commit7703eae0f55edaff9f482fa8d23a6910d5d18577 (patch)
tree83aa003e890adb9ef5e1968d02febf0256cf61ac
parent0c8732c8583c7e31476c0ec1c0ac92cc7e5f8bc0 (diff)
downloadpcre-7703eae0f55edaff9f482fa8d23a6910d5d18577.tar.gz
Load pcre-2.03 into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@29 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog16
-rw-r--r--LICENCE32
-rw-r--r--Makefile5
-rw-r--r--README60
-rw-r--r--dftables.c4
-rw-r--r--get.c189
-rw-r--r--internal.h6
-rw-r--r--maketables.c4
-rw-r--r--pcre.3145
-rw-r--r--pcre.c4
-rw-r--r--pcre.h8
-rw-r--r--pcreposix.c4
-rw-r--r--pcretest.c86
-rw-r--r--study.c4
-rw-r--r--testinput26
-rw-r--r--testinput246
-rw-r--r--testinput36
-rw-r--r--testoutput48
-rw-r--r--testoutput2146
-rw-r--r--testoutput314
-rw-r--r--testoutput42
21 files changed, 778 insertions, 77 deletions
diff --git a/ChangeLog b/ChangeLog
index 435b90a..a057a22 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,22 @@ ChangeLog for PCRE
------------------
+Version 2.03 02-Feb-99
+----------------------
+
+1. Fixed typo and small mistake in man page.
+
+2. Added 4th condition (GPL supersedes) and created separate LICENCE file.
+
+3. Updated pcretest so that patterns such as /abc\/def/ work like they do in
+Perl, that is the internal \ allows the delimiter to be included in the
+pattern. Locked out the use of \ as a delimiter. If \ immediately follows
+the final delimiter, add \ to the end of the pattern (to test the error).
+
+4. Added the convenience functions for extracting substrings after a successful
+match. Updated pcretest to make it able to test these functions.
+
+
Version 2.02 14-Jan-99
----------------------
diff --git a/LICENCE b/LICENCE
new file mode 100644
index 0000000..246515a
--- /dev/null
+++ b/LICENCE
@@ -0,0 +1,32 @@
+PCRE LICENCE
+------------
+
+PCRE is a library of functions to support regular expressions whose syntax
+and semantics are as close as possible to those of the Perl 5 language.
+
+Written by: Philip Hazel <ph10@cam.ac.uk>
+
+University of Cambridge Computing Service,
+Cambridge, England. Phone: +44 1223 334714.
+
+Copyright (c) 1997-1999 University of Cambridge
+
+Permission is granted to anyone to use this software for any purpose on any
+computer system, and to redistribute it freely, subject to the following
+restrictions:
+
+1. This software is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+
+2. The origin of this software must not be misrepresented, either by
+ explicit claim or by omission.
+
+3. Altered versions must be plainly marked as such, and must not be
+ misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
+
+End
diff --git a/Makefile b/Makefile
index afa6316..2da3012 100644
--- a/Makefile
+++ b/Makefile
@@ -19,7 +19,7 @@ RANLIB = @true
##########################################################################
-OBJ = maketables.o study.o pcre.o
+OBJ = maketables.o get.o study.o pcre.o
all: libpcre.a libpcreposix.a pcretest pgrep
@@ -48,6 +48,9 @@ pcreposix.o: pcreposix.c pcreposix.h internal.h pcre.h Makefile
maketables.o: maketables.c pcre.h internal.h Makefile
$(CC) -c $(CFLAGS) maketables.c
+get.o: get.c pcre.h internal.h Makefile
+ $(CC) -c $(CFLAGS) get.c
+
study.o: study.c pcre.h internal.h Makefile
$(CC) -c $(CFLAGS) study.c
diff --git a/README b/README
index e169e46..29fc714 100644
--- a/README
+++ b/README
@@ -21,6 +21,7 @@ README file for PCRE (Perl-compatible regular expressions)
The distribution should contain the following files:
ChangeLog log of changes to the code
+ LICENCE conditions for the use of PCRE
Makefile for building PCRE
README this file
RunTest a shell script for running tests
@@ -28,6 +29,7 @@ The distribution should contain the following files:
pcre.3 man page for the functions
pcreposix.3 man page for the POSIX wrapper API
dftables.c auxiliary program for building chartables.c
+ get.c )
maketables.c )
study.c ) source of
pcre.c ) the functions
@@ -69,8 +71,9 @@ additional features of release 5.005, which is why it is kept separate from the
main test input, which needs only Perl 5.004. In the long run, when 5.005 is
widespread, these two test files may get amalgamated.
-The second set of tests check pcre_info(), pcre_study(), error detection and
-run-time flags that are specific to PCRE, as well as the POSIX wrapper API.
+The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
+pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
+flags that are specific to PCRE, as well as the POSIX wrapper API.
The fourth set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
@@ -157,13 +160,36 @@ The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern. An empty line signals the end of the
set. The regular expressions are given enclosed in any non-alphameric
-delimiters, for example
+delimiters other than backslash, for example
/(a|bc)x+yz/
-and may be followed by i, m, s, or x to set the PCRE_CASELESS, PCRE_MULTILINE,
-PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These options have the
-same effect as they do in Perl.
+White space before the initial delimiter is ignored. A regular expression may
+be continued over several input lines, in which case the newline characters are
+included within it. See the testinput files for many examples. It is possible
+to include the delimiter within the pattern by escaping it, for example
+
+ /abc\/def/
+
+If you do so, the escape and the delimiter form part of the pattern, but since
+delimiters are always non-alphameric, this does not affect its interpretation.
+If the terminating delimiter is immediately followed by a backslash, for
+example,
+
+ /abc/\
+
+then a backslash is added to the end of the pattern. This provides a way of
+testing the error condition that arises if a pattern finishes with a backslash,
+because
+
+ /abc\/
+
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcretest to read the next line as a continuation of the regular expression.
+
+The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
+PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These
+options have the same effect as they do in Perl.
There are also some upper case options that do not match Perl options: /A, /E,
and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
@@ -196,9 +222,6 @@ rather than its native API. When this is done, all other options except /i and
is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
PCRE_DOTALL unless REG_NEWLINE is set.
-A regular expression can extend over several lines of input; the newlines are
-included in it. See the testinput files for many examples.
-
Before each data line is passed to pcre_exec(), leading and trailing whitespace
is removed, and it is then scanned for \ escapes. The following are recognized:
@@ -215,6 +238,11 @@ is removed, and it is then scanned for \ escapes. The following are recognized:
\A pass the PCRE_ANCHORED option to pcre_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
+ \Cdd call pcre_copy_substring() for substring dd after a successful match
+ (any decimal number less than 32)
+ \Gdd call pcre_get_substring() for substring dd after a successful match
+ (any decimal number less than 32)
+ \L call pcre_get_substringlist() after a successful match
\Odd set the size of the output vector passed to pcre_exec() to dd
(any number of decimal digits)
\Z pass the PCRE_NOTEOL option to pcre_exec()
@@ -227,7 +255,7 @@ If /P was present on the regex, causing the POSIX wrapper API to be used, only
\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
regexec() respectively.
-When a match succeeds, pcretest outputs the list of identified substrings that
+When a match succeeds, pcretest outputs the list of captured substrings that
pcre_exec() returns, starting with number 0 for the string that matched the
whole pattern. Here is an example of an interactive pcretest run.
@@ -242,6 +270,12 @@ whole pattern. Here is an example of an interactive pcretest run.
data> xyz
No match
+If any of \C, \G, or \L are present in a data line that is successfully
+matched, the substrings extracted by the convenience functions are output with
+C, G, or L after the string number instead of a colon. This is in addition to
+the normal full list. The string length (that is, the return from the
+extraction function) is given in parentheses after each string for \C and \G.
+
Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape.
@@ -260,10 +294,10 @@ compilation.
If the option -s is given to pcretest, it outputs the size of each compiled
pattern after it has been compiled.
-If the -t option is given, each compile, study, and match is run 10000 times
+If the -t option is given, each compile, study, and match is run 20000 times
while being timed, and the resulting time per compile or match is output in
milliseconds. Do not set -t with -s, because you will then get the size output
-10000 times and the timing will be distorted. If you want to change the number
+20000 times and the timing will be distorted. If you want to change the number
of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
pcretest.c
@@ -291,4 +325,4 @@ contains malformed regular expressions, in order to check that PCRE diagnoses
them correctly.
Philip Hazel <ph10@cam.ac.uk>
-January 1999
+February 1999
diff --git a/dftables.c b/dftables.c
index 3e5d592..729049f 100644
--- a/dftables.c
+++ b/dftables.c
@@ -24,6 +24,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
See the file Tech.Notes for some information on the internals.
diff --git a/get.c b/get.c
new file mode 100644
index 0000000..035668e
--- /dev/null
+++ b/get.c
@@ -0,0 +1,189 @@
+/*************************************************
+* Perl-Compatible Regular Expressions *
+*************************************************/
+
+/*
+This is a library of functions to support regular expressions whose syntax
+and semantics are as close as possible to those of the Perl 5 language. See
+the file Tech.Notes for some information on the internals.
+
+Written by: Philip Hazel <ph10@cam.ac.uk>
+
+ Copyright (c) 1997-1999 University of Cambridge
+
+-----------------------------------------------------------------------------
+Permission is granted to anyone to use this software for any purpose on any
+computer system, and to redistribute it freely, subject to the following
+restrictions:
+
+1. This software is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+
+2. The origin of this software must not be misrepresented, either by
+ explicit claim or by omission.
+
+3. Altered versions must be plainly marked as such, and must not be
+ misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
+-----------------------------------------------------------------------------
+*/
+
+/* This module contains some convenience functions for extracting substrings
+from the subject string after a regex match has succeeded. The original idea
+for these functions came from Scott Wimer <scottw@cgibuilder.com>. */
+
+
+/* Include the internals header, which itself includes Standard C headers plus
+the external pcre header. */
+
+#include "internal.h"
+
+
+
+/*************************************************
+* Copy captured string to given buffer *
+*************************************************/
+
+/* This function copies a single captured substring into a given buffer.
+Note that we use memcpy() rather than strncpy() in case there are binary zeros
+in the string.
+
+Arguments:
+ subject the subject string that was matched
+ ovector pointer to the offsets table
+ stringcount the number of substrings that were captured
+ (i.e. the yield of the pcre_exec call, unless
+ that was zero, in which case it should be 1/3
+ of the offset table size)
+ stringnumber the number of the required substring
+ buffer where to put the substring
+ size the size of the buffer
+
+Returns: if successful:
+ the length of the copied string, not including the zero
+ that is put on the end; can be zero
+ if not successful:
+ PCRE_ERROR_NOMEMORY (-6) buffer too small
+ PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
+*/
+
+int
+pcre_copy_substring(const char *subject, int *ovector, int stringcount,
+ int stringnumber, char *buffer, int size)
+{
+int yield;
+if (stringnumber < 0 || stringnumber >= stringcount)
+ return PCRE_ERROR_NOSUBSTRING;
+stringnumber *= 2;
+yield = ovector[stringnumber+1] - ovector[stringnumber];
+if (size < yield + 1) return PCRE_ERROR_NOMEMORY;
+memcpy(buffer, subject + ovector[stringnumber], yield);
+buffer[yield] = 0;
+return yield;
+}
+
+
+
+/*************************************************
+* Copy all captured strings to new store *
+*************************************************/
+
+/* This function gets one chunk of store and builds a list of pointers and all
+of the captured substrings in it. A NULL pointer is put on the end of the list.
+
+Arguments:
+ subject the subject string that was matched
+ ovector pointer to the offsets table
+ stringcount the number of substrings that were captured
+ (i.e. the yield of the pcre_exec call, unless
+ that was zero, in which case it should be 1/3
+ of the offset table size)
+ listptr set to point to the list of pointers
+
+Returns: if successful: 0
+ if not successful:
+ PCRE_ERROR_NOMEMORY (-6) failed to get store
+*/
+
+int
+pcre_get_substring_list(const char *subject, int *ovector, int stringcount,
+ const char ***listptr)
+{
+int i;
+int size = sizeof(char *);
+int double_count = stringcount * 2;
+char **stringlist;
+char *p;
+
+for (i = 0; i < double_count; i += 2)
+ size += sizeof(char *) + ovector[i+1] - ovector[i] + 1;
+
+stringlist = (char **)(pcre_malloc)(size);
+if (stringlist == NULL) return PCRE_ERROR_NOMEMORY;
+
+*listptr = (const char **)stringlist;
+p = (char *)(stringlist + stringcount + 1);
+
+for (i = 0; i < double_count; i += 2)
+ {
+ int len = ovector[i+1] - ovector[i];
+ memcpy(p, subject + ovector[i], len);
+ *stringlist++ = p;
+ p += len;
+ *p++ = 0;
+ }
+
+*stringlist = NULL;
+return 0;
+}
+
+
+
+/*************************************************
+* Copy captured string to new store *
+*************************************************/
+
+/* This function copies a single captured substring into a piece of new
+store
+
+Arguments:
+ subject the subject string that was matched
+ ovector pointer to the offsets table
+ stringcount the number of substrings that were captured
+ (i.e. the yield of the pcre_exec call, unless
+ that was zero, in which case it should be 1/3
+ of the offset table size)
+ stringnumber the number of the required substring
+ stringptr where to put a pointer to the substring
+
+Returns: if successful:
+ the length of the string, not including the zero that
+ is put on the end; can be zero
+ if not successful:
+ PCRE_ERROR_NOMEMORY (-6) failed to get store
+ PCRE_ERROR_NOSUBSTRING (-7) substring not present
+*/
+
+int
+pcre_get_substring(const char *subject, int *ovector, int stringcount,
+ int stringnumber, const char **stringptr)
+{
+int yield;
+char *substring;
+if (stringnumber < 0 || stringnumber >= stringcount)
+ return PCRE_ERROR_NOSUBSTRING;
+stringnumber *= 2;
+yield = ovector[stringnumber+1] - ovector[stringnumber];
+substring = (char *)(pcre_malloc)(yield + 1);
+if (substring == NULL) return PCRE_ERROR_NOMEMORY;
+memcpy(substring, subject + ovector[stringnumber], yield);
+substring[yield] = 0;
+*stringptr = substring;
+return yield;
+}
+
+/* End of get.c */
diff --git a/internal.h b/internal.h
index 713e6c5..1b9ffe2 100644
--- a/internal.h
+++ b/internal.h
@@ -3,7 +3,7 @@
*************************************************/
-#define PCRE_VERSION "2.02 14-Jan-1999"
+#define PCRE_VERSION "2.03 12-Feb-1999"
/* This is a library of functions to support regular expressions whose syntax
@@ -28,6 +28,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
diff --git a/maketables.c b/maketables.c
index 01943d3..1b76455 100644
--- a/maketables.c
+++ b/maketables.c
@@ -24,6 +24,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
See the file Tech.Notes for some information on the internals.
diff --git a/pcre.3 b/pcre.3
index 098813e..927a6ad 100644
--- a/pcre.3
+++ b/pcre.3
@@ -13,9 +13,6 @@ pcre - Perl-compatible regular expressions.
.B const unsigned char *\fItableptr\fR);
.PP
.br
-.B const unsigned char *pcre_maketables(void);
-.PP
-.br
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
.ti +5n
.B const char **\fIerrptr\fR);
@@ -28,6 +25,28 @@ pcre - Perl-compatible regular expressions.
.B int *\fIovector\fR, int \fIovecsize\fR);
.PP
.br
+.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
+.ti +5n
+.B int \fIstringcount\fR, int \fIstringnumber\fR, char *\fIbuffer\fR,
+.ti +5n
+.B int \fIbuffersize\fR);
+.PP
+.br
+.B int pcre_get_substring(const char *\fIsubject\fR, int *\fIovector\fR,
+.ti +5n
+.B int \fIstringcount\fR, int \fIstringnumber\fR,
+.ti +5n
+.B const char **\fIstringptr\fR);
+.PP
+.br
+.B int pcre_get_substring_list(const char *\fIsubject\fR,
+.ti +5n
+.B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
+.PP
+.br
+.B const unsigned char *pcre_maketables(void);
+.PP
+.br
.B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int
.B *\fIfirstcharptr\fR);
.PP
@@ -51,10 +70,13 @@ PCRE has its own native API, which is described in this man page. There is also
a set of wrapper functions that correspond to the POSIX API. See
\fBpcreposix (3)\fR.
-The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and
-\fBpcre_exec()\fR are used for compiling and matching regular expressions. The
-function \fBpcre_maketables()\fR is used (optionally) to build a set of
-character tables in the current locale for passing to \fBpcre_compile()\fR.
+The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
+are used for compiling and matching regular expressions, while
+\fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
+\fBpcre_get_substring_list()\fR are convenience functions for extracting
+captured substrings from a matched subject string. The function
+\fBpcre_maketables()\fR is used (optionally) to build a set of character tables
+in the current locale for passing to \fBpcre_compile()\fR.
The function \fBpcre_info()\fR is used to find out information about a compiled
pattern, while the function \fBpcre_version()\fR returns a pointer to a string
@@ -233,6 +255,27 @@ in different locales. It is the caller's responsibility to ensure that the
memory containing the tables remains available for as long as it is needed.
+.SH INFORMATION ABOUT A PATTERN
+The \fBpcre_info()\fR function returns information about a compiled pattern.
+Its yield is the number of capturing subpatterns, or one of the following
+negative numbers:
+
+ PCRE_ERROR_NULL the argument \fIcode\fR was NULL
+ PCRE_ERROR_BADMAGIC the "magic number" was not found
+
+If the \fIoptptr\fR argument is not NULL, a copy of the options with which the
+pattern was compiled is placed in the integer it points to.
+
+If the \fIfirstcharptr\fR argument is not NULL, is is used to pass back
+information about the first character of any matched string. If there is a
+fixed first character, e.g. from a pattern such as (cat|cow|coyote), then it is
+returned in the integer pointed to by \fIfirstcharptr\fR. Otherwise, if the
+pattern was compiled with the PCRE_MULTILINE option, and every branch started
+with "^", then -1 is returned, indicating that the pattern will match at the
+start of a subject string or after any "\\n" within the string. Otherwise -2 is
+returned.
+
+
.SH MATCHING A PATTERN
The function \fBpcre_exec()\fR is called to match a subject string against a
pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
@@ -290,6 +333,9 @@ is the number of pairs that have been set. If there are no capturing
subpatterns, the return value from a successful match is 1, indicating that
just the first pair of offsets has been set.
+Some convenience functions are provided for extracting the captured substrings
+as separate strings. These are described in the following section.
+
It is possible for an capturing subpattern number \fIn+1\fR to match some
part of the subject when subpattern \fIn\fR has not been used at all. For
example, if the string "abc" is matched against the pattern (a|(z))(bc)
@@ -350,25 +396,62 @@ call via \fBpcre_malloc()\fR fails, this error is given. The memory is freed at
the end of matching.
-.SH INFORMATION ABOUT A PATTERN
-The \fBpcre_info()\fR function returns information about a compiled pattern.
-Its yield is the number of capturing subpatterns, or one of the following
-negative numbers:
+.SH EXTRACTING CAPTURED SUBSTRINGS
+Captured substrings can be accessed directly by using the offsets returned by
+\fBpcre_exec()\fR in \fIovector\fR. For convenience, the functions
+\fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
+\fBpcre_get_substring_list()\fR are provided for extracting captured substrings
+as new, separate, zero-terminated strings. A substring that contains a binary
+zero is correctly extracted and has a further zero added on the end, but the
+result does not, of course, function as a C string.
+
+The first three arguments are the same for all three functions: \fIsubject\fR
+is the subject string which has just been successfully matched, \fIovector\fR
+is a pointer to the vector of integer offsets that was passed to
+\fBpcre_exec()\fR, and \fIstringcount\fR is the number of substrings that
+were captured by the match, including the substring that matched the entire
+regular expression. This is the value returned by \fBpcre_exec\fR if it
+is greater than zero. If \fBpcre_exec()\fR returned zero, indicating that it
+ran out of space in \fIovector\fR, then the value passed as
+\fIstringcount\fR should be the size of the vector divided by three.
+
+The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fR
+extract a single substring, whose number is given as \fIstringnumber\fR. A
+value of zero extracts the substring that matched the entire pattern, while
+higher values extract the captured substrings. For \fBpcre_copy_substring()\fR,
+the string is placed in \fIbuffer\fR, whose length is given by
+\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of store is
+obtained via \fBpcre_malloc\fR, and its address is returned via
+\fIstringptr\fR. The yield of the function is the length of the string, not
+including the terminating zero, or one of
- PCRE_ERROR_NULL the argument \fIcode\fR was NULL
- PCRE_ERROR_BADMAGIC the "magic number" was not found
+ PCRE_ERROR_NOMEMORY (-6)
-If the \fIoptptr\fR argument is not NULL, a copy of the options with which the
-pattern was compiled is placed in the integer it points to.
+The buffer was too small for \fBpcre_copy_substring()\fR, or the attempt to get
+memory failed for \fBpcre_get_substring()\fR.
+
+ PCRE_ERROR_NOSUBSTRING (-7)
+
+There is no substring whose number is \fIstringnumber\fR.
+
+The \fBpcre_get_substring_list()\fR function extracts all available substrings
+and builds a list of pointers to them. All this is done in a single block of
+memory which is obtained via \fBpcre_malloc\fR. The address of the memory block
+is returned via \fIlistptr\fR, which is also the start of the list of string
+pointers. The end of the list is marked by a NULL pointer. The yield of the
+function is zero if all went well, or
+
+ PCRE_ERROR_NOMEMORY (-6)
+
+if the attempt to get the memory block failed.
+
+When any of these functions encounter a substring that is unset, which can
+happen when capturing subpattern number \fIn+1\fR matches some part of the
+subject, but subpattern \fIn\fR has not been used at all, they return an empty
+string. This can be distinguished from a genuine zero-length substring by
+inspecting the appropriate offset in \fIovector\fR, which is negative for unset
+substrings.
-If the \fIfirstcharptr\fR argument is not NULL, is is used to pass back
-information about the first character of any matched string. If there is a
-fixed first character, e.g. from a pattern such as (cat|cow|coyote), then it is
-returned in the integer pointed to by \fIfirstcharptr\fR. Otherwise, if the
-pattern was compiled with the PCRE_MULTILINE option, and every branch started
-with "^", then -1 is returned, indicating that the pattern will match at the
-start of a subject string or after any "\\n" within the string. Otherwise -2 is
-returned.
.SH LIMITATIONS
@@ -723,11 +806,15 @@ The minus (hyphen) character can be used to specify a range of characters in a
character class. For example, [d-m] matches any letter between d and m,
inclusive. If a minus character is required in a class, it must be escaped with
a backslash or appear in a position where it cannot be interpreted as
-indicating a range, typically as the first or last character in the class. It
-is not possible to have the character "]" as the end character of a range,
-since a sequence such as [w-] is interpreted as a class of two characters. The
-octal or hexadecimal representation of "]" can, however, be used to end a
-range.
+indicating a range, typically as the first or last character in the class.
+
+It is not possible to have the literal character "]" as the end character of a
+range. A pattern such as [W-]46] is interpreted as a class of two characters
+("W" and "-") followed by a literal string "46]", so it would match "W46]" or
+"-46]". However, if the "]" is escaped with a backslash it is interpreted as
+the end of range, so [W-\\]46] is interpreted as a single class containing a
+range followed by two separate characters. The octal or hexadecimal
+representation of "]" can also be used to end a range.
Ranges operate in ASCII collating sequence. They can also be used for
characters specified numerically, for example [\\000-\\037]. If a range that
@@ -1156,7 +1243,7 @@ of characters that an identical standalone pattern would match, if anchored at
the current point in the subject string.
Once-only subpatterns are not capturing subpatterns. Simple cases such as the
-above example can be though of as a maximizing repeat that must swallow
+above example can be thought of as a maximizing repeat that must swallow
everything it can. So, while both \\d+ and \\d+? are prepared to adjust the
number of digits they match in order to make the rest of the pattern match,
(?>\\d+) can only match an entire sequence of digits.
diff --git a/pcre.c b/pcre.c
index 320b8e2..8da2134 100644
--- a/pcre.c
+++ b/pcre.c
@@ -25,6 +25,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
diff --git a/pcre.h b/pcre.h
index 5224e25..27204b6 100644
--- a/pcre.h
+++ b/pcre.h
@@ -32,7 +32,7 @@ extern "C" {
#define PCRE_NOTEOL 0x0100
#define PCRE_UNGREEDY 0x0200
-/* Exec-time error codes */
+/* Exec-time and get-time error codes */
#define PCRE_ERROR_NOMATCH (-1)
#define PCRE_ERROR_NULL (-2)
@@ -40,6 +40,7 @@ extern "C" {
#define PCRE_ERROR_BADMAGIC (-4)
#define PCRE_ERROR_UNKNOWN_NODE (-5)
#define PCRE_ERROR_NOMEMORY (-6)
+#define PCRE_ERROR_NOSUBSTRING (-7)
/* Types */
@@ -56,10 +57,13 @@ extern void (*pcre_free)(void *);
extern pcre *pcre_compile(const char *, int, const char **, int *,
const unsigned char *);
+extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
int, int, int *, int);
-extern unsigned const char *pcre_maketables(void);
+extern int pcre_get_substring(const char *, int *, int, int, const char **);
+extern int pcre_get_substring_list(const char *, int *, int, const char ***);
extern int pcre_info(const pcre *, int *, int *);
+extern unsigned const char *pcre_maketables(void);
extern pcre_extra *pcre_study(const pcre *, int, const char **);
extern const char *pcre_version(void);
diff --git a/pcreposix.c b/pcreposix.c
index 4470676..b370701 100644
--- a/pcreposix.c
+++ b/pcreposix.c
@@ -28,6 +28,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
diff --git a/pcretest.c b/pcretest.c
index 9aaf981..4b58895 100644
--- a/pcretest.c
+++ b/pcretest.c
@@ -385,9 +385,9 @@ while (!done)
delimiter = *p++;
- if (isalnum(delimiter))
+ if (isalnum(delimiter) || delimiter == '\\')
{
- fprintf(outfile, "** Delimiter must not be alphameric\n");
+ fprintf(outfile, "** Delimiter must not be alphameric or \\\n");
goto SKIP_DATA;
}
@@ -395,7 +395,12 @@ while (!done)
for(;;)
{
- while (*pp != 0 && *pp != delimiter) pp++;
+ while (*pp != 0)
+ {
+ if (*pp == '\\' && pp[1] != 0) pp++;
+ else if (*pp == delimiter) break;
+ pp++;
+ }
if (*pp != 0) break;
len = sizeof(buffer) - (pp - buffer);
@@ -415,6 +420,12 @@ while (!done)
if (infile != stdin) fprintf(outfile, "%s", (char *)pp);
}
+ /* If the first character after the delimiter is backslash, make
+ the pattern end with backslash. This is purely to provide a way
+ of testing for the error message when a pattern ends with backslash. */
+
+ if (pp[1] == '\\') *pp++ = '\\';
+
/* Terminate the pattern at the delimiter */
*pp++ = 0;
@@ -644,6 +655,9 @@ while (!done)
{
unsigned char *q;
int count, c;
+ int copystrings = 0;
+ int getstrings = 0;
+ int getlist = 0;
int offsets[45];
int size_offsets = sizeof(offsets)/sizeof(int);
@@ -709,6 +723,20 @@ while (!done)
options |= PCRE_NOTBOL;
continue;
+ case 'C':
+ while(isdigit(*p)) n = n * 10 + *p++ - '0';
+ copystrings |= 1 << n;
+ continue;
+
+ case 'G':
+ while(isdigit(*p)) n = n * 10 + *p++ - '0';
+ getstrings |= 1 << n;
+ continue;
+
+ case 'L':
+ getlist = 1;
+ continue;
+
case 'O':
while(isdigit(*p)) n = n * 10 + *p++ - '0';
if (n <= (int)(sizeof(offsets)/sizeof(int))) size_offsets = n;
@@ -788,8 +816,7 @@ while (!done)
if (count >= 0)
{
int i;
- count *= 2;
- for (i = 0; i < count; i += 2)
+ for (i = 0; i < count * 2; i += 2)
{
if (offsets[i] < 0)
fprintf(outfile, "%2d: <unset>\n", i/2);
@@ -800,6 +827,55 @@ while (!done)
fprintf(outfile, "\n");
}
}
+
+ for (i = 0; i < 32; i++)
+ {
+ if ((copystrings & (1 << i)) != 0)
+ {
+ char buffer[16];
+ int rc = pcre_copy_substring((char *)dbuffer, offsets, count,
+ i, buffer, sizeof(buffer));
+ if (rc < 0)
+ fprintf(outfile, "copy substring %d failed %d\n", i, rc);
+ else
+ fprintf(outfile, "%2dC %s (%d)\n", i, buffer, rc);
+ }
+ }
+
+ for (i = 0; i < 32; i++)
+ {
+ if ((getstrings & (1 << i)) != 0)
+ {
+ const char *substring;
+ int rc = pcre_get_substring((char *)dbuffer, offsets, count,
+ i, &substring);
+ if (rc < 0)
+ fprintf(outfile, "get substring %d failed %d\n", i, rc);
+ else
+ {
+ fprintf(outfile, "%2dG %s (%d)\n", i, substring, rc);
+ free((void *)substring);
+ }
+ }
+ }
+
+ if (getlist)
+ {
+ const char **stringlist;
+ int rc = pcre_get_substring_list((char *)dbuffer, offsets, count,
+ &stringlist);
+ if (rc < 0)
+ fprintf(outfile, "get substring list failed %d\n", rc);
+ else
+ {
+ for (i = 0; i < count; i++)
+ fprintf(outfile, "%2dL %s\n", i, stringlist[i]);
+ if (stringlist[i] != NULL)
+ fprintf(outfile, "string list not terminated by NULL\n");
+ free((void *)stringlist);
+ }
+ }
+
}
else
{
diff --git a/study.c b/study.c
index 40f489b..284833b 100644
--- a/study.c
+++ b/study.c
@@ -25,6 +25,10 @@ restrictions:
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
+
+4. If PCRE is embedded in any software that is released under the GNU
+ General Purpose Licence (GPL), then the terms of that licence shall
+ supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
diff --git a/testinput b/testinput
index 43f2914..ffd2bb4 100644
--- a/testinput
+++ b/testinput
@@ -1655,5 +1655,31 @@
ABC445
*** Failers
ABC123
+
+/^[W-]46]/
+ W46]789
+ -46]789
+ *** Failers
+ Wall
+ Zebra
+ 42
+ [abcd]
+ ]abcd[
+/^[W-\]46]/
+ W46]789
+ Wall
+ Zebra
+ Xylophone
+ 42
+ [abcd]
+ ]abcd[
+ \\backslash
+ *** Failers
+ -46]789
+ well
+
+/\d\d\/\d\d\/\d\d\d\d/
+ 01/01/2000
+
/ End of test input /
diff --git a/testinput2 b/testinput2
index 360e48b..f9b4427 100644
--- a/testinput2
+++ b/testinput2
@@ -28,8 +28,6 @@
*** Failers
def\nabc
-/abc\/
-
/ab\gdef/X
/(?X)ab\gdef/X
@@ -156,8 +154,6 @@
abc
abc\n
-/abc\/P
-
/(abc)\2/P
/(abc\1)/P
@@ -285,7 +281,7 @@
/(?<=ab(c|de)f)g/
-/The next two are in testinput2 because they have variable length branches/
+/The next three are in testinput2 because they have variable length branches/
/(?<=bullock|donkey)-cart/
the bullock-cart
@@ -296,6 +292,10 @@
/(?<=ab(?i)x|y|z)/
+/(?>.*)(?<=(abcd)|(xyz))/
+ alphabetabcd
+ endingxyz
+
/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/
abxyZZ
abXyZZ
@@ -338,8 +338,6 @@
/(*)b/
-/a\/
-
/abc)/
/(abc/
@@ -364,8 +362,6 @@
/(*)b/i
-/a\/i
-
/abc)/i
/(abc/i
@@ -398,4 +394,36 @@
/a{37,17}/
+/abc/\
+
+/abc/\P
+
+/abc/\i
+
+/(a)bc(d)/
+ abcd
+ abcd\C2
+ abcd\C5
+
+/(.{20})/
+ abcdefghijklmnopqrstuvwxyz
+ abcdefghijklmnopqrstuvwxyz\C1
+ abcdefghijklmnopqrstuvwxyz\G1
+
+/(.{15})/
+ abcdefghijklmnopqrstuvwxyz
+ abcdefghijklmnopqrstuvwxyz\C1\G1
+
+/(.{16})/
+ abcdefghijklmnopqrstuvwxyz
+ abcdefghijklmnopqrstuvwxyz\C1\G1\L
+
+/^(a|(bc))de(f)/
+ adef\G1\G2\G3\G4\L
+ bcdef\G1\G2\G3\G4\L
+ adefghijk\C0
+
+/^abc\00def/
+ abc\00def\L\C0
+
/ End of test input /
diff --git a/testinput3 b/testinput3
index c27c780..e4b356c 100644
--- a/testinput3
+++ b/testinput3
@@ -1625,4 +1625,10 @@
the.quick.brown.fox_
the.quick.brown.fox+
+/(?>.*)(?<=(abcd|wxyz))/
+ alphabetabcd
+ endingwxyz
+ *** Failers
+ a rather long string that doesn't end with one of them
+
/ End of test input /
diff --git a/testoutput b/testoutput
index 37bf728..408588b 100644
--- a/testoutput
+++ b/testoutput
@@ -1,4 +1,4 @@
-PCRE version 2.02 14-Jan-1999
+PCRE version 2.03 12-Feb-1999
/the quick brown fox/
the quick brown fox
@@ -2484,6 +2484,52 @@ No match
No match
ABC123
No match
+
+/^[W-]46]/
+ W46]789
+ 0: W46]
+ -46]789
+ 0: -46]
+ *** Failers
+No match
+ Wall
+No match
+ Zebra
+No match
+ 42
+No match
+ [abcd]
+No match
+ ]abcd[
+No match
+/^[W-\]46]/
+ W46]789
+ 0: W
+ Wall
+ 0: W
+ Zebra
+ 0: Z
+ Xylophone
+ 0: X
+ 42
+ 0: 4
+ [abcd]
+ 0: [
+ ]abcd[
+ 0: ]
+ \\backslash
+ 0: \
+ *** Failers
+No match
+ -46]789
+No match
+ well
+No match
+
+/\d\d\/\d\d\/\d\d\d\d/
+ 01/01/2000
+ 0: 01/01/2000
+
/ End of test input /
diff --git a/testoutput2 b/testoutput2
index 34dad57..6416c79 100644
--- a/testoutput2
+++ b/testoutput2
@@ -1,4 +1,4 @@
-PCRE version 2.02 14-Jan-1999
+PCRE version 2.03 12-Feb-1999
/(a)b|/
Identifying subpattern count = 1
@@ -68,9 +68,6 @@ No match
def\nabc
No match
-/abc\/
-Failed: \ at end of pattern at offset 4
-
/ab\gdef/X
Failed: unrecognized character follows \ at offset 3
@@ -362,9 +359,6 @@ No match: POSIX code 17: match failed
abc\n
0: abc
-/abc\/P
-Failed: POSIX code 9: bad escape sequence at offset 4
-
/(abc)\2/P
Failed: POSIX code 15: bad back reference at offset 7
@@ -658,7 +652,7 @@ Failed: lookbehind assertion is not fixed length at offset 12
/(?<=ab(c|de)f)g/
Failed: lookbehind assertion is not fixed length at offset 13
-/The next two are in testinput2 because they have variable length branches/
+/The next three are in testinput2 because they have variable length branches/
Identifying subpattern count = 0
No options
First char = 'T'
@@ -683,6 +677,18 @@ Identifying subpattern count = 0
No options
No first char
+/(?>.*)(?<=(abcd)|(xyz))/
+Identifying subpattern count = 2
+Options: anchored
+No first char
+ alphabetabcd
+ 0: alphabetabcd
+ 1: abcd
+ endingxyz
+ 0: endingxyz
+ 1: <unset>
+ 2: xyz
+
/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/
Identifying subpattern count = 0
No options
@@ -770,9 +776,6 @@ Failed: nothing to repeat at offset 0
/(*)b/
Failed: nothing to repeat at offset 1
-/a\/
-Failed: \ at end of pattern at offset 2
-
/abc)/
Failed: unmatched parentheses at offset 3
@@ -809,9 +812,6 @@ Failed: nothing to repeat at offset 0
/(*)b/i
Failed: nothing to repeat at offset 1
-/a\/i
-Failed: \ at end of pattern at offset 2
-
/abc)/i
Failed: unmatched parentheses at offset 3
@@ -860,6 +860,124 @@ Failed: lookbehind assertion is not fixed length at offset 6
/a{37,17}/
Failed: numbers out of order in {} quantifier at offset 7
+/abc/\
+Failed: \ at end of pattern at offset 4
+
+/abc/\P
+Failed: POSIX code 9: bad escape sequence at offset 4
+
+/abc/\i
+Failed: \ at end of pattern at offset 4
+
+/(a)bc(d)/
+Identifying subpattern count = 2
+No options
+First char = 'a'
+ abcd
+ 0: abcd
+ 1: a
+ 2: d
+ abcd\C2
+ 0: abcd
+ 1: a
+ 2: d
+ 2C d (1)
+ abcd\C5
+ 0: abcd
+ 1: a
+ 2: d
+copy substring 5 failed -7
+
+/(.{20})/
+Identifying subpattern count = 1
+No options
+No first char
+ abcdefghijklmnopqrstuvwxyz
+ 0: abcdefghijklmnopqrst
+ 1: abcdefghijklmnopqrst
+ abcdefghijklmnopqrstuvwxyz\C1
+ 0: abcdefghijklmnopqrst
+ 1: abcdefghijklmnopqrst
+copy substring 1 failed -6
+ abcdefghijklmnopqrstuvwxyz\G1
+ 0: abcdefghijklmnopqrst
+ 1: abcdefghijklmnopqrst
+ 1G abcdefghijklmnopqrst (20)
+
+/(.{15})/
+Identifying subpattern count = 1
+No options
+No first char
+ abcdefghijklmnopqrstuvwxyz
+ 0: abcdefghijklmno
+ 1: abcdefghijklmno
+ abcdefghijklmnopqrstuvwxyz\C1\G1
+ 0: abcdefghijklmno
+ 1: abcdefghijklmno
+ 1C abcdefghijklmno (15)
+ 1G abcdefghijklmno (15)
+
+/(.{16})/
+Identifying subpattern count = 1
+No options
+No first char
+ abcdefghijklmnopqrstuvwxyz
+ 0: abcdefghijklmnop
+ 1: abcdefghijklmnop
+ abcdefghijklmnopqrstuvwxyz\C1\G1\L
+ 0: abcdefghijklmnop
+ 1: abcdefghijklmnop
+copy substring 1 failed -6
+ 1G abcdefghijklmnop (16)
+ 0L abcdefghijklmnop
+ 1L abcdefghijklmnop
+
+/^(a|(bc))de(f)/
+Identifying subpattern count = 3
+Options: anchored
+No first char
+ adef\G1\G2\G3\G4\L
+ 0: adef
+ 1: a
+ 2: <unset>
+ 3: f
+ 1G a (1)
+ 2G (0)
+ 3G f (1)
+get substring 4 failed -7
+ 0L adef
+ 1L a
+ 2L
+ 3L f
+ bcdef\G1\G2\G3\G4\L
+ 0: bcdef
+ 1: bc
+ 2: bc
+ 3: f
+ 1G bc (2)
+ 2G bc (2)
+ 3G f (1)
+get substring 4 failed -7
+ 0L bcdef
+ 1L bc
+ 2L bc
+ 3L f
+ adefghijk\C0
+ 0: adef
+ 1: a
+ 2: <unset>
+ 3: f
+ 0C adef (4)
+
+/^abc\00def/
+Identifying subpattern count = 0
+Options: anchored
+No first char
+ abc\00def\L\C0
+ 0: abc\x00def
+ 0C abc (7)
+ 0L abc
+
/ End of test input /
Identifying subpattern count = 0
No options
diff --git a/testoutput3 b/testoutput3
index 18a07ef..0be0787 100644
--- a/testoutput3
+++ b/testoutput3
@@ -1,4 +1,4 @@
-PCRE version 2.02 14-Jan-1999
+PCRE version 2.03 12-Feb-1999
/(?<!bar)foo/
foo
@@ -2806,5 +2806,17 @@ No match
the.quick.brown.fox+
No match
+/(?>.*)(?<=(abcd|wxyz))/
+ alphabetabcd
+ 0: alphabetabcd
+ 1: abcd
+ endingwxyz
+ 0: endingwxyz
+ 1: wxyz
+ *** Failers
+No match
+ a rather long string that doesn't end with one of them
+No match
+
/ End of test input /
diff --git a/testoutput4 b/testoutput4
index c72a1f3..fde2a46 100644
--- a/testoutput4
+++ b/testoutput4
@@ -1,4 +1,4 @@
-PCRE version 2.02 14-Jan-1999
+PCRE version 2.03 12-Feb-1999
/^[\w]+/
*** Failers