10 files changed, 1760 insertions, 342 deletions
diff --git a/ext/pcre/pcrelib/doc/Tech.Notes b/ext/pcre/pcrelib/doc/Tech.Notes
index 7b96e5b60e..f5ca280115 100644
--- a/ext/pcre/pcrelib/doc/Tech.Notes
+++ b/ext/pcre/pcrelib/doc/Tech.Notes
@@ -135,7 +135,7 @@ end of each byte.
 Back references
 ---------------
 
-OP_REF is followed by a single byte containing the reference number.
+OP_REF is followed by two bytes containing the reference number.
 
 
 Repeating character classes and back references
@@ -163,11 +163,21 @@ Brackets and alternation
 
 A pair of non-capturing (round) brackets is wrapped round each expression at
 compile time, so alternation always happens in the context of brackets.
+
 Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
 OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
 speakers, including myself, can be round, square, curly, or pointy. Hence this
 usage.]
 
+Originally PCRE was limited to 99 capturing brackets (so as not to use up all
+the opcodes). From release 3.5, there is no limit. What happens is that the
+first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
+above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
+first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
+number. This opcode is ignored while matching, but is fished out when handling
+the bracket itself. (They could have all been done like this, but I was making
+minimal changes.)
+
 A bracket opcode is followed by two bytes which give the offset to the next
 alternative OP_ALT or, if there aren't any branches, to the matching KET
 opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
@@ -191,8 +201,8 @@ appropriate.
 A subpattern with a bounded maximum repetition is replicated in a nested
 fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
 each replication after the minimum, so that, for example, (abc){2,5} is
-compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not
-apply to these internally generated brackets.
+compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 99 and 200 bracket limits do
+not apply to these internally generated brackets.
 
 
 Assertions
@@ -220,7 +230,7 @@ Conditional subpatterns
 
 These are like other subpatterns, but they start with the opcode OP_COND. If
 the condition is a back reference, this is stored at the start of the
-subpattern using the opcode OP_CREF followed by one byte containing the
+subpattern using the opcode OP_CREF followed by two bytes containing the
 reference number. Otherwise, a conditional subpattern will always start with
 one of the assertions.
 
@@ -240,4 +250,4 @@ the compiled data.
 
 
 Philip Hazel
-August 2000
+August 2001
diff --git a/ext/pcre/pcrelib/doc/pcre.3 b/ext/pcre/pcrelib/doc/pcre.3
index fc204453c7..738f76b4a9 100644
--- a/ext/pcre/pcrelib/doc/pcre.3
+++ b/ext/pcre/pcrelib/doc/pcre.3
@@ -92,7 +92,9 @@ contain the major and minor release numbers for the library. Applications can
 use these to include support for different releases.
 
 The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
-are used for compiling and matching regular expressions.
+are used for compiling and matching regular expressions. A sample program that
+demonstrates the simplest way of using them is given in the file
+\fIpcredemo.c\fR. The last section of this man page describes how to run it.
 
 The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
 \fBpcre_get_substring_list()\fR are convenience functions for extracting
@@ -129,18 +131,22 @@ the same compiled pattern can safely be used by several threads at once.
 The function \fBpcre_compile()\fR is called to compile a pattern into an
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument \fIpattern\fR. A pointer to a single block of memory
-that is obtained via \fBpcre_malloc\fR is returned. This contains the
-compiled code and related data. The \fBpcre\fR type is defined for this for
-convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the
-contents of the block are not externally defined. It is up to the caller to
-free the memory when it is no longer required.
-.PP
+that is obtained via \fBpcre_malloc\fR is returned. This contains the compiled
+code and related data. The \fBpcre\fR type is defined for the returned block;
+this is a typedef for a structure whose contents are not externally defined. It
+is up to the caller to free the memory when it is no longer required.
+
+Although the compiled code of a PCRE regex is relocatable, that is, it does not
+depend on memory location, the complete \fBpcre\fR data block is not
+fully relocatable, because it contains a copy of the \fItableptr\fR argument,
+which is an address (see below).
+
 The size of a compiled pattern is roughly proportional to the length of the
 pattern string, except that each character class (other than those containing
 just a single character, negated or not) requires 33 bytes, and repeat
 quantifiers with a minimum greater than one or a bounded maximum cause the
 relevant portions of the compiled pattern to be replicated.
-.PP
+
 The \fIoptions\fR argument contains independent bits that affect the
 compilation. It should be zero if no options are required. Some of the options,
 in particular, those that are compatible with Perl, can also be set and unset
@@ -149,19 +155,31 @@ below). For these options, the contents of the \fIoptions\fR argument specifies
 their initial settings at the start of compilation and execution. The
 PCRE_ANCHORED option can be set at the time of matching as well as at compile
 time.
-.PP
+
 If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns
 NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual
 error message. The offset from the start of the pattern to the character where
 the error was discovered is placed in the variable pointed to by
 \fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.
-.PP
+
 If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of
 character tables which are built when it is compiled, using the default C
 locale. Otherwise, \fItableptr\fR must be the result of a call to
 \fBpcre_maketables()\fR. See the section on locale support below.
-.PP
+
+This code fragment shows a typical straightforward call to \fBpcre_compile()\fR:
+
+  pcre *re;
+  const char *error;
+  int erroffset;
+  re = pcre_compile(
+    "^A.*Z",          /* the pattern */
+    0,                /* default options */
+    &error,           /* for error message */
+    &erroffset,       /* for error offset */
+    NULL);            /* use default character tables */
+
 The following option bits are defined in the header file:
 
   PCRE_ANCHORED
@@ -248,10 +266,10 @@ Details of exactly what it entails are given below.
 When a pattern is going to be used several times, it is worth spending more
 time analyzing it in order to speed up the time taken for matching. The
 function \fBpcre_study()\fR takes a pointer to a compiled pattern as its first
-argument, and returns a pointer to a \fBpcre_extra\fR block (another \fBvoid\fR
-typedef) containing additional information about the pattern; this can be
-passed to \fBpcre_exec()\fR. If no additional information is available, NULL
-is returned.
+argument, and returns a pointer to a \fBpcre_extra\fR block (another typedef
+for a structure with hidden contents) containing additional information about
+the pattern; this can be passed to \fBpcre_exec()\fR. If no additional
+information is available, NULL is returned.
 
 The second argument contains option bits. At present, no options are defined
 for \fBpcre_study()\fR, and this argument should always be zero.
@@ -260,6 +278,14 @@ The third argument for \fBpcre_study()\fR is a pointer to an error message. If
 studying succeeds (even if no data is returned), the variable it points to is
 set to NULL. Otherwise it points to a textual error message.
 
+This is a typical call to \fBpcre_study\fR():
+
+  pcre_extra *pe;
+  pe = pcre_study(
+    re,             /* result of pcre_compile() */
+    0,              /* no options exist */
+    &error);        /* set to NULL or points to a message */
+
 At present, studying a pattern is useful only for non-anchored patterns that do
 not have a single fixed starting character. A bitmap of possible starting
 characters is created.
@@ -309,13 +335,24 @@ the following negative numbers:
   PCRE_ERROR_BADMAGIC   the "magic number" was not found
   PCRE_ERROR_BADOPTION  the value of \fIwhat\fR was invalid
 
+Here is a typical call of \fBpcre_fullinfo()\fR, to obtain the length of the
+compiled pattern:
+
+  int rc;
+  unsigned long int length;
+  rc = pcre_fullinfo(
+    re,               /* result of pcre_compile() */
+    pe,               /* result of pcre_study(), or NULL */
+    PCRE_INFO_SIZE,   /* what is required */
+    &length);         /* where to put the data */
+
 The possible values for the third argument are defined in \fBpcre.h\fR, and are
 as follows:
 
   PCRE_INFO_OPTIONS
 
 Return a copy of the options with which the pattern was compiled. The fourth
-argument should point to au \fBunsigned long int\fR variable. These option bits
+argument should point to an \fBunsigned long int\fR variable. These option bits
 are those specified in the call to \fBpcre_compile()\fR, modified by any
 top-level option settings within the pattern itself, and with the PCRE_ANCHORED
 bit forcibly set if the form of the pattern implies that it can match only at
@@ -396,6 +433,20 @@ pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
 pattern has been studied, the result of the study should be passed in the
 \fIextra\fR argument. Otherwise this must be NULL.
 
+Here is an example of a simple call to \fBpcre_exec()\fR:
+
+  int rc;
+  int ovector[30];
+  rc = pcre_exec(
+    re,             /* result of pcre_compile() */
+    NULL,           /* we didn't study the pattern */
+    "some string",  /* the subject string */
+    11,             /* the length of the subject string */
+    0,              /* start at offset 0 in the subject */
+    0,              /* default options */
+    ovector,        /* vector for substring information */
+    30);            /* number of elements in the vector */
+
 The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose
 unused bits must be zero. However, if a pattern was compiled with
 PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
@@ -437,9 +488,9 @@ below) and trying an ordinary match again.
 
 The subject string is passed as a pointer in \fIsubject\fR, a length in
 \fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern
-string, it may contain binary zero characters. When the starting offset is
-zero, the search for a match starts at the beginning of the subject, and this
-is by far the most common case.
+string, the subject may contain binary zero characters. When the starting
+offset is zero, the search for a match starts at the beginning of the subject,
+and this is by far the most common case.
 
 A non-zero starting offset is useful when searching for another match in the
 same subject by calling \fBpcre_exec()\fR again after a previous success.
@@ -626,8 +677,9 @@ There are some size limitations in PCRE but it is hoped that they will never in
 practice be relevant.
 The maximum length of a compiled pattern is 65539 (sic) bytes.
 All values in repeating quantifiers must be less than 65536.
-The maximum number of capturing subpatterns is 99.
-The maximum number of all parenthesized subpatterns, including capturing
+There maximum number of capturing subpatterns is 65535.
+There is no limit to the number of non-capturing subpatterns, but the maximum
+depth of nesting of all kinds of parenthesized subpattern, including capturing
 subpatterns, assertions, and other types of subpattern, is 200.
 
 The maximum length of a subject string is the largest positive number that an
@@ -949,7 +1001,7 @@ PCRE_MULTILINE is set.
 
 Note that the sequences \\A, \\Z, and \\z can be used to match the start and
 end of the subject in both modes, and if all branches of a pattern start with
-\\A is it always anchored, whether PCRE_MULTILINE is set or not.
+\\A it is always anchored, whether PCRE_MULTILINE is set or not.
 
 
 .SH FULL STOP (PERIOD, DOT)
@@ -1053,7 +1105,7 @@ negation, which is indicated by a ^ character after the colon. For example,
 
   [12[:^digit:]]
 
-matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIX
+matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
 syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
 supported, and an error is given if they are encountered.
 
@@ -1151,7 +1203,7 @@ For example, if the string "the red king" is matched against the pattern
   the ((red|white) (king|queen))
 
 the captured substrings are "red king", "red", and "king", and are numbered 1,
-2, and 3.
+2, and 3, respectively.
 
 The fact that plain parentheses fulfil two functions is not always helpful.
 There are often times when a grouping subpattern is required without a
@@ -1792,6 +1844,137 @@ The following UTF-8 features of Perl 5.6 are not implemented:
 
 2. The use of Unicode tables and properties and escapes \\p, \\P, and \\X.
 
+
+.SH SAMPLE PROGRAM
+The code below is a simple, complete demonstration program, to get you started
+with using PCRE. This code is also supplied in the file \fIpcredemo.c\fR in the
+PCRE distribution.
+
+The program compiles the regular expression that is its first argument, and
+matches it against the subject string in its second argument. No options are
+set, and default character tables are used. If matching succeeds, the program
+outputs the portion of the subject that matched, together with the contents of
+any captured substrings.
+
+On a Unix system that has PCRE installed in \fI/usr/local\fR, you can compile
+the demonstration program using a command like this:
+
+  gcc -o pcredemo pcredemo.c -I/usr/local/include -L/usr/local/lib -lpcre
+
+Then you can run simple tests like this:
+
+  ./pcredemo 'cat|dog' 'the cat sat on the mat'
+
+Note that there is a much more comprehensive test program, called
+\fBpcretest\fR, which supports many more facilities for testing regular
+expressions. The \fBpcredemo\fR program is provided as a simple coding example.
+
+On some operating systems (e.g. Solaris) you may get an error like this when
+you try to run \fBpcredemo\fR:
+
+  ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
+
+This is caused by the way shared library support works on those systems. You
+need to add
+
+  -R/usr/local/lib
+
+to the compile command to get round this problem. Here's the code:
+
+  #include <stdio.h>
+  #include <string.h>
+  #include <pcre.h>
+
+  #define OVECCOUNT 30    /* should be a multiple of 3 */
+
+  int main(int argc, char **argv)
+  {
+  pcre *re;
+  const char *error;
+  int erroffset;
+  int ovector[OVECCOUNT];
+  int rc, i;
+
+  if (argc != 3)
+    {
+    printf("Two arguments required: a regex and a "
+      "subject string\\n");
+    return 1;
+    }
+
+  /* Compile the regular expression in the first argument */
+
+  re = pcre_compile(
+    argv[1],     /* the pattern */
+    0,           /* default options */
+    &error,      /* for error message */
+    &erroffset,  /* for error offset */
+    NULL);       /* use default character tables */
+
+  /* Compilation failed: print the error message and exit */
+
+  if (re == NULL)
+    {
+    printf("PCRE compilation failed at offset %d: %s\\n",
+      erroffset, error);
+    return 1;
+    }
+
+  /* Compilation succeeded: match the subject in the second
+     argument */
+
+  rc = pcre_exec(
+    re,          /* the compiled pattern */
+    NULL,        /* we didn't study the pattern */
+    argv[2],     /* the subject string */
+    (int)strlen(argv[2]), /* the length of the subject */
+    0,           /* start at offset 0 in the subject */
+    0,           /* default options */
+    ovector,     /* vector for substring information */
+    OVECCOUNT);  /* number of elements in the vector */
+
+  /* Matching failed: handle error cases */
+
+  if (rc < 0)
+    {
+    switch(rc)
+      {
+      case PCRE_ERROR_NOMATCH: printf("No match\\n"); break;
+      /*
+      Handle other special cases if you like
+      */
+      default: printf("Matching error %d\\n", rc); break;
+      }
+    return 1;
+    }
+
+  /* Match succeded */
+
+  printf("Match succeeded\\n");
+
+  /* The output vector wasn't big enough */
+
+  if (rc == 0)
+    {
+    rc = OVECCOUNT/3;
+    printf("ovector only has room for %d captured "
+      substrings\\n", rc - 1);
+    }
+
+  /* Show substrings stored in the output vector */
+
+  for (i = 0; i < rc; i++)
+    {
+    char *substring_start = argv[2] + ovector[2*i];
+    int substring_length = ovector[2*i+1] - ovector[2*i];
+    printf("%2d: %.*s\\n", i, substring_length,
+      substring_start);
+    }
+
+  return 0;
+  }
+
+
 .SH AUTHOR
 Philip Hazel <ph10@cam.ac.uk>
 .br
@@ -1803,8 +1986,6 @@ Cambridge CB2 3QG, England.
 .br
 Phone: +44 1223 334714
 
-Last updated: 28 August 2000,
-.br
-  the 250th anniversary of the death of J.S. Bach.
+Last updated: 15 August 2001
 .br
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcre.html b/ext/pcre/pcrelib/doc/pcre.html
index 01770975e2..3e9eb36b68 100644
--- a/ext/pcre/pcrelib/doc/pcre.html
+++ b/ext/pcre/pcrelib/doc/pcre.html
@@ -38,7 +38,8 @@ conversion went wrong.
 <LI><A NAME="TOC28" HREF="#SEC28">RECURSIVE PATTERNS</A>
 <LI><A NAME="TOC29" HREF="#SEC29">PERFORMANCE</A>
 <LI><A NAME="TOC30" HREF="#SEC30">UTF-8 SUPPORT</A>
-<LI><A NAME="TOC31" HREF="#SEC31">AUTHOR</A>
+<LI><A NAME="TOC31" HREF="#SEC31">SAMPLE PROGRAM</A>
+<LI><A NAME="TOC32" HREF="#SEC32">AUTHOR</A>
 </UL>
 <LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
 <P>
@@ -126,7 +127,9 @@ use these to include support for different releases.
 </P>
 <P>
 The functions <B>pcre_compile()</B>, <B>pcre_study()</B>, and <B>pcre_exec()</B>
-are used for compiling and matching regular expressions.
+are used for compiling and matching regular expressions. A sample program that
+demonstrates the simplest way of using them is given in the file
+<I>pcredemo.c</I>. The last section of this man page describes how to run it.
 </P>
 <P>
 The functions <B>pcre_copy_substring()</B>, <B>pcre_get_substring()</B>, and
@@ -168,11 +171,16 @@ the same compiled pattern can safely be used by several threads at once.
 The function <B>pcre_compile()</B> is called to compile a pattern into an
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument <I>pattern</I>. A pointer to a single block of memory
-that is obtained via <B>pcre_malloc</B> is returned. This contains the
-compiled code and related data. The <B>pcre</B> type is defined for this for
-convenience, but in fact <B>pcre</B> is just a typedef for <B>void</B>, since the
-contents of the block are not externally defined. It is up to the caller to
-free the memory when it is no longer required.
+that is obtained via <B>pcre_malloc</B> is returned. This contains the compiled
+code and related data. The <B>pcre</B> type is defined for the returned block;
+this is a typedef for a structure whose contents are not externally defined. It
+is up to the caller to free the memory when it is no longer required.
+</P>
+<P>
+Although the compiled code of a PCRE regex is relocatable, that is, it does not
+depend on memory location, the complete <B>pcre</B> data block is not
+fully relocatable, because it contains a copy of the <I>tableptr</I> argument,
+which is an address (see below).
 </P>
 <P>
 The size of a compiled pattern is roughly proportional to the length of the
@@ -206,6 +214,22 @@ locale. Otherwise, <I>tableptr</I> must be the result of a call to
 <B>pcre_maketables()</B>. See the section on locale support below.
 </P>
 <P>
+This code fragment shows a typical straightforward call to <B>pcre_compile()</B>:
+</P>
+<P>
+<PRE>
+  pcre *re;
+  const char *error;
+  int erroffset;
+  re = pcre_compile(
+    "^A.*Z",          /* the pattern */
+    0,                /* default options */
+    &error,           /* for error message */
+    &erroffset,       /* for error offset */
+    NULL);            /* use default character tables */
+</PRE>
+</P>
+<P>
 The following option bits are defined in the header file:
 </P>
 <P>
@@ -329,10 +353,10 @@ Details of exactly what it entails are given below.
 When a pattern is going to be used several times, it is worth spending more
 time analyzing it in order to speed up the time taken for matching. The
 function <B>pcre_study()</B> takes a pointer to a compiled pattern as its first
-argument, and returns a pointer to a <B>pcre_extra</B> block (another <B>void</B>
-typedef) containing additional information about the pattern; this can be
-passed to <B>pcre_exec()</B>. If no additional information is available, NULL
-is returned.
+argument, and returns a pointer to a <B>pcre_extra</B> block (another typedef
+for a structure with hidden contents) containing additional information about
+the pattern; this can be passed to <B>pcre_exec()</B>. If no additional
+information is available, NULL is returned.
 </P>
 <P>
 The second argument contains option bits. At present, no options are defined
@@ -344,6 +368,18 @@ studying succeeds (even if no data is returned), the variable it points to is
 set to NULL. Otherwise it points to a textual error message.
 </P>
 <P>
+This is a typical call to <B>pcre_study</B>():
+</P>
+<P>
+<PRE>
+  pcre_extra *pe;
+  pe = pcre_study(
+    re,             /* result of pcre_compile() */
+    0,              /* no options exist */
+    &error);        /* set to NULL or points to a message */
+</PRE>
+</P>
+<P>
 At present, studying a pattern is useful only for non-anchored patterns that do
 not have a single fixed starting character. A bitmap of possible starting
 characters is created.
@@ -403,6 +439,21 @@ the following negative numbers:
 </PRE>
 </P>
 <P>
+Here is a typical call of <B>pcre_fullinfo()</B>, to obtain the length of the
+compiled pattern:
+</P>
+<P>
+<PRE>
+  int rc;
+  unsigned long int length;
+  rc = pcre_fullinfo(
+    re,               /* result of pcre_compile() */
+    pe,               /* result of pcre_study(), or NULL */
+    PCRE_INFO_SIZE,   /* what is required */
+    &length);         /* where to put the data */
+</PRE>
+</P>
+<P>
 The possible values for the third argument are defined in <B>pcre.h</B>, and are
 as follows:
 </P>
@@ -413,7 +464,7 @@ as follows:
 </P>
 <P>
 Return a copy of the options with which the pattern was compiled. The fourth
-argument should point to au <B>unsigned long int</B> variable. These option bits
+argument should point to an <B>unsigned long int</B> variable. These option bits
 are those specified in the call to <B>pcre_compile()</B>, modified by any
 top-level option settings within the pattern itself, and with the PCRE_ANCHORED
 bit forcibly set if the form of the pattern implies that it can match only at
@@ -528,6 +579,24 @@ pattern has been studied, the result of the study should be passed in the
 <I>extra</I> argument. Otherwise this must be NULL.
 </P>
 <P>
+Here is an example of a simple call to <B>pcre_exec()</B>:
+</P>
+<P>
+<PRE>
+  int rc;
+  int ovector[30];
+  rc = pcre_exec(
+    re,             /* result of pcre_compile() */
+    NULL,           /* we didn't study the pattern */
+    "some string",  /* the subject string */
+    11,             /* the length of the subject string */
+    0,              /* start at offset 0 in the subject */
+    0,              /* default options */
+    ovector,        /* vector for substring information */
+    30);            /* number of elements in the vector */
+</PRE>
+</P>
+<P>
 The PCRE_ANCHORED option can be passed in the <I>options</I> argument, whose
 unused bits must be zero. However, if a pattern was compiled with
 PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
@@ -588,9 +657,9 @@ below) and trying an ordinary match again.
 <P>
 The subject string is passed as a pointer in <I>subject</I>, a length in
 <I>length</I>, and a starting offset in <I>startoffset</I>. Unlike the pattern
-string, it may contain binary zero characters. When the starting offset is
-zero, the search for a match starts at the beginning of the subject, and this
-is by far the most common case.
+string, the subject may contain binary zero characters. When the starting
+offset is zero, the search for a match starts at the beginning of the subject,
+and this is by far the most common case.
 </P>
 <P>
 A non-zero starting offset is useful when searching for another match in the
@@ -833,8 +902,9 @@ There are some size limitations in PCRE but it is hoped that they will never in
 practice be relevant.
 The maximum length of a compiled pattern is 65539 (sic) bytes.
 All values in repeating quantifiers must be less than 65536.
-The maximum number of capturing subpatterns is 99.
-The maximum number of all parenthesized subpatterns, including capturing
+There maximum number of capturing subpatterns is 65535.
+There is no limit to the number of non-capturing subpatterns, but the maximum
+depth of nesting of all kinds of parenthesized subpattern, including capturing
 subpatterns, assertions, and other types of subpattern, is 200.
 </P>
 <P>
@@ -1225,7 +1295,7 @@ PCRE_MULTILINE is set.
 <P>
 Note that the sequences \A, \Z, and \z can be used to match the start and
 end of the subject in both modes, and if all branches of a pattern start with
-\A is it always anchored, whether PCRE_MULTILINE is set or not.
+\A it is always anchored, whether PCRE_MULTILINE is set or not.
 </P>
 <LI><A NAME="SEC16" HREF="#TOC1">FULL STOP (PERIOD, DOT)</A>
 <P>
@@ -1350,7 +1420,7 @@ negation, which is indicated by a ^ character after the colon. For example,
 </PRE>
 </P>
 <P>
-matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIX
+matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
 syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
 supported, and an error is given if they are encountered.
 </P>
@@ -1482,7 +1552,7 @@ For example, if the string "the red king" is matched against the pattern
 </P>
 <P>
 the captured substrings are "red king", "red", and "king", and are numbered 1,
-2, and 3.
+2, and 3, respectively.
 </P>
 <P>
 The fact that plain parentheses fulfil two functions is not always helpful.
@@ -2375,7 +2445,213 @@ The following UTF-8 features of Perl 5.6 are not implemented:
 <P>
 2. The use of Unicode tables and properties and escapes \p, \P, and \X.
 </P>
-<LI><A NAME="SEC31" HREF="#TOC1">AUTHOR</A>
+<LI><A NAME="SEC31" HREF="#TOC1">SAMPLE PROGRAM</A>
+<P>
+The code below is a simple, complete demonstration program, to get you started
+with using PCRE. This code is also supplied in the file <I>pcredemo.c</I> in the
+PCRE distribution.
+</P>
+<P>
+The program compiles the regular expression that is its first argument, and
+matches it against the subject string in its second argument. No options are
+set, and default character tables are used. If matching succeeds, the program
+outputs the portion of the subject that matched, together with the contents of
+any captured substrings.
+</P>
+<P>
+On a Unix system that has PCRE installed in <I>/usr/local</I>, you can compile
+the demonstration program using a command like this:
+</P>
+<P>
+<PRE>
+  gcc -o pcredemo pcredemo.c -I/usr/local/include -L/usr/local/lib -lpcre
+</PRE>
+</P>
+<P>
+Then you can run simple tests like this:
+</P>
+<P>
+<PRE>
+  ./pcredemo 'cat|dog' 'the cat sat on the mat'
+</PRE>
+</P>
+<P>
+Note that there is a much more comprehensive test program, called
+<B>pcretest</B>, which supports many more facilities for testing regular
+expressions. The <B>pcredemo</B> program is provided as a simple coding example.
+</P>
+<P>
+On some operating systems (e.g. Solaris) you may get an error like this when
+you try to run <B>pcredemo</B>:
+</P>
+<P>
+<PRE>
+  ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
+</PRE>
+</P>
+<P>
+This is caused by the way shared library support works on those systems. You
+need to add
+</P>
+<P>
+<PRE>
+  -R/usr/local/lib
+</PRE>
+</P>
+<P>
+to the compile command to get round this problem. Here's the code:
+</P>
+<P>
+<PRE>
+  #include &#60;stdio.h&#62;
+  #include &#60;string.h&#62;
+  #include &#60;pcre.h&#62;
+</PRE>
+</P>
+<P>
+<PRE>
+  #define OVECCOUNT 30    /* should be a multiple of 3 */
+</PRE>
+</P>
+<P>
+<PRE>
+  int main(int argc, char **argv)
+  {
+  pcre *re;
+  const char *error;
+  int erroffset;
+  int ovector[OVECCOUNT];
+  int rc, i;
+</PRE>
+</P>
+<P>
+<PRE>
+  if (argc != 3)
+    {
+    printf("Two arguments required: a regex and a "
+      "subject string\n");
+    return 1;
+    }
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Compile the regular expression in the first argument */
+</PRE>
+</P>
+<P>
+<PRE>
+  re = pcre_compile(
+    argv[1],     /* the pattern */
+    0,           /* default options */
+    &error,      /* for error message */
+    &erroffset,  /* for error offset */
+    NULL);       /* use default character tables */
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Compilation failed: print the error message and exit */
+</PRE>
+</P>
+<P>
+<PRE>
+  if (re == NULL)
+    {
+    printf("PCRE compilation failed at offset %d: %s\n",
+      erroffset, error);
+    return 1;
+    }
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Compilation succeeded: match the subject in the second
+     argument */
+</PRE>
+</P>
+<P>
+<PRE>
+  rc = pcre_exec(
+    re,          /* the compiled pattern */
+    NULL,        /* we didn't study the pattern */
+    argv[2],     /* the subject string */
+    (int)strlen(argv[2]), /* the length of the subject */
+    0,           /* start at offset 0 in the subject */
+    0,           /* default options */
+    ovector,     /* vector for substring information */
+    OVECCOUNT);  /* number of elements in the vector */
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Matching failed: handle error cases */
+</PRE>
+</P>
+<P>
+<PRE>
+  if (rc &#60; 0)
+    {
+    switch(rc)
+      {
+      case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+      /*
+      Handle other special cases if you like
+      */
+      default: printf("Matching error %d\n", rc); break;
+      }
+    return 1;
+    }
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Match succeded */
+</PRE>
+</P>
+<P>
+<PRE>
+  printf("Match succeeded\n");
+</PRE>
+</P>
+<P>
+<PRE>
+  /* The output vector wasn't big enough */
+</PRE>
+</P>
+<P>
+<PRE>
+  if (rc == 0)
+    {
+    rc = OVECCOUNT/3;
+    printf("ovector only has room for %d captured "
+      substrings\n", rc - 1);
+    }
+</PRE>
+</P>
+<P>
+<PRE>
+  /* Show substrings stored in the output vector */
+</PRE>
+</P>
+<P>
+<PRE>
+  for (i = 0; i &#60; rc; i++)
+    {
+    char *substring_start = argv[2] + ovector[2*i];
+    int substring_length = ovector[2*i+1] - ovector[2*i];
+    printf("%2d: %.*s\n", i, substring_length,
+      substring_start);
+    }
+</PRE>
+</P>
+<P>
+<PRE>
+  return 0;
+  }
+</PRE>
+</P>
+<LI><A NAME="SEC32" HREF="#TOC1">AUTHOR</A>
 <P>
 Philip Hazel &#60;ph10@cam.ac.uk&#62;
 <BR>
@@ -2388,10 +2664,6 @@ Cambridge CB2 3QG, England.
 Phone: +44 1223 334714
 </P>
 <P>
-Last updated: 28 August 2000,
-<BR>
-<PRE>
-  the 250th anniversary of the death of J.S. Bach.
+Last updated: 15 August 2001
 <BR>
-</PRE>
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcre.txt b/ext/pcre/pcrelib/doc/pcre.txt
index 1db4b537b7..95f148f3de 100644
--- a/ext/pcre/pcrelib/doc/pcre.txt
+++ b/ext/pcre/pcrelib/doc/pcre.txt
@@ -74,7 +74,10 @@ DESCRIPTION
      releases.
 
      The functions pcre_compile(), pcre_study(), and  pcre_exec()
-     are used for compiling and matching regular expressions.
+     are  used  for compiling and matching regular expressions. A
+     sample program that demonstrates the simplest way  of  using
+     them  is  given  in the file pcredemo.c. The last section of
+     this man page describes how to run it.
 
      The functions  pcre_copy_substring(),  pcre_get_substring(),
      and  pcre_get_substring_list() are convenience functions for
@@ -104,19 +107,10 @@ DESCRIPTION
 
 
 MULTI-THREADING
-     The  PCRE  functions  can   be   used   in   multi-threading
-
-
-
-
-
-SunOS 5.8                 Last change:                          2
-
-
-
-     applications,  with  the  proviso that the memory management
-     functions pointed to by pcre_malloc and pcre_free are shared
-     by all threads.
+     The PCRE functions can be used in  multi-threading  applica-
+     tions, with the proviso that the memory management functions
+     pointed to by pcre_malloc and pcre_free are  shared  by  all
+     threads.
 
      The compiled form of a regular  expression  is  not  altered
      during  matching, so the same compiled pattern can safely be
@@ -130,11 +124,16 @@ COMPILING A PATTERN
      by a binary zero, and is passed in the argument  pattern.  A
      pointer  to  a  single  block of memory that is obtained via
      pcre_malloc is returned. This contains the compiled code and
-     related data. The pcre type is defined for this for conveni-
-     ence, but in fact pcre is just a typedef for void, since the
-     contents  of  the block are not externally defined. It is up
-     to the caller to free  the  memory  when  it  is  no  longer
-     required.
+     related  data.  The  pcre  type  is defined for the returned
+     block; this is a typedef for a structure whose contents  are
+     not  externally  defined. It is up to the caller to free the
+     memory when it is no longer required.
+
+     Although the compiled code of a PCRE regex  is  relocatable,
+     that is, it does not depend on memory location, the complete
+     pcre data block is not fully relocatable,  because  it  con-
+     tains  a  copy of the tableptr argument, which is an address
+     (see below).
 
      The size of a compiled pattern is  roughly  proportional  to
      the length of the pattern string, except that each character
@@ -169,6 +168,19 @@ COMPILING A PATTERN
      must  be  the result of a call to pcre_maketables(). See the
      section on locale support below.
 
+     This code fragment shows a typical straightforward  call  to
+     pcre_compile():
+
+       pcre *re;
+       const char *error;
+       int erroffset;
+       re = pcre_compile(
+         "^A.*Z",          /* the pattern */
+         0,                /* default options */
+         &error,           /* for error message */
+         &erroffset,       /* for error offset */
+         NULL);            /* use default character tables */
+
      The following option bits are defined in the header file:
 
        PCRE_ANCHORED
@@ -271,12 +283,12 @@ STUDYING A PATTERN
      When a pattern is going to be  used  several  times,  it  is
      worth  spending  more time analyzing it in order to speed up
      the time taken for matching. The function pcre_study() takes
-
      a  pointer  to a compiled pattern as its first argument, and
-     returns a  pointer  to  a  pcre_extra  block  (another  void
-     typedef)  containing  additional  information about the pat-
-     tern; this can be passed to pcre_exec().  If  no  additional
-     information is available, NULL is returned.
+     returns a pointer to a pcre_extra block (another typedef for
+     a  structure  with  hidden  contents)  containing additional
+     information  about  the  pattern;  this  can  be  passed  to
+     pcre_exec(). If no additional information is available, NULL
+     is returned.
 
      The second argument contains option  bits.  At  present,  no
      options  are  defined  for  pcre_study(),  and this argument
@@ -287,6 +299,14 @@ STUDYING A PATTERN
      the variable it points to  is  set  to  NULL.  Otherwise  it
      points to a textual error message.
 
+     This is a typical call to pcre_study():
+
+       pcre_extra *pe;
+       pe = pcre_study(
+         re,             /* result of pcre_compile() */
+         0,              /* no options exist */
+         &error);        /* set to NULL or points to a message */
+
      At present, studying a  pattern  is  useful  only  for  non-
      anchored  patterns  that do not have a single fixed starting
      character. A  bitmap  of  possible  starting  characters  is
@@ -347,13 +367,24 @@ INFORMATION ABOUT A PATTERN
        PCRE_ERROR_BADMAGIC   the "magic number" was not found
        PCRE_ERROR_BADOPTION  the value of what was invalid
 
+     Here is a typical call of  pcre_fullinfo(),  to  obtain  the
+     length of the compiled pattern:
+
+       int rc;
+       unsigned long int length;
+       rc = pcre_fullinfo(
+         re,               /* result of pcre_compile() */
+         pe,               /* result of pcre_study(), or NULL */
+         PCRE_INFO_SIZE,   /* what is required */
+         &length);         /* where to put the data */
+
      The possible values for the third argument  are  defined  in
      pcre.h, and are as follows:
 
        PCRE_INFO_OPTIONS
 
      Return a copy of the options with which the pattern was com-
-     piled.  The fourth argument should point to au unsigned long
+     piled.  The fourth argument should point to an unsigned long
      int variable. These option bits are those specified  in  the
      call  to  pcre_compile(),  modified  by any top-level option
      settings  within  the   pattern   itself,   and   with   the
@@ -375,9 +406,9 @@ INFORMATION ABOUT A PATTERN
 
        PCRE_INFO_BACKREFMAX
 
-     Return the number of  the  highest  back  reference  in  the
-     pattern.  The  fourth  argument should point to an int vari-
-     able. Zero is returned if there are no back references.
+     Return the number of the highest back reference in the  pat-
+     tern.  The  fourth argument should point to an int variable.
+     Zero is returned if there are no back references.
 
        PCRE_INFO_FIRSTCHAR
 
@@ -440,11 +471,34 @@ INFORMATION ABOUT A PATTERN
 
 MATCHING A PATTERN
      The function pcre_exec() is called to match a subject string
+
+
+
+
+
+SunOS 5.8                 Last change:                          9
+
+
+
      against  a pre-compiled pattern, which is passed in the code
      argument. If the pattern has been studied, the result of the
      study should be passed in the extra argument. Otherwise this
      must be NULL.
 
+     Here is an example of a simple call to pcre_exec():
+
+       int rc;
+       int ovector[30];
+       rc = pcre_exec(
+         re,             /* result of pcre_compile() */
+         NULL,           /* we didn't study the pattern */
+         "some string",  /* the subject string */
+         11,             /* the length of the subject string */
+         0,              /* start at offset 0 in the subject */
+         0,              /* default options */
+         ovector,        /* vector for substring information */
+         30);            /* number of elements in the vector */
+
      The PCRE_ANCHORED option can be passed in the options  argu-
      ment,  whose unused bits must be zero. However, if a pattern
      was  compiled  with  PCRE_ANCHORED,  or  turned  out  to  be
@@ -495,10 +549,10 @@ MATCHING A PATTERN
 
      The subject string is passed as  a  pointer  in  subject,  a
      length  in  length,  and  a  starting offset in startoffset.
-     Unlike the pattern string, it may contain binary zero  char-
-     acters.  When  the starting offset is zero, the search for a
-     match starts at the beginning of the subject, and this is by
-     far the most common case.
+     Unlike the pattern string, the subject  may  contain  binary
+     zero  characters.  When  the  starting  offset  is zero, the
+     search for a match starts at the beginning of  the  subject,
+     and this is by far the most common case.
 
      A non-zero starting offset  is  useful  when  searching  for
      another  match  in  the  same subject by calling pcre_exec()
@@ -634,17 +688,9 @@ MATCHING A PATTERN
 
 
 
+
 EXTRACTING CAPTURED SUBSTRINGS
      Captured substrings can be accessed directly  by  using  the
-
-
-
-
-
-SunOS 5.8                 Last change:                         12
-
-
-
      offsets returned by pcre_exec() in ovector. For convenience,
      the functions  pcre_copy_substring(),  pcre_get_substring(),
      and  pcre_get_substring_list()  are  provided for extracting
@@ -722,10 +768,12 @@ LIMITATIONS
      There are some size limitations in PCRE but it is hoped that
      they will never in practice be relevant.  The maximum length
      of a compiled pattern is 65539 (sic) bytes.  All  values  in
-     repeating  quantifiers must be less than 65536.  The maximum
-     number of capturing subpatterns is 99.  The  maximum  number
-     of  all  parenthesized subpatterns, including capturing sub-
-     patterns, assertions, and other types of subpattern, is 200.
+     repeating  quantifiers  must be less than 65536.  There max-
+     imum number of capturing subpatterns is 65535.  There is  no
+     limit  to  the  number of non-capturing subpatterns, but the
+     maximum depth of nesting of all kinds of parenthesized  sub-
+     pattern,  including  capturing  subpatterns, assertions, and
+     other types of subpattern, is 200.
 
      The maximum length of a subject string is the largest  posi-
      tive number that an integer variable can hold. However, PCRE
@@ -901,6 +949,7 @@ BACKSLASH
      The backslash character has several uses. Firstly, if it  is
      followed  by  a  non-alphameric character, it takes away any
      special  meaning  that  character  may  have.  This  use  of
+
      backslash  as  an  escape  character applies both inside and
      outside character classes.
 
@@ -1061,7 +1110,6 @@ CIRCUMFLEX AND DOLLAR
      Outside a character class, in the default matching mode, the
      circumflex  character  is an assertion which is true only if
      the current matching point is at the start  of  the  subject
-
      string.  If  the startoffset argument of pcre_exec() is non-
      zero, circumflex can never match. Inside a character  class,
      circumflex has an entirely different meaning (see below).
@@ -1105,7 +1153,7 @@ CIRCUMFLEX AND DOLLAR
 
      Note that the sequences \A, \Z, and \z can be used to  match
      the  start  and end of the subject in both modes, and if all
-     branches of a pattern start with \A is it  always  anchored,
+     branches of a pattern start with \A it is  always  anchored,
      whether PCRE_MULTILINE is set or not.
 
 
@@ -1114,7 +1162,6 @@ FULL STOP (PERIOD, DOT)
      Outside a character class, a dot in the pattern matches  any
      one character in the subject, including a non-printing char-
      acter, but not (by default)  newline.   If  the  PCRE_DOTALL
-
      option  is set, dots match newlines as well. The handling of
      dot is entirely independent of the  handling  of  circumflex
      and  dollar,  the  only  relationship  being  that they both
@@ -1233,7 +1280,7 @@ POSIX CHARACTER CLASSES
        [12[:^digit:]]
 
      matches "1", "2", or any non-digit.  PCRE  (and  Perl)  also
-     recogize  the POSIX syntax [.ch.] and [=ch=] where "ch" is a
+     recognize the POSIX syntax [.ch.] and [=ch=] where "ch" is a
      "collating element", but these are  not  supported,  and  an
      error is given if they are encountered.
 
@@ -1352,7 +1399,7 @@ SUBPATTERNS
        the ((red|white) (king|queen))
 
      the captured substrings are "red king", "red",  and  "king",
-     and are numbered 1, 2, and 3.
+     and are numbered 1, 2, and 3, respectively.
 
      The fact that plain parentheses fulfil two functions is  not
      always  helpful.  There are often times when a grouping sub-
@@ -1423,7 +1470,6 @@ REPETITION
      one that does not match the syntax of a quantifier, is taken
      as  a literal character. For example, {,6} is not a quantif-
      ier, but a literal string of four characters.
-
      The quantifier {0} is permitted, causing the  expression  to
      behave  as  if the previous item and the quantifier were not
      present.
@@ -1528,6 +1574,14 @@ REPETITION
 BACK REFERENCES
      Outside a character class, a backslash followed by  a  digit
      greater  than  0  (and  possibly  further  digits) is a back
+
+
+
+
+SunOS 5.8                 Last change:                         30
+
+
+
      reference to a capturing subpattern  earlier  (i.e.  to  its
      left)  in  the  pattern,  provided there have been that many
      previous capturing left parentheses.
@@ -1583,12 +1637,11 @@ BACK REFERENCES
 
      matches any number of "a"s and also "aba", "ababbaa" etc. At
      each iteration of the subpattern, the back reference matches
-     the  character  string   corresponding   to   the   previous
-     iteration.  In  order  for this to work, the pattern must be
-     such that the first iteration does not  need  to  match  the
-     back  reference.  This  can be done using alternation, as in
-     the example above, or by a  quantifier  with  a  minimum  of
-     zero.
+     the character string corresponding to  the  previous  itera-
+     tion.  In  order  for this to work, the pattern must be such
+     that the first iteration does not need  to  match  the  back
+     reference.  This  can  be  done using alternation, as in the
+     example above, or by a quantifier with a minimum of zero.
 
 
 
@@ -1741,9 +1794,9 @@ ONCE-ONLY SUBPATTERNS
 
      This kind of parenthesis "locks up" the  part of the pattern
      it  contains once it has matched, and a failure further into
-     the  pattern  is  prevented  from  backtracking   into   it.
-     Backtracking  past  it  to previous items, however, works as
-     normal.
+     the pattern is prevented from backtracking  into  it.  Back-
+     tracking  past  it to previous items, however, works as nor-
+     mal.
 
      An alternative description is that a subpattern of this type
      matches  the  string  of  characters that an identical stan-
@@ -2051,8 +2104,8 @@ UTF-8 SUPPORT
      Running with PCRE_UTF8 set causes these changes in  the  way
      PCRE works:
 
-     1. In a pattern, the escape sequence \x{...}, where the con-
-     tents  of  the  braces is a string of hexadecimal digits, is
+     1. In a pattern, the  escape  sequence  \x{...},  where  the
+     contents of the braces is a string of hexadecimal digits, is
      interpreted as a UTF-8 character whose code  number  is  the
      given   hexadecimal  number,  for  example:  \x{1234}.  This
      inserts from one to six  literal  bytes  into  the  pattern,
@@ -2106,6 +2159,7 @@ UTF-8 SUPPORT
 
      The following UTF-8 features of  Perl  5.6  are  not  imple-
      mented:
+
      1. The escape sequence \C to match a single byte.
 
      2. The use of Unicode tables and properties and escapes  \p,
@@ -2113,6 +2167,143 @@ UTF-8 SUPPORT
 
 
 
+SAMPLE PROGRAM
+     The code below is a simple, complete demonstration  program,
+     to  get  you started with using PCRE. This code is also sup-
+     plied in the file pcredemo.c in the PCRE distribution.
+
+     The program compiles the  regular  expression  that  is  its
+     first argument, and matches it against the subject string in
+     its second argument. No options are set, and default charac-
+     ter  tables are used. If matching succeeds, the program out-
+     puts the portion of the subject that matched, together  with
+     the contents of any captured substrings.
+
+     On a Unix system that has PCRE installed in /usr/local,  you
+     can  compile  the demonstration program using a command like
+     this:
+
+       gcc   -o    pcredemo    pcredemo.c    -I/usr/local/include
+     -L/usr/local/lib -lpcre
+
+     Then you can run simple tests like this:
+
+       ./pcredemo 'cat|dog' 'the cat sat on the mat'
+
+     Note that there is a much more comprehensive  test  program,
+     called  pcretest,  which  supports  many more facilities for
+     testing regular expressions. The pcredemo  program  is  pro-
+     vided as a simple coding example.
+
+     On some operating systems (e.g.  Solaris)  you  may  get  an
+     error like this when you try to run pcredemo:
+
+       ld.so.1: a.out: fatal: libpcre.so.0: open failed: No  such
+     file or directory
+
+     This is caused by the way shared library  support  works  on
+     those systems. You need to add
+
+       -R/usr/local/lib
+
+     to the compile command to get round this problem. Here's the
+     code:
+
+       #include <stdio.h>
+       #include <string.h>
+       #include <pcre.h>
+
+       #define OVECCOUNT 30    /* should be a multiple of 3 */
+
+       int main(int argc, char **argv)
+       {
+       pcre *re;
+       const char *error;
+       int erroffset;
+       int ovector[OVECCOUNT];
+       int rc, i;
+
+       if (argc != 3)
+         {
+         printf("Two arguments required: a regex and a "
+           "subject string\n");
+         return 1;
+         }
+
+       /* Compile the regular expression in the first argument */
+
+       re = pcre_compile(
+         argv[1],     /* the pattern */
+         0,           /* default options */
+         &error,      /* for error message */
+         &erroffset,  /* for error offset */
+         NULL);       /* use default character tables */
+
+       /* Compilation failed: print the error message and exit */
+
+       if (re == NULL)
+         {
+         printf("PCRE compilation failed at offset %d: %s\n",
+           erroffset, error);
+         return 1;
+         }
+
+       /* Compilation succeeded: match the subject in the second
+          argument */
+
+       rc = pcre_exec(
+         re,          /* the compiled pattern */
+         NULL,        /* we didn't study the pattern */
+         argv[2],     /* the subject string */
+         (int)strlen(argv[2]), /* the length of the subject */
+         0,           /* start at offset 0 in the subject */
+         0,           /* default options */
+         ovector,     /* vector for substring information */
+         OVECCOUNT);  /* number of elements in the vector */
+
+       /* Matching failed: handle error cases */
+
+       if (rc < 0)
+         {
+         switch(rc)
+           {
+           case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+           /*
+           Handle other special cases if you like
+           */
+           default: printf("Matching error %d\n", rc); break;
+           }
+         return 1;
+         }
+
+       /* Match succeded */
+
+       printf("Match succeeded\n");
+
+       /* The output vector wasn't big enough */
+
+       if (rc == 0)
+         {
+         rc = OVECCOUNT/3;
+         printf("ovector only has room for %d captured "
+           substrings\n", rc - 1);
+         }
+
+       /* Show substrings stored in the output vector */
+
+       for (i = 0; i < rc; i++)
+         {
+         char *substring_start = argv[2] + ovector[2*i];
+         int substring_length = ovector[2*i+1] - ovector[2*i];
+         printf("%2d: %.*s\n", i, substring_length,
+           substring_start);
+         }
+
+       return 0;
+       }
+
+
+
 AUTHOR
      Philip Hazel <ph10@cam.ac.uk>
      University Computing Service,
@@ -2120,6 +2311,5 @@ AUTHOR
      Cambridge CB2 3QG, England.
      Phone: +44 1223 334714
 
-     Last updated: 28 August 2000,
-       the 250th anniversary of the death of J.S. Bach.
-     Copyright (c) 1997-2000 University of Cambridge.
+     Last updated: 15 August 2001
+     Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcregrep.1 b/ext/pcre/pcrelib/doc/pcregrep.1
index 41b9051037..5d3151e867 100644
--- a/ext/pcre/pcrelib/doc/pcregrep.1
+++ b/ext/pcre/pcrelib/doc/pcregrep.1
@@ -2,7 +2,7 @@
 .SH NAME
 pcregrep - a grep with Perl-compatible regular expressions.
 .SH SYNOPSIS
-.B pcregrep [-Vchilnsvx] pattern [file] ...
+.B pcregrep [-Vcfhilnrsvx] pattern [file] ...
 
 
 .SH DESCRIPTION
@@ -32,6 +32,12 @@ Do not print individual lines; instead just print a count of the number of
 lines that would otherwise have been printed. If several files are given, a
 count is printed for each of them.
 .TP
+\fB-f\fIfilename\fR
+Read patterns from the file, one per line, and match all patterns against each
+line. There is a maximum of 100 patterns. Trailing white space is removed, and
+blank lines are ignored. An empty file contains no patterns and therefore
+matches nothing.
+.TP
 \fB-h\fR
 Suppress printing of filenames when searching multiple files.
 .TP
@@ -46,6 +52,10 @@ once, on a separate line.
 \fB-n\fR
 Precede each line by its line number in the file.
 .TP
+\fB-r\fR
+If any file is a directory, recursively scan the files it contains. Without
+\fB-r\fR a directory is scanned as a normal file.
+.TP
 \fB-s\fR
 Work silently, that is, display nothing except error messages.
 The exit status indicates whether any matches were found.
@@ -72,5 +82,7 @@ for syntax errors or inacessible files (even if matches were found).
 
 .SH AUTHOR
 Philip Hazel <ph10@cam.ac.uk>
+
+Last updated: 15 August 2001
 .br
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcregrep.html b/ext/pcre/pcrelib/doc/pcregrep.html
index 77da7c426c..7bc210c65a 100644
--- a/ext/pcre/pcrelib/doc/pcregrep.html
+++ b/ext/pcre/pcrelib/doc/pcregrep.html
@@ -22,7 +22,7 @@ pcregrep - a grep with Perl-compatible regular expressions.
 </P>
 <LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
 <P>
-<B>pcregrep [-Vchilnsvx] pattern [file] ...</B>
+<B>pcregrep [-Vcfhilnrsvx] pattern [file] ...</B>
 </P>
 <LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
 <P>
@@ -55,6 +55,13 @@ lines that would otherwise have been printed. If several files are given, a
 count is printed for each of them.
 </P>
 <P>
+\fB-f<I>filename</I>
+Read patterns from the file, one per line, and match all patterns against each
+line. There is a maximum of 100 patterns. Trailing white space is removed, and
+blank lines are ignored. An empty file contains no patterns and therefore
+matches nothing.
+</P>
+<P>
 <B>-h</B>
 Suppress printing of filenames when searching multiple files.
 </P>
@@ -73,6 +80,11 @@ once, on a separate line.
 Precede each line by its line number in the file.
 </P>
 <P>
+<B>-r</B>
+If any file is a directory, recursively scan the files it contains. Without
+<B>-r</B> a directory is scanned as a normal file.
+</P>
+<P>
 <B>-s</B>
 Work silently, that is, display nothing except error messages.
 The exit status indicates whether any matches were found.
@@ -101,5 +113,8 @@ for syntax errors or inacessible files (even if matches were found).
 <LI><A NAME="SEC7" HREF="#TOC1">AUTHOR</A>
 <P>
 Philip Hazel &#60;ph10@cam.ac.uk&#62;
+</P>
+<P>
+Last updated: 15 August 2001
 <BR>
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcregrep.txt b/ext/pcre/pcrelib/doc/pcregrep.txt
index 3483f9e158..1600228402 100644
--- a/ext/pcre/pcrelib/doc/pcregrep.txt
+++ b/ext/pcre/pcrelib/doc/pcregrep.txt
@@ -4,7 +4,7 @@ NAME
 
 
 SYNOPSIS
-     pcregrep [-Vchilnsvx] pattern [file] ...
+     pcregrep [-Vcfhilnrsvx] pattern [file] ...
 
 
 
@@ -37,6 +37,14 @@ OPTIONS
                wise have  been  printed.  If  several  files  are
                given, a count is printed for each of them.
 
+     -ffilename
+               Read patterns from the file,  one  per  line,  and
+               match  all  patterns against each line. There is a
+               maximum of 100 patterns. Trailing white  space  is
+               removed,  and  blank  lines  are ignored. An empty
+               file contains no patterns  and  therefore  matches
+               nothing.
+
      -h        Suppress printing of filenames when searching mul-
                tiple files.
 
@@ -44,12 +52,17 @@ OPTIONS
                parisons.
 
      -l        Instead of printing lines  from  the  files,  just
+
                print the names of the files containing lines that
                would have been printed. Each file name is printed
                once, on a separate line.
 
      -n        Precede each line by its line number in the file.
 
+     -r        If any file is a directory, recursively  scan  the
+               files  it  contains.  Without  -r  a  directory is
+               scanned as a normal file.
+
      -s        Work silently, that  is,  display  nothing  except
                error messages.  The exit status indicates whether
                any matches were found.
@@ -83,5 +96,6 @@ DIAGNOSTICS
 
 AUTHOR
      Philip Hazel <ph10@cam.ac.uk>
-     Copyright (c) 1997-2000 University of Cambridge.
 
+     Last updated: 15 August 2001
+     Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcretest.1 b/ext/pcre/pcrelib/doc/pcretest.1
new file mode 100644
index 0000000000..b2e25560d7
--- /dev/null
+++ b/ext/pcre/pcrelib/doc/pcretest.1
@@ -0,0 +1,282 @@
+.TH PCRETEST 1
+.SH NAME
+pcretest - a program for testing Perl-compatible regular expressions.
+.SH SYNOPSIS
+.B pcretest "[-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]"
+
+\fBpcretest\fR was written as a test program for the PCRE regular expression
+library itself, but it can also be used for experimenting with regular
+expressions. This man page describes the features of the test program; for
+details of the regular expressions themselves, see the \fBpcre\fR man page.
+
+.SH OPTIONS
+.TP 10
+\fB-d\fR
+Behave as if each regex had the \fB/D\fR modifier (see below); the internal
+form is output after compilation.
+.TP 10
+\fB-i\fR
+Behave as if each regex had the \fB/I\fR modifier; information about the
+compiled pattern is given after compilation.
+.TP 10
+\fB-m\fR
+Output the size of each compiled pattern after it has been compiled. This is
+equivalent to adding /M to each regular expression. For compatibility with
+earlier versions of pcretest, \fB-s\fR is a synonym for \fB-m\fR.
+.TP 10
+\fB-o\fR \fIosize\fR
+Set the number of elements in the output vector that is used when calling PCRE
+to be \fIosize\fR. The default value is 45, which is enough for 14 capturing
+subexpressions. The vector size can be changed for individual matching calls by
+including \\O in the data line (see below).
+.TP 10
+\fB-p\fR
+Behave as if each regex has \fB/P\fR modifier; the POSIX wrapper API is used
+to call PCRE. None of the other options has any effect when \fB-p\fR is set.
+.TP 10
+\fB-t\fR
+Run each compile, study, and match 20000 times with a timer, and output
+resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
+\fB-m\fR, because you will then get the size output 20000 times and the timing
+will be distorted.
+
+
+.SH DESCRIPTION
+
+If \fBpcretest\fR is given two filename arguments, it reads from the first and
+writes to the second. If it is given only one filename argument, it reads from
+that file and writes to stdout. Otherwise, it reads from stdin and writes to
+stdout, and prompts for each line of input, using "re>" to prompt for regular
+expressions, and "data>" to prompt for data lines.
+
+The program handles any number of sets of input on a single input file. Each
+set starts with a regular expression, and continues with any number of data
+lines to be matched against the pattern. An empty line signals the end of the
+data lines, at which point a new regular expression is read. The regular
+expressions are given enclosed in any non-alphameric delimiters other than
+backslash, for example
+
+  /(a|bc)x+yz/
+
+White space before the initial delimiter is ignored. A regular expression may
+be continued over several input lines, in which case the newline characters are
+included within it. It is possible to include the delimiter within the pattern
+by escaping it, for example
+
+  /abc\\/def/
+
+If you do so, the escape and the delimiter form part of the pattern, but since
+delimiters are always non-alphameric, this does not affect its interpretation.
+If the terminating delimiter is immediately followed by a backslash, for
+example,
+
+  /abc/\\
+
+then a backslash is added to the end of the pattern. This is done to provide a
+way of testing the error condition that arises if a pattern finishes with a
+backslash, because
+
+  /abc\\/
+
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcretest to read the next line as a continuation of the regular expression.
+
+
+.SH PATTERN MODIFIERS
+
+The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
+PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
+respectively. For example:
+
+  /caseless/i
+
+These modifier letters have the same effect as they do in Perl. There are
+others which set PCRE options that do not correspond to anything in Perl:
+\fB/A\fR, \fB/E\fR, and \fB/X\fR set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and
+PCRE_EXTRA respectively.
+
+Searching for all possible matches within each subject string can be requested
+by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called
+again to search the remainder of the subject string. The difference between
+\fB/g\fR and \fB/G\fR is that the former uses the \fIstartoffset\fR argument to
+\fBpcre_exec()\fR to start searching at a new point within the entire string
+(which is in effect what Perl does), whereas the latter passes over a shortened
+substring. This makes a difference to the matching process if the pattern
+begins with a lookbehind assertion (including \\b or \\B).
+
+If any call to \fBpcre_exec()\fR in a \fB/g\fR or \fB/G\fR sequence matches an
+empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
+flags set in order to search for another, non-empty, match at the same point.
+If this second match fails, the start offset is advanced by one, and the normal
+match is retried. This imitates the way Perl handles such cases when using the
+\fB/g\fR modifier or the \fBsplit()\fR function.
+
+There are a number of other modifiers for controlling the way \fBpcretest\fR
+operates.
+
+The \fB/+\fR modifier requests that as well as outputting the substring that
+matched the entire pattern, pcretest should in addition output the remainder of
+the subject string. This is useful for tests where the subject contains
+multiple copies of the same substring.
+
+The \fB/L\fR modifier must be followed directly by the name of a locale, for
+example,
+
+  /pattern/Lfr
+
+For this reason, it must be the last modifier letter. The given locale is set,
+\fBpcre_maketables()\fR is called to build a set of character tables for the
+locale, and this is then passed to \fBpcre_compile()\fR when compiling the
+regular expression. Without an \fB/L\fR modifier, NULL is passed as the tables
+pointer; that is, \fB/L\fR applies only to the expression on which it appears.
+
+The \fB/I\fR modifier requests that \fBpcretest\fR output information about the
+compiled expression (whether it is anchored, has a fixed first character, and
+so on). It does this by calling \fBpcre_fullinfo()\fR after compiling an
+expression, and outputting the information it gets back. If the pattern is
+studied, the results of that are also output.
+
+The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
+It causes the internal form of compiled regular expressions to be output after
+compilation.
+
+The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
+expression has been compiled, and the results used when the expression is
+matched.
+
+The \fB/M\fR modifier causes the size of memory block used to hold the compiled
+pattern to be output.
+
+The \fB/P\fR modifier causes \fBpcretest\fR to call PCRE via the POSIX wrapper
+API rather than its native API. When this is done, all other modifiers except
+\fB/i\fR, \fB/m\fR, and \fB/+\fR are ignored. REG_ICASE is set if \fB/i\fR is
+present, and REG_NEWLINE is set if \fB/m\fR is present. The wrapper functions
+force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
+
+The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
+option set. This turns on the (currently incomplete) support for UTF-8
+character handling in PCRE, provided that it was compiled with this support
+enabled. This modifier also causes any non-printing characters in output
+strings to be printed using the \\x{hh...} notation if they are valid UTF-8
+sequences.
+
+
+.SH DATA LINES
+
+Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
+whitespace is removed, and it is then scanned for \\ escapes. The following are
+recognized:
+
+  \\a         alarm (= BEL)
+  \\b         backspace
+  \\e         escape
+  \\f         formfeed
+  \\n         newline
+  \\r         carriage return
+  \\t         tab
+  \\v         vertical tab
+  \\nnn       octal character (up to 3 octal digits)
+  \\xhh       hexadecimal character (up to 2 hex digits)
+  \\x{hh...}  hexadecimal UTF-8 character
+
+  \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
+  \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
+  \\Cdd       call pcre_copy_substring() for substring dd
+                after a successful match (any decimal number
+                less than 32)
+  \\Gdd       call pcre_get_substring() for substring dd
+                after a successful match (any decimal number
+                less than 32)
+  \\L         call pcre_get_substringlist() after a
+                successful match
+  \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
+  \\Odd       set the size of the output vector passed to
+                \fBpcre_exec()\fR to dd (any number of decimal
+                digits)
+  \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
+
+When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
+option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
+for the line in which it appears.
+
+A backslash followed by anything else just escapes the anything else. If the
+very last character is a backslash, it is ignored. This gives a way of passing
+an empty line as data, since a real empty line terminates the data input.
+
+If \fB/P\fR was present on the regex, causing the POSIX wrapper API to be used,
+only \fB\B\fR, and \fB\Z\fR have any effect, causing REG_NOTBOL and REG_NOTEOL
+to be passed to \fBregexec()\fR respectively.
+
+The use of \\x{hh...} to represent UTF-8 characters is not dependent on the use
+of the \fB/8\fR modifier on the pattern. It is recognized always. There may be
+any number of hexadecimal digits inside the braces. The result is from one to
+six bytes, encoded according to the UTF-8 rules.
+
+
+.SH OUTPUT FROM PCRETEST
+
+When a match succeeds, pcretest outputs the list of captured substrings that
+\fBpcre_exec()\fR returns, starting with number 0 for the string that matched
+the whole pattern. Here is an example of an interactive pcretest run.
+
+  $ pcretest
+  PCRE version 2.06 08-Jun-1999
+
+    re> /^abc(\\d+)/
+  data> abc123
+   0: abc123
+   1: 123
+  data> xyz
+  No match
+
+If the strings contain any non-printing characters, they are output as \\0x
+escapes, or as \\x{...} escapes if the \fB/8\fR modifier was present on the
+pattern. If the pattern has the \fB/+\fR modifier, then the output for
+substring 0 is followed by the the rest of the subject string, identified by
+"0+" like this:
+
+    re> /cat/+
+  data> cataract
+   0: cat
+   0+ aract
+
+If the pattern has the \fB/g\fR or \fB/G\fR modifier, the results of successive
+matching attempts are output in sequence, like this:
+
+    re> /\\Bi(\\w\\w)/g
+  data> Mississippi
+   0: iss
+   1: ss
+   0: iss
+   1: ss
+   0: ipp
+   1: pp
+
+"No match" is output only if the first match attempt fails.
+
+If any of the sequences \fB\\C\fR, \fB\\G\fR, or \fB\\L\fR are present in a
+data line that is successfully matched, the substrings extracted by the
+convenience functions are output with C, G, or L after the string number
+instead of a colon. This is in addition to the normal full list. The string
+length (that is, the return from the extraction function) is given in
+parentheses after each string for \fB\\C\fR and \fB\\G\fR.
+
+Note that while patterns can be continued over several lines (a plain ">"
+prompt is used for continuations), data lines may not. However newlines can be
+included in data by means of the \\n escape.
+
+
+.SH AUTHOR
+Philip Hazel <ph10@cam.ac.uk>
+.br
+University Computing Service,
+.br
+New Museums Site,
+.br
+Cambridge CB2 3QG, England.
+.br
+Phone: +44 1223 334714
+
+Last updated: 15 August 2001
+.br
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcretest.html b/ext/pcre/pcrelib/doc/pcretest.html
new file mode 100644
index 0000000000..918e6dec2b
--- /dev/null
+++ b/ext/pcre/pcrelib/doc/pcretest.html
@@ -0,0 +1,369 @@
+<HTML>
+<HEAD>
+<TITLE>pcretest specification</TITLE>
+</HEAD>
+<body bgcolor="#FFFFFF" text="#00005A">
+<H1>pcretest specification</H1>
+This HTML document has been generated automatically from the original man page.
+If there is any nonsense in it, please consult the man page in case the
+conversion went wrong.
+<UL>
+<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
+<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
+<LI><A NAME="TOC3" HREF="#SEC3">OPTIONS</A>
+<LI><A NAME="TOC4" HREF="#SEC4">DESCRIPTION</A>
+<LI><A NAME="TOC5" HREF="#SEC5">PATTERN MODIFIERS</A>
+<LI><A NAME="TOC6" HREF="#SEC6">DATA LINES</A>
+<LI><A NAME="TOC7" HREF="#SEC7">OUTPUT FROM PCRETEST</A>
+<LI><A NAME="TOC8" HREF="#SEC8">AUTHOR</A>
+</UL>
+<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
+<P>
+pcretest - a program for testing Perl-compatible regular expressions.
+</P>
+<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
+<P>
+<B>pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]</B>
+</P>
+<P>
+<B>pcretest</B> was written as a test program for the PCRE regular expression
+library itself, but it can also be used for experimenting with regular
+expressions. This man page describes the features of the test program; for
+details of the regular expressions themselves, see the <B>pcre</B> man page.
+</P>
+<LI><A NAME="SEC3" HREF="#TOC1">OPTIONS</A>
+<P>
+<B>-d</B>
+Behave as if each regex had the <B>/D</B> modifier (see below); the internal
+form is output after compilation.
+</P>
+<P>
+<B>-i</B>
+Behave as if each regex had the <B>/I</B> modifier; information about the
+compiled pattern is given after compilation.
+</P>
+<P>
+<B>-m</B>
+Output the size of each compiled pattern after it has been compiled. This is
+equivalent to adding /M to each regular expression. For compatibility with
+earlier versions of pcretest, <B>-s</B> is a synonym for <B>-m</B>.
+</P>
+<P>
+<B>-o</B> <I>osize</I>
+Set the number of elements in the output vector that is used when calling PCRE
+to be <I>osize</I>. The default value is 45, which is enough for 14 capturing
+subexpressions. The vector size can be changed for individual matching calls by
+including \O in the data line (see below).
+</P>
+<P>
+<B>-p</B>
+Behave as if each regex has <B>/P</B> modifier; the POSIX wrapper API is used
+to call PCRE. None of the other options has any effect when <B>-p</B> is set.
+</P>
+<P>
+<B>-t</B>
+Run each compile, study, and match 20000 times with a timer, and output
+resulting time per compile or match (in milliseconds). Do not set <B>-t</B> with
+<B>-m</B>, because you will then get the size output 20000 times and the timing
+will be distorted.
+</P>
+<LI><A NAME="SEC4" HREF="#TOC1">DESCRIPTION</A>
+<P>
+If <B>pcretest</B> is given two filename arguments, it reads from the first and
+writes to the second. If it is given only one filename argument, it reads from
+that file and writes to stdout. Otherwise, it reads from stdin and writes to
+stdout, and prompts for each line of input, using "re&#62;" to prompt for regular
+expressions, and "data&#62;" to prompt for data lines.
+</P>
+<P>
+The program handles any number of sets of input on a single input file. Each
+set starts with a regular expression, and continues with any number of data
+lines to be matched against the pattern. An empty line signals the end of the
+data lines, at which point a new regular expression is read. The regular
+expressions are given enclosed in any non-alphameric delimiters other than
+backslash, for example
+</P>
+<P>
+<PRE>
+  /(a|bc)x+yz/
+</PRE>
+</P>
+<P>
+White space before the initial delimiter is ignored. A regular expression may
+be continued over several input lines, in which case the newline characters are
+included within it. It is possible to include the delimiter within the pattern
+by escaping it, for example
+</P>
+<P>
+<PRE>
+  /abc\/def/
+</PRE>
+</P>
+<P>
+If you do so, the escape and the delimiter form part of the pattern, but since
+delimiters are always non-alphameric, this does not affect its interpretation.
+If the terminating delimiter is immediately followed by a backslash, for
+example,
+</P>
+<P>
+<PRE>
+  /abc/\
+</PRE>
+</P>
+<P>
+then a backslash is added to the end of the pattern. This is done to provide a
+way of testing the error condition that arises if a pattern finishes with a
+backslash, because
+</P>
+<P>
+<PRE>
+  /abc\/
+</PRE>
+</P>
+<P>
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcretest to read the next line as a continuation of the regular expression.
+</P>
+<LI><A NAME="SEC5" HREF="#TOC1">PATTERN MODIFIERS</A>
+<P>
+The pattern may be followed by <B>i</B>, <B>m</B>, <B>s</B>, or <B>x</B> to set the
+PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
+respectively. For example:
+</P>
+<P>
+<PRE>
+  /caseless/i
+</PRE>
+</P>
+<P>
+These modifier letters have the same effect as they do in Perl. There are
+others which set PCRE options that do not correspond to anything in Perl:
+<B>/A</B>, <B>/E</B>, and <B>/X</B> set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and
+PCRE_EXTRA respectively.
+</P>
+<P>
+Searching for all possible matches within each subject string can be requested
+by the <B>/g</B> or <B>/G</B> modifier. After finding a match, PCRE is called
+again to search the remainder of the subject string. The difference between
+<B>/g</B> and <B>/G</B> is that the former uses the <I>startoffset</I> argument to
+<B>pcre_exec()</B> to start searching at a new point within the entire string
+(which is in effect what Perl does), whereas the latter passes over a shortened
+substring. This makes a difference to the matching process if the pattern
+begins with a lookbehind assertion (including \b or \B).
+</P>
+<P>
+If any call to <B>pcre_exec()</B> in a <B>/g</B> or <B>/G</B> sequence matches an
+empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
+flags set in order to search for another, non-empty, match at the same point.
+If this second match fails, the start offset is advanced by one, and the normal
+match is retried. This imitates the way Perl handles such cases when using the
+<B>/g</B> modifier or the <B>split()</B> function.
+</P>
+<P>
+There are a number of other modifiers for controlling the way <B>pcretest</B>
+operates.
+</P>
+<P>
+The <B>/+</B> modifier requests that as well as outputting the substring that
+matched the entire pattern, pcretest should in addition output the remainder of
+the subject string. This is useful for tests where the subject contains
+multiple copies of the same substring.
+</P>
+<P>
+The <B>/L</B> modifier must be followed directly by the name of a locale, for
+example,
+</P>
+<P>
+<PRE>
+  /pattern/Lfr
+</PRE>
+</P>
+<P>
+For this reason, it must be the last modifier letter. The given locale is set,
+<B>pcre_maketables()</B> is called to build a set of character tables for the
+locale, and this is then passed to <B>pcre_compile()</B> when compiling the
+regular expression. Without an <B>/L</B> modifier, NULL is passed as the tables
+pointer; that is, <B>/L</B> applies only to the expression on which it appears.
+</P>
+<P>
+The <B>/I</B> modifier requests that <B>pcretest</B> output information about the
+compiled expression (whether it is anchored, has a fixed first character, and
+so on). It does this by calling <B>pcre_fullinfo()</B> after compiling an
+expression, and outputting the information it gets back. If the pattern is
+studied, the results of that are also output.
+</P>
+<P>
+The <B>/D</B> modifier is a PCRE debugging feature, which also assumes <B>/I</B>.
+It causes the internal form of compiled regular expressions to be output after
+compilation.
+</P>
+<P>
+The <B>/S</B> modifier causes <B>pcre_study()</B> to be called after the
+expression has been compiled, and the results used when the expression is
+matched.
+</P>
+<P>
+The <B>/M</B> modifier causes the size of memory block used to hold the compiled
+pattern to be output.
+</P>
+<P>
+The <B>/P</B> modifier causes <B>pcretest</B> to call PCRE via the POSIX wrapper
+API rather than its native API. When this is done, all other modifiers except
+<B>/i</B>, <B>/m</B>, and <B>/+</B> are ignored. REG_ICASE is set if <B>/i</B> is
+present, and REG_NEWLINE is set if <B>/m</B> is present. The wrapper functions
+force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
+</P>
+<P>
+The <B>/8</B> modifier causes <B>pcretest</B> to call PCRE with the PCRE_UTF8
+option set. This turns on the (currently incomplete) support for UTF-8
+character handling in PCRE, provided that it was compiled with this support
+enabled. This modifier also causes any non-printing characters in output
+strings to be printed using the \x{hh...} notation if they are valid UTF-8
+sequences.
+</P>
+<LI><A NAME="SEC6" HREF="#TOC1">DATA LINES</A>
+<P>
+Before each data line is passed to <B>pcre_exec()</B>, leading and trailing
+whitespace is removed, and it is then scanned for \ escapes. The following are
+recognized:
+</P>
+<P>
+<PRE>
+  \a         alarm (= BEL)
+  \b         backspace
+  \e         escape
+  \f         formfeed
+  \n         newline
+  \r         carriage return
+  \t         tab
+  \v         vertical tab
+  \nnn       octal character (up to 3 octal digits)
+  \xhh       hexadecimal character (up to 2 hex digits)
+  \x{hh...}  hexadecimal UTF-8 character
+</PRE>
+</P>
+<P>
+<PRE>
+  \A         pass the PCRE_ANCHORED option to <B>pcre_exec()</B>
+  \B         pass the PCRE_NOTBOL option to <B>pcre_exec()</B>
+  \Cdd       call pcre_copy_substring() for substring dd
+                after a successful match (any decimal number
+                less than 32)
+  \Gdd       call pcre_get_substring() for substring dd
+                after a successful match (any decimal number
+                less than 32)
+  \L         call pcre_get_substringlist() after a
+                successful match
+  \N         pass the PCRE_NOTEMPTY option to <B>pcre_exec()</B>
+  \Odd       set the size of the output vector passed to
+                <B>pcre_exec()</B> to dd (any number of decimal
+                digits)
+  \Z         pass the PCRE_NOTEOL option to <B>pcre_exec()</B>
+</PRE>
+</P>
+<P>
+When \O is used, it may be higher or lower than the size set by the <B>-O</B>
+option (or defaulted to 45); \O applies only to the call of <B>pcre_exec()</B>
+for the line in which it appears.
+</P>
+<P>
+A backslash followed by anything else just escapes the anything else. If the
+very last character is a backslash, it is ignored. This gives a way of passing
+an empty line as data, since a real empty line terminates the data input.
+</P>
+<P>
+If <B>/P</B> was present on the regex, causing the POSIX wrapper API to be used,
+only <B>\B</B>, and <B>\Z</B> have any effect, causing REG_NOTBOL and REG_NOTEOL
+to be passed to <B>regexec()</B> respectively.
+</P>
+<P>
+The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
+of the <B>/8</B> modifier on the pattern. It is recognized always. There may be
+any number of hexadecimal digits inside the braces. The result is from one to
+six bytes, encoded according to the UTF-8 rules.
+</P>
+<LI><A NAME="SEC7" HREF="#TOC1">OUTPUT FROM PCRETEST</A>
+<P>
+When a match succeeds, pcretest outputs the list of captured substrings that
+<B>pcre_exec()</B> returns, starting with number 0 for the string that matched
+the whole pattern. Here is an example of an interactive pcretest run.
+</P>
+<P>
+<PRE>
+  $ pcretest
+  PCRE version 2.06 08-Jun-1999
+</PRE>
+</P>
+<P>
+<PRE>
+    re&#62; /^abc(\d+)/
+  data&#62; abc123
+   0: abc123
+   1: 123
+  data&#62; xyz
+  No match
+</PRE>
+</P>
+<P>
+If the strings contain any non-printing characters, they are output as \0x
+escapes, or as \x{...} escapes if the <B>/8</B> modifier was present on the
+pattern. If the pattern has the <B>/+</B> modifier, then the output for
+substring 0 is followed by the the rest of the subject string, identified by
+"0+" like this:
+</P>
+<P>
+<PRE>
+    re&#62; /cat/+
+  data&#62; cataract
+   0: cat
+   0+ aract
+</PRE>
+</P>
+<P>
+If the pattern has the <B>/g</B> or <B>/G</B> modifier, the results of successive
+matching attempts are output in sequence, like this:
+</P>
+<P>
+<PRE>
+    re&#62; /\Bi(\w\w)/g
+  data&#62; Mississippi
+   0: iss
+   1: ss
+   0: iss
+   1: ss
+   0: ipp
+   1: pp
+</PRE>
+</P>
+<P>
+"No match" is output only if the first match attempt fails.
+</P>
+<P>
+If any of the sequences <B>\C</B>, <B>\G</B>, or <B>\L</B> are present in a
+data line that is successfully matched, the substrings extracted by the
+convenience functions are output with C, G, or L after the string number
+instead of a colon. This is in addition to the normal full list. The string
+length (that is, the return from the extraction function) is given in
+parentheses after each string for <B>\C</B> and <B>\G</B>.
+</P>
+<P>
+Note that while patterns can be continued over several lines (a plain "&#62;"
+prompt is used for continuations), data lines may not. However newlines can be
+included in data by means of the \n escape.
+</P>
+<LI><A NAME="SEC8" HREF="#TOC1">AUTHOR</A>
+<P>
+Philip Hazel &#60;ph10@cam.ac.uk&#62;
+<BR>
+University Computing Service,
+<BR>
+New Museums Site,
+<BR>
+Cambridge CB2 3QG, England.
+<BR>
+Phone: +44 1223 334714
+</P>
+<P>
+Last updated: 15 August 2001
+<BR>
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/ext/pcre/pcrelib/doc/pcretest.txt b/ext/pcre/pcrelib/doc/pcretest.txt
index add2979f14..0e13b6c6c5 100644
--- a/ext/pcre/pcrelib/doc/pcretest.txt
+++ b/ext/pcre/pcrelib/doc/pcretest.txt
@@ -1,246 +1,319 @@
-The pcretest program
---------------------
+NAME
+     pcretest - a program  for  testing  Perl-compatible  regular
+     expressions.
 
-This program is intended for testing PCRE, but it can also be used for
-experimenting with regular expressions.
 
-If it is given two filename arguments, it reads from the first and writes to
-the second. If it is given only one filename argument, it reads from that file
-and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
-prompts for each line of input, using "re>" to prompt for regular expressions,
-and "data>" to prompt for data lines.
 
-The program handles any number of sets of input on a single input file. Each
-set starts with a regular expression, and continues with any number of data
-lines to be matched against the pattern. An empty line signals the end of the
-data lines, at which point a new regular expression is read. The regular
-expressions are given enclosed in any non-alphameric delimiters other than
-backslash, for example
+SYNOPSIS
+     pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source]  [des-
+     tination]
 
-  /(a|bc)x+yz/
+     pcretest was written as a test program for the PCRE  regular
+     expression  library  itself,  but  it  can  also be used for
+     experimenting  with  regular  expressions.  This  man   page
+     describes  the  features of the test program; for details of
+     the regular expressions themselves, see the pcre man page.
 
-White space before the initial delimiter is ignored. A regular expression may
-be continued over several input lines, in which case the newline characters are
-included within it. See the test input files in the testdata directory for many
-examples. It is possible to include the delimiter within the pattern by
-escaping it, for example
 
-  /abc\/def/
 
-If you do so, the escape and the delimiter form part of the pattern, but since
-delimiters are always non-alphameric, this does not affect its interpretation.
-If the terminating delimiter is immediately followed by a backslash, for
-example,
+OPTIONS
+     -d        Behave as if each regex had the /D  modifier  (see
+               below); the internal form is output after compila-
+               tion.
 
-  /abc/\
+     -i        Behave as if  each  regex  had  the  /I  modifier;
+               information  about  the  compiled pattern is given
+               after compilation.
 
-then a backslash is added to the end of the pattern. This is done to provide a
-way of testing the error condition that arises if a pattern finishes with a
-backslash, because
+     -m        Output the size of each compiled pattern after  it
+               has been compiled. This is equivalent to adding /M
+               to each regular expression. For compatibility with
+               earlier  versions of pcretest, -s is a synonym for
+               -m.
 
-  /abc\/
+     -o osize  Set the number of elements in  the  output  vector
+               that  is  used  when calling PCRE to be osize. The
+               default value is 45, which is enough for  14  cap-
+               turing  subexpressions.  The  vector  size  can be
+               changed for individual matching calls by including
+               \O in the data line (see below).
 
-is interpreted as the first line of a pattern that starts with "abc/", causing
-pcretest to read the next line as a continuation of the regular expression.
+     -p        Behave as if each regex has /P modifier; the POSIX
+               wrapper  API  is  used  to  call PCRE. None of the
+               other options has any effect when -p is set.
 
+     -t        Run each compile, study,  and  match  20000  times
+               with  a  timer, and output resulting time per com-
+               pile or match (in milliseconds).  Do  not  set  -t
+               with -m, because you will then get the size output
+               20000 times and the timing will be distorted.
 
-PATTERN MODIFIERS
------------------
 
-The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
-PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
-example:
 
-  /caseless/i
+DESCRIPTION
+     If pcretest is given two filename arguments, it  reads  from
+     the  first and writes to the second. If it is given only one
+
+
 
-These modifier letters have the same effect as they do in Perl. There are
-others which set PCRE options that do not correspond to anything in Perl: /A,
-/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
 
-Searching for all possible matches within each subject string can be requested
-by the /g or /G modifier. After finding a match, PCRE is called again to search
-the remainder of the subject string. The difference between /g and /G is that
-the former uses the startoffset argument to pcre_exec() to start searching at
-a new point within the entire string (which is in effect what Perl does),
-whereas the latter passes over a shortened substring. This makes a difference
-to the matching process if the pattern begins with a lookbehind assertion
-(including \b or \B).
+SunOS 5.8                 Last change:                          1
 
-If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
-next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set in order
-to search for another, non-empty, match at the same point. If this second match
-fails, the start offset is advanced by one, and the normal match is retried.
-This imitates the way Perl handles such cases when using the /g modifier or the
-split() function.
 
-There are a number of other modifiers for controlling the way pcretest
-operates.
 
-The /+ modifier requests that as well as outputting the substring that matched
-the entire pattern, pcretest should in addition output the remainder of the
-subject string. This is useful for tests where the subject contains multiple
-copies of the same substring.
+     filename argument, it reads from that  file  and  writes  to
+     stdout. Otherwise, it reads from stdin and writes to stdout,
+     and prompts for each line of input, using  "re>"  to  prompt
+     for  regular  expressions,  and  "data>"  to prompt for data
+     lines.
 
-The /L modifier must be followed directly by the name of a locale, for example,
+     The program handles any number of sets of input on a  single
+     input  file.  Each set starts with a regular expression, and
+     continues with any  number  of  data  lines  to  be  matched
+     against  the  pattern.  An empty line signals the end of the
+     data lines, at which point a new regular expression is read.
+     The  regular  expressions  are  given  enclosed  in any non-
+     alphameric delimiters other than backslash, for example
 
-  /pattern/Lfr
+       /(a|bc)x+yz/
 
-For this reason, it must be the last modifier letter. The given locale is set,
-pcre_maketables() is called to build a set of character tables for the locale,
-and this is then passed to pcre_compile() when compiling the regular
-expression. Without an /L modifier, NULL is passed as the tables pointer; that
-is, /L applies only to the expression on which it appears.
+     White space before the initial delimiter is ignored. A regu-
+     lar expression may be continued over several input lines, in
+     which case the newline characters are included within it. It
+     is  possible  to include the delimiter within the pattern by
+     escaping it, for example
 
-The /I modifier requests that pcretest output information about the compiled
-expression (whether it is anchored, has a fixed first character, and so on). It
-does this by calling pcre_fullinfo() after compiling an expression, and
-outputting the information it gets back. If the pattern is studied, the results
-of that are also output.
+       /abc\/def/
 
-The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
-the internal form of compiled regular expressions to be output after
-compilation.
+     If you do so, the escape and the delimiter form part of  the
+     pattern,  but  since  delimiters  are always non-alphameric,
+     this does not affect its interpretation.  If the terminating
+     delimiter  is immediately followed by a backslash, for exam-
+     ple,
 
-The /S modifier causes pcre_study() to be called after the expression has been
-compiled, and the results used when the expression is matched.
+       /abc/\
 
-The /M modifier causes the size of memory block used to hold the compiled
-pattern to be output.
+     then a backslash is added to the end of the pattern. This is
+     done  to  provide  a way of testing the error condition that
+     arises if a pattern finishes with a backslash, because
 
-The /P modifier causes pcretest to call PCRE via the POSIX wrapper API rather
-than its native API. When this is done, all other modifiers except /i, /m, and
-/+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m
-is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
-PCRE_DOTALL unless REG_NEWLINE is set.
+       /abc\/
+
+     is interpreted as the first line of a  pattern  that  starts
+     with  "abc/",  causing  pcretest  to read the next line as a
+     continuation of the regular expression.
+
+
+
+PATTERN MODIFIERS
+     The pattern may be followed by i, m, s,  or  x  to  set  the
+     PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED
+     options, respectively. For example:
+
+       /caseless/i
+
+     These modifier letters have the same effect as  they  do  in
+     Perl.  There  are  others which set PCRE options that do not
+     correspond  to  anything  in  Perl:   /A,  /E,  and  /X  set
+     PCRE_ANCHORED,  PCRE_DOLLAR_ENDONLY,  and PCRE_EXTRA respec-
+     tively.
+
+     Searching for  all  possible  matches  within  each  subject
+     string  can  be  requested  by  the /g or /G modifier. After
+     finding  a  match,  PCRE  is  called  again  to  search  the
+     remainder  of  the subject string. The difference between /g
+     and /G is that the former uses the startoffset  argument  to
+     pcre_exec()  to  start  searching  at a new point within the
+     entire string (which is in effect what Perl  does),  whereas
+     the  latter  passes over a shortened substring. This makes a
+     difference to the matching process  if  the  pattern  begins
+     with a lookbehind assertion (including \b or \B).
+
+     If any call to pcre_exec() in a /g or /G sequence matches an
+     empty  string,  the next call is done with the PCRE_NOTEMPTY
+     and PCRE_ANCHORED flags set in order to search for  another,
+     non-empty,  match  at  the same point.  If this second match
+     fails, the start offset is advanced by one, and  the  normal
+     match  is  retried.  This imitates the way Perl handles such
+     cases when using the /g modifier or the split() function.
+
+     There are a number of other modifiers  for  controlling  the
+     way pcretest operates.
+
+     The /+ modifier requests that as well as outputting the sub-
+     string  that  matched the entire pattern, pcretest should in
+     addition output the remainder of the subject string. This is
+     useful  for tests where the subject contains multiple copies
+     of the same substring.
+
+     The /L modifier must be followed directly by the name  of  a
+     locale, for example,
+
+       /pattern/Lfr
+
+     For this reason, it must be the last  modifier  letter.  The
+     given  locale is set, pcre_maketables() is called to build a
+     set of character tables for the locale,  and  this  is  then
+     passed  to pcre_compile() when compiling the regular expres-
+     sion. Without an /L modifier, NULL is passed as  the  tables
+     pointer; that is, /L applies only to the expression on which
+     it appears.
+
+     The /I modifier requests that  pcretest  output  information
+     about the compiled expression (whether it is anchored, has a
+     fixed first character, and so on). It does this  by  calling
+     pcre_fullinfo()  after  compiling an expression, and output-
+     ting the information it gets back. If the  pattern  is  stu-
+     died, the results of that are also output.
+     The /D modifier is a  PCRE  debugging  feature,  which  also
+     assumes /I.  It causes the internal form of compiled regular
+     expressions to be output after compilation.
+
+     The /S modifier causes pcre_study() to be called  after  the
+     expression  has been compiled, and the results used when the
+     expression is matched.
+
+     The /M modifier causes the size of memory block used to hold
+     the compiled pattern to be output.
+
+     The /P modifier causes pcretest to call PCRE via  the  POSIX
+     wrapper  API  rather than its native API. When this is done,
+     all other modifiers except  /i,  /m,  and  /+  are  ignored.
+     REG_ICASE is set if /i is present, and REG_NEWLINE is set if
+     /m    is    present.    The    wrapper    functions    force
+     PCRE_DOLLAR_ENDONLY    always,    and   PCRE_DOTALL   unless
+     REG_NEWLINE is set.
+
+     The /8 modifier  causes  pcretest  to  call  PCRE  with  the
+     PCRE_UTF8  option  set.  This turns on the (currently incom-
+     plete) support for UTF-8 character handling  in  PCRE,  pro-
+     vided  that  it was compiled with this support enabled. This
+     modifier also causes any non-printing characters  in  output
+     strings  to  be printed using the \x{hh...} notation if they
+     are valid UTF-8 sequences.
 
-The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option set.
-This turns on the (currently incomplete) support for UTF-8 character handling
-in PCRE, provided that it was compiled with this support enabled. This modifier
-also causes any non-printing characters in output strings to be printed using
-the \x{hh...} notation if they are valid UTF-8 sequences.
 
 
 DATA LINES
-----------
-
-Before each data line is passed to pcre_exec(), leading and trailing whitespace
-is removed, and it is then scanned for \ escapes. The following are recognized:
-
-  \a         alarm (= BEL)
-  \b         backspace
-  \e         escape
-  \f         formfeed
-  \n         newline
-  \r         carriage return
-  \t         tab
-  \v         vertical tab
-  \nnn       octal character (up to 3 octal digits)
-  \xhh       hexadecimal character (up to 2 hex digits)
-  \x{hh...}  hexadecimal UTF-8 character
-
-  \A         pass the PCRE_ANCHORED option to pcre_exec()
-  \B         pass the PCRE_NOTBOL option to pcre_exec()
-  \Cdd       call pcre_copy_substring() for substring dd after a successful
-               match (any decimal number less than 32)
-  \Gdd       call pcre_get_substring() for substring dd after a successful
-               match (any decimal number less than 32)
-  \L         call pcre_get_substringlist() after a successful match
-  \N         pass the PCRE_NOTEMPTY option to pcre_exec()
-  \Odd       set the size of the output vector passed to pcre_exec() to dd
-               (any number of decimal digits)
-  \Z         pass the PCRE_NOTEOL option to pcre_exec()
-
-A backslash followed by anything else just escapes the anything else. If the
-very last character is a backslash, it is ignored. This gives a way of passing
-an empty line as data, since a real empty line terminates the data input.
-
-If /P was present on the regex, causing the POSIX wrapper API to be used, only
-\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
-regexec() respectively.
-
-The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
-of the /8 modifier on the pattern. It is recognized always. There may be any
-number of hexadecimal digits inside the braces. The result is from one to six
-bytes, encoded according to the UTF-8 rules.
+     Before each data line is passed to pcre_exec(), leading  and
+     trailing whitespace is removed, and it is then scanned for \
+     escapes. The following are recognized:
+
+       \a         alarm (= BEL)
+       \b         backspace
+       \e         escape
+       \f         formfeed
+       \n         newline
+       \r         carriage return
+       \t         tab
+       \v         vertical tab
+       \nnn       octal character (up to 3 octal digits)
+       \xhh       hexadecimal character (up to 2 hex digits)
+       \x{hh...}  hexadecimal UTF-8 character
+
+       \A         pass the PCRE_ANCHORED option to pcre_exec()
+       \B         pass the PCRE_NOTBOL option to pcre_exec()
+       \Cdd       call pcre_copy_substring() for substring dd
+                     after a successful match (any decimal number
+                     less than 32)
+       \Gdd       call pcre_get_substring() for substring dd
+
+                     after a successful match (any decimal number
+                     less than 32)
+       \L         call pcre_get_substringlist() after a
+                     successful match
+       \N         pass the PCRE_NOTEMPTY option to pcre_exec()
+       \Odd       set the size of the output vector passed to
+                     pcre_exec() to dd (any number of decimal
+                     digits)
+       \Z         pass the PCRE_NOTEOL option to pcre_exec()
+
+     When \O is used, it may be higher or lower than the size set
+     by  the  -O  option (or defaulted to 45); \O applies only to
+     the call of pcre_exec() for the line in which it appears.
+
+     A backslash followed by anything else just escapes the  any-
+     thing else. If the very last character is a backslash, it is
+     ignored. This gives a way of passing an empty line as  data,
+     since a real empty line terminates the data input.
+
+     If /P was present on the regex, causing  the  POSIX  wrapper
+     API  to  be  used,  only  B,  and Z have any effect, causing
+     REG_NOTBOL and REG_NOTEOL to be passed to regexec()  respec-
+     tively.
+
+     The use of \x{hh...} to represent UTF-8  characters  is  not
+     dependent  on  the use of the /8 modifier on the pattern. It
+     is recognized always. There may be any number of hexadecimal
+     digits  inside  the  braces.  The  result is from one to six
+     bytes, encoded according to the UTF-8 rules.
+
 
 
 OUTPUT FROM PCRETEST
---------------------
-
-When a match succeeds, pcretest outputs the list of captured substrings that
-pcre_exec() returns, starting with number 0 for the string that matched the
-whole pattern. Here is an example of an interactive pcretest run.
-
-  $ pcretest
-  PCRE version 2.06 08-Jun-1999
-
-    re> /^abc(\d+)/
-  data> abc123
-   0: abc123
-   1: 123
-  data> xyz
-  No match
-
-If the strings contain any non-printing characters, they are output as \0x
-escapes, or as \x{...} escapes if the /8 modifier was present on the pattern.
-If the pattern has the /+ modifier, then the output for substring 0 is followed
-by the the rest of the subject string, identified by "0+" like this:
-
-    re> /cat/+
-  data> cataract
-   0: cat
-   0+ aract
-
-If the pattern has the /g or /G modifier, the results of successive matching
-attempts are output in sequence, like this:
-
-    re> /\Bi(\w\w)/g
-  data> Mississippi
-   0: iss
-   1: ss
-   0: iss
-   1: ss
-   0: ipp
-   1: pp
-
-"No match" is output only if the first match attempt fails.
-
-If any of \C, \G, or \L are present in a data line that is successfully
-matched, the substrings extracted by the convenience functions are output with
-C, G, or L after the string number instead of a colon. This is in addition to
-the normal full list. The string length (that is, the return from the
-extraction function) is given in parentheses after each string for \C and \G.
-
-Note that while patterns can be continued over several lines (a plain ">"
-prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \n escape.
-
-
-COMMAND LINE OPTIONS
---------------------
-
-If the -p option is given to pcretest, it is equivalent to adding /P to each
-regular expression: the POSIX wrapper API is used to call PCRE. None of the
-following flags has any effect in this case.
-
-If the option -d is given to pcretest, it is equivalent to adding /D to each
-regular expression: the internal form is output after compilation.
-
-If the option -i is given to pcretest, it is equivalent to adding /I to each
-regular expression: information about the compiled pattern is given after
-compilation.
-
-If the option -m is given to pcretest, it outputs the size of each compiled
-pattern after it has been compiled. It is equivalent to adding /M to each
-regular expression. For compatibility with earlier versions of pcretest, -s is
-a synonym for -m.
-
-If the -t option is given, each compile, study, and match is run 20000 times
-while being timed, and the resulting time per compile or match is output in
-milliseconds. Do not set -t with -m, because you will then get the size output
-20000 times and the timing will be distorted. If you want to change the number
-of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
-pcretest.c
-
-Philip Hazel <ph10@cam.ac.uk>
-August 2000
+     When a match succeeds, pcretest outputs the list of captured
+     substrings  that pcre_exec() returns, starting with number 0
+     for the string that matched the whole pattern.  Here  is  an
+     example of an interactive pcretest run.
+
+       $ pcretest
+       PCRE version 2.06 08-Jun-1999
+
+         re> /^abc(\d+)/
+       data> abc123
+        0: abc123
+        1: 123
+       data> xyz
+       No match
+
+     If the strings contain any non-printing characters, they are
+     output  as  \0x  escapes,  or  as  \x{...} escapes if the /8
+     modifier was present on the pattern. If the pattern has  the
+     /+  modifier, then the output for substring 0 is followed by
+     the the rest of the subject string, identified by "0+"  like
+     this:
+
+         re> /cat/+
+       data> cataract
+        0: cat
+        0+ aract
+
+     If the pattern has the /g or /G  modifier,  the  results  of
+     successive  matching  attempts  are output in sequence, like
+     this:
+
+         re> /\Bi(\w\w)/g
+       data> Mississippi
+        0: iss
+        1: ss
+        0: iss
+        1: ss
+        0: ipp
+        1: pp
+
+     "No match" is output only if the first match attempt fails.
+
+     If any of the sequences \C, \G, or \L are present in a  data
+     line  that is successfully matched, the substrings extracted
+     by the convenience functions are output  with  C,  G,  or  L
+     after the string number instead of a colon. This is in addi-
+     tion to the normal full list. The string  length  (that  is,
+     the  return  from  the  extraction  function)  is  given  in
+     parentheses after each string for \C and \G.
+
+     Note that while patterns can be continued over several lines
+     (a  plain  ">" prompt is used for continuations), data lines
+     may not. However newlines can be included in data  by  means
+     of the \n escape.
+
+
+
+AUTHOR
+     Philip Hazel <ph10@cam.ac.uk>
+     University Computing Service,
+     New Museums Site,
+     Cambridge CB2 3QG, England.
+     Phone: +44 1223 334714
+
+     Last updated: 15 August 2001
+     Copyright (c) 1997-2001 University of Cambridge.