Documentation update

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@456 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-10-02 08:53:31 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-10-02 08:53:31 +0000
commit: 80d5ddd534157a15013c2314a4236bdd8ac0b72f (patch)
tree: e9cc83556582cde2b0adf4e2a4d98ae75c35aecf
parent: 5c3493df2827cb70ddc42899df2f9ee30f5e7a7b (diff)
download: pcre-80d5ddd534157a15013c2314a4236bdd8ac0b72f.tar.gz
13 files changed, 274 insertions, 191 deletions
diff --git a/HACKING b/HACKING
index 1f30d4c..623fe5b 100644
--- a/HACKING
+++ b/HACKING
@@ -67,22 +67,22 @@ many tests of the mode that might slow it down. So I re-factored the compiling
 functions to work this way. This got rid of about 600 lines of source. It
 should make future maintenance and development easier. As this was such a major 
 change, I never released 6.8, instead upping the number to 7.0 (other quite 
-major changes are also present in the 7.0 release).
+major changes were also present in the 7.0 release).
 
-A side effect of this work is that the previous limit of 200 on the nesting
+A side effect of this work was that the previous limit of 200 on the nesting
 depth of parentheses was removed. However, there is a downside: pcre_compile()
 runs more slowly than before (30% or more, depending on the pattern) because it
-is doing a full analysis of the pattern. My hope is that this is not a big
-issue.
+is doing a full analysis of the pattern. My hope was that this would not be a
+big issue, and in the event, nobody has commented on it.
 
 Traditional matching function
 -----------------------------
 
 The "traditional", and original, matching function is called pcre_exec(), and 
 it implements an NFA algorithm, similar to the original Henry Spencer algorithm 
-and the way that Perl works. Not surprising, since it is intended to be as 
-compatible with Perl as possible. This is the function most users of PCRE will 
-use most of the time.
+and the way that Perl works. This is not surprising, since it is intended to be
+as compatible with Perl as possible. This is the function most users of PCRE
+will use most of the time.
 
 Supplementary matching function
 -------------------------------
@@ -119,6 +119,7 @@ quantifiers) are always just two bytes long.
 
 A list of the opcodes follows:
 
+
 Opcodes with no following data
 ------------------------------
 
@@ -150,12 +151,12 @@ These items are all just one byte long
   OP_EXTUNI              match an extended Unicode character 
   OP_ANYNL               match any Unicode newline sequence 
   
-  OP_ACCEPT              )
-  OP_COMMIT              ) 
-  OP_FAIL                ) These are Perl 5.10's "backtracking     
-  OP_PRUNE               ) control verbs".                         
-  OP_SKIP                )
-  OP_THEN                )
+  OP_ACCEPT              ) These are Perl 5.10's "backtracking    
+  OP_COMMIT              ) control verbs". If OP_ACCEPT is inside
+  OP_FAIL                ) capturing parentheses, it may be preceded 
+  OP_PRUNE               ) by one or more OP_CLOSE, followed by a 2-byte 
+  OP_SKIP                ) number, indicating which parentheses must be
+  OP_THEN                ) closed.
   
 
 Repeating single characters
@@ -415,4 +416,4 @@ at compile time, and so does not cause anything to be put into the compiled
 data.
 
 Philip Hazel
-April 2008
+October 2009
diff --git a/doc/pcre.3 b/doc/pcre.3
index 430fbd5..3d6409a 100644
--- a/doc/pcre.3
+++ b/doc/pcre.3
@@ -6,21 +6,20 @@ PCRE - Perl-compatible regular expressions
 .sp
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl, with just a few
-differences. Certain features that appeared in Python and PCRE before they
-appeared in Perl are also available using the Python syntax. There is also some
-support for certain .NET and Oniguruma syntax items, and there is an option for
-requesting some minor changes that give better JavaScript compatibility.
+differences. Some features that appeared in Python and PCRE before they
+appeared in Perl are also available using the Python syntax, there is some
+support for one or two .NET and Oniguruma syntax items, and there is an option
+for requesting some minor changes that give better JavaScript compatibility.
 .P
-The current implementation of PCRE (release 8.xx) corresponds approximately
-with Perl 5.10, including support for UTF-8 encoded strings and Unicode general
-category properties. However, UTF-8 and Unicode support has to be explicitly
-enabled; it is not the default. The Unicode tables correspond to Unicode
-release 5.1.
+The current implementation of PCRE corresponds approximately with Perl 5.10,
+including support for UTF-8 encoded strings and Unicode general category
+properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
+is not the default. The Unicode tables correspond to Unicode release 5.1.
 .P
 In addition to the Perl-compatible matching function, PCRE contains an
-alternative matching function that matches the same compiled patterns in a
-different way. In certain circumstances, the alternative function has some
-advantages. For a discussion of the two matching algorithms, see the
+alternative function that matches the same compiled patterns in a different
+way. In certain circumstances, the alternative function has some advantages.
+For a discussion of the two matching algorithms, see the
 .\" HREF
 \fBpcrematching\fP
 .\"
@@ -66,7 +65,8 @@ available. The features themselves are described in the
 \fBpcrebuild\fP
 .\"
 page. Documentation about building PCRE for various operating systems can be
-found in the \fBREADME\fP file in the source distribution.
+found in the \fBREADME\fP and \fBNON-UNIX-USE\fP files in the source
+distribution.
 .P
 The library contains a number of undocumented internal functions and data
 tables that are used by more than one of the exported external functions, but
@@ -100,12 +100,12 @@ of searching. The sections are as follows:
 .\" JOIN
   pcrepattern       syntax and semantics of supported
                       regular expressions
-  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
+  pcresyntax        quick syntax reference
   pcretest          description of the \fBpcretest\fP testing command
 .sp
 In addition, in the "man" and HTML formats, there is a short page for each
@@ -148,8 +148,8 @@ issues, see the
 .\"
 documentation.
 .
-.\" HTML <a name="utf8support"></a>
 .
+.\" HTML <a name="utf8support"></a>
 .
 .SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
 .rs
@@ -167,7 +167,7 @@ the code, and, in addition, you must call
 with the PCRE_UTF8 option flag, or the pattern must start with the sequence
 (*UTF8). When either of these is the case, both the pattern and any subject
 strings that are matched against it are treated as UTF-8 strings instead of
-just strings of bytes.
+strings of 1-byte characters.
 .P
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
 library will be a bit bigger, but the additional run time overhead is limited
@@ -187,6 +187,7 @@ documentation. Only the short names for properties are supported. For example,
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE does not support this.
 .
+.
 .\" HTML <a name="utf8strings"></a>
 .
 .SS "Validity of UTF-8 strings"
@@ -292,6 +293,6 @@ two digits 10, at the domain cam.ac.uk.
 .rs
 .sp
 .nf
-Last updated: 01 September 2009
+Last updated: 28 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcre_compile.3 b/doc/pcre_compile.3
index 48f92f7..e64288a 100644
--- a/doc/pcre_compile.3
+++ b/doc/pcre_compile.3
@@ -52,11 +52,11 @@ The option bits are:
   PCRE_NEWLINE_LF         Set LF as the newline sequence
   PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
-  PCRE_UNGREEDY           Invert greediness of quantifiers
-  PCRE_UTF8               Run in UTF-8 mode
   PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
                             validity (only relevant if
                             PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 .sp
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.
diff --git a/doc/pcre_compile2.3 b/doc/pcre_compile2.3
index 1e71aff..84dbf19 100644
--- a/doc/pcre_compile2.3
+++ b/doc/pcre_compile2.3
@@ -34,29 +34,33 @@ argument. The arguments are:
 .sp
 The option bits are:
 .sp
-  PCRE_ANCHORED         Force pattern anchoring
-  PCRE_AUTO_CALLOUT     Compile automatic callouts
-  PCRE_CASELESS         Do caseless matching
-  PCRE_DOLLAR_ENDONLY   $ not to match newline at end
-  PCRE_DOTALL           . matches anything including NL
-  PCRE_DUPNAMES         Allow duplicate names for subpatterns
-  PCRE_EXTENDED         Ignore whitespace and # comments
-  PCRE_EXTRA            PCRE extra features
-                          (not much use currently)
-  PCRE_FIRSTLINE        Force matching to be before newline
-  PCRE_MULTILINE        ^ and $ match newlines within data
-  PCRE_NEWLINE_ANY      Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF  Recognize CR, LF, and CRLF as newline sequences
-  PCRE_NEWLINE_CR       Set CR as the newline sequence
-  PCRE_NEWLINE_CRLF     Set CRLF as the newline sequence
-  PCRE_NEWLINE_LF       Set LF as the newline sequence
-  PCRE_NO_AUTO_CAPTURE  Disable numbered capturing paren-
-                          theses (named ones available)
-  PCRE_UNGREEDY         Invert greediness of quantifiers
-  PCRE_UTF8             Run in UTF-8 mode
-  PCRE_NO_UTF8_CHECK    Do not check the pattern for UTF-8
-                          validity (only relevant if
-                          PCRE_UTF8 is set)
+  PCRE_ANCHORED           Force pattern anchoring
+  PCRE_AUTO_CALLOUT       Compile automatic callouts
+  PCRE_BSR_ANYCRLF        \eR matches only CR, LF, or CRLF
+  PCRE_BSR_UNICODE        \eR matches all Unicode line endings
+  PCRE_CASELESS           Do caseless matching
+  PCRE_DOLLAR_ENDONLY     $ not to match newline at end
+  PCRE_DOTALL             . matches anything including NL
+  PCRE_DUPNAMES           Allow duplicate names for subpatterns
+  PCRE_EXTENDED           Ignore whitespace and # comments
+  PCRE_EXTRA              PCRE extra features
+                            (not much use currently)
+  PCRE_FIRSTLINE          Force matching to be before newline
+  PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
+  PCRE_MULTILINE          ^ and $ match newlines within data
+  PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
+  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline 
+                            sequences
+  PCRE_NEWLINE_CR         Set CR as the newline sequence
+  PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence
+  PCRE_NEWLINE_LF         Set LF as the newline sequence
+  PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
+                            theses (named ones available)
+  PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
+                            validity (only relevant if
+                            PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 .sp
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index c406f60..6b586e8 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -395,7 +395,9 @@ avoiding the use of the stack.
 Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be
 called to compile a pattern into an internal form. The only difference between
 the two interfaces is that \fBpcre_compile2()\fP has an additional argument,
-\fIerrorcodeptr\fP, via which a numerical error code can be returned.
+\fIerrorcodeptr\fP, via which a numerical error code can be returned. To avoid 
+too much repetition, we refer just to \fBpcre_compile()\fP below, but the 
+information applies equally to \fBpcre_compile2()\fP.
 .P
 The pattern is a C string terminated by a binary zero, and is passed in the
 \fIpattern\fP argument. A pointer to a single block of memory that is obtained
@@ -412,23 +414,23 @@ argument, which is an address (see below).
 The \fIoptions\fP argument contains various bit settings that affect the
 compilation. It should be zero if no options are required. The available
 options are described below. Some of them (in particular, those that are
-compatible with Perl, but also some others) can also be set and unset from
+compatible with Perl, but some others as well) can also be set and unset from
 within the pattern (see the detailed description in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
 documentation). For those options that can be different in different parts of
-the pattern, the contents of the \fIoptions\fP argument specifies their initial
-settings at the start of compilation and execution. The PCRE_ANCHORED and
-PCRE_NEWLINE_\fIxxx\fP options can be set at the time of matching as well as at
-compile time.
+the pattern, the contents of the \fIoptions\fP argument specifies their
+settings at the start of compilation and execution. The PCRE_ANCHORED, 
+PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
+of matching as well as at compile time.
 .P
 If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
 NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual
 error message. This is a static string that is part of the library. You must
 not try to free it. The byte offset from the start of the pattern to the
-character that was being processes when the error was discovered is placed in
+character that was being processed when the error was discovered is placed in
 the variable pointed to by \fIerroffset\fP, which must not be NULL. If it is,
 an immediate error is given. Some errors are not detected until checks are
 carried out when the whole pattern has been scanned; in this case the offset is
@@ -984,7 +986,7 @@ is -1.
 .sp
 If the pattern was studied and a minimum length for matching subject strings
 was computed, its value is returned. Otherwise the returned value is -1. The
-value is a number of characters, not bytes (there may be a difference in UTF-8
+value is a number of characters, not bytes (this may be relevant in UTF-8
 mode). The fourth argument should point to an \fBint\fP variable. A
 non-negative value is a lower bound to the length of any matching string. There
 may not be any strings of that length that do actually match, but every string
@@ -1209,7 +1211,7 @@ the block by setting the other fields and their corresponding flag bits.
 The \fImatch_limit\fP field provides a means of preventing PCRE from using up a
 vast amount of resources when running patterns that are not going to match,
 but which have a very large number of possibilities in their search trees. The
-classic example is the use of nested unlimited repeats.
+classic example is a pattern that uses nested unlimited repeats.
 .P
 Internally, PCRE uses a function called \fBmatch()\fP which it calls repeatedly
 (sometimes recursively). The limit set by \fImatch_limit\fP is imposed on the
@@ -1508,7 +1510,7 @@ the \fIovector\fP is not big enough to remember the related substrings, PCRE
 has to get additional memory for use during matching. Thus it is usually
 advisable to supply an \fIovector\fP.
 .P
-The \fBpcre_info()\fP function can be used to find out how many capturing
+The \fBpcre_fullinfo()\fP function can be used to find out how many capturing
 subpatterns there are in a compiled pattern. The smallest size for
 \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to
 the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.
@@ -2043,6 +2045,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 26 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrebuild.3 b/doc/pcrebuild.3
index 4801263..dd970dc 100644
--- a/doc/pcrebuild.3
+++ b/doc/pcrebuild.3
@@ -1,6 +1,8 @@
 .TH PCREBUILD 3
 .SH NAME
 PCRE - Perl-compatible regular expressions
+.
+.
 .SH "PCRE BUILD-TIME OPTIONS"
 .rs
 .sp
@@ -29,6 +31,7 @@ The following sections include descriptions of options whose names begin with
 --enable and --disable always come in pairs, so the complementary option always
 exists as well, but as it specifies the default, it is not described.
 .
+.
 .SH "C++ SUPPORT"
 .rs
 .sp
@@ -40,6 +43,7 @@ for PCRE. You can disable this by adding
 .sp
 to the \fBconfigure\fP command.
 .
+.
 .SH "UTF-8 SUPPORT"
 .rs
 .sp
@@ -50,7 +54,7 @@ To build PCRE with support for UTF-8 Unicode character strings, add
 to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
 strings as UTF-8. As well as compiling PCRE with this option, you also have
 have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
-function.
+or \fBpcre_compile2()\fP functions.
 .P
 If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects
 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
@@ -58,6 +62,7 @@ not possible to support both EBCDIC and UTF-8 codes in the same version of the
 library. Consequently, --enable-utf8 and --enable-ebcdic are mutually
 exclusive.
 .
+.
 .SH "UNICODE CHARACTER PROPERTY SUPPORT"
 .rs
 .sp
@@ -80,6 +85,7 @@ supported. Details are given in the
 .\"
 documentation.
 .
+.
 .SH "CODE VALUE OF NEWLINE"
 .rs
 .sp
@@ -112,6 +118,7 @@ Whatever line ending convention is selected when PCRE is built can be
 overridden when the library functions are called. At build time it is
 conventional to use the standard for your operating system.
 .
+.
 .SH "WHAT \eR MATCHES"
 .rs
 .sp
@@ -124,6 +131,7 @@ the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
 selected when PCRE is built can be overridden when the library functions are
 called.
 .
+.
 .SH "BUILDING SHARED AND STATIC LIBRARIES"
 .rs
 .sp
@@ -135,6 +143,7 @@ Unix libraries by default. You can suppress one of these by adding one of
 .sp
 to the \fBconfigure\fP command, as required.
 .
+.
 .SH "POSIX MALLOC USAGE"
 .rs
 .sp
@@ -154,6 +163,7 @@ such as
 .sp
 to the \fBconfigure\fP command.
 .
+.
 .SH "HANDLING VERY LARGE PATTERNS"
 .rs
 .sp
@@ -162,8 +172,8 @@ another (for example, from an opening parenthesis to an alternation
 metacharacter). By default, two-byte values are used for these offsets, leading
 to a maximum size for a compiled pattern of around 64K. This is sufficient to
 handle all but the most gigantic patterns. Nevertheless, some people do want to
-process enormous patterns, so it is possible to compile PCRE to use three-byte
-or four-byte offsets by adding a setting such as
+process truyl enormous patterns, so it is possible to compile PCRE to use
+three-byte or four-byte offsets by adding a setting such as
 .sp
   --with-link-size=3
 .sp
@@ -171,6 +181,7 @@ to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using
 longer offsets slows down the operation of PCRE because it has to load
 additional bytes when handling them.
 .
+.
 .SH "AVOIDING EXCESSIVE STACK USAGE"
 .rs
 .sp
@@ -194,7 +205,7 @@ to the \fBconfigure\fP command. With this configuration, PCRE will use the
 \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
 management functions. By default these point to \fBmalloc()\fP and
 \fBfree()\fP, but you can replace the pointers so that your own functions are
-used.
+used instead.
 .P
 Separate functions are provided rather than using \fBpcre_malloc\fP and
 \fBpcre_free\fP because the usage is very predictable: the block sizes
@@ -202,7 +213,8 @@ requested are always the same, and the blocks are always freed in reverse
 order. A calling program might be able to implement optimized functions that
 perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
 slowly when built in this way. This option affects only the \fBpcre_exec()\fP
-function; it is not relevant for the the \fBpcre_dfa_exec()\fP function.
+function; it is not relevant for \fBpcre_dfa_exec()\fP.
+.
 .
 .SH "LIMITING PCRE RESOURCE USAGE"
 .rs
@@ -235,6 +247,7 @@ constraints. However, you can set a lower limit by adding, for example,
 .sp
 to the \fBconfigure\fP command. This value can also be overridden at run time.
 .
+.
 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
 .rs
 .sp
@@ -253,6 +266,7 @@ compiling, because \fBdftables\fP is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by
 hand".)
 .
+.
 .SH "USING EBCDIC CODE"
 .rs
 .sp
@@ -268,6 +282,7 @@ to the \fBconfigure\fP command. This setting implies
 an EBCDIC environment (for example, an IBM mainframe operating system). The
 --enable-ebcdic option is incompatible with --enable-utf8.
 .
+.
 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
 .rs
 .sp
@@ -282,6 +297,7 @@ to the \fBconfigure\fP command. These options naturally require that the
 relevant libraries are installed on your system. Configuration will fail if
 they are not.
 .
+.
 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
 .rs
 .sp
@@ -292,7 +308,7 @@ If you add
 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
 \fBlibreadline\fP library, and when its input is from a terminal, it reads it
 using the \fBreadline()\fP function. This provides line-editing and history
-facilities. Note that \fBlibreadline\fP is GPL-licenced, so if you distribute a
+facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
 binary of \fBpcretest\fP linked in this way, there may be licensing issues.
 .P
 Setting this option causes the \fB-lreadline\fP option to be added to the
@@ -334,6 +350,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 06 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrecallout.3 b/doc/pcrecallout.3
index abdbaed..ad8a211 100644
--- a/doc/pcrecallout.3
+++ b/doc/pcrecallout.3
@@ -19,9 +19,10 @@ For example, this pattern has two callout points:
 .sp
   (?C1)abc(?C2)def
 .sp
-If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called,
-PCRE automatically inserts callouts, all with number 255, before each item in
-the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
+If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP is called, PCRE automatically inserts callouts, all with
+number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT
+is used with the pattern
 .sp
   A(\ed{2}|--)
 .sp
@@ -54,6 +55,11 @@ string is "abyz", the lack of "d" means that matching doesn't ever start, and
 the callout is never reached. However, with "abyd", though the result is still
 no match, the callout is obeyed.
 .P
+If the pattern is studied, PCRE knows the minimum length of a matching string,
+and will immediately give a "no match" return without actually running a match
+if the subject is not long enough, or, for unanchored patterns, if it has
+been scanned far enough.
+.P
 You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
 option to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. This slows down the
 matching process, but does ensure that callouts such as the example above are
@@ -155,7 +161,7 @@ The external callout function returns an integer to PCRE. If the value is zero,
 matching proceeds as normal. If the value is greater than zero, matching fails
 at the current point, but the testing of other matching possibilities goes
 ahead, just as if a lookahead assertion had failed. If the value is less than
-zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP)
+zero, the match is abandoned, and \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
 returns the negative value.
 .P
 Negative values should normally be chosen from the set of PCRE_ERROR_xxx
@@ -178,6 +184,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 15 March 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrecompat.3 b/doc/pcrecompat.3
index f32b071..2028c52 100644
--- a/doc/pcrecompat.3
+++ b/doc/pcrecompat.3
@@ -5,9 +5,8 @@ PCRE - Perl-compatible regular expressions
 .rs
 .sp
 This document describes the differences in the ways that PCRE and Perl handle
-regular expressions. The differences described here are mainly with respect to
-Perl 5.8, though PCRE versions 7.0 and later contain some features that are
-in Perl 5.10.
+regular expressions. The differences described here are with respect to Perl
+5.10.
 .P
 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
 it does have are given in the
@@ -86,7 +85,7 @@ section on recursion differences from Perl
 .\"
 in the
 .\" HREF
-\fBpcrecompat\fP
+\fBpcrepattern\fP
 .\"
 page.
 .P
@@ -98,14 +97,30 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
 (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an
 argument. PCRE does not support (*MARK).
 .P
-12. PCRE provides some extensions to the Perl regular expression facilities.
-Perl 5.10 will include new features that are not in earlier versions, some of
-which (such as named parentheses) have been in PCRE for some time. This list is
-with respect to Perl 5.10:
+12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern 
+names is not as general as Perl's. This is a consequence of the fact the PCRE 
+works internally just with numbers, using an external table to translate 
+between numbers and names. The following are some specific differences:
+.sp
+(a) After matching a pattern such as (?|(?<a>A)|(?<b)B) where the two capturing 
+parentheses have the same number but different names, it is not possible to 
+distinguish which parentheses matched, because both names map to capturing
+subpattern number 1.
+.sp
+(b) A condition test for a subpattern with a name that is duplicated gives
+unpredictable results. For example, when the pattern
+(?:(?<a>A)|(?<a>B))(?('a')...|...) is compiled (the PCRE_DUPNAMES option is
+required), the condition test (?('a') is set to test whether subpattern 1 has
+matched, ignoring subpattern 2, even though it has the same name.
+.P
+13. PCRE provides some extensions to the Perl regular expression facilities.
+Perl 5.10 includes new features that are not in earlier versions of Perl, some
+of which (such as named parentheses) have been in PCRE for some time. This list
+is with respect to Perl 5.10:
 .sp
-(a) Although lookbehind assertions must match fixed length strings, each
-alternative branch of a lookbehind assertion can match a different length of
-string. Perl requires them all to have the same length.
+(a) Although lookbehind assertions in PCRE must match fixed length strings,
+each alternative branch of a lookbehind assertion can match a different length
+of string. Perl requires them all to have the same length.
 .sp
 (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
@@ -155,6 +170,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 18 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrematching.3 b/doc/pcrematching.3
index a3b8363..2e2abd9 100644
--- a/doc/pcrematching.3
+++ b/doc/pcrematching.3
@@ -74,13 +74,17 @@ this is a kind of "DFA algorithm", though it is not implemented as a
 traditional finite state machine (it keeps multiple states active
 simultaneously).
 .P
+Although the general principle of this matching algorithm is that it scans the 
+subject string only once, without backtracking, there is one exception: when a 
+lookaround assertion is encountered, the characters following or preceding the 
+current point have to be independently inspected.
+.P
 The scan continues until either the end of the subject is reached, or there are
 no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
-them, and in particular, it finds the longest. In PCRE, there is an option to
-stop the algorithm after the first match (which is necessarily the shortest)
-has been found.
+them, and in particular, it finds the longest. There is an option to stop the
+algorithm after the first match (which is necessarily the shortest) is found.
 .P
 Note that all the matches that are found start at the same point in the
 subject. If the pattern
@@ -92,11 +96,6 @@ the three strings "cat", "cater", and "caterpillar" that start at the fourth
 character of the subject. The algorithm does not automatically move on to find
 matches that start at later positions.
 .P
-Although the general principle of this matching algorithm is that it scans the 
-subject string only once, without backtracking, there is one exception: when a 
-lookbehind assertion is encountered, the preceding characters have to be
-re-inspected.
-.P
 There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:
 .P
@@ -152,7 +151,12 @@ callouts.
 2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack, it is possible to pass very long subject strings to
 the matching function in several pieces, checking for partial matching each
-time.
+time. The
+.\" HREF                                                                
+\fBpcrepartial\fP
+.\"                                                           
+documentation gives details of partial matching.
+.
 .
 .SH "DISADVANTAGES OF THE ALTERNATIVE ALGORITHM"
 .rs
@@ -183,6 +187,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrepartial.3 b/doc/pcrepartial.3
index 0e9cc47..05487e1 100644
--- a/doc/pcrepartial.3
+++ b/doc/pcrepartial.3
@@ -32,10 +32,13 @@ whether or not a partial match is preferred to an alternative complete match,
 though the details differ between the two matching functions. If both options
 are set, PCRE_PARTIAL_HARD takes precedence.
 .P
-Setting a partial matching option disables one of PCRE's optimizations. PCRE
+Setting a partial matching option disables two of PCRE's optimizations. PCRE
 remembers the last literal byte in a pattern, and abandons matching immediately
 if such a byte is not present in the subject string. This optimization cannot
-be used for a subject string that might match only partially.
+be used for a subject string that might match only partially. If the pattern 
+was studied, PCRE knows the minimum length of a matching string, and does not 
+bother to run the matching function on shorter strings. This optimization is 
+also disabled for partial matching.
 .
 .
 .SH "PARTIAL MATCHING USING pcre_exec()"
@@ -53,7 +56,7 @@ instead of PCRE_ERROR_NOMATCH. If there are at least two slots in the offsets
 vector, the first of them is set to the offset of the earliest character that
 was inspected when the partial match was found. For convenience, the second
 offset points to the end of the string so that a substring can easily be
-extracted.
+identified.
 .P
 For the majority of patterns, the first offset identifies the start of the
 partially matched string. However, for patterns that contain lookbehind
@@ -358,6 +361,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 0b26453..34a5373 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -21,10 +21,10 @@ published by O'Reilly, covers regular expressions in great detail. This
 description of PCRE's regular expressions is intended as reference material.
 .P
 The original operation of PCRE was on strings of one-byte characters. However,
-there is now also support for UTF-8 character strings. To use this, you must
-build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with
-the PCRE_UTF8 option. There is also a special sequence that can be given at the
-start of a pattern:
+there is now also support for UTF-8 character strings. To use this, 
+PCRE must be built to include UTF-8 support, and you must call
+\fBpcre_compile()\fP or \fBpcre_compile2()\fP with the PCRE_UTF8 option. There
+is also a special sequence that can be given at the start of a pattern:
 .sp
   (*UTF8)
 .sp
@@ -83,8 +83,9 @@ string with one of the following five sequences:
   (*ANYCRLF)   any of the three above
   (*ANY)       all Unicode newline sequences
 .sp
-These override the default and the options given to \fBpcre_compile()\fP. For
-example, on a Unix system where LF is the default newline sequence, the pattern
+These override the default and the options given to \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP. For example, on a Unix system where LF is the default
+newline sequence, the pattern
 .sp
   (*CR)a.b
 .sp
@@ -206,9 +207,8 @@ The \eQ...\eE sequence is recognized both inside and outside character classes.
 A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
-but when a pattern is being prepared by text editing, it is usually easier to
-use one of the following escape sequences than the binary character it
-represents:
+but when a pattern is being prepared by text editing, it is often easier to use
+one of the following escape sequences than the binary character it represents:
 .sp
   \ea        alarm, that is, the BEL character (hex 07)
   \ecx       "control-x", where x is any character
@@ -468,12 +468,13 @@ one of the following sequences:
   (*BSR_ANYCRLF)   CR, LF, or CRLF only
   (*BSR_UNICODE)   any Unicode newline sequence
 .sp
-These override the default and the options given to \fBpcre_compile()\fP, but
-they can be overridden by options given to \fBpcre_exec()\fP. Note that these
-special settings, which are not Perl-compatible, are recognized only at the
-very start of a pattern, and that they must be in upper case. If more than one
-of them is present, the last one is used. They can be combined with a change of
-newline convention, for example, a pattern can start with:
+These override the default and the options given to \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP, but they can be overridden by options given to
+\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. Note that these special settings,
+which are not Perl-compatible, are recognized only at the very start of a
+pattern, and that they must be in upper case. If more than one of them is
+present, the last one is used. They can be combined with a change of newline
+convention, for example, a pattern can start with:
 .sp
   (*ANY)(*BSR_ANYCRLF)
 .sp
@@ -740,7 +741,10 @@ different meaning, namely the backspace character, inside a character class).
 A word boundary is a position in the subject string where the current character
 and the previous character do not both match \ew or \eW (i.e. one matches
 \ew and the other matches \eW), or the start or end of the string if the
-first or last character matches \ew, respectively.
+first or last character matches \ew, respectively. Neither PCRE nor Perl has a 
+separte "start of word" or "end of word" metasequence. However, whatever 
+follows \eb normally determines which it is. For example, the fragment 
+\eba matches "a" at the start of a word.
 .P
 The \eA, \eZ, and \ez assertions differ from the traditional circumflex and
 dollar (described in the next section) in that they only ever match at the very
@@ -872,14 +876,15 @@ the lookbehind.
 .rs
 .sp
 An opening square bracket introduces a character class, terminated by a closing
-square bracket. A closing square bracket on its own is not special. If a
-closing square bracket is required as a member of the class, it should be the
-first data character in the class (after an initial circumflex, if present) or
-escaped with a backslash.
+square bracket. A closing square bracket on its own is not special by default. 
+However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square 
+bracket causes a compile-time error. If a closing square bracket is required as
+a member of the class, it should be the first data character in the class
+(after an initial circumflex, if present) or escaped with a backslash.
 .P
 A character class matches a single character in the subject. In UTF-8 mode, the
-character may occupy more than one byte. A matched character must be in the set
-of characters defined by the class, unless the first character in the class
+character may be more than one byte long. A matched character must be in the
+set of characters defined by the class, unless the first character in the class
 definition is a circumflex, in which case the subject character must not be in
 the set defined by the class. If a circumflex is actually required as a member
 of the class, ensure it is not the first character, or escape it with a
@@ -889,7 +894,7 @@ For example, the character class [aeiou] matches any lower case vowel, while
 [^aeiou] matches any character that is not a lower case vowel. Note that a
 circumflex is just a convenient notation for specifying the characters that
 are in the class by enumerating those that are not. A class that starts with a
-circumflex is not an assertion: it still consumes a character from the subject
+circumflex is not an assertion; it still consumes a character from the subject
 string, and therefore it fails if the current pointer is at the end of the
 string.
 .P
@@ -903,9 +908,9 @@ caseful version would. In UTF-8 mode, PCRE always understands the concept of
 case for characters whose values are less than 128, so caseless matching is
 always possible. For characters with higher values, the concept of case is
 supported if PCRE is compiled with Unicode property support, but not otherwise.
-If you want to use caseless matching for characters 128 and above, you must
-ensure that PCRE is compiled with Unicode property support as well as with
-UTF-8 support.
+If you want to use caseless matching in UTF8-mode for characters 128 and above,
+you must ensure that PCRE is compiled with Unicode property support as well as
+with UTF-8 support.
 .P
 Characters that might indicate line breaks are never treated in any special way
 when matching character classes, whatever line-ending sequence is in use, and
@@ -1132,6 +1137,7 @@ is reached, an option setting in one branch does affect subsequent branches, so
 the above patterns match "SUNDAY" as well as "Saturday".
 .
 .
+.\" HTML <a name="dupsubpatternnumber"></a>
 .SH "DUPLICATE SUBPATTERN NUMBERS"
 .rs
 .sp
@@ -1157,10 +1163,20 @@ stored.
   / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
   # 1            2         2  3        2     3     4
 .sp
-A backreference or a recursive call to a numbered subpattern always refers to
-the first one in the pattern with the given number.
+A backreference to a numbered subpattern uses the most recent value that is set 
+for that number by any subpattern. The following pattern matches "abcabc" or
+"defdef":
+.sp
+  /(?|(abc)|(def))\1/
+.sp
+In contrast, a recursive or "subroutine" call to a numbered subpattern always
+refers to the first one in the pattern with the given number. The following 
+pattern matches "abcabc" or "defabc":
+.sp
+  /(?|(abc)|(def))(?1)/
+.sp
 .P
-An alternative approach to using this "branch reset" feature is to use
+An alternative approach to using the "branch reset" feature is to use
 duplicate named subpatterns, as described in the next section.
 .
 .
@@ -1247,6 +1263,7 @@ items:
   a character class
   a back reference (see next section)
   a parenthesized subpattern (unless it is an assertion)
+  a recursive or "subroutine" call to a subpattern 
 .sp
 The general repetition quantifier specifies a minimum and maximum number of
 permitted matches, by giving the two numbers in curly brackets (braces),
@@ -1568,16 +1585,19 @@ after the reference.
 .P
 There may be more than one back reference to the same subpattern. If a
 subpattern has not actually been used in a particular match, any back
-references to it always fail. For example, the pattern
+references to it always fail by default. For example, the pattern
 .sp
   (a|(bc))\e2
 .sp
-always fails if it starts to match "a" rather than "bc". Because there may be
-many capturing parentheses in a pattern, all digits following the backslash are
-taken as part of a potential back reference number. If the pattern continues
-with a digit character, some delimiter must be used to terminate the back
-reference. If the PCRE_EXTENDED option is set, this can be whitespace.
-Otherwise an empty comment (see
+always fails if it starts to match "a" rather than "bc". However, if the 
+PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an 
+unset value matches an empty string.
+.P
+Because there may be many capturing parentheses in a pattern, all digits
+following a backslash are taken as part of a potential back reference number.
+If the pattern continues with a digit character, some delimiter must be used to
+terminate the back reference. If the PCRE_EXTENDED option is set, this can be
+whitespace. Otherwise, the \eg{ syntax or an empty comment (see
 .\" HTML <a href="#comments">
 .\" </a>
 "Comments"
@@ -1650,6 +1670,8 @@ lookbehind assertion is needed to achieve the other effect.
 If you want to force a matching failure at some point in a pattern, the most
 convenient way to do it is with (?!) because an empty string always matches, so
 an assertion that requires there not to be an empty string must always fail.
+The Perl 5.10 backtracking control verb (*FAIL) or (*F) is essentially a
+synonym for (?!).
 .
 .
 .\" HTML <a name="lookbehind"></a>
@@ -1716,8 +1738,8 @@ Recursion,
 however, is not supported.
 .P
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
-specify efficient matching at the end of the subject string. Consider a simple
-pattern such as
+specify efficient matching of fixed-length strings at the end of subject
+strings. Consider a simple pattern such as
 .sp
   abcd$
 .sp
@@ -1781,8 +1803,8 @@ characters that are not "999".
 .sp
 It is possible to cause the matching process to obey a subpattern
 conditionally or to choose between two alternative subpatterns, depending on
-the result of an assertion, or whether a previous capturing subpattern matched
-or not. The two possible forms of conditional subpattern are
+the result of an assertion, or whether a specific capturing subpattern has 
+already been matched. The two possible forms of conditional subpattern are:
 .sp
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)
@@ -1798,12 +1820,20 @@ recursion, a pseudo-condition called DEFINE, and assertions.
 .rs
 .sp
 If the text between the parentheses consists of a sequence of digits, the
-condition is true if the capturing subpattern of that number has previously
-matched. An alternative notation is to precede the digits with a plus or minus
-sign. In this case, the subpattern number is relative rather than absolute.
-The most recently opened parentheses can be referenced by (?(-1), the next most
-recent by (?(-2), and so on. In looping constructs it can also make sense to
-refer to subsequent groups with constructs such as (?(+2).
+condition is true if a capturing subpattern of that number has previously
+matched. If there is more than one capturing subpattern with the same number 
+(see the earlier 
+.\"
+.\" HTML <a href="#recursion">
+.\" </a>
+section about duplicate subpattern numbers),
+.\"
+the condition is true if any of them have been set. An alternative notation is
+to precede the digits with a plus or minus sign. In this case, the subpattern
+number is relative rather than absolute. The most recently opened parentheses
+can be referenced by (?(-1), the next most recent by (?(-2), and so on. In
+looping constructs it can also make sense to refer to subsequent groups with
+constructs such as (?(+2).
 .P
 Consider the following pattern, which contains non-significant white space to
 make it more readable (assume the PCRE_EXTENDED option) and to divide it into
@@ -1855,7 +1885,7 @@ letter R, for example:
 .sp
   (?(R3)...) or (?(R&name)...)
 .sp
-the condition is true if the most recent recursion is into the subpattern whose
+the condition is true if the most recent recursion is into a subpattern whose
 number or name is given. This condition does not check the entire recursion
 stack.
 .P
@@ -1887,11 +1917,9 @@ written like this (ignore whitespace and line breaks):
 The first part of the pattern is a DEFINE group inside which a another group
 named "byte" is defined. This matches an individual component of an IPv4
 address (a number less than 256). When matching takes place, this part of the
-pattern is skipped because DEFINE acts like a false condition.
-.P
-The rest of the pattern uses references to the named group to match the four
-dot-separated components of an IPv4 address, insisting on a word boundary at
-each end.
+pattern is skipped because DEFINE acts like a false condition. The rest of the
+pattern uses references to the named group to match the four dot-separated
+components of an IPv4 address, insisting on a word boundary at each end.
 .
 .SS "Assertion conditions"
 .rs
@@ -1963,23 +1991,24 @@ a recursive call of the entire regular expression.
 This PCRE pattern solves the nested parentheses problem (assume the
 PCRE_EXTENDED option is set so that white space is ignored):
 .sp
-  \e( ( (?>[^()]+) | (?R) )* \e)
+  \e( ( [^()]++ | (?R) )* \e)
 .sp
 First it matches an opening parenthesis. Then it matches any number of
 substrings which can either be a sequence of non-parentheses, or a recursive
 match of the pattern itself (that is, a correctly parenthesized substring).
-Finally there is a closing parenthesis.
+Finally there is a closing parenthesis. Note the use of a possessive quantifier 
+to avoid backtracking into sequences of non-parentheses.
 .P
 If this were part of a larger pattern, you would not want to recurse the entire
 pattern, so instead you could use this:
 .sp
-  ( \e( ( (?>[^()]+) | (?1) )* \e) )
+  ( \e( ( [^()]++ | (?1) )* \e) )
 .sp
 We have put the pattern into parentheses, and caused the recursion to refer to
 them instead of the whole pattern.
 .P
 In a larger pattern, keeping track of parenthesis numbers can be tricky. This
-is made easier by the use of relative references. (A Perl 5.10 feature.)
+is made easier by the use of relative references (a Perl 5.10 feature).
 Instead of (?1) in the pattern above you can write (?-2) to refer to the second
 most recently opened parentheses preceding the recursion. In other words, a
 negative number counts capturing parentheses leftwards from the point at which
@@ -1998,19 +2027,19 @@ An alternative approach is to use named parentheses instead. The Perl syntax
 for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We
 could rewrite the above example as follows:
 .sp
-  (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )
+  (?<pn> \e( ( [^()]++ | (?&pn) )* \e) )
 .sp
 If there is more than one subpattern with the same name, the earliest one is
 used.
 .P
 This particular example pattern that we have been looking at contains nested
-unlimited repeats, and so the use of atomic grouping for matching strings of
-non-parentheses is important when applying the pattern to strings that do not
-match. For example, when this pattern is applied to
+unlimited repeats, and so the use of a possessive quantifier for matching
+strings of non-parentheses is important when applying the pattern to strings
+that do not match. For example, when this pattern is applied to
 .sp
   (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 .sp
-it yields "no match" quickly. However, if atomic grouping is not used,
+it yields "no match" quickly. However, if a possessive quantifier is not used,
 the match runs for a very long time indeed because there are so many different
 ways the + and * repeats can carve up the subject, and all have to be tested
 before failure can be reported.
@@ -2029,7 +2058,7 @@ documentation). If the pattern above is matched against
 the value for the capturing parentheses is "ef", which is the last value taken
 on at the top level. If additional parentheses are added, giving
 .sp
-  \e( ( ( (?>[^()]+) | (?R) )* ) \e)
+  \e( ( ( [^()]++ | (?R) )* ) \e)
      ^                        ^
      ^                        ^
 .sp
@@ -2113,6 +2142,13 @@ the use of the possessive quantifier *+ to avoid backtracking into sequences of
 non-word characters. Without this, PCRE takes a great deal longer (ten times or
 more) to match typical phrases, and Perl takes so long that you think it has
 gone into a loop.
+.P
+\fBWARNING\fP: The palindrome-matching patterns above work only if the subject
+string does not start with a palindrome that is shorter than the entire string.
+For example, although "abcba" is correctly matched, if the subject is "ababa",
+PCRE finds the palindrome "aba" at the start, then fails at top level because
+the end of the string does not follow. Once again, it cannot jump back into the
+recursion to try other alternatives, so the entire match fails.
 .
 .
 .\" HTML <a name="subpatternsassubroutines"></a>
@@ -2248,8 +2284,8 @@ The following verbs act as soon as they are encountered:
 .sp
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. When inside a recursion, only the innermost pattern is ended
-immediately. If the (*ACCEPT) is inside capturing parentheses, the data so far
-is captured. (This feature was added to PCRE at release 8.00.) For example:
+immediately. If (*ACCEPT) is inside capturing parentheses, the data so far is
+captured. (This feature was added to PCRE at release 8.00.) For example:
 .sp
   A((?:A|B(*ACCEPT)|C)D)
 .sp
@@ -2280,7 +2316,7 @@ The verbs differ in exactly what kind of failure occurs.
 .sp
 This verb causes the whole match to fail outright if the rest of the pattern
 does not match. Even if the pattern is unanchored, no further attempts to find
-a match by advancing the start point take place. Once (*COMMIT) has been
+a match by advancing the starting point take place. Once (*COMMIT) has been
 passed, \fBpcre_exec()\fP is committed to finding a match at the current
 starting point, or not at all. For example:
 .sp
@@ -2312,7 +2348,7 @@ was matched leading up to it cannot be part of a successful match. Consider:
 If the subject is "aaaac...", after the first match attempt fails (starting at
 the first character in the string), the starting point skips on to start the
 next attempt at "c". Note that a possessive quantifer does not have the same
-effect in this example; although it would suppress backtracking during the
+effect as this example; although it would suppress backtracking during the
 first match attempt, the second attempt would start at the second character
 instead of skipping on to "c".
 .sp
@@ -2334,7 +2370,8 @@ is used outside of any alternation, it acts exactly like (*PRUNE).
 .SH "SEE ALSO"
 .rs
 .sp
-\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), \fBpcre\fP(3).
+\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), 
+\fBpcresyntax\fP(3), \fBpcre\fP(3).
 .
 .
 .SH AUTHOR
@@ -2351,6 +2388,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 22 September 2009
+Last updated: 30 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcresample.3 b/doc/pcresample.3
index 48941c5..f7eefda 100644
--- a/doc/pcresample.3
+++ b/doc/pcresample.3
@@ -25,8 +25,8 @@ string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 .P
 If PCRE is installed in the standard include and library directories for your
-system, you should be able to compile the demonstration program using this
-command:
+operating system, you should be able to compile the demonstration program using
+this command:
 .sp
   gcc -o pcredemo pcredemo.c -lpcre
 .sp
@@ -87,6 +87,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 01 September 2009
+Last updated: 30 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/perltest.txt b/doc/perltest.txt
index ca02690..fbbc10e 100644
--- a/doc/perltest.txt
+++ b/doc/perltest.txt
@@ -1,7 +1,7 @@
 The perltest program
 --------------------
 
-The perltest program tests Perl's regular expressions; it has the same
+The perltest.pl script tests Perl's regular expressions; it has the same
 specification as pcretest, and so can be given identical input, except that
 input patterns can be followed only by Perl's lower case modifiers and /+ (as
 used by pcretest), which is recognized and handled by the program.
@@ -14,20 +14,14 @@ modifiers such as /A that pcretest recognizes, and its special data line
 escapes, are not used in these files. The output should be identical, apart
 from the initial identifying banner.
 
-The perltest script can also test UTF-8 features. It works as is for Perl 5.8
-or higher. It recognizes the special modifier /8 that pcretest uses to invoke
-UTF-8 functionality. The testinput4 file can be fed to perltest to run
-compatible UTF-8 tests.
+The perltest.pl script can also test UTF-8 features. It recognizes the special
+modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4
+file can be fed to perltest to run compatible UTF-8 tests.
 
-For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
-uncomment the "use utf8" lines that it contains. It is best to do this on a
-copy of the script, because for non-UTF-8 tests, these lines should remain
-commented out.
-
-The other testinput files are not suitable for feeding to perltest, since they
-make use of the special upper case modifiers and escapes that pcretest uses to
-test some features of PCRE. Some of these files also contains malformed regular
-expressions, in order to check that PCRE diagnoses them correctly.
+The other testinput files are not suitable for feeding to perltest.pl, since
+they make use of the special upper case modifiers and escapes that pcretest
+uses to test some features of PCRE. Some of these files also contains malformed
+regular expressions, in order to check that PCRE diagnoses them correctly.
 
 Philip Hazel
-September 2004
+September 2009
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-10-02 08:53:31 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-10-02 08:53:31 +0000
commit	80d5ddd534157a15013c2314a4236bdd8ac0b72f (patch)
tree	e9cc83556582cde2b0adf4e2a4d98ae75c35aecf
parent	5c3493df2827cb70ddc42899df2f9ee30f5e7a7b (diff)
download	pcre-80d5ddd534157a15013c2314a4236bdd8ac0b72f.tar.gz