Added a pcresyntax man page; tidied some others.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@208 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-08-06 15:23:29 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-08-06 15:23:29 +0000
commit: 7a27d7cb191012cfba8d5e2b43d96bbc47d43c8b (patch)
tree: 125cef490f6bc14f778719247a2f3373e1d0dcd8
parent: c686e88e16cd4dfec241981367ab8c35c9a148f6 (diff)
download: pcre-7a27d7cb191012cfba8d5e2b43d96bbc47d43c8b.tar.gz
17 files changed, 1496 insertions, 384 deletions
diff --git a/ChangeLog b/ChangeLog
index 3737357..4166e75 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -64,26 +64,26 @@ Version 7.3 05-Jul-07
     dynamic way, which I have now done. The artificial limitation on group
     length has been removed - we now have only the limit on the total length of
     the compiled pattern, which depends on the LINK_SIZE setting.
-    
-10. Fixed a bug in the documentation for get/copy named substring when 
-    duplicate names are permitted. If none of the named substrings are set, the 
-    functions return PCRE_ERROR_NOSUBSTRING (7); the doc said they returned an 
-    empty string. 
-    
-11. Because Perl interprets \Q...\E at a high level, and ignores orphan \E 
-    instances, patterns such as [\Q\E] or [\E] or even [^\E] cause an error, 
-    because the ] is interpreted as the first data character and the 
-    terminating ] is not found. PCRE has been made compatible with Perl in this 
-    regard. Previously, it interpreted [\Q\E] as an empty class, and [\E] could 
-    cause memory overwriting. 
-    
+
+10. Fixed a bug in the documentation for get/copy named substring when
+    duplicate names are permitted. If none of the named substrings are set, the
+    functions return PCRE_ERROR_NOSUBSTRING (7); the doc said they returned an
+    empty string.
+
+11. Because Perl interprets \Q...\E at a high level, and ignores orphan \E
+    instances, patterns such as [\Q\E] or [\E] or even [^\E] cause an error,
+    because the ] is interpreted as the first data character and the
+    terminating ] is not found. PCRE has been made compatible with Perl in this
+    regard. Previously, it interpreted [\Q\E] as an empty class, and [\E] could
+    cause memory overwriting.
+
 10. Like Perl, PCRE automatically breaks an unlimited repeat after an empty
     string has been matched (to stop an infinite loop). It was not recognizing
-    a conditional subpattern that could match an empty string if that 
+    a conditional subpattern that could match an empty string if that
     subpattern was within another subpattern. For example, it looped when
-    trying to match  (((?(1)X|))*)  but it was OK with  ((?(1)X|)*)  where the 
+    trying to match  (((?(1)X|))*)  but it was OK with  ((?(1)X|)*)  where the
     condition was not nested. This bug has been fixed.
-    
+
 12. A pattern like \X?\d or \P{L}?\d in non-UTF-8 mode could cause a backtrack
     past the start of the subject in the presence of bytes with the top bit
     set, for example "\x8aBCD".
diff --git a/PrepareRelease b/PrepareRelease
index cdf98bd..def850a 100755
--- a/PrepareRelease
+++ b/PrepareRelease
@@ -46,7 +46,7 @@ End
 
 echo "Making pcre.txt"
 for file in pcre pcrebuild pcrematching pcreapi pcrecallout pcrecompat \
-            pcrepattern pcrepartial pcreprecompile \
+            pcrepattern pcresyntax pcrepartial pcreprecompile \
             pcreperform pcreposix pcrecpp pcresample pcrestack ; do
   echo "  Processing $file.3"
   nroff -c -man $file.3 >$file.rawtxt
diff --git a/doc/html/index.html b/doc/html/index.html
index 36ae372..8a7174e 100644
--- a/doc/html/index.html
+++ b/doc/html/index.html
@@ -63,6 +63,9 @@ The HTML documentation for PCRE comprises the following pages:
 <tr><td><a href="pcrestack.html">pcrestack</a></td>
     <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
 
+<tr><td><a href="pcresyntax.html">pcresyntax</a></td>
+    <td>&nbsp;&nbsp;Syntax quick-reference summary</td></tr>
+
 <tr><td><a href="pcretest.html">pcretest</a></td>
     <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
 </table>
diff --git a/doc/html/pcre.html b/doc/html/pcre.html
index 662dfba..23f0c16 100644
--- a/doc/html/pcre.html
+++ b/doc/html/pcre.html
@@ -58,7 +58,9 @@ supported by PCRE are given in separate documents. See the
 <a href="pcrepattern.html"><b>pcrepattern</b></a>
 and
 <a href="pcrecompat.html"><b>pcrecompat</b></a>
-pages.
+pages. There is a syntax summary in the
+<a href="pcresyntax.html"><b>pcresyntax</b></a>
+page.
 </P>
 <P>
 Some features of PCRE can be included, excluded, or changed when the library is
@@ -98,6 +100,7 @@ follows:
   pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
   pcrepattern       syntax and semantics of supported regular expressions
+  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
@@ -124,21 +127,13 @@ documentation for details). In these cases the limit is substantially larger.
 However, the speed of execution is slower.
 </P>
 <P>
-All values in repeating quantifiers must be less than 65536. The maximum
-compiled length of subpattern with an explicit repeat count is 30000 bytes. The
-maximum number of capturing subpatterns is 65535.
+All values in repeating quantifiers must be less than 65536.
 </P>
 <P>
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns.
 </P>
 <P>
-If a non-capturing subpattern with an unlimited repetition quantifier can match
-an empty string, there is a limit of 1000 on the number of times it can be
-repeated while not matching an empty string - if it does match an empty
-string, the loop is immediately broken.
-</P>
-<P>
 The maximum length of name for a named subpattern is 32 characters, and the
 maximum number of named subpatterns is 10000.
 </P>
@@ -264,7 +259,7 @@ two digits 10, at the domain cam.ac.uk.
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 July 2007
+Last updated: 06 August 2007
 <br>
 Copyright &copy; 1997-2007 University of Cambridge.
 <br>
diff --git a/doc/html/pcreapi.html b/doc/html/pcreapi.html
index 3e5491b..ee7c6fb 100644
--- a/doc/html/pcreapi.html
+++ b/doc/html/pcreapi.html
@@ -1653,13 +1653,17 @@ are not required to be unique. Normally, patterns with duplicate names are such
 that in any one match, only one of the named subpatterns participates. An
 example is shown in the
 <a href="pcrepattern.html"><b>pcrepattern</b></a>
-documentation. When duplicates are present, <b>pcre_copy_named_substring()</b>
-and <b>pcre_get_named_substring()</b> return the first substring corresponding
-to the given name that is set. If none are set, an empty string is returned.
-The <b>pcre_get_stringnumber()</b> function returns one of the numbers that are
-associated with the name, but it is not defined which it is.
-<br>
-<br>
+documentation.
+</P>
+<P>
+When duplicates are present, <b>pcre_copy_named_substring()</b> and
+<b>pcre_get_named_substring()</b> return the first substring corresponding to
+the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING (-7) is
+returned; no data is returned. The <b>pcre_get_stringnumber()</b> function
+returns one of the numbers that are associated with the name, but it is not
+defined which it is.
+</P>
+<P>
 If you want to get full details of all captured substrings for a given name,
 you must use the <b>pcre_get_stringtable_entries()</b> function. The first
 argument is the compiled pattern, and the second is the name. The third and
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index a5ce66d..b8bf127 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -41,12 +41,14 @@ man page, in case the conversion went wrong.
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
 <P>
-The syntax and semantics of the regular expressions supported by PCRE are
-described below. Regular expressions are also described in the Perl
-documentation and in a number of books, some of which have copious examples.
-Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, covers
-regular expressions in great detail. This description of PCRE's regular
-expressions is intended as reference material.
+The syntax and semantics of the regular expressions that are supported by PCRE
+are described in detail below. There is a quick-reference syntax summary in the
+<a href="pcresyntax.html"><b>pcresyntax</b></a>
+page. Perl's regular expressions are described in its own documentation, and
+regular expressions in general are covered in a number of books, some of which
+have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
+published by O'Reilly, covers regular expressions in great detail. This
+description of PCRE's regular expressions is intended as reference material.
 </P>
 <P>
 The original operation of PCRE was on strings of one-byte characters. However,
@@ -255,9 +257,9 @@ meanings
 Absolute and relative back references
 </b><br>
 <P>
-The sequence \g followed by a positive or negative number, optionally enclosed
-in braces, is an absolute or relative back reference. A named back reference
-can be coded as \g{name}. Back references are discussed
+The sequence \g followed by an unsigned or a negative number, optionally
+enclosed in braces, is an absolute or relative back reference. A named back
+reference can be coded as \g{name}. Back references are discussed
 <a href="#backreferences">later,</a>
 following the discussion of
 <a href="#subpattern">parenthesized subpatterns.</a>
@@ -1303,6 +1305,11 @@ previous example can be rewritten as
 <pre>
   \d++foo
 </pre>
+Note that a possessive quantifier can be used with an entire group, for
+example:
+<pre>
+  (abc|xyz){2,3}+
+</pre>
 Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY
 option is ignored. They are a convenient notation for the simpler forms of
 atomic group. However, there is no difference in the meaning of a possessive
@@ -1377,16 +1384,17 @@ subpattern is possible using named parentheses (see below).
 <P>
 Another way of avoiding the ambiguity inherent in the use of digits following a
 backslash is to use the \g escape sequence, which is a feature introduced in
-Perl 5.10. This escape must be followed by a positive or a negative number,
-optionally enclosed in braces. These examples are all identical:
+Perl 5.10. This escape must be followed by an unsigned number or a negative
+number, optionally enclosed in braces. These examples are all identical:
 <pre>
   (ring), \1
   (ring), \g1
   (ring), \g{1}
 </pre>
-A positive number specifies an absolute reference without the ambiguity that is
-present in the older syntax. It is also useful when literal digits follow the
-reference. A negative number is a relative reference. Consider this example:
+An unsigned number specifies an absolute reference without the ambiguity that
+is present in the older syntax. It is also useful when literal digits follow
+the reference. A negative number is a relative reference. Consider this
+example:
 <pre>
   (abc(def)ghi)\g{-1}
 </pre>
@@ -1990,7 +1998,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 June 2007
+Last updated: 06 August 2007
 <br>
 Copyright &copy; 1997-2007 University of Cambridge.
 <br>
diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html
new file mode 100644
index 0000000..f6a22ce
--- /dev/null
+++ b/doc/html/pcresyntax.html
@@ -0,0 +1,407 @@
+<html>
+<head>
+<title>pcresyntax specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcresyntax man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
+<li><a name="TOC2" href="#SEC2">QUOTING</a>
+<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
+<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
+<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
+<li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
+<li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
+<li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
+<li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
+<li><a name="TOC11" href="#SEC11">ALTERNATION</a>
+<li><a name="TOC12" href="#SEC12">CAPTURING</a>
+<li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
+<li><a name="TOC14" href="#SEC14">COMMENT</a>
+<li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
+<li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
+<li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
+<li><a name="TOC20" href="#SEC20">CALLOUTS</a>
+<li><a name="TOC21" href="#SEC21">SEE ALSO</a>
+<li><a name="TOC22" href="#SEC22">AUTHOR</a>
+<li><a name="TOC23" href="#SEC23">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
+<P>
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation. This document contains just a quick-reference summary of the
+syntax.
+</P>
+<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
+<P>
+<pre>
+  \x         where x is non-alphanumeric is a literal x
+  \Q...\E    treat enclosed characters as literal
+</PRE>
+</P>
+<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
+<P>
+<pre>
+  \a         alarm, that is, the BEL character (hex 07)
+  \cx        "control-x", where x is any character
+  \e         escape (hex 1B)
+  \f         formfeed (hex 0C)
+  \n         newline (hex 0A)
+  \r         carriage return (hex 0D)
+  \t         tab (hex 09)
+  \ddd       character with octal code ddd, or backreference
+  \xhh       character with hex code hh
+  \x{hhh..}  character with hex code hhh..
+</PRE>
+</P>
+<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
+<P>
+<pre>
+  .          any character except newline;
+               in dotall mode, any character whatsoever
+  \C         one byte, even in UTF-8 mode (best avoided)
+  \d         a decimal digit
+  \D         a character that is not a decimal digit
+  \h         a horizontal whitespace character
+  \H         a character that is not a horizontal whitespace character
+  \p{<i>xx</i>}     a character with the <i>xx</i> property
+  \P{<i>xx</i>}     a character without the <i>xx</i> property
+  \R         a newline sequence
+  \s         a whitespace character
+  \S         a character that is not a whitespace character
+  \v         a vertical whitespace character
+  \V         a character that is not a vertical whitespace character
+  \w         a "word" character
+  \W         a "non-word" character
+  \X         an extended Unicode sequence
+</pre>
+In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+</P>
+<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
+<P>
+<pre>
+  C          Other
+  Cc         Control
+  Cf         Format
+  Cn         Unassigned
+  Co         Private use
+  Cs         Surrogate
+
+  L          Letter
+  Ll         Lower case letter
+  Lm         Modifier letter
+  Lo         Other letter
+  Lt         Title case letter
+  Lu         Upper case letter
+  L&         Ll, Lu, or Lt
+
+  M          Mark
+  Mc         Spacing mark
+  Me         Enclosing mark
+  Mn         Non-spacing mark
+
+  N          Number
+  Nd         Decimal number
+  Nl         Letter number
+  No         Other number
+
+  P          Punctuation
+  Pc         Connector punctuation
+  Pd         Dash punctuation
+  Pe         Close punctuation
+  Pf         Final punctuation
+  Pi         Initial punctuation
+  Po         Other punctuation
+  Ps         Open punctuation
+
+  S          Symbol
+  Sc         Currency symbol
+  Sk         Modifier symbol
+  Sm         Mathematical symbol
+  So         Other symbol
+
+  Z          Separator
+  Zl         Line separator
+  Zp         Paragraph separator
+  Zs         Space separator
+</PRE>
+</P>
+<br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<P>
+Arabic,
+Armenian,
+Balinese,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+</P>
+<br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
+<P>
+<pre>
+  [...]       positive character class
+  [^...]      negative character class
+  [x-y]       range (can be used for hex characters)
+  [[:xxx:]]   positive POSIX named set
+  [[^:xxx:]]  negative POSIX named set
+
+  alnum       alphanumeric
+  alpha       alphabetic
+  ascii       0-127
+  blank       space or tab
+  cntrl       control character
+  digit       decimal digit
+  graph       printing, excluding space
+  lower       lower case letter
+  print       printing, including space
+  punct       printing, excluding alphanumeric
+  space       whitespace
+  upper       upper case letter
+  word        same as \w
+  xdigit      hexadecimal digit
+</pre>
+In PCRE, POSIX character set names recognize only ASCII characters. You can use
+\Q...\E inside a character class.
+</P>
+<br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
+<P>
+<pre>
+  ?           0 or 1, greedy
+  ?+          0 or 1, possessive
+  ??          0 or 1, lazy
+  *           0 or more, greedy
+  *+          0 or more, possessive
+  *?          0 or more, lazy
+  +           1 or more, greedy
+  ++          1 or more, possessive
+  +?          1 or more, lazy
+  {n}         exactly n
+  {n,m}       at least n, no more than m, greedy
+  {n,m}+      at least n, no more than m, possessive
+  {n,m}?      at least n, no more than m, lazy
+  {n,}        n or more, greedy
+  {n,}+       n or more, possessive
+  {n,}?       n or more, lazy
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<P>
+<pre>
+  \b          word boundary
+  \B          not a word boundary
+  ^           start of subject
+               also after internal newline in multiline mode
+  \A          start of subject
+  $           end of subject
+               also before newline at end of subject
+               also before internal newline in multiline mode
+  \Z          end of subject
+               also before newline at end of subject
+  \z          end of subject
+  \G          first matching position in subject
+</PRE>
+</P>
+<br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
+<P>
+<pre>
+  \K          reset start of match
+</PRE>
+</P>
+<br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
+<P>
+<pre>
+  expr|expr|expr...
+</PRE>
+</P>
+<br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
+<P>
+<pre>
+  (...)          capturing group
+  (?&#60;name&#62;...)   named capturing group (Perl)
+  (?'name'...)   named capturing group (Perl)
+  (?P&#60;name&#62;...)  named capturing group (Python)
+  (?:...)        non-capturing group
+  (?|...)        non-capturing group; reset group numbers for
+                  capturing groups in each alternative
+</PRE>
+</P>
+<br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
+<P>
+<pre>
+  (?&#62;...)        atomic, non-capturing group
+</PRE>
+</P>
+<br><a name="SEC14" href="#TOC1">COMMENT</a><br>
+<P>
+<pre>
+  (?#....)       comment (not nestable)
+</PRE>
+</P>
+<br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
+<P>
+<pre>
+  (?i)           caseless
+  (?J)           allow duplicate names
+  (?m)           multiline
+  (?s)           single line (dotall)
+  (?U)           default ungreedy (lazy)
+  (?x)           extended (ignore white space)
+  (?-...)        unset option(s)
+</PRE>
+</P>
+<br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<P>
+<pre>
+  (?=...)        positive look ahead
+  (?!...)        negative look ahead
+  (?&#60;=...)       positive look behind
+  (?&#60;!...)       negative look behind
+</pre>
+Each top-level branch of a look behind must be of a fixed length.
+</P>
+<br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
+<P>
+<pre>
+  \n             reference by number (can be ambiguous)
+  \gn            reference by number
+  \g{n}          reference by number
+  \g{-n}         relative reference by number
+  \k&#60;name&#62;       reference by name (Perl)
+  \k'name'       reference by name (Perl)
+  \g{name}       reference by name (Perl)
+  \k{name}       reference by name (.NET)
+  (?P=name)      reference by name (Python)
+</PRE>
+</P>
+<br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<P>
+<pre>
+  (?R)           recurse whole pattern
+  (?n)           call subpattern by absolute number
+  (?+n)          call subpattern by relative number
+  (?-n)          call subpattern by relative number
+  (?&name)       call subpattern by name (Perl)
+  (?P&#62;name)      call subpattern by name (Python)
+</PRE>
+</P>
+<br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<P>
+<pre>
+  (?(condition)yes-pattern)
+  (?(condition)yes-pattern|no-pattern)
+
+  (?(n)...       absolute reference condition
+  (?(+n)...      relative reference condition
+  (?(-n)...      relative reference condition
+  (?(&#60;name&#62;)...  named reference condition (Perl)
+  (?('name')...  named reference condition (Perl)
+  (?(name)...    named reference condition (PCRE)
+  (?(R)...       overall recursion condition
+  (?(Rn)...      specific group recursion condition
+  (?(R&name)...  specific recursion condition
+  (?(DEFINE)...  define subpattern for reference
+  (?(assert)...  assertion condition
+</PRE>
+</P>
+<br><a name="SEC20" href="#TOC1">CALLOUTS</a><br>
+<P>
+<pre>
+  (?C)      callout
+  (?Cn)     callout with data n
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
+<b>pcrematching</b>(3), <b>pcre</b>(3).
+</P>
+<br><a name="SEC22" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC23" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 06 August 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/index.html.src b/doc/index.html.src
index b032434..888471f 100644
--- a/doc/index.html.src
+++ b/doc/index.html.src
@@ -63,6 +63,9 @@ The HTML documentation for PCRE comprises the following pages:
 <tr><td><a href="pcrestack.html">pcrestack</a></td>
     <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
 
+<tr><td><a href="pcresyntax.html">pcresyntax</a></td>
+    <td>&nbsp;&nbsp;Syntax quick-reference summary</td></tr>
+
 <tr><td><a href="pcretest.html">pcretest</a></td>
     <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
 </table>
diff --git a/doc/pcre.3 b/doc/pcre.3
index c48b18f..b29ef78 100644
--- a/doc/pcre.3
+++ b/doc/pcre.3
@@ -47,7 +47,11 @@ and
 .\" HREF
 \fBpcrecompat\fR
 .\"
-pages.
+pages. There is a syntax summary in the
+.\" HREF
+\fBpcresyntax\fR
+.\"
+page.
 .P
 Some features of PCRE can be included, excluded, or changed when the library is
 built. The
@@ -93,6 +97,7 @@ follows:
 .\" JOIN
   pcrepattern       syntax and semantics of supported
                       regular expressions
+  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
@@ -121,18 +126,11 @@ distribution and the
 documentation for details). In these cases the limit is substantially larger.
 However, the speed of execution is slower.
 .P
-All values in repeating quantifiers must be less than 65536. The maximum
-compiled length of subpattern with an explicit repeat count is 30000 bytes. The
-maximum number of capturing subpatterns is 65535.
+All values in repeating quantifiers must be less than 65536.
 .P
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns.
 .P
-If a non-capturing subpattern with an unlimited repetition quantifier can match
-an empty string, there is a limit of 1000 on the number of times it can be
-repeated while not matching an empty string - if it does match an empty
-string, the loop is immediately broken.
-.P
 The maximum length of name for a named subpattern is 32 characters, and the
 maximum number of named subpatterns is 10000.
 .P
@@ -256,6 +254,6 @@ two digits 10, at the domain cam.ac.uk.
 .rs
 .sp
 .nf
-Last updated: 30 July 2007
+Last updated: 06 August 2007
 Copyright (c) 1997-2007 University of Cambridge.
 .fi
diff --git a/doc/pcre.txt b/doc/pcre.txt
index 45ac6a8..4f4cf96 100644
--- a/doc/pcre.txt
+++ b/doc/pcre.txt
@@ -45,30 +45,31 @@ INTRODUCTION
 
        Details of exactly which Perl regular expression features are  and  are
        not supported by PCRE are given in separate documents. See the pcrepat-
-       tern and pcrecompat pages.
+       tern and pcrecompat pages. There is a syntax summary in the  pcresyntax
+       page.
 
-       Some features of PCRE can be included, excluded, or  changed  when  the
-       library  is  built.  The pcre_config() function makes it possible for a
-       client to discover which features are  available.  The  features  them-
-       selves  are described in the pcrebuild page. Documentation about build-
-       ing PCRE for various operating systems can be found in the README  file
+       Some  features  of  PCRE can be included, excluded, or changed when the
+       library is built. The pcre_config() function makes it  possible  for  a
+       client  to  discover  which  features are available. The features them-
+       selves are described in the pcrebuild page. Documentation about  build-
+       ing  PCRE for various operating systems can be found in the README file
        in the source distribution.
 
-       The  library  contains  a number of undocumented internal functions and
-       data tables that are used by more than one  of  the  exported  external
-       functions,  but  which  are  not  intended for use by external callers.
-       Their names all begin with "_pcre_", which hopefully will  not  provoke
+       The library contains a number of undocumented  internal  functions  and
+       data  tables  that  are  used by more than one of the exported external
+       functions, but which are not intended  for  use  by  external  callers.
+       Their  names  all begin with "_pcre_", which hopefully will not provoke
        any name clashes. In some environments, it is possible to control which
-       external symbols are exported when a shared library is  built,  and  in
+       external  symbols  are  exported when a shared library is built, and in
        these cases the undocumented symbols are not exported.
 
 
 USER DOCUMENTATION
 
-       The  user  documentation  for PCRE comprises a number of different sec-
-       tions. In the "man" format, each of these is a separate "man page".  In
-       the  HTML  format, each is a separate page, linked from the index page.
-       In the plain text format, all the sections are concatenated,  for  ease
+       The user documentation for PCRE comprises a number  of  different  sec-
+       tions.  In the "man" format, each of these is a separate "man page". In
+       the HTML format, each is a separate page, linked from the  index  page.
+       In  the  plain text format, all the sections are concatenated, for ease
        of searching. The sections are as follows:
 
          pcre              this document
@@ -83,6 +84,7 @@ USER DOCUMENTATION
          pcrepartial       details of the partial matching facility
          pcrepattern       syntax and semantics of supported
                              regular expressions
+         pcresyntax        quick syntax reference
          pcreperform       discussion of performance issues
          pcreposix         the POSIX-compatible C API
          pcreprecompile    details of saving and re-using precompiled patterns
@@ -90,35 +92,28 @@ USER DOCUMENTATION
          pcrestack         discussion of stack usage
          pcretest          description of the pcretest testing command
 
-       In addition, in the "man" and HTML formats, there is a short  page  for
+       In  addition,  in the "man" and HTML formats, there is a short page for
        each C library function, listing its arguments and results.
 
 
 LIMITATIONS
 
-       There  are some size limitations in PCRE but it is hoped that they will
+       There are some size limitations in PCRE but it is hoped that they  will
        never in practice be relevant.
 
-       The maximum length of a compiled pattern is 65539 (sic) bytes  if  PCRE
+       The  maximum  length of a compiled pattern is 65539 (sic) bytes if PCRE
        is compiled with the default internal linkage size of 2. If you want to
-       process regular expressions that are truly enormous,  you  can  compile
-       PCRE  with  an  internal linkage size of 3 or 4 (see the README file in
-       the source distribution and the pcrebuild documentation  for  details).
-       In  these  cases the limit is substantially larger.  However, the speed
+       process  regular  expressions  that are truly enormous, you can compile
+       PCRE with an internal linkage size of 3 or 4 (see the  README  file  in
+       the  source  distribution and the pcrebuild documentation for details).
+       In these cases the limit is substantially larger.  However,  the  speed
        of execution is slower.
 
-       All values in repeating quantifiers must be less than 65536. The  maxi-
-       mum  compiled  length  of  subpattern  with an explicit repeat count is
-       30000 bytes. The maximum number of capturing subpatterns is 65535.
+       All values in repeating quantifiers must be less than 65536.
 
        There is no limit to the number of parenthesized subpatterns, but there
        can be no more than 65535 capturing subpatterns.
 
-       If  a  non-capturing subpattern with an unlimited repetition quantifier
-       can match an empty string, there is a limit of 1000 on  the  number  of
-       times  it  can  be  repeated while not matching an empty string - if it
-       does match an empty string, the loop is immediately broken.
-
        The maximum length of name for a named subpattern is 32 characters, and
        the maximum number of named subpatterns is 10000.
 
@@ -231,7 +226,7 @@ AUTHOR
 
 REVISION
 
-       Last updated: 30 July 2007
+       Last updated: 06 August 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
 
@@ -2212,12 +2207,14 @@ DUPLICATE SUBPATTERN NAMES
        subpatterns  are  not  required  to  be unique. Normally, patterns with
        duplicate names are such that in any one match, only one of  the  named
        subpatterns  participates. An example is shown in the pcrepattern docu-
-       mentation. When duplicates are present, pcre_copy_named_substring() and
+       mentation.
+
+       When   duplicates   are   present,   pcre_copy_named_substring()    and
        pcre_get_named_substring()  return the first substring corresponding to
-       the given name that is set.  If  none  are  set,  an  empty  string  is
-       returned.  The pcre_get_stringnumber() function returns one of the num-
-       bers that are associated with the name, but it is not defined which  it
-       is.
+       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
+       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
+       function returns one of the numbers that are associated with the  name,
+       but it is not defined which it is.
 
        If  you want to get full details of all captured substrings for a given
        name, you must use  the  pcre_get_stringtable_entries()  function.  The
@@ -2732,12 +2729,14 @@ NAME
 
 PCRE REGULAR EXPRESSION DETAILS
 
-       The  syntax  and semantics of the regular expressions supported by PCRE
-       are described below. Regular expressions are also described in the Perl
-       documentation  and  in  a  number  of books, some of which have copious
-       examples.  Jeffrey Friedl's "Mastering Regular Expressions",  published
-       by  O'Reilly, covers regular expressions in great detail. This descrip-
-       tion of PCRE's regular expressions is intended as reference material.
+       The  syntax and semantics of the regular expressions that are supported
+       by PCRE are described in detail below. There is a quick-reference  syn-
+       tax  summary  in  the  pcresyntax  page. Perl's regular expressions are
+       described in its own documentation, and regular expressions in  general
+       are  covered in a number of books, some of which have copious examples.
+       Jeffrey  Friedl's  "Mastering  Regular   Expressions",   published   by
+       O'Reilly,  covers regular expressions in great detail. This description
+       of PCRE's regular expressions is intended as reference material.
 
        The original operation of PCRE was on strings of  one-byte  characters.
        However,  there is now also support for UTF-8 character strings. To use
@@ -2939,10 +2938,10 @@ BACKSLASH
 
    Absolute and relative back references
 
-       The  sequence  \g followed by a positive or negative number, optionally
-       enclosed in braces, is an absolute or relative back reference. A  named
-       back  reference can be coded as \g{name}. Back references are discussed
-       later, following the discussion of parenthesized subpatterns.
+       The  sequence  \g followed by an unsigned or a negative number, option-
+       ally enclosed in braces, is an absolute or relative back  reference.  A
+       named back reference can be coded as \g{name}. Back references are dis-
+       cussed later, following the discussion of parenthesized subpatterns.
 
    Generic character types
 
@@ -3878,121 +3877,126 @@ ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
 
          \d++foo
 
-       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
+       Note that a possessive quantifier can be used with an entire group, for
+       example:
+
+         (abc|xyz){2,3}+
+
+       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler forms of atomic group. However, there is no difference  in  the
-       meaning  of  a  possessive  quantifier and the equivalent atomic group,
-       though there may be a performance  difference;  possessive  quantifiers
+       simpler  forms  of atomic group. However, there is no difference in the
+       meaning of a possessive quantifier and  the  equivalent  atomic  group,
+       though  there  may  be a performance difference; possessive quantifiers
        should be slightly faster.
 
-       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
-       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
+       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
        edition of his book. Mike McCloskey liked it, so implemented it when he
-       built Sun's Java package, and PCRE copied it from there. It  ultimately
+       built  Sun's Java package, and PCRE copied it from there. It ultimately
        found its way into Perl at release 5.10.
 
        PCRE has an optimization that automatically "possessifies" certain sim-
-       ple pattern constructs. For example, the sequence  A+B  is  treated  as
-       A++B  because  there is no point in backtracking into a sequence of A's
+       ple  pattern  constructs.  For  example, the sequence A+B is treated as
+       A++B because there is no point in backtracking into a sequence  of  A's
        when B must follow.
 
-       When a pattern contains an unlimited repeat inside  a  subpattern  that
-       can  itself  be  repeated  an  unlimited number of times, the use of an
-       atomic group is the only way to avoid some  failing  matches  taking  a
+       When  a  pattern  contains an unlimited repeat inside a subpattern that
+       can itself be repeated an unlimited number of  times,  the  use  of  an
+       atomic  group  is  the  only way to avoid some failing matches taking a
        very long time indeed. The pattern
 
          (\D+|<\d+>)*[!?]
 
-       matches  an  unlimited number of substrings that either consist of non-
-       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
+       matches an unlimited number of substrings that either consist  of  non-
+       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
        matches, it runs quickly. However, if it is applied to
 
          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
-       it  takes  a  long  time  before reporting failure. This is because the
-       string can be divided between the internal \D+ repeat and the  external
-       *  repeat  in  a  large  number of ways, and all have to be tried. (The
-       example uses [!?] rather than a single character at  the  end,  because
-       both  PCRE  and  Perl have an optimization that allows for fast failure
-       when a single character is used. They remember the last single  charac-
-       ter  that  is required for a match, and fail early if it is not present
-       in the string.) If the pattern is changed so that  it  uses  an  atomic
+       it takes a long time before reporting  failure.  This  is  because  the
+       string  can be divided between the internal \D+ repeat and the external
+       * repeat in a large number of ways, and all  have  to  be  tried.  (The
+       example  uses  [!?]  rather than a single character at the end, because
+       both PCRE and Perl have an optimization that allows  for  fast  failure
+       when  a single character is used. They remember the last single charac-
+       ter that is required for a match, and fail early if it is  not  present
+       in  the  string.)  If  the pattern is changed so that it uses an atomic
        group, like this:
 
          ((?>\D+)|<\d+>)*[!?]
 
-       sequences  of non-digits cannot be broken, and failure happens quickly.
+       sequences of non-digits cannot be broken, and failure happens  quickly.
 
 
 BACK REFERENCES
 
        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern earlier (that is, to its left) in the pattern,  provided  there
+       pattern  earlier  (that is, to its left) in the pattern, provided there
        have been that many previous capturing left parentheses.
 
        However, if the decimal number following the backslash is less than 10,
-       it is always taken as a back reference, and causes  an  error  only  if
-       there  are  not that many capturing left parentheses in the entire pat-
-       tern. In other words, the parentheses that are referenced need  not  be
-       to  the left of the reference for numbers less than 10. A "forward back
-       reference" of this type can make sense when a  repetition  is  involved
-       and  the  subpattern to the right has participated in an earlier itera-
+       it  is  always  taken  as a back reference, and causes an error only if
+       there are not that many capturing left parentheses in the  entire  pat-
+       tern.  In  other words, the parentheses that are referenced need not be
+       to the left of the reference for numbers less than 10. A "forward  back
+       reference"  of  this  type can make sense when a repetition is involved
+       and the subpattern to the right has participated in an  earlier  itera-
        tion.
 
-       It is not possible to have a numerical "forward back  reference"  to  a
-       subpattern  whose  number  is  10  or  more using this syntax because a
-       sequence such as \50 is interpreted as a character  defined  in  octal.
+       It  is  not  possible to have a numerical "forward back reference" to a
+       subpattern whose number is 10 or  more  using  this  syntax  because  a
+       sequence  such  as  \50 is interpreted as a character defined in octal.
        See the subsection entitled "Non-printing characters" above for further
-       details of the handling of digits following a backslash.  There  is  no
-       such  problem  when named parentheses are used. A back reference to any
+       details  of  the  handling of digits following a backslash. There is no
+       such problem when named parentheses are used. A back reference  to  any
        subpattern is possible using named parentheses (see below).
 
-       Another way of avoiding the ambiguity inherent in  the  use  of  digits
+       Another  way  of  avoiding  the ambiguity inherent in the use of digits
        following a backslash is to use the \g escape sequence, which is a fea-
-       ture introduced in Perl 5.10. This escape must be followed by  a  posi-
-       tive  or  a negative number, optionally enclosed in braces. These exam-
-       ples are all identical:
+       ture  introduced  in  Perl  5.10.  This  escape  must be followed by an
+       unsigned number or a negative number, optionally  enclosed  in  braces.
+       These examples are all identical:
 
          (ring), \1
          (ring), \g1
          (ring), \g{1}
 
-       A positive number specifies an absolute reference without the ambiguity
-       that  is  present  in  the older syntax. It is also useful when literal
+       An  unsigned number specifies an absolute reference without the ambigu-
+       ity that is present in the older syntax. It is also useful when literal
        digits follow the reference. A negative number is a relative reference.
        Consider this example:
 
          (abc(def)ghi)\g{-1}
 
        The sequence \g{-1} is a reference to the most recently started captur-
-       ing subpattern before \g, that is, is it equivalent to  \2.  Similarly,
+       ing  subpattern  before \g, that is, is it equivalent to \2. Similarly,
        \g{-2} would be equivalent to \1. The use of relative references can be
-       helpful in long patterns, and also in  patterns  that  are  created  by
+       helpful  in  long  patterns,  and  also in patterns that are created by
        joining together fragments that contain references within themselves.
 
-       A  back  reference matches whatever actually matched the capturing sub-
-       pattern in the current subject string, rather  than  anything  matching
+       A back reference matches whatever actually matched the  capturing  sub-
+       pattern  in  the  current subject string, rather than anything matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern
 
          (sens|respons)e and \1ibility
 
-       matches "sense and sensibility" and "response and responsibility",  but
-       not  "sense and responsibility". If caseful matching is in force at the
-       time of the back reference, the case of letters is relevant. For  exam-
+       matches  "sense and sensibility" and "response and responsibility", but
+       not "sense and responsibility". If caseful matching is in force at  the
+       time  of the back reference, the case of letters is relevant. For exam-
        ple,
 
          ((?i)rah)\s+\1
 
-       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
+       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
        original capturing subpattern is matched caselessly.
 
-       There are several different ways of writing back  references  to  named
-       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
-       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
+       There  are  several  different ways of writing back references to named
+       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and named references, is also supported. We  could  rewrite  the  above
+       and  named  references,  is  also supported. We could rewrite the above
        example in any of the following ways:
 
          (?<p1>(?i)rah)\s+\k<p1>
@@ -4000,57 +4004,57 @@ BACK REFERENCES
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}
 
-       A  subpattern  that  is  referenced  by  name may appear in the pattern
+       A subpattern that is referenced by  name  may  appear  in  the  pattern
        before or after the reference.
 
-       There may be more than one back reference to the same subpattern. If  a
-       subpattern  has  not actually been used in a particular match, any back
+       There  may be more than one back reference to the same subpattern. If a
+       subpattern has not actually been used in a particular match,  any  back
        references to it always fail. For example, the pattern
 
          (a|(bc))\2
 
-       always fails if it starts to match "a" rather than "bc". Because  there
-       may  be  many  capturing parentheses in a pattern, all digits following
-       the backslash are taken as part of a potential back  reference  number.
+       always  fails if it starts to match "a" rather than "bc". Because there
+       may be many capturing parentheses in a pattern,  all  digits  following
+       the  backslash  are taken as part of a potential back reference number.
        If the pattern continues with a digit character, some delimiter must be
-       used to terminate the back reference. If the  PCRE_EXTENDED  option  is
-       set,  this  can  be  whitespace.  Otherwise an empty comment (see "Com-
+       used  to  terminate  the back reference. If the PCRE_EXTENDED option is
+       set, this can be whitespace.  Otherwise an  empty  comment  (see  "Com-
        ments" below) can be used.
 
-       A back reference that occurs inside the parentheses to which it  refers
-       fails  when  the subpattern is first used, so, for example, (a\1) never
-       matches.  However, such references can be useful inside  repeated  sub-
+       A  back reference that occurs inside the parentheses to which it refers
+       fails when the subpattern is first used, so, for example,  (a\1)  never
+       matches.   However,  such references can be useful inside repeated sub-
        patterns. For example, the pattern
 
          (a|b\1)+
 
        matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation of the subpattern,  the  back  reference  matches  the  character
-       string  corresponding  to  the previous iteration. In order for this to
-       work, the pattern must be such that the first iteration does  not  need
-       to  match the back reference. This can be done using alternation, as in
+       ation  of  the  subpattern,  the  back  reference matches the character
+       string corresponding to the previous iteration. In order  for  this  to
+       work,  the  pattern must be such that the first iteration does not need
+       to match the back reference. This can be done using alternation, as  in
        the example above, or by a quantifier with a minimum of zero.
 
 
 ASSERTIONS
 
-       An assertion is a test on the characters  following  or  preceding  the
-       current  matching  point that does not actually consume any characters.
-       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
+       An  assertion  is  a  test on the characters following or preceding the
+       current matching point that does not actually consume  any  characters.
+       The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are
        described above.
 
-       More  complicated  assertions  are  coded as subpatterns. There are two
-       kinds: those that look ahead of the current  position  in  the  subject
-       string,  and  those  that  look  behind  it. An assertion subpattern is
-       matched in the normal way, except that it does not  cause  the  current
+       More complicated assertions are coded as  subpatterns.  There  are  two
+       kinds:  those  that  look  ahead of the current position in the subject
+       string, and those that look  behind  it.  An  assertion  subpattern  is
+       matched  in  the  normal way, except that it does not cause the current
        matching position to be changed.
 
-       Assertion  subpatterns  are  not  capturing subpatterns, and may not be
-       repeated, because it makes no sense to assert the  same  thing  several
-       times.  If  any kind of assertion contains capturing subpatterns within
-       it, these are counted for the purposes of numbering the capturing  sub-
+       Assertion subpatterns are not capturing subpatterns,  and  may  not  be
+       repeated,  because  it  makes no sense to assert the same thing several
+       times. If any kind of assertion contains capturing  subpatterns  within
+       it,  these are counted for the purposes of numbering the capturing sub-
        patterns in the whole pattern.  However, substring capturing is carried
-       out only for positive assertions, because it does not  make  sense  for
+       out  only  for  positive assertions, because it does not make sense for
        negative assertions.
 
    Lookahead assertions
@@ -4060,37 +4064,37 @@ ASSERTIONS
 
          \w+(?=;)
 
-       matches a word followed by a semicolon, but does not include the  semi-
+       matches  a word followed by a semicolon, but does not include the semi-
        colon in the match, and
 
          foo(?!bar)
 
-       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
+       matches any occurrence of "foo" that is not  followed  by  "bar".  Note
        that the apparently similar pattern
 
          (?!foo)bar
 
-       does not find an occurrence of "bar"  that  is  preceded  by  something
-       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
+       does  not  find  an  occurrence  of "bar" that is preceded by something
+       other than "foo"; it finds any occurrence of "bar" whatsoever,  because
        the assertion (?!foo) is always true when the next three characters are
        "bar". A lookbehind assertion is needed to achieve the other effect.
 
        If you want to force a matching failure at some point in a pattern, the
-       most convenient way to do it is  with  (?!)  because  an  empty  string
-       always  matches, so an assertion that requires there not to be an empty
+       most  convenient  way  to  do  it  is with (?!) because an empty string
+       always matches, so an assertion that requires there not to be an  empty
        string must always fail.
 
    Lookbehind assertions
 
-       Lookbehind assertions start with (?<= for positive assertions and  (?<!
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
        for negative assertions. For example,
 
          (?<!foo)bar
 
-       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
-       contents of a lookbehind assertion are restricted  such  that  all  the
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
        strings it matches must have a fixed length. However, if there are sev-
-       eral top-level alternatives, they do not all  have  to  have  the  same
+       eral  top-level  alternatives,  they  do  not all have to have the same
        fixed length. Thus
 
          (?<=bullock|donkey)
@@ -4099,59 +4103,59 @@ ASSERTIONS
 
          (?<!dogs?|cats?)
 
-       causes  an  error at compile time. Branches that match different length
-       strings are permitted only at the top level of a lookbehind  assertion.
-       This  is  an  extension  compared  with  Perl (at least for 5.8), which
-       requires all branches to match the same length of string. An  assertion
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
+       This is an extension compared with  Perl  (at  least  for  5.8),  which
+       requires  all branches to match the same length of string. An assertion
        such as
 
          (?<=ab(c|de))
 
-       is  not  permitted,  because  its single top-level branch can match two
-       different lengths, but it is acceptable if rewritten to  use  two  top-
+       is not permitted, because its single top-level  branch  can  match  two
+       different  lengths,  but  it is acceptable if rewritten to use two top-
        level branches:
 
          (?<=abc|abde)
 
        In some cases, the Perl 5.10 escape sequence \K (see above) can be used
-       instead of a lookbehind assertion; this is not restricted to  a  fixed-
+       instead  of  a lookbehind assertion; this is not restricted to a fixed-
        length.
 
-       The  implementation  of lookbehind assertions is, for each alternative,
-       to temporarily move the current position back by the fixed  length  and
+       The implementation of lookbehind assertions is, for  each  alternative,
+       to  temporarily  move the current position back by the fixed length and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.
 
        PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode)  to appear in lookbehind assertions, because it makes it impossi-
-       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
+       mode) to appear in lookbehind assertions, because it makes it  impossi-
+       ble  to  calculate the length of the lookbehind. The \X and \R escapes,
        which can match different numbers of bytes, are also not permitted.
 
-       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
-       assertions to specify efficient matching at  the  end  of  the  subject
+       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
+       assertions  to  specify  efficient  matching  at the end of the subject
        string. Consider a simple pattern such as
 
          abcd$
 
-       when  applied  to  a  long string that does not match. Because matching
+       when applied to a long string that does  not  match.  Because  matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and  then  see  if what follows matches the rest of the pattern. If the
+       and then see if what follows matches the rest of the  pattern.  If  the
        pattern is specified as
 
          ^.*abcd$
 
-       the initial .* matches the entire string at first, but when this  fails
+       the  initial .* matches the entire string at first, but when this fails
        (because there is no following "a"), it backtracks to match all but the
-       last character, then all but the last two characters, and so  on.  Once
-       again  the search for "a" covers the entire string, from right to left,
+       last  character,  then all but the last two characters, and so on. Once
+       again the search for "a" covers the entire string, from right to  left,
        so we are no better off. However, if the pattern is written as
 
          ^.*+(?<=abcd)
 
-       there can be no backtracking for the .*+ item; it can  match  only  the
-       entire  string.  The subsequent lookbehind assertion does a single test
-       on the last four characters. If it fails, the match fails  immediately.
-       For  long  strings, this approach makes a significant difference to the
+       there  can  be  no backtracking for the .*+ item; it can match only the
+       entire string. The subsequent lookbehind assertion does a  single  test
+       on  the last four characters. If it fails, the match fails immediately.
+       For long strings, this approach makes a significant difference  to  the
        processing time.
 
    Using multiple assertions
@@ -4160,18 +4164,18 @@ ASSERTIONS
 
          (?<=\d{3})(?<!999)foo
 
-       matches "foo" preceded by three digits that are not "999". Notice  that
-       each  of  the  assertions is applied independently at the same point in
-       the subject string. First there is a  check  that  the  previous  three
-       characters  are  all  digits,  and  then there is a check that the same
+       matches  "foo" preceded by three digits that are not "999". Notice that
+       each of the assertions is applied independently at the  same  point  in
+       the  subject  string.  First  there  is a check that the previous three
+       characters are all digits, and then there is  a  check  that  the  same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded  by  six  characters,  the first of which are digits and the last
-       three of which are not "999". For example, it  doesn't  match  "123abc-
+       ceded by six characters, the first of which are  digits  and  the  last
+       three  of  which  are not "999". For example, it doesn't match "123abc-
        foo". A pattern to do that is
 
          (?<=\d{3}...)(?<!999)foo
 
-       This  time  the  first assertion looks at the preceding six characters,
+       This time the first assertion looks at the  preceding  six  characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".
 
@@ -4179,79 +4183,79 @@ ASSERTIONS
 
          (?<=(?<!foo)bar)baz
 
-       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       matches an occurrence of "baz" that is preceded by "bar" which in  turn
        is not preceded by "foo", while
 
          (?<=\d{3}(?!999)...)foo
 
-       is another pattern that matches "foo" preceded by three digits and  any
+       is  another pattern that matches "foo" preceded by three digits and any
        three characters that are not "999".
 
 
 CONDITIONAL SUBPATTERNS
 
-       It  is possible to cause the matching process to obey a subpattern con-
-       ditionally or to choose between two alternative subpatterns,  depending
-       on  the result of an assertion, or whether a previous capturing subpat-
-       tern matched or not. The two possible forms of  conditional  subpattern
+       It is possible to cause the matching process to obey a subpattern  con-
+       ditionally  or to choose between two alternative subpatterns, depending
+       on the result of an assertion, or whether a previous capturing  subpat-
+       tern  matched  or not. The two possible forms of conditional subpattern
        are
 
          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)
 
-       If  the  condition is satisfied, the yes-pattern is used; otherwise the
-       no-pattern (if present) is used. If there are more  than  two  alterna-
+       If the condition is satisfied, the yes-pattern is used;  otherwise  the
+       no-pattern  (if  present)  is used. If there are more than two alterna-
        tives in the subpattern, a compile-time error occurs.
 
-       There  are  four  kinds of condition: references to subpatterns, refer-
+       There are four kinds of condition: references  to  subpatterns,  refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.
 
    Checking for a used subpattern by number
 
-       If the text between the parentheses consists of a sequence  of  digits,
-       the  condition  is  true if the capturing subpattern of that number has
-       previously matched. An alternative notation is to  precede  the  digits
+       If  the  text between the parentheses consists of a sequence of digits,
+       the condition is true if the capturing subpattern of  that  number  has
+       previously  matched.  An  alternative notation is to precede the digits
        with a plus or minus sign. In this case, the subpattern number is rela-
        tive rather than absolute.  The most recently opened parentheses can be
-       referenced  by  (?(-1),  the  next most recent by (?(-2), and so on. In
+       referenced by (?(-1), the next most recent by (?(-2),  and  so  on.  In
        looping constructs it can also make sense to refer to subsequent groups
        with constructs such as (?(+2).
 
-       Consider  the  following  pattern, which contains non-significant white
+       Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
        divide it into three parts for ease of discussion:
 
          ( \( )?    [^()]+    (?(1) \) )
 
-       The  first  part  matches  an optional opening parenthesis, and if that
+       The first part matches an optional opening  parenthesis,  and  if  that
        character is present, sets it as the first captured substring. The sec-
-       ond  part  matches one or more characters that are not parentheses. The
+       ond part matches one or more characters that are not  parentheses.  The
        third part is a conditional subpattern that tests whether the first set
        of parentheses matched or not. If they did, that is, if subject started
        with an opening parenthesis, the condition is true, and so the yes-pat-
-       tern  is  executed  and  a  closing parenthesis is required. Otherwise,
-       since no-pattern is not present, the  subpattern  matches  nothing.  In
-       other  words,  this  pattern  matches  a  sequence  of non-parentheses,
+       tern is executed and a  closing  parenthesis  is  required.  Otherwise,
+       since  no-pattern  is  not  present, the subpattern matches nothing. In
+       other words,  this  pattern  matches  a  sequence  of  non-parentheses,
        optionally enclosed in parentheses.
 
-       If you were embedding this pattern in a larger one,  you  could  use  a
+       If  you  were  embedding  this pattern in a larger one, you could use a
        relative reference:
 
          ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
 
-       This  makes  the  fragment independent of the parentheses in the larger
+       This makes the fragment independent of the parentheses  in  the  larger
        pattern.
 
    Checking for a used subpattern by name
 
-       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
-       used  subpattern  by  name.  For compatibility with earlier versions of
-       PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
-       also  recognized. However, there is a possible ambiguity with this syn-
-       tax, because subpattern names may  consist  entirely  of  digits.  PCRE
-       looks  first for a named subpattern; if it cannot find one and the name
-       consists entirely of digits, PCRE looks for a subpattern of  that  num-
-       ber,  which must be greater than zero. Using subpattern names that con-
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
        sist entirely of digits is not recommended.
 
        Rewriting the above example to use a named subpattern gives this:
@@ -4262,85 +4266,85 @@ CONDITIONAL SUBPATTERNS
    Checking for pattern recursion
 
        If the condition is the string (R), and there is no subpattern with the
-       name  R, the condition is true if a recursive call to the whole pattern
+       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:
 
          (?(R3)...) or (?(R&name)...)
 
-       the  condition is true if the most recent recursion is into the subpat-
-       tern whose number or name is given. This condition does not  check  the
+       the condition is true if the most recent recursion is into the  subpat-
+       tern  whose  number or name is given. This condition does not check the
        entire recursion stack.
 
-       At  "top  level", all these recursion test conditions are false. Recur-
+       At "top level", all these recursion test conditions are  false.  Recur-
        sive patterns are described below.
 
    Defining subpatterns for use by reference only
 
-       If the condition is the string (DEFINE), and  there  is  no  subpattern
-       with  the  name  DEFINE,  the  condition is always false. In this case,
-       there may be only one alternative  in  the  subpattern.  It  is  always
-       skipped  if  control  reaches  this  point  in the pattern; the idea of
-       DEFINE is that it can be used to define "subroutines" that can be  ref-
-       erenced  from elsewhere. (The use of "subroutines" is described below.)
-       For example, a pattern to match an IPv4 address could be  written  like
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define "subroutines" that can be ref-
+       erenced from elsewhere. (The use of "subroutines" is described  below.)
+       For  example,  a pattern to match an IPv4 address could be written like
        this (ignore whitespace and line breaks):
 
          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b
 
-       The  first part of the pattern is a DEFINE group inside which a another
-       group named "byte" is defined. This matches an individual component  of
-       an  IPv4  address  (a number less than 256). When matching takes place,
-       this part of the pattern is skipped because DEFINE acts  like  a  false
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
        condition.
 
        The rest of the pattern uses references to the named group to match the
-       four dot-separated components of an IPv4 address, insisting on  a  word
+       four  dot-separated  components of an IPv4 address, insisting on a word
        boundary at each end.
 
    Assertion conditions
 
-       If  the  condition  is  not  in any of the above formats, it must be an
-       assertion.  This may be a positive or negative lookahead or  lookbehind
-       assertion.  Consider  this  pattern,  again  containing non-significant
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:
 
          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
 
-       The condition  is  a  positive  lookahead  assertion  that  matches  an
-       optional  sequence of non-letters followed by a letter. In other words,
-       it tests for the presence of at least one letter in the subject.  If  a
-       letter  is found, the subject is matched against the first alternative;
-       otherwise it is  matched  against  the  second.  This  pattern  matches
-       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.
 
 
 COMMENTS
 
-       The sequence (?# marks the start of a comment that continues up to  the
-       next  closing  parenthesis.  Nested  parentheses are not permitted. The
-       characters that make up a comment play no part in the pattern  matching
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses  are  not  permitted.  The
+       characters  that make up a comment play no part in the pattern matching
        at all.
 
-       If  the PCRE_EXTENDED option is set, an unescaped # character outside a
-       character class introduces a  comment  that  continues  to  immediately
+       If the PCRE_EXTENDED option is set, an unescaped # character outside  a
+       character  class  introduces  a  comment  that continues to immediately
        after the next newline in the pattern.
 
 
 RECURSIVE PATTERNS
 
-       Consider  the problem of matching a string in parentheses, allowing for
-       unlimited nested parentheses. Without the use of  recursion,  the  best
-       that  can  be  done  is  to use a pattern that matches up to some fixed
-       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
+       Consider the problem of matching a string in parentheses, allowing  for
+       unlimited  nested  parentheses.  Without the use of recursion, the best
+       that can be done is to use a pattern that  matches  up  to  some  fixed
+       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
        depth.
 
        For some time, Perl has provided a facility that allows regular expres-
-       sions to recurse (amongst other things). It does this by  interpolating
-       Perl  code in the expression at run time, and the code can refer to the
+       sions  to recurse (amongst other things). It does this by interpolating
+       Perl code in the expression at run time, and the code can refer to  the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:
 
@@ -4350,117 +4354,117 @@ RECURSIVE PATTERNS
        refers recursively to the pattern in which it appears.
 
        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
-       it  supports  special  syntax  for recursion of the entire pattern, and
-       also for individual subpattern recursion.  After  its  introduction  in
-       PCRE  and  Python,  this  kind of recursion was introduced into Perl at
+       it supports special syntax for recursion of  the  entire  pattern,  and
+       also  for  individual  subpattern  recursion. After its introduction in
+       PCRE and Python, this kind of recursion was  introduced  into  Perl  at
        release 5.10.
 
-       A special item that consists of (? followed by a  number  greater  than
+       A  special  item  that consists of (? followed by a number greater than
        zero and a closing parenthesis is a recursive call of the subpattern of
-       the given number, provided that it occurs inside that  subpattern.  (If
-       not,  it  is  a  "subroutine" call, which is described in the next sec-
-       tion.) The special item (?R) or (?0) is a recursive call of the  entire
+       the  given  number, provided that it occurs inside that subpattern. (If
+       not, it is a "subroutine" call, which is described  in  the  next  sec-
+       tion.)  The special item (?R) or (?0) is a recursive call of the entire
        regular expression.
 
-       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
+       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
        alternatives and there is a subsequent matching failure.
 
-       This PCRE pattern solves the nested  parentheses  problem  (assume  the
+       This  PCRE  pattern  solves  the nested parentheses problem (assume the
        PCRE_EXTENDED option is set so that white space is ignored):
 
          \( ( (?>[^()]+) | (?R) )* \)
 
-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       First it matches an opening parenthesis. Then it matches any number  of
+       substrings  which  can  either  be  a sequence of non-parentheses, or a
+       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis.
 
-       If this were part of a larger pattern, you would not  want  to  recurse
+       If  this  were  part of a larger pattern, you would not want to recurse
        the entire pattern, so instead you could use this:
 
          ( \( ( (?>[^()]+) | (?1) )* \) )
 
-       We  have  put the pattern into parentheses, and caused the recursion to
+       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.
 
-       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. (A Perl
-       5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
+       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+       tricky. This is made easier by the use of relative references. (A  Perl
+       5.10  feature.)   Instead  of  (?1)  in the pattern above you can write
        (?-2) to refer to the second most recently opened parentheses preceding
-       the recursion. In other  words,  a  negative  number  counts  capturing
+       the  recursion.  In  other  words,  a  negative number counts capturing
        parentheses leftwards from the point at which it is encountered.
 
-       It  is  also  possible  to refer to subsequently opened parentheses, by
-       writing references such as (?+2). However, these  cannot  be  recursive
-       because  the  reference  is  not inside the parentheses that are refer-
-       enced. They are always "subroutine" calls, as  described  in  the  next
+       It is also possible to refer to  subsequently  opened  parentheses,  by
+       writing  references  such  as (?+2). However, these cannot be recursive
+       because the reference is not inside the  parentheses  that  are  refer-
+       enced.  They  are  always  "subroutine" calls, as described in the next
        section.
 
-       An  alternative  approach is to use named parentheses instead. The Perl
-       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
+       An alternative approach is to use named parentheses instead.  The  Perl
+       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
        supported. We could rewrite the above example as follows:
 
          (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
 
-       If  there  is more than one subpattern with the same name, the earliest
+       If there is more than one subpattern with the same name,  the  earliest
        one is used.
 
-       This particular example pattern that we have been looking  at  contains
-       nested  unlimited repeats, and so the use of atomic grouping for match-
-       ing strings of non-parentheses is important when applying  the  pattern
+       This  particular  example pattern that we have been looking at contains
+       nested unlimited repeats, and so the use of atomic grouping for  match-
+       ing  strings  of non-parentheses is important when applying the pattern
        to strings that do not match. For example, when this pattern is applied
        to
 
          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 
-       it yields "no match" quickly. However, if atomic grouping is not  used,
-       the  match  runs  for a very long time indeed because there are so many
-       different ways the + and * repeats can carve up the  subject,  and  all
+       it  yields "no match" quickly. However, if atomic grouping is not used,
+       the match runs for a very long time indeed because there  are  so  many
+       different  ways  the  + and * repeats can carve up the subject, and all
        have to be tested before failure can be reported.
 
        At the end of a match, the values set for any capturing subpatterns are
        those from the outermost level of the recursion at which the subpattern
-       value  is  set.   If  you want to obtain intermediate values, a callout
-       function can be used (see below and the pcrecallout documentation).  If
+       value is set.  If you want to obtain  intermediate  values,  a  callout
+       function  can be used (see below and the pcrecallout documentation). If
        the pattern above is matched against
 
          (ab(cd)ef)
 
-       the  value  for  the  capturing  parentheses is "ef", which is the last
-       value taken on at the top level. If additional parentheses  are  added,
+       the value for the capturing parentheses is  "ef",  which  is  the  last
+       value  taken  on at the top level. If additional parentheses are added,
        giving
 
          \( ( ( (?>[^()]+) | (?R) )* ) \)
             ^                        ^
             ^                        ^
 
-       the  string  they  capture is "ab(cd)ef", the contents of the top level
-       parentheses. If there are more than 15 capturing parentheses in a  pat-
+       the string they capture is "ab(cd)ef", the contents of  the  top  level
+       parentheses.  If there are more than 15 capturing parentheses in a pat-
        tern, PCRE has to obtain extra memory to store data during a recursion,
-       which it does by using pcre_malloc, freeing  it  via  pcre_free  after-
-       wards.  If  no  memory  can  be  obtained,  the  match  fails  with the
+       which  it  does  by  using pcre_malloc, freeing it via pcre_free after-
+       wards. If  no  memory  can  be  obtained,  the  match  fails  with  the
        PCRE_ERROR_NOMEMORY error.
 
-       Do not confuse the (?R) item with the condition (R),  which  tests  for
-       recursion.   Consider  this pattern, which matches text in angle brack-
-       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
-       brackets  (that is, when recursing), whereas any characters are permit-
+       Do  not  confuse  the (?R) item with the condition (R), which tests for
+       recursion.  Consider this pattern, which matches text in  angle  brack-
+       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
+       brackets (that is, when recursing), whereas any characters are  permit-
        ted at the outer level.
 
          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
 
-       In this pattern, (?(R) is the start of a conditional  subpattern,  with
-       two  different  alternatives for the recursive and non-recursive cases.
+       In  this  pattern, (?(R) is the start of a conditional subpattern, with
+       two different alternatives for the recursive and  non-recursive  cases.
        The (?R) item is the actual recursive call.
 
 
 SUBPATTERNS AS SUBROUTINES
 
        If the syntax for a recursive subpattern reference (either by number or
-       by  name)  is used outside the parentheses to which it refers, it oper-
-       ates like a subroutine in a programming language. The "called"  subpat-
+       by name) is used outside the parentheses to which it refers,  it  oper-
+       ates  like a subroutine in a programming language. The "called" subpat-
        tern may be defined before or after the reference. A numbered reference
        can be absolute or relative, as in these examples:
 
@@ -4472,61 +4476,61 @@ SUBPATTERNS AS SUBROUTINES
 
          (sens|respons)e and \1ibility
 
-       matches "sense and sensibility" and "response and responsibility",  but
+       matches  "sense and sensibility" and "response and responsibility", but
        not "sense and responsibility". If instead the pattern
 
          (sens|respons)e and (?1)ibility
 
-       is  used, it does match "sense and responsibility" as well as the other
-       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       is used, it does match "sense and responsibility" as well as the  other
+       two  strings.  Another  example  is  given  in the discussion of DEFINE
        above.
 
        Like recursive subpatterns, a "subroutine" call is always treated as an
-       atomic group. That is, once it has matched some of the subject  string,
-       it  is  never  re-entered, even if it contains untried alternatives and
+       atomic  group. That is, once it has matched some of the subject string,
+       it is never re-entered, even if it contains  untried  alternatives  and
        there is a subsequent matching failure.
 
-       When a subpattern is used as a subroutine, processing options  such  as
+       When  a  subpattern is used as a subroutine, processing options such as
        case-independence are fixed when the subpattern is defined. They cannot
        be changed for different calls. For example, consider this pattern:
 
          (abc)(?i:(?-1))
 
-       It matches "abcabc". It does not match "abcABC" because the  change  of
+       It  matches  "abcabc". It does not match "abcABC" because the change of
        processing option does not affect the called subpattern.
 
 
 CALLOUTS
 
        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl code to be obeyed in the middle of matching a regular  expression.
+       Perl  code to be obeyed in the middle of matching a regular expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.
 
        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an  external function by putting its entry point in the global variable
-       pcre_callout.  By default, this variable contains NULL, which  disables
+       an external function by putting its entry point in the global  variable
+       pcre_callout.   By default, this variable contains NULL, which disables
        all calling out.
 
-       Within  a  regular  expression,  (?C) indicates the points at which the
-       external function is to be called. If you want  to  identify  different
-       callout  points, you can put a number less than 256 after the letter C.
-       The default value is zero.  For example, this pattern has  two  callout
+       Within a regular expression, (?C) indicates the  points  at  which  the
+       external  function  is  to be called. If you want to identify different
+       callout points, you can put a number less than 256 after the letter  C.
+       The  default  value is zero.  For example, this pattern has two callout
        points:
 
          (?C1)abc(?C2)def
 
        If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
-       automatically installed before each item in the pattern. They  are  all
+       automatically  installed  before each item in the pattern. They are all
        numbered 255.
 
        During matching, when PCRE reaches a callout point (and pcre_callout is
-       set), the external function is called. It is provided with  the  number
-       of  the callout, the position in the pattern, and, optionally, one item
-       of data originally supplied by the caller of pcre_exec().  The  callout
-       function  may cause matching to proceed, to backtrack, or to fail alto-
+       set),  the  external function is called. It is provided with the number
+       of the callout, the position in the pattern, and, optionally, one  item
+       of  data  originally supplied by the caller of pcre_exec(). The callout
+       function may cause matching to proceed, to backtrack, or to fail  alto-
        gether. A complete description of the interface to the callout function
        is given in the pcrecallout documentation.
 
@@ -4545,7 +4549,306 @@ AUTHOR
 
 REVISION
 
-       Last updated: 19 June 2007
+       Last updated: 06 August 2007
+       Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRESYNTAX(3)                                                    PCRESYNTAX(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE REGULAR EXPRESSION SYNTAX SUMMARY
+
+       The  full syntax and semantics of the regular expressions that are sup-
+       ported by PCRE are described in  the  pcrepattern  documentation.  This
+       document contains just a quick-reference summary of the syntax.
+
+
+QUOTING
+
+         \x         where x is non-alphanumeric is a literal x
+         \Q...\E    treat enclosed characters as literal
+
+
+CHARACTERS
+
+         \a         alarm, that is, the BEL character (hex 07)
+         \cx        "control-x", where x is any character
+         \e         escape (hex 1B)
+         \f         formfeed (hex 0C)
+         \n         newline (hex 0A)
+         \r         carriage return (hex 0D)
+         \t         tab (hex 09)
+         \ddd       character with octal code ddd, or backreference
+         \xhh       character with hex code hh
+         \x{hhh..}  character with hex code hhh..
+
+
+CHARACTER TYPES
+
+         .          any character except newline;
+                      in dotall mode, any character whatsoever
+         \C         one byte, even in UTF-8 mode (best avoided)
+         \d         a decimal digit
+         \D         a character that is not a decimal digit
+         \h         a horizontal whitespace character
+         \H         a character that is not a horizontal whitespace character
+         \p{xx}     a character with the xx property
+         \P{xx}     a character without the xx property
+         \R         a newline sequence
+         \s         a whitespace character
+         \S         a character that is not a whitespace character
+         \v         a vertical whitespace character
+         \V         a character that is not a vertical whitespace character
+         \w         a "word" character
+         \W         a "non-word" character
+         \X         an extended Unicode sequence
+
+       In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+
+
+GENERAL CATEGORY PROPERTY CODES FOR \p and \P
+
+         C          Other
+         Cc         Control
+         Cf         Format
+         Cn         Unassigned
+         Co         Private use
+         Cs         Surrogate
+
+         L          Letter
+         Ll         Lower case letter
+         Lm         Modifier letter
+         Lo         Other letter
+         Lt         Title case letter
+         Lu         Upper case letter
+         L&         Ll, Lu, or Lt
+
+         M          Mark
+         Mc         Spacing mark
+         Me         Enclosing mark
+         Mn         Non-spacing mark
+
+         N          Number
+         Nd         Decimal number
+         Nl         Letter number
+         No         Other number
+
+         P          Punctuation
+         Pc         Connector punctuation
+         Pd         Dash punctuation
+         Pe         Close punctuation
+         Pf         Final punctuation
+         Pi         Initial punctuation
+         Po         Other punctuation
+         Ps         Open punctuation
+
+         S          Symbol
+         Sc         Currency symbol
+         Sk         Modifier symbol
+         Sm         Mathematical symbol
+         So         Other symbol
+
+         Z          Separator
+         Zl         Line separator
+         Zp         Paragraph separator
+         Zs         Space separator
+
+
+SCRIPT NAMES FOR \p AND \P
+
+       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
+       Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
+       Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
+       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
+       gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
+       Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
+       Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
+       Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
+       Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
+
+
+CHARACTER CLASSES
+
+         [...]       positive character class
+         [^...]      negative character class
+         [x-y]       range (can be used for hex characters)
+         [[:xxx:]]   positive POSIX named set
+         [[^:xxx:]]  negative POSIX named set
+
+         alnum       alphanumeric
+         alpha       alphabetic
+         ascii       0-127
+         blank       space or tab
+         cntrl       control character
+         digit       decimal digit
+         graph       printing, excluding space
+         lower       lower case letter
+         print       printing, including space
+         punct       printing, excluding alphanumeric
+         space       whitespace
+         upper       upper case letter
+         word        same as \w
+         xdigit      hexadecimal digit
+
+       In PCRE, POSIX character set names recognize only ASCII characters. You
+       can use \Q...\E inside a character class.
+
+
+QUANTIFIERS
+
+         ?           0 or 1, greedy
+         ?+          0 or 1, possessive
+         ??          0 or 1, lazy
+         *           0 or more, greedy
+         *+          0 or more, possessive
+         *?          0 or more, lazy
+         +           1 or more, greedy
+         ++          1 or more, possessive
+         +?          1 or more, lazy
+         {n}         exactly n
+         {n,m}       at least n, no more than m, greedy
+         {n,m}+      at least n, no more than m, possessive
+         {n,m}?      at least n, no more than m, lazy
+         {n,}        n or more, greedy
+         {n,}+       n or more, possessive
+         {n,}?       n or more, lazy
+
+
+ANCHORS AND SIMPLE ASSERTIONS
+
+         \b          word boundary
+         \B          not a word boundary
+         ^           start of subject
+                      also after internal newline in multiline mode
+         \A          start of subject
+         $           end of subject
+                      also before newline at end of subject
+                      also before internal newline in multiline mode
+         \Z          end of subject
+                      also before newline at end of subject
+         \z          end of subject
+         \G          first matching position in subject
+
+
+MATCH POINT RESET
+
+         \K          reset start of match
+
+
+ALTERNATION
+
+         expr|expr|expr...
+
+
+CAPTURING
+
+         (...)          capturing group
+         (?<name>...)   named capturing group (Perl)
+         (?'name'...)   named capturing group (Perl)
+         (?P<name>...)  named capturing group (Python)
+         (?:...)        non-capturing group
+         (?|...)        non-capturing group; reset group numbers for
+                         capturing groups in each alternative
+
+
+ATOMIC GROUPS
+
+         (?>...)        atomic, non-capturing group
+
+
+COMMENT
+
+         (?#....)       comment (not nestable)
+
+
+OPTION SETTING
+
+         (?i)           caseless
+         (?J)           allow duplicate names
+         (?m)           multiline
+         (?s)           single line (dotall)
+         (?U)           default ungreedy (lazy)
+         (?x)           extended (ignore white space)
+         (?-...)        unset option(s)
+
+
+LOOKAHEAD AND LOOKBEHIND ASSERTIONS
+
+         (?=...)        positive look ahead
+         (?!...)        negative look ahead
+         (?<=...)       positive look behind
+         (?<!...)       negative look behind
+
+       Each top-level branch of a look behind must be of a fixed length.
+
+
+BACKREFERENCES
+
+         \n             reference by number (can be ambiguous)
+         \gn            reference by number
+         \g{n}          reference by number
+         \g{-n}         relative reference by number
+         \k<name>       reference by name (Perl)
+         \k'name'       reference by name (Perl)
+         \g{name}       reference by name (Perl)
+         \k{name}       reference by name (.NET)
+         (?P=name)      reference by name (Python)
+
+
+SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
+
+         (?R)           recurse whole pattern
+         (?n)           call subpattern by absolute number
+         (?+n)          call subpattern by relative number
+         (?-n)          call subpattern by relative number
+         (?&name)       call subpattern by name (Perl)
+         (?P>name)      call subpattern by name (Python)
+
+
+CONDITIONAL PATTERNS
+
+         (?(condition)yes-pattern)
+         (?(condition)yes-pattern|no-pattern)
+
+         (?(n)...       absolute reference condition
+         (?(+n)...      relative reference condition
+         (?(-n)...      relative reference condition
+         (?(<name>)...  named reference condition (Perl)
+         (?('name')...  named reference condition (Perl)
+         (?(name)...    named reference condition (PCRE)
+         (?(R)...       overall recursion condition
+         (?(Rn)...      specific group recursion condition
+         (?(R&name)...  specific recursion condition
+         (?(DEFINE)...  define subpattern for reference
+         (?(assert)...  assertion condition
+
+
+CALLOUTS
+
+         (?C)      callout
+         (?Cn)     callout with data n
+
+
+SEE ALSO
+
+       pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 06 August 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
 
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 5a2444f..69d6584 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -1635,7 +1635,7 @@ example is shown in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
-documentation. 
+documentation.
 .P
 When duplicates are present, \fBpcre_copy_named_substring()\fP and
 \fBpcre_get_named_substring()\fP return the first substring corresponding to
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 4b7a909..e22505c 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -4,12 +4,16 @@ PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
 .rs
 .sp
-The syntax and semantics of the regular expressions supported by PCRE are
-described below. Regular expressions are also described in the Perl
-documentation and in a number of books, some of which have copious examples.
-Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, covers
-regular expressions in great detail. This description of PCRE's regular
-expressions is intended as reference material.
+The syntax and semantics of the regular expressions that are supported by PCRE
+are described in detail below. There is a quick-reference syntax summary in the
+.\" HREF
+\fBpcresyntax\fP
+.\"
+page. Perl's regular expressions are described in its own documentation, and
+regular expressions in general are covered in a number of books, some of which
+have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
+published by O'Reilly, covers regular expressions in great detail. This
+description of PCRE's regular expressions is intended as reference material.
 .P
 The original operation of PCRE was on strings of one-byte characters. However,
 there is now also support for UTF-8 character strings. To use this, you must
@@ -240,9 +244,9 @@ meanings
 .SS "Absolute and relative back references"
 .rs
 .sp
-The sequence \eg followed by a positive or negative number, optionally enclosed
-in braces, is an absolute or relative back reference. A named back reference
-can be coded as \eg{name}. Back references are discussed
+The sequence \eg followed by an unsigned or a negative number, optionally
+enclosed in braces, is an absolute or relative back reference. A named back
+reference can be coded as \eg{name}. Back references are discussed
 .\" HTML <a href="#backreferences">
 .\" </a>
 later,
@@ -1290,6 +1294,11 @@ previous example can be rewritten as
 .sp
   \ed++foo
 .sp
+Note that a possessive quantifier can be used with an entire group, for
+example:
+.sp
+  (abc|xyz){2,3}+
+.sp
 Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY
 option is ignored. They are a convenient notation for the simpler forms of
 atomic group. However, there is no difference in the meaning of a possessive
@@ -1364,16 +1373,17 @@ subpattern is possible using named parentheses (see below).
 .P
 Another way of avoiding the ambiguity inherent in the use of digits following a
 backslash is to use the \eg escape sequence, which is a feature introduced in
-Perl 5.10. This escape must be followed by a positive or a negative number,
-optionally enclosed in braces. These examples are all identical:
+Perl 5.10. This escape must be followed by an unsigned number or a negative
+number, optionally enclosed in braces. These examples are all identical:
 .sp
   (ring), \e1
   (ring), \eg1
   (ring), \eg{1}
 .sp
-A positive number specifies an absolute reference without the ambiguity that is
-present in the older syntax. It is also useful when literal digits follow the
-reference. A negative number is a relative reference. Consider this example:
+An unsigned number specifies an absolute reference without the ambiguity that
+is present in the older syntax. It is also useful when literal digits follow
+the reference. A negative number is a relative reference. Consider this
+example:
 .sp
   (abc(def)ghi)\eg{-1}
 .sp
@@ -1976,6 +1986,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 19 June 2007
+Last updated: 06 August 2007
 Copyright (c) 1997-2007 University of Cambridge.
 .fi
diff --git a/doc/pcresyntax.3 b/doc/pcresyntax.3
new file mode 100644
index 0000000..7e3461c
--- /dev/null
+++ b/doc/pcresyntax.3
@@ -0,0 +1,381 @@
+.TH PCRESYNTAX 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
+.rs
+.sp
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+.\" HREF
+\fBpcrepattern\fP
+.\"
+documentation. This document contains just a quick-reference summary of the
+syntax.
+.
+.
+.SH "QUOTING"
+.rs
+.sp
+  \ex         where x is non-alphanumeric is a literal x
+  \eQ...\eE    treat enclosed characters as literal
+.
+.
+.SH "CHARACTERS"
+.rs
+.sp
+  \ea         alarm, that is, the BEL character (hex 07)
+  \ecx        "control-x", where x is any character
+  \ee         escape (hex 1B)
+  \ef         formfeed (hex 0C)
+  \en         newline (hex 0A)
+  \er         carriage return (hex 0D)
+  \et         tab (hex 09)
+  \eddd       character with octal code ddd, or backreference
+  \exhh       character with hex code hh
+  \ex{hhh..}  character with hex code hhh..
+.
+.
+.SH "CHARACTER TYPES"
+.rs
+.sp
+  .          any character except newline;
+               in dotall mode, any character whatsoever
+  \eC         one byte, even in UTF-8 mode (best avoided)
+  \ed         a decimal digit
+  \eD         a character that is not a decimal digit
+  \eh         a horizontal whitespace character
+  \eH         a character that is not a horizontal whitespace character
+  \ep{\fIxx\fP}     a character with the \fIxx\fP property
+  \eP{\fIxx\fP}     a character without the \fIxx\fP property
+  \eR         a newline sequence
+  \es         a whitespace character
+  \eS         a character that is not a whitespace character
+  \ev         a vertical whitespace character
+  \eV         a character that is not a vertical whitespace character
+  \ew         a "word" character
+  \eW         a "non-word" character
+  \eX         an extended Unicode sequence
+.sp
+In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
+.
+.
+.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
+.rs
+.sp
+  C          Other
+  Cc         Control
+  Cf         Format
+  Cn         Unassigned
+  Co         Private use
+  Cs         Surrogate
+.sp
+  L          Letter
+  Ll         Lower case letter
+  Lm         Modifier letter
+  Lo         Other letter
+  Lt         Title case letter
+  Lu         Upper case letter
+  L&         Ll, Lu, or Lt
+.sp
+  M          Mark
+  Mc         Spacing mark
+  Me         Enclosing mark
+  Mn         Non-spacing mark
+.sp
+  N          Number
+  Nd         Decimal number
+  Nl         Letter number
+  No         Other number
+.sp
+  P          Punctuation
+  Pc         Connector punctuation
+  Pd         Dash punctuation
+  Pe         Close punctuation
+  Pf         Final punctuation
+  Pi         Initial punctuation
+  Po         Other punctuation
+  Ps         Open punctuation
+.sp
+  S          Symbol
+  Sc         Currency symbol
+  Sk         Modifier symbol
+  Sm         Mathematical symbol
+  So         Other symbol
+.sp
+  Z          Separator
+  Zl         Line separator
+  Zp         Paragraph separator
+  Zs         Space separator
+.
+.
+.SH "SCRIPT NAMES FOR \ep AND \eP"
+.rs
+.sp
+Arabic,
+Armenian,
+Balinese,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+.
+.
+.SH "CHARACTER CLASSES"
+.rs
+.sp
+  [...]       positive character class
+  [^...]      negative character class
+  [x-y]       range (can be used for hex characters)
+  [[:xxx:]]   positive POSIX named set
+  [[^:xxx:]]  negative POSIX named set
+.sp
+  alnum       alphanumeric
+  alpha       alphabetic
+  ascii       0-127
+  blank       space or tab
+  cntrl       control character
+  digit       decimal digit
+  graph       printing, excluding space
+  lower       lower case letter
+  print       printing, including space
+  punct       printing, excluding alphanumeric
+  space       whitespace
+  upper       upper case letter
+  word        same as \ew
+  xdigit      hexadecimal digit
+.sp
+In PCRE, POSIX character set names recognize only ASCII characters. You can use
+\eQ...\eE inside a character class.
+.
+.
+.SH "QUANTIFIERS"
+.rs
+.sp
+  ?           0 or 1, greedy
+  ?+          0 or 1, possessive
+  ??          0 or 1, lazy
+  *           0 or more, greedy
+  *+          0 or more, possessive
+  *?          0 or more, lazy
+  +           1 or more, greedy
+  ++          1 or more, possessive
+  +?          1 or more, lazy
+  {n}         exactly n
+  {n,m}       at least n, no more than m, greedy
+  {n,m}+      at least n, no more than m, possessive
+  {n,m}?      at least n, no more than m, lazy
+  {n,}        n or more, greedy
+  {n,}+       n or more, possessive
+  {n,}?       n or more, lazy
+.
+.
+.SH "ANCHORS AND SIMPLE ASSERTIONS"
+.rs
+.sp
+  \eb          word boundary
+  \eB          not a word boundary
+  ^           start of subject
+               also after internal newline in multiline mode
+  \eA          start of subject
+  $           end of subject
+               also before newline at end of subject
+               also before internal newline in multiline mode
+  \eZ          end of subject
+               also before newline at end of subject
+  \ez          end of subject
+  \eG          first matching position in subject
+.
+.
+.SH "MATCH POINT RESET"
+.rs
+.sp
+  \eK          reset start of match
+.
+.
+.SH "ALTERNATION"
+.rs
+.sp
+  expr|expr|expr...
+.
+.
+.SH "CAPTURING"
+.rs
+.sp
+  (...)          capturing group
+  (?<name>...)   named capturing group (Perl)
+  (?'name'...)   named capturing group (Perl)
+  (?P<name>...)  named capturing group (Python)
+  (?:...)        non-capturing group
+  (?|...)        non-capturing group; reset group numbers for
+                  capturing groups in each alternative
+.
+.
+.SH "ATOMIC GROUPS"
+.rs
+.sp
+  (?>...)        atomic, non-capturing group
+.
+.
+.
+.
+.SH "COMMENT"
+.rs
+.sp
+  (?#....)       comment (not nestable)
+.
+.
+.SH "OPTION SETTING"
+.rs
+.sp
+  (?i)           caseless
+  (?J)           allow duplicate names
+  (?m)           multiline
+  (?s)           single line (dotall)
+  (?U)           default ungreedy (lazy)
+  (?x)           extended (ignore white space)
+  (?-...)        unset option(s)
+.
+.
+.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
+.rs
+.sp
+  (?=...)        positive look ahead
+  (?!...)        negative look ahead
+  (?<=...)       positive look behind
+  (?<!...)       negative look behind
+.sp
+Each top-level branch of a look behind must be of a fixed length.
+.SH "BACKREFERENCES"
+.rs
+.sp
+  \en             reference by number (can be ambiguous)
+  \egn            reference by number
+  \eg{n}          reference by number
+  \eg{-n}         relative reference by number
+  \ek<name>       reference by name (Perl)
+  \ek'name'       reference by name (Perl)
+  \eg{name}       reference by name (Perl)
+  \ek{name}       reference by name (.NET)
+  (?P=name)      reference by name (Python)
+.
+.
+.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
+.rs
+.sp
+  (?R)           recurse whole pattern
+  (?n)           call subpattern by absolute number
+  (?+n)          call subpattern by relative number
+  (?-n)          call subpattern by relative number
+  (?&name)       call subpattern by name (Perl)
+  (?P>name)      call subpattern by name (Python)
+.
+.
+.SH "CONDITIONAL PATTERNS"
+.rs
+.sp
+  (?(condition)yes-pattern)
+  (?(condition)yes-pattern|no-pattern)
+.sp
+  (?(n)...       absolute reference condition
+  (?(+n)...      relative reference condition
+  (?(-n)...      relative reference condition
+  (?(<name>)...  named reference condition (Perl)
+  (?('name')...  named reference condition (Perl)
+  (?(name)...    named reference condition (PCRE)
+  (?(R)...       overall recursion condition
+  (?(Rn)...      specific group recursion condition
+  (?(R&name)...  specific recursion condition
+  (?(DEFINE)...  define subpattern for reference
+  (?(assert)...  assertion condition
+.
+.
+.SH "CALLOUTS"
+.rs
+.sp
+  (?C)      callout
+  (?Cn)     callout with data n
+.
+.
+.SH "SEE ALSO"
+.rs
+.sp
+\fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
+\fBpcrematching\fP(3), \fBpcre\fP(3).
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 06 August 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi
diff --git a/pcre_compile.c b/pcre_compile.c
index 4508ef9..7804015 100644
--- a/pcre_compile.c
+++ b/pcre_compile.c
@@ -2454,23 +2454,23 @@ for (;; ptr++)
       }
 
     /* If the first character is '^', set the negation flag and skip it. Also,
-    if the first few characters (either before or after ^) are \Q\E or \E we 
+    if the first few characters (either before or after ^) are \Q\E or \E we
     skip them too. This makes for compatibility with Perl. */
-    
+
     negate_class = FALSE;
     for (;;)
       {
       c = *(++ptr);
       if (c == '\\')
         {
-        if (ptr[1] == 'E') ptr++; 
+        if (ptr[1] == 'E') ptr++;
           else if (strncmp((const char *)ptr+1, "Q\\E", 3) == 0) ptr += 3;
-            else break; 
+            else break;
         }
       else if (!negate_class && c == '^')
         negate_class = TRUE;
       else break;
-      } 
+      }
 
     /* Keep a count of chars with values < 256 so that we can optimize the case
     of just a single character (as long as it's < 256). However, For higher
@@ -3075,7 +3075,7 @@ for (;; ptr++)
       *errorcodeptr = ERR6;
       goto FAILED;
       }
-      
+
     /* If class_charcount is 1, we saw precisely one character whose value is
     less than 256. In non-UTF-8 mode we can always optimize. In UTF-8 mode, we
     can optimize the negative case only if there were no characters >= 128
diff --git a/pcre_exec.c b/pcre_exec.c
index 8bc28a5..c380583 100644
--- a/pcre_exec.c
+++ b/pcre_exec.c
@@ -609,7 +609,7 @@ for (;;)
       eptr >= md->end_subject &&
       eptr > mstart)
     md->hitend = TRUE;
-    
+
   switch(op)
     {
     /* Handle a capturing bracket. If there is space in the offset vector, save
diff --git a/pcre_internal.h b/pcre_internal.h
index 1c5c323..20e0116 100644
--- a/pcre_internal.h
+++ b/pcre_internal.h
@@ -358,8 +358,8 @@ capturing parenthesis numbers in back references. */
 
 /* When UTF-8 encoding is being used, a character is no longer just a single
 byte. The macros for character handling generate simple sequences when used in
-byte-mode, and more complicated ones for UTF-8 characters. BACKCHAR should 
-never be called in byte mode. To make sure it can never even appear when UTF-8 
+byte-mode, and more complicated ones for UTF-8 characters. BACKCHAR should
+never be called in byte mode. To make sure it can never even appear when UTF-8
 support is omitted, we don't even define it. */
 
 #ifndef SUPPORT_UTF8
@@ -461,7 +461,7 @@ if there are extra bytes. This is called when we know we are in UTF-8 mode. */
     }
 
 /* If the pointer is not at the start of a character, move it back until
-it is. This is called only in UTF-8 mode - we don't put a test within the macro 
+it is. This is called only in UTF-8 mode - we don't put a test within the macro
 because almost all calls are already within a block of UTF-8 only code. */
 
 #define BACKCHAR(eptr) while((*eptr & 0xc0) == 0x80) eptr--
diff --git a/pcre_newline.c b/pcre_newline.c
index 9a8998c..db02a8c 100644
--- a/pcre_newline.c
+++ b/pcre_newline.c
@@ -135,7 +135,7 @@ if (utf8)
   GETCHAR(c, ptr);
   }
 else c = *ptr;
-#else   /* no UTF-8 support */ 
+#else   /* no UTF-8 support */
 c = *ptr;
 #endif  /* SUPPORT_UTF8 */
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-08-06 15:23:29 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-08-06 15:23:29 +0000
commit	7a27d7cb191012cfba8d5e2b43d96bbc47d43c8b (patch)
tree	125cef490f6bc14f778719247a2f3373e1d0dcd8
parent	c686e88e16cd4dfec241981367ab8c35c9a148f6 (diff)
download	pcre-7a27d7cb191012cfba8d5e2b43d96bbc47d43c8b.tar.gz