Allow fixed-length subroutine calls in lookbehinds.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@454 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-09-22 09:42:11 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-09-22 09:42:11 +0000
commit: 13ec83b84a6939e47ebabc1836caec7d94836896 (patch)
tree: 4590c85bd69ba6b50d8a741a3469a023edfc03fc
parent: 20dd865c5c8f10036cda34b9870351b702399c08 (diff)
download: pcre-13ec83b84a6939e47ebabc1836caec7d94836896.tar.gz
13 files changed, 577 insertions, 317 deletions
diff --git a/ChangeLog b/ChangeLog
index 5bdf7fe..dbc57de 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -131,6 +131,10 @@ Version 8.00 ??-???-??
 23. The tests have been re-organized, adding tests 11 and 12, to make it 
     possible to check the Perl 5.10 features against Perl 5.10.
     
+24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
+    pattern matches a fixed length string. PCRE did not allow this; now it 
+    does. Neither allows recursion. 
+    
 
 Version 7.9 11-Apr-09
 ---------------------
diff --git a/doc/html/pcreapi.html b/doc/html/pcreapi.html
index 1d1ca12..cac98d9 100644
--- a/doc/html/pcreapi.html
+++ b/doc/html/pcreapi.html
@@ -434,9 +434,12 @@ If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.
 Otherwise, if compilation of a pattern fails, <b>pcre_compile()</b> returns
 NULL, and sets the variable pointed to by <i>errptr</i> to point to a textual
 error message. This is a static string that is part of the library. You must
-not try to free it. The offset from the start of the pattern to the character
-where the error was discovered is placed in the variable pointed to by
-<i>erroffset</i>, which must not be NULL. If it is, an immediate error is given.
+not try to free it. The byte offset from the start of the pattern to the
+character that was being processes when the error was discovered is placed in
+the variable pointed to by <i>erroffset</i>, which must not be NULL. If it is,
+an immediate error is given. Some errors are not detected until checks are
+carried out when the whole pattern has been scanned; in this case the offset is
+set to the end of the pattern.
 </P>
 <P>
 If <b>pcre_compile2()</b> is used instead of <b>pcre_compile()</b>, and the
@@ -2018,7 +2021,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 September 2009
+Last updated: 22 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index e02a686..b2e0dd5 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -334,7 +334,9 @@ a number enclosed either in angle brackets or single quotes, is an alternative
 syntax for referencing a subpattern as a "subroutine". Details are discussed
 <a href="#onigurumasubroutines">later.</a>
 Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
-synonymous. The former is a back reference; the latter is a subroutine call.
+synonymous. The former is a back reference; the latter is a 
+<a href="#subpatternsassubroutines">subroutine</a>
+call.
 </P>
 <br><b>
 Generic character types
@@ -1662,20 +1664,21 @@ is permitted, but
 </pre>
 causes an error at compile time. Branches that match different length strings
 are permitted only at the top level of a lookbehind assertion. This is an
-extension compared with Perl (at least for 5.8), which requires all branches to
+extension compared with Perl (5.8 and 5.10), which requires all branches to
 match the same length of string. An assertion such as
 <pre>
   (?&#60;=ab(c|de))
 </pre>
 is not permitted, because its single top-level branch can match two different
-lengths, but it is acceptable if rewritten to use two top-level branches:
+lengths, but it is acceptable to PCRE if rewritten to use two top-level
+branches:
 <pre>
   (?&#60;=abc|abde)
 </pre>
 In some cases, the Perl 5.10 escape sequence \K
 <a href="#resetmatchstart">(see above)</a>
-can be used instead of a lookbehind assertion; this is not restricted to a
-fixed-length.
+can be used instead of a lookbehind assertion to get round the fixed-length 
+restriction.
 </P>
 <P>
 The implementation of lookbehind assertions is, for each alternative, to
@@ -1690,6 +1693,13 @@ the length of the lookbehind. The \X and \R escapes, which can match
 different numbers of bytes, are also not permitted.
 </P>
 <P>
+<a href="#subpatternsassubroutines">"Subroutine"</a>
+calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
+as the subpattern matches a fixed-length string. 
+<a href="#recursion">Recursion,</a>
+however, is not supported.
+</P>
+<P>
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
 specify efficient matching at the end of the subject string. Consider a simple
 pattern such as
@@ -1841,8 +1851,9 @@ number or name is given. This condition does not check the entire recursion
 stack.
 </P>
 <P>
-At "top level", all these recursion test conditions are false. Recursive
-patterns are described below.
+At "top level", all these recursion test conditions are false. 
+<a href="#recursion">Recursive patterns</a>
+are described below.
 </P>
 <br><b>
 Defining subpatterns for use by reference only
@@ -1852,7 +1863,8 @@ If the condition is the string (DEFINE), and there is no subpattern with the
 name DEFINE, the condition is always false. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
-"subroutines" that can be referenced from elsewhere. (The use of "subroutines"
+"subroutines" that can be referenced from elsewhere. (The use of 
+<a href="#subpatternsassubroutines">"subroutines"</a>
 is described below.) For example, a pattern to match an IPv4 address could be
 written like this (ignore whitespace and line breaks):
 <pre>
@@ -1927,7 +1939,8 @@ this kind of recursion was subsequently introduced into Perl at release 5.10.
 <P>
 A special item that consists of (? followed by a number greater than zero and a
 closing parenthesis is a recursive call of the subpattern of the given number,
-provided that it occurs inside that subpattern. (If not, it is a "subroutine"
+provided that it occurs inside that subpattern. (If not, it is a 
+<a href="#subpatternsassubroutines">"subroutine"</a>
 call, which is described in the next section.) The special item (?R) or (?0) is
 a recursive call of the entire regular expression.
 </P>
@@ -1963,7 +1976,8 @@ it is encountered.
 It is also possible to refer to subsequently opened parentheses, by writing
 references such as (?+2). However, these cannot be recursive because the
 reference is not inside the parentheses that are referenced. They are always
-"subroutine" calls, as described in the next section.
+<a href="#subpatternsassubroutines">"subroutine"</a>
+calls, as described in the next section.
 </P>
 <P>
 An alternative approach is to use named parentheses instead. The Perl syntax
@@ -2318,7 +2332,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 18 September 2009
+Last updated: 22 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>
diff --git a/doc/pcre.txt b/doc/pcre.txt
index ab98276..f6140e6 100644
--- a/doc/pcre.txt
+++ b/doc/pcre.txt
@@ -1156,10 +1156,12 @@ COMPILING A PATTERN
        if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
        sets the variable pointed to by errptr to point to a textual error mes-
        sage. This is a static string that is part of the library. You must not
-       try to free it. The offset from the start of the pattern to the charac-
-       ter where the error was discovered is placed in the variable pointed to
-       by erroffset, which must not be NULL. If it is, an immediate  error  is
-       given.
+       try to free it. The byte offset from the start of the  pattern  to  the
+       character  that  was  being  processes when the error was discovered is
+       placed in the variable pointed to by erroffset, which must not be NULL.
+       If  it  is,  an  immediate error is given. Some errors are not detected
+       until checks are carried out when the whole pattern has  been  scanned;
+       in this case the offset is set to the end of the pattern.
 
        If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
        codeptr argument is not NULL, a non-zero error code number is  returned
@@ -2666,7 +2668,7 @@ AUTHOR
 
 REVISION
 
-       Last updated: 11 September 2009
+       Last updated: 22 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
  
@@ -4483,57 +4485,60 @@ ASSERTIONS
 
        causes  an  error at compile time. Branches that match different length
        strings are permitted only at the top level of a lookbehind  assertion.
-       This  is  an  extension  compared  with  Perl (at least for 5.8), which
-       requires all branches to match the same length of string. An  assertion
-       such as
+       This  is an extension compared with Perl (5.8 and 5.10), which requires
+       all branches to match the same length of string. An assertion such as
 
          (?<=ab(c|de))
 
-       is  not  permitted,  because  its single top-level branch can match two
-       different lengths, but it is acceptable if rewritten to  use  two  top-
-       level branches:
+       is not permitted, because its single top-level  branch  can  match  two
+       different lengths, but it is acceptable to PCRE if rewritten to use two
+       top-level branches:
 
          (?<=abc|abde)
 
        In some cases, the Perl 5.10 escape sequence \K (see above) can be used
-       instead of a lookbehind assertion; this is not restricted to  a  fixed-
-       length.
+       instead  of  a  lookbehind  assertion  to  get  round  the fixed-length
+       restriction.
 
-       The  implementation  of lookbehind assertions is, for each alternative,
-       to temporarily move the current position back by the fixed  length  and
+       The implementation of lookbehind assertions is, for  each  alternative,
+       to  temporarily  move the current position back by the fixed length and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.
 
        PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode)  to appear in lookbehind assertions, because it makes it impossi-
-       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
+       mode) to appear in lookbehind assertions, because it makes it  impossi-
+       ble  to  calculate the length of the lookbehind. The \X and \R escapes,
        which can match different numbers of bytes, are also not permitted.
 
-       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
-       assertions to specify efficient matching at  the  end  of  the  subject
+       "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
+       lookbehinds,  as  long as the subpattern matches a fixed-length string.
+       Recursion, however, is not supported.
+
+       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
+       assertions  to  specify  efficient  matching  at the end of the subject
        string. Consider a simple pattern such as
 
          abcd$
 
-       when  applied  to  a  long string that does not match. Because matching
+       when applied to a long string that does  not  match.  Because  matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and  then  see  if what follows matches the rest of the pattern. If the
+       and then see if what follows matches the rest of the  pattern.  If  the
        pattern is specified as
 
          ^.*abcd$
 
-       the initial .* matches the entire string at first, but when this  fails
+       the  initial .* matches the entire string at first, but when this fails
        (because there is no following "a"), it backtracks to match all but the
-       last character, then all but the last two characters, and so  on.  Once
-       again  the search for "a" covers the entire string, from right to left,
+       last  character,  then all but the last two characters, and so on. Once
+       again the search for "a" covers the entire string, from right to  left,
        so we are no better off. However, if the pattern is written as
 
          ^.*+(?<=abcd)
 
-       there can be no backtracking for the .*+ item; it can  match  only  the
-       entire  string.  The subsequent lookbehind assertion does a single test
-       on the last four characters. If it fails, the match fails  immediately.
-       For  long  strings, this approach makes a significant difference to the
+       there  can  be  no backtracking for the .*+ item; it can match only the
+       entire string. The subsequent lookbehind assertion does a  single  test
+       on  the last four characters. If it fails, the match fails immediately.
+       For long strings, this approach makes a significant difference  to  the
        processing time.
 
    Using multiple assertions
@@ -4542,18 +4547,18 @@ ASSERTIONS
 
          (?<=\d{3})(?<!999)foo
 
-       matches "foo" preceded by three digits that are not "999". Notice  that
-       each  of  the  assertions is applied independently at the same point in
-       the subject string. First there is a  check  that  the  previous  three
-       characters  are  all  digits,  and  then there is a check that the same
+       matches  "foo" preceded by three digits that are not "999". Notice that
+       each of the assertions is applied independently at the  same  point  in
+       the  subject  string.  First  there  is a check that the previous three
+       characters are all digits, and then there is  a  check  that  the  same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded  by  six  characters,  the first of which are digits and the last
-       three of which are not "999". For example, it  doesn't  match  "123abc-
+       ceded by six characters, the first of which are  digits  and  the  last
+       three  of  which  are not "999". For example, it doesn't match "123abc-
        foo". A pattern to do that is
 
          (?<=\d{3}...)(?<!999)foo
 
-       This  time  the  first assertion looks at the preceding six characters,
+       This time the first assertion looks at the  preceding  six  characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".
 
@@ -4561,79 +4566,79 @@ ASSERTIONS
 
          (?<=(?<!foo)bar)baz
 
-       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       matches an occurrence of "baz" that is preceded by "bar" which in  turn
        is not preceded by "foo", while
 
          (?<=\d{3}(?!999)...)foo
 
-       is another pattern that matches "foo" preceded by three digits and  any
+       is  another pattern that matches "foo" preceded by three digits and any
        three characters that are not "999".
 
 
 CONDITIONAL SUBPATTERNS
 
-       It  is possible to cause the matching process to obey a subpattern con-
-       ditionally or to choose between two alternative subpatterns,  depending
-       on  the result of an assertion, or whether a previous capturing subpat-
-       tern matched or not. The two possible forms of  conditional  subpattern
+       It is possible to cause the matching process to obey a subpattern  con-
+       ditionally  or to choose between two alternative subpatterns, depending
+       on the result of an assertion, or whether a previous capturing  subpat-
+       tern  matched  or not. The two possible forms of conditional subpattern
        are
 
          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)
 
-       If  the  condition is satisfied, the yes-pattern is used; otherwise the
-       no-pattern (if present) is used. If there are more  than  two  alterna-
+       If the condition is satisfied, the yes-pattern is used;  otherwise  the
+       no-pattern  (if  present)  is used. If there are more than two alterna-
        tives in the subpattern, a compile-time error occurs.
 
-       There  are  four  kinds of condition: references to subpatterns, refer-
+       There are four kinds of condition: references  to  subpatterns,  refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.
 
    Checking for a used subpattern by number
 
-       If the text between the parentheses consists of a sequence  of  digits,
-       the  condition  is  true if the capturing subpattern of that number has
-       previously matched. An alternative notation is to  precede  the  digits
+       If  the  text between the parentheses consists of a sequence of digits,
+       the condition is true if the capturing subpattern of  that  number  has
+       previously  matched.  An  alternative notation is to precede the digits
        with a plus or minus sign. In this case, the subpattern number is rela-
        tive rather than absolute.  The most recently opened parentheses can be
-       referenced  by  (?(-1),  the  next most recent by (?(-2), and so on. In
+       referenced by (?(-1), the next most recent by (?(-2),  and  so  on.  In
        looping constructs it can also make sense to refer to subsequent groups
        with constructs such as (?(+2).
 
-       Consider  the  following  pattern, which contains non-significant white
+       Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
        divide it into three parts for ease of discussion:
 
          ( \( )?    [^()]+    (?(1) \) )
 
-       The  first  part  matches  an optional opening parenthesis, and if that
+       The first part matches an optional opening  parenthesis,  and  if  that
        character is present, sets it as the first captured substring. The sec-
-       ond  part  matches one or more characters that are not parentheses. The
+       ond part matches one or more characters that are not  parentheses.  The
        third part is a conditional subpattern that tests whether the first set
        of parentheses matched or not. If they did, that is, if subject started
        with an opening parenthesis, the condition is true, and so the yes-pat-
-       tern  is  executed  and  a  closing parenthesis is required. Otherwise,
-       since no-pattern is not present, the  subpattern  matches  nothing.  In
-       other  words,  this  pattern  matches  a  sequence  of non-parentheses,
+       tern is executed and a  closing  parenthesis  is  required.  Otherwise,
+       since  no-pattern  is  not  present, the subpattern matches nothing. In
+       other words,  this  pattern  matches  a  sequence  of  non-parentheses,
        optionally enclosed in parentheses.
 
-       If you were embedding this pattern in a larger one,  you  could  use  a
+       If  you  were  embedding  this pattern in a larger one, you could use a
        relative reference:
 
          ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
 
-       This  makes  the  fragment independent of the parentheses in the larger
+       This makes the fragment independent of the parentheses  in  the  larger
        pattern.
 
    Checking for a used subpattern by name
 
-       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
-       used  subpattern  by  name.  For compatibility with earlier versions of
-       PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
-       also  recognized. However, there is a possible ambiguity with this syn-
-       tax, because subpattern names may  consist  entirely  of  digits.  PCRE
-       looks  first for a named subpattern; if it cannot find one and the name
-       consists entirely of digits, PCRE looks for a subpattern of  that  num-
-       ber,  which must be greater than zero. Using subpattern names that con-
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
        sist entirely of digits is not recommended.
 
        Rewriting the above example to use a named subpattern gives this:
@@ -4644,85 +4649,85 @@ CONDITIONAL SUBPATTERNS
    Checking for pattern recursion
 
        If the condition is the string (R), and there is no subpattern with the
-       name  R, the condition is true if a recursive call to the whole pattern
+       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:
 
          (?(R3)...) or (?(R&name)...)
 
-       the  condition is true if the most recent recursion is into the subpat-
-       tern whose number or name is given. This condition does not  check  the
+       the condition is true if the most recent recursion is into the  subpat-
+       tern  whose  number or name is given. This condition does not check the
        entire recursion stack.
 
-       At  "top  level", all these recursion test conditions are false. Recur-
+       At "top level", all these recursion test conditions are false.   Recur-
        sive patterns are described below.
 
    Defining subpatterns for use by reference only
 
-       If the condition is the string (DEFINE), and  there  is  no  subpattern
-       with  the  name  DEFINE,  the  condition is always false. In this case,
-       there may be only one alternative  in  the  subpattern.  It  is  always
-       skipped  if  control  reaches  this  point  in the pattern; the idea of
-       DEFINE is that it can be used to define "subroutines" that can be  ref-
-       erenced  from elsewhere. (The use of "subroutines" is described below.)
-       For example, a pattern to match an IPv4 address could be  written  like
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define "subroutines" that can be ref-
+       erenced from elsewhere. (The use of "subroutines" is described  below.)
+       For  example,  a pattern to match an IPv4 address could be written like
        this (ignore whitespace and line breaks):
 
          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b
 
-       The  first part of the pattern is a DEFINE group inside which a another
-       group named "byte" is defined. This matches an individual component  of
-       an  IPv4  address  (a number less than 256). When matching takes place,
-       this part of the pattern is skipped because DEFINE acts  like  a  false
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
        condition.
 
        The rest of the pattern uses references to the named group to match the
-       four dot-separated components of an IPv4 address, insisting on  a  word
+       four  dot-separated  components of an IPv4 address, insisting on a word
        boundary at each end.
 
    Assertion conditions
 
-       If  the  condition  is  not  in any of the above formats, it must be an
-       assertion.  This may be a positive or negative lookahead or  lookbehind
-       assertion.  Consider  this  pattern,  again  containing non-significant
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:
 
          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
 
-       The condition  is  a  positive  lookahead  assertion  that  matches  an
-       optional  sequence of non-letters followed by a letter. In other words,
-       it tests for the presence of at least one letter in the subject.  If  a
-       letter  is found, the subject is matched against the first alternative;
-       otherwise it is  matched  against  the  second.  This  pattern  matches
-       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.
 
 
 COMMENTS
 
-       The sequence (?# marks the start of a comment that continues up to  the
-       next  closing  parenthesis.  Nested  parentheses are not permitted. The
-       characters that make up a comment play no part in the pattern  matching
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses  are  not  permitted.  The
+       characters  that make up a comment play no part in the pattern matching
        at all.
 
-       If  the PCRE_EXTENDED option is set, an unescaped # character outside a
-       character class introduces a  comment  that  continues  to  immediately
+       If the PCRE_EXTENDED option is set, an unescaped # character outside  a
+       character  class  introduces  a  comment  that continues to immediately
        after the next newline in the pattern.
 
 
 RECURSIVE PATTERNS
 
-       Consider  the problem of matching a string in parentheses, allowing for
-       unlimited nested parentheses. Without the use of  recursion,  the  best
-       that  can  be  done  is  to use a pattern that matches up to some fixed
-       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
+       Consider the problem of matching a string in parentheses, allowing  for
+       unlimited  nested  parentheses.  Without the use of recursion, the best
+       that can be done is to use a pattern that  matches  up  to  some  fixed
+       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
        depth.
 
        For some time, Perl has provided a facility that allows regular expres-
-       sions to recurse (amongst other things). It does this by  interpolating
-       Perl  code in the expression at run time, and the code can refer to the
+       sions  to recurse (amongst other things). It does this by interpolating
+       Perl code in the expression at run time, and the code can refer to  the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:
 
@@ -4732,178 +4737,178 @@ RECURSIVE PATTERNS
        refers recursively to the pattern in which it appears.
 
        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
-       it  supports  special  syntax  for recursion of the entire pattern, and
-       also for individual subpattern recursion.  After  its  introduction  in
-       PCRE  and  Python,  this  kind of recursion was subsequently introduced
+       it supports special syntax for recursion of  the  entire  pattern,  and
+       also  for  individual  subpattern  recursion. After its introduction in
+       PCRE and Python, this kind of  recursion  was  subsequently  introduced
        into Perl at release 5.10.
 
-       A special item that consists of (? followed by a  number  greater  than
+       A  special  item  that consists of (? followed by a number greater than
        zero and a closing parenthesis is a recursive call of the subpattern of
-       the given number, provided that it occurs inside that  subpattern.  (If
-       not,  it  is  a  "subroutine" call, which is described in the next sec-
-       tion.) The special item (?R) or (?0) is a recursive call of the  entire
+       the  given  number, provided that it occurs inside that subpattern. (If
+       not, it is a "subroutine" call, which is described  in  the  next  sec-
+       tion.)  The special item (?R) or (?0) is a recursive call of the entire
        regular expression.
 
-       This  PCRE  pattern  solves  the nested parentheses problem (assume the
+       This PCRE pattern solves the nested  parentheses  problem  (assume  the
        PCRE_EXTENDED option is set so that white space is ignored):
 
          \( ( (?>[^()]+) | (?R) )* \)
 
-       First it matches an opening parenthesis. Then it matches any number  of
-       substrings  which  can  either  be  a sequence of non-parentheses, or a
-       recursive match of the pattern itself (that is, a  correctly  parenthe-
+       First  it matches an opening parenthesis. Then it matches any number of
+       substrings which can either be a  sequence  of  non-parentheses,  or  a
+       recursive  match  of the pattern itself (that is, a correctly parenthe-
        sized substring).  Finally there is a closing parenthesis.
 
-       If  this  were  part of a larger pattern, you would not want to recurse
+       If this were part of a larger pattern, you would not  want  to  recurse
        the entire pattern, so instead you could use this:
 
          ( \( ( (?>[^()]+) | (?1) )* \) )
 
-       We have put the pattern into parentheses, and caused the  recursion  to
+       We  have  put the pattern into parentheses, and caused the recursion to
        refer to them instead of the whole pattern.
 
-       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
-       tricky. This is made easier by the use of relative references. (A  Perl
-       5.10  feature.)   Instead  of  (?1)  in the pattern above you can write
+       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
+       tricky.  This is made easier by the use of relative references. (A Perl
+       5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
        (?-2) to refer to the second most recently opened parentheses preceding
-       the  recursion.  In  other  words,  a  negative number counts capturing
+       the recursion. In other  words,  a  negative  number  counts  capturing
        parentheses leftwards from the point at which it is encountered.
 
-       It is also possible to refer to  subsequently  opened  parentheses,  by
-       writing  references  such  as (?+2). However, these cannot be recursive
-       because the reference is not inside the  parentheses  that  are  refer-
-       enced.  They  are  always  "subroutine" calls, as described in the next
+       It  is  also  possible  to refer to subsequently opened parentheses, by
+       writing references such as (?+2). However, these  cannot  be  recursive
+       because  the  reference  is  not inside the parentheses that are refer-
+       enced. They are always "subroutine" calls, as  described  in  the  next
        section.
 
-       An alternative approach is to use named parentheses instead.  The  Perl
-       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
+       An  alternative  approach is to use named parentheses instead. The Perl
+       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
        supported. We could rewrite the above example as follows:
 
          (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
 
-       If there is more than one subpattern with the same name,  the  earliest
+       If  there  is more than one subpattern with the same name, the earliest
        one is used.
 
-       This  particular  example pattern that we have been looking at contains
-       nested unlimited repeats, and so the use of atomic grouping for  match-
-       ing  strings  of non-parentheses is important when applying the pattern
+       This particular example pattern that we have been looking  at  contains
+       nested  unlimited repeats, and so the use of atomic grouping for match-
+       ing strings of non-parentheses is important when applying  the  pattern
        to strings that do not match. For example, when this pattern is applied
        to
 
          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 
-       it  yields "no match" quickly. However, if atomic grouping is not used,
-       the match runs for a very long time indeed because there  are  so  many
-       different  ways  the  + and * repeats can carve up the subject, and all
+       it yields "no match" quickly. However, if atomic grouping is not  used,
+       the  match  runs  for a very long time indeed because there are so many
+       different ways the + and * repeats can carve up the  subject,  and  all
        have to be tested before failure can be reported.
 
        At the end of a match, the values set for any capturing subpatterns are
        those from the outermost level of the recursion at which the subpattern
-       value is set.  If you want to obtain  intermediate  values,  a  callout
-       function  can be used (see below and the pcrecallout documentation). If
+       value  is  set.   If  you want to obtain intermediate values, a callout
+       function can be used (see below and the pcrecallout documentation).  If
        the pattern above is matched against
 
          (ab(cd)ef)
 
-       the value for the capturing parentheses is  "ef",  which  is  the  last
-       value  taken  on at the top level. If additional parentheses are added,
+       the  value  for  the  capturing  parentheses is "ef", which is the last
+       value taken on at the top level. If additional parentheses  are  added,
        giving
 
          \( ( ( (?>[^()]+) | (?R) )* ) \)
             ^                        ^
             ^                        ^
 
-       the string they capture is "ab(cd)ef", the contents of  the  top  level
-       parentheses.  If there are more than 15 capturing parentheses in a pat-
+       the  string  they  capture is "ab(cd)ef", the contents of the top level
+       parentheses. If there are more than 15 capturing parentheses in a  pat-
        tern, PCRE has to obtain extra memory to store data during a recursion,
-       which  it  does  by  using pcre_malloc, freeing it via pcre_free after-
-       wards. If  no  memory  can  be  obtained,  the  match  fails  with  the
+       which it does by using pcre_malloc, freeing  it  via  pcre_free  after-
+       wards.  If  no  memory  can  be  obtained,  the  match  fails  with the
        PCRE_ERROR_NOMEMORY error.
 
-       Do  not  confuse  the (?R) item with the condition (R), which tests for
-       recursion.  Consider this pattern, which matches text in  angle  brack-
-       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
-       brackets (that is, when recursing), whereas any characters are  permit-
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.
 
          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
 
-       In  this  pattern, (?(R) is the start of a conditional subpattern, with
-       two different alternatives for the recursive and  non-recursive  cases.
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.
 
    Recursion difference from Perl
 
-       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
+       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
-       alternatives and there is a subsequent matching failure.  This  can  be
-       illustrated  by the following pattern, which purports to match a palin-
-       dromic string that contains an odd number of characters  (for  example,
+       alternatives  and  there  is a subsequent matching failure. This can be
+       illustrated by the following pattern, which purports to match a  palin-
+       dromic  string  that contains an odd number of characters (for example,
        "a", "aba", "abcba", "abcdcba"):
 
          ^(.|(.)(?1)\2)$
 
        The idea is that it either matches a single character, or two identical
-       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
-       in  PCRE  it  does  not if the pattern is longer than three characters.
+       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
+       in PCRE it does not if the pattern is  longer  than  three  characters.
        Consider the subject string "abcba":
 
-       At the top level, the first character is matched, but as it is  not  at
+       At  the  top level, the first character is matched, but as it is not at
        the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern  1  successfully  matches the next character ("b"). (Note that the
+       tern 1 successfully matches the next character ("b").  (Note  that  the
        beginning and end of line tests are not part of the recursion).
 
-       Back at the top level, the next character ("c") is compared  with  what
-       subpattern  2 matched, which was "a". This fails. Because the recursion
-       is treated as an atomic group, there are now  no  backtracking  points,
-       and  so  the  entire  match fails. (Perl is able, at this point, to re-
-       enter the recursion and try the second alternative.)  However,  if  the
+       Back  at  the top level, the next character ("c") is compared with what
+       subpattern 2 matched, which was "a". This fails. Because the  recursion
+       is  treated  as  an atomic group, there are now no backtracking points,
+       and so the entire match fails. (Perl is able, at  this  point,  to  re-
+       enter  the  recursion  and try the second alternative.) However, if the
        pattern is written with the alternatives in the other order, things are
        different:
 
          ^((.)(?1)\2|.)$
 
-       This time, the recursing alternative is tried first, and  continues  to
-       recurse  until  it runs out of characters, at which point the recursion
-       fails. But this time we do have  another  alternative  to  try  at  the
-       higher  level.  That  is  the  big difference: in the previous case the
+       This  time,  the recursing alternative is tried first, and continues to
+       recurse until it runs out of characters, at which point  the  recursion
+       fails.  But  this  time  we  do  have another alternative to try at the
+       higher level. That is the big difference:  in  the  previous  case  the
        remaining alternative is at a deeper recursion level, which PCRE cannot
        use.
 
        To change the pattern so that matches all palindromic strings, not just
-       those with an odd number of characters, it is tempting  to  change  the
+       those  with  an  odd number of characters, it is tempting to change the
        pattern to this:
 
          ^((.)(?1)\2|.?)$
 
-       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
-       When a deeper recursion has matched a single character,  it  cannot  be
-       entered  again  in  order  to match an empty string. The solution is to
-       separate the two cases, and write out the odd and even cases as  alter-
+       Again, this works in Perl, but not in PCRE, and for  the  same  reason.
+       When  a  deeper  recursion has matched a single character, it cannot be
+       entered again in order to match an empty string.  The  solution  is  to
+       separate  the two cases, and write out the odd and even cases as alter-
        natives at the higher level:
 
          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
 
-       If  you  want  to match typical palindromic phrases, the pattern has to
+       If you want to match typical palindromic phrases, the  pattern  has  to
        ignore all non-word characters, which can be done like this:
 
          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$
 
        If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
-       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
-       ing  into  sequences of non-word characters. Without this, PCRE takes a
-       great deal longer (ten times or more) to  match  typical  phrases,  and
+       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
+       ing into sequences of non-word characters. Without this, PCRE  takes  a
+       great  deal  longer  (ten  times or more) to match typical phrases, and
        Perl takes so long that you think it has gone into a loop.
 
 
 SUBPATTERNS AS SUBROUTINES
 
        If the syntax for a recursive subpattern reference (either by number or
-       by name) is used outside the parentheses to which it refers,  it  oper-
-       ates  like a subroutine in a programming language. The "called" subpat-
+       by  name)  is used outside the parentheses to which it refers, it oper-
+       ates like a subroutine in a programming language. The "called"  subpat-
        tern may be defined before or after the reference. A numbered reference
        can be absolute or relative, as in these examples:
 
@@ -4915,110 +4920,110 @@ SUBPATTERNS AS SUBROUTINES
 
          (sens|respons)e and \1ibility
 
-       matches  "sense and sensibility" and "response and responsibility", but
+       matches "sense and sensibility" and "response and responsibility",  but
        not "sense and responsibility". If instead the pattern
 
          (sens|respons)e and (?1)ibility
 
-       is used, it does match "sense and responsibility" as well as the  other
-       two  strings.  Another  example  is  given  in the discussion of DEFINE
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
        above.
 
        Like recursive subpatterns, a "subroutine" call is always treated as an
-       atomic  group. That is, once it has matched some of the subject string,
-       it is never re-entered, even if it contains  untried  alternatives  and
+       atomic group. That is, once it has matched some of the subject  string,
+       it  is  never  re-entered, even if it contains untried alternatives and
        there is a subsequent matching failure.
 
-       When  a  subpattern is used as a subroutine, processing options such as
+       When a subpattern is used as a subroutine, processing options  such  as
        case-independence are fixed when the subpattern is defined. They cannot
        be changed for different calls. For example, consider this pattern:
 
          (abc)(?i:(?-1))
 
-       It  matches  "abcabc". It does not match "abcABC" because the change of
+       It matches "abcabc". It does not match "abcABC" because the  change  of
        processing option does not affect the called subpattern.
 
 
 ONIGURUMA SUBROUTINE SYNTAX
 
-       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
        name or a number enclosed either in angle brackets or single quotes, is
-       an alternative syntax for referencing a  subpattern  as  a  subroutine,
-       possibly  recursively. Here are two of the examples used above, rewrit-
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
        ten using this syntax:
 
          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility
 
-       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
+       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
        plus or a minus sign it is taken as a relative reference. For example:
 
          (abc)(?i:\g<-1>)
 
-       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
-       synonymous. The former is a back reference; the latter is a  subroutine
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The former is a back reference; the latter is a subroutine
        call.
 
 
 CALLOUTS
 
        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl code to be obeyed in the middle of matching a regular  expression.
+       Perl  code to be obeyed in the middle of matching a regular expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.
 
        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an  external function by putting its entry point in the global variable
-       pcre_callout.  By default, this variable contains NULL, which  disables
+       an external function by putting its entry point in the global  variable
+       pcre_callout.   By default, this variable contains NULL, which disables
        all calling out.
 
-       Within  a  regular  expression,  (?C) indicates the points at which the
-       external function is to be called. If you want  to  identify  different
-       callout  points, you can put a number less than 256 after the letter C.
-       The default value is zero.  For example, this pattern has  two  callout
+       Within a regular expression, (?C) indicates the  points  at  which  the
+       external  function  is  to be called. If you want to identify different
+       callout points, you can put a number less than 256 after the letter  C.
+       The  default  value is zero.  For example, this pattern has two callout
        points:
 
          (?C1)abc(?C2)def
 
        If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
-       automatically installed before each item in the pattern. They  are  all
+       automatically  installed  before each item in the pattern. They are all
        numbered 255.
 
        During matching, when PCRE reaches a callout point (and pcre_callout is
-       set), the external function is called. It is provided with  the  number
-       of  the callout, the position in the pattern, and, optionally, one item
-       of data originally supplied by the caller of pcre_exec().  The  callout
-       function  may cause matching to proceed, to backtrack, or to fail alto-
+       set),  the  external function is called. It is provided with the number
+       of the callout, the position in the pattern, and, optionally, one  item
+       of  data  originally supplied by the caller of pcre_exec(). The callout
+       function may cause matching to proceed, to backtrack, or to fail  alto-
        gether. A complete description of the interface to the callout function
        is given in the pcrecallout documentation.
 
 
 BACKTRACKING CONTROL
 
-       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
+       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
        which are described in the Perl documentation as "experimental and sub-
-       ject  to  change or removal in a future version of Perl". It goes on to
-       say: "Their usage in production code should be noted to avoid  problems
+       ject to change or removal in a future version of Perl". It goes  on  to
+       say:  "Their usage in production code should be noted to avoid problems
        during upgrades." The same remarks apply to the PCRE features described
        in this section.
 
-       Since these verbs are specifically related  to  backtracking,  most  of
-       them  can  be  used  only  when  the  pattern  is  to  be matched using
+       Since  these  verbs  are  specifically related to backtracking, most of
+       them can be  used  only  when  the  pattern  is  to  be  matched  using
        pcre_exec(), which uses a backtracking algorithm. With the exception of
        (*FAIL), which behaves like a failing negative assertion, they cause an
        error if encountered by pcre_dfa_exec().
 
        If any of these verbs are used in an assertion subpattern, their effect
-       is  confined  to that subpattern; it does not extend to the surrounding
-       pattern.  Note that assertion subpatterns are processed as anchored  at
+       is confined to that subpattern; it does not extend to  the  surrounding
+       pattern.   Note that assertion subpatterns are processed as anchored at
        the point where they are tested.
 
-       The  new verbs make use of what was previously invalid syntax: an open-
+       The new verbs make use of what was previously invalid syntax: an  open-
        ing parenthesis followed by an asterisk. In Perl, they are generally of
        the form (*VERB:ARG) but PCRE does not support the use of arguments, so
-       its general form is just (*VERB). Any number of these verbs  may  occur
+       its  general  form is just (*VERB). Any number of these verbs may occur
        in a pattern. There are two kinds:
 
    Verbs that act immediately
@@ -5027,94 +5032,94 @@ BACKTRACKING CONTROL
 
           (*ACCEPT)
 
-       This  verb causes the match to end successfully, skipping the remainder
-       of the pattern. When inside a recursion, only the innermost pattern  is
-       ended  immediately.  If  the (*ACCEPT) is inside capturing parentheses,
+       This verb causes the match to end successfully, skipping the  remainder
+       of  the pattern. When inside a recursion, only the innermost pattern is
+       ended immediately. If the (*ACCEPT) is  inside  capturing  parentheses,
        the data so far is captured. (This feature was added to PCRE at release
        8.00.) For example:
 
          A((?:A|B(*ACCEPT)|C)D)
 
-       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
        tured by the outer parentheses.
 
          (*FAIL) or (*F)
 
-       This verb causes the match to fail, forcing backtracking to  occur.  It
-       is  equivalent to (?!) but easier to read. The Perl documentation notes
-       that it is probably useful only when combined  with  (?{})  or  (??{}).
-       Those  are,  of course, Perl features that are not present in PCRE. The
-       nearest equivalent is the callout feature, as for example in this  pat-
+       This  verb  causes the match to fail, forcing backtracking to occur. It
+       is equivalent to (?!) but easier to read. The Perl documentation  notes
+       that  it  is  probably  useful only when combined with (?{}) or (??{}).
+       Those are, of course, Perl features that are not present in  PCRE.  The
+       nearest  equivalent is the callout feature, as for example in this pat-
        tern:
 
          a+(?C)(*FAIL)
 
-       A  match  with the string "aaaa" always fails, but the callout is taken
+       A match with the string "aaaa" always fails, but the callout  is  taken
        before each backtrack happens (in this example, 10 times).
 
    Verbs that act after backtracking
 
        The following verbs do nothing when they are encountered. Matching con-
-       tinues  with what follows, but if there is no subsequent match, a fail-
-       ure is forced.  The verbs  differ  in  exactly  what  kind  of  failure
+       tinues with what follows, but if there is no subsequent match, a  fail-
+       ure  is  forced.   The  verbs  differ  in  exactly what kind of failure
        occurs.
 
          (*COMMIT)
 
-       This  verb  causes  the whole match to fail outright if the rest of the
-       pattern does not match. Even if the pattern is unanchored,  no  further
-       attempts  to find a match by advancing the start point take place. Once
-       (*COMMIT) has been passed, pcre_exec() is committed to finding a  match
+       This verb causes the whole match to fail outright if the  rest  of  the
+       pattern  does  not match. Even if the pattern is unanchored, no further
+       attempts to find a match by advancing the start point take place.  Once
+       (*COMMIT)  has been passed, pcre_exec() is committed to finding a match
        at the current starting point, or not at all. For example:
 
          a+(*COMMIT)b
 
-       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
+       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
        of dynamic anchor, or "I've started, so I must finish."
 
          (*PRUNE)
 
-       This verb causes the match to fail at the current position if the  rest
+       This  verb causes the match to fail at the current position if the rest
        of the pattern does not match. If the pattern is unanchored, the normal
-       "bumpalong" advance to the next starting character then happens.  Back-
-       tracking  can  occur as usual to the left of (*PRUNE), or when matching
-       to the right of (*PRUNE), but if there is no match to the right,  back-
-       tracking  cannot  cross (*PRUNE).  In simple cases, the use of (*PRUNE)
+       "bumpalong"  advance to the next starting character then happens. Back-
+       tracking can occur as usual to the left of (*PRUNE), or  when  matching
+       to  the right of (*PRUNE), but if there is no match to the right, back-
+       tracking cannot cross (*PRUNE).  In simple cases, the use  of  (*PRUNE)
        is just an alternative to an atomic group or possessive quantifier, but
-       there  are  some uses of (*PRUNE) that cannot be expressed in any other
+       there are some uses of (*PRUNE) that cannot be expressed in  any  other
        way.
 
          (*SKIP)
 
-       This verb is like (*PRUNE), except that if the pattern  is  unanchored,
-       the  "bumpalong" advance is not to the next character, but to the posi-
-       tion in the subject where (*SKIP) was  encountered.  (*SKIP)  signifies
-       that  whatever  text  was  matched leading up to it cannot be part of a
+       This  verb  is like (*PRUNE), except that if the pattern is unanchored,
+       the "bumpalong" advance is not to the next character, but to the  posi-
+       tion  in  the  subject where (*SKIP) was encountered. (*SKIP) signifies
+       that whatever text was matched leading up to it cannot  be  part  of  a
        successful match. Consider:
 
          a+(*SKIP)b
 
-       If the subject is "aaaac...",  after  the  first  match  attempt  fails
-       (starting  at  the  first  character in the string), the starting point
+       If  the  subject  is  "aaaac...",  after  the first match attempt fails
+       (starting at the first character in the  string),  the  starting  point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer  does not have the same effect in this example; although it would
-       suppress backtracking  during  the  first  match  attempt,  the  second
-       attempt  would  start at the second character instead of skipping on to
+       tifer does not have the same effect in this example; although it  would
+       suppress  backtracking  during  the  first  match  attempt,  the second
+       attempt would start at the second character instead of skipping  on  to
        "c".
 
          (*THEN)
 
        This verb causes a skip to the next alternation if the rest of the pat-
        tern does not match. That is, it cancels pending backtracking, but only
-       within the current alternation. Its name  comes  from  the  observation
+       within  the  current  alternation.  Its name comes from the observation
        that it can be used for a pattern-based if-then-else block:
 
          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
 
-       If  the COND1 pattern matches, FOO is tried (and possibly further items
-       after the end of the group if FOO succeeds);  on  failure  the  matcher
-       skips  to  the second alternative and tries COND2, without backtracking
-       into COND1. If (*THEN) is used outside  of  any  alternation,  it  acts
+       If the COND1 pattern matches, FOO is tried (and possibly further  items
+       after  the  end  of  the group if FOO succeeds); on failure the matcher
+       skips to the second alternative and tries COND2,  without  backtracking
+       into  COND1.  If  (*THEN)  is  used outside of any alternation, it acts
        exactly like (*PRUNE).
 
 
@@ -5132,7 +5137,7 @@ AUTHOR
 
 REVISION
 
-       Last updated: 18 September 2009
+       Last updated: 22 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
  
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 9175820..cf4a013 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -427,9 +427,12 @@ If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
 NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual
 error message. This is a static string that is part of the library. You must
-not try to free it. The offset from the start of the pattern to the character
-where the error was discovered is placed in the variable pointed to by
-\fIerroffset\fP, which must not be NULL. If it is, an immediate error is given.
+not try to free it. The byte offset from the start of the pattern to the
+character that was being processes when the error was discovered is placed in
+the variable pointed to by \fIerroffset\fP, which must not be NULL. If it is,
+an immediate error is given. Some errors are not detected until checks are
+carried out when the whole pattern has been scanned; in this case the offset is
+set to the end of the pattern.
 .P
 If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the
 \fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is
@@ -2020,6 +2023,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 11 September 2009
+Last updated: 22 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 21e3f71..0b26453 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -333,7 +333,12 @@ syntax for referencing a subpattern as a "subroutine". Details are discussed
 later.
 .\"
 Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
-synonymous. The former is a back reference; the latter is a subroutine call.
+synonymous. The former is a back reference; the latter is a 
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+subroutine
+.\"
+call.
 .
 .
 .SS "Generic character types"
@@ -1669,13 +1674,14 @@ is permitted, but
 .sp
 causes an error at compile time. Branches that match different length strings
 are permitted only at the top level of a lookbehind assertion. This is an
-extension compared with Perl (at least for 5.8), which requires all branches to
+extension compared with Perl (5.8 and 5.10), which requires all branches to
 match the same length of string. An assertion such as
 .sp
   (?<=ab(c|de))
 .sp
 is not permitted, because its single top-level branch can match two different
-lengths, but it is acceptable if rewritten to use two top-level branches:
+lengths, but it is acceptable to PCRE if rewritten to use two top-level
+branches:
 .sp
   (?<=abc|abde)
 .sp
@@ -1684,8 +1690,8 @@ In some cases, the Perl 5.10 escape sequence \eK
 .\" </a>
 (see above)
 .\"
-can be used instead of a lookbehind assertion; this is not restricted to a
-fixed-length.
+can be used instead of a lookbehind assertion to get round the fixed-length 
+restriction.
 .P
 The implementation of lookbehind assertions is, for each alternative, to
 temporarily move the current position back by the fixed length and then try to
@@ -1697,6 +1703,18 @@ to appear in lookbehind assertions, because it makes it impossible to calculate
 the length of the lookbehind. The \eX and \eR escapes, which can match
 different numbers of bytes, are also not permitted.
 .P
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"Subroutine"
+.\"
+calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
+as the subpattern matches a fixed-length string. 
+.\" HTML <a href="#recursion">
+.\" </a>
+Recursion,
+.\"
+however, is not supported.
+.P
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
 specify efficient matching at the end of the subject string. Consider a simple
 pattern such as
@@ -1841,8 +1859,12 @@ the condition is true if the most recent recursion is into the subpattern whose
 number or name is given. This condition does not check the entire recursion
 stack.
 .P
-At "top level", all these recursion test conditions are false. Recursive
-patterns are described below.
+At "top level", all these recursion test conditions are false. 
+.\" HTML <a href="#recursion">
+.\" </a>
+Recursive patterns
+.\"
+are described below.
 .
 .SS "Defining subpatterns for use by reference only"
 .rs
@@ -1851,7 +1873,11 @@ If the condition is the string (DEFINE), and there is no subpattern with the
 name DEFINE, the condition is always false. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
-"subroutines" that can be referenced from elsewhere. (The use of "subroutines"
+"subroutines" that can be referenced from elsewhere. (The use of 
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"subroutines"
+.\"
 is described below.) For example, a pattern to match an IPv4 address could be
 written like this (ignore whitespace and line breaks):
 .sp
@@ -1926,7 +1952,11 @@ this kind of recursion was subsequently introduced into Perl at release 5.10.
 .P
 A special item that consists of (? followed by a number greater than zero and a
 closing parenthesis is a recursive call of the subpattern of the given number,
-provided that it occurs inside that subpattern. (If not, it is a "subroutine"
+provided that it occurs inside that subpattern. (If not, it is a 
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"subroutine"
+.\"
 call, which is described in the next section.) The special item (?R) or (?0) is
 a recursive call of the entire regular expression.
 .P
@@ -1958,7 +1988,11 @@ it is encountered.
 It is also possible to refer to subsequently opened parentheses, by writing
 references such as (?+2). However, these cannot be recursive because the
 reference is not inside the parentheses that are referenced. They are always
-"subroutine" calls, as described in the next section.
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"subroutine"
+.\"
+calls, as described in the next section.
 .P
 An alternative approach is to use named parentheses instead. The Perl syntax
 for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We
@@ -2317,6 +2351,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 18 September 2009
+Last updated: 22 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi
diff --git a/maint/README b/maint/README
index 5168936..aa904d6 100644
--- a/maint/README
+++ b/maint/README
@@ -36,6 +36,8 @@ MultiStage2.py   A Python script that generates the file pcre_ucd.c from three
                  The generated file contains the tables for a 2-stage lookup
                  of Unicode properties.  
 
+README           This file.
+
 Unicode.tables   The files in this directory, DerivedGeneralCategory.txt, 
                  Scripts.txt and UnicodeData.txt, were downloaded from the
                  Unicode web site. They contain information about Unicode
@@ -62,16 +64,15 @@ Updating to a new Unicode release
 ---------------------------------
 
 When there is a new release of Unicode, the files in Unicode.tables must be
-refreshed from the web site. If the new version of Unicode adds new character 
+refreshed from the web site. If the new version of Unicode adds new character
 scripts, the source file ucp.h and both the MultiStage2.py and the
-GenerateUtt.py scripts must be edited to add the new names. Then the
-MultiStage2.py script can then be run to generate a new version of pcre_ucd.c
-and the GenerateUtt.py can be run to generate the tricky tables for inclusion
-in pcre_tables.c.
+GenerateUtt.py scripts must be edited to add the new names. Then MultiStage2.py
+can be run to generate a new version of pcre_ucd.c, and GenerateUtt.py can be
+run to generate the tricky tables for inclusion in pcre_tables.c.
 
-The ucptest program can then be compiled and used to check that the new tables
-in pcre_ucd.c work properly, using the data files in ucptestdata to check a
-number of test characters.
+The ucptest program can be compiled and used to check that the new tables in
+pcre_ucd.c work properly, using the data files in ucptestdata to check a number
+of test characters.
 
 
 Preparing for a PCRE release
@@ -80,8 +81,7 @@ Preparing for a PCRE release
 This section contains a checklist of things that I consult before building a
 distribution for a new release.
 
-. Ensure that the version number and version date are correct in configure.ac,
-  ChangeLog, and NEWS.
+. Ensure that the version number and version date are correct in configure.ac.
   
 . If new build options have been added, ensure that they are added to the CMake
   files as well as to the autoconf files. 
@@ -91,9 +91,11 @@ distribution for a new release.
 . Compile and test with many different config options, and combinations of
   options. The maint/ManyConfigTests script now encapsulates this testing.
 
-. Run perltest.pl on the test data for tests 1 and 4. The output should match
-  the PCRE test output, apart from the version identification at the top. The
-  other tests are not Perl-compatible (they use various special PCRE options).
+. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can 
+  be run with Perl 5.8 or 5.10; the last two require Perl 5.10. The output
+  should match the PCRE test output, apart from the version identification at
+  the start of each test. The other tests are not Perl-compatible (they use
+  various PCRE-specific features or options).
 
 . Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest
   valgrind", though that takes quite a long time.
@@ -116,9 +118,9 @@ distribution for a new release.
   used" warnings for the modules in which there is no call to memmove(). These
   can be ignored.
 
-. Documentation: check AUTHORS, COPYING, ChangeLog (check date), INSTALL,
-  LICENCE, NEWS (check date), NON-UNIX-USE, and README. Many of these won't
-  need changing, but over the long term things do change.
+. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date), 
+  INSTALL, LICENCE, NEWS (check version and date), NON-UNIX-USE, and README.
+  Many of these won't need changing, but over the long term things do change.
 
 . Man pages: Check all man pages for \ not followed by e or f or " because
   that indicates a markup error.
@@ -138,7 +140,7 @@ spaces). Then run "make distcheck" to create the tarballs and the zipball.
 Double-check with "svn status", then create an SVN tagged copy:
 
   svn copy svn://vcs.exim.org/pcre/code/trunk \
-           svn://vcs.exim.org/pcre/code/tags/pcre-7.x 
+           svn://vcs.exim.org/pcre/code/tags/pcre-8.xx 
 
 Don't forget to update Freshmeat when the new release is out, and to tell
 webmaster@pcre.org and the mailing list.
@@ -166,7 +168,7 @@ others are relatively new.
     to have little effect, and maybe makes things worse.
 
   * "Ends with literal string" - note that a single character doesn't gain much
-    over the existing "required byte" (reqbyte) feature that just saves one
+    over the existing "required byte" (reqbyte) feature that just remembers one
     byte.
 
   * These probably need to go in study():
@@ -176,9 +178,14 @@ others are relatively new.
     o A required byte from alternatives - not just the last char, but an
       earlier one if common to all alternatives.
 
-    o Minimum length of subject needed.
+    o Minimum length of subject needed (see also next . bullet).
 
     o Friedl contains other ideas.
+    
+. There was a request for a way of finding the minimum subject length that can
+  match a given pattern. (If this were available, it could be usefully added
+  to study() - see above.) This is easy for simple cases, but I haven't figured 
+  out how to handle recursion.   
 
 . If Perl gets to a consistent state over the settings of capturing sub-
   patterns inside repeats, see if we can match it. One example of the
@@ -213,10 +220,10 @@ others are relatively new.
 
   * Option to use NUL as a line terminator in subject strings. This could now
     be done relatively easily since the extension to support LF, CR, and CRLF.
-    If this is done, a suitable option for pcregrep is also required.
+    If it is done, a suitable option for pcregrep is also required.
 
 . Option to provide the pattern with a length instead of with a NUL terminator.
-  This probably affects quite a few places in the code.
+  This affects quite a few places in the code and is not trivial.
 
 . Catch SIGSEGV for stack overflows?
 
@@ -231,7 +238,7 @@ others are relatively new.
   preceded by a blank line, instead of adding it to every matched line, and (b)
   support --outputfile=name.
 
-. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 7.
+. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 8.
 
 . Add a user pointer to pcre_malloc/free functions -- some option would be
   needed to retain backward compatibility.
@@ -268,9 +275,37 @@ others are relatively new.
 . Callouts with arguments: (?Cn:ARG) for instance.
 
 . A user is going to supply a patch to generalize the API for user-specific 
-  memory allocation so that it is more flexible in threaded environments.
+  memory allocation so that it is more flexible in threaded environments. Thiw 
+  was promised a long time ago, and never appeared...
+  
+. Write a function that generates random matching strings for a compiled regex.
+
+. Write a wrapper to maintain a structure with specified runtime parameters, 
+  such as recurse limit, and pass these to PCRE each time it is called. Also 
+  maybe malloc and free. A user sent a prototype.
+  
+. Pcregrep: an option to specify the output line separator, either as a string 
+  or select from a fixed list. This is not dead easy, because at the moment it 
+  outputs whatever is in the input file.
+  
+. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete, 
+  non-thread-safe patch showed that this can help performance for patterns 
+  where there are many alternatives. However, a simple thread-safe 
+  implementation that I tried made things worse in many simple cases, so this 
+  is not an obviously good thing.
+  
+. Make the longest lookbehind available via pcre_fullinfo(). This is not 
+  straightforward because lookbehinds can be nested inside lookbehinds. This 
+  case will have to be identified, and the amounts added. This should then give 
+  the maximum possible lookbehind length. The reason for wanting this is to 
+  help when implementing multi-segment matching using pcre_exec() with partial
+  matching and overlapping segments.
+  
+. PCRE cannot at present distinguish between subpatterns with different names,
+  but the same number (created by the use of ?|). In order to do so, a way of 
+  remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
 
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 26 August 2008
+Last updated: 20 September 2009
diff --git a/pcre_compile.c b/pcre_compile.c
index 69fa428..56c97cb 100644
--- a/pcre_compile.c
+++ b/pcre_compile.c
@@ -1331,23 +1331,34 @@ for (;;)
 
 
 /*************************************************
-*        Find the fixed length of a pattern      *
+*        Find the fixed length of a branch       *
 *************************************************/
 
-/* Scan a pattern and compute the fixed length of subject that will match it,
+/* Scan a branch and compute the fixed length of subject that will match it,
 if the length is fixed. This is needed for dealing with backward assertions.
-In UTF8 mode, the result is in characters rather than bytes.
+In UTF8 mode, the result is in characters rather than bytes. The branch is 
+temporarily terminated with OP_END when this function is called.
+
+This function is called when a backward assertion is encountered, so that if it 
+fails, the error message can point to the correct place in the pattern. 
+However, we cannot do this when the assertion contains subroutine calls,
+because they can be forward references. We solve this by remembering this case 
+and doing the check at the end; a flag specifies which mode we are running in.
 
 Arguments:
   code     points to the start of the pattern (the bracket)
   options  the compiling options
+  atend    TRUE if called when the pattern is complete 
+  cd       the "compile data" structure 
 
-Returns:   the fixed length, or -1 if there is no fixed length,
+Returns:   the fixed length, 
+             or -1 if there is no fixed length,
              or -2 if \C was encountered
+             or -3 if an OP_RECURSE item was encountered and atend is FALSE
 */
 
 static int
-find_fixedlength(uschar *code, int options)
+find_fixedlength(uschar *code, int options, BOOL atend, compile_data *cd)
 {
 int length = -1;
 
@@ -1360,6 +1371,7 @@ branch, check the length against that of the other branches. */
 for (;;)
   {
   int d;
+  uschar *ce, *cs;
   register int op = *cc;
   switch (op)
     {
@@ -1367,7 +1379,7 @@ for (;;)
     case OP_BRA:
     case OP_ONCE:
     case OP_COND:
-    d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options);
+    d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options, atend, cd);
     if (d < 0) return d;
     branchlength += d;
     do cc += GET(cc, 1); while (*cc == OP_ALT);
@@ -1389,6 +1401,21 @@ for (;;)
     cc += 1 + LINK_SIZE;
     branchlength = 0;
     break;
+    
+    /* A true recursion implies not fixed length, but a subroutine call may
+    be OK. If the subroutine is a forward reference, we can't deal with
+    it until the end of the pattern, so return -3. */
+    
+    case OP_RECURSE:
+    if (!atend) return -3;
+    cs = ce = (uschar *)cd->start_code + GET(cc, 1);  /* Start subpattern */
+    do ce += GET(ce, 1); while (*ce == OP_ALT);       /* End subpattern */
+    if (cc > cs && cc < ce) return -1;                /* Recursion */
+    d = find_fixedlength(cs + 2, options, atend, cd);
+    if (d < 0) return d; 
+    branchlength += d;
+    cc += 1 + LINK_SIZE;
+    break;   
 
     /* Skip over assertive subpatterns */
 
@@ -1518,16 +1545,17 @@ for (;;)
 
 
 /*************************************************
-*    Scan compiled regex for numbered bracket    *
+*    Scan compiled regex for specific bracket    *
 *************************************************/
 
 /* This little function scans through a compiled pattern until it finds a
-capturing bracket with the given number.
+capturing bracket with the given number, or, if the number is negative, an
+instance of OP_REVERSE for a lookbehind.
 
 Arguments:
   code        points to start of expression
   utf8        TRUE in UTF-8 mode
-  number      the required bracket number
+  number      the required bracket number or negative to find a lookbehind
 
 Returns:      pointer to the opcode for the bracket, or NULL if not found
 */
@@ -1545,6 +1573,14 @@ for (;;)
   the table is zero; the actual length is stored in the compiled code. */
 
   if (c == OP_XCLASS) code += GET(code, 1);
+  
+  /* Handle recursion */
+  
+  else if (c == OP_REVERSE)
+    {
+    if (number < 0) return (uschar *)code; 
+    code += _pcre_OP_lengths[c];
+    }
 
   /* Handle capturing bracket */
 
@@ -5813,21 +5849,29 @@ for (;;)
 
     /* If lookbehind, check that this branch matches a fixed-length string, and
     put the length into the OP_REVERSE item. Temporarily mark the end of the
-    branch with OP_END. */
+    branch with OP_END. If the branch contains OP_RECURSE, the result is -3 
+    because there may be forward references that we can't check here. Set a
+    flag to cause another lookbehind check at the end. Why not do it all at the 
+    end? Because common, erroneous checks are picked up here and the offset of 
+    the problem can be shown. */
 
     if (lookbehind)
       {
       int fixed_length;
       *code = OP_END;
-      fixed_length = find_fixedlength(last_branch, options);
+      fixed_length = find_fixedlength(last_branch, options, FALSE, cd);
       DPRINTF(("fixed length = %d\n", fixed_length));
-      if (fixed_length < 0)
+      if (fixed_length == -3)
+        {
+        cd->check_lookbehind = TRUE; 
+        }   
+      else if (fixed_length < 0)
         {
         *errorcodeptr = (fixed_length == -2)? ERR36 : ERR25;
         *ptrptr = ptr;
         return FALSE;
         }
-      PUT(reverse_count, 0, fixed_length);
+      else { PUT(reverse_count, 0, fixed_length); }
       }
     }
 
@@ -6230,9 +6274,7 @@ int length = 1;  /* For final END opcode */
 int firstbyte, reqbyte, newline;
 int errorcode = 0;
 int skipatstart = 0;
-#ifdef SUPPORT_UTF8
-BOOL utf8;
-#endif
+BOOL utf8 = (options & PCRE_UTF8) != 0;
 size_t size;
 uschar *code;
 const uschar *codestart;
@@ -6329,7 +6371,6 @@ while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS &&
 /* Can't support UTF8 unless PCRE has been compiled to include the code. */
 
 #ifdef SUPPORT_UTF8
-utf8 = (options & PCRE_UTF8) != 0;
 if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
      (*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
   {
@@ -6337,7 +6378,7 @@ if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
   goto PCRE_EARLY_ERROR_RETURN2;
   }
 #else
-if ((options & PCRE_UTF8) != 0)
+if (utf8)
   {
   errorcode = ERR32;
   goto PCRE_EARLY_ERROR_RETURN;
@@ -6501,6 +6542,7 @@ cd->start_code = codestart;
 cd->hwm = cworkspace;
 cd->req_varyopt = 0;
 cd->had_accept = FALSE;
+cd->check_lookbehind = FALSE;
 cd->open_caps = NULL;
 
 /* Set up a starting, non-extracting bracket, then compile the expression. On
@@ -6540,7 +6582,7 @@ while (errorcode == 0 && cd->hwm > cworkspace)
   cd->hwm -= LINK_SIZE;
   offset = GET(cd->hwm, 0);
   recno = GET(codestart, offset);
-  groupptr = find_bracket(codestart, (re->options & PCRE_UTF8) != 0, recno);
+  groupptr = find_bracket(codestart, utf8, recno);
   if (groupptr == NULL) errorcode = ERR53;
     else PUT(((uschar *)codestart), offset, groupptr - codestart);
   }
@@ -6550,6 +6592,47 @@ subpattern. */
 
 if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;
 
+/* If there were any lookbehind assertions that contained OP_RECURSE 
+(recursions or subroutine calls), a flag is set for them to be checked here,
+because they may contain forward references. Actual recursions can't be fixed
+length, but subroutine calls can. It is done like this so that those without
+OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
+exceptional ones forgo this. We scan the pattern to check that they are fixed
+length, and set their lengths. */
+
+if (cd->check_lookbehind)
+  {
+  uschar *cc = (uschar *)codestart;
+   
+  /* Loop, searching for OP_REVERSE items, and process those that do not have 
+  their length set. (Actually, it will also re-process any that have a length 
+  of zero, but that is a pathological case, and it does no harm.) When we find 
+  one, we temporarily terminate the branch it is in while we scan it. */
+   
+  for (cc = (uschar *)find_bracket(codestart, utf8, -1);
+       cc != NULL;
+       cc = (uschar *)find_bracket(cc, utf8, -1))
+    { 
+    if (GET(cc, 1) == 0)
+      { 
+      int fixed_length; 
+      uschar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE);
+      int end_op = *be; 
+      *be = OP_END;
+      fixed_length = find_fixedlength(cc, re->options, TRUE, cd);
+      *be = end_op;
+      DPRINTF(("fixed length = %d\n", fixed_length));
+      if (fixed_length < 0)
+        {
+        errorcode = (fixed_length == -2)? ERR36 : ERR25;
+        break;   
+        }
+      PUT(cc, 1, fixed_length);
+      }
+    cc += 1 + LINK_SIZE;
+    }  
+  } 
+
 /* Failed to compile, or error while post-processing */
 
 if (errorcode != 0)
diff --git a/pcre_internal.h b/pcre_internal.h
index f64c809..db76647 100644
--- a/pcre_internal.h
+++ b/pcre_internal.h
@@ -1556,6 +1556,7 @@ typedef struct compile_data {
   int  external_flags;          /* External flag bits to be set */
   int  req_varyopt;             /* "After variable item" flag for reqbyte */
   BOOL had_accept;              /* (*ACCEPT) encountered */
+  BOOL check_lookbehind;        /* Lookbehinds need later checking */ 
   int  nltype;                  /* Newline type */
   int  nllen;                   /* Newline string length */
   uschar nl[4];                 /* Newline string when fixed length */
diff --git a/testdata/testinput11 b/testdata/testinput11
index 6f1203b..79be8db 100644
--- a/testdata/testinput11
+++ b/testdata/testinput11
@@ -270,4 +270,17 @@
     rhubarb
     the quick brown fox  
 
+/(a)(?<=b(?1))/
+    baz
+    ** Failers
+    caz  
+    
+/(?<=b(?1))(a)/
+    zbaaz
+    ** Failers
+    aaa  
+    
+/(?<X>a)(?<=b(?&X))/
+    baz
+
 /-- End of testinput11 --/
diff --git a/testdata/testinput2 b/testdata/testinput2
index fbc7671..5561e82 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -2845,4 +2845,20 @@ a random value. /Ix
 /^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/
     XYabcdY
 
+/(?<=b(?1)|zzz)(a)/
+    xbaax
+    xzzzax 
+
+/(a)(?<=b\1)/
+
+/(a)(?<=b+(?1))/
+
+/(a+)(?<=b(?1))/
+
+/(a(?<=b(?1)))/
+
+/(?<=b(?1))xyz/
+
+/(?<=b(?1))xyz(b+)pqrstuvew/
+
 /-- End of testinput2 --/
diff --git a/testdata/testoutput11 b/testdata/testoutput11
index 9a421aa..ab6599b 100644
--- a/testdata/testoutput11
+++ b/testdata/testoutput11
@@ -577,4 +577,27 @@ No match
     the quick brown fox  
 No match
 
+/(a)(?<=b(?1))/
+    baz
+ 0: a
+ 1: a
+    ** Failers
+No match
+    caz  
+No match
+    
+/(?<=b(?1))(a)/
+    zbaaz
+ 0: a
+ 1: a
+    ** Failers
+No match
+    aaa  
+No match
+    
+/(?<X>a)(?<=b(?&X))/
+    baz
+ 0: a
+ 1: a
+
 /-- End of testinput11 --/
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index fc83759..ab6a7c4 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -9812,4 +9812,30 @@ No match
  6: d
  7: Y
 
+/(?<=b(?1)|zzz)(a)/
+    xbaax
+ 0: a
+ 1: a
+    xzzzax 
+ 0: a
+ 1: a
+
+/(a)(?<=b\1)/
+Failed: lookbehind assertion is not fixed length at offset 10
+
+/(a)(?<=b+(?1))/
+Failed: lookbehind assertion is not fixed length at offset 13
+
+/(a+)(?<=b(?1))/
+Failed: lookbehind assertion is not fixed length at offset 14
+
+/(a(?<=b(?1)))/
+Failed: lookbehind assertion is not fixed length at offset 13
+
+/(?<=b(?1))xyz/
+Failed: reference to non-existent subpattern at offset 8
+
+/(?<=b(?1))xyz(b+)pqrstuvew/
+Failed: lookbehind assertion is not fixed length at offset 26
+
 /-- End of testinput2 --/
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-09-22 09:42:11 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-09-22 09:42:11 +0000
commit	13ec83b84a6939e47ebabc1836caec7d94836896 (patch)
tree	4590c85bd69ba6b50d8a741a3469a023edfc03fc
parent	20dd865c5c8f10036cda34b9870351b702399c08 (diff)
download	pcre-13ec83b84a6939e47ebabc1836caec7d94836896.tar.gz