9 files changed, 262 insertions, 116 deletions
diff --git a/doc/html/index.html b/doc/html/index.html
index 75361fd..fc93ed0 100644
--- a/doc/html/index.html
+++ b/doc/html/index.html
@@ -1,10 +1,10 @@
 <html>
-<!-- This is a manually maintained file that is the root of the HTML version of
-     the PCRE documentation. When the HTML documents are built from the man
-     page versions, the entire doc/html directory is emptied, this file is then
-     copied into doc/html/index.html, and the remaining files therein are
+<!-- This is a manually maintained file that is the root of the HTML version of 
+     the PCRE documentation. When the HTML documents are built from the man 
+     page versions, the entire doc/html directory is emptied, this file is then 
+     copied into doc/html/index.html, and the remaining files therein are 
      created by the 132html script.
--->
+-->      
 <head>
 <title>PCRE specification</title>
 </head>
@@ -83,11 +83,11 @@ The HTML documentation for PCRE comprises the following pages:
 </table>
 
 <p>
-There are also individual pages that summarize the interface for each function
+There are also individual pages that summarize the interface for each function 
 in the library:
 </p>
 
-<table>
+<table>    
 
 <tr><td><a href="pcre_assign_jit_stack.html">pcre_assign_jit_stack</a></td>
     <td>&nbsp;&nbsp;Assign stack for JIT matching</td></tr>
@@ -150,7 +150,7 @@ in the library:
 
 <tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
-
+    
 <tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
     <td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
 
diff --git a/doc/html/pcreapi.html b/doc/html/pcreapi.html
index cd90766..9ddae5b 100644
--- a/doc/html/pcreapi.html
+++ b/doc/html/pcreapi.html
@@ -649,6 +649,23 @@ character). Thus, the pattern AB]CD becomes illegal when this option is set.
 string (by default this causes the current matching alternative to fail). A
 pattern such as (\1)(a) succeeds when this option is set (assuming it can find
 an "a" in the subject), whereas it fails by default, for Perl compatibility.
+</P>
+<P>
+(3) \U matches an upper case "U" character; by default \U causes a compile 
+time error (Perl uses \U to upper case subsequent characters).
+</P>
+<P>
+(4) \u matches a lower case "u" character unless it is followed by four 
+hexadecimal digits, in which case the hexadecimal number defines the code point 
+to match. By default, \u causes a compile time error (Perl uses it to upper 
+case the following character).
+</P>
+<P>
+(5) \x matches a lower case "x" character unless it is followed by two 
+hexadecimal digits, in which case the hexadecimal number defines the code point 
+to match. By default, as in Perl, a hexadecimal number is always expected after 
+\x, but it may have zero, one, or two digits (so, for example, \xz matches a 
+binary zero character followed by z).
 <pre>
   PCRE_MULTILINE
 </pre>
@@ -1127,6 +1144,12 @@ particular pattern. See the
 <a href="pcrejit.html"><b>pcrejit</b></a>
 documentation for details of what can and cannot be handled.
 <pre>
+  PCRE_INFO_JITSIZE
+</pre>
+If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,
+return the size of the JIT compiled code, otherwise return zero. The fourth
+argument should point to a <b>size_t</b> variable.
+<pre>
   PCRE_INFO_LASTLITERAL
 </pre>
 Return the value of the rightmost literal byte that must exist in any matched
@@ -1235,10 +1258,13 @@ For such patterns, the PCRE_ANCHORED bit is set in the options returned by
 <pre>
   PCRE_INFO_SIZE
 </pre>
-Return the size of the compiled pattern, that is, the value that was passed as
-the argument to <b>pcre_malloc()</b> when PCRE was getting memory in which to
-place the compiled data. The fourth argument should point to a <b>size_t</b>
-variable.
+Return the size of the compiled pattern. The fourth argument should point to a
+<b>size_t</b> variable. This value does not include the size of the <b>pcre</b>
+structure that is returned by <b>pcre_compile()</b>. The value that is passed as
+the argument to <b>pcre_malloc()</b> when <b>pcre_compile()</b> is getting memory
+in which to place the compiled data is the value returned by this option plus
+the size of the <b>pcre</b> structure. Studying a compiled pattern, with or
+without JIT, does not alter the value returned by this option.
 <pre>
   PCRE_INFO_STUDYSIZE
 </pre>
@@ -2486,7 +2512,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC24" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 September 2011
+Last updated: 02 December 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcrecallout.html b/doc/html/pcrecallout.html
index 40d5fa2..e94ffec 100644
--- a/doc/html/pcrecallout.html
+++ b/doc/html/pcrecallout.html
@@ -189,9 +189,10 @@ same callout number. However, they are set for all callouts.
 <P>
 The <i>mark</i> field is present from version 2 of the <i>pcre_callout</i>
 structure. In callouts from <b>pcre_exec()</b> it contains a pointer to the
-zero-terminated name of the most recently passed (*MARK) item in the match, or
-NULL if there are no (*MARK)s in the current matching path. In callouts from
-<b>pcre_dfa_exec()</b> this field always contains NULL.
+zero-terminated name of the most recently passed (*MARK), (*PRUNE), or (*THEN)
+item in the match, or NULL if no such items have been passed. Instances of 
+(*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
+callouts from <b>pcre_dfa_exec()</b> this field always contains NULL.
 </P>
 <br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
 <P>
@@ -219,7 +220,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 August 2011
+Last updated: 30 November 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcrecompat.html b/doc/html/pcrecompat.html
index 69d9d1d..9a09318 100644
--- a/doc/html/pcrecompat.html
+++ b/doc/html/pcrecompat.html
@@ -53,7 +53,8 @@ represent a binary zero.
 own, matching a non-newline character, is supported.) In fact these are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE, an error is
-generated.
+generated by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, 
+\U and \u are interpreted as JavaScript interprets them.
 </P>
 <P>
 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
@@ -202,7 +203,7 @@ Cambridge CB2 3QH, England.
 REVISION
 </b><br>
 <P>
-Last updated: 09 October 2011
+Last updated: 14 November 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcrejit.html b/doc/html/pcrejit.html
index c257d0d..c5b2a48 100644
--- a/doc/html/pcrejit.html
+++ b/doc/html/pcrejit.html
@@ -20,10 +20,11 @@ man page, in case the conversion went wrong.
 <li><a name="TOC5" href="#SEC5">RETURN VALUES FROM JIT EXECUTION</a>
 <li><a name="TOC6" href="#SEC6">SAVING AND RESTORING COMPILED PATTERNS</a>
 <li><a name="TOC7" href="#SEC7">CONTROLLING THE JIT STACK</a>
-<li><a name="TOC8" href="#SEC8">EXAMPLE CODE</a>
-<li><a name="TOC9" href="#SEC9">SEE ALSO</a>
-<li><a name="TOC10" href="#SEC10">AUTHOR</a>
-<li><a name="TOC11" href="#SEC11">REVISION</a>
+<li><a name="TOC8" href="#SEC8">JIT STACK FAQ</a>
+<li><a name="TOC9" href="#SEC9">EXAMPLE CODE</a>
+<li><a name="TOC10" href="#SEC10">SEE ALSO</a>
+<li><a name="TOC11" href="#SEC11">AUTHOR</a>
+<li><a name="TOC12" href="#SEC12">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
@@ -57,11 +58,17 @@ fully tested. If --enable-jit is set on an unsupported platform, compilation
 fails.
 </P>
 <P>
-A program can tell if JIT support is available by calling <b>pcre_config()</b>
-with the PCRE_CONFIG_JIT option. The result is 1 when JIT is available, and 0
-otherwise. However, a simple program does not need to check this in order to
-use JIT. The API is implemented in a way that falls back to the ordinary PCRE
-code if JIT is not available.
+A program that is linked with PCRE 8.20 or later can tell if JIT support is
+available by calling <b>pcre_config()</b> with the PCRE_CONFIG_JIT option. The
+result is 1 when JIT is available, and 0 otherwise. However, a simple program
+does not need to check this in order to use JIT. The API is implemented in a
+way that falls back to the ordinary PCRE code if JIT is not available.
+</P>
+<P>
+If your program may sometimes be linked with versions of PCRE that are older
+than 8.20, but you want to use JIT when it is available, you can test
+the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such
+as PCRE_CONFIG_JIT, for compile-time control of your code. 
 </P>
 <br><a name="SEC3" href="#TOC1">SIMPLE USE OF JIT</a><br>
 <P>
@@ -75,6 +82,21 @@ You have to do two things to make use of the JIT support in the simplest way:
       no longer needed instead of just freeing it yourself. This
       ensures that any JIT data is also freed.
 </pre>
+For a program that may be linked with pre-8.20 versions of PCRE, you can insert
+<pre>
+  #ifndef PCRE_STUDY_JIT_COMPILE
+  #define PCRE_STUDY_JIT_COMPILE 0
+  #endif
+</pre>
+so that no option is passed to <b>pcre_study()</b>, and then use something like 
+this to free the study data:
+<pre>
+  #ifdef PCRE_CONFIG_JIT
+      pcre_free_study(study_ptr);
+  #else
+      pcre_free(study_ptr);
+  #endif
+</pre>
 In some circumstances you may need to call additional functions. These are
 described in the section entitled
 <a href="#stackcontrol">"Controlling the JIT stack"</a>
@@ -116,12 +138,8 @@ supported.
 <P>
 The unsupported pattern items are:
 <pre>
-  \C            match a single byte; not supported in UTF-8 mode
+  \C             match a single byte; not supported in UTF-8 mode
   (?Cn)          callouts
-  (?(&#60;name&#62;)...  conditional test on setting of a named subpattern
-  (?(R)...       conditional test on whole pattern recursion
-  (?(Rn)...      conditional test on recursion, by number
-  (?(R&name)...  conditional test on recursion, by name
   (*COMMIT)      )
   (*MARK)        )
   (*PRUNE)       ) the backtracking control verbs
@@ -167,7 +185,10 @@ When the compiled JIT code runs, it needs a block of memory to use as a stack.
 By default, it uses 32K on the machine stack. However, some large or
 complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT
 is given when there is not enough stack. Three functions are provided for
-managing blocks of memory for use as JIT stacks.
+managing blocks of memory for use as JIT stacks. There is further discussion
+about the use of JIT stacks in the section entitled
+<a href="#stackcontrol">"JIT stack FAQ"</a>
+below. 
 </P>
 <P>
 The <b>pcre_jit_stack_alloc()</b> function creates a JIT stack. Its arguments
@@ -234,8 +255,86 @@ All the functions described in this section do nothing if JIT is not available,
 and <b>pcre_assign_jit_stack()</b> does nothing unless the <b>extra</b> argument
 is non-NULL and points to a <b>pcre_extra</b> block that is the result of a
 successful study with PCRE_STUDY_JIT_COMPILE.
+<a name="stackfaq"></a></P>
+<br><a name="SEC8" href="#TOC1">JIT STACK FAQ</a><br>
+<P>
+(1) Why do we need JIT stacks?
+<br>
+<br>
+PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where
+the local data of the current node is pushed before checking its child nodes.
+Allocating real machine stack on some platforms is difficult. For example, the
+stack chain needs to be updated every time if we extend the stack on PowerPC.
+Although it is possible, its updating time overhead decreases performance. So
+we do the recursion in memory.
+</P>
+<P>
+(2) Why don't we simply allocate blocks of memory with <b>malloc()</b>? 
+<br>
+<br>
+Modern operating systems have a nice feature: they can reserve an address space
+instead of allocating memory. We can safely allocate memory pages inside this
+address space, so the stack could grow without moving memory data (this is
+important because of pointers). Thus we can allocate 1M address space, and use
+only a single memory page (usually 4K) if that is enough. However, we can still
+grow up to 1M anytime if needed.
+</P>
+<P>
+(3) Who "owns" a JIT stack? 
+<br>
+<br>
+The owner of the stack is the user program, not the JIT studied pattern or
+anything else. The user program must ensure that if a stack is used by
+<b>pcre_exec()</b>, (that is, it is assigned to the pattern currently running),
+that stack must not be used by any other threads (to avoid overwriting the same
+memory area). The best practice for multithreaded programs is to allocate a
+stack for each thread, and return this stack through the JIT callback function.
+</P>
+<P>
+(4) When should a JIT stack be freed?
+<br>
+<br>
+You can free a JIT stack at any time, as long as it will not be used by
+<b>pcre_exec()</b> again. When you assign the stack to a pattern, only a pointer
+is set. There is no reference counting or any other magic. You can free the
+patterns and stacks in any order, anytime. Just <i>do not</i> call
+<b>pcre_exec()</b> with a pattern pointing to an already freed stack, as that
+will cause SEGFAULT. (Also, do not free a stack currently used by
+<b>pcre_exec()</b> in another thread). You can also replace the stack for a
+pattern at any time. You can even free the previous stack before assigning a
+replacement.
+</P>
+<P>
+(5) Should I allocate/free a stack every time before/after calling
+<b>pcre_exec()</b>?
+<br>
+<br>
+No, because this is too costly in terms of resources. However, you could
+implement some clever idea which release the stack if it is not used in let's
+say two minutes. The JIT callback can help to achive this without keeping a
+list of the currently JIT studied patterns.
+</P>
+<P>
+(6) OK, the stack is for long term memory allocation. But what happens if a
+pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
+stack is freed? 
+<br>
+<br>
+Especially on embedded sytems, it might be a good idea to release
+memory sometimes without freeing the stack. There is no API for this at the
+moment. Probably a function call which returns with the currently allocated
+memory for any stack and another which allows releasing memory (shrinking the
+stack) would be a good idea if someone needs this.
+</P>
+<P>
+(7) This is too much of a headache. Isn't there any better solution for JIT
+stack handling? 
+<br>
+<br>
+No, thanks to Windows. If POSIX threads were used everywhere, we could throw
+out this complicated API.
 </P>
-<br><a name="SEC8" href="#TOC1">EXAMPLE CODE</a><br>
+<br><a name="SEC9" href="#TOC1">EXAMPLE CODE</a><br>
 <P>
 This is a single-threaded example that specifies a JIT stack without using a
 callback.
@@ -260,22 +359,22 @@ callback.
 
 </PRE>
 </P>
-<br><a name="SEC9" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC10" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcreapi</b>(3)
 </P>
-<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel
+Philip Hazel (FAQ by Zoltan Herczeg)
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
-<br><a name="SEC11" href="#TOC1">REVISION</a><br>
+<br><a name="SEC12" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 October 2011
+Last updated: 26 November 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcrelimits.html b/doc/html/pcrelimits.html
index 4dc28f7..2cab81f 100644
--- a/doc/html/pcrelimits.html
+++ b/doc/html/pcrelimits.html
@@ -37,6 +37,12 @@ There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns.
 </P>
 <P>
+There is a limit to the number of forward references to subsequent subpatterns
+of around 200,000. Repeated forward references with fixed upper limits, for
+example, (?2){0,100} when subpattern number 2 is to the right, are included in
+the count. There is no limit to the number of backward references.
+</P>
+<P>
 The maximum length of name for a named subpattern is 32 characters, and the
 maximum number of named subpatterns is 10000.
 </P>
@@ -65,7 +71,7 @@ Cambridge CB2 3QH, England.
 REVISION
 </b><br>
 <P>
-Last updated: 24 August 2011
+Last updated: 30 November 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcrematching.html b/doc/html/pcrematching.html
index 3d1acf6..ad17c98 100644
--- a/doc/html/pcrematching.html
+++ b/doc/html/pcrematching.html
@@ -164,9 +164,9 @@ always 1, and the value of the <i>capture_last</i> field is always -1.
 </P>
 <P>
 7. The \C escape sequence, which (in the standard algorithm) matches a single
-byte, even in UTF-8 mode, is not supported because the alternative algorithm
-moves through the subject string one character at a time, for all active paths
-through the tree.
+byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the
+alternative algorithm moves through the subject string one character at a time,
+for all active paths through the tree.
 </P>
 <P>
 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
@@ -220,7 +220,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 17 November 2010
+Last updated: 19 November 2011
 <br>
 Copyright &copy; 1997-2010 University of Cambridge.
 <br>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index 349c98c..3efb367 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -268,7 +268,8 @@ one of the following escape sequences than the binary character it represents:
   \t        tab (hex 09)
   \ddd      character with octal code ddd, or back reference
   \xhh      character with hex code hh
-  \x{hhh..} character with hex code hhh..
+  \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
+  \uhhhh    character with hex code hhhh (JavaScript mode only) 
 </pre>
 The precise effect of \cx is as follows: if x is a lower case letter, it
 is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
@@ -280,12 +281,12 @@ values are valid. A lower case letter is converted to upper case, and then the
 0xc0 bits are flipped.)
 </P>
 <P>
-After \x, from zero to two hexadecimal digits are read (letters can be in
-upper or lower case). Any number of hexadecimal digits may appear between \x{
-and }, but the value of the character code must be less than 256 in non-UTF-8
-mode, and less than 2**31 in UTF-8 mode. That is, the maximum value in
-hexadecimal is 7FFFFFFF. Note that this is bigger than the largest Unicode code
-point, which is 10FFFF.
+By default, after \x, from zero to two hexadecimal digits are read (letters
+can be in upper or lower case). Any number of hexadecimal digits may appear
+between \x{ and }, but the value of the character code must be less than 256
+in non-UTF-8 mode, and less than 2**31 in UTF-8 mode. That is, the maximum
+value in hexadecimal is 7FFFFFFF. Note that this is bigger than the largest
+Unicode code point, which is 10FFFF.
 </P>
 <P>
 If characters other than hexadecimal digits appear between \x{ and }, or if
@@ -294,9 +295,17 @@ initial \x will be interpreted as a basic hexadecimal escape, with no
 following digits, giving a character whose value is zero.
 </P>
 <P>
+If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is 
+as just described only when it is followed by two hexadecimal digits. 
+Otherwise, it matches a literal "x" character. In JavaScript mode, support for
+code points greater than 256 is provided by \u, which must be followed by 
+four hexadecimal digits; otherwise it matches a literal "u" character.
+</P>
+<P>
 Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \x. There is no difference in the way they are handled. For
-example, \xdc is exactly the same as \x{dc}.
+syntaxes for \x (or by \u in JavaScript mode). There is no difference in the
+way they are handled. For example, \xdc is exactly the same as \x{dc} (or 
+\u00dc in JavaScript mode).
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
@@ -338,12 +347,25 @@ zero, because no more than three octal digits are ever read.
 </P>
 <P>
 All the sequences that define a single character value can be used both inside
-and outside character classes. In addition, inside a character class, the
-sequence \b is interpreted as the backspace character (hex 08). The sequences
-\B, \N, \R, and \X are not special inside a character class. Like any other
-unrecognized escape sequences, they are treated as the literal characters "B",
-"N", "R", and "X" by default, but cause an error if the PCRE_EXTRA option is
-set. Outside a character class, these sequences have different meanings.
+and outside character classes. In addition, inside a character class, \b is
+interpreted as the backspace character (hex 08).
+</P>
+<P>
+\N is not allowed in a character class. \B, \R, and \X are not special
+inside a character class. Like other unrecognized escape sequences, they are
+treated as the literal characters "B", "R", and "X" by default, but cause an
+error if the PCRE_EXTRA option is set. Outside a character class, these
+sequences have different meanings.
+</P>
+<br><b>
+Unsupported escape sequences
+</b><br>
+<P>
+In Perl, the sequences \l, \L, \u, and \U are recognized by its string
+handler and used to modify the case of following characters. By default, PCRE
+does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT
+option is set, \U matches a "U" character, and \u can be used to define a
+character by code point, as described in the previous section.
 </P>
 <br><b>
 Absolute and relative back references
@@ -389,7 +411,8 @@ Another use of backslash is for specifying generic character types:
 There is also the single sequence \N, which matches a non-newline character.
 This is the same as
 <a href="#fullstopdot">the "." metacharacter</a>
-when PCRE_DOTALL is not set.
+when PCRE_DOTALL is not set. Perl also uses \N to match characters by name; 
+PCRE does not support this.
 </P>
 <P>
 Each pair of lower and upper case escape sequences partitions the complete set
@@ -963,7 +986,8 @@ special meaning in a character class.
 <P>
 The escape sequence \N behaves like a dot, except that it is not affected by
 the PCRE_DOTALL option. In other words, it matches any character except one
-that signifies the end of a line.
+that signifies the end of a line. Perl also uses \N to match characters by
+name; PCRE does not support this.
 </P>
 <br><a name="SEC7" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
 <P>
@@ -979,8 +1003,8 @@ processing unless the PCRE_NO_UTF8_CHECK option is used).
 </P>
 <P>
 PCRE does not allow \C to appear in lookbehind assertions
-<a href="#lookbehind">(described below),</a>
-because in UTF-8 mode this would make it impossible to calculate the length of
+<a href="#lookbehind">(described below)</a>
+in UTF-8 mode, because this would make it impossible to calculate the length of
 the lookbehind.
 </P>
 <P>
@@ -1926,10 +1950,10 @@ match. If there are insufficient characters before the current position, the
 assertion fails.
 </P>
 <P>
-PCRE does not allow the \C escape (which matches a single byte in UTF-8 mode)
-to appear in lookbehind assertions, because it makes it impossible to calculate
-the length of the lookbehind. The \X and \R escapes, which can match
-different numbers of bytes, are also not permitted.
+In UTF-8 mode, PCRE does not allow the \C escape (which matches a single byte,
+even in UTF-8 mode) to appear in lookbehind assertions, because it makes it
+impossible to calculate the length of the lookbehind. The \X and \R escapes,
+which can match different numbers of bytes, are also not permitted.
 </P>
 <P>
 <a href="#subpatternsassubroutines">"Subroutine"</a>
@@ -2511,10 +2535,11 @@ failing negative assertion, they cause an error if encountered by
 If any of these verbs are used in an assertion or in a subpattern that is
 called as a subroutine (whether or not recursively), their effect is confined
 to that subpattern; it does not extend to the surrounding pattern, with one
-exception: a *MARK that is encountered in a positive assertion <i>is</i> passed
-back (compare capturing parentheses in assertions). Note that such subpatterns
-are processed as anchored at the point where they are tested. Note also that
-Perl's treatment of subroutines is different in some cases.
+exception: the name from a *(MARK), (*PRUNE), or (*THEN) that is encountered in
+a successful positive assertion <i>is</i> passed back when a match succeeds
+(compare capturing parentheses in assertions). Note that such subpatterns are
+processed as anchored at the point where they are tested. Note also that Perl's
+treatment of subroutines is different in some cases.
 </P>
 <P>
 The new verbs make use of what was previously invalid syntax: an opening
@@ -2536,6 +2561,10 @@ the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
 when calling <b>pcre_compile()</b> or <b>pcre_exec()</b>, or by starting the
 pattern with (*NO_START_OPT).
 </P>
+<P>
+Experiments with Perl suggest that it too has similar optimizations, sometimes 
+leading to anomalous results.
+</P>
 <br><b>
 Verbs that act immediately
 </b><br>
@@ -2583,17 +2612,17 @@ A name is always required with this verb. There may be as many instances of
 (*MARK) as you like in a pattern, and their names do not have to be unique.
 </P>
 <P>
-When a match succeeds, the name of the last-encountered (*MARK) is passed back
-to the caller via the <i>pcre_extra</i> data structure, as described in the
+When a match succeeds, the name of the last-encountered (*MARK) on the matching 
+path is passed back to the caller via the <i>pcre_extra</i> data structure, as
+described in the
 <a href="pcreapi.html#extradata">section on <i>pcre_extra</i></a>
 in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-documentation. No data is returned for a partial match. Here is an example of
-<b>pcretest</b> output, where the /K modifier requests the retrieval and
-outputting of (*MARK) data:
+documentation. Here is an example of <b>pcretest</b> output, where the /K
+modifier requests the retrieval and outputting of (*MARK) data:
 <pre>
-  /X(*MARK:A)Y|X(*MARK:B)Z/K
-  XY
+    re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
+  data&#62; XY
    0: XY
   MK: A
   XZ
@@ -2611,32 +2640,17 @@ passed back if it is the last-encountered. This does not happen for negative
 assertions.
 </P>
 <P>
-A name may also be returned after a failed match if the final path through the
-pattern involves (*MARK). However, unless (*MARK) used in conjunction with
-(*COMMIT), this is unlikely to happen for an unanchored pattern because, as the
-starting point for matching is advanced, the final check is often with an empty
-string, causing a failure before (*MARK) is reached. For example:
+After a partial match or a failed match, the name of the last encountered
+(*MARK) in the entire match process is returned. For example:
 <pre>
-  /X(*MARK:A)Y|X(*MARK:B)Z/K
-  XP
-  No match
-</pre>
-There are three potential starting points for this match (starting with X,
-starting with P, and with an empty string). If the pattern is anchored, the
-result is different:
-<pre>
-  /^X(*MARK:A)Y|^X(*MARK:B)Z/K
-  XP
+    re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
+  data&#62; XP
   No match, mark = B
 </pre>
-PCRE's start-of-match optimizations can also interfere with this. For example,
-if, as a result of a call to <b>pcre_study()</b>, it knows the minimum
-subject length for a match, a shorter subject will not be scanned at all.
-</P>
-<P>
-Note that similar anomalies (though different in detail) exist in Perl, no
-doubt for the same reasons. The use of (*MARK) data after a failed match of an
-unanchored pattern is not recommended, unless (*COMMIT) is involved.
+Note that in this unanchored example the mark is retained from the match
+attempt that started at the letter "X". Subsequent match attempts starting at 
+"P" and then with an empty string do not get as far as the (*MARK) item, but 
+nevertheless do not reset it.
 </P>
 <br><b>
 Verbs that act after backtracking
@@ -2675,8 +2689,8 @@ Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
 unless PCRE's start-of-match optimizations are turned off, as shown in this
 <b>pcretest</b> example:
 <pre>
-  /(*COMMIT)abc/
-  xyzabc
+    re&#62; /(*COMMIT)abc/
+  data&#62; xyzabc
    0: abc
   xyzabc\Y
   No match
@@ -2697,10 +2711,8 @@ reached, or when matching to the right of (*PRUNE), but if there is no match to
 the right, backtracking cannot cross (*PRUNE). In simple cases, the use of
 (*PRUNE) is just an alternative to an atomic group or possessive quantifier,
 but there are some uses of (*PRUNE) that cannot be expressed in any other way.
-The behaviour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE) when the
-match fails completely; the name is passed back if this is the final attempt.
-(*PRUNE:NAME) does not pass back a name if the match succeeds. In an anchored
-pattern (*PRUNE) has the same effect as (*COMMIT).
+The behaviour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE). In an
+anchored pattern (*PRUNE) has the same effect as (*COMMIT).
 <pre>
   (*SKIP)
 </pre>
@@ -2726,8 +2738,7 @@ following pattern fails to match, the previous path through the pattern is
 searched for the most recent (*MARK) that has the same name. If one is found,
 the "bumpalong" advance is to the subject position that corresponds to that
 (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a
-matching name is found, normal "bumpalong" of one character happens (that is,
-the (*SKIP) is ignored).
+matching name is found, the (*SKIP) is ignored.
 <pre>
   (*THEN) or (*THEN:NAME)
 </pre>
@@ -2741,9 +2752,8 @@ be used for a pattern-based if-then-else block:
 If the COND1 pattern matches, FOO is tried (and possibly further items after
 the end of the group if FOO succeeds); on failure, the matcher skips to the
 second alternative and tries COND2, without backtracking into COND1. The
-behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN) if the
-overall match fails. If (*THEN) is not inside an alternation, it acts like
-(*PRUNE).
+behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN).
+If (*THEN) is not inside an alternation, it acts like (*PRUNE).
 </P>
 <P>
 Note that a subpattern that does not contain a | character is just a part of
@@ -2819,7 +2829,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 October 2011
+Last updated: 29 November 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>
diff --git a/doc/html/pcretest.html b/doc/html/pcretest.html
index 40b970d..80d224b 100644
--- a/doc/html/pcretest.html
+++ b/doc/html/pcretest.html
@@ -364,7 +364,10 @@ which it appears.
 </P>
 <P>
 The <b>/M</b> modifier causes the size of memory block used to hold the compiled
-pattern to be output.
+pattern to be output. This does not include the size of the <b>pcre</b> block; 
+it is just the actual compiled data. If the pattern is successfully studied
+with the PCRE_STUDY_JIT_COMPILE option, the size of the JIT compiled code is 
+also output.
 </P>
 <P>
 If the <b>/S</b> modifier appears once, it causes <b>pcre_study()</b> to be
@@ -856,7 +859,7 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 August 2011
+Last updated: 02 December 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>