1 files changed, 56 insertions, 14 deletions
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 0149f50..4c7d43c 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -99,6 +99,12 @@ PCRE - Perl-compatible regular expressions
 .B void (*pcre_free)(void *);
 .PP
 .br
+.B void *(*pcre_stack_malloc)(size_t);
+.PP
+.br
+.B void (*pcre_stack_free)(void *);
+.PP
+.br
 .B int (*pcre_callout)(pcre_callout_block *);
 
 .SH PCRE API
@@ -147,6 +153,16 @@ respectively. PCRE calls the memory management functions via these variables,
 so a calling program can replace them if it wishes to intercept the calls. This
 should be done before calling any PCRE functions.
 
+The global variables \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are also
+indirections to memory management functions. These special functions are used
+only when PCRE is compiled to use the heap for remembering data, instead of
+recursive function calls. This is a non-standard way of building PCRE, for use
+in environments that have limited stacks. Because of the greater use of memory
+management, it runs more slowly. Separate functions are provided so that
+special-purpose external code can be used for this case. When used, these
+functions are always called in a stack-like manner (last obtained, first
+freed), and always for memory blocks of the same size.
+
 The global variable \fBpcre_callout\fR initially contains NULL. It can be set
 by the caller to a "callout" function, which PCRE will then call at specified
 points during a matching operation. Details are given in the \fBpcrecallout\fR
@@ -156,9 +172,9 @@ documentation.
 .rs
 .sp
 The PCRE functions can be used in multi-threading applications, with the
-proviso that the memory management functions pointed to by \fBpcre_malloc\fR
-and \fBpcre_free\fR, and the callout function pointed to by \fBpcre_callout\fR,
-are shared by all threads.
+proviso that the memory management functions pointed to by \fBpcre_malloc\fR,
+\fBpcre_free\fR, \fBpcre_stack_malloc\fR, and \fBpcre_stack_free\fR, and the
+callout function pointed to by \fBpcre_callout\fR, are shared by all threads.
 
 The compiled form of a regular expression is not altered during matching, so
 the same compiled pattern can safely be used by several threads at once.
@@ -210,6 +226,15 @@ The output is an integer that gives the default limit for the number of
 internal matching function calls in a \fBpcre_exec()\fR execution. Further
 details are given with \fBpcre_exec()\fR below.
 
+  PCRE_CONFIG_STACKRECURSE
+
+The output is an integer that is set to one if internal recursion is
+implemented by recursive function calls that use the stack to remember their
+state. This is the usual way that PCRE is compiled. The output is zero if PCRE
+was compiled to use blocks of data on the heap instead of recursive function
+calls. In this case, \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are
+called to manage memory blocks on the heap, thus avoiding the use of the stack.
+
 .SH COMPILING A PATTERN
 .rs
 .sp
@@ -711,12 +736,21 @@ or turned out to be anchored by virtue of its contents, it cannot be made
 unachored at matching time.
 
 When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8
-string is automatically checked. If an invalid UTF-8 sequence of bytes is
-found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already
-know that your subject is valid, and you want to skip this check for
-performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling
-\fBpcre_exec()\fR. When this option is set, the effect of passing an invalid
-UTF-8 string as a subject is undefined. It may cause your program to crash.
+string is automatically checked, and the value of \fIstartoffset\fR is also
+checked to ensure that it points to the start of a UTF-8 character. If an
+invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fR returns the error
+PCRE_ERROR_BADUTF8. If \fIstartoffset\fR contains an invalid value,
+PCRE_ERROR_BADUTF8_OFFSET is returned.
+
+If you already know that your subject is valid, and you want to skip these
+checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
+calling \fBpcre_exec()\fR. You might want to do this for the second and
+subsequent calls to \fBpcre_exec()\fR if you are making repeated calls to find
+all the matches in a single subject string. However, you should be sure that
+the value of \fIstartoffset\fR points to the start of a UTF-8 character. When
+PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a
+subject, or a value of \fIstartoffset\fR that does not point to the start of a
+UTF-8 character, is undefined. Your program may crash.
 
 There are also three further options that can be set only at matching time:
 
@@ -753,14 +787,17 @@ PCRE_NOTEMPTY set, and then if that fails by advancing the starting offset (see
 below) and trying an ordinary match again.
 
 The subject string is passed to \fBpcre_exec()\fR as a pointer in
-\fIsubject\fR, a length in \fIlength\fR, and a starting offset in
+\fIsubject\fR, a length in \fIlength\fR, and a starting byte offset in
 \fIstartoffset\fR. Unlike the pattern string, the subject may contain binary
 zero bytes. When the starting offset is zero, the search for a match starts at
 the beginning of the subject, and this is by far the most common case.
 
 If the pattern was compiled with the PCRE_UTF8 option, the subject must be a
-sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is
-passed, PCRE's behaviour is not defined.
+sequence of bytes that is a valid UTF-8 string, and the starting offset must
+point to the beginning of a UTF-8 character. If an invalid UTF-8 string or
+offset is passed, an error (either PCRE_ERROR_BADUTF8 or
+PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is
+set, in which case PCRE's behaviour is not defined.
 
 A non-zero starting offset is useful when searching for another match in the
 same subject by calling \fBpcre_exec()\fR again after a previous success.
@@ -892,10 +929,15 @@ This error is never generated by \fBpcre_exec()\fR itself. It is provided for
 use by callout functions that want to yield a distinctive error code. See the
 \fBpcrecallout\fR documentation for details.
 
-  PCRE_ERROR_BADUTF8       (-10)
+  PCRE_ERROR_BADUTF8        (-10)
 
 A string that contains an invalid UTF-8 byte sequence was passed as a subject.
 
+  PCRE_ERROR_BADUTF8_OFFSET (-11)
+
+The UTF-8 byte sequence that was passed as a subject was valid, but the value
+of \fIstartoffset\fR did not point to the beginning of a UTF-8 character.
+
 .SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
 .rs
 .sp
@@ -1035,6 +1077,6 @@ then call \fIpcre_copy_substring()\fR or \fIpcre_get_substring()\fR, as
 appropriate.
 
 .in 0
-Last updated: 20 August 2003
+Last updated: 09 December 2003
 .br
 Copyright (c) 1997-2003 University of Cambridge.