summaryrefslogtreecommitdiff
path: root/doc/pcreapi.3
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcreapi.3')
-rw-r--r--doc/pcreapi.370
1 files changed, 56 insertions, 14 deletions
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index 0149f50..4c7d43c 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -99,6 +99,12 @@ PCRE - Perl-compatible regular expressions
.B void (*pcre_free)(void *);
.PP
.br
+.B void *(*pcre_stack_malloc)(size_t);
+.PP
+.br
+.B void (*pcre_stack_free)(void *);
+.PP
+.br
.B int (*pcre_callout)(pcre_callout_block *);
.SH PCRE API
@@ -147,6 +153,16 @@ respectively. PCRE calls the memory management functions via these variables,
so a calling program can replace them if it wishes to intercept the calls. This
should be done before calling any PCRE functions.
+The global variables \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are also
+indirections to memory management functions. These special functions are used
+only when PCRE is compiled to use the heap for remembering data, instead of
+recursive function calls. This is a non-standard way of building PCRE, for use
+in environments that have limited stacks. Because of the greater use of memory
+management, it runs more slowly. Separate functions are provided so that
+special-purpose external code can be used for this case. When used, these
+functions are always called in a stack-like manner (last obtained, first
+freed), and always for memory blocks of the same size.
+
The global variable \fBpcre_callout\fR initially contains NULL. It can be set
by the caller to a "callout" function, which PCRE will then call at specified
points during a matching operation. Details are given in the \fBpcrecallout\fR
@@ -156,9 +172,9 @@ documentation.
.rs
.sp
The PCRE functions can be used in multi-threading applications, with the
-proviso that the memory management functions pointed to by \fBpcre_malloc\fR
-and \fBpcre_free\fR, and the callout function pointed to by \fBpcre_callout\fR,
-are shared by all threads.
+proviso that the memory management functions pointed to by \fBpcre_malloc\fR,
+\fBpcre_free\fR, \fBpcre_stack_malloc\fR, and \fBpcre_stack_free\fR, and the
+callout function pointed to by \fBpcre_callout\fR, are shared by all threads.
The compiled form of a regular expression is not altered during matching, so
the same compiled pattern can safely be used by several threads at once.
@@ -210,6 +226,15 @@ The output is an integer that gives the default limit for the number of
internal matching function calls in a \fBpcre_exec()\fR execution. Further
details are given with \fBpcre_exec()\fR below.
+ PCRE_CONFIG_STACKRECURSE
+
+The output is an integer that is set to one if internal recursion is
+implemented by recursive function calls that use the stack to remember their
+state. This is the usual way that PCRE is compiled. The output is zero if PCRE
+was compiled to use blocks of data on the heap instead of recursive function
+calls. In this case, \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are
+called to manage memory blocks on the heap, thus avoiding the use of the stack.
+
.SH COMPILING A PATTERN
.rs
.sp
@@ -711,12 +736,21 @@ or turned out to be anchored by virtue of its contents, it cannot be made
unachored at matching time.
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8
-string is automatically checked. If an invalid UTF-8 sequence of bytes is
-found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already
-know that your subject is valid, and you want to skip this check for
-performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling
-\fBpcre_exec()\fR. When this option is set, the effect of passing an invalid
-UTF-8 string as a subject is undefined. It may cause your program to crash.
+string is automatically checked, and the value of \fIstartoffset\fR is also
+checked to ensure that it points to the start of a UTF-8 character. If an
+invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fR returns the error
+PCRE_ERROR_BADUTF8. If \fIstartoffset\fR contains an invalid value,
+PCRE_ERROR_BADUTF8_OFFSET is returned.
+
+If you already know that your subject is valid, and you want to skip these
+checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
+calling \fBpcre_exec()\fR. You might want to do this for the second and
+subsequent calls to \fBpcre_exec()\fR if you are making repeated calls to find
+all the matches in a single subject string. However, you should be sure that
+the value of \fIstartoffset\fR points to the start of a UTF-8 character. When
+PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a
+subject, or a value of \fIstartoffset\fR that does not point to the start of a
+UTF-8 character, is undefined. Your program may crash.
There are also three further options that can be set only at matching time:
@@ -753,14 +787,17 @@ PCRE_NOTEMPTY set, and then if that fails by advancing the starting offset (see
below) and trying an ordinary match again.
The subject string is passed to \fBpcre_exec()\fR as a pointer in
-\fIsubject\fR, a length in \fIlength\fR, and a starting offset in
+\fIsubject\fR, a length in \fIlength\fR, and a starting byte offset in
\fIstartoffset\fR. Unlike the pattern string, the subject may contain binary
zero bytes. When the starting offset is zero, the search for a match starts at
the beginning of the subject, and this is by far the most common case.
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a
-sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is
-passed, PCRE's behaviour is not defined.
+sequence of bytes that is a valid UTF-8 string, and the starting offset must
+point to the beginning of a UTF-8 character. If an invalid UTF-8 string or
+offset is passed, an error (either PCRE_ERROR_BADUTF8 or
+PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is
+set, in which case PCRE's behaviour is not defined.
A non-zero starting offset is useful when searching for another match in the
same subject by calling \fBpcre_exec()\fR again after a previous success.
@@ -892,10 +929,15 @@ This error is never generated by \fBpcre_exec()\fR itself. It is provided for
use by callout functions that want to yield a distinctive error code. See the
\fBpcrecallout\fR documentation for details.
- PCRE_ERROR_BADUTF8 (-10)
+ PCRE_ERROR_BADUTF8 (-10)
A string that contains an invalid UTF-8 byte sequence was passed as a subject.
+ PCRE_ERROR_BADUTF8_OFFSET (-11)
+
+The UTF-8 byte sequence that was passed as a subject was valid, but the value
+of \fIstartoffset\fR did not point to the beginning of a UTF-8 character.
+
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
.rs
.sp
@@ -1035,6 +1077,6 @@ then call \fIpcre_copy_substring()\fR or \fIpcre_get_substring()\fR, as
appropriate.
.in 0
-Last updated: 20 August 2003
+Last updated: 09 December 2003
.br
Copyright (c) 1997-2003 University of Cambridge.