summaryrefslogtreecommitdiff
path: root/doc/pcreapi.3
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcreapi.3')
-rw-r--r--doc/pcreapi.326
1 files changed, 25 insertions, 1 deletions
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index fbd3d5d..0149f50 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -371,6 +371,18 @@ in the main
.\"
page.
+ PCRE_NO_UTF8_CHECK
+
+When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
+automatically checked. If an invalid UTF-8 sequence of bytes is found,
+\fBpcre_compile()\fR returns an error. If you already know that your pattern is
+valid, and you want to skip this check for performance reasons, you can set the
+PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid
+UTF-8 string as a pattern is undefined. It may cause your program to crash.
+Note that there is a similar option for suppressing the checking of subject
+strings passed to \fBpcre_exec()\fR.
+
+
.SH STUDYING A PATTERN
.rs
.sp
@@ -698,6 +710,14 @@ first matching position. However, if a pattern was compiled with PCRE_ANCHORED,
or turned out to be anchored by virtue of its contents, it cannot be made
unachored at matching time.
+When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8
+string is automatically checked. If an invalid UTF-8 sequence of bytes is
+found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already
+know that your subject is valid, and you want to skip this check for
+performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling
+\fBpcre_exec()\fR. When this option is set, the effect of passing an invalid
+UTF-8 string as a subject is undefined. It may cause your program to crash.
+
There are also three further options that can be set only at matching time:
PCRE_NOTBOL
@@ -872,6 +892,10 @@ This error is never generated by \fBpcre_exec()\fR itself. It is provided for
use by callout functions that want to yield a distinctive error code. See the
\fBpcrecallout\fR documentation for details.
+ PCRE_ERROR_BADUTF8 (-10)
+
+A string that contains an invalid UTF-8 byte sequence was passed as a subject.
+
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
.rs
.sp
@@ -1011,6 +1035,6 @@ then call \fIpcre_copy_substring()\fR or \fIpcre_get_substring()\fR, as
appropriate.
.in 0
-Last updated: 03 February 2003
+Last updated: 20 August 2003
.br
Copyright (c) 1997-2003 University of Cambridge.