1 files changed, 47 insertions, 11 deletions
diff --git a/doc/pcre.txt b/doc/pcre.txt
index 1ec5f2c..ad6f3b2 100644
--- a/doc/pcre.txt
+++ b/doc/pcre.txt
@@ -118,10 +118,19 @@ UTF-8 SUPPORT
      The following comments apply when PCRE is running  in  UTF-8
      mode:
 
-     1. PCRE assumes that the strings it is given  contain  valid
-     UTF-8  codes. It does not diagnose invalid UTF-8 strings. If
-     you pass invalid UTF-8 strings  to  PCRE,  the  results  are
-     undefined.
+     1. When you set the PCRE_UTF8 flag, the  strings  passed  as
+     patterns  and  subjects are checked for validity on entry to
+     the relevant  functions.  If  an  invalid  UTF-8  string  is
+     passed,  an  error  return is given. In some situations, you
+     may already know that your strings are valid, and  therefore
+     want  to  skip these checks in order to improve performance.
+     If you set the PCRE_NO_UTF8_CHECK flag at compile time or at
+     run  time,  PCRE  assumes  that the pattern or subject it is
+     given (respectively) contains only  valid  UTF-8  codes.  In
+     this  case, it does not diagnose an invalid UTF-8 string. If
+     you  pass   an   invalid   UTF-8   string   to   PCRE   when
+     PCRE_NO_UTF8_CHECK  is  set, the results are undefined. Your
+     program may crash.
 
      2. In a pattern, the escape sequence \x{...}, where the con-
      tents  of  the  braces is a string of hexadecimal digits, is
@@ -164,7 +173,7 @@ AUTHOR
      Cambridge CB2 3QG, England.
      Phone: +44 1223 334714
 
-Last updated: 04 February 2003
+Last updated: 20 August 2003
 Copyright (c) 1997-2003 University of Cambridge.
 -----------------------------------------------------------------------------
 
@@ -654,6 +663,20 @@ COMPILING A PATTERN
      option  changes  the behaviour of PCRE are given in the sec-
      tion on UTF-8 support in the main pcre page.
 
+       PCRE_NO_UTF8_CHECK
+
+     When PCRE_UTF8 is set, the validity  of  the  pattern  as  a
+     UTF-8  string  is automatically checked. If an invalid UTF-8
+     sequence of bytes is found, pcre_compile() returns an error.
+     If you already know that your pattern is valid, and you want
+     to skip this check for performance reasons, you can set  the
+     PCRE_NO_UTF8_CHECK  option.  When  it  is set, the effect of
+     passing an invalid UTF-8 string as a pattern  is  undefined.
+     It  may  cause  your program to crash.  Note that there is a
+     similar option  for  suppressing  the  checking  of  subject
+     strings passed to pcre_exec().
+
+
 
 STUDYING A PATTERN
 
@@ -747,7 +770,6 @@ INFORMATION ABOUT A PATTERN
      compiled pattern. It replaces the obsolete pcre_info() func-
      tion, which is nevertheless retained for backwards compabil-
      ity (and is documented below).
-
      The first argument for pcre_fullinfo() is a pointer  to  the
      compiled  pattern.  The  second  argument  is  the result of
      pcre_study(), or NULL if the pattern was  not  studied.  The
@@ -1014,6 +1036,16 @@ MATCHING A PATTERN
      turned out to be anchored by virtue of its contents, it can-
      not be made unachored at matching time.
 
+     When PCRE_UTF8 was set at compile time, the validity of  the
+     subject  as  a  UTF-8 string is automatically checked. If an
+     invalid  UTF-8  sequence  of  bytes  is  found,  pcre_exec()
+     returns  the  error  PCRE_ERROR_BADUTF8. If you already know
+     that your subject is valid, and you want to skip this  check
+     for  performance reasons, you can set the PCRE_NO_UTF8_CHECK
+     option when calling pcre_exec(). When this  option  is  set,
+     the  effect  of passing an invalid UTF-8 string as a subject
+     is undefined. It may cause your program to crash.
+
      There are also three further options that can be set only at
      matching time:
 
@@ -1103,7 +1135,6 @@ MATCHING A PATTERN
      used for a fragment of a pattern that picks out a substring.
      PCRE supports several other kinds of  parenthesized  subpat-
      tern that do not cause substrings to be captured.
-
      Captured substrings are returned to the caller via a  vector
      of  integer  offsets whose address is passed in ovector. The
      number of elements in the vector is passed in ovecsize.  The
@@ -1219,6 +1250,11 @@ MATCHING A PATTERN
      distinctive error code. See  the  pcrecallout  documentation
      for details.
 
+       PCRE_ERROR_BADUTF8       (-10)
+
+     A string that contains an invalid UTF-8  byte  sequence  was
+     passed as a subject.
+
 
 EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
 
@@ -1255,7 +1291,6 @@ EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
      returned zero, indicating that it ran out of space in  ovec-
      tor,  the  value passed as stringcount should be the size of
      the vector divided by three.
-
      The functions pcre_copy_substring() and pcre_get_substring()
      extract a single substring, whose number is given as string-
      number. A value of zero extracts the substring that  matched
@@ -1352,7 +1387,7 @@ EXTRACTING CAPTURED SUBSTRINGS BY NAME
      succeeds,    they   then   call   pcre_copy_substring()   or
      pcre_get_substring(), as appropriate.
 
-Last updated: 03 February 2003
+Last updated: 20 August 2003
 Copyright (c) 1997-2003 University of Cambridge.
 -----------------------------------------------------------------------------
 
@@ -1420,8 +1455,9 @@ PCRE CALLOUTS
      The current_position field contains the  offset  within  the
      subject of the current match pointer.
 
-     The capture_top field contains the  number  of  the  highest
-     captured substring so far.
+     The capture_top field contains one more than the  number  of
+     the  highest  numbered captured substring so far. If no sub-
+     strings have been captured, the value of capture_top is one.
 
      The capture_last field  contains  the  number  of  the  most
      recently captured substring.