summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2008-08-24 16:25:20 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2008-08-24 16:25:20 +0000
commit41687a877dcdcc63d1d53b0648600018ceecb2eb (patch)
tree97f2b12f82e5d334ff80311cbe59d98fefd81c33
parent4f2129b124fcdc66251bb1cf3b7ec4486d9c50d7 (diff)
downloadpcre-41687a877dcdcc63d1d53b0648600018ceecb2eb.tar.gz
Make it clearer that ovector values are byte offsets, not character counts.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@368 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog7
-rw-r--r--doc/pcreapi.349
2 files changed, 32 insertions, 24 deletions
diff --git a/ChangeLog b/ChangeLog
index 4a0124a..a491cb1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,7 +1,7 @@
ChangeLog for PCRE
------------------
-Version 8.0 02 Jul-08
+Version 7.8 25-Aug-08
---------------------
1. Replaced UCP searching code with optimized version as implemented for Ad
@@ -65,6 +65,11 @@ Version 8.0 02 Jul-08
15. Lazy qualifiers were not working in some cases in UTF-8 mode. For example,
/^[^d]*?$/8 failed to match "abc".
+
+16. Added a missing copyright notice to pcrecpp_internal.h.
+
+17. Make it more clear in the documentation that values returned from
+ pcre_exec() in ovector are byte offsets, not character counts.
Version 7.7 07-May-08
diff --git a/doc/pcreapi.3 b/doc/pcreapi.3
index f68d0ed..30c3d23 100644
--- a/doc/pcreapi.3
+++ b/doc/pcreapi.3
@@ -1371,11 +1371,11 @@ documentation.
.rs
.sp
The subject string is passed to \fBpcre_exec()\fP as a pointer in
-\fIsubject\fP, a length in \fIlength\fP, and a starting byte offset in
-\fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of a
-UTF-8 character. Unlike the pattern string, the subject may contain binary zero
-bytes. When the starting offset is zero, the search for a match starts at the
-beginning of the subject, and this is by far the most common case.
+\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset
+in \fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of
+a UTF-8 character. Unlike the pattern string, the subject may contain binary
+zero bytes. When the starting offset is zero, the search for a match starts at
+the beginning of the subject, and this is by far the most common case.
.P
A non-zero starting offset is useful when searching for another match in the
same subject by calling \fBpcre_exec()\fP again after a previous success.
@@ -1409,38 +1409,41 @@ pattern. Following the usage in Jeffrey Friedl's book, this is called
a fragment of a pattern that picks out a substring. PCRE supports several other
kinds of parenthesized subpattern that do not cause substrings to be captured.
.P
-Captured substrings are returned to the caller via a vector of integer offsets
-whose address is passed in \fIovector\fP. The number of elements in the vector
-is passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP:
-this argument is NOT the size of \fIovector\fP in bytes.
+Captured substrings are returned to the caller via a vector of integers whose
+address is passed in \fIovector\fP. The number of elements in the vector is
+passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP: this
+argument is NOT the size of \fIovector\fP in bytes.
.P
The first two-thirds of the vector is used to pass back captured substrings,
each substring using a pair of integers. The remaining third of the vector is
used as workspace by \fBpcre_exec()\fP while matching capturing subpatterns,
-and is not available for passing back information. The length passed in
+and is not available for passing back information. The number passed in
\fIovecsize\fP should always be a multiple of three. If it is not, it is
rounded down.
.P
When a match is successful, information about captured substrings is returned
in pairs of integers, starting at the beginning of \fIovector\fP, and
-continuing up to two-thirds of its length at the most. The first element of a
-pair is set to the offset of the first character in a substring, and the second
-is set to the offset of the first character after the end of a substring. The
-first pair, \fIovector[0]\fP and \fIovector[1]\fP, identify the portion of the
-subject string matched by the entire pattern. The next pair is used for the
-first capturing subpattern, and so on. The value returned by \fBpcre_exec()\fP
-is one more than the highest numbered pair that has been set. For example, if
-two substrings have been captured, the returned value is 3. If there are no
-capturing subpatterns, the return value from a successful match is 1,
-indicating that just the first pair of offsets has been set.
+continuing up to two-thirds of its length at the most. The first element of
+each pair is set to the byte offset of the first character in a substring, and
+the second is set to the byte offset of the first character after the end of a
+substring. \fBNote\fP: these values are always byte offsets, even in UTF-8
+mode. They are not character counts.
+.P
+The first pair of integers, \fIovector[0]\fP and \fIovector[1]\fP, identify the
+portion of the subject string matched by the entire pattern. The next pair is
+used for the first capturing subpattern, and so on. The value returned by
+\fBpcre_exec()\fP is one more than the highest numbered pair that has been set.
+For example, if two substrings have been captured, the returned value is 3. If
+there are no capturing subpatterns, the return value from a successful match is
+1, indicating that just the first pair of offsets has been set.
.P
If a capturing subpattern is matched repeatedly, it is the last portion of the
string that it matched that is returned.
.P
If the vector is too small to hold all the captured substring offsets, it is
used as far as possible (up to two-thirds of its length), and the function
-returns a value of zero. In particular, if the substring offsets are not of
-interest, \fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and
+returns a value of zero. If the substring offsets are not of interest,
+\fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and
\fIovecsize\fP as zero. However, if the pattern contains back references and
the \fIovector\fP is not big enough to remember the related substrings, PCRE
has to get additional memory for use during matching. Thus it is usually
@@ -1975,6 +1978,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 12 April 2008
+Last updated: 24 August 2008
Copyright (c) 1997-2008 University of Cambridge.
.fi