summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-01-14 16:45:24 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-01-14 16:45:24 +0000
commit2f5a8f10bf39f753de5036739c0b56b874c71f9a (patch)
treed944f27c3c839153771d8de1dd199bfb1951b8d8
parente64ccef119356d70a1782b07b6ac5f0be0c902e8 (diff)
downloadpcre-2f5a8f10bf39f753de5036739c0b56b874c71f9a.tar.gz
Documentation minor edits.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@873 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog3
-rw-r--r--README19
-rw-r--r--doc/pcrebuild.311
-rw-r--r--doc/pcretest.114
4 files changed, 28 insertions, 19 deletions
diff --git a/ChangeLog b/ChangeLog
index c58ab53..dc2527e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -32,6 +32,9 @@ Version 8.30
8. Ovector size of 2 is also supported by JIT based pcre_exec (the ovector size
rounding is not applied in this particular case).
+
+9. The invalid Unicode surrogate codepoints U+D800 to U+DFFF are now rejected
+ if they appear, or are escaped, in patterns.
Version 8.21 12-Dec-2011
diff --git a/README b/README
index 1a72ead..0832924 100644
--- a/README
+++ b/README
@@ -195,14 +195,17 @@ library. They are also documented in the pcrebuild man page.
the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
you must add --enable-utf to the "configure" command. Without it, the code
for handling UTF-8 and UTF-16 is not included in the relevant library. Even
- when --enable-utf included, the use of UTF encoding still has to be enabled
- by an option at run time. When PCRE is compiled with this option, its input
- can only either be ASCII or UTF-8/16, even when running on EBCDIC platforms.
- It is not possible to use both --enable-utf and --enable-ebcdic at the same
- time.
-
-. The option --enable-utf8 is retained for backwards compatibility with earlier
- releases that did not support 16-bit character strings. It is synonymous with
+ when --enable-utf is included, the use of a UTF encoding still has to be
+ enabled by an option at run time. When PCRE is compiled with this option, its
+ input can only either be ASCII or UTF-8/16, even when running on EBCDIC
+ platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
+ the same time.
+
+. There are no separate options for enabling UTF-8 and UTF-16 independently
+ because that would allow ridiculous settings such as requesting UTF-16
+ support while building only the 8-bit library. However, the option
+ --enable-utf8 is retained for backwards compatibility with earlier releases
+ that did not support 16-bit character strings. It is synonymous with
--enable-utf. It is not possible to configure one library with UTF support
and the other without in the same configuration.
diff --git a/doc/pcrebuild.3 b/doc/pcrebuild.3
index 88ca1b2..11efdc2 100644
--- a/doc/pcrebuild.3
+++ b/doc/pcrebuild.3
@@ -85,11 +85,14 @@ To build PCRE with support for UTF Unicode character strings, add
.sp
--enable-utf
.sp
-to the \fBconfigure\fP command. This setting applies to both libraries, adding
+to the \fBconfigure\fP command. This setting applies to both libraries, adding
support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
-library. It is not possible to build one library with UTF support and the other
-without in the same configuration. (For backwards compatibility, --enable-utf8
-is a synonym of --enable-utf.)
+library. There are no separate options for enabling UTF-8 and UTF-16
+independently because that would allow ridiculous settings such as requesting
+UTF-16 support while building only the 8-bit library. It is not possible to
+build one library with UTF support and the other without in the same
+configuration. (For backwards compatibility, --enable-utf8 is a synonym of
+--enable-utf.)
.P
Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
well as compiling PCRE with this option, you also have have to set the
diff --git a/doc/pcretest.1 b/doc/pcretest.1
index 0997220..1be87c1 100644
--- a/doc/pcretest.1
+++ b/doc/pcretest.1
@@ -549,12 +549,12 @@ The use of \ex{hh...} is not dependent on the use of the \fB/8\fP modifier on
the pattern. It is recognized always. There may be any number of hexadecimal
digits inside the braces; invalid values provoke error messages.
.P
-Note that \exhh specifies one byte in UTF-8 mode; this makes it possible to
-construct invalid UTF-8 sequences for testing purposes. On the other hand,
-\ex{hh} is interpreted as a UTF-8 character in UTF-8 mode, generating more than
-one byte if the value is greater than 127. When testing the 8-bit library not
-in UTF-8 mode, \ex{hh} generates one byte for values less than 256, and causes
-an error for greater values.
+Note that \exhh specifies one byte rather than one character in UTF-8 mode;
+this makes it possible to construct invalid UTF-8 sequences for testing
+purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in
+UTF-8 mode, generating more than one byte if the value is greater than 127.
+When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte
+for values less than 256, and causes an error for greater values.
.P
In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it
possible to construct invalid UTF-16 sequences for testing purposes.
@@ -936,6 +936,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 13 January 2012
+Last updated: 14 January 2012
Copyright (c) 1997-2012 University of Cambridge.
.fi