diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2012-01-14 16:45:24 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2012-01-14 16:45:24 +0000 |
commit | 2f5a8f10bf39f753de5036739c0b56b874c71f9a (patch) | |
tree | d944f27c3c839153771d8de1dd199bfb1951b8d8 | |
parent | e64ccef119356d70a1782b07b6ac5f0be0c902e8 (diff) | |
download | pcre-2f5a8f10bf39f753de5036739c0b56b874c71f9a.tar.gz |
Documentation minor edits.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@873 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r-- | ChangeLog | 3 | ||||
-rw-r--r-- | README | 19 | ||||
-rw-r--r-- | doc/pcrebuild.3 | 11 | ||||
-rw-r--r-- | doc/pcretest.1 | 14 |
4 files changed, 28 insertions, 19 deletions
@@ -32,6 +32,9 @@ Version 8.30 8. Ovector size of 2 is also supported by JIT based pcre_exec (the ovector size rounding is not applied in this particular case). + +9. The invalid Unicode surrogate codepoints U+D800 to U+DFFF are now rejected + if they appear, or are escaped, in patterns. Version 8.21 12-Dec-2011 @@ -195,14 +195,17 @@ library. They are also documented in the pcrebuild man page. the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library, you must add --enable-utf to the "configure" command. Without it, the code for handling UTF-8 and UTF-16 is not included in the relevant library. Even - when --enable-utf included, the use of UTF encoding still has to be enabled - by an option at run time. When PCRE is compiled with this option, its input - can only either be ASCII or UTF-8/16, even when running on EBCDIC platforms. - It is not possible to use both --enable-utf and --enable-ebcdic at the same - time. - -. The option --enable-utf8 is retained for backwards compatibility with earlier - releases that did not support 16-bit character strings. It is synonymous with + when --enable-utf is included, the use of a UTF encoding still has to be + enabled by an option at run time. When PCRE is compiled with this option, its + input can only either be ASCII or UTF-8/16, even when running on EBCDIC + platforms. It is not possible to use both --enable-utf and --enable-ebcdic at + the same time. + +. There are no separate options for enabling UTF-8 and UTF-16 independently + because that would allow ridiculous settings such as requesting UTF-16 + support while building only the 8-bit library. However, the option + --enable-utf8 is retained for backwards compatibility with earlier releases + that did not support 16-bit character strings. It is synonymous with --enable-utf. It is not possible to configure one library with UTF support and the other without in the same configuration. diff --git a/doc/pcrebuild.3 b/doc/pcrebuild.3 index 88ca1b2..11efdc2 100644 --- a/doc/pcrebuild.3 +++ b/doc/pcrebuild.3 @@ -85,11 +85,14 @@ To build PCRE with support for UTF Unicode character strings, add .sp --enable-utf .sp -to the \fBconfigure\fP command. This setting applies to both libraries, adding +to the \fBconfigure\fP command. This setting applies to both libraries, adding support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit -library. It is not possible to build one library with UTF support and the other -without in the same configuration. (For backwards compatibility, --enable-utf8 -is a synonym of --enable-utf.) +library. There are no separate options for enabling UTF-8 and UTF-16 +independently because that would allow ridiculous settings such as requesting +UTF-16 support while building only the 8-bit library. It is not possible to +build one library with UTF support and the other without in the same +configuration. (For backwards compatibility, --enable-utf8 is a synonym of +--enable-utf.) .P Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As well as compiling PCRE with this option, you also have have to set the diff --git a/doc/pcretest.1 b/doc/pcretest.1 index 0997220..1be87c1 100644 --- a/doc/pcretest.1 +++ b/doc/pcretest.1 @@ -549,12 +549,12 @@ The use of \ex{hh...} is not dependent on the use of the \fB/8\fP modifier on the pattern. It is recognized always. There may be any number of hexadecimal digits inside the braces; invalid values provoke error messages. .P -Note that \exhh specifies one byte in UTF-8 mode; this makes it possible to -construct invalid UTF-8 sequences for testing purposes. On the other hand, -\ex{hh} is interpreted as a UTF-8 character in UTF-8 mode, generating more than -one byte if the value is greater than 127. When testing the 8-bit library not -in UTF-8 mode, \ex{hh} generates one byte for values less than 256, and causes -an error for greater values. +Note that \exhh specifies one byte rather than one character in UTF-8 mode; +this makes it possible to construct invalid UTF-8 sequences for testing +purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in +UTF-8 mode, generating more than one byte if the value is greater than 127. +When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte +for values less than 256, and causes an error for greater values. .P In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it possible to construct invalid UTF-16 sequences for testing purposes. @@ -936,6 +936,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 13 January 2012 +Last updated: 14 January 2012 Copyright (c) 1997-2012 University of Cambridge. .fi |