summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2015-01-26 14:21:45 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2015-01-26 14:21:45 +0000
commit1434a44f884ad6637bcd6ae1876dd997b78135d0 (patch)
tree755d211f4ea8f74b50a12f9a1d9795c30cda73e2 /README
parent61cb4c76713910670d84bc0e3bb9dad8c2661b37 (diff)
downloadpcre2-1434a44f884ad6637bcd6ae1876dd997b78135d0.tar.gz
Documentation clarifications.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@186 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'README')
-rw-r--r--README16
1 files changed, 10 insertions, 6 deletions
diff --git a/README b/README
index 71d6f72..508fd1e 100644
--- a/README
+++ b/README
@@ -179,20 +179,24 @@ library. They are also documented in the pcre2build man page.
. If you do not want to make use of the support for UTF-8 Unicode character
strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
- library, and UTF-32 Unicode character strings in the 32-bit library, you can
+ library, or UTF-32 Unicode character strings in the 32-bit library, you can
add --disable-unicode to the "configure" command. This reduces the size of
the libraries. It is not possible to configure one library with Unicode
support, and another without, in the same configuration.
When Unicode support is available, the use of a UTF encoding still has to be
- enabled by an option at run time. When PCRE2 is compiled with Unicode
- support, its input can only either be ASCII or UTF-8/16/32, even when running
- on EBCDIC platforms. It is not possible to use both --enable-unicode and
- --enable-ebcdic at the same time.
+ enabled by setting the PCRE2_UTF option at run time or starting a pattern
+ with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
+ either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
+ not possible to use both --enable-unicode and --enable-ebcdic at the same
+ time.
As well as supporting UTF strings, Unicode support includes support for the
\P, \p, and \X sequences that recognize Unicode character properties.
However, only the basic two-letter properties such as Lu are supported.
+ Escape sequences such as \d and \w in patterns do not by default make use of
+ Unicode properties, but can be made to do so by setting the PCRE2_UCP option
+ or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the
@@ -825,4 +829,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 20 January 2015
+Last updated: 26 January 2015