summaryrefslogtreecommitdiff
path: root/doc/pcre2.3
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-03 18:27:56 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-03 18:27:56 +0000
commitc920d7941295fd6c46fffed9aa9e2e0de0b18570 (patch)
treeb6f7098df730074a124b8d339d923ef273aa8822 /doc/pcre2.3
parentdbde828d9540f7373f1f5d9fbf17880b9e045f7d (diff)
downloadpcre2-c920d7941295fd6c46fffed9aa9e2e0de0b18570.tar.gz
Make --enable-unicode the default.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@132 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/pcre2.3')
-rw-r--r--doc/pcre2.325
1 files changed, 13 insertions, 12 deletions
diff --git a/doc/pcre2.3 b/doc/pcre2.3
index 8a31f5d..2c585d6 100644
--- a/doc/pcre2.3
+++ b/doc/pcre2.3
@@ -1,4 +1,4 @@
-.TH PCRE2 3 "28 September 2014" "PCRE2 10.00"
+.TH PCRE2 3 "03 November 2014" "PCRE2 10.00"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH INTRODUCTION
@@ -17,9 +17,10 @@ code units, which means that up to three separate libraries may be installed.
The original work to extend PCRE to 16-bit and 32-bit code units was done by
Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
can be interpreted either as one character per code unit, or as UTF-encoded
-Unicode, with support for Unicode general category properties. Unicode is
-optional at build time, and must be enabled explicitly at run time. The version
-of Unicode in use can be discovered by running
+Unicode, with support for Unicode general category properties. Unicode support
+is optional at build time (but is the default); however, processing strings as
+UTF code units must be enabled explicitly at run time. The version of Unicode
+in use can be discovered by running
.sp
pcre2test -C
.P
@@ -91,13 +92,13 @@ not exported.
.sp
If you are using PCRE2 in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
-allows users to turn on UTF support from within a pattern, provided that PCRE2
-was built with Unicode support. For example, an 8-bit pattern that begins with
-"(*UTF)" turns on UTF-8 mode, which interprets patterns and subjects as strings
-of UTF-8 code units instead of individual 8-bit characters. This causes both
-the pattern and any data against which it is matched to be checked for UTF-8
-validity. If the data string is very long, such a check might use sufficiently
-many resources as to cause your application to lose performance.
+allows users to turn on UTF support from within a pattern. For example, an
+8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets
+patterns and subjects as strings of UTF-8 code units instead of individual
+8-bit characters. This causes both the pattern and any data against which it is
+matched to be checked for UTF-8 validity. If the data string is very long, such
+a check might use sufficiently many resources as to cause your application to
+lose performance.
.P
One way of guarding against this possibility is to use the
\fBpcre2_pattern_info()\fP function to check the compiled pattern's options for
@@ -175,6 +176,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
.rs
.sp
.nf
-Last updated: 28 September 2014
+Last updated: 03 November 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi