summaryrefslogtreecommitdiff
path: root/doc/html/pcre2.html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-03 18:27:56 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-03 18:27:56 +0000
commitc920d7941295fd6c46fffed9aa9e2e0de0b18570 (patch)
treeb6f7098df730074a124b8d339d923ef273aa8822 /doc/html/pcre2.html
parentdbde828d9540f7373f1f5d9fbf17880b9e045f7d (diff)
downloadpcre2-c920d7941295fd6c46fffed9aa9e2e0de0b18570.tar.gz
Make --enable-unicode the default.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@132 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2.html')
-rw-r--r--doc/html/pcre2.html23
1 files changed, 12 insertions, 11 deletions
diff --git a/doc/html/pcre2.html b/doc/html/pcre2.html
index a94bd1a..4eac819 100644
--- a/doc/html/pcre2.html
+++ b/doc/html/pcre2.html
@@ -35,9 +35,10 @@ code units, which means that up to three separate libraries may be installed.
The original work to extend PCRE to 16-bit and 32-bit code units was done by
Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
can be interpreted either as one character per code unit, or as UTF-encoded
-Unicode, with support for Unicode general category properties. Unicode is
-optional at build time, and must be enabled explicitly at run time. The version
-of Unicode in use can be discovered by running
+Unicode, with support for Unicode general category properties. Unicode support
+is optional at build time (but is the default); however, processing strings as
+UTF code units must be enabled explicitly at run time. The version of Unicode
+in use can be discovered by running
<pre>
pcre2test -C
</PRE>
@@ -95,13 +96,13 @@ not exported.
<P>
If you are using PCRE2 in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
-allows users to turn on UTF support from within a pattern, provided that PCRE2
-was built with Unicode support. For example, an 8-bit pattern that begins with
-"(*UTF)" turns on UTF-8 mode, which interprets patterns and subjects as strings
-of UTF-8 code units instead of individual 8-bit characters. This causes both
-the pattern and any data against which it is matched to be checked for UTF-8
-validity. If the data string is very long, such a check might use sufficiently
-many resources as to cause your application to lose performance.
+allows users to turn on UTF support from within a pattern. For example, an
+8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets
+patterns and subjects as strings of UTF-8 code units instead of individual
+8-bit characters. This causes both the pattern and any data against which it is
+matched to be checked for UTF-8 validity. If the data string is very long, such
+a check might use sufficiently many resources as to cause your application to
+lose performance.
</P>
<P>
One way of guarding against this possibility is to use the
@@ -173,7 +174,7 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
</P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 28 September 2014
+Last updated: 03 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>