diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2014-11-03 18:27:56 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2014-11-03 18:27:56 +0000 |
commit | c920d7941295fd6c46fffed9aa9e2e0de0b18570 (patch) | |
tree | b6f7098df730074a124b8d339d923ef273aa8822 /doc/pcre2.3 | |
parent | dbde828d9540f7373f1f5d9fbf17880b9e045f7d (diff) | |
download | pcre2-c920d7941295fd6c46fffed9aa9e2e0de0b18570.tar.gz |
Make --enable-unicode the default.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@132 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/pcre2.3')
-rw-r--r-- | doc/pcre2.3 | 25 |
1 files changed, 13 insertions, 12 deletions
diff --git a/doc/pcre2.3 b/doc/pcre2.3 index 8a31f5d..2c585d6 100644 --- a/doc/pcre2.3 +++ b/doc/pcre2.3 @@ -1,4 +1,4 @@ -.TH PCRE2 3 "28 September 2014" "PCRE2 10.00" +.TH PCRE2 3 "03 November 2014" "PCRE2 10.00" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH INTRODUCTION @@ -17,9 +17,10 @@ code units, which means that up to three separate libraries may be installed. The original work to extend PCRE to 16-bit and 32-bit code units was done by Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings can be interpreted either as one character per code unit, or as UTF-encoded -Unicode, with support for Unicode general category properties. Unicode is -optional at build time, and must be enabled explicitly at run time. The version -of Unicode in use can be discovered by running +Unicode, with support for Unicode general category properties. Unicode support +is optional at build time (but is the default); however, processing strings as +UTF code units must be enabled explicitly at run time. The version of Unicode +in use can be discovered by running .sp pcre2test -C .P @@ -91,13 +92,13 @@ not exported. .sp If you are using PCRE2 in a non-UTF application that permits users to supply arbitrary patterns for compilation, you should be aware of a feature that -allows users to turn on UTF support from within a pattern, provided that PCRE2 -was built with Unicode support. For example, an 8-bit pattern that begins with -"(*UTF)" turns on UTF-8 mode, which interprets patterns and subjects as strings -of UTF-8 code units instead of individual 8-bit characters. This causes both -the pattern and any data against which it is matched to be checked for UTF-8 -validity. If the data string is very long, such a check might use sufficiently -many resources as to cause your application to lose performance. +allows users to turn on UTF support from within a pattern. For example, an +8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets +patterns and subjects as strings of UTF-8 code units instead of individual +8-bit characters. This causes both the pattern and any data against which it is +matched to be checked for UTF-8 validity. If the data string is very long, such +a check might use sufficiently many resources as to cause your application to +lose performance. .P One way of guarding against this possibility is to use the \fBpcre2_pattern_info()\fP function to check the compiled pattern's options for @@ -175,6 +176,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk. .rs .sp .nf -Last updated: 28 September 2014 +Last updated: 03 November 2014 Copyright (c) 1997-2014 University of Cambridge. .fi |