summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2017-03-17 16:55:58 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2017-03-17 16:55:58 +0000
commitce69d550b09d9ecb4e153b6b9588257da33961fe (patch)
tree28aacc632101ef9007aa87f4bbe064f059019f1f /README
parent021f9123553364f824fa678ff441695361a22493 (diff)
downloadpcre2-ce69d550b09d9ecb4e153b6b9588257da33961fe.tar.gz
Documentation update.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@684 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'README')
-rw-r--r--README91
1 files changed, 45 insertions, 46 deletions
diff --git a/README b/README
index ff22647..cf5dd7e 100644
--- a/README
+++ b/README
@@ -15,8 +15,8 @@ subscribe or manage your subscription here:
https://lists.exim.org/mailman/listinfo/pcre-dev
-Please read the NEWS file if you are upgrading from a previous release.
-The contents of this README file are:
+Please read the NEWS file if you are upgrading from a previous release. The
+contents of this README file are:
The PCRE2 APIs
Documentation for PCRE2
@@ -44,8 +44,8 @@ wrappers.
The distribution does contain a set of C wrapper functions for the 8-bit
library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2-posix. Note that this
-just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These can be found in a library called libpcre2-posix. Note that
+this just provides a POSIX calling interface to PCRE2; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE2's facilities.
@@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms:
Building PCRE2 on non-Unix-like systems
---------------------------------------
-For a non-Unix-like system, please read the comments in the file
-NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
-"make" you may be able to build PCRE2 using autotools in the same way as for
-many Unix-like systems.
+For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
+your system supports the use of "configure" and "make" you may be able to build
+PCRE2 using autotools in the same way as for many Unix-like systems.
PCRE2 can also be configured using CMake, which can be run in various ways
(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
@@ -174,19 +173,19 @@ library. They are also documented in the pcre2build man page.
architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error.
-. If you do not want to make use of the support for UTF-8 Unicode character
- strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
- library, or UTF-32 Unicode character strings in the 32-bit library, you can
- add --disable-unicode to the "configure" command. This reduces the size of
- the libraries. It is not possible to configure one library with Unicode
- support, and another without, in the same configuration.
+. If you do not want to make use of the default support for UTF-8 Unicode
+ character strings in the 8-bit library, UTF-16 Unicode character strings in
+ the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
+ library, you can add --disable-unicode to the "configure" command. This
+ reduces the size of the libraries. It is not possible to configure one
+ library with Unicode support, and another without, in the same configuration.
+ It is also not possible to use --enable-ebcdic (see below) with Unicode
+ support, so if this option is set, you must also use --disable-unicode.
When Unicode support is available, the use of a UTF encoding still has to be
enabled by setting the PCRE2_UTF option at run time or starting a pattern
with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
- either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
- not possible to use both --enable-unicode and --enable-ebcdic at the same
- time.
+ either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
As well as supporting UTF strings, Unicode support includes support for the
\P, \p, and \X sequences that recognize Unicode character properties.
@@ -232,18 +231,18 @@ library. They are also documented in the pcre2build man page.
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
- pcre2_match() can supply their own value. There is more discussion on the
- pcre2api man page.
+ pcre2_match() can supply their own value. There is more discussion in the
+ pcre2api man page (search for pcre2_set_match_limit).
-. There is a separate counter that limits the depth of recursive function calls
- during a matching process. This also has a default of ten million, which is
- essentially "unlimited". You can change the default by setting, for example,
+. There is a separate counter that limits the depth of nested backtracking
+ during a matching process, which in turn limits the amount of memory that is
+ used. This also has a default of ten million, which is essentially
+ "unlimited". You can change the default by setting, for example,
- --with-match-limit-recursion=500000
+ --with-match-limit-depth=5000
- Recursive function calls use up the runtime stack; running out of stack can
- cause programs to crash in strange ways. There is a discussion about stack
- sizes in the pcre2stack man page.
+ There is more discussion in the pcre2api man page (search for
+ pcre2_set_depth_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the
@@ -254,20 +253,6 @@ library. They are also documented in the pcre2build man page.
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
link size setting is ignored, as 4-byte offsets are always used.
-. You can build PCRE2 so that its internal match() function that is called from
- pcre2_match() does not call itself recursively. Instead, it uses memory
- blocks obtained from the heap to save data that would otherwise be saved on
- the stack. To build PCRE2 like this, use
-
- --disable-stack-for-recursion
-
- on the "configure" command. PCRE2 runs more slowly in this mode, but it may
- be necessary in environments with limited stack sizes. This applies only to
- the normal execution of the pcre2_match() function; if JIT support is being
- successfully used, it is not relevant. Equally, it does not apply to
- pcre2_dfa_match(), which does not use deeply nested recursion. There is a
- discussion about stack sizes in the pcre2stack man page.
-
. For speed, PCRE2 uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, it uses a set of
tables for ASCII encoding that is part of the distribution. If you specify
@@ -389,6 +374,13 @@ library. They are also documented in the pcre2build man page.
string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string.
+. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
+ which caused pcre2_match() to use individual blocks on the heap for
+ backtracking instead of recursive function calls (which use the stack). This
+ is now obsolete since pcre2_match() was refactored always to use the heap (in
+ a much more efficient way than before). This option is retained for backwards
+ compatibility, but has no effect other than to output a warning.
+
The "configure" script builds the following files for the basic C library:
. Makefile the makefile that builds the library
@@ -662,25 +654,32 @@ Unicode support is enabled.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
8-bit mode. Each pair are for general cases and Unicode support, respectively.
+
Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
-Test 14 contains a number of tests that must not be run with JIT. They check,
+Test 14 contains some special UTF and UCP tests that give different output for
+the different widths.
+
+Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
matcher.
-Test 15 is run only when JIT support is not available. It checks that an
+Test 16 is run only when JIT support is not available. It checks that an
attempt to use JIT has the expected behaviour.
-Test 16 is run only when JIT support is available. It checks JIT complete and
+Test 17 is run only when JIT support is available. It checks JIT complete and
partial modes, match-limiting under JIT, and other JIT-specific features.
-Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
+Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
the 8-bit library, without and with Unicode support, respectively.
-Test 19 checks the serialization functions by writing a set of compiled
+Test 20 checks the serialization functions by writing a set of compiled
patterns to a file, and then reloading and checking them.
+Tests 21 and 22 test \C support when the use of \C is not locked out, without
+and with UTF support, respectively. Test 23 tests \C when it is locked out.
+
Character tables
----------------
@@ -866,4 +865,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 01 November 2016
+Last updated: 17 March 2017