diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2017-03-17 16:55:58 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2017-03-17 16:55:58 +0000 |
commit | ce69d550b09d9ecb4e153b6b9588257da33961fe (patch) | |
tree | 28aacc632101ef9007aa87f4bbe064f059019f1f /README | |
parent | 021f9123553364f824fa678ff441695361a22493 (diff) | |
download | pcre2-ce69d550b09d9ecb4e153b6b9588257da33961fe.tar.gz |
Documentation update.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@684 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'README')
-rw-r--r-- | README | 91 |
1 files changed, 45 insertions, 46 deletions
@@ -15,8 +15,8 @@ subscribe or manage your subscription here: https://lists.exim.org/mailman/listinfo/pcre-dev -Please read the NEWS file if you are upgrading from a previous release. -The contents of this README file are: +Please read the NEWS file if you are upgrading from a previous release. The +contents of this README file are: The PCRE2 APIs Documentation for PCRE2 @@ -44,8 +44,8 @@ wrappers. The distribution does contain a set of C wrapper functions for the 8-bit library that are based on the POSIX regular expression API (see the pcre2posix -man page). These can be found in a library called libpcre2-posix. Note that this -just provides a POSIX calling interface to PCRE2; the regular expressions +man page). These can be found in a library called libpcre2-posix. Note that +this just provides a POSIX calling interface to PCRE2; the regular expressions themselves still follow Perl syntax and semantics. The POSIX API is restricted, and does not give full access to all of PCRE2's facilities. @@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms: Building PCRE2 on non-Unix-like systems --------------------------------------- -For a non-Unix-like system, please read the comments in the file -NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and -"make" you may be able to build PCRE2 using autotools in the same way as for -many Unix-like systems. +For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if +your system supports the use of "configure" and "make" you may be able to build +PCRE2 using autotools in the same way as for many Unix-like systems. PCRE2 can also be configured using CMake, which can be run in various ways (command line, GUI, etc). This creates Makefiles, solution files, etc. The file @@ -174,19 +173,19 @@ library. They are also documented in the pcre2build man page. architectures. If you try to enable it on an unsupported architecture, there will be a compile time error. -. If you do not want to make use of the support for UTF-8 Unicode character - strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit - library, or UTF-32 Unicode character strings in the 32-bit library, you can - add --disable-unicode to the "configure" command. This reduces the size of - the libraries. It is not possible to configure one library with Unicode - support, and another without, in the same configuration. +. If you do not want to make use of the default support for UTF-8 Unicode + character strings in the 8-bit library, UTF-16 Unicode character strings in + the 16-bit library, or UTF-32 Unicode character strings in the 32-bit + library, you can add --disable-unicode to the "configure" command. This + reduces the size of the libraries. It is not possible to configure one + library with Unicode support, and another without, in the same configuration. + It is also not possible to use --enable-ebcdic (see below) with Unicode + support, so if this option is set, you must also use --disable-unicode. When Unicode support is available, the use of a UTF encoding still has to be enabled by setting the PCRE2_UTF option at run time or starting a pattern with (*UTF). When PCRE2 is compiled with Unicode support, its input can only - either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is - not possible to use both --enable-unicode and --enable-ebcdic at the same - time. + either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. As well as supporting UTF strings, Unicode support includes support for the \P, \p, and \X sequences that recognize Unicode character properties. @@ -232,18 +231,18 @@ library. They are also documented in the pcre2build man page. --with-match-limit=500000 on the "configure" command. This is just the default; individual calls to - pcre2_match() can supply their own value. There is more discussion on the - pcre2api man page. + pcre2_match() can supply their own value. There is more discussion in the + pcre2api man page (search for pcre2_set_match_limit). -. There is a separate counter that limits the depth of recursive function calls - during a matching process. This also has a default of ten million, which is - essentially "unlimited". You can change the default by setting, for example, +. There is a separate counter that limits the depth of nested backtracking + during a matching process, which in turn limits the amount of memory that is + used. This also has a default of ten million, which is essentially + "unlimited". You can change the default by setting, for example, - --with-match-limit-recursion=500000 + --with-match-limit-depth=5000 - Recursive function calls use up the runtime stack; running out of stack can - cause programs to crash in strange ways. There is a discussion about stack - sizes in the pcre2stack man page. + There is more discussion in the pcre2api man page (search for + pcre2_set_depth_limit). . In the 8-bit library, the default maximum compiled pattern size is around 64K bytes. You can increase this by adding --with-link-size=3 to the @@ -254,20 +253,6 @@ library. They are also documented in the pcre2build man page. performance in the 8-bit and 16-bit libraries. In the 32-bit library, the link size setting is ignored, as 4-byte offsets are always used. -. You can build PCRE2 so that its internal match() function that is called from - pcre2_match() does not call itself recursively. Instead, it uses memory - blocks obtained from the heap to save data that would otherwise be saved on - the stack. To build PCRE2 like this, use - - --disable-stack-for-recursion - - on the "configure" command. PCRE2 runs more slowly in this mode, but it may - be necessary in environments with limited stack sizes. This applies only to - the normal execution of the pcre2_match() function; if JIT support is being - successfully used, it is not relevant. Equally, it does not apply to - pcre2_dfa_match(), which does not use deeply nested recursion. There is a - discussion about stack sizes in the pcre2stack man page. - . For speed, PCRE2 uses four tables for manipulating and identifying characters whose code point values are less than 256. By default, it uses a set of tables for ASCII encoding that is part of the distribution. If you specify @@ -389,6 +374,13 @@ library. They are also documented in the pcre2build man page. string. Otherwise, it is assumed to be a file name, and the contents of the file are the test string. +. Releases before 10.30 could be compiled with --disable-stack-for-recursion, + which caused pcre2_match() to use individual blocks on the heap for + backtracking instead of recursive function calls (which use the stack). This + is now obsolete since pcre2_match() was refactored always to use the heap (in + a much more efficient way than before). This option is retained for backwards + compatibility, but has no effect other than to output a warning. + The "configure" script builds the following files for the basic C library: . Makefile the makefile that builds the library @@ -662,25 +654,32 @@ Unicode support is enabled. Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 16-bit and 32-bit modes. These are tests that generate different output in 8-bit mode. Each pair are for general cases and Unicode support, respectively. + Test 13 checks the handling of non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit modes. -Test 14 contains a number of tests that must not be run with JIT. They check, +Test 14 contains some special UTF and UCP tests that give different output for +the different widths. + +Test 15 contains a number of tests that must not be run with JIT. They check, among other non-JIT things, the match-limiting features of the intepretive matcher. -Test 15 is run only when JIT support is not available. It checks that an +Test 16 is run only when JIT support is not available. It checks that an attempt to use JIT has the expected behaviour. -Test 16 is run only when JIT support is available. It checks JIT complete and +Test 17 is run only when JIT support is available. It checks JIT complete and partial modes, match-limiting under JIT, and other JIT-specific features. -Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to +Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to the 8-bit library, without and with Unicode support, respectively. -Test 19 checks the serialization functions by writing a set of compiled +Test 20 checks the serialization functions by writing a set of compiled patterns to a file, and then reloading and checking them. +Tests 21 and 22 test \C support when the use of \C is not locked out, without +and with UTF support, respectively. Test 23 tests \C when it is locked out. + Character tables ---------------- @@ -866,4 +865,4 @@ The distribution should contain the files listed below. Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 01 November 2016 +Last updated: 17 March 2017 |