summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-12-30 19:32:50 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-12-30 19:32:50 +0000
commit3e69b6dc685eead4b0dfc35191a26457c74f598a (patch)
tree3cf48781f7a952f8468cd6cb4c98cce89c955b29 /README
parentd42d890c2e36e9749c36b180c1cc16b8f6ec0682 (diff)
downloadpcre-3e69b6dc685eead4b0dfc35191a26457c74f598a.tar.gz
16-bit update of non-man documentation files and the PrepareRelease script.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@840 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'README')
-rw-r--r--README236
1 files changed, 134 insertions, 102 deletions
diff --git a/README b/README
index 9dc25d7..f75cfa0 100644
--- a/README
+++ b/README
@@ -34,16 +34,19 @@ The contents of this README file are:
The PCRE APIs
-------------
-PCRE is written in C, and it has its own API. The distribution also includes a
-set of C++ wrapper functions (see the pcrecpp man page for details), courtesy
-of Google Inc.
-
-In addition, there is a set of C wrapper functions that are based on the POSIX
-regular expression API (see the pcreposix man page). These end up in the
-library called libpcreposix. Note that this just provides a POSIX calling
-interface to PCRE; the regular expressions themselves still follow Perl syntax
-and semantics. The POSIX API is restricted, and does not give full access to
-all of PCRE's facilities.
+PCRE is written in C, and it has its own API. There are two sets of functions,
+one for the 8-bit library, which processes strings of bytes, and one for the
+16-bit library, which processes strings of 16-bit values. The distribution also
+includes a set of C++ wrapper functions (see the pcrecpp man page for details),
+courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
+C++.
+
+In addition, there is a set of C wrapper functions (again, just for the 8-bit
+library) that are based on the POSIX regular expression API (see the pcreposix
+man page). These end up in the library called libpcreposix. Note that this just
+provides a POSIX calling interface to PCRE; the regular expressions themselves
+still follow Perl syntax and semantics. The POSIX API is restricted, and does
+not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
@@ -143,9 +146,9 @@ the usual methods of changing standard defaults are available. For example:
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
-specifies that the C compiler should be run with the flags '-O2 -Wall' instead
-of the default, and that "make install" should install PCRE under /opt/local
-instead of the default /usr/local.
+This command specifies that the C compiler should be run with the flags '-O2
+-Wall' instead of the default, and that "make install" should install PCRE
+under /opt/local instead of the default /usr/local.
If you want to build in a different directory, just run "configure" with that
directory as current. For example, suppose you have unpacked the PCRE source
@@ -168,11 +171,16 @@ library. They are also documented in the pcrebuild man page.
--disable-static
(See also "Shared libraries on Unix-like systems" below.)
+
+. By default, only the 8-bit library is built. If you add --enable-pcre16 to
+ the "configure" command, the 16-bit library is also built. If you want only
+ the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8".
-. If you want to suppress the building of the C++ wrapper library, you can add
- --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
- it will try to find a C++ compiler and C++ header files, and if it succeeds,
- it will try to build the C++ wrapper.
+. If you are building the 8-bit library and want to suppress the building of
+ the C++ wrapper library, you can add --disable-cpp to the "configure"
+ command. Otherwise, when "configure" is run without --disable-pcre8, it will
+ try to find a C++ compiler and C++ header files, and if it succeeds, it will
+ try to build the C++ wrapper.
. If you want to include support for just-in-time compiling, which can give
large performance improvements on certain platforms, add --enable-jit to the
@@ -184,19 +192,26 @@ library. They are also documented in the pcrebuild man page.
you add --disable-pcregrep-jit to the "configure" command.
. If you want to make use of the support for UTF-8 Unicode character strings in
- PCRE, you must add --enable-utf8 to the "configure" command. Without it, the
- code for handling UTF-8 is not included in the library. Even when included,
- it still has to be enabled by an option at run time. When PCRE is compiled
- with this option, its input can only either be ASCII or UTF-8, even when
- running on EBCDIC platforms. It is not possible to use both --enable-utf8 and
- --enable-ebcdic at the same time.
-
-. If, in addition to support for UTF-8 character strings, you want to include
- support for the \P, \p, and \X sequences that recognize Unicode character
- properties, you must add --enable-unicode-properties to the "configure"
- command. This adds about 30K to the size of the library (in the form of a
- property table); only the basic two-letter properties such as Lu are
- supported.
+ the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
+ you must add --enable-utf to the "configure" command. Without it, the code
+ for handling UTF-8 and UTF-16 is not included in the relevant library. Even
+ when --enable-utf included, the use of UTF encoding still has to be enabled
+ by an option at run time. When PCRE is compiled with this option, its input
+ can only either be ASCII or UTF-8/16, even when running on EBCDIC platforms.
+ It is not possible to use both --enable-utf and --enable-ebcdic at the same
+ time.
+
+. The option --enable-utf8 is retained for backwards compatibility with earlier
+ releases that did not support 16-bit character strings. It is synonymous with
+ --enable-utf. It is not possible to configure one library with UTF support
+ and the other without in the same configuration.
+
+. If, in addition to support for UTF-8/16 character strings, you want to
+ include support for the \P, \p, and \X sequences that recognize Unicode
+ character properties, you must add --enable-unicode-properties to the
+ "configure" command. This adds about 30K to the size of the library (in the
+ form of a property table); only the basic two-letter properties such as Lu
+ are supported.
. You can build PCRE to recognize either CR or LF or the sequence CRLF or any
of the preceding, or any of the Unicode newline sequences as indicating the
@@ -249,10 +264,11 @@ library. They are also documented in the pcrebuild man page.
sizes in the pcrestack man page.
. The default maximum compiled pattern size is around 64K. You can increase
- this by adding --with-link-size=3 to the "configure" command. You can
- increase it even more by setting --with-link-size=4, but this is unlikely
- ever to be necessary. Increasing the internal link size will reduce
- performance.
+ this by adding --with-link-size=3 to the "configure" command. In the 8-bit
+ library, PCRE then uses three bytes instead of two for offsets to different
+ parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
+ the same as --with-link-size=4, which (in both libraries) uses four-byte
+ offsets. Increasing the internal link size reduces performance.
. You can build PCRE so that its internal match() function that is called from
pcre_exec() does not call itself recursively. Instead, it uses memory blocks
@@ -287,10 +303,12 @@ library. They are also documented in the pcrebuild man page.
This automatically implies --enable-rebuild-chartables (see above). However,
when PCRE is built this way, it always operates in EBCDIC. It cannot support
- both EBCDIC and UTF-8.
+ both EBCDIC and UTF-8/16.
-. It is possible to compile pcregrep to use libz and/or libbz2, in order to
- read .gz and .bz2 files (respectively), by specifying one or both of
+. The pcregrep program currently supports only 8-bit data files, and so
+ requires the 8-bit PCRE library. It is possible to compile pcregrep to use
+ libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
+ specifying one or both of
--enable-pcregrep-libz
--enable-pcregrep-libbz2
@@ -333,6 +351,7 @@ The "configure" script builds the following files for the basic C library:
. pcre-config script that shows the building settings such as CFLAGS
that were set for "configure"
. libpcre.pc ) data for the pkg-config command
+. libpcre16.pc )
. libpcreposix.pc )
. libtool script that builds shared and/or static libraries
. RunTest script for running tests on the basic C library
@@ -343,7 +362,8 @@ names config.h.generic and pcre.h.generic. These are provided for those who
have to built PCRE without using "configure" or CMake. If you use "configure"
or CMake, the .generic versions are not used.
-If a C++ compiler is found, the following files are also built:
+When building the 8-bit library, if a C++ compiler is found, the following
+files are also built:
. libpcrecpp.pc data for the pkg-config command
. pcrecpparg.h header file for calling PCRE via the C++ wrapper
@@ -353,13 +373,16 @@ The "configure" script also creates config.status, which is an executable
script that can be run to recreate the configuration, and config.log, which
contains compiler output from tests that "configure" runs.
-Once "configure" has run, you can run "make". It builds two libraries, called
-libpcre and libpcreposix, a test program called pcretest, and the pcregrep
-command. If a C++ compiler was found on your system, and you did not disable it
-with --disable-cpp, "make" also builds the C++ wrapper library, which is called
-libpcrecpp, and some test programs called pcrecpp_unittest,
-pcre_scanner_unittest, and pcre_stringpiece_unittest. If you enabled JIT
-support with --enable-jit, a test program called pcre_jit_test is also built.
+Once "configure" has run, you can run "make". This builds either or both of the
+libraries libpcre and libpcre16, and a test program called pcretest. If you
+enabled JIT support with --enable-jit, a test program called pcre_jit_test is
+built as well.
+
+If the 8-bit library is built, libpcreposix and the pcregrep command are also
+built, and if a C++ compiler was found on your system, and you did not disable
+it with --disable-cpp, "make" builds the C++ wrapper library, which is called
+libpcrecpp, as well as some test programs called pcrecpp_unittest,
+pcre_scanner_unittest, and pcre_stringpiece_unittest.
The command "make check" runs all the appropriate tests. Details of the PCRE
tests are given below in a separate section of this document.
@@ -370,15 +393,17 @@ system. The following are installed (file names are all relative to the
Commands (bin):
pcretest
- pcregrep
+ pcregrep (if 8-bit support is enabled)
pcre-config
Libraries (lib):
- libpcre
- libpcreposix
- libpcrecpp (if C++ support is enabled)
+ libpcre16 (if 16-bit support is enabled)
+ libpcre (if 8-bit support is enabled)
+ libpcreposix (if 8-bit support is enabled)
+ libpcrecpp (if 8-bit and C++ support is enabled)
Configuration information (lib/pkgconfig):
+ libpcre16.pc
libpcre.pc
libpcreposix.pc
libpcrecpp.pc (if C++ support is enabled)
@@ -558,8 +583,8 @@ The RunTest script runs the pcretest test program (which is documented in its
own man page) on each of the relevant testinput files in the testdata
directory, and compares the output with the contents of the corresponding
testoutput files. Some tests are relevant only when certain build-time options
-were selected. For example, the tests for UTF-8 support are run only if
---enable-utf8 was used. RunTest outputs a comment when it skips a test.
+were selected. For example, the tests for UTF-8/16 support are run only if
+--enable-utf was used. RunTest outputs a comment when it skips a test.
Many of the tests that are not skipped are run up to three times. The second
run forces pcre_study() to be called for all patterns except for a few in some
@@ -567,17 +592,22 @@ tests that are marked "never study" (see the pcretest program for how this is
done). If JIT support is available, the non-DFA tests are run a third time,
this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
-RunTest uses a file called testtry to hold the main output from pcretest
-(testsavedregex is also used as a working file). To run pcretest on just one of
-the test files, give its number as an argument to RunTest, for example:
+When both 8-bit and 16-bit support is enabled, the entire set of tests is run
+twice, once for each library. If you want to run just one set of tests, call
+RunTest with either the -8 or -16 option.
- RunTest 2
+RunTest uses a file called testtry to hold the main output from pcretest
+(testsavedregex is also used as a working file). To run pcretest on just one or
+more specific test files, give their numbers as arguments to RunTest, for
+example:
+ RunTest 2 7 11
+
The first test file can be fed directly into the perltest.pl script to check
that Perl gives the same results. The only difference you should see is in the
first few lines, where the Perl version is given instead of the PCRE version.
-The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
+The second set of tests check pcre_fullinfo(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
detection, and run-time flags that are specific to PCRE, as well as the POSIX
wrapper API. It also uses the debugging flags to check some of the internals of
@@ -612,36 +642,29 @@ RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
Windows versions of test 2. More info on using RunTest.bat is included in the
document entitled NON-UNIX-USE.]
-The fourth test checks the UTF-8 support. This file can be also fed directly to
-the perltest.pl script, provided you are running Perl 5.8 or higher.
-
-The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
-features of PCRE that are not relevant to Perl.
-
-The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
-character properties. This file can be also fed directly to the perltest.pl
-script, provided you are running Perl 5.10 or higher.
+The fourth and fifth tests check the UTF-8/16 support and error handling and
+internal UTF features of PCRE that are not relevant to Perl, respectively. The
+sixth and seventh tests do the same for Unicode character properties support.
-The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
-matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
-property support, respectively.
+The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
+matching function, in non-UTF-8/16 mode, UTF-8/16 mode, and UTF-8/16 mode with
+Unicode property support, respectively.
-The tenth test checks some internal offsets and code size features; it is run
-only when the default "link size" of 2 is set (in other cases the sizes
+The eleventh test checks some internal offsets and code size features; it is
+run only when the default "link size" of 2 is set (in other cases the sizes
change) and when Unicode property support is enabled.
-The eleventh and twelfth tests check out features that are new in Perl 5.10,
-without and with UTF-8 support, respectively. This file can be also fed
-directly to the perltest.pl script, provided you are running Perl 5.10 or
-higher.
+The twelfth test is run only when JIT support is available, and the thirteenth
+test is run only when JIT support is not available. They test some JIT-specific
+features such as information output from pcretest about JIT compilation.
-The thirteenth test checks a number internals and non-Perl features concerned
-with Unicode property support.
+The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
+the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode.
+These are tests that generate different output in the two modes. They are for
+general cases, UTF-8/16 support, and Unicode property support, respectively.
-The fourteenth test is run only when JIT support is available, and the
-fifteenth test is run only when JIT support is not available. They test some
-JIT-specific features such as information output from pcretest about JIT
-compilation.
+The twentieth test is run only in 16-bit mode. It tests some specific 16-bit
+features of the DFA matching engine.
Character tables
@@ -701,7 +724,9 @@ will cause PCRE to malfunction.
File manifest
-------------
-The distribution should contain the following files:
+The distribution should contain the files listed below. Where a file name is
+given as pcre[16]_xxx it means that there are two files, one with the name
+pcre_xxx and the other with the name pcre16_xxx.
(A) Source files of the PCRE library functions and their headers:
@@ -710,31 +735,36 @@ The distribution should contain the following files:
pcre_chartables.c.dist a default set of character tables that assume ASCII
coding; used, unless --enable-rebuild-chartables is
- specified, by copying to pcre_chartables.c
+ specified, by copying to pcre[16]_chartables.c
pcreposix.c )
- pcre_byte_order.c )
- pcre_compile.c )
- pcre_config.c )
- pcre_dfa_exec.c )
- pcre_exec.c )
- pcre_fullinfo.c )
- pcre_get.c ) sources for the functions in the library,
- pcre_globals.c ) and some internal functions that they use
- pcre_info.c )
- pcre_jit_compile.c )
- pcre_maketables.c )
- pcre_newline.c )
+ pcre[16]_byte_order.c )
+ pcre[16]_compile.c )
+ pcre[16]_config.c )
+ pcre[16]_dfa_exec.c )
+ pcre[16]_exec.c )
+ pcre[16]_fullinfo.c )
+ pcre[16]_get.c ) sources for the functions in the library,
+ pcre[16]_globals.c ) and some internal functions that they use
+ pcre[16]_jit_compile.c )
+ pcre[16]_maketables.c )
+ pcre[16]_newline.c )
+ pcre[16]_refcount.c )
+ pcre[16]_string_utils.c )
+ pcre[16]_study.c )
+ pcre[16]_tables.c )
+ pcre[16]_ucd.c )
+ pcre[16]_version.c )
+ pcre[16]_xclass.c )
pcre_ord2utf8.c )
- pcre_refcount.c )
- pcre_study.c )
- pcre_tables.c )
- pcre_ucd.c )
pcre_valid_utf8.c )
- pcre_version.c )
- pcre_xclass.c )
- pcre_printint.src ) debugging function that is #included in pcretest,
+ pcre16_ord2utf16.c )
+ pcre16_utf16_utils.c )
+ pcre16_valid_utf16.c )
+
+ pcre[16]_printint.c ) debugging function that is used by pcretest,
) and can also be #included in pcre_compile()
+
pcre.h.in template for pcre.h when built by "configure"
pcreposix.h header for the external POSIX wrapper API
pcre_internal.h header for internal use
@@ -796,6 +826,7 @@ The distribution should contain the following files:
doc/pcretest.txt plain text documentation of test program
doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
+ libpcre16.pc.in template for libpcre16.pc for pkg-config
libpcre.pc.in template for libpcre.pc for pkg-config
libpcreposix.pc.in template for libpcreposix.pc for pkg-config
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
@@ -812,6 +843,7 @@ The distribution should contain the following files:
testdata/testinput* test data for main library tests
testdata/testoutput* expected test results
testdata/grep* input and output for pcregrep tests
+ testdata/* other supporting test files
(D) Auxiliary files for cmake support
@@ -842,4 +874,4 @@ The distribution should contain the following files:
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 06 September 2011
+Last updated: 30 December 2011