summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorchpe <chpe@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-10-16 15:53:30 +0000
committerchpe <chpe@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-10-16 15:53:30 +0000
commit62c2f93fe63ee94ff2692091a42a7d594f5d4fe3 (patch)
tree3d1739b24c57943c20fa880eed55ab341db96a81 /README
parent3f6d05379ea067a3b4f4a61e4be268ee8c37e7a6 (diff)
downloadpcre-62c2f93fe63ee94ff2692091a42a7d594f5d4fe3.tar.gz
pcre32: Add 32-bit library
Create libpcre32 that operates on 32-bit characters (UTF-32). This turned out to be surprisingly simple after the UTF-16 support was introduced; mostly just extra ifdefs and adjusting and adding some tests. git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1055 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'README')
-rw-r--r--README134
1 files changed, 75 insertions, 59 deletions
diff --git a/README b/README
index 0d3cffc..a65cf9e 100644
--- a/README
+++ b/README
@@ -35,9 +35,10 @@ The contents of this README file are:
The PCRE APIs
-------------
-PCRE is written in C, and it has its own API. There are two sets of functions,
-one for the 8-bit library, which processes strings of bytes, and one for the
-16-bit library, which processes strings of 16-bit values. The distribution also
+PCRE is written in C, and it has its own API. There are three sets of functions,
+one for the 8-bit library, which processes strings of bytes, one for the
+16-bit library, which processes strings of 16-bit values, and one for the 32-bit
+library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
@@ -183,8 +184,10 @@ library. They are also documented in the pcrebuild man page.
(See also "Shared libraries on Unix-like systems" below.)
. By default, only the 8-bit library is built. If you add --enable-pcre16 to
- the "configure" command, the 16-bit library is also built. If you want only
- the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8".
+ the "configure" command, the 16-bit library is also built. If you add
+ --enable-pcre32 to the "configure" command, the 32-bit library is also built.
+ If you want only the 16-bit or 32-bit library, --disable-pcre8 to disable
+ building the 8-bit library.
. If you are building the 8-bit library and want to suppress the building of
the C++ wrapper library, you can add --disable-cpp to the "configure"
@@ -203,23 +206,24 @@ library. They are also documented in the pcrebuild man page.
. If you want to make use of the support for UTF-8 Unicode character strings in
the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
- you must add --enable-utf to the "configure" command. Without it, the code
- for handling UTF-8 and UTF-16 is not included in the relevant library. Even
+ or UTF-32 Unicode character strings in the 32-bit library, you must add
+ --enable-utf to the "configure" command. Without it, the code for handling
+ UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
when --enable-utf is included, the use of a UTF encoding still has to be
enabled by an option at run time. When PCRE is compiled with this option, its
- input can only either be ASCII or UTF-8/16, even when running on EBCDIC
+ input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
the same time.
-. There are no separate options for enabling UTF-8 and UTF-16 independently
- because that would allow ridiculous settings such as requesting UTF-16
- support while building only the 8-bit library. However, the option
+. There are no separate options for enabling UTF-8, UTF-16 and UTF-32
+ independently because that would allow ridiculous settings such as requesting
+ UTF-16 support while building only the 8-bit library. However, the option
--enable-utf8 is retained for backwards compatibility with earlier releases
- that did not support 16-bit character strings. It is synonymous with
+ that did not support 16-bit or 32-bit character strings. It is synonymous with
--enable-utf. It is not possible to configure one library with UTF support
and the other without in the same configuration.
-. If, in addition to support for UTF-8/16 character strings, you want to
+. If, in addition to support for UTF-8/16/32 character strings, you want to
include support for the \P, \p, and \X sequences that recognize Unicode
character properties, you must add --enable-unicode-properties to the
"configure" command. This adds about 30K to the size of the library (in the
@@ -281,7 +285,8 @@ library. They are also documented in the pcrebuild man page.
library, PCRE then uses three bytes instead of two for offsets to different
parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
the same as --with-link-size=4, which (in both libraries) uses four-byte
- offsets. Increasing the internal link size reduces performance.
+ offsets. Increasing the internal link size reduces performance. In the 32-bit
+ library, the only supported link size is 4.
. You can build PCRE so that its internal match() function that is called from
pcre_exec() does not call itself recursively. Instead, it uses memory blocks
@@ -316,7 +321,7 @@ library. They are also documented in the pcrebuild man page.
This automatically implies --enable-rebuild-chartables (see above). However,
when PCRE is built this way, it always operates in EBCDIC. It cannot support
- both EBCDIC and UTF-8/16. There is a second option, --enable-ebcdic-nl25,
+ both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
which specifies that the code value for the EBCDIC NL character is 0x25
instead of the default 0x15.
@@ -368,6 +373,7 @@ The "configure" script builds the following files for the basic C library:
that were set for "configure"
. libpcre.pc ) data for the pkg-config command
. libpcre16.pc )
+. libpcre32.pc )
. libpcreposix.pc )
. libtool script that builds shared and/or static libraries
@@ -387,8 +393,8 @@ The "configure" script also creates config.status, which is an executable
script that can be run to recreate the configuration, and config.log, which
contains compiler output from tests that "configure" runs.
-Once "configure" has run, you can run "make". This builds either or both of the
-libraries libpcre and libpcre16, and a test program called pcretest. If you
+Once "configure" has run, you can run "make". This builds the the libraries
+libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
enabled JIT support with --enable-jit, a test program called pcre_jit_test is
built as well.
@@ -412,12 +418,14 @@ system. The following are installed (file names are all relative to the
Libraries (lib):
libpcre16 (if 16-bit support is enabled)
+ libpcre32 (if 32-bit support is enabled)
libpcre (if 8-bit support is enabled)
libpcreposix (if 8-bit support is enabled)
libpcrecpp (if 8-bit and C++ support is enabled)
Configuration information (lib/pkgconfig):
libpcre16.pc
+ libpcre32.pc
libpcre.pc
libpcreposix.pc
libpcrecpp.pc (if C++ support is enabled)
@@ -598,7 +606,7 @@ The RunTest script runs the pcretest test program (which is documented in its
own man page) on each of the relevant testinput files in the testdata
directory, and compares the output with the contents of the corresponding
testoutput files. Some tests are relevant only when certain build-time options
-were selected. For example, the tests for UTF-8/16 support are run only if
+were selected. For example, the tests for UTF-8/16/32 support are run only if
--enable-utf was used. RunTest outputs a comment when it skips a test.
Many of the tests that are not skipped are run up to three times. The second
@@ -607,9 +615,9 @@ tests that are marked "never study" (see the pcretest program for how this is
done). If JIT support is available, the non-DFA tests are run a third time,
this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
-When both 8-bit and 16-bit support is enabled, the entire set of tests is run
-twice, once for each library. If you want to run just one set of tests, call
-RunTest with either the -8 or -16 option.
+The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
+libraries that are enabled. If you want to run just one set of tests, call
+RunTest with either the -8, -16 or -32 option.
RunTest uses a file called testtry to hold the main output from pcretest.
Other files whose names begin with "test" are used as working files in some
@@ -660,13 +668,13 @@ RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
Windows versions of test 2. More info on using RunTest.bat is included in the
document entitled NON-UNIX-USE.]
-The fourth and fifth tests check the UTF-8/16 support and error handling and
+The fourth and fifth tests check the UTF-8/16/32 support and error handling and
internal UTF features of PCRE that are not relevant to Perl, respectively. The
sixth and seventh tests do the same for Unicode character properties support.
The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
-matching function, in non-UTF-8/16 mode, UTF-8/16 mode, and UTF-8/16 mode with
-Unicode property support, respectively.
+matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
+mode with Unicode property support, respectively.
The eleventh test checks some internal offsets and code size features; it is
run only when the default "link size" of 2 is set (in other cases the sizes
@@ -677,16 +685,21 @@ test is run only when JIT support is not available. They test some JIT-specific
features such as information output from pcretest about JIT compilation.
The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
-the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode.
+the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit mode.
These are tests that generate different output in the two modes. They are for
-general cases, UTF-8/16 support, and Unicode property support, respectively.
+general cases, UTF-8/16/32 support, and Unicode property support, respectively.
-The twentieth test is run only in 16-bit mode. It tests some specific 16-bit
-features of the DFA matching engine.
+The twentieth test is run only in 16/32-bit mode. It tests some specific
+16/32-bit features of the DFA matching engine.
-The twenty-first and twenty-second tests are run only in 16-bit mode, when the
-link size is set to 2. They test reloading pre-compiled patterns.
+The twenty-first and twenty-second tests are run only in 16/32-bit mode, when the
+link size is set to 2 for the 16-bit library. They test reloading pre-compiled patterns.
+The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are for
+general cases, and UTF-16 support, respectively.
+
+The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are for
+general cases, and UTF-32 support, respectively.
Character tables
----------------
@@ -746,8 +759,8 @@ File manifest
-------------
The distribution should contain the files listed below. Where a file name is
-given as pcre[16]_xxx it means that there are two files, one with the name
-pcre_xxx and the other with the name pcre16_xxx.
+given as pcre[16|32]_xxx it means that there are three files, one with the name
+pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
(A) Source files of the PCRE library functions and their headers:
@@ -758,33 +771,35 @@ pcre_xxx and the other with the name pcre16_xxx.
coding; used, unless --enable-rebuild-chartables is
specified, by copying to pcre[16]_chartables.c
- pcreposix.c )
- pcre[16]_byte_order.c )
- pcre[16]_compile.c )
- pcre[16]_config.c )
- pcre[16]_dfa_exec.c )
- pcre[16]_exec.c )
- pcre[16]_fullinfo.c )
- pcre[16]_get.c ) sources for the functions in the library,
- pcre[16]_globals.c ) and some internal functions that they use
- pcre[16]_jit_compile.c )
- pcre[16]_maketables.c )
- pcre[16]_newline.c )
- pcre[16]_refcount.c )
- pcre[16]_string_utils.c )
- pcre[16]_study.c )
- pcre[16]_tables.c )
- pcre[16]_ucd.c )
- pcre[16]_version.c )
- pcre[16]_xclass.c )
- pcre_ord2utf8.c )
- pcre_valid_utf8.c )
- pcre16_ord2utf16.c )
- pcre16_utf16_utils.c )
- pcre16_valid_utf16.c )
-
- pcre[16]_printint.c ) debugging function that is used by pcretest,
- ) and can also be #included in pcre_compile()
+ pcreposix.c )
+ pcre[16|32]_byte_order.c )
+ pcre[16|32]_compile.c )
+ pcre[16|32]_config.c )
+ pcre[16|32]_dfa_exec.c )
+ pcre[16|32]_exec.c )
+ pcre[16|32]_fullinfo.c )
+ pcre[16|32]_get.c ) sources for the functions in the library,
+ pcre[16|32]_globals.c ) and some internal functions that they use
+ pcre[16|32]_jit_compile.c )
+ pcre[16|32]_maketables.c )
+ pcre[16|32]_newline.c )
+ pcre[16|32]_refcount.c )
+ pcre[16|32]_string_utils.c )
+ pcre[16|32]_study.c )
+ pcre[16|32]_tables.c )
+ pcre[16|32]_ucd.c )
+ pcre[16|32]_version.c )
+ pcre[16|32]_xclass.c )
+ pcre_ord2utf8.c )
+ pcre_valid_utf8.c )
+ pcre16_ord2utf16.c )
+ pcre16_utf16_utils.c )
+ pcre16_valid_utf16.c )
+ pcre32_utf32_utils.c )
+ pcre32_valid_utf32.c )
+
+ pcre[16|32]_printint.c ) debugging function that is used by pcretest,
+ ) and can also be #included in pcre_compile()
pcre.h.in template for pcre.h when built by "configure"
pcreposix.h header for the external POSIX wrapper API
@@ -849,6 +864,7 @@ pcre_xxx and the other with the name pcre16_xxx.
doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
libpcre16.pc.in template for libpcre16.pc for pkg-config
+ libpcre32.pc.in template for libpcre32.pc for pkg-config
libpcre.pc.in template for libpcre.pc for pkg-config
libpcreposix.pc.in template for libpcreposix.pc for pkg-config
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config