summaryrefslogtreecommitdiff
path: root/srclib/pcre/README
diff options
context:
space:
mode:
Diffstat (limited to 'srclib/pcre/README')
-rw-r--r--srclib/pcre/README274
1 files changed, 197 insertions, 77 deletions
diff --git a/srclib/pcre/README b/srclib/pcre/README
index 7557374791..fc5397ecce 100644
--- a/srclib/pcre/README
+++ b/srclib/pcre/README
@@ -16,6 +16,33 @@ regex.h, but I didn't want to risk possible problems with existing files of
that name by distributing it that way. To use it with an existing program that
uses the POSIX API, it will have to be renamed or pointed at by a link.
+If you are using the POSIX interface to PCRE and there is already a POSIX regex
+library installed on your system, you must take care when linking programs to
+ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
+up the "real" POSIX functions of the same name.
+
+
+Documentation for PCRE
+----------------------
+
+If you install PCRE in the normal way, you will end up with an installed set of
+man pages whose names all start with "pcre". The one that is called "pcre"
+lists all the others. In addition to these man pages, the PCRE documentation is
+supplied in two other forms; however, as there is no standard place to install
+them, they are left in the doc directory of the unpacked source distribution.
+These forms are:
+
+ 1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
+ first of these is a concatenation of the text forms of all the section 3
+ man pages except those that summarize individual functions. The other two
+ are the text forms of the section 1 man pages for the pcregrep and
+ pcretest commands. Text forms are provided for ease of scanning with text
+ editors or similar tools.
+
+ 2. A subdirectory called doc/html contains all the documentation in HTML
+ form, hyperlinked in various ways, and rooted in a file called
+ doc/index.html.
+
Contributions by users of PCRE
------------------------------
@@ -30,17 +57,18 @@ Windows systems (I myself do not use Windows). Some are complete in themselves;
others are pointers to URLs containing relevant files.
-Building PCRE on a Unix system
-------------------------------
+Building PCRE on a Unix-like system
+-----------------------------------
-To build PCRE on a Unix system, first run the "configure" command from the PCRE
-distribution directory, with your current directory set to the directory where
-you want the files to be created. This command is a standard GNU "autoconf"
-configuration script, for which generic instructions are supplied in INSTALL.
+To build PCRE on a Unix-like system, first run the "configure" command from the
+PCRE distribution directory, with your current directory set to the directory
+where you want the files to be created. This command is a standard GNU
+"autoconf" configuration script, for which generic instructions are supplied in
+INSTALL.
Most commonly, people build PCRE within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient, but the
-usual methods of changing standard defaults are available. For example,
+usual methods of changing standard defaults are available. For example:
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
@@ -55,18 +83,71 @@ into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
cd /build/pcre/pcre-xxx
/source/pcre/pcre-xxx/configure
-If you want to make use of the experimential, incomplete support for UTF-8
-character strings in PCRE, you must add --enable-utf8 to the "configure"
-command. Without it, the code for handling UTF-8 is not included in the
-library. (Even when included, it still has to be enabled by an option at run
-time.)
+There are some optional features that can be included or omitted from the PCRE
+library. You can read more about them in the pcrebuild man page.
-The "configure" script builds five files:
+. If you want to make use of the support for UTF-8 character strings in PCRE,
+ you must add --enable-utf8 to the "configure" command. Without it, the code
+ for handling UTF-8 is not included in the library. (Even when included, it
+ still has to be enabled by an option at run time.)
-. libtool is a script that builds shared and/or static libraries
+. If, in addition to support for UTF-8 character strings, you want to include
+ support for the \P, \p, and \X sequences that recognize Unicode character
+ properties, you must add --enable-unicode-properties to the "configure"
+ command. This adds about 90K to the size of the library (in the form of a
+ property table); only the basic two-letter properties such as Lu are
+ supported.
+
+. You can build PCRE to recognized CR or NL as the newline character, instead
+ of whatever your compiler uses for "\n", by adding --newline-is-cr or
+ --newline-is-nl to the "configure" command, respectively. Only do this if you
+ really understand what you are doing. On traditional Unix-like systems, the
+ newline character is NL.
+
+. When called via the POSIX interface, PCRE uses malloc() to get additional
+ storage for processing capturing parentheses if there are more than 10 of
+ them. You can increase this threshold by setting, for example,
+
+ --with-posix-malloc-threshold=20
+
+ on the "configure" command.
+
+. PCRE has a counter which can be set to limit the amount of resources it uses.
+ If the limit is exceeded during a match, the match fails. The default is ten
+ million. You can change the default by setting, for example,
+
+ --with-match-limit=500000
+
+ on the "configure" command. This is just the default; individual calls to
+ pcre_exec() can supply their own value. There is discussion on the pcreapi
+ man page.
+
+. The default maximum compiled pattern size is around 64K. You can increase
+ this by adding --with-link-size=3 to the "configure" command. You can
+ increase it even more by setting --with-link-size=4, but this is unlikely
+ ever to be necessary. If you build PCRE with an increased link size, test 2
+ (and 5 if you are using UTF-8) will fail. Part of the output of these tests
+ is a representation of the compiled pattern, and this changes with the link
+ size.
+
+. You can build PCRE so that its match() function does not call itself
+ recursively. Instead, it uses blocks of data from the heap via special
+ functions pcre_stack_malloc() and pcre_stack_free() to save data that would
+ otherwise be saved on the stack. To build PCRE like this, use
+
+ --disable-stack-for-recursion
+
+ on the "configure" command. PCRE runs more slowly in this mode, but it may be
+ necessary in environments with limited stack sizes.
+
+The "configure" script builds seven files:
+
+. pcre.h is build by copying pcre.in and making substitutions
. Makefile is built by copying Makefile.in and making substitutions.
. config.h is built by copying config.in and making substitutions.
. pcre-config is built by copying pcre-config.in and making substitutions.
+. libpcre.pc is data for the pkg-config command, built from libpcre.pc.in
+. libtool is a script that builds shared and/or static libraries
. RunTest is a script for running tests
Once "configure" has run, you can run "make". It builds two libraries called
@@ -75,30 +156,36 @@ command. You can use "make install" to copy these, the public header files
pcre.h and pcreposix.h, and the man pages to appropriate live directories on
your system, in the normal way.
+
+Retrieving configuration information on Unix-like systems
+---------------------------------------------------------
+
Running "make install" also installs the command pcre-config, which can be used
to recall information about the PCRE configuration and installation. For
-example,
+example:
pcre-config --version
prints the version number, and
- pcre-config --libs
+ pcre-config --libs
outputs information about where the library is installed. This command can be
included in makefiles for programs that use PCRE, saving the programmer from
having to remember too many details.
-There is one esoteric feature that is controlled by "configure". It concerns
-the character value used for "newline", and is something that you probably do
-not want to change on a Unix system. The default is to use whatever value your
-compiler gives to '\n'. By using --enable-newline-is-cr or
---enable-newline-is-lf you can force the value to be CR (13) or LF (10) if you
-really want to.
+The pkg-config command is another system for saving and retrieving information
+about installed libraries. Instead of separate commands for each library, a
+single command is used. For example:
+ pkg-config --cflags pcre
-Shared libraries on Unix systems
---------------------------------
+The data is held in *.pc files that are installed in a directory called
+pkgconfig.
+
+
+Shared libraries on Unix-like systems
+-------------------------------------
The default distribution builds PCRE as two shared libraries and two static
libraries, as long as the operating system supports shared libraries. Shared
@@ -115,7 +202,7 @@ installed themselves. However, the versions left in the source directory still
use the uninstalled libraries.
To build PCRE using static libraries only you must use --disable-shared when
-configuring it. For example
+configuring it. For example:
./configure --prefix=/usr/gnu --disable-shared
@@ -123,12 +210,28 @@ Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.
+Cross-compiling on a Unix-like system
+-------------------------------------
+
+You can specify CC and CFLAGS in the normal way to the "configure" command, in
+order to cross-compile PCRE for some other host. However, during the building
+process, the dftables.c source file is compiled *and run* on the local host, in
+order to generate the default character tables (the chartables.c file). It
+therefore needs to be compiled with the local compiler, not the cross compiler.
+You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)
+when calling the "configure" command. If they are not specified, they default
+to the values of CC and CFLAGS.
+
+
Building on non-Unix systems
----------------------------
-For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has
-been compiled on Windows systems and on Macintoshes, but I don't know the
-details because I don't use those systems. It should be straightforward to
+For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
+the system supports the use of "configure" and "make" you may be able to build
+PCRE in the same way as for Unix systems.
+
+PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
+the details because I don't use those systems. It should be straightforward to
build PCRE on any system that has a Standard C compiler, because it uses only
Standard C functions.
@@ -138,22 +241,20 @@ Testing PCRE
To test PCRE on a Unix system, run the RunTest script that is created by the
configuring process. (This can also be run by "make runtest", "make check", or
-"make test".) For other systems, see the instruction in NON-UNIX-USE.
+"make test".) For other systems, see the instructions in NON-UNIX-USE.
-The script runs the pcretest test program (which is documented in the doc
-directory) on each of the testinput files (in the testdata directory) in turn,
+The script runs the pcretest test program (which is documented in its own man
+page) on each of the testinput files (in the testdata directory) in turn,
and compares the output with the contents of the corresponding testoutput file.
-A file called testtry is used to hold the output from pcretest. To run pcretest
-on just one of the test files, give its number as an argument to RunTest, for
-example:
+A file called testtry is used to hold the main output from pcretest
+(testsavedregex is also used as a working file). To run pcretest on just one of
+the test files, give its number as an argument to RunTest, for example:
- RunTest 3
+ RunTest 2
-The first and third test files can also be fed directly into the perltest
-script to check that Perl gives the same results. The third file requires the
-additional features of release 5.005, which is why it is kept separate from the
-main test input, which needs only Perl 5.004. In the long run, when 5.005 (or
-higher) is widespread, these two test files may get amalgamated.
+The first file can also be fed directly into the perltest script to check that
+Perl gives the same results. The only difference you should see is in the first
+few lines, where the Perl version is given instead of the PCRE version.
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
@@ -171,34 +272,42 @@ listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE.
-The fourth set of tests checks pcre_maketables(), the facility for building a
+The third set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
-default tables. The tests make use of the "fr" (French) locale. Before running
-the test, the script checks for the presence of this locale by running the
-"locale" command. If that command fails, or if it doesn't include "fr" in the
-list of available locales, the fourth test cannot be run, and a comment is
-output to say why. If running this test produces instances of the error
+default tables. The tests make use of the "fr_FR" (French) locale. Before
+running the test, the script checks for the presence of this locale by running
+the "locale" command. If that command fails, or if it doesn't include "fr_FR"
+in the list of available locales, the third test cannot be run, and a comment
+is output to say why. If running this test produces instances of the error
- ** Failed to set locale "fr"
+ ** Failed to set locale "fr_FR"
in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.
-The fifth test checks the experimental, incomplete UTF-8 support. It is not run
-automatically unless PCRE is built with UTF-8 support. This file can be fed
-directly to the perltest8 script, which requires Perl 5.6 or higher. The sixth
-file tests internal UTF-8 features of PCRE that are not relevant to Perl.
+The fourth test checks the UTF-8 support. It is not run automatically unless
+PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
+running "configure". This file can be also fed directly to the perltest script,
+provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
+commented in the script, can be be used.)
+
+The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
+features of PCRE that are not relevant to Perl.
+
+The sixth and final test checks the support for Unicode character properties.
+It it not run automatically unless PCRE is built with Unicode property support.
+To to this you must set --enable-unicode-properties when running "configure".
Character tables
----------------
-PCRE uses four tables for manipulating and identifying characters. The final
-argument of the pcre_compile() function is a pointer to a block of memory
-containing the concatenated tables. A call to pcre_maketables() can be used to
-generate a set of tables in the current locale. If the final argument for
-pcre_compile() is passed as NULL, a set of default tables that is built into
-the binary is used.
+PCRE uses four tables for manipulating and identifying characters whose values
+are less than 256. The final argument of the pcre_compile() function is a
+pointer to a block of memory containing the concatenated tables. A call to
+pcre_maketables() can be used to generate a set of tables in the current
+locale. If the final argument for pcre_compile() is passed as NULL, a set of
+default tables that is built into the binary is used.
The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
@@ -238,11 +347,20 @@ The distribution should contain the following files:
headers:
dftables.c auxiliary program for building chartables.c
+
get.c )
maketables.c )
- study.c ) source of
- pcre.c ) the functions
+ study.c ) source of the functions
+ pcre.c ) in the library
pcreposix.c )
+ printint.c )
+
+ ucp.c )
+ ucp.h ) source for the code that is used for
+ ucpinternal.h ) Unicode property handling
+ ucptable.c )
+ ucptypetable.c )
+
pcre.in "source" for the header for the external API; pcre.h
is built from this by "configure"
pcreposix.h header for the external POSIX wrapper API
@@ -266,31 +384,27 @@ The distribution should contain the following files:
configure a configuring shell script (built by autoconf)
configure.in the autoconf input used to build configure
doc/Tech.Notes notes on the encoding
- doc/pcre.3 man page source for the PCRE functions
- doc/pcre.html HTML version
- doc/pcre.txt plain text version
- doc/pcreposix.3 man page source for the POSIX wrapper API
- doc/pcreposix.html HTML version
- doc/pcreposix.txt plain text version
- doc/pcretest.txt documentation of test program
- doc/perltest.txt documentation of Perl test program
- doc/pcregrep.1 man page source for the pcregrep utility
- doc/pcregrep.html HTML version
- doc/pcregrep.txt plain text version
+ doc/*.3 man page sources for the PCRE functions
+ doc/*.1 man page sources for pcregrep and pcretest
+ doc/html/* HTML documentation
+ doc/pcre.txt plain text version of the man pages
+ doc/pcretest.txt plain text documentation of test program
+ doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
+ libpcre.pc.in "source" for libpcre.pc for pkg-config
ltmain.sh file used to build a libtool script
+ mkinstalldirs script for making install directories
pcretest.c comprehensive test program
pcredemo.c simple demonstration of coding calls to PCRE
perltest Perl test program
- perltest8 Perl test program for UTF-8 tests
pcregrep.c source of a grep utility that uses PCRE
pcre-config.in source of script which retains PCRE information
- testdata/testinput1 test data, compatible with Perl 5.004 and 5.005
+ testdata/testinput1 test data, compatible with Perl
testdata/testinput2 test data for error messages and non-Perl things
- testdata/testinput3 test data, compatible with Perl 5.005
- testdata/testinput4 test data for locale-specific tests
- testdata/testinput5 test data for UTF-8 tests compatible with Perl 5.6
- testdata/testinput6 test data for other UTF-8 tests
+ testdata/testinput3 test data for locale-specific tests
+ testdata/testinput4 test data for UTF-8 tests compatible with Perl
+ testdata/testinput5 test data for other UTF-8 tests
+ testdata/testinput6 test data for Unicode property support tests
testdata/testoutput1 test results corresponding to testinput1
testdata/testoutput2 test results corresponding to testinput2
testdata/testoutput3 test results corresponding to testinput3
@@ -301,7 +415,13 @@ The distribution should contain the following files:
(C) Auxiliary files for Win32 DLL
dll.mk
+ libpcre.def
+ libpcreposix.def
pcre.def
+(D) Auxiliary file for VPASCAL
+
+ makevp.bat
+
Philip Hazel <ph10@cam.ac.uk>
-August 2001
+September 2004