diff options
Diffstat (limited to 'ext/pcre/pcrelib/README')
-rw-r--r-- | ext/pcre/pcrelib/README | 489 |
1 files changed, 0 insertions, 489 deletions
diff --git a/ext/pcre/pcrelib/README b/ext/pcre/pcrelib/README deleted file mode 100644 index f8d63bbd34..0000000000 --- a/ext/pcre/pcrelib/README +++ /dev/null @@ -1,489 +0,0 @@ -README file for PCRE (Perl-compatible regular expression library) ------------------------------------------------------------------ - -The latest release of PCRE is always available from - - ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz - -Please read the NEWS file if you are upgrading from a previous release. - - -The PCRE APIs -------------- - -PCRE is written in C, and it has its own API. The distribution now includes a -set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page -for details). - -Also included are a set of C wrapper functions that are based on the POSIX -API. These end up in the library called libpcreposix. Note that this just -provides a POSIX calling interface to PCRE: the regular expressions themselves -still follow Perl syntax and semantics. The header file for the POSIX-style -functions is called pcreposix.h. The official POSIX name is regex.h, but I -didn't want to risk possible problems with existing files of that name by -distributing it that way. To use it with an existing program that uses the -POSIX API, it will have to be renamed or pointed at by a link. - -If you are using the POSIX interface to PCRE and there is already a POSIX regex -library installed on your system, you must take care when linking programs to -ensure that they link with PCRE's libpcreposix library. Otherwise they may pick -up the "real" POSIX functions of the same name. - - -Documentation for PCRE ----------------------- - -If you install PCRE in the normal way, you will end up with an installed set of -man pages whose names all start with "pcre". The one that is called "pcre" -lists all the others. In addition to these man pages, the PCRE documentation is -supplied in two other forms; however, as there is no standard place to install -them, they are left in the doc directory of the unpacked source distribution. -These forms are: - - 1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The - first of these is a concatenation of the text forms of all the section 3 - man pages except those that summarize individual functions. The other two - are the text forms of the section 1 man pages for the pcregrep and - pcretest commands. Text forms are provided for ease of scanning with text - editors or similar tools. - - 2. A subdirectory called doc/html contains all the documentation in HTML - form, hyperlinked in various ways, and rooted in a file called - doc/index.html. - - -Contributions by users of PCRE ------------------------------- - -You can find contributions from PCRE users in the directory - - ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib - -where there is also a README file giving brief descriptions of what they are. -Several of them provide support for compiling PCRE on various flavours of -Windows systems (I myself do not use Windows). Some are complete in themselves; -others are pointers to URLs containing relevant files. - - -Building PCRE on a Unix-like system ------------------------------------ - -To build PCRE on a Unix-like system, first run the "configure" command from the -PCRE distribution directory, with your current directory set to the directory -where you want the files to be created. This command is a standard GNU -"autoconf" configuration script, for which generic instructions are supplied in -INSTALL. - -Most commonly, people build PCRE within its own distribution directory, and in -this case, on many systems, just running "./configure" is sufficient, but the -usual methods of changing standard defaults are available. For example: - -CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local - -specifies that the C compiler should be run with the flags '-O2 -Wall' instead -of the default, and that "make install" should install PCRE under /opt/local -instead of the default /usr/local. - -If you want to build in a different directory, just run "configure" with that -directory as current. For example, suppose you have unpacked the PCRE source -into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx: - -cd /build/pcre/pcre-xxx -/source/pcre/pcre-xxx/configure - -There are some optional features that can be included or omitted from the PCRE -library. You can read more about them in the pcrebuild man page. - -. If you want to make use of the support for UTF-8 character strings in PCRE, - you must add --enable-utf8 to the "configure" command. Without it, the code - for handling UTF-8 is not included in the library. (Even when included, it - still has to be enabled by an option at run time.) - -. If, in addition to support for UTF-8 character strings, you want to include - support for the \P, \p, and \X sequences that recognize Unicode character - properties, you must add --enable-unicode-properties to the "configure" - command. This adds about 90K to the size of the library (in the form of a - property table); only the basic two-letter properties such as Lu are - supported. - -. You can build PCRE to recognized CR or NL as the newline character, instead - of whatever your compiler uses for "\n", by adding --newline-is-cr or - --newline-is-nl to the "configure" command, respectively. Only do this if you - really understand what you are doing. On traditional Unix-like systems, the - newline character is NL. - -. When called via the POSIX interface, PCRE uses malloc() to get additional - storage for processing capturing parentheses if there are more than 10 of - them. You can increase this threshold by setting, for example, - - --with-posix-malloc-threshold=20 - - on the "configure" command. - -. PCRE has a counter that can be set to limit the amount of resources it uses. - If the limit is exceeded during a match, the match fails. The default is ten - million. You can change the default by setting, for example, - - --with-match-limit=500000 - - on the "configure" command. This is just the default; individual calls to - pcre_exec() can supply their own value. There is discussion on the pcreapi - man page. - -. The default maximum compiled pattern size is around 64K. You can increase - this by adding --with-link-size=3 to the "configure" command. You can - increase it even more by setting --with-link-size=4, but this is unlikely - ever to be necessary. If you build PCRE with an increased link size, test 2 - (and 5 if you are using UTF-8) will fail. Part of the output of these tests - is a representation of the compiled pattern, and this changes with the link - size. - -. You can build PCRE so that its internal match() function that is called from - pcre_exec() does not call itself recursively. Instead, it uses blocks of data - from the heap via special functions pcre_stack_malloc() and pcre_stack_free() - to save data that would otherwise be saved on the stack. To build PCRE like - this, use - - --disable-stack-for-recursion - - on the "configure" command. PCRE runs more slowly in this mode, but it may be - necessary in environments with limited stack sizes. This applies only to the - pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not - use deeply nested recursion. - -The "configure" script builds eight files for the basic C library: - -. pcre.h is the header file for C programs that call PCRE -. Makefile is the makefile that builds the library -. config.h contains build-time configuration options for the library -. pcre-config is a script that shows the settings of "configure" options -. libpcre.pc is data for the pkg-config command -. libtool is a script that builds shared and/or static libraries -. RunTest is a script for running tests on the library -. RunGrepTest is a script for running tests on the pcregrep command - -In addition, if a C++ compiler is found, the following are also built: - -. pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper -. pcre_stringpiece.h is the header for the C++ "stringpiece" functions - -The "configure" script also creates config.status, which is an executable -script that can be run to recreate the configuration, and config.log, which -contains compiler output from tests that "configure" runs. - -Once "configure" has run, you can run "make". It builds two libraries, called -libpcre and libpcreposix, a test program called pcretest, and the pcregrep -command. If a C++ compiler was found on your system, it also builds the C++ -wrapper library, which is called libpcrecpp, and some test programs called -pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest. - -The command "make test" runs all the appropriate tests. Details of the PCRE -tests are given in a separate section of this document, below. - -You can use "make install" to copy the libraries, the public header files -pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if -the C++ wrapper was built), and the man pages to appropriate live directories -on your system, in the normal way. - -If you want to remove PCRE from your system, you can run "make uninstall". -This removes all the files that "make install" installed. However, it does not -remove any directories, because these are often shared with other programs. - - -Retrieving configuration information on Unix-like systems ---------------------------------------------------------- - -Running "make install" also installs the command pcre-config, which can be used -to recall information about the PCRE configuration and installation. For -example: - - pcre-config --version - -prints the version number, and - - pcre-config --libs - -outputs information about where the library is installed. This command can be -included in makefiles for programs that use PCRE, saving the programmer from -having to remember too many details. - -The pkg-config command is another system for saving and retrieving information -about installed libraries. Instead of separate commands for each library, a -single command is used. For example: - - pkg-config --cflags pcre - -The data is held in *.pc files that are installed in a directory called -pkgconfig. - - -Shared libraries on Unix-like systems -------------------------------------- - -The default distribution builds PCRE as shared libraries and static libraries, -as long as the operating system supports shared libraries. Shared library -support relies on the "libtool" script which is built as part of the -"configure" process. - -The libtool script is used to compile and link both shared and static -libraries. They are placed in a subdirectory called .libs when they are newly -built. The programs pcretest and pcregrep are built to use these uninstalled -libraries (by means of wrapper scripts in the case of shared libraries). When -you use "make install" to install shared libraries, pcregrep and pcretest are -automatically re-built to use the newly installed shared libraries before being -installed themselves. However, the versions left in the source directory still -use the uninstalled libraries. - -To build PCRE using static libraries only you must use --disable-shared when -configuring it. For example: - -./configure --prefix=/usr/gnu --disable-shared - -Then run "make" in the usual way. Similarly, you can use --disable-static to -build only shared libraries. - - -Cross-compiling on a Unix-like system -------------------------------------- - -You can specify CC and CFLAGS in the normal way to the "configure" command, in -order to cross-compile PCRE for some other host. However, during the building -process, the dftables.c source file is compiled *and run* on the local host, in -order to generate the default character tables (the chartables.c file). It -therefore needs to be compiled with the local compiler, not the cross compiler. -You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD; -there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper) -when calling the "configure" command. If they are not specified, they default -to the values of CC and CFLAGS. - - -Building on non-Unix systems ----------------------------- - -For a non-Unix system, read the comments in the file NON-UNIX-USE, though if -the system supports the use of "configure" and "make" you may be able to build -PCRE in the same way as for Unix systems. - -PCRE has been compiled on Windows systems and on Macintoshes, but I don't know -the details because I don't use those systems. It should be straightforward to -build PCRE on any system that has a Standard C compiler, because it uses only -Standard C functions. - - -Testing PCRE ------------- - -To test PCRE on a Unix system, run the RunTest script that is created by the -configuring process. There is also a script called RunGrepTest that tests the -options of the pcregrep command. If the C++ wrapper library is build, three -test programs called pcrecpp_unittest, pcre_scanner_unittest, and -pcre_stringpiece_unittest are provided. - -Both the scripts and all the program tests are run if you obey "make runtest", -"make check", or "make test". For other systems, see the instructions in -NON-UNIX-USE. - -The RunTest script runs the pcretest test program (which is documented in its -own man page) on each of the testinput files (in the testdata directory) in -turn, and compares the output with the contents of the corresponding testoutput -file. A file called testtry is used to hold the main output from pcretest -(testsavedregex is also used as a working file). To run pcretest on just one of -the test files, give its number as an argument to RunTest, for example: - - RunTest 2 - -The first file can also be fed directly into the perltest script to check that -Perl gives the same results. The only difference you should see is in the first -few lines, where the Perl version is given instead of the PCRE version. - -The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(), -pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error -detection, and run-time flags that are specific to PCRE, as well as the POSIX -wrapper API. It also uses the debugging flag to check some of the internals of -pcre_compile(). - -If you build PCRE with a locale setting that is not the standard C locale, the -character tables may be different (see next paragraph). In some cases, this may -cause failures in the second set of tests. For example, in a locale where the -isprint() function yields TRUE for characters in the range 128-255, the use of -[:isascii:] inside a character class defines a different set of characters, and -this shows up in this test as a difference in the compiled code, which is being -listed for checking. Where the comparison test output contains [\x00-\x7f] the -test will contain [\x00-\xff], and similarly in some other cases. This is not a -bug in PCRE. - -The third set of tests checks pcre_maketables(), the facility for building a -set of character tables for a specific locale and using them instead of the -default tables. The tests make use of the "fr_FR" (French) locale. Before -running the test, the script checks for the presence of this locale by running -the "locale" command. If that command fails, or if it doesn't include "fr_FR" -in the list of available locales, the third test cannot be run, and a comment -is output to say why. If running this test produces instances of the error - - ** Failed to set locale "fr_FR" - -in the comparison output, it means that locale is not available on your system, -despite being listed by "locale". This does not mean that PCRE is broken. - -The fourth test checks the UTF-8 support. It is not run automatically unless -PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when -running "configure". This file can be also fed directly to the perltest script, -provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch, -commented in the script, can be be used.) - -The fifth test checks error handling with UTF-8 encoding, and internal UTF-8 -features of PCRE that are not relevant to Perl. - -The sixth and test checks the support for Unicode character properties. It it -not run automatically unless PCRE is built with Unicode property support. To to -this you must set --enable-unicode-properties when running "configure". - -The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative -matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode -property support, respectively. The eighth and ninth tests are not run -automatically unless PCRE is build with the relevant support. - - -Character tables ----------------- - -PCRE uses four tables for manipulating and identifying characters whose values -are less than 256. The final argument of the pcre_compile() function is a -pointer to a block of memory containing the concatenated tables. A call to -pcre_maketables() can be used to generate a set of tables in the current -locale. If the final argument for pcre_compile() is passed as NULL, a set of -default tables that is built into the binary is used. - -The source file called chartables.c contains the default set of tables. This is -not supplied in the distribution, but is built by the program dftables -(compiled from dftables.c), which uses the ANSI C character handling functions -such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table -sources. This means that the default C locale which is set for your system will -control the contents of these default tables. You can change the default tables -by editing chartables.c and then re-building PCRE. If you do this, you should -probably also edit Makefile to ensure that the file doesn't ever get -re-generated. - -The first two 256-byte tables provide lower casing and case flipping functions, -respectively. The next table consists of three 32-byte bit maps which identify -digits, "word" characters, and white space, respectively. These are used when -building 32-byte bit maps that represent character classes. - -The final 256-byte table has bits indicating various character types, as -follows: - - 1 white space character - 2 letter - 4 decimal digit - 8 hexadecimal digit - 16 alphanumeric or '_' - 128 regular expression metacharacter or binary zero - -You should not alter the set of characters that contain the 128 bit, as that -will cause PCRE to malfunction. - - -Manifest --------- - -The distribution should contain the following files: - -(A) The actual source files of the PCRE library functions and their - headers: - - dftables.c auxiliary program for building chartables.c - - pcreposix.c ) - pcre_compile.c ) - pcre_config.c ) - pcre_dfa_exec.c ) - pcre_exec.c ) - pcre_fullinfo.c ) - pcre_get.c ) sources for the functions in the library, - pcre_globals.c ) and some internal functions that they use - pcre_info.c ) - pcre_maketables.c ) - pcre_ord2utf8.c ) - pcre_printint.c ) - pcre_study.c ) - pcre_tables.c ) - pcre_try_flipped.c ) - pcre_ucp_findchar.c ) - pcre_valid_utf8.c ) - pcre_version.c ) - pcre_xclass.c ) - - ucp_findchar.c ) - ucp.h ) source for the code that is used for - ucpinternal.h ) Unicode property handling - ucptable.c ) - ucptypetable.c ) - - pcre.in "source" for the header for the external API; pcre.h - is built from this by "configure" - pcreposix.h header for the external POSIX wrapper API - pcre_internal.h header for internal use - config.in template for config.h, which is built by configure - - pcrecpp.h.in "source" for the header file for the C++ wrapper - pcrecpp.cc ) - pcre_scanner.cc ) source for the C++ wrapper library - - pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the - C++ stringpiece functions - pcre_stringpiece.cc source for the C++ stringpiece functions - -(B) Auxiliary files: - - AUTHORS information about the author of PCRE - ChangeLog log of changes to the code - INSTALL generic installation instructions - LICENCE conditions for the use of PCRE - COPYING the same, using GNU's standard name - Makefile.in template for Unix Makefile, which is built by configure - NEWS important changes in this release - NON-UNIX-USE notes on building PCRE on non-Unix systems - README this file - RunTest.in template for a Unix shell script for running tests - RunGrepTest.in template for a Unix shell script for pcregrep tests - config.guess ) files used by libtool, - config.sub ) used only when building a shared library - configure a configuring shell script (built by autoconf) - configure.in the autoconf input used to build configure - doc/Tech.Notes notes on the encoding - doc/*.3 man page sources for the PCRE functions - doc/*.1 man page sources for pcregrep and pcretest - doc/html/* HTML documentation - doc/pcre.txt plain text version of the man pages - doc/pcretest.txt plain text documentation of test program - doc/perltest.txt plain text documentation of Perl test program - install-sh a shell script for installing files - libpcre.pc.in "source" for libpcre.pc for pkg-config - ltmain.sh file used to build a libtool script - mkinstalldirs script for making install directories - pcretest.c comprehensive test program - pcredemo.c simple demonstration of coding calls to PCRE - perltest Perl test program - pcregrep.c source of a grep utility that uses PCRE - pcre-config.in source of script which retains PCRE information - pcrecpp_unittest.c ) - pcre_scanner_unittest.c ) test programs for the C++ wrapper - pcre_stringpiece_unittest.c ) - testdata/testinput* test data for main library tests - testdata/testoutput* expected test results - testdata/grep* input and output for pcregrep tests - -(C) Auxiliary files for Win32 DLL - - libpcre.def - libpcreposix.def - pcre.def - -(D) Auxiliary file for VPASCAL - - makevp.bat - -Philip Hazel -Email local part: ph10 -Email domain: cam.ac.uk -June 2005 |