diff options
author | nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2007-02-24 21:39:17 +0000 |
---|---|---|
committer | nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2007-02-24 21:39:17 +0000 |
commit | 1622a3e7058dec7de74889c69595693ac0c64187 (patch) | |
tree | 871101eb39b4c1611359f849f7e8b1647a291e30 /README | |
parent | b72ae7c414f315e8915948fbea7b391a490fa946 (diff) | |
download | pcre-1622a3e7058dec7de74889c69595693ac0c64187.tar.gz |
Load pcre-2.08a into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@41 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'README')
-rw-r--r-- | README | 463 |
1 files changed, 131 insertions, 332 deletions
@@ -1,109 +1,75 @@ -README file for PCRE (Perl-compatible regular expressions) ----------------------------------------------------------- - -******************************************************************************* -* IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00 * -* * -* Please note that there has been a change in the API such that a larger * -* ovector is required at matching time, to provide some additional workspace. * -* The new man page has details. This change was necessary in order to support * -* some of the new functionality in Perl 5.005. * -* * -* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00 * -* * -* Another (I hope this is the last!) change has been made to the API for the * -* pcre_compile() function. An additional argument has been added to make it * -* possible to pass over a pointer to character tables built in the current * -* locale by pcre_maketables(). To use the default tables, this new arguement * -* should be passed as NULL. * -* * -* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05 * -* * -* Yet another (and again I hope this really is the last) change has been made * -* to the API for the pcre_exec() function. An additional argument has been * -* added to make it possible to start the match other than at the start of the * -* subject string. This is important if there are lookbehinds. The new man * -* page has the details, but you just want to convert existing programs, all * -* you need to do is to stick in a new fifth argument to pcre_exec(), with a * -* value of zero. For example, change * -* * -* pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize) * -* to * -* pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize) * -******************************************************************************* +README file for PCRE (Perl-compatible regular expression library) +----------------------------------------------------------------- +Please read the NEWS file if you are upgrading from a previous release. -The distribution should contain the following files: - ChangeLog log of changes to the code - LICENCE conditions for the use of PCRE - Makefile for building PCRE in Unix systems - README this file - RunTest a Unix shell script for running tests - Tech.Notes notes on the encoding - pcre.3 man page source for the functions - pcre.3.txt plain text version - pcre.3.html HTML version - pcreposix.3 man page source for the POSIX wrapper API - pcreposix.3.txt plain text version - pcreposix.3.HTML HTML version - dftables.c auxiliary program for building chartables.c - get.c ) - maketables.c ) - study.c ) source of - pcre.c ) the functions - pcreposix.c ) - pcre.h header for the external API - pcreposix.h header for the external POSIX wrapper API - internal.h header for internal use - pcretest.c test program - pgrep.1 man page source for pgrep - pgrep.1.txt plain text version - pgrep.1.HTML HTML version - pgrep.c source of a grep utility that uses PCRE - perltest Perl test program - testinput1 test data, compatible with Perl 5.004 and 5.005 - testinput2 test data for error messages and non-Perl things - testinput3 test data, compatible with Perl 5.005 - testinput4 test data for locale-specific tests - testoutput1 test results corresponding to testinput1 - testoutput2 test results corresponding to testinput2 - testoutput3 test results corresponding to testinput3 - testoutput4 test results corresponding to testinput4 - dll.mk for Win32 DLL - pcre.def ditto - -To build PCRE on a Unix system, first edit Makefile for your system. It is a -fairly simple make file, and there are some comments near the top, after the -text "On a Unix system". Then run "make". It builds two libraries called +Building PCRE on a Unix system +------------------------------ + +To build PCRE on a Unix system, run the "configure" command in the PCRE +distribution directory. This is a standard GNU "autoconf" configuration script, +for which generic instructions are supplied in INSTALL. On many systems just +running "./configure" is sufficient, but the usual methods of changing standard +defaults are available. For example + +CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local + +specifies that the C compiler should be run with the flags '-O2 -Wall' instead +of the default, and that "make install" should install PCRE under /opt/local +instead of the default /usr/local. The "configure" script builds two files: + +. Makefile is built by copying Makefile.in and making certain substitutions. +. config.h is built by copying config.in and making certain substitutions. + +Once "configure" has run, you can run "make". It builds two libraries called libpcre.a and libpcreposix.a, a test program called pcretest, and the pgrep command. You can use "make install" to copy these, and the public header file -pcre.h, to appropriate live directories on your system. These installation -directories are defined at the top of the Makefile, and you should edit them if -necessary. - -For a non-Unix system, read the comments at the top of Makefile, which give -some hints on what needs to be done. PCRE has been compiled on Windows systems -and on Macintoshes, but I don't know the details as I don't use those systems. -It should be straightforward to build PCRE on any system that has a Standard C -compiler. - -Some help in building a Win32 DLL of PCRE in GnuWin32 environments was -contributed by Paul.Sokolovsky@technologist.com. These environments are -Mingw32 (http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and -CygWin (http://sourceware.cygnus.com/cygwin/). Paul comments: - - For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get - pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically - linked pgrep and pcretest. If you have /bin/sh, run RunTest (three - main test go ok, locale not supported). - -To test PCRE, run the RunTest script in the pcre directory. This can also be -run by "make runtest". It runs the pcretest test program (which is documented -below) on each of the testinput files in turn, and compares the output with the -contents of the corresponding testoutput file. A file called testtry is used to -hold the output from pcretest. To run pcretest on just one of the test files, -give its number as an argument to RunTest, for example: +pcre.h, to appropriate live directories on your system, in the normal way. + + +Shared libraries on Unix systems +-------------------------------- + +The default distribution builds static libraries. It is also possible to build +PCRE as two shared libraries. This support is new and experimental and may not +work on all systems. It relies on the "libtool" scripts - these are distributed +with PCRE. To build PCRE using shared libraries you must use --enable-shared +when configuring it. For example + +./configure --prefix=/usr/gnu --enable-shared + +Then run "make" in the usual way. It should build a "libtool" script and use +this to compile and link shared libraries, which are placed in a subdirectory +called .libs. The programs pcretest and pgrep are built to use these +uninstalled libraries by means of wrapper scripts. When you use "make install" +to install shared libraries, pgrep is automatically re-built to use the newly +installed library before it itself is installed. + + +Building on non-Unix systems +---------------------------- + +For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has +been compiled on Windows systems and on Macintoshes, but I don't know the +details because I don't use those systems. It should be straightforward to +build PCRE on any system that has a Standard C compiler, because it uses only +Standard C functions. + + +Testing PCRE +------------ + +To test PCRE on a Unix system, run the RunTest script in the pcre directory. +(This can also be run by "make runtest" or "make check".) For other systems, +see the instruction in NON-UNIX-USE. + +The script runs the pcretest test program (which is documented in +doc/pcretest.txt) on each of the testinput files (in the testdata directory) in +turn, and compares the output with the contents of the corresponding testoutput +file. A file called testtry is used to hold the output from pcretest. To run +pcretest on just one of the test files, give its number as an argument to +RunTest, for example: RunTest 3 @@ -179,238 +145,71 @@ You should not alter the set of characters that contain the 128 bit, as that will cause PCRE to malfunction. -The pcretest program --------------------- +Manifest +-------- -This program is intended for testing PCRE, but it can also be used for -experimenting with regular expressions. - -If it is given two filename arguments, it reads from the first and writes to -the second. If it is given only one filename argument, it reads from that file -and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and -prompts for each line of input. - -The program handles any number of sets of input on a single input file. Each -set starts with a regular expression, and continues with any number of data -lines to be matched against the pattern. An empty line signals the end of the -set. The regular expressions are given enclosed in any non-alphameric -delimiters other than backslash, for example - - /(a|bc)x+yz/ - -White space before the initial delimiter is ignored. A regular expression may -be continued over several input lines, in which case the newline characters are -included within it. See the testinput files for many examples. It is possible -to include the delimiter within the pattern by escaping it, for example +The distribution should contain the following files: - /abc\/def/ - -If you do so, the escape and the delimiter form part of the pattern, but since -delimiters are always non-alphameric, this does not affect its interpretation. -If the terminating delimiter is immediately followed by a backslash, for -example, - - /abc/\ - -then a backslash is added to the end of the pattern. This is done to provide a -way of testing the error condition that arises if a pattern finishes with a -backslash, because - - /abc\/ - -is interpreted as the first line of a pattern that starts with "abc/", causing -pcretest to read the next line as a continuation of the regular expression. - -The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS, -PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For -example: - - /caseless/i - -These modifier letters have the same effect as they do in Perl. There are -others which set PCRE options that do not correspond to anything in Perl: /A, -/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively. - -Searching for all possible matches within each subject string can be requested -by the /g or /G modifier. After finding a match, PCRE is called again to search -the remainder of the subject string. The difference between /g and /G is that -the former uses the startoffset argument to pcre_exec() to start searching at -a new point within the entire string (which is in effect what Perl does), -whereas the latter passes over a shortened substring. This makes a difference -to the matching process if the pattern begins with a lookbehind assertion -(including \b or \B). - -If any call to pcre_exec() in a /g or /G sequence matches an empty string, the -next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an -empty string again. This imitates the way Perl handles such cases when using -the /g modifier or the split() function. - -There are a number of other modifiers for controlling the way pcretest -operates. - -The /+ modifier requests that as well as outputting the substring that matched -the entire pattern, pcretest should in addition output the remainder of the -subject string. This is useful for tests where the subject contains multiple -copies of the same substring. - -The /L modifier must be followed directly by the name of a locale, for example, - - /pattern/Lfr - -For this reason, it must be the last modifier letter. The given locale is set, -pcre_maketables() is called to build a set of character tables for the locale, -and this is then passed to pcre_compile() when compiling the regular -expression. Without an /L modifier, NULL is passed as the tables pointer; that -is, /L applies only to the expression on which it appears. - -The /I modifier requests that pcretest output information about the compiled -expression (whether it is anchored, has a fixed first character, and so on). It -does this by calling pcre_info() after compiling an expression, and outputting -the information it gets back. If the pattern is studied, the results of that -are also output. - -The /D modifier is a PCRE debugging feature, which also assumes /I. It causes -the internal form of compiled regular expressions to be output after -compilation. - -The /S modifier causes pcre_study() to be called after the expression has been -compiled, and the results used when the expression is matched. - -The /M modifier causes the size of memory block used to hold the compiled -pattern to be output. - -Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API -rather than its native API. When this is done, all other modifiers except /i, -/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is -set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, -and PCRE_DOTALL unless REG_NEWLINE is set. - -Before each data line is passed to pcre_exec(), leading and trailing whitespace -is removed, and it is then scanned for \ escapes. The following are recognized: - - \a alarm (= BEL) - \b backspace - \e escape - \f formfeed - \n newline - \r carriage return - \t tab - \v vertical tab - \nnn octal character (up to 3 octal digits) - \xhh hexadecimal character (up to 2 hex digits) - - \A pass the PCRE_ANCHORED option to pcre_exec() - \B pass the PCRE_NOTBOL option to pcre_exec() - \Cdd call pcre_copy_substring() for substring dd after a successful match - (any decimal number less than 32) - \Gdd call pcre_get_substring() for substring dd after a successful match - (any decimal number less than 32) - \L call pcre_get_substringlist() after a successful match - \N pass the PCRE_NOTEMPTY option to pcre_exec() - \Odd set the size of the output vector passed to pcre_exec() to dd - (any number of decimal digits) - \Z pass the PCRE_NOTEOL option to pcre_exec() - -A backslash followed by anything else just escapes the anything else. If the -very last character is a backslash, it is ignored. This gives a way of passing -an empty line as data, since a real empty line terminates the data input. - -If /P was present on the regex, causing the POSIX wrapper API to be used, only -\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to -regexec() respectively. - -When a match succeeds, pcretest outputs the list of captured substrings that -pcre_exec() returns, starting with number 0 for the string that matched the -whole pattern. Here is an example of an interactive pcretest run. - - $ pcretest - PCRE version 2.06 08-Jun-1999 - - re> /^abc(\d+)/ - data> abc123 - 0: abc123 - 1: 123 - data> xyz - No match - -If the strings contain any non-printing characters, they are output as \0x -escapes. If the pattern has the /+ modifier, then the output for substring 0 is -followed by the the rest of the subject string, identified by "0+" like this: - - re> /cat/+ - data> cataract - 0: cat - 0+ aract - -If the pattern has the /g or /G modifier, the results of successive matching -attempts are output in sequence, like this: - - re> /\Bi(\w\w)/g - data> Mississippi - 0: iss - 1: ss - 0: iss - 1: ss - 0: ipp - 1: pp - -"No match" is output only if the first match attempt fails. - -If any of \C, \G, or \L are present in a data line that is successfully -matched, the substrings extracted by the convenience functions are output with -C, G, or L after the string number instead of a colon. This is in addition to -the normal full list. The string length (that is, the return from the -extraction function) is given in parentheses after each string for \C and \G. - -Note that while patterns can be continued over several lines (a plain ">" -prompt is used for continuations), data lines may not. However newlines can be -included in data by means of the \n escape. - -If the -p option is given to pcretest, it is equivalent to adding /P to each -regular expression: the POSIX wrapper API is used to call PCRE. None of the -following flags has any effect in this case. - -If the option -d is given to pcretest, it is equivalent to adding /D to each -regular expression: the internal form is output after compilation. - -If the option -i is given to pcretest, it is equivalent to adding /I to each -regular expression: information about the compiled pattern is given after -compilation. - -If the option -m is given to pcretest, it outputs the size of each compiled -pattern after it has been compiled. It is equivalent to adding /M to each -regular expression. For compatibility with earlier versions of pcretest, -s is -a synonym for -m. - -If the -t option is given, each compile, study, and match is run 20000 times -while being timed, and the resulting time per compile or match is output in -milliseconds. Do not set -t with -s, because you will then get the size output -20000 times and the timing will be distorted. If you want to change the number -of repetitions used for timing, edit the definition of LOOPREPEAT at the top of -pcretest.c - - - -The perltest program --------------------- - -The perltest program tests Perl's regular expressions; it has the same -specification as pcretest, and so can be given identical input, except that -input patterns can be followed only by Perl's lower case modifiers. The -contents of testinput1 and testinput3 meet this condition. - -The data lines are processed as Perl double-quoted strings, so if they contain -" \ $ or @ characters, these have to be escaped. For this reason, all such -characters in testinput1 and testinput3 are escaped so that they can be used -for perltest as well as for pcretest, and the special upper case modifiers such -as /A that pcretest recognizes are not used in these files. The output should -be identical, apart from the initial identifying banner. - -The testinput2 and testinput4 files are not suitable for feeding to perltest, -since they do make use of the special upper case modifiers and escapes that -pcretest uses to test some features of PCRE. The first of these files also -contains malformed regular expressions, in order to check that PCRE diagnoses -them correctly. +(A) The actual source files of the PCRE library functions and their + headers: + + dftables.c auxiliary program for building chartables.c + get.c ) + maketables.c ) + study.c ) source of + pcre.c ) the functions + pcreposix.c ) + pcre.h header for the external API + pcreposix.h header for the external POSIX wrapper API + internal.h header for internal use + config.in template for config.h, which is built by configure + +(B) Auxiliary files: + + AUTHORS information about the author of PCRE + ChangeLog log of changes to the code + INSTALL generic installation instructions + LICENCE conditions for the use of PCRE + Makefile.in template for Unix Makefile, which is built by configure + NEWS important changes in this release + NON-UNIX-USE notes on building PCRE on non-Unix systems + README this file + RunTest a Unix shell script for running tests + config.guess ) files used by libtool, + config.sub ) used only when building a shared library + configure a configuring shell script (built by autoconf) + configure.in the autoconf input used to build configure + doc/Tech.Notes notes on the encoding + doc/pcre.3 man page source for the PCRE functions + doc/pcre.html HTML version + doc/pcre.txt plain text version + doc/pcreposix.3 man page source for the POSIX wrapper API + doc/pcreposix.html HTML version + doc/pcreposix.txt plain text version + doc/pcretest.txt documentation of test program + doc/perltest.txt documentation of Perl test program + doc/pgrep.1 man page source for the pgrep utility + doc/pgrep.html HTML version + doc/pgrep.txt plain text version + install-sh a shell script for installing files + ltconfig ) files used to build "libtool", + ltmain.sh ) used only when building a shared library + pcretest.c test program + perltest Perl test program + pgrep.c source of a grep utility that uses PCRE + testdata/testinput1 test data, compatible with Perl 5.004 and 5.005 + testdata/testinput2 test data for error messages and non-Perl things + testdata/testinput3 test data, compatible with Perl 5.005 + testdata/testinput4 test data for locale-specific tests + testdata/testoutput1 test results corresponding to testinput1 + testdata/testoutput2 test results corresponding to testinput2 + testdata/testoutput3 test results corresponding to testinput3 + testdata/testoutput4 test results corresponding to testinput4 + +(C) Auxiliary files for Win32 DLL + + dll.mk + pcre.def Philip Hazel <ph10@cam.ac.uk> -July 1999 +January 2000 |