summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authornigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:39:17 +0000
committernigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:39:17 +0000
commit1622a3e7058dec7de74889c69595693ac0c64187 (patch)
tree871101eb39b4c1611359f849f7e8b1647a291e30 /README
parentb72ae7c414f315e8915948fbea7b391a490fa946 (diff)
downloadpcre-1622a3e7058dec7de74889c69595693ac0c64187.tar.gz
Load pcre-2.08a into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@41 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'README')
-rw-r--r--README463
1 files changed, 131 insertions, 332 deletions
diff --git a/README b/README
index c9696ba..aa49877 100644
--- a/README
+++ b/README
@@ -1,109 +1,75 @@
-README file for PCRE (Perl-compatible regular expressions)
-----------------------------------------------------------
-
-*******************************************************************************
-* IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00 *
-* *
-* Please note that there has been a change in the API such that a larger *
-* ovector is required at matching time, to provide some additional workspace. *
-* The new man page has details. This change was necessary in order to support *
-* some of the new functionality in Perl 5.005. *
-* *
-* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00 *
-* *
-* Another (I hope this is the last!) change has been made to the API for the *
-* pcre_compile() function. An additional argument has been added to make it *
-* possible to pass over a pointer to character tables built in the current *
-* locale by pcre_maketables(). To use the default tables, this new arguement *
-* should be passed as NULL. *
-* *
-* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05 *
-* *
-* Yet another (and again I hope this really is the last) change has been made *
-* to the API for the pcre_exec() function. An additional argument has been *
-* added to make it possible to start the match other than at the start of the *
-* subject string. This is important if there are lookbehinds. The new man *
-* page has the details, but you just want to convert existing programs, all *
-* you need to do is to stick in a new fifth argument to pcre_exec(), with a *
-* value of zero. For example, change *
-* *
-* pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize) *
-* to *
-* pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize) *
-*******************************************************************************
+README file for PCRE (Perl-compatible regular expression library)
+-----------------------------------------------------------------
+Please read the NEWS file if you are upgrading from a previous release.
-The distribution should contain the following files:
- ChangeLog log of changes to the code
- LICENCE conditions for the use of PCRE
- Makefile for building PCRE in Unix systems
- README this file
- RunTest a Unix shell script for running tests
- Tech.Notes notes on the encoding
- pcre.3 man page source for the functions
- pcre.3.txt plain text version
- pcre.3.html HTML version
- pcreposix.3 man page source for the POSIX wrapper API
- pcreposix.3.txt plain text version
- pcreposix.3.HTML HTML version
- dftables.c auxiliary program for building chartables.c
- get.c )
- maketables.c )
- study.c ) source of
- pcre.c ) the functions
- pcreposix.c )
- pcre.h header for the external API
- pcreposix.h header for the external POSIX wrapper API
- internal.h header for internal use
- pcretest.c test program
- pgrep.1 man page source for pgrep
- pgrep.1.txt plain text version
- pgrep.1.HTML HTML version
- pgrep.c source of a grep utility that uses PCRE
- perltest Perl test program
- testinput1 test data, compatible with Perl 5.004 and 5.005
- testinput2 test data for error messages and non-Perl things
- testinput3 test data, compatible with Perl 5.005
- testinput4 test data for locale-specific tests
- testoutput1 test results corresponding to testinput1
- testoutput2 test results corresponding to testinput2
- testoutput3 test results corresponding to testinput3
- testoutput4 test results corresponding to testinput4
- dll.mk for Win32 DLL
- pcre.def ditto
-
-To build PCRE on a Unix system, first edit Makefile for your system. It is a
-fairly simple make file, and there are some comments near the top, after the
-text "On a Unix system". Then run "make". It builds two libraries called
+Building PCRE on a Unix system
+------------------------------
+
+To build PCRE on a Unix system, run the "configure" command in the PCRE
+distribution directory. This is a standard GNU "autoconf" configuration script,
+for which generic instructions are supplied in INSTALL. On many systems just
+running "./configure" is sufficient, but the usual methods of changing standard
+defaults are available. For example
+
+CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
+
+specifies that the C compiler should be run with the flags '-O2 -Wall' instead
+of the default, and that "make install" should install PCRE under /opt/local
+instead of the default /usr/local. The "configure" script builds two files:
+
+. Makefile is built by copying Makefile.in and making certain substitutions.
+. config.h is built by copying config.in and making certain substitutions.
+
+Once "configure" has run, you can run "make". It builds two libraries called
libpcre.a and libpcreposix.a, a test program called pcretest, and the pgrep
command. You can use "make install" to copy these, and the public header file
-pcre.h, to appropriate live directories on your system. These installation
-directories are defined at the top of the Makefile, and you should edit them if
-necessary.
-
-For a non-Unix system, read the comments at the top of Makefile, which give
-some hints on what needs to be done. PCRE has been compiled on Windows systems
-and on Macintoshes, but I don't know the details as I don't use those systems.
-It should be straightforward to build PCRE on any system that has a Standard C
-compiler.
-
-Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
-contributed by Paul.Sokolovsky@technologist.com. These environments are
-Mingw32 (http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and
-CygWin (http://sourceware.cygnus.com/cygwin/). Paul comments:
-
- For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
- pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
- linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
- main test go ok, locale not supported).
-
-To test PCRE, run the RunTest script in the pcre directory. This can also be
-run by "make runtest". It runs the pcretest test program (which is documented
-below) on each of the testinput files in turn, and compares the output with the
-contents of the corresponding testoutput file. A file called testtry is used to
-hold the output from pcretest. To run pcretest on just one of the test files,
-give its number as an argument to RunTest, for example:
+pcre.h, to appropriate live directories on your system, in the normal way.
+
+
+Shared libraries on Unix systems
+--------------------------------
+
+The default distribution builds static libraries. It is also possible to build
+PCRE as two shared libraries. This support is new and experimental and may not
+work on all systems. It relies on the "libtool" scripts - these are distributed
+with PCRE. To build PCRE using shared libraries you must use --enable-shared
+when configuring it. For example
+
+./configure --prefix=/usr/gnu --enable-shared
+
+Then run "make" in the usual way. It should build a "libtool" script and use
+this to compile and link shared libraries, which are placed in a subdirectory
+called .libs. The programs pcretest and pgrep are built to use these
+uninstalled libraries by means of wrapper scripts. When you use "make install"
+to install shared libraries, pgrep is automatically re-built to use the newly
+installed library before it itself is installed.
+
+
+Building on non-Unix systems
+----------------------------
+
+For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has
+been compiled on Windows systems and on Macintoshes, but I don't know the
+details because I don't use those systems. It should be straightforward to
+build PCRE on any system that has a Standard C compiler, because it uses only
+Standard C functions.
+
+
+Testing PCRE
+------------
+
+To test PCRE on a Unix system, run the RunTest script in the pcre directory.
+(This can also be run by "make runtest" or "make check".) For other systems,
+see the instruction in NON-UNIX-USE.
+
+The script runs the pcretest test program (which is documented in
+doc/pcretest.txt) on each of the testinput files (in the testdata directory) in
+turn, and compares the output with the contents of the corresponding testoutput
+file. A file called testtry is used to hold the output from pcretest. To run
+pcretest on just one of the test files, give its number as an argument to
+RunTest, for example:
RunTest 3
@@ -179,238 +145,71 @@ You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE to malfunction.
-The pcretest program
---------------------
+Manifest
+--------
-This program is intended for testing PCRE, but it can also be used for
-experimenting with regular expressions.
-
-If it is given two filename arguments, it reads from the first and writes to
-the second. If it is given only one filename argument, it reads from that file
-and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
-prompts for each line of input.
-
-The program handles any number of sets of input on a single input file. Each
-set starts with a regular expression, and continues with any number of data
-lines to be matched against the pattern. An empty line signals the end of the
-set. The regular expressions are given enclosed in any non-alphameric
-delimiters other than backslash, for example
-
- /(a|bc)x+yz/
-
-White space before the initial delimiter is ignored. A regular expression may
-be continued over several input lines, in which case the newline characters are
-included within it. See the testinput files for many examples. It is possible
-to include the delimiter within the pattern by escaping it, for example
+The distribution should contain the following files:
- /abc\/def/
-
-If you do so, the escape and the delimiter form part of the pattern, but since
-delimiters are always non-alphameric, this does not affect its interpretation.
-If the terminating delimiter is immediately followed by a backslash, for
-example,
-
- /abc/\
-
-then a backslash is added to the end of the pattern. This is done to provide a
-way of testing the error condition that arises if a pattern finishes with a
-backslash, because
-
- /abc\/
-
-is interpreted as the first line of a pattern that starts with "abc/", causing
-pcretest to read the next line as a continuation of the regular expression.
-
-The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
-PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
-example:
-
- /caseless/i
-
-These modifier letters have the same effect as they do in Perl. There are
-others which set PCRE options that do not correspond to anything in Perl: /A,
-/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
-
-Searching for all possible matches within each subject string can be requested
-by the /g or /G modifier. After finding a match, PCRE is called again to search
-the remainder of the subject string. The difference between /g and /G is that
-the former uses the startoffset argument to pcre_exec() to start searching at
-a new point within the entire string (which is in effect what Perl does),
-whereas the latter passes over a shortened substring. This makes a difference
-to the matching process if the pattern begins with a lookbehind assertion
-(including \b or \B).
-
-If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
-next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an
-empty string again. This imitates the way Perl handles such cases when using
-the /g modifier or the split() function.
-
-There are a number of other modifiers for controlling the way pcretest
-operates.
-
-The /+ modifier requests that as well as outputting the substring that matched
-the entire pattern, pcretest should in addition output the remainder of the
-subject string. This is useful for tests where the subject contains multiple
-copies of the same substring.
-
-The /L modifier must be followed directly by the name of a locale, for example,
-
- /pattern/Lfr
-
-For this reason, it must be the last modifier letter. The given locale is set,
-pcre_maketables() is called to build a set of character tables for the locale,
-and this is then passed to pcre_compile() when compiling the regular
-expression. Without an /L modifier, NULL is passed as the tables pointer; that
-is, /L applies only to the expression on which it appears.
-
-The /I modifier requests that pcretest output information about the compiled
-expression (whether it is anchored, has a fixed first character, and so on). It
-does this by calling pcre_info() after compiling an expression, and outputting
-the information it gets back. If the pattern is studied, the results of that
-are also output.
-
-The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
-the internal form of compiled regular expressions to be output after
-compilation.
-
-The /S modifier causes pcre_study() to be called after the expression has been
-compiled, and the results used when the expression is matched.
-
-The /M modifier causes the size of memory block used to hold the compiled
-pattern to be output.
-
-Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
-rather than its native API. When this is done, all other modifiers except /i,
-/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
-set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
-and PCRE_DOTALL unless REG_NEWLINE is set.
-
-Before each data line is passed to pcre_exec(), leading and trailing whitespace
-is removed, and it is then scanned for \ escapes. The following are recognized:
-
- \a alarm (= BEL)
- \b backspace
- \e escape
- \f formfeed
- \n newline
- \r carriage return
- \t tab
- \v vertical tab
- \nnn octal character (up to 3 octal digits)
- \xhh hexadecimal character (up to 2 hex digits)
-
- \A pass the PCRE_ANCHORED option to pcre_exec()
- \B pass the PCRE_NOTBOL option to pcre_exec()
- \Cdd call pcre_copy_substring() for substring dd after a successful match
- (any decimal number less than 32)
- \Gdd call pcre_get_substring() for substring dd after a successful match
- (any decimal number less than 32)
- \L call pcre_get_substringlist() after a successful match
- \N pass the PCRE_NOTEMPTY option to pcre_exec()
- \Odd set the size of the output vector passed to pcre_exec() to dd
- (any number of decimal digits)
- \Z pass the PCRE_NOTEOL option to pcre_exec()
-
-A backslash followed by anything else just escapes the anything else. If the
-very last character is a backslash, it is ignored. This gives a way of passing
-an empty line as data, since a real empty line terminates the data input.
-
-If /P was present on the regex, causing the POSIX wrapper API to be used, only
-\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
-regexec() respectively.
-
-When a match succeeds, pcretest outputs the list of captured substrings that
-pcre_exec() returns, starting with number 0 for the string that matched the
-whole pattern. Here is an example of an interactive pcretest run.
-
- $ pcretest
- PCRE version 2.06 08-Jun-1999
-
- re> /^abc(\d+)/
- data> abc123
- 0: abc123
- 1: 123
- data> xyz
- No match
-
-If the strings contain any non-printing characters, they are output as \0x
-escapes. If the pattern has the /+ modifier, then the output for substring 0 is
-followed by the the rest of the subject string, identified by "0+" like this:
-
- re> /cat/+
- data> cataract
- 0: cat
- 0+ aract
-
-If the pattern has the /g or /G modifier, the results of successive matching
-attempts are output in sequence, like this:
-
- re> /\Bi(\w\w)/g
- data> Mississippi
- 0: iss
- 1: ss
- 0: iss
- 1: ss
- 0: ipp
- 1: pp
-
-"No match" is output only if the first match attempt fails.
-
-If any of \C, \G, or \L are present in a data line that is successfully
-matched, the substrings extracted by the convenience functions are output with
-C, G, or L after the string number instead of a colon. This is in addition to
-the normal full list. The string length (that is, the return from the
-extraction function) is given in parentheses after each string for \C and \G.
-
-Note that while patterns can be continued over several lines (a plain ">"
-prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \n escape.
-
-If the -p option is given to pcretest, it is equivalent to adding /P to each
-regular expression: the POSIX wrapper API is used to call PCRE. None of the
-following flags has any effect in this case.
-
-If the option -d is given to pcretest, it is equivalent to adding /D to each
-regular expression: the internal form is output after compilation.
-
-If the option -i is given to pcretest, it is equivalent to adding /I to each
-regular expression: information about the compiled pattern is given after
-compilation.
-
-If the option -m is given to pcretest, it outputs the size of each compiled
-pattern after it has been compiled. It is equivalent to adding /M to each
-regular expression. For compatibility with earlier versions of pcretest, -s is
-a synonym for -m.
-
-If the -t option is given, each compile, study, and match is run 20000 times
-while being timed, and the resulting time per compile or match is output in
-milliseconds. Do not set -t with -s, because you will then get the size output
-20000 times and the timing will be distorted. If you want to change the number
-of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
-pcretest.c
-
-
-
-The perltest program
---------------------
-
-The perltest program tests Perl's regular expressions; it has the same
-specification as pcretest, and so can be given identical input, except that
-input patterns can be followed only by Perl's lower case modifiers. The
-contents of testinput1 and testinput3 meet this condition.
-
-The data lines are processed as Perl double-quoted strings, so if they contain
-" \ $ or @ characters, these have to be escaped. For this reason, all such
-characters in testinput1 and testinput3 are escaped so that they can be used
-for perltest as well as for pcretest, and the special upper case modifiers such
-as /A that pcretest recognizes are not used in these files. The output should
-be identical, apart from the initial identifying banner.
-
-The testinput2 and testinput4 files are not suitable for feeding to perltest,
-since they do make use of the special upper case modifiers and escapes that
-pcretest uses to test some features of PCRE. The first of these files also
-contains malformed regular expressions, in order to check that PCRE diagnoses
-them correctly.
+(A) The actual source files of the PCRE library functions and their
+ headers:
+
+ dftables.c auxiliary program for building chartables.c
+ get.c )
+ maketables.c )
+ study.c ) source of
+ pcre.c ) the functions
+ pcreposix.c )
+ pcre.h header for the external API
+ pcreposix.h header for the external POSIX wrapper API
+ internal.h header for internal use
+ config.in template for config.h, which is built by configure
+
+(B) Auxiliary files:
+
+ AUTHORS information about the author of PCRE
+ ChangeLog log of changes to the code
+ INSTALL generic installation instructions
+ LICENCE conditions for the use of PCRE
+ Makefile.in template for Unix Makefile, which is built by configure
+ NEWS important changes in this release
+ NON-UNIX-USE notes on building PCRE on non-Unix systems
+ README this file
+ RunTest a Unix shell script for running tests
+ config.guess ) files used by libtool,
+ config.sub ) used only when building a shared library
+ configure a configuring shell script (built by autoconf)
+ configure.in the autoconf input used to build configure
+ doc/Tech.Notes notes on the encoding
+ doc/pcre.3 man page source for the PCRE functions
+ doc/pcre.html HTML version
+ doc/pcre.txt plain text version
+ doc/pcreposix.3 man page source for the POSIX wrapper API
+ doc/pcreposix.html HTML version
+ doc/pcreposix.txt plain text version
+ doc/pcretest.txt documentation of test program
+ doc/perltest.txt documentation of Perl test program
+ doc/pgrep.1 man page source for the pgrep utility
+ doc/pgrep.html HTML version
+ doc/pgrep.txt plain text version
+ install-sh a shell script for installing files
+ ltconfig ) files used to build "libtool",
+ ltmain.sh ) used only when building a shared library
+ pcretest.c test program
+ perltest Perl test program
+ pgrep.c source of a grep utility that uses PCRE
+ testdata/testinput1 test data, compatible with Perl 5.004 and 5.005
+ testdata/testinput2 test data for error messages and non-Perl things
+ testdata/testinput3 test data, compatible with Perl 5.005
+ testdata/testinput4 test data for locale-specific tests
+ testdata/testoutput1 test results corresponding to testinput1
+ testdata/testoutput2 test results corresponding to testinput2
+ testdata/testoutput3 test results corresponding to testinput3
+ testdata/testoutput4 test results corresponding to testinput4
+
+(C) Auxiliary files for Win32 DLL
+
+ dll.mk
+ pcre.def
Philip Hazel <ph10@cam.ac.uk>
-July 1999
+January 2000