summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authornigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:38:53 +0000
committernigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:38:53 +0000
commit7703eae0f55edaff9f482fa8d23a6910d5d18577 (patch)
tree83aa003e890adb9ef5e1968d02febf0256cf61ac /README
parent0c8732c8583c7e31476c0ec1c0ac92cc7e5f8bc0 (diff)
downloadpcre-7703eae0f55edaff9f482fa8d23a6910d5d18577.tar.gz
Load pcre-2.03 into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@29 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'README')
-rw-r--r--README60
1 files changed, 47 insertions, 13 deletions
diff --git a/README b/README
index e169e46..29fc714 100644
--- a/README
+++ b/README
@@ -21,6 +21,7 @@ README file for PCRE (Perl-compatible regular expressions)
The distribution should contain the following files:
ChangeLog log of changes to the code
+ LICENCE conditions for the use of PCRE
Makefile for building PCRE
README this file
RunTest a shell script for running tests
@@ -28,6 +29,7 @@ The distribution should contain the following files:
pcre.3 man page for the functions
pcreposix.3 man page for the POSIX wrapper API
dftables.c auxiliary program for building chartables.c
+ get.c )
maketables.c )
study.c ) source of
pcre.c ) the functions
@@ -69,8 +71,9 @@ additional features of release 5.005, which is why it is kept separate from the
main test input, which needs only Perl 5.004. In the long run, when 5.005 is
widespread, these two test files may get amalgamated.
-The second set of tests check pcre_info(), pcre_study(), error detection and
-run-time flags that are specific to PCRE, as well as the POSIX wrapper API.
+The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
+pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
+flags that are specific to PCRE, as well as the POSIX wrapper API.
The fourth set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
@@ -157,13 +160,36 @@ The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern. An empty line signals the end of the
set. The regular expressions are given enclosed in any non-alphameric
-delimiters, for example
+delimiters other than backslash, for example
/(a|bc)x+yz/
-and may be followed by i, m, s, or x to set the PCRE_CASELESS, PCRE_MULTILINE,
-PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These options have the
-same effect as they do in Perl.
+White space before the initial delimiter is ignored. A regular expression may
+be continued over several input lines, in which case the newline characters are
+included within it. See the testinput files for many examples. It is possible
+to include the delimiter within the pattern by escaping it, for example
+
+ /abc\/def/
+
+If you do so, the escape and the delimiter form part of the pattern, but since
+delimiters are always non-alphameric, this does not affect its interpretation.
+If the terminating delimiter is immediately followed by a backslash, for
+example,
+
+ /abc/\
+
+then a backslash is added to the end of the pattern. This provides a way of
+testing the error condition that arises if a pattern finishes with a backslash,
+because
+
+ /abc\/
+
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcretest to read the next line as a continuation of the regular expression.
+
+The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
+PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These
+options have the same effect as they do in Perl.
There are also some upper case options that do not match Perl options: /A, /E,
and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
@@ -196,9 +222,6 @@ rather than its native API. When this is done, all other options except /i and
is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
PCRE_DOTALL unless REG_NEWLINE is set.
-A regular expression can extend over several lines of input; the newlines are
-included in it. See the testinput files for many examples.
-
Before each data line is passed to pcre_exec(), leading and trailing whitespace
is removed, and it is then scanned for \ escapes. The following are recognized:
@@ -215,6 +238,11 @@ is removed, and it is then scanned for \ escapes. The following are recognized:
\A pass the PCRE_ANCHORED option to pcre_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
+ \Cdd call pcre_copy_substring() for substring dd after a successful match
+ (any decimal number less than 32)
+ \Gdd call pcre_get_substring() for substring dd after a successful match
+ (any decimal number less than 32)
+ \L call pcre_get_substringlist() after a successful match
\Odd set the size of the output vector passed to pcre_exec() to dd
(any number of decimal digits)
\Z pass the PCRE_NOTEOL option to pcre_exec()
@@ -227,7 +255,7 @@ If /P was present on the regex, causing the POSIX wrapper API to be used, only
\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
regexec() respectively.
-When a match succeeds, pcretest outputs the list of identified substrings that
+When a match succeeds, pcretest outputs the list of captured substrings that
pcre_exec() returns, starting with number 0 for the string that matched the
whole pattern. Here is an example of an interactive pcretest run.
@@ -242,6 +270,12 @@ whole pattern. Here is an example of an interactive pcretest run.
data> xyz
No match
+If any of \C, \G, or \L are present in a data line that is successfully
+matched, the substrings extracted by the convenience functions are output with
+C, G, or L after the string number instead of a colon. This is in addition to
+the normal full list. The string length (that is, the return from the
+extraction function) is given in parentheses after each string for \C and \G.
+
Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape.
@@ -260,10 +294,10 @@ compilation.
If the option -s is given to pcretest, it outputs the size of each compiled
pattern after it has been compiled.
-If the -t option is given, each compile, study, and match is run 10000 times
+If the -t option is given, each compile, study, and match is run 20000 times
while being timed, and the resulting time per compile or match is output in
milliseconds. Do not set -t with -s, because you will then get the size output
-10000 times and the timing will be distorted. If you want to change the number
+20000 times and the timing will be distorted. If you want to change the number
of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
pcretest.c
@@ -291,4 +325,4 @@ contains malformed regular expressions, in order to check that PCRE diagnoses
them correctly.
Philip Hazel <ph10@cam.ac.uk>
-January 1999
+February 1999