summaryrefslogtreecommitdiff
path: root/doc/pcretest.txt
diff options
context:
space:
mode:
authornigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:40:03 +0000
committernigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>2007-02-24 21:40:03 +0000
commitc8cb607ab7e12e185e86a8b23d413b7f9536f24c (patch)
treee1c3675d531d498d2a84490908e187a249456d2c /doc/pcretest.txt
parente27c89c9227398c6feee3ca0748827fd064154cd (diff)
downloadpcre-c8cb607ab7e12e185e86a8b23d413b7f9536f24c.tar.gz
Load pcre-4.0 into code/trunk.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@63 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/pcretest.txt')
-rw-r--r--doc/pcretest.txt159
1 files changed, 111 insertions, 48 deletions
diff --git a/doc/pcretest.txt b/doc/pcretest.txt
index 0e13b6c..80585af 100644
--- a/doc/pcretest.txt
+++ b/doc/pcretest.txt
@@ -3,20 +3,26 @@ NAME
expressions.
-
SYNOPSIS
pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [des-
tination]
pcretest was written as a test program for the PCRE regular
expression library itself, but it can also be used for
- experimenting with regular expressions. This man page
+ experimenting with regular expressions. This document
describes the features of the test program; for details of
- the regular expressions themselves, see the pcre man page.
-
+ the regular expressions themselves, see the pcrepattern
+ documentation. For details of PCRE and its options, see the
+ pcreapi documentation.
OPTIONS
+
+
+ -C Output the version number of the PCRE library, and
+ all available information about the optional
+ features that are included, and then exit.
+
-d Behave as if each regex had the /D modifier (see
below); the internal form is output after compila-
tion.
@@ -42,25 +48,17 @@ OPTIONS
wrapper API is used to call PCRE. None of the
other options has any effect when -p is set.
- -t Run each compile, study, and match 20000 times
- with a timer, and output resulting time per com-
- pile or match (in milliseconds). Do not set -t
- with -m, because you will then get the size output
- 20000 times and the timing will be distorted.
-
+ -t Run each compile, study, and match many times with
+ a timer, and output resulting time per compile or
+ match (in milliseconds). Do not set -t with -m,
+ because you will then get the size output 20000
+ times and the timing will be distorted.
DESCRIPTION
+
If pcretest is given two filename arguments, it reads from
the first and writes to the second. If it is given only one
-
-
-
-
-SunOS 5.8 Last change: 1
-
-
-
filename argument, it reads from that file and writes to
stdout. Otherwise, it reads from stdin and writes to stdout,
and prompts for each line of input, using "re>" to prompt
@@ -70,10 +68,18 @@ SunOS 5.8 Last change: 1
The program handles any number of sets of input on a single
input file. Each set starts with a regular expression, and
continues with any number of data lines to be matched
- against the pattern. An empty line signals the end of the
- data lines, at which point a new regular expression is read.
- The regular expressions are given enclosed in any non-
- alphameric delimiters other than backslash, for example
+ against the pattern.
+
+ Each line is matched separately and independently. If you
+ want to do multiple-line matches, you have to use the \n
+ escape sequence in a single line of input to encode the new-
+ line characters. The maximum length of data line is 30,000
+ characters.
+
+ An empty line signals the end of the data lines, at which
+ point a new regular expression is read. The regular expres-
+ sions are given enclosed in any non-alphameric delimiters
+ other than backslash, for example
/(a|bc)x+yz/
@@ -104,8 +110,8 @@ SunOS 5.8 Last change: 1
continuation of the regular expression.
-
PATTERN MODIFIERS
+
The pattern may be followed by i, m, s, or x to set the
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED
options, respectively. For example:
@@ -165,9 +171,11 @@ PATTERN MODIFIERS
pcre_fullinfo() after compiling an expression, and output-
ting the information it gets back. If the pattern is stu-
died, the results of that are also output.
+
The /D modifier is a PCRE debugging feature, which also
assumes /I. It causes the internal form of compiled regular
- expressions to be output after compilation.
+ expressions to be output after compilation. If the pattern
+ was studied, the information returned is also output.
The /S modifier causes pcre_study() to be called after the
expression has been compiled, and the results used when the
@@ -185,19 +193,49 @@ PATTERN MODIFIERS
REG_NEWLINE is set.
The /8 modifier causes pcretest to call PCRE with the
- PCRE_UTF8 option set. This turns on the (currently incom-
- plete) support for UTF-8 character handling in PCRE, pro-
- vided that it was compiled with this support enabled. This
- modifier also causes any non-printing characters in output
- strings to be printed using the \x{hh...} notation if they
- are valid UTF-8 sequences.
+ PCRE_UTF8 option set. This turns on support for UTF-8 char-
+ acter handling in PCRE, provided that it was compiled with
+ this support enabled. This modifier also causes any non-
+ printing characters in output strings to be printed using
+ the \x{hh...} notation if they are valid UTF-8 sequences.
+
+
+CALLOUTS
+
+ If the pattern contains any callout requests, pcretest's
+ callout function will be called. By default, it displays the
+ callout number, and the start and current positions in the
+ text at the callout time. For example, the output
+
+ --->pqrabcdef
+ 0 ^ ^
+
+ indicates that callout number 0 occurred for a match attempt
+ starting at the fourth character of the subject string, when
+ the pointer was at the seventh character. The callout func-
+ tion returns zero (carry on matching) by default.
+
+ Inserting callouts may be helpful when using pcretest to
+ check complicated regular expressions. For further informa-
+ tion about callouts, see the pcrecallout documentation.
+ For testing the PCRE library, additional control of callout
+ behaviour is available via escape sequences in the data, as
+ described in the following section. In particular, it is
+ possible to pass in a number as callout data (the default is
+ zero). If the callout function receives a non-zero number,
+ it returns that value instead of zero.
DATA LINES
+
Before each data line is passed to pcre_exec(), leading and
trailing whitespace is removed, and it is then scanned for \
- escapes. The following are recognized:
+ escapes. Some of these are pretty esoteric features,
+ intended for checking out some of the more complicated
+ features of PCRE. If you are just testing "ordinary" regular
+ expressions, you probably don't need any of these. The fol-
+ lowing escapes are recognized:
\a alarm (= BEL)
\b backspace
@@ -209,25 +247,52 @@ DATA LINES
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
- \x{hh...} hexadecimal UTF-8 character
-
+ \x{hh...} hexadecimal character, any number of digits
+ in UTF-8 mode
\A pass the PCRE_ANCHORED option to pcre_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
\Cdd call pcre_copy_substring() for substring dd
- after a successful match (any decimal number
- less than 32)
+ after a successful match (any decimal number
+ less than 32)
+ \Cname call pcre_copy_named_substring() for substring
+ "name" after a successful match (name termin-
+ ated by next non alphanumeric character)
+ \C+ show the current captured substrings at callout
+ time
+
+ C- do not supply a callout function
+ \C!n return 1 instead of 0 when callout number n is
+ reached
+ \C!n!m return 1 instead of 0 when callout number n is
+ reached for the nth time
+ \C*n pass the number n (may be negative) as callout
+ data
\Gdd call pcre_get_substring() for substring dd
-
- after a successful match (any decimal number
- less than 32)
+ after a successful match (any decimal number
+ less than 32)
+ \Gname call pcre_get_named_substring() for substring
+ "name" after a successful match (name termin-
+ ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a
- successful match
+ successful match
+ \M discover the minimum MATCH_LIMIT setting
\N pass the PCRE_NOTEMPTY option to pcre_exec()
\Odd set the size of the output vector passed to
- pcre_exec() to dd (any number of decimal
- digits)
+ pcre_exec() to dd (any number of decimal
+ digits)
\Z pass the PCRE_NOTEOL option to pcre_exec()
+ If \M is present, pcretest calls pcre_exec() several times,
+ with different values in the match_limit field of the
+ pcre_extra data structure, until it finds the minimum number
+ that is needed for pcre_exec() to complete. This number is a
+ measure of the amount of recursion and backtracking that
+ takes place, and checking it out can be instructive. For
+ most simple matches, the number is quite small, but for pat-
+ terns with very large numbers of matching possibilities, it
+ can become large very quickly with increasing length of sub-
+ ject string.
+
When \O is used, it may be higher or lower than the size set
by the -O option (or defaulted to 45); \O applies only to
the call of pcre_exec() for the line in which it appears.
@@ -249,15 +314,15 @@ DATA LINES
bytes, encoded according to the UTF-8 rules.
-
OUTPUT FROM PCRETEST
+
When a match succeeds, pcretest outputs the list of captured
substrings that pcre_exec() returns, starting with number 0
for the string that matched the whole pattern. Here is an
example of an interactive pcretest run.
$ pcretest
- PCRE version 2.06 08-Jun-1999
+ PCRE version 4.00 08-Jan-2003
re> /^abc(\d+)/
data> abc123
@@ -307,13 +372,11 @@ OUTPUT FROM PCRETEST
of the \n escape.
-
AUTHOR
+
Philip Hazel <ph10@cam.ac.uk>
University Computing Service,
- New Museums Site,
Cambridge CB2 3QG, England.
- Phone: +44 1223 334714
- Last updated: 15 August 2001
- Copyright (c) 1997-2001 University of Cambridge.
+Last updated: 03 February 2003
+Copyright (c) 1997-2003 University of Cambridge.