summaryrefslogtreecommitdiff
path: root/doc/pcretest.1
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcretest.1')
-rw-r--r--doc/pcretest.1152
1 files changed, 113 insertions, 39 deletions
diff --git a/doc/pcretest.1 b/doc/pcretest.1
index b2e2556..76daaf3 100644
--- a/doc/pcretest.1
+++ b/doc/pcretest.1
@@ -6,10 +6,24 @@ pcretest - a program for testing Perl-compatible regular expressions.
\fBpcretest\fR was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
-expressions. This man page describes the features of the test program; for
-details of the regular expressions themselves, see the \fBpcre\fR man page.
+expressions. This document describes the features of the test program; for
+details of the regular expressions themselves, see the
+.\" HREF
+\fBpcrepattern\fR
+.\"
+documentation. For details of PCRE and its options, see the
+.\" HREF
+\fBpcreapi\fR
+.\"
+documentation.
.SH OPTIONS
+.rs
+.sp
+.TP 10
+\fB-C\fR
+Output the version number of the PCRE library, and all available information
+about the optional features that are included, and then exit.
.TP 10
\fB-d\fR
Behave as if each regex had the \fB/D\fR modifier (see below); the internal
@@ -35,14 +49,14 @@ Behave as if each regex has \fB/P\fR modifier; the POSIX wrapper API is used
to call PCRE. None of the other options has any effect when \fB-p\fR is set.
.TP 10
\fB-t\fR
-Run each compile, study, and match 20000 times with a timer, and output
+Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
\fB-m\fR, because you will then get the size output 20000 times and the timing
will be distorted.
-
.SH DESCRIPTION
-
+.rs
+.sp
If \fBpcretest\fR is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
@@ -51,10 +65,16 @@ expressions, and "data>" to prompt for data lines.
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
-lines to be matched against the pattern. An empty line signals the end of the
-data lines, at which point a new regular expression is read. The regular
-expressions are given enclosed in any non-alphameric delimiters other than
-backslash, for example
+lines to be matched against the pattern.
+
+Each line is matched separately and independently. If you want to do
+multiple-line matches, you have to use the \\n escape sequence in a single line
+of input to encode the newline characters. The maximum length of data line is
+30,000 characters.
+
+An empty line signals the end of the data lines, at which point a new regular
+expression is read. The regular expressions are given enclosed in any
+non-alphameric delimiters other than backslash, for example
/(a|bc)x+yz/
@@ -81,9 +101,9 @@ backslash, because
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
-
.SH PATTERN MODIFIERS
-
+.rs
+.sp
The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
respectively. For example:
@@ -138,7 +158,8 @@ studied, the results of that are also output.
The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
It causes the internal form of compiled regular expressions to be output after
-compilation.
+compilation. If the pattern was studied, the information returned is also
+output.
The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
expression has been compiled, and the results used when the expression is
@@ -154,17 +175,47 @@ present, and REG_NEWLINE is set if \fB/m\fR is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
-option set. This turns on the (currently incomplete) support for UTF-8
-character handling in PCRE, provided that it was compiled with this support
-enabled. This modifier also causes any non-printing characters in output
-strings to be printed using the \\x{hh...} notation if they are valid UTF-8
-sequences.
-
+option set. This turns on support for UTF-8 character handling in PCRE,
+provided that it was compiled with this support enabled. This modifier also
+causes any non-printing characters in output strings to be printed using the
+\\x{hh...} notation if they are valid UTF-8 sequences.
+
+.SH CALLOUTS
+.rs
+.sp
+If the pattern contains any callout requests, \fBpcretest\fR's callout function
+will be called. By default, it displays the callout number, and the start and
+current positions in the text at the callout time. For example, the output
+
+ --->pqrabcdef
+ 0 ^ ^
+
+indicates that callout number 0 occurred for a match attempt starting at the
+fourth character of the subject string, when the pointer was at the seventh
+character. The callout function returns zero (carry on matching) by default.
+
+Inserting callouts may be helpful when using \fBpcretest\fR to check
+complicated regular expressions. For further information about callouts, see
+the
+.\" HREF
+\fBpcrecallout\fR
+.\"
+documentation.
+
+For testing the PCRE library, additional control of callout behaviour is
+available via escape sequences in the data, as described in the following
+section. In particular, it is possible to pass in a number as callout data (the
+default is zero). If the callout function receives a non-zero number, it
+returns that value instead of zero.
.SH DATA LINES
-
+.rs
+.sp
Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
-whitespace is removed, and it is then scanned for \\ escapes. The following are
+whitespace is removed, and it is then scanned for \\ escapes. Some of these are
+pretty esoteric features, intended for checking out some of the more
+complicated features of PCRE. If you are just testing "ordinary" regular
+expressions, you probably don't need any of these. The following escapes are
recognized:
\\a alarm (= BEL)
@@ -177,24 +228,49 @@ recognized:
\\v vertical tab
\\nnn octal character (up to 3 octal digits)
\\xhh hexadecimal character (up to 2 hex digits)
- \\x{hh...} hexadecimal UTF-8 character
-
+ \\x{hh...} hexadecimal character, any number of digits
+ in UTF-8 mode
\\A pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
\\B pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
\\Cdd call pcre_copy_substring() for substring dd
- after a successful match (any decimal number
- less than 32)
+ after a successful match (any decimal number
+ less than 32)
+ \\Cname call pcre_copy_named_substring() for substring
+ "name" after a successful match (name termin-
+ ated by next non alphanumeric character)
+ \\C+ show the current captured substrings at callout
+ time
+ \\C- do not supply a callout function
+ \\C!n return 1 instead of 0 when callout number n is
+ reached
+ \\C!n!m return 1 instead of 0 when callout number n is
+ reached for the nth time
+ \\C*n pass the number n (may be negative) as callout
+ data
\\Gdd call pcre_get_substring() for substring dd
- after a successful match (any decimal number
- less than 32)
+ after a successful match (any decimal number
+ less than 32)
+ \\Gname call pcre_get_named_substring() for substring
+ "name" after a successful match (name termin-
+ ated by next non-alphanumeric character)
\\L call pcre_get_substringlist() after a
- successful match
+ successful match
+ \\M discover the minimum MATCH_LIMIT setting
\\N pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
\\Odd set the size of the output vector passed to
- \fBpcre_exec()\fR to dd (any number of decimal
- digits)
+ \fBpcre_exec()\fR to dd (any number of decimal
+ digits)
\\Z pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
+If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with
+different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data
+structure, until it finds the minimum number that is needed for
+\fBpcre_exec()\fR to complete. This number is a measure of the amount of
+recursion and backtracking that takes place, and checking it out can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string.
+
When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
for the line in which it appears.
@@ -212,15 +288,15 @@ of the \fB/8\fR modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
-
.SH OUTPUT FROM PCRETEST
-
+.rs
+.sp
When a match succeeds, pcretest outputs the list of captured substrings that
\fBpcre_exec()\fR returns, starting with number 0 for the string that matched
the whole pattern. Here is an example of an interactive pcretest run.
$ pcretest
- PCRE version 2.06 08-Jun-1999
+ PCRE version 4.00 08-Jan-2003
re> /^abc(\\d+)/
data> abc123
@@ -265,18 +341,16 @@ Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \\n escape.
-
.SH AUTHOR
+.rs
+.sp
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service,
.br
-New Museums Site,
-.br
Cambridge CB2 3QG, England.
-.br
-Phone: +44 1223 334714
-Last updated: 15 August 2001
+.in 0
+Last updated: 03 February 2003
.br
-Copyright (c) 1997-2001 University of Cambridge.
+Copyright (c) 1997-2003 University of Cambridge.