diff options
Diffstat (limited to 'doc/pcretest.1')
-rw-r--r-- | doc/pcretest.1 | 118 |
1 files changed, 104 insertions, 14 deletions
diff --git a/doc/pcretest.1 b/doc/pcretest.1 index 0c06cb7..336abcf 100644 --- a/doc/pcretest.1 +++ b/doc/pcretest.1 @@ -4,7 +4,7 @@ pcretest - a program for testing Perl-compatible regular expressions. .SH SYNOPSIS .rs .sp -.B pcretest "[-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]" +.B pcretest "[-C] [-d] [-dfa] [-i] [-m] [-o osize] [-p] [-t] [source]" .ti +5n .B "[destination]" .P @@ -31,11 +31,16 @@ Output the version number of the PCRE library, and all available information about the optional features that are included, and then exit. .TP 10 \fB-d\fP -Behave as if each regex had the \fB/D\fP (debug) modifier; the internal +Behave as if each regex has the \fB/D\fP (debug) modifier; the internal form is output after compilation. .TP 10 +\fB-dfa\fP +Behave as if each data line contains the \eD escape sequence; this causes the +alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the +standard \fBpcre_exec()\fP function (more detail is given below). +.TP 10 \fB-i\fP -Behave as if each regex had the \fB/I\fP modifier; information about the +Behave as if each regex has the \fB/I\fP modifier; information about the compiled pattern is given after compilation. .TP 10 \fB-m\fP @@ -50,8 +55,9 @@ for 14 capturing subexpressions. The vector size can be changed for individual matching calls by including \eO in the data line (see below). .TP 10 \fB-p\fP -Behave as if each regex has \fB/P\fP modifier; the POSIX wrapper API is used -to call PCRE. None of the other options has any effect when \fB-p\fP is set. +Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is +used to call PCRE. None of the other options has any effect when \fB-p\fP is +set. .TP 10 \fB-t\fP Run each compile, study, and match many times with a timer, and output @@ -131,6 +137,7 @@ not correspond to anything in Perl: \fB/A\fP PCRE_ANCHORED \fB/C\fP PCRE_AUTO_CALLOUT \fB/E\fP PCRE_DOLLAR_ENDONLY + \fB/f\fP PCRE_FIRSTLINE \fB/N\fP PCRE_NO_AUTO_CAPTURE \fB/U\fP PCRE_UNGREEDY \fB/X\fP PCRE_EXTRA @@ -257,6 +264,8 @@ recognized: .\" JOIN \eC*n pass the number n (may be negative) as callout data; this is used as the callout return value + \eD use the \fBpcre_dfa_exec()\fP match function + \eF only shortest match for \fBpcre_dfa_exec()\fP .\" JOIN \eGdd call pcre_get_substring() for substring dd after a successful match (number less than 32) @@ -272,7 +281,10 @@ recognized: .\" JOIN \eOdd set the size of the output vector passed to \fBpcre_exec()\fP to dd (any number of digits) +.\" JOIN \eP pass the PCRE_PARTIAL option to \fBpcre_exec()\fP + or \fBpcre_dfa_exec()\fP + \eR pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP \eS output details of memory get/free calls during matching \eZ pass the PCRE_NOTEOL option to \fBpcre_exec()\fP .\" JOIN @@ -308,15 +320,38 @@ any number of hexadecimal digits inside the braces. The result is from one to six bytes, encoded according to the UTF-8 rules. . . -.SH "OUTPUT FROM PCRETEST" +.SH "THE ALTERNATIVE MATCHING FUNCTION" .rs .sp +By default, \fBpcretest\fP uses the standard PCRE matching function, +\fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an +alternative matching function, \fBpcre_dfa_test()\fP, which operates in a +different way, and has some restrictions. The differences between the two +functions are described in the +.\" HREF +\fBpcrematching\fP +.\" +documentation. +.P +If a data line contains the \eD escape sequence, or if the command line +contains the \fB-dfa\fP option, the alternative matching function is called. +This function finds all possible matches at a given point. If, however, the \eF +escape sequence is present in the data line, it stops after the first match is +found. This is always the shortest possible match. +. +. +.SH "DEFAULT OUTPUT FROM PCRETEST" +.rs +.sp +This section describes the output when the normal matching function, +\fBpcre_exec()\fP, is being used. +.P When a match succeeds, pcretest outputs the list of captured substrings that \fBpcre_exec()\fP returns, starting with number 0 for the string that matched the whole pattern. Otherwise, it outputs "No match" or "Partial match" when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE negative error number. Here is an example -of an interactive pcretest run. +of an interactive \fBpcretest\fP run. .sp $ pcretest PCRE version 5.00 07-Sep-2004 @@ -365,13 +400,68 @@ prompt is used for continuations), data lines may not. However newlines can be included in data by means of the \en escape. . . +.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION" +.rs +.sp +When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by +means of the \eD escape sequence or the \fB-dfa\fP command line option), the +output consists of a list of all the matches that start at the first point in +the subject where there is at least one match. For example: +.sp + re> /(tang|tangerine|tan)/ + data> yellow tangerine\eD + 0: tangerine + 1: tang + 2: tan +.sp +(Using the normal matching function on this data finds only "tang".) The +longest matching string is always given first (and numbered zero). +.P +If \fB/g\P is present on the pattern, the search for further matches resumes +at the end of the longest match. For example: +.sp + re> /(tang|tangerine|tan)/g + data> yellow tangerine and tangy sultana\eD + 0: tangerine + 1: tang + 2: tan + 0: tang + 1: tan + 0: tan +.sp +Since the matching function does not support substring capture, the escape +sequences that are concerned with captured substrings are not relevant. +. +. +.SH "RESTARTING AFTER A PARTIAL MATCH" +.rs +.sp +When the alternative matching function has given the PCRE_ERROR_PARTIAL return, +indicating that the subject partially matched the pattern, you can restart the +match with additional subject data by means of the \eR escape sequence. For +example: +.sp + re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 23ja\eP\eD + Partial match: 23ja + data> n05\eR\eD + 0: n05 +.sp +For further information about partial matching, see the +.\" HREF +\fBpcrepartial\fP +.\" +documentation. +. +. .SH CALLOUTS .rs .sp If the pattern contains any callout requests, \fBpcretest\fP's callout function -is called during matching. By default, it displays the callout number, the -start and current positions in the text at the callout time, and the next -pattern item to be tested. For example, the output +is called during matching. This works with both matching functions. By default, +the called function displays the callout number, the start and current +positions in the text at the callout time, and the next pattern item to be +tested. For example, the output .sp --->pqrabcdef 0 ^ ^ \ed @@ -396,7 +486,7 @@ example: 0: E* .sp The callout function in \fBpcretest\fP returns zero (carry on matching) by -default, but you can use an \eC item in a data line (as described above) to +default, but you can use a \eC item in a data line (as described above) to change this. .P Inserting callouts can be helpful when using \fBpcretest\fP to check @@ -471,13 +561,13 @@ result is undefined. .SH AUTHOR .rs .sp -Philip Hazel <ph10@cam.ac.uk> +Philip Hazel .br University Computing Service, .br Cambridge CB2 3QG, England. .P .in 0 -Last updated: 10 September 2004 +Last updated: 28 February 2005 .br -Copyright (c) 1997-2004 University of Cambridge. +Copyright (c) 1997-2005 University of Cambridge. |