diff options
Diffstat (limited to 'doc/pcretest.1')
-rw-r--r-- | doc/pcretest.1 | 63 |
1 files changed, 36 insertions, 27 deletions
diff --git a/doc/pcretest.1 b/doc/pcretest.1 index e7e90e6..8c439c7 100644 --- a/doc/pcretest.1 +++ b/doc/pcretest.1 @@ -165,9 +165,11 @@ effect as they do in Perl. For example: .sp /caseless/i .sp -The following table shows additional modifiers for setting PCRE options that do -not correspond to anything in Perl: +The following table shows additional modifiers for setting PCRE compile-time +options that do not correspond to anything in Perl: .sp + \fB/8\fP PCRE_UTF8 + \fB/?\fP PCRE_NO_UTF8_CHECK \fB/A\fP PCRE_ANCHORED \fB/C\fP PCRE_AUTO_CALLOUT \fB/E\fP PCRE_DOLLAR_ENDONLY @@ -175,6 +177,7 @@ not correspond to anything in Perl: \fB/J\fP PCRE_DUPNAMES \fB/N\fP PCRE_NO_AUTO_CAPTURE \fB/U\fP PCRE_UNGREEDY + \fB/W\fP PCRE_UCP \fB/X\fP PCRE_EXTRA \fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT \fB/<cr>\fP PCRE_NEWLINE_CR @@ -185,17 +188,20 @@ not correspond to anything in Perl: \fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF \fB/<bsr_unicode>\fP PCRE_BSR_UNICODE .sp -Those specifying line ending sequences are literal strings as shown, but the -letters can be in either case. This example sets multiline matching with CRLF -as the line ending sequence: +The modifiers that are enclosed in angle brackets are literal strings as shown, +including the angle brackets, but the letters can be in either case. This +example sets multiline matching with CRLF as the line ending sequence: .sp /^abc/m<crlf> .sp -Details of the meanings of these PCRE options are given in the +As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes +any non-printing characters in output strings to be printed using the +\ex{hh...} notation if they are valid UTF-8 sequences. Full details of the PCRE +options are given in the .\" HREF \fBpcreapi\fP .\" -documentation. +documentation. . . .SS "Finding all matches in a string" @@ -224,16 +230,6 @@ such cases when using the \fB/g\fP modifier or the \fBsplit()\fP function. There are yet more modifiers for controlling the way \fBpcretest\fP operates. .P -The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8 -option set. This turns on support for UTF-8 character handling in PCRE, -provided that it was compiled with this support enabled. This modifier also -causes any non-printing characters in output strings to be printed using the -\ex{hh...} notation if they are valid UTF-8 sequences. -.P -If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to -call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the -checking of the string for UTF-8 validity. -.P The \fB/+\fP modifier requests that as well as outputting the substring that matched the entire pattern, pcretest should in addition output the remainder of the subject string. This is useful for tests where the subject contains @@ -286,17 +282,30 @@ pointer; that is, \fB/L\fP applies only to the expression on which it appears. The \fB/M\fP modifier causes the size of memory block used to hold the compiled pattern to be output. .P -The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper -API rather than its native API. When this is done, all other modifiers except -\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is -present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions -force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set. -.P The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the expression has been compiled, and the results used when the expression is matched. . . +.SS "Using the POSIX wrapper API" +.rs +.sp +The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper +API rather than its native API. When \fB/P\fP is set, the following modifiers +set options for the \fBregcomp()\fP function: +.sp + /i REG_ICASE + /m REG_NEWLINE + /N REG_NOSUB + /s REG_DOTALL ) + /U REG_UNGREEDY ) These options are not part of + /W REG_UCP ) the POSIX standard + /8 REG_UTF8 ) +.sp +The \fB/+\fP modifier works as described above. All other modifiers are +ignored. +. +. .SH "DATA LINES" .rs .sp @@ -434,9 +443,9 @@ by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to the call of \fBpcre_exec()\fP for the line in which it appears. .P If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper -API to be used, the only option-setting sequences that have any effect are \eB -and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to -\fBregexec()\fP. +API to be used, the only option-setting sequences that have any effect are \eB, +\eN, and \eZ, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, +to be passed to \fBregexec()\fP. .P The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use of the \fB/8\fP modifier on the pattern. It is recognized always. There may be @@ -741,6 +750,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 26 March 2010 +Last updated: 16 May 2010 Copyright (c) 1997-2010 University of Cambridge. .fi |