summaryrefslogtreecommitdiff
path: root/doc/pcre2grep.1
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcre2grep.1')
-rw-r--r--doc/pcre2grep.1234
1 files changed, 134 insertions, 100 deletions
diff --git a/doc/pcre2grep.1 b/doc/pcre2grep.1
index 137117a..b03082b 100644
--- a/doc/pcre2grep.1
+++ b/doc/pcre2grep.1
@@ -1,4 +1,4 @@
-.TH PCRE2GREP 1 "25 January 2020" "PCRE2 10.35"
+.TH PCRE2GREP 1 "04 October 2020" "PCRE2 10.36"
.SH NAME
pcre2grep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -79,8 +79,8 @@ matching substrings, or if \fB--only-matching\fP, \fB--file-offsets\fP, or
(either shown literally, or as an offset), scanning resumes immediately
following the match, so that further matches on the same line can be found. If
there are multiple patterns, they are all tried on the remainder of the line,
-but patterns that follow the one that matched are not tried on the earlier part
-of the line.
+but patterns that follow the one that matched are not tried on the earlier
+matched part of the line.
.P
This behaviour means that the order in which multiple patterns are specified
can affect the output when one of the above options is used. This is no longer
@@ -115,11 +115,10 @@ ignored.
.rs
.sp
By default, a file that contains a binary zero byte within the first 1024 bytes
-is identified as a binary file, and is processed specially. (GNU grep
-identifies binary files in this manner.) However, if the newline type is
-specified as NUL, that is, the line terminator is a binary zero, the test for
-a binary file is not applied. See the \fB--binary-files\fP option for a means
-of changing the way binary files are handled.
+is identified as a binary file, and is processed specially. However, if the
+newline type is specified as NUL, that is, the line terminator is a binary
+zero, the test for a binary file is not applied. See the \fB--binary-files\fP
+option for a means of changing the way binary files are handled.
.
.
.SH "BINARY ZEROS IN PATTERNS"
@@ -383,8 +382,8 @@ Ignore upper/lower case distinctions during comparisons.
.TP
\fB--include\fP=\fIpattern\fP
If any \fB--include\fP patterns are specified, the only files that are
-processed are those that match one of the patterns (and do not match an
-\fB--exclude\fP pattern). This option does not affect directories, but it
+processed are those whose names match one of the patterns and do not match an
+\fB--exclude\fP pattern. This option does not affect directories, but it
applies to all files, whether listed on the command line, obtained from
\fB--file-list\fP, or by scanning a directory. The pattern is a PCRE2 regular
expression, and is matched against the final component of the file name, not
@@ -401,8 +400,8 @@ may be given any number of times; all the files are read.
.TP
\fB--include-dir\fP=\fIpattern\fP
If any \fB--include-dir\fP patterns are specified, the only directories that
-are processed are those that match one of the patterns (and do not match an
-\fB--exclude-dir\fP pattern). This applies to all directories, whether listed
+are processed are those whose names match one of the patterns and do not match
+an \fB--exclude-dir\fP pattern. This applies to all directories, whether listed
on the command line, obtained from \fB--file-list\fP, or by scanning a parent
directory. The pattern is a PCRE2 regular expression, and is matched against
the final component of the directory name, not the entire path. The \fB-F\fP,
@@ -423,8 +422,9 @@ a separate line. Searching normally stops as soon as a matching line is found
in a file. However, if the \fB-c\fP (count) option is also used, matching
continues in order to obtain the correct count, and those files that have at
least one match are listed along with their counts. Using this option with
-\fB-c\fP is a way of suppressing the listing of files with no matches. This
-opeion overrides any previous \fB-H\fP, \fB-h\fP, or \fB-L\fP options.
+\fB-c\fP is a way of suppressing the listing of files with no matches that
+occurs with \fB-c\fP on its own. This option overrides any previous \fB-H\fP,
+\fB-h\fP, or \fB-L\fP options.
.TP
\fB--label\fP=\fIname\fP
This option supplies a name to be used for the standard input when file names
@@ -435,8 +435,8 @@ short form for this option.
When this option is given, non-compressed input is read and processed line by
line, and the output is flushed after each write. By default, input is read in
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
-terminal (which is currently possible only in Unix-like environments or
-Windows). Output to terminal is normally automatically flushed by the operating
+terminal, which is currently possible only in Unix-like environments or
+Windows. Output to terminal is normally automatically flushed by the operating
system. This option can be useful when the input or output is attached to a
pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data.
However, its use will affect performance, and the \fB-M\fP (multiline) option
@@ -459,6 +459,45 @@ the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
locale is specified, the PCRE2 library's default (usually the "C" locale) is
used. There is no short form for this option.
.TP
+\fB-M\fP, \fB--multiline\fP
+Allow patterns to match more than one line. When this option is set, the PCRE2
+library is called in "multiline" mode. This allows a matched string to extend
+past the end of a line and continue on one or more subsequent lines. Patterns
+used with \fB-M\fP may usefully contain literal newline characters and internal
+occurrences of ^ and $ characters. The output for a successful match may
+consist of more than one line. The first line is the line in which the match
+started, and the last line is the line in which the match ended. If the matched
+string ends with a newline sequence, the output ends at the end of that line.
+If \fB-v\fP is set, none of the lines in a multi-line match are output. Once a
+match has been handled, scanning restarts at the beginning of the line after
+the one in which the match ended.
+.sp
+The newline sequence that separates multiple lines must be matched as part of
+the pattern. For example, to find the phrase "regular expression" in a file
+where "regular" might be at the end of a line and "expression" at the start of
+the next line, you could use this command:
+.sp
+ pcre2grep -M 'regular\es+expression' <file>
+.sp
+The \es escape sequence matches any white space character, including newlines,
+and is followed by + so as to match trailing white space on the first line as
+well as possibly handling a two-character newline sequence.
+.sp
+There is a limit to the number of lines that can be matched, imposed by the way
+that \fBpcre2grep\fP buffers the input file as it scans it. With a sufficiently
+large processing buffer, this should not be a problem, but the \fB-M\fP option
+does not work when input is read line by line (see \fB--line-buffered\fP.)
+.TP
+\fB-m\fP \fInumber\fP, \fB--max-count\fP=\fInumber\fP
+Stop processing after finding \fInumber\fP matching lines, or non-matching
+lines if \fB-v\fP is also set. Any trailing context lines are output after the
+final match. In multiline mode, each multiline match counts as just one line
+for this purpose. If this limit is reached when reading the standard input from
+a regular file, the file is left positioned just after the last matching line.
+If \fB-c\fP is also set, the count that is output is never greater than
+\fInumber\fP. This option has no effect if used with \fB-L\fP, \fB-l\fP, or
+\fB-q\fP, or when just checking for a match in a binary file.
+.TP
\fB--match-limit\fP=\fInumber\fP
Processing some regular expression patterns may take a very long time to search
for all possible matching strings. Others may require a very large amount of
@@ -493,35 +532,6 @@ This limits the expansion of the processing buffer, whose initial size can be
set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no
smaller than the starting buffer size.
.TP
-\fB-M\fP, \fB--multiline\fP
-Allow patterns to match more than one line. When this option is set, the PCRE2
-library is called in "multiline" mode. This allows a matched string to extend
-past the end of a line and continue on one or more subsequent lines. Patterns
-used with \fB-M\fP may usefully contain literal newline characters and internal
-occurrences of ^ and $ characters. The output for a successful match may
-consist of more than one line. The first line is the line in which the match
-started, and the last line is the line in which the match ended. If the matched
-string ends with a newline sequence, the output ends at the end of that line.
-If \fB-v\fP is set, none of the lines in a multi-line match are output. Once a
-match has been handled, scanning restarts at the beginning of the line after
-the one in which the match ended.
-.sp
-The newline sequence that separates multiple lines must be matched as part of
-the pattern. For example, to find the phrase "regular expression" in a file
-where "regular" might be at the end of a line and "expression" at the start of
-the next line, you could use this command:
-.sp
- pcre2grep -M 'regular\es+expression' <file>
-.sp
-The \es escape sequence matches any white space character, including newlines,
-and is followed by + so as to match trailing white space on the first line as
-well as possibly handling a two-character newline sequence.
-.sp
-There is a limit to the number of lines that can be matched, imposed by the way
-that \fBpcre2grep\fP buffers the input file as it scans it. With a sufficiently
-large processing buffer, this should not be a problem, but the \fB-M\fP option
-does not work when input is read line by line (see \fB--line-buffered\fP.)
-.TP
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
Six different conventions for indicating the ends of lines in scanned files are
supported. For example:
@@ -565,27 +575,36 @@ use of JIT at run time. It is provided for testing and working round problems.
It should never be needed in normal use.
.TP
\fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP
-When there is a match, instead of outputting the whole line that matched,
-output just the given text, followed by an operating-system standard newline.
-The \fB--newline\fP option has no effect on this option, which is mutually
-exclusive with \fB--only-matching\fP, \fB--file-offsets\fP, and
-\fB--line-offsets\fP. Escape sequences starting with a dollar character may be
-used to insert the contents of the matched part of the line and/or captured
-substrings into the text.
-.sp
-$<digits> or ${<digits>} is replaced by the captured
-substring of the given decimal number; zero substitutes the whole match. If
-the number is greater than the number of capturing substrings, or if the
-capture is unset, the replacement is empty.
+When there is a match, instead of outputting the line that matched, output just
+the text specified in this option, followed by an operating-system standard
+newline. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP,
+and \fB-C\fP options are ignored. The \fB--newline\fP option has no effect on
+this option, which is mutually exclusive with \fB--only-matching\fP,
+\fB--file-offsets\fP, and \fB--line-offsets\fP. However, like
+\fB--only-matching\fP, if there is more than one match in a line, each of them
+causes a line of output.
+.sp
+Escape sequences starting with a dollar character may be used to insert the
+contents of the matched part of the line and/or captured substrings into the
+text.
+.sp
+$<digits> or ${<digits>} is replaced by the captured substring of the given
+decimal number; zero substitutes the whole match. If the number is greater than
+the number of capturing substrings, or if the capture is unset, the replacement
+is empty.
.sp
$a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by
newline; $r by carriage return; $t by tab; $v by vertical tab.
.sp
-$o<digits> is replaced by the character represented by the given octal
-number; up to three digits are processed.
+$o<digits> or $o{<digits>} is replaced by the character whose code point is the
+given octal number. In the first form, up to three octal digits are processed.
+When more digits are needed in Unicode mode to specify a wide character, the
+second form must be used.
.sp
-$x<digits> is replaced by the character represented by the given hexadecimal
-number; up to two digits are processed.
+$x<digits> or $x{<digits>} is replaced by the character represented by the
+given hexadecimal number. In the first form, up to two hexadecimal digits are
+processed. When more digits are needed in Unicode mode to specify a wide
+character, the second form must be used.
.sp
Any other character is substituted by itself. In particular, $$ is replaced by
a single dollar.
@@ -644,7 +663,8 @@ immediate end-of-file. This option is a shorthand for setting the \fB-d\fP
option to "recurse".
.TP
\fB--recursion-limit\fP=\fInumber\fP
-See \fB--match-limit\fP above.
+This is an obsolete synonym for \fB--depth-limit\fP. See \fB--match-limit\fP
+above for details.
.TP
\fB-s\fP, \fB--no-messages\fP
Suppress error messages about non-existent or unreadable files. Such files are
@@ -665,14 +685,17 @@ total would always be zero.
\fB-u\fP, \fB--utf\fP
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
-\fB--include\fP options) and all subject lines that are scanned must be valid
-strings of UTF-8 characters.
+\fB--include\fP options) and all lines that are scanned must be valid strings
+of UTF-8 characters. If an invalid UTF-8 string is encountered, an error
+occurs.
.TP
\fB-U\fP, \fB--utf-allow-invalid\fP
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
-unit sequences. These can never form part of any pattern match. This facility
-allows valid UTF-8 strings to be sought in executable or other binary files.
-For more details about matching in non-valid UTF-8 strings, see the
+unit sequences. These can never form part of any pattern match. Patterns
+themselves, however, must still be valid UTF-8 strings. This facility allows
+valid UTF-8 strings to be sought within arbitrary byte sequences in executable
+or other binary files. For more details about matching in non-valid UTF-8
+strings, see the
.\" HREF
\fBpcre2unicode\fP(3)
.\"
@@ -685,7 +708,9 @@ ignored.
.TP
\fB-v\fP, \fB--invert-match\fP
Invert the sense of the match, so that lines which do \fInot\fP match any of
-the patterns are the ones that are found.
+the patterns are the ones that are found. When this option is set, options such
+as \fB--only-matching\fP and \fB--output\fP, which specify parts of a match
+that are to be output, are ignored.
.TP
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
Force the patterns only to match "words". That is, there must be a word
@@ -812,12 +837,36 @@ documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP;
only callouts with string arguments are useful.
.
.
+.SS "Echoing a specific string"
+.rs
+.sp
+Starting the callout string with a pipe character invokes an echoing facility
+that avoids calling an external program or script. This facility is always
+available, provided that callouts were not completely disabled when
+\fBpcre2grep\fP was built. The rest of the callout string is processed as a
+zero-terminated string, which means it should not contain any internal binary
+zeros. It is written to the output, having first been passed through the same
+escape processing as text from the \fB--output\fP (\fB-O\fP) option (see
+above). However, $0 cannot be used to insert a matched substring because the
+match is still in progress. Instead, the single character '0' is inserted. Any
+syntax errors in the string (for example, a dollar not followed by another
+character) causes the callout to be ignored. No terminator is added to the
+output string, so if you want a newline, you must include it explicitly using
+the escape $n. For example:
+.sp
+ pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
+.sp
+Matching continues normally after the string is output. If you want to see only
+the callout output but not any output from an actual match, you should end the
+pattern with (*FAIL).
+.
+.
.SS "Calling external programs or scripts"
.rs
.sp
This facility can be independently disabled when \fBpcre2grep\fP is built. It
is supported for Windows, where a call to \fB_spawnvp()\fP is used, for VMS,
-where \fBlib$spawn()\fP is used, and for any other Unix-like environment where
+where \fBlib$spawn()\fP is used, and for any Unix-like environment where
\fBfork()\fP and \fBexecv()\fP are available.
.P
If the callout string does not start with a pipe (vertical bar) character, it
@@ -828,13 +877,11 @@ arguments:
executable_name|arg1|arg2|...
.sp
Any substring (including the executable name) may contain escape sequences
-started by a dollar character: $<digits> or ${<digits>} is replaced by the
-captured substring of the given decimal number, which must be greater than
-zero. If the number is greater than the number of capturing substrings, or if
-the capture is unset, the replacement is empty.
-.P
-Any other character is substituted by itself. In particular, $$ is replaced by
-a single dollar and $| is replaced by a pipe character. Here is an example:
+started by a dollar character. These are the same as for the \fB--output\fP
+(\fB-O\fP) option documented above, except that $0 cannot insert the matched
+string because the match is still in progress. Instead, the character '0'
+is inserted. If you need a literal dollar or pipe character in any
+substring, use $$ or $| respectively. Here is an example:
.sp
echo -e "abcde\en12345" | pcre2grep \e
'(?x)(.)(..(.))
@@ -847,28 +894,14 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
Arg1: [1] [234] [4] Arg2: |1| ()
12345
.sp
-The parameters for the system call that is used to run the
-program or script are zero-terminated strings. This means that binary zero
-characters in the callout argument will cause premature termination of their
-substrings, and therefore should not be present. Any syntax errors in the
-string (for example, a dollar not followed by another character) cause the
-callout to be ignored. If running the program fails for any reason (including
-the non-existence of the executable), a local matching failure occurs and the
-matcher backtracks in the normal way.
-.
-.
-.SS "Echoing a specific string"
-.rs
-.sp
-This facility is always available, provided that callouts were not completely
-disabled when \fBpcre2grep\fP was built. If the callout string starts with a
-pipe (vertical bar) character, the rest of the string is written to the output,
-having been passed through the same escape processing as text from the --output
-option. This provides a simple echoing facility that avoids calling an external
-program or script. No terminator is added to the string, so if you want a
-newline, you must include it explicitly. Matching continues normally after the
-string is output. If you want to see only the callout output but not any output
-from an actual match, you should end the relevant pattern with (*FAIL).
+The parameters for the system call that is used to run the program or script
+are zero-terminated strings. This means that binary zero characters in the
+callout argument will cause premature termination of their substrings, and
+therefore should not be present. Any syntax errors in the string (for example,
+a dollar not followed by another character) causes the callout to be ignored.
+If running the program fails for any reason (including the non-existence of the
+executable), a local matching failure occurs and the matcher backtracks in the
+normal way.
.
.
.SH "MATCHING ERRORS"
@@ -904,7 +937,8 @@ because VMS does not distinguish between exit(0) and exit(1).
.SH "SEE ALSO"
.rs
.sp
-\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2callout\fP(3).
+\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2callout\fP(3),
+\fBpcre2unicode\fP(3).
.
.
.SH AUTHOR
@@ -921,6 +955,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 25 January 2020
+Last updated: 04 October 2020
Copyright (c) 1997-2020 University of Cambridge.
.fi