summaryrefslogtreecommitdiff
path: root/doc/pcregrep.1
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcregrep.1')
-rw-r--r--doc/pcregrep.1312
1 files changed, 216 insertions, 96 deletions
diff --git a/doc/pcregrep.1 b/doc/pcregrep.1
index f1244e4..1dfe310 100644
--- a/doc/pcregrep.1
+++ b/doc/pcregrep.1
@@ -2,7 +2,7 @@
.SH NAME
pcregrep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
-.B pcregrep [options] [long options] [pattern] [file1 file2 ...]
+.B pcregrep [options] [long options] [pattern] [path1 path2 ...]
.
.SH DESCRIPTION
.rs
@@ -16,8 +16,22 @@ patterns that are compatible with the regular expressions of Perl 5. See
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
.P
-A pattern must be specified on the command line unless the \fB-f\fP option is
-used (see below).
+Patterns, whether supplied on the command line or in a separate file, are given
+without delimiters. For example:
+.sp
+ pcregrep Thursday /etc/motd
+.sp
+If you attempt to use delimiters (for example, by surrounding a pattern with
+slashes, as is common in Perl scripts), they are interpreted as part of the
+pattern. Quotes can of course be used on the command line because they are
+interpreted by the shell, and indeed they are required if a pattern contains
+white space or shell metacharacters.
+.P
+The first argument that follows any option settings is treated as the single
+pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
+Conversely, when one or both of these options are used to specify patterns, all
+arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an
+argument pattern must be provided.
.P
If no files are specified, \fBpcregrep\fP reads the standard input. The
standard input can also be referenced by a name consisting of a single hyphen.
@@ -26,45 +40,97 @@ For example:
pcregrep some-pattern /file1 - /file3
.sp
By default, each line that matches the pattern is copied to the standard
-output, and if there is more than one file, the file name is printed before
-each line of output. However, there are options that can change how
+output, and if there is more than one file, the file name is output at the
+start of each line. However, there are options that can change how
\fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it possible to
search for patterns that span line boundaries.
.P
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
BUFSIZ is defined in \fB<stdio.h>\fP.
+.P
+If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
+\fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
+The \fB--locale\fP option can be used to override this.
.
.SH OPTIONS
.rs
.TP 10
\fB--\fP
This terminate the list of options. It is useful if the next item on the
-command line starts with a hyphen, but is not an option.
+command line starts with a hyphen but is not an option. This allows for the
+processing of patterns and filenames that start with hyphens.
.TP
-\fB-A\fP \fInumber\fP
-Print \fInumber\fP lines of context after each matching line. If file names
-and/or line numbers are being printed, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is printed between each
+\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
+Output \fInumber\fP lines of context after each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
-guarantees to have up to 8K of following text available for context printing.
+guarantees to have up to 8K of following text available for context output.
.TP
-\fB-B\fP \fInumber\fP
-Print \fInumber\fP lines of context before each matching line. If file names
-and/or line numbers are being printed, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is printed between each
+\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
+Output \fInumber\fP lines of context before each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
-guarantees to have up to 8K of preceding text available for context printing.
+guarantees to have up to 8K of preceding text available for context output.
.TP
-\fB-C\fP \fInumber\fP
-Print \fInumber\fP lines of context both before and after each matching line.
+\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
+Output \fInumber\fP lines of context both before and after each matching line.
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
.TP
-\fB-c\fP
-Do not print individual lines; instead just print a count of the number of
-lines that would otherwise have been printed. If several files are given, a
-count is printed for each of them.
+\fB-c\fP, \fB--count\fP
+Do not output individual lines; instead just output a count of the number of
+lines that would otherwise have been output. If several files are given, a
+count is output for each of them. In this mode, the \fB-A\fP, \fB-B\fP, and
+\fB-C\fP options are ignored.
+.TP
+\fB--colour\fP, \fB--color\fP
+If this option is given without any data, it is equivalent to "--colour=auto".
+If data is required, it must be given in the same shell item, separated by an
+equals sign.
+.TP
+\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP
+This option specifies under what circumstances the part of a line that matched
+a pattern should be coloured in the output. The value may be "never" (the
+default), "always", or "auto". In the latter case, colouring happens only if
+the standard output is connected to a terminal. The colour can be specified by
+setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
+of this variable should be a string of two numbers, separated by a semicolon.
+They are copied directly into the control string for setting colour on a
+terminal, so it is your responsibility to ensure that they make sense. If
+neither of the environment variables is set, the default is "1;31", which gives
+red.
+.TP
+\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
+If an input path is not a regular file or a directory, "action" specifies how
+it is to be processed. Valid values are "read" (the default) or "skip"
+(silently skip the path).
+.TP
+\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP
+If an input path is a directory, "action" specifies how it is to be processed.
+Valid values are "read" (the default), "recurse" (equivalent to the \fB-r\fP
+option), or "skip" (silently skip the path). In the default case, directories
+are read as if they were ordinary files. In some operating systems the effect
+of reading a directory like this is an immediate end-of-file.
+.TP
+\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP,
+\fB--regexp=\fP\fIpattern\fP Specify a pattern to be matched. This option can
+be used multiple times in order to specify several patterns. It can also be
+used as a way of specifying a single pattern that starts with a hyphen. When
+\fB-e\fP is used, no argument pattern is taken from the command line; all
+arguments are treated as file names. There is an overall maximum of 100
+patterns. They are applied to each line in the order in which they are defined
+until one matches (or fails to match if \fB-v\fP is used). If \fB-f\fP is used
+with \fB-e\fP, the command line patterns are matched first, followed by the
+patterns from the file, independent of the order in which these options are
+specified. Note that multiple use of \fB-e\fP is not the same as a single
+pattern with alternatives. For example, X|Y finds the first character in a line
+that is X or Y, whereas if the two patterns are given separately,
+\fBpcregrep\fP finds X if it is present, even if it follows Y in the line. It
+finds Y only if there is no X in the line. This really matters only if you are
+using \fB-o\fP to show the portion of the line that matched.
.TP
\fB--exclude\fP=\fIpattern\fP
When \fBpcregrep\fP is searching the files in a directory as a consequence of
@@ -73,43 +139,74 @@ are excluded. The pattern is a PCRE regular expression. If a file name matches
both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short
form for this option.
.TP
-\fB-f\fP\fIfilename\fP
-Read a number of patterns from the file, one per line, and match all of them
-against each line of input. A line is output if any of the patterns match it.
-When \fB-f\fP is used, no pattern is taken from the command line; all arguments
-are treated as file names. There is a maximum of 100 patterns. Trailing white
-space is removed, and blank lines are ignored. An empty file contains no
-patterns and therefore matches nothing.
+\fB-F\fP, \fB--fixed-strings\fP
+Interpret each pattern as a list of fixed strings, separated by newlines,
+instead of as a regular expression. The \fB-w\fP (match as a word) and \fB-x\fP
+(match whole line) options can be used with \fB-F\fP. They apply to each of the
+fixed strings. A line is selected if any of the fixed strings are found in it
+(subject to \fB-w\fP or \fB-x\fP, if present).
+.TP
+\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP
+Read a number of patterns from the file, one per line, and match them against
+each line of input. A data line is output if any of the patterns match it. The
+filename can be given as "-" to refer to the standard input. When \fB-f\fP is
+used, patterns specified on the command line using \fB-e\fP may also be
+present; they are tested before the file's patterns. However, no other pattern
+is taken from the command line; all arguments are treated as file names. There
+is an overall maximum of 100 patterns. Trailing white space is removed from
+each line, and blank lines are ignored. An empty file contains no patterns and
+therefore matches nothing.
.TP
-\fB-h\fP
-Suppress printing of filenames when searching multiple files.
+\fB-H\fP, \fB--with-filename\fP
+Force the inclusion of the filename at the start of output lines when searching
+a single file. By default, the filename is not shown in this case. For matching
+lines, the filename is followed by a colon and a space; for context lines, a
+hyphen separator is used. If a line number is also being output, it follows the
+file name without a space.
.TP
-\fB-i\fP
+\fB-h\fP, \fB--no-filename\fP
+Suppress the output filenames when searching multiple files. By default,
+filenames are shown when multiple files are searched. For matching lines, the
+filename is followed by a colon and a space; for context lines, a hyphen
+separator is used. If a line number is also being output, it follows the file
+name without a space.
+.TP
+\fB--help\fP
+Output a brief help message and exit.
+.TP
+\fB-i\fP, \fB--ignore-case\fP
Ignore upper/lower case distinctions during comparisons.
.TP
\fB--include\fP=\fIpattern\fP
When \fBpcregrep\fP is searching the files in a directory as a consequence of
-the \fB-r\fP (recursive search) option, only files whose names match the
+the \fB-r\fP (recursive search) option, only those files whose names match the
pattern are included. The pattern is a PCRE regular expression. If a file name
matches both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no
short form for this option.
.TP
-\fB-L\fP
-Instead of printing lines from the files, just print the names of the files
-that do not contain any lines that would have been printed. Each file name is
-printed once, on a separate line.
+\fB-L\fP, \fB--files-without-match\fP
+Instead of outputting lines from the files, just output the names of the files
+that do not contain any lines that would have been output. Each file name is
+output once, on a separate line.
.TP
-\fB-l\fP
-Instead of printing lines from the files, just print the names of the files
-containing lines that would have been printed. Each file name is printed
-once, on a separate line.
+\fB-l\fP, \fB--files-with-matches\fP
+Instead of outputting lines from the files, just output the names of the files
+containing lines that would have been output. Each file name is output
+once, on a separate line. Searching stops as soon as a matching line is found
+in a file.
.TP
\fB--label\fP=\fIname\fP
This option supplies a name to be used for the standard input when file names
-are being printed. If not supplied, "(standard input)" is used. There is no
+are being output. If not supplied, "(standard input)" is used. There is no
short form for this option.
.TP
-\fB-M\fP
+\fB--locale\fP=\fIlocale-name\fP
+This option specifies a locale to be used for pattern matching. It overrides
+the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
+locale is specified, the PCRE library's default (usually the "C" locale) is
+used. There is no short form for this option.
+.TP
+\fB-M\fP, \fB--multiline\fP
Allow patterns to match more than one line. When this option is given, patterns
may usefully contain literal newline characters and internal occurrences of ^
and $ characters. The output for any one match may consist of more than one
@@ -121,74 +218,74 @@ that \fBpcregrep\fP buffers the input file as it scans it. However,
the previous 8K characters (or all the previous characters, if fewer than 8K)
are guaranteed to be available for lookbehind assertions.
.TP
-\fB-n\fP
-Precede each line by its line number in the file.
+\fB-n\fP, \fB--line-number\fP
+Precede each output line by its line number in the file, followed by a colon
+and a space for matching lines or a hyphen and a space for context lines. If
+the filename is also being output, it precedes the line number.
.TP
-\fB-q\fP
-Work quietly, that is, display nothing except error messages.
-The exit status indicates whether or not any matches were found.
+\fB-o\fP, \fB--only-matching\fP
+Show only the part of the line that matched a pattern. In this mode, no
+context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
+ignored.
.TP
-\fB-r\fP
+\fB-q\fP, \fB--quiet\fP
+Work quietly, that is, display nothing except error messages. The exit
+status indicates whether or not any matches were found.
+.TP
+\fB-r\fP, \fB--recursive\fP
If any given path is a directory, recursively scan the files it contains,
-taking note of any \fB--include\fP and \fB--exclude\fP settings. Without
-\fB-r\fP a directory is scanned as a normal file.
+taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a
+directory is read as a normal file; in some operating systems this gives an
+immediate end-of-file. This option is a shorthand for setting the \fB-d\fP
+option to "recurse".
.TP
-\fB-s\fP
+\fB-s\fP, \fB--no-messages\fP
Suppress error messages about non-existent or unreadable files. Such files are
quietly skipped. However, the return code is still 2, even if matches were
found in other files.
.TP
-\fB-u\fP
+\fB-u\fP, \fB--utf-8\fP
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
-with UTF-8 support. Both the pattern and each subject line must be valid
-strings of UTF-8 characters.
+with UTF-8 support. Both patterns and subject lines must be valid strings of
+UTF-8 characters.
.TP
-\fB-V\fP
+\fB-V\fP, \fB--version\fP
Write the version numbers of \fBpcregrep\fP and the PCRE library that is being
used to the standard error stream.
.TP
-\fB-v\fP
-Invert the sense of the match, so that lines which do \fInot\fP match the
-pattern are the ones that are found.
+\fB-v\fP, \fB--invert-match\fP
+Invert the sense of the match, so that lines which do \fInot\fP match any of
+the patterns are the ones that are found.
.TP
-\fB-w\fP
-Force the pattern to match only whole words. This is equivalent to having \eb
+\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
+Force the patterns to match only whole words. This is equivalent to having \eb
at the start and end of the pattern.
.TP
-\fB-x\fP
-Force the pattern to be anchored (it must start matching at the beginning of
-the line) and in addition, require it to match the entire line. This is
+\fB-x\fP, \fB--line-regex\fP, \fP--line-regexp\fP
+Force the patterns to be anchored (each must start matching at the beginning of
+a line) and in addition, require them to match entire lines. This is
equivalent to having ^ and $ characters at the start and end of each
-alternative branch in the regular expression.
+alternative branch in every pattern.
+.
.
-.SH "LONG OPTIONS"
+.SH "ENVIRONMENT VARIABLES"
.rs
.sp
-Long forms of all the options are available, as in GNU grep. They are shown in
-the following table:
+The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that
+order, for a locale. The first one that is set is used. This can be overridden
+by the \fB--locale\fP option. If no locale is set, the PCRE library's default
+(usually the "C" locale) is used.
+.
+.
+.SH "OPTIONS COMPATIBILITY"
+.rs
.sp
- -A --after-context
- -B --before-context
- -C --context
- -c --count
- --exclude (no short form)
- -f --file
- -h --no-filename
- --help (no short form)
- -i --ignore-case
- --include (no short form)
- -L --files-without-match
- -l --files-with-matches
- --label (no short form)
- -n --line-number
- -r --recursive
- -q --quiet
- -s --no-messages
- -u --utf-8
- -V --version
- -v --invert-match
- -x --line-regex
- -x --line-regexp
+The majority of short and long forms of \fBpcregrep\fP's options are the same
+as in the GNU \fBgrep\fP program. Any long option of the form
+\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP
+(PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--multiline\fP,
+\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP.
+.
.
.SH "OPTIONS WITH DATA"
.rs
@@ -201,20 +298,43 @@ command line item. For example:
-f /some/file
.sp
If a long form option is used, the data may appear in the same command line
-item, separated by an = character, or it may appear in the next command line
-item. For example:
+item, separated by an equals character, or (with one exception) it may appear
+in the next command line item. For example:
.sp
--file=/some/file
--file /some/file
.sp
+Note, however, that if you want to supply a file name beginning with ~ as data
+in a shell command, and have the shell expand ~ to a home directory, you must
+separate the file name from the option, because the shell does not treat ~
+specially unless it is at the start of an item.
+.P
+The exception to the above is the \fB--colour\fP (or \fB--color\fP) option,
+for which the data is optional. If this option does have data, it must be given
+in the first form, using an equals character. Otherwise it will be assumed that
+it has no data.
+.
+.
+.SH MATCHING ERRORS
+.rs
+.sp
+It is possible to supply a regular expression that takes a very long time to
+fail to match certain lines. Such patterns normally involve nested indefinite
+repeats, for example: (a+)*\ed when matched against a line of a's with no final
+digit. The PCRE matching function has a resource limit that causes it to abort
+in these circumstances. If this happens, \fBpcregrep\fP outputs an error
+message and the line that caused the problem to the standard error stream. If
+there are more than 20 such errors, \fBpcregrep\fP gives up.
+.
.
.SH DIAGNOSTICS
.rs
.sp
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors and non-existent or inacessible files (even if matches were
-found in other files). Using the \fB-s\fP option to suppress error messages
-about inaccessble files does not affect the return code.
+found in other files) or too many matching errors. Using the \fB-s\fP option to
+suppress error messages about inaccessble files does not affect the return
+code.
.
.
.SH AUTHOR
@@ -227,6 +347,6 @@ University Computing Service
Cambridge CB2 3QG, England.
.P
.in 0
-Last updated: 16 May 2005
+Last updated: 23 January 2006
.br
-Copyright (c) 1997-2005 University of Cambridge.
+Copyright (c) 1997-2006 University of Cambridge.