From b3f42a32920b20ae71988bc1d06a7148e0211925 Mon Sep 17 00:00:00 2001 From: ph10 Date: Sat, 25 Jan 2020 15:50:44 +0000 Subject: Ensure a newline after the final line in a file is output by pcre2grep. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1211 6239d852-aaf2-0410-a92c-79f79f948069 --- doc/html/pcre2grep.html | 84 ++++++++++++++++++++++++++++++------------------- 1 file changed, 52 insertions(+), 32 deletions(-) (limited to 'doc/html') diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html index f5b72f3..abbafa1 100644 --- a/doc/html/pcre2grep.html +++ b/doc/html/pcre2grep.html @@ -148,7 +148,7 @@ ignored. By default, a file that contains a binary zero byte within the first 1024 bytes is identified as a binary file, and is processed specially. (GNU grep identifies binary files in this manner.) However, if the newline type is -specified as "nul", that is, the line terminator is a binary zero, the test for +specified as NUL, that is, the line terminator is a binary zero, the test for a binary file is not applied. See the --binary-files option for a means of changing the way binary files are handled.

@@ -601,25 +601,32 @@ does not work when input is read line by line (see \fP--line-buffered\fP.)

-N newline-type, --newline=newline-type -The PCRE2 library supports five different conventions for indicating -the ends of lines. They are the single-character sequences CR (carriage return) -and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention, -which recognizes any of the preceding three types, and an "any" convention, in -which any Unicode line ending sequence is assumed to end a line. The Unicode -sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF -(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and -PS (paragraph separator, U+2029). +Six different conventions for indicating the ends of lines in scanned files are +supported. For example: +

+  pcre2grep -N CRLF 'some pattern' <file>
+
+The newline type may be specified in upper, lower, or mixed case. If the +newline type is NUL, lines are separated by binary zero characters. The other +types are the single-character sequences CR (carriage return) and LF +(linefeed), the two-character sequence CRLF, an "anycrlf" type, which +recognizes any of the preceding three types, and an "any" type, for which any +Unicode line ending sequence is assumed to end a line. The Unicode sequences +are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed, +U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS +(paragraph separator, U+2029).

When the PCRE2 library is built, a default line-ending sequence is specified. This is normally the standard sequence for the operating system. Unless otherwise specified by this option, pcre2grep uses the library's default. -The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This -makes it possible to use pcre2grep to scan files that have come from -other environments without having to modify their line endings. If the data -that is being scanned does not agree with the convention set by this option, -pcre2grep may behave in strange ways. Note that this option does not -apply to files specified by the -f, --exclude-from, or +
+
+This option makes it possible to use pcre2grep to scan files that have +come from other environments without having to modify their line endings. If +the data that is being scanned does not agree with the convention set by this +option, pcre2grep may behave in strange ways. Note that this option does +not apply to files specified by the -f, --exclude-from, or --include-from options, which are expected to use the operating system's standard newline sequence.

@@ -640,12 +647,14 @@ use of JIT at run time. It is provided for testing and working round problems. It should never be needed in normal use.

--O text, --output=text +-O text, --output=text When there is a match, instead of outputting the whole line that matched, -output just the given text. This option is mutually exclusive with ---only-matching, --file-offsets, and --line-offsets. Escape -sequences starting with a dollar character may be used to insert the contents -of the matched part of the line and/or captured substrings into the text. +output just the given text, followed by an operating-system standard newline. +The --newline option has no effect on this option, which is mutually +exclusive with --only-matching, --file-offsets, and +--line-offsets. Escape sequences starting with a dollar character may be +used to insert the contents of the matched part of the line and/or captured +substrings into the text.

$<digits> or ${<digits>} is replaced by the captured @@ -807,16 +816,27 @@ by the --locale option. If no locale is set, the PCRE2 library's default
NEWLINES

The -N (--newline) option allows pcre2grep to scan files with -different newline conventions from the default. Any parts of the input files -that are written to the standard output are copied identically, with whatever -newline sequences they have in the input. However, the setting of this option -affects only the way scanned files are processed. It does not affect the -interpretation of files specified by the -f, --file-list, ---exclude-from, or --include-from options, nor does it affect the -way in which pcre2grep writes informational messages to the standard -error and output streams. For these it uses the string "\n" to indicate -newlines, relying on the C I/O library to convert this to an appropriate -sequence. +newline conventions that differ from the default. This option affects only the +way scanned files are processed. It does not affect the interpretation of files +specified by the -f, --file-list, --exclude-from, or +--include-from options. +

+

+Any parts of the scanned input files that are written to the standard output +are copied with whatever newline sequences they have in the input. However, if +the final line of a file is output, and it does not end with a newline +sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF +or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a +single NL is used. +

+

+The newline setting does not affect the way in which pcre2grep writes +newlines in informational messages to the standard output and error streams. +Under Windows, the standard output is set to be binary, so that "\r\n" at the +ends of output lines that are copied from the input is not converted to +"\r\r\n" by the C I/O library. This means that any messages written to the +standard output must end with "\r\n". For all other operating systems, and +for all messages to the standard error stream, "\n" is used.


OPTIONS COMPATIBILITY

@@ -992,9 +1012,9 @@ Cambridge, England.


REVISION

-Last updated: 15 June 2019 +Last updated: 25 January 2020
-Copyright © 1997-2019 University of Cambridge. +Copyright © 1997-2020 University of Cambridge.

Return to the PCRE2 index page. -- cgit v1.2.1