diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2023-04-29 23:41:14 -0700 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2023-04-29 23:42:07 -0700 |
commit | 8d3afeebcc2bdf2e8fd4ed1c5256e54be95f36a1 (patch) | |
tree | f39e858d7d3324eccf24566df1d13c8fdc7ae8e6 | |
parent | c3259803fe255fb55f2cfcdf4cf5bd94ae3befdd (diff) | |
download | grep-8d3afeebcc2bdf2e8fd4ed1c5256e54be95f36a1.tar.gz |
doc: improve doc for -P '\d'
This follows up to Carlo Marcelo Arenas Belón’s email
<https://lists.gnu.org/r/grep-devel/2023-04/msg00017.html>
that proposed changing the code too. These patches change
only the documentation since we’re so near a release.
* NEWS: Be less optimistic about the fix for -P '\d',
and warn that behavior is likely to change again.
* doc/grep.texi (grep Programs): Be less specific about -P \d
behavior, since it’s still in flux. Warn about mismatching
Unicode versions, or disagreements about obscure constructs.
-rw-r--r-- | NEWS | 14 | ||||
-rw-r--r-- | doc/grep.texi | 13 |
2 files changed, 13 insertions, 14 deletions
@@ -4,11 +4,12 @@ GNU grep NEWS -*- outline -*- ** Bug fixes - With -P, patterns like [\d] now work again. The fix relies on PCRE2 - support for the PCRE2_EXTRA_ASCII_BSD flag planned for PCRE2 10.43. - With PCRE2 version 10.42 or earlier, behavior reverts to that of - grep 3.8, in that patterns like \w and \b use ASCII rather than - Unicode interpretations. + With -P, patterns like [\d] now work again. Fixing this has caused + grep to revert to the behavior of grep 3.8, in that patterns like \w + and \b go back to using ASCII rather than Unicode interpretations. + However, future versions of GNU grep and/or PCRE2 are likely to fix + this and change the behavior of \w and \b back to Unicode again, + without breaking [\d] as 3.10 did. [bug introduced in grep 3.10] grep no longer fails on files dated after the year 2038, @@ -25,7 +26,8 @@ GNU grep NEWS -*- outline -*- previous versions of grep wouldn't respect the user provided settings for PCRE_CFLAGS and PCRE_LIBS when building if a libpcre2-8 pkg-config module - found in the system. + was found. + * Noteworthy changes in release 3.10 (2023-03-22) [stable] diff --git a/doc/grep.texi b/doc/grep.texi index ce6d6dc0..ff31d5d2 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1154,18 +1154,15 @@ For documentation, refer to @url{https://www.pcre.org/}, with these caveats: @samp{\d} matches only the ten ASCII digits (and @samp{\D} matches the complement), regardless of locale. Use @samp{\p@{Nd@}} to also match non-ASCII digits. - -When @command{grep} is built with PCRE2 10.42 and earlier, -@samp{\d} and @samp{\D} ignore in-regexp directives like @samp{(?aD)} -and work like @samp{[0-9]} and @samp{[^0-9]} respectively. -However, later versions of PCRE2 likely will fix this, -and the plan is for @command{grep} to respect those directives if possible. +(The behavior of @samp{\d} and @samp{\D} is unspecified after +in-regexp directives like @samp{(?aD)}.) @item Although PCRE tracks the syntax and semantics of Perl's regular -expressions, the match is not always exact, partly because Perl +expressions, the match is not always exact. For example, Perl evolves and a Perl installation may predate or postdate the PCRE2 -installation on the same host. +installation on the same host, or their Unicode versions may differ, +or Perl and PCRE2 may disagree about an obscure construct. @item By default, @command{grep} applies each regexp to a line at a time, |