From 8d3afeebcc2bdf2e8fd4ed1c5256e54be95f36a1 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 29 Apr 2023 23:41:14 -0700 Subject: doc: improve doc for -P '\d' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This follows up to Carlo Marcelo Arenas Belón’s email that proposed changing the code too. These patches change only the documentation since we’re so near a release. * NEWS: Be less optimistic about the fix for -P '\d', and warn that behavior is likely to change again. * doc/grep.texi (grep Programs): Be less specific about -P \d behavior, since it’s still in flux. Warn about mismatching Unicode versions, or disagreements about obscure constructs. --- NEWS | 14 ++++++++------ doc/grep.texi | 13 +++++-------- 2 files changed, 13 insertions(+), 14 deletions(-) diff --git a/NEWS b/NEWS index c15764ce..995d14ef 100644 --- a/NEWS +++ b/NEWS @@ -4,11 +4,12 @@ GNU grep NEWS -*- outline -*- ** Bug fixes - With -P, patterns like [\d] now work again. The fix relies on PCRE2 - support for the PCRE2_EXTRA_ASCII_BSD flag planned for PCRE2 10.43. - With PCRE2 version 10.42 or earlier, behavior reverts to that of - grep 3.8, in that patterns like \w and \b use ASCII rather than - Unicode interpretations. + With -P, patterns like [\d] now work again. Fixing this has caused + grep to revert to the behavior of grep 3.8, in that patterns like \w + and \b go back to using ASCII rather than Unicode interpretations. + However, future versions of GNU grep and/or PCRE2 are likely to fix + this and change the behavior of \w and \b back to Unicode again, + without breaking [\d] as 3.10 did. [bug introduced in grep 3.10] grep no longer fails on files dated after the year 2038, @@ -25,7 +26,8 @@ GNU grep NEWS -*- outline -*- previous versions of grep wouldn't respect the user provided settings for PCRE_CFLAGS and PCRE_LIBS when building if a libpcre2-8 pkg-config module - found in the system. + was found. + * Noteworthy changes in release 3.10 (2023-03-22) [stable] diff --git a/doc/grep.texi b/doc/grep.texi index ce6d6dc0..ff31d5d2 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1154,18 +1154,15 @@ For documentation, refer to @url{https://www.pcre.org/}, with these caveats: @samp{\d} matches only the ten ASCII digits (and @samp{\D} matches the complement), regardless of locale. Use @samp{\p@{Nd@}} to also match non-ASCII digits. - -When @command{grep} is built with PCRE2 10.42 and earlier, -@samp{\d} and @samp{\D} ignore in-regexp directives like @samp{(?aD)} -and work like @samp{[0-9]} and @samp{[^0-9]} respectively. -However, later versions of PCRE2 likely will fix this, -and the plan is for @command{grep} to respect those directives if possible. +(The behavior of @samp{\d} and @samp{\D} is unspecified after +in-regexp directives like @samp{(?aD)}.) @item Although PCRE tracks the syntax and semantics of Perl's regular -expressions, the match is not always exact, partly because Perl +expressions, the match is not always exact. For example, Perl evolves and a Perl installation may predate or postdate the PCRE2 -installation on the same host. +installation on the same host, or their Unicode versions may differ, +or Perl and PCRE2 may disagree about an obscure construct. @item By default, @command{grep} applies each regexp to a line at a time, -- cgit v1.2.1