summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2023-04-29 23:41:14 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2023-04-29 23:42:07 -0700
commit8d3afeebcc2bdf2e8fd4ed1c5256e54be95f36a1 (patch)
treef39e858d7d3324eccf24566df1d13c8fdc7ae8e6
parentc3259803fe255fb55f2cfcdf4cf5bd94ae3befdd (diff)
downloadgrep-8d3afeebcc2bdf2e8fd4ed1c5256e54be95f36a1.tar.gz
doc: improve doc for -P '\d'
This follows up to Carlo Marcelo Arenas Belón’s email <https://lists.gnu.org/r/grep-devel/2023-04/msg00017.html> that proposed changing the code too. These patches change only the documentation since we’re so near a release. * NEWS: Be less optimistic about the fix for -P '\d', and warn that behavior is likely to change again. * doc/grep.texi (grep Programs): Be less specific about -P \d behavior, since it’s still in flux. Warn about mismatching Unicode versions, or disagreements about obscure constructs.
-rw-r--r--NEWS14
-rw-r--r--doc/grep.texi13
2 files changed, 13 insertions, 14 deletions
diff --git a/NEWS b/NEWS
index c15764ce..995d14ef 100644
--- a/NEWS
+++ b/NEWS
@@ -4,11 +4,12 @@ GNU grep NEWS -*- outline -*-
** Bug fixes
- With -P, patterns like [\d] now work again. The fix relies on PCRE2
- support for the PCRE2_EXTRA_ASCII_BSD flag planned for PCRE2 10.43.
- With PCRE2 version 10.42 or earlier, behavior reverts to that of
- grep 3.8, in that patterns like \w and \b use ASCII rather than
- Unicode interpretations.
+ With -P, patterns like [\d] now work again. Fixing this has caused
+ grep to revert to the behavior of grep 3.8, in that patterns like \w
+ and \b go back to using ASCII rather than Unicode interpretations.
+ However, future versions of GNU grep and/or PCRE2 are likely to fix
+ this and change the behavior of \w and \b back to Unicode again,
+ without breaking [\d] as 3.10 did.
[bug introduced in grep 3.10]
grep no longer fails on files dated after the year 2038,
@@ -25,7 +26,8 @@ GNU grep NEWS -*- outline -*-
previous versions of grep wouldn't respect the user provided settings for
PCRE_CFLAGS and PCRE_LIBS when building if a libpcre2-8 pkg-config module
- found in the system.
+ was found.
+
* Noteworthy changes in release 3.10 (2023-03-22) [stable]
diff --git a/doc/grep.texi b/doc/grep.texi
index ce6d6dc0..ff31d5d2 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1154,18 +1154,15 @@ For documentation, refer to @url{https://www.pcre.org/}, with these caveats:
@samp{\d} matches only the ten ASCII digits
(and @samp{\D} matches the complement), regardless of locale.
Use @samp{\p@{Nd@}} to also match non-ASCII digits.
-
-When @command{grep} is built with PCRE2 10.42 and earlier,
-@samp{\d} and @samp{\D} ignore in-regexp directives like @samp{(?aD)}
-and work like @samp{[0-9]} and @samp{[^0-9]} respectively.
-However, later versions of PCRE2 likely will fix this,
-and the plan is for @command{grep} to respect those directives if possible.
+(The behavior of @samp{\d} and @samp{\D} is unspecified after
+in-regexp directives like @samp{(?aD)}.)
@item
Although PCRE tracks the syntax and semantics of Perl's regular
-expressions, the match is not always exact, partly because Perl
+expressions, the match is not always exact. For example, Perl
evolves and a Perl installation may predate or postdate the PCRE2
-installation on the same host.
+installation on the same host, or their Unicode versions may differ,
+or Perl and PCRE2 may disagree about an obscure construct.
@item
By default, @command{grep} applies each regexp to a line at a time,