| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves runtime checking for integer overflow when compiling
with gcc -fsanitize=undefined and the like. It also avoids
the need for some integer casts, which can be error-prone.
* bootstrap.conf (gnulib_modules): Add idx.
* src/dfasearch.c (struct dfa_comp, kwsmusts):
(possible_backrefs_in_pattern, regex_compile, GEAcompile)
(EGexecute):
* src/grep.c (struct patloc, patlocs_allocated, patlocs_used)
(n_patterns, update_patterns, pattern_file_name, poison_len)
(asan_poison, fwrite_errno, compile_fp_t, execute_fp_t)
(buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls)
(bufalloc, pagesize, all_zeros, fillbuf, nlscan)
(print_line_head, print_line_middle, print_line_tail, grepbuf)
(grep, contains_encoding_error, fgrep_icase_available)
(fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern)
(main):
* src/kwsearch.c (struct kwsearch, Fcompile, Fexecute):
* src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr)
(kwswords, treefails, memchr_kwset, acexec_trans, kwsexec)
(treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec)
(acexec):
* src/kwset.h (struct kwsmatch):
* src/pcresearch.c (Pcompile, Pexecute):
* src/search.h (mb_clen):
* src/searchutils.c (kwsinit, mb_goback, wordchars_count)
(wordchars_size, wordchar_next, wordchar_prev):
Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes,
and prefer ptrdiff_t to size_t for sizes plus error values.
* src/grep.c (uword_size): New constant, used for signed
size calculations.
(totalnl, add_count, totalcc, print_offset, print_line_head, grep):
Prefer intmax_t to uintmax_t for wide integer calculations.
(fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets.
* src/grep.h: Include idx.h.
* src/search.h (imbrlen): New function, like mbrlen except
with idx_t and ptrdiff_t.
|
|
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): When scanning backward through
UTF-8, check the length implied by the putative byte 1 before
bothering to invoke mb_clen. This length check also lets us use
mbrlen directly rather than calling mb_clen, which would
eventually defer to mbrlen anyway.
|
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): Set *MBCLEN only in
non-UTF-8 encodings, since that’s the only time it’s needed,
and this lets us see more clearly that the UTF-8 clen value
is not useful to the caller.
|
|
|
|
|
| |
* src/searchutils.c (wordchar_prev): Tweak performance by using a
value already in a local variable rather than consulting a table.
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): Improve the comment to better
describe this confusing function. And remove an unnecessary
test of cur vs end.
|
|
|
|
| |
* src/kwset.c (struct kwset.maxd): Remove. All uses removed.
|
|
|
|
|
|
|
|
|
| |
This helps move the code away from unsigned types.
* src/grep.c (buf_has_encoding_errors, contains_encoding_error):
* src/searchutils.c (mb_goback):
Compare to MB_LEN_MAX, not to (size_t) -2. This is a bit safer
anyway, as grep relies on MB_LEN_MAX limits elsewhere.
* src/search.h (mb_clen): Compare to -2 before converting to size_t.
|
|
|
|
|
|
|
|
| |
Inspired by bug#50129 even though this is a different bug.
* src/grep.c (main): For ‘-f -’, use clearerr (stdin) after
reading, so that ‘grep -f - -f -’ reads stdin twice even
when stdin is a tty. Also, for ‘-f FILE’, report any
I/O error when closing FILE.
|
|
|
|
|
| |
Problem reported by Alex Murray (bug#50093).
* src/grep.c (hash_pattern): Use a nonzero initial value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Mention this (see bug#49996).
* doc/Makefile.am (egrep.1 fgrep.1): Remove. All uses removed.
* doc/grep.in.1, doc/grep.texi (grep Programs):
Remove documentation for egrep, fgrep.
* doc/grep.texi (Usage): Add FAQ for egrep and fgrep.
* src/Makefile.am (shell_does_substrings): Substitute for ${0##*/},
not for ${0%/\*} (which was not being used anyway).
* src/egrep.sh: Issue an obsolescence warning.
* tests/fedora: Use "grep -F" instead of "fgrep" in diagnostics,
as this tests "grep -F" not "fgrep".
|
| |
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Remove a label and goto.
This also makes the machine code a bit shorter, on x86-64 gcc.
|
|
|
|
| |
* src/grep.c (fillbuf): Simplify movement of saved data.
|
|
|
|
|
|
| |
* src/grep.c (ALIGN_TO): When converting pointers to unsigned
integers, convert to uintptr_t not size_t, as size_t in theory
might be too narrow.
|
|
|
|
|
| |
* src/grep.c (usage): Document --group-separator
and --no-group-separator.
|
| |
|
|
|
|
|
| |
* src/dfasearch.c (regex_compile): Parenthesize arith-OR vs
ternary, to placate clang-10.
|
|
|
|
|
|
|
| |
* NEWS (Change in behavior): Mention this.
* src/grep.c (main): Warn about each use of obsolete
--unix-byte-offsets (-u).
* doc/grep.in.1 (-u): Remove its documentation.
|
|
|
|
|
|
|
|
|
|
| |
* src/grep.c (hash_pattern): Switch from PJW to DJB2, to avoid an
O(N) to O(N^2) performance regression due to hash collisions with
patterns from e.g., seq 500000|tr 0-9 A-J
Reported by Frank Heckenbach in https://bugs.gnu.org/44754
* NEWS (Bug fixes): Mention it.
* tests/hash-collision-perf: New file.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
|
| |
* gnulib: Update submodule to latest.
* src/grep.c (printf_errno): Reflect gnulib's renaming: change
_GL_ATTRIBUTE_FORMAT_PRINTF to
_GL_ATTRIBUTE_FORMAT_PRINTF_STANDARD
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Mention this.
* doc/grep.in.1:
Remove GREP_OPTIONS documentation.
* doc/grep.texi (Environment Variables):
Move GREP_OPTIONS stuff into a “no longer implemented” paragraph.
* src/grep.c (prepend_args, prepend_default_options): Remove.
(main): Do not look at GREP_OPTIONS.
* tests/Makefile.am (TESTS_ENVIRONMENTS):
* tests/init.cfg (vars_): Remove GREP_OPTIONS.
|
|
|
|
|
|
|
| |
* src/dfasearch.c (regex_compile): New parameter. All callers changed.
(GEAcompile): Move setting syntax for regex into regex_compile()
function. This addresses a performance problem exposed by extreme
regular expressions, as described in https://bugs.gnu.org/43862 .
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, it could be tedious to determine which input
file evokes a PCRE-execution-time failure.
* src/pcresearch.c (Pexecute): When failing, include the
error-provoking file name in the diagnostic.
* src/grep.c (input_filename): Make extern, since used above.
* src/search.h (input_filename): Declare.
* tests/filename-lineno.pl: Test for this.
($no_pcre): Factor out.
* NEWS (Bug fixes): Mention this.
|
|
|
|
|
|
|
|
|
| |
* src/kwsearch.c (Fexecute):
Assume C99 to put declarations nearer uses.
* src/kwset.c (bmexec): Omit unnecessary test.
* src/kwset.h (struct kwsmatch): Make OFFSET and SIZE individual
elements, not arrays of size 1 (a revenant of an earlier API).
All uses changed.
|
|
|
|
|
| |
* src/kwsearch.c (Fcompile, Fexecute): Remove unused code. No longer these
are used after commit 016e590a8198009bce0e1078f6d4c7e037e2df3c.
|
|
|
|
|
|
|
| |
This suppresses a false alarm '"grep.c", line 720: warning:
initializer will be sign-extended: -1'.
* src/grep.c (uword_max): New static constant.
(initialize_unibyte_mask): Use it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577).
* NEWS: Mention new bug report.
* src/grep.c (ok_fold): New static var.
(setup_ok_fold): New function.
(fgrep_icase_charlen): Reject single-byte characters
if they match some multibyte characters when ignoring case.
This part of the patch is partly derived from
<https://bugs.gnu.org/43577#14>, which means it is:
Co-authored-by: Norihiro Tanaka <noritnk@kcn.ne.jp>
(main): Call setup_ok_fold if ok_fold might be needed.
* src/searchutils.c (kwsinit): With the grep.c changes,
this code can now revert to classic 7th Edition Unix style;
aborting would be wrong.
* tests/turkish-eyes: Add tests for these bugs.
|
|
|
|
|
|
| |
* src/grep.c (main): Do not double-increment update_patterns.
update_patterns increments n_patterns now; do not increment it
again, as the incorrect count would hurt performance heuristics later.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Grep resorts to using the regex engine when the precision of either
-o or --color is required, or when the pattern is not supported by
our DFA engine (e.g., backref). Otherwise, grep would perform regex
compilation solely to check the syntax. This change makes grep skip
that compilation in the common case for which it is unnecessary.
The compilation we are avoiding is quite costly, consuming O(N^2)
RSS for N regular expressions.
* src/dfasearch.c (GEAcompile): Add new argument, and avoid unneeded
compilation of regex.
* src/grep.c (compile_fp_t): Update prototype.
(main): Update caller.
* src/kwsearch.c (Fcompile): Update caller and add new argument.
* src/pcresearch.c (Pcompile): Add new argument.
* src/search.h (GEAcompile, Fcompile, Pcompile): Update prototype.
|
| |
|
|
|
|
|
|
|
| |
* src/grep.c (try_fgrep_pattern): With -G, pass \) through to
GEAcompile so that it can complain. This fixes an unexpected
change in behavior from grep 3.4 and earlier.
* tests/filename-lineno.pl: Add tests for this sort of thing.
|
|
|
|
|
| |
* src/grep.c (try_fgrep_pattern): Tweak previous change
by using mempcpy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The same applied for many other backslash-escaped bytes, not just
metacharacters. The switch to rawmemchr in v3.4-almost-10-g9393b97
made some parts of the code require the usually-guaranteed newline
sentinel at the end of each pattern. Before, some consumers used a
(correct) pattern length and did not care that try_fgrep_pattern could
transform a pattern (with sentinel) like "\\.\n" to "..\n", thus
violating that assumption.
* src/grep.c (try_fgrep_pattern): Preserve the invariant
that each regexp is newline-terminated.
* tests/backslash-dot: New file. Test for this.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove 'quote'.
* src/grep.c: Do not include quote.h.
(grep, grepdirent, grepdesc): Put the three unusual diagnostics
into the same "grep: FOO: message" form that grep uses elsewhere.
* tests/binary-file-matches, tests/in-eq-out-infloop:
Adjust tests to match new diagnostic format.
|
|
|
|
|
|
|
|
| |
* src/grep.c (grep): Lower-case the "B" in "Binary file... matches"
diagnostic that we now emit to stderr. This avoids the following
when running "make syntax-check":
maint.mk: found capitalized error message
make: *** [maint.mk:469: sc_error_message_uppercase] Error 1
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS, doc/grep.texi: Mention this change (Bug#29668).
* src/grep.c (grep): Send "Binary file FOO matches" to stderr
instead of stdout.
* tests/encoding-error, tests/invalid-multibyte-infloop:
* tests/null-byte, tests/pcre-count, tests/surrogate-pair:
* tests/symlink, tests/unibyte-binary:
Adjust tests to match new behavior. In all cases this
simplifies the tests, which is a good sign.
|
|
|
|
|
|
|
| |
Problem reported by Jason Franklin (Bug#33552).
* NEWS: Mention this.
* src/grep.c (grep): Do not output "Binary file FOO matches" if -I.
* tests/encoding-error: Add test for this bug.
|
|
|
|
|
|
|
| |
* src/pcresearch.c (jit_exec) [PCRE_EXTRA_MATCH_LIMIT_RECURSION]:
When growing the match_limit_recursion limit, do not use the old
value if ! (flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION), as it is
uninitialized in that case.
|
|
|
|
|
|
|
| |
Problem reported by Thomas Deutschmann (Bug#29446#23).
* src/pcresearch.c (Pexecute): Diagnose PCRE_ERROR_RECURSIONLIMIT.
* tests/pcre-jitstack: Treat recursion limit overflow like stack
overflow.
|
|
|
|
|
|
|
|
|
| |
Problem reported by Mayo Fark (Bug#43225).
* src/searchutils.c (wordchar_prev): In a UTF-8 locale, do not
assume that an encoding-error byte cannot be part of a word
constituent, as this assumption is incorrect for the last byte
of a multibyte word constituent.
* tests/word-delim-multibyte: Add a test for the bug.
|
|
|
|
|
|
|
|
|
|
| |
* bootstrap.conf (gnulib_modules): Add rawmemchr.
* src/dfasearch.c (GEAcompile, EGexecute):
* src/grep.c (update_patterns, prpending, prtext):
* src/kwsearch.c (Fcompile, Fexecute):
* src/pcresearch.c (Pcompile, Pexecute):
Simplify (and presumably speed up a little) by using rawmemchr
with a sentinel, instead of using memchr.
|
|
|
|
|
|
| |
* src/grep.c (pattern_file_name): Make first argument
origin-0, not origin-1, as this simplifies both caller and
callee. All uses changed.
|
|
|
|
|
| |
* src/dfasearch.c (regex_compile): "" suffices; we don’t need "\0".
No need to initialize pat_lineno.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not pass two copies of the same regexp to the
regular-expression engine. Although the engines should
perform nearly as well even with the copies, in practice they do not.
Problem reported by Luca Borzacchiello (Bug#43040).
* bootstrap.conf (gnulib_modules): Add hash.
* src/grep.c: Include stdint.h, for SIZE_WIDTH.
Include hash.h.
(struct patloc, patloc, patlocs_allocated, patlocs_used):
Rename from struct FL_pair, fl_pair, n_fl_pair_slots, n_pattern_files,
respectively, since the data type is no longer a pair.
All uses changed.
(struct patloc): New member FILELINE. The lineno member is now
ptrdiff_t since nowadays we prefer signed types.
(pattern_array, patterns_table): New static vars.
(count_nl_bytes, fl_add): Remove; no longer used.
(hash_pattern, compare_patterns, update_patterns): New functions.
update_patterns does what fl_add used to do, plus remove dups.
(pattern_file_name): Adjust to change from fl_pair to patloc.
(main): Move some variables to inner blocks for clarity.
Maintain the pattern_table hash of all patterns.
Update pattern_array to match keys, and use update_patterns
instead of fl_add to remove duplicate keys.
* tests/filename-lineno.pl (invalid-re-2-files)
(invalid-re-2-files2, invalid-re-2e): Ensure regexps are unique in
tests so that dups aren’t removed in diagnostics.
(invalid-re-line-numbers): New test.
|
|
|
|
|
|
|
|
|
| |
Problems reported by Antonio Diaz Diaz in:
https://bugs.gnu.org/28105#29
* NEWS, doc/grep.texi (Exit Status), src/grep.c (usage):
Adjust documentation accordingly.
* src/grep.c (grepdesc, main): Go back to old behavior.
* tests/skip-read: Adjust tests accordingly.
|
|
|
|
|
|
|
| |
Problem reported by Duncan Moore (Bug#37212).
* src/grep.c (usage): Fix incorrect statement about --exclude
and directories. Standardize on “that match GLOB” instead
of “matching GLOB”.
|
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* doc/grep.in.1: Use "-" in copyright year ranges, not \en.
|
| |
|
|
|
|
|
| |
* src/dfasearch.c (possible_backrefs_in_pattern): Remove a
duplicate "a", insert a "be" and a comma, and reformat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some bugs in the previous commit,
and should finish the fix for Bug#33249.
* NEWS: Mention fix for Bug#33249.
* src/dfasearch.c (possible_backrefs_in_pattern, regex_compile)
(GEAcompile): In new code, prefer ptrdiff_t to size_t when either
will do, since ptrdiff_t has better error checking. At some point
we should adjust the old code too.
(possible_backrefs_in_pattern): Rename from
find_backref_in_pattern. New arg BS_SAFE. All uses changed.
Fix false negative if a multibyte character ends in a single
'\\' byte, followed by the two bytes '\\', '1'.
(regex_compile): Simplify.
(GEAcompile): Avoid quadratic behavior when reallocating growing
buffers. Fix a couple of bugs in copying pattern data involving
backreferences. Fix another bug in copying pattern metadata
involving backreferences, by removing the need to copy it.
|