| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
Reported by Norihiro Tanaka in: http://bugs.gnu.org/18454#38
* src/grep.c (SEEK_DATA): Default to SEEK_SET if not defined.
(SEEK_HOLE): Move to top level, and default it to SEEK_SET.
(file_textbin): Adjust to new default.
(fillbuf): Don't bother with SEEK_DATA if it defaults to SEEK_SET.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Take advantage of the relaxed rules for treating non-text bytes in
binary data, by efficiently skipping past holes on platforms
supporting lseek's SEEK_DATA flag.
On one test on a circa-2008 Sun Fire V40z running Solaris 11.2,
'grep x' took 0.009 real-time seconds to scan a holey file of size
9,223,372,036,854,775,802 bytes, for a nominal scan rate of 1 ZB/s.
grep 2.20's scan rate on this platform was 843 MB/s, so this is a
speedup by a factor of 1.2 trillion. The speedup factor is not
as great on GNU/Linux hosts, due to what appear to be SEEK_DATA
inefficiencies, but presumably this will be cleared up in time.
* NEWS: Document this.
* src/grep.c, src/grep.h (eolbyte): Now char, not unsigned char.
This is for compatibility with the rest of the code.
The old (performance?) reasons for 'unsigned char' are now moot.
* src/grep.c (skip_nuls, skip_empty_lines, seek_data_failed):
New static vars.
(totalnl): Move up, since it's about input, not output, and
fillbuf now uses it.
(add_count): Move up, since fillbuf now uses it.
(all_zeros): New function.
(fillbuf): Use SEEK_DATA to skip past holes efficiently,
on systems that support this.
(grep, main): Set the new static vars.
|
|
|
|
|
|
|
|
|
|
| |
* src/grep.c, src/grep.h (enum textbin): Move to grep.h.
(input_textbin, validated_boundary): New vars.
* src/grep.c (grepbuf, grep): Initialize them.
* src/pcresearch.c (Pexecute): Do a multiline search
when the input is known to be free of encoding errors.
Quickly discard bytes that are obviously encoding errors.
Quickly match empty strings.
|
|
|
|
| |
* src/pcresearch.c (jit_stack): No longer static.
|
|
|
|
|
|
|
|
| |
* NEWS, doc/grep.texi (File and Directory Selection):
Document this change.
* src/grep.c (zap_nuls): New function.
(grep): Use it.
* tests/null-byte: Relax to allow new behavior.
|
|
|
|
|
|
|
|
| |
This avoids a problem when using grep -z in a Windows-1252 locale.
Plus, it lets 'grep -z' run a bit faster.
* NEWS: Document this.
* src/grep.c (buffer_textbin): Don't look for '\200' if -z.
* tests/pcre-z: Test for new behavior.
|
|
|
|
|
|
| |
* src/grep.c (enum textbin): New enum.
(textbin_is_binary): New function.
(buffer_textbin, file_textbin, grep): Use them, for clarity.
|
|
|
|
|
|
|
|
|
| |
* src/pcresearch.c (NSUB): New top-level constant, replacing
'nsub' within Pexecute.
(Pcompile, Pexecute): Use it.
(Pexecute): Don't assume sub[1] is zero after a PCRE_ERROR_BADUTF8
match failure.
* tests/pcre-invalid-utf8-input: Test for this bug.
|
|
|
|
|
|
| |
* src/pcresearch.c (Pcompile): Do not assume that
PCRE_STUDY_JIT_COMPILE is defined.
(empty_match): Define on all platforms.
|
|
|
|
| |
* src/grep.c (fgrep_to_grep_pattern): Use mb_clen here, too.
|
|
|
|
|
| |
* cfg.mk (_gl_TS_unmarked_extern_functions): New var,
to bypass the tight_scope false alarms on mb_clen and to_uchar.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/grep.c (buffer_textbin, contains_encoding_error):
Use mb_clen for speed.
(buffer_textbin): Bypass mb_clen in unibyte locales.
(main): Always initialize the cache, since it's sometimes used in
unibyte locales now. Initialize it before contains_encoding_error
might be called.
* src/search.h (SEARCH_INLINE): New macro.
(mbclen_cache): Now extern decl.
(mb_clen): New inline function.
* src/searchutils.c (SEARCH_INLINE, SYSTEM_INLINE): Define.
(mbclen_cache): Now extern.
(build_mbclen_cache): Put 1 into the cache when mbrlen returns 0.
(mb_goback): Use mb_len for speed, and rely on it returning nonzero.
* src/system.h (SYSTEM_INLINE): New macro.
(to_uchar): Use it.
|
|
|
|
|
|
|
|
|
| |
glibc has a bug where mbrlen and mbrtowc mishandle length-0 inputs.
Working around it in gnulib slows grep down, so disable the tests for it
and make sure grep works even if the bug is present.
* bootstrap.conf (avoided_gnulib_modules): Add mbrtowc-tests.
* configure.ac (gl_cv_func_mbrtowc_empty_input): Assume yes.
* src/searchutils.c (mb_next_wc): Don't invoke mbrtowc on empty input.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS:
* doc/grep.texi (File and Directory Selection):
Document this.
* src/grep.c (buffer_encoding, buffer_textbin): New functions.
(file_textbin): Rename from file_is_binary. Now returns 3-way value.
All callers changed.
(file_textbin, grep): Check the input more carefully for text vs
binary data.
(contains_encoding_error): Remove; use replaced by buffer_encoding.
* tests/backref-multibyte-slow:
* tests/high-bit-range:
* tests/invalid-multibyte-infloop:
Use -a, since the input is now considered to be binary.
* tests/invalid-multibyte-infloop: Add a check for new behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/grep.c (show_version, suppress_errors, only_matching)
(align_tabs, match_icase, match_words, match_lines, errseen)
(write_error_seen, is_device_mode, usable_st_size)
(file_is_binary, skipped_file, reset, fillbuf, out_quiet)
(out_line, out_byte, count_matches, no_filenames, line_buffered)
(done_on_match, exit_on_match, print_line_head, prline, grep)
(grepdirent, grepfile, grepdesc, grep_command_line_arg)
(get_nondigit_option, main): Use bool for boolean.
(print_line_head, prline): Use char for byte.
* src/grep.h: Include <stdbool.h>, and adjust decls to match
changes in grep.c.
|
|
|
|
|
|
| |
* src/pcresearch.c (empty_match): New var.
(Pcompile): Set it.
(Pexecute): Use it.
|
|
|
|
|
| |
* src/grep.c (do_execute): Remove. Caller now uses 'execute'.
* src/pcresearch.c (Pexecute): Improve comment about this.
|
|
|
|
|
|
|
|
| |
* src/pcresearch.c (Pcompile):
libpcre supports only unibyte and UTF-8 locales,
so report an error and exit if used in other locales.
* NEWS: Mention this.
* tests/euc-mb: Test this.
|
|
|
|
|
| |
* NEWS (Changes in behavior): Move the note about GREP_OPTIONS
from the 2.20 section into the section for the upcoming release.
|
|
|
|
|
|
|
|
|
| |
* NEWS:
* doc/grep.in.1 (ENVIRONMENT_VARIABLES):
* doc/grep.texi (Environment Variables):
Document that GREP_OPTIONS is obsolescent now.
* src/grep.c (main): Warn if GREP_OPTIONS is used.
* tests/r-dot, tests/skip-device: Don't use GREP_OPTIONS.
|
|
|
|
|
|
| |
* README (KNOWN BUGS):
* doc/grep.in.1:
* doc/grep.texi (Reporting Bugs): Document this.
|
|
|
|
| |
* tests/pcre-invalid-utf8-input: Add a test for that.
|
|
|
|
|
| |
* src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching
initial substrings of a line.
|
|
|
|
|
|
|
|
| |
* tests/triple-backref: New file.
* tests/Makefile.am (TESTS): Add it.
(XFAIL_TESTS): List it as a known, always-failing test.
Based on the bug report from Paul Eggert:
https://sourceware.org/bugzilla/show_bug.cgi?id=17356
|
|
|
|
| |
* Makefile.am (EXTRA_DIST): Add .mailmap.
|
|
|
|
|
|
| |
* src/pcresearch.c (Pexecute): Don't assume that a pcre_exec
that returns PCRE_ERROR_NOMATCH leaves its sub argument alone.
This assumption is false for libpcre-3 version 8.31-2ubuntu2.
|
|
|
|
|
|
|
|
|
| |
Problem reported by Santiago Vila in: http://bugs.gnu.org/18266
* NEWS: Mention this.
* src/pcresearch.c (Pexecute): Treat UTF-8 encoding errors
as non-matching data, instead of exiting 'grep'.
* tests/pcre-infloop: grep now exits with status 1, not 2.
* tests/pcre-invalid-utf8-input: grep now exits with status 0, not 2.
|
|
|
|
|
|
|
|
| |
undossify_input bug reported by Vincent Lefevre in:
http://bugs.gnu.org/18269
* src/dosbuf.c (undossify_input): Return size_t, not int.
* src/grep.c (fillbuf): Work portably even if safe_read returns a
value greater than SSIZE_MAX, e.g., if there's an I/O error.
|
|
|
|
|
| |
Reported by Benno Schulenberg in: http://bugs.gnu.org/18185
* doc/grep.texi (Environment Variables): Document LANGUAGE.
|
|
|
|
|
| |
Reported by Benno Schulenberg in: http://bugs.gnu.org/18184
* doc/grep.texi: Avoid @code in favor of @env, or of nothing at all.
|
|
|
|
|
|
|
| |
Problem reported by Hugues Andreux in: http://bugs.gnu.org/17763
* doc/grep.texi (File and Directory Selection): Be more careful
about documenting the interaction between recursive searching,
--include, --exclude, and --exclude-dir.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cfg.mk (sc_long_lines): New rule, from coreutils; exempt tests/*
* src/grep.c (usage): Tweak -F wording to shorten a line.
Correct grammar in a comment.
Split the --exclude-file=... description to fit within 80 columns.
Use emit_bug_reporting_address, eliminating another long line.
* src/dfa.c: Split long lines. No semantic change.
* doc/grep.texi: Likewise.
* tests/include-exclude: Split a long line.
* tests/backref: Split long lines.
* tests/empty: Likewise.
* tests/fmbtest: Likewise.
|
|
|
|
| |
* HACKING: Update from coreutils.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Makefile.am (THANKS): New rule.
* THANKS.in: New file.
* THANKS: Remove. Now it's generated from the combination of
THANKS.in and git logs.
* .mailmap: New file.
* cfg.mk (sc_THANKS_in_duplicates): New syntax-check rule, from
coreutils.
* .gitignore: Add THANKS.
* thanks-gen: New file, from coreutils.
|
|
|
|
|
|
|
|
| |
Problem reported by Nathan Weeks in: http://bugs.gnu.org/17856
* src/grep.c (Ecompile): Also specify RE_UNMATCHED_RIGHT_PAREN_ORD.
* doc/grep.texi (Fundamental Structure), NEWS: Document this.
* tests/ere.tests: Add a couple of tests for this.
* tests/spencer1.tests: Fix exit status.
|
|
|
|
|
|
|
|
| |
This allows the use of --enable-gcc-warnings on Gentoo and Ubuntu.
See: http://bugs.gnu.org/17793
* configure.ac (WERROR_CFLAGS): Avoid -Wstack-protector.
This can be worked around, but the cure is worse than the disease.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This led to problems, such as the prompt "mv: try to overwrite
'egrep', overriding mode 0555 (r-xr-xr-x)? " during a build.
It can be worked around, but the cure is worse than the disease;
making output files read-only is more trouble than it's worth.
* doc/Makefile.am (grep.1, egrep.1, fgrep.1):
* lib/Makefile.am (colorize.c):
* src/Makefile.am (egrep fgrep):
Don't make output files read-only. Prefer separate commands to
'&&' when either will do.
|
|
|
|
| |
* grep.spec: Remove; obsolete and evidently not used.
|
|
|
|
|
|
| |
* bootstrap.conf (gnulib_modules): Add fdl.
* doc/fdl.texi: Remove, as this now comes from gnulib.
* doc/.gitignore: Update to match current sources.
|
|
|
|
|
|
| |
* src/Makefile.am (egrep fgrep): chmod a=rx generated files,
and remove $@-t before attempting to redirect to it, in case it
is read-only.
|
|
|
|
|
|
|
| |
* lib/Makefile.am (colorize.c): Don't redirect directly to target, $@.
Otherwise, we could create a corrupt colorize.c file with a
timestamp that indicates it is up to date.
Also, make the generated file read-only.
|
|
|
|
|
| |
* src/dfa.c (enlist): Undo part of previous change that doesn't
look correct and doesn't help performance much anyway.
|
|
|
|
|
|
|
|
|
| |
Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/17700
* NEWS: Document this.
* bootstrap.conf (gnulib_modules): Add strstr.
* src/dfa.c (istrstr): Remove.
(enlist): Use strstr instead. Wait until we need memory before
allocating it; this can save an unnecessary allocate and free.
|
| |
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With --max-count=N (-m N), grep is supposed to stop reading input
after it has found the Nth match. However, a recent context-
related change made it so grep would always read to end of file.
* src/grep.c (prtext): Don't let a negative "out_after" value
make "pending" line count negative.
* tests/max-count-overread: New test, for this.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention it.
* THANKS: Add names of two recent bug reporters.
This bug was introduced by commit v2.18-139-g5122195.
Reported by Marc Aldorasi in http://bugs.gnu.org/17640.
|
|
|
|
|
|
|
|
| |
Commit v2.19-10-gc32ff67 mistakenly made this change:
-realloc_trans_if_necessary (d, 1);
+realloc_trans_if_necessary (d, 0);
which led to a heap buffer overflow.
* src/dfa.c (dfaexec): Allocate space for one state, as before.
|
|
|
|
|
|
|
|
|
| |
grep -E 'a(b$|c$)' would mistakenly match "aa".
* src/dfa.c (dfamust): When resetting 'is' in OR, also reset
'begline' and 'endline' of 'must'.
* NEWS (Bug fixes): Mention it.
This bug was introduced via commit v2.18-85-g2c94326.
Reported by Péter Radics in <http://bugs.gnu.org/17617>.
|
|
|
|
|
|
|
| |
build_state_zero doesn't need the struct dfa to be initialized,
so remove the initialization and simplify.
* src/dfa.c (build_state_zero): Remove.
(dfaexec): Call realloc_trans_if_necessary and build_state directly.
|