summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* grep: port to platforms lacking SEEK_DATAPaul Eggert2014-09-171-6/+11
| | | | | | | | Reported by Norihiro Tanaka in: http://bugs.gnu.org/18454#38 * src/grep.c (SEEK_DATA): Default to SEEK_SET if not defined. (SEEK_HOLE): Move to top level, and default it to SEEK_SET. (file_textbin): Adjust to new default. (fillbuf): Don't bother with SEEK_DATA if it defaults to SEEK_SET.
* grep: skip past holes efficientlyPaul Eggert2014-09-173-19/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the relaxed rules for treating non-text bytes in binary data, by efficiently skipping past holes on platforms supporting lseek's SEEK_DATA flag. On one test on a circa-2008 Sun Fire V40z running Solaris 11.2, 'grep x' took 0.009 real-time seconds to scan a holey file of size 9,223,372,036,854,775,802 bytes, for a nominal scan rate of 1 ZB/s. grep 2.20's scan rate on this platform was 843 MB/s, so this is a speedup by a factor of 1.2 trillion. The speedup factor is not as great on GNU/Linux hosts, due to what appear to be SEEK_DATA inefficiencies, but presumably this will be cleared up in time. * NEWS: Document this. * src/grep.c, src/grep.h (eolbyte): Now char, not unsigned char. This is for compatibility with the rest of the code. The old (performance?) reasons for 'unsigned char' are now moot. * src/grep.c (skip_nuls, skip_empty_lines, seek_data_failed): New static vars. (totalnl): Move up, since it's about input, not output, and fillbuf now uses it. (add_count): Move up, since fillbuf now uses it. (all_zeros): New function. (fillbuf): Use SEEK_DATA to skip past holes efficiently, on systems that support this. (grep, main): Set the new static vars.
* grep: improve -P performance in typical casesPaul Eggert2014-09-173-31/+130
| | | | | | | | | | * src/grep.c, src/grep.h (enum textbin): Move to grep.h. (input_textbin, validated_boundary): New vars. * src/grep.c (grepbuf, grep): Initialize them. * src/pcresearch.c (Pexecute): Do a multiline search when the input is known to be free of encoding errors. Quickly discard bytes that are obviously encoding errors. Quickly match empty strings.
* grep: minor -P speedup with jit_stackPaul Eggert2014-09-171-4/+2
| | | | * src/pcresearch.c (jit_stack): No longer static.
* grep: non-text bytes in binary data may be treated as line endsPaul Eggert2014-09-174-3/+34
| | | | | | | | * NEWS, doc/grep.texi (File and Directory Selection): Document this change. * src/grep.c (zap_nuls): New function. (grep): Use it. * tests/null-byte: Relax to allow new behavior.
* grep: -z no longer considers '\200' to be binary dataPaul Eggert2014-09-173-9/+9
| | | | | | | | This avoids a problem when using grep -z in a Windows-1252 locale. Plus, it lets 'grep -z' run a bit faster. * NEWS: Document this. * src/grep.c (buffer_textbin): Don't look for '\200' if -z. * tests/pcre-z: Test for new behavior.
* grep: refactor binary-vs-unknown-vs-text flags for clarityPaul Eggert2014-09-171-31/+55
| | | | | | * src/grep.c (enum textbin): New enum. (textbin_is_binary): New function. (buffer_textbin, file_textbin, grep): Use them, for clarity.
* grep: fix -P speedup bug with empty matchPaul Eggert2014-09-162-13/+24
| | | | | | | | | * src/pcresearch.c (NSUB): New top-level constant, replacing 'nsub' within Pexecute. (Pcompile, Pexecute): Use it. (Pexecute): Don't assume sub[1] is zero after a PCRE_ERROR_BADUTF8 match failure. * tests/pcre-invalid-utf8-input: Test for this bug.
* grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILEPaul Eggert2014-09-161-7/+7
| | | | | | * src/pcresearch.c (Pcompile): Do not assume that PCRE_STUDY_JIT_COMPILE is defined. (empty_match): Define on all platforms.
* grep: use mbclen cache in one more placePaul Eggert2014-09-161-2/+1
| | | | * src/grep.c (fgrep_to_grep_pattern): Use mb_clen here, too.
* grep: avoid false alarms for mb_clen and to_ucharPaul Eggert2014-09-161-0/+4
| | | | | * cfg.mk (_gl_TS_unmarked_extern_functions): New var, to bypass the tight_scope false alarms on mb_clen and to_uchar.
* grep: use mbclen cache more effectivelyPaul Eggert2014-09-164-30/+62
| | | | | | | | | | | | | | | | | | * src/grep.c (buffer_textbin, contains_encoding_error): Use mb_clen for speed. (buffer_textbin): Bypass mb_clen in unibyte locales. (main): Always initialize the cache, since it's sometimes used in unibyte locales now. Initialize it before contains_encoding_error might be called. * src/search.h (SEARCH_INLINE): New macro. (mbclen_cache): Now extern decl. (mb_clen): New inline function. * src/searchutils.c (SEARCH_INLINE, SYSTEM_INLINE): Define. (mbclen_cache): Now extern. (build_mbclen_cache): Put 1 into the cache when mbrlen returns 0. (mb_goback): Use mb_len for speed, and rely on it returning nonzero. * src/system.h (SYSTEM_INLINE): New macro. (to_uchar): Use it.
* grep: improve performance for older glibcPaul Eggert2014-09-163-1/+8
| | | | | | | | | glibc has a bug where mbrlen and mbrtowc mishandle length-0 inputs. Working around it in gnulib slows grep down, so disable the tests for it and make sure grep works even if the bug is present. * bootstrap.conf (avoided_gnulib_modules): Add mbrtowc-tests. * configure.ac (gl_cv_func_mbrtowc_empty_input): Assume yes. * src/searchutils.c (mb_next_wc): Don't invoke mbrtowc on empty input.
* grep: treat a file as binary if its prefix contains encoding errorsPaul Eggert2014-09-166-45/+106
| | | | | | | | | | | | | | | | | * NEWS: * doc/grep.texi (File and Directory Selection): Document this. * src/grep.c (buffer_encoding, buffer_textbin): New functions. (file_textbin): Rename from file_is_binary. Now returns 3-way value. All callers changed. (file_textbin, grep): Check the input more carefully for text vs binary data. (contains_encoding_error): Remove; use replaced by buffer_encoding. * tests/backref-multibyte-slow: * tests/high-bit-range: * tests/invalid-multibyte-infloop: Use -a, since the input is now considered to be binary. * tests/invalid-multibyte-infloop: Add a check for new behavior.
* grep: use bool for boolean in grep.cPaul Eggert2014-09-162-116/+124
| | | | | | | | | | | | | | * src/grep.c (show_version, suppress_errors, only_matching) (align_tabs, match_icase, match_words, match_lines, errseen) (write_error_seen, is_device_mode, usable_st_size) (file_is_binary, skipped_file, reset, fillbuf, out_quiet) (out_line, out_byte, count_matches, no_filenames, line_buffered) (done_on_match, exit_on_match, print_line_head, prline, grep) (grepdirent, grepfile, grepdesc, grep_command_line_arg) (get_nondigit_option, main): Use bool for boolean. (print_line_head, prline): Use char for byte. * src/grep.h: Include <stdbool.h>, and adjust decls to match changes in grep.c.
* grep: speed up -P on files containing many multibyte errorsPaul Eggert2014-09-161-8/+18
| | | | | | * src/pcresearch.c (empty_match): New var. (Pcompile): Set it. (Pexecute): Use it.
* grep: remove/refactor unnecessary code about line splittingPaul Eggert2014-09-162-46/+6
| | | | | * src/grep.c (do_execute): Remove. Caller now uses 'execute'. * src/pcresearch.c (Pexecute): Improve comment about this.
* grep: diagnose -P in non-UTF-8 multibyte localePaul Eggert2014-09-123-2/+13
| | | | | | | | * src/pcresearch.c (Pcompile): libpcre supports only unibyte and UTF-8 locales, so report an error and exit if used in other locales. * NEWS: Mention this. * tests/euc-mb: Test this.
* doc: move NEWS note about GREP_OPTIONS into proper sectionJim Meyering2014-09-121-3/+5
| | | | | * NEWS (Changes in behavior): Move the note about GREP_OPTIONS from the 2.20 section into the section for the upcoming release.
* grep: make GREP_OPTIONS obsolescentPaul Eggert2014-09-126-33/+18
| | | | | | | | | * NEWS: * doc/grep.in.1 (ENVIRONMENT_VARIABLES): * doc/grep.texi (Environment Variables): Document that GREP_OPTIONS is obsolescent now. * src/grep.c (main): Warn if GREP_OPTIONS is used. * tests/r-dot, tests/skip-device: Don't use GREP_OPTIONS.
* doc: bug tracker has moved to debbugs.gnu.orgPaul Eggert2014-09-113-9/+9
| | | | | | * README (KNOWN BUGS): * doc/grep.in.1: * doc/grep.texi (Reporting Bugs): Document this.
* grep: fix false matches with -P '...$' and invalid UTF-8Paul Eggert2014-09-111-1/+4
| | | | * tests/pcre-invalid-utf8-input: Add a test for that.
* grep: fix false matches with -P '...$' and invalid UTF-8Paul Eggert2014-09-111-1/+2
| | | | | * src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching initial substrings of a line.
* tests: add expect-to-fail test for a glibc regexp bugJim Meyering2014-09-102-1/+37
| | | | | | | | * tests/triple-backref: New file. * tests/Makefile.am (TESTS): Add it. (XFAIL_TESTS): List it as a known, always-failing test. Based on the bug report from Paul Eggert: https://sourceware.org/bugzilla/show_bug.cgi?id=17356
* maint: avoid distcheck failureJim Meyering2014-09-101-0/+1
| | | | * Makefile.am (EXTRA_DIST): Add .mailmap.
* grep: port recent fix to older pcre versionPaul Eggert2014-09-101-2/+4
| | | | | | * src/pcresearch.c (Pexecute): Don't assume that a pcre_exec that returns PCRE_ERROR_NOMATCH leaves its sub argument alone. This assumption is false for libpcre-3 version 8.31-2ubuntu2.
* grep: -P now treats invalid UTF-8 input as non-matchingPaul Eggert2014-09-094-44/+33
| | | | | | | | | Problem reported by Santiago Vila in: http://bugs.gnu.org/18266 * NEWS: Mention this. * src/pcresearch.c (Pexecute): Treat UTF-8 encoding errors as non-matching data, instead of exiting 'grep'. * tests/pcre-infloop: grep now exits with status 1, not 2. * tests/pcre-invalid-utf8-input: grep now exits with status 0, not 2.
* grep: fix integer-width bugs in undossify_input etc.Paul Eggert2014-08-142-8/+8
| | | | | | | | undossify_input bug reported by Vincent Lefevre in: http://bugs.gnu.org/18269 * src/dosbuf.c (undossify_input): Return size_t, not int. * src/grep.c (fillbuf): Work portably even if safe_read returns a value greater than SSIZE_MAX, e.g., if there's an I/O error.
* doc: document LANGUAGEPaul Eggert2014-08-031-3/+13
| | | | | Reported by Benno Schulenberg in: http://bugs.gnu.org/18185 * doc/grep.texi (Environment Variables): Document LANGUAGE.
* doc: prefer @env to @codePaul Eggert2014-08-031-16/+16
| | | | | Reported by Benno Schulenberg in: http://bugs.gnu.org/18184 * doc/grep.texi: Avoid @code in favor of @env, or of nothing at all.
* doc: Document -r vs --exclude more carefully.Paul Eggert2014-07-111-9/+11
| | | | | | | Problem reported by Hugues Andreux in: http://bugs.gnu.org/17763 * doc/grep.texi (File and Directory Selection): Be more careful about documenting the interaction between recursive searching, --include, --exclude, and --exclude-dir.
* maint: split long lines, and enforce the 80-column limitJim Meyering2014-06-2710-54/+89
| | | | | | | | | | | | | | * cfg.mk (sc_long_lines): New rule, from coreutils; exempt tests/* * src/grep.c (usage): Tweak -F wording to shorten a line. Correct grammar in a comment. Split the --exclude-file=... description to fit within 80 columns. Use emit_bug_reporting_address, eliminating another long line. * src/dfa.c: Split long lines. No semantic change. * doc/grep.texi: Likewise. * tests/include-exclude: Split a long line. * tests/backref: Split long lines. * tests/empty: Likewise. * tests/fmbtest: Likewise.
* doc: update HACKINGJim Meyering2014-06-271-16/+39
| | | | * HACKING: Update from coreutils.
* maint: generate distributed THANKS from VC'd THANKS.inJim Meyering2014-06-277-108/+163
| | | | | | | | | | | | * Makefile.am (THANKS): New rule. * THANKS.in: New file. * THANKS: Remove. Now it's generated from the combination of THANKS.in and git logs. * .mailmap: New file. * cfg.mk (sc_THANKS_in_duplicates): New syntax-check rule, from coreutils. * .gitignore: Add THANKS. * thanks-gen: New file, from coreutils.
* grep: with -E, unmatched ')' matches itselfPaul Eggert2014-06-275-2/+12
| | | | | | | | Problem reported by Nathan Weeks in: http://bugs.gnu.org/17856 * src/grep.c (Ecompile): Also specify RE_UNMATCHED_RIGHT_PAREN_ORD. * doc/grep.texi (Fundamental Structure), NEWS: Document this. * tests/ere.tests: Add a couple of tests for this. * tests/spencer1.tests: Fix exit status.
* build: avoid -Wstack-protectorPaul Eggert2014-06-171-0/+1
| | | | | | | | This allows the use of --enable-gcc-warnings on Gentoo and Ubuntu. See: http://bugs.gnu.org/17793 * configure.ac (WERROR_CFLAGS): Avoid -Wstack-protector. This can be worked around, but the cure is worse than the disease.
* build: don't make output files read-onlyPaul Eggert2014-06-173-11/+7
| | | | | | | | | | | | This led to problems, such as the prompt "mv: try to overwrite 'egrep', overriding mode 0555 (r-xr-xr-x)? " during a build. It can be worked around, but the cure is worse than the disease; making output files read-only is more trouble than it's worth. * doc/Makefile.am (grep.1, egrep.1, fgrep.1): * lib/Makefile.am (colorize.c): * src/Makefile.am (egrep fgrep): Don't make output files read-only. Prefer separate commands to '&&' when either will do.
* maint: remove grep.specPaul Eggert2014-06-081-79/+0
| | | | * grep.spec: Remove; obsolete and evidently not used.
* doc: use gnulib fdl modulePaul Eggert2014-06-073-507/+2
| | | | | | * bootstrap.conf (gnulib_modules): Add fdl. * doc/fdl.texi: Remove, as this now comes from gnulib. * doc/.gitignore: Update to match current sources.
* build: improve rule to generate egrep+fgrep scriptsJim Meyering2014-06-061-10/+11
| | | | | | * src/Makefile.am (egrep fgrep): chmod a=rx generated files, and remove $@-t before attempting to redirect to it, in case it is read-only.
* build: don't redirect directly to $@Jim Meyering2014-06-061-1/+4
| | | | | | | * lib/Makefile.am (colorize.c): Don't redirect directly to target, $@. Otherwise, we could create a corrupt colorize.c file with a timestamp that indicates it is up to date. Also, make the generated file read-only.
* grep: undo part of previous changePaul Eggert2014-06-051-3/+6
| | | | | * src/dfa.c (enlist): Undo part of previous change that doesn't look correct and doesn't help performance much anyway.
* grep: use system strstr if available and fastPaul Eggert2014-06-053-21/+10
| | | | | | | | | Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/17700 * NEWS: Document this. * bootstrap.conf (gnulib_modules): Add strstr. * src/dfa.c (istrstr): Remove. (enlist): Use strstr instead. Wait until we need memory before allocating it; this can save an unnecessary allocate and free.
* build: update gnulib submodule to latestPaul Eggert2014-06-051-0/+0
|
* maint: post-release administriviaJim Meyering2014-06-033-2/+5
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.20v2.20Jim Meyering2014-06-031-1/+1
| | | | * NEWS: Record release date.
* grep: fix --max-count=N (-m N) to stop reading after Nth matchJim Meyering2014-05-305-1/+24
| | | | | | | | | | | | | | With --max-count=N (-m N), grep is supposed to stop reading input after it has found the Nth match. However, a recent context- related change made it so grep would always read to end of file. * src/grep.c (prtext): Don't let a negative "out_after" value make "pending" line count negative. * tests/max-count-overread: New test, for this. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it. * THANKS: Add names of two recent bug reporters. This bug was introduced by commit v2.18-139-g5122195. Reported by Marc Aldorasi in http://bugs.gnu.org/17640.
* dfa: fix off-by-one under-allocation from recent changeJim Meyering2014-05-291-1/+1
| | | | | | | | Commit v2.19-10-gc32ff67 mistakenly made this change: -realloc_trans_if_necessary (d, 1); +realloc_trans_if_necessary (d, 0); which led to a heap buffer overflow. * src/dfa.c (dfaexec): Allocate space for one state, as before.
* dfa: fix bug with regex containing multiple begin/end-line constraintsNorihiro Tanaka2014-05-284-4/+46
| | | | | | | | | grep -E 'a(b$|c$)' would mistakenly match "aa". * src/dfa.c (dfamust): When resetting 'is' in OR, also reset 'begline' and 'endline' of 'must'. * NEWS (Bug fixes): Mention it. This bug was introduced via commit v2.18-85-g2c94326. Reported by Péter Radics in <http://bugs.gnu.org/17617>.
* dfa: simplify building initial stateNorihiro Tanaka2014-05-261-17/+4
| | | | | | | build_state_zero doesn't need the struct dfa to be initialized, so remove the initialization and simplify. * src/dfa.c (build_state_zero): Remove. (dfaexec): Call realloc_trans_if_necessary and build_state directly.