diff options
Diffstat (limited to 'ChangeLog')
-rw-r--r-- | ChangeLog | 9075 |
1 files changed, 9075 insertions, 0 deletions
diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..6500478 --- /dev/null +++ b/ChangeLog @@ -0,0 +1,9075 @@ +2016-04-21 Jim Meyering <meyering@fb.com> + + version 2.25 + * NEWS: Record release date. + +2016-04-19 Paul Eggert <eggert@cs.ucla.edu> + + dfa: remove dependency on btowc + MirOS BSD btowc is a macro that (when GCC is being used) hardcodes + btowc (0x80) == WEOF regardless of locale, which contradicts + future POSIX in the C locale. Instead of bothering to develop a + Gnulib workaround for the btowc incompatibility, use mbrtowc, + which we are using elsewhere and fixing anyway, and are caching so + it is fast here. Problem reported by Nelson H. F. Beebe via Jim + Meyering in: http://bugs.gnu.org/23269#14 + * bootstrap.conf (gnulib_modules): Remove btowc. + * src/dfa.c (struct dfa): Remove mbrtowc_cache member, replacing with ... + (mbrtowc_cache): ... this new static var. All uses changed. + (dfambcache): Remove; now done by setsyntax. Call removed. + (is_valid_unibyte_character): Remove. + (IS_WORD_CONSTITUENT): Remove this macro, replacing it with ... + (unibyte_word_constituent): ... this new function. It uses + mbrtowc_cache rather than btowc. + (dfasyntax): Initialize mbrtowc_cache before using it. + +2016-04-10 Paul Eggert <eggert@cs.ucla.edu> + + grep: minor doc tweaks inspired by Debian + Problem reported by Santiago Ruano Rincón in: http://bugs.gnu.org/22911 + * doc/grep.in.1: + * doc/grep.texi (Matching Control, grep Programs) + (Regular Expressions): + Document -e, -f, and PCRE more carefully. + +2016-04-10 Jim Meyering <meyering@fb.com> + + maint: remove unused mbtoupper function + * src/searchutils.c (mbtoupper): Remove now-unused function. + Also remove inclusion of <assert.h>, since this change removed + the final use of assert. + * src/search.h (mbtoupper): Remove declaration. + +2016-04-10 Paul Eggert <eggert@cs.ucla.edu> + + grep: in C locale, all bytes are valid characters + This works around glibc bug 19932: + https://sourceware.org/bugzilla/show_bug.cgi?id=19932 + The actual bug fix was the update to the current version of Gnulib. + grep problem reported by Björn Jacke in: http://bugs.gnu.org/23234 + * NEWS: Mention this. + * doc/grep.texi (File and Directory Selection): Crossref to LC_* + section. Suggest why -a or LC_ALL=C might be useful. + (Environment Variables): Mention 'locale -a'. + Say that LC_CTYPE also specifies encoding, and that every + byte is a valid character in the C or POSIX locale. + * tests/c-locale: New test. + * tests/Makefile.am (TESTS): Add it. + + build: update gnulib submodule to latest + +2016-04-05 Paul Eggert <eggert@cs.ucla.edu> + + Give another example of binary file processing + Problem reported by Shlomi Fish + * doc/grep.texi (File and Directory Selection): + Document that 'q$' might match 'q' followed by a NUL + if --binary-files=binary is in effect. + +2016-04-03 Paul Eggert <eggert@cs.ucla.edu> + + tests: test egrep/fgrep help only if our grep + Problem reported by Christian Weisgerber in: http://bugs.gnu.org/23146 + * tests/Makefile.am (TESTS_ENVIRONMENT): + Test egrep and fgrep only if they use our grep. + +2016-03-29 Jim Meyering <meyering@fb.com> + + tests: remove spurious test of egrep + * tests/reversed-range-endpoints: Do not test egrep here. + There is already a test of grep -E. + Prompted by http://bugs.gnu.org/23146 + +2016-03-23 Paul Eggert <eggert@cs.ucla.edu> + + grep: -Pz no longer misdiagnoses [^a] + Problem reported by Michael Jess. + * NEWS: Document this. + * src/pcresearch.c (Pcompile): Do not diagnose [^ when [ is unescaped. + * tests/pcre: Test for the bug. + +2016-03-22 Jim Meyering <meyering@fb.com> + + maint: move new 'Improvements' blurb into proper section + * NEWS (Improvements): Move this new section from within the block + for the already-released 2.24 into the proper "next-release" block. + Also, retain the 2-blank-line separator between blocks. + +2016-03-18 Jim Meyering <meyering@fb.com> + + maint: avoid spurious "binary file ... matches" in generated THANKS + * Makefile.am (THANKS): Don't apply grep to a stream containing + NUL bytes. Sync this rule from the one in coreutils: it was missing + some improvements. + Reported by Bailes Magio in http://bugs.gnu.org/22899 + +2016-03-18 Paul Eggert <eggert@cs.ucla.edu> + + grep: -oz now outputs null bytes, not newlines + * NEWS: Document this. + * doc/grep.texi (Other Options): Clarify that -z affects output + as well as input data. + * src/grep.c (print_line_middle): Output eolbyte, not newline, if -o. + * tests/null-byte: Test -o too. + * tests/pcre-context: Adjust test to match new behavior. + +2016-03-17 Paul Eggert <eggert@cs.ucla.edu> + + grep: use errno consistently in write diagnostics + Feature request and initial version reported by Assaf Gordon in: + http://bugs.gnu.org/23031 + * NEWS: Document this. + * src/grep.c: Include <stdarg.h>. + (stdout_errno): New static var. + (write_error_seen): Remove; superseded by stdout_errno. + All uses changed. + (putchar_errno, fputs_errno, printf_errno, fwrite_errno) + (fflush_errno): New static functions. + (print_filename, print_sep, print_offset, print_line_head) + (print_line_middle, print_line_tail, prline, prtext, grep) + (grepdesc): Use them. + * tests/write-error-msg: New file. + * tests/Makefile.am (TESTS): Add it. + +2016-03-10 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.24 + * NEWS: Record release date. + +2016-02-28 Jim Meyering <meyering@fb.com> + + maint: add dist-check.mk + This file augments "make distcheck" rules. + * dist-check.mk: New file, from coreutils via gzip. + * Makefile.am (EXTRA_DIST): Add it. + * cfg.mk: Include it. + +2016-02-21 Paul Eggert <eggert@cs.ucla.edu> + + grep: -Pz is incompatible with ^ and $ + Problem reported by Sergei Trofimovich in: http://bugs.gnu.org/22655 + * NEWS: Document this. + * src/pcresearch.c (Pcompile): Warn with -Pz and anchors. + * tests/pcre: Test new behavior. + +2016-02-21 Jim Meyering <meyering@fb.com> + + tests: test cleanup + * tests/z-anchor-newline: Remove test artifact that would write + to /t/x. + +2016-02-20 Jim Meyering <meyering@fb.com> + + grep -z: avoid erroneous match with regexp anchor and \n in text + * src/dfasearch.c (EGexecute): Clear the newline_anchor bit when + eolbyte is not '\n'. + * tests/z-anchor-newline: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Describe it. + Originally reported by Ulrich Mueller in + https://bugs.gentoo.org/show_bug.cgi?id=574662 + Reported to us by Sergei Trofimovich as http://debbugs.gnu.org/22655 + + tests: convert "cmd && fail=1" to "returns_ 1 cmd || fail=1" + The latter is robust, while the former can silently ignore + failure due to signals. + * cfg.mk (sc_prohibit_and_fail_1): New rule, copied from coreutils. + * tests/long-pattern-perf: Perform the above substitution. + * tests/mb-non-UTF8-performance: Likewise. + * tests/help-version: Merge from coreutils. + +2016-02-09 Jim Meyering <meyering@fb.com> + + maint: add a check-very-expensive target + * Makefile.am (check-very-expensive): New convenience rule, + currently merely equivalent to check-expensive. + +2016-02-04 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.23 + * NEWS: Record release date. + +2016-02-02 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + Update for this "make distcheck"-fixing change: + > verify-tests: also remove stray test-verify.Tpo + +2016-02-01 Jim Meyering <meyering@fb.com> + + tests/null-byte: test another code path + * tests/null-byte: Also exercise the case in which there is + a match in the block along with the NUL byte. + +2016-01-31 Paul Eggert <eggert@cs.ucla.edu> + + Omit excess "Binary file ... matches" + Problem reported in: http://bugs.gnu.org/22461 + * src/grep.c (grep): Don't report "Binary file ... matches" + merely because the file contained both matches and binary data. + Insist that the binary data contained a match. + * tests/null-byte: Add a test for this. + +2016-01-28 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + +2016-01-23 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + + maint: fix typo in NEWS: s/a/an/ + +2016-01-15 Paul Eggert <eggert@cs.ucla.edu> + + grep: -x now supersedes -w more consistently + * NEWS, doc/grep.texi (Matching Control): Mention this. + * src/dfasearch.c (EGexecute): + * src/pcresearch.c (Pcompile): + Don't get confused by -w if -x is also present. + * src/pcresearch.c (Pcompile): Remove misleading comment about + non-UTF-8 multibyte locales, as PCRE doesn't support them. + Calculate buffer sizes more carefully; the old method + allocated a buffer slightly too big, seemingly due to luck. + * tests/backref-word, tests/pcre: Add tests for this bug. + + tests: omit update-copyright-tests + This test does not check how 'grep' itself operates, so it is + out of place for grep's 'make check'. Problem reported by Sam Razavi in: + http://bugs.gnu.org/22376 + * bootstrap.conf (avoided_gnulib_modules): Add update-copyright-tests. + +2016-01-11 Jim Meyering <meyering@fb.com> + + tests: do use "yes" but via an AWK replacement + Also, use sed Nq in place of head -N + * tests/init.cfg (yes): Define. + Thanks to Paul Eggert for this definition. + * tests/max-count-overread: Revert to using "yes". + * tests/mb-non-UTF8-performance: Likewise, and use + "sed Nq" in place of head -N. + +2016-01-11 Paul Eggert <eggert@cs.ucla.edu> + + * tests/pcre-count: Don't assume the page size is 32kB. + +2016-01-08 Paul Eggert <eggert@cs.ucla.edu> + + tests: port to other POSIXish platforms + I tested this on Solaris 10 and AIX 7.1. + * tests/max-count-overread: + * tests/mb-non-UTF8-performance: + Don't assume 'yes' exists, as 'yes' is not in POSIX. + * tests/mb-non-UTF8-performance: + Don't rely on 'head -1000', as that option syntax is not POSIX. + * tests/pcre-count: Don't rely on "printf '\x0'". + * tests/unibyte-binary: Don't assume \200 is an encoding error + in every unibyte locale. + +2016-01-08 Jim Meyering <meyering@fb.com> + + tests: fix encoding-error test failure to use of printf '\xHH' + * tests/encoding-error: Don't rely on printf having support for \xHH + hexadecimal. That is not portable. Use \OOO octal, instead. + + maint: fix typo in NEWS: s/a/an/ + +2016-01-07 Jim Meyering <meyering@fb.com> + + mb-non-UTF8-performance: avoid FP test failure on fast hardware + * tests/mb-non-UTF8-performance: Don't use a fixed size. + Otherwise, on a fast system, the fixed-size unibyte test + would complete in a nominal 0 ms, which might well be + smaller than 1/30 of the multibyte duration, provoking + a false positive test failure. Instead, increase the + size of the input until we obtain a unibyte duration of + at least 10ms. + +2016-01-07 Paul Eggert <eggert@cs.ucla.edu> + + doc: mention unibyte encoding fix + * NEWS: Document recent fix for encoding errors in unibyte locales. + + grep: improve unibyte -P performance + This is a followon to the recent changes prompted by Bug#20526. + In <http://bugs.gnu.org/bug=20526#86> Norihiro Tanaka pointed out + that grep mistakenly assumed that unibyte locales cannot have + encoding errors. Here, the mistake hurt performance significantly. + On Fedora 23 x86-64 in the C locale, this patch improved grep's + performance by a factor of 7 when run as "grep -P 'z.*a'" on the + output of "yes $(printf '\200\n') | head -n 1000000000". + * src/pcresearch.c (multibyte_locale) [HAVE_LIBPCRE]: New static var. + (Pcompile): Set it. + (Pexecute): Use it to avoid the need to call + buf_has_encoding_errors in unibyte locales. + +2016-01-06 Paul Eggert <eggert@cs.ucla.edu> + + Improve on fix for Bug#22181 + * src/pcresearch.c (Pexecute): Update subject when skipping past + easily-determined encoding errors, as this is faster than letting + pcre_exec skip them. On my platform this improves performance + 4.7x on a benchmark created via "yes $(printf '\200\200\200\200 + \200\200\200\200\200\200\200\200\200\200\200\200\200\200\200\200x\n') + | head -n 1000000 >j; grep -oP y j" in a UTF-8 locale. Rework + code that deals with PCRE_ERROR_BADUTF8 return, to avoid an + incorrect (albeit currently harmless) 'bol = false' assignment. + + grep: restore -P optimization (followup fix) + * src/search.h (EGexecute, Fexecute, Pexecute): + Change decls to match new implementations. + I forgot to add this file to the previous commit. + + grep: restore -P PCRE_NO_UTF8_CHECK optimization + On my platform in the en_US.utf8 locale, this makes 'grep -P "z.*a" k' + 220x faster, where k is created by the shell command: + yes 'abcdefg hijklmn opqrstu vwxyz' | head -n 10000000 >k + * src/dfasearch.c (EGexecute): + * src/grep.c (execute_fp_t): + * src/kwsearch.c (Fexecute): + * src/pcresearch.c (Pexecute): + First arg is now char *, not char const *, since Pexecute now + temporarily modifies this argument. + * src/grep.c, src/grep.h (buf_has_encoding_errors): Now extern. + * src/pcresearch.c (Pexecute): Use it. If the input is free of + encoding errors, use a multiline search and the PCRE_NO_UTF8_CHECK + option, as this is typically way faster. This restores an + optimization that was removed with the recent changes for binary + file detection. + +2016-01-05 Paul Eggert <eggert@cs.ucla.edu> + + Fix calculation of unibyte_mask + * src/grep.c (initialize_unibyte_mask): The old method worked for + UTF-8 and other typical encodings, but did not work for weird + encodings, e.g., one where all bytes other than 0x7f and 0x80 are + unibyte characters. + +2016-01-01 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix bug with with invalid unibyte sequence + This was introduced by the recent binary-data-detection changes. + Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/20526#86 + * src/grep.c (HIBYTE, easy_encoding, init_easy_encoding): Remove, + replacing with ... + (uword_max, unibyte_mask, initialize_unibyte_mask): ... this new + constant, static var, and function. All uses changed. The + unibyte_mask var generalizes the old local var hibyte_mask, which + worked only for encodings where every byte with 0x80 turned off is + a single-byte character. + (buf_has_encoding_errors): Return false immediately if + unibyte_mask is zero, not whether the current encoding is unibyte. + The old test was incorrect in unibyte locales in which some bytes + were encoding errors. + * tests/pcre-z: Require UTF-8 locale, since the grep -z . test now + needs this. Use printf \0 rather than tr. Port the 'grep -z .' + test to platforms where the C locale says '\200' is an encoding + error. Use cmp rather than compare, as the file is binary and + so non-GNU diff might not work. + * tests/unibyte-binary: New file. + * tests/Makefile.am (TESTS): Add it. + +2016-01-01 Jim Meyering <meyering@fb.com> + + maint: update copyright year, bootstrap, init.sh + Run "make update-copyright" and then... + + * gnulib: Update to latest. + * tests/init.sh: Update from gnulib. + * bootstrap: Likewise. + +2015-12-31 Paul Eggert <eggert@cs.ucla.edu> + + doc: clarify text vs binary match output + * NEWS: + * doc/grep.texi (File and Directory Selection): + Make it clearer that grep can now output matching text before + reporting a binary match. Problem reported by Norihiro Tanaka in: + http://bugs.gnu.org/20526#83 + + doc: minor clarifications + * doc/grep.in.1, doc/grep.texi: Minor clarifications suggested by + Debian documentation patches. Problem reported by Santiago Ruano + Rincón in: http://bugs.gnu.org/18651 + + grep: fix -l --line-buffer bug + Problem reported by Louis Sautier in: http://bugs.gnu.org/18750 + * NEWS: Document this. + * src/grep.c (grep, grepdesc): If --line-buffered, flush + stdout after outputting newline (or null byte, if applicable). + +2015-12-30 Paul Eggert <eggert@cs.ucla.edu> + + grep: remove duplicate init + * src/grep.c (print_line_middle): Remove duplicate initialization. + + grep: report line-buffered write error right away + * src/grep.c (prline): When line buffered, if there is a write + error, report it immediately rather than waiting until the next + line of output. + + grep: -c should keep counting after binary data + Problem and fix reported by Jaroslav Škarvada, and test case + reported by Norihiro Tanaka, in: http://bugs.gnu.org/22028 + * NEWS: Document this. + * src/grep.c (grep): Don't stop counting merely because nulls seen. + * tests/pcre-count: New file. + * tests/Makefile.am (TESTS): Add it. + + dfa: port to tinycc + * src/dfa.c (add_utf8_anychar): Put 'const' after type. + Problem reported by Aharon Robbins in: + http://bugs.gnu.org/22260 + + grep: be less picky about encoding errors + This fixes a longstanding problem introduced in grep 2.21, + which is overly picky about binary files. + * NEWS: + * doc/grep.texi (File and Directory Selection): Document this. + * src/grep.c (input_textbin, textbin_is_binary, buffer_textbin) + (file_textbin): + Remove. All uses removed. + (encoding_error_output): New static var. + (buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls): + New functions, which reuse bits + and pieces of the removed functions. + (lastout, print_line_head, print_line_middle, print_line_tail, prline) + (prpending, prtext, grepbuf): + Avoid use of const, now that we have + functions that require modifying a sentinel. + (print_line_head): New arg LEN. All uses changed. + (print_line_head, print_line_tail): + Return indicator whether the output line was printed. + All uses changed. + (print_line_middle): Exit early on encoding error. + (grep): Use new method for determining whether file is binary. + * src/grep.h (enum textbin, TEXTBIN_BINARY, TEXTBIN_UNKNOWN) + (TEXTBIN_TEXT, input_textbin): Remove decls. All uses removed. + * src/pcresearch.c (Pexecute): Remove multiline optimization, + since the main program no longer checks for encoding errors on input. + * tests/encoding-error: New file. + * tests/Makefile.am (TESTS): Add it. + +2015-12-29 Jim Meyering <meyering@fb.com> + + maint: correct (make sorted) order of test file names + * tests/Makefile.am (TESTS): Insert new test name in sorted order. + +2015-12-28 Paul Eggert <eggert@cs.ucla.edu> + + grep: --exclude matches trailing parts of args + Problem reported by Vincent Lefevre in: + http://bugs.gnu.org/22144 + * NEWS: + * doc/grep.texi (File and Directory Selection): Document this. + * src/grep.c (excluded_patterns, excluded_directory_patterns): + Now 2-element arrays, with one element for subfiles and another + for command-line args. All uses changed. This implements the change. + (exclude_options): New function. + * tests/include-exclude: Test the change. + +2015-12-18 Jim Meyering <meyering@fb.com> + + grep -oP: don't infloop when processing invalid UTF8 preceding a match + * src/pcresearch.c (Pexecute): When advancing SUBJECT past an + encoding error, don't blindly set P to that new value, since we + will soon compute SEARCH_OFFSET = P - SUBJECT, and mistakenly + making that difference too small would allow us to match some + previously-processed text, resulting in an infinite loop. + * NEWS (Bug fixes): Mention it. + * THANKS.in: Add Christian's name and email address. + * tests/pcre-invalid-utf8-infloop: New file. + * tests/Makefile.am (TESTS): Add it. + Reported by Christian Boltz in http://debbugs.gnu.org/22181 + Introduced by commit, v2.21-37-g14f8e48. + +2015-11-04 Jim Meyering <meyering@fb.com> + + tests: mark performance-related tests as expensive + These performance-related tests are slightly failure prone due to + varying system load during the two runs. + Marking these tests as "expensive" makes it so they are no longer run + via "make check". You can still run them via make "check-expensive". + This makes them less likely to be run by regular users. + * tests/long-pattern-perf: Use expensive_. + * tests/mb-non-UTF8-performance: Likewise. + Reported by Jaroslav Skarvada in http://debbugs.gnu.org/21826 + and by Andreas Schwab in http://debbugs.gnu.org/21812. + +2015-11-01 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.22 + * NEWS: Record release date. + + tests: pcre-jitstack: upon failure, retry with no stack size limit + * tests/pcre-jitstack: Don't let an example that provokes inordinate + stack space use cause a test failure. Thanks to reports from and + analysis by Bruce Dubbs; see http://debbugs.gnu.org/21755 + +2015-10-27 Jim Meyering <meyering@fb.com> + + maint: update THANKS.in + * THANKS.in: Add name+email of those who found and reported + the bug that made grep -E '^x|x$' match any "x". + +2015-10-25 Zev Weiss <zev@bewilderbeest.net> + + dfa: plug a memory leak in dfamust + * src/dfa.c (dfamust): Ensure MP is freed, by refraining + from returning early when, at "done:" *RESULT is NULL. + +2015-10-25 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + * gnulib: Pull in one more portability fix: + stdalign: port to Sun C 5.9 + +2015-10-24 Jim Meyering <meyering@fb.com> + + gnulib: update to latest, for portability fixes + * gnulib: Pull in changes like these: + fts: port to C11 alignof + stdalign: work around pre-4.9 GCC x86 bug + + maint: NEWS: correct/amend + * NEWS: Move the long-regexp-performance-improvement from + "Bug fixes" to "Improvements." Say more and include an example. + The -Fw degradation was introduced in commit v2.18-125-g94555dd + + tests: avoid spurious failure on OpenBSD 5.8 + * tests/fedora: Don't rely on "diff - FILE" reading from stdin. + Reported privately by Nelson Beebe. + +2015-10-17 Jim Meyering <meyering@fb.com> + + gnulib: update to latest; also bootstrap and tests/init.sh + * bootstrap: Update from gnulib. + * tests/init.sh: Likewise. + * gnulib: Update submodule to latest. + + build: avoid spurious bootstrap failure involving pkg.m4 + Running ./bootstrap could fail mistakenly at the very end in + its attempt to obtain a copy of pkg.m4. It would search only + $(aclocal --print-ac-dir) and some other directories, but not + those listed in $(aclocal --print-ac-dir)/dirlist. + * bootstrap.conf (bootstrap_post_import_hook): Also search the + directories named in $(aclocal --print-ac-dir)/dirlist when that + file exists with nonzero size. + +2015-10-16 Paul Eggert <eggert@cs.ucla.edu> + + maint: add news item + * NEWS: Document grep -Fw speedup. + + grep: simplify previous change + * src/grep.c (main): Simplify recently-changed grep -Fw test. + +2015-10-16 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: use grep matcher for grep -Fw when unibyte + In single byte locales with grep -Fw, prefer the grep matcher to the + kwset matcher, as the former uses KWset and a DFA, whereas the latter + calls kwsexec many times until it matches a word. + * src/grep.c (main): Change pattern for fgrep into grep for grep -Fw in + single byte locales. + +2015-10-16 Paul Eggert <eggert@cs.ucla.edu> + + grep: use memchr/memrchar + * src/kwsearch.c (Fexecute): Prefer memchr and memrchr to doing it + by hand. + +2015-10-16 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: improve performance of grep -Fw + * src/kwsearch.c (Fexecute): grep -Fw examined whether the previous + character is a word character after matching from the head of the + buffer. It is extremely slow. Now, if grep found a potential match, + it looks for the previous newline, and examines from there. + +2015-10-13 Jim Meyering <meyering@fb.com> + + maint: use single quote rather than UTF-8 multi-byte version + * tests/backref-alt: Translate unnecessary non-ASCII in comment. + +2015-10-13 Paul Eggert <eggert@cs.ucla.edu> + + dfa: make the executable a bit smaller + * src/dfa.c (dfamust): Hoist MB_CUR_MAX calculation out of loops. + +2015-10-13 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: fix bug in alternate of sub-patterns that differ only in constraints + Fix a bug where a line incorrectly matches alternates of sub-patterns + that differ only in the constraints, e.g., the ERE '^a|a$'. + Reported by Greg Boyd in: http://debbugs.gnu.org/21670 + * src/dfa.c (dfamust): For a pattern with constraints, check that it is + matched including the constraints, to judge whether it is exact. + + dfa: fix off-by-one error + * src/dfa.c (dfamust): Fix off-by-one error in computing 'must' length, + which caused the 'must' to be too short. See: + http://bugs.gnu.org/21670#28 + +2015-10-12 Jim Meyering <meyering@fb.com> + + doc: NEWS: mention a bug fix + * NEWS (Bug fixes): Describe it. + This bug was introduced by commit v2.18-85-g2c94326 + and fixed by commit v2.21-51-g256a4b4. + +2015-10-11 Paul Eggert <eggert@cs.ucla.edu> + + tests: add test case for Bug#21670 + * tests/options: Add test #4 to catch Bug#21670. + Also, do not overescape # in shell strings. + +2015-09-19 Paul Eggert <eggert@cs.ucla.edu> + + Add test for pop_fail_stack bug + Problem reported by Hanno Böck in: http://bugs.gnu.org/21513 + If you use --with-included-regex the bug fix is in gnulib, here: + http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=5513b40999149090987a0341c018d05d3eea1272 + If you use glibc, the bug fix has not been installed yet. + * tests/Makefile.am (XFAIL_TESTS): Add backref-alt if system matcher. + (TESTS): Add backref-alt. + * tests/backref-alt: New file. + * tests/triple-backref: Remove unused var. + Don't skip if tested with glibc, as Makefile.am now handles this. + + build: update gnulib submodule to latest + +2015-08-19 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: avoid use of uninitialized variable + EGexecute would use "backref" uninitialized. + While that could have no bearing on correctness, it could + impact performance, via an unnecessary use of regexp. + * src/dfasearch.c (EGexecute): Initialize backref. + Reported as http://debbugs.gnu.org/21273 + Introduced by commit v2.21-55-gea0ebaa. + +2015-08-12 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: remove fgrep code for case insensitive match + The fgrep matcher is no longer called in case insensitive matching, + so remove the code to support it. + * src/kwsearch.c (mb_case_map_apply): Remove function. + (Fexecute): Remove now-unused code. + +2015-08-12 Paul Eggert <eggert@cs.ucla.edu> + + dfa: optimize [x-x] + * src/dfa.c (parse_bracket_exp): Treat [x-x] as if it were [x]. + This also pacifies GCC, which otherwise complains about wc2 + being set but not used. + +2015-08-12 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: remove unused multibyte support + Now regex should be used for range, collating element, equivalent class + in non POSIX locales. So remove code to support these features. + * dfa.c (struct mb_char_classes): Remove members ch_classes, + nch_classes, ranges, nranges, equivs, nequivs, coll_elems, ncoll_elems. + All uses removed. + (match_mb_charset): Remove function. + +2015-08-01 Jim Meyering <meyering@fb.com> + + tests: mb-non-UTF8-performance: use new function + * tests/mb-non-UTF8-performance: Rewrite to use + the user-time measuring function in init.cfg. + + tests: long-pattern-perf: measure user time, not elapsed + Measuring user time makes this test less prone to false + positive failure, and also lets us use a tighter bound. + * tests/long-pattern-perf: Measure elapsed user time rather than + wall-clock time, to permit a tighter bound on the ratio of + N-to-10N timings. Suggested by Giuseppe Ottaviano. + Also, use regexps built from mostly 5-digit numbers, so that the 10:1 + ratio applies to lines of "seq" output as well as to total bytes. + + tests: new function to measure elapsed user time + * tests/init.cfg (user_time_): New function. + +2015-07-25 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: remove word delimiter support for multibyte locales + DFA supports word delimiter expressions, but it does not behave + correctly for multibyte locales. Even if it were to be fixed, + the DFA matcher's performance would be no better than that of regex. + Thus, this change removes DFA support for word delimiter expressions + in multibyte locales. + + * src/dfa.c (dfa_supported): Return false also when a pattern uses any + word delimiter expression in a multibyte locale. + +2015-07-25 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: avoid execution for a pattern including an unsupported expression + If a pattern includes a construct unsupported by the DFA matcher, + the DFA search would fail in most cases. Make dfaexec immediately + return for any such pattern. + + * src/dfa.c (struct dfa_state) [has_backref, has_mbcset]: Remove members + and all uses. + (dfaexec_main): Remove 'backref' parameter. Update callers. + (dfaexec_noop): New function. + (dfa_supported): New function. + (dfassbuild): Remove now-unused code. + (dfacomp): When a pattern uses a DFA-unsupported construct, do not + waste time performing any further analysis. + +2015-07-19 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: DEBUG: print detail of DFA states + When compiled with -DDEBUG, grep outputs tokens etc. + With this change, also print DFA states and transitions. + This change is very useful when debugging those. + + * src/dfa.c (prtok) [DEBUG]: Change `%c' to `%02x' in printf format. + (state_index) [DEBUG]: Print detail of new state. + (dfastate) [DEBUG]: Print detail of DFA states. + Reported as http://debbugs.gnu.org/18707 + +2015-07-18 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: sjis-mb: accept two more locales + * tests/sjis-mb: Accept the ja_JP.SJIS and ja_JP.PCK locales + as well as ja_JP.SHIFT_JIS, so this test is less likely to + be skipped unnecessarily. Reported as http://bugs.gnu.org/18983 + +2015-07-18 Jim Meyering <meyering@fb.com> + + tests: add a test for the performance fix + * tests/long-pattern-perf: New file. + * tests/Makefile.am (TESTS): Add it. + +2015-07-18 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: speed up handling of long pattern + DFA tries to find a long sequence of characters that must appear + in any matching line. However, when a pattern is long (length N), + it is very slow, because it makes O(N^2) strstr calls. + This change reduces that to O(N) by processing each sequence of + adjacent "regular" characters as a group. + + Compare the run times of this command before and after this change: + (on a i7-4770S CPU @ 3.10GHz using rawhide (~fedora 22) and compiled + with gcc 6.0.0 20150627) + : | env time -f %e grep -f <(seq -s '' 9999) + Before: 0.85 + After: 0.02 + + * src/dfa.c (dfamust): Process each string of concatenated normal + characters as a unit. + * NEWS (Improvement): Mention it. + Prompted by a bug report and patch by Ivan Yanikov + in http://bugs.gnu.org/15191#5 + +2015-07-17 Jim Meyering <meyering@fb.com> + + tests: fix mis-applied patch. + * tests/include-exclude: I applied "|sort" to the wrong creation + of "out", and didn't push the same patch that I'd tested. + + tests: avoid FS-dependent false-positive failure + * tests/include-exclude: Sort file name list, so that this test + is not sensitive to the order in which those names are returned + via readdir. I noticed the failure on a Fedora 21 system using ext4. + Also fix a typo: s/framework_failure+/framework_failure_/ + +2015-07-13 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix bug with --exclude-dir and command line + Reported by Aron Griffis in: http://bugs.gnu.org/21027 + * NEWS: Document this. + * src/grep.c (grepdirent): Don't check whether the file is skipped + when on the command line, as that's the caller's responsibility. + (main): Anchor the exclude patterns. + * tests/include-exclude: Adjust test case to match fixed behavior. + Add some more test cases. + + tests: fix $? typo in null-byte + * tests/null-byte: Don't assume $? survives an invocation of 'test'. + +2015-07-05 Jim Meyering <meyering@fb.com> + + maint: dfa: used unsigned types where appropriate + * src/dfa.c (case_folded_counterparts): Return unsigned int, not int. + Change type of two locals to unsigned int, to reflect that their + values are never negative. + (parse_bracket_exp): Adjust type of result at each use, as well + as that of related index variables. + +2015-07-04 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: build struct dfamust on demand + If we won't use KWset, do not build a "struct dfamust". + Now it is built only when needed. + * src/dfa.c (struct dfa) [musts]: Remove member. + (dfacomp): Don't build dfamust here. + (dfamustfree): New function to free a struct dfamust. + (dfamust): Make it a global function, and make it return a pointer + to a malloc'd struct dfamust. + (dfamusts): Remove it. + * src/dfa.h (struct dfamust) [next]: Remove member. + In the implementation preceding this patch, there was + never more than one of these in a given "struct dfa". + (dfamustfree, dfamust): Add prototypes. + (dfamusts): Remove prototype. + (dfaalloc): Declare with _GL_ATTRIBUTE_MALLOC. + To make that symbol usable there, move the inclusion + of "xalloc.h" from dfa.c to this file, dfa.h. + * src/dfasearch.c (kwsmusts): Adapt to use the new interface. + Update the comments to reflect reality. + This addresses http://bugs.gnu.org/17715 + +2015-07-04 Paul Eggert <eggert@cs.ucla.edu> + + grep: use recent gnulib syntax bits + * src/grep.c (Gcompile, Ecompile): Use plain RE_SYNTAX_GREP + and RE_SYNTAX_EGREP, now that we assume a recent-enough gnulib. + + maint: ignore gendocs_template_min + * doc/.gitignore: Add '/gendocs_template_min'. + + build: update gnulib submodule to latest + + dfa: '.' and '[^x]' now consistently match newline + * src/dfa.c (parse_bracket_exp, lex, add_utf8_anychar) + (match_anychar): RE_DOT_NEWLINE and RE_HAT_LISTS_NOT_NEWLINE + are about LF, not about eolbyte. This patch does not affect + 'grep', but may affect other users of dfa.c. + + grep: -z '[^x]' now consistently matches newline + Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/20974#19 + * NEWS: Document this. + * src/grep.c (Gcompile, Ecompile): Clear RE_HAT_LISTS_NOT_NEWLINE. + * tests/utf8-bracket: Test this. + +2015-07-03 Paul Eggert <eggert@cs.ucla.edu> + + grep: -z '.' now consistently matches newline + Problem reported by Balazs Kezes in: http://bugs.gnu.org/20974 + * NEWS: Document this. + * tests/utf8-bracket: New file, to test for this bug. + * src/grep.c (Gcompile, Ecompile): Also specify RE_DOT_NEWLINE. + * tests/Makefile.am (TESTS): Add it. + + grep: simplify print_line_middle slightly + * src/grep.c (print_line_middle): Simplify. + + grep: don't mishandle left context in -P + http://bugs.gnu.org/20957 + * src/pcresearch.c (jit_exec): New arg SEARCH_OFFSET. + Caller changed. + (Pexecute): Pass the left context to pcre_exec, so that PCRE + regular-expression matching can see it. + * tests/pcre-context: New file, to test for this bug. + * tests/Makefile.am (TESTS): Add it. + +2015-06-28 Jim Meyering <meyering@fb.com> + + tests/case-fold-backref: factor test + +2015-06-26 Paul Eggert <eggert@cs.ucla.edu> + + grep: don't hang on command-line fifo if -D skip + * NEWS: Document this. + * src/grep.c (skip_devices): + New function, with code taken from grepdirent. + (grepdirent): Use it. Avoid an unnecessary initialization. + (grepfile): If skipping devices, open files with O_NONBLOCK. + Throw in O_NOCTTY while we're at it. + (grepdesc): Skip devices here, too. Not only does this fix the + bug, it fixes an unlikely race condition if some other process + renames a device between fstatat and openat. + * tests/skip-device: Add a test for this bug. + + grep: minor tweaks + * src/grep.c (main): Change recently-added static vars to be + constants, which makes them sharable. Prefer 'return' to 'exit' + when returning/exiting from 'main'. Move decl closer to first use + and rename local from 'ok' (which was confusing) to 'status'. + Prefer named constant STDOUT_FILENO to unnamed constant 1. + +2015-06-26 Jim Meyering <meyering@fb.com> + + maint: unify three argv-processing calls + * src/grep.c (main): Unify three calls to grep_commandline_arg. + + maint: alphabetize anonymous enum member names + +2015-05-30 Paul Eggert <eggert@cs.ucla.edu> + + test: tighten tests for bracket exprs + * tests/posix-bracket: Test '[a-a[.-.]--]'. + Also, test that failures are with status 1 + (nonmatching data), not status 2 (invalid expressions). + +2015-04-26 Jim Meyering <meyering@fb.com> + + maint: update bootstrap from gnulib + * bootstrap: Update from gnulib. + + maint: reword a diagnostic not to trigger leading capital check + * src/pcresearch.c: Reword diagnostic to avoid "make syntax-check" + failure. + + maint: sort test names in tests/Makefile.am and add syntax-check rule + * cfg.mk (sc_sorted_tests): New rule. + * tests/Makefile.am (TESTS): Alphabetize. + +2015-04-25 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: make find_pred return NULL for an invalid predicate + This could never happen when invoked via grep, but could have triggered + a bug if dfa.c's find_pred function were invoked by some other program. + * src/dfa.c (find_pred): Return NULL for an invalid predicate. + * tests/invalid-char-class: New file to test for this. + * tests/Makefile.am (TESTS): Add that new file name to the list. + This addresses http://debbugs.gnu.org/18631 + +2015-04-06 Paul Eggert <eggert@cs.ucla.edu> + + build: improve pkg-config doc and error handling + Error-handling improvement suggested by Mike Frysinger in: + http://bugs.gnu.org/16757#29 + * NEWS: Document pkg-config changes. + * README-prereq: pkg-config is now a prereq when building from + repository. + * m4/pcre.m4 (gl_FUNC_PCRE): Report an error if pcre is explicitly + requested but not available. Defer to user-supplied PCRE_CFLAGS + and PCRE_LIBS. + + build: remove typo and don't bother with /usr/include/pcre + Problem reported by Holger Bruenjes. + * m4/pcre.m4: Remove test for /usr/include/libpng (a typo). + Come to think of it, don't bother worrying about + /usr/include/pcre, as hosts with that problem can use pkg-config + or configure with CFLAGS by hand. + + build: use pkg-config (if available) to configure libpcre + Problem reported by Mike Frysinger in: http://bugs.gnu.org/16757 + * bootstrap.conf (bootstrap_post_import_hook): + Copy pkg-config's pkg.m4. + * configure.ac: Invoke PKG_PROG_PKG_CONFIG. + * m4/pcre.m4 (gl_FUNC_PCRE): Rewrite to use pkg-config if + available, and to test that pcre_compile can be linked to. + * src/Makefile.am (AM_CFLAGS): Add PCRE_CFLAGS. + (grep_LDADD): Add PCRE_LIBS. + * src/pcresearch.c: Simply include <pcre.h> if HAVE_LIBPCRE, + since 'configure' arranges for the appropriate -I option now. + +2015-03-11 Paul Eggert <eggert@cs.ucla.edu> + + grep: output "." file name in diagnostic + This is bug C as reported by David Grayson in: + http://bugs.gnu.org/16444#18 + This bug occurs only in obscure circumstances, and I didn't see + how to write a reasonable test case for it. + * src/grep.c (filename_prefix_len): Remove, replacing with ... + (omit_dot_slash): New static var. All uses of the former replaced + with uses of the latter. + (grepdirent): Don't add 2 if the filename is just ".". + + egrep, fgrep: just use what's in PATH + * src/egrep.sh: Don't monkey with PATH; just use whatever 'grep' + is in the path. This is simpler, and lets the user specify + default options with a script for only grep, with no need for + egrep and fgrep scripts. + Fixes: bug#19998 + + doc: give a script wrapper example + * doc/grep.texi (Environment Variables): Give an example of a + wrapper script, as an alternative to using GREP_OPTIONS. + Fixes: bug#19998 + + doc: clarify how -a matches + * doc/grep.in.1, doc/grep.texi (File and Directory Selection): + Give an example of how non-text bytes affect pattern matching in + binary files. + Fixes: bug#20080 + +2015-02-23 Paul Eggert <eggert@cs.ucla.edu> + + Cover the non-INSTALL case + * README: Mention what to do if there is no INSTALL file. + Fixes: bug#19928 + +2015-02-11 Jim Meyering <meyering@fb.com> + + maint: use ASAN-poisoning more carefully + The ASAN-poisoning instituted by commit v2.21-14-g1555185 was + incomplete, since the poisoned tail of the read buffer could well + be the target of a legitimate follow-on read. To accommodate that, + we must unpoison each such region just before beginning fillbuf's + read loop. + * src/grep.c [HAVE_ASAN] (asan_poison): Define. + (clear_asan_poison): Define. + (fillbuf): Clear before reading, since we are likely to read + into memory that was poisoned on the preceding iteration. + * tests/two-files: New file, to test for this. + * tests/Makefile.am (TESTS): Add it. + +2015-02-10 Paul Eggert <eggert@cs.ucla.edu> + + Grow the JIT stack if it becomes exhausted + Problem reported by Oliver Freyermuth in: http://bugs.gnu.org/19833 + * NEWS: Document the fix. + * tests/Makefile.am (TESTS): Add pcre-jitstack. + * tests/pcre-jitstack: New file. + * src/pcresearch.c (NSUB): Move decl earlier, since it's needed + earlier now. + (jit_stack_size) [PCRE_STUDY_JIT_COMPILE]: New static var. + (jit_exec): New function. + (Pcompile): Initialize jit_stack_size. + (Pexecute): Use new jit_exec function. Report a useful diagnostic + if the error is PCRE_ERROR_JIT_STACKLIMIT. + +2015-02-01 Jim Meyering <meyering@fb.com> + + maint: reference CVE-2015-1345 from NEWS + * NEWS: Mention the CVE that was addressed by v2.21-13-g83a95bd, + "grep -F: fix a heap buffer (read) overrun". + +2015-01-18 Jim Meyering <meyering@fb.com> + + maint: convert "goto" to "continue" and remove now-spurious label + * src/kwset.c (bmexec_trans): Using "goto big_advance" here is + equivalent to using "continue". Make that change and remove + the now-unused label. + +2015-01-10 Jim Meyering <meyering@fb.com> + + tests: add support for ASAN memory poisoning + This lets us reliably detect with ASAN some UMR bugs + that would otherwise be detectable only some of the time + with MSAN. Use __asan_poison_memory_region to mark the unused + portion of a read buffer as inaccessible. Then, with ASAN, + any attempt to access those bytes results in an ASAN abort. + * src/system.h: Include "ignore-value.h". + (__has_feature): Define. + (HAVE_ASAN): Define when address sanitizer is enabled. + [HAVE_ASAN]: Declare these two __asan_* symbols. + [!HAVE_ASAN] (__asan_poison_memory_region): Define stub. + [!HAVE_ASAN] (__asan_unpoison_memory_region): Likewise. + * src/grep.c: Use __asan_poison_memory_region. + +2015-01-09 Yuliy Pisetsky <ypisetsky@fb.com> + + grep -F: fix a heap buffer (read) overrun + grep's read buffer is often filled to its full size, except when + reading the final buffer of a file. In that case, the number of + bytes read may be far less than the size of the buffer. However, for + certain unusual pattern/text combinations, grep -F would mistakenly + examine bytes in that uninitialized region of memory when searching + for a match. With carefully chosen inputs, one can cause grep -F to + read beyond the end of that buffer altogether. This problem arose via + commit v2.18-90-g73893ff with the introduction of a more efficient + heuristic using what is now the memchr_kwset function. The use of + that function in bmexec_trans could leave TP much larger than EP, + and the subsequent call to bm_delta2_search would mistakenly access + beyond end of the main input read buffer. + + * src/kwset.c (bmexec_trans): When TP reaches or exceeds EP, + do not call bm_delta2_search. + * tests/kwset-abuse: New file. + * tests/Makefile.am (TESTS): Add it. + * THANKS.in: Update. + * NEWS (Bug fixes): Mention it. + + Prior to this patch, this command would trigger a UMR: + + printf %0360db 0 | valgrind src/grep -F $(printf %019dXb 0) + + Use of uninitialised value of size 8 + at 0x4142BE: bmexec_trans (kwset.c:657) + by 0x4143CA: bmexec (kwset.c:678) + by 0x414973: kwsexec (kwset.c:848) + by 0x414DC4: Fexecute (kwsearch.c:128) + by 0x404E2E: grepbuf (grep.c:1238) + by 0x4054BF: grep (grep.c:1417) + by 0x405CEB: grepdesc (grep.c:1645) + by 0x405EC1: grep_command_line_arg (grep.c:1692) + by 0x4077D4: main (grep.c:2570) + + See the accompanying test for how to trigger the heap buffer overrun. + + Thanks to Nima Aghdaii for testing and finding numerous + ways to break early iterations of this patch. + +2015-01-08 Jim Meyering <meyering@fb.com> + + grep: avoid false-positive UMR + For some inputs, valgrind would report an uninitialized + memory read error, but it was harmless. + * src/grep.c (fillbuf): Initialize those trailing bytes. + +2015-01-01 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + + maint: update copyright year ranges to include 2015 + Run "make update-copyright". Also, ... + * grep.texi: Update manually, converting each "--" to "-". + +2014-12-15 Paul Eggert <eggert@cs.ucla.edu> + + doc: document binary-data heuristic better + Problem reported by Martin Hoch in: http://bugs.gnu.org/19388 + * doc/grep.texi (File and Directory Selection): + Document what non-text bytes are. + (Usage): Fix cross reference. + +2014-12-12 Jim Meyering <meyering@fb.com> + + maint: fix a new "make syntax-check" failure + * tests/dfa-match-aux.c: s/can not/cannot/ + +2014-12-12 Norihiro Tanaka <noritnk@kcn.ne.jp> + + build: avoid build failure with --enable-gcc-warnings and no PCRE + * src/pcresearch.c [HAVE_LIBPCRE] (empty_match): Guard the declaration + of this PCRE-only variable. + +2014-12-07 Paul Eggert <eggert@cs.ucla.edu> + + tests: port fmbtest to CentOS 6 and earlier + * tests/fmbtest: Port to platforms where the 'sed' pattern + '[^0-9]' does not match every non-digit character. Problem + reported by Norihiro Tanaka in: http://bugs.gnu.org/19293 + +2014-12-06 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: simplify dfaexec + * src/dfa.c (dfaexec): Simplify by rearrangement of IF conditions. + This commit induces no semantic change, and reverts part of commit + v2.5.4-144-gbafa134. + +2014-12-06 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: avoid invalid match or infinite loop in unused matching mode + Neither grep nor gawk uses this DFA code in its matching mode, + since each always calls dfacomp with a nonzero final argument. + However, when used in that mode, it had bug: + After failing to match in matching mode, it should return NULL, + but instead would either report a false match or enter an + infinite loop. + + * src/dfa.c (dfaexec_main): After failing to match in matching mode + return NULL, rather than transitioning to the next state. + * tests/dfa-match: Add a new test. + * tests/dfa-match-aux.c: Add a new program to exercise this + otherwise-unused part of dfa.c. + * tests/Makefile.am: Add a rule to build new test. + (check_PROGRAMS): Add dfa-match-aux. + (AM_CPPFLAGS): Add -I$(top_srcdir)/src. + (TESTS): Add dfa-match. + * cfg.mk (exclude_file_name_regexp--sc_bindtextdomain): + (exclude_file_name_regexp--sc_prohibit_atoi_atof): + Exempt the new test file from some syntax-check rules. + +2014-12-04 Santiago Ruano Rincón <santiago@debian.org> + + doc: document grep-2.11 change in behavior of -r, --recursive + * doc/grep.texi (--recursive, -r): Mention the new behavior + of recursively searching "." when there is no FILE argument. + * doc/grep.in.1: Likewise. + That change first appeared in grep-2.11, released on 2012-03-02. + +2014-11-24 Jim Meyering <meyering@fb.com> + + maint: correct for four Author: name misspellings + * .mailmap: Correct for misspelling in Norihiro Tanaka's last name + as listed in four commit Author: fields: s/Norihirio/Norihiro/ + +2014-11-23 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.21 + * NEWS: Record release date. + +2014-11-21 Jim Meyering <meyering@fb.com> + + tests: sjis-mb: remove now-obsolete and failing sub-tests + * tests/sjis-mb: Commit v2.18-123-geb3292b changed how grep + handles patterns with encoding errors. These SJIS tests are + skipped so often that we didn't notice until now that there were + two tests of that changed behavior, and that on any system with + the ja_JP.SHIFT_JIS locale, they would always fail. Remove those + two tests, since this functionality is well tested separately, + via tests/prefix-of-multibyte. + +2014-11-20 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep -F could erroneously fail to match in non-UTF8 multibyte locales + This fixes a bug that can strike only when using a non-UTF8 multibyte + locale like ja_JP.SHIFT_JIS. + + Consider this example: it would mistakenly fail to match before + this patch: + + printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A + + When searching for a single byte that happens to be the latter + byte of a multibyte character, and the target byte also follows + that multibyte character, grep -F would advance an internal pointer + by one byte too many, thus missing the target byte. A test case + for this bug is already included in tests/sjis-mb. + + * src/kwsearch.c (Fexecute): Skip one byte less, after matched middle of a + multi-byte character. Introduced by commit v2.18-119-gfb7d538. + +2014-11-17 Jim Meyering <meyering@fb.com> + + tests: big-match: disable OOM-provoking subtest + * tests/big-match: Our application of this regexp '^.*x\(\)\1' + to a file containing a single matching line of length 2GiB+2 + would cause inordinate memory consumption (over 100GB) via + regexec.c, but no leak. That would cause disruption on most + systems, so remove this subtest. Reported by Assaf Gordon. + +2014-11-16 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: avoid undefined behavior + * src/dfa.c (dfassbuild): Don't call memcpy with a second + argument of NULL, even when the size (3rd argument) is 0. + +2014-11-14 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + +2014-11-14 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep -F -x -o PAT would print an extra newline for each match + * src/kwsearch.c (Fexecute): Correctly compute the length of a match + by subtracting 2 (not 1) when match_lines is set. With -x, we augment + the "line" by both prepending and appending an EOLBYTE to the search + pattern. Here, we must correct for that. However, to compensate, + when we are using -x (--line-regexp) and start_ptr is NULL, we have + to add 1 to the length so that we still print the trailing EOLBYTE. + Introduced by commit v2.18-85-g2c94326. + * tests/match-lines: Add a new test. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + +2014-11-11 Paul Eggert <eggert@cs.ucla.edu> + + tests: port to Darwin + The 'sed' command 's/.//' does not delete all bytes in the C locale. + Problem reported by Nelson H. F. Beebe. + * tests/fmbtest: Don't assume that sed treats bytes with the + top bit set as valid characters in the C locale, as this is not + true for Darwin. Use the cs_CZ.UTF-8 locale instead, and + simplify the sed script. + + tests: fix recently-introduced stray output + * tests/init.cfg (require_pcre_): Remove stray debugging output. + + build: port to GCC 4.6.4 + glibc 2.5 + On platforms this old, building with _FORTIFY_SOURCE equal to 2 + results in duplicate definitions of standard library functions. + Problem reported by Nelson H. F. Beebe. + * configure.ac (_FORTIFY_SOURCE): Sort after GNULIB_PORTCHECK. + By default, do not enable this unless GNULIB_PORTCHECK is defined. + This better matches the original intent, which as I recall was to + enable these extra checks only with --enable-gcc-warnings. + + tests: port to libpcre sans UTF-8 support + Problem reported by Nelson H. F. Beebe. + * tests/pcre-infloop, tests/pcre-invalid-utf8-input, tests/pcre-utf8: + Skip the test unless PCRE works in an en_US.UTF-8 locale. + +2014-11-09 Jim Meyering <meyering@fb.com> + + tests: do not fail when the zh_CN.UTF-8 locale is not installed + * tests/word-multibyte: This test would fail on a system with + no zh_CN.UTF-8 locale. Use it only if it is installed. + + tests: avoid hex_printf_ portability problems + * tests/init.cfg (hex_printf_): Spell out a-f and A-F, for + non-C locales, ensure that the input to sed is newline-terminated, + and quote the final octal format string. + Suggestions from Paul Eggert. + +2014-11-08 Jim Meyering <meyering@fb.com> + + tests: avoid a multibyte tr portability problem + * tests/init.cfg (tr): New wrapper function. + See comments for details. Reported by Norihiro Tanaka + in http://debbugs.gnu.org/18991 + + maint: remove spurious LC_ALL setting from one test + * tests/word-multibyte: Remove unnecessary setting of LC_ALL. + + tests: fix typo in previous change + * tests/init.cfg (hex_printf_): Fix typo s/A-f/A-F/. + For the record, I introduced that error, not Norihiro. + +2014-11-08 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: avoid awk+printf+\xHH portability trap + * tests/init.cfg (hex_printf_): Rewrite in terms of printf and sed. + Using awk's printf with \xHH in the format string was not portable + to the awk of Solaris 10, AIX 7 or HP-UX 11.23, as reported in + http://debbugs.gnu.org/18987. + * tests/word-multibyte: Use printf rather than hex_printf_, + and give the character we're printing a name: e_acute (rather + than A-grave), since that is used in other tests. + a trailing \n in the format string, adjust by removing it, and + instead invoking echo. + * tests/multibyte-white-space: Simply remove each trailing \n. + They were not needed. + +2014-11-07 Jim Meyering <meyering@fb.com> + + tests: avoid printf+\xHH portability trap + * tests/word-multibyte: Using the bourne shell's printf function + with strings like "\xHH\xHH" happens to work for most interactive + shells, but not for dash. That is not portable. Use our hex_printf_ + awk wrapper instead. Without this change, this test would fail on + a Debian system for which /bin/sh is configured to be "dash". + + maint: move helper function, hex_printf to init.cfg + * tests/init.cfg (hex_printf_): New function, from ... + * tests/multibyte-white-space: ... here. Reflect the + s/hex_print/hex_printf_/ renaming. + +2014-11-02 Paul Eggert <eggert@cs.ucla.edu> + + grep: port O_NOFOLLOW errno checking to NetBSD + Problem reported by Assaf Gordon in: http://bugs.gnu.org/18892 + * NEWS: Document it. + * src/grep.c (open_symlink_nofollow_error): + New function, which does the right thing on NetBSD. + (grepfile): Use it. + +2014-10-31 Jim Meyering <meyering@fb.com> + + build: generate man pages even when existing targets are read-only + * doc/Makefile.am (grep.1): Use mv -f to move temporary to target, + in case the target is read-only. Also, always make the generated + files read-only. + (egrep.1 fgrep.1): Likewise. + This avoids a build failure reported by Eric Blake in + http://lists.gnu.org/archive/html/bug-grep/2014-10/msg00112.html + +2014-10-30 Jim Meyering <meyering@fb.com> + + tests: avoid false-positive failure due to some zh_CN.* locales + On some systems, and for some zh_CN.* locales (e.g., OpenBSD5.5) the + E-acute pair of bytes do not qualify as a word-constituent character. + * tests/word-multibyte: Use zh_CN.UTF-8, rather than "zh_CN". + Reported by Assaf Gordon and Bruce Dubbs in + http://debbugs.gnu.org/18892 + +2014-10-29 Jim Meyering <meyering@fb.com> + + gnulib: update to latest; bootstrap, too + * gnulib: Update to latest. + * bootstrap: Copy latest from gnulib. + +2014-10-28 Jim Meyering <meyering@fb.com> + + tests: make new test script executable + * tests/word-multibyte: Make this file executable. + +2014-10-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: make \w and \W work in multibyte locales + Reported by Jaroslav Skarvada in: http://bugs.gnu.org/18817 + Now, \w and \W are supported in not only single byte locale but multibyte + locale. + + * src/dfa.c (PUSH_LEX_STATE, POP_LEX_STATE): Move definitions "up", + so they are not within the function. + (lex): Make \w and \W work in a multibyte locale, the same way + we made \s and \S work. + * tests/word-multibyte: New test for this change. + * tests/Makefile.am: Add a rule to build new test. + * NEWS (Bug fixes): Mention it. + +2014-10-26 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: avoid false match in a non-UTF8 multibyte locale + This command should print nothing: + + printf '\263\244\263\244\n' \ + | LC_ALL=ja_JP.eucJP grep -E "$(printf '^x|\244\263')" + + Before this patch, it would print its sole input line. + * src/dfa.c (struct dfa): Add new members: min_trcount, + initstate_letter, initstate_others. + (dfaanalyze): Build states with not only a newline context but others. + (build_state): Don't release initial states. + (skip_remains_mb): Add a parameter. + Add a comment describing all parameters. + (dfaexec_main): When there are multiple start states, we are about + to transition from one state to another and the current byte is not + the first byte of a multibyte character, first advance past the + current multibyte character. + * tests/euc-mb: Add a new test. + * NEWS (Bug fixes): Mention it. + This addresses http://debbugs.gnu.org/18685 + +2014-10-25 Paul Eggert <eggert@cs.ucla.edu> + + tests: work around older libpcre bugs when testing -P and UTF-8 + * tests/pcre-invalid-utf8-input: Add require_timeout_ and + require_compiled_in_MB_support. Put a timeout of 3 seconds on + grep, to avoid having this test case loop forever with older + versions of libpcre, such as those found on RHEL 6.5. + Reported by Jim Meyering in: http://bugs.gnu.org/18806#34 + +2014-10-24 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: add test for grep -P fix + * tests/pcre-o: New test for this change. + * tests/Makefile.am (TESTS): Add it. + +2014-10-24 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix grep -P crash + Reported by Shlomi Fish in: http://bugs.gnu.org/18806 + Commit 9fa500407137f49f6edc3c6b4ee6c7096f0190c5 (2014-09-16) is a + hack that I put in to speed up 'grep -P'. Unfortunately, not only + is it violation of modularity, it's also a bug magnet, as we have + found out with Bug#18738 and Bug#18806. Remove the optimization + instead of applying more bandaids. Perhaps we can think of a + better way of doing the optimization, or perhaps we can just live + with a slower grep -P (as -P is inherently slower anyway...). + * src/grep.c, src/grep.h (validated_boundary): + Remove. All uses removed. + * src/pcresearch.c (Pexecute): Do not worry about validated_boundary. + +2014-10-19 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: remove two erroneous clauses from a now-unused function + RE_DOT_NEWLINE and RE_DOT_NOT_NULL apply only to a dot that + matches any character. Do not consider them when matching + with a bracket expression. + + * src/dfa.c (match_mb_charset): Remove tests for RE_DOT_NEWLINE + and RE_DOT_NOT_NULL. + +2014-10-19 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: process all MBCSET constructs via glibc's matcher + The DFA matcher does not support collating symbols or equivalence + classes, so ensure that any MBCSET reference is handled by the glibc + matcher. dfa.c already handled this in one case, but not the other, + so that a command like "printf '\0' |src/grep -aE '^\s?$'" would + mistakenly end up using dfa.c's match_mb_charset function rather + than glibc's matcher. + + * src/dfa.c (dfaexec_main): Move that code into the + State_transition macro. This renders the match_mb_charset + unused by grep. + * tests/multibyte-white-space: Add a test to exercise the + just-rendered-inaccessible code path. + +2014-10-15 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: initialize validation_boundary properly before use + * src/grep.c (main): Initialize validation_boundary before pre-searching + for an empty line. + +2014-10-15 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix off-by-one bug in -P optimization + Reported by Norihiro Tanaka in: http://bugs.gnu.org/18738 + * src/pcresearch.c (Pexecute): Fix off-by-one bug with + validation_boundary. + * tests/init.cfg (envvar_check_fail): Catch off-by-one bug. + +2014-10-08 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: fix a theoretical bug + * src/dfa.c (dfaexec_main): After searching for a match from + the initial state, set the previous state, S1, to 0. + So far, we have found no case in which this fix makes a difference. + See http://debbugs.gnu.org/18645 + +2014-10-07 Paul Eggert <eggert@cs.ucla.edu> + + doc: modernize and simplify man page + * doc/grep.in.1 (Tx, Id): Remove. All uses removed. + (MTO, URL): New macros, used for email and URL. + Use them when appropriate. + In main text, omit chatty discussions of other implementations; + the full manual suffices for this sort of thing. + + doc: clarify exit status + Reported by Santiago Ruano Rincón in: http://bugs.gnu.org/18651 + * doc/grep.in.1 (EXIT STATUS): + * doc/grep.texi (Exit Status): Clarify. + +2014-10-07 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: test for just-fixed bug + * tests/mb-dot-newline: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + Bisection suggests that the bug was introduced by + commit v2.18-123-geb3292b. Also see + http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580 + +2014-10-05 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: factor out a new nontrivial block of duplicated code + * src/dfa.c (State_transition): New macro. + (dfaexec_main): Use it twice. + + dfa: check end of input buffer after transition in non-UTF8 multibyte locale + * src/dfa.c (dfaexec_main): Check for end of input buffer after each + transition in a non-UTF8 multibyte locale. + * tests/mb-non-UTF8-overrun: New test. + * tests/Makefile.am (TESTS): Add it. + * src/grep.c (main): With this fix, we no longer need the fourth + byte of "eolbytes". + +2014-10-04 Jim Meyering <meyering@fb.com> + + grep: avoid stack buffer read-underrun and overrun + Testing binaries built with -fsanitize=address caused aborts due + to stack underrun and overrun. + * src/grep.c (main): Allocate a larger buffer for eolbytes: + one byte before the beginning and one more after the end. + For details, see http://debbugs.gnu.org/18580#44. + +2014-10-04 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: fix subscript error when testing whether empty lines match + src/grep.c (grep): When testing whether an empty line matches, + make the input buffer one byte longer, as dfaexec uses that + for a sentinel. + +2014-09-27 Paul Eggert <eggert@cs.ucla.edu> + + dfa: minor tweaks, mostly to remove __attribute__ ((noinline)) + That attribute isn't portable, and I found a way to get similar + performance with standard C features. + * NEWS: Document the recently-installed performance improvement. + * src/dfa.c (struct dfa): New member dfaexec. + (dfaexec_main): Remove unnecessary 'const'. + (dfaexec_mb, dfaexec_sb): Remove __attribute__ ((noinline)); + no longer needed. + (dfaexec): Use new dfaexec member. + (dfainit, dfaoptimize, dfassbuild): Initialize it. + +2014-09-27 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: separate dfaexec function to help optimization by compiler + * src/dfa.c (dfaexec_main): Rename from dfaexec, add inline attribute. + (dfaexec_mb): New function. Run it when d->multibyte is true. For this + function inlination must be avoided. + (dfaexec_sb): New function. Run it when d->multibyte is false. For this + function inlination must be avoided. + (dfaexec): Call dfaexec_mb or dfaexec_sb accoding to d->multibyte. + +2014-09-27 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: speed-up at initial state + DFA state is always 0 until have found potential match. So we improve + matching there by continuing to use the transition table. + + * src/dfa.c (skip_remains_mb): New function. + (dfaexec): Speed-up at initial state. + +2014-09-27 Paul Eggert <eggert@cs.ucla.edu> + + maint: generalize the -Wcast-align fix + * src/grep.c (CAST_ALIGNED): New macro. + (skip_easy_bytes): Use it. + +2014-09-27 Jim Meyering <meyering@fb.com> + + maint: suppress a false-positive -Wcast-align warning + Building with --enable-gcc-warnings and gcc-4.9.1 would provoke this: + grep.c:499:12: error: cast from 'const char *' to 'const uword *'\ + (aka 'const unsigned long *') increases required alignment from\ + 1 to 8 [-Werror,-Wcast-align] + for (s = (uword const *) p; ! (*s & hibyte_mask); s++) + ^~~~~~~~~~~~~~~~~ + * src/grep.c (skip_easy_bytes): Use a pragma to suppress + gcc's false-positive cast-alignment warning. + +2014-09-26 Paul Eggert <eggert@cs.ucla.edu> + + grep: don't check extensively for invalid prefix bytes unless -P + Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56 + * src/grep.c (grep): After the first buffer is checked, leave the + file-type checker in TEXTBIN_UNKNOWN state only when -P is used. + Only the -P matcher has performance problems with checking binary + data that make it worthwhile to check every prefix input byte so + the -P matcher's TEXTBIN_UNKNOWN optimizations can come into play. + Other matchers can simply check the data directly, and using + TEXTBIN_UNKNOWN with them slows 'grep' down for no benefit. + + grep: scan for valid multibyte strings more quickly + Scan valid multibyte strings more quickly in the common case of + encodings that are upward compatible with ASCII, such as UTF-8. + You'd think there'd be a fast standard way to do this nowadays, + but nooooo.... + Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56 + * src/grep.c (HIBYTE): New constant. + (easy_encoding): New static var. + (init_easy_encoding, skip_easy_bytes): New functions. + (uword): New type. + (buffer_textbin): Skip easy bytes quickly. + Don't bother with mb_clen here, since skip_easy_bytes typically + captures the easy cases; just use mbrlen directly. + (buffer_textbin, file_textbin): First arg is no longer a const + pointer, since the byte past the end is now an overwritten sentinel. + (fillbuf): Make room for a uword after the buffer, for skip_easy_bytes. + (main): Call init_easy_encoding. + +2014-09-17 Paul Eggert <eggert@cs.ucla.edu> + + grep: speed up processing of holes before EOF on Solaris + * src/grep.c (fillbuf): If SEEK_DATA fails with errno == ENXIO, + skip over the hole at EOF. + + grep: port to platforms lacking SEEK_DATA + Reported by Norihiro Tanaka in: http://bugs.gnu.org/18454#38 + * src/grep.c (SEEK_DATA): Default to SEEK_SET if not defined. + (SEEK_HOLE): Move to top level, and default it to SEEK_SET. + (file_textbin): Adjust to new default. + (fillbuf): Don't bother with SEEK_DATA if it defaults to SEEK_SET. + + grep: skip past holes efficiently + Take advantage of the relaxed rules for treating non-text bytes in + binary data, by efficiently skipping past holes on platforms + supporting lseek's SEEK_DATA flag. + On one test on a circa-2008 Sun Fire V40z running Solaris 11.2, + 'grep x' took 0.009 real-time seconds to scan a holey file of size + 9,223,372,036,854,775,802 bytes, for a nominal scan rate of 1 ZB/s. + grep 2.20's scan rate on this platform was 843 MB/s, so this is a + speedup by a factor of 1.2 trillion. The speedup factor is not + as great on GNU/Linux hosts, due to what appear to be SEEK_DATA + inefficiencies, but presumably this will be cleared up in time. + * NEWS: Document this. + * src/grep.c, src/grep.h (eolbyte): Now char, not unsigned char. + This is for compatibility with the rest of the code. + The old (performance?) reasons for 'unsigned char' are now moot. + * src/grep.c (skip_nuls, skip_empty_lines, seek_data_failed): + New static vars. + (totalnl): Move up, since it's about input, not output, and + fillbuf now uses it. + (add_count): Move up, since fillbuf now uses it. + (all_zeros): New function. + (fillbuf): Use SEEK_DATA to skip past holes efficiently, + on systems that support this. + (grep, main): Set the new static vars. + + grep: improve -P performance in typical cases + * src/grep.c, src/grep.h (enum textbin): Move to grep.h. + (input_textbin, validated_boundary): New vars. + * src/grep.c (grepbuf, grep): Initialize them. + * src/pcresearch.c (Pexecute): Do a multiline search + when the input is known to be free of encoding errors. + Quickly discard bytes that are obviously encoding errors. + Quickly match empty strings. + + grep: minor -P speedup with jit_stack + * src/pcresearch.c (jit_stack): No longer static. + + grep: non-text bytes in binary data may be treated as line ends + * NEWS, doc/grep.texi (File and Directory Selection): + Document this change. + * src/grep.c (zap_nuls): New function. + (grep): Use it. + * tests/null-byte: Relax to allow new behavior. + + grep: -z no longer considers '\200' to be binary data + This avoids a problem when using grep -z in a Windows-1252 locale. + Plus, it lets 'grep -z' run a bit faster. + * NEWS: Document this. + * src/grep.c (buffer_textbin): Don't look for '\200' if -z. + * tests/pcre-z: Test for new behavior. + + grep: refactor binary-vs-unknown-vs-text flags for clarity + * src/grep.c (enum textbin): New enum. + (textbin_is_binary): New function. + (buffer_textbin, file_textbin, grep): Use them, for clarity. + +2014-09-16 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix -P speedup bug with empty match + * src/pcresearch.c (NSUB): New top-level constant, replacing + 'nsub' within Pexecute. + (Pcompile, Pexecute): Use it. + (Pexecute): Don't assume sub[1] is zero after a PCRE_ERROR_BADUTF8 + match failure. + * tests/pcre-invalid-utf8-input: Test for this bug. + + grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILE + * src/pcresearch.c (Pcompile): Do not assume that + PCRE_STUDY_JIT_COMPILE is defined. + (empty_match): Define on all platforms. + + grep: use mbclen cache in one more place + * src/grep.c (fgrep_to_grep_pattern): Use mb_clen here, too. + + grep: avoid false alarms for mb_clen and to_uchar + * cfg.mk (_gl_TS_unmarked_extern_functions): New var, + to bypass the tight_scope false alarms on mb_clen and to_uchar. + + grep: use mbclen cache more effectively + * src/grep.c (buffer_textbin, contains_encoding_error): + Use mb_clen for speed. + (buffer_textbin): Bypass mb_clen in unibyte locales. + (main): Always initialize the cache, since it's sometimes used in + unibyte locales now. Initialize it before contains_encoding_error + might be called. + * src/search.h (SEARCH_INLINE): New macro. + (mbclen_cache): Now extern decl. + (mb_clen): New inline function. + * src/searchutils.c (SEARCH_INLINE, SYSTEM_INLINE): Define. + (mbclen_cache): Now extern. + (build_mbclen_cache): Put 1 into the cache when mbrlen returns 0. + (mb_goback): Use mb_len for speed, and rely on it returning nonzero. + * src/system.h (SYSTEM_INLINE): New macro. + (to_uchar): Use it. + + grep: improve performance for older glibc + glibc has a bug where mbrlen and mbrtowc mishandle length-0 inputs. + Working around it in gnulib slows grep down, so disable the tests for it + and make sure grep works even if the bug is present. + * bootstrap.conf (avoided_gnulib_modules): Add mbrtowc-tests. + * configure.ac (gl_cv_func_mbrtowc_empty_input): Assume yes. + * src/searchutils.c (mb_next_wc): Don't invoke mbrtowc on empty input. + + grep: treat a file as binary if its prefix contains encoding errors + * NEWS: + * doc/grep.texi (File and Directory Selection): + Document this. + * src/grep.c (buffer_encoding, buffer_textbin): New functions. + (file_textbin): Rename from file_is_binary. Now returns 3-way value. + All callers changed. + (file_textbin, grep): Check the input more carefully for text vs + binary data. + (contains_encoding_error): Remove; use replaced by buffer_encoding. + * tests/backref-multibyte-slow: + * tests/high-bit-range: + * tests/invalid-multibyte-infloop: + Use -a, since the input is now considered to be binary. + * tests/invalid-multibyte-infloop: Add a check for new behavior. + + grep: use bool for boolean in grep.c + * src/grep.c (show_version, suppress_errors, only_matching) + (align_tabs, match_icase, match_words, match_lines, errseen) + (write_error_seen, is_device_mode, usable_st_size) + (file_is_binary, skipped_file, reset, fillbuf, out_quiet) + (out_line, out_byte, count_matches, no_filenames, line_buffered) + (done_on_match, exit_on_match, print_line_head, prline, grep) + (grepdirent, grepfile, grepdesc, grep_command_line_arg) + (get_nondigit_option, main): Use bool for boolean. + (print_line_head, prline): Use char for byte. + * src/grep.h: Include <stdbool.h>, and adjust decls to match + changes in grep.c. + + grep: speed up -P on files containing many multibyte errors + * src/pcresearch.c (empty_match): New var. + (Pcompile): Set it. + (Pexecute): Use it. + + grep: remove/refactor unnecessary code about line splitting + * src/grep.c (do_execute): Remove. Caller now uses 'execute'. + * src/pcresearch.c (Pexecute): Improve comment about this. + +2014-09-12 Paul Eggert <eggert@cs.ucla.edu> + + grep: diagnose -P in non-UTF-8 multibyte locale + * src/pcresearch.c (Pcompile): + libpcre supports only unibyte and UTF-8 locales, + so report an error and exit if used in other locales. + * NEWS: Mention this. + * tests/euc-mb: Test this. + +2014-09-12 Jim Meyering <meyering@fb.com> + + doc: move NEWS note about GREP_OPTIONS into proper section + * NEWS (Changes in behavior): Move the note about GREP_OPTIONS + from the 2.20 section into the section for the upcoming release. + +2014-09-12 Paul Eggert <eggert@cs.ucla.edu> + + grep: make GREP_OPTIONS obsolescent + * NEWS: + * doc/grep.in.1 (ENVIRONMENT_VARIABLES): + * doc/grep.texi (Environment Variables): + Document that GREP_OPTIONS is obsolescent now. + * src/grep.c (main): Warn if GREP_OPTIONS is used. + * tests/r-dot, tests/skip-device: Don't use GREP_OPTIONS. + +2014-09-11 Paul Eggert <eggert@cs.ucla.edu> + + doc: bug tracker has moved to debbugs.gnu.org + * README (KNOWN BUGS): + * doc/grep.in.1: + * doc/grep.texi (Reporting Bugs): Document this. + + grep: fix false matches with -P '...$' and invalid UTF-8 + * tests/pcre-invalid-utf8-input: Add a test for that. + + grep: fix false matches with -P '...$' and invalid UTF-8 + * src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching + initial substrings of a line. + +2014-09-10 Jim Meyering <meyering@fb.com> + + tests: add expect-to-fail test for a glibc regexp bug + * tests/triple-backref: New file. + * tests/Makefile.am (TESTS): Add it. + (XFAIL_TESTS): List it as a known, always-failing test. + Based on the bug report from Paul Eggert: + https://sourceware.org/bugzilla/show_bug.cgi?id=17356 + + maint: avoid distcheck failure + * Makefile.am (EXTRA_DIST): Add .mailmap. + +2014-09-10 Paul Eggert <eggert@cs.ucla.edu> + + grep: port recent fix to older pcre version + * src/pcresearch.c (Pexecute): Don't assume that a pcre_exec + that returns PCRE_ERROR_NOMATCH leaves its sub argument alone. + This assumption is false for libpcre-3 version 8.31-2ubuntu2. + +2014-09-09 Paul Eggert <eggert@cs.ucla.edu> + + grep: -P now treats invalid UTF-8 input as non-matching + Problem reported by Santiago Vila in: http://bugs.gnu.org/18266 + * NEWS: Mention this. + * src/pcresearch.c (Pexecute): Treat UTF-8 encoding errors + as non-matching data, instead of exiting 'grep'. + * tests/pcre-infloop: grep now exits with status 1, not 2. + * tests/pcre-invalid-utf8-input: grep now exits with status 0, not 2. + +2014-08-14 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix integer-width bugs in undossify_input etc. + undossify_input bug reported by Vincent Lefevre in: + http://bugs.gnu.org/18269 + * src/dosbuf.c (undossify_input): Return size_t, not int. + * src/grep.c (fillbuf): Work portably even if safe_read returns a + value greater than SSIZE_MAX, e.g., if there's an I/O error. + +2014-08-03 Paul Eggert <eggert@cs.ucla.edu> + + doc: document LANGUAGE + Reported by Benno Schulenberg in: http://bugs.gnu.org/18185 + * doc/grep.texi (Environment Variables): Document LANGUAGE. + + doc: prefer @env to @code + Reported by Benno Schulenberg in: http://bugs.gnu.org/18184 + * doc/grep.texi: Avoid @code in favor of @env, or of nothing at all. + +2014-07-11 Paul Eggert <eggert@cs.ucla.edu> + + doc: Document -r vs --exclude more carefully. + Problem reported by Hugues Andreux in: http://bugs.gnu.org/17763 + * doc/grep.texi (File and Directory Selection): Be more careful + about documenting the interaction between recursive searching, + --include, --exclude, and --exclude-dir. + +2014-06-27 Jim Meyering <meyering@fb.com> + + maint: split long lines, and enforce the 80-column limit + * cfg.mk (sc_long_lines): New rule, from coreutils; exempt tests/* + * src/grep.c (usage): Tweak -F wording to shorten a line. + Correct grammar in a comment. + Split the --exclude-file=... description to fit within 80 columns. + Use emit_bug_reporting_address, eliminating another long line. + * src/dfa.c: Split long lines. No semantic change. + * doc/grep.texi: Likewise. + * tests/include-exclude: Split a long line. + * tests/backref: Split long lines. + * tests/empty: Likewise. + * tests/fmbtest: Likewise. + + doc: update HACKING + * HACKING: Update from coreutils. + + maint: generate distributed THANKS from VC'd THANKS.in + * Makefile.am (THANKS): New rule. + * THANKS.in: New file. + * THANKS: Remove. Now it's generated from the combination of + THANKS.in and git logs. + * .mailmap: New file. + * cfg.mk (sc_THANKS_in_duplicates): New syntax-check rule, from + coreutils. + * .gitignore: Add THANKS. + * thanks-gen: New file, from coreutils. + +2014-06-27 Paul Eggert <eggert@cs.ucla.edu> + + grep: with -E, unmatched ')' matches itself + Problem reported by Nathan Weeks in: http://bugs.gnu.org/17856 + * src/grep.c (Ecompile): Also specify RE_UNMATCHED_RIGHT_PAREN_ORD. + * doc/grep.texi (Fundamental Structure), NEWS: Document this. + * tests/ere.tests: Add a couple of tests for this. + * tests/spencer1.tests: Fix exit status. + +2014-06-17 Paul Eggert <eggert@cs.ucla.edu> + + build: avoid -Wstack-protector + This allows the use of --enable-gcc-warnings on Gentoo and Ubuntu. + See: http://bugs.gnu.org/17793 + * configure.ac (WERROR_CFLAGS): Avoid -Wstack-protector. + + This can be worked around, but the cure is worse than the disease. + +2014-06-17 Paul Eggert <eggert@cs.ucla.edu> + + build: don't make output files read-only + This led to problems, such as the prompt "mv: try to overwrite + 'egrep', overriding mode 0555 (r-xr-xr-x)? " during a build. + It can be worked around, but the cure is worse than the disease; + making output files read-only is more trouble than it's worth. + * doc/Makefile.am (grep.1, egrep.1, fgrep.1): + * lib/Makefile.am (colorize.c): + * src/Makefile.am (egrep fgrep): + Don't make output files read-only. Prefer separate commands to + '&&' when either will do. + +2014-06-08 Paul Eggert <eggert@cs.ucla.edu> + + maint: remove grep.spec + * grep.spec: Remove; obsolete and evidently not used. + +2014-06-07 Paul Eggert <eggert@cs.ucla.edu> + + doc: use gnulib fdl module + * bootstrap.conf (gnulib_modules): Add fdl. + * doc/fdl.texi: Remove, as this now comes from gnulib. + * doc/.gitignore: Update to match current sources. + +2014-06-06 Jim Meyering <meyering@fb.com> + + build: improve rule to generate egrep+fgrep scripts + * src/Makefile.am (egrep fgrep): chmod a=rx generated files, + and remove $@-t before attempting to redirect to it, in case it + is read-only. + + build: don't redirect directly to $@ + * lib/Makefile.am (colorize.c): Don't redirect directly to target, $@. + Otherwise, we could create a corrupt colorize.c file with a + timestamp that indicates it is up to date. + Also, make the generated file read-only. + +2014-06-05 Paul Eggert <eggert@cs.ucla.edu> + + grep: undo part of previous change + * src/dfa.c (enlist): Undo part of previous change that doesn't + look correct and doesn't help performance much anyway. + + grep: use system strstr if available and fast + Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/17700 + * NEWS: Document this. + * bootstrap.conf (gnulib_modules): Add strstr. + * src/dfa.c (istrstr): Remove. + (enlist): Use strstr instead. Wait until we need memory before + allocating it; this can save an unnecessary allocate and free. + + build: update gnulib submodule to latest + +2014-06-03 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.20 + * NEWS: Record release date. + +2014-05-30 Jim Meyering <meyering@fb.com> + + grep: fix --max-count=N (-m N) to stop reading after Nth match + With --max-count=N (-m N), grep is supposed to stop reading input + after it has found the Nth match. However, a recent context- + related change made it so grep would always read to end of file. + * src/grep.c (prtext): Don't let a negative "out_after" value + make "pending" line count negative. + * tests/max-count-overread: New test, for this. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + * THANKS: Add names of two recent bug reporters. + This bug was introduced by commit v2.18-139-g5122195. + Reported by Marc Aldorasi in http://bugs.gnu.org/17640. + +2014-05-29 Jim Meyering <meyering@fb.com> + + dfa: fix off-by-one under-allocation from recent change + Commit v2.19-10-gc32ff67 mistakenly made this change: + -realloc_trans_if_necessary (d, 1); + +realloc_trans_if_necessary (d, 0); + which led to a heap buffer overflow. + * src/dfa.c (dfaexec): Allocate space for one state, as before. + +2014-05-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: fix bug with regex containing multiple begin/end-line constraints + grep -E 'a(b$|c$)' would mistakenly match "aa". + * src/dfa.c (dfamust): When resetting 'is' in OR, also reset + 'begline' and 'endline' of 'must'. + * NEWS (Bug fixes): Mention it. + This bug was introduced via commit v2.18-85-g2c94326. + Reported by Péter Radics in <http://bugs.gnu.org/17617>. + +2014-05-26 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: simplify building initial state + build_state_zero doesn't need the struct dfa to be initialized, + so remove the initialization and simplify. + * src/dfa.c (build_state_zero): Remove. + (dfaexec): Call realloc_trans_if_necessary and build_state directly. + + dfa: revert "grep: do not count newline before the start of buffer" + This reverts commit 5dc3af2806d21455b818be3f9da26c372e4a7f8d. + The previous change renders that commit unnecessary. + + dfa: do not clear the first state of a transition table + If number of DFA states reaches 1024, build_state clears transition + tables to save memory. However, the initial state is always used, + so clearing it just wastes time. + * src/dfa.c (build_state): Do not clear the initial state's + transition and failure tables. + + grep: remove unnecessary argument + * src/grep.c (do_execute): Remove argument 'start_ptr'. It's always null. + All uses changed. + +2014-05-24 Paul Eggert <eggert@cs.ucla.edu> + + grep: --exclude-dir=FOO/ now ignores the trailing slash + Problem reported by Khaled Ziyaeen; see: http://bugs.gnu.org/17481 + * NEWS, doc/grep.texi (File and Directory Selection): Document this. + * src/grep.c (main): Implement this. + * tests/include-exclude: Test this. + + dist: don't distribute lib/colorize.c + 'configure' creates this file, so it shouldn't be distributed; see: + http://bugs.gnu.org/17480 + * configure.ac (COLORIZE_SOURCE): New macro. + Don't use AC_CONFIG_LINKS for lib/colorize.c. + * lib/Makefile.am (nodist_libgreputils_a_SOURCES): New macro. + (libgreputils_a_SOURCES): Remove colorize.c. + (CLEANFILES): Add colorize.c + (colorize.c): New rule. + +2014-05-23 behoffski <behoffski@grouse.com.au> + + maint: uncapitalize first letter of two dfaerror message strings + * dfa.c (lex): Make two message strings consistent with all of + the others: do not capitalize the first letter of the first word. + +2014-05-23 Jim Meyering <meyering@fb.com> + + maint: revert "grep: port mb_next_wc to RHEL 6.5 x86-64" + This reverts commit v2.18-148-ga6ae68d. + Now that we have gnulib change v0.1-131-g2a045bc, "mbrlen, mbrtowc: + fix bug with empty input", this work-around is no longer needed. + + gnulib: update, for mbrlen/mbrtowc empty input bug fix + +2014-05-22 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.19 + * NEWS: Record release date. + +2014-05-21 Jim Meyering <meyering@fb.com> + + maint: avoid new false-positive syntax-check failure + * cfg.mk (exclude_file_name_regexp--sc_prohibit_doubled_word): + Exempt new test file that contains legitimate use of "in in". + +2014-05-17 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: add test case for newline-count fix + * tests/count-newline: New test. + * tests/Makefile.am (TESTS): Add it. + +2014-05-16 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: do not count newline before the start of buffer + * src/dfa.c (build_state): When checking whether the previous + character was a newline, do not count any newline before the + start of the buffer. + +2014-05-15 Paul Eggert <eggert@cs.ucla.edu> + + grep: port mb_next_wc to RHEL 6.5 x86-64 + * src/searchutils.c (mb_next_wc): Work around glibc bug 16950; see: + https://sourceware.org/bugzilla/show_bug.cgi?id=16950 + This bug was masked in the other GNU/Linux tests I made. It was + exposed on RHEL 6.5 x86-64, where the compiler (GCC Red Hat 4.4.7-4) + happened to use temporaries in a different way. + Also see recent changes to the Gnulib documentation in this area: + http://lists.gnu.org/archive/html/bug-gnulib/2014-05/msg00013.html + + tests: port mb-non-UTF8-performance to RHEL 6.5 + * tests/mb-non-UTF8-performance (timeout): Use an integer, + as 'timeout 1.234' doesn't work in EUC locales. + +2014-05-12 Paul Eggert <eggert@cs.ucla.edu> + + egrep, fgrep: port to Solaris 10 /bin/sh + This old shell doesn't grok ${0%/*}; see: http://bugs.gnu.org/17471 + * src/Makefile.am (egrep fgrep): Don't assume the shell does substrings. + * src/egrep.sh (dir): New var, so that the substring calculation is + done only once (which is a small win even with newer shells), + and so that the calculation is easier to edit on older shells. + +2014-05-10 Jim Meyering <meyering@fb.com> + + maint: NEWS: adjust wording to reflect move + * NEWS (Improvements): Correct direction-relative wording, + now that the referent is below, not above. + + maint: NEWS: move "Improvements" to the top + * NEWS: Move the small "Improvements" section to precede + the longer "Bug fixes" one. + + gnulib: update submodule to latest, and bootstrap + * gnulib: Update submodule. + * bootstrap: Update from gnulib. + +2014-05-10 Paul Eggert <eggert@cs.ucla.edu> + + dfa: omit double includes + * src/dfa.c: Don't include stddef.h or stdbool.h, as dfa.h includes + them already, and it's the same module as we are. + Suggested by Aharon Robbins in: http://bugs.gnu.org/17458 + + dfa: fix bug with \< etc in multibyte locales + Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16867 + * NEWS: Document the fix. + * src/dfa.c (dfaoptimize): Remove any superset if changing from + UTF-8 to unibyte, and if the pattern has no backreferences. + (dfassbuild): In multibyte locales, treat \< \> \b \B as + backreferences in the DFA, since the DFA relies on unibyte + tests to check them. + (dfacomp): Optimize after building the superset, so that + dfassbuild can depend on d->multibyte. A downside is that + dfaoptimize must remove supersets that are likely slower than the + DFA after optimization, but that's been done in the + above-described change. + * tests/Makefile.am (XFAIL_TESTS): Remove word-delim-multibyte, + since the test works now. + + tests: add test case for -C 0 change + * tests/context-0: New test. + * tests/Makefile.am (TESTS): Add it. + + grep: -A 0, -B 0, -C 0 now output a separator + Problem reported by Dan Jacobson in: http://bugs.gnu.org/17380 + * NEWS: + * doc/grep.texi (Context Line Control): Document this. + * src/grep.c (prtext): Output a separator even if context is zero. + (main): Default context is now -1, not 0. + +2014-05-09 Paul Eggert <eggert@cs.ucla.edu> + + grep: minor improvements to retry-DFA-superset patch + * src/dfasearch.c (EGexecute): Avoid unnecessary test in a context + where memrchr cannot return a null pointer. + +2014-05-09 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: retry DFA superset after matching multiple lines + * src/dfasearch.c (EGexecute): Without this patch, the code reverts + to KWset when the DFA superset matches multiple lines. + However, if the DFA superset matches multiple lines, it most likely + also matches a single line, and reverting to KWset means dfafast + won't work effectively. Change the code so that it retries the DFA + superset immediately after it matches multipline lines. On my platform + this improves the performance of "LC_ALL=C grep '\(ab\)cd\1d' k" from + 3.48 to 2.14 seconds realtime, where k contains the output of + "yes abcdabc | head -50000000". + + dfa: fix inconsistency in multibyte locales + * src/dfa.c (dfaexec): Use the same exit condition in multibyte + locales as in unibyte. + +2014-05-08 Jim Meyering <meyering@fb.com> + + maint: mark some breakless cases with /* fallthrough */ comment + * src/dfa.c (addtok_mb, dfaanalyze): Add comment so that it is + clear that the "break" statement is deliberately omitted. + +2014-05-08 Paul Eggert <eggert@cs.ucla.edu> + + dfa: assume C89 for CHAR_BIT + * src/dfa.c (CHARBITS): Remove. All uses replaced by CHAR_BIT. + (NOTCHAR): Now an enum, since it need not be a macro. + + dfa: don't assume unsigned int is exactly 32 bits wide + Sun C 5.12 (sparc) warns of the potential unportability. + * src/dfa.c (charclass_word): New type, for clarity. + All relevant uses of 'unsigned' changed. + (CHARCLASS_WORD_BITS): Rename from INTBITS. All uses changed. + Now an enum, since it needn't be a macro. + (CHARCLASS_WORD_MASK): New macro. + (CHARCLASS_WORDS): Rename from CHARCLASS_INTS. All uses changed. + (setbit, clrbit): Cast 1 to charclass_word, for clarity. + (notset, add_utf8_anychar, dfastats): + Don't assume unsigned int is exactly 32 bits wide. + (dfastate): Don't rely on implementation-defined conversion of + greater-than-INT_MAX unsigned to int. Change bit test to resemble + tstbit more. + + maint: fix indenting to pacify 'prohibit_tab_based_indentation' + * src/dfa.c: Use spaces and not tabs to indent some lines. + + grep: simplify and clarify invert-related code + * src/grep.c (out_invert, prtext): Use bool for booleans. + (prline): Remove unnecessary '!!' on a value that is always 0 or 1. + (prtext): Remove last arg NLINESP; use !out_invert instead. All uses + changed. Move decls to nearer uses, since we can assume C99 here. + Update 'outleft' and 'after_last_match' here; it's simpler. + (grepbuf): Compute return value by subtracting new from old 'outleft', + rather than by keeping a separate running total. Avoid code duplication + by arranging for prtext to be called from one place, not three. + +2014-05-08 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: improve performance of -v when combined with -L, -l or -q + Problem reported by Jörn Hees in: http://bugs.gnu.org/17427 + * src/grep.c (grepbuf, grep): When -v is combined with -L, -l, or -q, + don't read data unnecessarily after a non-match is found. + +2014-05-06 Paul Eggert <eggert@cs.ucla.edu> + + doc: mention performance changes + * NEWS: Discuss recent performance improvements and downgrades. + + dfa: clarify use of "if" + The phrase "Y is true if X" is logically equivalent to "X implies Y", + but often "X if and only if Y" was intended. + * src/dfa.c, src/dfa.h: Reword to avoid the incorrect use of "if". + + dfa: minor performance improvement for previous change + * src/dfa.c (struct dfa): New member 'fast'. Remove 'has_backref'. + All uses changed. + +2014-05-06 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: speed up 'dfaisfast' + * src/dfa.c (struct dfa): New member 'has_backref'. + (addtok_mb): Set it. + (dfaisfast): Use it. + +2014-05-05 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix -w match next to a multibyte letter + * NEWS: Document this. + * src/dfasearch.c, src/kwsearch.c (WCHAR): Remove. + (wordchar): New static function. + * src/dfasearch.c (EGexecute): + * src/kwsearch.c (Fexecute): Use the new functions, so that the + code works correctly if a multibyte character adjacent to the + match has two or more bytes. + * src/search.h, src/searchutils.c (mb_prev_wc, mb_next_wc): + New functions. + * tests/word-delim-multibyte: Add a test for grep -w (which now + passes), and a test for \> (which still fails). The \< test also + still fails. + + grep: improve internal API for multibyte boundary + * src/search.h, src/searchutils.c (mb_goback): Rename from + is_mb_middle. Omit last arg. Return number of bytes to go back, + not just a boolean. All uses changed. + * src/dfasearch.c (EGexecute): + * src/kwsearch.c (Fexecute): Adjust to API change. + * src/kwsearch.c (Fexecute): Eliminate common subexpression. + + grep: fix encoding-error incompatibilities among regex, DFA, KWset + This follows up to http://bugs.gnu.org/17376 and fixes a different + set of incompatibilities, namely between the regex matcher and the + other matchers, when the pattern contains encoding errors. + The GNU regex matcher is not consistent in this area: sometimes + an encoding error matches only itself, and sometimes it + matches part of a multibyte character. There is no documentation + for grep's behavior in this area and users don't seem to care, + and it's simpler to defer to the regex matcher for problematic + cases like these. + * NEWS: Document this. + * src/dfa.c (ctok): Remove. All uses removed. + (parse_bracket_exp, atom): Use BACKREF if a pattern contains + an encoding error, so that the matcher will revert to regex. + * src/dfasearch.c, src/grep.c, src/pcresearch.c, src/searchutils.c: + Don't include dfa.h, since search.h now does that for us. + * src/dfasearch.c (EGexecute): + * src/kwsearch.c (Fexecute): In a UTF-8 locale, there's no need to + worry about matching part of a multibyte character. + * src/grep.c (contains_encoding_error): New static function. + (main): Use it, so that grep -F is consistent with plain fgrep + when the pattern contains an encoding error. + * src/search.h: Include dfa.h, so that kwsearch.c can call using_utf8. + * src/searchutils.c (is_mb_middle): Remove UTF-8-specific code. + Callers now ensure that we are in a non-UTF-8 locale. + The code was clearly wrong, anyway. + * tests/fgrep-infloop, tests/invalid-multibyte-infloop: + * tests/prefix-of-multibyte: + Do not require that grep have a particular behavor for this test. + It's OK to match (exit status 0), not match (exit status 1), or + report an error (exit status 2), since the pattern contains an + encoding error and grep's behavior is not specified for such + patterns. Test only that KWset, DFA, and regex agree. + * tests/prefix-of-multibyte: Add tests for ABCABC and __..._ABCABC___. + +2014-05-04 Paul Eggert <eggert@cs.ucla.edu> + + dfa: minor simplification + * src/dfa.c (parse_bracket_exp): Use enum, not macro, and move var + to just the scope it's needed. + + grep: simplify and fix problems with KWset-DFA agreement patch + * src/dfa.c (dfambcache, parse_bracket_exp): Simplify. + (mbs_to_wchar, wctok, FETCH_WC, match_anychar, match_mb_charset) + (check_matching_with_multibyte_ops, transit_state_consume_1char) + (transit_state, dfaexec): Use wint_t, not wchar_t, so that + WEOF is treated correctly on platforms where WEOF is not a valid + wchar_t value. + (ctok, lex): Use int, not unsigned int, for characters, + so that EOF is treated more naturally. + (parse_bracket_exp): Use NOTCHAR to mark uninitialized char, since + FETCH_WC can now set the char to EOF. + (lex): Remove unnecessary test for EOF. + (parse_bracket_exp, atom): Swap then and else parts, to put + the small one first; this is more readable here. + * src/searchutils.c (is_mb_middle): Simplify. + + tests: improve coverage for prefix-of-multibyte + * tests/prefix-of-multibyte: Also test the regex version. + +2014-05-04 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: make KWset and DFA agree about invalid sequences in patterns + See: http://bugs.gnu.org/17376 + * src/dfa.c (dfambcache): Don't cache invalid sequences, because they can't be + represented by wide characters. + (dfambcache, mbs_to_wchar): Return WEOF for invalid sequences. + (ctok): New global variable. + (parse_bracket_exp, atom, match_anychar, match_mb_charset): Don't allow WEOF. + (lex): Set 'ctok'. + * src/kwsearch.c (Fexecute): + * src/searchutils.c (is_mb_middle): Don't check here. + * tests/invalid-multibyte-infloop: Adjust to fixed behavior. + * tests/prefix-of-multibyte: Add test cases for this bug. + +2014-05-03 Jim Meyering <meyering@fb.com> + + maint: make ChangeLog generation more robust + * Makefile.am (gen-ChangeLog): Sync changes from GNU coreutils, + to ensure exit status is propagated, and to support an optional + git-log-fix file. + +2014-05-03 Paul Eggert <eggert@cs.ucla.edu> + + grep: clarify EGexecute slightly + * src/dfasearch.c (EGexecute): Change if-then-else to !if-else-then. + +2014-05-03 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: fix the bug in previous patch. + * src/dfasearch.c (EGexecute): Do it. + +2014-04-30 Paul Eggert <eggert@cs.ucla.edu> + + grep: simplify EGexecute further + * src/dfa.c, src/dfa.h (dfasuperset): Arg is now const pointer. + Now pure. + * src/dfasearch.c (EGexecute): Coalesce some duplicate code. + Don't worry about memrchr returning NULL when that's impossible. + +2014-04-30 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: adjust timing back to kwset when dfaisfast is true + * src/dfasearch.c (EGexecute): If DFA fails after kwset succeeds, + the code doesn't return to kwset until it reaches the end of the buffer + or finds a match. Because of this, although some cases speed up, + others slow down. + + Adjust the heuristic for switching to the DFA, so that it + is more likely to switch at the right times. + +2014-04-30 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: simplify superset + * src/dfa.h (dfahint): Remove decl. + (dfasuperset): New decl. + * src/dfa.c (dfahint): Remove. + (dfassbuild): Rename from dfasuperset. + (dfasuperset): New function. It returns the superset of D. + * src/dfasearch.c: Use dfasuperset instead of dfahint, and simplify. + + dfa: optimize memory allocation + * src/dfa.c (epsclosure): get the value of 'visited' from the argument. + (dfaanalyze): Define and allocate variable 'visited'. + (dfastate): Use not 'insert' but 'merge' to insert positions for + state 0 of DFA. + +2014-04-29 Norihiro Tanaka <noritnk@kcn.ne.jp> + + kwset: improve performance by inlining tr + Without this change, older versions of GCC won't inline 'tr', and this + can hurt performance significantly. See: http://bugs.gnu.org/17229#64 + * src/kwset.c (tr): Make it inline. + +2014-04-27 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + * gnulib: This fixes a bug whereby running bootstrap + would remove our build-aux/git-log-fix file. + +2014-04-27 Paul Eggert <eggert@cs.ucla.edu> + + kwset: improve performance by inlining more + Problem reported by Norihiro Tanaka in <http://bugs.gnu.org/17229#55>. + * src/kwset.c (bmexec_trans): Rename from bmexec, and make it inline. + (bmexec): New implementation, which calls bmexec_trans. This helps + GCC inline more aggressively with the default optimization, and + improves performance 25% with the reported benchmark on my host. + +2014-04-26 Paul Eggert <eggert@cs.ucla.edu> + + kwset: speed up by using memchr2 + Idea suggested by Eric Blake in: http://bugs.gnu.org/17229#43 + * bootstrap.conf (gnulib_modules): Add memchr2. + * src/kwset.c: Include stdint.h, for uintptr_t. Include memchr2.h. + (struct kwset): New members gc1, gc2, gc1help. + (tr): Move earlier, so it can be used earlier. + (kwsprep): Initialize struct kwset's new members. + (memchr_kwset): Rename from memchr_trans. Combine C and TRANS args into + new arg KWSET. All uses changed. Use memchr2 when appropriate. + (bmexec): Use new members instead of recomputing their values. + Increase advance_heuristic; it's just a guess, but memchr2 probably + makes it reasonable to increase it. + + kwset: improve performance when large Boyer-Moore key doesn't match + * src/kwset.c (bmexec): As a heuristic, prefer memchr to seeking + by delta1 only when the latter doesn't advance much. + + dfa: fix index bug in previous patch, and simplify + * src/dfa.c, src/dfa.h (dfaisfast): Arg is const pointer. + * src/dfa.c (dfaisfast): Simplify, since supersets never contain BACKREF. + * src/dfa.h (dfaisfast): Declare to be pure. + * src/dfasearch.c (EGexecute): Fix typo that could cause buffer + read overrun when !dfafast. Hoist duplicate computation out + of an if's then and else parts. + +2014-04-26 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: speed up for a case to repeat failure in DFA after success in kwset + A DFA is typically much faster if it is unibyte and does not set BACKREF. + Skip kwset if the DFA is fast. For example: + + yes abcdabc | head -50000000 >k + env LC_ALL=C time -p src/grep -i 'abcd.bd' k + + This improved real-time from 4.86 to 1.34 s. + + * src/dfa.c, src/dfa.h (dfaisfast): New function. + * src/dfasearch.c (EGexecute): Use it. + +2014-04-24 Paul Eggert <eggert@cs.ucla.edu> + + dfa: fix recently-introduced memory leak + Problem reported by Aharon Robbins in: http://bugs.gnu.org/17341 + * src/dfa.c (dfasuperset): free after dfafree. + + misc: fix doc and test bugs re grep -z + Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16871 + * doc/grep.texi (Usage): Remove incorrect example with -P. + * tests/pcre: Improve test so that it actually tests whether \s + matches a newline. + + dfa: minor simplification of dfaexec + * src/dfa.c (dfaexec): Streamline updating of returned values. + Don't bother to check d->multibyte before updating mbp. + Avoid duplicate p > end test. + +2014-04-24 Paul Eggert <eggert@cs.ucla.edu> + + dfa: simplify and be more consistent about MB_CUR_MAX + * src/dfa.c (struct dfa): New member 'multibyte', + replacing 'mb_cur_max'. All uses changed. Use this new member + consistently, instead of sometimes referring to MB_CUR_MAX directly. + + dfa: fix comment + * src/dfa.c (maybe_realloc): Fix comment to match behavior better. + +2014-04-24 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: skip checking of multibyte character boundary, reaching at eolbyte + * src/dfa.c (dfaexec): Skip checking of multibyte character boundary, + reaching at eolbyte. + +2014-04-24 Paul Eggert <eggert@cs.ucla.edu> + + dfa: fix incorrect comment that led to heap overrun + * dfa.c (maybe_realloc): Fix comment to match behavior. + + dfa: minor tuneup of dfamust memory savings patch + * src/dfa.c (allocmust): Use xmalloc, not xzalloc. + Initialize the must completely, so that the caller need not + invoke resetmust. All callers changed. + (dfamust): Omit asserts that aren't needed on typical machines + where dereferencing NULL dumps core. Don't leak memory if the + pattern contains a NUL byte. + +2014-04-24 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: avoid wasting memory for large patterns in dfamust + * src/dfa.c (struct must): New member 'prev'. It points to the + previous must. + (allocmust): New function. + (freemust): New function. + (dfamust): Use it. + +2014-04-24 Jim Meyering <meyering@fb.com> + + grep: fix new heap write buffer overrun + * src/dfa.c (parse_bracket_exp): Fix off-by-one allocation error. + Exposed by running the tests with an ASAN-enabled binary (i.e., + created using gcc's -fsanitize=address option). Introduced by + commit v2.18-70-gd3d9612, "dfa: simplify range char allocation". + +2014-04-24 Paul Eggert <eggert@cs.ucla.edu> + + build: suppress unsafe-loop-optimizations warnings + I ran into one of these while trying out GCC 4.9.0's new + -fsanitize=undefined option. The warning told me that GCC didn't + do an unsafe optimization, but in 'grep' this is not typically a + symptom of a programming error. + * configure.ac (WERROR_CFLAGS): Suppress -Wunsafe-loop-optimizations. + +2014-04-23 Paul Eggert <eggert@cs.ucla.edu> + + dfa: fix memory leak reintroduced by previous patch + Reported by Norihiro Tanaka in <http://bugs.gnu.org/17328#16>. + * src/dfa.c (dfaexec): Allocate mb_match_lens and mb_follows only + if not already allocated. + (free_mbdata): Null out mb_match_lens to mark it as being freed. + +2014-04-23 Jim Meyering <meyering@fb.com> + + tests: use consistent spelling for locale name, en_US.UTF-8 + * tests/pcre-infloop: Spell locale name, en_US.UTF-8, consistently, + converting this one use from "en_US.utf8", which would provoke a + test failure on OS/X. + +2014-04-23 Paul Eggert <eggert@cs.ucla.edu> + + dfa: omit static variables that limited dfaexec to one struct dfa + Problem reported by Aharon Robbins in: http://bugs.gnu.org/17328 + * src/dfa.c (struct dfa): New member mbs. + mb_follows is now a position_set, not a pointer to one; + this simplifies memory allocation. All uses changed. + (mbs_to_wchar): Put DFA arg at the end, in place of the mbstate_t *arg, + since the DFA now contains an mbstate_t. All uses changed. + (mbs): Remove static variable. + (dfaexec): Remove static bool that attempted to optimize memory + allocation, as this wasn't correct for Gawk. Perhaps we can think + of a better way to optimize memory. + +2014-04-22 Paul Eggert <eggert@cs.ucla.edu> + + kwset: simplify and speed up Boyer-Moore unibyte -i in some cases + This improves the performance of, for example, + yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 | grep -i jk + in a unibyte locale. + * src/kwset.c (memchr_trans): New function. + (bmexec): Use it. Simplify the code and remove some of the + confusing gotos and breaks and labels. Do not treat glibc memchr + as a special case; if non-glibc memchr is slow, that is lower + priority and I suppose we can try to work around the problem in + gnulib. + +2014-04-22 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: speed-up by using memchr() in Boyer-Moore searching + memchr() of glibc is faster than seeking by delta1 on some platforms. + When there is no chance to match for a while, use it on them. + * src/kwset.c (bmexec): Use memchr() in Boyer-Moore searching. + +2014-04-22 Paul Eggert <eggert@cs.ucla.edu> + + kwset: simplify Boyer-Moore with unibyte -i + This change doesn't significantly affect performance on my platform, + and should make the code easier to maintain. + * src/kwset.c (BM_DELTA2_SEARCH, LAST_SHIFT, TRANS): + Remove these macros, in favor of ... + (tr, bm_delta2_search): New functions. All uses changed. + The latter function is inline because this improves code size and + runtime CPU slightly on x86-64 with gcc -O2 (GCC 4.9.0). + (bmexec): Prefer tr when that's simpler. + +2014-04-22 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: may also use Boyer-Moore algorithm for case-insensitive matching + * src/kwset.c (BM_DELTA2_SEARCH, LAST_SHIFT, TRANS): New macro. + (bmexec): Use character translation table. + (kwsexec): Call bmexec for case-insensitive matching. + (kwsprep): Change the `if' condition. + +2014-04-21 Paul Eggert <eggert@cs.ucla.edu> + + grep: -P now rejects invalid input sequences in UTF-8 locales + See <http://bugs.gnu.org/17245> and <http://bugs.exim.org/1468>. + * NEWS: Document this. + * src/pcresearch.c (Pexecute): Do not use PCRE_NO_UTF8_CHECK, + as this leads to undefined behavior when the input is not UTF-8. + * tests/pcre-infloop, tests/pcre-invalid-utf8-input: + Exit status is now 2, not 1, when grep -P is given invalid UTF-8 + data in a UTF-8 locale. + + dfa: minor improvements to previous patch + * src/dfa.c (dfamust): Use &=, not if-then. + * src/dfa.h (struct dfamust): + * src/dfasearch.c (begline, hwsmusts): Use bool for boolean. + * src/dfasearch.c (kwsmusts): + * src/kwsearch.c (Fcompile): Prefer decls after statements. + * src/dfasearch.c (kwsmusts): Avoid conditional branch. + * src/kwsearch.c (Fcompile): Unify the two calls to kwsincr. + +2014-04-21 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: speed-up for exact matching with begline and endline constraints. + dfamust turns on the flag when a state exactly matches the proposed one. + However, when the state has begline and/or endline constraints, turns + off it. + + This patch enables to match a state exactly, even if the state has + begline and/or endline constraints. If a exact string has one of their + constrations, the string adding eolbyte to a head and/or foot is pushed + to kwsincr(). In addition, if it has begline constration, start + searching from just before the position of the text. + + * src/dfa.c (variable must): New members `begline' and `endline'. + (dfamust): Consideration of begline and endline constrations. + * src/dfa.h (struct dfamust): New members `begline' and `endline'. + * src/dfasearch.c (kwsmusts): If a exact string has begline constration, + start searching from just before the position of the text. + (EGexecute): Same as above. + * src/kwsearch.c (Fexecute): Same as above. + +2014-04-20 Paul Eggert <eggert@cs.ucla.edu> + + dfa: fix bug that caused NUL to be mishandled in patterns + This bug was introduced in the early-2012 patches that fixed some + context-handling bugs. Bisecting found commit + d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100), + but it apears the underlying problem was introduced in commit + 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100). + * NEWS: Mention bug fix. + * src/dfa.c (char_context): Consider NUL to be a newline only if -z. + * tests/Makefile.am (TESTS): Add null-byte. + * tests/null-byte: New file. + +2014-04-19 Jim Meyering <meyering@fb.com> + + build: reenable some compiler warning options + +2014-04-18 Paul Eggert <eggert@cs.ucla.edu> + + dfa: fix pointer type conversion bug + The code converted between size_t * and ptrdiff_t *, which wasn't + diagnosed by modern x86-64 GCC but isn't portable. Problem + reported by Norihiro Tanaka in <http://bugs.gnu.org/17136#31>. + * configure.ac (WERROR_CFLAGS): Don't add -Wno-pointer-sign. + We want GCC to diagnose pointer signedness problems, as they + violate the C standard and other compilers no doubt complain too. + * src/dfa.c (struct dfa): Change type of salloc to size_t. + (realloc_trans_if_necessary): Convert signed value to size_t before + passing its address to x2nrealloc. Changing the type of tralloc + to size_t might have led to problems elsewhere. + +2014-04-18 Jim Meyering <meyering@fb.com> + + maint: Revert "dfa: avoid new NULL dereference" + This reverts commit 5190041fe515743ef4545abf287d243bc025c701. + It was only a bug if one neglected to update to the latest gnulib. + With the newer xn2realloc, there is no problem. + + dfa: avoid new NULL dereference + * src/dfa.c (dfa_charclass_index): Restore a "+ 1" mistakenly omitted + during recent improvements. Introduced in v2.18-66-g6a60fd5. + +2014-04-17 Paul Eggert <eggert@cs.ucla.edu> + + dfa: minor cleanup + * src/dfa.c (MAX): Remove; no longer used. + +2014-04-17 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: speed up by checking multibyte characters on demand + If dfaexec() runs in non-UTF8 locales, length and wide character + representation are checked for all characters of a line in a input + string. However, if matched early in the line, results for remaining + characters are wasted. + + This patch checks multibyte characters on demand. It should work + faster for early matches, and reduces memory requirements. + + * src/dfa.c (struct dfa): Remove members mblen_buf, nmblen_buf, + inputwcs, ninputwcs. All uses removed. + (buf_begin, buf_end, prepare_wc_buf): Remove. All uses removed. + (SKIP_REMAINS_MB_IF_INITIAL_STATE): Remove. This is now expanded + when used. + (match_anychar, match_mb_charset, check_matching_with_multibyte_ops): + New arg wc, mbclen. Remove arg idx. All uses changed. + (transit_state_consume_1char): New arg wc. All uses changed. + (transit_state): New arg 'end'. All uses changed. + +2014-04-17 Paul Eggert <eggert@cs.ucla.edu> + + dfa: trans reallocation microoptimization + * src/dfa.c (realloc_trans_if_necessary): + Help the compiler avoid unnecessary reloads. + + dfa: simplify dfmust initialization + * src/dfa.c (dfamust): Don't initialize musts twice. + Use zcalloc, not xmalloc followed by zeroing. + Make result a const pointer. + + dfa: simplify freelist + * src/dfa.c (freelist): Don't null out array while freeing its + pointers; the caller can do that if needed. + (resetmust): Null out zeroth entry of array. + + dfa: avoid duplicate strlen when allocating memory + * src/dfa.c (dfamust): Use xstrdup, not strlen (twice) + xmemdup. + + dfa: simplify memory allocation + * src/dfa.c (icatalloc, freelist, enlist, comsubs, addlists, inboth) + (dfamust): Don't worry about null arguments or results, + as memory allocators no longer can return null pointers. + (dfamust): Invoke malloc just once when building a concatenated string. + + dfa: simplify position set and element count allocation + * src/dfa.c (dfaanalyze): Allocation position set info all at one go, + and similarly for element count info. + + dfa: simplify multibyte_prop allocation + * src/dfa.c (struct dfa): Simplify by removing nmultibyte_prop; + it should always be the same as talloc. All uses changed. + + dfa: simplify range char allocation + * src/dfa.c (struct dfa): Simplify by allocating one array of ranges + rather than one for range starts and another for range ends. + All uses changed. + + dfa: simplify transition table allocation + * src/dfa.c (struct dfa): Remove member 'realtrans', as it can + be computed from 'trans'. All uses changed. + (realloc_trans_if_necessary): Move earlier, to avoid a forward decl. + Use x2nrealloc to compute new size, rather than doing it by hand, + which omits a check for unlikely overflow. + (realloc_trans_if_necessary, dfafree): Adjust to the fact that + d->trans now might be either NULL, or 1 + the pointer to free. + (build_state, build_state_zero): Use realloc_trans_if_necessary + instead of duplicating its code. + + dfa: better size-overflow check + * src/dfa.c (dfasuperset): Let xnmalloc do the multiplication, + to check for size arithmetic overflow better. + + dfa: avoid unnecessary work and other initialization + * src/dfa.c (dfaanalyze, dfainit): + Don't bother allocating when x2nrealloc will do it for us. + (dfastate): Allocate grps and labels on the stack, as their + size is known at compile time. + (build_state): Use xmalloc, not xnmalloc, since the multiplication + can be done at compile-time. + + dfa: clarify memory allocation and port to IRIX + This change was prompted by a porting problem: + IRIX defines its own MALLOC macro, which clashes with ours. + More generally, the MALLOC etc. macros are confusing, as they + look like functions but do not have C-function semantics. + A functional style makes the code easier to read, and though + it lengthens the code a bit here it'll make other + simplifications easier. + * src/dfa.c (XNMALLOC, XCALLOC, CALLOC, MALLOC, REALLOC): Remove. + All uses replaced by xnmalloc etc. + (REALLOC_IF_NECESSARY): Remove; all uses replaced by .... + (maybe_realloc): New function. + (copy, merge): Free and allocate rather than realloc, as we + needn't save the contents. + +2014-04-14 Jim Meyering <meyering@fb.com> + + tests: detect an infloop-inducing bug in grep -P (pcre-8.35) + * tests/pcre-infloop: New test. + * tests/Makefile.am (TESTS): Add it. + +2014-04-12 Paul Eggert <eggert@cs.ucla.edu> + + build: update gnulib submodule to latest + +2014-04-11 Paul Eggert <eggert@cs.ucla.edu> + + grep: improvements for the open-CSET patch + * src/dfa.c (dfamust): Simplify by removing some duplicate code. + Optimize patterns like [aaa] even when not case-folding. + Avoid an unnecessary copy of the charclass. + +2014-04-11 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: open CSET and transform into uppercase when MB_CUR_MAX == 1 + In unibyte locales with -i, kwset matching isn't helpful, because + dfamust doesn't extract the CSET entries. Fix dmamust so that it + does that, and makes it possible to take out a longer fixed string + from tokens. + * src/dfa.c (dfamust): open CSET and transform into uppercase + when MB_CUR_MAX == 1. + +2014-04-11 Paul Eggert <eggert@cs.ucla.edu> + + grep: cleanup for HAS_DOS_FILE_CONTENTS issue + While cleaning up the empty-string fix, I noticed that one part of + the code worried about CRLF in pattern files whereas another part + did not. Fix this by using the same approach in both places, + and make the CRLF code more modular in the process. + * src/dosbuf.c (dos_binary, dos_unix_byte_offsets): New functions. + (undossify_input, dossified_pos): Do nothing if ! O_BINARY. + * src/grep.c: Always include dosbuf.c so that the code is + checked statically even on non-DOS hosts. + (dos_binary, dos_unix_byte_offsets): New decls. + (undossify_input): Declare unconditionally. + * src/grep.c (fillbuf, print_line_head, main): + * src/kwsearch.c (Fcompile): + Simplify by not worrying about HAVE_DOS_FILE_CONTENTS. + * src/grep.c (main): fopen with "rt" if O_TEXT; this is simpler + than worrying about HAVE_DOS_FILE_CONTENTS elsewhere. + * src/system.h (HAVE_DOS_FILE_CONTENTS): Remove. + + grep: cleanup for empty-string fix + * NEWS: Document it. + * src/dfasearch.c (GEAcompile): + * src/kwsearch.c (Fcompile): + Use C99-style decls to simplify. Avoid duplicate code. + * tests/empty-line: Add some more tests like this. + +2014-04-11 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: no match for the empty string included in multiple patterns + * src/dfasearch.c (EGAcompile): Fix it. + * src/kwsearch.c (Fcompile): Fix it. + +2014-04-08 Paul Eggert <eggert@cs.ucla.edu> + + grep: remove bool_bf + The extra complexity of this microoptimization wasn't ever much help, + and currently it generated bigger code with gcc -O2 (x86-64). + * src/dfa.c (bool_bf): Remove. All uses replaced by plain 'bool', + without a bitfield. + +2014-04-08 Jim Meyering <meyering@fb.com> + + maint: avoid sc_po_check syntax-check failure (kwset.c) + * po/POTFILES.in: Remove kwset.c from this list, since it + no longer contains a translatable diagnostic. + +2014-04-08 Paul Eggert <eggert@cs.ucla.edu> + + grep: port better to hosts with nonstandard nl_langinfo + On some hosts, nl_langinfo returns strings other than "UTF-8" when + UTF-8 is used, and (worse) return "UTF-8" even if the encoding is + single-byte. Work around these problems by trying a sample + character instead. + * src/dfa.c, src/pcresearch.c, src/searchutils.c: + Don't include <langinfo.h>. + * src/dfa.c (using_utf8): Test for UTF-8 by trying a character + rather than by invoking nl_langinfo (CODESET); this is more + portable in practice, and removes a dependency on + HAVE_LANGINFO_CODESET. + * src/pcresearch.c: Include dfa.h, for using_utf8. + (Pcompile): Use using_utf8 rather than nl_langinfo. + +2014-04-07 Paul Eggert <eggert@cs.ucla.edu> + + grep: prefer bool in DFA internals + * src/dfa.c (bool_bf): New type. + (dfa_state): Use it, as this seems to generate slightly better + code with GCC. + (struct mb_char_classes, struct dfa, equal, case_fold, dfasyntax) + (laststart, parse_bracket_exp, lex, dfaparse, dfaanalyze, dfastate) + (match_mb_charset, dfamust): + Use bool for boolean. + (using_utf8) [!HAVE_LANGINFO_CODESET]: Tune. + (dfaanalyze): Prefer & to && and | to || on booleans; it's simpler here. + (dfastate): Simplify charclass nonzero testing. Redo has_mbcset + test so that the compiler's more likely to optimize it. + +2014-04-07 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: prefer regex to DFA for ANYCHAR in multibyte locales + * src/dfa.c (dfa_state): New member has_mbcset. + Rename backref to has_backref, and make it of type bool too. + All uses changed. + (state_index, dfastate): Initialize new member. + (dfaexec): Prefer regex to DFA for ANYCHAR in multibyte locales. + +2014-04-07 Paul Eggert <eggert@cs.ucla.edu> + + grep: remove trival_case_ignore + This optimization is no longer needed, given the other + optimizations recently installed. Derived from a patch by + Norihiro Tanaka; see <http://bugs.gnu.org/17019>. + * bootstrap.conf (gnulib_modules): Remove assert-h. + * src/dfa.c (CASE_FOLDED_BUFSIZE): Move here from dfa.h. + Remove now-unnecessary static assert. + (case_folded_counterparts): Now static. + * src/dfa.h (CASE_FOLDED_BUFSIZE, case_folded_counterparts): + Remove decls; no longer public. + * src/dfasearch.c (kwsmusts): Use kwset even if fill MB_CUR_MAX > 1 + and case-insensitive. + * src/grep.c (MBRTOWC, WCRTOMB): Remove. + (fgrep_to_grep_pattern): Use mbrtowc, not MBRTOWC. + (trivial_case_ignore): Remove; this optimization is no longer needed. + All uses removed. + + grep: simplify memory allocation in kwset + * src/kwset.c: Include kwset.h first, to check its prereqs. + Include xalloc.h, for xmalloc. + (kwsalloc): Use xmalloc, not malloc, so that the caller need not + worry about memory allocation failure. + (kwsalloc, kwsincr, kwsprep): Do not worry about obstack_alloc + returning NULL, as that's not possible. + (kwsalloc, kwsincr, kwsprep, bmexec, cwexec, kwsexec, kwsfree): + Omit unnecessary conversion between struct kwset * and kwset_t. + (kwsincr, kwsprep): Return void since memory-allocation failure is + not possible now. All uses changed. + * src/kwset.h: Include <stddef.h>, for size_t, so that this + include file doesn't require other files to be included first. + + grep: minor cleanups for Galil speedups + * src/kwset.c: Update citations. + Include stdbool.h. + (kwsincr, kwsprep): Clarify by using C99 decls after statements. + (kwsprep): Clarify by using MIN. Avoid a couple of buffer copies + when !TRANS. + (bmexec): Use bool for boolean. Prefer "continue;" to ";". + +2014-04-07 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: use the Galil rule for Boyer-Moore algorithm in KWSet + The Boyer-Moore algorithm is O(m*n), which means it may be much + slower than the DFA. Its Galil rule variant is O(n) and increases + efficiency in the typical case; it skips sections that are known + to match and does not compare more than once for a position in the text. + To use the Galil rule, look for the delta2 shift at each position + from the trie instead of the 'mind2' value. + * src/kwset.c (struct kwset): Replace member 'mind2' with 'shift'. + (kwsprep): Look for the delta2 shift. + (bmexec): Use it. + +2014-04-06 Paul Eggert <eggert@cs.ucla.edu> + + grep: cleanup DFA superset optimization + * src/dfa.c (dfa_charclass_index): New function, with body of + old dfa_charclass but with an extra parameter D. + (charclass_index): Reimplement in terms of dfa_charclass_index. + (dfahint): Clarify. + (dfasuperset): Do not assign to 'dfa' static variable. Instead, + use a local, and use the new dfa_charclass_index function. This + doesn't fix any bugs, but it's clearer. Initialize a few more + members, to simplify dfafree. Copy the charclasses with + just one memcpy call. Don't assign nonnull to D->superset until + it's known to be valid; that's simpler. + (dfafree, dfaalloc): Simplify based on dfasuperset initializations. + * src/dfa.h (dfahint): Add comment. + * src/dfasearch.c (EGexecute): Simplify use of memchr. + Simplify by using memrchr. Fix typo that could cause a buffer + read overrun. + +2014-04-06 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: optimization with the superset of DFA + The superset of a DFA is like the DFA, except that for speed + ANYCHAR, MBCSET and BACKREF are replaced by (CSET full bits) STAR, + and mb_cur_max is 1. For example, for 'a\(b\)c\1': + original: a b CAT c CAT BACKREF CAT + superset: a b CAT c CAT CSET STAR CAT (The CSET has all bits set.) + If a string matches a DFA, it matches the DFA's superset. + Using the superset to filter can dramatically improve performance, + over 200x in some cases. See <http://bugs.gnu.org/16966>. + * src/dfa.c (struct dfa): New member 'superset'. + (dfahint, dfasuperset): New functions. + (dfacomp): Create and analyze the superset. + (dfafree): Free only non-NULL items. + (dfaalloc): Initialize superset member. + (dfaoptimize): If succeed in optimization for UTF-8 locale, don't use + the superset. + * src/dfa.h (dfahint): New decl. + * src/dfasearch.c (EGexecute): Use dfahint. + +2014-04-06 Jim Meyering <meyering@fb.com> + + build: avoid OS X 10.8.5 build failure due to lack of static_assert + * bootstrap.conf (gnulib_modules): Add assert-h, to accommodate the + new use of static_assert on systems lacking support for that construct. + Without this change, compilation of dfa.c failed on OS X 10.8.5 with + gcc-4.9.0 20140324. We should be using gnulib's assert-h module, + regardless, for its nominal improved portability, since grep includes + assert.h and uses assert. + +2014-04-05 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: fix performance bug with regex in line-by-line mode + * src/dfasearch.c (EGexecute): Match line-by-line with regex. + +2014-04-05 Paul Eggert <eggert@cs.ucla.edu> + + grep: minor improvements to previous patch + * src/dfa.c (MAX): New macro. + (match_anychar, match_mb_charset, transit_state_consume_1char): + Use it to simplify assignments. + (SKIP_REMAINS_MB_IF_INITIAL_STATE): Prefer != 0 for unsigned. + (free_mbdata): Omit an unnecessary 'free'. + +2014-04-05 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: reuse multibyte DFA buffers in non-UTF8 locales + * src/dfa.c (struct dfa): New members 'mblen_buf', 'nmblen_buf', + 'inputwcs', 'ninputwcs', 'mb_follows' and 'mb_match_lens'. + (mblen_buf, inputwcs): Remove static vars. + (SKIP_REMAINS_MB_IF_INITIAL_STATE, match_anychar, match_mb_charset) + (transit_state_consume_1char, transit_state, prepare_wc_buf): + Use new members instead of global variables. + (check_matching_with_multibyte_ops): Use new members + instead of new allocation. + (dfaexec): Initialize new members. + (free_mbdata): Free new members. + +2014-04-05 Paul Eggert <eggert@penguin.cs.ucla.edu> + + grep: simplify dfa.c by having it not include mbsupport.h directly + * src/mbsupport.h: Remove. + * src/Makefile.am (noinst_HEADERS): Remove mbsupport.h. + * src/dfa.c, src/grep.c, src/search.h: Don't include mbsupport.h. + * src/dfa.c: Include wchar.h and wctype.h unconditionally, as + this simplifies the use of dfa.c in grep, and it does no harm + in gawk. + (setlocale, static_assert): Remove gawk-specific hacks, as + gawk now does these itself. + (struct dfa, dfambcache, mbs_to_wchar) + (is_valid_unibyte_character, setbit_wc, using_utf8, FETCH_WC) + (addtok_wc, add_utf8_anychar, atom, state_index, epsclosure) + (dfaanalyze, dfastate, prepare_wc_buf, dfaoptimize, dfafree, dfamust): + * src/dfasearch.c (EGexecute): + * src/grep.c (main): + * src/searchutils.c (mbtoupper): + Assume MBS_SUPPORT. + +2014-04-01 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: avoid re-building a state built previously + * src/dfa.c (dfaexec): Avoid to re-build a state built previously. + +2014-03-28 Paul Eggert <eggert@cs.ucla.edu> + + dfa: improve port to freestanding DJGPP + Suggested by Aharon Robbins (Bug#17056). + * src/dfa.c (setlocale) [!LC_ALL]: Return NULL, not "C", + reverting part of a recent change. + (using_simple_locale): Return true if setlocale returns null. + +2014-03-28 Jim Meyering <meyering@fb.com> + + tests: placate "make syntax-check" re compare arg ordering + * tests/euc-mb: Reverse order of arguments to compare. + Be consistent in ordering compare arguments: expected followed + by actual. + +2014-03-28 Paul Eggert <eggert@cs.ucla.edu> + + dfa: avoid an indirection and port wint_t usage + * src/dfa.c (struct dfa): Put mbrtowc_cache directly into struct dfa + rather than having a pointer; this saves a malloc and an indirection. + All uses changed. + (dfambcache): Port to hosts where wint_t * can't be cast to wchar_t *. + +2014-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: take mbrtowc_cache into new member of struct dfa + When struct dfa more than one are used at the same time, mbrtowc cache + may be conflict. So, take mbrtowc_cache into new member of struct dfa, + and define each mbrtowc cache for them. + + * src/dfa.c (struct dfa): New member `mbrtowc_cache'. + (dfambcache): Rename from build_mbrtowc_cache. Add dependency on struct dfa. + (mbs_to_wchar): Add dependency on struct dfa. + (FETCH_WC): Use it. + (prepare_wc_buf): Use it. Add dependency on struct dfa. + (dfacomp): Call it. + (dfafree): Release it. + +2014-03-28 Paul Eggert <eggert@cs.ucla.edu> + + dfa: cache results of mbrtowc for speed + Idea suggested by Norihiro Tanaka in Bug#16842. + * src/dfa.c (mbrtowc_cache): New static var. + (build_mbrtowc_cache, mbs_to_wchar): New functions. + (FETCH_WC) [MBS_SUPPORT]: Speed up by using mbs_to_wchar + instead of mbrtowc and wctob. + (FETCH_WC) [!MBS_SUPPORT]: Rewrite in terms of old FETCH macro. + (FETCH): Remove; no longer used. + (lex): Simplify by avoiding the need for FETCH. + (prepare_wc_buf) [MBS_SUPPORT]: Speed up by using mbs_to_wchar. + Simplify the loop. + (dfacomp): Initialize the cache. + +2014-03-27 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: perform the kwset-helping DFA match in narrower range + When kwsexec gives us the offset of a potential match, we compute + line begin/end and then run the DFA matcher to see if there really + is a match on that line. When the beginning of the line, BEG, is + not on a multibyte character boundary, advance BEG until it on such + a boundary, before running the DFA search. + * src/dfasearch.c (EGexecute): As above. Add a comment. + * tests/euc-mb: Add a test case that exercises this code. + This addresses http://debbugs.gnu.org/17095. + +2014-03-26 Jim Meyering <meyering@fb.com> + + maint: fix "make dist" + * src/Makefile.am (egrep fgrep): Specify egrep.sh via + $(srcdir)/egrep.sh, so non-srcdir builds work once again. + +2014-03-26 Paul Eggert <eggert@penguin.cs.ucla.edu> + + dfa: improve port to freestanding DJGPP + * src/dfa.c (setlocale) [!LC_ALL]: Return "C", not NULL (Bug#17056). + (using_simple_locale): Store setlocale result in a ptr-to-const. + + egrep, fgrep: improve diagnostics from shell scripts + This should fix Bug#17098. + * src/Makefile.am (EXTRA_DIST): Add egrep.sh. + (egrep fgrep): Depend on egrep.sh and Makefile. + Build from new file egrep.sh, as this makes the build process + easier to follow. Arrange for $0 to look nicer in subgrep. + * src/egrep.sh: New file. + +2014-03-23 Paul Eggert <eggert@cs.ucla.edu> + + dfa: avoid undefined behavior + * src/dfa.c (FETCH_WC, addtok_wc): Don't rely on undefined behavior + when converting an out-of-range value to 'int'. + (FETCH_WC, prepare_wc_buf): Don't rely on conversion state after + mbrtowc returns a special value, as it's undefined for (size_t) -1. + (prepare_wc_buf): Simplify test for valid character. + + grep: fix and simplify grep -iF optimization + * src/grep.c (check_any_alphabets): Remove. + (fgrep_to_grep_pattern): Fix problems when mbrtowc returns -1 or -2. + Simplify a bit. + (main): Don't bother optimizing 'grep -iF PAT' when PAT contains no + alphabetics; it's so rare it's not worth the complexity. + +2014-03-23 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: optimization for fgrep with changing the macher to grep macher. + fgrep macher is only use kwset engine. However, it's very slow for + case-insensitive matching in multibyte locales. + + And so, if the matcher is fgrep and case-insensitive and keys including + any alphabets, change it into grep matcher by escape of keys. OTOH, if + keys include no alphabet, turn match_icase flag off. + + I prepare following string to measure the performance. + + yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in + A=`printf '\xef\xbc\xa1'` # FULLWIDTH LATIN CAPITAL LETTER A + + I run three tests with this patch (best-of-5 trials): + + env LC_ALL=en_US.UTF-8 time -p src/fgrep -i "$A" in + real 8.54 user 7.13 sys 1.16 + + Back out that commit (temporarily), recompile, and rerun the experiment: + + env LC_ALL=en_US.UTF-8 time -p src/fgrep -i "$A" in + real 0.07 user 0.02 sys 0.05 + + * src/fgrep.c (Gcompile) New function. + * src/main.c (check_any_alphabets) New function. + (fgrep_to_grep_pattern) New function. + (main) Use them. + +2014-03-23 Paul Eggert <eggert@cs.ucla.edu> + + egrep, fgrep: go back to shell scripts + Although egrep's and fgrep's switch from shell scripts to + executables may have made sense in 2005, it complicated + maintenance and recently has caused subtle performance bugs. + Go back to the old way of doing things, as it's simpler and more + easily separated from the mainstream implementation. This should + be good enough nowadays, as POSIX has withdrawn egrep/fgrep and + portable applications should be using -E/-F anyway. + * po/POTFILES.in: Remove src/egrep.c, src/fgrep.c, src/main.c. + * src/Makefile.am (bin_PROGRAMS): Remove egrep, fgrep. + (bin_SCRIPTS): New macro. + (grep_SOURCES): Move searchutils.c, dfa.c, dfasearch.c, kwset.c, + kwsearch.c, pcresearch.c here from libgrep_a_SOURCES. + (egrep_SOURCES, fgrep_SOURCES, noinst_LIBRARIES, libgrep_a_SOURCES): + Remove. + (LDADD): Remove libgrep.a. + (egrep, fgrep): New rules. + (CLEANFILES): New macro. + * src/grep.c: Rename from src/main.c. + (usage, setmatcher, main): + Simplify, since there's now just one executable. + (Gcompile, Ecompile, Acompile, GAcompile, PAcompile, matchers): + Move here from the (removed) src/grep.c. + (compile_fp_t, execute_fp_t, struct matcher, matchers): + Move here from src/grep.h, as they no longer need to be public. + (struct matcher.name): Avoid one level of indirection/relocation. + (do_execute, main): Fix a performance bug when it was compiled + as 'fgrep', due to confusion about which matcher was which. + (main): Fix a performance bug with -P, likewise. + * src/grep.h (before_options, after_options): Remove. + * src/egrep.c, src/fgrep.c, src/grep.c: Remove. + + dfa: port to freestanding DJGPP (Bug#17056) + * src/dfa.c (setlocale) [!LC_ALL]: Define a dummy. + +2014-03-16 Jim Meyering <meyering@fb.com> + + tests: avoid false-positive failure on some AMD CPUs + * tests/mb-non-UTF8-performance: Avoid false-positive failure + when run on certain AMD processors. + +2014-03-10 Jim Meyering <meyering@fb.com> + + tests: make a performance-measuring test less system-sensitive + Andreas Schwab reported in http://debbugs.gnu.org/16941 + that this test would timeout and fail on m68k-suse-linux. + Rather than testing absolute duration with a limit tuned + to today's hardware, compare performance of grep with LC_ALL=C + against that same command using LC_ALL=ja_JP.eucJP. + * tests/init.cfg (require_hi_res_time_): New function. + * tests/mb-non-UTF8-performance: Rewrite to use it: + record absolute duration D of the first (normally much faster) + command, and set a timeout of 8*D for the command running in + an affected locale. + +2014-03-09 Paul Eggert <eggert@cs.ucla.edu> + + maint: pacify 'make dist' + * src/dfa.c (parse_bracket_exp): Reindent with spaces. + * src/dfa.h (case_folded_counterparts): Prefix decl with 'extern'. + * src/main.c: Don't include assert.h. + +2014-03-07 Paul Eggert <eggert@cs.ucla.edu> + + fgrep: fix case-fold incompatibility with plain 'grep' + fgrep converted to lowercase, whereas the regex code converted + to uppercase. The resulting behaviors don't agree in offbeat + cases like Greek sigmas and Turkish Is. Fix this by changing + fgrep to agree with the regex code. + * src/kwsearch.c (Fcompile, Fexecute): + * src/searchutils.c (kwsinit, mbtoupper): + Convert to uppercase, not to lowercase, for compatibility with + plain 'grep'. + * src/search.h, src/searchutils.c (mbtoupper): + Rename from mbtolower, since it now converts to uppercase. + All uses changed. + * tests/case-fold-titlecase: Add tests for this. + + grep: fix case-fold mismatches between DFA and regex + The DFA code and the regex code didn't use the same semantics for + case-folding. The regex code says that the data char d matches + the pattern char p if uc (d) == uc (p). POSIX is unclear in this + area; the simplest fix for now is to change the DFA code to agree + with the regex code. See <http://bugs.gnu.org/16919>. + * src/dfa.c (static_assert): New macro, if not already defined. + (setbit_case_fold_c): Assume MB_CUR_MAX is 1 and that case_fold + is nonzero; all callers changed. + (setbit_case_fold_c, parse_bracket_exp, lex, atom): + Case-fold like the regex code does. + (lonesome_lower): New constant. + (case_folded_counterparts): New function. + (parse_bracket_exp): Prefer plain setbit when case-folding is + not needed. + * src/dfa.h (CASE_FOLDED_BUFSIZE): New constant. + (case_folded_counterparts): New function decl. + * src/main.c (trivial_case_ignore): Case-fold like the regex code does. + (main): Try to improve comment re trivial_case_ignore. + * tests/case-fold-titlecase: Add lots more test cases. + +2014-03-06 Paul Eggert <eggert@cs.ucla.edu> + + build: update gnulib submodule to latest + + doc: do not overpromise --ignore-case's behavior + * NEWS: Omit vague statement about titlecase that could be + misinterpreted, and is more trouble than it's worth. + * doc/grep.texi: Add @documentencoding. Fix copyright range to + use endash not hyphen. + (Matching Control): Do not overpromise what --ignore-case will do. + Give examples of corner cases where the documentation does not + specify behavior. + +2014-03-05 Paul Eggert <eggert@cs.ucla.edu> + + maint: remove differences from gnulib regex code + These don't seem to be needed with GCC 4.8.2, and are making + maintenance harder. If we need to disable warnings with older + compilers, we can add pragmas to the gnulib versions. See + <http://bugs.gnu.org/16911#24>. + * gl/lib/regcomp.c.diff, gl/lib/regex_internal.c.diff: + * gl/lib/regex_internal.h.diff, gl/lib/regexec.c.diff: + Remove. + * cfg.mk (exclude_file_name_regexp--sc_prohibit_tab_based_indentation): + Don't mention gl/* files. + +2014-03-03 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix comment + * src/main.c (trivial_case_ignore): Fix comment typo. + +2014-03-03 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: avoid to add same character to a bracket expression + * src/main.c (trivial_ignore_case): Only when uppercase and/or + lowercase is different from original character, add it to new pattern. + +2014-03-02 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix some unlikely bugs in trivial_case_ignore + * src/main.c (MBRTOWC, WCRTOMB): Reformat as per usual GNU style. + (trivial_case_ignore): Don't overrun buffer in the unusual case + when a character has both lowercase and uppercase counterparts. + Don't rely on undefined behavior when assigning out-of-range value + to an 'int'. Simplify by avoiding unnecessary buffer copies. + Work even with shift encodings, by using mbsinit to + disable the optimization if we are not in the initial state + when we replace B by [BCD]. + +2014-03-02 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: revert removal of trivial_case_ignore + Revive trivial_case_ignore function in order to be able to use kwset. + + * src/main.c (MBRTOWC, WCRTOMB): New macros. + (trivial_case_ignore): New function. + (main): Use it. + +2014-03-02 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: optimization of bracket expression for non-UTF8 locales + * src/dfa.c (addtok): Replace an MBCSET with a CSET even in + non-UTF8 locales, and even when it has individual characters. + +2014-03-01 Paul Eggert <eggert@cs.ucla.edu> + + doc: describe titlecase fix better + * NEWS: Document behavior on lowercase text too. + Suggested by Eric Blake in <http://bugs.gnu.org/16911#10>. + * doc/grep.texi (Matching Control): Specify behavior of -i + more precisely. + +2014-02-28 Paul Eggert <eggert@cs.ucla.edu> + + grep: minor tuning for mb_case_map_apply + * src/kwsearch.c (mb_case_map_apply): Avoid unnecessary widening of + size_t to intmax_t. Avoid unnecessary reinitialization of k. + + grep: avoid 'inline' when it doesn't matter + These days, compilers generally do just fine without advice from + users about 'inline', and there's little need for 'static inline', + just as there's little need for 'register'. + * src/dfa.c (to_uchar): + * src/dosbuf.c (guess_type, undossify_input, dossified_pos): + * src/main.c (undossify_input): + No longer inline. + * src/search.h (mb_case_map_apply): Move from here ... + * src/kwsearch.c (mb_case_map_apply): ... to here, and + make it no longer 'inline'. + + grep: fix bugs with -i and titlecase + * NEWS: Document this. + * src/dfa.c (setbit_wc): Simplify. + (setbit_c): Remove; no longer used. + (setbit_case_fold_c, parse_bracket_exp, atom): + Don't mishandle titlecase. For 'atom', this removes the need for + the refactoring of Bug#16729. + (lex): Use the slower approach only for letters that have a + differing case. + * tests/case-fold-titlecase: New file. + * tests/Makefile.am (TESTS): Add it. + + grep: remove lint + * src/main.c (MBRTOWC, WCRTOMB): Remove no-longer-used macros. + +2014-02-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + grep: remove trivial_case_ignore + * src/main.c (trivial_case_ignore): Remove. + (main): Remove its use; this optimization is no longer needed. + + grep: don't match line-by-line for case-insensitive with grep and awk + * src/main.c (matcher): Move decl up. + (do_execute): With the grep or awk matchers, + no need to match line by line. + +2014-02-27 Jim Meyering <meyering@fb.com> + + maint: dfa: pass NULL, not 0, as 2nd arg to setlocale + * src/dfa.c (using_simple_locale): Use NULL, not 0. + +2014-02-27 Paul Eggert <eggert@cs.ucla.edu> + + * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match multibyte chars. + + * src/dfa.c (parse_bracket_exp): Parenthesize. + + grep: fix multiple bugs with bracket expressions + * NEWS: Document this. + * src/dfa.c (using_simple_locale): New function. + (parse_bracket_exp): Handle bracket expressions like [a-[.z.]] + correctly. Don't assume that dfaexec handles expressions like + [^a-z] correctly, as they can match multiple characters in some + locales. + * tests/posix-bracket: New file. + * tests/Makefile.am (TESTS): Add it. + +2014-02-25 Stephane Chazelas <stephane.chazelas@gmail.com> + + align grep -Pw with grep -w + For the -w option, with -P, we used to look for the pattern surrounded by + word boundaries. That's different from what grep -w does and what the + documentation describes. Now align with grep -w and the documentation by + using PCRE look-behind and look-ahead operators to match the pattern if + it is not surrounded by word constituents. + * src/pcresearch.c (Pcompile): Use (?<!\w)(?:...)(?!\w) rather than + \b(?:...)\b. + * NEWS (Bug fixes): Mention it. + * tests/pcre-w: New file. + * tests/Makefile.am (TESTS): Add it. + This complements the fix for http://debbugs.gnu.org/16865 + +2014-02-24 Stephane Chazelas <stephane.chazelas@gmail.com> + + grep -P: fix it so backreferences now work with -w and -x + To implement -w and -x, we bracket the search term with parentheses. + However, that set of parentheses had the default semantics of + "capturing", i.e., creating a backreferenceable matched quantity. + Instead, use (?:...), to create a non-capturing group. + * src/pcresearch.c (Pcompile): Use (?:...) rather than (...). + * NEWS (Bug fixes): Mention it. + * tests/pcre-wx-backref: New file. + * tests/Makefile.am (TESTS): Add it. + This addresses http://debbugs.gnu.org/16865 + +2014-02-20 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.18 + * NEWS: Record release date. + + tests: test for the non-UTF8 multi-byte performance regression + Test for the just-fixed performance regression. + With a 100-200x differential, it is reasonable to expect that + a very slow system will be able to complete the designated + task in a few seconds, while with the bug, even a very fast + system would exceed the timeout. + * tests/mb-non-UTF8-performance: New file. + * tests/Makefile.am (TESTS): Add it. + * tests/init.cfg (require_JP_EUC_locale_): New function. + + grep -i: avoid a performance regression in multibyte non-UTF8 locales + * src/main.c: Include dfa.h. + (trivial_case_ignore): Perform this optimization only for UTF8 locales. + This rectifies a 100-200x performance regression in non-UTF8 multi-byte + locales like ja_JP.eucJP. The regression was introduced by the 10x + UTF8/grep-i speedup, commit v2.16-4-g97318f5. + * NEWS (Bug fixes): Mention it. + Reported by Norihiro Tanaka in http://debbugs.gnu.org/16232#50 + + maint: give dfa.c's using_utf8 function external scope + * src/dfa.c (using_utf8): Remove "static inline". + * src/dfa.h (using_utf8): Declare it. + * src/searchutils.c (is_mb_middle): Use using_utf8 rather than + rolling our own. + +2014-02-20 Paul Eggert <eggert@cs.ucla.edu> + + tests: test [^^-^] in unibyte locales + This is a bug in the current dfa.c, which was reintroduced by the + recent reversion from RRI. + * tests/unibyte-negated-circumflex: New file. + * tests/Makefile.am (TESTS): Add it. + * tests/init.cfg (require_unibyte_locale): New function. + + grep: fix bug with patterns like [^^-~] in unibyte locales + * NEWS: Document this. + * src/dfa.c (parse_bracket_exp): Escape patterns like [^^-~], or + Awk patterns like [\^-\]], so that they are not misinterpreted by + the system regex library. Check for system regex failure due to + memory exhaustion. + +2014-02-17 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.17 + * NEWS: Record release date. + +2014-02-17 Paolo Bonzini <bonzini@gnu.org> + + revert "grep: DFA now uses rational ranges in unibyte locales" + The correct course of action for grep is to defer range interpretation + to regex, because otherwise you can get mismatches between regexes with + backreferences and those without. + + For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing + result that the first regex won't match a superset of the language + described by the second regex. + + The source of the confusion is that, even though grep's dfa.c was changed + to use range checking instead of strcoll, that code is only invoked if + dfaexec is called with backref = NULL, and that never happens for grep! + + In the end, all that's needed for RRI is compiling --with-included-regex, + and in that case the patch is almost a no-op. Almost, because there + are corner cases that aren't handled correctly (e.g. [a-[.e.]], or + regular expressions that include a NUL character), but this can be + handled separately. + + * NEWS: Revert paragraph introduced by commit v2.16-7-g1078b64. + * src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec. + +2014-02-16 Mike Frysinger <vapier@gentoo.org> + + maint: ignore configure.lineno + * .gitignore: Add configure.lineno. + +2014-02-11 Benno Schulenberg <bensberg@justemail.net> + + help: remove surplus newline + * src/main.c (usage): Remove inconsistent \n introduced by previous + patch. + +2014-02-10 Benno Schulenberg <bensberg@justemail.net> + + help: fix a line ending, and use the same word for similar things + * src/main.c (usage): Change a stray 'n' to a newline, and use + the word "display" for showing version info as for help text. + +2014-02-09 Norihiro Tanaka <noritnk@kcn.ne.jp> + + speed up mb-boundary-detection after each preliminary match + After each kwsexec or dfaexec match, we must determine whether + the tentative match falls in the middle of a multi-byte character. + That is what our is_mb_middle function does, but it was expensive, + even when most input consisted of single-byte characters. The main + cost was for each call to mbrlen. This change constructs and uses + a cache of the lengths returned by mbrlen for unibyte values. + The largest speed-up (3x to 7x, CPU-dependent) is when most + lines contain a match, yet few are printed, e.g., when using + grep -v common-pattern ... to filter out all but a few lines. + + * src/search.h (build_mbclen_cache): Declare it. + * src/main.c: Include "search.h". + [MBS_SUPPORT] (main): Call build_mbclen_cache in a multibyte locale. + * src/searchutils.c [HAVE_LANGINFO_CODESET]: Include <langinfo.h>. + (mbclen_cache): New global. + (build_mbclen_cache): New function. + (is_mb_middle) [HAVE_LANGINFO_CODESET]: Use it. + * NEWS (Improvements): Mention it. + +2014-02-01 Jim Meyering <meyering@fb.com> + + maint: use to_uchar function rather than explicit casts + * src/system.h (to_uchar): Define function. + * src/kwsearch.c (Fexecute): Use to_uchar twice in place of casts. + * src/dfasearch.c (EGexecute): Likewise. + * src/main.c (prepend_args): Likewise. + * src/kwset.c (U): Define in terms of to_uchar. + * src/dfa.c (match_mb_charset): Use to_uchar, not an explicit cast. + +2014-01-27 Jim Meyering <meyering@fb.com> + + maint: remove vestiges of support for long-disabled --mmap option + This option was disabled in March of 2010, and began to elicit a + warning in January of 2012. Its time has come. + * doc/grep.in.1: Remove mention. + * doc/grep.texi: Likewise. + * src/main.c (GROUP_SEPARATOR_OPTION, usage, MMAP_OPTION) + (long_options, main): Remove all traces. + * tests/Makefile.am (check_PROGRAMS): Remove mention of ignore-mmap. + * tests/ignore-mmap: Remove file. + * NEWS (Maintenance): Mention it. + +2014-01-26 Jim Meyering <meyering@fb.com> + + maint: move two local variable declarations + * src/dfasearch.c (kwsmusts): Move one declaration down to the point + of definition. Move another into the sole scope where it is used. + +2014-01-26 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfasearch: skip kwset optimization when multi-byte+case-insensitive + Now that DFA searching works with multi-byte locales, the only remaining + reason to case-convert the searched input is the kwset optimization. + But multi-byte case-conversion is so expensive that it's not + worthwhile even to attempt that optimization. + + * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode + when the locale is multi-byte. + (EGexecute): Now that this code need not handle multi-byte case-ignoring + matches, remove the expensive copy/case-conversion code. + With no case-converted buffer, there is no longer any need to call + mb_case_map_apply, so remove it and associated code. + (kwsincr_case): Remove function. Now, every use of this function + is equivalent to a use of kwsincr. Replace all uses. + * tests/turkish-eyes: Test all of -E, -F and -G. + +2014-01-25 Norihiro Tanaka <noritnk@kcn.ne.jp> + + dfa: remove GREP-ifdef'd code in favor of code used by gawk + For many years, gawk and grep have used different #ifdef'd bits of + code relating to how the DFA matcher matches multibyte characters. + Remove the GREP-specific code in favor of the code gawk uses. This + permits us to avoid still more cases in which grep must resort to + the expensive process of copying/case-converting each input line + before matching against a case-converted regexp. + * src/dfa.c (parse_bracket_exp, atom): As above. + +2014-01-25 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + +2014-01-17 Paul Eggert <eggert@cs.ucla.edu> + + grep: DFA now uses rational ranges in unibyte locales + Problem reported by Aharon Robbins in <http://bugs.gnu.org/16481>. + * NEWS: + * doc/grep.texi (Environment Variables) + (Character Classes and Bracket Expressions): + Document this. + * src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte. + +2014-01-17 Aharon Robbins <arnold@skeeve.com> + + grep: add undocumented '-X gawk' and '-X posixawk' options + See <http://bugs.gnu.org/16481>. + * src/grep.c (GAcompile, PAcompile): New functions. + (const): Use them. + +2014-01-10 Pádraig Brady <P@draigBrady.com> + + tests: remove superfluous uses of printf + * tests/turkish-eyes: Remove unnecessary uses of printf. + +2014-01-09 Jim Meyering <meyering@fb.com> + + grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales + These days, nearly everyone uses a multibyte locale, and grep is often + used with the --ignore-case (-i) option, but that option imposes a very + high cost in order to handle some unusual cases in just a few multibyte + locales. This change gets most of the performance of using LC_ALL=C + without eliminating the ability to search for multibyte strings. + + With the following example, I see an 11x speed-up with a 2.3GHz i7: + Generate a 10M-line file, with each line consisting of 40 'j's: + + yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k + + Time searching it for the simple/noexistent string "foobar", + first with this patch (best-of-5 trials): + + LC_ALL=en_US.UTF-8 env time src/grep -i foobar k + 1.10 real 1.03 user 0.07 sys + + Back out that commit (temporarily), recompile, and rerun the experiment: + + git log -1 -p|patch -R -p1; make + LC_ALL=en_US.UTF-8 env time src/grep -i foobar k + 12.50 real 12.41 user 0.08 sys + + The trick is to realize that for some search strings, it is easy + to convert to an equivalent one that is handled much more efficiently. + E.g., convert this command: + + grep -i foobar k + + to this: + + grep '[fF][oO][oO][bB][aA][rR]' k + + That allows the matcher to search in buffer mode, rather than having to + extract/case-convert/search each line separately. Currently, we perform + this conversion only when search strings contain neither '\' nor '['. + See the comments for more detail. + + * src/main.c (trivial_case_ignore): New function. + (main): When possible, transform the regexp so we can drop the -i. + * tests/turkish-eyes: New file. + * tests/Makefile.am (TESTS): Use it. + * NEWS (Improvements): Mention it. + +2014-01-07 Paul Eggert <eggert@cs.ucla.edu> + + tests: port Solaris 10 /bin/sh patch back to GNU/Linux + Problem reported by Jim Meyering. + * tests/bre, tests/ere, tests/spencer1-locale: + Prefer re_shell, not re_shell_. + * tests/init.sh (re_shell): New var, which is exported instead of + re_shell_. + + Port to Solaris 10 /bin/sh. + Problem reported by Dagobert Michelsen in <http://bugs.gnu.org/16380>. + * tests/bre, tests/ere, tests/spencer1-locale: + Prefer re_shell_ to SHELL, if re_shell_ is set. + * tests/init.sh (re_shell_): Export if it's used. + +2014-01-01 Jim Meyering <meyering@fb.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.16 + * NEWS: Record release date. + + gnulib: update to latest, for maint.mk fix + + maint: update copyright dates for 2014 + Do that by running "make update-copyright". + + gnulib: update to latest + +2013-12-31 Jim Meyering <meyering@fb.com> + + pcre: use PCRE_NO_UTF8_CHECK properly + In order to obtain the behavior we want, i.e., to disable + error-on-invalid-UTF-in-input, apply this PCRE option in + pcre_exec, not when compiling. + * src/pcresearch.c (Pexecute): Use PCRE_NO_UTF8_CHECK here, ... + (Pcompile): ...rather than here. + * tests/pcre-invalid-utf8-input: Adjust test case to test for this. + +2013-12-26 Jim Meyering <meyering@fb.com> + + maint: fix inconsistent spacing in expression + * src/main.c (prline): Fix inconsistent spacing in expression: + s/ / /. + +2013-12-26 behoffski <behoffski@grouse.com.au> + + maint: fix a garbled comment + * src/dfa.c (XNMALLOC, etc.): Fix garbled comment wording. + +2013-12-23 Jim Meyering <meyering@fb.com> + + maint: fix/improve a comment + * src/main.c (prline): Replace untrue FIXME comment with one + telling how the hard-to-reach code can be exercised. + +2013-12-21 Santiago Ruano Rincón <santiago@debian.org> + + pcre: tell grep -P to relax its stance on invalid multibyte chars + Do not exit-2 for invalid UTF-8 characters. Just prior to this + change, this command would match no lines and fail like this: + $ printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 grep -P j|cat -A; echo $? + grep: invalid UTF-8 byte sequence in input + 2 + After this change, the same command matches both lines, and succeeds: + jM-^B$ + j$ + 0 + * src/pcresearch.c (Pcompile): Use PCRE_NO_UTF8_CHECK, too, and + add a comment. + * tests/pcre-utf8: Add a test and a comment. + This change did not work with Debian unstable pcre-8.31-2 + or with some 8.33 and 8.34-based versions, but does work with + Fedora 20's 8.33 and with a built-from-latest source library. + Based on a patch by Santiago Ruano Rincón. + See http://bugs.gnu.org/15758/ + +2013-12-21 Jim Meyering <meyering@fb.com> + + tests: avoid FP failure due to exhausted memory + * tests/long-line-vs-2GiB-read: Don't declare the test "failed" + when running out of memory. In that case, skip it. + +2013-12-18 Jim Meyering <meyering@fb.com> + + maint: add comments and split some long lines + * src/main.c (do_execute): Add a comment. + Split some lines longer than 80 bytes. + + pcre: avoid a nominal leak + * src/pcresearch.c (Pcompile)[HAVE_LIBPCRE && !PCRE_STUDY_JIT_COMPILE]: + We would leak "re" if built with HAVE_LIBPCRE but without + PCRE_STUDY_JIT_COMPILE. Move the free out one level. + + maint: indent cpp directives to reflect nesting + * src/pcresearch.c: Insert spaces after a few "#", to indent + cpp directives to reflect their nesting. + + grep: handle lines longer than INT_MAX on more systems + When trying to exercize some long-line-handling code, I ran these + commands: + $ dd bs=1 seek=2G of=big < /dev/null; grep -l x big; echo $? + grep: big: Invalid argument + 2 + grep should not have issued that diagnostic, and it should + have exited with status 1, not 2. What happened? + grep read the 2GiB of NULs, doubled its buffer size, + copied the 2GiB into the new 4GiB buffer, and proceeded + to call "read" with a byte-count argument of 2^32. + On at least Darwin 12.5.0, that makes read fail with EINVAL. + The solution is to use gnulib's safe_read wrapper. + * src/main.c: Include "safe-read.h" + (fillbuf): Use safe_read, rather than bare read. The latter + cannot handle a read size of 2^32 on some systems. + * bootstrap.conf (gnulib_modules): Add safe-read. + * tests/long-line-vs-2GiB-read: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + +2013-11-25 Jim Meyering <meyering@fb.com> + + tests: port to non-GNU sed + * tests/multibyte-white-space (utf8_space_characters): The generation + of test inputs relied on GNU sed's interpretation of \<, but that is + not portable, and caused spurious test failures. Adjust the sed regexp + to work on all versions. + Reported by Karl Dubost in http://bugs.gnu.org/15953. + +2013-11-22 Jim Meyering <meyering@fb.com> + + maint: minor cleanup: xmalloc+strcpy -> xmemdup + * src/main.c (main): Replace an xmalloc+strcpy combination + with an equivalent use of xmemdup. + +2013-11-21 Jim Meyering <meyering@fb.com> + Paul Eggert <eggert@cs.ucla.edu> + + dfa: avoid undefined behavior of "1 << 31" + * src/dfa.c (charclass): Change type from "int" to "unsigned int". + (tstbit): Rather than shifting "1" left to form a mask, shift the + LHS bits the right and use "1" as the mask. Also, return bool, rather + than "int". + (setbit, clrbit, dfastate): Don't shift "1" (aka (int)1) left by 31 bits. + Instead, use "1U" as the operand, to avoid undefined behavior. + Spotted by gcc's new -fsanitize=undefined. + +2013-11-02 Jim Meyering <meyering@fb.com> + + grep: fix regression with -P vs. invalid UTF-8 input + * src/pcresearch.c (Pexecute): Don't abort upon unexpected + PCRE-specific error code. Explicitly handle PCRE_ERROR_BADUTF8, + and change the default to print a diagnostic including the unhandled + integer PCRE error code and exit with status 2. + * tests/pcre-invalid-utf8-input: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + * THANKS: Update. + Reported by Dave Reisner in http://bugs.gnu.org/15758. + + grep: fix regression involving \s and \S + Commit v2.14-40-g01ec90b made \s and \S work with multi-byte + characters, but it made it so any use like \s*, \s+, \s?, \s{3} + would malfunction in a multi-byte locale. + * src/dfa.c (lex): Also reset laststart. + * tests/backslash-s-and-repetition-operators: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + * THANKS: Update. + Reported by Mirraz Mirraz in http://bugs.gnu.org/15773. + +2013-11-01 Jim Meyering <meyering@fb.com> + + maint: NEWS: document a release-related bug fix + * NEWS (Bug fixes): Add an entry for a fix pulled from gnulib. + +2013-10-26 Jim Meyering <meyering@fb.com> + + build: update gnulib submodule to latest + This pulls in a gnulib fix for maint.mk that ensures the procedure + described in README-release actually does what we want. Before this + change, that procedure resulted in a grep-2.15 tarball that would + lead to a grep binary whose --version- reported version number was + 2.14.51... rather than the expected 2.15. + + maint: avoid automake deprecation warning re ACLOCAL_AMFLAGS + * Makefile.am (ACLOCAL_AMFLAGS): Don't use this deprecated variable. + * configure.ac (AC_CONFIG_MACRO_DIRS): Use this instead. + (AUTOMAKE_OPTIONS): Require automake-1.12. + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.15 + * NEWS: Record release date. + +2013-10-25 Paul Eggert <eggert@cs.ucla.edu> + + build: port to AIX + Problem reported by Pavel Kharitonov in <http://bugs.gnu.org/15690#68>. + * src/Makefile.am (LDADD): Add $(LIBTHREAD). + + build: avoid duplicate -funit-at-a-time etc. options + * configure.ac (WERROR_CFLAGS): Don't add -fdiagnostics-show-option + and -funit-at-a-time, as Gnulib does that for us now, and we're + merely piling on duplicats. + +2013-10-24 Jim Meyering <meyering@fb.com> + + tests: port more tests to bourne shells with hex-challenged printf + * tests/pcre-utf8: Convert the hex \xHH literals for the euro symbol + to octal \OOO. + * tests/turkish-I: Likewise for "I with dot". + * tests/turkish-I-without-dot: Likewise for another Turkish I: U+0131. + + maint: clean up an ugly 'while' condition + * src/main.c (get_nondigit_option): Separate a slightly baroque + "while" expression into two separate statements, both inside the loop. + +2013-10-23 Jim Meyering <meyering@fb.com> + + tests: port to bourne shells whose printf doesn't grok hex + Use octal escapes, not hex, in printf(1) format strings, + and in one case, use $AWK's printf so we can continue + to use the table of hex values. + * tests/char-class-multibyte: Use printf octal escapes, not hex, + for portability to shells like dash and Solaris 10's /bin/sh. + * tests/backslash-s-vs-invalid-multitype: Likewise. + * tests/surrogate-pair: Likewise. + * tests/unibyte-bracket-expr: Count in decimal and convert to octal. + * tests/multibyte-white-space (hex_printf): New function. + Use it in place of printf so we can retain the table of hex digits + without hitting the limitation of some bourne shells. + Reported by Paul Eggert in http://bugs.gnu.org/15690#11 + +2013-10-21 Jim Meyering <meyering@fb.com> + + gnulib: update to latest + + maint: remove now-unused wcscoll module + * bootstrap.conf (gnulib_modules): Remove wcscoll; no longer used. + +2013-10-20 Paul Eggert <eggert@cs.ucla.edu> + + build: avoid chatter from Automake 1.14 + * configure.ac (AM_INIT_AUTOMAKE): Add subdir-objects. + + build: port shell pattern to Solaris 10 + * configure.ac: Don't use unquoted '^' in a pattern, as this + breaks 'configure' on Solaris 10, whose /bin/sh complains about it, + which causes 'configure' to exit even before it finds a decent shell. + Unix 7th edition shell accepted '^' as an alias for '|'. + + build: port to platforms that predefine _FORTIFY_SOURCE + Problem reported by Brenton Hoff (Bug#15663). + * configure.ac (_FORTIFY_SOURCE): Don't define if already defined. + This is what Emacs does. + +2013-10-20 Jim Meyering <meyering@fb.com> + + build: update gnulib submodule to latest + +2013-10-19 Jim Meyering <meyering@fb.com> + + tests: extend the multibyte-white-space test + * tests/multibyte-white-space (utf8_space_characters): Add more + single-byte whitespace characters. Align RHS hex values and + make the sed substitution less rigid, to accommodate. + Also, ensure that grep '\S' exits with status 1. + + maint: update bootstrap to latest from gnulib + * bootstrap: Update from gnulib. + + maint: fix typo in NEWS + * NEWS: Fix/improve example commands in most recent entry. + The LC_ALL envvar setting goes before grep, not before printf. + Don't reference src/ in the second example command, and do specify + the locale. + +2013-10-09 Jim Meyering <meyering@fb.com> + + tests: add a test for better coverage of some tricky code + * tests/spencer1.tests: Add a non-range bracket expression representing the + same regexp, to cover the alternate code path, the one that does not require + a regcomp/exec call to interpret the regexp. + +2013-10-01 Jim Meyering <meyering@fb.com> + + tests: ensure neither \s nor \S matches an invalid multibyte character + * tests/backslash-S-vs-invalid-multitype: New file. + Prompted by the bug report from Roman at + http://savannah.gnu.org/bugs/?40009 + * tests/Makefile.am (TESTS): Add it. + + dfa: fix \s and \S to work for multibyte + * src/dfa.c (lex): In multibyte mode, we can't treat \s and \S as we do + in single-byte mode. Map them to [[:space:]] and [^[:space:]] respectively, + to make the DFA matcher use the regex-matcher for this term. + * tests/multibyte-white-space: New file. Test for the bug. + * tests/Makefile.am (TESTS): Add it. + This bug was introduced with the addition of DFA support + for \s and \S in commit v2.5.4-112-gf979ca0. + +2013-09-30 Jim Meyering <meyering@fb.com> + + maint: change all references: s/POSIX\.2/POSIX/ + There is no longer any point in referring to POSIX.N. + POSIX is sufficient. + * doc/grep.in.1: As above. + * src/main.c (main): Likewise. + * tests/file: Likewise. + * tests/options: Likewise. + * ChangeLog: Likewise. + * NEWS: Likewise. + * cfg.mk: Update, to match changed NEWS. + Inspired by Glenn Golden's suggestion in http://bugs.gnu.org/15486 + +2013-09-22 Jim Meyering <meyering@fb.com> + + dfa: remove dead disjunct + * src/dfa.c (parse_bracket_exp): Remove dead disjunct. + At that point, we know MB_CUR_MAX <= 1, so the test, + MB_CUR_MAX > 1 && ... is always false. Remove the disjunct. + + maint: dfa: improve comments and formatting + * src/dfa.c (add_utf8_anychar): Correct wording/alignment of a comment. + (dfaexec): Add curly braces around multi-line while statement within + a "then" block. + (ANYCHAR): Clarify comment: "." does not match an invalid UTF8 character. + (parse_bracket_exp) Improve comment. + +2013-09-08 Jim Meyering <meyering@fb.com> + + dfa: appease a static analyzer, and save 95 stack bytes + * src/dfa.c (MAX_BRACKET_STRING_LEN): Rename from BRACKET_BUFFER_SIZE + and decrease from 128 to 32. + (parse_bracket_exp): Add one byte more than MAX_BRACKET_STRING_LEN + to the length of "str" buffer, to avoid appearance that we may store + the trailing NUL beyond the end of buffer. A string of length 32 + or greater is rejected by earlier processing, so would never reach + this code. Addresses http://bugs.gnu.org/15307 + +2013-09-01 Corinna Vinschen <vinschen@redhat.com> + + fix Cygwin UTF-16 surrogate-pair handling with -i + grep -i would segfault on systems using UTF-16-based wchar_t (Cygwin) + when converting an input string containing certain 4-byte UTF-8 + sequences to lower case. The conversions to wchar_t and back to + a UTF-8 multibyte string did not take surrogate pairs into account. + * src/searchutils.c (mbtolower) [__CYGWIN__]: Detect and handle + surrogate pairs when converting. + * NEWS (Bug fixes): Mention it. + * tests/surrogate-pair: New test. + * tests/Makefile.am (TESTS): Add it. + Reported by: Jim Burwell + +2013-08-19 Paul Eggert <eggert@cs.ucla.edu> + + doc: mention how to use the latest gnulib + * README-hacking: Steal some text from coreutils/README-hacking. + +2013-08-10 Jim Meyering <meyering@fb.com> + + build: update gnulib-related code + * gnulib: Update submodule to latest. + * bootstrap: Update from gnulib. + * gl/lib/regex_internal.h.diff: Update to reflect gnulib changes. + * bootstrap.conf: Partial sync from coreutils. + +2013-08-09 Jim Meyering <meyering@fb.com> + + tests: simplify and factor newest test + * tests/char-class-multibyte2: Simplify file names. + Factor out $e_acute, so that the grep argument representation + is ascii (though the value is still UTF8). + + doc: NEWS: mention the DFA segfault fix + * NEWS (Bug fixes): List the DFA segfault fix. + +2013-07-05 Paul Eggert <eggert@cs.ucla.edu> + + Redo comments and white space to better approach GNU style. + +2013-07-05 Paolo Bonzini <bonzini@gnu.org> + + tests: add testcase for previous change + * tests/Makefile.am (TESTS): add char-class-multibyte2. + * tests/char-class-multibyte2: New file. + +2013-07-05 Mike Haertel <mike@ducky.net> + + dfa: fix multibyte character in brackets with repetition + Let FOO stand for any multibyte (e.g. CJK character) in the regexp. + It turns out the following much simpler regexp: + ([^.]*[FOO]){1,2} + is sufficient to cause the crash. + + In the first step of its parsing, DFA transforms regexp from human + readable syntax into reverse-polish form. For regexps of the form a{m,n} + repeat counts, it simply builds repeated copies of the representation + of a, with appropriate inserted CAT and QMARK operators. For the above + example with a regexp of the form a{1,2} it would build: + + <RPN representation for a> + <RPN representation for a> + QMARK + CAT + + When building repeated copies of RPN representations, additional + copies of the RPN representations are made by calling a function + copytoks() with arguments consisting of the start position and + length of the original copy. + + The problem is that the current code for copytoks() is simply + incorrect. It operates by calling addtok() for each individual + token in the source range being copied. But, in the particular + case that the token being added is MBCSET, addtok(): + + (1) incorrectly assumes that the character set being added to be added + is the one most (addtok has no argument to indicate which cset is + being added, so it just uses the latest one) + + (2) attempts to do some token sequence expansion into more primitive + operators so things like [FOO] are matched efficiently. + + Both of these assumptions are incorrect in the case that addtok() + is being called from copytoks(): (1) is simply not true, and + (2) is redundant--the expansion has already been done token sequence + being copied, so there is no need to do the expansion again. + + The correct function to add exactly one token, without further expansion, + is addtok_mb(). So here is my proposed fix, which is that copytoks() + should never call addtok(), but instead directly call addtok_mb() + (which is what addtok() eventually calls). + + * src/dfa.c (copytoks): Rewrite using addtok_mb directly. + +2013-05-28 Jim Meyering <meyering@fb.com> + + maint: align backslashes consistently + * tests/Makefile.am: Most backslashes were aligned with TABs, + so adjust the few that used spaces to conform. + + grep -F: avoid an infinite loop with invalid multi-byte search string + * src/kwsearch.c (Fexecute): Avoid an infinite loop when processing + a fixed (-F) multibyte search string that is an invalid byte sequence + in the current locale and that matches the bytes of the input twice + on a line. Reported by Daisuke GOTO in + http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4773 + * tests/invalid-multibyte-infloop: New test. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + +2013-04-18 Paul Eggert <eggert@cs.ucla.edu> + + * cfg.mk (old_NEWS_hash): Update. + + doc: document EREs like a{,10} + Problem reported by Eric Blake in + <http://lists.gnu.org/archive/html/bug-grep/2013-04/msg00005.html>. + * NEWS: Document the bug fix. + * doc/grep.in.1: Restore documentation for this feature, but mention + that it is a GNU extension. + * doc/grep.texi (Fundamental Structure): Mention that this feature + is a GNU extension. + +2013-04-02 Paul Eggert <eggert@cs.ucla.edu> + + build: make dfa.c closer to Gawk's + * src/dfa.c: Include <stddef.h>, not <sys/types.h>. + stddef.h is smaller and is all we need and is portable nowadays. + Include <wchar.h> and <wctype.h> only if MBS_SUPPORT. + +2013-01-15 Paul Eggert <eggert@cs.ucla.edu> + + grep: make dfa.h standalone + Problem reported by Aharon Robbins in + <http://lists.gnu.org/archive/html/bug-grep/2013-01/msg00007.html>. + * src/dfa.c: Include dfa.h first, so that it's tested standalone. + No need to include <regex.h>, since we are in charge of dfa.h and + know that it includes <regex.h>. + * src/dfa.h: Include <regex.h> and <stddef.h>, so that it's standalone. + +2013-01-11 Stefano Lattarini <stefano.lattarini@gmail.com> + + build: update gettext version to 0.18.2 + * configure.ac (AM_GNU_GETTEXT_VERSION): Update to 0.18.2. + This is necessary to have the gettext-provided m4 files to use + AC_PROG_MKDIR_P rather than AM_PROG_MKDIR_P. This latter macro, + planned to disappear in Automake 1.14, has already been removed + in the development version of Automake, so that, without this + change, grep fails to bootstrap with bleeding-edge Automake. + +2013-01-11 Paul Eggert <eggert@cs.ucla.edu> + + build: update gnulib submodule to latest + +2013-01-11 Stefano Lattarini <stefano.lattarini@gmail.com> + + build: remove redundant use of $(INCLUDES) + * lib/Makefile.am (INCLUDES): Remove. Automake automatically adds + $(srcdir) and $(top_builddir) to the C preprocessor search path. + INCLUDES is deprecated in Automake 1.13 (causing a runtime + warning), and will be removed in Automake 1.14. + +2013-01-04 Jim Meyering <jim@meyering.net> + + build: update gnulib submodule to latest + + maint: update all copyright year number ranges + Run "make update-copyright". + +2012-11-20 Paul Eggert <eggert@cs.ucla.edu> + + grep: normalize diagnostics + * src/pcresearch.c (Pcompile): Use similar format diagnostics + as elsewhere, and translate them. + +2012-11-19 Paul Eggert <eggert@cs.ucla.edu> + + grep: diagnose read errors from -f dir, porting to Solaris + Problem reported by Dennis Clarke for Solaris 10 in + <http://lists.gnu.org/archive/html/bug-grep/2012-11/msg00009.html>. + * src/main.c (main): For -f F, diagnose any read errors + encountered when reading F. + * tests/Makefile.am (XFAIL_TESTS): Remove grep-dir. + * tests/grep-dir: Don't assume that directories cannot be read + via fread, as POSIX allows this and it can happen on Solaris. + +2012-11-09 Paolo Bonzini <bonzini@gnu.org> + + pcre: add PCRE-JIT support for grep + * NEWS: Document new feature. + * src/pcresearch.c [PCRE_STUDY_JIT_COMPILE] (jit_stack): New. + [PCRE_STUDY_JIT_COMPILE] (Pcompile): JIT-compile the regular expression + and allocate a stack for it. Based on a patch from Zoltan Herczeg. + * THANKS: Add Zoltan to the list. + +2012-10-24 Paul Eggert <eggert@cs.ucla.edu> + + build: go back to AC_PROG_CC + * configure.ac: Go back to using AC_PROG_CC rather than AC_PROG_CC_STDC, + as the latter is obsolescent and the Autoconf bug involving the former + has been fixed. + +2012-10-24 Jim Meyering <jim@meyering.net> + + build: use AC_PROG_CC_STDC rather than AC_PROG_CC + * configure.ac: Use AC_PROG_CC_STDC rather than AC_PROG_CC, + to accommodate autoconf-2.69-37+. + + build: update gnulib submodule to latest + +2012-10-23 Eric Blake <eblake@redhat.com> + + build: default to --enable-gcc-warnings in a git tree + Anyone building from cloned sources can be assumed to have a new + enough environment, such that enabling gcc warnings by default will + be useful. Tarballs still default to no warnings, and the default + can still be overridden with --disable-gcc-warnings. + * configure.ac (gl_gcc_warnings): Set default based on environment. + +2012-10-03 Jim Meyering <meyering@redhat.com> + + maint: factor out STREQ definition + * src/main.c (STREQ): Remove definition. + * src/pcresearch.c: (STREQ): Likewise. + * src/system.h (STREQ): Define it here instead. + + maint: correct syntax-check failures; adjust NEWS + * tests/pcre-utf8: Reverse order of compare arguments. + Remove all copyright year numbers except 2012. + Use skip_ "diagnostic...", rather than a bare "exit 77". + * NEWS: Start with a concise description of the bug. + * src/pcresearch.c (STREQ): Define, so that we can... + (Pcompile): use STREQ, not strcmp. + +2012-10-03 Paolo Bonzini <bonzini@gnu.org> + + tests: include UTF-8 testcases for grep -P + * tests/Makefile.am (TESTS): Add pcre-utf8. + * tests/pcre-utf8: New file. + +2012-10-03 Petr Pisar <ppisar@redhat.com> + + pcresearch: set UTF-8 flag correctly for UTF-8 locales + Otherwise, Unicode properties (\p{XXX}) do not work with characters + outside the 7-bit ASCII character set. + + * src/pcresearch.c (Pcompile): Look for UTF-8 locales and set PCRE_UTF8 + if one is found. + +2012-10-03 Jaroslav Škarvada <jskarvad@redhat.com> + + doc: fix a formatting bug in grep.1 template + * doc/grep.in.1: Insert .TP before the paragraph describing + --dereference-recursive (-R). + +2012-10-03 Jim Meyering <meyering@redhat.com> + + maint: placate gcc's -Wjump-misses-init warning + * src/kwsearch.c (Fexecute): Replace a "goto" and "return" with + a simple return statement, eliminating the label, since that was + the sole use. + * src/dfasearch.c (EGexecute): Likewise. + +2012-09-01 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2012-09-01 Eric Blake <eblake@redhat.com> + + build: work with new glibc when not optimizing + Starting with glibc 2.15, the system headers refuse to compile + unconditional use of FORTIFY_SOURCE if optimization is disabled + but -Werror is in effect. + + * configure.ac (FORTIFY_SOURCE): Make conditional. + +2012-08-19 Jim Meyering <meyering@redhat.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.14 + * NEWS: Record release date. + +2012-08-07 Jim Meyering <meyering@redhat.com> + + build: update gnulib and bootstrap + + tests: test for bug with -i and ^$ in a multi-byte locale + * tests/empty-line-mb: New file. + * tests/Makefile.am (TESTS): Add it. + + grep -i '^$' in a multi-byte locale could report a false match + * src/dfasearch.c (EGexecute): Do not match the sentinel "newline" + that is appended to each buffer. + This bug may sound like a big deal (it certainly surprised me), but + realize that only the empty-line-matching regular expression '^$' + can trigger it, and then only when you add the unnecessary (and + arguably superfluous) -i, *and* run the command in a multi-byte + locale. Using a multi-byte locale for such a regular expression + is also pointless, and hurts performance. + * NEWS (Bug fixes): Mention it. + Reported by Alexander Katassonov <katasso@gmx.de> + +2012-08-06 Jim Meyering <meyering@redhat.com> + + tests: fix a skip diagnostic that mentioned the wrong locale + * tests/init.cfg (require_tr_utf8_locale_): s/en_US/tr_TR/ + +2012-08-02 Jim Meyering <meyering@redhat.com> + + tests: skip failing test on FS/system that lack SEEK_HOLE support + * tests/big-hole: Test for SEEK_HOLE support. If not available, + skip this test. Hence, this test is now skipped on linux-3.5.0 with + ext4 or tmpfs. The test runs (and passes) with at least btrfs, xfs, + or ocfs2. + * bootstrap.conf (gnulib_modules): Use the perl module. + +2012-07-30 Jim Meyering <meyering@redhat.com> + + maint: optimize long-line processing + * src/main.c (grep): Use memrchr rather than an open-coded loop, + reducing the cost of the replaced code by 50% when processing very + long lines. If there were a rawmemrchr function (analogous to glibc's + rawmemchr), then the performance improvement would be even greater. + +2012-07-27 Paul Eggert <eggert@cs.ucla.edu> + + maint: remove stat-size + * bootstrap.conf (gnulib_modules): Remove stat-size. + * src/main.c: Don't include stat-size.h; no longer needed. + + grep: don't falsely report compressed text files as binary + * NEWS: Document this. + * src/main.c (file_is_binary): Remove the heuristic based on + st_blocks, as it does not work for compressed file systems. + On Solaris, it'd be cheap to test whether the file system is known + to be uncompressed, which allow the heuristic, but Solaris has + SEEK_HOLE so there's little point. + + grep: don't falsely report tiny text files as binary + * NEWS: Document this. + * src/main.c (file_is_binary): When we are already at apparent + EOF, skip the file-size check, as some servers use zero blocks + to store binary files. Reported by Martin Carroll in + <http://lists.gnu.org/archive/html/bug-grep/2012-07/msg00016.html>. + +2012-07-26 Paul Eggert <eggert@cs.ucla.edu> + + doc: document -r/-R in man page + * doc/grep.in.1: Document -r vs. -R. + +2012-07-21 Jim Meyering <meyering@redhat.com> + + tests: avoid false positive upon kernel OOM-kill + * tests/big-match (skip_diagnostic): Handle case of 139 (SIGKILL) + with no diagnostic. + + build: update gnulib and bootstrap + + maint: fix misspellings in old ChangeLog + * ChangeLog-2009: Fix typos. + +2012-07-19 Paul Eggert <eggert@cs.ucla.edu> + + grep: fix ptrdiff/size_t clash + Reported by Jaroslav Škarvada in <http://savannah.gnu.org/bugs/?36883>. + * src/dfasearch.c (EGexecute): Use size_t, not ptrdiff_t, for lengths. + Use regoff_t to store re_match's output, and test it before converting + it to size_t. + +2012-07-06 Jim Meyering <meyering@redhat.com> + + maint: correct log typo, to reflect in generated ChangeLog + * Makefile.am (gen-ChangeLog): Use --amend, now that we must + make our first log correction. + * build-aux/git-log-fix: New file. + +2012-07-04 Jim Meyering <meyering@redhat.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.13 + * NEWS: Record release date. + + build: update gnulib submodule, bootstrap, init.sh + +2012-06-17 Jim Meyering <meyering@redhat.com> + + tests: add another turkish-I-related test case + * tests/turkish-I-without-dot: Also exercise the case in which + the original string and the lower-case buffer have precisely + the same length (22 bytes here), yet internal offsets do differ. + +2012-06-16 Jim Meyering <meyering@redhat.com> + + grep -i: work also when converting to lower-case inflates byte count + Commit v2.12-16-g7aa698d addressed the case in which the lower-case + representation of an input byte occupies fewer bytes than the original. + However, even with commit v2.12-20-g074842d, grep -i would still + misbehave when converting a character to lower-case increased its + byte count. The map-manipulation code assumed that the case conversion + could only shrink the byte count. With the consideration that it may + also inflate it, the deltas recorded in the map array must be signed, + and we must account for the one-to-two-or-more mapping when the + original-to-lower-case conversion causes the byte count to increase. + * src/searchutils.c (mbtolower): When a lower-case character occupies + more than one byte, set its remaining map slots to zero. Change the + type of the map to be signed, and compute the change in character + byte count as new_length - old_length. + * src/search.h: Include <stdint.h>, for decl of intmax_t. + (mb_case_map_apply): Adjust for signed increments: + each map entry is now signed. + (mb_len_map_t): Define type. Thanks to Paul Eggert for noticing + in review that using a bare "char" as the base type would be wrong on + systems for which it is a signed type (as with gcc's -funsigned-char). + * src/kwsearch.c (Fcompile, Fexecute): Likewise. + * src/dfasearch.c (kwsincr_case, EGexecute): Likewise. + * tests/turkish-I-without-dot: New test. Thanks to Paolo Bonzini + for the tip that in the tr_TR.utf8 locale, mapping "I" to lower case + increases the character's byte count. + * tests/Makefile.am (TESTS): Add it. + * tests/init.cfg (require_tr_utf8_locale_): New function. + * NEWS (Bug fixes): Expand the existing entry. + +2012-06-12 Paul Eggert <eggert@cs.ucla.edu> + + grep: handle -i when chars differ in length but line does not + * src/searchutils.c (mbtolower): Return the map back to the caller + if any input character's length differs from the corresponding output + character's, not merely if the total string length differs. + Problem reported by Johannes Meixner in + <http://lists.gnu.org/archive/html/bug-grep/2012-06/msg00029.html>. + +2012-06-07 Jim Meyering <meyering@redhat.com> + + tests: extend coverage of dfa.c's match_mb_charset + Add a test case to increase test coverage of part of dfa.c (the DFA + matcher used by grep and gawk). While thinking about removing the few + remaining uses of strncpy in dfa.c, I found that none of the existing + tests covered the 40+ lines of code at the end of match_mb_charset, + so constructed this test case to demonstrate that it's not dead code + * tests/dfa-coverage: New test, for improved coverage. + * tests/Makefile.am (TESTS): Add it. + +2012-06-05 Jim Meyering <meyering@redhat.com> + + build: fix a subtly twisted "make distcheck" failure + "make distcheck" would fail when, during a test build, + an attempt to overwrite the deliberately-write-protected + $(srcdir)/grep.pot file would fail. + * bootstrap.conf (bootstrap_epilogue): Don't let the existence of + a large sparse file in the build directory induce "make distcheck" + failure. The existence of a large sparse test file named 8T-or-so + would make po/Makefile.in.in's use of grep (to search for "GNU grep" + as an indication that this is a GNU package) exit 2 without generating + any output, which made the first xgettext use --package-name=grep, + while that same search for "GNU grep" would succeed when run + from a pristine from-tarball build, thus making the second + xgettext invocation use --package-name='GNU grep'. + That mismatch: + -"Project-Id-Version: grep 2.12.18-1080\n" + +"Project-Id-Version: GNU grep 2.12.18-1080\n" + led to the attempt by Makefile.in.in's grep.pot-update rule to + overwrite ../../grep.pot in the read-only po/ source directory. + +2012-06-03 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule, bootstrap and init.sh + cfg.mk: Exempt dfa.c from the new no-strncpy test, for now. + +2012-06-02 Jim Meyering <meyering@redhat.com> + + grep: fix how -i works with a match containing the Turkish I-with-dot + Fix a long-standing problem in the way grep's -i interacts with + data whose byte count changes when we convert it to lower case. + For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes, + but its lower case analog, i, occupies just one byte. The code + converts both search string and the haystack data to lower case, + and then searches for the modified string in the modified buffer. + The trouble arose when using a lowercase buffer <offset,length> + pair to manipulate the original (longer) buffer. + + The solution is to change mbtolower to return additional information: + a malloc'd mapping vector. With that, the caller maps the lowercase- + relative <offset,length> to numbers that refer to the original buffer. + This mapping is used only when lengths actually differ, so the cost + in general should be small. + + * src/searchutils.c (mbtolower): Add the new map parameter. + * src/search.h (mb_case_map_apply): New function. + * src/kwsearch.c (Fexecute): Update mbtolower caller, and upon + success, apply the new map. + * src/dfasearch.c (EGexecute): Likewise. + * tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list; + that test is no longer expected to fail. + * NEWS (Bug fixes): Mention it. + Reported by Ilya Basin in + http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later + by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567 + +2012-06-01 Paul Eggert <eggert@cs.ucla.edu> + + grep: remove unnecessary "what-if-signal?" code + * src/main.c (fillbuf): Don't worry about EINTR when closing -- + not possible, since we're not catching signals. + +2012-05-16 Paul Eggert <eggert@cs.ucla.edu> + + grep: avoid nominal integer overflow + * src/dfa.c (add_utf8_anychar): Avoid signed integer overflow. + Although this works on all platforms we know about, strictly + speaking the behavior is undefined, and Sun C 5.8 warns about it. + +2012-05-15 Jim Meyering <meyering@redhat.com> + + maint: avoid nit-picky syntax-check test failure; tweak big-hole test + * NEWS: Restore deleted newline in "old" NEWS, to fix a syntax-check + test failure. + * tests/big-hole: Use awk, rather than a shell loop: saves 3000 lines + of verbose shell output in the .log file. + +2012-05-15 Paul Eggert <eggert@cs.ucla.edu> + + grep: sparse files are now considered binary + * NEWS: Document this. + * doc/grep.texi (File and Directory Selection): Likewise. + * bootstrap.conf (gnulib_modules): Add stat-size. + * src/main.c: Include stat-size.h. + (usable_st_size): New function, mostly stolen from coreutils. + (fillbuf): Use it. + (file_is_binary): New function, which looks for holes too. + (grep): Use it. + * tests/Makefile.am (TESTS): Add big-hole. + * tests/big-hole: New file. + +2012-05-06 Paul Eggert <eggert@cs.ucla.edu> + + maint: quote 'like this' or "like this", not `like this' + See <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00125.html>. + * ChangeLog-2009, HACKING, NEWS, README-hacking, cfg.mk, configure.ac: + * lib/colorize-w32.c, m4/pcre.m4: + * src/Makefile.am, src/dfa.c, src/dosbuf.c, src/main.c: + * tests/backref, tests/help-version, tests/tests: + In commentary, quote 'like this' or "like this" rather than + `like this' or ``like this''. + * cfg.mk (old_NEWS_hash): Update due to changed old NEWS. + * doc/grep.texi (General Output Control): Quote sample text + with @samp, not with `...'. + * src/main.c (usage): + * tests/help-version: Quote 'like this' rather than `like this' + in diagnostics. + + exclude: process exclude and include directives in order + Also, change exclude and include directives so that they apply to + command-line arguments too. This restores the pre-2.6 behavior, + and fixes a bug reported by Quentin Arce in + <http://lists.gnu.org/archive/html/bug-grep/2012-04/msg00056.html>. + * NEWS: Document this. + * src/main.c (included_patterns): Remove. All uses removed. + (skipped_file): New function. + (grepdirent): New arg command_line; all callers changed. This is + needed because non-command-line files can invoke fts_open, and + their directory entries need to be distinguished from top-level + directory entries. Move code into the new skipped_file function. + (grepdesc): Check whether a command-line argument should be skipped. + (main): --include and --exclude options now share excluded_patterns + rather than having separate variables included_patterns and + excluded_patterns. + * tests/include-exclude: Add a test to detect the fixed bug. + + build: update gnulib submodule to latest + +2012-04-30 Jim Meyering <meyering@redhat.com> + + cosmetic: binary operator goes *after* the newline, when split + * src/dfa.c (match_mb_charset): Join split lines. + (parse_bracket_exp): Move "||" from end of first split line + to the beginning of the continued line. + * src/dosbuf.c (dossified_pos): Likewise, but for "&&". + + grep: -K is not an option: remove it from list + The presence of "K" in the short-option string meant that + an erroneous "grep -K ..." would fail with a bare Usage/Try... + message, without the usual "invalid option -- 'K'". With this + removal, now grep prints the expected invalid option diagnostic. + * src/main.c (short_options): Remove "K". + Reported by Петр Досычев in + http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4488 + +2012-04-29 Paolo Bonzini <bonzini@gnu.org> + + dfa: small fixes to single-byte range computation + * src/dfa.c (parse_bracket_exp): Do not call regexec with an invalid + subject. Move declarations before all statements. + +2012-04-27 Paolo Bonzini <bonzini@gnu.org> + + dfa: do not use hard-locale + * bootstrap.conf (gnulib_modules): Remove hard-locale. + * src/dfa.c (hard_LC_COLLATE): Remove. + (dfaparse): Do not initialize it. + (parse_bracket_exp): Always go through system regex matcher to find + single byte characters matching a range. + + drop support for Makefile.boot + * Makefile.am: Do not distribute README-boot and Makefile.boot. + * NEWS: Mention this change. + * README-alpha: Do not mention README-boot and Makefile.boot. + * Makefile.boot: Remove. + * README-boot: Remove. + +2012-04-27 Aharon Robbins <arnold@skeeve.com> + + dfa: do not use strcoll to match multibyte characters in ranges + This does not affect the behavior of grep, which always defers + to glibc or gnulib when matching ranges. + * src/dfa.c (match_mb_charset): Compare wc directly to the range + endpoints. + + dfa: include stdbool.h explicitly + * src/dfa.c: Include stdbool.h explicitly + +2012-04-23 Jim Meyering <meyering@redhat.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.12 + * NEWS: Record release date. + + build: update gnulib submodule to latest + + tests: skip annoyingly long gnulib lock tests + * bootstrap.conf (avoided_gnulib_modules): Define. + (gnulib_tool_option_extras): Use it. + +2012-04-22 Jim Meyering <meyering@redhat.com> + + tests: avoid spurious quote-mismatch failure on OS/X + * tests/in-eq-out-infloop: Simplify expected error output, eliminating + expected quotes altogether, thus avoiding spurious OS/X-specific + failure due to mismatch of multi-byte vs. single-byte quotes. + +2012-04-17 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + * bootstrap: Also update this file. + +2012-04-17 Jim Meyering <meyering@redhat.com> + + grep: fix --devices=ACTION (-D) so stdin is once again exempt + An oversight in the 2.11 changes made it so "echo x|grep x" would + fail for those who set GREP_OPTIONS=--devices=skip. + + * src/main.c (grepdesc): Ignore skip-related options when reading + from standard input. + * tests/skip-device: New file. Test for the above. + * tests/Makefile.am (TESTS): Add it. + * doc/grep.texi (File and Directory Selection): Clarify this point, + documenting the stdin exemption. + * NEWS (Bug fixes): Mention it, and add a few "[fixed in ...] notes. + Reported by Tino Keitel in http://bugs.debian.org/669084, + and forwarded to bug-grep by Aníbal Monsalve Salazar. + +2012-04-13 Jim Meyering <meyering@redhat.com> + + maint: dfa: correct bogus formatting + * src/dfa.c (transit_state, dfaexec): s/++ * VAR/++*VAR/ + + maint: dfa: add/improve comments + * src/dfa.c (transit_state_consume_1char): Note always-ignored + return value. + Fix typos: s/equivalent class/equivalence class/. + + maint: dfa: avoid unnecessary uses of strcpy/strncpy + * src/dfa.c (icatalloc): Use memcpy, not strcpy, given the length. + (dfamust): Combine MALLOC+strcpy into cleaner xmemdup. + (parse_bracket_exp): Likewise, but replace a use of strncpy. + + grep: handle symlinked directory loops as usual + * src/main.c (grepfile): Treat EMLINK just like ELOOP, for + systems like FreeBSD 9.0 on which we would otherwise report + "Too many links" rather than ignoring that type of failure. + E.g., "mkdir d; cd d; ln -s . a; grep -r ^" would print + grep: a: Too many links and would exit with status 2. + Now, it prints nothing and exits with status 1, as before. + Reported by Nelson H. F. Beebe. + + tests: avoid spurious failure of the symlink test + * tests/symlink: Ignore spurious "Binary file d matches" on + systems for which reading from a directory actually succeeds. + Reported by Bruno Haible and Nelson Beebe. + +2012-04-09 Jim Meyering <meyering@redhat.com> + + tests: avoid syntax-check failure: reverse compare arguments + * tests/repetition-overflow: Fix reversed compare arguments. + + build: update gnulib submodule to latest + +2012-03-18 Paul Eggert <eggert@cs.ucla.edu> + + grep: report overflow for ERE a{1000000000} + * NEWS: Document this. + * src/dfa.c (MIN): New macro. + (lex): Lexically analyze the repeat-count operator once, not + twice; the double-scan complicated the code and made it harder to + understand and fix. Adjust the repeat-count parsing so that it + better matches the behavior of the regex code, in three ways: + 1. Diagnose too-large repeat counts rather than treating them as + literal characters. 2. Use RE_INVALID_INTERVAL_ORD, not + RE_NO_BK_BRACES, to decide whether to treat invalid-syntax {...}s + as literals. 3. Use the same wording for {...}-related + diagnostics that the regex code uses. + * tests/bre.tests, tests/ere.tests, tests/repetition-overflow: + Adjust to match new behavior, and add a few tests. + * cfg.mk (exclude_file_name_regexp--sc_error_message_uppercase): + New macro, since the diagnostics start with uppercase letters. + +2012-03-14 Paul Eggert <eggert@cs.ucla.edu> + + grep: -r no longer follows symlinks; use fts + Change -r to follow only command-line symlinks, and by default to + read only devices named on the command line. This is a simple + way to get a more-useful behavior when searching random + directories; the idea is to use 'find' if you want something fancy. + -R acts as before and gets a new alias --dereference-recursive. + The code now uses fts internally, so it is more robust and + faster with large hierarchies. + * .gitignore: Remove lib/savedir.c, lib/savedir.h. + * tests/symlink: New file + * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. + Perhaps other changes are needed too, but I'm not sure what + this makefile is for. + * NEWS: Document changes. + * doc/grep.texi (File and Directory Selection): Likewise. + * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. + Add fstatat, fts, openat-safer. + * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. + * lib/savedir.c, lib/savedir.h: Remove. + * po/POTFILES.in: Add lib/openat-die.c. + * src/main.c: Include fcntl-safer.h, fts_.h. Don't include + isdir.h, savedir.h. + (struct stats, stats_base): Remove. + (long_options, usage, main): Add --dereference-recursive and + implement -r vs -R. + (filename_prefix_len, fts_options): New static vars. + (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. + (devices): Now defaults to READ_COMMAND_LINE_DEVICES. + (reset, grep): Now takes just struct stat rather than file name and + struct stats. All callers changed. + (fillbuf): Now takes struct stat reather than struct stats. + All callers changed. + (grep): Don't worry about recursing too deeply; fts and grepdesc + handle this now. + (is_device_mode, grepdirent, grepdesc, grep_command_line_args): + New functions. + (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats + arg. All callers changed. Use openat_safer rather than open. + Use desc == STDIN_FILENO to tell whether we're reading "-". + Don't worry about EINTR when closing -- not possible, since we're + not catching signals. + * tests/Makefile.am (TESTS): Add symlink. + * tests/symlink: New file. + +2012-03-12 Paul Eggert <eggert@cs.ucla.edu> + + tests: port big-match to non-GNU dd + * tests/big-match: Don't assume GNU dd extension "bs=1M". + + tests: test for bug with -r --exclude-dir and no file operand + * tests/include-exclude: Test for the bug and fix. + +2012-03-12 Allan McRae <allan@archlinux.org> + + grep: fix segfault with -r --exclude-dir and no file operand + * src/main.c (grepdir): Don't invoke excluded_file_name on NULL. + * NEWS (Bug fixes): Mention it. + +2012-03-09 Jim Meyering <meyering@redhat.com> + + tests: exercise two recently-fixed bugs + * tests/repetition-overflow: New test for bugs fixed by commit + v2.10-82-gcbbc1a4. + * tests/Makefile.am (TESTS): Add it. + +2012-03-03 Jim Meyering <meyering@redhat.com> + + maint: use an optimal-for-grep xz compression setting + * cfg.mk (XZ_OPT): Use -6e (determined empirically, see comments). + This sacrifices a meager 60 bytes of compressed tarball size for a + 55-MiB decrease in the memory required during decompression. I.e., + using -9e would shave off only 60 bytes from the tar.xz file, yet + would force every decompression process to use 55 MiB more memory. + + build: update gnulib submodule to latest + +2012-03-02 Jim Meyering <meyering@redhat.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.11 + * NEWS: Record release date. + + tests: avoid failure when using Solaris 10's sed + * tests/reversed-range-endpoints: Use a simpler sed expression to + sanitize actual output, so it also works with Solaris 10's /bin/sed. + +2012-03-01 Jim Meyering <meyering@redhat.com> + + maint: manually correct formatting in dfa.c's cpp definitions + * src/dfa.c: Adjust formatting in cpp definitions. + + maint: indent dfa.c + * src/dfa.c: Filter through indent like this: + HOME=. indent -Tsize_t -l79 --leave-preprocessor-space \ + --dont-format-comments --no-tabs < dfa.c > k && mv k dfa.c + + doc: correct grep.1's descriptions of \w and \W (they omitted "_") + * doc/grep.in.1: Fix descriptions of \w and \W. + They did not mention "_". + * doc/grep.texi (The Backslash Character and Special Expressions): + [\w, \W]: List the "_" before the char class, not after: [_[:alnum:]], + for readability and to be consistent with the man page. + +2012-03-01 Paul Eggert <eggert@cs.ucla.edu> + + maint: spelling fixes + + grep: fix integer-overflow issues in main program + * NEWS: Document this. + * bootstrap.conf (gnulib_modules): Add inttypes, xstrtoimax. + Remove xstrtoumax. + * src/main.c: Include <inttypes.h>, for INTMAX_MAX, PRIdMAX. + (context_length_arg, prtext, grepbuf, grep, grepfile) + (get_nondigit_option, main): + Use intmax_t, not int, for line counts. + (context_length_arg, main): Silently ceiling line counts + to maximum value, since there's no practical difference between + doing that and using infinite-precision arithmetic. + (out_before, out_after, pending): Now intmax_t, not int. + (max_count, outleft): Now intmax_t, not off_t. + (prepend_args, prepend_default_options, main): + Use size_t, not int, for sizes. + (prepend_default_options): Check for int and size_t overflow. + + grep: avoid mishandling of long lines + * src/pcresearch.c (Pexecute): Do not pass a line longer than + INT_MAX to pcre_exec, since its API does not permit that. + + grep: remove no-longer-used setrlimit code + This code has been unused and obsolescent ever since the regex + code stopped using the stack for large regular expressions. + * src/main.c [HAVE_SETRLIMIT]: Do not include <sys/time.h> or + or <sys/resource.h>; no longer needed. + (set_rlimits): Remove. All callers changed. + + grep: fix some core dumps with long lines etc. + These problems mostly occur because the code attempts to stuff + sizes into int or into unsigned int; this doesn't work on most + 64-bit hosts and the errors can lead to core dumps. + * NEWS: Document this. + * src/dfa.c (token): Typedef to ptrdiff_t, since the enum's + range could be as small as -128 .. 127 on practical hosts. + (position.index): Now size_t, not unsigned int. + (leaf_set.elems): Now size_t *, not unsigned int *. + (dfa_state.hash, struct mb_char_classes.nchars, .nch_classes) + (.nranges, .nequivs, .ncoll_elems, struct dfa.cindex, .calloc, .tindex) + (.talloc, .depth, .nleaves, .nregexps, .nmultibyte_prop, .nmbcsets): + (.mbcsets_alloc): Now size_t, not int. + (dfa_state.first_end): Now token, not int. + (state_num): New type. + (struct mb_char_classes.cset): Now ptrdiff_t, not int. + (struct dfa.utf8_anychar_classes): Now token[5], not int[5]. + (struct dfa.sindex, .salloc, .tralloc): Now state_num, not int. + (struct dfa.trans, .realtrans, .fails): Now state_num **, not int **. + (struct dfa.newlines): Now state_num *, not int *. + (prtok): Don't assume 'token' is no wider than int. + (lexleft, parens, depth): Now size_t, not int. + (charclass_index, nsubtoks) + (parse_bracket_exp, addtok, copytoks, closure, insert, merge, delete) + (state_index, epsclosure, state_separate_contexts) + (dfaanalyze, dfastate, build_state, realloc_trans_if_necessary) + (transit_state_singlebyte, match_anychar, match_mb_charset) + (check_matching_with_multibyte_ops, transit_state_consume_1char) + (transit_state, dfaexec, free_mbdata, dfaoptimize, dfafree) + (freelist, enlist, addlists, inboth, dfamust): + Don't assume indexes fit in 'int'. + (lex): Avoid overflow in string-to-{hi,lo} conversions. + (dfaanalyze): Redo indexing so that it works with size_t values, + which cannot go negative. + * src/dfa.h (dfaexec): Count argument is now size_t *, not int *. + (dfastate): State numbers are now ptrdiff_t, not int. + * src/dfasearch.c: Include "intprops.h", for TYPE_MAXIMUM. + (kwset_exact_matches): Now size_t, not int. + (EGexecute): Don't assume indexes fit in 'int'. + Check for overflow before converting a ptrdiff_t to a regoff_t, + as regoff_t is narrower than ptrdiff_t in 64-bit glibc (contra POSIX). + Check for memory exhaustion in re_search rather than treating + it merely as failure to match; use xalloc_die () to report any error. + * src/kwset.c (struct trie.accepting): Now size_t, not unsigned int. + (struct kwset.words): Now ptrdiff_t, not int. + * src/kwset.h (struct kwsmatch.index): Now size_t, not int. + + tests: test for problems with long matches + The new test is expensive, so add a category of expensive tests, + which are normally not run, and put the new test in this new + category. The idea of having expensive tests is taken from coreutils. + * HACKING: Mention RUN_EXPENSIVE_TESTS and similar env vars. + * Makefile.am (check-expensive): New rule. + * tests/Makefile.am (TESTS): Add big-match. + * tests/init.cfg (expensive_): New function, from coreutils. + * tests/big-match: New file. + +2012-02-29 Paul Eggert <eggert@cs.ucla.edu> + + maint: use gnulib _Noreturn rather than __attribute__ ((noreturn)) + * src/grep.h (__attribute__): Remove. + * src/dfa.h (__attribute__): Likewise. + (dfaerror): Use noreturn rather than __attribute__ ((noreturn)). + * src/main.c (usage): Likewise. + +2012-02-26 Jim Meyering <meyering@redhat.com> + + build: update submodule, bootstrap, tests/init.sh from gnulib + * gl/lib/regcomp.c.diff: Adjust. + * bootstrap: Update from gnulib. + * tests/init.sh: Update from gnulib. + +2012-02-26 Paolo Bonzini <bonzini@gnu.org> + + dfa: merge calls to SUCCEEDS_IN_CONTEXT + * src/dfa.c (state_index): use a single call to SUCCEEDS_IN_CONTEXT. + + dfa: fix a subtle constraint encoding bug + * src/dfa.c (SUCCEEDS_IN_CONTEXT, PREV_NEWLINE_DEPENDENT, + PREV_LETTER_DEPENDENT): Rewrite to handle all 3*3=9 possible + combinations of previous and next character contexts. + (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT): Remove. + (NO_CONSTRAINT, BEGLINE_CONSTRAINT, ENDLINE_CONSTRAINT, + BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT, LIMWORD_CONSTRAINT, + NOTLIMWORD_CONSTRAINT): Switch to new encoding. + * NEWS: Document resulting bugfix. + * tests/spencer1.tests: Add regression test. + + dfa: do not use MATCHES_*_CONTEXT directly + * src/dfa.c (dfastate): Use SUCCEEDS_IN_CONTEXT. + + dfa: change meaning of a state context + * src/dfa.c (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT): New. + (state_separate_contexts): Remove second argument. + (state_index): Do not mask away CTX_NONE. + (dfaanalyze): Adjust call to state_index and state_separate_contexts. + (dfastate): Adjust calls to state_index and state_separate_contexts. + +2012-02-13 Paul Eggert <eggert@cs.ucla.edu> + + tests: fix loop in epipe test + * tests/epipe: Don't loop forever if the bug is present. + Problem reported by Jaroslav Skarvada. + +2012-02-08 Paul Eggert <eggert@cs.ucla.edu> + + tests: work portably even if SIGPIPE is ignored + * tests/epipe: Don't rely on "trap - PIPE"; that's not portable. + Problem reported by Eric Blake in + <http://lists.gnu.org/archive/html/bug-grep/2012-02/msg00017.html>. + Also, use "ls -al" rather than "echo", in case "echo" is done by a + buggy shell that ignores write errors. And close grep's fd 3, as + a sanity check. + +2012-02-07 Paul Eggert <eggert@cs.ucla.edu> + + tests: work even if SIGPIPE is ignored + * tests/epipe: Do not infinite-loop if SIGPIPE is already ignored. + It could be that the invoker of 'make check' ignores SIGPIPE, + for example. + +2012-02-05 Jim Meyering <meyering@redhat.com> + + build: accommodate -Wshadow and -Werror=suggest-attribute=pure + * src/dfa.c (state_separate_contexts): Add _GL_ATTRIBUTE_PURE. + (dfaexec): Rename parameter, s/newline/allow_nl/, to avoid + shadowing the global. + +2012-02-05 Paolo Bonzini <bonzini@gnu.org> + + dfa: refactor common context computations + * src/dfa.c (CTX_ANY, charclass_context, state_separate_contexts): New. + (dfaanalyze): Use state_separate_contexts. + (dfastate): Use charclass_context and state_separate_contexts. Rename + prev_context to separate_contexts. + + dfa: change newline/letter to a single context value + * src/dfa.c (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT, + SUCCEEDS_IN_CONTEXT, ACCEPTS_IN_CONTEXT): Take a single context value + for prev and curr. + (struct dfa_state): Replace newline and letter with context. + (wchar_context): New. + (state_index): Replace newline and letter with context. Compare + context values in the state struct. Adjust calls to pass contexts. + (wants_newline): Replace with wanted_context. Adjust calls to pass + contexts. + (dfastate): Replace wants_newline and wants_letter with wanted_context. + Adjust calls to pass contexts. + (build_state): Adjust calls to pass contexts. + (match_anychar, match_mb_charset, transit_state): Use wchar_context. + Adjust calls to pass contexts. + +2012-02-05 Paolo Bonzini <bonzini@gnu.org> + + dfa: introduce contexts for the values in d->success + Also initialize all tables in a single place in dfasyntax. + + * src/dfa.c (CTX_NONE, CTX_LETTER, CTX_NEWLINE, char_context): New. + (sbit, letters, newline): New. + (dfasyntax): Fill them. + (dfastate): Remove letters, newline, initialized. + (build_state): Use CTX_* constants. + (dfaexec): Remove sbit and sbit_init. + +2012-02-05 Paolo Bonzini <bonzini@gnu.org> + + dfa: remove useless check + * src/dfa.c (state_index): There is nothing that is a newline *and* + a letter. Remove redundant call to SUCCEEDS_IN_CONTEXT. + +2012-01-22 Jim Meyering <meyering@redhat.com> + + build: update bootstrap from gnulib and adapt + * bootstrap: Update from gnulib. + * tests/init.sh: Update from gnulib. + * bootstrap.conf (bootstrap_epilogue): Remove now-unnecessary, + snippet that edited gnulib-tests/gnulib.mk. + (gnulib_tool_option_extras): Add both --symlink and + --makefile-name=gnulib.mk. Remove use of $bt. + * lib/Makefile.am: Initialize numerous automake variables so that + generated code in gnulib.mk may use += to append to them. + + maint: convert `this' to 'this' quoting style in diagnostics + Now that gnulib's quote and quotearg modules use 'this' style, + change the few explicit uses in diagnostics to conform. + * src/egrep.c (after_options): Use 'this' style of quotes. + * src/fgrep.c (after_options): Likewise. + * src/grep.c (after_options): Likewise. + * src/main.c (usage): Likewise. + + build: update gnulib to latest; adjust quoting in tests + * gnulib: Update. + * tests/in-eq-out-infloop: Convert expected diagnostics to match + new quoting. + +2012-01-22 Paul Eggert <eggert@cs.ucla.edu> + + doc: document recent diagnostics-related changes + * NEWS: Document changes re diagnostics related to GREP_COLORS, + directory loops, -s, "write error". + + grep: be quiet about GREP_COLORS syntax + * src/main.c (struct color_cap): fct now returns void, + since there's no longer need to use what it returns. + (color_cap_mt_fct, color_cap_rv_fct, color_cap_ne_fct): Return void. + (parse_grep_colors): Do not output diagnostics and then exit with + status 0. Instead, ignore errors in GREP_COLORS. This is more + consistent with programs that (e.g.) ignore errors in termcap entries, + and it's more internally-consistent as some GREP_COLORS errors + were ignored but not others. + + grep: exit with nonzero status if directory loop + * src/main.c (grepdir): Exit with status 2 if a directory loop is + found, since the output might not be "right" (i.e., infinite...). + + grep: suppress read errors if -s + * src/main.c (reset, grep, grepfile): Do not report an input error + if -s is given. + + grep: don't say "write error" over and over + Problem reported by Travis Gummels in + <https://bugzilla.redhat.com/show_bug.cgi?id=741452>. + * src/main.c (write_error_seen): New static var. + (clean_up_stdout): New function. + (prline): Do not output 'write error' more than once; exit + after the first one. Use the same wording for the diagnostic + that close_stdout uses. + (main): Clean up with clean_up_stdout, not close_stdout, so that + grep doesn't output multiple "write error" diagnostics. + * tests/Makefile.am (TESTS): Add epipe. + * tests/epipe: New file. + +2012-01-12 Paul Eggert <eggert@cs.ucla.edu> + + dfa: non-glibc word-constituent unibyte fix + * src/dfa.c (is_valid_unibyte_character): Fix typo that caused + this to incorrectly return 0 on unibyte non-glibc systems. + Problem reported by Aharon Robbins in + <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00084.html>. + +2012-01-04 Paul Eggert <eggert@cs.ucla.edu> + + doc: document empty pattern better + * doc/grep.texi (Top, Fundamental Structure, Usage): + Explain how grep deals with the empty pattern. + Problem spotted by Bernhard Voelker in + <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00050.html>. + + grep: with no args, search "." only if command-line -r + * NEWS: Document this. + * doc/grep.texi (Environment Variables, grep Programs): Likewise. + * src/main.c (usage): Likewise. + (main): Implement this. + (prepend_default_options): Return a count of prepended options. + * tests/r-dot: Test the above. + +2012-01-03 Jim Meyering <meyering@redhat.com> + + tests: adjust test to match code, now that --mmap writes to stderr + * tests/ignore-mmap: Separate stdout and stderr; test both. + + deprecate the --mmap option + * src/main.c (main): Deprecate the --mmap option: issue a warning + when it is used. + (usage): Change description. + * doc/grep.texi (Other Options): Document the new behavior. + * NEWS (Changes in behavior): Mention it. + +2012-01-03 Paolo Bonzini <bonzini@gnu.org> + + dfa: fix incorrect comment + * src/dfa.c (dfastate): Fix comment for newline. + + dfa: fix rebase conflict + * src/dfa.c (dfaanalyze): Fix reference to nalloc. + + dfa: automatically resize position_sets + * src/dfa.c (insert, copy, merge): Resize arrays here. + (dfaanalyze): Do not track number of allocated elements here. + (dfastate): Allocate mbps with only one element. + + dfa: change position_set nelem to size_t + * src/dfa.c (REALLOC_IF_NECESSARY): Disable assertion, to avoid + warnings from -Wtype-limits. + (position_set): Change nelem to a size_t. + + dfa: move nalloc to position_set structure + * src/dfa.c (position_set): Add alloc. + (alloc_position_set): Initialize it. + (dfaanalyze): Use it instead of the nalloc array or nelem. + + dfa: remove dead assignment + * src/dfa.c (transit_state): transit_state_consume_1char will clear follows, + do not do this ourselves. + + dfa: introduce alloc_position_set + * src/dfa.c (alloc_position_set): New function, use it throughout. + + dfa: use a more compact data type for grps + * src/dfa.c (leaf_set): New. + (dfastate): Use the smaller type, leaf_set, for grps. Its prior type + contained an unused constraint field. + + dfa: use MALLOC/REALLOC always + src/dfa.c (dfastate, enlist, dfamust): Use MALLOC and REALLOC. + + dfa: remove unnecessary braces + * src/dfa.c (dfastate): Remove unnecessary braces. + + dfa: x2nrealloc starting from a NULL pointer works + * src/dfa.c (parse_bracket_exp): Do not MALLOC mbcset parts the first time + they are encountered. Initialize chars_al correctly. + +2012-01-03 Jim Meyering <meyering@redhat.com> + + build: avoid build failure with --enable-gcc-warnings and recent gcc + * lib/colorize-posix.c: Disable -Wsuggest-attribute=const, to avoid + warning about this empty init_colorize function. + +2012-01-03 Paolo Bonzini <bonzini@gnu.org> + + remove lib/ms/ + * configure.ac: Create lib/colorize.c as a symbolic link. + * lib/colorize-posix.c: New name of lib/colorize-impl.c. + * lib/colorize-w32.c: New name of lib/ms/colorize-impl.c. + * lib/colorize.c: Delete. + * lib/Makefile.am (EXTRA_DIST): Adjust. + * .gitignore: Adjust. + * cfg.mk: Adjust syntax-check exclusions. + + unify colorize.h headers + * lib/Makefile.am (EXTRA_DIST): Adjust. + * lib/colorize.h: Remove inline functions. + * lib/colorize-impl.c: Move them here as functions. + * lib/ms/colorize.h: Remove. + * src/Makefile.am (DEFAULT_HEADERS): Remove. + +2012-01-02 Paolo Bonzini <bonzini@gnu.org> + + colorize: use isatty module + * bootstrap.conf: Add isatty module. + * gnulib: Update to latest. + * lib/colorize.h: Remove argument from should_colorize. + * lib/ms/colorize.h: Likewise. + * lib/colorize-impl.c: Factor isatty call out of here... + * lib/ms/colorize-impl.c: ... and here... + * src/main.c: ... into here. + +2012-01-02 Jim Meyering <meyering@redhat.com> + + tests: avoid minor "make check" failure + * tests/r-dot: Make executable, to avoid triggering a failed + consistency test in "make check". + +2012-01-02 Paul Eggert <eggert@cs.ucla.edu> + + grep: -r with no args now searches "." + This is a patch I've been meaning to put in for years. + When I added support for "grep -r", I forgot to have "grep -r PAT" + search the working directory by default, instead of searching + standard input (which makes no sense, even if stdin is a directory). + This is not an upward compatible change, since "grep -r PAT <file" + will no longer search standard input, but that's OK; nobody should + be using "grep -r" that way anyway. + * NEWS: Document this. + * doc/grep.texi (File and Directory Selection, grep Programs, Usage): + Likewise. + * src/main.c (usage): Likewise. + (grepdir): If DIR is null, search the working directory, but do + not prepend "./" to the file names. + (main): If recursing and no operands are given, search ".". + * tests/Makefile.am (TESTS): Add r-dot. + * tests/r-dot: New file. + + grep: prefer fgets to printf, _ to gettext + * lib/colorize.h (print_end_colorize): + * lib/ms/colorize-impl.c (print_end_colorize): + Use fputs instead of printf. + * src/main.c (usage): Likewise. Use _ instead of gettext. + +2012-01-01 Paul Eggert <eggert@cs.ucla.edu> + + grep: check stdin like other files + * NEWS: Document this. + * src/main.c (grepfile): Revamp tests for input files so that + standard input is tested like other files. For example, report + an error if standard input equals standard output. + Prefer open+fstat to stat+open if possible, as open+fstat is + usually a bit faster and avoids a race condition. + * tests/in-eq-out-infloop: Add tests for cases like + 'grep pat <file >>file'. + +2012-01-01 Jim Meyering <meyering@redhat.com> + + maint: update all copyright year number ranges + Run "make update-copyright". + +2011-12-31 Paul Eggert <eggert@cs.ucla.edu> + + grep: lower-case function names + These names used to be macros, but they're functions now. + All callers changed. + * src/main.c (pr_sgr_start): Rename from PR_SGR_START. + (pr_sgr_end): Rename from PR_SGR_END. + (pr_sgr_start_if): Rename from PR_SGR_START_IF. + (pr_sgr_end_if): Rename from PR_SGR_END_IF. + + ms: move Microsoft-specific stuff to lib/ms + * cfg.mk (exclude_file_name_regexp--sc_prohibit_strcmp) + (exclude_file_name_regexp--sc_require_config_h) + (exclude_file_name_regexp--sc_require_config_h_first): + New rules. + * lib/colorize.c, lib/colorize.h, lib/colorize-impl.c: + * lib/ms/colorize.h, lib/ms/colorize-impl.c: New files. + * configure.ac (GREP_SRC_INCLUDES): New macro. + * lib/Makefile.am (libgreputils_a_SOURCES): Add colorize.[ch]. + (EXTRA_DIST): New macro. + * src/Makefile.am (DEFAULT_INCLUDES): New macro. + * src/main.c: Include colorize.h. + (PR_SGR_START, PR_SGR_END, PR_SGR_START_IF, PR_SGR_END_IF): + Now static functions, not macros. + (hstdout, norm_attr, w32_console_init, w32_sgr2attr) + (w32_clreol) [__MINGW32__]: Move to lib/ms/colorize-impl.c. + (pr_sgr_start, pr_sgr_end): Remove; callers changed to use new + print_start_colorize, print_end_colorize from colorize.h. + (init_colorize): Rename from w32_console_init and move to + colorize module; caller changed. + (should_colorize): Move to colorize module. + + grep: do input==output check more like dir loop check + * src/main.c (grepfile): Just use SAME_INODE; don't bother + with SAME_REGULAR_FILE. This works better on properly-working + POSIX hosts, since it handles the case where the file is changing + as we grep it. It works worse on hosts that don't support st_ino + properly, but in practice this isn't that much of a problem here. + * src/system.h (same_file_attributes, SAME_REGULAR_FILE): + Remove; no longer needed. + + build: update gnulib submodule to latest + +2011-12-28 Paul Eggert <eggert@cs.ucla.edu> + + maint: remove now-unused/obsolete files + * README.DOS: Remove file. + * m4/djgpp.m4: Likewise. + * .gitignore: Remove reference to m4/djgpp.m4. + +2011-12-28 Jim Meyering <meyering@redhat.com> + + maint: distribute ChangeLog-2009 + * Makefile.am (EXTRA_DIST): Add ChangeLog-2009. + Spotted by Eli Zaretskii. + +2011-12-28 Jim Meyering <meyering@redhat.com> + + main.c: add some 'const' directives + * src/main.c (color_dict, fg_color, bg_color, cap): Declare const. + + No semantic change. + +2011-12-28 Jim Meyering <meyering@redhat.com> + + main.c: correct indentation and formatting style + * src/main.c: Correct many formatting inconsistencies. + No semantic change. + + avoid new syntax-check failures + * cfg.mk (old_NEWS_hash): Update, to accommodate old NEWS modification. + * src/main.c: Indent solely with spaces, never with TABs. + (should_colorize): Remove useless parens in #if directive. + +2011-12-28 Eli Zaretskii <eliz@gnu.org> + + Fix whitespace, indentation and documentation + * src/main.c (parse_grep_colors): Fix indentation. + (usage): Mention MS-Windows in help text for -U and -u options. + + update NEWS for MS-Windows changes + * NEWS: Mention MS-Windows related bugfixes and enhancements. + + Fix the test suite for MS-Windows. + * tests/include-exclude: Use --directories=skip, to avoid + gratuitous failures on systems that cannot grep directories. + * tests/reversed-range-endpoints: Don't reject program names with + leading directories and drive letters. + * tests/warn-char-classes: Likewise. + + Support color highlighting on MS-Windows + * src/main.c (SGR_START, SGR_END, PR_SGR_FMT, PR_SGR_FMT_IF): Remove. + (PR_SGR_START, PR_SGR_START_IF): Replace with pr_sgr_start. + (PR_SGR_END, PR_SGR_END_IF): Replace with pr_sgr_end. + (pr_sgr_start, pr_sgr_end, should_colorize): New functions. + (w32_console_init, w32_sgr2attr, w32_clreol) [__MINGW32__]: New functions. + (main): Use should_colorize. Invoke w32_console_init. + +2011-12-24 Paul Eggert <eggert@cs.ucla.edu> + + don't ignore errors when reading a directory + grep no longer silently suppresses errors when reading a directory + as if it were a text file. For example, "grep x ." now reports a + read error on most systems; formerly, it ignored the error. + Problem reported as an aside by Bob Proulx (Bug#10355). + * NEWS: Document this. + * src/main.c (grep, grepfile): Implement this. Simplify the code + considerably. + * src/system.h (is_EISDIR): Remove; no longer needed. + + --include etc. now work on command-line args more consistently + --include and --exclude apply only to non-directories and + --exclude-dir applies only to directories. "-" (standard input) + is never excluded, since it is not a file name. + This bug was discovered while fixing a read-directory bug (Bug#10355). + * NEWS: Document this. + * src/main.c (main): Implement this. + * tests/include-exclude: Test for it. + +2011-12-24 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2011-12-12 Arnold D. Robbins <arnold@skeeve.com> + + doc: improve grep.texi + * doc/grep.texi: General editing for improved aesthetics. + Also fix a few problems. + +2011-12-12 Jim Meyering <meyering@redhat.com> + + build: use gnulib's iswctype wcscoll + * bootstrap.conf (gnulib_modules): Add iswctype and wcscoll. + * configure.ac: Remove explicit checks for those functions. + * src/mbsupport.h (MBS_SUPPORT): Define to 1 if not already defined. + Remove the conditional, now that we're guaranteed by gnulib to have + wcscoll and iswctype. + Suggested by Alan Hourihane in http://savannah.gnu.org/bugs/?34930 + + disable the new input==output guard for additional options + * src/main.c (grepfile): Do not reject input == output also + when using a few other options. + * tests/in-eq-out-infloop: Test these new cases. + * NEWS (Bug fixes): Mention it + +2011-12-11 Nicolas Vigier <boklm@mars-attacks.org> + + do not reject "grep -qr . > out" + The recent fix to avoid an infinite disk-filling loop, commit 5e20a38a, + introduced a minor regression. If you use grep with -q and -r, and + redirect output to a file that will be traversed, then grep would + reject the command, even though it will generate no output. + In that case, there is no risk of an infinite loop. + * src/main.c (grepfile): Do not reject input == output when + using --quiet/--silent (-q). + Reported by J H Wilson in http://bugs.mageia.org/show_bug.cgi?id=3501 + forwarded by Nicolas Vigier to https://savannah.gnu.org/bugs/?34917 + +2011-11-29 Arnold Robbins <arnold@skeeve.com> + + dfa: do not call nl_langinfo in !MBS_SUPPORT mode + * src/dfa.c (using_utf8) [!MBS_SUPPORT]: Remove erroneous "defined" + in cpp test for MBS_SUPPORT. Since commit a163349d, MBS_SUPPORT is 0/1. + This error caused trouble only in the !MBS_SUPPORT case. + + dfa: avoid warning from deficient compiler in !MBS_SUPPORT mode + * src/dfa.c (setbit_wc) [!MBS_SUPPORT]: Add explicit "return false;" + after "abort ();", to avoid a warning from deficient compilers. + +2011-11-29 Jim Meyering <meyering@redhat.com> + + tests: use "compare exp out", not "compare out exp" + Likewise, when an empty file is expected, use "compare /dev/null out", + not "compare out /dev/null". I.e., specify the expected/desired contents + via the first file name. Prompted by a suggestion from Bruno Haible + in http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4020/focus=29154 + + Run these commands: + + git grep -l -E 'compare [^ ]+ exp' \ + |xargs perl -pi -e 's/(compare) (\S+) (exp\S*)/$1 $3 $2/' + git grep -l -E 'compare [^ ]+ /dev/null' \ + |xargs perl -pi -e 's/(compare) (\S+) (\/dev\/null)/$1 $3 $2/' + +2011-11-29 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2011-11-28 Jim Meyering <meyering@redhat.com> + + build: accommodate -Werror=suggest-attribute=pure + Now that we're using the latest manywarnings module from gnulib, + accommodate gcc's -Werror=suggest-attribute=pure option by marking + suggested functions with gnulib-defined _GL_ATTRIBUTE_PURE. + * src/kwset.c (hasevery): Mark function with pure attribute. + (bmexec): Likewise. + * src/dfa.c (nsubtoks, istrstr, find_pred, dfamusts): Likewise. + * configure.ac: Disable (for lib/) options that seem not to be worth + the trouble: -Wunsuffixed-float-constants and -Wformat-nonliteral. + +2011-11-21 Bruno Haible <bruno@clisp.org> + + build: fix "make check" error on OSF/1 + * tests/Makefile.am (TESTS_ENVIRONMENT): Test the value of the variable + BASH_VERSION, not the literal ASH_VERSION. + +2011-11-21 Jim Meyering <meyering@redhat.com> + + portability: work consistently on *BSD systems + * src/dfa.c (is_valid_unibyte_character): Define. + (IS_WORD_CONSTITUENT): Use it here, to make grep work consistently + even on *BSD systems, which use different tables for ctype macros + like isalpha. http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4022 + With help from Bruno Haible. + +2011-11-20 Jim Meyering <meyering@redhat.com> + + maint: consistently use NULL, not 0, when comparing pointers + * src/dfa.c (dfaanalyze): Compare trans[s] with NULL, not 0. + + maint: remove an avoidable #ifdef/#endif pair + * src/dfa.c (dfaanalyze): Remove avoidable #ifdef around "{". + + tests: fix typo in last change + * tests/word-delim-multibyte: Use double quotes around $e_acute, + not single quotes. Spotted by Bruno Haible. + This and the preceding change do not resolve the XPASS failure + on OpenBSD 4.9 after all. See the explanation at + http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4022 + + tests: avoid unwarranted test failure on *BSD-based systems + * tests/word-delim-multibyte (e_acute): Use a more portable + representation of e-acute. Reported by Bruno Haible. + +2011-11-19 Jim Meyering <meyering@redhat.com> + + maint: accommodate -Wdeclaration-after-statement, but only in dfa.c, + and because doing so does not impact readability/maintainability. + This is solely to accommodate gawk users who are stuck with ancient gcc. + This is no excuse to change any other code in grep. + * src/dfa.c (dfaoptimize, parse_bracket_exp): Move declaration + to precede first statement in block. + +2011-11-16 Jim Meyering <meyering@redhat.com> + + maint: post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.10 + * NEWS: Record release date. + + build: update gnulib submodule to latest + +2011-11-13 Jim Meyering <meyering@redhat.com> + + maint: update bootstrap and init.sh from gnulib + * tests/init.sh: Update from gnulib. + * bootstrap: Likewise. + +2011-11-12 Jim Meyering <meyering@redhat.com> + + build: update gnulib for exclude-test fixes + + tests: make our "export" replacement efficient with modern shells + * tests/Makefile.am (TESTS_ENVIRONMENT): Use a trivial and efficient + implementation with a shell that supports "export var=val". + Use the sed-invoking replacement only when necessary. + Improved by Stefano Lattarini. + + tests: make the replacement export function more robust + * tests/Makefile.am (sed_quote_value): Also quote single quotes. + Remove sed's -e options. Not needed. + +2011-11-12 Bruno Haible <bruno@clisp.org> + + tests: fix test suite execution failure on OSF/1 5.1 + * tests/Makefile.am (TESTS_ENVIRONMENT): Use a shell function to + ensure that we use only the portable form of the 'export' shell + built-in. + + tests: don't assume that /bin/bash exists + * tests/fedora: Run using /bin/sh, not /bin/bash. + + tests: avoid unwarranted failures due to SATAN's timeout + * tests/init.cfg (require_timeout_): Also ensure that + timeout exits with its child's exit status. + + build: fix compilation error on MSVC 9 to due Pexecute() declaration + * src/pcresearch.c (WITHOUT_PCRE_NORETURN): Remove macro. + (Pexecute): Replace abort() call with code that does not trigger GCC + warnings. + + tests: fix high-bit-range test failure on OSF/1 5.1 + * tests/high-bit-range: Use octal escape instead of hexadecimal escape + sequence. + +2011-11-11 Jim Meyering <meyering@redhat.com> + + build: update gnulib for solaris test fix + +2011-11-10 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + maint: adjust the URL that will appear in the generated announcement + * cfg.mk (url_dir_list): Use this http://ftp.gnu.org/gnu/$(PACKAGE) + for the first link listed in the generated announcement. + announce-gen now provides the faster mirror link automatically. + +2011-11-06 Jim Meyering <meyering@redhat.com> + + build: stop distributing gzip'd releases; xz is enough + * configure.ac (AM_INIT_AUTOMAKE): Add no-dist-gzip. + * NEWS (Build-related): Mention that we're dropping .tar.gz. + + build: update gnulib submodule to latest + +2011-10-14 Stefano Lattarini <stefano.lattarini@gmail.com> + + distcheck: ensure dist-hook fails if syntax-check fails + * Makefile.am (run-syntax-check): Fix logic, to ensure that + the recipe of this target returns a non-zero exit status if + "make syntax-check" fails. + +2011-10-12 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + This should fix a few portability problems, including one on HP-UX + and a test-float failure on PPC, reported by Andreas Metzler. + +2011-10-10 Stefano Lattarini <stefano.lattarini@gmail.com> + + gitignore: merge top-level and tests/ .gitignore files + * tests/.gitignore: Remove; what little remained of its + contents has been moved ... + * .gitignore: ... here. + + tests: tiny simplification in TESTS_ENVIRONMENT definition + * tests/Makefile.am (TESTS_ENVIRONMENT): Remove redundant use of + `export'. + +2011-10-10 Stefano Lattarini <stefano.lattarini@gmail.com> + + tests: support development version of automake too + This change implements a more correct and idiomatic use of the + features of the Automake-provided 'parallel-tests' harness. + Moreover, this change is required in order for the testsuite to + continue to work with the new testsuite harness that is planned + to be introduced in Automake 1.12 (which, as of the writing date, + is still under development and in late alpha state). + + * tests/Makefile.am (TESTS_ENVIRONMENT): The development version of + automake dos not support setting the interpreter delegated to run + the tests scripts in this variable; instead, use ... + (LOG_COMPILER): ... this variable. + * .gitignore: Ignore `.trs' files in directory `tests/'. + * build-aux/.gitignore: Ignore `test-driver' script. + +2011-10-03 Eli Zaretskii <eliz@gnu.org> + + dfa: don't mishandle high-bit bytes in a regexp with signed-char + This appears to arise only on systems for which "char" is signed. + * src/dfa.c (FETCH_WC, FETCH): Produce an unsigned value, rather + than a sign-extended one. Fixes a bug on MS-Windows with compiling + patterns that include characters with the 8-th bit set. + (to_uchar): Define. From coreutils. + Reported by David Millis <tvtronix@yahoo.com>. + See http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3893 + * NEWS (Bug fixes): Mention it. + +2011-09-16 Jim Meyering <meyering@redhat.com> + + maint: dfa: simplify multi-byte-related conditionals + * src/dfa.c (setbit_case_fold_c, parse_bracket_exp, lex): + (addtok_mb, dfaparse): Change each "MBS_SUPPORT && MB_CUR_MAX > 1" + test to just "MB_CUR_MAX > 1". + * src/dfasearch.c (kwsincr_case, EGexecute): Likewise. + * src/kwsearch.c (Fcompile, Fexecute): Likewise. + * src/searchutils.c (kwsinit): Likewise. + * src/dfa.c (parse_bracket_exp): Convert + "if (!MBS_SUPPORT || MB_CUR_MAX == 1)" to + "if (MB_CUR_MAX == 1)" and do this: + - assert(!MBS_SUPPORT || MB_CUR_MAX == 1); + + assert(MB_CUR_MAX == 1); + + maint: dfa: simplify several expressions + * src/dfa.c (dfainit): Set d->mb_cur_max unconditionally, now + that MB_CUR_MAX is always usable. With that, simplify all + "MBS_SUPPORT && d->mb_cur_max > 1" to simply "d->mb_cur_max > 1". + (dfastate, dfaexec, dfainit, dfafree): Simplify, removing each + now-unnecessary "MBS_SUPPORT &&". + + maint: dfa: avoid in-function "#if MBS_SUPPORT" tests + * src/dfa.c (setbit_case_fold_c): Remove "#if MBS_SUPPORT" in favor + of simple "if (MBS_SUPPORT ...". + (dfaexec, addtok): Likewise. + + maint: ensure that MB_CUR_MAX is defined even when !MBS_SUPPORT + * src/mbsupport.h [!MBS_SUPPORT] (MB_CUR_MAX): Define to 1. + + build: fix compilation failure when MBS_SUPPORT is 0 + * src/dfa.c (add_utf8_anychar): Always compile this function, + but when MBS_SUPPORT is 0, give it an empty body. + (prepare_wc_buf): Likewise. + [! MBS_SUPPORT] (setbit_wc): Define to always abort. + + maint: dfa: simplify dfaoptimize + * src/dfa.c (dfaoptimize): Simplify. + (dfacomp): Remove now-redundant "if (MBS_SUPPORT)" guard, + since dfaoptimize does nothing if !MBS_SUPPORT. + + maint: dfa: remove some #if MBS_SUPPORT guards + * src/dfa.c: Replace a few "#if MBS_SUPPORT" directives with + "if (MBS_SUPPORT)". Remove some altogether. + + maint: dfa: convert #if-MBS_SUPPORT (dfastate) + * src/dfa.c (dfastate): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (dfastate) + * src/dfa.c (dfastate): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (state_index) + * src/dfa.c (state_index): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (dfaparse) + * src/dfa.c (dfaparse): Use regular "if", not #if MBS_SUPPORT.' + + maint: dfa: convert #if-MBS_SUPPORT (copytoks) + * src/dfa.c (copytoks): Use regular "if", not #if MBS_SUPPORT.' + + maint: dfa: convert #if-MBS_SUPPORT (lex) + * src/dfa.c (lex): Use regular "if", not #if MBS_SUPPORT.' + + maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp) + * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp) + * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp) + * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (dfaexec) + * src/dfa.c (dfaexec): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (dfaexec) + * src/dfa.c (dfaexec): Use regular "if", not #if MBS_SUPPORT. + Also add curly braces around multi-line if/else blocks. + + maint: dfa: remove #if-MBS_SUPPORT (free_mbdata) + * src/dfa.c (free_mbdata): Remove the #if guard altogether. + + maint: dfa: convert #if-MBS_SUPPORT (dfaoptimize, dfacomp) + * src/dfa.c (dfaoptimize, dfacomp): Use regular "if", + not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (dfafree) + * src/dfa.c (dfafree): Use regular "if", not #if MBS_SUPPORT. + + maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp, part1) + * src/dfa.c (parse_bracket_exp): Remove in-function #if MBS_SUPPORT. + + maint: remove #if-MBS_SUPPORT declaration guards + * src/search.h: Don't bother to #if-out declarations. + + maint: convert #if-MBS_SUPPORT (EGexecute) + * src/dfasearch.c (EGexecute): Remove in-function #if MBS_SUPPORT. + + maint: convert #if-MBS_SUPPORT (kwsincr_case) + * src/dfasearch.c (kwsincr_case): Remove in-function #if MBS_SUPPORT. + Move decl's down. + + maint: convert #if-MBS_SUPPORT (Fcompile, etc.) + * src/kwsearch.c (Fcompile, Fexecute): Remove in-function #if MBS_SUPPORT. + (Fcompile): Rearrange some declarations. No semantic change. + + maint: convert #if-MBS_SUPPORT (kwsinit) + * src/searchutils.c (kwsinit): Remove in-function #if MBS_SUPPORT. + + maint: dfa: remove case-guarding #if-MBS_SUPPORT + * src/dfa.c [DEBUG] (prtok): Remove now-useless #if-MBS_SUPPORT. + +2011-09-15 Jim Meyering <meyering@redhat.com> + + maint: remove #if MBS_SUPPORT around member declaration + * src/dfa.c (dfastate): Don't #ifdef-out "mbps" position_set member. + + maint: dfa: remove #if MBS_SUPPORT around struct definition + * src/dfa.c (struct mb_char_classes): Don't #ifdef-out declarations. + + build: avoid compilation failure when building without PCRE support + * src/pcresearch.c [!HAVE_LIBPCRE] (WITHOUT_PCRE_NORETURN): Define + to _Noreturn, not obsoleted-by-gnulib _GL_ATTRIBUTE_NORETURN. + Reported by Eric Blake. + + tests: stop using skip_test_; use skip_ instead + * tests/init.cfg (skip_test_): Remove definition. Use the improved + skip_ function from init.sh, now that it has the same feature. + * tests/euc-mb: s/skip_test_/skip_/ + * tests/sjis-mb: Likewise. + * tests/fmbtest: Likewise. + + tests: skip tests that require MBS support + * tests/init.cfg (require_compiled_in_MB_support): New function. + * tests/char-class-multibyte: Use it here, since this test cannot + succeed without MBS support. + * tests/equiv-classes: Likewise. + * tests/euc-mb: Likewise. + * tests/fgrep-infloop: Likewise. + * tests/init.cfg: Likewise. + * tests/prefix-of-multibyte: Likewise. + * tests/turkish-I: Likewise. + * tests/sjis-mb: Likewise. + + tests: make fmbtest explain (to stderr, not log) why it is skipped + * tests/fmbtest: Use skip_ and fail_ to give better diagnostics. + + maint: dfa: improve comments + * src/dfa.c (match_mb_charset, match_anychar): Improve comments. + +2011-09-14 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to newer + + maint: correct indentation + * src/dfa.c (dfaexec): Reposition curly braces to match indentation style. + Remove useless comment. + + maint: move declaration "down" to inner scope where it is used + * src/dfa.c (dfaexec): Move decl of local down into scope where used. + +2011-09-07 Jim Meyering <meyering@redhat.com> + + doc: use "file name" consistently in grep's --help output + * src/main.c (usage): Use "file name", not "filename" in descriptions + of --with-filename (-H), --no-filename (-h) and --label=LABEL. + Suggested by Sequoia McDowell. + + bug: requires ru_RU.KOI8-R". [bug introduced in grep-2.9] + +2011-08-31 Matthew Burgess <matthew@linuxfromscratch.org> + + tests: remove debug code that would cp to /t + * tests/unibyte-bracket-expr: Remove debug artifact introduced + by 2011-06-02 commit de5f7000, "tests: exercise a uni-byte [...] + bug: requires ru_RU.KOI8-R". [bug introduced in grep-2.9] + +2011-08-20 Jim Meyering <meyering@redhat.com> + + build: use largefile module and update to latest gnulib + * configure.ac: Remove AC_SYS_LARGEFILE, subsumed by ... + * bootstrap.conf (gnulib_modules): ...this. Use largefile module. + * gnulib: Update to latest. + + maint: clean up and plug a leak-on-OOM + * src/dfa.c (icatalloc): Clean up; use xrealloc in place of malloc + and realloc; remove conditionals that are unnecessary, now that + failed allocation results in exit. + (enlist): Use xrealloc in place of realloc; remove conditional. + (comsubs): Avoid leak upon failed enlist call. + (dfamust): Use xmalloc in place of malloc. + Remove conditionals, now that icpyalloc and icatalloc never return NULL. + + maint: use x2nrealloc, not xrealloc + * src/main.c (main): Use x2nrealloc, not xrealloc + +2011-07-24 Jim Meyering <meyering@redhat.com> + + tests: add a test to trigger the bug + * tests/Makefile.am (TESTS): Add it. + * tests/in-eq-out-infloop: Exercise the bug/fix. + + exit 2 (rather than infloop) when an input file is also on stdout + This avoids a potential "infinite" disk-filling loop. + Reported in http://savannah.gnu.org/patch/?5316 + and http://savannah.gnu.org/bugs/?17457. + * src/main.c: Include "quote.h". + (out_stat): New global. + (grepfile): Compare each regular file's dev/ino/etc. + with those from the file on stdout (if it too is regular). + (main): Set out_stat, if stdout is a regular file. + * src/system.h: Include "same-inode.h". + (same_file_attributes): Define. From diffutils. + (SAME_REGULAR_FILE): Define. + * bootstrap.conf (gnulib_modules): Use quote, not quotearg. + Use same-inode. + * NEWS (Bug fixes): Mention it. + +2011-07-15 Reuben Thomas <rrt@sc3d.org> + + doc: improve documentation of character classes in the man page + * doc/grep.in.1: Reword documentation of character classes. + +2011-07-12 Jim Meyering <meyering@redhat.com> + + dfa: remove unnecessary inclusion of verify.h + * src/dfa.c: Don't include "verify.h". + + dfa: simplify use of *ALLOC macros + * src/dfa.c (XNMALLOC, XCALLOC): Redefine without outer cast-to-(t *). + (CALLOC, MALLOC, REALLOC): Remove type "t" parameter and adjust callers. + + dfa: change semantics of REALLOC_IF_NECESSARY's 3rd parameter + * src/dfa.c (REALLOC_IF_NECESSARY): Change meaning of 3rd param, + from "maximum index" to 1 greater than that: the required number + of *P-sized elements. Note that only some of the uses of + REALLOC_IF_NECESSARY needed to be adjusted, the others had already + required an extra element. + + dfa: rename REALLOC_IF_NECESSARY param/local for clarity + * src/dfa.c (REALLOC_IF_NECESSARY): Rename nalloc and new_nalloc + to n_alloc and new_n_alloc. + + dfa: prepare for a semantic change in REALLOC_IF_NECESSARY + * src/dfa.c (REALLOC_IF_NECESSARY): Remove "t" (type) parameter. + Use (*p) instead. Adjust all callers. + + dfa: add braces to REALLOC_IF_NECESSARY definition + * src/dfa.c (REALLOC_IF_NECESSARY): Add curly braces; use TABs + to right-indent. + +2011-06-28 Paolo Bonzini <bonzini@gnu.org> + + doc: improve documentation of character classes + * doc/grep.texi (Character classes): Mention explicitly when + examples refer to the C locale, explain better the general + meaning of character classes. + +2011-06-28 Jim Meyering <meyering@redhat.com> + + dfa: fix the root cause of the heap overrun + dfa's "insert" function was supposed to be maintaining the position + list sorted on *decreasing* index, but since the 2009-12-09 "Speed + up insert" commit, 62458291, it was using code that assumed the data + were sorted on *increasing* index. As such, sometimes it would no + longer merge constraints (not finding a match) and would append + entries that normally would have matched and been merged. Those + erroneous append operations resulted in the heap overrun fixed by + 2011-06-17 commit 0b91d692 by doubling the array size. + * src/dfa.c (insert): Fix the comparison. + (dfaanalyze): Now that that's fixed, revert commit 0b91d692, + allocating space for only d->nleaves entries, not double that. + As far as I can tell, this change has no effect other than + decreased memory usage, although it may improve performance + slightly, since the resulting list of positions is half as long + as it used to be. + +2011-06-28 Paolo Bonzini <bonzini@gnu.org> + + dfa: use memcpy to copy position_sets + * src/dfa.c (copy): Use memcpy. + + dfa: use copyset to copy charclasses + * src/dfa.c (add_utf8_anychar): Change memcpy to copyset. + + gnulib: Update + Fixes mmap-anon.m4 conflict with fn_grep, reported by Rainer Orth. + +2011-06-21 Jim Meyering <meyering@redhat.com> + + maint: update bootstrap from gnulib + * bootstrap: Update to latest, so it no longer inserts empty lines + in .gitignore files. + * .gitignore: Let bootstrap move "!..." lines to end of file. + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.9 + * NEWS: Record release date. + + build: avoid a warning when building with --disable-perl-regexp... + and --enable-gcc-warnings. + * src/pcresearch.c (WITHOUT_PCRE_NORETURN): Define. + Remove the unreachable return statement. + Reported by Eric Blake. + + tests: ensure that each test script is executable + This adds a rule run at "make check" time to ensure that + test scripts are consistently executable. + This change is not required for "make check", but makes it easier + for people to run scripts manually, but that is discouraged because + doing so makes it easy to omit important variable settings that + are normally provided via TESTS_ENVIRONMENT. + This change also makes each of the existing TESTS executable. + * tests/Makefile.am (check_executable_TESTS): New rule. + (check): Depend on it. + * tests/{all_scripts}: chmod 755. + Prompted by a report from Eric Blake. + + maint: update bootstrap from gnulib + * bootstrap: Update from gnulib. + + maint: update po/POTFILES.in + * po/POTFILES.in: Remove dfasearch.c, now that it no longer + contains a translatable diagnostic. + + tests: include-exclude: avoid false positive failure on FreeBSD + * tests/include-exclude: Avoid false-positive failure due to + matching "a" in a directory on FreeBSD, when searching a directory + without "-r". Search for '^aaa$' rather than just 'a'. + Adjust test inputs and expected output files accordingly. + + dfa: remove some useless casts + * src/dfa.c (icatalloc): Change type of "old" parameter + from "char const *" to "char *". + Don't cast-away const on realloc argument. + Remove now-unnecessary const-discarding cast. + Don't (void)-cast strcpy result. + * src/dosbuf.c (undossify_input): Remove anachronistic + cast-to-"char *" of realloc argument. + + dfa: more heap-allocation-related overflow protection + * src/dfa.c (enlist): Use xnrealloc, not realloc. + Also, remove unnecessary cast-to-(char *). + (dfamust): Use xnmalloc, not malloc. Before, this code would + return upon malloc failure (xnmalloc exits upon failure), but + later, via the *ALLOC macros, it could already exit, so this + new potential exit point is nothing new. The same applies + to enlist, since it is called only through dfamust. + + tests: update init.sh; simplify TESTS_ENVIRONMENT + * tests/init.sh: Update from coreutils. + * tests/Makefile.am (TESTS_ENVIRONMENT): Remove shell_or_perl_ + function. Instead, just use $(SHELL), since grep has no test + that starts with #!/usr/bin/perl. + +2011-06-20 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + build: avoid configure/gnulib-related errors + * bootstrap.conf: Remove now-unnecessary code to exclude + gettext/intl-related m4 tests. + +2011-06-19 Jim Meyering <meyering@redhat.com> + + maint: tighten up superfluous code + * src/main.c (parse_grep_colors): Use xstrdup in place of xmalloc, + a useless test, strlen, and strcpy. + +2011-06-19 Paul Eggert <eggert@cs.ucla.edu> + + dfa: avoid possibility of overflow + * src/dfa.c (REALLOC_IF_NECESSARY, CALLOC, MALLOC, REALLOC): + Use functions from xalloc.h to avoid overflow. + * src/dfasearch.c (GEAcompile): Use xnrealloc rather than realloc. + * src/pcresearch.c (Pcompile): Use xnmalloc, not xmalloc. + +2011-06-17 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + dfa: correct two uses of btowc + * src/dfa.c (setbit_c, setbit_case_fold_c): Compare the btowc + return value against WEOF, not EOF. Suggested by Eli Zaretskii. + On a system like MinGW with unsigned wint_t, comparing a btowc + return value against EOF (-1) would always be false. + + dfa: don't overrun a malloc'd buffer for certain regexps + * src/dfa.c (dfaanalyze): Allocate space for twice as many + positions as there are leaves. Before this change, for some + regular expressions, DFA analysis would have inserted far more + "positions" than dfa->nleaves (up to double). + Reported by Raymond Russell in http://savannah.gnu.org/bugs/?33547 + * tests/dfa-heap-overrun: Trigger the overrun. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + +2011-06-08 Jim Meyering <meyering@redhat.com> + + tests: don't ignore sjis-mb test failure + I made changes that caused grep to segfault during "make check" -- + as seen in dmesg output -- yet no test failed(!), and there was no + trace of the segfault in the logs. + * tests/sjis-mb (test_grep_reject): Ensure that output is empty. + Don't ignore test failure. + +2011-06-07 Paolo Bonzini <bonzini@gnu.org> + + dfa: optimize wide characters in a bracket expression + * src/dfa.c (addtok): Compile characters to an alternation. Handle the + case when nothing else remains in the MBCSET. + + dfa: refactor to prepare for upcoming optimizations + * src/dfa.c (parse_bracket_exp): Move optimization of MBCSET from here... + (addtok): ... to here. + +2011-06-07 Paolo Bonzini <bonzini@gnu.org> + + dfa: correct handling of single-byte character ranges + This provides a better fix for the unibyte-bracket-expr and high-bit-range + testcases, and fixes the latent bug tested by bogus-wctob. + + * src/dfa.c (setbit_case_fold): Remove, replace with... + (setbit_wc, setbit_c, setbit_case_fold_c): ... these. + (parse_bracket_exp): Use setbit_case_fold_c when iterating over + single-byte sequences. Use setbit_wc for multi-byte character sets, + and setbit_case_fold_c for single-byte character sets. + (lex): Use setbit_case_fold_c for single-byte character sets. + +2011-06-07 Paolo Bonzini <bonzini@gnu.org> + + tests: exercise latent bug in character ranges + * tests/bogus-wctob: New. + * Makefile.am (TESTS): Add it. + +2011-06-07 Jim Meyering <meyering@redhat.com> + + tests: exercise a uni-byte [...] bug: requires ru_RU.KOI8-R + * tests/unibyte-bracket-expr: New file. + * tests/Makefile.am (TESTS): Add it. + * init.cfg (require_ru_RU_koi8_r): New function. + + fix the [...] bug also for relatively unusual uni-byte encodings + * src/dfa.c (setbit_case_fold): Also handle uni-byte locales + like the one mentioned in the original report: see 2011-05-07 + commit d98338eb. Re-reported by Santiago Ruano Rincón. + Note that most uni-byte locales are not affected. + * NEWS (Bug fixes): Mention it. + + tests: use skip_test_, not skip_ + Use skip_test_, not skip_. The former prints its message both to + the log file and to FD 9 (redirected to tty via tests/Makefile.am), + while skip_ prints only to stderr, which goes to the log file. + * tests/init.cfg (skip_test_): New function. + Use skip_test_ in place of skip_ everywhere. + * tests/fmbtest: s/skip_/skip_test_/ + * tests/sjis-mb: Likewise. + * tests/euc-mb: Likewise. + + tests: fmbtest: factor + * tests/fmbtest: Factor out locale-name duplication. + + tests: fix skip-inducing typo in fmbtest + * tests/fmbtest: Fix locale name typo (s/cz_CZ/cs_CZ/) + that would cause this test to be skipped every time. + +2011-06-07 Paolo Bonzini <bonzini@gnu.org> + + gnulib: adjust included modules + * bootstrap.conf (gnulib_modules): Drop strtoul, rename wctype to + wctype-h. + +2011-05-21 Jim Meyering <meyering@redhat.com> + + grep -P: don't abort upon exceeding PCRE's backtracking limit + * src/pcresearch.c (Pexecute): Handle PCRE_ERROR_MATCHLIMIT. + * tests/Makefile.am (XFAIL_TESTS): Remove pcre-abort. + * tests/pcre-abort: Expect failure, no output, and increase + the length of the input string, in case the backtracking limit + is ever raised. Adjust comment. + * NEWS (Bug fixes): Mention it. + + tests: show how to make grep -P abort + * tests/pcre-abort: New file. + Minimal testcase by Paolo Bonzini, derived from a report + by www.beaver@list.ru. + * tests/Makefile.am (TESTS): Add it. + (XFAIL_TESTS): Add it here, too, since this test always fails, for now. + + tests: fix oddities in pcre-z + * tests/pcre-z: Redirect stderr inside $(), not outside. + Remove double quotes around $REGEX (which is just 'a') within + double-quoted "$(...)". Split a long line. + + tests: factor out a new require_pcre_ function + * tests/init.cfg (require_pcre_): New function, factored out of... + * tests/pcre-z: ...here. Use the function. + * tests/pcre: Likewise. + + tests: clean up pcre + * tests/pcre: Skip (don't pass) the test when PCRE support is disabled. + Don't redirect so much to /dev/null, now that all test output goes to + pcre.log. Remove unnecessary braces and diagnostic about failing test. + +2011-05-13 Jim Meyering <meyering@redhat.com> + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.8 + * NEWS: Record release date. + + build: update gnulib, for fixed getcwd test + + build: update gnulib submodule to latest + + maint: remove syntax-checking sc_tight_scope rule + * src/Makefile.am (sc_tight_scope): Remove rule. + Now it's provided via gnulib's maint.mk. + * cfg.mk (sc_tight_scope): Likewise. + +2011-05-08 Jim Meyering <meyering@redhat.com> + + maint: use consistent declaration syntax + * src/grep.h (matchers): Declare consistently, so the sc_tight_scope + rule detects this as an extern-marked variable. + +2011-05-07 Jim Meyering <meyering@redhat.com> + + maint: use gnulib's new readme-release module + * bootstrap.conf (gnulib_modules): Add readme-release. + (bootstrap_epilogue): Add the recommended perl one-liner. + * README-release: Remove file; it is now generated from gnulib. + * .gitignore: Add it. + * gnulib: Update submodule to latest. + + tests: exercise bug with 0x80..0xff in [...] + * tests/high-bit-range: New test, inspired by an example in the + report by Igor O. Ladygin: http://bugs.debian.org/624387, + via Santiago Ruano Rincón's http://savannah.gnu.org/bugs/?33198 + * tests/Makefile.am (TESTS): Add it. + + fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff + * src/dfa.c (setbit_case_fold) [MBS_SUPPORT]: Set the bit also + when wctob returns EOF. + * NEWS (Bug fixes): Mention it. + +2011-05-02 Reuben Thomas <rrt@sc3d.org> + + doc: correct comment about mmap + * doc/grep.texi (Other Options) [--mmap]: This option is now + ignored, so using it can have no effect on performance. + +2011-05-02 Arnold D. Robbins <arnold@skeeve.com> + + build: move add_utf8_anychar into MBS ifdef + +2011-05-01 Arnold D. Robbins <arnold@skeeve.com> + + maint: remove GAWK ifndef; no longer needed + +2011-05-01 Jim Meyering <meyering@redhat.com> + + maint: remove now-unnecessary use of gnulib's strtol module + * bootstrap.conf (gnulib_modules): Remove now-obsolete "strtol". + +2011-04-29 Jim Meyering <meyering@redhat.com> + + maint: tweak README-release + * README-release: Add note to check the NixOS/Hydra autobuilder results. + +2011-04-28 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + maint: add the tight_scope syntax-checking rule + This ensures that the only externally scoped symbols are ones + that are explicitly marked as "extern" or white-listed like "main". + * src/Makefile.am (sc_tight_scope): New rule, copied from coreutils. + * cfg.mk (sc_tight_scope): Define, to hook to it from the top level. + + maint: mark some function declarations as extern + * src/search.h: Add "extern" keyword to each function declaration. + +2011-04-23 Jim Meyering <meyering@redhat.com> + + maint: fix doubled-word typos in comments + * src/dfa.c (SUCCEEDS_IN_CONTEXT): Remove doubled "a". + * src/dfa.c (BACKREF): s/it it/it is/ + +2011-04-09 Jim Meyering <meyering@redhat.com> + + maint: fix typos in comments: s/can not/cannot/ + * src/dfa.c (check_matching_with_multibyte_ops, dfastate): As above. + +2011-03-19 Jim Meyering <meyering@redhat.com> + + maint: stop using .x-sc_* files to list syntax-check exemptions + Instead, use the new mechanism with which you merely use a + variable (derived from the rule name) defined in cfg.mk to an ERE + matching the exempted file names. + * gnulib: Update to latest, to get maint.mk that implements this. + * .x-sc_bindtextdomain: Remove file. + * .x-sc_prohibit_tab_based_indentation: Likewise. + * .x-sc_prohibit_xalloc_without_use: Likewise. + * .x-sc_space_tab: Likewise. + * cfg.mk: Define variables to exempt the same files. + + build: correct my change of 2011-01-28 + Do not override original dist-hook rule. + * Makefile.am (run-syntax-check): Rename from overriding dist-hook. + (dist-hook): Depend on run-syntax-check. + +2011-02-27 Jim Meyering <meyering@redhat.com> + + maint: update from gnulib + * bootstrap: Update from gnulib. + * tests/init.sh: Likewise. + * gnulib: Update to latest. + +2011-01-27 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + build: run syntax-check rules as part of "make dist" + * Makefile.am (dist-hook): Depend on syntax-check. + Suggested by Reuben Thomas. + +2011-01-26 Jim Meyering <meyering@redhat.com> + + maint: remove unneeded #include directives + * lib/savedir.c: Don't include <stddef.h>. Not needed. + * src/dfa.c: Likewise. + +2011-01-22 Jim Meyering <meyering@redhat.com> + + build: avoid new syntax-check failures + * .x-sc_bindtextdomain: New file, used to avoid a spurious + failure from the new syntax-check rule. + * NEWS: Remove a trailing space. + +2011-01-19 Jim Meyering <meyering@redhat.com> + + tests: add a known-to-fail test + * tests/turkish-I: New test. + * tests/Makefile.am (TESTS): Add it. + (XFAIL_TESTS): Add here, too. + Reported by Ilya Basin. + + maint: sort test names in Makefile.am + * tests/Makefile.am (TESTS): Sort test names. + +2011-01-05 Jim Meyering <meyering@redhat.com> + + doc: remove erroneous "{,m}" item from grep man page + * doc/grep.in.1: Remove item describing bogus {,m} regex notation. + Reported by Fernando Basso. + +2011-01-03 Jim Meyering <meyering@redhat.com> + + maint: update copyright year ranges to include 2011 + Run "make update-copyright", so "make syntax-check" works in 2011. + + build: update gnulib submodule to latest + +2010-12-20 Paolo Bonzini <bonzini@gnu.org> + + main: fix exit status on xmalloc failures + * NEWS: Update. + * src/main.c (main): Set exit_failure. Reported by Guy Shaw. + + add comment above fn_grep + * configure.ac (fn_grep): Add comment suggested by Bruno Haible. + +2010-11-14 Paolo Bonzini <bonzini@gnu.org> + + grep: add include guards + * src/system.h: Add multiple inclusion guards. + * src/grep.h: Likewise. + + configure: fix M4 quotation + * configure.ac: Add extra brackets around [...] patterns. + + configure: remove dependency on grep that supports long lines and -e + * configure.ac (fn_grep): New. Set GREP and EGREP to it, replace + with newly-built grep before AC_OUTPUT. Reported by Florin Iucha + <http://savannah.gnu.org/bugs/?31646>. + +2010-11-04 Jim Meyering <meyering@redhat.com> + + build: update gnulib to latest + + tests: don't hard-code a 5-second timeout; that's not always enough + Instead, time the command in the C locale and use 10 times that + duration -- rounded up to whole seconds -- as the timeout when running + it in the UTF-8 locale. + * tests/backref-multibyte-slow: Compute a performance-relative timeout. + Reported by Gilles Espinasse, regarding an imac 400. For more details, + see http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3360 + +2010-10-09 Jim Meyering <meyering@redhat.com> + + maint: describe policy on copyright year number ranges + * README: Mention coreutils' long-standing policy on use of M-N + ranges in copyright year lists. Requested by Richard Stallman. + +2010-10-04 Dmitry V. Levin <ldv@altlinux.org> + + build: compile gnulib without -Wcast-align to avoid warnings on ARM + * configure.ac (GNULIB_WARN_CFLAGS): Remove -Wcast-align. + +2010-09-30 Jim Meyering <meyering@redhat.com> + + maint: don't define a gpg_key_ID. now it's obtained automatically + * cfg.mk (gpg_key_ID): Remove definition. No longer needed. + +2010-09-23 Paolo Bonzini <bonzini@gnu.org> + + tests: add testcase for previous fix + * tests/inconsistent-ranges: New. + * tests/Makefile.am (TESTS): Add it. + +2010-09-23 Paolo Bonzini <bonzini@gnu.org> + + dfa: process range expressions consistently with system regex + The actual meaning of range expressions in glibc is not exactly strcoll, + which makes the behavior of grep hard to predict when compiled with the + system regex. Leave to the system regex matcher the decision of which + single-byte characters are matched by a range expression. + + This partially reverts a change made in commit 0d38a8bb (which made + sense at the time, but not now that src/dfa.c is not doing multibyte + character set matching anymore). + + * src/dfa.c (in_coll_range): Remove. + (parse_bracket_exp): Use system regex to find which single-char + bytes match a range expression. + +2010-09-23 Bruno Haible <bruno@clisp.org> + + build: fix link error on systems that have libiconv but not libintl + * src/Makefile.am (LDADD): Add $(LIBICONV). + +2010-09-21 Jim Meyering <meyering@redhat.com> + + build: avoid compilation failure on the Hurd + * src/dfasearch.c (dfawarn): Rename enum symbols to use DW_ prefix, + so as not to collide with "GNU", which is defined by the Hurd. + Reported by Matthias Lanzinger in http://savannah.gnu.org/bugs/?31096 + +2010-09-20 Jim Meyering <meyering@redhat.com> + + maint: avoid obsolete gnulib modules + * bootstrap.conf (gnulib_modules): Don't use obsolete atexit module. + Use malloc-gnu and realloc-gnu -- malloc and realloc are obsolete. + + maint: update README-release + * README-release: Reflect changes in coreutils' version of this file. + +2010-09-20 Aharon Robbins <arnold@skeeve.com> + + dfa: fix compilation when not using MBS + * src/dfa.c (prepare_wc_buf) [!MBS_SUPPORT]: Do not compile this + function. + +2010-09-16 Jim Meyering <meyering@redhat.com> + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.7 + * NEWS: Record release date. + +2010-09-13 Paolo Bonzini <bonzini@gnu.org> + + tests: add equiv-classes + * configure.ac (USE_INCLUDED_REGEX): Add Automake conditional. + * tests/equiv-classes: New test. + * tests/Makefile.am (TESTS): Add it. + (XFAIL_TESTS) [USE_INCLUDED_REGEX]: Mark it as expected failure. + +2010-09-13 Paolo Bonzini <bonzini@gnu.org> + + dfa: fall back to glibc matcher if a MBCSET is found + This patch enables full support of equivalence classes and multicharacter + collation symbols. It can also improve performance problems in some + cases for multibyte grep. Both of these changes however depend on the + glibc version installed in the system. + + For UTF-8 it will trigger only in the presence of MBCSET, e.g. [a-z]. + For other character sets all brackets and `.` as well will trigger it. + + * NEWS: Document this. + * src/dfa.c (dfaexec): Fall back to glibc for multibyte matches, + if possible. + +2010-09-13 Paolo Bonzini <bonzini@gnu.org> + + build: update gnulib submodule to latest + This is done to include commit "regex: Pass the system regex if its only + problem is 32-bit regoff_t". + + * gnulib: Update to e2b0e1a. + +2010-09-12 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + + tests: update init.sh from gnulib + * tests/init.sh: Update from gnulib. + +2010-09-08 Patrick Boyd <pboyd04@gmail.com> + + dfa: reduce stack usage + * src/dfa.c (dfaanalyze): Allocate GRPS and LABELS arrays from heap, + not on the stack. With this change, grep can now run in these UEFI + simulators: + http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=EDK + http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=EDK2 + +2010-09-08 Jim Meyering <meyering@redhat.com> + + tests/portability: avoid spurious failure with OpenBSD's /bin/sh + * tests/warn-char-classes: Don't use "set -x" here. It causes + a spurious test failure on openbsd 4.7 when using its /bin/sh, + since the command, /bin/sh -xc 'P=1 : 2> err' emits "P=1" into err. + To enable set -x, run the test with "VERBOSE=yes", e.g., + make check -C tests TESTS=warn-char-classes VERBOSE=yes + +2010-09-07 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2010-09-03 Jim Meyering <meyering@redhat.com> + + tests: remove .sh suffix from remaining test scripts. + * tests/backref: Rename from backref.sh. + * tests/bre: Rename from bre.sh. + * tests/ere: Rename from ere.sh. + * tests/file: Rename from file.sh. + * tests/khadafy: Rename from khadafy.sh. + * tests/options: Rename from options.sh. + * tests/pcre: Rename from pcre.sh. + * tests/spencer1: Rename from spencer1.sh. + * tests/spencer2: Rename from spencer2.sh. + * tests/status: Rename from status.sh. + * tests/yesno: Rename from yesno.sh. + * tests/Makefile.am: Reflect renamings. + + tests: convert remaining tests to use init.sh + * tests/file.sh: Use init.sh. Use Exit, not exit. Use grep, not ${GREP}. + * tests/khadafy.sh: Likewise. + * tests/options.sh: Likewise. + * tests/spencer1.sh: Likewise. + * tests/spencer2.sh: Likewise. + * tests/status.sh: Likewise. + * tests/spencer1.awk: Use grep, not ${GREP}. + Don't ignore failure to generate intermediate shell script. + * tests/Makefile.am (CLEANFILES): Remove altogether, now that + all tests use init.sh. + (TESTS_ENVIRONMENT): Don't set GREP. It's no longer used. + + tests: remove warning.sh + * tests/warning.sh: Remove file. All it did was print a warning. + * tests/Makefile.am (TESTS): Remove warning.sh. + + tests: convert pcre.sh to use init.sh + * tests/pcre.sh: Use init.sh. Use Exit, not exit. Use grep, not ${GREP}. + + tests: convert bre.sh to use init.sh + * tests/bre.sh: Use init.sh. + Use Exit, not exit. + Use "$abs_top_srcdir/tests/", not "$srcdir/" to specify inputs. + Source generated bre.script, rather than invoking $SHELL. + * tests/ere.sh: Likewise. + * tests/bre.awk: Use grep, not ${GREP}. + * tests/ere.awk: Likewise. + * tests/Makefile.am (CLEANFILES): Remove bre.script and ere.script. + + tests: convert to use init.sh + * tests/yesno.sh: Use init.sh. + Use Exit, not exit. + Use grep, not $GREP. + * tests/backref.sh: Likewise. + * tests/Makefile.am (CLEANFILES): Remove yesno.txt. + + build: update gnulib submodule to latest + + build: update build/test tools from gnulib + * bootstrap: Update from gnulib. + * tests/init.sh: Likewise. + +2010-09-01 Jim Meyering <meyering@redhat.com> + + maint: add lib/version-etc.c to the list in POTFILES.in + * po/POTFILES.in: Add lib/version-etc.c. + +2010-09-01 Jim Meyering <meyering@redhat.com> + + grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc. + When I make a mistake like this: + grep '[:lower:]' ... + be it in a script or on the command line, I want to know about + it as soon as possible. I don't want grep to print a mere warning + that it is interpreting this suspicious and almost guaranteed-wrong + regular expression as a set of just 6 bytes. And I certainly don't + want grep to silently do the wrong thing, even if that would be + officially standards-conforming. It's obvious that I intended + [[:lower:]], and I want my error to be diagnosed in a way that is + most likely to get my attention. Thus, with this change, grep now + prints a diagnostic and exits with status 2 the moment it + encounters an offending [:char_class:] construct. + + This changes the way grep works by default, rather than + putting this new behavior on an option. A new option + would seldom be used in scripts (not portable), and would + probably be used only rarely by those who need it the most. + This new functionality provides a valuable safety measure + and incurs truly negligible risk. + + For strict POSIX compliance, set POSIXLY_CORRECT in + your environment. That disables this new feature. + + Revert the changes from commit 2cd3bcea, "grep: add + --warnings={always,never,auto}.", and then do the following: + + * src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here; + Remove "warning: " from the diagnostic, now that it's more than + a warning, and exit with status 2. + * NEWS (New features): Describe the new semantics. + * tests/warn-char-classes: Adjust one test to accommodate this change. + * doc/grep.texi (Character Classes and Bracket Expressions): Document. + (Environment Variables): Cross-reference it. + Remove reference to obsolete getopt illegal vs. invalid difference. + Thanks to Paul Eggert for suggestions and an initial prod. + +2010-08-30 Jim Meyering <meyering@redhat.com> + + maint: use gnulib's standard --version-printing code + This includes author names and keeps the copyright year up to date. + * bootstrap.conf (gnulib_modules): Add propername and version-etc-fsf. + * src/main.c (AUTHORS): Define. + (main): Use version_etc, rather than hard-coding the copyright text. + Prompted by a patch from Paolo Bonzini. + +2010-08-27 Paolo Bonzini <bonzini@gnu.org> + + dfa: warn on [:space:] and similar + * src/dfa.c (parse_bracket_exp): Warn on regular expressions such as + [:space:]. + * src/dfa.h (dfawarn): New prototype. + * src/dfasearch.c (dfawarn): New. + * NEWS: Document. + + tests: add test for warnings + * tests/Makefile.am (TESTS): Add warn-char-class. + * tests/warn-char-class: New. + + grep: add --warnings={always,never,auto}. + * src/grep.h (no_warnings): New declaration. + * src/main.c (no_warnings): New. + (WARNINGS_OPTION): Add to enum. + (main): Add --warnings. Handle color_option == 2 together with it. + + tests: add failing test for grep from a directory + * tests/Makefile.am (TESTS, XFAIL_TESTS): Add grep-dir. + * tests/grep-dir: New. + + tests: add test for previous commit + * tests/Makefile.am (TESTS): Add grep-dev-null. + * tests/grep-dev-null: New. + + search: fix "grep -Fif /dev/null" + * bootstrap.conf: Include gnulib module minmax. + * src/searchutils.c (mbtolower): Handle *N == 0 case. + * src/system.h: Include minmax.h from gnulib. + +2010-08-27 Adam Katz <savannah@kopis.com> + + Remove declaration after statement in dfa.c + * dfa.c (dfaexec): Declare saved_end at the beginning of the function. + +2010-08-13 Jim Meyering <meyering@redhat.com> + + make --include=FILE work once again + The semantics of excluded_file_name changed (when operating on + an "included" file name list). + * src/main.c (main): Adjust for changed semantics of excluded_file_name + simply by removing a negation. + * NEWS (Bug fixes): Mention this fix. + * tests/include-exclude: Add a test for this. + Reported by Joe Perches in http://savannah.gnu.org/bugs/?29876. + +2010-07-16 Paolo Bonzini <bonzini@gnu.org> + + doc: document \s and \S + * doc/grep.texi (The Backslash Character and Special Expressions): + Document \s and \S escapes. + +2010-05-29 Karl Berry <karl@gnu.org> + + doc: discuss matches that span two or more lines + * doc/grep.texi (Usage): Discuss matching across lines. + (Character Classes and Bracket Expressions) <[:space:]>: refer to it. + +2010-05-25 Jim Meyering <meyering@redhat.com> + + build: use latest gettext: 0.18 + * configure.ac: Use gettext-0.18. + * bootstrap.conf (gnulib_modules): Use gettext-h, not gettext. + since the latter drags in a depedency on gettext 0.18. + Suggested by Bruno Haible. + + maint: update helper scripts from gnulib + * tests/init.sh: Update from gnulib. + * bootstrap: Likewise. + + build: update gnulib submodule to latest + + maint: don't emit an extra newline in each of two diagnostics + * src/main.c (context_length_arg, grepdir): Remove a stray \n in + each of two diagnostics. + +2010-05-24 Bruno Haible <bruno@clisp.org> + + search: Avoid out-of-bounds access. + * src/dfasearch.c (EGexecute): Avoid access beyond end of buffer + that could happen if start != beg - buf. + +2010-05-23 Aharon Robbins <arnold@skeeve.com> + + dfa: fix signedness warnings + * src/dfa.c (dfaexec): Cast p when passing it to prepare_wc_buf. + +2010-05-09 Jim Meyering <meyering@redhat.com> + + tests: update init.sh + * tests/init.sh: Update from gnulib. + + tests: normalize init.sh-sourcing code + * tests/backref-multibyte-slow: Use one-line idiom. + * tests/backref-word: Likewise. + * tests/case-fold-backref: Likewise. + * tests/case-fold-backslash-w: Likewise. + * tests/case-fold-char-class: Likewise. + * tests/case-fold-char-range: Likewise. + * tests/case-fold-char-type: Likewise. + * tests/char-class-multibyte: Likewise. + * tests/dfaexec-multibyte: Likewise. + * tests/empty: Likewise. + * tests/euc-mb: Likewise. + * tests/fedora: Likewise. + * tests/fgrep-infloop: Likewise. + * tests/fmbtest: Likewise. + * tests/foad1: Likewise. + * tests/ignore-mmap: Likewise. + * tests/include-exclude: Likewise. + * tests/max-count-vs-context: Likewise. + * tests/pcre-z: Likewise. + * tests/prefix-of-multibyte: Likewise. + * tests/reversed-range-endpoints: Likewise. + * tests/sjis-mb: Likewise. + * tests/spencer1-locale: Likewise. + * tests/word-delim-multibyte: Likewise. + * tests/word-multi-file: Likewise. + + tests: update help-version + * tests/help-version: Update from coreutils. + +2010-05-06 Jim Meyering <meyering@redhat.com> + + tests: enable glibc's malloc-perturbing option + * tests/Makefile.am (MALLOC_PERTURB_): Define, in case it's not already + set in your environment. + (TESTS_ENVIRONMENT): Propagate MALLOC_PERTURB_ setting to test scripts. + +2010-05-06 Paolo Bonzini <bonzini@gnu.org> + + dfa: speed up [[:digit:]] and [[:xdigit:]] + There's no "multibyte pain" in these two classes, since POSIX + and ISO C99 mandate their contents. + + Time for "./grep -x '[[:digit:]]' /usr/share/dict/linux.words" + Before: 1.5s, after: 0.07s. (sed manages only 0.5s). + + * src/dfa.c (predicates): Declare struct dfa_ctype separately + from definition. Add sb_only. + (find_pred): Return const struct dfa_ctype *. + (parse_bracket_exp): Return const struct dfa_ctype *. Do + not fill MBCSET for sb_only character types. + +2010-05-05 Jim Meyering <meyering@redhat.com> + + tests: readability: use awk rather than obfuscated sed + * tests/backref-multibyte-slow: Generate input using an awk for-loop + rather than expensive and harder-to-read sed pipes. + Remove stray "set -x" and "wc -l in". + + dfa: avoid segfault when processing an invalid multi-byte sequence + * src/dfa.c (dfaexec): Handle the cases in which mbrtowc returns + (size_t)-1 or (size_t)-2, rather than setting mblen_buf[i] to an + outrageously large value. + +2010-05-05 Paolo Bonzini <bonzini@gnu.org> + + grep: remove redundant syntax bit + * grep.c (Gcompile): Remove RE_HAT_LISTS_NOT_NEWLINE. + + tests: add test for newly-fixed performance problem + * tests/backref-multibyte-slow: New. + * tests/Makefile.am: Add it. + +2010-05-05 Paolo Bonzini <bonzini@gnu.org> + + dfa: convert to wide character line-by-line + This provides a nice speedup for -m in general, but especially + it avoids quadratic complexity in case we have to go to glibc. + + * NEWS: Document change. + * src/dfa.c (prepare_wc_buf): Extract out of dfaexec. Convert + only up to the next newline. + (dfaexec): Exit multibyte processing loop if past buf_end. + Call prepare_wc_buf again after processing a newline. + +2010-05-01 Jim Meyering <meyering@redhat.com> + + maint: remove useless #if HAVE_STDLIB_H + * src/mbsupport.h: Don't test HAVE_STDLIB_H. + +2010-04-20 Jim Meyering <meyering@redhat.com> + + dfa: don't #ifdef-out member declarations + * src/dfa.c (struct dfa): Remove "#if MBS_SUPPORT" guard that made + several member declarations conditional on this cpp definition. + (token): Likewise. + Reported by Anders Wallin. + + tests: ensure that the --mmap option is ignored + * tests/ignore-mmap: New file. + * tests/Makefile.am (TESTS): Add it. + Reported by Jaroslav Škarvada in <http://savannah.gnu.org/bugs/?29614> + +2010-04-20 Paolo Bonzini <bonzini@gnu.org> + + dfa: honor RE_DOT_NEWLINE and RE_DOT_NOT_NULL in UTF-8 period optimization + * src/dfa.c (add_utf8_anychar): Check for RE_DOT_NEWLINE and + RE_DOT_NOT_NULL. + + grep: fix --mmap not being ignored + * NEWS: Document bugfix. + * main.c (main): Ignore MMAP_OPTION. + +2010-04-19 Jim Meyering <meyering@redhat.com> + + maint: avoid syntax-check failure due to indentation via TABs + * src/dfa.c (atom): Expand TABs in indentation. + + build: update gnulib submodule to latest + + maint: restrict scope of two globals to dfasearch.c + * src/dfasearch.c (patterns, pcount): Declare these file-scoped + globals to be static. + +2010-04-19 Paolo Bonzini <bonzini@gnu.org> + + dfa: optimize UTF-8 period + * NEWS: Document improvement. + * src/dfa.c (struct dfa): Add utf8_anychar_classes. + (add_utf8_anychar): New. + (atom): Simplify if/else nesting. Call add_utf8_anychar for ANYCHAR + in UTF-8 locales. + (dfaoptimize): Abort on ANYCHAR. + + dfa: drop ORTOP + * src/dfa.c (token, prtok, addtok_mb, nsubtoks, dfaanalyze, dfamust): + Remove ORTOP. + (regexp): Remove parameter, always add OR at the end, adjust callers. + (atom): Adjust caller. + (dfaparse): Adjust caller. Always add OR at the end. + + dfa: fix {0,0} + * NEWS: Document change. + * src/dfa.c (struct dfa): Remove "broken" field. + (lex): Do not set it. + (closure): On {0,0}, backup and lex another closure without + adding a CAT. + (dfabroken): Remove. + * src/dfa.h (dfabroken): Remove. + * tests/spencer1.tests: Add testcases for {m,n}. + + dfa: simplify dfainit + * src/dfa.c (dfainit): Use memset. + +2010-04-17 Jim Meyering <meyering@redhat.com> + + doc: fix a nit in HACKING + * HACKING: Correct size of .git/ dir: 9MB, not 30MB. + + tests: add an expected-to-fail test using \< in a multi-byte locale + * tests/word-delim-multibyte: New test. Currently failing. + * tests/Makefile.am (TESTS): Add it. + (XFAIL_TESTS): Define, temporarily. + Reported by Jaroslav Škarvada in http://savannah.gnu.org/bugs/?29537. + +2010-04-16 Paolo Bonzini <bonzini@gnu.org> + + test: cover just-fixed bug + * tests/empty: Test -Fw too. + + grep: fix matching the empty string with grep -Fw + * NEWS: Document fix. + * src/kwsearch.c (Fexecute): The empty string is a valid match if it is + a whole word. + +2010-04-15 Jim Meyering <meyering@redhat.com> + + maint: update init.sh and HACKING + * HACKING: Sync from coreutils. + * tests/init.sh: Update from gnulib. + +2010-04-13 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest; adapt + * COPYING: Remove empty line. + * README: Likewise. + * doc/fdl.texi: Likewise. + * tests/backref-word: Likewise. + +2010-04-11 Stefano Lattarini <stefano.lattarini@gmail.com> + + tests: accept the Debian timeout program + * tests/init.cfg: test timeout with `timeout 10s true' + +2010-04-08 Jim Meyering <meyering@redhat.com> + + dfa: convert "cannot happen" code/comment to use assert + * src/dfa.c (dfamust): There were numerous "cannot happen" comments, + some associated with "if (expr) goto done;". Replace each with an + equivalent "assert (!expr);". + + build: use gnulib's isblank module + * bootstrap.conf (gnulib_modules): Use gnulib's isblank module, + now that we rely on the function by that name. + + maint: undo TAB-conversion change to gl/lib/*.c.diff + This fixes a bootstrap failure due to the patches not applying. + * .x-sc_prohibit_tab_based_indentation: Add ^gl/lib/.*\.c\.diff$ + * gl/lib/regcomp.c.diff: Revert today's TAB->space change. + * gl/lib/regex_internal.c.diff: Likewise. + * gl/lib/regexec.c.diff: Likewise. + +2010-04-08 Arnold D. Robbins <arnold@skeeve.com> + + dfa: fix declaration of dfabroken in dfa.h + * dfa.h (dfabroken) [GAWK]: Fix declaration to match that in dfa.c. + +2010-04-08 Jim Meyering <meyering@redhat.com> + + maint: add syntax-check rule to enforce the new no-leading-TABs policy + * cfg.mk (sc_prohibit_tab_based_indentation): New rule, from coreutils. + (sc_prohibit_emacs__indent_tabs_mode__setting): Likewise. + (old_NEWS_hash): Update. + * .x-sc_prohibit_tab_based_indentation: List exempt files. + +2010-04-08 Jim Meyering <meyering@redhat.com> + + convert all TABs to equivalent spaces in indentation + Using this file, + + cat > leading-blank.exempt <<\EOF + (?:^|\/)ChangeLog[^/]*$ + (?:^|\/)(?:GNU)?[Mm]akefile[^/]*$ + \.(?:am|mk)$ + EOF + + run this command to convert all non-conforming leading white + space to be all spaces: + + git ls-files \ + | pcregrep -vf leading-blank.exempt \ + | xargs pcregrep -l '^ *\t' \ + | xargs perl -MText::Tabs -ni -le \ + '$m=/^( *\t[ \t]*)(.*)/; print $m ? expand($1) . $2 : $_' + +2010-04-08 Jim Meyering <meyering@redhat.com> + + build: include cfg.mk in the distribution tarball + * Makefile.am (EXTRA_DIST): Add cfg.mk. + +2010-04-08 Jim Meyering <meyering@redhat.com> + + maint: Makefile.am tweak (no semantic change) + * Makefile.am (EXTRA_DIST): List one per line. Sort. + + build: include cfg.mk in the distribution tarball + * Makefile.am (EXTRA_DIST): Add cfg.mk. + +2010-04-08 Jim Meyering <meyering@redhat.com> + + dfa: move definition of __attribute__ back into dfa.h + * src/dfa.c (__attribute__): Move definition back to... + * src/dfa.h: ... this file. It is essential for non-gcc compilers. + Reported by Arnold Robbins. + +2010-04-07 Arnold D. Robbins <arnold@skeeve.com> + + dfa: move internals from dfa.h to dfa.c + * src/dfa.h: Move internals into dfa.c. + * src/dfa.c: The dfa internals are now totally local to this file. + (dfaalloc, dfamusts, dfabroken): New functions to access features. + * src/dfasearch.c (dfa): Change this global variable from struct to pointer. + Adapt to that change, and use new functions, dfamusts and dfaalloc. + +2010-04-07 Jim Meyering <meyering@redhat.com> + + mbtolower: avoid potential NULL-dereference + * src/searchutils.c: Include <assert.h>. + (mbtolower): Assert that 0 < *n, to avoid possibility of NULL-deref. + Remove dead increment. + + maint: tell git to ignore more build products + * .gitignore: Also ignore results of "make ID" and "make tags". + + build: update gnulib submodule to latest + + tests: use init.sh consistently + * tests/euc-mb: Call "path_prepend_ ." on a line by itself, + and with a comment. This makes it so all of the srcdir/init.sh + lines are consistent, project-wide, and so that the addition of "." + to PATH for this test is properly documented. + * tests/sjis-mb: Likewise. + + maint: avoid new syntax-check failure, ... + ...now that the sole use of xmalloc no longer matches the + regular expression used by the syntax-check rule. + * .x-sc_prohibit_xalloc_without_use: Exempt src/kwset.c. + + grep: make kwset's obstack use xmalloc, not malloc + This insidious bug could make grep fail to diagnose a failed malloc, + and then proceed to dereference the resulting NULL pointer. + Note that this bug was unlikely ever to cause real trouble; without + the fix, grep would segfault upon OOM, now it exits with a diagnostic. + * src/kwset.c (malloc) [GREP]: Define without the "(s)" macro + parameter, so that unadorned uses of malloc are also mapped to xmalloc. + One such use is in the expansion of obstack_init. + Report and patch by Nelson H. F. Beebe, in + http://thread.gmane.org/gmane.comp.gnu.grep.bugs/2995 + + tests: improve help-version (sync from gzip's version) + * tests/help-version: Cross-check $VERSION and --version output. + * tests/Makefile.am (TESTS_ENVIRONMENT): Export VERSION=$(VERSION). + +2010-04-06 Jim Meyering <meyering@redhat.com> + + doc: update THANKS + * THANKS: Update. + +2010-04-06 Aharon Robbins <arnold@skeeve.com> + + build: avoid conflict with WCHAR definition from Cygwin's <windows.h> + * src/dfa.h (enum token): Remove the definition from this file. + Replace with a declaration and typedef. Moved to ... + * src/dfa.c (enum token): ... here. + Reported by Corinna Vinschen. + +2010-04-06 Jim Meyering <meyering@redhat.com> + + doc: add HACKING + * HACKING: New file. Copied from coreutils, with s/coreutils/grep/ + and a few minor edits. + +2010-04-05 Jim Meyering <meyering@redhat.com> + + tests: pull fixed init.sh from gnulib + * tests/init.sh: Update from gnulib. + + maint: fix new argmatch-related syntax-check failures + * configure.ac (ARGMATCH_DIE): Use usage(EXIT_FAILURE), not exit(1). + * po/POTFILES.in: Add lib/argmatch.c. + + maint: update cfg.mk to work with gnulib's newer "make syntax-check" + * cfg.mk: Update to use new _sc_search_regexp interface. Run this: + perl -pi -e 's/\b_prohibit_regexp\b/_sc_search_regexp/;' + -e 's/\bmsg=/halt=/; s/\bre=/prohibit=/;' cfg.mk + and then adjust backslashes so they still line up. + + maint: update tests/init.sh from gnulib + This ensures that the explanation for any skipped or failed test + is printed on stderr, not buried in each .log file. + * tests/init.sh: Update from gnulib. + * tests/init.cfg (stderr_fileno_): Define to 9, to match the + literal 2>&9 in tests/Makefile.am + + build: update gnulib submodule to latest + +2010-04-04 Jim Meyering <meyering@redhat.com> + + maint: use argmatch, for better --directories=INVAL diagnostics + Before, you'd see this: + grep: unknown directories method + + Now, you'll see this: + grep: invalid argument `INVAL' for `--directories' + Valid arguments are: + - `read' + - `recurse' + - `skip' + Usage: src/grep [OPTION]... PATTERN [FILE]... + Try `src/grep --help' for more information. + + * bootstrap.conf: Add argmatch. + * configure.ac: Define ARGMATCH_DIE and ARGMATCH_DIE_DECL. + * src/main.c (directories_type): Define. + (directories_args, directories_types) Define. + All of the above so we can... + (main): Use XARGMATCH. + (usage): Declare extern, now that argmatch calls it via ARGMATCH_DIE. + +2010-04-04 Jim Meyering <meyering@redhat.com> + + dfa.c: const correctness; and remove useless casts of realloc and malloc + * src/dfa.c (icatalloc, icpyalloc, istrstr, enlist): As above. + (inboth, dfamust, comsubs): Likewise. + + dfa.c: use a better (unsigned) type for an index: int->unsigned int + * src/dfa.c (dfaexec): Use "unsigned int" for a logically unsigned index. + + maint: style: use sizeof VAR, rather than sizeof TYPE, where possible + * src/dfa.c (copyset, zeroset): Prefer sizeof EXPR, over sizeof TYPE, + for improved readability/maintainability. + (equal, parse_bracket_exp, addtok_wc, dfaparse, dfaexec): Likewise. + +2010-04-02 Jim Meyering <meyering@redhat.com> + + dfa.c: use a better (unsigned) type for an index: int->size_t + * src/dfa.c (parse_bracket_exp): Use size_t as type of index, not int. + + maint: const-correctness + * src/dfa.c (tstbit, copyset, equal, charclass_index): Declare read-only + "charclass" parameters to be "const". No semantic change. + + maint: include <wchar.h> and <wctype.h> unconditionally + * src/main.c: Include <wchar.h> and <wctype.h> unconditionally. + Their presence/usefulness are assured by gnulib. + * src/dfa.c: Likewise. + * src/search.h: Likewise. + + maint: MBS_SUPPORT: define to 0/1, not undef/1 + Prepare to remove many of these #ifdefs. + * src/mbsupport.h (MBS_SUPPORT): Define to 0/1, not undef/1. + Change each "#ifdef MBS_SUPPORT" to "#if MBS_SUPPORT". Use this: + perl -pi -e 's/ifdef (MBS_SUPPORT)/if $1/' $(g grep -l ifdef.MBS_SUPPO) + * src/dfa.c: s/#ifdef MBS_SUPPORT/#if MBS_SUPPORT/ + * src/dfa.h: Likewise. + * src/dfasearch.c: Likewise. + * src/kwsearch.c: Likewise. + * src/main.c: Likewise. + * src/search.h: Likewise. + * src/searchutils.c: Likewise. + +2010-04-02 Jim Meyering <meyering@redhat.com> + + maint: use STREQ in place of strcmp + perl -pi -e 's/\bstrcmp *\((.*?)\) == 0/STREQ ($1)/' src/main.c + perl -pi -e 's/\bstrcmp *\((.*?)\) != 0/!STREQ ($1)/' src/main.c + + * src/dfa.c (STREQ): Define. + Use it instead of strcmp. + * src/main.c (STREQ): Likewise. + * cfg.mk (local-checks-to-skip): Remove sc_prohibit_strcmp, + to enable the strcmp-prohibition. + +2010-04-02 Jim Meyering <meyering@redhat.com> + + maint: enable the useless_cpp_parens syntax check + * cfg.mk (local-checks-to-skip): Remove sc_useless_cpp_parens. + * src/main.c (devices, fillbuf, exit_on_match): Remove useless parens. + (print_line_head, grepfile, set_limits, main): Likewise. + * src/vms_fab.h: Likewise. + * vms/config_vms.h: Likewise. + * src/mbsupport.h: Likewise. + + cleanup and improvement: parse command line arguments consistently + * src/main.c: Include c-ctype.h, for this: + (prepend_args): Use c_isspace, not ISSPACE. + This is important so that we parse arguments consistently, + and independently of the current locale. + * bootstrap.conf (gnulib_modules): Add c-ctype. + * src/system.h: Remove IS* definitions here, too. + * src/dfasearch.c (WCHAR): Use isalnum, not ISALNUM. + * src/kwsearch.c (WCHAR): Likewise. + * src/searchutils.c (kwsinit): Use tolower, not TOLOWER. + + cleanup: rely on gnulib's ctype.h functions; remove IS* macros and is_* + * src/dfa.c (setbit_case_fold, prednames): Use official names. + (IS_WORD_CONSTITUENT, lex): Likewise. + (ISALNUM, ISALPHA, ISCNTRL, ISDIGIT, ISGRAPH): Remove definitions. + (ISLOWER, ISPRINT, ISPUNCT, ISSPACE, ISUPPER, ISXDIGIT): Likewise. + (is_alnum, is_alpha, is_blank, is_cntrl, is_digit, is_graph): Likewise. + (is_lower, is_print, is_punct, is_space, is_upper, is_xdigit): Likewise. + (isgraph): Likewise. + + build: update gnulib submodule to latest, and adjust + * src/main.c (parse_grep_colors): Adjust diagnostics not to trigger + the sc_error_message_period and sc_error_message_uppercase + syntax-check rules. + + maint: remove all VMS-related code + * configure.ac (AC_CONFIG_FILES): Remove vms/Makefile + * Makefile.am (SUBDIRS): Remove vms. + * src/Makefile.am (EXTRA_DIST): Remove vms_fab.c and vms_fab.h. + * src/vms_fab.c, src/vms_fab.h, vms/make.com: Remove files. + * vms/Makefile.am, vms/README, vms/config_vms.h: Likewise. + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.6.3 + * NEWS: Record release date. + +2010-04-02 Jim Meyering <meyering@redhat.com> + + grep: avoid used-undefined error with truncated multibyte input + * src/dfa.c (addtok_wc): Don't use buf[0] (it's undefined) when + wcrtomb returns <= 0. + + MBS_SUPPORT-removal: * src/dfa.c (dfastate): + +2010-04-01 Jim Meyering <meyering@redhat.com> + + maint: avoid unnecessary 2nd getenv("TERM") + * src/main.c (main): Don't call getenv("TERM") twice -- in the same + expression, even. + + tests: remove all unportable uses of echo + * src/main.c: Use printf rather than echo -ne in a comment. + * tests/fedora: Use printf (not echo) also in ok/fail functions. + * cfg.mk (sc_prohibit_echo_minus_en): New rule, to prohibit + any future introduction. + + tests: add explicit requirement for en_US.UTF-8 + * tests/char-class-multibyte: Use require_en_utf8_locale_, + rather than open-coding it. + * tests/prefix-of-multibyte: Require the locale explicitly. + * tests/fgrep-infloop: Likewise. + This fixes test failures that would arise on systems without + that particular locale. Reported by Ludovic Courtès. + + tests: new function, to require an en_US UTF8 locale + * tests/init.cfg (require_en_utf8_locale_): New function. + + tests: use printf, not echo -n, echo -e, or any combination + * tests/fedora: Using printf is more portable. + + grep: remove unnecessary code + * src/main.c (print_line_middle): Now that we use RE_ICASE + (enabled in commit 70e23616, "dfa: rewrite handling of multibyte + case_fold lexing"), this case-conversion code is useless and wasteful. + Remove it. + + doc: fix typo: s/AM_V_AT/AM_V_at/ + * doc/Makefile.am (egrep.1 fgrep.1): The former has case consistent + with its sister variable, AM_V_GEN, but the latter is the one that + actually works. + + doc: generated files are best made read-only, ... + ...to minimize risk of accidentally modifying the generated file + rather than its template. These are tiny, so no risk, but it's + a good to be consistent, so generated files are easier to spot. + * doc/Makefile.am (egrep.1 fgrep.1): When generating these files, + ensure that they too are created read-only. + + doc: generate grep.1 from template + * doc/Makefile.am (grep.1): New rule. + (CLEANFILES): Add grep.1 to the list. + * .gitignore: Add /doc/grep.1 + * doc/grep.in.1: Replace hard-coded "2.5.1-cvs" with @VERSION@. + Update copyright year list. + Omit the line-splitting \(co directive so that update-copyright + will perform future updates automatically. + Egmont Koblinger reported the outdated version string + and copyright year list in the man page: + http://savannah.gnu.org/bugs/?29390 + + doc: prepare to generate grep.1 + * doc/grep.1: Rename to... + * doc/grep.in.1: ...this. + +2010-03-31 Eric Blake <eblake@redhat.com> + + build: avoid another warning + Noticed on cygwin: + get-mb-cur-max.c: In function 'main': + get-mb-cur-max.c:27: error: unused parameter 'argc' [-Wunused-parameter] + + * tests/get-mb-cur-max.c (main): Use argc. + +2010-03-31 Paolo Bonzini <bonzini@gnu.org> + + tests: fix on systems with broken sh + * tests/Makefile.am (TESTS_ENVIRONMENT): Adjust coreutils remnants. + * tests/bre.sh: Invoke script with $SHELL if defined. + * tests/ere.sh: Likewise. + * tests/spencer1-locale: Likewise. + * tests/spencer1.sh: Likewise. + + tests: improve empty test + * tests/empty: Add more tests, note expected failure. + + tests: improve empty test with respect to locales + * tests/empty: Add tests for multiple locales. + + grep: fix grep -F against empty string + * src/searchutils.c (is_mb_middle): Do not return true for empty matches + when p == buf. + + tests: rename empty.sh to empty + * tests/empty.sh: Rename to... + * tests/empty: ... this. + * tests/Makefile.am (TESTS): Adjust. + + tests: convert empty.sh to new style + * tests/empty.sh: Convert to init.sh, add 10-second timeout. + + tests: use get-mb-cur-max in char-class-multibyte + * tests/char-class-multibyte: Use get-mb-cur-max to detect UTF-8 support. + Rewrite previous locale detection code as a grep test. + + tests: fix -Wformat failure + * tests/get-mb-cur-max (main): Cast MB_CUR_MAX to int. + +2010-03-30 Jim Meyering <meyering@redhat.com> + + doc: add a "Reply-To" to the suggested announcement mail header + * README-release: Add "Reply-To" with the list address, + to minimize risk of replies to the other announcement recipients. + Suggestion from Eric Blake. + +2010-03-29 Jim Meyering <meyering@redhat.com> + + build: avoid compiler warning when building test program + * tests/Makefile.am (AM_CPPFLAGS, AM_CFLAGS, AM_LDFLAGS): Define, + so that all the usual C compile-and-link machinery comes into play. + * tests/get-mb-cur-max.c: Include "progname.h". + Remove unnecessary inclusion of <ctype.h>. + Mike Frysinger reported the "implicit decl of set_program_name" warning. + + build: detect PCRE support also when <pcre/pcre.h> is the header + * m4/pcre.m4: Also check for <pcre/pcre.h>. + * src/pcresearch.c: Include <pcre/pcre.h>, if needed. + Guard inclusions with HAVE_PCRE_H and HAVE_PCRE_PCRE_H, not HAVE_LIBPCRE. + * NEWS (Bug fixes): Mention it. + Dmitry V. Levin reported that PCRE support was not detected + on systems with <pcre.h> not in the default include path. + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.6.2 + * NEWS: Record release date. + +2010-03-29 Eric Blake <eblake@redhat.com> + + build: avoid warnings on cygwin + * lib/savedir.c (isdir): Avoid shadowing a declaration. + * src/main.c (get_nondigit_option): Cast away const to avoid + compiler warning. + + maint: ignore new test executable + * .gitignore: Enhance. + +2010-03-29 Jim Meyering <meyering@redhat.com> + + doc: consolidate redundant-looking entries + * NEWS: Consolidate the two --include/exclude-related entries. + Suggested by Eric Blake. + +2010-03-29 Paolo Bonzini <bonzini@gnu.org> + + tests: use $(...) consistently + * tests/backref.sh: Use `...' instead of ``...'' in comments. + * tests/bre.awk: Use $(...) instead of `...`. + * tests/ere.awk: Use $(...) instead of `...`. + * tests/euc-mb: Use $(...) instead of `...`. + * tests/fmbtest: Use $(...) instead of `...`. + * tests/foad1: Use $(...) instead of `...`. + * tests/pcre-z: Use $(...) instead of `...`. Quote output of grep. + * tests/spencer1-locale.awk: Use $(...) instead of `...`. + * tests/spencer1.awk: Use $(...) instead of `...`. + * tests/yesno.sh: Use $(...) instead of `...`. + +2010-03-29 Jim Meyering <meyering@redhat.com> + + build: make doc/Makefile.am cleaner and more robust + * doc/Makefile.am (egrep.1 fgrep.1): Generate robustly, i.e., + do not redirect directly to $@. + Use $(AM_V_GEN). + Do not distribute intermediate files like fgrep.man and egrep.man. + Likewise, do not use them to generate their %.1 images. + Instead, generate the .1 files directly. + +2010-03-29 Paolo Bonzini <bonzini@gnu.org> + + tests: add program to detect locales + * tests/Makefile.am (check_PROGRAMS): Add get-mb-cur-max. + * tests/get-mb-cur-max.c: New. + * tests/euc-mb: Use it. Fail if the former detection test fails. + * tests/sjis-mb: Use it. Fail if the former detection test fails. Expand + comments. + +2010-03-29 Paolo Bonzini <bonzini@gnu.org> + + tests: add tests for SJIS character sets + The attached test will be skipped unless (on a glibc system) you run + something like + + mkdir /usr/lib/locale/ja_JP.SHIFT_JIS + zcat /usr/share/i18n/charmaps/SHIFT_JIS.gz | \ + localedef \ + -f - \ + -i /usr/share/i18n/locales/ja_JP \ + /usr/lib/locale/ja_JP.SHIFT_JIS + + * tests/Makefile.am: Add sjis-mb. + * tests/sjis-mb: New. + +2010-03-29 Paolo Bonzini <bonzini@gnu.org> + + grep -F: fix a bug with SJIS character sets + Commit db9d6 would erroneously skip matches in SJIS character sets. In + this character set low bytes (i.e. ASCII bytes) are also valid second + bytes in a double-byte character, so you have to continue looking for + a match, even if you match in the middle of a double-byte character. + + * src/kwsearch.c: Ensure that beg is advanced by at least one byte, + but do not fail immediately after matching in the middle of a double-byte + character. + +2010-03-28 Bruno Haible <bruno@clisp.org> + + build: update after change in gnulib's lib-ignore module + * src/Makefile.am (AM_LDFLAGS): Define. Use gnulib's new + $(IGNORE_UNUSED_LIBRARIES_CFLAGS). + +2010-03-28 Jim Meyering <meyering@redhat.com> + + tests: disable new texinfo-acronym syntax-check from gnulib + * cfg.mk (local-checks-to-skip): Add new sc_texinfo_acronym, to skip it. + +2010-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: exercise fix for improper match of incomplete MB char prefix + * tests/prefix-of-multibyte: New file. + * tests/Makefile.am (TESTS): Add it. + +2010-03-28 Jim Meyering <meyering@redhat.com> + + grep -F: fix a multi-byte erroneous-match-in-middle bug + Just as Perl prints nothing in this case, + printf '\357\274\241\n' | perl -CIO -lne '/\357/ and print' + + grep should also print nothing when used as follows. + However, these would mistakenly match with grep prior to 2.6.2: + printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357' + printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357\274' + + * src/searchutils.c (is_mb_middle): New parameter: the length of the + match, in bytes, as determined by kwsexec. Use this to detect when + the nominal match found by kwsexec must be skipped because it is for + an incomplete multi-byte character that is a prefix of a character + in the input. + * src/dfasearch.c (EGexecute): Update caller. + * src/kwsearch.c (Fexecute): Likewise. + * src/search.h: Update prototype. + * NEWS (Bug fixes): Mention it. + Report and analysis by Norihiro Tanaka. + +2010-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp> + + tests: add tests for the fgrep-infloop bug + * tests/init.cfg (require_timeout_): New function. + * tests/fgrep-infloop: New file. Test for the above fix. + * tests/Makefile.am (TESTS): Add it. + +2010-03-28 Jim Meyering <meyering@redhat.com> + + grep -F: avoid infinite loop when searching for incomplete MB character + Searching for an incomplete non-prefix of a multi-byte character + should find no match. + + Just as these print nothing, + printf '\357\274\241\357\274\241\n' \ + | perl -CIO -ne '/\241\357/ and print' + printf '\357\274\241\n' | perl -CIO -ne '/\274\241/ and print' + printf '\357\274\241\n' | perl -CIO -ne '/\241/ and print' + printf '\357\274\241\n' | perl -CIO -ne '/\274/ and print' + + These should also print nothing, but with grep-2.6 and grep-2.6.1, + they would infloop: + printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\241' + printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274' + printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274\241' + + * src/kwsearch.c (Fexecute): Don't infloop when searching for + an incomplete non-prefix part of a multi-byte character. + * NEWS (Bug fixes): Mention it. + Reported and diagnosed by Norihiro Tanaka. + +2010-03-28 Jim Meyering <meyering@redhat.com> + + tests: rename: fmbtest.sh -> fmbtest + * tests/fmbtest.sh: Rename to ... + * tests/fmbtest: ...this, dropping the .sh suffix. + * tests/Makefile.am (TESTS): Reflect renaming. + + tests: convert fmbtest.sh to use init.sh + * tests/fmbtest.sh: Use init.sh and adapt accordingly: + Use "grep", not ${GREP}. Use Exit, not exit. + + tests: also exercise the --include + glob path + * tests/include-exclude: Exercise Javier's fix. + +2010-03-28 Javier Villavicencio <the_paya@gentoo.org> + + grep -r: fix --include with globs, too + The previous fix addressed only the non-glob case. + * src/main.c (main): Use add_exclude's EXCLUDE_WILDCARDS option, + to enable the use of fnmatch with --include=GLOB. + gnulib: Update to latest, for the fixed exclude.c. + +2010-03-28 Jim Meyering <meyering@redhat.com> + + grep -r: fix --include with non-globs + * lib/savedir.c (savedir): Fix logic error. Introduced by commit + bf3bd92c, "build: adapt to the newer exclude API we now get from gnulib" + * tests/include-exclude: Test for this bug by exercising --include, too. + * NEWS (Bug fixes): Mention it. + Reported by Philipp Kohlbecher in http://savannah.gnu.org/bugs/?29358 + +2010-03-27 Jim Meyering <meyering@redhat.com> + + kwset: correct comments; require non-NULL kwsmatch argument + * src/kwset.c (kwsexec): Correct comments. This function has been + returning an offset, not a pointer, for 9 years. + Do not test for kwsmatch == NULL. All callers pass non-NULL. + (cwexec): Likewise. + * src/kwset.h (kwsexec): Mark the 4th parameter, kwsmatch, as non-NULL. + Include "arg-nonnull.h". + + build: add -I$(top_builddir)/lib so we also find generated .h files + * src/Makefile.am (AM_CPPFLAGS): Rename from INCLUDES to avoid + warning from automake -Wall. + Add -I$(top_builddir)/lib, so we find generated .h files like + getopt.h in a non-srcdir build. + + build: remove superfluous LOCALEDIR definition + * src/Makefile.am (INCLUDES): Remove unnecessary definition of + LOCALEDIR here. Now, it's defined via gnulib's configmake.h. + * src/system.h: Include "configmake.h" for its LOCALEDIR definition. + + grep: don't segfault upon use of --include or --exclude* options + * lib/savedir.c (isdir1): Fix fatal typo: deref "dir" argument, + not the global (initially-NULL) "path". Reported by Standish Parsley. + * tests/include-exclude: New file. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + +2010-03-26 Jim Meyering <meyering@redhat.com> + + tests: rename: foad1.sh -> foad1 + * tests/foad1.sh: Rename to ... + * tests/foad1: ...this, dropping the .sh suffix. + * tests/Makefile.am (TESTS): Reflect renaming. + + tests: convert foad1.sh to use init.sh + This fixes a spurious test failure when "make check" is run with + certain envvars set, e.g., "make check GREP_COLOR=always" + * tests/foad1.sh: Use init.sh and adapt accordingly: + Use "grep", not ${GREP}. Test VERBOSE against "yes", not "1", + to be consistent with init.sh. + Use Exit, not exit. + Reported by Nelson H. F. Beebe. + + tests: insulate tests from envvar settings + * tests/init.cfg (vars_): Unset each envvar that can affect how + grep works. This protects only those tests that have been + converted to use init.sh. + +2010-03-25 Eric Blake <eblake@redhat.com> + + maint: ignore 'make dist pdf' droppings + * .gitignore: Add more exemptions. + +2010-03-25 Jim Meyering <meyering@redhat.com> + + tests: avoid spurious test failure due to lack of a French UTF8 locale + * tests/init.cfg: New file. If either $LOCALE_FR or $LOCALE_FR_UTF8 + is set to "none", reset it to the empty string. + Reported by Mike Frysinger and Sven Joachim. + * tests/Makefile.am (EXTRA_DIST): Add init.cfg. + + build: do not use pkg-config to test for PCRE support + * configure.ac: Do not use PKG_PROG_PKG_CONFIG or PKG_CHECK_MODULES. + Do not modify CPPFLAGS; that belongs to those who invoke make. + Instead, use autoconf's AC_CHECK_HEADERS and AC_SEARCH_LIBS via the + new macro, gl_FUNC_PCRE, defined in... + * m4/pcre.m4 (gl_FUNC_PCRE): New macro, to handle pcre-related + configure-time tests. + * src/Makefile.am (grep_LDADD): Use LIB_PCRE, not PCRE_LIBS. + * src/pcresearch.c: Test HAVE_LIBPCRE via "#if", not "#ifdef". + All other cpp tests of this symbol used "#if". + Prompted by a suggestion from Bruno Haible. + * NEWS (Build-related): Mention this. + + doc: correct and amend NEWS entries for 2.6.1 + * NEWS (Bug fixes): Correct character ranges bug description. + Add an example from Dmitry V. Levin. + Add that the word-with-backref bug was introduced in 2.5.1. + * cfg.mk (old_NEWS_hash): Update to match. + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.6.1 + * NEWS: Record release date. + +2010-03-25 Tony Abou-Assaleh <taa@acm.org> + + tests: use awk's -v option more portably + * tests/spencer1-locale: Add a space between awk's "-v" option and + the following VAR=value string, to avoid test failure on Mac OS X. + +2010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp> + + dfa/grep: fix compilation with MBS_SUPPORT + * src/dfa.c (cur_mb_len): Initialize to 1 and always make it available. + (setbit_case_fold): Do not use wint_t in prototype if !MBS_SUPPORT. + (parse_bracket_exp): Fix compilation with !MBS_SUPPORT. + * src/kwsearch.c (kwsinit): Do not use mbtolower and MB_CUR_MAX + if !MBS_SUPPORT. + * src/searchutils.c (kwsinit): Do not refer to MB_CUR_MAX if !MBS_SUPPORT. + + * tests/char-class-multibyte: Skip if UTF-8 matching does not work. + * tests/fmbtest.sh: Likewise. + +2010-03-25 Jim Meyering <meyering@redhat.com> + + build: avoid warnings about unnecessary use of "return" + * src/grep.c (Gcompile, Ecompile, Acompile): Do not "return X" + from a function returning void, not even when X itself is a + function returning void. This avoids warnings from Sun Studio 11 + reported by Dagobert Michelsen. + * src/egrep.c (Ecompile): Likewise. + +2010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp> + + grep: fix printing when -w is used and regex is needed for matching + * NEWS: Document bugfix. + * src/dfasearch.c (EGexecute): After assess_pattern_match len, is either + invalid or end-beg; jump to success. + * tests/Makefile.am (TESTS): Add new test. + * tests/backref-word: New. + +2010-03-25 Paolo Bonzini <bonzini@gnu.org> + + dfa: fix single byte character ranges + * src/dfa.c (in_coll_range): Fix ordering for second strcoll. Reported + by Dmitry V. Levin. + * tests/spencer1-locale.awk: Also test single-byte character sets. + * NEWS: Add a note about this bugfix. + * THANKS: Add Dmitry. + +2010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp> + + grep: reset state after truncated or invalid multibyte sequences + * src/searchutils.c (is_mb_middle): When treating an invalid sequence + or a truncated multibyte character as a single byte character, reset + mbstate + + grep: do lowercase conversion in print_line_middle only for single-byte case + * src/main.c (print_line_middle): Restrict match_icase code + to MB_CUR_MAX == 1. Adjust comments. + +2010-03-25 Jim Meyering <meyering@redhat.com> + + tests: provide framework_failure_ function + The shell function "framework_failure" was called in the unusual + event that some fundamental test set-up operation would fail. + However it was not defined. Define it, but with a trailing underscore + to impinge less on the test writer's name space. Adjust all uses. + * tests/init.sh (framework_failure_): New function. + * tests/case-fold-backref: s/framework_failure/framework_failure_/ + * tests/case-fold-char-class: Likewise. + * tests/case-fold-char-range: Likewise. + * tests/case-fold-char-type: Likewise. + * tests/char-class-multibyte: Likewise. + * tests/dfaexec-multibyte: Likewise. + * tests/max-count-vs-context: Likewise. + * tests/word-multi-file: Likewise. + +2010-03-24 Jim Meyering <meyering@redhat.com> + + doc: tweak THANKS + * THANKS: Update Arnold's name and address, per request. + + portability: use gnulib's lseek wrapper + * bootstrap.conf (gnulib_modules): Use gnulib's lseek wrapper, + for improved portability. lseek does not fail with ESPIPE on + pipes on some systems. + + build: avoid link failure on Solaris 8 + * bootstrap.conf (gnulib_modules): Add wctob. + * NEWS (Portability): Mention this. + Reported by Dagobert Michelsen in <http://sv.gnu.org/bugs/?29325>. + +2010-03-24 Petr Písař <petr.pisar@atlas.cz> + + doc: translate new --help message + * src/main.c: Translate "after_options". + +2010-03-24 Jim Meyering <meyering@redhat.com> + + doc: NEWS make it clear that the bug was introduced in 2.6 + * NEWS: Clarify. + +2010-03-24 Paolo Bonzini <bonzini@gnu.org> + + tests: fix char-class-multibyte + * tests/char-class-multibyte: Make it pass. + +2010-03-23 Jim Meyering <meyering@redhat.com> + + build: avoid compilation failure when MBS_SUPPORT not defined + * src/dfa.c (setbit_case_fold) [!MBS_SUPPORT]: Fix curly brace mismatch. + +2010-03-23 Paolo Bonzini <bonzini@gnu.org> + + dfa: fix sigsegv on multibyte character classes + Reported by Jaroslav Škarvada <jskarvad@redhat.com>. This is + unfortunate. grep needs an automatic testcase generator. + + * NEWS: Document bug. + * THANKS: Mention reporter. + * src/dfa.c (set_bit_casefold): Change type of first argument for + self-documentation. + (parse_bracket_exp): Fix call. + * tests/Makefile.am: Add new testcase. + * tests/char-class-multibyte: New testcase. + +2010-03-23 Jim Meyering <meyering@redhat.com> + + post-release administrivia + * NEWS: Add header line for next release. + * .prev-version: Record previous version. + * cfg.mk (old_NEWS_hash): Auto-update. + + version 2.6 + * NEWS: Record release date. + + build: avoid warnings: tell gcc and clang that dfaerror never returns + * src/dfa.h (__attribute__): Define. + (dfaerror): Declare with the "noreturn" attribute. + * src/dfasearch.c (dfaerror): Add an unreachable use of abort. + +2010-03-22 Eric Blake <eblake@redhat.com> + + build: fix cygwin build + Portions of gnulib depend on -lintl, and cygwin does not allow + lazy linking. + + * src/Makefile.am (LDADD): Include libraries in correct order. + +2010-03-22 Paolo Bonzini <bonzini@gnu.org> + + grep: remove --mmap + mmap is a bad idea for sequentially accessed file because it will cause + a page fault for every read page. Just consider it a failed experiment, + and ignore --mmap while accepting it for backwards compatibility. + + * configure.ac (AC_FUNC_MMAP): Remove. + * doc/grep.texi (Other options): Say --mmap is ignored. + * src/grep.c (mmap_option): Remove. + (long_options): Do not reference it. + (bufmapped, initial_bufoffset): Remove. + (reset, fillbuf): Remove HAVE_MMAP code. + (grepfile): Remove bufmapped reference. + (usage): Say --mmap is ignored. + +2010-03-22 Paolo Bonzini <bonzini@gnu.org> + + grep: rename files for intuitiveness + * Makefile.am (libgrep_a_SOURCES, grep_SOURCES, egrep_SOURCES, + fgrep_SOURCES): Adjust. + * grep.c: Rename to main.c. + * esearch.c: Rename to egrep.c. + * fsearch.c: Rename to fgrep.c. + * gsearch.c: Rename to grep.c. + + grep: kill GREP_PROGRAM/EGREP_PROGRAM/FGREP_PROGRAM + * NEWS: Document slight semantic change. + * TODO: #ifdefs are gone. + * po/POTFILES.in: Update. + * src/Makefile.am (grep_SOURCES, egrep_SOURCES, fgrep_SOURCES): Remove + grep.c/egrep.c/fgrep.c. + (noinst_LIBRARIES): Change libsearch.a to libgrep.a. + (libsearch_a_SOURCES): Rename to libgrep_a_SOURCES, add grep.c + (LDADD): Change libsearch.a to libgrep.a. + * src/esearch.c: Add before_options and after_options. + * src/fsearch.c: Likewise. + * src/gsearch.c: Likewise. + * src/grep.c (short_options, long_options): Remove GREP_PROGRAM + special-casing. + (usage): Use before_options and after_options, look at matchers. + (setmatcher): Merge with install_matcher. + (main): Call setmatcher (NULL) instead of install_matcher. + * src/grep.h (GREP_PROGRAM): Remove. + (before_options, after_options): Add. + + thank Eric Blake + * THANKS: Add Eric Blake, who reported the warning fixed by 774d0ee. + + grep: libify *search.c + * src/Makefile.am (libsearch_a_SOURCES): Add dfasearch.c, kwsearch.c, + pcresearch.c. + * src/esearch.c, src/fsearch.c, * src/gsearch.c: Only include search.h. + * src/dfasearch.c (GEAcompile, EGexecute): Export. + * src/kwsearch.c (Fcompile, Fexecute): Export. + * src/pcresearch.c (Pcompile, Pexecute): Export. + * src/search.h: Add new exported functions. + + grep: prepare for libification of *search.c + * src/dfasearch.c (Ecompile): Remove. + * src/esearch.c: Place it here... + * src/gsearch.c: ... and here. + + grep: split search.c + * po/POTFILES.in: Update. + * src/Makefile.am (grep_SOURCES, egrep_SOURCES, fgrep_SOURCES): Move + kwset.c and dfa.c to libsearch.a. Add searchutils.c there too. + * src/search.h, src/dfasearch.c, src/pcresearch.c, src/kwsearch.c, + src/searchutils.c: New files, split out of src/search.c. + * src/esearch.c, src/fsearch.c: Include the new files instead of search.c. + * src/gsearch.c: Likewise, plus move Gcompile/Acompile here. + + grep: remove one #ifdef + * search.c (GEAcompile) [EGREP_PROGRAM]: Use common code. Inline IF_BK. + +2010-03-22 Paolo Bonzini <bonzini@gnu.org> + + grep: eliminate {COMPILE,EXECUTE}_{RET,ARGS,FCT} + Modern compilers warn about type mismatches. + + * src/grep.c (do_execute): Write full declaration. + * src/grep.h (COMPILE_RET, COMPILE_ARGS, COMPILE_FCT, EXECUTE_RET, + EXECUTE_ARGS, EXECUTE_FCT): Remove. + (compile_fp_t, execute_fp_t): Write full declaration. + * src/search.c (GEAcompile, Gcompile, Acompile, Ecompile, EGexecute, + Fcompile, Fexecute, Pcompile, Pexecute): Write full declaration. + +2010-03-22 Paolo Bonzini <bonzini@gnu.org> + + grep: make egrep/fgrep use struct matcher + * Makefile.am (grep_SOURCES): Add gsearch.c. + (EXTRA_DIST): Add search.c. + * esearch.c (matchers): New. + * fsearch.c (matchers): New. + * gsearch.c: New. + * search.c (matchers): Remove. + * grep.c: Always compile most !GREP_PROGRAM sections. + (main): Use first matcher if none is explicitly provided. Remove + "default" matcher. + * grep.h (struct matcher): Adjust comments. + + grep: change struct matcher termination + * src/grep.c (setmatcher): Look for NULL matchers[i].name. + * src/grep.h (struct matcher): Change name to pointer. Adjust comments. + * src/search.c (matchers): Terminate with three NULLs. + + grep: remove one #ifdef + * search.c (Ecompile): Always go through GEAcompile to use same code path + for both grep and egrep. + + grep: remove getpagesize.h + * src/getpagesize.h: Remove. + * src/Makefile.am (noinst_HEADERS): Remove getpagesize.h. + +2010-03-21 Jim Meyering <meyering@redhat.com> + + build: use the fcntl-h module, not "fcntl" + * bootstrap.conf (gnulib_modules): We might need fcntl.h somewhere, + but don't use the fcntl function. Reported by Bruno Haible. + + build: avoid link failure on systems using gnulib's fcntl but not open + * bootstrap.conf (gnulib_modules): Using gnulib's fcntl module + and including <fcntl.h>, but not also using gnulib's "open" module + would result in link failure due to references to rpl_open + on systems requiring the replacement (e.g., Cygwin and Darwin). + + build: avoid compilation failure on systems using rpl_open + This new build failure has arisen as a result of using gnulib's + "fcntl" module. Now that an inadequate "open" syscall is replace + by gnulib's wrapper, it is essential to include <fcntl.h>. + * src/grep.c: Include <fcntl.h>. + This is required, for grepfile's use of open, at least on + Cygwin and Darwin. + + maint: use gnulib's fcntl module, just in case + * bootstrap.conf (gnulib_modules): Add fcntl. + Grep uses at least O_BINARY, which may be defined therein. + + maint: remove TYPE_* definitions from src/system.h + * src/system.h (TYPE_MAXIMUM, TYPE_MINIMUM, TYPE_SIGNED): Remove + definitions. They are provided by intprops.h. + * src/grep.c: Include "intprops.h" + * bootstrap.conf (gnulib_modules): Add intprops. + + maint: alphabetize #include directives + * src/grep.c: Alphabetize #include directives. + +2010-03-20 Jim Meyering <meyering@redhat.com> + + build: stop using gnulib's memmove module + * bootstrap.conf (gnulib_modules): Remove obsolete module: memmove + + build: reinstate gnulib's fcntl-h-tests + * bootstrap.conf (gnulib_tool_option_extras): Do not avoid + the fcntl-h-tests. I cannot reproduce the failure. + +2010-03-20 Eric Blake <eblake@redhat.com> + + build: allow compilation on cygwin + Gnulib is incompatible with -Wunused-macros. Addtionally, + cygwin 1.7.1 coupled with --enable-gcc-warnings tripped on: + + grep.c: In function 'print_line_middle': + grep.c:805: error: array subscript has type 'char' [-Wchar-subscripts] + grep.c: In function 'main': + grep.c:1833: error: 'optarg' redeclared without dllimport attribute: previous dllimport ignored [-Wattributes] + grep.c:1834: error: 'optind' redeclared without dllimport attribute after being referenced with dll linkage + + * configure.ac (GNULIB_WARN_FLAGS): Disable -Wunused-macros. + * src/grep.c (print_line_middle): Use correct type to tolower. + (main): Drop useless redeclarations. + * .gitignore: Ignore more built files. + +2010-03-20 Jim Meyering <meyering@redhat.com> + + tests: ensure that all programs handle [b-a] consistently + * tests/reversed-range-endpoints: New test. + * tests/Makefile.am (TESTS): Add it. + +2010-03-20 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + This pulls in the latest regex module from gnulib, including a fix + to make it honor the RE_NO_EMPTY_RANGES syntax bit. + + tests: temporarily disable irrelevant-to-grep failing C++ fcntl-h-tests + * bootstrap.conf (gnulib_tool_option_extras): Temporarily add + --avoid=fcntl-h-tests, until the C++ part of that test is fixed. + +2010-03-20 Jim Meyering <meyering@redhat.com> + + reject reversed-endpoint ranges, with all regex variants + * src/search.c: Add RE_NO_EMPTY_RANGES to the syntax bits + in three places, so that all of grep, egrep, and grep -E reject + a range with reversed endpoints like '[b-a]'. This is required, + when using the latest version of gnulib's regex module, since it + now honors the RE_NO_EMPTY_RANGES flag, rather than acting as if + it were always set. + Based on a change by Matthew Burgess. + +2010-03-19 Jim Meyering <meyering@redhat.com> + + maint: correct macro parameter parentheses + * src/dfa.c (FETCH_WC, FETCH): Parenthesize macro parameters. + +2010-03-19 Paolo Bonzini <bonzini@gnu.org> + + tests: change help-version to per-program functions + * help-version: Change each *_args variable to a *_setup function. + + dfa: fix wchar_t/wint_t type mismatch + * src/dfa.c (FETCH_WC): Pass a local wchar_t variable to mbrtowc. + (FETCH): Rename temporary second argument to FETCH_WC. + (parse_bracket_exp): Always use FETCH_WC. + +2010-03-19 Jim Meyering <meyering@redhat.com> + + doc: add README-prereq, referenced from README-hacking + * README-prereq: New file. Cloned from coreutils, s/coreutils/grep/ + Reported by Tony Abou-Assaleh. + +2010-03-19 Arnold Robbins <arnold@skeeve.com> + + maint: sync dfa comments from gawk + * src/dfa.h (struct dfa) [newlines]: Amend comment. + * src/dfa.c: Update copyright year list to include gawk's. + +2010-03-17 Jim Meyering <meyering@redhat.com> + + maint: remove obsolete "cvs-clean" make target + * Makefile.am (cvs-clean): Remove obsolete target. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + dfa: initialize struct mbcset using memset + * src/dfa.c (parse_bracket_exp): Use memset to initialize workmbc. + + dfa: spell out "unsigned int" + * dfa.c (setbit, tstbit, clrbit, setbit_case_fold, lex, dfaoptimize, + free_mbdata): Put "int" after unsigned. + * dfa.h (struct position, struct dfa): Likewise. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + dfa: optimize simple character sets under UTF-8 charsets + Only use a bitset when possible without involving MBCSET. Testcase: + yes 'the quick brown fox jumps over the lazy dog' | sed 100000q | \ + time grep -c [ABCDEFGHIJKLMNOPQRSTUVWXYZ,] + + Before: 51ms (best of three runs); after: 16ms(best of three runs). + + * src/dfa.c (parse_bracket_exp): For simple bracket expressions + under UTF-8, use a CSET. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + dfa: speed up handling of brackets + This patch has two sides. One is to fold the parsing of brackets in the + single- and multi-byte cases. The second is to leverage this change, + and use a bitset to test for single-byte characters in the charset. + Splitting the two would be very hard. + + Testcase: + yes 'the quick brown fox jumps over the lazy dog' | sed 100000q | \ + time grep -c [ABCDEFGHIJKLMNOPQRSTUVWXYZ,] + + Before: 59ms (best of three runs); after: 51ms (best of three runs). + Nice, but mostly providing infrastructure for the next patch. + + * src/dfa.c (setbit_case_fold): Try applying towlower/towupper. + (looking_at): Remove. + (FETCH_WC): New. + (fetch_wc): Merge into FETCH_WC [MBS_SUPPORT]. + (FETCH) [MBS_SUPPORT]: Call FETCH_WC. + (prednames, find_pred, is_blank and other predicates): Move above, + remove K&R syntax support. + (parse_bracket_exp): New name of parse_bracket_exp_mb, rewritten to + include single-byte character set parsing of brackets. + (lex): Adjust for fetch_wc->FETCH_WC change, remove single-byte + character set parsing of brackets. + (match_mb_charset): Test against work_mbc->cset. + * src/dfa.h (struct mb_char_classes): Add cset. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + syntax-check: remove space-tab exception + * .x-sc_space_tab: Remove. + * src/dfa.c: Fix space-tab occurrence. + + THANKS: fix Jim Meyering's email address + * THANKS: Jim is now with Red Hat. + + dfa: add missing function + * src/dfa.c (using_utf8): New. + (addtok_wc, free_mbdata, dfaoptimize) [!MBS_SUPPORT]: Do not define. + (dfacomp) [!MBS_SUPPORT]: Do not call dfaoptimize. + + tests: fix typo + * fedora: Fix typo. + + tests: use Exit + * euc-mb: exit with "Exit 0". + + grep: remove more register keywords + * dosbuf.c: Remove register keywords. + * grep.c: Remove register keywords. + * kwset.c: Remove register keywords. + * search.c: Remove register keywords. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + dfa: run simple UTF-8 regexps as a single-byte character set + This provides a speedup whenever fgrep is "almost" sufficient but + not quite (e.g. grep ^abc). This affects test cases such as + https://savannah.gnu.org/bugs/?29117, which are already worked around + by the line-by-line matching patch c32c04; without that patch the + speedup can reach 1000x even on non-contrived testcases. + + * src/dfa.c (dfaoptimize): New. + (dfacomp): Call it. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + tests: fix syntax-check failures + * tests/case-fold-backref: Use "foo" instead of "the". + * tests/dfaexec-multibyte: Remove trailing blanks. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + grep: remove check_multibyte_string, fix non-UTF8 missed match + Avoid computing ahead something that can be computed lazily as efficiently + (or more efficiently in the case of UTF-8, though this is left as TODO). + At the same time, "soften" the rejection condition for matching in the + middle of a multibyte sequence to fix bug 23814. + + Multibyte "grep -i" would still be very slow if it wasn't for the workaround + patch c32c042 (grep: match multibyte charsets line-by-line when using -i, + 2010-03-08). + + * NEWS: Document bugfix. + * src/search.c (check_multibyte_string): Rewrite as... + (is_mb_middle): ... this. + (EGexecute, Fexecute): Adjust. + * tests/Makefile.am (TESTS): Add euc-mb. + * tests/euc-mb: New testcase. + +2010-03-17 Paolo Bonzini <bonzini@gnu.org> + + dfa: cache MB_CUR_MAX for dfaexec + * src/dfa.c (state_index, dfaexec): Use d->mb_cur_max. + (dfainit): Initialize it. + (free_mbdata): New, extracted out of dfafree. + (dfafree): Use it. + + dfa: improve documentation of struct dfa + * src/dfa.h (struct dfa): Reword some comments. + + tests: factor name of output files into a variable + * tests/case-fold-backref, tests/case-fold-char-class, + tests/case-fold-char-range, tests/case-fold-char-type, + tests/dfaexec-multibyte: Use a variable for the output filename, + as it is common to the grep and compare invocations. + + tests: use different output files to simplify reading failed .log files + * tests/case-fold-backref, tests/case-fold-char-class, + tests/case-fold-char-range, tests/case-fold-char-type: Use a different + name for each output file from grep. + * tests/dfaexec-multibyte: Likewise, and merge some grep invocations. + + tests: add another grep -i testcase, from bug 16179 + * tests/case-fold-backref: New. + * tests/Makefile.am (TESTS): Add it. + +2010-03-16 Paolo Bonzini <bonzini@gnu.org> + + dfa: rewrite handling of multibyte case_fold lexing + Let dfacomp do the folding to lowercase of multibyte input strings, + and remove it from grep.c. Input strings to kwset.c are still folded + outside kwset.c, so we still need to do mbtolower in search.c. + + * NEWS: Document bugfixes. + * .x-sc_cast_of_argument_to_free: Remove. + * src/dfa.c (wctok, addtok_wc): New. + (cur_mb_index, update_mb_len_index): Remove. + (FETCH): Do not call it. + (parse_bracket_exp_mb) [GREP]: Disable case-folding of ranges and + characters. + (addtok): Extract part to... + (addtok_mb): ... this new function. + (lex): Call fetch_wc in the main loop for MB_CUR_MAX > 1. Return WCHAR + for normal characters if MB_CUR_MAX > 1. + (atom): Handle WCHAR instead of treating multibyte characters specially. + Do case folding of multibyte characters here. + (dfacomp): Remove case_fold special casing. + * src/dfa.h (WCHAR): New. + * src/grep.c (mb_icase_keys): Remove. + (main): Do not call it. + * src/search.c (kwsinit): Init transition table only for MB_CUR_MAX == 1. + (mbtolower): New. + (kwsincr_case): New. + (kwsmusts): Call it instead of kwsincr. + (check_multibyte_string): Remove. + (check_multibyte_string_no_icase): Rename to check_multibyte_string. + (GEAcompile, EGexecute, Fcompile): Use mbtolower instead of the old + check_multibyte_string. + * tests/Makefile.am (TESTS): Add case-fold-backslash-w. + * tests/foad1.sh: Enable fixed tests. + * tests/case-fold-backslash-w: New. + +2010-03-16 Paolo Bonzini <bonzini@gnu.org> + + grep: match multibyte charsets line-by-line when using -i + The turtle combination -i + MB_CUR_MAX>1 requires case conversion ahead + of time. Avoid doing this repeatedly when many matches succeed. Together + with the previous changes, this fixes https://savannah.gnu.org/bugs/?29117 + and https://savannah.gnu.org/bugs/?14472. + + * NEWS: Document new speedup. + * src/grep.c (do_execute): New. + (grepbuf): Use it. + +2010-03-15 Paolo Bonzini <bonzini@gnu.org> + + dfa: fix handling of ranges in multibyte character sets + * src/dfa.c (parse_bracket_exp_mb): Add separate ranges for + lowercase and uppercase endpoints if folding case. + * tests/Makefile.am (TESTS): Add case-fold-char-range. + * tests/case-fold-char-range: New. + + tests: add more UTF-8 test cases + * tests/Makefile.am (TESTS): Add spencer1-locale. + (EXTRA_DIST): Add spencer1-locale.awk. + * tests/spencer1-locale.awk: New. + * tests/spencer1-locale: New. + +2010-03-15 Jim Meyering <meyering@redhat.com> + + tests: complete the renaming fedora.sh -> fedora + * tests/Makefile.am (TESTS): Rename fedora.sh -> fedora here, too. + +2010-03-15 Jim Meyering <meyering@redhat.com> + + * tests/fedora.sh: Rename to... + * tests/fedora: ...this, to reflect new convention: + Use the lack of a suffix to indicate we've converted to the new + init.sh-using test framework. + + tests: adjust fedora.sh to handle traps more portably + +2010-03-15 Jim Meyering <meyering@redhat.com> + + tests: adjust fedora.sh to handle traps more portably + * tests/fedora.sh: Use "Exit", not "exit". + + tests: for each test, set an envvar to its name + * tests/Makefile.am (TESTS_ENVIRONMENT): Set GREP_TEST_NAME for + each test. This is used to help make the output of hundreds of + independent, often-parallel valgrind runs more manageable. + +2010-03-14 Jim Meyering <meyering@redhat.com> + + tests: clean up fedora.sh + * tests/fedora.sh: Use "grep", not ${GREP}. + Use init.sh. + Use timeout 10, not sleep 1 (three times). + The latter would always sleep for 3 seconds, and the test would + fail with a false positive on a slow system or with a heavily + instrumented (valgrind) executable. + +2010-03-12 Jim Meyering <meyering@redhat.com> + + build: avoid build failure with --enable-gcc-warnings + * src/dfa.c: Don't include <assert.h>, now that it is not used. + [DEBUG]: Remove #ifdef block. + +2010-03-12 Paolo Bonzini <bonzini@gnu.org> + + syntax-check: enable space-tab + * cfg.mk (local-checks-to-skip): Enable space-tab. + * .x-sc_space_tab: Add exceptions. + * tests/status.sh: Fix occurrence. + + syntax-check: enable m4-quote-check + * cfg.mk (local-checks-to-skip): Enable m4-quote-check. + * configure.ac: Fix occurrence. + + syntax-check: enable makefile-TAB-only-indentation + * cfg.mk (local-checks-to-skip): Enable makefile-TAB-only-indentation. + * Makefile.am: Fix only occurrence. + + grep: fix error-message-uppercase + * cfg.mk (local-checks-to-skip): Enable error-message-uppercase. + * src/dfa.c (parse_bracket_exp_mb, lex, dfaparse): Fix occurrences. + * src/search.c (Pcompile, Pexecute): Fix occurrences. + + dfa, grep: cleanup if-before-free and cast-of-argument-to-free + * .x-sc_avoid_if_before_free: Remove. + * .x-sc_cast_of_alloca_return_value: Remove. + * .x-sc_cast_of_x_alloc_return_value: Remove. + * .x-sc_cast_of_argument_to_free: Temporarily add src/search.c. + * cfg.mk (local-checks-to-skip): Remove sc_cast_of_argument_to_free. + * src/dfa.c (ifree): Remove. + (dfamust, build_state, transit_state, dfafree): Do not do if-before-free, + do not cast free argument to ptr_t or char *. + (freelist): Call free instead of ifree. + * src/dfa.h (ptr_t): Remove. + +2010-03-12 Paolo Bonzini <bonzini@gnu.org> + + dfa: remove CRANGE dead code + The only use of CRANGE was removed by commit 193830d. In theory it is + more correct to do what CRANGE did, but in practice it seems like it did + not work. + + * src/dfa.h (token): Remove CRANGE. + * src/dfa.c (atom): Do not handle CRANGE. + (prtok): Likewise. + +2010-03-12 Paolo Bonzini <bonzini@gnu.org> + + dfa: get rid of x*alloc + * src/dfa.c: Include xalloc.h. + (xmalloc, xrealloc, xcalloc): Remove. + + grep: cleanup one const cast + * src/search.c (GEAcompile): Do not reuse motif when operating on the + (const) pattern, so we can make it non-const. Remove cast from free. + + kwset/system: remove ptr_t + * src/kwset.h: Declare kwset using an incomplete struct type. + * src/system.h (ptr_t): Remove. + +2010-03-12 Jim Meyering <meyering@redhat.com> + + tests: add test cases for dfaexec bug + * tests/dfaexec-multibyte: New test. + * tests/Makefile.am (TESTS): Add it. + Reported by Paolo Bonzini in http://bugzilla.redhat.com/544407 + and http://bugzilla.redhat.com/544406 . + +2010-03-12 Jim Meyering <meyering@redhat.com> + + dfa: manually merge gawk's dfaexec + * src/dfa.c (dfaexec): Adjust API: return pointer, not offset, and + take an "end" pointer parameter, rather than integral "size". + Adjust comment accordingly. + (build_state): Maintain d->newlines. + (copytoks): Update multibyte_prop indices. + (SKIP_REMAINS_MB_IF_INITIAL_STATE): Update a cast. + Return NULL, rather than (size_t) -1. + (realloc_trans_if_necessary): Realloc d->newlines. + * src/dfa.h (struct dfa): New member, "newlines". + (struct dfa) [GAWK]: New member, "broken". + (dfaexec): Update prototype and copy the new comment from dfa.c. + + dfa: make search.c use the new dfaexec API + + * src/search.c: Adjust to new dfaexec API. + Now, dfaexec returns a pointer, not an integer, + and the third parameter is END, not buffer size. + * src/dfa.c (dfaexec): Rewrite the function's comment. + Don't just clobber *END. While doing that happens to be + fine for gawk's usage, in grep, *END usually points to the + first byte of the next buffer. Save the initial value, + and restore it just before returning. + * src/dfa.h (dfaexec): Update comment; include parameter names. + +2010-03-12 Jim Meyering <meyering@redhat.com> + + dfa: appease static analyzers + * src/dfa.c (transit_state_singlebyte): Call abort rather + than returning in a "can't happen" scenario. + This stops clang from emitting a false-positive report (I think it + was used-uninitialized) about a caller. + +2010-03-11 Jim Meyering <meyering@redhat.com> + + dfa: do not accept [[:UPPER:]] or [[:LOWER:]] internally + * src/dfa.c (parse_bracket_exp_mb): Those class names are not + valid, and rejected elsewhere, so there is no point in allowing + upper or mixed-case versions here. + +2010-03-11 Jim Meyering <meyering@redhat.com> + + maint: remove a trailing space + * src/search.c (EXECUTE_FCT): Remove trailing space. + + maint: remove all uses of PARAMS + Remove most with this: + git grep -lw PARAMS |xargs perl -pi -e 's/\bPARAMS *\((.*)\);/$1;/' + Remove the remainder manually. + +2010-03-11 Jim Meyering <meyering@redhat.com> + + maint: remove all uses of PARAMS + * lib/savedir.h (PARAMS): Remove definitions manually. + Remove the remaining ones via this command: + git grep -l define.PARAMS |xargs perl -ni -e '/define PARAMS/ or print' + * src/dfa.h (PARAMS): Remove definitions. + * src/system.h (PARAMS): Likewise. + Remove most uses with this: + git grep -lw PARAMS |xargs perl -pi -e 's/\bPARAMS *\((.*)\);/$1;/' + Remove the remainder manually. + + maint: remove now-useless prototypes + * src/dfa.c: Remove the prototype of each static, non-recursive + function whose definition precedes first use. + + grep: plug an inconsequential leak + * src/grep.c (main): Plug a leak: free "keys". + + grep: avoid useless allocations for empty GREP_OPTIONS + * src/grep.c (prepend_default_options): Ignore GREP_OPTIONS + when it's empty, not just when it's undefined. + There are still relatively harmless leaks when GREP_OPTIONS + is set and non-empty. We'll address those, eventually. + +2010-03-09 Jim Meyering <meyering@redhat.com> + + build: record build-from-clone tool requirements + * bootstrap.conf (buildreq): This makes bootstrap fail with + a clear explanation of the problem. Otherwise, you'd get into + the build process and fail with something far more cryptic. + + dfa: remove a trailing blank + * src/dfa.c (dfaexec): No trailing blanks allowed. + + dfa: sync a tiny change from gawk + * src/dfa.c (state_index) [MBS_SUPPORT]: Initialize .mpbs.nelem member + unconditionally. Also initialize .mbps.elems. + + dfa: avoid a leak (work_mbc->chars) + * src/dfa.c (parse_bracket_exp_mb): Remove useless (and leaked MALLOC). + + doc+bootstrap: document build-from-git-clone process + * bootstrap: Update from coreutils/gnulib. + * README-hacking: New file, nearly identical to the one in coreutils. + +2010-03-08 Paolo Bonzini <bonzini@gnu.org> + + more work on TODO + * TODO: More work on the first section. Use clearer section headers. + +2010-03-08 Reuben Thomas <rrt@sc3d.org> + + bring TODO up-to-date + * TODO: merge with TODO section of http://www.gnu.org/software/grep/devel.html + and remove done items. Some small bits of tidying also. + +2010-03-07 Paolo Bonzini <bonzini@gnu.org> + + simplify parsing of [a-z] + * src/dfa.c (in_coll_range): New. + (lex): Use it instead of regcomp/regexec. + + Small refactoring in src/dfa.c + * src/dfa.c (parse_bracket_exp_mb): Return MBCSET. + (lex): Assign return value of parse_bracket_exp_mb to lasttok, return it. + + use do...while(0) idiom + * dfa.c (FETCH): Wrap with do...while(0). + +2010-03-06 Paolo Bonzini <bonzini@gnu.org> + + extract common code from if/else + * dfa.c (dfaexec): Simplify logic for MB_CUR_MAX > 1 case. + + remove register variable hacks + * dfa.c (dfaexec): We can extract the address of a variable without fearing + performance problems, modern compilers know better. + + remove register keywords + * dfa.c (dfaexec): Modern compilers just ignore it. + + allow grep -Pz + * NEWS: Document grep -P improvements. + * src/search.c (Pcompile): Remove restriction on grep -Pz. + * tests/pcre-z: New. + * tests/Makefile.am (TESTS): Add pcre-z. + + fix cross-line matching in PCRE backend + * search.c (Pexecute): Split the buffer in lines and match each line + separately. + * tests/fedora.sh: Add regression testsuite. + + fix formatting of NEWS + * NEWS: fix formatting of 2.6 entries. + + fix a bug in handling of -i and character type + * dfa.c (parse_bracket_exp_mb): Convert [[:lower:]] and [[:upper]] to + [[:alpha:]] when folding case. + * tests/case-fold-char-type: New file. Test for the bug. + * tests/Makefile.am (TESTS): Add it. + * NEWS (Bug fixes): Mention it. + + fix previous test case change + * tests/case-fold-char-class: Do not reset fail to 0 after first test. + +2010-03-06 Mike Frysinger <vapier@gentoo.org> + + grep(1) man page: touchup --label option + * doc/grep.1 (--label): Don't italicize ending period. Point to -H + option. + +2010-03-06 Paolo Bonzini <bonzini@gnu.org> + + augment case-fold-char-class test case + * tests/case-fold-char-class: Test matching lowercase against uppercase + as well as vice versa. + +2010-03-05 Reuben Thomas <rrt@sc3d.org> + + doc: improve the discussion of PCRE + * doc/grep.1: Add a sentence about Perl regular expressions, + and point to pcresyntax(3) and pcrepattern(3). + * doc/grep.texi: Likewise. + +2010-03-05 Jim Meyering <meyering@redhat.com> + + maint: dfa-sync: comment and dead-to-grep code: no semantic change + * src/dfa.c: Sync a comment and some #ifdef GAWK code. + + maint: dfa-sync: don't malloc zero + * src/dfa.c (dfacomp): Skip case_fold logic when length is zero. + This probably "no semantic change", but does improve efficiency in + a degenerate case. + + maint: dfa-sync: use CALLOC rather than equiv. MALLOC+initialize-loop + * src/dfa.c (dfaanalyze): Sync from gawk. No semantic change. + + dfa.c: add support for \s and \S + * src/dfa.c (lex): Sync from gawk's dfa.c. + + maint: dfa-sync: add omitted array initializer + * src/dfa.c (prednames): Add a "0" to final initializer. + No semantic change. + + fix a bug in handling of -i and character classes + * dfa.c (parse_bracket_exp_mb): Sync one part of this function + from gawk's dfa.c, which was patched by Arnold D. Robbins. + * tests/case-fold-char-class: New file. Test for the bug. + * tests/Makefile.am (TESTS): Add it. + (TESTS_ENVIRONMENT): Propagate LOCALE_FR and LOCALE_FR_UTF8 + definitions into tests. + * NEWS (Bug fixes): Mention it. + +2010-03-05 Paolo Bonzini <pbonzini@redhat.com> + + Fedora Grep regression test suite + * tests/Makefile.am (TESTS): Add fedora.sh. + (CLEANFILES): Add several new files. + * tests/fedora.sh: New file, originally by Lubomir Rintel but somewhat + rewritten to avoid bashisms. + +2010-03-05 Paolo Bonzini <bonzini@gnu.org> + + convert AUTHORS file to UTF-8 + * AUTHORS: Convert to UTF-8. + + eliminate invalid "ptr += (ptr2 - ptr1)" + * lib/savedir.c (savedir): new_name_space and name_space do not point into + the same object, so computing their difference is invalid. Similarly, + summing the difference to namep is invalid because namep and the result + point into different objects. Avoid this. + + fix for bug 21276 + * lib/savedir.c (isdir1): Use realloc instead of calloc. Remove + dead code. + (savedir): Do not leak name_space if allocation of new_name_space fails. + +2010-03-04 Jim Meyering <meyering@redhat.com> + + tests: add a test based on an example from Paolo Bonzini + * tests/word-multi-file: New test. + * tests/Makefile.am (TESTS): Add it. + + doc: document release procedure + * README-release: New file. + + build: update gnulib submodule to latest + +2010-02-22 Paolo Bonzini <bonzini@gnu.org> + + add --group-separator=FOO and --no-group-separator + * src/grep.c (group_separator): New. + (long_options): Add --group-separator=FOO and --no-group-separator. + (prtext): Print group_separator instead of SEP_STR_GROUP. Optionally + suppress the separator altogether. + (main) Handle GROUP_SEPARATOR_OPTION. + * doc/grep.texi (Context control): Document it. + * NEWS: Mention it. + * tests/yesno.sh: Add testcases. + +2010-02-21 Jim Meyering <meyering@redhat.com> + + tests: don't use "echo -n" + * tests/foad1.sh: Use printf, not echo -n. The latter is not portable. + Reported by Daniel Richman. + +2010-02-08 Jim Meyering <meyering@redhat.com> + + remove useless DJGPP-specific code + * src/grep.c (grepfile): Remove now-useless DJGPP-specific code. + Now, all S_IS* macros are guaranteed to be defined via gnulib. + +2010-02-07 Jim Meyering <meyering@redhat.com> + + tests: add help-version sanity tests from coreutils + * tests/help-version: New test, from coreutils. + * tests/Makefile.am (TESTS): Add it. + (TESTS_ENVIRONMENT) [built_programs]: Define it. + + tests: correct TESTS_ENVIRONMENT's PATH setting + * tests/Makefile.am (TESTS_ENVIRONMENT): Set PATH to start with + $(abs_top_builddir)/src, so that we test the programs we've just built. + + grep: use the correct exit status (2) upon write failure, not 1 + * src/grep.c (main): Initialize exit_failure to EXIT_TROUBLE. + * NEWS (Bug fixes): Mention this fix. + + maint: enable the prohibit_magic_number_exit syntax check + * cfg.mk (local-checks-to-skip): Remove sc_prohibit_magic_number_exit, + to enable that check. + * src/system.h (EXIT_TROUBLE): Define. + * src/grep.c: Use symbolic names, EXIT_SUCCESS, EXIT_FAILURE, and + EXIT_TROUBLE, not 0, 1, 2. + * src/search.c: Likewise. + * src/vms_fab.c (string): Likewise. + +2010-02-04 Jim Meyering <meyering@redhat.com> + + doc: adjust NEWS item + * NEWS: Correct a description. + +2010-02-03 Jim Meyering <meyering@redhat.com> + + tests: exercise surprising -m1 vs. --context behavior + * tests/max-count-vs-context: New test. Exercise the surprising, + but documented, behavior reported by Markus Jochim in + http://savannah.gnu.org/bugs/?28588. + * tests/Makefile.am (TESTS): Add it. + + tests: use init.sh from gnulib + * tests/init.sh: New file, from gnulib. + * tests/Makefile.am (EXTRA_DIST): Add it. + (TESTS_ENVIRONMENT): Add variables and features. + (VERBOSE): Define. + + maint: remove unused Makefile rule + * tests/Makefile.am (dist-hook): Remove rule. No longer needed. + + maint: adjust formatting in tests/Makefile.am + * tests/Makefile.am (TESTS, CLEANFILES): Align and sort. + + build: avoid warnings in gnulib-supplied regex files + Now that we enable more warnings in lib/, we choose + to avoid some via patches applied by bootstrap, using + files in the gl/ hierarchy. Other, less-important + warnings are avoided simply by turning off the + -Wold-style-definition option and using a slightly + relaxed set of warnings $(GNULIB_WARN_CFLAGS) in lib/. + * gl/lib/regcomp.c.diff: Avoid warnings. + * gl/lib/regex_internal.c.diff: Likewise. + * gl/lib/regex_internal.h.diff: Likewise. + * gl/lib/regexec.c.diff: Likewise. + * configure.ac (GNULIB_PORTCHECK): Disable only -Wold-style-definition. + * lib/Makefile.am (AM_CFLAGS): Use $(GNULIB_WARN_CFLAGS) rather + than the slightly more strict $(WARN_CFLAGS). + + tests: adjust spencer #37 to pass with gnulib's regex code + * tests/spencer1.tests: Change #37 to expect an exit status of 2, not 1. + grep 'a[b-a]' reports "Invalid range end". + + maint: use regex from gnulib, rather than our bit-rotting one + * bootstrap.conf (gnulib_modules): Add regex. + * configure.ac: Don't use jm_INCLUDED_REGEX. + Update use of cache variable. + * lib/regex.c: Remove file. + * lib/regex.h: Likewise. + * m4/regex.m4: Likewise. + * POTFILES.in: Update to match. + + build: update gnulib submodule to latest + +2010-01-28 Jim Meyering <meyering@redhat.com> + + maint: update to latest gnulib; adjust cfg.mk + * gnulib: Update submodule to latest. + * cfg.mk (old_NEWS_hash): Update to reflect NEWS Copyright line change. + +2010-01-06 Jim Meyering <meyering@redhat.com> + + maint: avoid old jm_* macros + There were jm_* macros here, until very recently. + * cfg.mk (sc_prohibit_jm_in_m4): New rule, from coreutils. + + maint: remove decl.m4 + * m4/decl.m4: Remove unused file. + + maint: rely on gnulib's new isdir.h + * src/grep.c: Include "isdir.h". + * src/system.h: Remove declaration of isdir. + + build: rename local to avoid shadowing global, dfa + * src/dfa.c (dfamust): Rename parameter: s/dfa/d/. + + build: avoid warning from -Wmissing-prototypes + * src/dfa.c (match_mb_charset): Declare to be static. + + build: avoid shadowing warning for "link" + * src/kwset.c (link): Define to kwset_link, to avoid shadowing + the function. + + build: avoid shadowing warning for unused "rs" + * src/dfa.c (transit_state): Remove dead stores; + move a declaration "down". + Ignore transit_state_consume_1char return value. + + build: avoid shadowing warnings + * src/dfa.c (match_mb_charset): Rename parameter: s/index/idx/. + (check_matching_with_multibyte_ops, match_anychar): Likewise. + + build: avoid warning about unused definition of N_ + * src/dfa.c (N_): Remove unused definition. + + build: avoid format-string warnings + * src/search.c (dfaerror): Use literal "%s" as format string. + (kwsmusts, GEAcompile): Likewise. + (Pcompile): Likewise. + + build: add configure-time --enable-gcc-warnings option; avoid warnings + * bootstrap.conf (gnulib_modules): Add "manywarnings" module. + * configure.ac: Add --enable-gcc-warnings, derived from code in bison. + * src/Makefile.am (AM_CFLAGS): Set to $(WARN_CFLAGS) $(WERROR_CFLAGS) + * lib/Makefile.am (AM_CFLAGS): Likewise, but append. + + build: remove now-useless -I../intl option + * src/Makefile.am (INCLUDES): Remove -I../intl, now that intl is gone. + + maint: avoid more warnings + * src/grep.c (MAX): Remove definition of unused macro. + (usage): Declare with __attribute__ ((noreturn)). + Split long strings into chunks of length < 509. + + fix a possible bug: remove errant semicolon + * src/grep.c (prline): Remove erroneous semicolon-after-if-expr. + + maint: avoid compilation warnings + * bootstrap.conf (gnulib_modules): Add ignore-value. + * src/search.c (check_multibyte_string_no_icase): A variant of + check_multibyte_string that does *not* convert case, and hence + does not modify its BUF parameter. + (check_multibyte_string): Use xcalloc in place of xmalloc+memset. + Use ignore_value to ignore the return value from wcrtomb. This is + ok, since we know the input is a valid upper case wide character. + (Fexecute, EGexecute): Update callers of check_multibyte_string + to use both it and check_multibyte_string_no_icase. + + maint: avoid warnings about unused fwrite return value + * bootstrap.conf (gnulib_modules): Add unlocked-io. + * src/system.h: Include "unlocked-io.h". + + maint: remove {m4,lib}/.gitignore; they were undergoing too much churn + * .gitignore: Ignore all of m4/* except m4/djgpp.m4 + and all of lib/* except Makefile.am, savedir.c and savedir.h. + * m4/.gitignore: Remove file. + * lib/.gitignore: Remove file. + +2010-01-05 Jim Meyering <meyering@redhat.com> + + build: run gnulib's tests, too + * Makefile.am (SUBDIRS): Add gnulib-tests. + * gnulib-tests/Makefile.am: New file. + * bootstrap.conf (bootstrap_epilogue): New function, from coreutils. + (gnulib_tool_option_extras): Define. + * configure.ac: Add gnulib-tests/Makefile. + +2010-01-03 Jim Meyering <meyering@redhat.com> + + maint: record update-copyright options for this package + * cfg.mk: Next time, just run "make update-copyright". + +2010-01-01 Jim Meyering <meyering@redhat.com> + + maint: update all FSF copyright year lists to include 2010 + Use this command: + git ls-files |grep -vE '^(\..*|COPYING|gnulib)$' |xargs \ + env UPDATE_COPYRIGHT_USE_INTERVALS=1 build-aux/update-copyright + +2009-12-23 Jim Meyering <meyering@redhat.com> + + fix multi-byte-locale read-beyond-end-of-buffer error + Avoid read-beyond-end-of-buffer errors, evoked by running this: + LC_ALL=en_US.UTF-8 valgrind src/grep -f <(printf 'a\nb\n') <(echo c) + + Conditional jump or move depends on uninitialised value(s) + at 0x78136D: __gconv_transform_utf8_internal (in /lib/libc-2.11.so) + by 0x7E7232: mbrtowc (in /lib/libc-2.11.so) + by 0x8055773: dfaexec (dfa.c:2816) + by 0x804D7B0: EGexecute (search.c:353) + by 0x804ACD8: grepbuf (grep.c:1036) + by 0x804B023: grep (grep.c:1156) + by 0x804B460: grepfile (grep.c:1287) + by 0x804CF0D: main (grep.c:2282) + + Conditional jump or move depends on uninitialised value(s) + at 0x7E7248: mbrtowc (in /lib/libc-2.11.so) + by 0x8055773: dfaexec (dfa.c:2816) + by 0x804D7B0: EGexecute (search.c:353) + by 0x804ACD8: grepbuf (grep.c:1036) + by 0x804B023: grep (grep.c:1156) + by 0x804B460: grepfile (grep.c:1287) + by 0x804CF0D: main (grep.c:2282) + + * src/dfa.c (dfaexec) [MBS_SUPPORT]: Do not access one byte beyond + end of buffer. + +2009-12-23 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2009-12-23 Paolo Bonzini <bonzini@gnu.org> + + Speed up insert. + Suggested by Johan Walles <johan.walles@gmail.com> (bug 23354). + + * src/dfa.c (insert): Use binary search. + +2009-12-23 Johan Walles <johan.walles@gmail.com> + + Decrease epsclosure memory usage + Fixes bug 23321. + + * src/dfa.c (epsclosure): Make visited an array of char. + +2009-12-22 Paolo Bonzini <bonzini@gnu.org> + + Make 'grep -1 -2' and 'grep -1v2' equivalent to grep -2 + Fixes bug 12128. + + * src/grep.c (get_nondigit_option): Reset the buffer every time + a non-digit option is found or a new argument is started. + +2009-12-22 Paolo Bonzini <bonzini@gnu.org> + + Improve description of --label + Fixes bug 22681. + + * doc/grep.1 (--label): Use -H in the example, improve wording. + * doc/grep.texi (Output Line Prefix Control): Likewise. + +2009-12-22 Paolo Bonzini <bonzini@gnu.org> + + Avoid using an invalid memchr result. + Related to bug 13161. I cannot find a testcase, but it is better to be + defensive considering that these bug were found in the past. + + * src/search.c (EGexecute, Fexecute): Check for memchr return values. + +2009-12-11 Jim Meyering <meyering@redhat.com> + + build: update gnulib submodule to latest + +2009-12-04 Jim Meyering <meyering@redhat.com> + + maint: enable prohibit_have_config_h check + * cfg.mk (local-checks-to-skip): Enable sc_prohibit_have_config_h + * lib/regex.c: Remove useless cpp test of HAVE_CONFIG_H. + * lib/savedir.c: Likewise. + * src/grep.c: Likewise. + * src/kwset.c: Likewise. + * src/search.c: Likewise. + + maint: enable cast_of_x_alloc_return_value check + * cfg.mk (local-checks-to-skip): Enable sc_cast_of_x_alloc_return_value. + * .x-sc_cast_of_x_alloc_return_value: + * src/dfa.c (CALLOC, MALLOC, REALLOC): Remove casts. + * src/dosbuf.c (undossify_input): Likewise. + * src/grep.c (print_line_middle, prepend_default_options): Likewise. + + maint: enable cast_of_alloca_return_value check + * cfg.mk (local-checks-to-skip): Enable sc_cast_of_alloca_return_value. + * .x-sc_cast_of_alloca_return_value: New file. + +2009-12-04 Paolo Bonzini <bonzini@gnu.org> + + fix "grep -Ff" on CRLF-terminated files + * src/search.c (Fcompile) [HAVE_DOS_FILE_CONTENTS]: Recognize \r\n as + a line terminator. + + fix compilation with included regex + * Makefile.am (libgreputils_a_DEPENDENCIES): New. + + switch to pkg-config for PCRE detection + * configure.ac: use pkg-config to detect PCRE + * src/Makefile.am (grep_LDADD): link grep with PCRE_LIBS + +2009-12-04 Jim Meyering <meyering@redhat.com> + + maint: remove "missing" script + * missing: Remove now-unused file. + + maint: make .gitignore ignore more + * .gitignore: Ignore more. + + maint: enable useless-if-before-free check + * cfg.mk (local-checks-to-skip): Enable sc_avoid_if_before_free. + * .x-sc_avoid_if_before_free: New file. Exempt regex.c and dfa.c, + in case anyone ever tries to merge their contents with other versions. + * src/grep.c (print_line_middle, grepdir): Remove useless if-before-free. + * src/search.c (IF_BK, EXECUTE_FCT): Likewise. + + maint: enable po-check + * cfg.mk (local-checks-to-skip): Enable sc_po_check. + * po/POTFILES.in: Sort and update. + +2009-12-03 Paolo Bonzini <bonzini@gnu.org> + + update gnulib, fixing missing inclusion of stdbool.h + * gnulib: Update. + +2009-11-30 Jim Meyering <meyering@redhat.com> + + maint: enable two checks + * cfg.mk (local-checks-to-skip): Enable two: + sc_prohibit_xalloc_without_use sc_two_space_separator_in_usage + * src/grep.c (usage): Conform: use two spaces, not 1. + * src/kwset.c (malloc): Define as a function-macro so that the + syntax-check rule sees that we are indeed using xmalloc here. + + maint: enable makefile_path_separator check + * cfg.mk (local-checks-to-skip): Enable sc_makefile_path_separator_check, + now that the sole offender, an old po/Makefile.in.in, is gone. + + maint: remove now-generated file: po/Makefile.in.in + * po/Makefile.in.in: Remove file, now generated via bootstrap. + + maint: enable makefile @...@ check + * cfg.mk (local-checks-to-skip): Enable sc_makefile_check. + * lib/Makefile.am (libgreputils_a_LIBADD): Use $(...), rather than + anachronistic @...@ notation. + * src/Makefile.am (LDADD): Likewise. + * tests/Makefile.am (AWK): Remove definition. + + maint: enable trailing_blank check + * cfg.mk (local-checks-to-skip): Enable sc_trailing_blank. + * AUTHORS: Remove trailing blanks. + * COPYING: Likewise. + * README: Likewise. + * README-alpha: Likewise. + * README-boot: Likewise. + * THANKS: Likewise. + * TODO: Likewise. + * src/dfa.c: Likewise. + * src/mbsupport.h: Likewise. + * tests/backref.sh: Likewise. + * tests/file.sh: Likewise. + * tests/options.sh: Likewise. + * tests/tests: Likewise. + * vms/README: Likewise. + * vms/make.com: Likewise. + + maint: enable unmarked_diagnostics check + * cfg.mk (local-checks-to-skip): Enable sc_unmarked_diagnostics + * src/grep.c (fillbuf): Mark a diagnostic for translation. + (reset): Likewise. + + maint: enable require_config_h checks + * cfg.mk (local-checks-to-skip): Enable sc_require_config_h + and sc_require_config_h_first. + * src/dosbuf.c: Include <config.h>. + * src/vms_fab.c: Likewise. + * .x-sc_require_config_h: New file: list the exceptions. + * .x-sc_require_config_h_first: Likewise. + + maint: use gnulib's progname module; enable set_program_name check + * bootstrap.conf (gnulib_modules): Add progname. + * src/grep.c: Include "progname.h". + (program_name): Remove declaration. + (main): Call set_program_name. + * cfg.mk (local-checks-to-skip): Add sc_program_name. + + maint: enable "file system" check + * cfg.mk (local-checks-to-skip): Enable sc_file_system. + * lib/savedir.c (savedir): Tweak spelling. Remove trailing blanks. + + maint: enable immutable_NEWS check + * NEWS: Move copyright to the bottom. + Use the format required by release-related tools. + * .prev-version: New file. + * cfg.mk (old_NEWS_hash): Define. + (local-checks-to-skip): Enable check: sc_immutable_NEWS. + + maint: disable the many failing syntax-checks + * cfg.mk: New file. + (local-checks-to-skip): Define to the list of disabled rules. + Subsequent change-sets will enable them, one by one. + + build: require automake-1.11, enable silent-rules, parallel tests, xz + * configure.ac (AM_INIT_AUTOMAKE): Create xz-compressed tarballs, + not bzip2-compressed ones. Enable automake's silent-rules, + parallel tests, and test PASS/FAIL coloring options. + Use AC_CONFIG_HEADERS, not AM_CONFIG_HEADER. Quote the argument. + + build: use git-version-gen for inter-release version strings + * configure.ac (AC_INIT): Use git-version-gen. + + build: add several build- and release-related gnulib modules + * bootstrap.conf (gnulib_modules): Add announce-gen update-copyright + do-release-commit-and-tag git-version-gen gnu-web-doc-update + gnupload maintainer-makefile useless-if-before-free + + build: adapt to the newer closeout module from gnulib + * src/grep.c: Include "exitfail.h". + (main) [-q]: Set the global variable, exit_failure, rather than + calling the now-removed close_stdout_set_file_name function. + + build: adapt to the newer exclude API we now get from gnulib + * src/grep.c (main): Adapt to newer exclude.c: add EXCLUDE_WILDCARDS as + the new "option" argument in calls to add_exclude and add_exclude_file. + + build: get more lib/* files from gnulib, adjust savedir + * bootstrap.conf (gnulib_modules): Add the following: + closeout exclude hard-locale isdir strtoumax. + * lib/.gitignore, m4/.gitignore: Update. + * lib/closeout.c, lib/closeout.h: Remove. + * lib/exclude.c, lib/exclude.h: Remove. + * lib/hard-locale.c, lib/hard-locale.h: Remove. + * lib/strtoumax.c: Remove. + * lib/isdir.c: Remove. + * lib/Makefile.am: Remove here, too. + * lib/savedir.c: Adapt to new exclude module: + s/excluded_filename/excluded_file_name/ and remove 3rd argument. + + build: update gnulib submodule to latest + + maint: generate ChangeLog from git logs + * Makefile.am (dist-hook, gen-ChangeLog): New rules. + * bootstrap.conf (gnulib_modules): Add gitlog-to-changelog. + Ensure that ChangeLog exists. + * ChangeLog-2009: Rename from ChangeLog + * ChangeLog: Remove file. + * .gitignore: Add ChangeLog. + + maint: list gnulib modules one per line + * bootstrap.conf (gnulib_modules): List them one per line. + +2009-11-29 Tony Abou-Assaleh <taa@acm.org> + + Acknowledge new maintainers, update README-alpha + * AUTHORS: new maintainers added + * THANKS: same + * README-alpha: change CVS references to Git |