summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Update .gitmodules to use upstream:gnulibbaserock/v2.21Pedro Alvarez2015-10-251-1/+1
|
* version 2.21v2.21Jim Meyering2014-11-231-1/+1
| | | | * NEWS: Record release date.
* tests: sjis-mb: remove now-obsolete and failing sub-testsJim Meyering2014-11-211-2/+0
| | | | | | | | | | * tests/sjis-mb: Commit v2.18-123-geb3292b changed how grep handles patterns with encoding errors. These SJIS tests are skipped so often that we didn't notice until now that there were two tests of that changed behavior, and that on any system with the ja_JP.SHIFT_JIS locale, they would always fail. Remove those two tests, since this functionality is well tested separately, via tests/prefix-of-multibyte.
* grep -F could erroneously fail to match in non-UTF8 multibyte localesNorihiro Tanaka2014-11-202-3/+21
| | | | | | | | | | | | | | | | | | | This fixes a bug that can strike only when using a non-UTF8 multibyte locale like ja_JP.SHIFT_JIS. Consider this example: it would mistakenly fail to match before this patch: printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A When searching for a single byte that happens to be the latter byte of a multibyte character, and the target byte also follows that multibyte character, grep -F would advance an internal pointer by one byte too many, thus missing the target byte. A test case for this bug is already included in tests/sjis-mb. * src/kwsearch.c (Fexecute): Skip one byte less, after matched middle of a multi-byte character. Introduced by commit v2.18-119-gfb7d538.
* tests: big-match: disable OOM-provoking subtestJim Meyering2014-11-171-1/+4
| | | | | | | | * tests/big-match: Our application of this regexp '^.*x\(\)\1' to a file containing a single matching line of length 2GiB+2 would cause inordinate memory consumption (over 100GB) via regexec.c, but no leak. That would cause disruption on most systems, so remove this subtest. Reported by Assaf Gordon.
* dfa: avoid undefined behaviorNorihiro Tanaka2014-11-161-2/+5
| | | | | * src/dfa.c (dfassbuild): Don't call memcpy with a second argument of NULL, even when the size (3rd argument) is 0.
* gnulib: update to latestJim Meyering2014-11-141-0/+0
|
* grep -F -x -o PAT would print an extra newline for each matchNorihiro Tanaka2014-11-144-2/+45
| | | | | | | | | | | | | * src/kwsearch.c (Fexecute): Correctly compute the length of a match by subtracting 2 (not 1) when match_lines is set. With -x, we augment the "line" by both prepending and appending an EOLBYTE to the search pattern. Here, we must correct for that. However, to compensate, when we are using -x (--line-regexp) and start_ptr is NULL, we have to add 1 to the length so that we still print the trailing EOLBYTE. Introduced by commit v2.18-85-g2c94326. * tests/match-lines: Add a new test. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it.
* tests: port to DarwinPaul Eggert2014-11-111-10/+11
| | | | | | | | | The 'sed' command 's/.//' does not delete all bytes in the C locale. Problem reported by Nelson H. F. Beebe. * tests/fmbtest: Don't assume that sed treats bytes with the top bit set as valid characters in the C locale, as this is not true for Darwin. Use the cs_CZ.UTF-8 locale instead, and simplify the sed script.
* tests: fix recently-introduced stray outputPaul Eggert2014-11-111-1/+0
| | | | * tests/init.cfg (require_pcre_): Remove stray debugging output.
* build: port to GCC 4.6.4 + glibc 2.5Paul Eggert2014-11-111-3/+4
| | | | | | | | | | On platforms this old, building with _FORTIFY_SOURCE equal to 2 results in duplicate definitions of standard library functions. Problem reported by Nelson H. F. Beebe. * configure.ac (_FORTIFY_SOURCE): Sort after GNULIB_PORTCHECK. By default, do not enable this unless GNULIB_PORTCHECK is defined. This better matches the original intent, which as I recall was to enable these extra checks only with --enable-gcc-warnings.
* tests: port to libpcre sans UTF-8 supportPaul Eggert2014-11-113-3/+3
| | | | | | Problem reported by Nelson H. F. Beebe. * tests/pcre-infloop, tests/pcre-invalid-utf8-input, tests/pcre-utf8: Skip the test unless PCRE works in an en_US.UTF-8 locale.
* tests: do not fail when the zh_CN.UTF-8 locale is not installedJim Meyering2014-11-091-1/+9
| | | | | * tests/word-multibyte: This test would fail on a system with no zh_CN.UTF-8 locale. Use it only if it is installed.
* tests: avoid hex_printf_ portability problemsJim Meyering2014-11-091-2/+4
| | | | | | | * tests/init.cfg (hex_printf_): Spell out a-f and A-F, for non-C locales, ensure that the input to sed is newline-terminated, and quote the final octal format string. Suggestions from Paul Eggert.
* tests: avoid a multibyte tr portability problemJim Meyering2014-11-081-0/+9
| | | | | | * tests/init.cfg (tr): New wrapper function. See comments for details. Reported by Norihiro Tanaka in http://debbugs.gnu.org/18991
* maint: remove spurious LC_ALL setting from one testJim Meyering2014-11-081-2/+0
| | | | * tests/word-multibyte: Remove unnecessary setting of LC_ALL.
* tests: fix typo in previous changeJim Meyering2014-11-081-1/+1
| | | | | * tests/init.cfg (hex_printf_): Fix typo s/A-f/A-F/. For the record, I introduced that error, not Norihiro.
* tests: avoid awk+printf+\xHH portability trapNorihiro Tanaka2014-11-083-7/+8
| | | | | | | | | | | | | | * tests/init.cfg (hex_printf_): Rewrite in terms of printf and sed. Using awk's printf with \xHH in the format string was not portable to the awk of Solaris 10, AIX 7 or HP-UX 11.23, as reported in http://debbugs.gnu.org/18987. * tests/word-multibyte: Use printf rather than hex_printf_, and give the character we're printing a name: e_acute (rather than A-grave), since that is used in other tests. a trailing \n in the format string, adjust by removing it, and instead invoking echo. * tests/multibyte-white-space: Simply remove each trailing \n. They were not needed.
* tests: avoid printf+\xHH portability trapJim Meyering2014-11-071-1/+1
| | | | | | | | * tests/word-multibyte: Using the bourne shell's printf function with strings like "\xHH\xHH" happens to work for most interactive shells, but not for dash. That is not portable. Use our hex_printf_ awk wrapper instead. Without this change, this test would fail on a Debian system for which /bin/sh is configured to be "dash".
* maint: move helper function, hex_printf to init.cfgJim Meyering2014-11-072-10/+10
| | | | | | * tests/init.cfg (hex_printf_): New function, from ... * tests/multibyte-white-space: ... here. Reflect the s/hex_print/hex_printf_/ renaming.
* grep: port O_NOFOLLOW errno checking to NetBSDPaul Eggert2014-11-022-1/+19
| | | | | | | | Problem reported by Assaf Gordon in: http://bugs.gnu.org/18892 * NEWS: Document it. * src/grep.c (open_symlink_nofollow_error): New function, which does the right thing on NetBSD. (grepfile): Use it.
* build: generate man pages even when existing targets are read-onlyJim Meyering2014-10-311-4/+8
| | | | | | | | | * doc/Makefile.am (grep.1): Use mv -f to move temporary to target, in case the target is read-only. Also, always make the generated files read-only. (egrep.1 fgrep.1): Likewise. This avoids a build failure reported by Eric Blake in http://lists.gnu.org/archive/html/bug-grep/2014-10/msg00112.html
* tests: avoid false-positive failure due to some zh_CN.* localesJim Meyering2014-10-301-1/+1
| | | | | | | | On some systems, and for some zh_CN.* locales (e.g., OpenBSD5.5) the E-acute pair of bytes do not qualify as a word-constituent character. * tests/word-multibyte: Use zh_CN.UTF-8, rather than "zh_CN". Reported by Assaf Gordon and Bruce Dubbs in http://debbugs.gnu.org/18892
* gnulib: update to latest; bootstrap, tooJim Meyering2014-10-292-7/+18
| | | | | * gnulib: Update to latest. * bootstrap: Copy latest from gnulib.
* tests: make new test script executableJim Meyering2014-10-281-0/+0
| | | | * tests/word-multibyte: Make this file executable.
* dfa: make \w and \W work in multibyte localesNorihiro Tanaka2014-10-284-21/+67
| | | | | | | | | | | | | | Reported by Jaroslav Skarvada in: http://bugs.gnu.org/18817 Now, \w and \W are supported in not only single byte locale but multibyte locale. * src/dfa.c (PUSH_LEX_STATE, POP_LEX_STATE): Move definitions "up", so they are not within the function. (lex): Make \w and \W work in a multibyte locale, the same way we made \s and \S work. * tests/word-multibyte: New test for this change. * tests/Makefile.am: Add a rule to build new test. * NEWS (Bug fixes): Mention it.
* dfa: avoid false match in a non-UTF8 multibyte localeNorihiro Tanaka2014-10-263-18/+67
| | | | | | | | | | | | | | | | | | | | | | This command should print nothing: printf '\263\244\263\244\n' \ | LC_ALL=ja_JP.eucJP grep -E "$(printf '^x|\244\263')" Before this patch, it would print its sole input line. * src/dfa.c (struct dfa): Add new members: min_trcount, initstate_letter, initstate_others. (dfaanalyze): Build states with not only a newline context but others. (build_state): Don't release initial states. (skip_remains_mb): Add a parameter. Add a comment describing all parameters. (dfaexec_main): When there are multiple start states, we are about to transition from one state to another and the current byte is not the first byte of a multibyte character, first advance past the current multibyte character. * tests/euc-mb: Add a new test. * NEWS (Bug fixes): Mention it. This addresses http://debbugs.gnu.org/18685
* tests: work around older libpcre bugs when testing -P and UTF-8Paul Eggert2014-10-251-3/+5
| | | | | | | | * tests/pcre-invalid-utf8-input: Add require_timeout_ and require_compiled_in_MB_support. Put a timeout of 3 seconds on grep, to avoid having this test case loop forever with older versions of libpcre, such as those found on RHEL 6.5. Reported by Jim Meyering in: http://bugs.gnu.org/18806#34
* tests: add test for grep -P fixNorihiro Tanaka2014-10-242-0/+18
| | | | | * tests/pcre-o: New test for this change. * tests/Makefile.am (TESTS): Add it.
* grep: fix grep -P crashPaul Eggert2014-10-243-30/+14
| | | | | | | | | | | | | | Reported by Shlomi Fish in: http://bugs.gnu.org/18806 Commit 9fa500407137f49f6edc3c6b4ee6c7096f0190c5 (2014-09-16) is a hack that I put in to speed up 'grep -P'. Unfortunately, not only is it violation of modularity, it's also a bug magnet, as we have found out with Bug#18738 and Bug#18806. Remove the optimization instead of applying more bandaids. Perhaps we can think of a better way of doing the optimization, or perhaps we can just live with a slower grep -P (as -P is inherently slower anyway...). * src/grep.c, src/grep.h (validated_boundary): Remove. All uses removed. * src/pcresearch.c (Pexecute): Do not worry about validated_boundary.
* dfa: remove two erroneous clauses from a now-unused functionNorihiro Tanaka2014-10-191-11/+1
| | | | | | | | | RE_DOT_NEWLINE and RE_DOT_NOT_NULL apply only to a dot that matches any character. Do not consider them when matching with a bracket expression. * src/dfa.c (match_mb_charset): Remove tests for RE_DOT_NEWLINE and RE_DOT_NOT_NULL.
* dfa: process all MBCSET constructs via glibc's matcherNorihiro Tanaka2014-10-192-10/+20
| | | | | | | | | | | | | | | The DFA matcher does not support collating symbols or equivalence classes, so ensure that any MBCSET reference is handled by the glibc matcher. dfa.c already handled this in one case, but not the other, so that a command like "printf '\0' |src/grep -aE '^\s?$'" would mistakenly end up using dfa.c's match_mb_charset function rather than glibc's matcher. * src/dfa.c (dfaexec_main): Move that code into the State_transition macro. This renders the match_mb_charset unused by grep. * tests/multibyte-white-space: Add a test to exercise the just-rendered-inaccessible code path.
* grep: initialize validation_boundary properly before useNorihiro Tanaka2014-10-151-0/+1
| | | | | * src/grep.c (main): Initialize validation_boundary before pre-searching for an empty line.
* grep: fix off-by-one bug in -P optimizationPaul Eggert2014-10-152-2/+6
| | | | | | | Reported by Norihiro Tanaka in: http://bugs.gnu.org/18738 * src/pcresearch.c (Pexecute): Fix off-by-one bug with validation_boundary. * tests/init.cfg (envvar_check_fail): Catch off-by-one bug.
* dfa: fix a theoretical bugNorihiro Tanaka2014-10-081-0/+1
| | | | | | | * src/dfa.c (dfaexec_main): After searching for a match from the initial state, set the previous state, S1, to 0. So far, we have found no case in which this fix makes a difference. See http://debbugs.gnu.org/18645
* doc: modernize and simplify man pagePaul Eggert2014-10-071-133/+35
| | | | | | | | * doc/grep.in.1 (Tx, Id): Remove. All uses removed. (MTO, URL): New macros, used for email and URL. Use them when appropriate. In main text, omit chatty discussions of other implementations; the full manual suffices for this sort of thing.
* doc: clarify exit statusPaul Eggert2014-10-072-23/+10
| | | | | | Reported by Santiago Ruano Rincón in: http://bugs.gnu.org/18651 * doc/grep.in.1 (EXIT STATUS): * doc/grep.texi (Exit Status): Clarify.
* dfa: test for just-fixed bugNorihiro Tanaka2014-10-073-0/+51
| | | | | | | | | * tests/mb-dot-newline: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it. Bisection suggests that the bug was introduced by commit v2.18-123-geb3292b. Also see http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580
* dfa: factor out a new nontrivial block of duplicated codeNorihiro Tanaka2014-10-051-41/+32
| | | | | * src/dfa.c (State_transition): New macro. (dfaexec_main): Use it twice.
* dfa: check end of input buffer after transition in non-UTF8 multibyte localeNorihiro Tanaka2014-10-054-2/+63
| | | | | | | | | * src/dfa.c (dfaexec_main): Check for end of input buffer after each transition in a non-UTF8 multibyte locale. * tests/mb-non-UTF8-overrun: New test. * tests/Makefile.am (TESTS): Add it. * src/grep.c (main): With this fix, we no longer need the fourth byte of "eolbytes".
* grep: avoid stack buffer read-underrun and overrunJim Meyering2014-10-041-2/+3
| | | | | | | | Testing binaries built with -fsanitize=address caused aborts due to stack underrun and overrun. * src/grep.c (main): Allocate a larger buffer for eolbytes: one byte before the beginning and one more after the end. For details, see http://debbugs.gnu.org/18580#44.
* grep: fix subscript error when testing whether empty lines matchNorihiro Tanaka2014-10-041-1/+2
| | | | | | src/grep.c (grep): When testing whether an empty line matches, make the input buffer one byte longer, as dfaexec uses that for a sentinel.
* dfa: minor tweaks, mostly to remove __attribute__ ((noinline))Paul Eggert2014-09-272-9/+26
| | | | | | | | | | | | That attribute isn't portable, and I found a way to get similar performance with standard C features. * NEWS: Document the recently-installed performance improvement. * src/dfa.c (struct dfa): New member dfaexec. (dfaexec_main): Remove unnecessary 'const'. (dfaexec_mb, dfaexec_sb): Remove __attribute__ ((noinline)); no longer needed. (dfaexec): Use new dfaexec member. (dfainit, dfaoptimize, dfassbuild): Initialize it.
* dfa: separate dfaexec function to help optimization by compilerNorihiro Tanaka2014-09-271-6/+29
| | | | | | | | | * src/dfa.c (dfaexec_main): Rename from dfaexec, add inline attribute. (dfaexec_mb): New function. Run it when d->multibyte is true. For this function inlination must be avoided. (dfaexec_sb): New function. Run it when d->multibyte is false. For this function inlination must be avoided. (dfaexec): Call dfaexec_mb or dfaexec_sb accoding to d->multibyte.
* dfa: speed-up at initial stateNorihiro Tanaka2014-09-271-19/+35
| | | | | | | | DFA state is always 0 until have found potential match. So we improve matching there by continuing to use the transition table. * src/dfa.c (skip_remains_mb): New function. (dfaexec): Speed-up at initial state.
* maint: generalize the -Wcast-align fixPaul Eggert2014-09-271-8/+16
| | | | | * src/grep.c (CAST_ALIGNED): New macro. (skip_easy_bytes): Use it.
* maint: suppress a false-positive -Wcast-align warningJim Meyering2014-09-271-0/+7
| | | | | | | | | | | Building with --enable-gcc-warnings and gcc-4.9.1 would provoke this: grep.c:499:12: error: cast from 'const char *' to 'const uword *'\ (aka 'const unsigned long *') increases required alignment from\ 1 to 8 [-Werror,-Wcast-align] for (s = (uword const *) p; ! (*s & hibyte_mask); s++) ^~~~~~~~~~~~~~~~~ * src/grep.c (skip_easy_bytes): Use a pragma to suppress gcc's false-positive cast-alignment warning.
* grep: don't check extensively for invalid prefix bytes unless -PPaul Eggert2014-09-261-0/+2
| | | | | | | | | | | Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56 * src/grep.c (grep): After the first buffer is checked, leave the file-type checker in TEXTBIN_UNKNOWN state only when -P is used. Only the -P matcher has performance problems with checking binary data that make it worthwhile to check every prefix input byte so the -P matcher's TEXTBIN_UNKNOWN optimizations can come into play. Other matchers can simply check the data directly, and using TEXTBIN_UNKNOWN with them slows 'grep' down for no benefit.
* grep: scan for valid multibyte strings more quicklyPaul Eggert2014-09-261-17/+76
| | | | | | | | | | | | | | | | | | | Scan valid multibyte strings more quickly in the common case of encodings that are upward compatible with ASCII, such as UTF-8. You'd think there'd be a fast standard way to do this nowadays, but nooooo.... Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56 * src/grep.c (HIBYTE): New constant. (easy_encoding): New static var. (init_easy_encoding, skip_easy_bytes): New functions. (uword): New type. (buffer_textbin): Skip easy bytes quickly. Don't bother with mb_clen here, since skip_easy_bytes typically captures the easy cases; just use mbrlen directly. (buffer_textbin, file_textbin): First arg is no longer a const pointer, since the byte past the end is now an overwritten sentinel. (fillbuf): Make room for a uword after the buffer, for skip_easy_bytes. (main): Call init_easy_encoding.
* grep: speed up processing of holes before EOF on SolarisPaul Eggert2014-09-171-0/+5
| | | | | * src/grep.c (fillbuf): If SEEK_DATA fails with errno == ENXIO, skip over the hole at EOF.