| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
|
|
|
|
|
| |
* tests/sjis-mb: Commit v2.18-123-geb3292b changed how grep
handles patterns with encoding errors. These SJIS tests are
skipped so often that we didn't notice until now that there were
two tests of that changed behavior, and that on any system with
the ja_JP.SHIFT_JIS locale, they would always fail. Remove those
two tests, since this functionality is well tested separately,
via tests/prefix-of-multibyte.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug that can strike only when using a non-UTF8 multibyte
locale like ja_JP.SHIFT_JIS.
Consider this example: it would mistakenly fail to match before
this patch:
printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A
When searching for a single byte that happens to be the latter
byte of a multibyte character, and the target byte also follows
that multibyte character, grep -F would advance an internal pointer
by one byte too many, thus missing the target byte. A test case
for this bug is already included in tests/sjis-mb.
* src/kwsearch.c (Fexecute): Skip one byte less, after matched middle of a
multi-byte character. Introduced by commit v2.18-119-gfb7d538.
|
|
|
|
|
|
|
|
| |
* tests/big-match: Our application of this regexp '^.*x\(\)\1'
to a file containing a single matching line of length 2GiB+2
would cause inordinate memory consumption (over 100GB) via
regexec.c, but no leak. That would cause disruption on most
systems, so remove this subtest. Reported by Assaf Gordon.
|
|
|
|
|
| |
* src/dfa.c (dfassbuild): Don't call memcpy with a second
argument of NULL, even when the size (3rd argument) is 0.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/kwsearch.c (Fexecute): Correctly compute the length of a match
by subtracting 2 (not 1) when match_lines is set. With -x, we augment
the "line" by both prepending and appending an EOLBYTE to the search
pattern. Here, we must correct for that. However, to compensate,
when we are using -x (--line-regexp) and start_ptr is NULL, we have
to add 1 to the length so that we still print the trailing EOLBYTE.
Introduced by commit v2.18-85-g2c94326.
* tests/match-lines: Add a new test.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention it.
|
|
|
|
|
|
|
|
|
| |
The 'sed' command 's/.//' does not delete all bytes in the C locale.
Problem reported by Nelson H. F. Beebe.
* tests/fmbtest: Don't assume that sed treats bytes with the
top bit set as valid characters in the C locale, as this is not
true for Darwin. Use the cs_CZ.UTF-8 locale instead, and
simplify the sed script.
|
|
|
|
| |
* tests/init.cfg (require_pcre_): Remove stray debugging output.
|
|
|
|
|
|
|
|
|
|
| |
On platforms this old, building with _FORTIFY_SOURCE equal to 2
results in duplicate definitions of standard library functions.
Problem reported by Nelson H. F. Beebe.
* configure.ac (_FORTIFY_SOURCE): Sort after GNULIB_PORTCHECK.
By default, do not enable this unless GNULIB_PORTCHECK is defined.
This better matches the original intent, which as I recall was to
enable these extra checks only with --enable-gcc-warnings.
|
|
|
|
|
|
| |
Problem reported by Nelson H. F. Beebe.
* tests/pcre-infloop, tests/pcre-invalid-utf8-input, tests/pcre-utf8:
Skip the test unless PCRE works in an en_US.UTF-8 locale.
|
|
|
|
|
| |
* tests/word-multibyte: This test would fail on a system with
no zh_CN.UTF-8 locale. Use it only if it is installed.
|
|
|
|
|
|
|
| |
* tests/init.cfg (hex_printf_): Spell out a-f and A-F, for
non-C locales, ensure that the input to sed is newline-terminated,
and quote the final octal format string.
Suggestions from Paul Eggert.
|
|
|
|
|
|
| |
* tests/init.cfg (tr): New wrapper function.
See comments for details. Reported by Norihiro Tanaka
in http://debbugs.gnu.org/18991
|
|
|
|
| |
* tests/word-multibyte: Remove unnecessary setting of LC_ALL.
|
|
|
|
|
| |
* tests/init.cfg (hex_printf_): Fix typo s/A-f/A-F/.
For the record, I introduced that error, not Norihiro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* tests/init.cfg (hex_printf_): Rewrite in terms of printf and sed.
Using awk's printf with \xHH in the format string was not portable
to the awk of Solaris 10, AIX 7 or HP-UX 11.23, as reported in
http://debbugs.gnu.org/18987.
* tests/word-multibyte: Use printf rather than hex_printf_,
and give the character we're printing a name: e_acute (rather
than A-grave), since that is used in other tests.
a trailing \n in the format string, adjust by removing it, and
instead invoking echo.
* tests/multibyte-white-space: Simply remove each trailing \n.
They were not needed.
|
|
|
|
|
|
|
|
| |
* tests/word-multibyte: Using the bourne shell's printf function
with strings like "\xHH\xHH" happens to work for most interactive
shells, but not for dash. That is not portable. Use our hex_printf_
awk wrapper instead. Without this change, this test would fail on
a Debian system for which /bin/sh is configured to be "dash".
|
|
|
|
|
|
| |
* tests/init.cfg (hex_printf_): New function, from ...
* tests/multibyte-white-space: ... here. Reflect the
s/hex_print/hex_printf_/ renaming.
|
|
|
|
|
|
|
|
| |
Problem reported by Assaf Gordon in: http://bugs.gnu.org/18892
* NEWS: Document it.
* src/grep.c (open_symlink_nofollow_error):
New function, which does the right thing on NetBSD.
(grepfile): Use it.
|
|
|
|
|
|
|
|
|
| |
* doc/Makefile.am (grep.1): Use mv -f to move temporary to target,
in case the target is read-only. Also, always make the generated
files read-only.
(egrep.1 fgrep.1): Likewise.
This avoids a build failure reported by Eric Blake in
http://lists.gnu.org/archive/html/bug-grep/2014-10/msg00112.html
|
|
|
|
|
|
|
|
| |
On some systems, and for some zh_CN.* locales (e.g., OpenBSD5.5) the
E-acute pair of bytes do not qualify as a word-constituent character.
* tests/word-multibyte: Use zh_CN.UTF-8, rather than "zh_CN".
Reported by Assaf Gordon and Bruce Dubbs in
http://debbugs.gnu.org/18892
|
|
|
|
|
| |
* gnulib: Update to latest.
* bootstrap: Copy latest from gnulib.
|
|
|
|
| |
* tests/word-multibyte: Make this file executable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reported by Jaroslav Skarvada in: http://bugs.gnu.org/18817
Now, \w and \W are supported in not only single byte locale but multibyte
locale.
* src/dfa.c (PUSH_LEX_STATE, POP_LEX_STATE): Move definitions "up",
so they are not within the function.
(lex): Make \w and \W work in a multibyte locale, the same way
we made \s and \S work.
* tests/word-multibyte: New test for this change.
* tests/Makefile.am: Add a rule to build new test.
* NEWS (Bug fixes): Mention it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This command should print nothing:
printf '\263\244\263\244\n' \
| LC_ALL=ja_JP.eucJP grep -E "$(printf '^x|\244\263')"
Before this patch, it would print its sole input line.
* src/dfa.c (struct dfa): Add new members: min_trcount,
initstate_letter, initstate_others.
(dfaanalyze): Build states with not only a newline context but others.
(build_state): Don't release initial states.
(skip_remains_mb): Add a parameter.
Add a comment describing all parameters.
(dfaexec_main): When there are multiple start states, we are about
to transition from one state to another and the current byte is not
the first byte of a multibyte character, first advance past the
current multibyte character.
* tests/euc-mb: Add a new test.
* NEWS (Bug fixes): Mention it.
This addresses http://debbugs.gnu.org/18685
|
|
|
|
|
|
|
|
| |
* tests/pcre-invalid-utf8-input: Add require_timeout_ and
require_compiled_in_MB_support. Put a timeout of 3 seconds on
grep, to avoid having this test case loop forever with older
versions of libpcre, such as those found on RHEL 6.5.
Reported by Jim Meyering in: http://bugs.gnu.org/18806#34
|
|
|
|
|
| |
* tests/pcre-o: New test for this change.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reported by Shlomi Fish in: http://bugs.gnu.org/18806
Commit 9fa500407137f49f6edc3c6b4ee6c7096f0190c5 (2014-09-16) is a
hack that I put in to speed up 'grep -P'. Unfortunately, not only
is it violation of modularity, it's also a bug magnet, as we have
found out with Bug#18738 and Bug#18806. Remove the optimization
instead of applying more bandaids. Perhaps we can think of a
better way of doing the optimization, or perhaps we can just live
with a slower grep -P (as -P is inherently slower anyway...).
* src/grep.c, src/grep.h (validated_boundary):
Remove. All uses removed.
* src/pcresearch.c (Pexecute): Do not worry about validated_boundary.
|
|
|
|
|
|
|
|
|
| |
RE_DOT_NEWLINE and RE_DOT_NOT_NULL apply only to a dot that
matches any character. Do not consider them when matching
with a bracket expression.
* src/dfa.c (match_mb_charset): Remove tests for RE_DOT_NEWLINE
and RE_DOT_NOT_NULL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DFA matcher does not support collating symbols or equivalence
classes, so ensure that any MBCSET reference is handled by the glibc
matcher. dfa.c already handled this in one case, but not the other,
so that a command like "printf '\0' |src/grep -aE '^\s?$'" would
mistakenly end up using dfa.c's match_mb_charset function rather
than glibc's matcher.
* src/dfa.c (dfaexec_main): Move that code into the
State_transition macro. This renders the match_mb_charset
unused by grep.
* tests/multibyte-white-space: Add a test to exercise the
just-rendered-inaccessible code path.
|
|
|
|
|
| |
* src/grep.c (main): Initialize validation_boundary before pre-searching
for an empty line.
|
|
|
|
|
|
|
| |
Reported by Norihiro Tanaka in: http://bugs.gnu.org/18738
* src/pcresearch.c (Pexecute): Fix off-by-one bug with
validation_boundary.
* tests/init.cfg (envvar_check_fail): Catch off-by-one bug.
|
|
|
|
|
|
|
| |
* src/dfa.c (dfaexec_main): After searching for a match from
the initial state, set the previous state, S1, to 0.
So far, we have found no case in which this fix makes a difference.
See http://debbugs.gnu.org/18645
|
|
|
|
|
|
|
|
| |
* doc/grep.in.1 (Tx, Id): Remove. All uses removed.
(MTO, URL): New macros, used for email and URL.
Use them when appropriate.
In main text, omit chatty discussions of other implementations;
the full manual suffices for this sort of thing.
|
|
|
|
|
|
| |
Reported by Santiago Ruano Rincón in: http://bugs.gnu.org/18651
* doc/grep.in.1 (EXIT STATUS):
* doc/grep.texi (Exit Status): Clarify.
|
|
|
|
|
|
|
|
|
| |
* tests/mb-dot-newline: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention it.
Bisection suggests that the bug was introduced by
commit v2.18-123-geb3292b. Also see
http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580
|
|
|
|
|
| |
* src/dfa.c (State_transition): New macro.
(dfaexec_main): Use it twice.
|
|
|
|
|
|
|
|
|
| |
* src/dfa.c (dfaexec_main): Check for end of input buffer after each
transition in a non-UTF8 multibyte locale.
* tests/mb-non-UTF8-overrun: New test.
* tests/Makefile.am (TESTS): Add it.
* src/grep.c (main): With this fix, we no longer need the fourth
byte of "eolbytes".
|
|
|
|
|
|
|
|
| |
Testing binaries built with -fsanitize=address caused aborts due
to stack underrun and overrun.
* src/grep.c (main): Allocate a larger buffer for eolbytes:
one byte before the beginning and one more after the end.
For details, see http://debbugs.gnu.org/18580#44.
|
|
|
|
|
|
| |
src/grep.c (grep): When testing whether an empty line matches,
make the input buffer one byte longer, as dfaexec uses that
for a sentinel.
|
|
|
|
|
|
|
|
|
|
|
|
| |
That attribute isn't portable, and I found a way to get similar
performance with standard C features.
* NEWS: Document the recently-installed performance improvement.
* src/dfa.c (struct dfa): New member dfaexec.
(dfaexec_main): Remove unnecessary 'const'.
(dfaexec_mb, dfaexec_sb): Remove __attribute__ ((noinline));
no longer needed.
(dfaexec): Use new dfaexec member.
(dfainit, dfaoptimize, dfassbuild): Initialize it.
|
|
|
|
|
|
|
|
|
| |
* src/dfa.c (dfaexec_main): Rename from dfaexec, add inline attribute.
(dfaexec_mb): New function. Run it when d->multibyte is true. For this
function inlination must be avoided.
(dfaexec_sb): New function. Run it when d->multibyte is false. For this
function inlination must be avoided.
(dfaexec): Call dfaexec_mb or dfaexec_sb accoding to d->multibyte.
|
|
|
|
|
|
|
|
| |
DFA state is always 0 until have found potential match. So we improve
matching there by continuing to use the transition table.
* src/dfa.c (skip_remains_mb): New function.
(dfaexec): Speed-up at initial state.
|
|
|
|
|
| |
* src/grep.c (CAST_ALIGNED): New macro.
(skip_easy_bytes): Use it.
|
|
|
|
|
|
|
|
|
|
|
| |
Building with --enable-gcc-warnings and gcc-4.9.1 would provoke this:
grep.c:499:12: error: cast from 'const char *' to 'const uword *'\
(aka 'const unsigned long *') increases required alignment from\
1 to 8 [-Werror,-Wcast-align]
for (s = (uword const *) p; ! (*s & hibyte_mask); s++)
^~~~~~~~~~~~~~~~~
* src/grep.c (skip_easy_bytes): Use a pragma to suppress
gcc's false-positive cast-alignment warning.
|
|
|
|
|
|
|
|
|
|
|
| |
Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56
* src/grep.c (grep): After the first buffer is checked, leave the
file-type checker in TEXTBIN_UNKNOWN state only when -P is used.
Only the -P matcher has performance problems with checking binary
data that make it worthwhile to check every prefix input byte so
the -P matcher's TEXTBIN_UNKNOWN optimizations can come into play.
Other matchers can simply check the data directly, and using
TEXTBIN_UNKNOWN with them slows 'grep' down for no benefit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan valid multibyte strings more quickly in the common case of
encodings that are upward compatible with ASCII, such as UTF-8.
You'd think there'd be a fast standard way to do this nowadays,
but nooooo....
Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56
* src/grep.c (HIBYTE): New constant.
(easy_encoding): New static var.
(init_easy_encoding, skip_easy_bytes): New functions.
(uword): New type.
(buffer_textbin): Skip easy bytes quickly.
Don't bother with mb_clen here, since skip_easy_bytes typically
captures the easy cases; just use mbrlen directly.
(buffer_textbin, file_textbin): First arg is no longer a const
pointer, since the byte past the end is now an overwritten sentinel.
(fillbuf): Make room for a uword after the buffer, for skip_easy_bytes.
(main): Call init_easy_encoding.
|
|
|
|
|
| |
* src/grep.c (fillbuf): If SEEK_DATA fails with errno == ENXIO,
skip over the hole at EOF.
|