summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNorihiro Tanaka <noritnk@kcn.ne.jp>2014-11-18 13:36:42 +0900
committerJim Meyering <meyering@fb.com>2014-11-20 14:52:33 -0800
commitff7e6edab11d6de3bfa33d426ab0f66eb23fa35a (patch)
tree9939b921338c0bd9e5abfcb8b5ee70b362a5b678
parent83af04cef3139045f1a95a63b70b5935dbf857d8 (diff)
downloadgrep-ff7e6edab11d6de3bfa33d426ab0f66eb23fa35a.tar.gz
grep -F could erroneously fail to match in non-UTF8 multibyte locales
This fixes a bug that can strike only when using a non-UTF8 multibyte locale like ja_JP.SHIFT_JIS. Consider this example: it would mistakenly fail to match before this patch: printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A When searching for a single byte that happens to be the latter byte of a multibyte character, and the target byte also follows that multibyte character, grep -F would advance an internal pointer by one byte too many, thus missing the target byte. A test case for this bug is already included in tests/sjis-mb. * src/kwsearch.c (Fexecute): Skip one byte less, after matched middle of a multi-byte character. Introduced by commit v2.18-119-gfb7d538.
-rw-r--r--NEWS7
-rw-r--r--src/kwsearch.c17
2 files changed, 21 insertions, 3 deletions
diff --git a/NEWS b/NEWS
index 15975288..c5202dbf 100644
--- a/NEWS
+++ b/NEWS
@@ -48,6 +48,13 @@ GNU grep NEWS -*- outline -*-
of a multibyte character when using a '^'-anchored alternate in a pattern,
leading it to print non-matching lines. [bug present since "the beginning"]
+ grep -F Y no longer fails to match in non-UTF8 multibyte locales like
+ Shift-JIS, when the input contains a 2-byte character, XY, followed by
+ the single-byte search pattern, Y. grep would find the first, middle-
+ of-multibyte matching "Y", and then mistakenly advance an internal
+ pointer one byte too far, skipping over the target "Y" just after that.
+ [bug introduced in grep-2.19]
+
grep -E rejected unmatched ')', instead of treating it like '\)'.
[bug present since "the beginning"]
diff --git a/src/kwsearch.c b/src/kwsearch.c
index aa965f62..1335a269 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -133,9 +133,20 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
if (!match_lines && MB_CUR_MAX > 1 && !using_utf8 ()
&& mb_goback (&mb_start, beg + offset, buf + size) != 0)
{
- /* The match was a part of multibyte character, advance at least
- one byte to ensure no infinite loop happens. */
- beg = mb_start;
+ /* We have matched a single byte that is not at the beginning of a
+ multibyte character. mb_goback has advanced MB_START past that
+ multibyte character. Now, we want to position BEG so that the
+ next kwsexec search starts there. Thus, to compensate for the
+ for-loop's BEG++, above, subtract one here. This code is
+ unusually hard to reach, and exceptionally, let's show how to
+ trigger it here:
+
+ printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A
+
+ That assumes the named locale is installed.
+ Note that your system's shift-JIS locale may have a different
+ name, possibly including "sjis". */
+ beg = mb_start - 1;
continue;
}
beg += offset;