diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2022-05-13 23:23:35 -0700 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2022-05-13 23:25:43 -0700 |
commit | b19a10775e54f8ed17e3a8c08a72d261d8c26244 (patch) | |
tree | 3babf016d1cc498afd1cec9952b5eb56030122d9 /tests/test-dfa-match.sh | |
parent | efa0065f1682f53fb15ad427555ddedec6ec51eb (diff) | |
download | gnulib-b19a10775e54f8ed17e3a8c08a72d261d8c26244.tar.gz |
dfa: fix bug with ‘.’ and UTF-8 Hangul Syllables
This fixes a bug introduced in 2019-12-18T05:41:27Z!eggert@cs.ucla.edu,
an earlier patch that fixed dfa.c to not match invalid UTF-8.
Unfortunately that patch had a couple of typos when dfa.c is
matching against the regular expression ‘.’ (dot). One typo
caused dfa.c to incorrectly reject the valid UTF-8 sequences
(ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF, which
are some Hangul Syllables and Hangul Jamo Extended-B. The other
typo caused dfa.c to incorrectly reject the valid sequences
(F4)(88-8F)(80-BF)(80-BF) which correspond to U+108000 through
U+10FFFF (Supplemental Private Use Area plane B).
* lib/dfa.c (utf8_classes): Fix typos.
* tests/test-dfa-match.sh: Test the fix.
Diffstat (limited to 'tests/test-dfa-match.sh')
-rwxr-xr-x | tests/test-dfa-match.sh | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/tests/test-dfa-match.sh b/tests/test-dfa-match.sh index b23851b8c0..4561584c4c 100755 --- a/tests/test-dfa-match.sh +++ b/tests/test-dfa-match.sh @@ -42,4 +42,15 @@ in=$(printf "bb\nbb") $timeout_10 ${CHECKER} test-dfa-match-aux a "$in" 1 > out || fail=1 compare /dev/null out || fail=1 +# If the platform supports U+00E9 LATIN SMALL LETTER E WITH ACUTE, +# test U+D45C HANGUL SYLLABLE PYO. +U_00E9=$(printf '\303\251\n') +U_D45C=$(printf '\355\221\234\n') +if testout=$(LC_ALL=en_US.UTF-8 $CHECKER test-dfa-match-aux '^.$' "$U_00E9") && + test "$testout" = 2 +then + testout=$(LC_ALL=en_US.UTF-8 $CHECKER test-dfa-match-aux '^.$' "$U_D45C") && + test "$testout" = 3 || fail=1 +fi + Exit $fail |