From f2e9cd38b97c58bcd66b804e08186503e00a3e8e Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Fri, 31 Dec 2021 12:20:27 +0100 Subject: Update to Unicode 13.0.0. * lib/gen-uni-tables.c (is_WBP_MIDLETTER): Add character 0x055F. (get_wbp): Assign value WBP_ALETTER to the characters 0x02E5..0x02EB, 0x055A, 0x058A, 0xA708..0xA716. * lib/gen-uni-tables.c (LBP_CP1, LBP_CP2, LBP_OP1, LBP_OP2): New enum values. (LBP_OP, LBP_CP): Assign artificial values. (get_lbp): Use the unicode_width[] table to assign LBP_CP1, LBP_CP2 instead of LBP_CP, and LBP_OP1, LBP_OP2 instead of LBP_OP. Update such that unilbrk/lbrkprop.txt comes out as expected. (debug_output_lbp): Print either LBP_CP1 or LBP_CP2 as LBP_CP. Print either LBP_OP1 or LBP_OP2 as LBP_OP. (lbp_value_to_string): Handle LBP_CP1, LBP_CP2, LBP_OP1, LBP_OP2 instead of LBP_CP, LBP_OP. (output_lbrk_rules_as_tables): Treat LBP_CP and LBP_OP as macros that map to two table rows/columns. In rule LB30, use only LBP_OP1 instead of LBP_OP, and only LBP_CP1 instead of LBP_CP. Simplify rule LB22. * lib/unilbrk/lbrktables.h (LBP_CP1, LBP_CP2, LBP_OP1, LBP_OP2): New enum values. (LBP_OP, LBP_CP): Remove enum values. (unilbrk_table): Update declaration. * lib/unilbrk/u8-possible-linebreaks.c (u8_possible_linebreaks_loop): Add a test for East Asian opening parenthesis. * lib/unilbrk/u16-possible-linebreaks.c (u16_possible_linebreaks_loop): Likewise. * lib/unilbrk/u32-possible-linebreaks.c (u32_possible_linebreaks_loop): Likewise. * lib/uniwidth/width.c (nonspacing_table_data, nonspacing_table_ind): Update. (uc_width): Assign width 2 to the characters 0x16FF0..0x16FF1, 0x18AF3..0x18CD5, 0x18D00..0x18D08, 0x1F6D6..0x1F6D7, 0x1F6FB..0x1F6FC, 0x1F90C, 0x1FA74, 0x1FA83..0x1FA86, 0x1FA96..0x1FAA8, 0x1FAB0..0x1FAB6, 0x1FAC0..0x1FAC2, 0x1FAD0..0x1FAD6. Assign width 1 to the characters 0x1F93B, 0x1F946. * tests/uniwidth/test-uc_width2.sh: Expect width 0 for the characters 0x0B55, 0x0D81, 0x1ABF..0x1AC0, 0xA82C, 0x10EAB..0x10EAC, 0x111CF, 0x1193B..0x1193C, 0x1193E, 0x11943, 0x16FE4. Expect width 2 for the characters 0x16FF0..0x16FF1, 0x18AF3..0x18CD5, 0x18D00..0x18D08, 0x1F6D6..0x1F6D7, 0x1F6FB..0x1F6FC, 0x1F90C, 0x1FA74, 0x1FA83..0x1FA86, 0x1FA96..0x1FAA8, 0x1FAB0..0x1FAB6, 0x1FAC0..0x1FAC2, 0x1FAD0..0x1FAD6. Expect width 1 for the characters 0x1F93B, 0x1F946. * All generated files under lib/uni* and tests/uni*: Regenerate. * tests/uniname/NameAliases.txt: Update. * tests/uniname/UnicodeData.txt: Update. * tests/uninorm/NormalizationTest.txt: Update. * tests/unigbrk/GraphemeBreakTest.txt: Update. * tests/uniwbrk/WordBreakTest.txt: Update. * All the affected modules: Bump required libunistring version. --- tests/uniwidth/test-uc_width2.sh | 77 +++++++++++++++++++++++++++------------- 1 file changed, 53 insertions(+), 24 deletions(-) (limited to 'tests/uniwidth') diff --git a/tests/uniwidth/test-uc_width2.sh b/tests/uniwidth/test-uc_width2.sh index b6d05f14f3..585291b210 100755 --- a/tests/uniwidth/test-uc_width2.sh +++ b/tests/uniwidth/test-uc_width2.sh @@ -133,8 +133,8 @@ cat > uc_width.ok <<\EOF 0B41..0B44 0 0B45..0B4C A 0B4D 0 -0B4E..0B55 A -0B56 0 +0B4E..0B54 A +0B55..0B56 0 0B57..0B61 A 0B62..0B63 0 0B64..0B81 A @@ -175,7 +175,9 @@ cat > uc_width.ok <<\EOF 0D4D 0 0D4E..0D61 A 0D62..0D63 0 -0D64..0DC9 A +0D64..0D80 A +0D81 0 +0D82..0DC9 A 0DCA 0 0DCB..0DD1 A 0DD2..0DD4 0 @@ -291,8 +293,8 @@ cat > uc_width.ok <<\EOF 1A7D..1A7E A 1A7F 0 1A80..1AAF A -1AB0..1ABE 0 -1ABF..1AFF A +1AB0..1AC0 0 +1AC1..1AFF A 1B00..1B03 0 1B04..1B33 A 1B34 0 @@ -454,7 +456,9 @@ A807..A80A A A80B 0 A80C..A824 A A825..A826 0 -A827..A8C3 A +A827..A82B A +A82C 0 +A82D..A8C3 A A8C4..A8C5 0 A8C6..A8DF A A8E0..A8F1 0 @@ -550,7 +554,9 @@ FFFC..101FC 1 10AE5..10AE6 0 10AE7..10D23 1 10D24..10D27 0 -10D28..10F45 1 +10D28..10EAA 1 +10EAB..10EAC 0 +10EAD..10F45 1 10F46..10F50 0 10F51..11000 1 11001 0 @@ -580,7 +586,9 @@ FFFC..101FC 1 111B6..111BE 0 111BF..111C8 1 111C9..111CC 0 -111CD..1122E 1 +111CD..111CE 1 +111CF 0 +111D0..1122E 1 1122F..11231 0 11232..11233 1 11234 0 @@ -650,7 +658,13 @@ FFFC..101FC 1 1182F..11837 0 11838 1 11839..1183A 0 -1183B..119D3 1 +1183B..1193A 1 +1193B..1193C 0 +1193D 1 +1193E 0 +1193F..11942 1 +11943 0 +11944..119D3 1 119D4..119D7 0 119D8..119D9 1 119DA..119DB 0 @@ -716,11 +730,16 @@ FFFC..101FC 1 16F8F..16F92 0 16F93..16FDF 1 16FE0..16FE3 2 -16FE4..16FFF 1 +16FE4 0 +16FE5..16FEF 1 +16FF0..16FF1 2 +16FF2..16FFF 1 17000..187F7 2 187F8..187FF 1 -18800..18AF2 2 -18AF3..1AFFF 1 +18800..18CD5 2 +18CD6..18CFF 1 +18D00..18D08 2 +18D09..1AFFF 1 1B000..1B11F 2 1B120..1B14F 1 1B150..1B152 2 @@ -823,24 +842,34 @@ FFFC..101FC 1 1F6CD..1F6CF 1 1F6D0..1F6D2 2 1F6D3..1F6D4 1 -1F6D5 2 -1F6D6..1F6EA 1 +1F6D5..1F6D7 2 +1F6D8..1F6EA 1 1F6EB..1F6EC 2 1F6ED..1F6F3 1 -1F6F4..1F6FA 2 -1F6FB..1F7DF 1 +1F6F4..1F6FC 2 +1F6FD..1F7DF 1 1F7E0..1F7EB 2 -1F7EC..1F90C 1 -1F90D..1F9FF 2 +1F7EC..1F90B 1 +1F90C..1F93A 2 +1F93B 1 +1F93C..1F945 2 +1F946 1 +1F947..1F9FF 2 1FA00..1FA6F 1 -1FA70..1FA73 2 -1FA74..1FA77 1 +1FA70..1FA74 2 +1FA75..1FA77 1 1FA78..1FA7A 2 1FA7B..1FA7F 1 -1FA80..1FA82 2 -1FA83..1FA8F 1 -1FA90..1FA95 2 -1FA96..1FFFF 1 +1FA80..1FA86 2 +1FA87..1FA8F 1 +1FA90..1FAA8 2 +1FAA9..1FAAF 1 +1FAB0..1FAB6 2 +1FAB7..1FABF 1 +1FAC0..1FAC2 2 +1FAC3..1FACF 1 +1FAD0..1FAD6 2 +1FAD7..1FFFF 1 20000..3FFFF 2 40000..E0000 1 E0001 0 -- cgit v1.2.1