summaryrefslogtreecommitdiff
path: root/test/ruby/enc
Commit message (Collapse)AuthorAgeFilesLines
* support multi-run for test/ruby/enc/test_regex_casefold.rbKoichi Sasada2020-01-291-1/+1
| | | | should not mutate test data.
* Removed excess spacesNobuyoshi Nakada2019-06-282-2/+2
|
* Fixed name conflict between helper classesNobuyoshi Nakada2019-06-282-2/+8
|
* Add new encoding CESU-8 [Feature #15931]NARUSE, Yui2019-06-241-0/+109
|
* Test to disable ASCII-only optimizationNobuyoshi Nakada2019-05-171-0/+10
| | | | | | | Examples why ASCII-only optimization cannot apply multi-byte encodings which have 7-bit trailing bytes. Suggested by @duerst at https://github.com/ruby/ruby/pull/2187#issuecomment-492949218
* add a test to make sure some unassigned codepoints do not get convertedduerst2018-12-101-0/+6
| | | | | | | | | | | | In test/ruby/enc/test_case_mapping.rb, add a test to make sure the unassigned codepoints in the Georgian MTAVRULI range (U+1CBB, U+1CBC) do not get converted to unrelated codepoints by String#capitalize. (It turns out that this test was not strictly necessary, because unassigned codepoints are already excluded by the fact that they are not found in the onigenc_unicode_fold_lookup table. So this test only serves to check against future regressions.) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66314 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* implement special behavior for Georgian for String#capitalizeduerst2018-12-092-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* replace hardcoded emoji version by RbConfig::CONFIG['UNICODE_EMOJI_VERSION']duerst2018-12-071-2/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* update to Unicode 11.0.0 (main step, not complete yet)duerst2018-12-051-2/+2
| | | | | | | | | | | | | | - common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0 - test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. Files for Unicode 10.0.0 will be removed once we are sure 11.0.0 works. - lib/unicode_normalize/tables.rb: Updated table. - regparse.c: Almost completely reimplement grapheme cluster detection in function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* exclude skin tones as second component in TestEmojiBreaks#test_mixed_emojiduerst2018-12-041-0/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* change embedding character in TestEmojiBreaks#test_embedded_emojiduerst2018-12-041-2/+2
| | | | | | | | In test/ruby/enc/test_emoji_breaks.rb, in method TestEmojiBreaks#test_embedded_emoji, change the surrounding characters from A/Z to the more neutral \t in preparation for upgrade to Unicode 11.0.0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66180 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* solve the genie/zombie/wrestlers bugduerst2018-12-021-3/+0
| | | | | | | | | | enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF to onigenc_unicode_GCB_ranges_E_Base. - Add comments with character names. test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers. This closes issue #15343. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* improve messages for test failuresduerst2018-11-261-7/+11
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66010 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* add tests for grapheme clusters using Unicode Emoji test dataduerst2018-11-261-0/+117
| | | | | | | | | | Add file test/ruby/enc/test_emoji_breaks.rb to test String#each_grapheme_cluster test data provided by Unicode (at https://www.unicode.org/Public/emoji/#{EMOJI_VERSION}/). Lines containing emoji for genies, zombies, and wrestling are ignored because there seems to be a bug (#15343) in the implementation. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove guard against bug #15337, because it is fixedduerst2018-11-241-2/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* add tests using Unicode test data for grapheme clustersduerst2018-11-241-0/+94
| | | | | | | | | | | | Add file test/ruby/enc/test_grapheme_breaks.rb to test String#each_grapheme_cluster and \X extended grapheme cluster matcher in regular expressions against test data provided by Unicode (ucd/auxiliary/GraphemeBreakTest.txt). Some lines in the data file are ignored, as follows: - Lines with a surrogate, because Ruby doesn't handle these - The case of "\r\n", because there is a bug (#15337) in the implementation git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65955 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix unicode data directorynobu2017-12-231-5/+3
| | | | | | | * test/ruby/enc/test_regex_casefold.rb: fix searching unicode data directory, like as test_case_comprehensive.rb. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61417 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* update unicode data files directorynobu2017-12-222-2/+7
| | | | | | | | | * test/ruby/enc/test_case_comprehensive.rb: search ucd directory first if exists. * test/ruby/enc/test_regex_casefold.rb: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61415 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix UTF-32 valid_encoding?nobu2017-03-091-0/+68
| | | | | | | | | | | | * enc/utf_32be.c (utf32be_mbc_enc_len): check arguments precisely. [ruby-core:79966] [Bug #13292] * enc/utf_32le.c (utf32le_mbc_enc_len): ditto. * regenc.h (UNICODE_VALID_CODEPOINT_P): predicate for valid Unicode codepoints. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57816 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* test_utf16.rb: refine valid_encoding testsnobu2017-03-091-50/+62
| | | | | | | | | | * test/ruby/enc/test_utf16.rb (test_utf16be_valid_encoding): assert all data and use assert_predicate. * test/ruby/enc/test_utf16.rb (test_utf16le_valid_encoding): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57815 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* add tests againts regressions for upcoming codepoint reordering in unfolding ↵duerst2016-12-031-0/+25
| | | | | | | | | | table * test/ruby/enc/test_case_mapping.rb: Add method test_reorder_unfold to test against problems when reordering codepoints in some entries in CaseUnfold_11_Type CaseUnfold_11_Table. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56968 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* change test class name because it is not only about foldingduerst2016-12-031-3/+3
| | | | | | | | * test/ruby/enc/test_case_comprehensive.rb: Change test class name from TestComprehensiveCaseFold to TestComprehensiveCaseMapping because the tests are about mapping in general, not only folding git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56966 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UKduerst2016-11-301-2/+0
| | | | | | | | | | | | | | | | | * enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC at the end of onigenc_unicode_case_map (Bug #12990). * enc/unicode/case-folding.rb: Add U+A64B to the special cases 03B9 and 03BC. Add a comment pointing to enc/unicode.c. Change warnings to exceptions for unpredicted cases, because this would have been more easily noticed (the warning was not noticed when upgrading to Unicode 9.0.0). * test/ruby/enc/test_case_comprehensive.rb: Remove temporary exclusion of U+A64B from testing. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* get rid of ambiguous parentheses warningsnobu2016-11-292-10/+10
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56937 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Fix erroneous test of target against targetduerst2016-11-291-1/+3
| | | | | | | | * test/ruby/enc/test_case_comprehensive.rb: fix test condition, add a temporary check for U+A64B, the only character where the tests currently fail. (Bug #12990) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1254.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-10-161-1/+1
| | | | | | | Implement non-ASCII case conversion for Windows-1254. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56433 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* test_regex_casefold.rb: skip if no data filenobu2016-08-261-1/+1
| | | | | | | | * test/ruby/enc/test_regex_casefold.rb (setup): skip with error message if CaseFolding.txt does not present, instead of printing the message, which causes unknown command in parallel test. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56017 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_2.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-301-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-2, by Yushiro Ishii. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1257.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+1
| | | | | | | Implement non-ASCII case conversion for Windows-1257, by Sho Koike. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1250.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+1
| | | | | | | | Implement non-ASCII case conversion for Windows-1250, by Sho Koike. * ChangeLog: Fixed order of previous two entries. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+1
| | | | | | | Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55750 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+1
| | | | | | | Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2016-07-261-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * test/ruby/enc/test_case_comprehensive.rb: Add explicit skip test forduerst2016-07-261-4/+13
| | | | | | | availability of Unicode data files. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55746 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_9.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-132-2/+2
| | | | | | | | | | | Implement non-ASCII case conversion for ISO-8859-9, by Kazuki Iijima. * enc/iso_8859_9.c: Exclude dotless i/I with dot from case-insensitive matching because they are not a case pair. * test/ruby/enc/test_iso_8859.rb: Make test coverage for ISO-8859-9 a bit more complete. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1252.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-1/+1
| | | | | | | | Implement non-ASCII case conversion for Windows-1252, by Serina Tai. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55665 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_7.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-1/+1
| | | | | | | | Implement non-ASCII case conversion for ISO-8859-7, by Kosuke Kurihara. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55664 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_5.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-8/+8
| | | | | | | | Implement non-ASCII case conversion for ISO-8859-5, by Masaru Onodera. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_13.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-13, by Kanon Shindo. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_3.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-3/+10
| | | | | | | | | | | Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto. * test/ruby/enc/test_case_comprehensive.rb: Extend special treatment for Turkic. * enc/iso_8859_3.c: Exclude dotless i/I with dot from case-insensitive matching because they are not a case pair. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55648 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * test/ruby/enc/test_iso_8859.rb: Excluded dotless i/I with dot fromduerst2016-07-121-1/+3
| | | | | | | case-insensitive matching because they are not a case pair. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55647 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * revert r55642 (previous commit) because of test failure atduerst2016-07-121-10/+3
| | | | | | | https://travis-ci.org/ruby/ruby/builds/144148780 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55643 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_3.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-121-3/+10
| | | | | | | | | Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto. * test/ruby/enc/test_case_comprehensive.rb: Extend special treatment for Turkic. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55642 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_10.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-101-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-10, by Toya Hosokawa. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55627 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * test/ruby/enc/test_case_comprehensive.rb: Changed testing logic in toduerst2016-07-101-1/+6
| | | | | | | | | | catch unintended modifications of characters that do not have a case equivalent in the respective encoding. * enc/iso_8859_1.c, enc/iso_8859_15.c: Fixed unintended modifications of micro sign and y with diaeresis. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55626 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_4.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-101-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-4, by Kotaro Yoshida. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55624 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55622 ↵duerst2016-07-101-1/+1
| | | | b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_14.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-061-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-14, by Yutaro Tada. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55595 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_15.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-061-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-15, by Maho Harada. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55591 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_16.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-061-1/+1
| | | | | | | Implement non-ASCII case conversion for ISO-8859-16, by Satoshi Kayama. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55590 b2dd03c8-39d4-4d8f-98ff-823fe69b080e