summaryrefslogtreecommitdiff
path: root/enc
Commit message (Collapse)AuthorAgeFilesLines
* Change max byte length of UTF-8 to 4 bytesduerst2017-05-301-1/+1
| | | | | | | | In enc/utf_8.c, change maximum byte length of UTF-8 to 4 bytes (from 6) to conform to definition of UTF-8. This closes issue #13590. (This is a retry of r58954, after issue #13590 has been addressed.) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* revert r58954 temporarilyduerst2017-05-291-1/+1
| | | | | | | | Revert change to maximum of 4 bytes for UTF-8 characters at r58954 temporarily. This failed spec at https://travis-ci.org/ruby/ruby/builds/237086017, but it is totally unclear why. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58955 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Change max byte length of UTF-8 to 4 bytesduerst2017-05-291-1/+1
| | | | | | | In enc/utf_8.c, change maximum byte length of UTF-8 to 4 bytes (from 6) to conform to definition of UTF-8. This closes issue #13590. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* delete enc/prelude.rb, because no longer neededduerst2017-05-061-4/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* clean autogenerated filesnobu2017-04-212-229/+2
| | | | | | | | | | * enc/depend (clean, clean-srcs): fix path of name2ctype.h, and remove casefold.h too. * enc/jis/props.h: autogenerated file. [ruby-core:80823] [Bug #13493] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58438 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend: remove Unicode versionsnobu2017-04-181-2/+0
| | | | | | | * enc/depend (enc/unicode.o): remove hardcoded Unicode versions. this object file must be compiled by toplevel make. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix ext/-test-/struct/ dependenciesnormal2017-04-151-0/+2
| | | | | | | | | | | | | | I started writing a template for auto-generation and let "tool/update-deps --fix" fill in the rest. Hopefully this fixes problems with some CI builds after r58359. Further changes to other ext/-test-/ files should probably add or update "depend" files, too. * ext/-test-/struct/depend: new file * enc/depend: auto-updated with unicode 9.0.0 headers (side-effect) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58364 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc-unicode.rb: uniname2ctype_offsetnobu2017-03-231-786/+787
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* update name2ctype.hnobu2017-03-231-806/+979
| | | | | | | * enc/unicode/9.0.0/name2ctype.h: update due to merger of Onigmo 6.0.0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58064 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ruby tool/update-deps --fixshyouhei2017-03-221-2/+64
| | | | | | | | | | | | | | Onigumo 6 (r57045) introduced new onigumo.h header file, which is required from quite much everywhere. This commit adds necessary dependencies. Note: ruby/oniguruma.h now includes onigumo.h, ruby/io.h includes oniguruma.h, ruby/encoding.h also includes oniguruma.h, and internal.h includes encoding.h. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix UTF-32 valid_encoding?nobu2017-03-092-6/+30
| | | | | | | | | | | | * enc/utf_32be.c (utf32be_mbc_enc_len): check arguments precisely. [ruby-core:79966] [Bug #13292] * enc/utf_32le.c (utf32le_mbc_enc_len): ditto. * regenc.h (UNICODE_VALID_CODEPOINT_P): predicate for valid Unicode codepoints. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57816 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Merge Onigmo 6.0.0naruse2016-12-1042-527/+593
| | | | | | | | | | | * https://github.com/k-takata/Onigmo/blob/Onigmo-6.0.0/HISTORY * fix for ruby 2.4: https://github.com/k-takata/Onigmo/pull/78 * suppress warning: https://github.com/k-takata/Onigmo/pull/79 * include/ruby/oniguruma.h: include onigmo.h. * template/encdb.h.tmpl: ignore duplicated definition of EUC-CN in enc/euc_kr.c. It is defined in enc/gb2313.c with CRuby macro. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove special processing for U+03B9/U+03BC/U+A64Bduerst2016-12-042-18/+5
| | | | | | | | | | | | | * enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B (GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK) from onigenc_unicode_case_map and simplify code. * enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B. This and the previous few related commits make sure that we won't hit the equivalent of bug #12990 anymore for future updates of Unicode versions. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Reorder codepoints in some entries of CaseUnfold_11_Tableduerst2016-12-042-8/+17
| | | | | | | | | * enc/unicode/case-folding.rb: Reorder codepoints so that the upper-case mapping comes first. * enc/unicode/9.0.0/casefold.h: Codepoints reordered, upper-case mapping flag added. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56975 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Use offsetof macro and shrink table sizenobu2016-12-011-786/+786
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* constify CaseMappingSpecialsnobu2016-12-013-3/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Regexp supports Unicoe 9.0.0's \Xnaruse2016-11-301-1889/+3139
| | | | | | | | | | | | | | | | | | | | | | | | | * meta character \X matches Unicode 9.0.0 characters with some workarounds for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences. [Feature #12831] [ruby-core:77586] The term "character" can have many meanings bytes, codepoints, combined characters, and so on. "grapheme cluster" is highest one of such words, which means user-perceived characters. Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to handle grapheme clusters (extended grapheme cluster). But some specs aren't updated to current situation because Unicode Emoji is rapidly extended without well definition. It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters". (the sentence will be removed in the next version) Though some of its detail are described in Unicode Technical Report #51 UNICODE EMOJI but it is not merged into UTR#29 yet. http://unicode.org/reports/tr29/ http://unicode.org/reports/tr51/ http://unicode.org/Public/emoji/4.0/ git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UKduerst2016-11-302-6/+8
| | | | | | | | | | | | | | | | | * enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC at the end of onigenc_unicode_case_map (Bug #12990). * enc/unicode/case-folding.rb: Add U+A64B to the special cases 03B9 and 03BC. Add a comment pointing to enc/unicode.c. Change warnings to exceptions for unpredicted cases, because this would have been more easily noticed (the warning was not noticed when upgrading to Unicode 9.0.0). * test/ruby/enc/test_case_comprehensive.rb: Remove temporary exclusion of U+A64B from testing. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1254.c: Fix typo. Reported by k-takata atduerst2016-10-291-1/+1
| | | | | | | https://github.com/k-takata/Onigmo/commit/ceb59cc. Thanks! git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56523 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Update windows-1255 tablenobu2016-10-281-1/+2
| | | | | | | * enc/trans/windows-1255-tbl.rb: update mapping from 0xCA to U+05BA. [Feature #12877] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend: downcasenobu2016-10-281-1/+1
| | | | | | * enc/depend: downcase table file names. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56515 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend: extract transcode_tblgennobu2016-10-281-1/+5
| | | | | | | * enc/depend: extract transcode_tblgen method calls for libraries loaded by dynamically generated names, in single_byte.trans. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56514 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* single_byte.trans: dead codenobu2016-10-281-6/+3
| | | | | | | * enc/trans/single_byte.trans (transcode_tblgen_singlebyte): remove useless code. returned value is not used. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1254.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-10-161-6/+62
| | | | | | | Implement non-ASCII case conversion for Windows-1254. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56433 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * unicode/8.0.0/casefold.h, name2ctype.h, unicode/data/8.0.0:duerst2016-09-072-40414/+0
| | | | | | | removing directories/files related to Unicode version 8.0.0 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk: Updated Unicode version to 9.0.0 [Feature #12513]duerst2016-09-072-0/+42457
| | | | | | | | * unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0: new directories/files for Unicode version 9.0.0 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: separate unicode headersnobu2016-08-162-0/+0
| | | | | | | * common.mk (UNICODE_HDR_DIR): separate unicode header files from unicode data files. [ruby-core:76879] [Bug #12677] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55942 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: UNICODE_HDR_DIRnobu2016-08-161-2/+3
| | | | | | * common.mk (UNICODE_HDR_DIR): directory for unicode headers. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* iso_8859_2.c: dedent [ci skip]nobu2016-07-301-1/+1
| | | | | | * enc/iso_8859_2.c: remove unnecessary indent. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_2.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-301-2/+49
| | | | | | | Implement non-ASCII case conversion for ISO-8859-2, by Yushiro Ishii. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1253.c: Remove dead code found by Coverity Scan.duerst2016-07-271-3/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1257.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+55
| | | | | | | Implement non-ASCII case conversion for Windows-1257, by Sho Koike. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1250.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-3/+52
| | | | | | | | Implement non-ASCII case conversion for Windows-1250, by Sho Koike. * ChangeLog: Fixed order of previous two entries. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-7/+67
| | | | | | | Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55750 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-261-1/+42
| | | | | | | Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * regenc.h/c, include/ruby/oniguruma.h, enc/ascii.c, big5.c, cp949.c,duerst2016-07-2441-117/+1
| | | | | | | | | | | emacs_mule.c, euc_jp.c, euc_kr.c, euc_tw.c, gb18030.c, gbk.c, iso_8859_1|2|3|4|5|6|7|8|9|10|11|13|14|15|16.c, koi8_r.c, koi8_u.c, shift_jis.c, unicode.c, us_ascii.c, utf_16|32be|le.c, utf_8.c, windows_1250|51|52|53|54|57.c, windows_31j.c, unicode.c: Remove conditional compilation macro ONIG_CASE_MAPPING. [Feature #12386]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55740 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Move generated headers to unicode data directorynobu2016-07-174-4/+22
| | | | | | | * common.mk, enc/depend (casefold.h, name2ctype.h): move to unicode data directory per version. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: directory timestampsnobu2016-07-151-2/+3
| | | | | | | | * common.mk, enc/Makefile.in: moved timestamp files for directories under the specific directory, to get rid of match with files under the source directory. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55696 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Revert r55693 because it broke building on all platforms (and had no ChangeLog).usa2016-07-151-3/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55694 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: directory timestampsnobu2016-07-151-2/+3
| | | | | | | | * common.mk, enc/Makefile.in: moved timestamp files for directories under the specific directory, to get rid of match with files under the source directory. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55693 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode: check Unicode versionsnobu2016-07-153-28/+44
| | | | | | | * enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if Unicode versions are consistent with each other. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: update enc/unicode/name2ctype.hnobu2016-07-143-85779/+0
| | | | | | | | | | * Makefile.in (enc/unicode/name2ctype.h): remove stale recipe, which did not support Unicode age properties. * common.mk (enc/unicode/name2ctype.h): update by --header option of tool/enc-unicode.rb. enc/unicode/name2ctype.kwd file has not been used. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55678 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Fix file name in comment againkazu2016-07-131-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55670 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_9.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-3/+57
| | | | | | | | | | | Implement non-ASCII case conversion for ISO-8859-9, by Kazuki Iijima. * enc/iso_8859_9.c: Exclude dotless i/I with dot from case-insensitive matching because they are not a case pair. * test/ruby/enc/test_iso_8859.rb: Make test coverage for ISO-8859-9 a bit more complete. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1252.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-3/+51
| | | | | | | | Implement non-ASCII case conversion for Windows-1252, by Serina Tai. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55665 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_7.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-1/+57
| | | | | | | | Implement non-ASCII case conversion for ISO-8859-7, by Kosuke Kurihara. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55664 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Fix file names in commentsnobu2016-07-135-5/+5
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55661 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_1.c, enc/iso_8859_4.c: Avoid setting modification flag ifduerst2016-07-132-6/+3
| | | | | | | there is no modification. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55660 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/iso_8859_5.c, test/ruby/enc/test_case_comprehensive.rb:duerst2016-07-131-1/+35
| | | | | | | | Implement non-ASCII case conversion for ISO-8859-5, by Masaru Onodera. * test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/windows_1254.c: Adjust variable/macro names.duerst2016-07-131-8/+8
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55654 b2dd03c8-39d4-4d8f-98ff-823fe69b080e