Name: icu URL: http://site.icu-project.org/ Version: 4.6 License: MIT Security Critical: yes Description: This directory contains the source code of ICU 4.6 for C/C++ 1. It was obtained with the following: $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46 2. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X: - Apply platform.patch in patches directory. : It applies the upstream patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248) and change source/common/unicode/ptypes.h to refer to plinux.h and pmac.h generated below. - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and 'runConfigureICU MacOSX' are run to generate source/common/unicode/platform.h. - On OpenBSD, source/common/unicode/platform.h is being generated by the icu4c port in the ports directory and not by runConfigureICU. In case the file has to be updated you can do: cd /home/ports/textproc/icu4c && make configure - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h' - Apply patches/pmach.h.patch on Mac to pmac.h - On Android, the pandroid.h was generated by copying plinux.h to pandroid.h and applying the patches/pandroid.h.patch. - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h 3. The following directories were removed because they're not used by Chromium at the moment: as_is packaging source/extra source/sample source/layout source/layoutex 4. The word breaking for Chinese and Japanese were modified to use a word frequency list with the following patch and cjdict.txt. - patches/segmentation.patch : Adds a dictionary (word-frequency)-based word breaking for CJK (Korean is supported in the code, but it does not do anything because we don't have a Korean word-list.) - source/data/brkitr/cjdict.txt : Chinese and Japanese word frequency list. See the file for license/copyright notice - source/data/brkitr/cc_edict.txt : the list of words derived from CC-Edict.) - patches/brkitr.patch * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific handling of U+0022, and splitting of FQDN into labels at '.'. For Hebrew, see http://unicode.org/cldr/track/ticket/3120 * line.txt : Incorporated line_he and minor changes in CL, OP and ID definitions. For Hebrew, see http://unicode.org/cldr/track/ticket/4004 For others, see http://unicode.org/cldr/track/ticket/3974 http://unicode.org/cldr/track/ticket/4200 http://unicode.org/cldr/track/ticket/ * brklocal.mk : build file changes to drop unnecessary brkitr rule files (e.g. word_ja.txt, line_he.txt) - android/brkitr.patch (to be applied for Android build only) : Reverts some changes about Chinese/Japanese segmentation rules in patches/brkitr.patch to reduce binary size for Android. If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. 5. Converter changes : converters.patch - Include what we really need. See source/data/mappings/ucmlocal.txt - Alias and mapping changes : source/data/mappings/convrtrs.txt - Changes several tables and add six new tables, three of which are 'fake' tables for ISO-2022-CN(-Ext). - ucnv2022.c is modified to use 3 'fake' tables added above for ISO-2022-CN(-Ext). 6. Locale changes - patches/locale1.patch : Filipino, Amharic, and Swahili locales exemplar character set changes for CJK + 9 Indian locales Minor fixes for Danish, , Turkish, and Korean. - patches/locale2.patch : The minimum locale data Chrome needs for 47 languages Chrome is not localized to. Each locale data file has ExemplarCharacters, LocaleScript, layout, and the name of the language for a locale in its native language. - patches/locale3.patch : Locale build configuration files. They add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. - In source/data/region, run the following command to get rid of numeric region display names we don't use (everything other than 419). $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt - android/patch_locale.sh (to be run for Android build only): Makes changes to source/data/{curr,region,lang} to exclude these data except the language and script names of zh_Hans and zh_Hant. 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt - patches/unihan.patch: unihan collation tables are never used in Chrome/Webkit, but it takes about 1MB in the uncompressed ICU data file in ICU 4.2.1. 8. Timezone data update - Grab the latest version of the following timezone data files and put them in source/data/misc. metaZones.txt timezoneTypes.txt windowsZones.txt zoneinfo64.txt As of Dec 2013, the latest version is 2013h and the above files are available at http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2013h/44/ 9. Transliterator customization - Add the following files taken from ICU 52 to source/data/trnslit {tr,el,az}_{Upper,Lower,Title}.txt - Also add css3transform.txt to the same directory - Put the following line in trnslocal.mk TRANSLIT_SOURCE=css3transform.txt 10. Build-related changes - patches/wpo.patch - patches/vscomp.patch (see http://bugs.icu-project.org/trac/ticket/8355 and http://bugs.icu-project.org/trac/ticket/8356 ) - patches/rtti.patch : Make RTTI work without exception handling on Windows (see http://bugs.icu-project.org/trac/ticket/8343) - patches/data.build.patch : To remove some data files we don't use and cut down the data size. - patches/data.build.win.patch : Windows-only data build patch. Add a new target DATALIB to makedata.mak - patches/clang.patch: To build with Clang. (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in the patch have already been fixed in the ICU trunk.) - add an empty file (stubdatabuilt.txt) to source/stubdata 11. Pre-built data libraries are checked in. Before building data file on Linux, re-run 'runConfigureICU Linux' again if it's run without data.build.patch in #10 above. Because we removed layout and layoutex directories in step 3, 'runConfigureICU Linux' will fail even with '--disable-layout'. A work-around is to have a copy of our icu tree in a separate build directory and add back directories we removed in step 3 before running 'runConfigure'. 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make' in {BUILD_DIR_ROOT}/data. 'make' will fail again when pkgdata looks for css3transform.res. Edit data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'. (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again. - source/data/in/icudt46l.dat : Built on Linux with all the patches above applied. This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp. - windows/icudt.dll : With icudt46l.dat in place, all the patches applied and header files moved (#11 below), generated by building icudt_build project of build/icudt_build.sln on Windows. icudt46.dll is generated in bin/{Release,Debug} and copied to windows/icudt.dll and checked in. Note that we drop the version number ('46') from the dll name to avoind having to update our build scripts/configuration files everytime ICU is upgraded to a new version. - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the patches above (except android/brkitr.patch) applied and checked in. This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp. Alternatively, one can just generate icudt46l_dat.S on Linux and adopt the header portion to match the current header in mac/icudt46l_dat.S. That is as following without no leading space in each line: .globl _icudt46_dat #ifdef U_HIDE_DATA_SYMBOL .private_extern _icudt46_dat #endif .data .const .align 4 _icudt46_dat: - android/icudt46l_dat.S : Built on Linux with all the patches above and android/brkitr.patch applied and android/patch_locale.sh executed, and checked in. 12. Apply the fix found with static analysis tools such as PSV and coverity - patches/static.analysis.patch - upstream trunk/4.8 do not have this code any more. 13. Fix for msvs2010 applied: --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp (revision 78292) +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp (working copy) @@ -75,7 +75,7 @@ * Visual Studios 9.0. * Cygwin with MSVC 9.0 also complains here about redefinition. */ -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC) +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC) const int32_t StringPiece::npos; #endif 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch - upstream bug: http://bugs.icu-project.org/trac/ticket/8561 - Handle other chars besides the dot. This is required because decNumber's parser expects the dot as a decimal separator. - Locales that don't use dot were producing "NaN" values. 15. Fix a bug in the regex engine. - patches/regex.patch - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream) 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1). - patches/search_collation.patch - upstream bug: http://bugs.icu-project.org/trac/ticket/8290 17. Fix a use of uninitialized memory bug in regular expression matching - patches/rematch.patch - upstream bug: http://bugs.icu-project.org/trac/ticket/8824 18. Make it compile with -Werror on gcc 4.6 - patches/gcc46.patch (ToT upstream does not have this code any more). 19. Fix four out of bounds memory access error in common/uloc.c and common/uresbund.c - patches/uloc.patch - upstream bug: 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize) 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords) 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund) http://bugs.icu-project.org/trac/ticket/8813 (uresbund) 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords) 20. Fix a null pointer error in ubrk_setText in ubrk.cpp. - patches/ubrk.patch - upstream bug : http://bugs.icu-project.org/trac/ticket/9115 21. Fix a clang warning in rbbi.cpp by merging in an upstream change. - patches/changeset_30255.patch - upstream change : http://bugs.icu-project.org/trac/changeset/30255 22. Fix time zone handling and compilation on iOS. - patches/ios_timezone.patch - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051 - http://bugs.icu-project.org/trac/ticket/8661 23. Fix a buffer overflow in utext - patches/utext.patch - upstream change : http://bugs.icu-project.org/trac/changeset/29356 24. Fix compilation errors on VS2012 and above. - patches/vs2012.patch 25. Fix a buffer overflow in UTF-16/32 detection. - patches/csetdet.patch - upstream bug: http://bugs.icu-project.org/trac/ticket/10318 26. Add BreakIterator::getRuleStatus - patches/breakiterator.patch - Copy and paste BreakIterator::getRuleStatus API from ICU 52