1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
|
Revision history for Perl module Unicode::Collate.
0.76 Sun May 15 10:06:59 2011
- updated CJK/Pinyin.pm and CJK/Stroke.pm according to CLDR 1.9.1 using
type='pinyin' alt='short' and type='stroke' alt='short' respectively.
0.75 Sat May 7 21:07:38 2011
- supported ignore_level2 and rewrite.
- Added iglevel2.t and rewrite.t in t.
0.74 Mon Mar 21 19:07:38 2011
- removed sw (Swahili) collation according to CLDR 1.9.
(removed files: Collate/Locale/sw.pl and data/sw.txt)
- shifted primary weights of letters > Z for some languages.
(affected locales: da, fi, fo, kl, nb, nn, sv)
0.73 Sun Mar 6 13:24:22 2011
- DUCET is updated (for Unicode 6.0.0) as Collate/allkeys.txt.
! However no maint perl has supported Unicode 6.0.0 yet;
wait for 5.14, or try developing 5.13.7 or later.
! Please notice that allkeys.txt will be overwritten if you have had
other allkeys.txt already.
- The default UCA_Version is 22. Locale/*.pl and Korean.pm are updated.
- test: compare allkeys.txt's version with Base_Unicode_Version
in t/default.t.
0.72 Sat Jan 22 17:28:32 2011
- xs: fix mixing char* and U8*.
0.71 Tue Jan 18 22:29:44 2011
- t/loc_test.t should not fail without Unicode::Normalize.
0.70 Sun Jan 16 20:31:07 2011
- Now U::C::Locale->new will use the compiled DUCET via XS if available.
added some tests in t/loc_test.t.
0.69 Sat Jan 15 19:41:11 2011
- clarified about XSUB. revised INSTALL in README.
- xs: flag passed to utf8n_to_uvuni().
- doc and comments: [perl #81876] Fix typos by Peter J. Acklam.
0.68 Tue Nov 23 20:17:22 2010
- doc: clarified about (backwards => [ ]) and (backwards => undef).
- separated t/backwds.t from t/test.t.
- added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t
for CJK/*.pm without Locale.pm.
0.67 Sun Nov 14 11:38:59 2010
- supported UCA_Version 22 for Unicode 6.0.0.
* 2B740..2B81D are new CJK unified ideographs.
* noncharacters (e.g. U+FFFF) should be overridable, not be ignored.
! DUCET is NOT updated, as no maint perl supports Unicode 6.0.0.
Thus the default UCA_Version is still 20.
- added t/nonchar.t.
- improved discontiguous contractions of 3 or more characters.
(e.g. 0FB2 0F71 0F80 and 0FB3 0F71 0F80)
- auxiliary: now 'mklocale' also copes with Korean.pm according to DUCET.
0.66 Sun Nov 7 10:47:30 2010
- U::C::Locale newly supports locale: ko.
- added Unicode::Collate::CJK::Korean for ko.
- added t/loc_ko.t.
- 12 compat. ideographs (e.g. U+FA0E) are treated as unified ideographs.
(though DUCET also does it, now Unicode::Collate does it without DUCET.)
- added t/compatui.t.
! Ideographs Ext.B (U+20000..U+2A6D6) can be overridden with UCA_Version 8.
This is a long-standing behavior from Unicode::Collate 0.11 to 0.63.
A wrong fix at 0.64 should be abandoned.
0.65 Wed Nov 3 13:10:20 2010
- U::C::Locale newly supports locale: zh and its some variants.
(zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke)
- added Unicode::Collate::CJK::Big5 for zh__big5han.
- added Unicode::Collate::CJK::GB2312 for zh__gb2312han.
- added Unicode::Collate::CJK::Pinyin for zh__pinyin.
- added Unicode::Collate::CJK::Stroke for zh__stroke.
- added loc_zh.t, loc_zhb5.t, loc_zhgb.t, loc_zhpy.t, loc_zhst.t in t.
0.64 Sun Oct 31 14:17:29 2010
- U::C::Locale newly supports locale: ja.
- added Unicode::Collate::CJK::JISX0208 for ja.
- added loc_ja.t, loc_jait.t, loc_japr.t in t.
- a subroutine specified in 'overrideCJK' or 'overrideHangul' is allowed
to return an integer or undef value.
- fix: Ideographs Ext.B (U+20000..U+2A6D6) are assigned in Unicode 3.1,
then 'overrideCJK' should not override them with UCA_Version 8.
!! sorry, this fix is based on a wrong idea. reverted at 0.66. !!
- separated t/overcjk0.t and t/overcjk1.t from t/override.t.
0.63 Sun Oct 10 22:13:21 2010
- supported suppress contractions (see 'suppress' in POD).
- internal for 'hangul_terminator' in getSortKey().
- U::C::Locale newly supports locales: be, bg, kk, mk, ru, sr.
- added loc_be.t, loc_bg.t, loc_cyrl.t, loc_kk.t, loc_mk.t, loc_ru.t,
loc_sr.t in t.
- added tailoring with U+0340 or U+0341 instead of U+0300 or U+0301.
(affected locales: hr, is, pl, se, to, wo)
0.62 Wed Oct 6 21:35:54 2010
- U::C::Locale newly supports locales: ar, hu, hy, se, to, uk.
- added loc_ar.t, loc_hu.t, loc_hy.t, loc_se.t, loc_to.t, loc_uk.t in t.
- Vietnamese (vi): added tailoring for U+0340 and U+0341.
0.61 Sat Oct 2 11:41:29 2010
- U::C::Locale newly supports locales: hr, ig, sq.
- added loc_hr.t, loc_ig.t, loc_sq.t in t.
- precomposites of e-dot-below, o-dot-below, o-tilde are tailored as well.
(affected locales: et, yo)
- Vietnamese (vi): added contractions for non-blocked decompositions
* base + dot-below + mark such as a\x{323}\x{306}, \x{1EA1}\x{306} etc.
* base + tone + horn such as o\x{309}\x{31B}, \x{1ECF}\x{31B} etc.
0.60 Thu Sep 23 21:37:36 2010
- bug fix: index() [and its friends including gmatch()] didn't remove
ignorable characters in the substring correctly.
Thanks for the bug report:
http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2010-09/msg00014.html
- U::C::Locale newly supports locales: de__phonebook, nso, om, tn, vi.
- added loc_de.t, loc_deph.t, loc_nso.t, loc_om.t, loc_tn.t, loc_vi.t in t.
- precomposites of a-breve, a-circ, e-circ, o-circ are tailored as well.
(affected locales: ro, sk, sv)
0.59 Sun Sep 5 17:03:52 2010
- U::C::Locale newly supports locales: az, fil, ha, lt, mt, tr, wo, yo.
- added loc_az.t, loc_fil.t, loc_ha.t, loc_lt.t, loc_mt.t, loc_tr.t,
loc_wo.t, loc_yo.t in t.
- precomposites of a-uml, o-uml, and u-uml are tailored as well.
(affected locales: da, et, fi, fo, is, kl, nb, nn, sk, sv)
0.58 Sun Aug 29 19:56:50 2010
- U::C::Locale newly supports locales: af, cy, da, fo, haw, is, kl, sw.
- added loc_af.t, loc_cy.t, loc_da.t, loc_fo.t, loc_haw.t, loc_is.t,
loc_kl.t, loc_sw.t in t.
0.57 Sun Aug 22 22:39:58 2010
- U::C::Locale newly supports locales: ca, et, fi, lv, sk, sl.
- added loc_ca.t, loc_et.t, loc_fi.t, loc_lv.t, loc_sk.t, loc_sl.t in t.
0.56 Sun Aug 8 20:24:03 2010
- Unicode::Collate::Locale newly supports locales: eo, nb, ro, sv.
- added loc_eo.t, loc_es.t, loc_estr.t, loc_nb.t, loc_ro.t, loc_sv.t in t.
! renamed t/locale_{xy}.t to t/loc_{xy}.t (for safer 8.3 names)
(loc_cs.t, loc_fr.t, loc_nn.t, loc_pl.t, loc_test.t)
0.55 Sun Aug 1 21:21:23 2010
- incorporated Unicode::Collate::Locale with some changes. see:
http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2004-03/msg00030.html
- supported locales: cs, es, es__traditional, fr, nn, pl.
! added t/locale*.t that uses DUCET.
(locale_cs.t, locale_fr.t, locale_nn.t, locale_pl.t, locale_test.t)
- data/*.txt and mklocale for preparation of Locale/*.pl from DUCET.
0.54 Sun Jul 25 21:37:04 2010
- Now UCA Revision 20 (based on Unicode 5.2.0).
- DUCET is also updated (for Unicode 5.2.0) as Collate/allkeys.txt,
which *is required* to test this module.
! Please notice that allkeys.txt will be overwritten if you have had
other allkeys.txt already.
- U+9FC4..U+9FCB and U+2A700..U+2B734 are new CJK unified ideographs.
- Many hangul jamo are assigned (affecting hangul_terminator).
! Now XSUB will be built by default. (XSUB needs a C compiler.)
To build pure perl, run disableXS before Makefile.PL.
! DUCET will be compiled when XS is used. Explicit saying
<table => 'allkeys.txt'> (or using another table) will prevent
this module from using the compiled DUCET.
! added t/default.t that uses DUCET.
0.53 Sun Feb 14 20:46:27 2010
- Now UCA Revision 18 (based on Unicode 5.1.0).
- DUCET is also updated (for Unicode 5.1.0) as Collate/allkeys.txt,
which is not required to test this module.
! Please notice that allkeys.txt will be overwritten if you have had
other allkeys.txt already.
- U+9FBC..U+9FC3 are new CJK unified ideographs.
0.52 Thu Oct 13 21:51:09 2005
- The Unicode::Collate->new method does not destroy user's $_ any longer.
(thanks to Jon Warbrick for bug report)
0.51 Sun May 29 20:21:19 2005
- Added the latest DUCET (for Unicode 4.1.0) as Collate/allkeys.txt,
which is not required to test this module.
! Please notice that allkeys.txt will be overwritten if you have had
other allkeys.txt already.
- Added INSTALL section in POD.
0.50 Sun May 8 20:26:39 2005
- Now UCA Revision 14 (based on Unicode 4.1.0).
- Some tests are modified.
- Added cjkrange.t, ignor.t, override.t in t.
- Added META.yml.
0.40 Sat Apr 24 06:54:40 2004
- Now a table file is searched in @INC.
0.33 Sat Dec 13 14:07:27 2003
- documentation improvement: in "entry", "overrideHangul", etc.
0.32 Wed Dec 3 23:38:18 2003
- A matching part from index(), match() etc. will include illegal
code points (as well as ignorable characters) following a grapheme.
- Contraction with illegal code point will be invalid.
- Added t/view.t.
- Added some tests in t/illegal.t.
- Separated t/altern.t and t/rearrang.t from t/test.t.
- modified XSUB internals.
0.31 Sun Nov 16 15:40:15 2003
- Illegal code points (surrogate and noncharacter; they are definitely
ignorable) will be distinguished from NULL ("\0");
but porting is not successful in the case of ((Pure Perl) and
(Perl 5.7.3 or before)). If perl 5.6.X is used, XSUB may help it
in place of broken CORE::unpack('U*') in older perl.
- added illegal.t and illegalp.t in t.
- added XSUB where some functions are implemented in XSUB.
Pure Perl is also supported.
0.30 Mon Oct 13 21:26:37 2003
- fix: Completely ignorable in table should be able to be overridden
by non-ignorable in entry.
- fix: Maximum length for contraction must not be shortened
by a shorter contraction following in table and/or entry.
- added t/normal.t.
- some doc fixes
0.29 Mon Oct 13 12:18:23 2003
- now UCA Version 11 (but no functionality is different from Version 9).
- supported hangul_terminator.
- fix: Base_Unicode_Version falsely returns Perl's Unicode version.
C4 in UTS #10 requires UTS's Unicode version.
- For variable weighting, 'variable' is recommended
and 'alternate' is deprecated.
- added version() method.
- added hangtype.t, trailwt.t, variable.t, and version.t in t.
0.28 Sat Sep 06 20:16:01 2003
- Fixed another inconsistency under (normalization => undef):
Non-contiguous contraction is always neglected.
- Fixed: according to S2.1 in UTS #10, a blocked combining character
should not be contracted. One test in t/test.t was wrong, then removed.
- Added t/contract.t.
- (normalization => "prenormalized") is able to be used.
0.27 Sun Aug 31 22:23:17 2003
some improvements:
- The maximum length of contracted CE was not checked (v0.22 to v0.26).
Collation of a large string including a first letter of a contraction
that is not a part of that contraction (say, 'c' of 'ca'
where 'ch' is defined) was too slow, inefficient.
- A form name for 'normalization', no longer restricted to
/^(?:NF)?K?[CD]\z/, will be allowed as long as
Unicode::Normalize::normalize() accepts it, since Unicode::Normalize
or UAX #15 may be changed/enhanced in future.
- When Hangul syllables are decomposed under <normalization => undef>,
contraction among jamo (LV, VT, LVT) derived from the same
Hangul syllable is allowed.
- Added t/hangul.t.
0.26 Sun Aug 03 22:23:17 2003
- fix: an expansion in which a CE is level 3 ignorable and others are not
was wrongly made level 3 ignorable as a whole entry.
(In DUCET, some precomposites in Musical Symbols are so)
0.25 Mon Jun 06 23:20:17 2003
- fix Makefile.PL.
- internal tweak (again): pack_U() and unpack_U().
0.24 Thu Apr 02 23:12:54 2003
- internal tweak for (?un)pack 'U'.
0.23 Wed Sep 04 19:25:20 2002
- fix: scalar match() no longer returns an lvalue substr ref.
- fix: "Ignorable after variable" should be made level 3 ignorable
even if alternate => 'blanked'.
- Now a grapheme may contain trailing level 2, level 3,
and completely ignorable characters.
0.22 Mon Sep 02 23:15:14 2002
- New File: t/index.t.
(The new t/test.t excludes tests for index.)
- tweak on index(). POSITION is supported.
- add match, gmatch, subst, gsubst methods.
- fix: ignorable after variable in 'shift'-variable weight.
0.21 Sat Aug 03 10:24:00 2002
- upgrade keys.txt and t/test.t for UCA Version 9.
0.20 Fri Jul 26 02:15:25 2002
- now UCA Version 9.
- U+FDD0..U+FDEF are new non-characters.
- fix: whitespace characters before @backwards etc. in a table file.
- now values for 'alternate', 'backwards', etc.,
which are explicitly specified via new(),
are preferred to those specified in a table file.
0.12 Sun May 05 09:43:10 2002
- add new methods, ->UCA_Version and ->Base_Unicode_Version.
- test fix: removed the needless requirement of Unicode::Normalize.
[reported by David Hand]
0.11 Fri May 03 02:28:10 2002
- fix: now derived collation elements can be used for Hangul Jamo
when their weights are not defined.
[reported by Andreas J. Koenig]
- fix: rearrangements had not worked.
- mentioned pleblem on index() in BUGS.
- more documents, more tests.
- tag names for 'alternate' are case-insensitive (i.e. 'SHIFTed' etc.).
- The <undef> value for the keys "overrideCJK", "overrideHangul",
"rearrange" has a special behavior (different from default).
0.10 Tue Dec 11 23:26:42 2001
- now you are allowed to use no table file.
- fix: fetching CE with two or more combining characters.
0.09 Sun Nov 11 17:02:40:18 2001
- add the following methods: eq, ne, lt, le, gt, le.
- relies on &Unicode::Normalize::getCombinClass()
in place of %Unicode::Normalize::Combin
(the hash is not defined in the XS version of Unicode::Normalize).
then you should install Unicode::Normalize 0.10 or later.
- now independent of Lingua::KO::Hangul::Util
(this module does decomposition of Hangul syllables for itself)
0.08 Mon Aug 20 22:40:18 2001
- add the index method.
0.07 Thu Aug 16 23:42:02 2001
- rename the module name to Unicode::Collate.
0.06 Thu Aug 16 23:18:36 2001
- add description of the getSortKey method.
0.05 Mon Aug 13 22:23:11 2001
- bug fix: on the things of 4.2.1, UTR #10
- getSortKey returns a string, but not an arrayref.
0.04 Mon Aug 13 22:23:11 2001
- some bugs are fixed.
- some tailoring parameters are added.
0.03 Mon Aug 06 06:26:35 2001
- modify README
0.02 Sun Aug 05 20:20:01 2001
- some fix
0.01 Sun Jul 29 16:16:15 2001
- original version; created by h2xs 1.21
with options -A -X -n Sort::UCA
|