summaryrefslogtreecommitdiff
path: root/lib/unicore/README.perl
blob: 7515825c6f34b25f58b31417c2b555cbef87bab9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
The *.txt files were copied from

	ftp://www.unicode.org/Public/UNIDATA

with subdirectories 'extracted' and 'auxiliary'

The Unihan files were not included due to space considerations.  Also NOT
included were any *.html files and *Test.txt files.  It is possible to add the
Unihan files, and edit mktables (see instructions near its beginning) to look
at them.

The file 'version' should exist and be a single line with the Unicode version,
like:
5.2.0

To be 8.3 filesystem friendly, the names of some of the input files have been
changed from the values that are in the Unicode DB:

mv PropertyValueAliases.txt PropValueAliases.txt
mv NamedSequencesProv.txt NamedSqProv.txt
mv DerivedAge.txt DAge.txt
mv DerivedCoreProperties.txt DCoreProperties.txt
mv DerivedNormalizationProps.txt DNormalizationProps.txt
mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt
mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt
mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt
mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt
mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt
mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt
mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt
mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt
mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt
mv extracted/DerivedNumericType.txt extracted/DNumType.txt
mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt

If you have the Unihan database (5.2 and above), you should also do the
following:

mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt
mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt
mv Unihan_IRGSources.txt UnihanIRGSources.txt
mv Unihan_NumericValues.txt UnihanNumericValues.txt
mv Unihan_OtherMappings.txt UnihanOtherMappings.txt
mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt
mv Unihan_Readings.txt UnihanReadings.txt
mv Unihan_Variants.txt UnihanVariants.txt

If you download everything, the names of files, such as test files, that are
not used by mktables are not changed by the above, and will not work correctly
as-is on 8.3 filesystems.

mktables is used to generate the tables used by the rest of Perl.  It will warn
you about any *.txt files in the directory substructure that it doesn't know
about.  You should remove any so-identified, or edit mktables to add them to
its lists to process.  You can run

    mktables -globlist

to have it try to process these tables generically.

If any files are added, deleted, or their names change, you must run

    mktables -makelist

to generate a new list of all the files.

FOR PUMPKINS

The files are inter-related.  If you take the latest UnicodeData.txt, for
example, but leave the older versions of other files, there can be subtle
problems.

When moving to a new version of Unicode, you need to update 'version' by hand

	p4 edit version
	...

You should look in the Unicode release notes (which are probably towards the
bottom of http://www.unicode.org/reports/tr44/) to see if any properties have
newly been moved to be Obsolete, Deprecated, or Stabilized.  The full names for
these should be added to the respective lists near the beginning of mktables,
using an 'if' to add them for just this Unicode version going forward, so that
mktables can continue to be used for earlier Unicode versions. 

When putting out a new Perl release, think about if any of the Deprecated
properties should be moved to Suppressed.

The *.pl files are generated from the *.txt files by the mktables script,
more recently done during the Perl build process, but if you want to try
the old manual way:
	
	cd lib/unicore
	p4 edit *.pl */*.pl */*/*.pl
	perl ./mktables -P ../../pod -T ../../t/re/uniprops.t -makelist
	p4 revert -a
	cd ../..
	perl Porting/manicheck
	
If any new (or deleted, unlikely but not impossible) *.pl files are indicated:

	cd lib/unicore
	p4 add ...
	p4 delete ...
	cd ../...
	p4 edit MANIFEST
	...

And finally:

	p4 submit

-- 
jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com