1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta name="keywords"
content="unicode, normalization, composition, decomposition">
<meta name="description" content="Describes PropList.html">
<title>UCD: Extended Character Properties</title>
<link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css">
</head>
<body bgcolor="#ffffff">
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tr>
<td>
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="icon"><a href="http://www.unicode.org"><img border="0"
src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle"
alt="[Unicode]" width="34" height="33"></a> <a
class="bar" href="UnicodeCharacterDatabase.html">Unicode Character
Database</a></td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="gray"> </td>
</tr>
</table>
<h1>Extended Character Properties</h1>
<table height="87" cellspacing="2" cellpadding="0" width="100%" border="1">
<tbody>
<tr>
<td valign="top" width="144">Revision</td>
<td valign="top">3.1.0</td>
</tr>
<tr>
<td valign="top" width="144">Authors</td>
<td valign="top">Mark Davis</td>
</tr>
<tr>
<td valign="top" width="144">Date</td>
<td valign="top">2001-02-28</td>
</tr>
<tr>
<td valign="top" width="144">This Version</td>
<td valign="top"><a
href="http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html">http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html</a></td>
</tr>
<tr>
<td valign="top" width="144">Previous Version</td>
<td valign="top">n/a</td>
</tr>
<tr>
<td valign="top" width="144">Latest Version</td>
<td valign="top"><a
href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td>
</tr>
</tbody>
</table>
<h3><i><br>
Summary</i></h3>
<blockquote>
<p><i>This document describes the format and content of the PropList.txt data
file in the Unicode Character Database (UCD).</i></p>
</blockquote>
<h3><i>Status</i></h3>
<blockquote>
<p><i>The file and the files described herein are part of the Unicode
Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a>
given below.</i></p>
<p><i>For general information on file formats and table formats, and the
implications of normative vs informative properties, see
UnicodeCharacterDatabase.html.</i></p>
<p><i><b>Warning: </b>the information in this file does not completely
describe the use and interpretation of Unicode character properties and
behavior. It must be used in conjunction with the data in the other files in
the UCD, and relies on the notation and definitions supplied in <a
href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The
Unicode Standard</a>. All chapter references are to Version 3.1.0 of the
standard.</i></p>
</blockquote>
<hr width="50%">
<h2>Introduction</h2>
<p align="left">PropList.txt contains extended properties that supplement the
General Category property described in UnicodeData.html. Unlike the derived
properties, the properties in PropList.txt cannot be derived directly from
UnicodeData.txt or other data files of the UCD. These properties are listed in
the following table.</p>
<div align="center">
<center>
<table border="1" cellspacing="0" cellpadding="3" class="smallText">
<tr>
<th>Property Value</th>
<th>N/I</th>
<th>Definition and Usage</th>
</tr>
<tr>
<th valign="top">White_space</th>
<th valign="top">N</th>
<td valign="top">Space characters and those format control characters
(such as TAB, CR and LF) which should be treated by programming
languages as "white space" for the purpose of parsing
elements.
<p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not
included, since their functions are restricted to line-break control.
Their names are unfortunately misleading in this respect.</p>
<p><b>Note: </b>There are other senses of "whitespace" that
encompass a different set of characters.</p>
</td>
</tr>
<tr>
<th valign="top">Bidi_Control</th>
<th valign="top">N</th>
<td valign="top">Those format control characters which have specific
functions in the Bidirectional Algorithm.</td>
</tr>
<tr>
<th valign="top">Join_Control</th>
<th valign="top">N</th>
<td valign="top">Those format control characters which have specific
functions for control of cursive joining and ligation.</td>
</tr>
<tr>
<th valign="top">Dash</th>
<th valign="top">I</th>
<td valign="top">Those punctuation characters explicitly called out as
dashes in the Unicode Standard, plus compatibility equivalents to those.
Most of these have the Pd General Category, but some have the Sm General
Category because of their use in mathematics.</td>
</tr>
<tr>
<th valign="top">Hyphen</th>
<th valign="top">I</th>
<td valign="top">Those dashes used to mark connections between pieces of
words, plus the Katakana middle dot. The Katakana middle dot functions
like a hyphen, but is shaped like a dot rather than a dash.</td>
</tr>
<tr>
<th valign="top">Quotation_Mark</th>
<th valign="top">I</th>
<td valign="top">Those punctuation characters that function as quotation
marks.</td>
</tr>
<tr>
<th valign="top">Terminal_Punctuation</th>
<th valign="top">I</th>
<td valign="top">Those punctuation characters that generally mark the end
of textual units.</td>
</tr>
<tr>
<th valign="top">Other_Math</th>
<th valign="top">I</th>
<td valign="top">Math characters that do not have the Sm General Category.</td>
</tr>
<tr>
<th valign="top">Hex_Digit</th>
<th valign="top">I</th>
<td valign="top">Characters commonly used for the representation of
hexadecimal numbers, plus their compatibility equivalents.</td>
</tr>
<tr>
<th valign="top">Other_Alphabetic</th>
<th valign="top">I</th>
<td valign="top">Alphabetic characters that do not have L as their major
class for the General Category (Lu, Ll, Lt, Lm, Lo).</td>
</tr>
<tr>
<th valign="top">Ideographic</th>
<th valign="top">I</th>
<td valign="top">Characters considered to be CJKV (Chinese, Japanese,
Korean, and Vietnamese) ideographs.</td>
</tr>
<tr>
<th valign="top">Diacritic</th>
<th valign="top">I</th>
<td valign="top">Characters that linguistically modify the meaning of
another character to which they apply. Some diacritics are not combining
characters, and some combining characters are not diacritics.</td>
</tr>
<tr>
<th valign="top">Extender</th>
<th valign="top">I</th>
<td valign="top">Characters whose principal function is to extend the
value or shape of a preceding alphabetic character. Typical of these are
length and iteration marks.</td>
</tr>
<tr>
<th valign="top">Other_Lowercase</th>
<th valign="top">I</th>
<td valign="top">Lowercase characters that do not have the Ll General
Category.</td>
</tr>
<tr>
<th valign="top">Other_Uppercase</th>
<th valign="top">I</th>
<td valign="top">Uppercase characters that do not have the Lu General
Category.</td>
</tr>
<tr>
<th valign="top">Noncharacter_Code_Point</th>
<th valign="top">N</th>
<td valign="top">Code points that are explicitly defined as illegal for
the encoding of characters. See <a
href="http://www.unicode.org/unicode/reports/tr27/">Unicode 3.1</a> for
more information.</td>
</tr>
</table>
</center>
</div>
<h2><i><a name="UCD_Terms"><br>
UCD Terms of Use</a></i></h2>
<h3><i>Disclaimer</i></h3>
<blockquote>
<p><i>The Unicode Character Database is provided as is by Unicode, Inc. No
claims are made as to fitness for any particular purpose. No warranties of any
kind are expressed or implied. The recipient agrees to determine applicability
of information provided. If this file has been purchased on magnetic or
optical media from Unicode, Inc., the sole remedy for any claim will be
exchange of defective media within 90 days of receipt.</i></p>
<p><i>This disclaimer is applicable for all other data files accompanying the
Unicode Character Database, some of which have been compiled by the Unicode
Consortium, and some of which have been supplied by other sources.</i></p>
</blockquote>
<h3><i>Limitations on Rights to Redistribute This Data</i></h3>
<blockquote>
<p><i>Recipient is granted the right to make copies in any form for internal
distribution and to freely use the information supplied in the creation of
products supporting the Unicode<sup>TM</sup> Standard. The files in the
Unicode Character Database can be redistributed to third parties or other
organizations (whether for profit or not) as long as this notice and the
disclaimer notice are retained. Information can be extracted from these files
and used in documentation or programs, as long as there is an accompanying
notice indicating the source.</i></p>
</blockquote>
<hr width="50%">
<p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img
src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40"
height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0"
alt="Terms of Use" width="152" height="49"><img
src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46"
height="49"></a>
</body>
</html>
|