summaryrefslogtreecommitdiff
path: root/lib/Unicode/UCD.pm
diff options
context:
space:
mode:
Diffstat (limited to 'lib/Unicode/UCD.pm')
-rw-r--r--lib/Unicode/UCD.pm61
1 files changed, 51 insertions, 10 deletions
diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm
index 074284f5fb..a1f16a99ff 100644
--- a/lib/Unicode/UCD.pm
+++ b/lib/Unicode/UCD.pm
@@ -2252,20 +2252,56 @@ Devanagari, Gurmukhi, and Oriya scripts.
The Name_Alias property is of this form. But each scalar consists of two
components: 1) the name, and 2) the type of alias this is. They are
-separated by a colon and a space. In Unicode 6.0, there are two alias types:
-C<"correction">, which indicates that the name is a corrected form for the
-original name (which remains valid) for the same code point; and C<"control">,
-which adds a new name for a control character.
+separated by a colon and a space. In Unicode 6.1, there are several alias types:
+
+=over
+
+=item C<correction>
+
+indicates that the name is a corrected form for the
+original name (which remains valid) for the same code point.
+
+=item C<control>
+
+adds a new name for a control character.
+
+=item C<alternate>
+
+is an alternate name for a character
+
+=item C<figment>
+
+is a name for a character that has been documented but was never in any
+actual standard.
+
+=item C<abbreviation>
+
+is a common abbreviation for a character
+
+=back
+
+The lists are ordered (roughly) so the most preferred names come before less
+preferred ones.
For example,
- @aliases_ranges @alias_maps
+ @aliases_ranges @alias_maps
+ ...
+ 0x009E [ 'PRIVACY MESSAGE: control', 'PM: abbreviation' ]
+ 0x009F [ 'APPLICATION PROGRAM COMMAND: control',
+ 'APC: abbreviation'
+ ]
+ 0x00A0 'NBSP: abbreviation'
+ 0x00A1 ""
+ 0x00AD 'SHY: abbreviation'
+ 0x00AE ""
+ 0x01A2 'LATIN CAPITAL LETTER GHA: correction'
+ 0x01A3 'LATIN SMALL LETTER GHA: correction'
+ 0x01A4 ""
...
- 0x01A2 LATIN CAPITAL LETTER GHA: correction
- 0x01A3 LATIN SMALL LETTER GHA: correction
-Unicode 6.1 will introduce other types, and some map entries will be lists of
-multiple name-alias pairs for a single code point.
+A map to the empty string means that there is no alias defined for the code
+point.
=item C<r>
@@ -2409,7 +2445,9 @@ the function L<charnames/charnames::viacode(code)>.
Note that for control characters (C<Gc=cc>), Unicode's data files have the
string "C<E<lt>controlE<gt>>", but the real name of each of these characters is the empty
-string. This function returns that real name, the empty string.
+string. This function returns that real name, the empty string. (There are
+names for these characters, but they are aliases, not the real name, and are
+contained in the C<Name_Alias> property.)
=item C<d>
@@ -3179,6 +3217,9 @@ To convert from new-style to old-style, follow this recipe:
gets the lower end of the range (0th element) and then looks up the old name
for its block using C<charblock>).
+Note that starting in Unicode 6.1, many of the block names have shorter
+synonyms. These are always given in the new style.
+
=head1 BUGS
Does not yet support EBCDIC platforms.