summaryrefslogtreecommitdiff
path: root/lib/Unicode
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2001-09-02 10:32:30 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2001-09-02 10:32:30 +0000
commit78bf21c2b5382395f2e75b313393c17f529af2e0 (patch)
tree68b7dce19ca1e54c363864957c07bcb67241fea1 /lib/Unicode
parent53c4c00cd908b83921217c52fa633bcfdd89f0fb (diff)
downloadperl-78bf21c2b5382395f2e75b313393c17f529af2e0.tar.gz
Slight doc tweaks for the module.
p4raw-id: //depot/perl@11824
Diffstat (limited to 'lib/Unicode')
-rw-r--r--lib/Unicode/UCD.pm40
1 files changed, 27 insertions, 13 deletions
diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm
index 3ce4a95670..d4525ccf2b 100644
--- a/lib/Unicode/UCD.pm
+++ b/lib/Unicode/UCD.pm
@@ -114,8 +114,8 @@ If no match is found, a reference to an empty hash is returned.
The C<block> property is the same as as returned by charinfo(). It is
not defined in the Unicode Character Database proper (Chapter 4 of the
-Unicode 3.0 Standard) but instead in an auxiliary database (Chapter 14
-of TUS3). Similarly for the C<script> property.
+Unicode 3.0 Standard, aka TUS3) but instead in an auxiliary database
+(Chapter 14 of TUS3). Similarly for the C<script> property.
Note that you cannot do (de)composition and casing based solely on the
above C<decomposition> and C<lower>, C<upper>, C<title>, properties,
@@ -327,12 +327,14 @@ sub charinrange {
my $charblock = charblock("0x263a");
my $charblock = charblock("U+263a");
- my $ranges = charblock('Armenian');
+ my $range = charblock('Armenian');
-With a B<code point argument> charblock() returns the block the character
+With a B<code point argument> charblock() returns the I<block> the character
belongs to, e.g. C<Basic Latin>. Note that not all the character
positions within all blocks are defined.
+See also L</Blocks versus Scripts>.
+
If supplied with an argument that can't be a code point, charblock()
tries to do the opposite and interpret the argument as a character
block. The return value is a I<range>: an anonymous list that
@@ -388,11 +390,13 @@ sub charblock {
my $charscript = charscript(1234);
my $charscript = charscript("U+263a");
- my $ranges = charscript('Thai');
+ my $range = charscript('Thai');
-With a B<code point argument> charscript() returns the script the
+With a B<code point argument> charscript() returns the I<script> the
character belongs to, e.g. C<Latin>, C<Greek>, C<Han>.
+See also L</Blocks versus Scripts>.
+
If supplied with an argument that can't be a code point, charscript()
tries to do the opposite and interpret the argument as a character
script. The return value is a I<range>: an anonymous list that
@@ -452,6 +456,8 @@ sub charscript {
charblocks() returns a reference to a hash with the known block names
as the keys, and the code point ranges (see L</charblock>) as the values.
+See also L</Blocks versus Scripts>.
+
=cut
sub charblocks {
@@ -468,6 +474,8 @@ sub charblocks {
charscripts() returns a hash with the known script names as the keys,
and the code point ranges (see L</charscript>) as the values.
+See also L</Blocks versus Scripts>.
+
=cut
sub charscripts {
@@ -503,14 +511,18 @@ C<\p{InCyrillic}>, C<\P{InBasicLatin}>. Spaces and dashes ('-') are
removed from the names for the C<\p{In...}>, for example
C<LatinExtendedA> instead of C<Latin Extended-A>.
-There are a few cases where there exists both a script and a block by
-the same name, in these cases the block version has C<Block> appended:
-C<\p{InKatakana}> is the script, C<\p{InKatakanaBlock}> is the block.
+There are a few cases where there is both a script and a block by the
+same name, in these cases the block version has C<Block> appended to
+its name: C<\p{InKatakana}> is the script, C<\p{InKatakanaBlock}> is
+the block.
=head2 Code Point Arguments
-A <code point argument> is either a decimal or a hexadecimal scalar,
-or "U+" followed by hexadecimals.
+A <code point argument> is either a decimal or a hexadecimal scalar
+designating a Unicode character, or "U+" followed by hexadecimals
+designating a Unicode character. Note that Unicode is B<not> limited
+to 16 bits (the number of Unicode characters is open-ended, in theory
+unlimited): you may have more than 4 hexdigits.
=head2 charinrange
@@ -721,7 +733,8 @@ sub casespec {
Unicode::UCD::UnicodeVersion() returns the version of the Unicode
Character Database, in other words, the version of the Unicode
-standard the database implements.
+standard the database implements. The version is a string
+of numbers delimited by dots (C<'.'>).
=cut
@@ -742,7 +755,8 @@ sub UnicodeVersion {
The first use of charinfo() opens a read-only filehandle to the Unicode
Character Database (the database is included in the Perl distribution).
-The filehandle is then kept open for further queries.
+The filehandle is then kept open for further queries. In other words,
+if you are wondering where one of your filehandles went, that's where.
=head1 AUTHOR