Better document the difference between a block and a script.

p4raw-id: //depot/perl@11131
author: Jarkko Hietaniemi <jhi@iki.fi> 2001-07-03 23:02:02 +0000
committer: Jarkko Hietaniemi <jhi@iki.fi> 2001-07-03 23:02:02 +0000
commit: ad9cab3708f3a6aff28b5c1ca3a390c013235283 (patch)
tree: 080d5152748296c3c6decee34699748f96f3b5d3 /lib
parent: 16703a004678038faba1eda656251a1ad71e30db (diff)
download: perl-ad9cab3708f3a6aff28b5c1ca3a390c013235283.tar.gz
1 files changed, 24 insertions, 13 deletions
diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm
index 81a9aed348..4e310e7c1c 100644
--- a/lib/Unicode/UCD.pm
+++ b/lib/Unicode/UCD.pm
@@ -194,17 +194,7 @@ sub charblock {
     my $charscript = charscript(0x41);
 
 charscript() returns the script the character belongs to, e.g.
-C<Latin>, C<Greek>, C<Han>.  Note that not all the character positions
-within all scripts are defined.  
-
-The difference between a character block and a script is that script
-names are closer to the linguistic notion of a set of characters,
-while block is more of an artifact of the Unicode character numbering.
-For example the Latin B<script> is spread over several B<blocks>.
-
-Note also that the script names are all in uppercase, e.g. C<HEBREW>,
-while the block names are Capitalized and with intermixed spaces,
-e.g. C<Yi Syllables>.
+C<Latin>, C<Greek>, C<Han>.
 
 Unfortunately, currently (Perl 5.8.0) there is no regular expression
 notation for matching scripts as there is for blocks (C<\p{In...}>.
@@ -231,10 +221,31 @@ sub charscript {
     _search(\@SCRIPTS, 0, $#SCRIPTS, $code);
 }
 
+=head2 charblock versus charscript
+
+The difference between a character block and a script is that scripts
+are closer to the linguistic notion of a set of characters required to
+present languages, while block is more of an artifact of the Unicode
+character numbering.  For example the Latin B<script> is spread over
+several B<blocks>, such as C<Basic Latin>, C<Latin 1 Supplement>,
+C<Latin Extended-A>, and C<Latin Extended-B>.  On the other hand, the
+Latin script does not contain all the characters of the C<Basic Latin>
+block (also known as the ASCII): it includes only the letters, not for
+example the digits or the punctuation.
+
+For block see http://www.unicode.org/Public/UNIDATA/Blocks.txt
+
+For scripts see UTR #24: http://www.unicode.org/unicode/reports/tr24/
+
+Note also that the script names are all in uppercase, e.g. C<HEBREW>,
+while the block names are Capitalized and with intermixed spaces,
+e.g. C<Yi Syllables>.
+
 =head1 IMPLEMENTATION NOTE
 
-The first use of L<charinfo> opens a read-only filehandle to the Unicode
-Character Database.  The filehandle is kept open for further queries.
+The first use of charinfo() opens a read-only filehandle to the Unicode
+Character Database (the database is included in the Perl distribution).
+The filehandle is then kept open for further queries.
 
 =head1 AUTHOR
author	Jarkko Hietaniemi <jhi@iki.fi>	2001-07-03 23:02:02 +0000
committer	Jarkko Hietaniemi <jhi@iki.fi>	2001-07-03 23:02:02 +0000
commit	ad9cab3708f3a6aff28b5c1ca3a390c013235283 (patch)
tree	080d5152748296c3c6decee34699748f96f3b5d3 /lib
parent	16703a004678038faba1eda656251a1ad71e30db (diff)
download	perl-ad9cab3708f3a6aff28b5c1ca3a390c013235283.tar.gz