Unicode::UCD: Pod corrections, clarifications

author: Karl Williamson <khw@cpan.org> 2015-02-14 09:34:39 -0700
committer: Karl Williamson <khw@cpan.org> 2015-02-18 12:51:34 -0700
commit: 91e7847033aefac6cf5b500a5c72811c8d6b8fbc (patch)
tree: d2b1242b00e5f8ee11d717a89698f8e3bda751b7 /lib/Unicode/UCD.pm
parent: fc1bb3f2dcaaac9f305b27acb4800babdc8a06f3 (diff)
download: perl-91e7847033aefac6cf5b500a5c72811c8d6b8fbc.tar.gz
1 files changed, 35 insertions, 26 deletions
diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm
index 7e2ccf6a09..5d3115dcfe 100644
--- a/lib/Unicode/UCD.pm
+++ b/lib/Unicode/UCD.pm
@@ -111,7 +111,8 @@ Character Database.
 
 Some of the functions are called with a I<code point argument>, which is either
 a decimal or a hexadecimal scalar designating a code point in the platform's
-native character set (extended to Unicode), or C<U+> followed by hexadecimals
+native character set (extended to Unicode), or a string containing C<U+>
+followed by hexadecimals
 designating a Unicode code point.  A leading 0 will force a hexadecimal
 interpretation, as will a hexadecimal digit that isn't a decimal digit.
 
@@ -120,7 +121,7 @@ Examples:
     223     # Decimal 223 in native character set
     0223    # Hexadecimal 223, native (= 547 decimal)
     0xDF    # Hexadecimal DF, native (= 223 decimal
-    U+DF    # Hexadecimal DF, in Unicode's character set
+    'U+DF'  # Hexadecimal DF, in Unicode's character set
                               (= LATIN SMALL LETTER SHARP S)
 
 Note that the largest code point in Unicode is U+10FFFF.
@@ -284,30 +285,30 @@ As of Unicode 6.0, this is always empty.
 
 =item B<upper>
 
-is empty if there is no single code point uppercase mapping for I<code>
-(its uppercase mapping is itself);
-otherwise it is that mapping expressed as at least four hexdigits.
-(L</casespec()> should be used in addition to B<charinfo()>
-for case mappings when the calling program can cope with multiple code point
-mappings.)
+is, if non-empty, the uppercase mapping for I<code> expressed as at least four
+hexdigits.  This indicates that the full uppercase mapping is a single
+character, and is identical to the simple (single-character only) mapping.
+When this field is empty, it means that the simple uppercase mapping is
+I<code> itself; you'll need some other means, (like
+L</casespec()> to get the full mapping.
 
 =item B<lower>
 
-is empty if there is no single code point lowercase mapping for I<code>
-(its lowercase mapping is itself);
-otherwise it is that mapping expressed as at least four hexdigits.
-(L</casespec()> should be used in addition to B<charinfo()>
-for case mappings when the calling program can cope with multiple code point
-mappings.)
+is, if non-empty, the lowercase mapping for I<code> expressed as at least four
+hexdigits.  This indicates that the full lowercase mapping is a single
+character, and is identical to the simple (single-character only) mapping.
+When this field is empty, it means that the simple lowercase mapping is
+I<code> itself; you'll need some other means, (like
+L</casespec()> to get the full mapping.
 
 =item B<title>
 
-is empty if there is no single code point titlecase mapping for I<code>
-(its titlecase mapping is itself);
-otherwise it is that mapping expressed as at least four hexdigits.
-(L</casespec()> should be used in addition to B<charinfo()>
-for case mappings when the calling program can cope with multiple code point
-mappings.)
+is, if non-empty, the titlecase mapping for I<code> expressed as at least four
+hexdigits.  This indicates that the full titlecase mapping is a single
+character, and is identical to the simple (single-character only) mapping.
+When this field is empty, it means that the simple titlecase mapping is
+I<code> itself; you'll need some other means, (like
+L</casespec()> to get the full mapping.
 
 =item B<block>
 
@@ -2070,10 +2071,10 @@ are only a few dozen possible General Categories.
 
 You can use L</prop_values()> to find out if a given property is one which has
 a restricted set of values, and if so, what those values are.  But usually
-each value actually has several synonyms.  For example, in binary properties,
-I<truth> can be represented by any of the strings "Y", "Yes", "T", or "True";
-and the General Category "Punctuation" by that string, or "Punct", or simply
-"P".
+each value actually has several synonyms.  For example, in Unicode binary
+properties, I<truth> can be represented by any of the strings "Y", "Yes", "T",
+or "True"; and the General Category "Punctuation" by that string, or "Punct",
+or simply "P".
 
 Like property names, there is typically at least a short name for each such
 property-value, and a long name.  If you know any name of the property-value
@@ -2097,7 +2098,7 @@ C<undef>.
 
 If called with a property that doesn't have synonyms for its values, it
 returns the input value, possibly normalized with capitalization and
-underscores.
+underscores, but not necessarily checking that the input value is valid.
 
 For the block property, new-style block names are returned (see
 L</Old-style versus new-style block names>).
@@ -2890,6 +2891,14 @@ Use L</casefold()> for these.
 C<prop_invmap> does not know about any user-defined properties, and will
 return C<undef> if called with one of those.
 
+The returned values for the Perl extension properties, such as C<Any> and
+C<Greek> are somewhat misleading.  The values are either C<"Y"> or C<"N>".
+All Unicode properties are bipartite, so you can actually use the C<"Y"> or
+C<"N>" in a Perl regular rexpression for these, like C<qr/\p{ID_Start=Y/}> or
+C<qr/\p{Upper=N/}>.  But the Perl extensions aren't specified this way, only
+like C</qr/\p{Any}>, I<etc>.  You can't actually use the C<"Y"> and C<"N>" in
+them.
+
 =cut
 
 # User-defined properties could be handled with some changes to utf8_heavy.pl;
@@ -3794,7 +3803,7 @@ as C<Basic Latin>, C<Latin 1 Supplement>, C<Latin Extended-A>, and
 C<Latin Extended-B>.  On the other hand, the Latin script does not
 contain all the characters of the C<Basic Latin> block (also known as
 ASCII): it includes only the letters, and not, for example, the digits
-or the punctuation.
+nor the punctuation.
 
 For blocks see L<http://www.unicode.org/Public/UNIDATA/Blocks.txt>
author	Karl Williamson <khw@cpan.org>	2015-02-14 09:34:39 -0700
committer	Karl Williamson <khw@cpan.org>	2015-02-18 12:51:34 -0700
commit	91e7847033aefac6cf5b500a5c72811c8d6b8fbc (patch)
tree	d2b1242b00e5f8ee11d717a89698f8e3bda751b7 /lib/Unicode/UCD.pm
parent	fc1bb3f2dcaaac9f305b27acb4800babdc8a06f3 (diff)
download	perl-91e7847033aefac6cf5b500a5c72811c8d6b8fbc.tar.gz