summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-04-18 16:35:39 -0600
committerKarl Williamson <public@khwilliamson.com>2012-04-26 11:58:56 -0600
commit746afd533cc96b75c8a3c821291822f0c0ce7e2a (patch)
treea55cc43c53ca279f4c31e2dfe497e7e923c8c306 /utf8.c
parent99ee1dcd0469086e91a96e31a9b9ea27bb7f0c7e (diff)
downloadperl-746afd533cc96b75c8a3c821291822f0c0ce7e2a.tar.gz
utf8.c: Clarify and correct pod
Some of these were spotted by Hugo van der Sanden
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c12
1 files changed, 6 insertions, 6 deletions
diff --git a/utf8.c b/utf8.c
index 0eaee4ee33..7ddd9c75af 100644
--- a/utf8.c
+++ b/utf8.c
@@ -505,10 +505,10 @@ Perl_is_utf8_string_loclen(const U8 *s, STRLEN len, const U8 **ep, STRLEN *el)
=for apidoc utf8n_to_uvuni
Bottom level UTF-8 decode routine.
-Returns the code point value of the first character in the string C<s>
-which is assumed to be in UTF-8 (or UTF-EBCDIC) encoding and no longer than
-C<curlen> bytes; C<retlen> will be set to the length, in bytes, of that
-character.
+Returns the code point value of the first character in the string C<s>,
+which is assumed to be in UTF-8 (or UTF-EBCDIC) encoding, and no longer than
+C<curlen> bytes; C<*retlen> (if C<retlen> isn't NULL) will be set to
+the length, in bytes, of that character.
The value of C<flags> determines the behavior when C<s> does not point to a
well-formed UTF-8 character. If C<flags> is 0, when a malformation is found,
@@ -531,7 +531,7 @@ the caller will raise a warning, and this function will silently just set
C<retlen> to C<-1> and return zero.
Certain code points are considered problematic. These are Unicode surrogates,
-Unicode non-characters, and code points above the Unicode maximum of 0x10FFF.
+Unicode non-characters, and code points above the Unicode maximum of 0x10FFFF.
By default these are considered regular code points, but certain situations
warrant special handling for them. If C<flags> contains
UTF8_DISALLOW_ILLEGAL_INTERCHANGE, all three classes are treated as
@@ -551,7 +551,7 @@ Very large code points (above 0x7FFF_FFFF) are considered more problematic than
the others that are above the Unicode legal maximum. There are several
reasons: they do not fit into a 32-bit word, are not representable on EBCDIC
platforms, and the original UTF-8 specification never went above
-this number (the current 0x10FFF limit was imposed later). The UTF-8 encoding
+this number (the current 0x10FFFF limit was imposed later). The UTF-8 encoding
on ASCII platforms for these large code points begins with a byte containing
0xFE or 0xFF. The UTF8_DISALLOW_FE_FF flag will cause them to be treated as
malformations, while allowing smaller above-Unicode code points. (Of course