summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2014-04-23 13:51:48 -0600
committerKarl Williamson <khw@cpan.org>2014-04-23 17:08:08 -0600
commit75200dff8561a9c5d6eaa86a0ac75874bf13282b (patch)
tree1b821bbc87435dd0afac2cd0ef1d981f7d53551b /utf8.c
parent70d95cc994e515d77711476f6c853c3f1f1f1458 (diff)
downloadperl-75200dff8561a9c5d6eaa86a0ac75874bf13282b.tar.gz
perlapi: Clarify NUL handling for 2 fcns; nits
The string input to these two functions must be NUL terminated when the length parameter is 0.
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c18
1 files changed, 10 insertions, 8 deletions
diff --git a/utf8.c b/utf8.c
index fa5b4a7323..dab538789a 100644
--- a/utf8.c
+++ b/utf8.c
@@ -57,7 +57,9 @@ or not the string is encoded in UTF-8 (or UTF-EBCDIC on EBCDIC machines). That
is, if they are invariant. On ASCII-ish machines, only ASCII characters
fit this definition, hence the function's name.
-If C<len> is 0, it will be calculated using C<strlen(s)>.
+If C<len> is 0, it will be calculated using C<strlen(s)>, (which means if you
+use this option, that C<s> can't have embedded C<NUL> characters and has to
+have a terminating C<NUL> byte).
See also L</is_utf8_string>(), L</is_utf8_string_loclen>(), and L</is_utf8_string_loc>().
@@ -401,9 +403,9 @@ Perl_is_utf8_char(const U8 *s)
Returns true if the first C<len> bytes of string C<s> form a valid
UTF-8 string, false otherwise. If C<len> is 0, it will be calculated
-using C<strlen(s)> (which means if you use this option, that C<s> has to have a
-terminating NUL byte). Note that all characters being ASCII constitute 'a
-valid UTF-8 string'.
+using C<strlen(s)> (which means if you use this option, that C<s> can't have
+embedded C<NUL> characters and has to have a terminating C<NUL> byte). Note
+that all characters being ASCII constitute 'a valid UTF-8 string'.
See also L</is_ascii_string>(), L</is_utf8_string_loclen>(), and L</is_utf8_string_loc>().
@@ -548,11 +550,11 @@ flags) malformation is found. If this flag is set, the routine assumes that
the caller will raise a warning, and this function will silently just set
C<retlen> to C<-1> (cast to C<STRLEN>) and return zero.
-Note that this API requires disambiguation between successful decoding a NUL
+Note that this API requires disambiguation between successful decoding a C<NUL>
character, and an error return (unless the UTF8_CHECK_ONLY flag is set), as
in both cases, 0 is returned. To disambiguate, upon a zero return, see if the
-first byte of C<s> is 0 as well. If so, the input was a NUL; if not, the input
-had an error.
+first byte of C<s> is 0 as well. If so, the input was a C<NUL>; if not, the
+input had an error.
Certain code points are considered problematic. These are Unicode surrogates,
Unicode non-characters, and code points above the Unicode maximum of 0x10FFFF.
@@ -1400,7 +1402,7 @@ UTF-8.
Returns a pointer to the newly-created string, and sets C<len> to
reflect the new length in bytes.
-A NUL character will be written after the end of the string.
+A C<NUL> character will be written after the end of the string.
If you want to convert to UTF-8 from encodings other than
the native (Latin1 or EBCDIC),