summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-08-04 13:19:58 -0600
committerKarl Williamson <khw@cpan.org>2018-08-05 05:52:48 -0600
commit00d976bbd170bbdc283618817b02b1b8f46bddd4 (patch)
tree2d22c0e7768b711e9a9f75a7729088871f0cecbf /utf8.c
parenta4ee4fb50b9465c0ded09ff38f68516952970ec8 (diff)
downloadperl-00d976bbd170bbdc283618817b02b1b8f46bddd4.tar.gz
perlapi: Fix up pod for utf8n_to_uvchr_error()
There are two return flags signalling that the input UTF-8 was malformed by being too short. This commit adds detail comparing and contrasting the meanings of the two
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c31
1 files changed, 30 insertions, 1 deletions
diff --git a/utf8.c b/utf8.c
index 06b77689c0..cba1523aa6 100644
--- a/utf8.c
+++ b/utf8.c
@@ -1365,7 +1365,8 @@ C<UTF8_DISALLOW_NONCHAR> or the C<UTF8_WARN_NONCHAR> flags.
=item C<UTF8_GOT_NON_CONTINUATION>
The input sequence was malformed in that a non-continuation type byte was found
-in a position where only a continuation type one should be.
+in a position where only a continuation type one should be. See also
+L</C<UTF8_GOT_SHORT>>.
=item C<UTF8_GOT_OVERFLOW>
@@ -1378,6 +1379,34 @@ The input sequence was malformed in that C<curlen> is smaller than required for
a complete sequence. In other words, the input is for a partial character
sequence.
+
+C<UTF8_GOT_SHORT> and C<UTF8_GOT_NON_CONTINUATION> both indicate a too short
+sequence. The difference is that C<UTF8_GOT_NON_CONTINUATION> indicates always
+that there is an error, while C<UTF8_GOT_SHORT> means that an incomplete
+sequence was looked at. If no other flags are present, it means that the
+sequence was valid as far as it went. Depending on the application, this could
+mean one of three things:
+
+=over
+
+=item *
+
+The C<curlen> length parameter passed in was too small, and the function was
+prevented from examining all the necessary bytes.
+
+=item *
+
+The buffer being looked at is based on reading data, and the data received so
+far stopped in the middle of a character, so that the next read will
+read the remainder of this character. (It is up to the caller to deal with the
+split bytes somehow.)
+
+=item *
+
+This is a real error, and the partial sequence is all we're going to get.
+
+=back
+
=item C<UTF8_GOT_SUPER>
The input sequence was malformed in that it is for a non-Unicode code point;