diff options
author | Karl Williamson <khw@cpan.org> | 2018-08-04 13:19:58 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-08-05 05:52:48 -0600 |
commit | 00d976bbd170bbdc283618817b02b1b8f46bddd4 (patch) | |
tree | 2d22c0e7768b711e9a9f75a7729088871f0cecbf /utf8.c | |
parent | a4ee4fb50b9465c0ded09ff38f68516952970ec8 (diff) | |
download | perl-00d976bbd170bbdc283618817b02b1b8f46bddd4.tar.gz |
perlapi: Fix up pod for utf8n_to_uvchr_error()
There are two return flags signalling that the input UTF-8 was malformed
by being too short. This commit adds detail comparing and contrasting
the meanings of the two
Diffstat (limited to 'utf8.c')
-rw-r--r-- | utf8.c | 31 |
1 files changed, 30 insertions, 1 deletions
@@ -1365,7 +1365,8 @@ C<UTF8_DISALLOW_NONCHAR> or the C<UTF8_WARN_NONCHAR> flags. =item C<UTF8_GOT_NON_CONTINUATION> The input sequence was malformed in that a non-continuation type byte was found -in a position where only a continuation type one should be. +in a position where only a continuation type one should be. See also +L</C<UTF8_GOT_SHORT>>. =item C<UTF8_GOT_OVERFLOW> @@ -1378,6 +1379,34 @@ The input sequence was malformed in that C<curlen> is smaller than required for a complete sequence. In other words, the input is for a partial character sequence. + +C<UTF8_GOT_SHORT> and C<UTF8_GOT_NON_CONTINUATION> both indicate a too short +sequence. The difference is that C<UTF8_GOT_NON_CONTINUATION> indicates always +that there is an error, while C<UTF8_GOT_SHORT> means that an incomplete +sequence was looked at. If no other flags are present, it means that the +sequence was valid as far as it went. Depending on the application, this could +mean one of three things: + +=over + +=item * + +The C<curlen> length parameter passed in was too small, and the function was +prevented from examining all the necessary bytes. + +=item * + +The buffer being looked at is based on reading data, and the data received so +far stopped in the middle of a character, so that the next read will +read the remainder of this character. (It is up to the caller to deal with the +split bytes somehow.) + +=item * + +This is a real error, and the partial sequence is all we're going to get. + +=back + =item C<UTF8_GOT_SUPER> The input sequence was malformed in that it is for a non-Unicode code point; |