utf8n_to_uvchr(): Note multiple malformations

Some UTF-8 sequences can have multiple malformations. For example, a sequence can be the start of an overlong representation of a code point, and still be incomplete. Until this commit what was generally done was to stop looking when the first malformation was found. This was not correct behavior, as that malformation may be allowed, while another unallowed one went unnoticed. (But this did not actually create security holes, as those allowed malformations replaced the input with a REPLACEMENT CHARACTER.) This commit refactors the error handling of this function to set a flag and keep going if a malformation is found that doesn't preclude others. Then each is handled in a loop at the end, warning if warranted. The result is that there is a warning for each malformation for which warnings should be generated, and an error return is made if any one is disallowed. Overflow doesn't happen except for very high code points, well above the Unicode range, and above fitting in 31 bits. Hence the latter 2 potential malformations are subsets of overflow, so only one warning is output--the most dire. This will speed up the normal case slightly, as the test for overflow is pulled out of the loop, allowing the UV to overflow. Then a single test after the loop is done to see if there was overflow or not.
author: Karl Williamson <khw@cpan.org> 2016-10-05 19:09:02 -0600
committer: Karl Williamson <khw@cpan.org> 2016-10-13 11:18:12 -0600
commit: 2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 (patch)
tree: f634e7245187a35ca5b793e67350b3cb41f4377a /pod/perldiag.pod
parent: 1980a0f48b7a9b6e99cda0d5ae69cbb49da3cbf4 (diff)
download: perl-2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930.tar.gz
1 files changed, 1 insertions, 1 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index d9f807c733..6b42a00dc5 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3344,7 +3344,7 @@ Perhaps the function's author was trying to write a subroutine signature
 but didn't enable that feature first (C<use feature 'signatures'>),
 so the signature was instead interpreted as a bad prototype.
 
-=item Malformed UTF-8 character (%s)
+=item Malformed UTF-8 character%s
 
 (S utf8)(F) Perl detected a string that should be UTF-8, but didn't
 comply with UTF-8 encoding rules, or represents a code point whose
author	Karl Williamson <khw@cpan.org>	2016-10-05 19:09:02 -0600
committer	Karl Williamson <khw@cpan.org>	2016-10-13 11:18:12 -0600
commit	2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 (patch)
tree	f634e7245187a35ca5b793e67350b3cb41f4377a /pod/perldiag.pod
parent	1980a0f48b7a9b6e99cda0d5ae69cbb49da3cbf4 (diff)
download	perl-2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930.tar.gz