summaryrefslogtreecommitdiff
path: root/pod/perldiag.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2016-10-05 19:09:02 -0600
committerKarl Williamson <khw@cpan.org>2016-10-13 11:18:12 -0600
commit2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 (patch)
treef634e7245187a35ca5b793e67350b3cb41f4377a /pod/perldiag.pod
parent1980a0f48b7a9b6e99cda0d5ae69cbb49da3cbf4 (diff)
downloadperl-2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930.tar.gz
utf8n_to_uvchr(): Note multiple malformations
Some UTF-8 sequences can have multiple malformations. For example, a sequence can be the start of an overlong representation of a code point, and still be incomplete. Until this commit what was generally done was to stop looking when the first malformation was found. This was not correct behavior, as that malformation may be allowed, while another unallowed one went unnoticed. (But this did not actually create security holes, as those allowed malformations replaced the input with a REPLACEMENT CHARACTER.) This commit refactors the error handling of this function to set a flag and keep going if a malformation is found that doesn't preclude others. Then each is handled in a loop at the end, warning if warranted. The result is that there is a warning for each malformation for which warnings should be generated, and an error return is made if any one is disallowed. Overflow doesn't happen except for very high code points, well above the Unicode range, and above fitting in 31 bits. Hence the latter 2 potential malformations are subsets of overflow, so only one warning is output--the most dire. This will speed up the normal case slightly, as the test for overflow is pulled out of the loop, allowing the UV to overflow. Then a single test after the loop is done to see if there was overflow or not.
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r--pod/perldiag.pod2
1 files changed, 1 insertions, 1 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index d9f807c733..6b42a00dc5 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3344,7 +3344,7 @@ Perhaps the function's author was trying to write a subroutine signature
but didn't enable that feature first (C<use feature 'signatures'>),
so the signature was instead interpreted as a bad prototype.
-=item Malformed UTF-8 character (%s)
+=item Malformed UTF-8 character%s
(S utf8)(F) Perl detected a string that should be UTF-8, but didn't
comply with UTF-8 encoding rules, or represents a code point whose