diff options
author | Karl Williamson <khw@cpan.org> | 2016-10-05 19:09:02 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-10-13 11:18:12 -0600 |
commit | 2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 (patch) | |
tree | f634e7245187a35ca5b793e67350b3cb41f4377a /pod/perldiag.pod | |
parent | 1980a0f48b7a9b6e99cda0d5ae69cbb49da3cbf4 (diff) | |
download | perl-2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930.tar.gz |
utf8n_to_uvchr(): Note multiple malformations
Some UTF-8 sequences can have multiple malformations. For example, a
sequence can be the start of an overlong representation of a code point,
and still be incomplete. Until this commit what was generally done was
to stop looking when the first malformation was found. This was not
correct behavior, as that malformation may be allowed, while another
unallowed one went unnoticed. (But this did not actually create
security holes, as those allowed malformations replaced the input with a
REPLACEMENT CHARACTER.) This commit refactors the error handling of
this function to set a flag and keep going if a malformation is found
that doesn't preclude others. Then each is handled in a loop at the
end, warning if warranted. The result is that there is a warning for
each malformation for which warnings should be generated, and an error
return is made if any one is disallowed.
Overflow doesn't happen except for very high code points, well above the
Unicode range, and above fitting in 31 bits. Hence the latter 2
potential malformations are subsets of overflow, so only one warning is
output--the most dire.
This will speed up the normal case slightly, as the test for overflow is
pulled out of the loop, allowing the UV to overflow. Then a single test
after the loop is done to see if there was overflow or not.
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r-- | pod/perldiag.pod | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index d9f807c733..6b42a00dc5 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3344,7 +3344,7 @@ Perhaps the function's author was trying to write a subroutine signature but didn't enable that feature first (C<use feature 'signatures'>), so the signature was instead interpreted as a bad prototype. -=item Malformed UTF-8 character (%s) +=item Malformed UTF-8 character%s (S utf8)(F) Perl detected a string that should be UTF-8, but didn't comply with UTF-8 encoding rules, or represents a code point whose |