summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2016-12-14 11:38:42 -0700
committerKarl Williamson <khw@cpan.org>2016-12-23 16:48:35 -0700
commit5a48568dae7e81342fc2f8d0845423834f5c818f (patch)
tree255923c76214f0142ba304aa6fe7295e118b7103 /utf8.c
parentd1f8d421df731c77beff3db92d27dc6ec28589f2 (diff)
downloadperl-5a48568dae7e81342fc2f8d0845423834f5c818f.tar.gz
Return REPLACEMENT for UTF-8 empty malformation
The previous commit no longer allows this so-called malformation under DEBUGGING builds, except if code explicitly changes to request it (or already explicitly does, but there are no instances of this in CPAN). If it is explicitly allowed, prior to this commit it returned NUL. If it wasn't allowed, it returned 0. Most code won't treat these as different. When returning NUL, it basically is making nothing into something, which might be exploitable some way by an attacker. The Unicode accepted way of dealing with malformations is to replace them with the REPLACEMENT CHARACTER, and so this commit changes things to conform to this.
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c10
1 files changed, 4 insertions, 6 deletions
diff --git a/utf8.c b/utf8.c
index 231264871f..d34597bb56 100644
--- a/utf8.c
+++ b/utf8.c
@@ -875,10 +875,9 @@ is, when there is a shorter sequence that can express the same code point;
overlong sequences are expressly forbidden in the UTF-8 standard due to
potential security issues). Another malformation example is the first byte of
a character not being a legal first byte. See F<utf8.h> for the list of such
-flags. For allowed 0 length strings, this function returns 0; for allowed
-overlong sequences, the computed code point is returned; for all other allowed
-malformations, the Unicode REPLACEMENT CHARACTER is returned, as these have no
-determinable reasonable value.
+flags. For allowed overlong sequences, the computed code point is returned;
+for all other allowed malformations, the Unicode REPLACEMENT CHARACTER is
+returned.
The C<UTF8_CHECK_ONLY> flag overrides the behavior when a non-allowed (by other
flags) malformation is found. If this flag is set, the routine assumes that
@@ -1123,8 +1122,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
if (UNLIKELY(curlen == 0)) {
possible_problems |= UTF8_GOT_EMPTY;
curlen = 0;
- uv = 0; /* XXX It could be argued that this should be
- UNICODE_REPLACEMENT? */
+ uv = UNICODE_REPLACEMENT;
goto ready_to_handle_errors;
}