diff options
author | Karl Williamson <khw@cpan.org> | 2016-12-14 11:38:42 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-12-23 16:48:35 -0700 |
commit | 5a48568dae7e81342fc2f8d0845423834f5c818f (patch) | |
tree | 255923c76214f0142ba304aa6fe7295e118b7103 /utf8.c | |
parent | d1f8d421df731c77beff3db92d27dc6ec28589f2 (diff) | |
download | perl-5a48568dae7e81342fc2f8d0845423834f5c818f.tar.gz |
Return REPLACEMENT for UTF-8 empty malformation
The previous commit no longer allows this so-called malformation under
DEBUGGING builds, except if code explicitly changes to request it (or
already explicitly does, but there are no instances of this in CPAN).
If it is explicitly allowed, prior to this commit it returned NUL. If
it wasn't allowed, it returned 0. Most code won't treat these as
different. When returning NUL, it basically is making nothing into
something, which might be exploitable some way by an attacker. The
Unicode accepted way of dealing with malformations is to replace them
with the REPLACEMENT CHARACTER, and so this commit changes things to
conform to this.
Diffstat (limited to 'utf8.c')
-rw-r--r-- | utf8.c | 10 |
1 files changed, 4 insertions, 6 deletions
@@ -875,10 +875,9 @@ is, when there is a shorter sequence that can express the same code point; overlong sequences are expressly forbidden in the UTF-8 standard due to potential security issues). Another malformation example is the first byte of a character not being a legal first byte. See F<utf8.h> for the list of such -flags. For allowed 0 length strings, this function returns 0; for allowed -overlong sequences, the computed code point is returned; for all other allowed -malformations, the Unicode REPLACEMENT CHARACTER is returned, as these have no -determinable reasonable value. +flags. For allowed overlong sequences, the computed code point is returned; +for all other allowed malformations, the Unicode REPLACEMENT CHARACTER is +returned. The C<UTF8_CHECK_ONLY> flag overrides the behavior when a non-allowed (by other flags) malformation is found. If this flag is set, the routine assumes that @@ -1123,8 +1122,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s, if (UNLIKELY(curlen == 0)) { possible_problems |= UTF8_GOT_EMPTY; curlen = 0; - uv = 0; /* XXX It could be argued that this should be - UNICODE_REPLACEMENT? */ + uv = UNICODE_REPLACEMENT; goto ready_to_handle_errors; } |