diff options
author | Karl Williamson <khw@cpan.org> | 2018-07-17 13:57:54 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-07-17 14:11:51 -0600 |
commit | aa3c16bd709ef9b9c8c785af48f368e08f70c74b (patch) | |
tree | cfd770ae37a97b0b3a9ba4bc9a2685e5cbb44f49 | |
parent | 69352d88af71728e6c8538f66892224127e2167f (diff) | |
download | perl-aa3c16bd709ef9b9c8c785af48f368e08f70c74b.tar.gz |
Make utf8_to_uvchr() safer
This function is deprecated because the API doesn't allow it to
determine the end of the input string, so it can read off the far end.
But I just realized that since many strings are NUL-terminated, so we
can forbid it from reading past the next NUL, and hence make it safe in
many cases.
-rw-r--r-- | utf8.c | 21 |
1 files changed, 20 insertions, 1 deletions
@@ -6345,7 +6345,26 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen) { PERL_ARGS_ASSERT_UTF8_TO_UVCHR; - return utf8_to_uvchr_buf(s, s + UTF8_MAXBYTES, retlen); + /* This function is unsafe if malformed UTF-8 input is given it, which is + * why the function is deprecated. If the first byte of the input + * indicates that there are more bytes remaining in the sequence that forms + * the character than there are in the input buffer, it can read past the + * end. But we can make it safe if the input string happens to be + * NUL-terminated, as many strings in Perl are, by refusing to read past a + * NUL. A NUL indicates the start of the next character anyway. If the + * input isn't NUL-terminated, the function remains unsafe, as it always + * has been. + * + * An initial NUL has to be handled separately, but all ASCIIs can be + * handled the same way, speeding up this common case */ + + if (UTF8_IS_INVARIANT(*s)) { /* Assumes 's' contains at least 1 byte */ + return (UV) *s; + } + + return utf8_to_uvchr_buf(s, + s + strnlen((char *) s, UTF8_MAXBYTES), + retlen); } /* |