Make utf8_to_uvchr() safer

This function is deprecated because the API doesn't allow it to determine the end of the input string, so it can read off the far end. But I just realized that since many strings are NUL-terminated, so we can forbid it from reading past the next NUL, and hence make it safe in many cases.
author: Karl Williamson <khw@cpan.org> 2018-07-17 13:57:54 -0600
committer: Karl Williamson <khw@cpan.org> 2018-07-17 14:11:51 -0600
commit: aa3c16bd709ef9b9c8c785af48f368e08f70c74b (patch)
tree: cfd770ae37a97b0b3a9ba4bc9a2685e5cbb44f49
parent: 69352d88af71728e6c8538f66892224127e2167f (diff)
download: perl-aa3c16bd709ef9b9c8c785af48f368e08f70c74b.tar.gz
1 files changed, 20 insertions, 1 deletions
diff --git a/utf8.c b/utf8.c
index dec8aa1252..51039aed4f 100644
--- a/utf8.c
+++ b/utf8.c
@@ -6345,7 +6345,26 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen)
 {
     PERL_ARGS_ASSERT_UTF8_TO_UVCHR;
 
-    return utf8_to_uvchr_buf(s, s + UTF8_MAXBYTES, retlen);
+    /* This function is unsafe if malformed UTF-8 input is given it, which is
+     * why the function is deprecated.  If the first byte of the input
+     * indicates that there are more bytes remaining in the sequence that forms
+     * the character than there are in the input buffer, it can read past the
+     * end.  But we can make it safe if the input string happens to be
+     * NUL-terminated, as many strings in Perl are, by refusing to read past a
+     * NUL.  A NUL indicates the start of the next character anyway.  If the
+     * input isn't NUL-terminated, the function remains unsafe, as it always
+     * has been.
+     *
+     * An initial NUL has to be handled separately, but all ASCIIs can be
+     * handled the same way, speeding up this common case */
+
+    if (UTF8_IS_INVARIANT(*s)) {  /* Assumes 's' contains at least 1 byte */
+        return (UV) *s;
+    }
+
+    return utf8_to_uvchr_buf(s,
+                             s + strnlen((char *) s, UTF8_MAXBYTES),
+                            retlen);
 }
 
 /*
author	Karl Williamson <khw@cpan.org>	2018-07-17 13:57:54 -0600
committer	Karl Williamson <khw@cpan.org>	2018-07-17 14:11:51 -0600
commit	aa3c16bd709ef9b9c8c785af48f368e08f70c74b (patch)
tree	cfd770ae37a97b0b3a9ba4bc9a2685e5cbb44f49
parent	69352d88af71728e6c8538f66892224127e2167f (diff)
download	perl-aa3c16bd709ef9b9c8c785af48f368e08f70c74b.tar.gz