Dest buffer needs to be bigger for utf16_to_utf8()

These undocumented functions require the destination buffer to have the worst case size. However that size (previously listed as 3/2 * input) is wrong for EBCDIC. Correct the comments, and the single use of these in core. These functions do not have a way to avoid overflowing, which strikes me as wrong.
author: Karl Williamson <khw@cpan.org> 2017-08-10 15:52:35 -0600
committer: Karl Williamson <khw@cpan.org> 2017-11-08 20:21:44 -0700
commit: 624504c5a60da0880a7d1d6d3e66f65c68ba28ae (patch)
tree: 2e10bdcad3179576394c2f1e3fb2fd1327cac6f0 /utf8.c
parent: 63ab03b3966fa7dcc24a137305becdb56bbf4e5a (diff)
download: perl-624504c5a60da0880a7d1d6d3e66f65c68ba28ae.tar.gz
1 files changed, 12 insertions, 3 deletions
diff --git a/utf8.c b/utf8.c
index 6107348523..b731780fe4 100644
--- a/utf8.c
+++ b/utf8.c
@@ -2364,10 +2364,19 @@ Perl_bytes_to_utf8(pTHX_ const U8 *s, STRLEN *lenp)
 }
 
 /*
- * Convert native (big-endian) or reversed (little-endian) UTF-16 to UTF-8.
+ * Convert native (big-endian) UTF-16 to UTF-8.  For reversed (little-endian),
+ * use utf16_to_utf8_reversed().
  *
- * Destination must be pre-extended to 3/2 source.  Do not use in-place.
- * We optimize for native, for obvious reasons. */
+ * UTF-16 requires 2 bytes for every code point below 0x10000; otherwise 4 bytes.
+ * UTF-8 requires 1-3 bytes for every code point below 0x1000; otherwise 4 bytes.
+ * UTF-EBCDIC requires 1-4 bytes for every code point below 0x1000; otherwise 4-5 bytes.
+ *
+ * These functions don't check for overflow.  The worst case is every code
+ * point in the input is 2 bytes, and requires 4 bytes on output.  (If the code
+ * is never going to run in EBCDIC, it is 2 bytes requiring 3 on output.)  Therefore the
+ * destination must be pre-extended to 2 times the source length.
+ *
+ * Do not use in-place.  We optimize for native, for obvious reasons. */
 
 U8*
 Perl_utf16_to_utf8(pTHX_ U8* p, U8* d, I32 bytelen, I32 *newlen)
author	Karl Williamson <khw@cpan.org>	2017-08-10 15:52:35 -0600
committer	Karl Williamson <khw@cpan.org>	2017-11-08 20:21:44 -0700
commit	624504c5a60da0880a7d1d6d3e66f65c68ba28ae (patch)
tree	2e10bdcad3179576394c2f1e3fb2fd1327cac6f0 /utf8.c
parent	63ab03b3966fa7dcc24a137305becdb56bbf4e5a (diff)
download	perl-624504c5a60da0880a7d1d6d3e66f65c68ba28ae.tar.gz