diff options
author | Karl Williamson <khw@cpan.org> | 2017-08-10 15:52:35 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2017-11-08 20:21:44 -0700 |
commit | 624504c5a60da0880a7d1d6d3e66f65c68ba28ae (patch) | |
tree | 2e10bdcad3179576394c2f1e3fb2fd1327cac6f0 /utf8.c | |
parent | 63ab03b3966fa7dcc24a137305becdb56bbf4e5a (diff) | |
download | perl-624504c5a60da0880a7d1d6d3e66f65c68ba28ae.tar.gz |
Dest buffer needs to be bigger for utf16_to_utf8()
These undocumented functions require the destination buffer to have the
worst case size. However that size (previously listed as 3/2 * input)
is wrong for EBCDIC. Correct the comments, and the single use of these
in core.
These functions do not have a way to avoid overflowing, which strikes me
as wrong.
Diffstat (limited to 'utf8.c')
-rw-r--r-- | utf8.c | 15 |
1 files changed, 12 insertions, 3 deletions
@@ -2364,10 +2364,19 @@ Perl_bytes_to_utf8(pTHX_ const U8 *s, STRLEN *lenp) } /* - * Convert native (big-endian) or reversed (little-endian) UTF-16 to UTF-8. + * Convert native (big-endian) UTF-16 to UTF-8. For reversed (little-endian), + * use utf16_to_utf8_reversed(). * - * Destination must be pre-extended to 3/2 source. Do not use in-place. - * We optimize for native, for obvious reasons. */ + * UTF-16 requires 2 bytes for every code point below 0x10000; otherwise 4 bytes. + * UTF-8 requires 1-3 bytes for every code point below 0x1000; otherwise 4 bytes. + * UTF-EBCDIC requires 1-4 bytes for every code point below 0x1000; otherwise 4-5 bytes. + * + * These functions don't check for overflow. The worst case is every code + * point in the input is 2 bytes, and requires 4 bytes on output. (If the code + * is never going to run in EBCDIC, it is 2 bytes requiring 3 on output.) Therefore the + * destination must be pre-extended to 2 times the source length. + * + * Do not use in-place. We optimize for native, for obvious reasons. */ U8* Perl_utf16_to_utf8(pTHX_ U8* p, U8* d, I32 bytelen, I32 *newlen) |