summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorFather Chrysostomos <sprout@cpan.org>2011-09-29 08:48:38 -0700
committerFather Chrysostomos <sprout@cpan.org>2011-10-06 13:01:10 -0700
commitc682ebef862f40c7b7ed8a6175ecb457b9981787 (patch)
tree1fd18653eeb152b22027bdae16c29d35e89022d0 /utf8.c
parent204e6232679d0d412347fddd9e5bd0e529da73d5 (diff)
downloadperl-c682ebef862f40c7b7ed8a6175ecb457b9981787.tar.gz
mro.c: Correct utf8 and bytes concatenation
The previous commit introduced some code that concatenates a pv on to an sv and then does SvUTF8_on on the sv if the pv was utf8. That can’t work if the sv was in Latin-1 (or single-byte) encoding and contained extra-ASCII characters. Nor can it work if bytes are appended to a utf8 sv. Both produce mangled utf8. There is apparently no function apart from sv_catsv that handle this. So I’ve modified sv_catpvn_flags to handle this if passed the SV_CATUTF8 (concatenating a utf8 pv) or SV_CATBYTES (cancatenating a byte pv) flag. This avoids the overhead of creating a new sv (in fact, sv_catsv even copies its rhs in some cases, so that would mean creating two new svs). It might even be worthwhile to redefine sv_catsv in terms of this....
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c3
1 files changed, 3 insertions, 0 deletions
diff --git a/utf8.c b/utf8.c
index 1773f2e34c..69ab6b977b 100644
--- a/utf8.c
+++ b/utf8.c
@@ -1091,6 +1091,9 @@ see sv_recode_to_utf8().
=cut
*/
+/* This logic is duplicated in sv_catpvn_flags, so any bug fixes will
+ likewise need duplication. */
+
U8*
Perl_bytes_to_utf8(pTHX_ const U8 *s, STRLEN *len)
{