diff options
author | Father Chrysostomos <sprout@cpan.org> | 2011-09-29 08:48:38 -0700 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2011-10-06 13:01:10 -0700 |
commit | c682ebef862f40c7b7ed8a6175ecb457b9981787 (patch) | |
tree | 1fd18653eeb152b22027bdae16c29d35e89022d0 /utf8.c | |
parent | 204e6232679d0d412347fddd9e5bd0e529da73d5 (diff) | |
download | perl-c682ebef862f40c7b7ed8a6175ecb457b9981787.tar.gz |
mro.c: Correct utf8 and bytes concatenation
The previous commit introduced some code that concatenates a pv on to
an sv and then does SvUTF8_on on the sv if the pv was utf8.
That can’t work if the sv was in Latin-1 (or single-byte) encoding
and contained extra-ASCII characters. Nor can it work if bytes are
appended to a utf8 sv. Both produce mangled utf8.
There is apparently no function apart from sv_catsv that handle
this. So I’ve modified sv_catpvn_flags to handle this if passed the
SV_CATUTF8 (concatenating a utf8 pv) or SV_CATBYTES (cancatenating a
byte pv) flag.
This avoids the overhead of creating a new sv (in fact, sv_catsv
even copies its rhs in some cases, so that would mean creating two
new svs). It might even be worthwhile to redefine sv_catsv in terms
of this....
Diffstat (limited to 'utf8.c')
-rw-r--r-- | utf8.c | 3 |
1 files changed, 3 insertions, 0 deletions
@@ -1091,6 +1091,9 @@ see sv_recode_to_utf8(). =cut */ +/* This logic is duplicated in sv_catpvn_flags, so any bug fixes will + likewise need duplication. */ + U8* Perl_bytes_to_utf8(pTHX_ const U8 *s, STRLEN *len) { |