summaryrefslogtreecommitdiff
path: root/embed.h
diff options
context:
space:
mode:
authorFather Chrysostomos <sprout@cpan.org>2013-07-23 13:15:34 -0700
committerFather Chrysostomos <sprout@cpan.org>2013-08-25 12:22:40 -0700
commit25fdce4a165b6305e760d4c8d94404ce055657a0 (patch)
tree7c3aa76b83b1518991bf23909ee072c55de29138 /embed.h
parent428ccf1e2d78d72b07c5e959e967569a82ce07ba (diff)
downloadperl-25fdce4a165b6305e760d4c8d94404ce055657a0.tar.gz
Stop pos() from being confused by changing utf8ness
The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
Diffstat (limited to 'embed.h')
-rw-r--r--embed.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/embed.h b/embed.h
index 4c62a834a9..49700ca352 100644
--- a/embed.h
+++ b/embed.h
@@ -872,6 +872,7 @@
#define regprop(a,b,c) Perl_regprop(aTHX_ a,b,c)
#define report_uninit(a) Perl_report_uninit(aTHX_ a)
#define sv_magicext_mglob(a) Perl_sv_magicext_mglob(aTHX_ a)
+#define sv_or_pv_pos_u2b(a,b,c,d) S_sv_or_pv_pos_u2b(aTHX_ a,b,c,d)
#define validate_proto(a,b,c) Perl_validate_proto(aTHX_ a,b,c)
#define vivify_defelem(a) Perl_vivify_defelem(aTHX_ a)
#define yylex() Perl_yylex(aTHX)