summaryrefslogtreecommitdiff
path: root/t
diff options
context:
space:
mode:
authorFather Chrysostomos <sprout@cpan.org>2013-07-23 13:15:34 -0700
committerFather Chrysostomos <sprout@cpan.org>2013-08-25 12:22:40 -0700
commit25fdce4a165b6305e760d4c8d94404ce055657a0 (patch)
tree7c3aa76b83b1518991bf23909ee072c55de29138 /t
parent428ccf1e2d78d72b07c5e959e967569a82ce07ba (diff)
downloadperl-25fdce4a165b6305e760d4c8d94404ce055657a0.tar.gz
Stop pos() from being confused by changing utf8ness
The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
Diffstat (limited to 't')
-rw-r--r--t/op/pos.t33
1 files changed, 32 insertions, 1 deletions
diff --git a/t/op/pos.t b/t/op/pos.t
index b9691adc04..e9c863b5e4 100644
--- a/t/op/pos.t
+++ b/t/op/pos.t
@@ -6,7 +6,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 22;
+plan tests => 28;
$x='banana';
$x=~/.a/g;
@@ -91,3 +91,34 @@ sub {
pos $h{n} = 1;
ok $_[3] =~ /\Ge/, '\G works with defelem scalars';
}->($h{k}, $h{l}, $h{m}, $h{n});
+
+$x = bless [], chr 256;
+pos $x=1;
+bless $x, a;
+is pos($x), 1, 'pos is not affected by reference stringification changing';
+{
+ my $w;
+ local $SIG{__WARN__} = sub { $w .= shift };
+ $x = bless [], chr 256;
+ pos $x=1;
+ bless $x, "\x{1000}";
+ is pos $x, 1,
+ 'pos unchanged after increasing size of chars in stringification';
+ is $w, undef, 'and no malformed utf8 warning';
+}
+$x = bless [], chr 256;
+$x =~ /.(?{
+ bless $x, a;
+ is pos($x), 1, 'pos unaffected by ref str changing (in re-eval)';
+})/;
+{
+ my $w;
+ local $SIG{__WARN__} = sub { $w .= shift };
+ $x = bless [], chr(256);
+ $x =~ /.(?{
+ bless $x, "\x{1000}";
+ is pos $x, 1,
+ 'pos unchanged in re-eval after increasing size of chars in str';
+ })/;
+ is $w, undef, 'and no malformed utf8 warning';
+}