diff options
author | Father Chrysostomos <sprout@cpan.org> | 2013-07-23 13:15:34 -0700 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2013-08-25 12:22:40 -0700 |
commit | 25fdce4a165b6305e760d4c8d94404ce055657a0 (patch) | |
tree | 7c3aa76b83b1518991bf23909ee072c55de29138 /t/op/pos.t | |
parent | 428ccf1e2d78d72b07c5e959e967569a82ce07ba (diff) | |
download | perl-25fdce4a165b6305e760d4c8d94404ce055657a0.tar.gz |
Stop pos() from being confused by changing utf8ness
The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at -e line 1.
0
So pos() should be stored as a character offset.
The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.
This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.
Diffstat (limited to 't/op/pos.t')
-rw-r--r-- | t/op/pos.t | 33 |
1 files changed, 32 insertions, 1 deletions
diff --git a/t/op/pos.t b/t/op/pos.t index b9691adc04..e9c863b5e4 100644 --- a/t/op/pos.t +++ b/t/op/pos.t @@ -6,7 +6,7 @@ BEGIN { require './test.pl'; } -plan tests => 22; +plan tests => 28; $x='banana'; $x=~/.a/g; @@ -91,3 +91,34 @@ sub { pos $h{n} = 1; ok $_[3] =~ /\Ge/, '\G works with defelem scalars'; }->($h{k}, $h{l}, $h{m}, $h{n}); + +$x = bless [], chr 256; +pos $x=1; +bless $x, a; +is pos($x), 1, 'pos is not affected by reference stringification changing'; +{ + my $w; + local $SIG{__WARN__} = sub { $w .= shift }; + $x = bless [], chr 256; + pos $x=1; + bless $x, "\x{1000}"; + is pos $x, 1, + 'pos unchanged after increasing size of chars in stringification'; + is $w, undef, 'and no malformed utf8 warning'; +} +$x = bless [], chr 256; +$x =~ /.(?{ + bless $x, a; + is pos($x), 1, 'pos unaffected by ref str changing (in re-eval)'; +})/; +{ + my $w; + local $SIG{__WARN__} = sub { $w .= shift }; + $x = bless [], chr(256); + $x =~ /.(?{ + bless $x, "\x{1000}"; + is pos $x, 1, + 'pos unchanged in re-eval after increasing size of chars in str'; + })/; + is $w, undef, 'and no malformed utf8 warning'; +} |