diff options
author | David Mitchell <davem@iabyn.com> | 2016-12-10 15:06:30 +0000 |
---|---|---|
committer | David Mitchell <davem@iabyn.com> | 2016-12-10 15:50:12 +0000 |
commit | 98d5e3efa825adce1bfa065a5deed791c30162ac (patch) | |
tree | e591c8784b775e2f3a8a1cafb0af37074d3c8e2a /t/re/reg_eval.t | |
parent | 88e94c8da78227f5d839a2b1aca58b5e4fd24364 (diff) | |
download | perl-98d5e3efa825adce1bfa065a5deed791c30162ac.tar.gz |
misaligned buffer with heredoc and /(?{...})/
RT #129199
When an re_eval like /(?{...})/ is tokenised, as well as tokenising the
individual elements of the code, the whole src string is returned as a
constant too, to enable the stringification of the regex to be calculated.
For example,
/abc(?{$x})def/
is tokenised like
MATCH '('
CONST('abc')
DO '{' '$' CONST('x') '}'
','
CONST('(?{$x})')
','
CONST('def'),
')'
If the code within the (?{...}) contains a heredoc (<<) and the PL_linestr
buffer happens to get reallocated, the pointer which points to the start
of the code string will get adjusted using the wrong buffer pointer.
Later when the end of the code is reached and the whole code string '(?{$x})'
is copied to a new SV, garbage may get copied (or it may panic with -ve
length, out of memory etc). Note that this garbage will only used for the
string representation of the regex, e.g.
my $r = qr/abc(?{$x})def/;
print "$r"; # garbage used here
/xyz$r/; # garbage not used here
Diffstat (limited to 't/re/reg_eval.t')
-rw-r--r-- | t/re/reg_eval.t | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/t/re/reg_eval.t b/t/re/reg_eval.t index 09bc3d4c0a..b492178ec6 100644 --- a/t/re/reg_eval.t +++ b/t/re/reg_eval.t @@ -83,4 +83,10 @@ fresh_perl_is($preamble . <<'CODE', 'no match ::', {}, 'regex distillation 4'); match("Jim Jones, 35 years old, secret wombat 007."); CODE +# RT #129199: this is mainly for ASAN etc's benefit +fresh_perl_is(<<'CODE', '', {}, "RT #129199:"); +/(?{<<""})/ +0 +CODE + done_testing; |