summaryrefslogtreecommitdiff
path: root/t/run
diff options
context:
space:
mode:
authorDavid Mitchell <davem@iabyn.com>2011-07-23 21:29:02 +0100
committerDavid Mitchell <davem@iabyn.com>2012-06-13 13:25:48 +0100
commit9da1dd8f41c98df713957a658aa8fefcdf57163c (patch)
tree6e699deb4737149b04a9810c392ed0f7e082fb6b /t/run
parentecd2417115c94639d7dd7fb772dc8fc460b0ed57 (diff)
downloadperl-9da1dd8f41c98df713957a658aa8fefcdf57163c.tar.gz
make re_evals be seen by the toker/parser
This commit is a first step to making the handling of (/(?{...})/ more sane. But see the big proviso at the end. Currently a patten like /a(?{...})b/ is uninterpreted by the lexer and parser, and is instead passed as-is to the regex compiler, which is responsible for ensuring that the embedded perl code is extracted and compiled. The only thing the quoted string code in the lexer currently does is to skip nested matched {}'s, in order to get to end of the code block and restart looking for interpolated variables, \Q etc. This commit makes the lexer smarter. Consider the following pattern: /FOO(?{BLOCK})BAR$var/ This is currently tokenised as op_match ( op_const["FOO(?{BLOCK})BAR"] , $ "var" ) Instead, tokenise it as: op_match ( op_const["FOO"] , DO { BLOCK ; } , op_const["(?{BLOCK})"] , op_const["BAR"] , $ "var" ) This means that BLOCK is itself tokenised and parsed. We also insert a const into the stream to include the original source text of BLOCK so that it's available for stringifying qr's etc. Note that by allowing the lexer/parser direct access to BLOCK, we can now handle things like /(?{"{"})/ This mechanism is similar to the way something like "abc $a[foo(q(]))] def" is currently parsed: the double-quoted string handler in the lexer stops at $a[, the 'foo(q(]))' is treated as perl code, then at the end control is passed back to the string handler to handle the ' def'. This commit includes a new error message: Sequence (?{...}) not terminated with ')' since when control is passed back to the quoted-string handler, it expects to find the ')' as the next char. This new error mostly replaces the old Sequence (?{...}) not terminated or not {}-balanced in regex Big proviso: This commit updates toke.c to recognise the embedded code, but doesn't then do anything with it. The parser will pass both a compiled do block and a const for each embedded (?{..}), and Perl_pmruntime just throws away the do block and keeps the constant text instead which is passed to the regex compiler. So currently each code block gets compiled twice (!) with two sets of warnings etc. The next stage will be to pass these do blocks to the regex compiler. This commit is based on a patch I had originally worked up about 6 years ago and has been sitting bit-rotting ever since.
Diffstat (limited to 't/run')
-rw-r--r--t/run/fresh_perl.t4
1 files changed, 1 insertions, 3 deletions
diff --git a/t/run/fresh_perl.t b/t/run/fresh_perl.t
index 9c76a64f46..e1ffc1b823 100644
--- a/t/run/fresh_perl.t
+++ b/t/run/fresh_perl.t
@@ -355,9 +355,7 @@ Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <-- HERE
########
/(?{"{"}})/ # Check it outside of eval too
EXPECT
-Unmatched right curly bracket at (re_eval 1) line 1, at end of line
-syntax error at (re_eval 1) line 1, near ""{"}"
-Compilation failed in regexp at - line 1.
+Sequence (?{...}) not terminated with ')' at - line 1.
########
BEGIN { @ARGV = qw(a b c d e) }
BEGIN { print "argv <@ARGV>\nbegin <",shift,">\n" }