summaryrefslogtreecommitdiff
path: root/regcomp.h
diff options
context:
space:
mode:
authorDavid Mitchell <davem@iabyn.com>2011-08-25 11:41:49 +0100
committerDavid Mitchell <davem@iabyn.com>2012-06-13 13:25:49 +0100
commit68e2671bec1b01022978d5d5eb6eee8742396e13 (patch)
treeb717536b9af0d3627b948d49e60ecb113aa34d16 /regcomp.h
parent74529a43ce600615669683dcaf9e9521d374031c (diff)
downloadperl-68e2671bec1b01022978d5d5eb6eee8742396e13.tar.gz
Mostly complete fix for literal /(?{..})/ blocks
Change the way that code blocks in patterns are parsed and executed, especially as regards lexical and scoping behaviour. (Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns, and literals within qr//, are still done the old broken way for now). This change means that for literal /(?{..})/ and /(??{..})/: * the code block is now fully parsed in the same pass as the surrounding code, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" ) * Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from #!/usr/bin/perl /(?{ warn "boo" })/ has changed from boo at (re_eval 1) line 1. to boo at /tmp/p line 2. * scope and closures now behave as you might expect; for example for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ } now prints "abc" rather than "" * with recursion, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000": sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0); * an earlier fix that stopped 'my' declarations within code blocks causing crashes, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed; * UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though) This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit, for the non-qr and non-runtime variants, those code blocks are no longer thrown away. Instead: * the LISTOP generated by the parser, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern, is now stored in a new field in PMOPs, called op_code_list. For example in /A(?{BLOCK})C/, the listop stored in op_code_list looks like LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B'] * each of the code blocks has its last op set to null and is individually run through the peephole optimiser, so each one becomes a little self-contained block of code, rather than a list of blocks that run into each other; * then in re_op_compile(), we concatenate the list of CONSTs to produce a string to be compiled, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})']; * (if the current regex engine isn't the built-in perl one, then we just throw away the code blocks and pass the concatenated string to the engine) * then during regex compilation, whenever we encounter a '(?{', we see if it matches the index of one of the pre-compiled blocks, and if so, we store a pointer to that block in an 'l' data slot, and use the end index to skip over the text of the code body. Conversely, if the index doesn't match, then we know that it's a run-time pattern and (for now), compile it in the old way. * During execution, when an EVAL op is encountered, if data->what is 'l', then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.
Diffstat (limited to 'regcomp.h')
-rw-r--r--regcomp.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/regcomp.h b/regcomp.h
index 0fdb0058b8..a9da0c97e1 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -535,6 +535,7 @@ END_EXTERN_C
* The character describes the function of the corresponding .data item:
* a - AV for paren_name_list under DEBUGGING
* f - start-class data for regstclass optimization
+ * l - start op for literal (?{EVAL}) item
* n - Root of op tree for (?{EVAL}) item
* o - Start op for (?{EVAL}) item
* p - Pad for (?{EVAL}) item