diff options
author | David Mitchell <davem@iabyn.com> | 2016-09-15 10:59:37 +0100 |
---|---|---|
committer | David Mitchell <davem@iabyn.com> | 2016-10-04 11:18:40 +0100 |
commit | 5012eebe5586df96a1869edfedea1382aa254085 (patch) | |
tree | 1ade02c4dd69a3204fb5db3a1b8588f6854c2946 /pp.c | |
parent | 1c5665476f0d7250c7d93f82eab2b7cda1e6937f (diff) | |
download | perl-5012eebe5586df96a1869edfedea1382aa254085.tar.gz |
make OP_SPLIT a PMOP, and eliminate OP_PUSHRE
Most ops that execute a regex, such as match and subst, are of type PMOP.
A PMOP allows the actual regex to be attached directly to that op, due
to its extra fields.
OP_SPLIT is different; it is just a plain LISTOP, but it always has an
OP_PUSHRE as its first child, which *is* a PMOP and which has the regex
attached.
At runtime, pp_pushre()'s only job is to push itself (i.e. the current
PL_op) onto the stack. Later pp_split() pops this to get access to the
regex it wants to execute.
This is a bit unpleasant, because we're pushing an OP* onto the stack,
which is supposed to be an array of SV*'s. As a bit of a hack, on
DEBUGGING builds we push a PVLV with the PL_op address embedded instead,
but this still isn't very satisfactory.
Now that regexes are first-class SVs, we could push a REGEXP onto the
stack rather than PL_op. However, there is an optimisation of @array =
split which eliminates the assign and embeds the array's GV/padix directly
in the PUSHRE op. So split still needs access to that op. But the pushre
op will always be splitop->op_first anyway, so one possibility is to just
skip executing the pushre altogether, and make pp_split just directly
access op_first instead to get the regex and @array info.
But if we're doing that, then why not just go the full hog and make
OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the
data that was spread across the two ops now combined into just the one
split op.
That is exactly what this commit does.
For a simple compile-time pattern like split(/foo/, $s, 1), the optree
looks like:
before:
<@> split[t2] lK
</> pushre(/"foo"/) s/RTIME
<0> padsv[$s:1,2] s
<$> const(IV 1) s
after:
</> split(/"foo"/)[t2] lK/RTIME
<0> padsv[$s:1,2] s
<$> const[IV 1] s
while for a run-time expression like split(/$pat/, $s, 1),
before:
<@> split[t3] lK
</> pushre() sK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const(IV 1)s
after:
</> split()[t3] lK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const[IV 1] s
This makes the code faster and simpler.
At the same time, two new private flags have been added for OP_SPLIT -
OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the
assign op has been optimised away, and if so, whether the array is
lexical.
Also, deparsing of split has been improved, to the extent that
perl TEST -deparse op/split.t
now passes.
Also, a couple of panic messages in pp_split() have been replaced with
asserts().
Diffstat (limited to 'pp.c')
-rw-r--r-- | pp.c | 38 |
1 files changed, 17 insertions, 21 deletions
@@ -5708,14 +5708,16 @@ PP(pp_reverse) PP(pp_split) { dSP; dTARG; - AV *ary = PL_op->op_flags & OPf_STACKED ? (AV *)POPs : NULL; + AV *ary = ( (PL_op->op_private & OPpSPLIT_ASSIGN) + && (PL_op->op_flags & OPf_STACKED)) + ? (AV *)POPs : NULL; IV limit = POPi; /* note, negative is forever */ SV * const sv = POPs; STRLEN len; const char *s = SvPV_const(sv, len); const bool do_utf8 = DO_UTF8(sv); const char *strend = s + len; - PMOP *pm; + PMOP *pm = cPMOPx(PL_op); REGEXP *rx; SV *dstr; const char *m; @@ -5736,33 +5738,26 @@ PP(pp_split) bool multiline = 0; MAGIC *mg = NULL; -#ifdef DEBUGGING - Copy(&LvTARGOFF(POPs), &pm, 1, PMOP*); -#else - pm = (PMOP*)POPs; -#endif - if (!pm) - DIE(aTHX_ "panic: pp_split, pm=%p, s=%p", pm, s); rx = PM_GETRE(pm); TAINT_IF(get_regex_charset(RX_EXTFLAGS(rx)) == REGEX_LOCALE_CHARSET && (RX_EXTFLAGS(rx) & (RXf_WHITE | RXf_SKIPWHITE))); + if (PL_op->op_private & OPpSPLIT_ASSIGN) { + if (!(PL_op->op_flags & OPf_STACKED)) { + if (PL_op->op_private & OPpSPLIT_LEX) + ary = (AV *)PAD_SVl(pm->op_pmreplrootu.op_pmtargetoff); + else { + GV *gv = #ifdef USE_ITHREADS - if (pm->op_pmreplrootu.op_pmtargetoff) { - ary = GvAVn(MUTABLE_GV(PAD_SVl(pm->op_pmreplrootu.op_pmtargetoff))); - goto have_av; - } + MUTABLE_GV(PAD_SVl(pm->op_pmreplrootu.op_pmtargetoff)); #else - if (pm->op_pmreplrootu.op_pmtargetgv) { - ary = GvAVn(pm->op_pmreplrootu.op_pmtargetgv); - goto have_av; - } + pm->op_pmreplrootu.op_pmtargetgv; #endif - else if (pm->op_targ) - ary = (AV *)PAD_SVl(pm->op_targ); - if (ary) { - have_av: + ary = GvAVn(gv); + } + } + realarray = 1; PUTBACK; av_extend(ary,0); @@ -5786,6 +5781,7 @@ PP(pp_split) make_mortal = 0; } } + base = SP - PL_stack_base; orig = s; if (RX_EXTFLAGS(rx) & RXf_SKIPWHITE) { |