Add OP_MULTICONCAT op

Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can make things a *lot* faster: 4x or more. In more detail: it will optimise into a single OP_MULTICONCAT, most expressions of the form LHS RHS where LHS is one of (empty) my $lexical = $lexical = $lexical .= expression = expression .= and RHS is one of (A . B . C . ...) where A,B,C etc are expressions and/or string constants "aAbBc..." where a,A,b,B etc are expressions and/or string constants sprintf "..%s..%s..", A,B,.. where the format is a constant string containing only '%s' and '%%' elements, and A,B, etc are scalar expressions (so only a fixed, compile-time-known number of args: no arrays or list context function calls etc) It doesn't optimise other forms, such as ($a . $b) . ($c. $d) ((($a .= $b) .= $c) .= $d); (although sub-parts of those expressions might be converted to an OP_MULTICONCAT). This is partly because it would be hard to maintain the correct ordering of tie or overload calls. The compiler uses heuristics to determine when to convert: in general, expressions involving a single OP_CONCAT aren't converted, unless some other saving can be made, for example if an OP_CONST can be eliminated, or in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply OPpTARGET_MY to, but OP_CONST can't. The multiconcat op is of type UNOP_AUX, with the op_aux structure directly holding a pointer to a single constant char* string plus a list of segment lengths. So for "a=$a b=$b\n"; the constant string is "a= b=\n", and the segment lengths are (2,3,1). If the constant string has different non-utf8 and utf8 representations (such as "\x80") then both variants are pre-computed and stored in the aux struct, along with two sets of segment lengths. For all the above LHS types, any SASSIGN op is optimised away. For a LHS of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too. For example where $a and $b are lexical vars, this statement: my $c = "a=$a, b=$b\n"; formerly compiled to const[PV "a="] s padsv[$a:1,3] s concat[t4] sK/2 const[PV ", b="] s concat[t5] sKS/2 padsv[$b:1,3] s concat[t6] sKS/2 const[PV "\n"] s concat[t7] sKS/2 padsv[$c:2,3] sRM*/LVINTRO sassign vKS/2 and now compiles to: padsv[$a:1,3] s padsv[$b:1,3] s multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY In terms of how much faster it is, this code: my $a = "the quick brown fox jumps over the lazy dog"; my $b = "to be, or not to be; sorry, what was the question again?"; for my $i (1..10_000_000) { my $c = "a=$a, b=$b\n"; } runs 2.7 times faster, and if you throw utf8 mixtures in it gets even better. This loop runs 4 times faster: my $s; my $a = "ab\x{100}cde"; my $b = "fghij"; my $c = "\x{101}klmn"; for my $i (1..10_000_000) { $s = "\x{100}wxyz"; $s .= "foo=$a bar=$b baz=$c"; } The main ways in which OP_MULTICONCAT gains its speed are: * any OP_CONSTs are eliminated, and the constant bits (already in the right encoding) are copied directly from the constant string attached to the op's aux structure. * It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in all cases; OP_CONCAT only did this in very limited circumstances. * Because it has a holistic view of the entire concatenation expression, it can do the whole thing in one efficient go, rather than creating and copying intermediate results. pp_multiconcat() goes to considerable efforts to avoid inefficiencies. For example it will only SvGROW() the target once, and to the exact size needed, no matter what mix of utf8 and non-utf8 appear on the LHS and RHS. It never allocates any temporary SVs except possibly in the case of tie or overloading. * It does all its own appending and utf8 handling rather than calling out to functions like sv_catsv(). * It's very good at handling the LHS appearing on the RHS; for example in $x = "abcd"; $x = "-$x-$x-"; It will do roughly the equivalent of the following (where targ is $x); SvPV_force(targ); SvGROW(targ, 11); p = SvPVX(targ); Move(p, p+1, 4, char); Copy("-", p, 1, char); Copy("-", p+5, 1, char); Copy(p+1, p+6, 4, char); Copy("-", p+10, 1, char); SvCUR(targ) = 11; p[11] = '\0'; Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to handle situations like that. The code is quite big; both S_maybe_multiconcat() and pp_multiconcat() (the main compile-time and runtime parts of the implementation) are over 700 lines each. It turns out that when you combine multiple ops, the number of edge cases grows exponentially ;-)
author: David Mitchell <davem@iabyn.com> 2017-08-08 18:42:14 +0100
committer: David Mitchell <davem@iabyn.com> 2017-10-31 15:31:26 +0000
commit: e839e6ed99c6b25aee589f56bb58de2f8fa00f41 (patch)
tree: 30bab03fdd8e73c4cb6e5b2d33ab1f428693a3a8 /gv.c
parent: c0acf911f65b2badbd72efa28edb2d197639a51b (diff)
download: perl-e839e6ed99c6b25aee589f56bb58de2f8fa00f41.tar.gz
1 files changed, 6 insertions, 1 deletions
diff --git a/gv.c b/gv.c
index fed5b7c2ee..90e8fe0f1e 100644
--- a/gv.c
+++ b/gv.c
@@ -3476,7 +3476,12 @@ Perl_amagic_call(pTHX_ SV *left, SV *right, int method, int flags)
     SV* res;
     const bool oldcatch = CATCH_GET;
     I32 oldmark, nret;
-    U8 gimme = force_scalar ? G_SCALAR : GIMME_V;
+                /* for multiconcat, we may call overload several times,
+                 * with the context of individual concats being scalar,
+                 * regardless of the overall context of the multiconcat op
+                 */
+    U8 gimme = (force_scalar || PL_op->op_type == OP_MULTICONCAT)
+                    ? G_SCALAR : GIMME_V;
 
     CATCH_SET(TRUE);
     Zero(&myop, 1, BINOP);
author	David Mitchell <davem@iabyn.com>	2017-08-08 18:42:14 +0100
committer	David Mitchell <davem@iabyn.com>	2017-10-31 15:31:26 +0000
commit	e839e6ed99c6b25aee589f56bb58de2f8fa00f41 (patch)
tree	30bab03fdd8e73c4cb6e5b2d33ab1f428693a3a8 /gv.c
parent	c0acf911f65b2badbd72efa28edb2d197639a51b (diff)
download	perl-e839e6ed99c6b25aee589f56bb58de2f8fa00f41.tar.gz