summaryrefslogtreecommitdiff
path: root/dump.c
Commit message (Collapse)AuthorAgeFilesLines
* Simplify some LC_NUMERIC macrosKarl Williamson2018-01-301-4/+6
| | | | | | | These macros are marked as subject to change and are not documented externally. I don't know what I was thinking when I named some of them, but whatever no longer makes sense to me. Simplify them, and change so there is only one restore macro to remember.
* op_dump(): dump tr/// translation tableDavid Mitchell2018-01-191-3/+35
| | | | | | previously it just displayed its address. Also, when the table is in fact a swash, don't display its address on threaded builds, as its actually just a padix.
* revert smartmatch to 5.27.6 behaviourZefram2017-12-291-2/+2
| | | | | | | | | | | | | The pumpking has determined that the CPAN breakage caused by changing smartmatch [perl #132594] is too great for the smartmatch changes to stay in for 5.28. This reverts most of the merge in commit da4e040f42421764ef069371d77c008e6b801f45. All core behaviour and documentation is reverted. The removal of use of smartmatch from a couple of tests (that aren't testing smartmatch) remains. Customisation of a couple of CPAN modules to make them portable across smartmatch types remains. A small bugfix in scope.c also remains.
* internally change "when" to "whereso"Zefram2017-12-051-1/+1
| | | | | The names of ops, context types, functions, etc., all change in accordance with the change of keyword.
* use LOOP struct for entergiven opZefram2017-11-291-1/+1
| | | | | This will support the upcoming change to let loop control ops apply to "given" blocks.
* change OP_MULTICONCAT nargs from UV to SSize_tDavid Mitchell2017-11-131-6/+5
| | | | | | | Change it from unsigned to unsigned since it makes the SP-adjusting code in pp_multiconcat easier without hitting undefined behaviour (RT #132390); and change its size from UV to SSize_t since it represents the number of args on the stack.
* rename op_aux field from 'size' to 'ssize'David Mitchell2017-11-131-3/+3
| | | | | | | This part of the op_aux union was added for OP_MULTICONCAT; its actually of type SSize_t, so rename it to ssize to better reflect that it's signed. This should make no functional difference.
* Add OP_MULTICONCAT opDavid Mitchell2017-10-311-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can make things a *lot* faster: 4x or more. In more detail: it will optimise into a single OP_MULTICONCAT, most expressions of the form LHS RHS where LHS is one of (empty) my $lexical = $lexical = $lexical .= expression = expression .= and RHS is one of (A . B . C . ...) where A,B,C etc are expressions and/or string constants "aAbBc..." where a,A,b,B etc are expressions and/or string constants sprintf "..%s..%s..", A,B,.. where the format is a constant string containing only '%s' and '%%' elements, and A,B, etc are scalar expressions (so only a fixed, compile-time-known number of args: no arrays or list context function calls etc) It doesn't optimise other forms, such as ($a . $b) . ($c. $d) ((($a .= $b) .= $c) .= $d); (although sub-parts of those expressions might be converted to an OP_MULTICONCAT). This is partly because it would be hard to maintain the correct ordering of tie or overload calls. The compiler uses heuristics to determine when to convert: in general, expressions involving a single OP_CONCAT aren't converted, unless some other saving can be made, for example if an OP_CONST can be eliminated, or in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply OPpTARGET_MY to, but OP_CONST can't. The multiconcat op is of type UNOP_AUX, with the op_aux structure directly holding a pointer to a single constant char* string plus a list of segment lengths. So for "a=$a b=$b\n"; the constant string is "a= b=\n", and the segment lengths are (2,3,1). If the constant string has different non-utf8 and utf8 representations (such as "\x80") then both variants are pre-computed and stored in the aux struct, along with two sets of segment lengths. For all the above LHS types, any SASSIGN op is optimised away. For a LHS of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too. For example where $a and $b are lexical vars, this statement: my $c = "a=$a, b=$b\n"; formerly compiled to const[PV "a="] s padsv[$a:1,3] s concat[t4] sK/2 const[PV ", b="] s concat[t5] sKS/2 padsv[$b:1,3] s concat[t6] sKS/2 const[PV "\n"] s concat[t7] sKS/2 padsv[$c:2,3] sRM*/LVINTRO sassign vKS/2 and now compiles to: padsv[$a:1,3] s padsv[$b:1,3] s multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY In terms of how much faster it is, this code: my $a = "the quick brown fox jumps over the lazy dog"; my $b = "to be, or not to be; sorry, what was the question again?"; for my $i (1..10_000_000) { my $c = "a=$a, b=$b\n"; } runs 2.7 times faster, and if you throw utf8 mixtures in it gets even better. This loop runs 4 times faster: my $s; my $a = "ab\x{100}cde"; my $b = "fghij"; my $c = "\x{101}klmn"; for my $i (1..10_000_000) { $s = "\x{100}wxyz"; $s .= "foo=$a bar=$b baz=$c"; } The main ways in which OP_MULTICONCAT gains its speed are: * any OP_CONSTs are eliminated, and the constant bits (already in the right encoding) are copied directly from the constant string attached to the op's aux structure. * It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in all cases; OP_CONCAT only did this in very limited circumstances. * Because it has a holistic view of the entire concatenation expression, it can do the whole thing in one efficient go, rather than creating and copying intermediate results. pp_multiconcat() goes to considerable efforts to avoid inefficiencies. For example it will only SvGROW() the target once, and to the exact size needed, no matter what mix of utf8 and non-utf8 appear on the LHS and RHS. It never allocates any temporary SVs except possibly in the case of tie or overloading. * It does all its own appending and utf8 handling rather than calling out to functions like sv_catsv(). * It's very good at handling the LHS appearing on the RHS; for example in $x = "abcd"; $x = "-$x-$x-"; It will do roughly the equivalent of the following (where targ is $x); SvPV_force(targ); SvGROW(targ, 11); p = SvPVX(targ); Move(p, p+1, 4, char); Copy("-", p, 1, char); Copy("-", p+5, 1, char); Copy(p+1, p+6, 4, char); Copy("-", p+10, 1, char); SvCUR(targ) = 11; p[11] = '\0'; Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to handle situations like that. The code is quite big; both S_maybe_multiconcat() and pp_multiconcat() (the main compile-time and runtime parts of the implementation) are over 700 lines each. It turns out that when you combine multiple ops, the number of edge cases grows exponentially ;-)
* fix typo in commentLukas Mai2017-08-291-1/+1
|
* S_opdump_indent(): avoid shift overflowDavid Mitchell2017-08-171-1/+4
| | | | | | | | | RT #131912 the (1 << i) is harmless for large i, but triggers an 'undefined-behavior' errror in clang. So work around it.
* sv_dump(): display regex LEN and LV-as-RX regexpDavid Mitchell2017-08-041-1/+6
| | | | | | | | When the len field of a REGEXP isn't usurped, display it (it used to always be skipped for REGEXPs). When it's usurped by a PVLV to point to a 'struct regexp', display it as a pointer.
* add PL_sv_zeroDavid Mitchell2017-07-271-1/+14
| | | | | | | | | | it's like PL_sv_no, except that its string value is "0" rather than "". It can be used for example where pp function wants to push a zero return value on the stack. The next commit will start to use it. Also update the SvIMMORTAL() to be more efficient: it now checks whether the SV's address is in a range rather than individually checking against &PL_sv_undef, &PL_sv_no etc.
* PL_curstackinfo->si_stack_hwm: gently restoreDavid Mitchell2017-07-161-1/+2
| | | | | | | | | | | | | | | | | | RT #131732 With v5.27.1-66-g87058c3, I introduced a DEBUGGING-only mechanism in the runops loop for checking whether an op extended the stack by as many slots as values it returned on the stack. It did this by setting a high-water-mark just before calling each pp function, and checking its result on return. It saved and restored the old value of PL_curstackinfo->si_stack_hwm whenever it entered or left a runops loop or did a JMPENV_PUSH / JMPENV_POP. However, the restoring could restore to an old value that was smaller than the current value, leading to false-positive stack-extend panics. So only restore if the old value was larger. In particular this was causing false positives in DBI.
* fix #ifdef directives with extra tokensLukas Mai2017-06-241-3/+3
|
* add PL_curstackinfo->si_stack_hwmDavid Mitchell2017-06-241-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | On debugging builds only, add a mechanism for checking pp function calls for insufficient stack extending. It works by: * make the runops loop set a high-water-mark (HWM) variable equal to PL_stack_sp just before calling each pp function; * make EXTEND() etc update this HWM; * on return from the pp function, panic if PL_stack_sp is > HWM. This detects whether pp functions are pushing more items onto the stack than they are requesting space for. There's a possibility of false positives if the code is doing weird stuff like direct manipulation of stacks via PL_curstack, SWITCHSTACK() etc. It's also possible that one pp function "knows" that a previous pp function will have already grown the stack enough. Currently the only place in core that seems to do this is pp_enteriter, which allocates 1 stack slot so that pp_iter doesn't have to check each time it returns &PL_sv_yes/no. To accommodate this, the new macro EXTEND_SKIP() has been added, that tells perl that it's safely skipping an EXTEND() here.
* PERL_GLOBAL_STRUCT_PRIVATE: dump.c:op_class_namesDavid Mitchell2017-03-171-1/+1
| | | | | t/porting/libperl.t under -DPERL_GLOBAL_STRUCT_PRIVATE doesn't like non-const static data structures
* S_do_op_dump_bar(): don't print TRANS op_pv fieldDavid Mitchell2017-02-271-7/+8
| | | | | | | | | | My recent commit v5.25.9-32-gabd07ec made dump.c display the op_pv string of OP_NEXT, OP_TRANS etc ops. However, for OP_TRANS/OP_TRANSR, the string is basically a 256-byte potentially non null-temrinated array. This was causing a buffer read overrun and garbage to be displayed. The simple solution is to only display the address but not contents for a trans op. OP_NEXT ec labels continue to be displayed.
* HvTOTALKEYS() takes a HV* as argumentSteffen Mueller2017-02-031-1/+1
| | | | | | | Incidentally, it currently works on SV *'s as well because there's an explicit cast after an SvANY. Let's not rely on that. This commit also removes a pointless const in a cast. Again. It takes an HV * as argument. Let's only change that if we have a strong reason to.
* in dump_sub() handle CV ref used as GVZefram2017-01-281-15/+21
| | | | | dump_sub() can receive a CV ref where it's expecting a GV. Make it handle that cleanly. Fixes [perl #129126].
* Perl_sv_dump(): allow a null-pointer argumentAaron Crane2017-01-241-3/+1
| | | | | | Since the recursive case already handles null pointers, and this function is specifically aimed at debugging, it seems sensible to handle a null pointer at the top level too.
* S_do_pmop_dump_bar() reduce scope of ch variableDavid Mitchell2017-01-241-7/+3
| | | | trivial bit of tidy-up
* handle op_pv better in op_clear() and op_dump()David Mitchell2017-01-241-1/+25
| | | | | | | | | | | | | | | In op_clear(), the ops with labels stored in the op_pv field (OP_NEXT etc) fall-through to the OP_TRANS/OP_TRANSR code, which determines whether to free op_pv based on the OPpTRANS_FROM_UTF|OPpTRANS_TO_UTF flags, which are only valid for OP_TRANS/OP_TRANSR. At the moment the fall-through fields don't use either of those private bits, but in case this changes in future, only check those flag bits for trans ops. At the same time, enhance op_dump() to display the OP_PV field of such ops. Also, fix a leak I introduced in the recently-added S_gv_display() function.
* dump.c: handle GV being really a ref to a CVDavid Mitchell2017-01-231-29/+36
| | | | | | | | | | | | | | | | | | | | | | RT #129285 These days a 'GV' can actually just be a ref to a CV when the only thing that would be stored in the glob is a CV. Update S_do_op_dump_bar() to handle this. Formerly it would trigger an assert on a non-threaded build. In fact, incorporate the fixed logic into a static function, S_gv_display(), that is shared by both S_do_op_dump_bar() and Perl_debop(); so both perl -Dx and perl -Dt get the benefit. Also for the -Dx case, make it display the raw address of the GV too.
* reindent OP_AELEMFAST block in S_do_op_dump_bar()David Mitchell2017-01-231-13/+13
| | | | | after previous commit removed an enclosing 'if' block. Whitespace-only change
* op_dump(): no OPf_SPECIAL on AELEMFAST,GVSV,GVDavid Mitchell2017-01-231-2/+0
| | | | | | | | between 5.14 and 5.16 pp_aelemfast changed from using OPf_SPECIAL to using op type to distinguish between a lexical or glob arg, but op_dump() hadn't been updated to reflect this. Also, GVSV and GV never used the OPf_SPECIAL flag, so testing for it with those ops was wrong (but currently harmless).
* fix some more bizarre indention in dump.cDavid Mitchell2017-01-231-27/+28
| | | | | | | | | | | | | | | | | | | | (whitespace-only change) Not mentioning any names to protect the guilty, but about 3 years ago some code was committed to dump.c that had just bizarre indentation; for example, this if (foo) bar being more like if (foo) bar (and this is nothing to do with tab expansion). This commit fixes up the most glaring issues.
* S_do_op_dump_bar(): fix some weird indentationDavid Mitchell2017-01-211-19/+22
| | | | whitespace-only change
* revamp the op_dump() output formatDavid Mitchell2017-01-211-87/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mainly used for low-level debugging these days (higher level stuff like Concise having since been created), e.g. calling op_dump() from within a debugger or running with -Dx. Make it display more info, and use an ACSII-art tree to show the structure. The main changes are: * added 'ASCII-art' tree structure; * it now displays each op's class and address; * for op_next etc links, it now displays the type and address of the linked-to op in addition to its sequence number; * the following ops now have their op_other field displayed, like op_and etc already do: andassign argdefelem dor dorassign entergiven entertry enterwhen once orassign regcomp substcont * enteriter now has its op_redo etc fields displayed, like enterloop already does; Here is a sample before and after of perl -Dx -e'($x+$y) * $z' Before: { 1 TYPE = leave ===> NULL TARG = 1 FLAGS = (VOID,KIDS,PARENS,SLABBED) PRIVATE = (REFC) REFCNT = 1 { 2 TYPE = enter ===> 3 FLAGS = (UNKNOWN,SLABBED,MORESIB) } { 3 TYPE = nextstate ===> 4 FLAGS = (VOID,SLABBED,MORESIB) LINE = 1 PACKAGE = "main" SEQ = 4294967246 } { 5 TYPE = multiply ===> 1 TARG = 5 FLAGS = (VOID,KIDS,SLABBED) PRIVATE = (0x2) { 6 TYPE = add ===> 7 TARG = 3 FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB) PRIVATE = (0x2) { 8 TYPE = null ===> (9) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED,MORESIB) PRIVATE = (0x1) { 4 TYPE = gvsv ===> 9 FLAGS = (SCALAR,SLABBED) PADIX = 1 } } { 10 TYPE = null ===> (6) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) { 9 TYPE = gvsv ===> 6 FLAGS = (SCALAR,SLABBED) PADIX = 2 } } } { 11 TYPE = null ===> (5) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) { 7 TYPE = gvsv ===> 5 FLAGS = (SCALAR,SLABBED) PADIX = 4 } } } } After: 1 leave LISTOP(0xdecb38) ===> [0x0] TARG = 1 FLAGS = (VOID,KIDS,PARENS,SLABBED) PRIVATE = (REFC) REFCNT = 1 | 2 +--enter OP(0xdecb00) ===> 3 [nextstate 0xdecb80] | FLAGS = (UNKNOWN,SLABBED,MORESIB) | 3 +--nextstate COP(0xdecb80) ===> 4 [gvsv 0xdeb3b8] | FLAGS = (VOID,SLABBED,MORESIB) | LINE = 1 | PACKAGE = "main" | SEQ = 4294967246 | 5 +--multiply BINOP(0xdecbe0) ===> 1 [leave 0xdecb38] TARG = 5 FLAGS = (VOID,KIDS,SLABBED) PRIVATE = (0x2) | 6 +--add BINOP(0xdeb2b0) ===> 7 [gvsv 0xdeb270] | TARG = 3 | FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB) | PRIVATE = (0x2) | | 8 | +--null (ex-rv2sv) UNOP(0xdeb378) ===> 9 [gvsv 0xdeb338] | | FLAGS = (SCALAR,KIDS,SLABBED,MORESIB) | | PRIVATE = (0x1) | | | 4 | | +--gvsv PADOP(0xdeb3b8) ===> 9 [gvsv 0xdeb338] | | FLAGS = (SCALAR,SLABBED) | | PADIX = 1 | | 10 | +--null (ex-rv2sv) UNOP(0xdeb2f8) ===> 6 [add 0xdeb2b0] | FLAGS = (SCALAR,KIDS,SLABBED) | PRIVATE = (0x1) | | 9 | +--gvsv PADOP(0xdeb338) ===> 6 [add 0xdeb2b0] | FLAGS = (SCALAR,SLABBED) | PADIX = 2 | 11 +--null (ex-rv2sv) UNOP(0xdeb220) ===> 5 [multiply 0xdecbe0] FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) | 7 +--gvsv PADOP(0xdeb270) ===> 5 [multiply 0xdecbe0] FLAGS = (SCALAR,SLABBED) PADIX = 4
* add Perl_op_class(o) API functionDavid Mitchell2017-01-211-0/+148
| | | | | | | | | | | | Given an op, this function determines what type of struct it has been allocated as. Returns one of the OPclass enums, such as OPclass_LISTOP. Originally this was a static function in B.xs, but it has wider applicability; indeed several XS modules on CPAN have cut and pasted it. It adds the OPclass enum to op.h. In B.xs there was a similar enum, but with names like OPc_LISTOP. I've renamed them to OPclass_LISTOP etc. so as not to clash with the cut+paste code already on CPAN.
* Change white space to avoid C++ deprecation warningKarl Williamson2016-11-181-143/+168
| | | | | | | | | | | | | | | | | | | | | | C++11 requires space between the end of a string literal and a macro, so that a feature can unambiguously be added to the language. Starting in g++ 6.2, the compiler emits a warning when there isn't a space (presumably so that future versions can support C++11). Unfortunately there are many such instances in the perl core. This commit fixes those, including those in ext/, but individual commits will be used for the other modules, those in dist/ and cpan/. This commit also inserts space at the end of a macro before a string literal, even though that is not deprecated, and removes useless "" literals following a macro (instead of inserting a blank). The result is easier to read, making the macro stand out, and be clearer as to the intention. Code and modules included with the Perl core need to be compilable using C++. This is so that perl can be embedded in C++ programs. (Actually, only the hdr files need to be so compilable, but it would be hard to test that just the hdrs are compilable.) So we need to accommodate changes to the C++ language.
* eliminate OPpRUNTIME private PMOP flagDavid Mitchell2016-11-141-3/+2
| | | | | | This flag was added in 5.004 and even then it didn't seem to be used for anything. It gets set and unset in various places, but is never tested. I'm not even sure what it was intended for.
* eliminate SVpbm_VALID flagDavid Mitchell2016-11-121-8/+2
| | | | | | | | This flag is set on an SV to indicate that it has PERL_MAGIC_bm (fast Boyer-Moore) magic attached. Instead just directly check whether it has such magic. This frees up the 0x40000000 bit for anything except AVs and HVs
* Only test SvTAIL when SvVALIDDavid Mitchell2016-11-121-2/+5
| | | | | | Only use the SvTAIL() macro when we've already confirmed that the SV is SvVALID() - this is in preparation for removing the SVpbm_TAIL flag in the next commit
* Eliminate SVrepl_EVAL and SvEVALED()David Mitchell2016-11-121-4/+2
| | | | | | | | | | | This flag is only used to indicate that the SV holding the text of the replacement part of a s/// has seen at least one /e. Instead, set the IVX field in the SV to a true value. (We already set the NVX field on that SV to indicate a multi-src-line substitution). This is to reduce the number of odd special cases for the SVpbm_VALID flag.
* op_dump() - remove extra indentation from PMOPDavid Mitchell2016-11-121-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When dumping a PMOP, it displays the PMOP-specific fields with an extra set of braces and level of indentation, e.g. { TYPE = match ===> 1 FLAGS = (VOID,SLABBED) PRIVATE = (RTIME) { PMf_PRE /abc/ (RUNTIME) PMFLAGS = (SCANFIRST,ALL) } } This is visually confusing, because child ops are shown in the same way. This commit removes the extra indentation: { TYPE = match ===> 1 FLAGS = (VOID,SLABBED) PRIVATE = (RTIME) PMf_PRE /abc/ (RUNTIME) PMFLAGS = (SCANFIRST,ALL) }
* dump.c: don't display an ARRAY's ARYLEN fieldDavid Mitchell2016-11-121-2/+0
| | | | | | | | | | | | | Originally xav_arylen was an AV field and was displayed by sv_dump. In 2005, this ield was removed, and replaced by PERL_MAGIC_arylen_p magic when needed. A side effect of this is that sv_dump on a magical AV adds PERL_MAGIC_arylen_p magic to the av as a side-effect. Which is undesirable. This commit just omits displaying 'ARYLEN =' altogether. Any arylen magic will already be displayed as part of dumping the AV, so it's redundant.
* -DsR : display unTEMPed temps with "t" not "T"David Mitchell2016-10-261-5/+8
| | | | | | | | | | | | | | | | | | | | with the -R debugging flag, SVs are displayed with a reference count (if > 1), and with a T if the SV is referenced from the temps stack. E.g. $ perl -DstR -e'@a = map $_,"a", "b"' ... * <T>PV("a"\0) <T>PV("b"\0) This commit enhances this to use both "t" and "T": t: SV is referenced from PL_tmps_stack, but SvTEMP() not set T: SV is referenced from PL_tmps_stack, and in addition, SvTEMP() is set (The other permutation, SvTEMP() set but not in PL_tmps_stack, is illegal). This commit changes
* dump.c: use new SvPVCLEAR and constant string friendly macrosYves Orton2016-10-191-4/+4
|
* make OP_SPLIT a PMOP, and eliminate OP_PUSHREDavid Mitchell2016-10-041-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most ops that execute a regex, such as match and subst, are of type PMOP. A PMOP allows the actual regex to be attached directly to that op, due to its extra fields. OP_SPLIT is different; it is just a plain LISTOP, but it always has an OP_PUSHRE as its first child, which *is* a PMOP and which has the regex attached. At runtime, pp_pushre()'s only job is to push itself (i.e. the current PL_op) onto the stack. Later pp_split() pops this to get access to the regex it wants to execute. This is a bit unpleasant, because we're pushing an OP* onto the stack, which is supposed to be an array of SV*'s. As a bit of a hack, on DEBUGGING builds we push a PVLV with the PL_op address embedded instead, but this still isn't very satisfactory. Now that regexes are first-class SVs, we could push a REGEXP onto the stack rather than PL_op. However, there is an optimisation of @array = split which eliminates the assign and embeds the array's GV/padix directly in the PUSHRE op. So split still needs access to that op. But the pushre op will always be splitop->op_first anyway, so one possibility is to just skip executing the pushre altogether, and make pp_split just directly access op_first instead to get the regex and @array info. But if we're doing that, then why not just go the full hog and make OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the data that was spread across the two ops now combined into just the one split op. That is exactly what this commit does. For a simple compile-time pattern like split(/foo/, $s, 1), the optree looks like: before: <@> split[t2] lK </> pushre(/"foo"/) s/RTIME <0> padsv[$s:1,2] s <$> const(IV 1) s after: </> split(/"foo"/)[t2] lK/RTIME <0> padsv[$s:1,2] s <$> const[IV 1] s while for a run-time expression like split(/$pat/, $s, 1), before: <@> split[t3] lK </> pushre() sK/RTIME <|> regcomp(other->8) sK <0> padsv[$pat:2,3] s <0> padsv[$s:1,3] s <$> const(IV 1)s after: </> split()[t3] lK/RTIME <|> regcomp(other->8) sK <0> padsv[$pat:2,3] s <0> padsv[$s:1,3] s <$> const[IV 1] s This makes the code faster and simpler. At the same time, two new private flags have been added for OP_SPLIT - OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the assign op has been optimised away, and if so, whether the array is lexical. Also, deparsing of split has been improved, to the extent that perl TEST -deparse op/split.t now passes. Also, a couple of panic messages in pp_split() have been replaced with asserts().
* do_sv_dump(): handle CvSTART() as slab addressDavid Mitchell2016-09-051-1/+6
| | | | | If a CV is CvSLABBED(), then CvSTART() points to the op slab rather than a start op. Make Perl_do_sv_dump() display this more informatively.
* dump.c: dump physical, not logical, AVsDavid Mitchell2016-08-101-6/+8
| | | | | | | | | Perl_do_sv_dump() (as used by Devel::Peek) dumped a logical AV - i.e. if it was tied, it called tie methods to get its size and to get its elements. Instead, dump the physical fields in the AV - e.g. a tied AV will likely have a FILL of -1 and no elements.
* add OP_ARGELEM, OP_ARGDEFELEM, OP_ARGCHECK opsDavid Mitchell2016-08-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently subroutine signature parsing emits many small discrete ops to implement arg handling. This commit replaces them with a couple of ops per signature element, plus an initial signature check op. These new ops are added to the OP tree during parsing, so will be visible to hooks called up to and including peephole optimisation. It is intended soon that the peephole optimiser will take these per-element ops, and replace them with a single OP_SIGNATURE op which handles the whole signature in a single go. So normally these ops wont actually get executed much. But adding these intermediate-level ops gives three advantages: 1) it allows the parser to efficiently generate subtrees containing individual signature elements, which can't be done if only OP_SIGNATURE or discrete ops are available; 2) prior to optimisation, it provides a simple and straightforward representation of the signature; 3) hooks can mess with the signature OP subtree in ways that make it no longer possible to optimise into an OP_SIGNATURE, but which can still be executed, deparsed etc (if less efficiently). This code: use feature "signatures"; sub f($a, $, $b = 1, @c) {$a} under 'perl -MO=Concise,f' now gives: d <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->d 1 <;> nextstate(main 84 foo:6) v:%,469762048 ->2 2 <+> argcheck(3,1,@) v ->3 3 <;> nextstate(main 81 foo:6) v:%,469762048 ->4 4 <+> argelem(0)[$a:81,84] v/SV ->5 5 <;> nextstate(main 82 foo:6) v:%,469762048 ->6 8 <+> argelem(2)[$b:82,84] vKS/SV ->9 6 <|> argdefelem(other->7)[2] sK ->8 7 <$> const(IV 1) s ->8 9 <;> nextstate(main 83 foo:6) v:%,469762048 ->a a <+> argelem(3)[@c:83,84] v/AV ->b - <;> ex-nextstate(main 84 foo:6) v:%,469762048 ->b b <;> nextstate(main 84 foo:6) v:%,469762048 ->c c <0> padsv[$a:81,84] s ->d The argcheck(3,1,@) op knows the number of positional params (3), the number of optional params (1), and whether it has an array / hash slurpy element at the end. This op is responsible for checking that @_ contains the right number of args. A simple argelem(0)[$a] op does the equivalent of 'my $a = $_[0]'. Similarly, argelem(3)[@c] is equivalent to 'my @c = @_[3..$#_]'. If it has a child, it gets its arg from the stack rather than using $_[N]. Currently the only used child is the logop argdefelem. argdefelem(other->7)[2] is equivalent to '@_ > 2 ? $_[2] : other'. [ These ops currently assume that the lexical var being introduced is undef/empty and non-magival etc. This is an incorrect assumption and is fixed in a few commits' time ]
* Change scalar(%hash) to be the same as 0+keys(%hash)Yves Orton2016-06-221-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This subject has a long history see [perl #114576] for more discussion. https://rt.perl.org/Public/Bug/Display.html?id=114576 There are a variety of reasons we want to change the return signature of scalar(%hash). One is that it leaks implementation details about our associative array structure. Another is that it requires us to keep track of the used buckets in the hash, which we use for no other purpose but for scalar(%hash). Another is that it is just odd. Almost nothing needs to know these values. Perhaps debugging, but we have several much better functions for introspecting the internals of a hash. By changing the return signature we can remove all the logic related to maintaining and updating xhv_fill_lazy. This should make hot code paths a little faster, and maybe save some memory for traversed hashes. In order to provide some form of backwards compatibility we adds three new functions to the Hash::Util namespace: bucket_ratio(), num_buckets() and used_buckets(). These functions are actually implemented in universal.c, and thus always available even if Hash::Util is not loaded. This simplifies testing. At the same time Hash::Util contains backwards compatible code so that the new functions are available from it should they be needed in older perls. There are many tests in t/op/hash.t that are more or less obsolete after this patch as they test that xhv_fill_lazy is correctly set in various situations. However since we have a backwards compat layer we can just switch them to use bucket_ratio(%hash) instead of scalar(%hash) and keep the tests, just in case they are actually testing something not tested elsewhere.
* Dump empty-string ENAMEs as empty stringsFather Chrysostomos2016-05-151-1/+1
| | | | They were coming out as ‘(null)’, which is incorrect and confusing.
* rename and function-ise dtrace macrosDavid Mitchell2016-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit: 1. Renames the various dtrace probe macros into a consistent and self-documenting pattern, e.g. ENTRY_PROBE => PERL_DTRACE_PROBE_ENTRY RETURN_PROBE => PERL_DTRACE_PROBE_RETURN Since they're supposed to be defined only under PERL_CORE, this shouldn't break anything that's not being naughty. 2. Implement the main body of these macros using a real function. They were formerly defined along the lines of if (PERL_SUB_ENTRY_ENABLED()) PERL_SUB_ENTRY(...); The PERL_SUB_ENTRY() part is a macro generated by the dtrace system, which for example on linux expands to a large bunch of assembly directives. Replace the direct macro with a function wrapper, e.g. if (PERL_SUB_ENTRY_ENABLED()) Perl_dtrace_probe_call(aTHX_ cv, TRUE); This reduces to once the number of times the macro is expanded. The new functions also take simpler args and then process the values they need using intermediate temporary vars to avoid huge macro expansions. For example ENTRY_PROBE(CvNAMED(cv) ? HEK_KEY(CvNAME_HEK(cv)) : GvENAME(CvGV(cv)), CopFILE((const COP *)CvSTART(cv)), CopLINE((const COP *)CvSTART(cv)), CopSTASHPV((const COP *)CvSTART(cv))); is now PERL_DTRACE_PROBE_ENTRY(cv); This reduces the executable size by 1K on -O2 -Dusedtrace builds, and by 45K on -DDEBUGGING -Dusedtrace builds.
* document args of Perl_do_sv_dump()David Mitchell2016-03-021-0/+10
|
* Add missing break in switch.Jarkko Hietaniemi2016-02-071-0/+1
| | | | Coverity CID 135145: Missing break in switch (MISSING_BREAK)
* Perl_runops_debug(): do FREETMPSDavid Mitchell2016-02-031-0/+4
| | | | | | | | | In runops_debug() wrap the optional printing of next op, arg stack etc with ENTER/SAVETMPS, FREETMPS/LEAVE - so that temporaries created by the dump output are promptly freed, and thus don't alter the tmps stack. (I'm trying to debug some tmps stack corruption, and running with -Dst made the problem go away).
* Add STATICs to S_ functions.Jarkko Hietaniemi2016-01-291-1/+1
|
* Various pods: Add C<> around many typed-as-is thingsKarl Williamson2015-09-031-23/+23
| | | | Removes 'the' in front of parameter names in some instances.