summaryrefslogtreecommitdiff
path: root/op.h
Commit message (Collapse)AuthorAgeFilesLines
* Fix a bunch of repeated-word typosDagfinn Ilmari Mannsåker2020-05-221-1/+1
| | | | | Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
* Remove spurious double spaces before open braces in core C codeDagfinn Ilmari Mannsåker2020-04-131-1/+1
|
* op:h remove double space in struct op_argcheck_aux declarationDagfinn Ilmari Mannsåker2020-04-131-1/+1
|
* make freed op re-use closer to O(1)Tony Cook2020-03-021-1/+2
| | | | | | | | | | | | | | | | previously freed ops were stored as one singly linked list, and a failed search for a free op to re-use could potentially search that entire list, making freed op lookups O(number of freed ops), or given that the number of freed ops is roughly proportional to program size, making the total cost of freed op handling roughly O((program size)**2). This was bad. This change makes opslab_freed into an array of linked list heads, one per op size. Since in a practical sense the number of op sizes should remain small, and insertion is amortized O(1), this makes freed op management now roughly O(program size). fixes #17555
* Restrict features in wildcardsKarl Williamson2020-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | The algorithm for dealing with Unicode property wildcards is to wrap the user-supplied pattern with /miaa. We don't want the user to be able to override the /m and /aa parts. Modifiers that are only specifiable as a modifier in a qr or similar op (like /gc) can't be included in things like (?gc). These normally incur a warning that they are ignored, but the texts of those warnings are misleading when using wildcards, so I chose to just make them illegal. Of course that could be changed to having custom useful warning texts, but I didn't think it was worth it. I also chose to forbid recursion of using nested \p{}, just from fear that it might lead to issues down the road, and it really isn't useful for this limited universe of strings to match against. Because wildcards currently can't handle '}' inside them, only the single letter \p,\P are valid anyway. Similarly, I forbid the '*' quantifier to make it harder for the constructed subpattern to take forever to make any progress and decide to halt. Again, using it would be overkill on the universe of possible match strings.
* op.h: Move some flag bits downKarl Williamson2020-02-191-14/+14
| | | | | | | | | This is in preparation for adding a new flag bit at the end in a future commit. It could have been added in the unused space that the first of these was moved to, but the new one is less important/used, so I thought it best to come last. The reason to use unused space is to preserve binary compatibility with the bits, and we don't care about that at this point in the development cycle.
* Reimplement tr/// without swashesKarl Williamson2019-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large commit removes the last use of swashes from core. It replaces swashes by inversion maps. This data structure is already in use for some Unicode properties, such as case changing. The inversion map data structure leads to straight forward implementation code, so I collapsed the two doop.c routines do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few conditionals could be avoided in the loop if this function were split so that one version didn't have to test for, e.g., squashing, but I suspect these are in the noise in the loop, which has to deal with UTF-8 conversions. This should be faster than the previous implementation anyway. I measured the differences some releases back, and inversion maps were faster than the equivalent swash for up to 512 or 1024 different ranges. These numbers are unlikely to be exceeded in tr/// except possibly in machine-generated ones. Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases, but I left in the existing non-UTF-8 implementation, which uses tables, because I suspect it is faster. This means that there is extra code, purely for runtime performance. An inversion map is always created from the input, and then if the table implementation is to be used, the table is easily derived from the map. Prior to this commit, the table implementation was used in certain edge cases involving code points above 255. Those cases are now handled by the inversion map implementation, because it would have taken extra code to detect them, and I didn't think it was worth it. That could be changed if I am wrong. Creating an inversion map for all inputs essentially normalizes them, and then the same logic is usable for all. This fixes some false negatives in the previous implementation. It also allows for detecting if the actual transliteration can be done in place. Previously, the code mostly punted on that detection for the UTF-8 case. This also allows for accurate counting of the lengths of the two sides, fixing some longstanding TODO warning tests. A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a below 256 character resolving to one that requires UTF-8. If this isn't set, the code knows that a non-UTF-8 input won't become UTF-8 in the process, and so can take short cuts. The bit representing this flag is the same as OPpTRANS_FROM_UTF, which is no longer used. That name is left in so that the dozen-ish modules in cpan that refer to it can still compile. AFAICT none of them actually use the flag, as well they shouldn't since it is private to the core. Inversion maps are ideally suited for tr/// implementations. An issue with them in general is that for some pathological data, they can become fragmented requiring more space than you would expect, to represent the underlying data. However, the typical tr/// would not have this issue, requiring only very short inversion maps to represent; in some cases shorter than the table implementation. Inversion maps are also easier to deparse than swashes. A deparse TODO was also fixed by this commit, and the code to deparse UTF-8 inputs is simplified. One could implement specialized data structures for specific types of inputs. For example, a common tr/// form is a single range, like tr/A-Z/a-z/. That could be implemented without a table and be quite fast. An intermediate step would be to use the inversion map implementation always when the transliteration is a single range, and then special case length=1 maps at execution time. Thanks to Nicholas Rochemagne for his help on B
* op.h: Add synonyms for some tr/// valuesKarl Williamson2019-11-061-0/+3
|
* Change names of some OPpTRANS flagsKarl Williamson2019-11-061-2/+3
| | | | | | | These two flags will shortly become obsolete, replaced by ones with different meanings. This flag makes the new ones the normal ones, and makes the old names synonyms so that code that refers to them can compile.
* doop.c: Change out-of-bounds valueKarl Williamson2019-11-061-0/+1
| | | | | | This currently uses 0xfeedface as a marker for something that isn't a legal value. But that could in fact become legal at same point. This defines a value TR_OOB that can be guaranteed not to become legal.
* op.c, doop.c Use mnemonics instead of numeric valuesKarl Williamson2019-11-061-0/+5
| | | | For legibility and maintainability
* Change macro name in tr/// codeKarl Williamson2019-11-061-0/+3
| | | | This makes it more mnemonic. Also add an explanation in toke.c
* op.h: Remove obsolete #defineKarl Williamson2019-11-031-4/+0
| | | | This is no longer used.
* On OP_READLINE, OPf_SPECIAL is set for <<>>, clear for <>.Nicholas Clark2019-11-021-0/+1
|
* Remove indentation of no-longer #ifdef-guarded #definesDagfinn Ilmari Mannsåker2019-10-171-7/+7
| | | | | Commit 0f9a6232f0af0895807ddd0afae2d5512aa91bf9 removed the #ifdef PERL_OP_PARENT, but left the #define directives indented.
* Signatures: change param count from IV to UVDavid Mitchell2019-09-231-2/+2
| | | | | | For some reason I was storing the counts of sub signature parameters and optional parameters as signed ints. Since these can never be negative, change them to UV instead.
* OP_ARGCHECK: use custom aux structDavid Mitchell2019-09-231-0/+8
| | | | | | | | | | | | This op is of class OP_UNOP_AUX, Ops of this class have an op_aux pointer which typically points to a variable-length malloced array of IVs, UVs, etc. However in the specific case of OP_ARGCHECK the data stored in the aux struct is fixed. So this commit casts the aux pointer to a struct containing the relevant fields (number of parameters etc), rather than referring to them as aux[0], aux[1] etc. This makes the code more readable. Should be no functional changes.
* Un-revert "[MERGE] add+use si_cxsubix field"David Mitchell2019-09-231-1/+1
| | | | | | | | original merge commit: v5.31.3-198-gd2cd363728 reverted by: v5.31.4-0-g20ef288c53 The commit following this commit fixes the breakage, which that means the revert can be undone.
* Revert "[MERGE] add+use PL_curstackinfo->si_cxsubix field"v5.31.4Max Maischein2019-09-201-1/+1
| | | | | | | | | | | | This reverts commit d2cd363728088adada85312725ac9d96c29659be, reversing changes made to 068b48acd4bdf9e7c69b87f4ba838bdff035053c. This change breaks installing Test::Deep: ... not ok 37 - Test 'isa eq' completed ok 38 - Test 'isa eq' no premature diagnostication ...
* add Perl_gimme_V() static inline fn for GIMME_VDavid Mitchell2019-09-191-1/+1
| | | | | | This function makes use of PL_curstackinfo->si_cxsubix to avoid the overhead of a call to block_gimme() when the context of the op is unknown.
* Mark BHK macros as unorthodoxKarl Williamson2019-09-021-5/+5
| | | | This is because they take a preprocessor token as an argument
* OPSLOT: replace opslot_next with opslot_sizeDavid Mitchell2019-08-051-2/+1
| | | | | | | | | Currently, each allocated opslot has a pointer to the opslot that was allocated immediately above it. Replace this with a U16 opslot_size field giving the size of the opslot. The next opslot can then be found by adding slot->opslot_size * sizeof(void*) to slot. This saves space.
* struct opslot: document a field betterDavid Mitchell2019-08-051-1/+1
|
* opslabs: change opslab_first to opslab_free_spaceDavid Mitchell2019-08-051-1/+3
| | | | | | | Currently a OPSLAB maintains a pointer to the lowest allocated OPSLOT within the slab (slots are allocated downwards). Replace this pointer with a U16 indicating how many pointer-sized words are free below the lowest allocated slot.
* OPSLAB: always have opslab_size fieldDavid Mitchell2019-08-051-1/+2
| | | | | | | | Currently this struct only has the opslab_size field on debugging builds. Change it so that this field is always present. This will make it easier to not need a fake partial OPSLOT at the end of the slab with a NULL opslot_next field, which will in turn simplify converting opslot_next into U16 size field shortly.
* make opslot_slab an offset in current slabDavid Mitchell2019-08-051-3/+8
| | | | | | | | | | | | | | | | | | | Each OPSLOT allocated within an OPSLAB contains a pointer, opslot_slab, which points back to the first (head) slab of the slab chain (i.e. not necessarily to the slab which the op is contained in). This commit changes the pointer to be a 16-bit offset from the start of the current slab, and adds a pointer at the start of each slab which points back to the head slab. The mapping from an op to the head slab is now a two-step process: use the op's slot's opslot_offset field to find the start of the current slab, then use that slab's new opslab_head pointer to find the head slab. The advantage of this is that it reduces the storage per op. (It probably doesn't make any practical difference yet, due to alignment issues, but that will will be sorted shortly in this branch.)
* Don't use PL_check[op_type] to check for filetets ops to stackDagfinn Ilmari Mannsåker2019-05-271-0/+2
| | | | | | | | | This breaks hooking the filetest ops' check function by modules like bareword::filehandles. Instead use the OP_IS_FILETEST() macro to decide check for filetest ops. Also add an OP_IS_STAT() macro for when we want to check for (l)stat as well as the filetest ops. c.f. https://rt.cpan.org/Ticket/Display.html?id=127073
* Eliminate opASSIGN macro usage from coreDavid Mitchell2019-02-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | This macro is defined as (PL_op->op_flags & OPf_STACKED) and indicates, for ops which support it, that the mutator-variant of the op is present (e.g. $x += 1). This macro was mainly used as an arg for the old-style overloading macros (tryAMAGICbin()) which were eliminated several years ago. This commit removes its vestigial usage, and instead tests OPf_STACKED directly at each location, along with adding a comment about the significance of the flag. This removes one item of obfuscation from the overloading code. There is one potentially functional change in this commit: Perl_try_amagic_bin() was sometimes testing for OPf_STACKED without first checking that it had been called with the AMGf_assign flag (which indicates that this op supports a mutator variant). With this commit, it now checks first, so this is theoretically a bug fix. In practice that section of code was never reached without AMGf_assign always being set anyway.
* PERL_OP_PARENT is always defined, stop testing for itTony Cook2019-01-251-21/+2
| | | | | | | | PERL_OP_PARENT is the new reality, leaving the pre-processor checks is more confusing that anything else. I left the test in perl.c for consistency with the other checks in that code.
* rmv/de-dup static const char array "strings"Daniel Dragan2018-03-071-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSVC due to a bug doesn't merge identicals between .o'es or discard these vars and their contents. MEM_WRAP_CHECK_2 has never been used outside of core according to cpan grep MEM_WRAP_CHECK_2 was removed on the "have PERL_MALLOC_WRAP" branch in commit fabdb6c0879 "pre-likely cleanup" without explination, probably bc it was unused. But MEM_WRAP_CHECK_2 was still left on the "no PERL_MALLOC_WRAP" branch, so remove it from the "no" side for tidyness since it was a mistake to leave it there if it was removed from the "yes" side of the #ifdef. Add MEM_WRAP_CHECK_s API, letter "s" means argument is string or static. This lets us get rid of the "%s" argument passed to Perl_croak_nocontext at a couple call sites since we fully control the next and only argument and its guaranteed to be a string literal. This allows merging of 2 "Out of memory during array extend" c strings by linker now. Also change the 2 op.h messages into macros which become string literals at their call sites instead of "read char * from a global char **" which was going on before. VC 2003 32b perl527.dll section size before .text name DE503 virtual size .rdata name 4B621 virtual size after .text name DE503 virtual size .rdata name 4B5D1 virtual size
* op.h: remove spurious # define indentDavid Mitchell2018-02-271-1/+1
| | | | whitespace-only change
* Deprecate above \xFF in bitwise string opsKarl Williamson2018-01-191-0/+4
| | | | | | | | | | | | This is already a fatal error for operations whose outcome depends on them, but in things like "abc" & "def\x{100}" the wide character doesn't actually need to participate in the AND, and so perl doesn't. As a result of the discussion in the thread beginning with http://nntp.perl.org/group/perl.perl5.porters/244884, it was decided to deprecate these ones too.
* tr///; simplify $utf8 =~ tr/nonutf8/nonutf8/David Mitchell2018-01-191-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The run-time code to handle a non-utf8 tr/// against a utf8 string is complex, with many variants of similar code repeated depending on the presence of the /s and /c flags. Simplify them all into a single code block by changing how the translation table is stored. Formerly, the tr struct contained possibly two tables: the basic 0-255 slot one, plus in the presence of /c, a second one to map the implicit search range (\x{100}...) against any residual replacement chars not consumed by the first table. This commit merges the two tables into a single unified whole. For example tr/\x00-\xfe/abcd/c is equivalent to tr/xff-\x{7fffffff}/abcd/ which generates a 259-entry translation table consisting of: 0x00 => -1 0x01 => -1 ... 0xfe => -1 0xff => a 0x100 => b 0x101 => c 0x102 => d In addition we store: 1) the size of the translation table (0x103 in the example above); 2) an extra 'wildcard' entry stored 1 slot beyond the main table, which specifies the action for any codepoints outside the range of the table (i.e. chars 0x103..0x7fffffff). This can be either: a) a character, when the last replacement char is repeated; b) -1 when /c isn't in effect; c) -2 when /d is in effect; c) -3 identity: when the replacement list is empty but not /d. In the example above, this would be 0x103 => d The addition of -3 as a valid slot value is new. This makes the main runtime code for the utf8 string with non-utf8 tr// case look like, at its core: size = tbl->size; mapped_ch = tbl->map[ch >= size ? size : ch]; which then processes mapped_ch based on whether its >=0, or -1/-2/-3. This is a lot simpler than the old scheme, and should generally be faster too.
* tr///c: handle len(replacement charlist) > 32767David Mitchell2018-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | RT #132608 In the non-utf8 case, the /c (complement) flag to tr adds an implied \x{100}-\x{7fffffff} range to the search charlist. If the replacement list contains more chars than are paired with the 0-255 part of the search list, then the excess chars are stored in an extended part of the table. The excess char count was being stored as a short, which caused problems if the replacement list contained more than 32767 excess chars: either substituting the wrong char, or substituting for a char located up to 0xffff bytes in memory before the real translation table. So change it to SSize_t. Note that this is only a problem when the search and replacement charlists are non-utf8, the replacement list contains around 0x8000+ entries, and where the string being translated is utf8 with at least one codepoint >= U+8000.
* add two structs for OP_TRANSDavid Mitchell2018-01-191-0/+17
| | | | | | | | | | | | | | | | | Originally, the op_pv of an OP_TRANS op pointed to a 256-slot array of shorts, which contained the translations. However, in the presence of tr///c, extra information needs to be stored to handle utf8 strings. The 256 slot array was extended, with slot 0x100 holding a length, and slots 0x101 holding some extra chars. This has made things a bit messy, so this commit adds two structs, one being an array of 256 shorts, and the other being the same but with some extra fields. So for example tbl->[0x100] has been replaced with tbl->excess_len. This commit should make no functional difference, but will allow us shortly to fix a bug by changing the type of the excess_len field from short to something bigger, for example.
* revert smartmatch to 5.27.6 behaviourZefram2017-12-291-0/+3
| | | | | | | | | | | | | The pumpking has determined that the CPAN breakage caused by changing smartmatch [perl #132594] is too great for the smartmatch changes to stay in for 5.28. This reverts most of the merge in commit da4e040f42421764ef069371d77c008e6b801f45. All core behaviour and documentation is reverted. The removal of use of smartmatch from a couple of tests (that aren't testing smartmatch) remains. Customisation of a couple of CPAN modules to make them portable across smartmatch types remains. A small bugfix in scope.c also remains.
* remove useless "default" mechanismZefram2017-11-281-2/+0
|
* drop op flag for implicit smartmatchZefram2017-11-221-1/+0
| | | | | | OPf_SPECIAL on a smartmatch op used to indicate that it was an implicit smartmatch in a "when" construct. "when" no longer implies smartmatch, so drop the comment about this flag and the special handling in B::Deparse.
* rename op_aux field from 'size' to 'ssize'David Mitchell2017-11-131-1/+1
| | | | | | | This part of the op_aux union was added for OP_MULTICONCAT; its actually of type SSize_t, so rename it to ssize to better reflect that it's signed. This should make no functional difference.
* Add OP_MULTICONCAT opDavid Mitchell2017-10-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can make things a *lot* faster: 4x or more. In more detail: it will optimise into a single OP_MULTICONCAT, most expressions of the form LHS RHS where LHS is one of (empty) my $lexical = $lexical = $lexical .= expression = expression .= and RHS is one of (A . B . C . ...) where A,B,C etc are expressions and/or string constants "aAbBc..." where a,A,b,B etc are expressions and/or string constants sprintf "..%s..%s..", A,B,.. where the format is a constant string containing only '%s' and '%%' elements, and A,B, etc are scalar expressions (so only a fixed, compile-time-known number of args: no arrays or list context function calls etc) It doesn't optimise other forms, such as ($a . $b) . ($c. $d) ((($a .= $b) .= $c) .= $d); (although sub-parts of those expressions might be converted to an OP_MULTICONCAT). This is partly because it would be hard to maintain the correct ordering of tie or overload calls. The compiler uses heuristics to determine when to convert: in general, expressions involving a single OP_CONCAT aren't converted, unless some other saving can be made, for example if an OP_CONST can be eliminated, or in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply OPpTARGET_MY to, but OP_CONST can't. The multiconcat op is of type UNOP_AUX, with the op_aux structure directly holding a pointer to a single constant char* string plus a list of segment lengths. So for "a=$a b=$b\n"; the constant string is "a= b=\n", and the segment lengths are (2,3,1). If the constant string has different non-utf8 and utf8 representations (such as "\x80") then both variants are pre-computed and stored in the aux struct, along with two sets of segment lengths. For all the above LHS types, any SASSIGN op is optimised away. For a LHS of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too. For example where $a and $b are lexical vars, this statement: my $c = "a=$a, b=$b\n"; formerly compiled to const[PV "a="] s padsv[$a:1,3] s concat[t4] sK/2 const[PV ", b="] s concat[t5] sKS/2 padsv[$b:1,3] s concat[t6] sKS/2 const[PV "\n"] s concat[t7] sKS/2 padsv[$c:2,3] sRM*/LVINTRO sassign vKS/2 and now compiles to: padsv[$a:1,3] s padsv[$b:1,3] s multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY In terms of how much faster it is, this code: my $a = "the quick brown fox jumps over the lazy dog"; my $b = "to be, or not to be; sorry, what was the question again?"; for my $i (1..10_000_000) { my $c = "a=$a, b=$b\n"; } runs 2.7 times faster, and if you throw utf8 mixtures in it gets even better. This loop runs 4 times faster: my $s; my $a = "ab\x{100}cde"; my $b = "fghij"; my $c = "\x{101}klmn"; for my $i (1..10_000_000) { $s = "\x{100}wxyz"; $s .= "foo=$a bar=$b baz=$c"; } The main ways in which OP_MULTICONCAT gains its speed are: * any OP_CONSTs are eliminated, and the constant bits (already in the right encoding) are copied directly from the constant string attached to the op's aux structure. * It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in all cases; OP_CONCAT only did this in very limited circumstances. * Because it has a holistic view of the entire concatenation expression, it can do the whole thing in one efficient go, rather than creating and copying intermediate results. pp_multiconcat() goes to considerable efforts to avoid inefficiencies. For example it will only SvGROW() the target once, and to the exact size needed, no matter what mix of utf8 and non-utf8 appear on the LHS and RHS. It never allocates any temporary SVs except possibly in the case of tie or overloading. * It does all its own appending and utf8 handling rather than calling out to functions like sv_catsv(). * It's very good at handling the LHS appearing on the RHS; for example in $x = "abcd"; $x = "-$x-$x-"; It will do roughly the equivalent of the following (where targ is $x); SvPV_force(targ); SvGROW(targ, 11); p = SvPVX(targ); Move(p, p+1, 4, char); Copy("-", p, 1, char); Copy("-", p+5, 1, char); Copy(p+1, p+6, 4, char); Copy("-", p+10, 1, char); SvCUR(targ) = 11; p[11] = '\0'; Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to handle situations like that. The code is quite big; both S_maybe_multiconcat() and pp_multiconcat() (the main compile-time and runtime parts of the implementation) are over 700 lines each. It turns out that when you combine multiple ops, the number of edge cases grows exponentially ;-)
* Fatalize the use of code points above 0xFF for bitwise operators.Abigail2017-06-071-3/+2
| | | | | | This commit removes quite a number of tests, mostly from t/op/bop.t, which test the behaviour of such code points in combination of bitwise operators. Since it's now fatal, the tests are no longer useful.
* Improve handling pattern compilation errorsKarl Williamson2017-02-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Perl tries to continue parsing in the face of errors for the convenience of the person running the script, so as to batch up as many errors as possible, and cut down the number of runs. Some errors will, however, have a cascading effect, resulting in the parser getting confused as to the intent. Perl currently aborts parsing if 10 errors accumulate. However, some things are reparsed as compilation continues, in particular tr///, s///, and qr//. The code that reparses has an expectation of basic sanity in what it is looking at, and so reparsing with known errors can lead to segfaults. Recent commits have tightened this up to avoid reparsing, or substitute valid stuff before reparsing. This all works, as the code won't execute until all the errors get fixed. Commit f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b changed things so that if there is an error in parsing a pattern, the whole compilation is immediately aborted. Since then, I realized it would be relatively simple to instead, skip compilation of that particular pattern, but continue on with the parsing of the program as a whole, up to the maximum number of allowed errors. And again the program will refuse to execute after compilation if there were any errors. This commit implements that, the benefit being that we don't try to reparse a pattern that failed the original parse, but can go on to find errors elsewhere in the program.
* OP_CLASS() docs - mention op_class() tooDavid Mitchell2017-01-241-1/+4
|
* add Perl_op_class(o) API functionDavid Mitchell2017-01-211-0/+18
| | | | | | | | | | | | Given an op, this function determines what type of struct it has been allocated as. Returns one of the OPclass enums, such as OPclass_LISTOP. Originally this was a static function in B.xs, but it has wider applicability; indeed several XS modules on CPAN have cut and pasted it. It adds the OPclass enum to op.h. In B.xs there was a similar enum, but with names like OPc_LISTOP. I've renamed them to OPclass_LISTOP etc. so as not to clash with the cut+paste code already on CPAN.
* String bitwise operators will not accept code points > 0xFF in 5.28Abigail2017-01-161-1/+2
|
* op.h: add parens around macro expansionLukas Mai2016-11-141-1/+1
|
* make OP_SPLIT a PMOP, and eliminate OP_PUSHREDavid Mitchell2016-10-041-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most ops that execute a regex, such as match and subst, are of type PMOP. A PMOP allows the actual regex to be attached directly to that op, due to its extra fields. OP_SPLIT is different; it is just a plain LISTOP, but it always has an OP_PUSHRE as its first child, which *is* a PMOP and which has the regex attached. At runtime, pp_pushre()'s only job is to push itself (i.e. the current PL_op) onto the stack. Later pp_split() pops this to get access to the regex it wants to execute. This is a bit unpleasant, because we're pushing an OP* onto the stack, which is supposed to be an array of SV*'s. As a bit of a hack, on DEBUGGING builds we push a PVLV with the PL_op address embedded instead, but this still isn't very satisfactory. Now that regexes are first-class SVs, we could push a REGEXP onto the stack rather than PL_op. However, there is an optimisation of @array = split which eliminates the assign and embeds the array's GV/padix directly in the PUSHRE op. So split still needs access to that op. But the pushre op will always be splitop->op_first anyway, so one possibility is to just skip executing the pushre altogether, and make pp_split just directly access op_first instead to get the regex and @array info. But if we're doing that, then why not just go the full hog and make OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the data that was spread across the two ops now combined into just the one split op. That is exactly what this commit does. For a simple compile-time pattern like split(/foo/, $s, 1), the optree looks like: before: <@> split[t2] lK </> pushre(/"foo"/) s/RTIME <0> padsv[$s:1,2] s <$> const(IV 1) s after: </> split(/"foo"/)[t2] lK/RTIME <0> padsv[$s:1,2] s <$> const[IV 1] s while for a run-time expression like split(/$pat/, $s, 1), before: <@> split[t3] lK </> pushre() sK/RTIME <|> regcomp(other->8) sK <0> padsv[$pat:2,3] s <0> padsv[$s:1,3] s <$> const(IV 1)s after: </> split()[t3] lK/RTIME <|> regcomp(other->8) sK <0> padsv[$pat:2,3] s <0> padsv[$s:1,3] s <$> const[IV 1] s This makes the code faster and simpler. At the same time, two new private flags have been added for OP_SPLIT - OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the assign op has been optimised away, and if so, whether the array is lexical. Also, deparsing of split has been improved, to the extent that perl TEST -deparse op/split.t now passes. Also, a couple of panic messages in pp_split() have been replaced with asserts().
* Improve code comments for some ctx stuffDavid Mitchell2016-03-301-1/+2
| | | | | | | | | * in pp_return(), some comments were out of date about how leave_adjust_stacks() is called ; * add a comment to all the functions that pp_return() tail-calls to the effect that they can be tail-called; * make it clearer when/why OPf_SPECIAL is set on OP_LEAVE; * CXt_LOOP_PLAIN can be a while loop as well as a plain block.
* convert CX_PUSHEVAL/POPEVAL to inline fnsDavid Mitchell2016-02-031-1/+1
| | | | | | Replace CX_PUSHEVAL() with cx_pusheval() etc. No functional changes.
* rename PUSHBLOCK,PUSHSUB etc to CX_PUSHBLOCK etcDavid Mitchell2016-02-031-1/+1
| | | | | | | Earlier all the POPFOO macros were renamed to CX_POPFOO to reflect the changed API (like POPBLOCK no longer decremented cxstack_ix). Now rename the PUSH ones for consistency.