summaryrefslogtreecommitdiff
path: root/op.h
Commit message (Collapse)AuthorAgeFilesLines
* op.h: Corrent comment about entersub stricturesFather Chrysostomos2013-06-031-1/+1
| | | | Flag 2 on entersub is HINT_STRICT_REFS, not _SUBS.
* Eliminate PL_reg_state.re_reparsing, part 1David Mitchell2013-04-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PL_reg_state.re_reparsing is a hacky flag used to allow runtime code blocks to be included in patterns. Basically, since code blocks are now handled by the perl parser within literal patterns, runtime patterns are handled by taking the (assembled at runtime) pattern, and feeding it back through the parser via the equivalent of eval q{qr'the_pattern'}, so that run-time (?{..})'s appear to be literal code blocks. When this happens, the global flag PL_reg_state.re_reparsing is set, which modifies lexing and parsing in minor ways (such as whether \\ is stripped). Now, I'm in the slow process of trying to eliminate global regex state (i.e. gradually removing the fields of PL_reg_state), and also a change which will be coming a few commits ahead requires the info which this flag indicates to linger for longer (currently it is cleared immediately after the call to scan_str(). For those two reasons, this commit adds a new mechanism to indicate this: a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval. Its still a yukky global flag hack, but its a *different* global flag hack now. For this commit, we add the new flag(s) but keep the old PL_reg_state.re_reparsing flag and assert that the two mechanisms always match. The next commit will remove re_reparsing.
* rework split() special case interaction with regex engineYves Orton2013-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves several issues at once. The parts are sufficiently interconnected that it is hard to break it down into smaller commits. The tickets open for these issues are: RT #94490 - split and constant folding RT #116086 - split "\x20" doesn't work as documented It additionally corrects some issues with cached regexes that were exposed by the split changes (and applied to them). It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171 and cccd1425414e6518c1fc8b7bcaccfb119320c513. Prior to this patch the special RXf_SKIPWHITE behavior of split(" ", $thing) was only available if Perl could resolve the first argument to split at compile time, meaning under various arcane situations. This manifested as oddities like my $delim = $cond ? " " : qr/\s+/; split $delim, $string; and split $cond ? " ", qr/\s+/, $string not behaving the same as: ($cond ? split(" ", $string) : split(/\s+/, $string)) which isn't very convenient. This patch changes this by adding a new flag to the op_pmflags, PMf_SPLIT which enables pp_regcomp() to know whether it was called as part of split, which allows the RXf_SPLIT to be passed into run time regex compilation. We also preserve the original flags so pattern caching works properly, by adding a new property to the regexp structure, "compflags", and related macros for accessing it. We preserve the original flags passed into the compilation process, so we can compare when we are trying to decide if we need to recompile. Note that this essentially the opposite fix from the one applied originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171. The reverted patch was meant to make: split( 0 || " ", $thing ) #1 consistent with my $x=0; split( $x || " ", $thing ) #2 and not with split( " ", $thing ) #3 This was reverted because it broke C<split("\x{20}", $thing)>, and because one might argue that is not that #1 does the wrong thing, but rather that the behavior of #2 that is wrong. In other words we might expect that all three should behave the same as #3, and that instead of "fixing" the behavior of #1 to be like #2, we should really fix the behavior of #2 to behave like #3. (Which is what we did.) Also, it doesn't make sense to move the special case detection logic further from the regex engine. We really want the regex engine to decide this stuff itself, otherwise split " ", ... wouldn't work properly with an alternate engine. (Imagine we add a special regexp meta pattern that behaves the same as " " does in a split /.../. For instance we might make split /(*SPLITWHITE)/ trigger the same behavior as split " ". The other major change as result of this patch is it effectively reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which and free up bits in the regex flags structure. But we dont want to get rid of these vars, and it turns out that RXf_SEEN_LOOKBEHIND is used only in the same situation as the new RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to RXf_NO_INPLACE_SUBST, and then instead of using two vars we use only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to have their bits back.
* make m?$pat? match only once under ithreadsDavid Mitchell2013-01-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [perl #115080] m?...? is only supposed to match once, until reset. Normally this is done by setting the PMf_USED flag on the PMOP. Under ithreads we can't modify ops, so instead we indicate by setting the regex's SV to readonly. (This is a bit of a hack: the flag should be associated with the PMOP, not the regex). This breaks with run-time regexes when the pattern gets recompiled; for example: for my $c (qw(a b c)) { print "matched $c\n" if $c =~ m?^$c$?; } outputs matched a on unthreaded, but matched a matched b matched c on threaded. The re_eval jumbo fix made this more noticeable by sometimes recompiling even when the pattern text hasn't changed (to make closures work ok). The quick fix is to propagate the readonlyness of the old re to the new re. (The proper fix would be to store the flag state in a pad slot associated with the PMOP). Needless to say, I've gone for the quick fix.
* fix warning in PmopSTASH_set()David Mitchell2012-12-101-1/+1
| | | | | | | In the threaded version of PmopSTASH_set(), the assigned value is a PADOFFSET, not a pointer; so use 0 rather than NULL for the default value. This keeps clang happy.
* Stop renamed packages from making reset() crashFather Chrysostomos2012-12-051-34/+11
| | | | | | | | | | | | | | | | | | This only affected threaded builds. I think the comments in the added test explain well enough what was happening. The solution is to store a stashpad offset in the pmop, instead of the name of the stash. This is similar to what was done with cop stashes in d4d03940c58a. Not only does this fix the crash, but it also makes compilation faster and saves memory (no separate malloc for every m?pat?). I had to move Safefree(PL_stashpad) later on in perl_destruct, because freeing a pmop causes the PL_stashpad to be accessed, and pmops can be freed during sv_clean_all. Its previous location was not a problem for cops, as PL_stashpad[cop->cop_stashoff] is only accessed when PL_curcop==that_cop and Perl code is running, not when cops are freed.
* Don’t share TARGs between recursive opsFather Chrysostomos2012-11-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | I had to change the definition of IS_PADCONST to account for the SVf_IsCOW flag. Previously, anything marked READONLY would be consid- ered a pad constant, to be shared by pads of different recursion lev- els. Some of those READONLY things were not actually read-only, as they were copy-on-write scalars, which are never read-only. So I changed the definition of IS_PADCONST in e3918bb703c to accept COWs as well as read-only scalars, since I was removing the READONLY flag from COWs. With the new copy-on-write scheme, it is easy for a TARG to turn into a COW. If that happens and then the same subroutine calls itself recursively for the first time after that, pad_push will see that this is a pad ‘constant’ and allow the next recursion level to share it. If pp_concat calls itself recursively, the recursive call can modify the scalar the outer call is in the middle of using, causing the return value to be doubled up (‘tmptmp’) in the test case added here. Since pad constants are marked PADTMP (I would like to change that eventually), there is no way to distinguish them from TARGs when the are COWs, except for the fact that pad constants that are COWs are always shared hash keys (SvLEN==0).
* SVf_IsCOWFather Chrysostomos2012-11-141-1/+1
| | | | | | | | | | | | | | | As discussed in ticket #114820, instead of using READONLY+FAKE to mark a copy-on-write string, we should make it a separate flag. There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib) that assume that SvREADONLY means read-only. Only one CPAN module, POSIX::pselect will definitely be broken by this. Others may need to be tweaked. But I believe this is for the better. It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq- uisite for any new COW scheme that creates COWs under the same cir- cumstances.
* padrange: handle @_ directlyDavid Mitchell2012-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In a construct like my ($x,$y) = @_ the pushmark/padsv/padsv is already optimised into a single padrange op. This commit makes the OPf_SPECIAL flag on the padrange op indicate that in addition, @_ should be pushed onto the stack, skipping an additional pushmark/gv[*_]/rv2sv combination. So in total (including the earlier padrange work), the above construct goes from being 3 <0> pushmark s 4 <$> gv(*_) s 5 <1> rv2av[t3] lK/1 6 <0> pushmark sRM*/128 7 <0> padsv[$x:1,2] lRM*/LVINTRO 8 <0> padsv[$y:1,2] lRM*/LVINTRO 9 <2> aassign[t4] vKS to 3 <0> padrange[$x:1,2; $y:1,2] l*/LVINTRO,2 ->4 4 <2> aassign[t4] vKS
* add SAVEt_CLEARPADRANGEDavid Mitchell2012-11-101-1/+2
| | | | | Add a new save type that does the equivalent of multiple SAVEt_CLEARSV's for a given target range. This makes the new padange op more efficient.
* add padrange opDavid Mitchell2012-11-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This single op can, in some circumstances, replace the sequence of a pushmark followed by one or more padsv/padav/padhv ops, and possibly a trailing 'list' op, but only where the targs of the pad ops form a continuous range. This is generally more efficient, but is particularly so in the case of void-context my declarations, such as: my ($a,@b); Formerly this would be executed as the following set of ops: pushmark pushes a new mark padsv[$a] pushes $a, does a SAVEt_CLEARSV padav[@b] pushes all the flattened elements (i.e. none) of @a, does a SAVEt_CLEARSV list pops the mark, and pops all stack elements except the last nextstate pops the remaining stack element It's now: padrange[$a..@b] does two SAVEt_CLEARSV's nextstate nothing needing doing to the stack Note that in the case above, this commit changes user-visible behaviour in pathological cases; in particular, it has always been possible to modify a lexical var *before* the my is executed, using goto or closure tricks. So in principle someone could tie an array, then could notice that FETCH is no longer being called, e.g. f(); my ($s, @a); # this no longer triggers two FETCHES sub f { tie @a, ...; push @a, 1,2; } But I think we can live with that. Note also that having a padrange operator will allow us shortly to have a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple individual SAVEt_CLEARSV's.
* simplify GIMME_VDavid Mitchell2012-11-041-4/+2
| | | | | | Since OPf_WANT_VOID == G_VOID etc, we can substantially simplify the OP_GIMME macro (which is what implements GIMME_V). This saves 588 bytes in the perl executable on my -O2 system.
* Re-enable static op allocation with obslabReini Urban2012-10-251-2/+5
| | | | | | obslab and the removal of the op_latefree logic, which allowed static ops, removed support for the compiler modules, which allocates ops statically. Add an op_static flag to replace the old latefree(d) op_free logic.
* Remove PMf_MAYBE_CONSTFather Chrysostomos2012-10-111-3/+0
| | | | It was added in ce862d02d but has never been used.
* [perl #94490] const fold should not trigger special split " "Father Chrysostomos2012-09-221-1/+1
| | | | | | | | | | | The easiest way to fix this was to move the special handling out of the regexp engine. Now a flag is set on the split op itself for this case. A real regexp is still created, as that is the most convenient way to propagate locale settings, and it prevents the need to rework pp_split to handle a null regexp. This also means that custom regexp plugins no longer need to handle split specially (which they all do currently).
* Correct typo in flag nameFather Chrysostomos2012-08-251-1/+1
|
* Banish boolkeysFather Chrysostomos2012-08-251-0/+2
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* Optimise %hash in sub { %hash || ... }Father Chrysostomos2012-08-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In %hash || $foo, the %hash is in scalar context, so it has to iterate through the buckets to produce statistics on bucket usage. If the || is in void context, the value returned by hash is only ever used as a boolean (as || doesn’t have to return it). We already opti- mise it by adding a boolkeys op when it is known at compile time that || will be in void context. In sub { %hash || $foo } it is not known at compile time that it will be in void context, so it wasn’t optimised. This commit optimises it by flagging the %hash at compile time as being possibly in ‘true boolean’ context. When that flag is set, the rv2hv and padhv ops call block_gimme() to see whether || is in void context. This speeds things up signficantly. Here is what I got after optimis- ing rv2hv but before doing padhv: $ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m0.179s user 0m0.101s sys 0m0.005s $ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m5.446s user 0m2.419s sys 0m0.015s (That example is slightly misleading because of the closure, but the closure version takes 1 sec. when optimised.)
* assert_(...)Father Chrysostomos2012-08-051-5/+1
| | | | | | | | This new macro expands to ‘assert(...),’ (with a trailing comma) under debugging builds; the empty string otherwise. It allows for the removal of some #ifdef DEBUGGINGs, which could not be avoided otherwise.
* Remove op_latefree(d)Father Chrysostomos2012-07-141-15/+2
| | | | | | | This was an early attempt to fix leaking of ops after syntax errors, disabled because it was deemed to fragile. The new slab allocator (8be227a) has solved this problem another way, so latefree(d) no longer serves any purpose.
* Eliminate PL_OP_SLAB_ALLOCFather Chrysostomos2012-07-121-1/+1
| | | | | | | | | | | | This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator *is* the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
* PERL_DEBUG_READONLY_OPS with the new allocatorFather Chrysostomos2012-07-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.
* Record folded constants in the op treeFather Chrysostomos2012-07-041-0/+1
|
* CV-based slab allocation for opsFather Chrysostomos2012-06-291-10/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses bugs #111462 and #112312 and part of #107000. When a longjmp occurs during lexing, parsing or compilation, any ops in C autos that are not referenced anywhere are leaked. This commit introduces op slabs that are attached to the currently- compiling CV. New ops are allocated on the slab. When an error occurs and the CV is freed, any ops remaining are freed. This is based on Nick Ing-Simmons’ old experimental op slab implemen- tation, but it had to be rewritten to work this way. The old slab allocator has a pointer before each op that points to a reference count stored at the beginning of the slab. Freed ops are never reused. When the last op on a slab is freed, the slab itself is freed. When a slab fills up, a new one is created. To allow iteration through the slab to free everything, I had to have two pointers; one points to the next item (op slot); the other points to the slab, for accessing the reference count. Ops come in different sizes, so adding sizeof(OP) to a pointer won’t work. The old slab allocator puts the ops at the end of the slab first, the idea being that the leaves are allocated first, so the order will be cache-friendly as a result. I have preserved that order for a dif- ferent reason: We don’t need to store the size of the slab (slabs vary in size; see below) if we can simply follow pointers to find the last op. I tried eliminating reference counts altogether, by having all ops implicitly attached to PL_compcv when allocated and freed when the CV is freed. That also allowed op_free to skip FreeOp altogether, free- ing ops faster. But that doesn’t work in those cases where ops need to survive beyond their CVs; e.g., re-evals. The CV also has to have a reference count on the slab. Sometimes the first op created is immediately freed. If the reference count of the slab reaches 0, then it will be freed with the CV still point- ing to it. CVs use the new CVf_SLABBED flag to indicate that the CV has a refer- ence count on the slab. When this flag is set, the slab is accessible via CvSTART when CvROOT is not set, or by subtracting two pointers (2*sizeof(I32 *)) from CvROOT when it is set. I decided to sneak the slab into CvSTART during compilation, because enlarging the xpvcv struct by another pointer would make all CVs larger, even though this patch only benefits few (programs using string eval). When the CVf_SLABBED flag is set, the CV takes responsibility for freeing the slab. If CvROOT is not set when the CV is freed or undeffed, it is assumed that a compilation error has occurred, so the op slab is traversed and all the ops are freed. Under normal circumstances, the CV forgets about its slab (decrement- ing the reference count) when the root is attached. So the slab ref- erence counting that happens when ops are freed takes care of free- ing the slab. In some cases, the CV is told to forget about the slab (cv_forget_slab) precisely so that the ops can survive after the CV is done away with. Forgetting the slab when the root is attached is not strictly neces- sary, but avoids potential problems with CvROOT being written over. There is code all over the place, both in core and on CPAN, that does things with CvROOT, so forgetting the slab makes things more robust and avoids potential problems. Since the CV takes ownership of its slab when flagged, that flag is never copied when a CV is cloned, as one CV could free a slab that another CV still points to, since forced freeing of ops ignores the reference count (but asserts that it looks right). To avoid slab fragmentation, freed ops are marked as freed and attached to the slab’s freed chain (an idea stolen from DBM::Deep). Those freed ops are reused when possible. I did consider not reusing freed ops, but realised that would result in significantly higher mem- ory using for programs with large ‘if (DEBUG) {...}’ blocks. SAVEFREEOP was slightly problematic. Sometimes it can cause an op to be freed after its CV. If the CV has forcibly freed the ops on its slab and the slab itself, then we will be fiddling with a freed slab. Making SAVEFREEOP a no-op won’t help, as sometimes an op can be savefreed when there is no compilation error, so the op would never be freed. It holds a reference count on the slab, so the whole slab would leak. So SAVEFREEOP now sets a special flag on the op (->op_savefree). The forced freeing of ops after a compilation error won’t free any ops thus marked. Since many pieces of code create tiny subroutines consisting of only a few ops, and since a huge slab would be quite a bit of baggage for those to carry around, the first slab is always very small. To avoid allocating too many slabs for a single CV, each subsequent slab is twice the size of the previous. Smartmatch expects to be able to allocate an op at run time, run it, and then throw it away. For that to work the op is simply mallocked when PL_compcv has’t been set up. So all slab-allocated ops are marked as such (->op_slabbed), to distinguish them from mallocked ops. All of this is kept under lock and key via #ifdef PERL_CORE, as it should be completely transparent. If it isn’t transparent, I would consider that a bug. I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as it is used by PERL_DEBUG_READONLY_OPS, which I am not about to rewrite. :-) Concerning the change from A to X for slab allocation functions: Many times in the past, A has been used for functions that were not intended to be public but were used for public macros. Since PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_* to be API functions, since they were rarely actually available. To avoid propagating this mistake further, they are now X.
* Flag ops that are on the savestackFather Chrysostomos2012-06-291-2/+4
| | | | | | | | | This is to allow future commits to free dangling ops after errors. If an op is on the savestack, then it is going to be freed by scope.c, and op_free must not be called on it by anyone else. So we flag such ops new.
* add PMf_USE_RE_EVAL flagDavid Mitchell2012-06-131-1/+2
| | | | | | This isn't actually used in the pm_flags field of a PMOP, but is used in the pm_flags arg of Perl_re_op_compile() to indicate that this pattern should be compiled with "use re 'eval'" in scope
* add PMf_IS_QR flagDavid Mitchell2012-06-131-1/+6
| | | | | | | | | | | This indicates that a particular PMOP is in fact OP_QR. We should of course be able to tell this from op_type, but the regex-compiling API only gets passed op_flags. This then allows us to fix a bug where we were deciding during compilation whether to hang on to the code_blocks based on whether the PMOP was PMf_HAS_CV rather than PMf_IS_QR; the latter implies the former, but not the other way round.
* add PMf_CODELIST_PRIVATE flagDavid Mitchell2012-06-131-1/+5
| | | | | | | | | | | | | | | This indicates that the op_code_list field in a PMOP is "private"; that is, it points to a list of DO blocks that we don't own, and shouldn't free, and whose pad may not match ours. This will allow us to use the op_code_list field in the runtime case of literal code, e.g. /$runtime(?{...})/ and qr/$runtime(?{...})/. Here, at compile-time, we need to make the pre-compiled (?{..}) blocks available to pp_regcomp, but the list containing those blocks is also the list that is executed in the lead-up to executing pp_regcomp (while skipping the DO blocks), so the code is already embedded, and doesn't need freeing. Furthermore, in the qr// case, the code blocks are actually within a different sub (an anon one) than the PMOP, so the pads won't match.
* make qr/(?{})/ behave with closuresDavid Mitchell2012-06-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this commit, qr// with a literal (compile-time) code block will Do the Right Thing as regards closures and the scope of lexical vars; in particular, the following now creates three regexes that match 1, 2 and 3: for my $i (0..2) { push @r, qr/^(??{$i})$/; } "1" =~ $r[1]; # matches Previously, $i would be evaluated as undef in all 3 patterns. This is achieved by wrapping the compilation of the pattern within a new anonymous CV, which is then attached to the pattern. At run-time pp_qr() clones the CV as well as copying the REGEXP; and when the code block is executed, it does so using the pad of the cloned CV. Which makes everything come out all right in the wash. The CV is stored in a new field of the REGEXP, called qr_anoncv. Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/; nor is it yet fixed where the qr// is embedded within another pattern: continuing with the code example from above, my $i = 99; "1" =~ $r[1]; # bare qr: matches: correct! "X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still seeing the wrong $i
* Mostly complete fix for literal /(?{..})/ blocksDavid Mitchell2012-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the way that code blocks in patterns are parsed and executed, especially as regards lexical and scoping behaviour. (Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns, and literals within qr//, are still done the old broken way for now). This change means that for literal /(?{..})/ and /(??{..})/: * the code block is now fully parsed in the same pass as the surrounding code, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" ) * Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from #!/usr/bin/perl /(?{ warn "boo" })/ has changed from boo at (re_eval 1) line 1. to boo at /tmp/p line 2. * scope and closures now behave as you might expect; for example for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ } now prints "abc" rather than "" * with recursion, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000": sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0); * an earlier fix that stopped 'my' declarations within code blocks causing crashes, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed; * UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though) This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit, for the non-qr and non-runtime variants, those code blocks are no longer thrown away. Instead: * the LISTOP generated by the parser, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern, is now stored in a new field in PMOPs, called op_code_list. For example in /A(?{BLOCK})C/, the listop stored in op_code_list looks like LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B'] * each of the code blocks has its last op set to null and is individually run through the peephole optimiser, so each one becomes a little self-contained block of code, rather than a list of blocks that run into each other; * then in re_op_compile(), we concatenate the list of CONSTs to produce a string to be compiled, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})']; * (if the current regex engine isn't the built-in perl one, then we just throw away the code blocks and pass the concatenated string to the engine) * then during regex compilation, whenever we encounter a '(?{', we see if it matches the index of one of the pre-compiled blocks, and if so, we store a pointer to that block in an 'l' data slot, and use the end index to skip over the text of the code body. Conversely, if the index doesn't match, then we know that it's a run-time pattern and (for now), compile it in the old way. * During execution, when an EVAL op is encountered, if data->what is 'l', then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* Remove OPpCONST_WARNINGFather Chrysostomos2012-05-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This was added to op.h in commit 599cee73: commit 599cee73f2261c5e09cde7ceba3f9a896989e117 Author: Paul Marquess <paul.marquess@btinternet.com> Date: Wed Jul 29 10:28:45 1998 +0100 lexical warnings; tweaks to places that didn't apply correctly Message-Id: <9807290828.AA26286@claudius.bfsec.bt.co.uk> Subject: lexical warnings patch for 5.005_50 p4raw-id: //depot/perl@1773 dump.c was modified to dump in, in this commit: commit bf91b999b25fa75a3ef7a327742929592a2e7e9c Author: Simon Cozens <simon@netthink.co.uk> Date: Sun May 13 21:20:36 2001 +0100 Op private flags Message-ID: <20010513202036.A21896@netthink.co.uk> p4raw-id: //depot/perl@10117 But is apparently completely unused anywhere. And I want that bit.
* Correct comment typo in op.hFather Chrysostomos2012-05-211-1/+1
|
* Fix VMS build broken by d1718a7cf5Father Chrysostomos2012-04-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This comment in perl.c: /* Note: 20,40,80 used for NATIVE_HINTS */ (added by a0ed51b3 [Here are the long-expected Unicode/UTF-8 mod- ifications.]), has apparently always been wrong. The values in vms/vmsish.h end with 7 zeroes in hex, and are only shifted down to one zero when assigned to cop->op_private in op.c:newSTATEOP. In PL_hints they never have the values indicated in perl.h. So those are actually free bits. It’s the high versions in vmsish.h that are not free. 20 (actually 0x20000000) hasn’t been used for anything since commit 744a34f9085 (Urk -- undo previous removal of vmsish 'exit' change), which change I don’t really understand. In any case, the comment was never updated. This comment in op.h: /* NOTE: OP_NEXTSTATE and OP_DBSTATE (i.e. COPs) carry lower * bits of PL_hints in op_private */ was added by d41ff1b8ad98 (introduce $^U), which was later reverted. op_private does not carry the lower bits of PL_hints, but, rather, certain higher bits of PL_hints, shifted to fit (the NATIVE_HINTS cited above). Due to misleading comments, I ended up breaking the VMS build in com- mit d1718a7cf5, by using its bits for something else. This commit moves the bits around a bit to avoid the clash, and modi- fies the comments to reflect reality.
* Label UTF8 cleanupBrian Fraser2012-03-251-0/+3
| | | | | This meant changing LABEL's definition in perly.y, so most of this commit is actually from the regened files.
* add wrap_op_checker() API functionZefram2012-02-111-0/+16
| | | | | This function provides a convenient and thread-safe way for modules to hook op checking.
* [perl #77388] Make stacked -t workFather Chrysostomos2012-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up till now, -t was popping too much off the stack when stacked with other filetest operators. Since the special use of _ doesn’t apply to -t, we cannot simply have it use _ when stacked, but instead we pass the argument down from the previous op. To facilitate this, the whole stacked mechanism has to change. As before, in an expression like -r -w -x, -x and -w are flagged as ‘stacking’ ops (followed by another filetest), and -w and -r are flagged as stacked (preceded by another filetest). Stacking filetest ops no longer return a false value to the next op when a test fails, and stacked ops no longer check the truth of the value on the stack to determine whether to return early (if it’s false). The argument to the first filetest is now passed from one op to another. This is similar to the mechanism that overloaded objects were already using. Now it applies to any argument. Since it could be false, we cannot rely on the boolean value of the stack item. So, stacking ops, when they return false, now traverse the ->op_next pointers and find the op after the last stacked op. That op is returned to the runloop. This short-circuiting is proba- bly faster than calling every subsequent op (a separate function call for each). Filetest ops other than -t continue to use the last stat buffer when stacked, so the argument on the stack is ignored. But if the op is preceded by nothing other than -t (where preceded means on the right, since the ops are evaluated right-to-left), it *does* use the argument on the stack, since -t has not set the last stat buffer. The new OPpFT_AFTER_t flag indicates that a stacked op is preceded by nothing other than -t. In ‘-e -t foo’, the -e gets the flag, but not in ‘-e -t -r foo’, because -r will have saved the stat buffer, so -e can just use that.
* Stop tell($glob_copy) from clearing PL_last_in_gvFather Chrysostomos2011-12-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug is a side effect of rv2gv’s starting to return an incoercible mortal copy of a coercible glob in 5.14: $ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' 0 $ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' -1 In the first case, tell without arguments is returning the position of the filehandle. In the second case, tell with an explicit argument that happens to be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu- ally tell *$fh) sets PL_last_in_gv to a mortal copy thereof, which is freed at the end of the statement, setting PL_last_in_gv to null. So there is no ‘last used’ handle by the time we get to the tell without arguments. This commit adds a new rv2gv flag that tells it not to copy the glob. By doing it unconditionally on the kidop, this allows tell(*$fh) to work the same way. Let’s hope nobody does tell(*{*$fh}), which will unset PL_last_in_gv because the inner * returns a mortal copy. This whole area is really icky. PL_last_in_gv should be refcounted, but that would cause handles to leak out of scope, breaking programs that rely on the auto-closing ‘feature’.
* [perl #91514] Use correct error msg for defaultFather Chrysostomos2011-12-161-1/+2
| | | | | | | | This commit not only mentions default (as opposed to when) in the error message about it being outside a topicalizer, but also normalises those error messages, making them consistent with continue and other loop controls. It also makes the perldiag entry for when actually match the error message.
* Optimise substr assignment in void contextFather Chrysostomos2011-11-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In void context we can optimise substr($foo, $bar, $baz) = $replacement; to something like substr($foo, $bar, $baz, $replacement); except that the execution order must be preserved. So what we actu- ally do is substr($replacement, $foo, $bar, $baz); with a flag to indicate that the replacement comes first. This means we can also optimise assignment to two-argument substr the same way. Although optimisations are not supposed to change behaviour, this one does. • It stops substr assignment from calling get-magic twice, which means the optimisation makes things less buggy than usual. • It causes the uninitialized warning (for an undefined first argu- ment) to mention the substr operator, as it did before the previous commit, rather than the assignment operator. I think that sort of detail is minor enough. I had to make the warning about clobbering references apply whenever substr does a replacement, and not only when used as an lvalue. So four-argument substr now emits that warning. I would consider that a bug fix, too. Also, if the numeric arguments to four-argument substr and the replacement string are undefined, the order of the uninitialized warn- ings is slightly different, but is consistent regardless of whether the optimisation is in effect. I believe this will make 95% of substr assignments run faster. So there is less incentive to use what I consider the less readable form (the four-argument form, which is not self-documenting). Since I like naïve benchmarks, here are Before and After: $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m2.391s user 0m2.381s sys 0m0.005s $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m0.936s user 0m0.927s sys 0m0.005s
* [perl #80628] __SUB__Father Chrysostomos2011-11-221-1/+1
| | | | | After much alternation, altercation and alteration, __SUB__ is finally here.
* Add evalbytes functionFather Chrysostomos2011-11-061-0/+2
| | | | | | | | | | | This function evaluates its argument as a byte string, regardless of the internal encoding. It croaks if the string contains characters outside the byte range. Hence evalbytes(" use utf8; '\xc4\x80' ") will return "\x{100}", even if the original string had the UTF8 flag on, and evalbytes(" '\xc4\x80' ") will return "\xc4\x80". This has the side effect of fixing the deparsing of CORE::break under ‘use feature’ when there is an override.
* eval STRING UTF8 cleanup.Brian Fraser2011-11-061-0/+1
| | | | | (modified by the committer only to apply when the unicode_eval feature is enabled)
* Fix CORE::globFather Chrysostomos2011-10-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | This commit makes CORE::glob bypassing glob overrides. A side effect of the fix is that, with the default glob implementa- tion, undefining *CORE::GLOBAL::glob no longer results in an ‘unde- fined subroutine’ error. Another side effect is that compilation of a glob op no longer assumes that the loading of File::Glob will create the *CORE::GLOB::glob type- glob. ‘++$INC{"File/Glob.pm"}; sub File::Glob::csh_glob; eval '<*>';’ used to crash. This is accomplished using a mechanism similar to lock() and threads::shared. There is a new PL_globhook interpreter varia- ble that pp_glob calls when there is no override present. Thus, File::Glob (which is supposed to be transparent, as it *is* the built-in implementation) no longer interferes with the user mechanism for overriding glob. This removes one tier from the five or so hacks that constitute glob’s implementation, and which work together to make it one of the buggiest and most inconsistent areas of Perl.
* Oust cv_ckproto_lenFather Chrysostomos2011-10-061-1/+1
| | | | | | | | | | It is no longer used in core (having been superseded by cv_ckproto_len_flags), is unused on CPAN, and is not part of the API. The cv_ckproto ‘public’ macro is modified to use the _flags version. I put ‘public’ in quotes because, even before this commit, cv_ckproto was using a non-exported function, and hence could never have worked on a strict linker (or whatever you call it).
* Groundwork to allow cops and pmops to store the UTF8 flagBrian Fraser2011-10-061-7/+18
| | | | | | | | | With threaded builds, cop.h and op.h get an extra member in their structs, to save the UTF-8ness of the stash's name. *STASH_set() checks for the flag, stores it through *STASH_flags(), and *STASH() uses the latter to fetch the correct scalar.
* remove index offsetting ($[)Zefram2011-09-091-1/+0
| | | | | | $[ remains as a variable. It no longer has compile-time magic. At runtime, it always reads as zero, accepts a write of zero, but dies on writing any other value.
* Use OPpDEREF for lvalue sub, such that the flags contains the deref type, ↵Gerard Goossen2011-09-011-1/+1
| | | | | | | instead of deriving it from the opchain. Also contains a test where using the opchain to determine the deref type fails.
* Reassign op_private flags of OP_ENTERSUB such that bits 32 and 64 can be ↵Gerard Goossen2011-09-011-3/+3
| | | | used by OPpDEREF
* Add OPpCOREARGS_SCALARMOD flagFather Chrysostomos2011-08-261-0/+1
| | | | | pp_coreargs will use this to distinguish between the \$ and \[$@%*] prototypes.