| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1830b3d9c8 introduced a flaw where XopENTRY calls
Perl_custom_op_xop twice to retrieve the same XOP *. This is inefficient
and causes extra machine code. Since I found no CPAN or upstream=blead
usage of Perl_custom_op_xop, and its previous docs say it isn't 100%
public, it is being converted to a macro.
Most usage of Perl_custom_op_xop is to conditionally fetch a member of the
XOP struct, which was previously implemented by XopENTRY. Move the XopENTRY
logic and picking defaults to an expanded version of Perl_custom_op_xop.
The union allows Perl_custom_op_get_field to return its result in 1
register, since the union is similar to a void * or IV, but with the
machine code overhead of casting, if any, being done in the callee
(Perl_custom_op_get_field), not the caller. Perl_custom_op_get_field can
also return the XOP * without looking inside it to implement
Perl_custom_op_xop.
XopENTRYCUSTOM is a wrapper around Perl_custom_op_get_field with
XopENTRY-like usage.
XopENTRY is used by the OP_* macros, which are heavily used (but rarely
called, since custom ops are rare) by Perl lang warnings system. The
vararg warning arguments are usually evaluted no matter if the warning
will be printed to STDERR or not. Since some people like to ignore warnings
or run no strict; and warnings branches are frequent in pp_*, it is
beneficial to make the OP_* macros smaller in machine code. The design
of Perl_custom_op_get_field supports these goals.
This commit does not pass judgement on Ben Morrow's unclear public or
private API designation of Perl_custom_op_xop, and whether
Perl_custom_op_xop should deprecated and removed from public API. It was
trivial to leave a form of Perl_custom_op_xop in the new design.
XOPe enums are identical to XOPf constants so no conversion has to be
done between the field selector parameter and the field flag to test
in machine code.
ASSUME and NOT_REACHED are being introduced. The closest to the 2
previously was "assert(0)". Perl has not used ASSUME or CC specific
versions of it before. Clang, GCC >= 4.5, and Visual C are supported. For
completeness, ARMCC's __promise was added, but Perl is not known to have
any support for ARMCC by this commiter.
This patch is part of perl #115032.
|
|
|
|
|
|
|
|
|
| |
by removing the hint from the exit op itself and just having pp_exit
look in the cop hint hash, where it is already stored (as a result of
having been in %^H at compile time).
&CORE:: subs intentionally lack a nextstate op (cop) so they can see
the hints in the caller’s nextstate op.
|
|
|
|
|
|
|
| |
This commit makes them behave like exit and die without the ampersand
by moving the OPpHUSH_VMSISH hint from exit/die op to the current
statement (nextstate/cop) instead. &CORE:: subs intentionally lack a
nextstate op, so they can see the hints in the caller’s nextstate op.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and reword the warning slightly.
See <20131027204944.20489.qmail@lists-nntp.develooper.com>.
To avoid getting a warning about scalar context for ‘delete %a[1,2]’,
which dies anyway, I stopped scalar context from being applied to
delete’s argument. Scalar context is not meaningful here anyway, and
the context is not really scalar.
This also means that ‘delete sort’ no longer produces a warning about
scalar context before dying, so I added a test for that.
|
|
|
|
|
|
|
|
| |
If a bare block is the last thing in an lvalue sub, OP_LEAVELOOP needs
to propagate lvalue context and handle returned arrays properly, just
as OP_LEAVE has done since yesterday.
This is a follow-up to 2ec7f6f24289. This came up in ticket #119797.
|
|
|
|
| |
This flag will be used by the next commit.
|
|
|
|
|
|
|
| |
Now that we have op->op_folded, we don’t need OPpCONST_FOLDED any
more. In removing it, I modified B::Concise to output op_folded the
way OPpCONST_FOLDED was output before, since it can be helpful to have
it when reading op dumps.
|
|
|
|
|
|
|
| |
Entersub ops can turn into rv2cv ops at compile time. It is important
that possible flag conflicts are handled properly. Since it took me
a while to figure out that there were no bugs here, here are my own
notes, cleaned up and nicely formatted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resolves tickets #28380 and #114024.
Commit 95a31aad5 did something similar to this for the new %hash{...}
syntax. This commit extends it to @ slices and combines the two
code paths.
The heuristics in toke.c can easily produce false positives. So the
op is flagged as being a candidate for the warning. Then when op.c
has the op tree available, it examines it to see whether the heuristic
may have been a false positive.
This avoids bugs with qw "foo bar baz" and sub calls triggering
the warning.
The source code is no longer available for the warning, so we recon-
struct it from the op tree, skipping the subscript if it is anything
other than a const op.
This means that @hash{$foo} comes out as @hash{...} and @hash{foo} as
@hash{"foo"}. It also meeans that @hash{"]"} is displayed correctly
instead of as @hash{"].
Commit 95a31aad5 also modified the heuristic for %hash{...} to exempt
qw altogether. But it did not exempt it if it was preceded by a tab.
So this commit rectifies that.
This commit also improves the false positive detection by exempting
any ops returning lists that can get past toke.c’s heuristic. I went
through the entire list of ops, but I may have missed some.
Also, @ slices on the lhs of = are exempt, as they change the context
and are hence actually useful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of warning in the lexer, flag the op and then warn in op.c,
when the op tree is available, so we don’t end up warning for actual
lists or for sub calls.
Also, only warn in scalar context, as in list context $hash{$scalar}
and %hash{$scalar} do different things.
In op.c we no longer have easy access to the source code, so recon-
struct the hash/array access based on the op tree. This means
%hash{foo} becomes %hash{"foo"}. We only reconstruct constant keys,
so %hash{++$x} becomes %hash{...}. This also corrects erroneous
dumps, like %hash{"} for %hash{"}"}.
Instead of triggering the warning solely based on the op tree, we
still keep the heuristic in toke.c, so that common workarounds for
that warning (e.g., {q<key>} and {("key")}) continue to work.
The heuristic in toke.c is tweaked to avoid warning for qw().
In a future commit I plan to extend this to the existing @array[0] and
@hash{key} warnings, to avoid false positives.
|
|
|
|
|
|
| |
Te various different forms of goto (and dump) take different branches
through this big function. Document which branches handle which variants.
Also document the use of OPf_SPECIAL in OP_DUMP.
|
|
|
|
|
|
|
|
|
|
| |
Make the array interface 64-bit safe by using SSize_t instead of I32
for array indices.
This is based on a patch by Chip Salzenberg.
This completes what the previous commit began when it changed
av_extend.
|
|
|
|
|
|
|
|
|
| |
Add a new member, op_folded, to BASEOP. It is replacement for
OPpCONST_FOLDED (which can only be set on OP_CONST). At the moment
OPpCONST_FOLDED remains, as it is exposed in B (e.g. B::Concise relies
on it).
Signed-off-by: Niels Thykier <niels@thykier.net>
|
|
|
|
|
|
|
|
|
|
|
| |
See ticket #118055 for all the detail. On systems where IV is bigger
than a pointer, the slab allocator messes things up because it only
provides pointer alignment. If pmops have an IV field, we cannot
allocate them via slab on such systems. Pmops actually don’t need
an IV, just a PADOFFSET. So we can change them and remove the
workaround.
This is obviously not suitable for maint.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At compile time, if split occurs on the right-hand side of an assign-
ment to a list of scalars, if the limit argument is a constant con-
taining the number 0 then it is modified in place to hold one more
than the number of scalars.
This means ‘constants’ can change their values, if they happen to be
in the wrong place at the wrong time:
$ ./perl -Ilib -le 'use constant NULL => 0; ($a,$b,$c) = split //, $foo, NULL; print NULL'
4
I considered checking the reference count on the SV, but since XS code
could create its own const ops with weak references to the same cons-
tants elsewhere, the safest way to avoid modifying someone else’s SV
is to mark the split op in ck_split so we know the SV belongs to that
split op alone.
Also, to be on the safe side, turn off the read-only flag before modi-
fying the SV, instead of relying on the special case for compile time
in sv_force_normal.
|
|
|
|
| |
Flag 2 on entersub is HINT_STRICT_REFS, not _SUBS.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_reg_state.re_reparsing is a hacky flag used to allow runtime
code blocks to be included in patterns. Basically, since code blocks
are now handled by the perl parser within literal patterns, runtime
patterns are handled by taking the (assembled at runtime) pattern,
and feeding it back through the parser via the equivalent of
eval q{qr'the_pattern'},
so that run-time (?{..})'s appear to be literal code blocks.
When this happens, the global flag PL_reg_state.re_reparsing is set,
which modifies lexing and parsing in minor ways (such as whether \\ is
stripped).
Now, I'm in the slow process of trying to eliminate global regex state
(i.e. gradually removing the fields of PL_reg_state), and also a change
which will be coming a few commits ahead requires the info which this flag
indicates to linger for longer (currently it is cleared immediately after
the call to scan_str().
For those two reasons, this commit adds a new mechanism to indicate this:
a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING
in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval.
Its still a yukky global flag hack, but its a *different* global flag hack
now.
For this commit, we add the new flag(s) but keep the old
PL_reg_state.re_reparsing flag and assert that the two mechanisms always
match. The next commit will remove re_reparsing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch resolves several issues at once. The parts are
sufficiently interconnected that it is hard to break it down
into smaller commits. The tickets open for these issues are:
RT #94490 - split and constant folding
RT #116086 - split "\x20" doesn't work as documented
It additionally corrects some issues with cached regexes that
were exposed by the split changes (and applied to them).
It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171
and cccd1425414e6518c1fc8b7bcaccfb119320c513.
Prior to this patch the special RXf_SKIPWHITE behavior of
split(" ", $thing)
was only available if Perl could resolve the first argument to
split at compile time, meaning under various arcane situations.
This manifested as oddities like
my $delim = $cond ? " " : qr/\s+/;
split $delim, $string;
and
split $cond ? " ", qr/\s+/, $string
not behaving the same as:
($cond ? split(" ", $string) : split(/\s+/, $string))
which isn't very convenient.
This patch changes this by adding a new flag to the op_pmflags,
PMf_SPLIT which enables pp_regcomp() to know whether it was called
as part of split, which allows the RXf_SPLIT to be passed into run
time regex compilation. We also preserve the original flags so
pattern caching works properly, by adding a new property to the
regexp structure, "compflags", and related macros for accessing it.
We preserve the original flags passed into the compilation process,
so we can compare when we are trying to decide if we need to
recompile.
Note that this essentially the opposite fix from the one applied
originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171.
The reverted patch was meant to make:
split( 0 || " ", $thing ) #1
consistent with
my $x=0; split( $x || " ", $thing ) #2
and not with
split( " ", $thing ) #3
This was reverted because it broke C<split("\x{20}", $thing)>, and
because one might argue that is not that #1 does the wrong thing,
but rather that the behavior of #2 that is wrong. In other words
we might expect that all three should behave the same as #3, and
that instead of "fixing" the behavior of #1 to be like #2, we should
really fix the behavior of #2 to behave like #3. (Which is what we did.)
Also, it doesn't make sense to move the special case detection logic
further from the regex engine. We really want the regex engine to decide
this stuff itself, otherwise split " ", ... wouldn't work properly with
an alternate engine. (Imagine we add a special regexp meta pattern that behaves
the same as " " does in a split /.../. For instance we might make
split /(*SPLITWHITE)/ trigger the same behavior as split " ".
The other major change as result of this patch is it effectively
reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which
was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which
and free up bits in the regex flags structure.
But we dont want to get rid of these vars, and it turns out that
RXf_SEEN_LOOKBEHIND is used only in the same situation as the new
RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to
RXf_NO_INPLACE_SUBST, and then instead of using two vars we use
only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to
have their bits back.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #115080]
m?...? is only supposed to match once, until reset. Normally this is done
by setting the PMf_USED flag on the PMOP. Under ithreads we can't modify
ops, so instead we indicate by setting the regex's SV to readonly. (This
is a bit of a hack: the flag should be associated with the PMOP, not the
regex).
This breaks with run-time regexes when the pattern gets recompiled; for
example:
for my $c (qw(a b c)) {
print "matched $c\n" if $c =~ m?^$c$?;
}
outputs
matched a
on unthreaded, but
matched a
matched b
matched c
on threaded.
The re_eval jumbo fix made this more noticeable by sometimes recompiling
even when the pattern text hasn't changed (to make closures work ok).
The quick fix is to propagate the readonlyness of the old re to the new
re. (The proper fix would be to store the flag state in a pad slot
associated with the PMOP).
Needless to say, I've gone for the quick fix.
|
|
|
|
|
|
|
| |
In the threaded version of PmopSTASH_set(), the assigned value is a
PADOFFSET, not a pointer; so use 0 rather than NULL for the default value.
This keeps clang happy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only affected threaded builds. I think the comments in the added
test explain well enough what was happening.
The solution is to store a stashpad offset in the pmop, instead of the
name of the stash. This is similar to what was done with cop stashes
in d4d03940c58a.
Not only does this fix the crash, but it also makes compilation faster
and saves memory (no separate malloc for every m?pat?).
I had to move Safefree(PL_stashpad) later on in perl_destruct, because
freeing a pmop causes the PL_stashpad to be accessed, and pmops can be
freed during sv_clean_all. Its previous location was not a problem
for cops, as PL_stashpad[cop->cop_stashoff] is only accessed when
PL_curcop==that_cop and Perl code is running, not when cops are freed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had to change the definition of IS_PADCONST to account for the
SVf_IsCOW flag. Previously, anything marked READONLY would be consid-
ered a pad constant, to be shared by pads of different recursion lev-
els. Some of those READONLY things were not actually read-only, as
they were copy-on-write scalars, which are never read-only. So I
changed the definition of IS_PADCONST in e3918bb703c to accept COWs
as well as read-only scalars, since I was removing the READONLY flag
from COWs.
With the new copy-on-write scheme, it is easy for a TARG to turn into
a COW. If that happens and then the same subroutine calls itself
recursively for the first time after that, pad_push will see that this
is a pad ‘constant’ and allow the next recursion level to share it.
If pp_concat calls itself recursively, the recursive call can modify
the scalar the outer call is in the middle of using, causing the
return value to be doubled up (‘tmptmp’) in the test case added here.
Since pad constants are marked PADTMP (I would like to change that
eventually), there is no way to distinguish them from TARGs when the
are COWs, except for the fact that pad constants that are COWs are
always shared hash keys (SvLEN==0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in ticket #114820, instead of using READONLY+FAKE to mark
a copy-on-write string, we should make it a separate flag.
There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib)
that assume that SvREADONLY means read-only. Only one CPAN module,
POSIX::pselect will definitely be broken by this. Others may need to
be tweaked. But I believe this is for the better.
It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny
tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq-
uisite for any new COW scheme that creates COWs under the same cir-
cumstances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a construct like
my ($x,$y) = @_
the pushmark/padsv/padsv is already optimised into a single padrange
op. This commit makes the OPf_SPECIAL flag on the padrange op indicate
that in addition, @_ should be pushed onto the stack, skipping an
additional pushmark/gv[*_]/rv2sv combination.
So in total (including the earlier padrange work), the above construct
goes from being
3 <0> pushmark s
4 <$> gv(*_) s
5 <1> rv2av[t3] lK/1
6 <0> pushmark sRM*/128
7 <0> padsv[$x:1,2] lRM*/LVINTRO
8 <0> padsv[$y:1,2] lRM*/LVINTRO
9 <2> aassign[t4] vKS
to
3 <0> padrange[$x:1,2; $y:1,2] l*/LVINTRO,2 ->4
4 <2> aassign[t4] vKS
|
|
|
|
|
| |
Add a new save type that does the equivalent of multiple SAVEt_CLEARSV's
for a given target range. This makes the new padange op more efficient.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This single op can, in some circumstances, replace the sequence of a
pushmark followed by one or more padsv/padav/padhv ops, and possibly
a trailing 'list' op, but only where the targs of the pad ops form
a continuous range.
This is generally more efficient, but is particularly so in the case
of void-context my declarations, such as:
my ($a,@b);
Formerly this would be executed as the following set of ops:
pushmark pushes a new mark
padsv[$a] pushes $a, does a SAVEt_CLEARSV
padav[@b] pushes all the flattened elements (i.e. none) of @a,
does a SAVEt_CLEARSV
list pops the mark, and pops all stack elements except the last
nextstate pops the remaining stack element
It's now:
padrange[$a..@b] does two SAVEt_CLEARSV's
nextstate nothing needing doing to the stack
Note that in the case above, this commit changes user-visible behaviour in
pathological cases; in particular, it has always been possible to modify a
lexical var *before* the my is executed, using goto or closure tricks.
So in principle someone could tie an array, then could notice that FETCH
is no longer being called, e.g.
f();
my ($s, @a); # this no longer triggers two FETCHES
sub f {
tie @a, ...;
push @a, 1,2;
}
But I think we can live with that.
Note also that having a padrange operator will allow us shortly to have
a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple
individual SAVEt_CLEARSV's.
|
|
|
|
|
|
| |
Since OPf_WANT_VOID == G_VOID etc, we can substantially simplify
the OP_GIMME macro (which is what implements GIMME_V).
This saves 588 bytes in the perl executable on my -O2 system.
|
|
|
|
|
|
| |
obslab and the removal of the op_latefree logic, which allowed static
ops, removed support for the compiler modules, which allocates ops statically.
Add an op_static flag to replace the old latefree(d) op_free logic.
|
|
|
|
| |
It was added in ce862d02d but has never been used.
|
|
|
|
|
|
|
|
|
|
|
| |
The easiest way to fix this was to move the special handling out of
the regexp engine. Now a flag is set on the split op itself for
this case. A real regexp is still created, as that is the most
convenient way to propagate locale settings, and it prevents the
need to rework pp_split to handle a null regexp.
This also means that custom regexp plugins no longer need to handle
split specially (which they all do currently).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way. sub { %hash || ... } is optimised to take advantage of this. If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean. When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.
If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more. We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In %hash || $foo, the %hash is in scalar context, so it has to iterate
through the buckets to produce statistics on bucket usage.
If the || is in void context, the value returned by hash is only ever
used as a boolean (as || doesn’t have to return it). We already opti-
mise it by adding a boolkeys op when it is known at compile time that
|| will be in void context.
In sub { %hash || $foo } it is not known at compile time that it will
be in void context, so it wasn’t optimised.
This commit optimises it by flagging the %hash at compile time as
being possibly in ‘true boolean’ context. When that flag is set,
the rv2hv and padhv ops call block_gimme() to see whether || is in
void context.
This speeds things up signficantly. Here is what I got after optimis-
ing rv2hv but before doing padhv:
$ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m0.179s
user 0m0.101s
sys 0m0.005s
$ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m5.446s
user 0m2.419s
sys 0m0.015s
(That example is slightly misleading because of the closure, but the
closure version takes 1 sec. when optimised.)
|
|
|
|
|
|
|
|
| |
This new macro expands to ‘assert(...),’ (with a trailing comma) under
debugging builds; the empty string otherwise.
It allows for the removal of some #ifdef DEBUGGINGs, which could not be
avoided otherwise.
|
|
|
|
|
|
|
| |
This was an early attempt to fix leaking of ops after syntax errors,
disabled because it was deemed to fragile. The new slab allocator
(8be227a) has solved this problem another way, so latefree(d) no
longer serves any purpose.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit eliminates the old slab allocator. It had bugs in it, in
that ops would not be cleaned up properly after syntax errors. So why
not fix it? Well, the new slab allocator *is* the old one fixed.
Now that this is gone, we don’t have to worry as much about ops leak-
ing when errors occur, because it won’t happen any more.
Recent commits eliminated the only reason to hang on to it:
PERL_DEBUG_READONLY_OPS required it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC),
but this useful debugging tool needs to be rewritten for the new
one first.
This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre-
ated after the main CV starts running will get read-only ops, too. It
is when a CV finishes compiling and relinquishes ownership of the slab
that the slab is made read-only, because at that point it should not
be used again for allocation.
BEGIN blocks are exempt, as they are processed before the Slab_to_ro
call in newATTRSUB. The Slab_to_ro call must come at the very end,
after LEAVE_SCOPE, because otherwise the ops freed via the stack (the
SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa-
ble again. At that point, the BEGIN block has already been run and
its slab freed. Maybe slabs belonging to BEGIN blocks can be made
read-only later.
Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to
record the size and readonliness of each slab. (Only the first slab
in a CV’s slab chain uses the readonly flag, since it is conceptually
simpler to treat them all as one unit.) Without recording this infor-
mation manually, things become unbearably slow, the tests taking hours
and hours instead of minutes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses bugs #111462 and #112312 and part of #107000.
When a longjmp occurs during lexing, parsing or compilation, any ops
in C autos that are not referenced anywhere are leaked.
This commit introduces op slabs that are attached to the currently-
compiling CV. New ops are allocated on the slab. When an error
occurs and the CV is freed, any ops remaining are freed.
This is based on Nick Ing-Simmons’ old experimental op slab implemen-
tation, but it had to be rewritten to work this way.
The old slab allocator has a pointer before each op that points to a
reference count stored at the beginning of the slab. Freed ops are
never reused. When the last op on a slab is freed, the slab itself is
freed. When a slab fills up, a new one is created.
To allow iteration through the slab to free everything, I had to have
two pointers; one points to the next item (op slot); the other points
to the slab, for accessing the reference count. Ops come in different
sizes, so adding sizeof(OP) to a pointer won’t work.
The old slab allocator puts the ops at the end of the slab first, the
idea being that the leaves are allocated first, so the order will be
cache-friendly as a result. I have preserved that order for a dif-
ferent reason: We don’t need to store the size of the slab (slabs
vary in size; see below) if we can simply follow pointers to find
the last op.
I tried eliminating reference counts altogether, by having all ops
implicitly attached to PL_compcv when allocated and freed when the CV
is freed. That also allowed op_free to skip FreeOp altogether, free-
ing ops faster. But that doesn’t work in those cases where ops need
to survive beyond their CVs; e.g., re-evals.
The CV also has to have a reference count on the slab. Sometimes the
first op created is immediately freed. If the reference count of
the slab reaches 0, then it will be freed with the CV still point-
ing to it.
CVs use the new CVf_SLABBED flag to indicate that the CV has a refer-
ence count on the slab. When this flag is set, the slab is accessible
via CvSTART when CvROOT is not set, or by subtracting two pointers
(2*sizeof(I32 *)) from CvROOT when it is set. I decided to sneak the
slab into CvSTART during compilation, because enlarging the xpvcv
struct by another pointer would make all CVs larger, even though this
patch only benefits few (programs using string eval).
When the CVf_SLABBED flag is set, the CV takes responsibility for
freeing the slab. If CvROOT is not set when the CV is freed or
undeffed, it is assumed that a compilation error has occurred, so the
op slab is traversed and all the ops are freed.
Under normal circumstances, the CV forgets about its slab (decrement-
ing the reference count) when the root is attached. So the slab ref-
erence counting that happens when ops are freed takes care of free-
ing the slab. In some cases, the CV is told to forget about the slab
(cv_forget_slab) precisely so that the ops can survive after the CV is
done away with.
Forgetting the slab when the root is attached is not strictly neces-
sary, but avoids potential problems with CvROOT being written over.
There is code all over the place, both in core and on CPAN, that does
things with CvROOT, so forgetting the slab makes things more robust
and avoids potential problems.
Since the CV takes ownership of its slab when flagged, that flag is
never copied when a CV is cloned, as one CV could free a slab that
another CV still points to, since forced freeing of ops ignores the
reference count (but asserts that it looks right).
To avoid slab fragmentation, freed ops are marked as freed and
attached to the slab’s freed chain (an idea stolen from DBM::Deep).
Those freed ops are reused when possible. I did consider not reusing
freed ops, but realised that would result in significantly higher mem-
ory using for programs with large ‘if (DEBUG) {...}’ blocks.
SAVEFREEOP was slightly problematic. Sometimes it can cause an op to
be freed after its CV. If the CV has forcibly freed the ops on its
slab and the slab itself, then we will be fiddling with a freed slab.
Making SAVEFREEOP a no-op won’t help, as sometimes an op can be
savefreed when there is no compilation error, so the op would never
be freed. It holds a reference count on the slab, so the whole
slab would leak. So SAVEFREEOP now sets a special flag on the op
(->op_savefree). The forced freeing of ops after a compilation error
won’t free any ops thus marked.
Since many pieces of code create tiny subroutines consisting of only
a few ops, and since a huge slab would be quite a bit of baggage for
those to carry around, the first slab is always very small. To avoid
allocating too many slabs for a single CV, each subsequent slab is
twice the size of the previous.
Smartmatch expects to be able to allocate an op at run time, run it,
and then throw it away. For that to work the op is simply mallocked
when PL_compcv has’t been set up. So all slab-allocated ops are
marked as such (->op_slabbed), to distinguish them from mallocked ops.
All of this is kept under lock and key via #ifdef PERL_CORE, as it
should be completely transparent. If it isn’t transparent, I would
consider that a bug.
I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as
it is used by PERL_DEBUG_READONLY_OPS, which I am not about to
rewrite. :-)
Concerning the change from A to X for slab allocation functions:
Many times in the past, A has been used for functions that were
not intended to be public but were used for public macros. Since
PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_*
to be API functions, since they were rarely actually available. To
avoid propagating this mistake further, they are now X.
|
|
|
|
|
|
|
|
|
| |
This is to allow future commits to free dangling ops after errors.
If an op is on the savestack, then it is going to be freed by scope.c,
and op_free must not be called on it by anyone else.
So we flag such ops new.
|
|
|
|
|
|
| |
This isn't actually used in the pm_flags field of a PMOP, but is
used in the pm_flags arg of Perl_re_op_compile() to indicate
that this pattern should be compiled with "use re 'eval'" in scope
|
|
|
|
|
|
|
|
|
|
|
| |
This indicates that a particular PMOP is in fact OP_QR. We should of
course be able to tell this from op_type, but the regex-compiling API
only gets passed op_flags.
This then allows us to fix a bug where we were deciding during compilation
whether to hang on to the code_blocks based on whether the PMOP was
PMf_HAS_CV rather than PMf_IS_QR; the latter implies the former, but not
the other way round.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This indicates that the op_code_list field in a PMOP is "private";
that is, it points to a list of DO blocks that we don't own, and
shouldn't free, and whose pad may not match ours.
This will allow us to use the op_code_list field in the runtime case of
literal code, e.g. /$runtime(?{...})/ and qr/$runtime(?{...})/. Here, at
compile-time, we need to make the pre-compiled (?{..}) blocks available to
pp_regcomp, but the list containing those blocks is also the list that is
executed in the lead-up to executing pp_regcomp (while skipping the DO
blocks), so the code is already embedded, and doesn't need freeing.
Furthermore, in the qr// case, the code blocks are actually within a
different sub (an anon one) than the PMOP, so the pads won't match.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this commit, qr// with a literal (compile-time) code block
will Do the Right Thing as regards closures and the scope of lexical vars;
in particular, the following now creates three regexes that match 1, 2
and 3:
for my $i (0..2) {
push @r, qr/^(??{$i})$/;
}
"1" =~ $r[1]; # matches
Previously, $i would be evaluated as undef in all 3 patterns.
This is achieved by wrapping the compilation of the pattern within a
new anonymous CV, which is then attached to the pattern. At run-time
pp_qr() clones the CV as well as copying the REGEXP; and when the code
block is executed, it does so using the pad of the cloned CV.
Which makes everything come out all right in the wash.
The CV is stored in a new field of the REGEXP, called qr_anoncv.
Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/;
nor is it yet fixed where the qr// is embedded within another pattern:
continuing with the code example from above,
my $i = 99;
"1" =~ $r[1]; # bare qr: matches: correct!
"X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still
seeing the wrong $i
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the way that code blocks in patterns are parsed and executed,
especially as regards lexical and scoping behaviour.
(Note that this fix only applies to literal code blocks appearing within
patterns: run-time patterns, and literals within qr//, are still done the
old broken way for now).
This change means that for literal /(?{..})/ and /(??{..})/:
* the code block is now fully parsed in the same pass as the surrounding
code, which means that the compiler no longer just does a simplistic
count of balancing {} to find the limits of the code block;
i.e. stuff like /(?{ $x = "{" })/ now works (in the same way
that subscripts in double quoted strings always have: "$a{'{'}" )
* Error and warning messages will now appear to emanate from the main body
rather than an re_eval; e.g. the output from
#!/usr/bin/perl
/(?{ warn "boo" })/
has changed from
boo at (re_eval 1) line 1.
to
boo at /tmp/p line 2.
* scope and closures now behave as you might expect; for example
for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ }
now prints "abc" rather than ""
* with recursion, it now finds the lexical within the appropriate depth
of pad: this code now prints "012" rather than "000":
sub recurse {
my ($n) = @_;
return if $n > 2;
"" =~ /^(?{print $n})/;
recurse($n+1);
}
recurse(0);
* an earlier fix that stopped 'my' declarations within code blocks causing
crashes, required the accumulating of two SAVECOMPPADs on the stack for
each iteration of the code block; this is no longer needed;
* UNITCHECK blocks within literal code blocks are now run as part of the
main body of code (run-time code blocks still trigger an immediate
call to the UNITCHECK block though)
This is all achieved by building upon the efforts of the commits which led
up to this; those altered the parser to parse literal code blocks
directly, but up until now those code blocks were discarded by
Perl_pmruntime and the block re-compiled using the original re_eval
mechanism. As of this commit, for the non-qr and non-runtime variants,
those code blocks are no longer thrown away. Instead:
* the LISTOP generated by the parser, which contains all the code
blocks plus OP_CONSTs that collectively make up the literal pattern,
is now stored in a new field in PMOPs, called op_code_list. For example
in /A(?{BLOCK})C/, the listop stored in op_code_list looks like
LIST
PUSHMARK
CONST['A']
NULL/special (aka a DO block)
BLOCK
CONST['(?{BLOCK})']
CONST['B']
* each of the code blocks has its last op set to null and is individually
run through the peephole optimiser, so each one becomes a little
self-contained block of code, rather than a list of blocks that run into
each other;
* then in re_op_compile(), we concatenate the list of CONSTs to produce a
string to be compiled, but at the same time we note any DO blocks and
note the start and end positions of the corresponding CONST['(?{BLOCK})'];
* (if the current regex engine isn't the built-in perl one, then we just
throw away the code blocks and pass the concatenated string to the engine)
* then during regex compilation, whenever we encounter a '(?{', we see if
it matches the index of one of the pre-compiled blocks, and if so, we
store a pointer to that block in an 'l' data slot, and use the end index
to skip over the text of the code body. Conversely, if the index doesn't
match, then we know that it's a run-time pattern and (for now), compile
it in the old way.
* During execution, when an EVAL op is encountered, if data->what is 'l',
then we just use the pad that was in effect when the pattern was called;
i.e. we use the current pad slot of the currently executing CV that the
pattern is embedded within.
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added to op.h in commit 599cee73:
commit 599cee73f2261c5e09cde7ceba3f9a896989e117
Author: Paul Marquess <paul.marquess@btinternet.com>
Date: Wed Jul 29 10:28:45 1998 +0100
lexical warnings; tweaks to places that didn't apply correctly
Message-Id: <9807290828.AA26286@claudius.bfsec.bt.co.uk>
Subject: lexical warnings patch for 5.005_50
p4raw-id: //depot/perl@1773
dump.c was modified to dump in, in this commit:
commit bf91b999b25fa75a3ef7a327742929592a2e7e9c
Author: Simon Cozens <simon@netthink.co.uk>
Date: Sun May 13 21:20:36 2001 +0100
Op private flags
Message-ID: <20010513202036.A21896@netthink.co.uk>
p4raw-id: //depot/perl@10117
But is apparently completely unused anywhere. And I want that bit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This comment in perl.c:
/* Note: 20,40,80 used for NATIVE_HINTS */
(added by a0ed51b3 [Here are the long-expected Unicode/UTF-8 mod-
ifications.]), has apparently always been wrong. The values in
vms/vmsish.h end with 7 zeroes in hex, and are only shifted down to
one zero when assigned to cop->op_private in op.c:newSTATEOP. In
PL_hints they never have the values indicated in perl.h. So those
are actually free bits. It’s the high versions in vmsish.h that
are not free.
20 (actually 0x20000000) hasn’t been used for anything since commit
744a34f9085 (Urk -- undo previous removal of vmsish 'exit' change),
which change I don’t really understand. In any case, the comment was
never updated.
This comment in op.h:
/* NOTE: OP_NEXTSTATE and OP_DBSTATE (i.e. COPs) carry lower
* bits of PL_hints in op_private */
was added by d41ff1b8ad98 (introduce $^U), which was later reverted.
op_private does not carry the lower bits of PL_hints, but, rather,
certain higher bits of PL_hints, shifted to fit (the NATIVE_HINTS
cited above).
Due to misleading comments, I ended up breaking the VMS build in com-
mit d1718a7cf5, by using its bits for something else.
This commit moves the bits around a bit to avoid the clash, and modi-
fies the comments to reflect reality.
|
|
|
|
|
| |
This meant changing LABEL's definition in perly.y, so most of this
commit is actually from the regened files.
|