| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Flag 2 on entersub is HINT_STRICT_REFS, not _SUBS.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_reg_state.re_reparsing is a hacky flag used to allow runtime
code blocks to be included in patterns. Basically, since code blocks
are now handled by the perl parser within literal patterns, runtime
patterns are handled by taking the (assembled at runtime) pattern,
and feeding it back through the parser via the equivalent of
eval q{qr'the_pattern'},
so that run-time (?{..})'s appear to be literal code blocks.
When this happens, the global flag PL_reg_state.re_reparsing is set,
which modifies lexing and parsing in minor ways (such as whether \\ is
stripped).
Now, I'm in the slow process of trying to eliminate global regex state
(i.e. gradually removing the fields of PL_reg_state), and also a change
which will be coming a few commits ahead requires the info which this flag
indicates to linger for longer (currently it is cleared immediately after
the call to scan_str().
For those two reasons, this commit adds a new mechanism to indicate this:
a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING
in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval.
Its still a yukky global flag hack, but its a *different* global flag hack
now.
For this commit, we add the new flag(s) but keep the old
PL_reg_state.re_reparsing flag and assert that the two mechanisms always
match. The next commit will remove re_reparsing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch resolves several issues at once. The parts are
sufficiently interconnected that it is hard to break it down
into smaller commits. The tickets open for these issues are:
RT #94490 - split and constant folding
RT #116086 - split "\x20" doesn't work as documented
It additionally corrects some issues with cached regexes that
were exposed by the split changes (and applied to them).
It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171
and cccd1425414e6518c1fc8b7bcaccfb119320c513.
Prior to this patch the special RXf_SKIPWHITE behavior of
split(" ", $thing)
was only available if Perl could resolve the first argument to
split at compile time, meaning under various arcane situations.
This manifested as oddities like
my $delim = $cond ? " " : qr/\s+/;
split $delim, $string;
and
split $cond ? " ", qr/\s+/, $string
not behaving the same as:
($cond ? split(" ", $string) : split(/\s+/, $string))
which isn't very convenient.
This patch changes this by adding a new flag to the op_pmflags,
PMf_SPLIT which enables pp_regcomp() to know whether it was called
as part of split, which allows the RXf_SPLIT to be passed into run
time regex compilation. We also preserve the original flags so
pattern caching works properly, by adding a new property to the
regexp structure, "compflags", and related macros for accessing it.
We preserve the original flags passed into the compilation process,
so we can compare when we are trying to decide if we need to
recompile.
Note that this essentially the opposite fix from the one applied
originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171.
The reverted patch was meant to make:
split( 0 || " ", $thing ) #1
consistent with
my $x=0; split( $x || " ", $thing ) #2
and not with
split( " ", $thing ) #3
This was reverted because it broke C<split("\x{20}", $thing)>, and
because one might argue that is not that #1 does the wrong thing,
but rather that the behavior of #2 that is wrong. In other words
we might expect that all three should behave the same as #3, and
that instead of "fixing" the behavior of #1 to be like #2, we should
really fix the behavior of #2 to behave like #3. (Which is what we did.)
Also, it doesn't make sense to move the special case detection logic
further from the regex engine. We really want the regex engine to decide
this stuff itself, otherwise split " ", ... wouldn't work properly with
an alternate engine. (Imagine we add a special regexp meta pattern that behaves
the same as " " does in a split /.../. For instance we might make
split /(*SPLITWHITE)/ trigger the same behavior as split " ".
The other major change as result of this patch is it effectively
reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which
was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which
and free up bits in the regex flags structure.
But we dont want to get rid of these vars, and it turns out that
RXf_SEEN_LOOKBEHIND is used only in the same situation as the new
RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to
RXf_NO_INPLACE_SUBST, and then instead of using two vars we use
only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to
have their bits back.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #115080]
m?...? is only supposed to match once, until reset. Normally this is done
by setting the PMf_USED flag on the PMOP. Under ithreads we can't modify
ops, so instead we indicate by setting the regex's SV to readonly. (This
is a bit of a hack: the flag should be associated with the PMOP, not the
regex).
This breaks with run-time regexes when the pattern gets recompiled; for
example:
for my $c (qw(a b c)) {
print "matched $c\n" if $c =~ m?^$c$?;
}
outputs
matched a
on unthreaded, but
matched a
matched b
matched c
on threaded.
The re_eval jumbo fix made this more noticeable by sometimes recompiling
even when the pattern text hasn't changed (to make closures work ok).
The quick fix is to propagate the readonlyness of the old re to the new
re. (The proper fix would be to store the flag state in a pad slot
associated with the PMOP).
Needless to say, I've gone for the quick fix.
|
|
|
|
|
|
|
| |
In the threaded version of PmopSTASH_set(), the assigned value is a
PADOFFSET, not a pointer; so use 0 rather than NULL for the default value.
This keeps clang happy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only affected threaded builds. I think the comments in the added
test explain well enough what was happening.
The solution is to store a stashpad offset in the pmop, instead of the
name of the stash. This is similar to what was done with cop stashes
in d4d03940c58a.
Not only does this fix the crash, but it also makes compilation faster
and saves memory (no separate malloc for every m?pat?).
I had to move Safefree(PL_stashpad) later on in perl_destruct, because
freeing a pmop causes the PL_stashpad to be accessed, and pmops can be
freed during sv_clean_all. Its previous location was not a problem
for cops, as PL_stashpad[cop->cop_stashoff] is only accessed when
PL_curcop==that_cop and Perl code is running, not when cops are freed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had to change the definition of IS_PADCONST to account for the
SVf_IsCOW flag. Previously, anything marked READONLY would be consid-
ered a pad constant, to be shared by pads of different recursion lev-
els. Some of those READONLY things were not actually read-only, as
they were copy-on-write scalars, which are never read-only. So I
changed the definition of IS_PADCONST in e3918bb703c to accept COWs
as well as read-only scalars, since I was removing the READONLY flag
from COWs.
With the new copy-on-write scheme, it is easy for a TARG to turn into
a COW. If that happens and then the same subroutine calls itself
recursively for the first time after that, pad_push will see that this
is a pad ‘constant’ and allow the next recursion level to share it.
If pp_concat calls itself recursively, the recursive call can modify
the scalar the outer call is in the middle of using, causing the
return value to be doubled up (‘tmptmp’) in the test case added here.
Since pad constants are marked PADTMP (I would like to change that
eventually), there is no way to distinguish them from TARGs when the
are COWs, except for the fact that pad constants that are COWs are
always shared hash keys (SvLEN==0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in ticket #114820, instead of using READONLY+FAKE to mark
a copy-on-write string, we should make it a separate flag.
There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib)
that assume that SvREADONLY means read-only. Only one CPAN module,
POSIX::pselect will definitely be broken by this. Others may need to
be tweaked. But I believe this is for the better.
It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny
tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq-
uisite for any new COW scheme that creates COWs under the same cir-
cumstances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a construct like
my ($x,$y) = @_
the pushmark/padsv/padsv is already optimised into a single padrange
op. This commit makes the OPf_SPECIAL flag on the padrange op indicate
that in addition, @_ should be pushed onto the stack, skipping an
additional pushmark/gv[*_]/rv2sv combination.
So in total (including the earlier padrange work), the above construct
goes from being
3 <0> pushmark s
4 <$> gv(*_) s
5 <1> rv2av[t3] lK/1
6 <0> pushmark sRM*/128
7 <0> padsv[$x:1,2] lRM*/LVINTRO
8 <0> padsv[$y:1,2] lRM*/LVINTRO
9 <2> aassign[t4] vKS
to
3 <0> padrange[$x:1,2; $y:1,2] l*/LVINTRO,2 ->4
4 <2> aassign[t4] vKS
|
|
|
|
|
| |
Add a new save type that does the equivalent of multiple SAVEt_CLEARSV's
for a given target range. This makes the new padange op more efficient.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This single op can, in some circumstances, replace the sequence of a
pushmark followed by one or more padsv/padav/padhv ops, and possibly
a trailing 'list' op, but only where the targs of the pad ops form
a continuous range.
This is generally more efficient, but is particularly so in the case
of void-context my declarations, such as:
my ($a,@b);
Formerly this would be executed as the following set of ops:
pushmark pushes a new mark
padsv[$a] pushes $a, does a SAVEt_CLEARSV
padav[@b] pushes all the flattened elements (i.e. none) of @a,
does a SAVEt_CLEARSV
list pops the mark, and pops all stack elements except the last
nextstate pops the remaining stack element
It's now:
padrange[$a..@b] does two SAVEt_CLEARSV's
nextstate nothing needing doing to the stack
Note that in the case above, this commit changes user-visible behaviour in
pathological cases; in particular, it has always been possible to modify a
lexical var *before* the my is executed, using goto or closure tricks.
So in principle someone could tie an array, then could notice that FETCH
is no longer being called, e.g.
f();
my ($s, @a); # this no longer triggers two FETCHES
sub f {
tie @a, ...;
push @a, 1,2;
}
But I think we can live with that.
Note also that having a padrange operator will allow us shortly to have
a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple
individual SAVEt_CLEARSV's.
|
|
|
|
|
|
| |
Since OPf_WANT_VOID == G_VOID etc, we can substantially simplify
the OP_GIMME macro (which is what implements GIMME_V).
This saves 588 bytes in the perl executable on my -O2 system.
|
|
|
|
|
|
| |
obslab and the removal of the op_latefree logic, which allowed static
ops, removed support for the compiler modules, which allocates ops statically.
Add an op_static flag to replace the old latefree(d) op_free logic.
|
|
|
|
| |
It was added in ce862d02d but has never been used.
|
|
|
|
|
|
|
|
|
|
|
| |
The easiest way to fix this was to move the special handling out of
the regexp engine. Now a flag is set on the split op itself for
this case. A real regexp is still created, as that is the most
convenient way to propagate locale settings, and it prevents the
need to rework pp_split to handle a null regexp.
This also means that custom regexp plugins no longer need to handle
split specially (which they all do currently).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way. sub { %hash || ... } is optimised to take advantage of this. If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean. When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.
If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more. We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In %hash || $foo, the %hash is in scalar context, so it has to iterate
through the buckets to produce statistics on bucket usage.
If the || is in void context, the value returned by hash is only ever
used as a boolean (as || doesn’t have to return it). We already opti-
mise it by adding a boolkeys op when it is known at compile time that
|| will be in void context.
In sub { %hash || $foo } it is not known at compile time that it will
be in void context, so it wasn’t optimised.
This commit optimises it by flagging the %hash at compile time as
being possibly in ‘true boolean’ context. When that flag is set,
the rv2hv and padhv ops call block_gimme() to see whether || is in
void context.
This speeds things up signficantly. Here is what I got after optimis-
ing rv2hv but before doing padhv:
$ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m0.179s
user 0m0.101s
sys 0m0.005s
$ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m5.446s
user 0m2.419s
sys 0m0.015s
(That example is slightly misleading because of the closure, but the
closure version takes 1 sec. when optimised.)
|
|
|
|
|
|
|
|
| |
This new macro expands to ‘assert(...),’ (with a trailing comma) under
debugging builds; the empty string otherwise.
It allows for the removal of some #ifdef DEBUGGINGs, which could not be
avoided otherwise.
|
|
|
|
|
|
|
| |
This was an early attempt to fix leaking of ops after syntax errors,
disabled because it was deemed to fragile. The new slab allocator
(8be227a) has solved this problem another way, so latefree(d) no
longer serves any purpose.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit eliminates the old slab allocator. It had bugs in it, in
that ops would not be cleaned up properly after syntax errors. So why
not fix it? Well, the new slab allocator *is* the old one fixed.
Now that this is gone, we don’t have to worry as much about ops leak-
ing when errors occur, because it won’t happen any more.
Recent commits eliminated the only reason to hang on to it:
PERL_DEBUG_READONLY_OPS required it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC),
but this useful debugging tool needs to be rewritten for the new
one first.
This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre-
ated after the main CV starts running will get read-only ops, too. It
is when a CV finishes compiling and relinquishes ownership of the slab
that the slab is made read-only, because at that point it should not
be used again for allocation.
BEGIN blocks are exempt, as they are processed before the Slab_to_ro
call in newATTRSUB. The Slab_to_ro call must come at the very end,
after LEAVE_SCOPE, because otherwise the ops freed via the stack (the
SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa-
ble again. At that point, the BEGIN block has already been run and
its slab freed. Maybe slabs belonging to BEGIN blocks can be made
read-only later.
Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to
record the size and readonliness of each slab. (Only the first slab
in a CV’s slab chain uses the readonly flag, since it is conceptually
simpler to treat them all as one unit.) Without recording this infor-
mation manually, things become unbearably slow, the tests taking hours
and hours instead of minutes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses bugs #111462 and #112312 and part of #107000.
When a longjmp occurs during lexing, parsing or compilation, any ops
in C autos that are not referenced anywhere are leaked.
This commit introduces op slabs that are attached to the currently-
compiling CV. New ops are allocated on the slab. When an error
occurs and the CV is freed, any ops remaining are freed.
This is based on Nick Ing-Simmons’ old experimental op slab implemen-
tation, but it had to be rewritten to work this way.
The old slab allocator has a pointer before each op that points to a
reference count stored at the beginning of the slab. Freed ops are
never reused. When the last op on a slab is freed, the slab itself is
freed. When a slab fills up, a new one is created.
To allow iteration through the slab to free everything, I had to have
two pointers; one points to the next item (op slot); the other points
to the slab, for accessing the reference count. Ops come in different
sizes, so adding sizeof(OP) to a pointer won’t work.
The old slab allocator puts the ops at the end of the slab first, the
idea being that the leaves are allocated first, so the order will be
cache-friendly as a result. I have preserved that order for a dif-
ferent reason: We don’t need to store the size of the slab (slabs
vary in size; see below) if we can simply follow pointers to find
the last op.
I tried eliminating reference counts altogether, by having all ops
implicitly attached to PL_compcv when allocated and freed when the CV
is freed. That also allowed op_free to skip FreeOp altogether, free-
ing ops faster. But that doesn’t work in those cases where ops need
to survive beyond their CVs; e.g., re-evals.
The CV also has to have a reference count on the slab. Sometimes the
first op created is immediately freed. If the reference count of
the slab reaches 0, then it will be freed with the CV still point-
ing to it.
CVs use the new CVf_SLABBED flag to indicate that the CV has a refer-
ence count on the slab. When this flag is set, the slab is accessible
via CvSTART when CvROOT is not set, or by subtracting two pointers
(2*sizeof(I32 *)) from CvROOT when it is set. I decided to sneak the
slab into CvSTART during compilation, because enlarging the xpvcv
struct by another pointer would make all CVs larger, even though this
patch only benefits few (programs using string eval).
When the CVf_SLABBED flag is set, the CV takes responsibility for
freeing the slab. If CvROOT is not set when the CV is freed or
undeffed, it is assumed that a compilation error has occurred, so the
op slab is traversed and all the ops are freed.
Under normal circumstances, the CV forgets about its slab (decrement-
ing the reference count) when the root is attached. So the slab ref-
erence counting that happens when ops are freed takes care of free-
ing the slab. In some cases, the CV is told to forget about the slab
(cv_forget_slab) precisely so that the ops can survive after the CV is
done away with.
Forgetting the slab when the root is attached is not strictly neces-
sary, but avoids potential problems with CvROOT being written over.
There is code all over the place, both in core and on CPAN, that does
things with CvROOT, so forgetting the slab makes things more robust
and avoids potential problems.
Since the CV takes ownership of its slab when flagged, that flag is
never copied when a CV is cloned, as one CV could free a slab that
another CV still points to, since forced freeing of ops ignores the
reference count (but asserts that it looks right).
To avoid slab fragmentation, freed ops are marked as freed and
attached to the slab’s freed chain (an idea stolen from DBM::Deep).
Those freed ops are reused when possible. I did consider not reusing
freed ops, but realised that would result in significantly higher mem-
ory using for programs with large ‘if (DEBUG) {...}’ blocks.
SAVEFREEOP was slightly problematic. Sometimes it can cause an op to
be freed after its CV. If the CV has forcibly freed the ops on its
slab and the slab itself, then we will be fiddling with a freed slab.
Making SAVEFREEOP a no-op won’t help, as sometimes an op can be
savefreed when there is no compilation error, so the op would never
be freed. It holds a reference count on the slab, so the whole
slab would leak. So SAVEFREEOP now sets a special flag on the op
(->op_savefree). The forced freeing of ops after a compilation error
won’t free any ops thus marked.
Since many pieces of code create tiny subroutines consisting of only
a few ops, and since a huge slab would be quite a bit of baggage for
those to carry around, the first slab is always very small. To avoid
allocating too many slabs for a single CV, each subsequent slab is
twice the size of the previous.
Smartmatch expects to be able to allocate an op at run time, run it,
and then throw it away. For that to work the op is simply mallocked
when PL_compcv has’t been set up. So all slab-allocated ops are
marked as such (->op_slabbed), to distinguish them from mallocked ops.
All of this is kept under lock and key via #ifdef PERL_CORE, as it
should be completely transparent. If it isn’t transparent, I would
consider that a bug.
I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as
it is used by PERL_DEBUG_READONLY_OPS, which I am not about to
rewrite. :-)
Concerning the change from A to X for slab allocation functions:
Many times in the past, A has been used for functions that were
not intended to be public but were used for public macros. Since
PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_*
to be API functions, since they were rarely actually available. To
avoid propagating this mistake further, they are now X.
|
|
|
|
|
|
|
|
|
| |
This is to allow future commits to free dangling ops after errors.
If an op is on the savestack, then it is going to be freed by scope.c,
and op_free must not be called on it by anyone else.
So we flag such ops new.
|
|
|
|
|
|
| |
This isn't actually used in the pm_flags field of a PMOP, but is
used in the pm_flags arg of Perl_re_op_compile() to indicate
that this pattern should be compiled with "use re 'eval'" in scope
|
|
|
|
|
|
|
|
|
|
|
| |
This indicates that a particular PMOP is in fact OP_QR. We should of
course be able to tell this from op_type, but the regex-compiling API
only gets passed op_flags.
This then allows us to fix a bug where we were deciding during compilation
whether to hang on to the code_blocks based on whether the PMOP was
PMf_HAS_CV rather than PMf_IS_QR; the latter implies the former, but not
the other way round.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This indicates that the op_code_list field in a PMOP is "private";
that is, it points to a list of DO blocks that we don't own, and
shouldn't free, and whose pad may not match ours.
This will allow us to use the op_code_list field in the runtime case of
literal code, e.g. /$runtime(?{...})/ and qr/$runtime(?{...})/. Here, at
compile-time, we need to make the pre-compiled (?{..}) blocks available to
pp_regcomp, but the list containing those blocks is also the list that is
executed in the lead-up to executing pp_regcomp (while skipping the DO
blocks), so the code is already embedded, and doesn't need freeing.
Furthermore, in the qr// case, the code blocks are actually within a
different sub (an anon one) than the PMOP, so the pads won't match.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this commit, qr// with a literal (compile-time) code block
will Do the Right Thing as regards closures and the scope of lexical vars;
in particular, the following now creates three regexes that match 1, 2
and 3:
for my $i (0..2) {
push @r, qr/^(??{$i})$/;
}
"1" =~ $r[1]; # matches
Previously, $i would be evaluated as undef in all 3 patterns.
This is achieved by wrapping the compilation of the pattern within a
new anonymous CV, which is then attached to the pattern. At run-time
pp_qr() clones the CV as well as copying the REGEXP; and when the code
block is executed, it does so using the pad of the cloned CV.
Which makes everything come out all right in the wash.
The CV is stored in a new field of the REGEXP, called qr_anoncv.
Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/;
nor is it yet fixed where the qr// is embedded within another pattern:
continuing with the code example from above,
my $i = 99;
"1" =~ $r[1]; # bare qr: matches: correct!
"X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still
seeing the wrong $i
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the way that code blocks in patterns are parsed and executed,
especially as regards lexical and scoping behaviour.
(Note that this fix only applies to literal code blocks appearing within
patterns: run-time patterns, and literals within qr//, are still done the
old broken way for now).
This change means that for literal /(?{..})/ and /(??{..})/:
* the code block is now fully parsed in the same pass as the surrounding
code, which means that the compiler no longer just does a simplistic
count of balancing {} to find the limits of the code block;
i.e. stuff like /(?{ $x = "{" })/ now works (in the same way
that subscripts in double quoted strings always have: "$a{'{'}" )
* Error and warning messages will now appear to emanate from the main body
rather than an re_eval; e.g. the output from
#!/usr/bin/perl
/(?{ warn "boo" })/
has changed from
boo at (re_eval 1) line 1.
to
boo at /tmp/p line 2.
* scope and closures now behave as you might expect; for example
for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ }
now prints "abc" rather than ""
* with recursion, it now finds the lexical within the appropriate depth
of pad: this code now prints "012" rather than "000":
sub recurse {
my ($n) = @_;
return if $n > 2;
"" =~ /^(?{print $n})/;
recurse($n+1);
}
recurse(0);
* an earlier fix that stopped 'my' declarations within code blocks causing
crashes, required the accumulating of two SAVECOMPPADs on the stack for
each iteration of the code block; this is no longer needed;
* UNITCHECK blocks within literal code blocks are now run as part of the
main body of code (run-time code blocks still trigger an immediate
call to the UNITCHECK block though)
This is all achieved by building upon the efforts of the commits which led
up to this; those altered the parser to parse literal code blocks
directly, but up until now those code blocks were discarded by
Perl_pmruntime and the block re-compiled using the original re_eval
mechanism. As of this commit, for the non-qr and non-runtime variants,
those code blocks are no longer thrown away. Instead:
* the LISTOP generated by the parser, which contains all the code
blocks plus OP_CONSTs that collectively make up the literal pattern,
is now stored in a new field in PMOPs, called op_code_list. For example
in /A(?{BLOCK})C/, the listop stored in op_code_list looks like
LIST
PUSHMARK
CONST['A']
NULL/special (aka a DO block)
BLOCK
CONST['(?{BLOCK})']
CONST['B']
* each of the code blocks has its last op set to null and is individually
run through the peephole optimiser, so each one becomes a little
self-contained block of code, rather than a list of blocks that run into
each other;
* then in re_op_compile(), we concatenate the list of CONSTs to produce a
string to be compiled, but at the same time we note any DO blocks and
note the start and end positions of the corresponding CONST['(?{BLOCK})'];
* (if the current regex engine isn't the built-in perl one, then we just
throw away the code blocks and pass the concatenated string to the engine)
* then during regex compilation, whenever we encounter a '(?{', we see if
it matches the index of one of the pre-compiled blocks, and if so, we
store a pointer to that block in an 'l' data slot, and use the end index
to skip over the text of the code body. Conversely, if the index doesn't
match, then we know that it's a run-time pattern and (for now), compile
it in the old way.
* During execution, when an EVAL op is encountered, if data->what is 'l',
then we just use the pad that was in effect when the pattern was called;
i.e. we use the current pad slot of the currently executing CV that the
pattern is embedded within.
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added to op.h in commit 599cee73:
commit 599cee73f2261c5e09cde7ceba3f9a896989e117
Author: Paul Marquess <paul.marquess@btinternet.com>
Date: Wed Jul 29 10:28:45 1998 +0100
lexical warnings; tweaks to places that didn't apply correctly
Message-Id: <9807290828.AA26286@claudius.bfsec.bt.co.uk>
Subject: lexical warnings patch for 5.005_50
p4raw-id: //depot/perl@1773
dump.c was modified to dump in, in this commit:
commit bf91b999b25fa75a3ef7a327742929592a2e7e9c
Author: Simon Cozens <simon@netthink.co.uk>
Date: Sun May 13 21:20:36 2001 +0100
Op private flags
Message-ID: <20010513202036.A21896@netthink.co.uk>
p4raw-id: //depot/perl@10117
But is apparently completely unused anywhere. And I want that bit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This comment in perl.c:
/* Note: 20,40,80 used for NATIVE_HINTS */
(added by a0ed51b3 [Here are the long-expected Unicode/UTF-8 mod-
ifications.]), has apparently always been wrong. The values in
vms/vmsish.h end with 7 zeroes in hex, and are only shifted down to
one zero when assigned to cop->op_private in op.c:newSTATEOP. In
PL_hints they never have the values indicated in perl.h. So those
are actually free bits. It’s the high versions in vmsish.h that
are not free.
20 (actually 0x20000000) hasn’t been used for anything since commit
744a34f9085 (Urk -- undo previous removal of vmsish 'exit' change),
which change I don’t really understand. In any case, the comment was
never updated.
This comment in op.h:
/* NOTE: OP_NEXTSTATE and OP_DBSTATE (i.e. COPs) carry lower
* bits of PL_hints in op_private */
was added by d41ff1b8ad98 (introduce $^U), which was later reverted.
op_private does not carry the lower bits of PL_hints, but, rather,
certain higher bits of PL_hints, shifted to fit (the NATIVE_HINTS
cited above).
Due to misleading comments, I ended up breaking the VMS build in com-
mit d1718a7cf5, by using its bits for something else.
This commit moves the bits around a bit to avoid the clash, and modi-
fies the comments to reflect reality.
|
|
|
|
|
| |
This meant changing LABEL's definition in perly.y, so most of this
commit is actually from the regened files.
|
|
|
|
|
| |
This function provides a convenient and thread-safe way for modules to
hook op checking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up till now, -t was popping too much off the stack when stacked with
other filetest operators.
Since the special use of _ doesn’t apply to -t, we cannot simply have
it use _ when stacked, but instead we pass the argument down from the
previous op.
To facilitate this, the whole stacked mechanism has to change.
As before, in an expression like -r -w -x, -x and -w are flagged
as ‘stacking’ ops (followed by another filetest), and -w and -r are
flagged as stacked (preceded by another filetest).
Stacking filetest ops no longer return a false value to the next op
when a test fails, and stacked ops no longer check the truth of the
value on the stack to determine whether to return early (if it’s
false).
The argument to the first filetest is now passed from one op to
another. This is similar to the mechanism that overloaded objects
were already using. Now it applies to any argument.
Since it could be false, we cannot rely on the boolean value of the
stack item. So, stacking ops, when they return false, now traverse
the ->op_next pointers and find the op after the last stacked op.
That op is returned to the runloop. This short-circuiting is proba-
bly faster than calling every subsequent op (a separate function call
for each).
Filetest ops other than -t continue to use the last stat buffer when
stacked, so the argument on the stack is ignored.
But if the op is preceded by nothing other than -t (where preceded
means on the right, since the ops are evaluated right-to-left), it
*does* use the argument on the stack, since -t has not set the last
stat buffer.
The new OPpFT_AFTER_t flag indicates that a stacked op is preceded by
nothing other than -t.
In ‘-e -t foo’, the -e gets the flag, but not in ‘-e -t -r foo’,
because -r will have saved the stat buffer, so -e can just use that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug is a side effect of rv2gv’s starting to return an incoercible
mortal copy of a coercible glob in 5.14:
$ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell'
0
$ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell'
-1
In the first case, tell without arguments is returning the position of
the filehandle.
In the second case, tell with an explicit argument that happens to
be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu-
ally tell *$fh) sets PL_last_in_gv to a mortal copy thereof, which is
freed at the end of the statement, setting PL_last_in_gv to null. So
there is no ‘last used’ handle by the time we get to the tell without
arguments.
This commit adds a new rv2gv flag that tells it not to copy the glob.
By doing it unconditionally on the kidop, this allows tell(*$fh) to
work the same way.
Let’s hope nobody does tell(*{*$fh}), which will unset PL_last_in_gv
because the inner * returns a mortal copy.
This whole area is really icky. PL_last_in_gv should be refcounted,
but that would cause handles to leak out of scope, breaking programs
that rely on the auto-closing ‘feature’.
|
|
|
|
|
|
|
|
| |
This commit not only mentions default (as opposed to when)
in the error message about it being outside a topicalizer, but
also normalises those error messages, making them consistent with
continue and other loop controls. It also makes the perldiag
entry for when actually match the error message.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In void context we can optimise
substr($foo, $bar, $baz) = $replacement;
to something like
substr($foo, $bar, $baz, $replacement);
except that the execution order must be preserved. So what we actu-
ally do is
substr($replacement, $foo, $bar, $baz);
with a flag to indicate that the replacement comes first. This means
we can also optimise assignment to two-argument substr the same way.
Although optimisations are not supposed to change behaviour,
this one does.
• It stops substr assignment from calling get-magic twice, which means
the optimisation makes things less buggy than usual.
• It causes the uninitialized warning (for an undefined first argu-
ment) to mention the substr operator, as it did before the previous
commit, rather than the assignment operator. I think that sort of
detail is minor enough.
I had to make the warning about clobbering references apply whenever
substr does a replacement, and not only when used as an lvalue. So
four-argument substr now emits that warning. I would consider that a
bug fix, too.
Also, if the numeric arguments to four-argument substr and the
replacement string are undefined, the order of the uninitialized warn-
ings is slightly different, but is consistent regardless of whether
the optimisation is in effect.
I believe this will make 95% of substr assignments run faster. So
there is less incentive to use what I consider the less readable form
(the four-argument form, which is not self-documenting).
Since I like naïve benchmarks, here are Before and After:
$ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000'
real 0m2.391s
user 0m2.381s
sys 0m0.005s
$ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000'
real 0m0.936s
user 0m0.927s
sys 0m0.005s
|
|
|
|
|
| |
After much alternation, altercation and alteration, __SUB__ is
finally here.
|
|
|
|
|
|
|
|
|
|
|
| |
This function evaluates its argument as a byte string, regardless of
the internal encoding. It croaks if the string contains characters
outside the byte range. Hence evalbytes(" use utf8; '\xc4\x80' ")
will return "\x{100}", even if the original string had the UTF8 flag
on, and evalbytes(" '\xc4\x80' ") will return "\xc4\x80".
This has the side effect of fixing the deparsing of CORE::break under
‘use feature’ when there is an override.
|
|
|
|
|
| |
(modified by the committer only to apply when the unicode_eval
feature is enabled)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit makes CORE::glob bypassing glob overrides.
A side effect of the fix is that, with the default glob implementa-
tion, undefining *CORE::GLOBAL::glob no longer results in an ‘unde-
fined subroutine’ error.
Another side effect is that compilation of a glob op no longer assumes
that the loading of File::Glob will create the *CORE::GLOB::glob type-
glob. ‘++$INC{"File/Glob.pm"}; sub File::Glob::csh_glob; eval '<*>';’
used to crash.
This is accomplished using a mechanism similar to lock() and
threads::shared. There is a new PL_globhook interpreter varia-
ble that pp_glob calls when there is no override present. Thus,
File::Glob (which is supposed to be transparent, as it *is* the
built-in implementation) no longer interferes with the user mechanism
for overriding glob.
This removes one tier from the five or so hacks that constitute glob’s
implementation, and which work together to make it one of the buggiest
and most inconsistent areas of Perl.
|
|
|
|
|
|
|
|
|
|
| |
It is no longer used in core (having been superseded by
cv_ckproto_len_flags), is unused on CPAN, and is not part of the API.
The cv_ckproto ‘public’ macro is modified to use the _flags version.
I put ‘public’ in quotes because, even before this commit, cv_ckproto
was using a non-exported function, and hence could never have worked
on a strict linker (or whatever you call it).
|
|
|
|
|
|
|
|
|
| |
With threaded builds, cop.h and op.h get an extra member in their
structs, to save the UTF-8ness of the stash's name.
*STASH_set() checks for the flag, stores it through
*STASH_flags(), and *STASH() uses the latter to fetch the
correct scalar.
|
|
|
|
|
|
| |
$[ remains as a variable. It no longer has compile-time magic.
At runtime, it always reads as zero, accepts a write of zero, but dies
on writing any other value.
|
|
|
|
|
|
|
| |
instead of deriving it from the opchain.
Also contains a test where using the opchain to determine the deref
type fails.
|
|
|
|
| |
used by OPpDEREF
|
|
|
|
|
| |
pp_coreargs will use this to distinguish between the \$ and \[$@%*]
prototypes.
|