| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Building with -Accflags="-Wformat -Werror=format-security"
triggers format string warnings from gcc.
As gcc can't tell that all the strings are constant here,
explicitly pass separate format strings to make it happy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PERL_ASYNC_CHECK() was added to Perl_leave_scope() as part of commit
f410a2119920dd04, which moved signal dispatch from the runloop to
control flow ops, to mitigate nearly all of the speed cost of safe
signals.
The assumption was that scope exit was a safe place to dispatch signals.
However, this is not true, as parts of the regex engine call
leave_scope(), the regex engine stores some state in per-interpreter
variables, and code called within signal handlers can change these
values.
Hence remove the call to PERL_ASYNC_CHECK() from Perl_leave_scope(), and
add it explicitly in the various OPs which were relying on their call to
leave_scope() to dispatch any pending signals. Also add a
PERL_ASYNC_CHECK() to the exit of the runloop, which ensures signals
still dispatch from S_sortcv() and S_sortcv_stacked(), as well as
addressing one of the concerns in the commit message of
f410a2119920dd04:
Subtle bugs might remain - there might be constructions that enter
the runloop (where signals used to be dispatched) but don't contain
any PERL_ASYNC_CHECK() calls themselves.
Finally, move the PERL_ASYNC_CHECK(); added by that commit to pp_goto to
the end of the function, to be consistent with the positioning of all
other PERL_ASYNC_CHECK() calls - at the beginning or end of OP
functions, hence just before the return to or just after the call from
the runloop, and hence effectively at the same point as the previous
location of PERL_ASYNC_CHECK() in the runloop.
|
|
|
|
|
| |
The are lots of places where local vars aren't used when compiled
with NO_TAINT_SUPPORT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(See RT #113928)
Formerly, __SUB__ within a code block within a qr// returned
a pointer to the "hidden" anon CV that implements the qr// closure. Since
this was never designed to be called directly, it would SEGV if you tried.
The easiest way to make this safe is to skip any CXt_SUB frames that
are marked as CXp_SUB_RE: i.e. skip any subs that are there just to
execute code blocks. For a qr//, this means that we return the sub which
the pattern match is embedded in.
Also, document the behaviour of __SUB__ within code blocks as being
subject to change. It could be argued for example that in these cases it
should return undef. But with the 5.18.0 release a month or two away, just
make it safe for now, and revisit the semantics later if necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(See RT #113928)
In code like
sub foo { /A(?{ bar; caller(); }B/; }
the regex /A(?{B})C/ is, from a scope point of view, supposed to
be compiled and executed as:
/A/ && do { B } && /C/;
i.e. the code block in B is part of the same sub as the code surrounding
the regex. Thus the result of caller() above should see the caller as
whoever called foo.
Due to an implementation detail, we actually push a hidden extra
sub CX before calling the pattern. This detail was leaking when caller()
was used. Fux it so that it ignores this extra context frame.
Conversely, for a qr//, that *is* supposed to be seen as an extra level
of anonymous sub, so add tests to ensure that is so.
i.e.
$r = qr/...(?{code}).../
/...$r.../
is supposed to behave like
$r = sub { code };
$r->();
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Indeed. The Perl_ck_warner() call in die_unwind() used to happen
before unwinding, so would be affected by the lexical warning state
at the die() site. Now it happens after unwinding, so takes the
lexical warning state at the catching site. I don't have a clear
idea of which behaviour is more correct. t/op/die_keeperr.t, which
was introduced as part of my exception handling changes, is actually
testing for the catching-site criterion, but that's not asserting
that the criterion should be that. The documentation speaks of "no
warnings 'misc'", but doesn't say which lexical scope matters.
Assuming we want to revert this change, the easy fix is to move the
conditional Perl_ck_warner() back to before unwinding. A more
difficult way would be to determine the disposition of the warning
before unwinding and then warn in the required manner after
unwinding. I see no compelling reason to warn after unwinding
rather than before, so just moving the warning code should be fine.
Note from the committer: This patch was supplied by Zefram in
https://rt.perl.org/rt3/Ticket/Display.html?id=113794#txn-1204749
with a note that some extra work was required for
ext/XS-APItest/t/call.t before the job was done. Ricardo Signes
applied this patch and followed Zefram's lead in patching
ext/XS-APItest/t/call.t without being 100% certain that this was
what was meant. This commit was then submitted for review.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two issues fixed here.
First, when a pattern has a run-time code-block included, such as
$code = '(?{...})'
/foo$code/
the mechanism used to parse those run-time blocks: of feeding the
resultant pattern into a call to eval_sv() with the string
qr'foo(?{...})'
and then extracting out any resulting opcode trees from the returned
qr object -- suffered from the re-parsed qr'..' also being subject to
overload:constant qr processing, which could result in Bad Things
happening.
Since we now have the PL_parser->lex_re_reparsing flag in scope throughout
the parsing of the pattern, this is easy to detect and avoid.
The second issue is a mechanism to avoid recursion when getting false
positives in S_has_runtime_code() for code like '[(?{})]'.
For patterns like this, we would suspect that the pattern may have code
(even though it doesn't), so feed it into qr'...' and reparse, and
again it looks like runtime code, so feed it in, rinse and repeat.
The thing to stop recursion was when we saw a qr with a single OP_CONST
string, we assumed it couldn't have any run-time component, and thus no
run-time code blocks.
However, this broke qr/foo/ in the presence of overload::constant qr
overloading, which could convert foo into a string containing code blocks.
The fix for this is to change the recursion-avoidance mechanism (in a way
which also turns out to be simpler too). Basically, when we fake up a
qr'...' and eval it, we turn off any 'use re eval' in scope: its not
needed, since we know the .... will be a constant string without any
overloading. Then we use the lack of 'use re eval' in scope to
skip calling S_has_runtime_code() and just assume that the code has no
run-time patterns (if it has, then eventually the regex parser will
rightly complain about 'Eval-group not allowed at runtime').
This commit also adds some fairly comprehensive tests for this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_reg_state.re_reparsing is a hacky flag used to allow runtime
code blocks to be included in patterns. Basically, since code blocks
are now handled by the perl parser within literal patterns, runtime
patterns are handled by taking the (assembled at runtime) pattern,
and feeding it back through the parser via the equivalent of
eval q{qr'the_pattern'},
so that run-time (?{..})'s appear to be literal code blocks.
When this happens, the global flag PL_reg_state.re_reparsing is set,
which modifies lexing and parsing in minor ways (such as whether \\ is
stripped).
Now, I'm in the slow process of trying to eliminate global regex state
(i.e. gradually removing the fields of PL_reg_state), and also a change
which will be coming a few commits ahead requires the info which this flag
indicates to linger for longer (currently it is cleared immediately after
the call to scan_str().
For those two reasons, this commit adds a new mechanism to indicate this:
a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING
in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval.
Its still a yukky global flag hack, but its a *different* global flag hack
now.
For this commit, we add the new flag(s) but keep the old
PL_reg_state.re_reparsing flag and assert that the two mechanisms always
match. The next commit will remove re_reparsing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch resolves several issues at once. The parts are
sufficiently interconnected that it is hard to break it down
into smaller commits. The tickets open for these issues are:
RT #94490 - split and constant folding
RT #116086 - split "\x20" doesn't work as documented
It additionally corrects some issues with cached regexes that
were exposed by the split changes (and applied to them).
It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171
and cccd1425414e6518c1fc8b7bcaccfb119320c513.
Prior to this patch the special RXf_SKIPWHITE behavior of
split(" ", $thing)
was only available if Perl could resolve the first argument to
split at compile time, meaning under various arcane situations.
This manifested as oddities like
my $delim = $cond ? " " : qr/\s+/;
split $delim, $string;
and
split $cond ? " ", qr/\s+/, $string
not behaving the same as:
($cond ? split(" ", $string) : split(/\s+/, $string))
which isn't very convenient.
This patch changes this by adding a new flag to the op_pmflags,
PMf_SPLIT which enables pp_regcomp() to know whether it was called
as part of split, which allows the RXf_SPLIT to be passed into run
time regex compilation. We also preserve the original flags so
pattern caching works properly, by adding a new property to the
regexp structure, "compflags", and related macros for accessing it.
We preserve the original flags passed into the compilation process,
so we can compare when we are trying to decide if we need to
recompile.
Note that this essentially the opposite fix from the one applied
originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171.
The reverted patch was meant to make:
split( 0 || " ", $thing ) #1
consistent with
my $x=0; split( $x || " ", $thing ) #2
and not with
split( " ", $thing ) #3
This was reverted because it broke C<split("\x{20}", $thing)>, and
because one might argue that is not that #1 does the wrong thing,
but rather that the behavior of #2 that is wrong. In other words
we might expect that all three should behave the same as #3, and
that instead of "fixing" the behavior of #1 to be like #2, we should
really fix the behavior of #2 to behave like #3. (Which is what we did.)
Also, it doesn't make sense to move the special case detection logic
further from the regex engine. We really want the regex engine to decide
this stuff itself, otherwise split " ", ... wouldn't work properly with
an alternate engine. (Imagine we add a special regexp meta pattern that behaves
the same as " " does in a split /.../. For instance we might make
split /(*SPLITWHITE)/ trigger the same behavior as split " ".
The other major change as result of this patch is it effectively
reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which
was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which
and free up bits in the regex flags structure.
But we dont want to get rid of these vars, and it turns out that
RXf_SEEN_LOOKBEHIND is used only in the same situation as the new
RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to
RXf_NO_INPLACE_SUBST, and then instead of using two vars we use
only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to
have their bits back.
|
|
|
|
|
|
| |
Makes use of the fact that the exception case is both rare and okay to
be penalized (a tiny bit) instead of the common case doing an extra
branch.
|
|
|
|
| |
another.
|
|
|
|
| |
A compiler on one of our smoke platforms wants empty braces here.
|
|
|
|
|
| |
PERL_UNUSED_DECL doesn't do anything under g++, so doing this silences
some g++ warnings.
|
|
|
|
|
| |
Changing these slightly got rid of the warnings like:
toke.c:9168: warning: format not a string literal and no format arguments
|
|
|
|
|
|
| |
Deciding whether this is goto-label or goto-sub can only correctly
happen after get-magic has been invoked, as get-magic can cause the
argument to begin or cease to be a subroutine reference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was discussed in ticket #114820.
This new copy-on-write mechanism stores a reference count for the
PV inside the PV itself, at the very end. (I was using SvEND+1
at first, but parts of the regexp engine expect to be able to do
SvCUR_set(sv,0), which causes the wrong byte of the string to be used
as the reference count.) Only 256 SVs can share the same PV this way.
Also, only strings with allocated space after the trailing null can
be used for copy-on-write.
Much of the code is shared with PERL_OLD_COPY_ON_WRITE. The restric-
tion against doing copy-on-write with magical variables has hence been
inherited, though it is not necessary. A future commit will take
care of that.
I had to modify _core_swash_init to handle $@ differently. The exist-
ing mechanism of copying $@ to a new scalar and back again was very
fragile. With copy-on-write, $@ =~ s/// can cause pp_subst’s string
pointers to become stale. So now we remove the scalar from *@ and
allow the utf8-table-loading code to autovivify a new one. Then we
restore the untouched $@ afterwards if all goes well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove a large amount of machine code (~4KB for me) from funcs that use
ERRSV making Perl faster and smaller by preventing multiple evaluation.
ERRSV is a macro that contains GvSVn which eventually conditionally calls
Perl_gv_add_by_type. If a SvTRUE or any other multiple evaluation macro
is used on ERRSV, the expansion will, in asm have dozens of calls to
Perl_gv_add_by_type one for each test/deref of the SV in SvTRUE. A less
severe problem exists when multiple funcs (sv_set*) in a row call, each
with ERRSV as an arg. Its recalculated then, Perl_gv_add_by_type and all.
I think ERRSV macro got the func call in commit f5fa9033b8, Perl RT #70862.
Prior to that commit it would be pure derefs I think. Saving the SV* is
still better than looking into interp->gv->gp to get the SV * after each
func call.
I received no responses to
http://www.nntp.perl.org/group/perl.perl5.porters/2012/11/msg195724.html
explaining when the SV is replaced in PL_errgv, so took a conservative
view and assumed callbacks (with Perl stack/ENTER/LEAVE/eval_*/call_*)
can change it. I also assume ERRSV will never return null, this allows a
more efficiently version of SvTRUE to be used.
In Perl_newATTRSUB_flags a wasteful copy to C stack operation with the
string was removed, and a croak_notcontext to remove push instructions to
the stack. I was not sure about the interaction between ERRSV and message
sv, I didn't change it to a more efficient (instruction wise, speed, idk)
format string combining of the not safe string and ERRSV in the croak call.
If such an optimization is done, a compiler potentially will put the not
safe string on the first, unconditionally, then check PL_in_eval, and
then jump to the croak call site, or eval ERRSV, push the SV on the C stack
then push the format string "%"SVf"%s". The C stack allocated const char
array came from commit e1ec3a884f .
In Perl_eval_pv, croak_on_error was checked first to not eval ERRSV unless
necessery. I was not sure about the side effects of using a more efficient
croak_sv instead of Perl_croak (null chars, utf8, etc) so I left a comment.
nocontext used to save an push instruction on implicit sys perl.
In S_doeval, don't open a new block to avoid large whitespace changes.
The NULL assignment should optimize away unless accidental usage of errsv
in the future happens through a code change. There might be a bug here from
commit ecad31f018 since previous a char * was derefed to check for null
char, but ERRSV will never be null, so "Unknown error\n" branch will never
be taken.
For pp_sys.c, in pp_die a new block was opened to not eval ERRSV if
"well-formed exception supplied". The else if else if else blocks all used
ERRSV, so a "SV * errsv = NULL;" and a eval in the conditional with comma
op thing wouldn't work (maybe it would, see toke.c comments later in this
message). pp_warn, I have no comments.
In S_compile_runtime_code, a croak_sv question comes up same as in
Perl_eval_pv.
In S_new_constant, a eval in the conditional is done to avoid evaling
ERRSV if PL_in_eval short circuits. Same thing in Perl_yyerror_pvn.
Perl__core_swash_init I have no comments.
In the future, a SvEMPTYSTRING macro should be considered (not fully
thought out by me) to replace the SvTRUEs with something smaller and
faster when dealing with ERRSV. _nomg is another thing to think about.
In S_init_main_stash there is an opportunity to prevent an extra ERRSV
between "sv_grow(ERRSV, 240);" and "CLEAR_ERRSV();" that was too complicated
for me to optimize.
before perl517.dll
.text 0xc2f77
.rdata 0x212dc
.data 0x3948
after perl517.dll
.text 0xc20d7
.rdata 0x212dc
.data 0x3948
Numbers are from VC 2003 x86 32 bit.
|
|
|
|
|
|
| |
When invoking the debugger recursively, pp_dbstate needs to push a new
pad (like pp_entersub) so that DB::DB doesn’t stomp on the lexical
variables belonging to the outer call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
# New Ticket Created by "Eric Brine"
# Please include the string: [perl #115710]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=115710 >
This is a bug report for perl from ikegami@adaelis.com,
generated with the help of perlbug 1.39 running under perl 5.14.2.
-----------------------------------------------------------------
[Please describe your issue here]
Attached patch silences two build warnings on systems where ivsize >
ptrsize.
They are safe to ignore, a side-effect of a function with a polymorphic
interface.
cv = find_runcv_where(FIND_RUNCV_level_eq, iv, NULL);
cv = find_runcv_where(FIND_RUNCV_padid_eq, PTR2IV(p), NULL); // p is a
PADNAMELIST*
[Please do not change anything below this line]
-----------------------------------------------------------------
|
|
|
|
|
| |
We croak if CxTYPE(cx) == CXt_EVAL before reaching the code
in question.
|
| |
|
|
|
|
|
|
|
|
|
| |
I found these 2 strlens while stepping through the interp while running a
script and both came from a pp_require. UNIVERSAL::can was not modified
since it is more rarely called than pp_require. A better more through
investigation of version obj comparison and upgrading will need to be done
in the future (new funcs needed for the derived/upg_version idiom, remove
the upg_version since it was changed to always be a ver obj, etc).
|
|
|
|
|
| |
It is a little tricky, as we have to hang on to @_ while unwinding the
effects of local @_.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By defining NO_TAINT_SUPPORT, all the various checks that perl does for
tainting become no-ops. It's not an entirely complete change: it doesn't
attempt to remove the taint-related interpreter variables, but instead
virtually eliminates access to it.
Why, you ask? Because it appears to speed up perl's run-time
significantly by avoiding various "are we running under taint" checks
and the like.
This change is not in a state to go into blead yet. The actual way I
implemented it might raise some (valid) objections. Basically, I
replaced all uses of the global taint variables (but not PL_taint_warn!)
with an extra layer of get/set macros (TAINT_get/TAINTING_get).
Furthermore, the change is not complete:
- PL_taint_warn would likely deserve the same treatment.
- Obviously, tests fail. We have tests for -t/-T
- Right now, I added a Perl warn() on startup when -t/-T are detected
but the perl was not compiled support it. It might be argued that it
should be silently ignored! Needs some thinking.
- Code quality concerns - needs review.
- Configure support required.
- Needs thinking: How does this tie in with CPAN XS modules that use
PL_taint and friends? It's easy to backport the new macros via PPPort,
but that doesn't magically change all code out there. Might be
harmless, though, because whenever you're running under
NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false.
Thus, the only CPAN code that SHOULD be adversely affected is code
that changes taint state.
|
|
|
|
| |
This leak was caused by v5.17.4-125-gf7ee53b.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was leaking:
$ ./miniperl -Xe 'warn $$; while(1){eval "ok 8"};'
1915 at -e line 1.
^C
This was not:
$ ./miniperl -Xe 'warn $$; while(1){eval "sub {ok 8}"};'
1916 at -e line 1.
^C
The sub is successfully taking care of its ops when it is freed. The
eval is not.
I made the mistake of having the CV relinquish ownership of the op
slab after an eval syntax error. That’s precisely the situation in
which the ops are likely to leak, and for which the slab allocator was
designed. Duh.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the xpvlv and regexp structs conflict, we have to find somewhere
else to put the regexp struct.
I was going to sneak it in SvPVX, allocating a buffer large
enough to fit the regexp struct followed by the string, and have
SvPVX - sizeof(regexp) point to the struct. But that would make all
regexp flag-checking macros fatter, and those are used in hot code.
So I came up with another method. Regexp stringification is not
speed-critical. So we can move the regexp stringification out of
re->sv_u and put it in the regexp struct. Then the regexp struct
itself can be pointed to by re->sv_u. So SVt_REGEXPs will have
re->sv_any and re->sv_u pointing to the same spot. PVLVs can then
have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u
point to a regexp struct. All regexp member access can go through
sv_u instead of sv_any, which will be no slower than before.
Regular expressions will no longer be SvPOK, so we give sv_2pv spec-
ial logic for regexps. We don’t need to make the regexp struct
larger, as SvLEN is currently always 0 iff mother_re is set. So we
can replace the SvLEN field with the pv.
SvFAKE is never used without SvPOK or SvSCREAM also set. So we can
use that to identify regexps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I added pad IDs so that a pad could record which pad it closes over,
to avoid problems with closures closing over the wrong pad, resulting
in crashes or bizarre copies. These pad IDs were shared between
clones of the same pad.
In commit 9ef8d56, for efficiency I made clones of the same closure
share the same pad name list.
It has just occurred to be that each padlist containing the same pad
name list also has the same pad ID, so we can just use the pad name
list itself as the ID.
This makes padlists 32 bits smaller and eliminates PL_pad_generation
from the interpreter struct.
|
|
|
|
|
|
| |
Setting $_ to a copy-on-write scalar in an @INC filter causes the
parser to modify every other scalar sharing the same string buffer.
It needs to be forced to a regular scalar before the parser sees it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the comments about how taint works above pp_subst in
pp_hot.c, the return value of s/// should not be tainted based on
the taintedness of the replacement. That makes sense, because the
replacement does not affect how many iterations there were. (The
return value is the number of iterations).
It only applies, however, to the cases where the ‘constant replace-
ment’ optimisation applies.
That means /e taints its return value:
$ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/; Dump $x'
SV = PVMG(0x822ff4) at 0x824dc0
REFCNT = 1
FLAGS = (pIOK)
IV = 1
NV = 0
PV = 0
$ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/e; Dump $x'
SV = PVMG(0x823010) at 0x824dc0
REFCNT = 1
FLAGS = (GMG,SMG,pIOK)
IV = 1
NV = 0
PV = 0
MAGIC = 0x201940
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
MG_LEN = 1
The number pushed on to the stack was becoming tainted due to the set-
ting of PL_tainted. PL_tainted is assigned to and the return value
explicitly tainted if appropriate shortly after the mPUSHi (which
implies sv_setiv, which taints when PL_tainted is true), so setting
PL_tainted to 0 just before the mPUSHi is safe.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following on from a recent thread I've put together a patch to expand
the error message when a module can't be loaded. With this patch,
instead of:
Can't locate Stuff/Of/Dreams.pm in @INC (@INC contains: ...)
You get:
Can't locate Stuff/Of/Dreams.pm in @INC (you may need to install the Stuff::Of::Dreams module) (@INC contains: ...)
[The committer tweaked the error message,
based on a suggestion by Tony Cook. See
<https://rt.perl.org/rt3/Ticket/Display.html?id=115048#txn-1157750>.]
|
|
|
|
|
| |
This is a follow-up to 432d4561c48, which fixed *DB::DB without
&DB::DB, but not &DB::DB without body.
|
|
|
|
|
|
|
|
|
| |
According to the documentation, reset() with no argument resets pat-
terns. But reset "" and reset "\0foo" were also resetting patterns.
While I was at it, I fixed embedded nulls, too, though it’s not likely
anyone is using this. I could not fix the bug within the existing API
for sv_reset, so I created a new function and left the old one with
the old behaviour. Call me pear-annoyed.
|
| |
|
|
|
|
|
| |
This was added in f3aa04c29a, but stopped being relevant in
d5ec2987912.
|
|
|
|
|
|
|
| |
In commit 7e4f04509c6 I forgot about caller. This commit makes the
value returned by (caller $n)[9] assignable to ${^WARNING_BITS} to
produce exactly the same warnings settings, including warnings con-
trolled by $^W.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code had been buggily attempting to overwrite just-freed memory since
PERL_POISON was added by commit 94010e71b67db040 in June 2005. However, no
regression test exercised this code path until recently.
Also fix the offset in the array of UVs used by PERL_OLD_COPY_ON_WRITE to
store RX_SAVED_COPY(). It now uses p[2]. Previously it had used p[1],
directly conflicting with the use of p[1] to store RX_NPARENS().
The code is too intertwined to meaningfully do these as separate commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 6502e08109cd003b2cdf39bc94ef35e52203240b introduced copying just
the part of the regex string that were needed; but piggy-backing on that
commit was a temporary change I made that I forgot to undo, which - it
turns out - causes SEGVs and similar when the replacement part of a
substitution dies.
This commits reverts that change.
Spotted as
Bleadperl v5.17.3-255-g6502e08 breaks GAAS/URI-1.60.tar.gz
(not assigned an RT ticket number yet)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(no functional changes).
1. Remove some dead code from pp_split; it's protected by an assert
that it could never be called.
2. Simplify the flags settings for the call to CALLREGEXEC() in
pp_substcont: on subsequent matches we always set REXEC_NOT_FIRST,
which forces the regex engine not to copy anyway, so passing the
REXEC_COPY_STR is pointless, as is the conditional code to set it.
3. (whitespace change): split a conditional expression over 2 lines
for easier reading.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a pattern matches, and that pattern contains captures (or $`, $&, $'
or /p are present), a copy is made of the whole original string, so
that $1 et al continue to hold the correct value even if the original
string is subsequently modified. This can have severe performance
penalties; for example, this code causes a 1Mb buffer to be allocated,
copied and freed a million times:
$&;
$x = 'x' x 1_000_000;
1 while $x =~ /(.)/g;
This commit changes this so that, where possible, only the needed
substring of the original string is copied: in the above case, only a
1-byte buffer is copied each time. Also, it now reuses or reallocs the
buffer, rather than freeing and mallocing each time.
Now that PL_sawampersand is a 3-bit flag indicating separately whether
$`, $& and $' have been seen, they each contribute only their own
individual penalty; which ones have been seen will limit the extent to
which we can avoid copying the whole buffer.
Note that the above code *without* the $& is not currently slow, but only
because the copying is artificially disabled to avoid the performance hit.
The next but one commit will remove that hack, meaning that it will still
be fast, but will now be correct in the presence of a modified original
string.
We achieve this by by adding suboffset and subcoffset fields to the
existing subbeg and sublen fields of a regex, to indicate how many bytes
and characters have been skipped from the logical start of the string till
the physical start of the buffer. To avoid copying stuff at the end, we
just reduce sublen. For example, in this:
"abcdefgh" =~ /(c)d/
subbeg points to a malloced buffer containing "c\0"; sublen == 1,
and suboffset == 2 (as does subcoffset).
while if $& has been seen,
subbeg points to a malloced buffer containing "cd\0"; sublen == 2,
and suboffset == 2.
If in addition $' has been seen, then
subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6,
and suboffset == 2.
The regex engine won't do this by default; there are two new flag bits,
REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with
REXEC_COPY_STR, request that the engine skip the start or end of the
buffer (it will still copy in the presence of the relevant $`, $&, $',
/p).
Only pp_match has been enhanced to use these extra flags; substitution
can't easily benefit, since the usual action of s///g is to copy the
whole string first time round, then perform subsequent matching iterations
against the copy, without further copying. So you still need to copy most
of the buffer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
String eval appends "\n;" to the string before evaluating it.
(caller $n)[6], which returns the text of the eval, was giving the
modified string, rather than the original.
In fact, it was returning the actual string buffer that the parser
uses. This commit changes it to create a new mortal SV from that
string buffer, but without the last two characters.
It unfortunately breaks this JAPH:
eval'BEGIN{${\(caller 2)[6]}=~y< !"$()+\-145=ACHMT^acfhinrsty{}>
<nlrhta"o Pe e,\nkrcrJ uthspeia">}say if+chr(1) -int"145"!=${^MATCH}'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CVs close over their outer CVs. So, when you write:
my $x = 52;
sub foo {
sub bar {
sub baz {
$x
}
}
}
baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo,
and foo’s to the main cv.
When the inner reference to $x is looked up, the CvOUTSIDE chain is
followed, and each sub’s pad is looked at to see if it has an $x.
(This happens at compile time.)
It can happen that bar is undefined and then redefined:
undef &bar;
eval 'sub bar { my $x = 34 }';
After this, baz will still refer to the main cv’s $x (52), but, if baz
had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x.
(It’s not really a new bar, as its refaddr is the same, but it has a
new body.)
This particular case is harmless, and is obscure enough that we could
define it any way we want, and it could still be considered correct.
The real problem happens when CVs are cloned.
When a CV is cloned, its name pad already contains the offsets into
the parent pad where the values are to be found. If the outer CV
has been undefined and redefined, those pad offsets can be com-
pletely bogus.
Normally, a CV cannot be cloned except when its outer CV is running.
And the outer CV cannot have been undefined without also throwing
away the op that would have cloned the prototype.
But formats can be cloned when the outer CV is not running. So it
is possible for cloned formats to close over bogus entries in a new
parent pad.
In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed)
instead of SCALAR(0xdeafbee):
sub foo {
my $x;
format =
@
($x,warn \$x)[0]
.
}
undef &foo;
eval 'sub foo { my @x; write }';
foo
__END__
And if the offset that the format’s pad closes over is beyond the end
of the parent’s new pad, we can even get a crash, as in this case:
eval
'sub foo {' .
'{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999
. q|
my $x;
format =
@
($x,warn \$x)[0]
.
}
|;
undef &foo;
eval 'sub foo { my @x; my $x = 34; write }';
foo();
__END__
So now, instead of using CvROOT to identify clones of
CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t
actually have an ID, so we give them one. Any time a sub is cloned,
the new padlist gets the same ID as the old. The format needs to
remember what its outer sub’s padlist ID was, so we put that in the
padlist struct, too.
|
|
|
|
|
| |
Much code relies on the fact that PADLIST is typedeffed as AV.
PADLIST should be treated as a distinct type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes most register declarations in C code (and accompanying
documentation) in the Perl core. Retained are those in the ext
directory, Configure, and those that are associated with assembly
language.
See:
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
which says, in part:
There is no good example of register usage when using modern compilers
(read: last 10+ years) because it almost never does any good and can do
some bad. When you use register, you are telling the compiler "I know
how to optimize my code better than you do" which is almost never the
case. One of three things can happen when you use register:
The compiler ignores it, this is most likely. In this case the only
harm is that you cannot take the address of the variable in the
code.
The compiler honors your request and as a result the code runs slower.
The compiler honors your request and the code runs faster, this is the least likely scenario.
Even if one compiler produces better code when you use register, there
is no reason to believe another will do the same. If you have some
critical code that the compiler is not optimizing well enough your best
bet is probably to use assembler for that part anyway but of course do
the appropriate profiling to verify the generated code is really a
problem first.
|
|
|
|
|
|
|
|
| |
Commit c127bd3aaa5c5 made XS DB::DB subs work. Before that,
pp_dbstate assumed DB::DB was written it perl. It adjusts CvDEPTH
when calling the XSUB, which serves no purpose. It was presumably
just copied from the pure-Perl-calling code. pp_entersub does-
n’t do this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we translate path names of required modules into Unix format,
we haven't (recently) been using the optional second argument to
the translation routines,[1] an argument that supplies a buffer for
the translation. That causes them to use a static buffer. Which
means that if two or more different threads are doing a require
operation at the same time, they will be blindly sharing the same
buffer.
So allocate buffers as we need them and make them mortal so they
will go away at the next state transition.
[1] Use of an automatic variable for the buffer was removed way
back in 46fc3d4c69a0ad.
|
|
|
|
|
|
|
|
| |
This commit makes given() alias $_ to the argument, using a slot in
the lexical pad if a lexical $_ is in scope, or $'_ otherwise.
This makes it work very similarly to foreach, and eliminates the
problem of List::Util functions not working inside given().
|
|
|
|
|
|
|
|
|
| |
In pp_require, the error reporting code treats file names ending /\.p?h\z/
specially. The detection code for this, as refactored in 2010 by commit
686c4ca09cf9d6ae, could read one or two bytes before the start of the
filename for filenames less than 3 bytes long. (Note this cannot happen with
module names given to use or require, as appending ".pm" will always make the
filename at least 3 bytes long.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions have been allowing arbitrary expressions, but would
treat anything that did not resolve to a const op as the empty string.
Not only were arguments swallowed up without warning, but constant
folding could change the behaviour. Computed labels are allowed for
goto, and there is no reason to disallow them for these other ops.
This can also come in handy for certain types of code generators.
In the process of modifying pp functions to accept arbitrary labels,
I noticed that the label and loop-popping code was identical in three
functions, so I moved it out into a separate static function, to make
the changes easier.
I also had to reorder newLOOPEX significantly, because code under the
goto branch needed to a apply to last, and vice versa. Using multiple
gotos to switch between the branches created too much of a mess.
I also eliminated the use of SP from pp_last, to avoid copying the
value back and forth between SP and PL_stack_sp.
|