| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch sets the utf8 flag on $! if the error string passes utf8
validity tests and has some bytes with the upper bit set. (If none
have that bit set, is an ASCII string, and whether or not it is UTF-8 is
irrelevant.) This is a heuristic that could fail, but as the reference
in the comments points out this is unlikely.
One can reasonably assume that a UTF-8 locale will return a UTF-8
result. So another approach would be to look at that (but we wouldn't
want to turn the flag on for a purely ASCII string anyway, as that could
change the semantics from existing behavior by making the string follow
Unicode rules, whereas it didn't necessarily before.) To do this, we
could keep track of the utf8ness of the LC_MESSAGES locale. But until
the heuristic in this patch is shown to not be good enough, I don't see
the need to do this extra work.
|
|
|
|
|
|
| |
Perl_magic_methcall is not public API, so there is no
need to add another function and we can just change
function's arguments.
|
|
|
|
|
|
|
|
|
| |
Can be used when it's known that method name has no
package part - just method name.
With flag set SV with precomputed hash value is used
and pp_method_named is called instead of pp_method.
Method lookup is faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code dealt rather inconsistently with uids and gids. Some
places assumed that they could be safely stored in UVs, others
in IVs, others in ints; All of them should've been using the
macros from config.h instead. Similarly, code that created
SVs or pushed values into the stack was also making incorrect
assumptions -- As a point of reference, only pp_stat did the
right thing:
#if Uid_t_size > IVSIZE
mPUSHn(PL_statcache.st_uid);
#else
# if Uid_t_sign <= 0
mPUSHi(PL_statcache.st_uid);
# else
mPUSHu(PL_statcache.st_uid);
# endif
#endif
The other places were potential bugs, and some were even causing
warnings in some unusual OSs, like haiku or qnx.
This commit ammends the situation by introducing four new macros,
SvUID(), sv_setuid(), SvGID(), and sv_setgid(), and using them
where needed.
|
|
|
|
|
| |
Using SvREFCNT_dec_NN in a couple of places eliminates needless
null checks.
|
|
|
|
|
| |
This causes each deprecated function to have a prominent note to that
effect in its API documentation.
|
|
|
|
| |
I found re-formatting this multi-line 'if' to be easier to understand
|
|
|
|
|
| |
The are lots of places where local vars aren't used when compiled
with NO_TAINT_SUPPORT.
|
|
|
|
|
| |
pod/perldelta.pod: document reversion of ENV{foo} = undef behaviour in delta
t/op/magic.t: add a test for ENV{foo} = undef
|
|
|
|
|
|
|
|
|
|
|
| |
SvGROW unconditionally derefs SvANY to check SvLEN. A crash occurs if the
sv is SVt_NULL. Also mg_get uses SvMAGIC which also has the same problem.
Rather than having people finding these properties out by trial and error,
document them. There is no sense in adding type checks since these 2 calls
have been had sv type restrictions since probably day 1 and for
performance reason. Anyone who hit the restrictions would have either
fixed their code immediately, or abandoned using XS. I observed someone
abandoning XS in the field over these undocumented restrictions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove a large amount of machine code (~4KB for me) from funcs that use
ERRSV making Perl faster and smaller by preventing multiple evaluation.
ERRSV is a macro that contains GvSVn which eventually conditionally calls
Perl_gv_add_by_type. If a SvTRUE or any other multiple evaluation macro
is used on ERRSV, the expansion will, in asm have dozens of calls to
Perl_gv_add_by_type one for each test/deref of the SV in SvTRUE. A less
severe problem exists when multiple funcs (sv_set*) in a row call, each
with ERRSV as an arg. Its recalculated then, Perl_gv_add_by_type and all.
I think ERRSV macro got the func call in commit f5fa9033b8, Perl RT #70862.
Prior to that commit it would be pure derefs I think. Saving the SV* is
still better than looking into interp->gv->gp to get the SV * after each
func call.
I received no responses to
http://www.nntp.perl.org/group/perl.perl5.porters/2012/11/msg195724.html
explaining when the SV is replaced in PL_errgv, so took a conservative
view and assumed callbacks (with Perl stack/ENTER/LEAVE/eval_*/call_*)
can change it. I also assume ERRSV will never return null, this allows a
more efficiently version of SvTRUE to be used.
In Perl_newATTRSUB_flags a wasteful copy to C stack operation with the
string was removed, and a croak_notcontext to remove push instructions to
the stack. I was not sure about the interaction between ERRSV and message
sv, I didn't change it to a more efficient (instruction wise, speed, idk)
format string combining of the not safe string and ERRSV in the croak call.
If such an optimization is done, a compiler potentially will put the not
safe string on the first, unconditionally, then check PL_in_eval, and
then jump to the croak call site, or eval ERRSV, push the SV on the C stack
then push the format string "%"SVf"%s". The C stack allocated const char
array came from commit e1ec3a884f .
In Perl_eval_pv, croak_on_error was checked first to not eval ERRSV unless
necessery. I was not sure about the side effects of using a more efficient
croak_sv instead of Perl_croak (null chars, utf8, etc) so I left a comment.
nocontext used to save an push instruction on implicit sys perl.
In S_doeval, don't open a new block to avoid large whitespace changes.
The NULL assignment should optimize away unless accidental usage of errsv
in the future happens through a code change. There might be a bug here from
commit ecad31f018 since previous a char * was derefed to check for null
char, but ERRSV will never be null, so "Unknown error\n" branch will never
be taken.
For pp_sys.c, in pp_die a new block was opened to not eval ERRSV if
"well-formed exception supplied". The else if else if else blocks all used
ERRSV, so a "SV * errsv = NULL;" and a eval in the conditional with comma
op thing wouldn't work (maybe it would, see toke.c comments later in this
message). pp_warn, I have no comments.
In S_compile_runtime_code, a croak_sv question comes up same as in
Perl_eval_pv.
In S_new_constant, a eval in the conditional is done to avoid evaling
ERRSV if PL_in_eval short circuits. Same thing in Perl_yyerror_pvn.
Perl__core_swash_init I have no comments.
In the future, a SvEMPTYSTRING macro should be considered (not fully
thought out by me) to replace the SvTRUEs with something smaller and
faster when dealing with ERRSV. _nomg is another thing to think about.
In S_init_main_stash there is an opportunity to prevent an extra ERRSV
between "sv_grow(ERRSV, 240);" and "CLEAR_ERRSV();" that was too complicated
for me to optimize.
before perl517.dll
.text 0xc2f77
.rdata 0x212dc
.data 0x3948
after perl517.dll
.text 0xc20d7
.rdata 0x212dc
.data 0x3948
Numbers are from VC 2003 x86 32 bit.
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the context/pTHX from Perl_croak_no_modify and Perl_croak_xs_usage.
For croak_no_modify, it now has no parameters (and always has been
no return), and on some compilers will now be optimized to a conditional
jump. For Perl_croak_xs_usage one push asm opcode is removed at the caller.
For both funcs, their footprint in their callers (which probably are hot
code) is smaller, which means a tiny bit more room in the cache. My text
section went from 0xC1A2F to 0xC198F after apply this. Also see
http://www.nntp.perl.org/group/perl.perl5.porters/2012/11/msg195233.html .
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By defining NO_TAINT_SUPPORT, all the various checks that perl does for
tainting become no-ops. It's not an entirely complete change: it doesn't
attempt to remove the taint-related interpreter variables, but instead
virtually eliminates access to it.
Why, you ask? Because it appears to speed up perl's run-time
significantly by avoiding various "are we running under taint" checks
and the like.
This change is not in a state to go into blead yet. The actual way I
implemented it might raise some (valid) objections. Basically, I
replaced all uses of the global taint variables (but not PL_taint_warn!)
with an extra layer of get/set macros (TAINT_get/TAINTING_get).
Furthermore, the change is not complete:
- PL_taint_warn would likely deserve the same treatment.
- Obviously, tests fail. We have tests for -t/-T
- Right now, I added a Perl warn() on startup when -t/-T are detected
but the perl was not compiled support it. It might be argued that it
should be silently ignored! Needs some thinking.
- Code quality concerns - needs review.
- Configure support required.
- Needs thinking: How does this tie in with CPAN XS modules that use
PL_taint and friends? It's easy to backport the new macros via PPPort,
but that doesn't magically change all code out there. Might be
harmless, though, because whenever you're running under
NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false.
Thus, the only CPAN code that SHOULD be adversely affected is code
that changes taint state.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added to make SvREADONLY_off safe. (I think read-only is
turned off during magic so the magic scalar itself can be set without
the sv_set* functions getting upset.) Since SvREADONLY doesn’t mean
read-only for COWs, we don’t actually need to do sv_force_normal, but
can simply skip SvREADONLY_off for COWs.
By leaving it to sv_set* functions to do sv_force_normal, we avoid
having to copy the string buffer if it is just going to be thrown away
anyway. S_save_magic can’t know whether the scalar will actually be
overwritten, so it has to copy the buffer.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Assigning to a substr lvalue scalar was invoking overload too
many times if the target was a UTF8 string and the assigned sub-
string was not.
Since sv_insert_flags itself stringifies the scalar, the easiest
way to fix this is to force the target to a PV before doing any-
thing to it.
|
|
|
|
|
| |
This was added in commit 9bf12eaf4 to fix a crash, but it is not
necessary any more, due to changes elsewhere.
|
|
|
|
|
|
|
|
|
| |
It is not possible to know how to interpret the returned length
without accessing the UTF8 flag, which is not reliable until
the SV has been stringified, which requires get-magic. So length
magic has not made senses since utf8 support was added. I have
removed all uses of length magic from the core, so this is now
dead code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By checking it before, it can end up treating a UTF8 string as bytes
when calculating offsets if the UTF8 flag is not turned on until
the target is stringified. This can happen with overloading and
typeglobs.
This is a regression from 5.14. 5.14 itself was buggy, too, but one
would have to modify the target after creating the substr lvalue but
before assigning to it; and that because of another bug fixed by
83f78d1a27, which was cancelling out this one.
package o {
use overload '""' => sub { $_[0][0] }
}
my $refee = bless ["\x{100}a"], o::;
my $substr = \substr $refee, -2;
$$substr = "b";
warn $refee;
That prints:
Wide character in warn at - line 7.
Āb at - line 7.
In 5.14 it prints:
b at - line 7.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This touches on issues raised in tickets #114410 and #114690.
If the UTF8ness of an overloaded string changes with each call, it
will make magic_setpos panic if it tries to stringify the SV multiple
times. We have to avoid any sv-specific utf8 length functions when
it comes to overloading. And we should do the same thing for gmagic,
too, to avoid creating caches that will shortly be invalidated.
The test class is very closely based on code written by Nicholas Clark
in a response to #114410.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mg_length returns the number of bytes if a scalar has length magic,
but the number of characters otherwise.
sv_len_utf8 used to assume that mg_length would return bytes. The
first mistake was added in commit b76347f2eb, which assumed that
mg_length would return characters. But it was #ifdeffed out until
commit ffc61ed20e.
Later, commit 5636d518683 met sv_len_utf8’s assumptions by making
mg_length return the length in characters, without accounting for
sv_len, which also used mg_length.
So we ended up with a buggy sv_len that would return a character
count for scalars with get- but not length-magic, and a byte count
otherwise.
In the previous commit, I fixed sv_len not to use mg_length at all. I
plan shortly to remove any use of mg_length (the one remaining use is
in sv_len_utf8, which is currently not called on magical values).
The reason for removing all calls to mg_length is that the returned
length cannot be converted to characters without access to the PV as
well, which requires get-magic. So length magic on scalars makes no
sense since the advent of utf8.
This commit restore mg_length to its old behaviour and lists it as
deprecated. This is mostly cosmetic, as there are no CPAN users. But
it is in the API, and I don’t know whether we can easily remove it.
|
| |
|
|
|
|
|
|
|
| |
This was brought up in ticket #96672.
This variable gives access to the last-read filehandle that Perl uses
when it appends ", <STDIN> line 1" to a warning or error message.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a pattern matches, and that pattern contains captures (or $`, $&, $'
or /p are present), a copy is made of the whole original string, so
that $1 et al continue to hold the correct value even if the original
string is subsequently modified. This can have severe performance
penalties; for example, this code causes a 1Mb buffer to be allocated,
copied and freed a million times:
$&;
$x = 'x' x 1_000_000;
1 while $x =~ /(.)/g;
This commit changes this so that, where possible, only the needed
substring of the original string is copied: in the above case, only a
1-byte buffer is copied each time. Also, it now reuses or reallocs the
buffer, rather than freeing and mallocing each time.
Now that PL_sawampersand is a 3-bit flag indicating separately whether
$`, $& and $' have been seen, they each contribute only their own
individual penalty; which ones have been seen will limit the extent to
which we can avoid copying the whole buffer.
Note that the above code *without* the $& is not currently slow, but only
because the copying is artificially disabled to avoid the performance hit.
The next but one commit will remove that hack, meaning that it will still
be fast, but will now be correct in the presence of a modified original
string.
We achieve this by by adding suboffset and subcoffset fields to the
existing subbeg and sublen fields of a regex, to indicate how many bytes
and characters have been skipped from the logical start of the string till
the physical start of the buffer. To avoid copying stuff at the end, we
just reduce sublen. For example, in this:
"abcdefgh" =~ /(c)d/
subbeg points to a malloced buffer containing "c\0"; sublen == 1,
and suboffset == 2 (as does subcoffset).
while if $& has been seen,
subbeg points to a malloced buffer containing "cd\0"; sublen == 2,
and suboffset == 2.
If in addition $' has been seen, then
subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6,
and suboffset == 2.
The regex engine won't do this by default; there are two new flag bits,
REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with
REXEC_COPY_STR, request that the engine skip the start or end of the
buffer (it will still copy in the presence of the relevant $`, $&, $',
/p).
Only pp_match has been enhanced to use these extra flags; substitution
can't easily benefit, since the usual action of s///g is to copy the
whole string first time round, then perform subsequent matching iterations
against the copy, without further copying. So you still need to copy most
of the buffer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the handling of getting the value, length etc of ${^PREMATCH}
etc is identical to that of $` etc.
Handle them separately, by adding RX_BUFF_IDX_CARET_PREMATCH etc
constants to the existing RX_BUFF_IDX_PREMATCH set.
This allows, when retrieving them, to always return undef if the current
match didn't use //p. Previously the result depended on stuff such
as whether the (non-//p) pattern included captures or not.
The documentation for ${^PREMATCH} etc states that it's only guaranteed to
return a defined value when the last pattern was //p.
As well as making things more consistent, this is a necessary
prerequisite for the following commit, which may not always copy the
whole string during a non-//p match.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The debugger implements breakpoints by setting/clearing OPf_SPECIAL on
OP_DBSTATE ops. This means that it is writing to the optree at runtime,
and it falls foul of the enforced read-only OP slabs when debugging with
-DPERL_DEBUG_READONLY_OPS
Avoid this by removing static from Slab_to_rw(), and using it and Slab_to_ro()
in Perl_magic_setdbline() to temporarily make the slab re-write whilst
changing the breakpoint flag.
With this all tests pass with -DPERL_DEBUG_READONLY_OPS (on this system)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If a scalar is gmagical, then the string buffer could change without
the utf8 pos cache being updated.
So it should respond to get-magic, not just set-magic. Actually add-
ing get-magic to the utf8 magic vtable would cause all scalars with
this magic to be flagged gmagical. Instead, in magic_get, we can call
magic_setutf8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes most register declarations in C code (and accompanying
documentation) in the Perl core. Retained are those in the ext
directory, Configure, and those that are associated with assembly
language.
See:
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
which says, in part:
There is no good example of register usage when using modern compilers
(read: last 10+ years) because it almost never does any good and can do
some bad. When you use register, you are telling the compiler "I know
how to optimize my code better than you do" which is almost never the
case. One of three things can happen when you use register:
The compiler ignores it, this is most likely. In this case the only
harm is that you cannot take the address of the variable in the
code.
The compiler honors your request and as a result the code runs slower.
The compiler honors your request and the code runs faster, this is the least likely scenario.
Even if one compiler produces better code when you use register, there
is no reason to believe another will do the same. If you have some
critical code that the compiler is not optimizing well enough your best
bet is probably to use assembler for that part anyway but of course do
the appropriate profiling to verify the generated code is really a
problem first.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are only used for storing a string and an IV.
Making them into full-blown SVt_PVFMs is overkill.
FmLINES was only being used on these three scalars. So make it use
the SvIVX field. struct xpvfm no longer needs an xfm_lines member,
because SVt_PVFMs no longer use it.
This also causes a TODO test in taint.t to start passing, but I do
not fully understand why. But at least that’s progress. :-)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A substitution forces its target to a string upon successful substitu-
tion, even if the substitution did nothing:
$ ./perl -Ilib -le '$a = *f; $a =~ s/f/f/; print ref \$a'
SCALAR
Notice that $a is no longer a glob after s///.
But vstrings are different:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/f/; print ref \$a'
VSTRING
I fixed this in 5.16 (1e6bda93) for those cases where the vstring ends
up with a value that doesn’t correspond to the actual string:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/o/; print ref \$a'
SCALAR
It works through vstring set-magic, that does the check and removes
the magic if it doesn’t match.
I did it that way because I couldn’t think of any other way to fix
bug #29070, and I didn’t realise at the time that I hadn’t fixed
all the bugs.
By making SvTHINKFIRST true on a vstring, we force it through
sv_force_normal before any in-place string operations. We can also
make sv_force_normal handle vstrings as well. This fixes all the lin-
gering-vstring-magic bugs in just two lines, making the vstring set-
magic (which is also slow) redundant. It also allows the special case
in sv_setsv_flags to be removed.
Or at least that was what I had hoped.
It turns out that pp_subst, twists and turns in tortuous ways, and
needs special treatment for things like this.
And do_trans functions wasn’t checking SvTHINKFIRST when arguably it
should have.
I tweaked sv_2pv{utf8,byte} to avoid copying magic variables that do
not need copying.
|
|
|
|
|
| |
(turning off other OK flags), make them byte strings; if wide characters can't
be downgraded to bytes, leave the string utf8 and issue a warning.
|
|
|
|
| |
These were missed in commit 1203306491d341ed, which renamed ptr to key.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In restore_magic(), which is called after any magic processing, all of
the public OK flags have been shifted into the private OK flags. Thus
the lack of an appropriate public OK flags was used to trigger both get
magic and required conversions. This scheme did not cover ROK, however,
so all properly written code had to make sure mg_get was called the right
number of times anyway. Meanwhile the private OK flags gained a second
purpose of marking converted but non-authoritative values (e.g. the IV
conversion of an NV), and the inadequate flag shift mechanic broke this
in some cases.
This patch removes the shift mechanic for magic flags, thus exposing (and
fixing) some improper usage of magic SVs in which mg_get() was not called
correctly. It also has the side effect of making magic get functions
specifically set their SVs to undef if that is desired, as the new behavior
of empty get functions is to leave the value unchanged. This is a feature,
as now get magic that does not modify its value, e.g. tainting, does not
have to be special cased.
The changes to cpan/ here are only temporary, for development only, to
keep blead working until upstream applies them (or something like them).
Thanks to Rik and Father C for review input.
|
|
|
|
| |
that was introduced by some guy named Chip in 1997 (e3c19b7bc9)
|
| |
|
|
|
|
|
| |
since signals can arrive at any point, clearing $@ isn't a safe
thing to do
|
| |
|
|
|
|
| |
This fixes RT #75596.
|
|
|
|
| |
I’m running out of synonyms for ‘remove’.
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
| |
C++ requires #include directives to be at file scope, but we've
been lazy and haven't been doing that.
|
|
|
|
| |
Now that ‘A’ magic is gone, nothing is using this function.
|
|
|
|
|
|
|
|
|
|
|
|
| |
$^N is a magical variable, like $1 and $2, with the usual ‘sv’
magic. So it is handled by Perl_magic_get and Perl_magic_set. But
Perl_magic_set didn’t have a case for it, so it simply ignored it and
did nothing, like a tied variable with an empty STORE method.
Now assigning to $^N has the same affect as assigned to the numbered
variable to which it corresponds. If there is no corresponding cap-
ture from the last match, or in the absence of regexp plugins, it
croaks with ‘Modification of a read-only value’.
|