summaryrefslogtreecommitdiff
path: root/dump.c
Commit message (Collapse)AuthorAgeFilesLines
* dump.c: Make less ASCII-centric:Karl Williamson2013-08-291-63/+14
| | | | This has the added advantage of being clearer as to what is going on.
* Stop pos() from being confused by changing utf8nessFather Chrysostomos2013-08-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
* Use SSize_t for arraysFather Chrysostomos2013-08-251-1/+1
| | | | | | | | | | Make the array interface 64-bit safe by using SSize_t instead of I32 for array indices. This is based on a patch by Chip Salzenberg. This completes what the previous commit began when it changed av_extend.
* Use SSize_t for tmps stack offsetsFather Chrysostomos2013-08-251-2/+2
| | | | | | | | | | | | | | | This is a partial fix for #119161. On 64-bit platforms, I32 is too small to hold offsets into a stack that can grow larger than I32_MAX. What happens is the offsets can wrap so we end up referencing and modifying elements with negative indices, corrupting memory, and causing crashes. With this commit, ()=1..1000000000000 stops crashing immediately. Instead, it gobbles up all your memory first, and then, if your com- puter still survives, crashes. The second crash happesn bcause of a similar bug with the argument stack, which the next commit will take care of.
* dump.c: Dump contents of regexps’ mother_re fieldFather Chrysostomos2013-08-091-0/+3
| | | | | This can make debugging easier if one needs to see the reference count of the parent regular expression.
* dump.c: White-space onlyKarl Williamson2013-08-011-6/+8
| | | | Indent and wrap lines to correspond with newly formed block
* Extend sv_dump() to dump SVt_INVLISTKarl Williamson2013-08-011-0/+7
| | | | | | | | | This changes the previously unused _invlist_dump() function to be called from sv_dump() to dump inversion list scalars. The format for regular SVt_PVs doesn't give human-friendly output for these. Since these lists are currently not visible outside the Perl core, the format is documented only in comments in the function itself.
* Skip trailing constants when searching padsFather Chrysostomos2013-07-301-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under ithreads, constants and GVs are stored in the pad. When names are looked up up in a pad, the search begins at the end and works its way toward the beginning, so that an $x declared later masks one declared earlier. If there are many constants at the end of the pad, which can happen for generated code such as lib/unicore/TestProp.pl (which has about 100,000 lines and over 500,000 pad entries for constants at the end of the file scope’s pad), it can take a long time to search through them all. Before commit 325e1816, constants used &PL_sv_undef ‘names’. Since that is the default value for array elements (when viewed directly through AvARRAY, rather than av_fetch), the pad allocation code did not even bother storing the ‘name’ for these. So the name pad (aka padnamelist) was not extended, leaving just 10 entries or so in the case of lib/unicore/TestProp.pl. Commit 325e1816 make pad constants have &PL_sv_no names, so the name pad would be implicitly extended as a result of storing &PL_sv_no, causing a huge slowdown in t/re/uniprops.t (which runs lib/unicore/TestProp.pl) under threaded builds. Now, normally the name pad *does* get extended to match the pad, in pad_tidy, but that is skipped for string eval (and required file scope, of course). Hence, wrapping the contents of lib/unicore/TestProp.pl in a sub or adding ‘my $x’ to end of it will cause the same slowdown before 325e1816. lib/unicore/TestProp.pl just happened to be written (ok, generated) in such a way that it ended up with a small name pad. This commit fixes things to make them as fast as before by recording the index of the last named variable in the pad. Anything following that is disregarded in pad lookup and search begins with the last named variable. (This actually does make things faster before for subs with many trailing constants in the pad.) This is not a complete fix. Adding ‘my $x’ to the end of a large file like lib/unicore/TestProp.pl will make it just as slow again. Ultimately we need another algorithm, such as a binary search.
* more op_folded support: B, dumpReini Urban2013-07-191-0/+1
| | | | also add more B::OP accessors for the missing bitfields
* Reinstate "Create SVt_INVLIST"Karl Williamson2013-07-161-4/+3
| | | | | | | | | | | | | | | | | This reverts commit 49cf1d6641a6dfd301302f616e4f25595dcc65d4, which reverted e045dbedc7da04e20cc8cfccec8a2e3ccc62cc8b, thus reinstating the latter commit. It turns out that the error being chased down was not due to this commit. Its original message was: This reshuffles the svtype enum to remove the dummy slot created in a previous commit, and add the new SVt_INVLIST type in its proper order. It still is unused, but since it is an extension of SVt_PV, it must be greater than that type's enum value. Since it can't be upgraded to any other PV type, it comes right after SVt_PV. Affected tables in the core are updated.
* Revert "Create SVt_INVLIST"Karl Williamson2013-07-041-3/+4
| | | | | | | | This reverts commit e045dbedc7da04e20cc8cfccec8a2e3ccc62cc8b. Blead won't compile with address sanitizer; commit 7cb47964955167736b2923b008cc8023a9b035e8 reverting an earlier commit failed to fix that. I'm trying two more reversions to get it back working. This is one of those
* Create SVt_INVLISTKarl Williamson2013-07-031-4/+3
| | | | | | | | | | This reshuffles the svtype enum to remove the dummy slot created in a previous commit, and add the new SVt_INVLIST type in its proper order. It still is unused, but since it is an extension of SVt_PV, it must be greater than that type's enum value. Since it can't be upgraded to any other PV type, it comes right after SVt_PV. Affected tables in the core are updated.
* -DPERL_TRACE_OPS to produce reports on executed OP countsSteffen Mueller2013-07-021-0/+3
| | | | | | | | | This produces a report on the number of OPs of a given type that were executed at the end of a program run. This can be useful in multiple ways. One, it can help determine hotspots for optimization (yes, I know execution count is not equal execution time). It can also help with determining whether a given change to perl has had the desired effect on deterministic programs.
* dump.c: Dump PV fields of SVt_PVIOsFather Chrysostomos2013-06-221-1/+2
| | | | Yes, these can hold PVs when source filters are involved.
* Stop split from mangling constantsFather Chrysostomos2013-06-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | At compile time, if split occurs on the right-hand side of an assign- ment to a list of scalars, if the limit argument is a constant con- taining the number 0 then it is modified in place to hold one more than the number of scalars. This means ‘constants’ can change their values, if they happen to be in the wrong place at the wrong time: $ ./perl -Ilib -le 'use constant NULL => 0; ($a,$b,$c) = split //, $foo, NULL; print NULL' 4 I considered checking the reference count on the SV, but since XS code could create its own const ops with weak references to the same cons- tants elsewhere, the safest way to avoid modifying someone else’s SV is to mark the split op in ck_split so we know the SV belongs to that split op alone. Also, to be on the safe side, turn off the read-only flag before modi- fying the SV, instead of relying on the special case for compile time in sv_force_normal.
* Remove BmRARE and BmPREVIOUSFather Chrysostomos2013-06-211-2/+0
| | | | | These were only used by the study code, which was disabled in 5.16.0 and removed shortly thereafter.
* Cache HvFILL() for larger hashes, and update on insertion/deletion.Nicholas Clark2013-05-291-1/+24
| | | | | | This avoids HvFILL() being O(n) for large n on large hashes, but also avoids storing the value of HvFILL() in smaller hashes (ie a memory overhead on every single object built using a hash.)
* Remove core references to SVt_BINDKarl Williamson2013-05-181-4/+4
| | | | | | This scalar type was unused. This is the first step in using the slot freed up for another purpose. The slot is now occupied by a temporary placeholder named SVt_DUMMY.
* Remove PERL_ASYNC_CHECK() from Perl_leave_scope().Nicholas Clark2013-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PERL_ASYNC_CHECK() was added to Perl_leave_scope() as part of commit f410a2119920dd04, which moved signal dispatch from the runloop to control flow ops, to mitigate nearly all of the speed cost of safe signals. The assumption was that scope exit was a safe place to dispatch signals. However, this is not true, as parts of the regex engine call leave_scope(), the regex engine stores some state in per-interpreter variables, and code called within signal handlers can change these values. Hence remove the call to PERL_ASYNC_CHECK() from Perl_leave_scope(), and add it explicitly in the various OPs which were relying on their call to leave_scope() to dispatch any pending signals. Also add a PERL_ASYNC_CHECK() to the exit of the runloop, which ensures signals still dispatch from S_sortcv() and S_sortcv_stacked(), as well as addressing one of the concerns in the commit message of f410a2119920dd04: Subtle bugs might remain - there might be constructions that enter the runloop (where signals used to be dispatched) but don't contain any PERL_ASYNC_CHECK() calls themselves. Finally, move the PERL_ASYNC_CHECK(); added by that commit to pp_goto to the end of the function, to be consistent with the positioning of all other PERL_ASYNC_CHECK() calls - at the beginning or end of OP functions, hence just before the return to or just after the call from the runloop, and hence effectively at the same point as the previous location of PERL_ASYNC_CHECK() in the runloop.
* dump.c: avoid compiler warning under -DmadDavid Mitchell2013-05-091-1/+1
| | | | | | | | | this fix was already applied to the non-MAD branch; apply it to the similar MAD / xml-dump code. dump.c:2663:57: warning: comparison of constant 85 with expression of type 'svtype' is always false [-Wtautological-constant-out-of-range-compare] else if (sv == (const SV *)0x55555555 || SvTYPE(sv) == 'U') {
* Make it possible to disable and control hash key traversal randomizationYves Orton2013-05-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control the level of randomization applied to keys() and friends. When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The chance that keys() changes due to an insert will be the same as in previous perls, basically only when the bucket size is changed. When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable way. The chance that keys() changes due to an insert will be very high. This is the most secure and default mode. When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way. Repititive runs of the same program should produce the same output every time. The chance that keys changes due to an insert will be very high. This patch also makes PERL_HASH_SEED imply a non-default PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a different level overrides this behavior. Includes changes to allow one to compile out various aspects of the patch. One can compile such that PERL_PERTURB_KEYS is not respected, or can compile without hash key traversal randomization at all. Note that support for these modes is incomplete, and currently a few tests will fail. Also includes a new subroutine in Hash::Util::hash_traversal_mask() which can be used to ensure a given hash produces a predictable key order (assuming the same hash seed is in effect). This sub acts as a getter and a setter. NOTE - this patch lacks tests, but I lack tuits to get them done quickly, so I am pushing this with the hope that others can add them afterwards.
* prevent SEGV from buffer read overrun, and refactor away duplicated codeYves Orton2013-03-271-14/+11
| | | | | | | The split patch introduced a buffer read overrun error in sv_dump() when stringifying empty strings. This bug was always existant but was probably never triggered because we almost always have at least one extflags set, so it never got an empty buffer to show. Not so with the new compflags. :-(
* rework split() special case interaction with regex engineYves Orton2013-03-271-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves several issues at once. The parts are sufficiently interconnected that it is hard to break it down into smaller commits. The tickets open for these issues are: RT #94490 - split and constant folding RT #116086 - split "\x20" doesn't work as documented It additionally corrects some issues with cached regexes that were exposed by the split changes (and applied to them). It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171 and cccd1425414e6518c1fc8b7bcaccfb119320c513. Prior to this patch the special RXf_SKIPWHITE behavior of split(" ", $thing) was only available if Perl could resolve the first argument to split at compile time, meaning under various arcane situations. This manifested as oddities like my $delim = $cond ? " " : qr/\s+/; split $delim, $string; and split $cond ? " ", qr/\s+/, $string not behaving the same as: ($cond ? split(" ", $string) : split(/\s+/, $string)) which isn't very convenient. This patch changes this by adding a new flag to the op_pmflags, PMf_SPLIT which enables pp_regcomp() to know whether it was called as part of split, which allows the RXf_SPLIT to be passed into run time regex compilation. We also preserve the original flags so pattern caching works properly, by adding a new property to the regexp structure, "compflags", and related macros for accessing it. We preserve the original flags passed into the compilation process, so we can compare when we are trying to decide if we need to recompile. Note that this essentially the opposite fix from the one applied originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171. The reverted patch was meant to make: split( 0 || " ", $thing ) #1 consistent with my $x=0; split( $x || " ", $thing ) #2 and not with split( " ", $thing ) #3 This was reverted because it broke C<split("\x{20}", $thing)>, and because one might argue that is not that #1 does the wrong thing, but rather that the behavior of #2 that is wrong. In other words we might expect that all three should behave the same as #3, and that instead of "fixing" the behavior of #1 to be like #2, we should really fix the behavior of #2 to behave like #3. (Which is what we did.) Also, it doesn't make sense to move the special case detection logic further from the regex engine. We really want the regex engine to decide this stuff itself, otherwise split " ", ... wouldn't work properly with an alternate engine. (Imagine we add a special regexp meta pattern that behaves the same as " " does in a split /.../. For instance we might make split /(*SPLITWHITE)/ trigger the same behavior as split " ". The other major change as result of this patch is it effectively reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which and free up bits in the regex flags structure. But we dont want to get rid of these vars, and it turns out that RXf_SEEN_LOOKBEHIND is used only in the same situation as the new RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to RXf_NO_INPLACE_SUBST, and then instead of using two vars we use only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to have their bits back.
* improve how Devel::Peek::Dump handles iterator informationYves Orton2013-03-241-2/+10
| | | | | | | | * If the hash is not OOK omit any iterator status information instead of showing -1/NULL * If the hash is OOK then add the RAND value from the iterator and if the LASTRAND is not the same show it too * Tweak tests to test the above.
* Revert "fix Peek.t to work with NEW COW"David Mitchell2013-03-231-39/+28
| | | | | | | This reverts commit 2b656fcc48f28912136698c28b3bd916c42d74f8. I accidentally included the changes I was reviewing from a patch of Reini's
* fix Peek.t to work with NEW COWDavid Mitchell2013-03-231-28/+39
|
* Silence a warning under clang/asanYves Orton2012-12-151-1/+2
| | | | | | | | | | This should silence the following warning: dump.c:459:57: warning: comparison of constant 85 with expression of type 'svtype' is always false [-Wtautological-constant-out-of-range-compare] The warning is a false positive, this code is /meant/ to detect conditions that should not happen.
* Copy keys for aassign in lvalue subFather Chrysostomos2012-12-111-2/+6
| | | | | | | | | Checking LVRET (which pp_aassign does, as of a few commits ago) has no effect if OPpMAYBE_LVSUB is not set on the op. This com- mit changes op.c:op_lvalue_flags to set this flag on aassign ops. This makes sub:lvalue{%h=($x,$x)} behave correctly if the return values of the sub are assigned to ($x is unmodfied).
* Switch dump.c over to using SvREFCNT_dec_NNFather Chrysostomos2012-12-091-14/+14
| | | | | No uses of SvREFCNT_dec in dump.c are ever passed null SVs, so don’t check for that.
* silence some clang warningsDavid Mitchell2012-12-041-1/+0
| | | | | | | | principally, proto.h was sometimes being included twice - once before a fn decl, and once after - giving rise to a 'decl after def' warning. Also, S_croak_memory_wrap was declared static in every source file, but not used in some. So selectively disable the unused-function warning.
* New COW mechanismFather Chrysostomos2012-11-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | This was discussed in ticket #114820. This new copy-on-write mechanism stores a reference count for the PV inside the PV itself, at the very end. (I was using SvEND+1 at first, but parts of the regexp engine expect to be able to do SvCUR_set(sv,0), which causes the wrong byte of the string to be used as the reference count.) Only 256 SVs can share the same PV this way. Also, only strings with allocated space after the trailing null can be used for copy-on-write. Much of the code is shared with PERL_OLD_COPY_ON_WRITE. The restric- tion against doing copy-on-write with magical variables has hence been inherited, though it is not necessary. A future commit will take care of that. I had to modify _core_swash_init to handle $@ differently. The exist- ing mechanism of copying $@ to a new scalar and back again was very fragile. With copy-on-write, $@ =~ s/// can cause pp_subst’s string pointers to become stale. So now we remove the scalar from *@ and allow the utf8-table-loading code to autovivify a new one. Then we restore the untouched $@ afterwards if all goes well.
* Hash Function Change - Murmur hash and true per process hash seedYves Orton2012-11-171-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: *) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. *) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. *) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. *) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. *) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. *) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
* dump.c: Fix non-mad threaded build errorFather Chrysostomos2012-11-141-6/+5
| | | | | | | This was introduced by 75a6ad4aa3. C preprocessors treat the ‘aTHX_ foo’ in MACRO(aTHX_ foo) as a sin- gle argument.
* SVf_IsCOWFather Chrysostomos2012-11-141-0/+1
| | | | | | | | | | | | | | | As discussed in ticket #114820, instead of using READONLY+FAKE to mark a copy-on-write string, we should make it a separate flag. There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib) that assume that SvREADONLY means read-only. Only one CPAN module, POSIX::pselect will definitely be broken by this. Others may need to be tweaked. But I believe this is for the better. It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq- uisite for any new COW scheme that creates COWs under the same cir- cumstances.
* Combine duplicate dump codeReini Urban2012-11-141-293/+134
| | | | | Combine op_flags and op_private dumper for MAD (xml) and -Dx dump. Add some previously missing flags.
* add padrange opDavid Mitchell2012-11-101-14/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This single op can, in some circumstances, replace the sequence of a pushmark followed by one or more padsv/padav/padhv ops, and possibly a trailing 'list' op, but only where the targs of the pad ops form a continuous range. This is generally more efficient, but is particularly so in the case of void-context my declarations, such as: my ($a,@b); Formerly this would be executed as the following set of ops: pushmark pushes a new mark padsv[$a] pushes $a, does a SAVEt_CLEARSV padav[@b] pushes all the flattened elements (i.e. none) of @a, does a SAVEt_CLEARSV list pops the mark, and pops all stack elements except the last nextstate pops the remaining stack element It's now: padrange[$a..@b] does two SAVEt_CLEARSV's nextstate nothing needing doing to the stack Note that in the case above, this commit changes user-visible behaviour in pathological cases; in particular, it has always been possible to modify a lexical var *before* the my is executed, using goto or closure tricks. So in principle someone could tie an array, then could notice that FETCH is no longer being called, e.g. f(); my ($s, @a); # this no longer triggers two FETCHES sub f { tie @a, ...; push @a, 1,2; } But I think we can live with that. Note also that having a padrange operator will allow us shortly to have a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple individual SAVEt_CLEARSV's.
* Add C define to remove taint support from perlSteffen Mueller2012-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By defining NO_TAINT_SUPPORT, all the various checks that perl does for tainting become no-ops. It's not an entirely complete change: it doesn't attempt to remove the taint-related interpreter variables, but instead virtually eliminates access to it. Why, you ask? Because it appears to speed up perl's run-time significantly by avoiding various "are we running under taint" checks and the like. This change is not in a state to go into blead yet. The actual way I implemented it might raise some (valid) objections. Basically, I replaced all uses of the global taint variables (but not PL_taint_warn!) with an extra layer of get/set macros (TAINT_get/TAINTING_get). Furthermore, the change is not complete: - PL_taint_warn would likely deserve the same treatment. - Obviously, tests fail. We have tests for -t/-T - Right now, I added a Perl warn() on startup when -t/-T are detected but the perl was not compiled support it. It might be argued that it should be silently ignored! Needs some thinking. - Code quality concerns - needs review. - Configure support required. - Needs thinking: How does this tie in with CPAN XS modules that use PL_taint and friends? It's easy to backport the new macros via PPPort, but that doesn't magically change all code out there. Might be harmless, though, because whenever you're running under NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false. Thus, the only CPAN code that SHOULD be adversely affected is code that changes taint state.
* Allow regexp-to-pvlv assignmentFather Chrysostomos2012-10-301-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the xpvlv and regexp structs conflict, we have to find somewhere else to put the regexp struct. I was going to sneak it in SvPVX, allocating a buffer large enough to fit the regexp struct followed by the string, and have SvPVX - sizeof(regexp) point to the struct. But that would make all regexp flag-checking macros fatter, and those are used in hot code. So I came up with another method. Regexp stringification is not speed-critical. So we can move the regexp stringification out of re->sv_u and put it in the regexp struct. Then the regexp struct itself can be pointed to by re->sv_u. So SVt_REGEXPs will have re->sv_any and re->sv_u pointing to the same spot. PVLVs can then have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u point to a regexp struct. All regexp member access can go through sv_u instead of sv_any, which will be no slower than before. Regular expressions will no longer be SvPOK, so we give sv_2pv spec- ial logic for regexps. We don’t need to make the regexp struct larger, as SvLEN is currently always 0 iff mother_re is set. So we can replace the SvLEN field with the pv. SvFAKE is never used without SvPOK or SvSCREAM also set. So we can use that to identify regexps.
* Define RXf_SPLIT and RXf_SKIPWHITE as 0Father Chrysostomos2012-10-111-4/+0
| | | | | | | | They are on longer used in core, and we need room for more flags. The only CPAN modules that use them check whether RXf_SPLIT is set (which no longer happens) before setting RXf_SKIPWHITE (which is ignored).
* dump.c: Dump CvNAME_HEKFather Chrysostomos2012-09-151-1/+4
|
* Don't copy all of the match string bufferDavid Mitchell2012-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pattern matches, and that pattern contains captures (or $`, $&, $' or /p are present), a copy is made of the whole original string, so that $1 et al continue to hold the correct value even if the original string is subsequently modified. This can have severe performance penalties; for example, this code causes a 1Mb buffer to be allocated, copied and freed a million times: $&; $x = 'x' x 1_000_000; 1 while $x =~ /(.)/g; This commit changes this so that, where possible, only the needed substring of the original string is copied: in the above case, only a 1-byte buffer is copied each time. Also, it now reuses or reallocs the buffer, rather than freeing and mallocing each time. Now that PL_sawampersand is a 3-bit flag indicating separately whether $`, $& and $' have been seen, they each contribute only their own individual penalty; which ones have been seen will limit the extent to which we can avoid copying the whole buffer. Note that the above code *without* the $& is not currently slow, but only because the copying is artificially disabled to avoid the performance hit. The next but one commit will remove that hack, meaning that it will still be fast, but will now be correct in the presence of a modified original string. We achieve this by by adding suboffset and subcoffset fields to the existing subbeg and sublen fields of a regex, to indicate how many bytes and characters have been skipped from the logical start of the string till the physical start of the buffer. To avoid copying stuff at the end, we just reduce sublen. For example, in this: "abcdefgh" =~ /(c)d/ subbeg points to a malloced buffer containing "c\0"; sublen == 1, and suboffset == 2 (as does subcoffset). while if $& has been seen, subbeg points to a malloced buffer containing "cd\0"; sublen == 2, and suboffset == 2. If in addition $' has been seen, then subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6, and suboffset == 2. The regex engine won't do this by default; there are two new flag bits, REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with REXEC_COPY_STR, request that the engine skip the start or end of the buffer (it will still copy in the presence of the relevant $`, $&, $', /p). Only pp_match has been enhanced to use these extra flags; substitution can't easily benefit, since the usual action of s///g is to copy the whole string first time round, then perform subsequent matching iterations against the copy, without further copying. So you still need to copy most of the buffer.
* "op-entry" DTrace probeShawn M Moore2012-08-281-0/+2
|
* Correct typo in flag nameFather Chrysostomos2012-08-251-2/+2
|
* Banish boolkeysFather Chrysostomos2012-08-251-2/+4
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* Optimise %hash in sub { %hash || ... }Father Chrysostomos2012-08-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In %hash || $foo, the %hash is in scalar context, so it has to iterate through the buckets to produce statistics on bucket usage. If the || is in void context, the value returned by hash is only ever used as a boolean (as || doesn’t have to return it). We already opti- mise it by adding a boolkeys op when it is known at compile time that || will be in void context. In sub { %hash || $foo } it is not known at compile time that it will be in void context, so it wasn’t optimised. This commit optimises it by flagging the %hash at compile time as being possibly in ‘true boolean’ context. When that flag is set, the rv2hv and padhv ops call block_gimme() to see whether || is in void context. This speeds things up signficantly. Here is what I got after optimis- ing rv2hv but before doing padhv: $ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m0.179s user 0m0.101s sys 0m0.005s $ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m5.446s user 0m2.419s sys 0m0.015s (That example is slightly misleading because of the closure, but the closure version takes 1 sec. when optimised.)
* Use FooBAR convention for new pad macrosFather Chrysostomos2012-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a while, I realised that it can be confusing for PAD_ARRAY and PAD_MAX to take a pad argument, but for PAD_SV to take a number and PAD_SET_CUR a padlist. I was copying the HEK_KEY convention, which was probably a bad idea. This is what we use elsewhere: TypeMACRO ----===== AvMAX CopFILE PmopSTASH StashHANDLER OpslabREFCNT_dec Furthermore, heks are not part of the API, so what convention they use is not so important. So these: PADNAMELIST_* PADLIST_* PADNAME_* PAD_* are now: Padnamelist* Padlist* Padname* Pad*
* Stop padlists from being AVsFather Chrysostomos2012-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | In order to fix a bug, I need to add new fields to padlists. But I cannot easily do that as long as they are AVs. So I have created a new padlist struct. This not only allows me to extend the padlist struct with new members as necessary, but also saves memory, as we now have a three-pointer struct where before we had a whole SV head (3-4 pointers) + XPVAV (5 pointers). This will unfortunately break half of CPAN, but the pad API docs clearly say this: NOTE: this function is experimental and may change or be removed without notice. This would have broken B::Debug, but a patch sent upstream has already been integrated into blead with commit 9d2d23d981.
* Use PADLIST in more placesFather Chrysostomos2012-08-211-1/+1
| | | | | Much code relies on the fact that PADLIST is typedeffed as AV. PADLIST should be treated as a distinct type.
* Add a depth field to formatsFather Chrysostomos2012-08-051-2/+1
| | | | | Instead of lengthening the struct, we can reuse SvCUR, which is cur- rently unused.
* Disallow setting SvPV on formatsFather Chrysostomos2012-08-051-1/+1
| | | | | | | Setting a the PV on a format is meaningless, as of the previ- ous commit. This frees up SvCUR for other uses.