summaryrefslogtreecommitdiff
path: root/dump.c
Commit message (Collapse)AuthorAgeFilesLines
* Cache HvFILL() for larger hashes, and update on insertion/deletion.Nicholas Clark2013-05-291-1/+24
| | | | | | This avoids HvFILL() being O(n) for large n on large hashes, but also avoids storing the value of HvFILL() in smaller hashes (ie a memory overhead on every single object built using a hash.)
* Remove core references to SVt_BINDKarl Williamson2013-05-181-4/+4
| | | | | | This scalar type was unused. This is the first step in using the slot freed up for another purpose. The slot is now occupied by a temporary placeholder named SVt_DUMMY.
* Remove PERL_ASYNC_CHECK() from Perl_leave_scope().Nicholas Clark2013-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PERL_ASYNC_CHECK() was added to Perl_leave_scope() as part of commit f410a2119920dd04, which moved signal dispatch from the runloop to control flow ops, to mitigate nearly all of the speed cost of safe signals. The assumption was that scope exit was a safe place to dispatch signals. However, this is not true, as parts of the regex engine call leave_scope(), the regex engine stores some state in per-interpreter variables, and code called within signal handlers can change these values. Hence remove the call to PERL_ASYNC_CHECK() from Perl_leave_scope(), and add it explicitly in the various OPs which were relying on their call to leave_scope() to dispatch any pending signals. Also add a PERL_ASYNC_CHECK() to the exit of the runloop, which ensures signals still dispatch from S_sortcv() and S_sortcv_stacked(), as well as addressing one of the concerns in the commit message of f410a2119920dd04: Subtle bugs might remain - there might be constructions that enter the runloop (where signals used to be dispatched) but don't contain any PERL_ASYNC_CHECK() calls themselves. Finally, move the PERL_ASYNC_CHECK(); added by that commit to pp_goto to the end of the function, to be consistent with the positioning of all other PERL_ASYNC_CHECK() calls - at the beginning or end of OP functions, hence just before the return to or just after the call from the runloop, and hence effectively at the same point as the previous location of PERL_ASYNC_CHECK() in the runloop.
* dump.c: avoid compiler warning under -DmadDavid Mitchell2013-05-091-1/+1
| | | | | | | | | this fix was already applied to the non-MAD branch; apply it to the similar MAD / xml-dump code. dump.c:2663:57: warning: comparison of constant 85 with expression of type 'svtype' is always false [-Wtautological-constant-out-of-range-compare] else if (sv == (const SV *)0x55555555 || SvTYPE(sv) == 'U') {
* Make it possible to disable and control hash key traversal randomizationYves Orton2013-05-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control the level of randomization applied to keys() and friends. When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The chance that keys() changes due to an insert will be the same as in previous perls, basically only when the bucket size is changed. When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable way. The chance that keys() changes due to an insert will be very high. This is the most secure and default mode. When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way. Repititive runs of the same program should produce the same output every time. The chance that keys changes due to an insert will be very high. This patch also makes PERL_HASH_SEED imply a non-default PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a different level overrides this behavior. Includes changes to allow one to compile out various aspects of the patch. One can compile such that PERL_PERTURB_KEYS is not respected, or can compile without hash key traversal randomization at all. Note that support for these modes is incomplete, and currently a few tests will fail. Also includes a new subroutine in Hash::Util::hash_traversal_mask() which can be used to ensure a given hash produces a predictable key order (assuming the same hash seed is in effect). This sub acts as a getter and a setter. NOTE - this patch lacks tests, but I lack tuits to get them done quickly, so I am pushing this with the hope that others can add them afterwards.
* prevent SEGV from buffer read overrun, and refactor away duplicated codeYves Orton2013-03-271-14/+11
| | | | | | | The split patch introduced a buffer read overrun error in sv_dump() when stringifying empty strings. This bug was always existant but was probably never triggered because we almost always have at least one extflags set, so it never got an empty buffer to show. Not so with the new compflags. :-(
* rework split() special case interaction with regex engineYves Orton2013-03-271-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves several issues at once. The parts are sufficiently interconnected that it is hard to break it down into smaller commits. The tickets open for these issues are: RT #94490 - split and constant folding RT #116086 - split "\x20" doesn't work as documented It additionally corrects some issues with cached regexes that were exposed by the split changes (and applied to them). It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171 and cccd1425414e6518c1fc8b7bcaccfb119320c513. Prior to this patch the special RXf_SKIPWHITE behavior of split(" ", $thing) was only available if Perl could resolve the first argument to split at compile time, meaning under various arcane situations. This manifested as oddities like my $delim = $cond ? " " : qr/\s+/; split $delim, $string; and split $cond ? " ", qr/\s+/, $string not behaving the same as: ($cond ? split(" ", $string) : split(/\s+/, $string)) which isn't very convenient. This patch changes this by adding a new flag to the op_pmflags, PMf_SPLIT which enables pp_regcomp() to know whether it was called as part of split, which allows the RXf_SPLIT to be passed into run time regex compilation. We also preserve the original flags so pattern caching works properly, by adding a new property to the regexp structure, "compflags", and related macros for accessing it. We preserve the original flags passed into the compilation process, so we can compare when we are trying to decide if we need to recompile. Note that this essentially the opposite fix from the one applied originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171. The reverted patch was meant to make: split( 0 || " ", $thing ) #1 consistent with my $x=0; split( $x || " ", $thing ) #2 and not with split( " ", $thing ) #3 This was reverted because it broke C<split("\x{20}", $thing)>, and because one might argue that is not that #1 does the wrong thing, but rather that the behavior of #2 that is wrong. In other words we might expect that all three should behave the same as #3, and that instead of "fixing" the behavior of #1 to be like #2, we should really fix the behavior of #2 to behave like #3. (Which is what we did.) Also, it doesn't make sense to move the special case detection logic further from the regex engine. We really want the regex engine to decide this stuff itself, otherwise split " ", ... wouldn't work properly with an alternate engine. (Imagine we add a special regexp meta pattern that behaves the same as " " does in a split /.../. For instance we might make split /(*SPLITWHITE)/ trigger the same behavior as split " ". The other major change as result of this patch is it effectively reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which and free up bits in the regex flags structure. But we dont want to get rid of these vars, and it turns out that RXf_SEEN_LOOKBEHIND is used only in the same situation as the new RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to RXf_NO_INPLACE_SUBST, and then instead of using two vars we use only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to have their bits back.
* improve how Devel::Peek::Dump handles iterator informationYves Orton2013-03-241-2/+10
| | | | | | | | * If the hash is not OOK omit any iterator status information instead of showing -1/NULL * If the hash is OOK then add the RAND value from the iterator and if the LASTRAND is not the same show it too * Tweak tests to test the above.
* Revert "fix Peek.t to work with NEW COW"David Mitchell2013-03-231-39/+28
| | | | | | | This reverts commit 2b656fcc48f28912136698c28b3bd916c42d74f8. I accidentally included the changes I was reviewing from a patch of Reini's
* fix Peek.t to work with NEW COWDavid Mitchell2013-03-231-28/+39
|
* Silence a warning under clang/asanYves Orton2012-12-151-1/+2
| | | | | | | | | | This should silence the following warning: dump.c:459:57: warning: comparison of constant 85 with expression of type 'svtype' is always false [-Wtautological-constant-out-of-range-compare] The warning is a false positive, this code is /meant/ to detect conditions that should not happen.
* Copy keys for aassign in lvalue subFather Chrysostomos2012-12-111-2/+6
| | | | | | | | | Checking LVRET (which pp_aassign does, as of a few commits ago) has no effect if OPpMAYBE_LVSUB is not set on the op. This com- mit changes op.c:op_lvalue_flags to set this flag on aassign ops. This makes sub:lvalue{%h=($x,$x)} behave correctly if the return values of the sub are assigned to ($x is unmodfied).
* Switch dump.c over to using SvREFCNT_dec_NNFather Chrysostomos2012-12-091-14/+14
| | | | | No uses of SvREFCNT_dec in dump.c are ever passed null SVs, so don’t check for that.
* silence some clang warningsDavid Mitchell2012-12-041-1/+0
| | | | | | | | principally, proto.h was sometimes being included twice - once before a fn decl, and once after - giving rise to a 'decl after def' warning. Also, S_croak_memory_wrap was declared static in every source file, but not used in some. So selectively disable the unused-function warning.
* New COW mechanismFather Chrysostomos2012-11-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | This was discussed in ticket #114820. This new copy-on-write mechanism stores a reference count for the PV inside the PV itself, at the very end. (I was using SvEND+1 at first, but parts of the regexp engine expect to be able to do SvCUR_set(sv,0), which causes the wrong byte of the string to be used as the reference count.) Only 256 SVs can share the same PV this way. Also, only strings with allocated space after the trailing null can be used for copy-on-write. Much of the code is shared with PERL_OLD_COPY_ON_WRITE. The restric- tion against doing copy-on-write with magical variables has hence been inherited, though it is not necessary. A future commit will take care of that. I had to modify _core_swash_init to handle $@ differently. The exist- ing mechanism of copying $@ to a new scalar and back again was very fragile. With copy-on-write, $@ =~ s/// can cause pp_subst’s string pointers to become stale. So now we remove the scalar from *@ and allow the utf8-table-loading code to autovivify a new one. Then we restore the untouched $@ afterwards if all goes well.
* Hash Function Change - Murmur hash and true per process hash seedYves Orton2012-11-171-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: *) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. *) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. *) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. *) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. *) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. *) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
* dump.c: Fix non-mad threaded build errorFather Chrysostomos2012-11-141-6/+5
| | | | | | | This was introduced by 75a6ad4aa3. C preprocessors treat the ‘aTHX_ foo’ in MACRO(aTHX_ foo) as a sin- gle argument.
* SVf_IsCOWFather Chrysostomos2012-11-141-0/+1
| | | | | | | | | | | | | | | As discussed in ticket #114820, instead of using READONLY+FAKE to mark a copy-on-write string, we should make it a separate flag. There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib) that assume that SvREADONLY means read-only. Only one CPAN module, POSIX::pselect will definitely be broken by this. Others may need to be tweaked. But I believe this is for the better. It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq- uisite for any new COW scheme that creates COWs under the same cir- cumstances.
* Combine duplicate dump codeReini Urban2012-11-141-293/+134
| | | | | Combine op_flags and op_private dumper for MAD (xml) and -Dx dump. Add some previously missing flags.
* add padrange opDavid Mitchell2012-11-101-14/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This single op can, in some circumstances, replace the sequence of a pushmark followed by one or more padsv/padav/padhv ops, and possibly a trailing 'list' op, but only where the targs of the pad ops form a continuous range. This is generally more efficient, but is particularly so in the case of void-context my declarations, such as: my ($a,@b); Formerly this would be executed as the following set of ops: pushmark pushes a new mark padsv[$a] pushes $a, does a SAVEt_CLEARSV padav[@b] pushes all the flattened elements (i.e. none) of @a, does a SAVEt_CLEARSV list pops the mark, and pops all stack elements except the last nextstate pops the remaining stack element It's now: padrange[$a..@b] does two SAVEt_CLEARSV's nextstate nothing needing doing to the stack Note that in the case above, this commit changes user-visible behaviour in pathological cases; in particular, it has always been possible to modify a lexical var *before* the my is executed, using goto or closure tricks. So in principle someone could tie an array, then could notice that FETCH is no longer being called, e.g. f(); my ($s, @a); # this no longer triggers two FETCHES sub f { tie @a, ...; push @a, 1,2; } But I think we can live with that. Note also that having a padrange operator will allow us shortly to have a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple individual SAVEt_CLEARSV's.
* Add C define to remove taint support from perlSteffen Mueller2012-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By defining NO_TAINT_SUPPORT, all the various checks that perl does for tainting become no-ops. It's not an entirely complete change: it doesn't attempt to remove the taint-related interpreter variables, but instead virtually eliminates access to it. Why, you ask? Because it appears to speed up perl's run-time significantly by avoiding various "are we running under taint" checks and the like. This change is not in a state to go into blead yet. The actual way I implemented it might raise some (valid) objections. Basically, I replaced all uses of the global taint variables (but not PL_taint_warn!) with an extra layer of get/set macros (TAINT_get/TAINTING_get). Furthermore, the change is not complete: - PL_taint_warn would likely deserve the same treatment. - Obviously, tests fail. We have tests for -t/-T - Right now, I added a Perl warn() on startup when -t/-T are detected but the perl was not compiled support it. It might be argued that it should be silently ignored! Needs some thinking. - Code quality concerns - needs review. - Configure support required. - Needs thinking: How does this tie in with CPAN XS modules that use PL_taint and friends? It's easy to backport the new macros via PPPort, but that doesn't magically change all code out there. Might be harmless, though, because whenever you're running under NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false. Thus, the only CPAN code that SHOULD be adversely affected is code that changes taint state.
* Allow regexp-to-pvlv assignmentFather Chrysostomos2012-10-301-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the xpvlv and regexp structs conflict, we have to find somewhere else to put the regexp struct. I was going to sneak it in SvPVX, allocating a buffer large enough to fit the regexp struct followed by the string, and have SvPVX - sizeof(regexp) point to the struct. But that would make all regexp flag-checking macros fatter, and those are used in hot code. So I came up with another method. Regexp stringification is not speed-critical. So we can move the regexp stringification out of re->sv_u and put it in the regexp struct. Then the regexp struct itself can be pointed to by re->sv_u. So SVt_REGEXPs will have re->sv_any and re->sv_u pointing to the same spot. PVLVs can then have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u point to a regexp struct. All regexp member access can go through sv_u instead of sv_any, which will be no slower than before. Regular expressions will no longer be SvPOK, so we give sv_2pv spec- ial logic for regexps. We don’t need to make the regexp struct larger, as SvLEN is currently always 0 iff mother_re is set. So we can replace the SvLEN field with the pv. SvFAKE is never used without SvPOK or SvSCREAM also set. So we can use that to identify regexps.
* Define RXf_SPLIT and RXf_SKIPWHITE as 0Father Chrysostomos2012-10-111-4/+0
| | | | | | | | They are on longer used in core, and we need room for more flags. The only CPAN modules that use them check whether RXf_SPLIT is set (which no longer happens) before setting RXf_SKIPWHITE (which is ignored).
* dump.c: Dump CvNAME_HEKFather Chrysostomos2012-09-151-1/+4
|
* Don't copy all of the match string bufferDavid Mitchell2012-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pattern matches, and that pattern contains captures (or $`, $&, $' or /p are present), a copy is made of the whole original string, so that $1 et al continue to hold the correct value even if the original string is subsequently modified. This can have severe performance penalties; for example, this code causes a 1Mb buffer to be allocated, copied and freed a million times: $&; $x = 'x' x 1_000_000; 1 while $x =~ /(.)/g; This commit changes this so that, where possible, only the needed substring of the original string is copied: in the above case, only a 1-byte buffer is copied each time. Also, it now reuses or reallocs the buffer, rather than freeing and mallocing each time. Now that PL_sawampersand is a 3-bit flag indicating separately whether $`, $& and $' have been seen, they each contribute only their own individual penalty; which ones have been seen will limit the extent to which we can avoid copying the whole buffer. Note that the above code *without* the $& is not currently slow, but only because the copying is artificially disabled to avoid the performance hit. The next but one commit will remove that hack, meaning that it will still be fast, but will now be correct in the presence of a modified original string. We achieve this by by adding suboffset and subcoffset fields to the existing subbeg and sublen fields of a regex, to indicate how many bytes and characters have been skipped from the logical start of the string till the physical start of the buffer. To avoid copying stuff at the end, we just reduce sublen. For example, in this: "abcdefgh" =~ /(c)d/ subbeg points to a malloced buffer containing "c\0"; sublen == 1, and suboffset == 2 (as does subcoffset). while if $& has been seen, subbeg points to a malloced buffer containing "cd\0"; sublen == 2, and suboffset == 2. If in addition $' has been seen, then subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6, and suboffset == 2. The regex engine won't do this by default; there are two new flag bits, REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with REXEC_COPY_STR, request that the engine skip the start or end of the buffer (it will still copy in the presence of the relevant $`, $&, $', /p). Only pp_match has been enhanced to use these extra flags; substitution can't easily benefit, since the usual action of s///g is to copy the whole string first time round, then perform subsequent matching iterations against the copy, without further copying. So you still need to copy most of the buffer.
* "op-entry" DTrace probeShawn M Moore2012-08-281-0/+2
|
* Correct typo in flag nameFather Chrysostomos2012-08-251-2/+2
|
* Banish boolkeysFather Chrysostomos2012-08-251-2/+4
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* Optimise %hash in sub { %hash || ... }Father Chrysostomos2012-08-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In %hash || $foo, the %hash is in scalar context, so it has to iterate through the buckets to produce statistics on bucket usage. If the || is in void context, the value returned by hash is only ever used as a boolean (as || doesn’t have to return it). We already opti- mise it by adding a boolkeys op when it is known at compile time that || will be in void context. In sub { %hash || $foo } it is not known at compile time that it will be in void context, so it wasn’t optimised. This commit optimises it by flagging the %hash at compile time as being possibly in ‘true boolean’ context. When that flag is set, the rv2hv and padhv ops call block_gimme() to see whether || is in void context. This speeds things up signficantly. Here is what I got after optimis- ing rv2hv but before doing padhv: $ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m0.179s user 0m0.101s sys 0m0.005s $ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m5.446s user 0m2.419s sys 0m0.015s (That example is slightly misleading because of the closure, but the closure version takes 1 sec. when optimised.)
* Use FooBAR convention for new pad macrosFather Chrysostomos2012-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a while, I realised that it can be confusing for PAD_ARRAY and PAD_MAX to take a pad argument, but for PAD_SV to take a number and PAD_SET_CUR a padlist. I was copying the HEK_KEY convention, which was probably a bad idea. This is what we use elsewhere: TypeMACRO ----===== AvMAX CopFILE PmopSTASH StashHANDLER OpslabREFCNT_dec Furthermore, heks are not part of the API, so what convention they use is not so important. So these: PADNAMELIST_* PADLIST_* PADNAME_* PAD_* are now: Padnamelist* Padlist* Padname* Pad*
* Stop padlists from being AVsFather Chrysostomos2012-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | In order to fix a bug, I need to add new fields to padlists. But I cannot easily do that as long as they are AVs. So I have created a new padlist struct. This not only allows me to extend the padlist struct with new members as necessary, but also saves memory, as we now have a three-pointer struct where before we had a whole SV head (3-4 pointers) + XPVAV (5 pointers). This will unfortunately break half of CPAN, but the pad API docs clearly say this: NOTE: this function is experimental and may change or be removed without notice. This would have broken B::Debug, but a patch sent upstream has already been integrated into blead with commit 9d2d23d981.
* Use PADLIST in more placesFather Chrysostomos2012-08-211-1/+1
| | | | | Much code relies on the fact that PADLIST is typedeffed as AV. PADLIST should be treated as a distinct type.
* Add a depth field to formatsFather Chrysostomos2012-08-051-2/+1
| | | | | Instead of lengthening the struct, we can reuse SvCUR, which is cur- rently unused.
* Disallow setting SvPV on formatsFather Chrysostomos2012-08-051-1/+1
| | | | | | | Setting a the PV on a format is meaningless, as of the previ- ous commit. This frees up SvCUR for other uses.
* Make PL_(top|body|form)target PVIVsFather Chrysostomos2012-08-051-2/+0
| | | | | | | | | | | | | These are only used for storing a string and an IV. Making them into full-blown SVt_PVFMs is overkill. FmLINES was only being used on these three scalars. So make it use the SvIVX field. struct xpvfm no longer needs an xfm_lines member, because SVt_PVFMs no longer use it. This also causes a TODO test in taint.t to start passing, but I do not fully understand why. But at least that’s progress. :-)
* dump.c: Dump op->op_s(labbed|avefree)Father Chrysostomos2012-07-141-1/+3
|
* Remove op_latefree(d)Father Chrysostomos2012-07-141-7/+1
| | | | | | | This was an early attempt to fix leaking of ops after syntax errors, disabled because it was deemed to fragile. The new slab allocator (8be227a) has solved this problem another way, so latefree(d) no longer serves any purpose.
* Record folded constants in the op treeFather Chrysostomos2012-07-041-0/+3
|
* dump.c: Dump CVf_SLABBEDFather Chrysostomos2012-06-291-0/+1
|
* Teach dump.c about CVf_HASEVALFather Chrysostomos2012-06-201-0/+1
|
* eliminate RExC_seen_evals and RExC_rx->seen_evalsDavid Mitchell2012-06-131-2/+0
| | | | | these were used as part of the old "use re 'eval'" security mechanism used by the now-eliminated PL_reginterp_cnt
* add PMf_IS_QR flagDavid Mitchell2012-06-131-1/+2
| | | | | | | | | | | This indicates that a particular PMOP is in fact OP_QR. We should of course be able to tell this from op_type, but the regex-compiling API only gets passed op_flags. This then allows us to fix a bug where we were deciding during compilation whether to hang on to the code_blocks based on whether the PMOP was PMf_HAS_CV rather than PMf_IS_QR; the latter implies the former, but not the other way round.
* fix dumping of PMf_CODELIST_PRIVATE flagDavid Mitchell2012-06-131-1/+1
|
* add PMf_CODELIST_PRIVATE flagDavid Mitchell2012-06-131-2/+8
| | | | | | | | | | | | | | | This indicates that the op_code_list field in a PMOP is "private"; that is, it points to a list of DO blocks that we don't own, and shouldn't free, and whose pad may not match ours. This will allow us to use the op_code_list field in the runtime case of literal code, e.g. /$runtime(?{...})/ and qr/$runtime(?{...})/. Here, at compile-time, we need to make the pre-compiled (?{..}) blocks available to pp_regcomp, but the list containing those blocks is also the list that is executed in the lead-up to executing pp_regcomp (while skipping the DO blocks), so the code is already embedded, and doesn't need freeing. Furthermore, in the qr// case, the code blocks are actually within a different sub (an anon one) than the PMOP, so the pads won't match.
* make qr/(?{})/ behave with closuresDavid Mitchell2012-06-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this commit, qr// with a literal (compile-time) code block will Do the Right Thing as regards closures and the scope of lexical vars; in particular, the following now creates three regexes that match 1, 2 and 3: for my $i (0..2) { push @r, qr/^(??{$i})$/; } "1" =~ $r[1]; # matches Previously, $i would be evaluated as undef in all 3 patterns. This is achieved by wrapping the compilation of the pattern within a new anonymous CV, which is then attached to the pattern. At run-time pp_qr() clones the CV as well as copying the REGEXP; and when the code block is executed, it does so using the pad of the cloned CV. Which makes everything come out all right in the wash. The CV is stored in a new field of the REGEXP, called qr_anoncv. Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/; nor is it yet fixed where the qr// is embedded within another pattern: continuing with the code example from above, my $i = 99; "1" =~ $r[1]; # bare qr: matches: correct! "X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still seeing the wrong $i
* Mostly complete fix for literal /(?{..})/ blocksDavid Mitchell2012-06-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the way that code blocks in patterns are parsed and executed, especially as regards lexical and scoping behaviour. (Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns, and literals within qr//, are still done the old broken way for now). This change means that for literal /(?{..})/ and /(??{..})/: * the code block is now fully parsed in the same pass as the surrounding code, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" ) * Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from #!/usr/bin/perl /(?{ warn "boo" })/ has changed from boo at (re_eval 1) line 1. to boo at /tmp/p line 2. * scope and closures now behave as you might expect; for example for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ } now prints "abc" rather than "" * with recursion, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000": sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0); * an earlier fix that stopped 'my' declarations within code blocks causing crashes, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed; * UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though) This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit, for the non-qr and non-runtime variants, those code blocks are no longer thrown away. Instead: * the LISTOP generated by the parser, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern, is now stored in a new field in PMOPs, called op_code_list. For example in /A(?{BLOCK})C/, the listop stored in op_code_list looks like LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B'] * each of the code blocks has its last op set to null and is individually run through the peephole optimiser, so each one becomes a little self-contained block of code, rather than a list of blocks that run into each other; * then in re_op_compile(), we concatenate the list of CONSTs to produce a string to be compiled, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})']; * (if the current regex engine isn't the built-in perl one, then we just throw away the code blocks and pass the concatenated string to the engine) * then during regex compilation, whenever we encounter a '(?{', we see if it matches the index of one of the pre-compiled blocks, and if so, we store a pointer to that block in an 'l' data slot, and use the end index to skip over the text of the code body. Conversely, if the index doesn't match, then we know that it's a run-time pattern and (for now), compile it in the old way. * During execution, when an EVAL op is encountered, if data->what is 'l', then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* Remove OPpCONST_WARNINGFather Chrysostomos2012-05-211-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This was added to op.h in commit 599cee73: commit 599cee73f2261c5e09cde7ceba3f9a896989e117 Author: Paul Marquess <paul.marquess@btinternet.com> Date: Wed Jul 29 10:28:45 1998 +0100 lexical warnings; tweaks to places that didn't apply correctly Message-Id: <9807290828.AA26286@claudius.bfsec.bt.co.uk> Subject: lexical warnings patch for 5.005_50 p4raw-id: //depot/perl@1773 dump.c was modified to dump in, in this commit: commit bf91b999b25fa75a3ef7a327742929592a2e7e9c Author: Simon Cozens <simon@netthink.co.uk> Date: Sun May 13 21:20:36 2001 +0100 Op private flags Message-ID: <20010513202036.A21896@netthink.co.uk> p4raw-id: //depot/perl@10117 But is apparently completely unused anywhere. And I want that bit.
* Use the new utf8 to code point functionsKarl Williamson2012-03-191-2/+2
| | | | | These functions should be used in preference to the old ones which can read beyond the end of the input string.
* Adjust substr offsets when using, not when creating, lvalueFather Chrysostomos2011-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;