summaryrefslogtreecommitdiff
path: root/dump.c
Commit message (Collapse)AuthorAgeFilesLines
* Hash Function Change - Murmur hash and true per process hash seedYves Orton2012-11-171-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: *) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. *) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. *) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. *) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. *) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. *) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
* dump.c: Fix non-mad threaded build errorFather Chrysostomos2012-11-141-6/+5
| | | | | | | This was introduced by 75a6ad4aa3. C preprocessors treat the ‘aTHX_ foo’ in MACRO(aTHX_ foo) as a sin- gle argument.
* SVf_IsCOWFather Chrysostomos2012-11-141-0/+1
| | | | | | | | | | | | | | | As discussed in ticket #114820, instead of using READONLY+FAKE to mark a copy-on-write string, we should make it a separate flag. There are many modules in CPAN (and 1 in core, Compress::Raw::Zlib) that assume that SvREADONLY means read-only. Only one CPAN module, POSIX::pselect will definitely be broken by this. Others may need to be tweaked. But I believe this is for the better. It causes all tests except ext/Devel-Peek/t/Peek.t (which needs a tiny tweak still) to pass under PERL_OLD_COPY_ON_WRITE, which is a prereq- uisite for any new COW scheme that creates COWs under the same cir- cumstances.
* Combine duplicate dump codeReini Urban2012-11-141-293/+134
| | | | | Combine op_flags and op_private dumper for MAD (xml) and -Dx dump. Add some previously missing flags.
* add padrange opDavid Mitchell2012-11-101-14/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This single op can, in some circumstances, replace the sequence of a pushmark followed by one or more padsv/padav/padhv ops, and possibly a trailing 'list' op, but only where the targs of the pad ops form a continuous range. This is generally more efficient, but is particularly so in the case of void-context my declarations, such as: my ($a,@b); Formerly this would be executed as the following set of ops: pushmark pushes a new mark padsv[$a] pushes $a, does a SAVEt_CLEARSV padav[@b] pushes all the flattened elements (i.e. none) of @a, does a SAVEt_CLEARSV list pops the mark, and pops all stack elements except the last nextstate pops the remaining stack element It's now: padrange[$a..@b] does two SAVEt_CLEARSV's nextstate nothing needing doing to the stack Note that in the case above, this commit changes user-visible behaviour in pathological cases; in particular, it has always been possible to modify a lexical var *before* the my is executed, using goto or closure tricks. So in principle someone could tie an array, then could notice that FETCH is no longer being called, e.g. f(); my ($s, @a); # this no longer triggers two FETCHES sub f { tie @a, ...; push @a, 1,2; } But I think we can live with that. Note also that having a padrange operator will allow us shortly to have a corresponding SAVEt_CLEARPADRANGE save type, that will replace multiple individual SAVEt_CLEARSV's.
* Add C define to remove taint support from perlSteffen Mueller2012-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By defining NO_TAINT_SUPPORT, all the various checks that perl does for tainting become no-ops. It's not an entirely complete change: it doesn't attempt to remove the taint-related interpreter variables, but instead virtually eliminates access to it. Why, you ask? Because it appears to speed up perl's run-time significantly by avoiding various "are we running under taint" checks and the like. This change is not in a state to go into blead yet. The actual way I implemented it might raise some (valid) objections. Basically, I replaced all uses of the global taint variables (but not PL_taint_warn!) with an extra layer of get/set macros (TAINT_get/TAINTING_get). Furthermore, the change is not complete: - PL_taint_warn would likely deserve the same treatment. - Obviously, tests fail. We have tests for -t/-T - Right now, I added a Perl warn() on startup when -t/-T are detected but the perl was not compiled support it. It might be argued that it should be silently ignored! Needs some thinking. - Code quality concerns - needs review. - Configure support required. - Needs thinking: How does this tie in with CPAN XS modules that use PL_taint and friends? It's easy to backport the new macros via PPPort, but that doesn't magically change all code out there. Might be harmless, though, because whenever you're running under NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false. Thus, the only CPAN code that SHOULD be adversely affected is code that changes taint state.
* Allow regexp-to-pvlv assignmentFather Chrysostomos2012-10-301-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the xpvlv and regexp structs conflict, we have to find somewhere else to put the regexp struct. I was going to sneak it in SvPVX, allocating a buffer large enough to fit the regexp struct followed by the string, and have SvPVX - sizeof(regexp) point to the struct. But that would make all regexp flag-checking macros fatter, and those are used in hot code. So I came up with another method. Regexp stringification is not speed-critical. So we can move the regexp stringification out of re->sv_u and put it in the regexp struct. Then the regexp struct itself can be pointed to by re->sv_u. So SVt_REGEXPs will have re->sv_any and re->sv_u pointing to the same spot. PVLVs can then have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u point to a regexp struct. All regexp member access can go through sv_u instead of sv_any, which will be no slower than before. Regular expressions will no longer be SvPOK, so we give sv_2pv spec- ial logic for regexps. We don’t need to make the regexp struct larger, as SvLEN is currently always 0 iff mother_re is set. So we can replace the SvLEN field with the pv. SvFAKE is never used without SvPOK or SvSCREAM also set. So we can use that to identify regexps.
* Define RXf_SPLIT and RXf_SKIPWHITE as 0Father Chrysostomos2012-10-111-4/+0
| | | | | | | | They are on longer used in core, and we need room for more flags. The only CPAN modules that use them check whether RXf_SPLIT is set (which no longer happens) before setting RXf_SKIPWHITE (which is ignored).
* dump.c: Dump CvNAME_HEKFather Chrysostomos2012-09-151-1/+4
|
* Don't copy all of the match string bufferDavid Mitchell2012-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pattern matches, and that pattern contains captures (or $`, $&, $' or /p are present), a copy is made of the whole original string, so that $1 et al continue to hold the correct value even if the original string is subsequently modified. This can have severe performance penalties; for example, this code causes a 1Mb buffer to be allocated, copied and freed a million times: $&; $x = 'x' x 1_000_000; 1 while $x =~ /(.)/g; This commit changes this so that, where possible, only the needed substring of the original string is copied: in the above case, only a 1-byte buffer is copied each time. Also, it now reuses or reallocs the buffer, rather than freeing and mallocing each time. Now that PL_sawampersand is a 3-bit flag indicating separately whether $`, $& and $' have been seen, they each contribute only their own individual penalty; which ones have been seen will limit the extent to which we can avoid copying the whole buffer. Note that the above code *without* the $& is not currently slow, but only because the copying is artificially disabled to avoid the performance hit. The next but one commit will remove that hack, meaning that it will still be fast, but will now be correct in the presence of a modified original string. We achieve this by by adding suboffset and subcoffset fields to the existing subbeg and sublen fields of a regex, to indicate how many bytes and characters have been skipped from the logical start of the string till the physical start of the buffer. To avoid copying stuff at the end, we just reduce sublen. For example, in this: "abcdefgh" =~ /(c)d/ subbeg points to a malloced buffer containing "c\0"; sublen == 1, and suboffset == 2 (as does subcoffset). while if $& has been seen, subbeg points to a malloced buffer containing "cd\0"; sublen == 2, and suboffset == 2. If in addition $' has been seen, then subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6, and suboffset == 2. The regex engine won't do this by default; there are two new flag bits, REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with REXEC_COPY_STR, request that the engine skip the start or end of the buffer (it will still copy in the presence of the relevant $`, $&, $', /p). Only pp_match has been enhanced to use these extra flags; substitution can't easily benefit, since the usual action of s///g is to copy the whole string first time round, then perform subsequent matching iterations against the copy, without further copying. So you still need to copy most of the buffer.
* "op-entry" DTrace probeShawn M Moore2012-08-281-0/+2
|
* Correct typo in flag nameFather Chrysostomos2012-08-251-2/+2
|
* Banish boolkeysFather Chrysostomos2012-08-251-2/+4
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* Optimise %hash in sub { %hash || ... }Father Chrysostomos2012-08-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In %hash || $foo, the %hash is in scalar context, so it has to iterate through the buckets to produce statistics on bucket usage. If the || is in void context, the value returned by hash is only ever used as a boolean (as || doesn’t have to return it). We already opti- mise it by adding a boolkeys op when it is known at compile time that || will be in void context. In sub { %hash || $foo } it is not known at compile time that it will be in void context, so it wasn’t optimised. This commit optimises it by flagging the %hash at compile time as being possibly in ‘true boolean’ context. When that flag is set, the rv2hv and padhv ops call block_gimme() to see whether || is in void context. This speeds things up signficantly. Here is what I got after optimis- ing rv2hv but before doing padhv: $ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m0.179s user 0m0.101s sys 0m0.005s $ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000' real 0m5.446s user 0m2.419s sys 0m0.015s (That example is slightly misleading because of the closure, but the closure version takes 1 sec. when optimised.)
* Use FooBAR convention for new pad macrosFather Chrysostomos2012-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a while, I realised that it can be confusing for PAD_ARRAY and PAD_MAX to take a pad argument, but for PAD_SV to take a number and PAD_SET_CUR a padlist. I was copying the HEK_KEY convention, which was probably a bad idea. This is what we use elsewhere: TypeMACRO ----===== AvMAX CopFILE PmopSTASH StashHANDLER OpslabREFCNT_dec Furthermore, heks are not part of the API, so what convention they use is not so important. So these: PADNAMELIST_* PADLIST_* PADNAME_* PAD_* are now: Padnamelist* Padlist* Padname* Pad*
* Stop padlists from being AVsFather Chrysostomos2012-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | In order to fix a bug, I need to add new fields to padlists. But I cannot easily do that as long as they are AVs. So I have created a new padlist struct. This not only allows me to extend the padlist struct with new members as necessary, but also saves memory, as we now have a three-pointer struct where before we had a whole SV head (3-4 pointers) + XPVAV (5 pointers). This will unfortunately break half of CPAN, but the pad API docs clearly say this: NOTE: this function is experimental and may change or be removed without notice. This would have broken B::Debug, but a patch sent upstream has already been integrated into blead with commit 9d2d23d981.
* Use PADLIST in more placesFather Chrysostomos2012-08-211-1/+1
| | | | | Much code relies on the fact that PADLIST is typedeffed as AV. PADLIST should be treated as a distinct type.
* Add a depth field to formatsFather Chrysostomos2012-08-051-2/+1
| | | | | Instead of lengthening the struct, we can reuse SvCUR, which is cur- rently unused.
* Disallow setting SvPV on formatsFather Chrysostomos2012-08-051-1/+1
| | | | | | | Setting a the PV on a format is meaningless, as of the previ- ous commit. This frees up SvCUR for other uses.
* Make PL_(top|body|form)target PVIVsFather Chrysostomos2012-08-051-2/+0
| | | | | | | | | | | | | These are only used for storing a string and an IV. Making them into full-blown SVt_PVFMs is overkill. FmLINES was only being used on these three scalars. So make it use the SvIVX field. struct xpvfm no longer needs an xfm_lines member, because SVt_PVFMs no longer use it. This also causes a TODO test in taint.t to start passing, but I do not fully understand why. But at least that’s progress. :-)
* dump.c: Dump op->op_s(labbed|avefree)Father Chrysostomos2012-07-141-1/+3
|
* Remove op_latefree(d)Father Chrysostomos2012-07-141-7/+1
| | | | | | | This was an early attempt to fix leaking of ops after syntax errors, disabled because it was deemed to fragile. The new slab allocator (8be227a) has solved this problem another way, so latefree(d) no longer serves any purpose.
* Record folded constants in the op treeFather Chrysostomos2012-07-041-0/+3
|
* dump.c: Dump CVf_SLABBEDFather Chrysostomos2012-06-291-0/+1
|
* Teach dump.c about CVf_HASEVALFather Chrysostomos2012-06-201-0/+1
|
* eliminate RExC_seen_evals and RExC_rx->seen_evalsDavid Mitchell2012-06-131-2/+0
| | | | | these were used as part of the old "use re 'eval'" security mechanism used by the now-eliminated PL_reginterp_cnt
* add PMf_IS_QR flagDavid Mitchell2012-06-131-1/+2
| | | | | | | | | | | This indicates that a particular PMOP is in fact OP_QR. We should of course be able to tell this from op_type, but the regex-compiling API only gets passed op_flags. This then allows us to fix a bug where we were deciding during compilation whether to hang on to the code_blocks based on whether the PMOP was PMf_HAS_CV rather than PMf_IS_QR; the latter implies the former, but not the other way round.
* fix dumping of PMf_CODELIST_PRIVATE flagDavid Mitchell2012-06-131-1/+1
|
* add PMf_CODELIST_PRIVATE flagDavid Mitchell2012-06-131-2/+8
| | | | | | | | | | | | | | | This indicates that the op_code_list field in a PMOP is "private"; that is, it points to a list of DO blocks that we don't own, and shouldn't free, and whose pad may not match ours. This will allow us to use the op_code_list field in the runtime case of literal code, e.g. /$runtime(?{...})/ and qr/$runtime(?{...})/. Here, at compile-time, we need to make the pre-compiled (?{..}) blocks available to pp_regcomp, but the list containing those blocks is also the list that is executed in the lead-up to executing pp_regcomp (while skipping the DO blocks), so the code is already embedded, and doesn't need freeing. Furthermore, in the qr// case, the code blocks are actually within a different sub (an anon one) than the PMOP, so the pads won't match.
* make qr/(?{})/ behave with closuresDavid Mitchell2012-06-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this commit, qr// with a literal (compile-time) code block will Do the Right Thing as regards closures and the scope of lexical vars; in particular, the following now creates three regexes that match 1, 2 and 3: for my $i (0..2) { push @r, qr/^(??{$i})$/; } "1" =~ $r[1]; # matches Previously, $i would be evaluated as undef in all 3 patterns. This is achieved by wrapping the compilation of the pattern within a new anonymous CV, which is then attached to the pattern. At run-time pp_qr() clones the CV as well as copying the REGEXP; and when the code block is executed, it does so using the pad of the cloned CV. Which makes everything come out all right in the wash. The CV is stored in a new field of the REGEXP, called qr_anoncv. Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/; nor is it yet fixed where the qr// is embedded within another pattern: continuing with the code example from above, my $i = 99; "1" =~ $r[1]; # bare qr: matches: correct! "X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still seeing the wrong $i
* Mostly complete fix for literal /(?{..})/ blocksDavid Mitchell2012-06-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the way that code blocks in patterns are parsed and executed, especially as regards lexical and scoping behaviour. (Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns, and literals within qr//, are still done the old broken way for now). This change means that for literal /(?{..})/ and /(??{..})/: * the code block is now fully parsed in the same pass as the surrounding code, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" ) * Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from #!/usr/bin/perl /(?{ warn "boo" })/ has changed from boo at (re_eval 1) line 1. to boo at /tmp/p line 2. * scope and closures now behave as you might expect; for example for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ } now prints "abc" rather than "" * with recursion, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000": sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0); * an earlier fix that stopped 'my' declarations within code blocks causing crashes, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed; * UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though) This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit, for the non-qr and non-runtime variants, those code blocks are no longer thrown away. Instead: * the LISTOP generated by the parser, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern, is now stored in a new field in PMOPs, called op_code_list. For example in /A(?{BLOCK})C/, the listop stored in op_code_list looks like LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B'] * each of the code blocks has its last op set to null and is individually run through the peephole optimiser, so each one becomes a little self-contained block of code, rather than a list of blocks that run into each other; * then in re_op_compile(), we concatenate the list of CONSTs to produce a string to be compiled, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})']; * (if the current regex engine isn't the built-in perl one, then we just throw away the code blocks and pass the concatenated string to the engine) * then during regex compilation, whenever we encounter a '(?{', we see if it matches the index of one of the pre-compiled blocks, and if so, we store a pointer to that block in an 'l' data slot, and use the end index to skip over the text of the code body. Conversely, if the index doesn't match, then we know that it's a run-time pattern and (for now), compile it in the old way. * During execution, when an EVAL op is encountered, if data->what is 'l', then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* Remove OPpCONST_WARNINGFather Chrysostomos2012-05-211-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This was added to op.h in commit 599cee73: commit 599cee73f2261c5e09cde7ceba3f9a896989e117 Author: Paul Marquess <paul.marquess@btinternet.com> Date: Wed Jul 29 10:28:45 1998 +0100 lexical warnings; tweaks to places that didn't apply correctly Message-Id: <9807290828.AA26286@claudius.bfsec.bt.co.uk> Subject: lexical warnings patch for 5.005_50 p4raw-id: //depot/perl@1773 dump.c was modified to dump in, in this commit: commit bf91b999b25fa75a3ef7a327742929592a2e7e9c Author: Simon Cozens <simon@netthink.co.uk> Date: Sun May 13 21:20:36 2001 +0100 Op private flags Message-ID: <20010513202036.A21896@netthink.co.uk> p4raw-id: //depot/perl@10117 But is apparently completely unused anywhere. And I want that bit.
* Use the new utf8 to code point functionsKarl Williamson2012-03-191-2/+2
| | | | | These functions should be used in preference to the old ones which can read beyond the end of the input string.
* Adjust substr offsets when using, not when creating, lvalueFather Chrysostomos2011-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;
* narrower localisation of PL_compcv around evalZefram2011-11-191-1/+1
| | | | | | | | | | | | | | PL_compcv used to be localised around the entire string eval process, and hence at runtime of the evaled code would refer to the evaled code rather than code of a surrounding compilation. This interfered with the ability of string-evaled code in a BEGIN block to affect the surrounding compilation, in a similar way to the localisation of $^H and %^H that was fixed in f45b078d20. Similar to the fix there, this change moves the localisation of PL_compcv inside the new evalcomp scope. A couple of things were relying on PL_compcv to find the running code when in a string-eval scope; they now need to find it from cx->blk_eval.cv, which was already being populated.
* in op_dump() / -Dx, replace "DONE" with "NULL"David Mitchell2011-10-171-1/+1
| | | | | | When displaying op_next, it currently shows a null value as "DONE", which while meaningful on a completely compiled tree, is confusing on a partially-built tree, where multiple ops may have an op_next of null.
* simplify op_dump() / -Dx sequencingDavid Mitchell2011-10-171-106/+17
| | | | | | | | | | | | | | | | Currently, whenever we dump an op tree, we first call sequence(), which walks the tree, creating address => sequence# mappings in PL_op_sequence. Then when individual ops or op-next fields are displayed, the sequence is looked up. Instead, do away with the initial walk, and just map addresses on request. This simplifies the code. As a deliberate side-effect, it no longer assigns a seq# of zero to null ops. This makes it easer to work out what's going on when you call op_dump() during a debugging session with partially constructed op-trees. It also removes the ambiguity in "====> 0" as to whether op_next is NULL or just points to an op_null.
* Resolve XS AUTOLOAD-prototype conflictFather Chrysostomos2011-10-091-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Did you know that a subroutine’s prototype can be modified with s///? Don’t look: *AUTOLOAD = *Internals'SvREFCNT; my $f = "Just another "; eval{main->$f}; print prototype AUTOLOAD; $f =~ s/Just another /Perl hacker,\n/; print prototype AUTOLOAD; You did look, didn’t you? You must admit that’s creepy. The problem goes back to this: commit adb5a9ae91a0bed93d396bb0abda99831f9e2e6f Author: Doug MacEachern <dougm@covalent.net> Date: Sat Jan 6 01:30:05 2001 -0800 [patch] xsub AUTOLOAD fix/optimization Message-ID: <Pine.LNX.4.10.10101060924280.24460-100000@mojo.covalent.net> Allow AUTOLOAD to be an xsub and allow such xsubs to avoid use of $AUTOLOAD. p4raw-id: //depot/perl@8362 which includes this: + if (CvXSUB(cv)) { + /* rather than lookup/init $AUTOLOAD here + * only to have the XSUB do another lookup for $AUTOLOAD + * and split that value on the last '::', + * pass along the same data via some unused fields in the CV + */ + CvSTASH(cv) = stash; + SvPVX(cv) = (char *)name; /* cast to loose constness warning */ + SvCUR(cv) = len; + return gv; + } That ‘unused’ field is not unused. It’s where the prototype is stored. So, not only is it clobbering the prototype, it’s also leak- ing it by assigning over the top of SvPVX. Furthermore, it’s blindly assigning someone else’s string, which could be freed before it’s even used. Since it has been documented for a long time that SvPVX contains the name of the AUTOLOADed sub, and since the use of SvPVX for prototypes is documented nowhere, we have to preserve the former. So this commit makes the prototype and the sub name share the same buffer, in a manner resembling that which CvFILE used before I changed it with bad4ae38. There are two new internal macros, CvPROTO and CvPROTOLEN for retriev- ing the prototype.
* make SVs_PADTMP and SVs_PADSTALE share a bitDavid Mitchell2011-10-071-2/+4
| | | | | | | | | | | SVs_PADSTALE is only meaningful with SVs_PADMY, while SVs_PADTMP is only meaningful with !SVs_PADMY, so let them share the same flag bit. Note that this doesn't yet free a bit in SvFLAGS, as the two bits are also used for SVpad_STATE, SVpad_TYPED. (This is is follow-on to 62bb6514085e5eddc42b4fdaf3713ccdb7f1da85.)
* remove index offsetting ($[)Zefram2011-09-091-3/+0
| | | | | | $[ remains as a variable. It no longer has compile-time magic. At runtime, it always reads as zero, accepts a write of zero, but dies on writing any other value.
* [perl #97088] Prevent double get-magic in various casesGerard Goossen2011-08-241-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prevents get-magic from executing twice during autovivifi- cation when the op doing the autovivification is not directly nested inside the dereferencing op. This can happen in cases like this: ${ (), $a } = 1; Previously (as of 5.13.something), the outer op was marked with the OPpDEREFed flag, which indicated that get-magic had already been called by the vivifying op (calling get-magic during vivification is inevitable): $ perl5.14.0 -MO=Concise -e '${ $a } = 1' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 2 -e:1) v:{ ->3 7 <2> sassign vKS/2 ->8 3 <$> const[IV 1] s ->4 6 <1> rv2sv sKRM*/DREFed,1 ->7 <-- right here - <@> scope sK ->6 - <0> ex-nextstate v ->4 5 <1> rv2sv sKM/DREFSV,1 ->6 4 <#> gv[*a] s ->5 -e syntax OK But in the ${()...} example above, there is a list op in the way that prevents the flag from being set inside the peephole optimizer. It’s not even possible to set it correctly in all cases, as in this exam- ple, which would need it both set and not set depending on which branch of the ternary operator is executed: ${ $x ? delete $a[0] : $a[0] } = 1 Instead of setting the OPpDEREFed flag, we now make a non-magic copy of the SV in vivify_ref (the first time get-magic is executed).
* [perl #96126] Allocate CvFILE more simplyFather Chrysostomos2011-08-171-0/+1
| | | | | | | | | | | | | | | | | | | See the thread starting at: http://www.nntp.perl.org/group/perl.perl5.porters/2011/07/msg175161.html Instead of assuming that only Perl subs have mallocked CvFILEs and only under threads, resulting in various hackery to borrow parts of the SvPVX buffer where that assumption proves wrong, we can simply add another flag (DYNFILE) to indicate whether CvFILE is mallocked, instead of trying to use the ISXSUB flag for two purposes. This simplifies the code greatly, eliminating bug #96126 in the pro- cess (which had to do with sv_dup not knowing about the hackery that this commit removes). I removed that comment from cv_ckproto_len about CONSTSUBs doubling up the buffer field, as it is no longer relevant. But I still left the code as it is, since it’s better to do an explicit length check.
* Remove OPpENTERSUB_NOMOD.Gerard Goossen2011-08-151-3/+0
| | | | | | OPpENTERSUB_NOMOD was always set in combination with OPf_WANT_VOID which is now used to not propagate the lvalue context, making OPpENTERSUB_NOMOD redundant.
* [perl #85026] Iterate hashes by hand during do_sv_dumpTon Hospel2011-06-111-12/+24
| | | | | | | | | A further note: while debugging this issue it was annoying that Devel::Peek::Dump doesb't actually dump the HASH elements when an iterator is active. Also added is a patch that does the iteration to dump the HASH contents by iterating over it by hand (not disturbing any active iterator). With that it also doesn't activate hash magic during iteration, which I think is a feature
* Revert "Perl_do_sv_dump: alert when skipping elements"Father Chrysostomos2011-06-111-27/+22
| | | | | | | This reverts commit 002beaef76a1595af2e39ffd4cd55c595bd6c271. I am about to apply the manual-iteration patch from ticket #85026. It conflicts with 002beaef, but it also renders 002beaef unnecessary.
* Generate magic_names in dump.c using mg_vtable.pl.Nicholas Clark2011-06-111-43/+1
|
* Provide the names of the magic vtables in PL_magic_vtable_names[].Nicholas Clark2011-06-111-33/+5
| | | | | | As it's a 1 to 1 mapping with the vtables in PL_magic_vtables[], refactor Perl_do_magic_dump() to index into it directly to find the name for an arbitrary mg_virtual, avoiding a long switch statement.
* Replace references to PL_vtbl_{bm,fm} in the code with PL_vtbl_regexp.Nicholas Clark2011-06-111-2/+0
| | | | | Also, in Perl_sv_magic() merge the case for PERL_MAGIC_dbfile with the others that return a NULL vtable.
* Abolish PL_vtbl_sig. It's been all 0s since it was added in 5.0 alpha 2.Nicholas Clark2011-06-111-1/+0
| | | | | Magic with a NULL vtable is equivalent to magic with a vtable of all 0s. On CPAN, only Apache::Peek's code for 5.005 is referencing it.