delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	[perl #117855] Store CopFILEGV in a pad under ithreads	Father Chrysostomos	2013-08-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This saves having to allocate a separate string buffer for every cop (control op; every statement has one). Under non-threaded builds, every cop has a pointer to the GV for that source file, namely *{"_<filename"}. Under threaded builds, the name of the GV used to be stored instead. Now we store an offset into the per-interpreter PL_filegvpad, which points to the GV. This makes no significant speed difference, but it reduces mem- ory usage.
*	bump version to v5.19.3	Aristotle Pagaltzis	2013-07-22	1	-1/+1
\|
*	-DPERL_TRACE_OPS to produce reports on executed OP counts	Steffen Mueller	2013-07-02	1	-0/+9
\| \| \| \| \| \| \| \| \|	This produces a report on the number of OPs of a given type that were executed at the end of a program run. This can be useful in multiple ways. One, it can help determine hotspots for optimization (yes, I know execution count is not equal execution time). It can also help with determining whether a given change to perl has had the desired effect on deterministic programs.
*	SV_CONST(name) and PL_sv_consts	Ruslan Zakirov	2013-06-30	1	-0/+2
\| \| \| \| \| \| \| \| \|	SV_CONST(XXX) returns SV* that contains "XXX" string. SVs are built on demand and stored in interp's structure for re-use. All SVs have precomputed hash value. Creates SVs on demand, we don't want 35 SV created during compile time or cloned during thread creation.
*	bump version to v5.19.2	David Golden	2013-06-20	1	-1/+1
\|
*	better comment the remaining PL_ regex vars	David Mitchell	2013-06-02	1	-3/+4
\|
*	eliminate PL_regdummy	David Mitchell	2013-06-02	1	-2/+0
\| \| \| \| \| \| \|	This global (per-interpreter) var is just used during regex compilation as a placeholder to point RExC_emit at during the first (non-emitting) pass, to indicate to not to emit anything. There's no need for it to be a global var: just add it as an extra field in the RExC_state_t struct instead.
*	eliminate PL_reg_state	David Mitchell	2013-06-02	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	This is a struct that holds all the global state of the current regex match. The previous set of commits have gradually removed all the fields of this struct (by making things local rather than global state). Since the struct is now empty, the PL_reg_state var can be removed, along with the SAVEt_RE_STATE save type which was used to save and restore those fields on recursive re-entry to the regex engine.
*	make PL_reg_curpm global	David Mitchell	2013-06-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PL_reg_curpm is actually #deffed to a field within PL_reg_state; promote it into a fully autonomous perl-interpreter variable. PL_reg_curpm points to a fake PMOP that's used to temporarily point PL_curpm to, that we can hang the current regex off, so that this works: "a" =~ /^(.)(?{ print $1 })/ # prints 'a' It turns out that it doesn't need to be saved and restored when we recursively enter the regex engine; that is already handled by saving and restoring which regex is currently attached to PL_reg_curpm. So we just need a single global (per interpreter) placeholder. Since we're shortly going to get rid of PL_reg_state, we need to move it out of that struct.
*	bump version to 5.19.1	Ricardo Signes	2013-05-20	1	-1/+1
\|
*	bump version to 5.19.0	Ricardo Signes	2013-05-18	1	-1/+1
\|
*	Make it possible to disable and control hash key traversal randomization	Yves Orton	2013-05-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control the level of randomization applied to keys() and friends. When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The chance that keys() changes due to an insert will be the same as in previous perls, basically only when the bucket size is changed. When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable way. The chance that keys() changes due to an insert will be very high. This is the most secure and default mode. When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way. Repititive runs of the same program should produce the same output every time. The chance that keys changes due to an insert will be very high. This patch also makes PERL_HASH_SEED imply a non-default PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a different level overrides this behavior. Includes changes to allow one to compile out various aspects of the patch. One can compile such that PERL_PERTURB_KEYS is not respected, or can compile without hash key traversal randomization at all. Note that support for these modes is incomplete, and currently a few tests will fail. Also includes a new subroutine in Hash::Util::hash_traversal_mask() which can be used to ensure a given hash produces a predictable key order (assuming the same hash seed is in effect). This sub acts as a getter and a setter. NOTE - this patch lacks tests, but I lack tuits to get them done quickly, so I am pushing this with the hope that others can add them afterwards.
*	Re-order intrpvar.h to minimise holes in the interpreter struct.	Nicholas Clark	2013-03-20	1	-20/+23
\| \| \| \|	Commit 19bc2726ec6be805 created 32 bytes of holes (on LP64 systems).
*	Harden hashes against hash seed discovery by randomizing hash iteration	Yves Orton	2013-03-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds: S_ptr_hash() - A new static function in hv.c which can be used to hash a pointer or integer. PL_hash_rand_bits - A new interpreter variable used as a cheap provider of "semi-random" state for use by the hash infrastructure. xpvhv_aux.xhv_rand - Used as a mask which is xored against the xpvhv_aux.riter during iteration to randomize the order the actual buckets are visited. PL_hash_rand_bits is initialized as interpreter start from the random hash seed, and then modified by "mixing in" the result of ptr_hash() on the bucket array pointer in the hv (HvARRAY(hv)) every time hv_auxinit() allocates a new iterator structure. The net result is that every hash has its own iteration order, which should make it much more difficult to determine what the current hash seed is. This required some test to be restructured, as they tested for something that was not necessarily true, we never guaranteed that two hashes with the same keys would produce the same key order, we merely promised that using keys(), values(), or each() on the same hash, without any insertions in between, would produce the same order of visiting the key/values.
*	reorder intrpvar.h	David Mitchell	2013-03-09	1	-109/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move more of the more commonly-used PL_ variables towards the front of the file (and thus to the top of the interpreter struct on MULTIPLICITY builds). This helps ensure that "hot" variables are clustered together on the same small number of cache lines, and also that the machine code to load them will have shorter offsets, which on some architectures may be achieved with shorter instructions. The "hotness" has been determined purely by my subjective judgement rather than any profiling. It's still open for the later to be done. (Only simple shunting of whole lines has been done; no changes have been made to individual lines.)
*	Prepare PL_sv_objcount removal	Steffen Mueller	2013-03-06	1	-1/+3
\| \| \| \| \| \| \| \| \|	This used to keep track of all objects. At least by now, that is for no particularly good reason. Just because it could avoid a bit of work during global destruction if no objects remained. Let's do less work at run-time instead. The interpreter global will remain for one deprecation cycle.
*	Use native-size integers for some global counters	Steffen Mueller	2013-02-27	1	-3/+3
\| \| \| \| \| \| \| \| \|	It may be unlikely that a Perl program will hit 2 billion SVs, but by the time that 5.18 is ancient history, it's looking a lot more likely. This makes two global counters use native-size ints. I'm preserving signedness just for hysterical raisins: It might be deliberate.
*	Rename PL_interp_size_5_16_0 to PL_interp_size_5_18_0.	Nicholas Clark	2013-02-19	1	-2/+2
\|
*	Re-order intrpvar.h to minimise holes in the interpreter struct.	Nicholas Clark	2013-02-19	1	-4/+6
\| \| \| \| \| \| \| \| \| \|	Holes were created by commit f59909ab8dad6ceb (April 2012) which removed PL_reginterp_cnt, commit 7dc8663964c66a69 (Nov 2012) which removed PL_rehash_seed_set, and commit 8936b48a49448f4e (Dec 2012) which removed PL_glob_index. There is still an unavoidable U16 sized hole on the default threaded configuration on x86_64. (U8 if PERL_SAWAMPERSAND is defined).
*	regex: Add pseudo-Posix class: 'cased'	Karl Williamson	2012-12-31	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
*	handy.h: Add full complement of isIDCONT() macros	Karl Williamson	2012-12-23	1	-0/+1
\| \| \| \| \| \| \|	This also changes isIDCONT_utf8() to use the Perl definition, which excludes any \W characters (the Unicode definition includes a few of these). Tests are also added. These macros remain undocumented for now.
*	Use an array for some inversion lists	Karl Williamson	2012-12-22	1	-6/+2
\| \| \| \| \|	Previous commits have placed some inversion list pointers into arrays. This commit extends that to another group of inversion lists
*	Use an array for some inversion lists	Karl Williamson	2012-12-22	1	-29/+2
\| \| \| \| \|	An earlier commit placed some inversion list pointers into an array. This commit extends that to another group of inversion lists.
*	Use array for some inversion lists	Karl Williamson	2012-12-22	1	-8/+1
\| \| \| \| \| \|	This patch creates an array pointing to the inversion lists that cover the Latin-1 ranges for Posix character classes, and uses it instead of the individual variables previously referred to.
*	intrpvar.h: Place some swash pointers in an array	Karl Williamson	2012-12-22	1	-9/+1
\|
*	intrpvar.h: #include handy.h	Karl Williamson	2012-12-22	1	-0/+2
\| \| \| \|	This will allow some mnemonics to be used in future commits
*	regexec.c: More efficient Korean \X processing	Karl Williamson	2012-12-16	1	-1/+0
\| \| \| \| \| \|	This refactors the code slightly that checks for Korean precomposed syllables in \X. It eliminates the PL_variable formerly used to keep track of things.
*	Zap PL_glob_index	Father Chrysostomos	2012-12-09	1	-2/+0
\| \| \| \|	As of the previous commit, nothing is using it.
*	Add functions for getting ctype ALNUMC	Karl Williamson	2012-12-09	1	-0/+1
\| \| \| \| \| \| \|	We think this is meant to stand for C's alphanumeric, that is what is matched by POSIX [:alnum:]. There were not functions and a dedicated swash available for accessing it. Future commits will want to use these.
*	intrpvar.h: Add comment	Karl Williamson	2012-12-09	1	-1/+1
\|
*	intrpvar.h: Use #define instead of hard-coded number	Karl Williamson	2012-12-09	1	-1/+1
\| \| \| \|	The number 12 is mysterious as to why we are using it otherwise.
*	Disable PL_sawampersand	Father Chrysostomos	2012-11-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PL_sawampersand actually causes bugs (e.g., perl #4289), because the behaviour changes. eval '$&' after a match will produce different results depending on whether $& was seen before the match. Using copy-on-write for the pre-match copy (preceding patches do that) alleviates the slowdown caused by mentioning $&. The copy doesn’t happen unless the string is modified after the match. It’s now a post- match copy. So we no longer need to do things differently depending on whether $& has been seen. PL_sawampersand is now #defined to be equal to what it would be if every program began with $',$&,$`. I left the PL_sawampersand code in place, in case this commit proves immature. Running Configure with -Accflags=PERL_SAWAMPERSAND will reënable the PL_sawampersand mechanism.
*	Remove 3 unused interpreter variables	Karl Williamson	2012-11-26	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	These variables have been unused in the Perl core since commit 4c88d5e0740d796bf5064336d280bba72897f385. The variables are undocumented. The only real use of any of these I found in CPAN is at https://metacpan.org/source/ABERGMAN/Devel-GC-Helper-0.25/Helper.xs#L1 The uses there appear to be in a list of known Perl variables. Since the module was published, more than a few new variables have been added, making this code obsolete anyway.
*	Hash Function Change - Murmur hash and true per process hash seed	Yves Orton	2012-11-17	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch does the following: ) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. ) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. ) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. ) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. ) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. ) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
*	Validate above-Latin1 characters in \N{} aliases	Karl Williamson	2012-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This completes the process of allowing users to define their own aliases for \N{} in any language they choose. Names have some validation applied so that they can't, for example, begin with something that is a digit in some Unicode script. Tests and documentation are included in this patch. The loop in toke.c that does the validation for user-supplied translators is revamped, and the messages that are output when there is an error are fixed to work with UTF-8.
*	Used pad name lists for pad ids	Father Chrysostomos	2012-10-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I added pad IDs so that a pad could record which pad it closes over, to avoid problems with closures closing over the wrong pad, resulting in crashes or bizarre copies. These pad IDs were shared between clones of the same pad. In commit 9ef8d56, for efficiency I made clones of the same closure share the same pad name list. It has just occurred to be that each padlist containing the same pad name list also has the same pad ID, so we can just use the pad name list itself as the ID. This makes padlists 32 bits smaller and eliminates PL_pad_generation from the interpreter struct.
*	PATCH: [perl #89774] multi-char fold + its fold in char class	Karl Williamson	2012-10-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The design for handling characters that fold to multiple characters when the former are encountered in a bracketed character class is defective. The ticket reads, "If a bracketed character class includes a character that has a multi-char fold, and it also includes the first character of that fold, the multi-char fold will never be matched; just the first character of the fold.". Thus, in the class /[\0-\xff]/i, \xDF will never be matched, because its fold is 'ss', the first character of which, 's', is also in the class. The reason the design is defective is that it doesn't allow for backtracking and trying the other options. This commit solves this by effectively rewriting the above to be / (?: \xdf \| [\0-\xde\xe0-\xff] ) /xi. And so the backtracking gets handled automatcially by the regex engine.
*	Eliminate the vestigial comment "magical thingies" from intrpvar.h	Nicholas Clark	2012-09-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original comment "magical thingies" was added to perl.c by commit 8ebc5c0145d2e355 in Jan 1997. It is above code which frees 4 interpreter- global SVs in perl_destruct. The comment "magical thingies" was in intrpvar.h since the file was created by commit 49f531dad558d800 on 29 Nov 1997. At that time, it was followed by a block of 13 relevant interpreter global variables. However, by commit d4cce5f1785350c2 (30 Nov 1997) all bar two were now in other places, mostly in thrdvar.h. With the abolition of PL_formfeed, the comment now annotates just one "magical" thingy, PL_basetime, which isn't even one of the SVs freed at the analogous location in perl.c. Hence the comment adds no value.
*	Get rid of PL_formfeed.	Enache Adrian	2012-09-20	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$^L is neither a magical variable, nor a normal one (like $;) but it's just a little bit special :) This patch removes PL_formfeed - IMHO, an extra gv_fetchpv per page when using formats isn't going to cause a sensible speed regression. I suppose that removing the intrpvar.h hunk from the patch is enough to keep binary compatibility - unless someone used PL_formfeed from an XS module. [with regen.pl run as noted by the author, and an additional change to perl.c to remove the reference to PL_formfeed added soon after this patch was sent]
*	Use macro not swash for utf8 quotemeta	Karl Williamson	2012-09-13	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
*	regexec.c: Use new macros instead of swashes	Karl Williamson	2012-09-13	1	-7/+0
\| \| \| \| \| \| \| \| \| \|	A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
*	PL_sawampersand: use 3 bit flags rather than bool	David Mitchell	2012-09-08	1	-1/+1
\| \| \| \| \| \| \| \|	Set a separate flag for each of $`, $& and $'. It still works fine in boolean context. This will allow us to have more refined control over what parts of a match string to copy (we currently copy the whole string).
*	Refactor \X regex handling to avoid a typical case table lookup	Karl Williamson	2012-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	Prior to this commit 98.4% of Unicode code points that went through \X had to be looked up to see if they begin a grapheme cluster; then looked up again to find that they didn't require special handling. This commit refactors things so only one look-up is required for those 98.4%. It changes the table generated by mktables to accomplish this, and hence the name of it, and references to it are changed to correspond.
*	Prepare for Unicode 6.2	Karl Williamson	2012-08-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This changes code to be able to handle Unicode 6.2, while continuing to handle all prevrious releases. The major change was a new definition of \X, which adds a property to its calculation. Unfortunately \X is hard-coded into regexec.c, and so has to revised whenever there is a change of this magnitude in Unicode, which fortunately isn't all that often. I refactored the code in mktables to make it easier next time there is a change like this one.
*	Comment out unused function	Karl Williamson	2012-08-25	1	-1/+0
\| \| \| \| \| \|	In looking at \X handling, I noticed that this function which is intended for use in it, actually isn't used. This function may someday be useful, so I'm leaving the source in.
*	Use new types for comppad and comppad_name	Father Chrysostomos	2012-08-21	1	-2/+2
\| \| \| \| \| \|	I know that a few times I’ve looked at perl source files to find out what type to use in ‘<type> foo = PL_whatever’. So I am changing intrpvar.h as well as the api docs.
*	Fix format closure bug with redefined outer sub	Father Chrysostomos	2012-08-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CVs close over their outer CVs. So, when you write: my $x = 52; sub foo { sub bar { sub baz { $x } } } baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo, and foo’s to the main cv. When the inner reference to $x is looked up, the CvOUTSIDE chain is followed, and each sub’s pad is looked at to see if it has an $x. (This happens at compile time.) It can happen that bar is undefined and then redefined: undef &bar; eval 'sub bar { my $x = 34 }'; After this, baz will still refer to the main cv’s $x (52), but, if baz had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x. (It’s not really a new bar, as its refaddr is the same, but it has a new body.) This particular case is harmless, and is obscure enough that we could define it any way we want, and it could still be considered correct. The real problem happens when CVs are cloned. When a CV is cloned, its name pad already contains the offsets into the parent pad where the values are to be found. If the outer CV has been undefined and redefined, those pad offsets can be com- pletely bogus. Normally, a CV cannot be cloned except when its outer CV is running. And the outer CV cannot have been undefined without also throwing away the op that would have cloned the prototype. But formats can be cloned when the outer CV is not running. So it is possible for cloned formats to close over bogus entries in a new parent pad. In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed) instead of SCALAR(0xdeafbee): sub foo { my $x; format = @ ($x,warn \$x)[0] . } undef &foo; eval 'sub foo { my @x; write }'; foo __END__ And if the offset that the format’s pad closes over is beyond the end of the parent’s new pad, we can even get a crash, as in this case: eval 'sub foo {' . '{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999 . q\| my $x; format = @ ($x,warn \$x)[0] . } \|; undef &foo; eval 'sub foo { my @x; my $x = 34; write }'; foo(); __END__ So now, instead of using CvROOT to identify clones of CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t actually have an ID, so we give them one. Any time a sub is cloned, the new padlist gets the same ID as the old. The format needs to remember what its outer sub’s padlist ID was, so we put that in the padlist struct, too.
*	regcomp.c: Fix multi-char fold bug	Karl Williamson	2012-08-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Input text to be matched under /i is placed in EXACTFish nodes. The current limit on such text is 255 bytes per node. Even if we raised that limit, it will always be finite. If the input text is longer than this, it is split across 2 or more nodes. A problem occurs when that split occurs within a potential multi-character fold. For example, if the final character that fits in a node is 'f', and the next character is 'i', it should be matchable by LATIN SMALL LIGATURE FI, but because Perl isn't structured to find multi-char folds that cross node boundaries, we will miss this it. The solution presented here isn't optimum. What we do is try to prevent all EXACTFish nodes from ending in a character that could be at the beginning or middle of a multi-char fold. That prevents the problem. But in actuality, the problem only occurs if the input text is actually a multi-char fold, which happens much less frequently. For example, we try to not end a full node with an 'f', but the problem doesn't actually occur unless the adjacent following node begins with an 'i' (or one of the other characters that 'f' participates in). That is, this patch splits when it doesn't need to. At the point of execution for this patch, we only know that the final character that fits in the node is that 'f'. The next character remains unparsed, and could be in any number of forms, a literal 'i', or a hex, octal, or named character constant, or it may need to be decoded (from 'use encoding'). So look-ahead is not really viable. So finding if a real multi-character fold is involved would have to be done later in the process, when we have full knowledge of the nodes, at the places where join_exact() is now called, and would require inserting a new node(s) in the middle of existing ones. This solution seems reasonable instead. It does not yet address named character constants (\N{}) which currently bypass the code added here.
*	Eliminate PL_OP_SLAB_ALLOC	Father Chrysostomos	2012-07-12	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator is the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
*	PERL_DEBUG_READONLY_OPS with the new allocator	Father Chrysostomos	2012-07-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.