delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	bump version to 5.19.1	Ricardo Signes	2013-05-20	1	-1/+1
\|
*	bump version to 5.19.0	Ricardo Signes	2013-05-18	1	-1/+1
\|
*	Make it possible to disable and control hash key traversal randomization	Yves Orton	2013-05-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control the level of randomization applied to keys() and friends. When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The chance that keys() changes due to an insert will be the same as in previous perls, basically only when the bucket size is changed. When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable way. The chance that keys() changes due to an insert will be very high. This is the most secure and default mode. When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way. Repititive runs of the same program should produce the same output every time. The chance that keys changes due to an insert will be very high. This patch also makes PERL_HASH_SEED imply a non-default PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a different level overrides this behavior. Includes changes to allow one to compile out various aspects of the patch. One can compile such that PERL_PERTURB_KEYS is not respected, or can compile without hash key traversal randomization at all. Note that support for these modes is incomplete, and currently a few tests will fail. Also includes a new subroutine in Hash::Util::hash_traversal_mask() which can be used to ensure a given hash produces a predictable key order (assuming the same hash seed is in effect). This sub acts as a getter and a setter. NOTE - this patch lacks tests, but I lack tuits to get them done quickly, so I am pushing this with the hope that others can add them afterwards.
*	Re-order intrpvar.h to minimise holes in the interpreter struct.	Nicholas Clark	2013-03-20	1	-20/+23
\| \| \| \|	Commit 19bc2726ec6be805 created 32 bytes of holes (on LP64 systems).
*	Harden hashes against hash seed discovery by randomizing hash iteration	Yves Orton	2013-03-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds: S_ptr_hash() - A new static function in hv.c which can be used to hash a pointer or integer. PL_hash_rand_bits - A new interpreter variable used as a cheap provider of "semi-random" state for use by the hash infrastructure. xpvhv_aux.xhv_rand - Used as a mask which is xored against the xpvhv_aux.riter during iteration to randomize the order the actual buckets are visited. PL_hash_rand_bits is initialized as interpreter start from the random hash seed, and then modified by "mixing in" the result of ptr_hash() on the bucket array pointer in the hv (HvARRAY(hv)) every time hv_auxinit() allocates a new iterator structure. The net result is that every hash has its own iteration order, which should make it much more difficult to determine what the current hash seed is. This required some test to be restructured, as they tested for something that was not necessarily true, we never guaranteed that two hashes with the same keys would produce the same key order, we merely promised that using keys(), values(), or each() on the same hash, without any insertions in between, would produce the same order of visiting the key/values.
*	reorder intrpvar.h	David Mitchell	2013-03-09	1	-109/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move more of the more commonly-used PL_ variables towards the front of the file (and thus to the top of the interpreter struct on MULTIPLICITY builds). This helps ensure that "hot" variables are clustered together on the same small number of cache lines, and also that the machine code to load them will have shorter offsets, which on some architectures may be achieved with shorter instructions. The "hotness" has been determined purely by my subjective judgement rather than any profiling. It's still open for the later to be done. (Only simple shunting of whole lines has been done; no changes have been made to individual lines.)
*	Prepare PL_sv_objcount removal	Steffen Mueller	2013-03-06	1	-1/+3
\| \| \| \| \| \| \| \| \|	This used to keep track of all objects. At least by now, that is for no particularly good reason. Just because it could avoid a bit of work during global destruction if no objects remained. Let's do less work at run-time instead. The interpreter global will remain for one deprecation cycle.
*	Use native-size integers for some global counters	Steffen Mueller	2013-02-27	1	-3/+3
\| \| \| \| \| \| \| \| \|	It may be unlikely that a Perl program will hit 2 billion SVs, but by the time that 5.18 is ancient history, it's looking a lot more likely. This makes two global counters use native-size ints. I'm preserving signedness just for hysterical raisins: It might be deliberate.
*	Rename PL_interp_size_5_16_0 to PL_interp_size_5_18_0.	Nicholas Clark	2013-02-19	1	-2/+2
\|
*	Re-order intrpvar.h to minimise holes in the interpreter struct.	Nicholas Clark	2013-02-19	1	-4/+6
\| \| \| \| \| \| \| \| \| \|	Holes were created by commit f59909ab8dad6ceb (April 2012) which removed PL_reginterp_cnt, commit 7dc8663964c66a69 (Nov 2012) which removed PL_rehash_seed_set, and commit 8936b48a49448f4e (Dec 2012) which removed PL_glob_index. There is still an unavoidable U16 sized hole on the default threaded configuration on x86_64. (U8 if PERL_SAWAMPERSAND is defined).
*	regex: Add pseudo-Posix class: 'cased'	Karl Williamson	2012-12-31	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
*	handy.h: Add full complement of isIDCONT() macros	Karl Williamson	2012-12-23	1	-0/+1
\| \| \| \| \| \| \|	This also changes isIDCONT_utf8() to use the Perl definition, which excludes any \W characters (the Unicode definition includes a few of these). Tests are also added. These macros remain undocumented for now.
*	Use an array for some inversion lists	Karl Williamson	2012-12-22	1	-6/+2
\| \| \| \| \|	Previous commits have placed some inversion list pointers into arrays. This commit extends that to another group of inversion lists
*	Use an array for some inversion lists	Karl Williamson	2012-12-22	1	-29/+2
\| \| \| \| \|	An earlier commit placed some inversion list pointers into an array. This commit extends that to another group of inversion lists.
*	Use array for some inversion lists	Karl Williamson	2012-12-22	1	-8/+1
\| \| \| \| \| \|	This patch creates an array pointing to the inversion lists that cover the Latin-1 ranges for Posix character classes, and uses it instead of the individual variables previously referred to.
*	intrpvar.h: Place some swash pointers in an array	Karl Williamson	2012-12-22	1	-9/+1
\|
*	intrpvar.h: #include handy.h	Karl Williamson	2012-12-22	1	-0/+2
\| \| \| \|	This will allow some mnemonics to be used in future commits
*	regexec.c: More efficient Korean \X processing	Karl Williamson	2012-12-16	1	-1/+0
\| \| \| \| \| \|	This refactors the code slightly that checks for Korean precomposed syllables in \X. It eliminates the PL_variable formerly used to keep track of things.
*	Zap PL_glob_index	Father Chrysostomos	2012-12-09	1	-2/+0
\| \| \| \|	As of the previous commit, nothing is using it.
*	Add functions for getting ctype ALNUMC	Karl Williamson	2012-12-09	1	-0/+1
\| \| \| \| \| \| \|	We think this is meant to stand for C's alphanumeric, that is what is matched by POSIX [:alnum:]. There were not functions and a dedicated swash available for accessing it. Future commits will want to use these.
*	intrpvar.h: Add comment	Karl Williamson	2012-12-09	1	-1/+1
\|
*	intrpvar.h: Use #define instead of hard-coded number	Karl Williamson	2012-12-09	1	-1/+1
\| \| \| \|	The number 12 is mysterious as to why we are using it otherwise.
*	Disable PL_sawampersand	Father Chrysostomos	2012-11-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PL_sawampersand actually causes bugs (e.g., perl #4289), because the behaviour changes. eval '$&' after a match will produce different results depending on whether $& was seen before the match. Using copy-on-write for the pre-match copy (preceding patches do that) alleviates the slowdown caused by mentioning $&. The copy doesn’t happen unless the string is modified after the match. It’s now a post- match copy. So we no longer need to do things differently depending on whether $& has been seen. PL_sawampersand is now #defined to be equal to what it would be if every program began with $',$&,$`. I left the PL_sawampersand code in place, in case this commit proves immature. Running Configure with -Accflags=PERL_SAWAMPERSAND will reënable the PL_sawampersand mechanism.
*	Remove 3 unused interpreter variables	Karl Williamson	2012-11-26	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	These variables have been unused in the Perl core since commit 4c88d5e0740d796bf5064336d280bba72897f385. The variables are undocumented. The only real use of any of these I found in CPAN is at https://metacpan.org/source/ABERGMAN/Devel-GC-Helper-0.25/Helper.xs#L1 The uses there appear to be in a list of known Perl variables. Since the module was published, more than a few new variables have been added, making this code obsolete anyway.
*	Hash Function Change - Murmur hash and true per process hash seed	Yves Orton	2012-11-17	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch does the following: ) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. ) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. ) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. ) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. ) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. ) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
*	Validate above-Latin1 characters in \N{} aliases	Karl Williamson	2012-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This completes the process of allowing users to define their own aliases for \N{} in any language they choose. Names have some validation applied so that they can't, for example, begin with something that is a digit in some Unicode script. Tests and documentation are included in this patch. The loop in toke.c that does the validation for user-supplied translators is revamped, and the messages that are output when there is an error are fixed to work with UTF-8.
*	Used pad name lists for pad ids	Father Chrysostomos	2012-10-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I added pad IDs so that a pad could record which pad it closes over, to avoid problems with closures closing over the wrong pad, resulting in crashes or bizarre copies. These pad IDs were shared between clones of the same pad. In commit 9ef8d56, for efficiency I made clones of the same closure share the same pad name list. It has just occurred to be that each padlist containing the same pad name list also has the same pad ID, so we can just use the pad name list itself as the ID. This makes padlists 32 bits smaller and eliminates PL_pad_generation from the interpreter struct.
*	PATCH: [perl #89774] multi-char fold + its fold in char class	Karl Williamson	2012-10-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The design for handling characters that fold to multiple characters when the former are encountered in a bracketed character class is defective. The ticket reads, "If a bracketed character class includes a character that has a multi-char fold, and it also includes the first character of that fold, the multi-char fold will never be matched; just the first character of the fold.". Thus, in the class /[\0-\xff]/i, \xDF will never be matched, because its fold is 'ss', the first character of which, 's', is also in the class. The reason the design is defective is that it doesn't allow for backtracking and trying the other options. This commit solves this by effectively rewriting the above to be / (?: \xdf \| [\0-\xde\xe0-\xff] ) /xi. And so the backtracking gets handled automatcially by the regex engine.
*	Eliminate the vestigial comment "magical thingies" from intrpvar.h	Nicholas Clark	2012-09-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original comment "magical thingies" was added to perl.c by commit 8ebc5c0145d2e355 in Jan 1997. It is above code which frees 4 interpreter- global SVs in perl_destruct. The comment "magical thingies" was in intrpvar.h since the file was created by commit 49f531dad558d800 on 29 Nov 1997. At that time, it was followed by a block of 13 relevant interpreter global variables. However, by commit d4cce5f1785350c2 (30 Nov 1997) all bar two were now in other places, mostly in thrdvar.h. With the abolition of PL_formfeed, the comment now annotates just one "magical" thingy, PL_basetime, which isn't even one of the SVs freed at the analogous location in perl.c. Hence the comment adds no value.
*	Get rid of PL_formfeed.	Enache Adrian	2012-09-20	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$^L is neither a magical variable, nor a normal one (like $;) but it's just a little bit special :) This patch removes PL_formfeed - IMHO, an extra gv_fetchpv per page when using formats isn't going to cause a sensible speed regression. I suppose that removing the intrpvar.h hunk from the patch is enough to keep binary compatibility - unless someone used PL_formfeed from an XS module. [with regen.pl run as noted by the author, and an additional change to perl.c to remove the reference to PL_formfeed added soon after this patch was sent]
*	Use macro not swash for utf8 quotemeta	Karl Williamson	2012-09-13	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
*	regexec.c: Use new macros instead of swashes	Karl Williamson	2012-09-13	1	-7/+0
\| \| \| \| \| \| \| \| \| \|	A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
*	PL_sawampersand: use 3 bit flags rather than bool	David Mitchell	2012-09-08	1	-1/+1
\| \| \| \| \| \| \| \|	Set a separate flag for each of $`, $& and $'. It still works fine in boolean context. This will allow us to have more refined control over what parts of a match string to copy (we currently copy the whole string).
*	Refactor \X regex handling to avoid a typical case table lookup	Karl Williamson	2012-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	Prior to this commit 98.4% of Unicode code points that went through \X had to be looked up to see if they begin a grapheme cluster; then looked up again to find that they didn't require special handling. This commit refactors things so only one look-up is required for those 98.4%. It changes the table generated by mktables to accomplish this, and hence the name of it, and references to it are changed to correspond.
*	Prepare for Unicode 6.2	Karl Williamson	2012-08-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This changes code to be able to handle Unicode 6.2, while continuing to handle all prevrious releases. The major change was a new definition of \X, which adds a property to its calculation. Unfortunately \X is hard-coded into regexec.c, and so has to revised whenever there is a change of this magnitude in Unicode, which fortunately isn't all that often. I refactored the code in mktables to make it easier next time there is a change like this one.
*	Comment out unused function	Karl Williamson	2012-08-25	1	-1/+0
\| \| \| \| \| \|	In looking at \X handling, I noticed that this function which is intended for use in it, actually isn't used. This function may someday be useful, so I'm leaving the source in.
*	Use new types for comppad and comppad_name	Father Chrysostomos	2012-08-21	1	-2/+2
\| \| \| \| \| \|	I know that a few times I’ve looked at perl source files to find out what type to use in ‘<type> foo = PL_whatever’. So I am changing intrpvar.h as well as the api docs.
*	Fix format closure bug with redefined outer sub	Father Chrysostomos	2012-08-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CVs close over their outer CVs. So, when you write: my $x = 52; sub foo { sub bar { sub baz { $x } } } baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo, and foo’s to the main cv. When the inner reference to $x is looked up, the CvOUTSIDE chain is followed, and each sub’s pad is looked at to see if it has an $x. (This happens at compile time.) It can happen that bar is undefined and then redefined: undef &bar; eval 'sub bar { my $x = 34 }'; After this, baz will still refer to the main cv’s $x (52), but, if baz had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x. (It’s not really a new bar, as its refaddr is the same, but it has a new body.) This particular case is harmless, and is obscure enough that we could define it any way we want, and it could still be considered correct. The real problem happens when CVs are cloned. When a CV is cloned, its name pad already contains the offsets into the parent pad where the values are to be found. If the outer CV has been undefined and redefined, those pad offsets can be com- pletely bogus. Normally, a CV cannot be cloned except when its outer CV is running. And the outer CV cannot have been undefined without also throwing away the op that would have cloned the prototype. But formats can be cloned when the outer CV is not running. So it is possible for cloned formats to close over bogus entries in a new parent pad. In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed) instead of SCALAR(0xdeafbee): sub foo { my $x; format = @ ($x,warn \$x)[0] . } undef &foo; eval 'sub foo { my @x; write }'; foo __END__ And if the offset that the format’s pad closes over is beyond the end of the parent’s new pad, we can even get a crash, as in this case: eval 'sub foo {' . '{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999 . q\| my $x; format = @ ($x,warn \$x)[0] . } \|; undef &foo; eval 'sub foo { my @x; my $x = 34; write }'; foo(); __END__ So now, instead of using CvROOT to identify clones of CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t actually have an ID, so we give them one. Any time a sub is cloned, the new padlist gets the same ID as the old. The format needs to remember what its outer sub’s padlist ID was, so we put that in the padlist struct, too.
*	regcomp.c: Fix multi-char fold bug	Karl Williamson	2012-08-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Input text to be matched under /i is placed in EXACTFish nodes. The current limit on such text is 255 bytes per node. Even if we raised that limit, it will always be finite. If the input text is longer than this, it is split across 2 or more nodes. A problem occurs when that split occurs within a potential multi-character fold. For example, if the final character that fits in a node is 'f', and the next character is 'i', it should be matchable by LATIN SMALL LIGATURE FI, but because Perl isn't structured to find multi-char folds that cross node boundaries, we will miss this it. The solution presented here isn't optimum. What we do is try to prevent all EXACTFish nodes from ending in a character that could be at the beginning or middle of a multi-char fold. That prevents the problem. But in actuality, the problem only occurs if the input text is actually a multi-char fold, which happens much less frequently. For example, we try to not end a full node with an 'f', but the problem doesn't actually occur unless the adjacent following node begins with an 'i' (or one of the other characters that 'f' participates in). That is, this patch splits when it doesn't need to. At the point of execution for this patch, we only know that the final character that fits in the node is that 'f'. The next character remains unparsed, and could be in any number of forms, a literal 'i', or a hex, octal, or named character constant, or it may need to be decoded (from 'use encoding'). So look-ahead is not really viable. So finding if a real multi-character fold is involved would have to be done later in the process, when we have full knowledge of the nodes, at the places where join_exact() is now called, and would require inserting a new node(s) in the middle of existing ones. This solution seems reasonable instead. It does not yet address named character constants (\N{}) which currently bypass the code added here.
*	Eliminate PL_OP_SLAB_ALLOC	Father Chrysostomos	2012-07-12	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator is the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
*	PERL_DEBUG_READONLY_OPS with the new allocator	Father Chrysostomos	2012-07-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.
*	handy.h: Fix isBLANK_uni and isBLANK_utf8	Karl Williamson	2012-06-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	These macros have never worked outside the Latin1 range, so this extends them to work. There are no tests I could find for things in handy.h, except that many of them are called all over the place during the normal course of events. This commit adds a new file for such testing, containing for now only with a few tests for the isBLANK's
*	eliminate PL_reginterp_cnt	David Mitchell	2012-06-13	1	-2/+0
\| \| \| \| \| \|	This used to be the mechanism to determine whether "use re 'eval'" needed to be in scope; but now that we make a clear distinction between literal and runtime code blocks, it's no longer needed.
*	[perl #78742] Store CopSTASH in a pad under threads	Father Chrysostomos	2012-06-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this commit, a pointer to the cop’s stash was stored in cop->cop_stash under non-threaded perls, and the name and name length were stored in cop->cop_stashpv and cop->cop_stashlen under ithreads. Consequently, eval "__PACKAGE__" would end up returning the wrong package name under threads if the current package had been assigned over. This commit changes the way cops store their stash under threads. Now it is an offset (cop->cop_stashoff) into the new PL_stashpad array (just a mallocked block), which holds pointers to all stashes that have code compiled in them. I didn’t use the lexical pads, because CopSTASH(cop) won’t work unless PL_curpad is holding the right pad. And things start to get very hairy in pp_caller, since the correct pad isn’t anywhere easily accessible on the context stack (oldcomppad actually referring to the current comppad). The approach I’ve followed uses far less code, too. In addition to fixing the bug, this also saves memory. Instead of allocating a separate PV for every single statement (to hold the stash name), now all lines of code in a package can share the same stashpad slot. So, on a 32-bit OS X, that’s 16 bytes less memory per COP for short package names. Since stashoff is the same size as stashpv, there is no difference there. Each package now needs just 4 bytes in the stashpad for storing a pointer. For speed’s sake PL_stashpadix stores the index of the last-used stashpad offset. So only when switching packages is there a linear search through the stashpad.
*	Excise PL_amagic_generation	Father Chrysostomos	2012-05-23	1	-2/+0
\| \| \| \| \| \| \| \| \|	The core is not using it any more. Every CPAN module that increments it also does newXS, which triggers mro_method_changed_in, which is sufficient; so nothing will break. So, to keep those modules compiling, PL_amagic_generation is now an alias to PL_na outside the core.
*	Remove gete?[ug]id caching	Ævar Arnfjörð Bjarmason	2012-02-18	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we cache the UID/GID and effective UID/GID similarly to how we used to cache getpid() before v5.14.0-251-g0e21945. Remove this magical behavior in favor of always calling getpid(), getgid() etc. This resolves RT #96208. A minimal testcase for this is the following by Leon Timmermans attached to RT #96208: eval { require 'syscall.ph'; 1 } or eval { require 'sys/syscall.ph'; 1 } or die $@; if (syscall(&SYS_setuid, $ARGV[0] + 0 \|\| 1000) >= 0 or die "$!") { printf "\$< = %d, getuid = %d\n", $<, syscall(&SYS_getuid); } I.e. if we call the sete?[ug]id() functions unbeknownst to perl the $<, $>, $( and $) variables won't be updated. This results in the same sort of issues we had with $$ before v5.14.0-251-g0e21945, and getppid() before my v5.15.7-407-gd7c042c patch. I'm completely eliminating the PL_egid, PL_euid, PL_gid and PL_uid variables as part of this patch, this will break some CPAN modules, but it'll be really easy before the v5.16.0 final to reinstate them. I'd like to remove them to see what breaks, and how easy it is to fix it. These variables are not part of the public API, and the modules using them could either use the Perl_gete?[ug]id() functions or are working around the bug I'm fixing with this commit. The new PL_delaymagic_(egid\|euid\|gid\|uid) variables I'm adding are only intended to be used internally in the interpreter to facilitate the delaymagic in Perl_pp_sassign. There's probably some way not to export these to programs that embed perl, but I haven't found out how to do that.
*	perl #77654: quotemeta quotes non-ASCII consistently	Karl Williamson	2012-02-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	As described in the pod changes in this commit, this changes quotemeta() to consistenly quote non-ASCII characters when used under unicode_strings. The behavior is changed for these and UTF-8 encoded strings to more closely align with Unicode's recommendations. The end result is that we could at some future point start using other characters as metacharacters than the 12 we do now.
*	Further eliminate POSIX-emulation under LinuxThreads	Ævar Arnfjörð Bjarmason	2012-02-15	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under POSIX threads the getpid() and getppid() functions return the same values across multiple threads, i.e. threads don't have their own PID's. This is not the case under the obsolete LinuxThreads where each thread has a different PID, so getpid() and getppid() will return different values across threads. Ever since the first perl 5.0 we've returned POSIX-consistent semantics for $$, until v5.14.0-251-g0e21945 when the getpid() cache was removed. In 5.8.1 Rafael added further explicit POSIX emulation in perl-5.8.0-133-g4d76a34 [1] by explicitly caching getppid(), so that multiple threads would always return the same value. I don't think all this effort to emulate POSIX sematics is worth it. I think $$ and getppid() are OS-level functions that should always return the same as their C equivalents. I shouldn't have to use a module like Linux::Pid to get the OS version of the return values. This is pretty much a complete non-issue in practice these days, LinuxThreads was a Linux 2.4 thread implementation that nobody maintains anymore[2], all modern Linux distros use NPTL threads which don't suffer from this discrepancy. Debian GNU/kFreeBSD does use LinuxThreads in the 6.0 release, but they too will be moving away from it in future releases, and really, nobody uses Debian GNU/kFreeBSD anyway. This caching makes it unnecessarily tedious to fork an embedded Perl interpreter. When someone that constructs an embedded perl interpreter and forks their application, the fork(2) system call isn't going to run Perl_pp_fork(), and thus the return value of $$ and getppid() doesn't reflect the current process. See [3] for a bug in uWSGI related to this, and Perl::AfterFork on the CPAN for XS code that you need to run after forking a PerlInterpreter unbeknownst to perl. We've already been failing the tests in t/op/getpid.t on these Linux systems that nobody apparently uses, the Debian GNU/kFreeBSD users did notice and filed #96270, this patch fixes that failure by changing the tests to test for different behavior under LinuxThreads, I've tested that this works on my Debian GNU/kFreeBSD 6.0.4 virtual machine. If this change is found to be unacceptable (i.e. we want to continue to emulate POSIX thread semantics for the sake of LinuxThreads) we also need to revert v5.14.0-251-g0e21945, because currently we're only emulating POSIX semantics for getppid(), not getpid(). But I don't think we should do that, both v5.14.0-251-g0e21945 and this commit are awesome. This commit includes a change to embedvar.h made by "make regen_headers". 1. http://www.nntp.perl.org/group/perl.perl5.porters/2002/08/msg64603.html 2. http://pauillac.inria.fr/~xleroy/linuxthreads/ 3. http://projects.unbit.it/uwsgi/ticket/85
*	intrpvar.h: Rmv no longer used PL_ variable	Karl Williamson	2012-02-11	1	-2/+0
\| \| \| \| \|	Commit 24caacbccae7b938deecdcc3f13dd66c9c6a684e removed all uses of this variable, but failed to remove it.
*	regcomp.c: /[[:lower:]]/i should match the same as /\p{Lower}/i	Karl Williamson	2012-02-11	1	-0/+2
\| \| \| \| \| \|	Same for [[:upper:]] and \p{Upper}. These were matching instead all of [[:alpha:]] or \p{Alpha}. What /\p{Lower}/i and /\p{Upper}/i match instead is \p{Cased}, and so that is what these should match.