delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	regcomp.c: Set flags when optimizing a [char class]	Karl Williamson	2012-08-11	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A bracketed character class containing a single Latin1-range character has long been optimized into an EXACT node. Also, flags are set to include SIMPLE. However, EXACT nodes containing code points that are different when encoded under UTF-8 versus not UTF-8 should not be marked simple. To fix this, the address of the flags parameter is now passed to regclass(), the function that parses bracketed character classes, which now sets it appropriately. The unconditional setting of SIMPLE that was always done in the code after calling regclass() has been removed. In addition, the setting of the flags for EXACT nodes has been pushed into the common function that populates them. regclass() will also now increment the naughtiness count if optimized to a node that normally does that. I do not understand this heuristic behavior very well, and could not come up with a test case for it; experimentation revealed that there are no test cases in our test suite for which naughtiness makes any difference at all.
*	Don’t let format arguments ‘leak out’ of formline	Father Chrysostomos	2012-08-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When parsing formats, the lexer invents tokens to feed to the parser. So when the lexer dissects this: format = @<<<< @>>>> $foo, $bar, $baz . The parser actually sees this (the parser knows that = . is like { }): format = ; formline "@<<<< @>>>>\n", $foo, $bar, $baz; . The lexer makes no effort to make sure that the argument line is con- tained within formline’s arguments. To make { do_stuff; $foo, bar } work, the lexer supplies a ‘do’ before the block, if it is inside a format. This means that $a, $b; $c, $d feeds ($a, $b) to formline, wheras { $a, $b; $c, $d } feeds ($c, $d) to formline. It also has various other strange effects: This script prints "# 0" as I would expect: print "# "; format = @ (0 and die) . write This one, locking parentheses, dies because ‘and’ has low precedence: print "# "; format = @ 0 and die . write This does not work: my $day = "Wed"; format = @<<<<<<<<<< ({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day}) . write You have to do this: my $day = "Wed"; format = @<<<<<<<<<< ({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day}) . write which is very strange and shouldn’t even be valid syntax. This does not work, because ‘no’ is not allowed in an expression: use strict; $::foo = "bar" format = @<<<<<<<<<<< no strict; $foo . write; Putting a block around it makes it work. Putting a semicolon before ‘no’ stop it from being a syntax error, but it silently does the wrong thing. I thought I could fix all these by putting an implicit do { ... } around the argument line and removing the special-casing for an open- ing brace, allowing anonymous hashrefs to work in formats, such that this: format = @<<<< @>>>> $foo, $bar, $baz . would turn into this: format = ; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; }; . But that will lead to madness like this ‘working’: format = @ }+do{ . It would also stop lexicals declared in one format line from being visible in another. So instead this commit starts being honest with the parser. We still have some ‘invented’ tokens, to indicate the start and end of a format line, but now it is the parser itself that understands a sequence of format lines, instead of being fed generated code. So the example above is now presented to the parser like this: format = ; FORMRBRACK "@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK ; . Note about the semicolons: The parser expects to see a semicolon at the end of each statement. So the lexer has to supply one before FORMRBRACK. The final dot goes through the same code that handles closing braces, which generates a semicolon for the same reason. It’s easier to make the parser expect a semicolon before the final dot than to change the } code in the lexer. We use the } code for . because it handles the internal variables that keep track of how many nested lev- els there, what kind, etc. The extra ;FORMRBRACK after the = is there also to keep the lexer sim- ple (ahem). When a newline is encountered during ‘normal’ (as opposed to format picture) parsing inside a format, that’s when the semicolon and FORMRBRACK are emitted. (There was already a semicolon there before this commit. I have just added FORMRBRACK in the same spot.)
*	regcomp.c: inline trivial static function	Karl Williamson	2012-08-02	1	-1/+1
\|
*	regcomp.c: Revise API for static function	Karl Williamson	2012-08-02	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	This is to allow future changes. The function now returns success or failure, and the created regnode (if any) is set via a parameter pointer. I removed the 'register' declaration to get this to work, because such declarations are considered bad form these days, e.g., http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
*	regcomp.c: Make invlist_search() usable from re_comp.c	Karl Williamson	2012-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	This was a static function which I couldn't get to be callable from the debugging version of regcomp.c. This makes it public, but known only in the regcomp.c source file. It changes the name to begin with an underscore so that if someone cheats by adding preprocessor #defines, they still have to call it with the name that convention indicates is a private function.
*	regcomp.c: Rename static fcn to better reflect its purpose	Karl Williamson	2012-08-02	1	-1/+1
\| \| \| \|	This function handles \N of any ilk, not just named sequences.
*	Oust sv_gmagical_2iv_please	Father Chrysostomos	2012-07-28	1	-1/+0
\| \| \| \| \| \| \|	The magic flags patch prevents this from ever being called, since the OK flags work the same way for magic variables now as they have for muggle vars, avoid these fiddly games. (It was when writing it that I realised the value of the magic flags proposal.)
*	Flatten vstrings modified in place	Father Chrysostomos	2012-07-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A substitution forces its target to a string upon successful substitu- tion, even if the substitution did nothing: $ ./perl -Ilib -le '$a = *f; $a =~ s/f/f/; print ref \$a' SCALAR Notice that $a is no longer a glob after s///. But vstrings are different: $ ./perl -Ilib -le '$a = v102; $a =~ s/f/f/; print ref \$a' VSTRING I fixed this in 5.16 (1e6bda93) for those cases where the vstring ends up with a value that doesn’t correspond to the actual string: $ ./perl -Ilib -le '$a = v102; $a =~ s/f/o/; print ref \$a' SCALAR It works through vstring set-magic, that does the check and removes the magic if it doesn’t match. I did it that way because I couldn’t think of any other way to fix bug #29070, and I didn’t realise at the time that I hadn’t fixed all the bugs. By making SvTHINKFIRST true on a vstring, we force it through sv_force_normal before any in-place string operations. We can also make sv_force_normal handle vstrings as well. This fixes all the lin- gering-vstring-magic bugs in just two lines, making the vstring set- magic (which is also slow) redundant. It also allows the special case in sv_setsv_flags to be removed. Or at least that was what I had hoped. It turns out that pp_subst, twists and turns in tortuous ways, and needs special treatment for things like this. And do_trans functions wasn’t checking SvTHINKFIRST when arguably it should have. I tweaked sv_2pv{utf8,byte} to avoid copying magic variables that do not need copying.
*	regcomp.c: Add _invlist_contains_cp	Karl Williamson	2012-07-24	1	-0/+1
\| \| \| \| \|	This simply searches an inversion list without going through a swash. It will be used in a future commit.
*	utf8.c: Add a get_() method to hide internal details	Karl Williamson	2012-07-24	1	-0/+1
\| \| \| \| \| \|	This should have been written this way to begin with (I'm the culprit). But we should have a method so another routine doesn't have to know the internal details.
*	regcomp.c: Add func to test 2 inversion lists for equality	Karl Williamson	2012-07-24	1	-0/+1
\| \| \| \|	This adds _invlistEQ which for now is commented out
*	regcomp.c: Extract some code into an inline function	Karl Williamson	2012-07-24	1	-0/+1
\| \| \| \|	This code will be used in future commits in multiple places
*	regcomp.c: Extract code to inline function	Karl Williamson	2012-07-24	1	-0/+2
\| \| \| \| \| \|	Future commits will use this paradigm in additional places, so extract it to a function, so they all do things right. This isn't a great API, but it works for the few places this will be called.
*	embed.fnc: Remove duplicate entry	Karl Williamson	2012-07-24	1	-1/+0
\|
*	embed.fnc: Add const to remove compiler warning	Karl Williamson	2012-07-24	1	-1/+1
\| \| \| \|	This should have been declared const.
*	Unify code that initializes constants yes, no, and undef	Chip Salzenberg	2012-07-23	1	-0/+1
\|
*	regcomp.c: Refactor code into a function	Karl Williamson	2012-07-19	1	-0/+1
\| \| \| \| \| \|	Future commits will use this functionality in additional places beyond the single one currently. It makes sense to abstract it into a function.
*	utf8.c: Create API so internals can be hidden	Karl Williamson	2012-07-19	1	-0/+1
\| \| \| \| \| \| \|	This creates a function to hide some of the internal details of swashes from the regex engine, which is the only authorized user, enforced through #ifdefs in embed.fnc. These work closely together, but it's best to have a clean interface.
*	Magic flags harmonization.	Chip Salzenberg	2012-07-15	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In restore_magic(), which is called after any magic processing, all of the public OK flags have been shifted into the private OK flags. Thus the lack of an appropriate public OK flags was used to trigger both get magic and required conversions. This scheme did not cover ROK, however, so all properly written code had to make sure mg_get was called the right number of times anyway. Meanwhile the private OK flags gained a second purpose of marking converted but non-authoritative values (e.g. the IV conversion of an NV), and the inadequate flag shift mechanic broke this in some cases. This patch removes the shift mechanic for magic flags, thus exposing (and fixing) some improper usage of magic SVs in which mg_get() was not called correctly. It also has the side effect of making magic get functions specifically set their SVs to undef if that is desired, as the new behavior of empty get functions is to leave the value unchanged. This is a feature, as now get magic that does not modify its value, e.g. tainting, does not have to be special cased. The changes to cpan/ here are only temporary, for development only, to keep blead working until upstream applies them (or something like them). Thanks to Rik and Father C for review input.
*	Eliminate PL_OP_SLAB_ALLOC	Father Chrysostomos	2012-07-12	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator is the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
*	PERL_DEBUG_READONLY_OPS with the new allocator	Father Chrysostomos	2012-07-12	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.
*	regcomp.c: Remove obsolete code	Karl Williamson	2012-06-29	1	-2/+0
\| \| \| \| \| \|	A previous commit has removed all calls to these two functions (moving a large portion of the bit_fold() one to another place, and no longer sets the variable.
*	handy.h: Fix isBLANK_uni and isBLANK_utf8	Karl Williamson	2012-06-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	These macros have never worked outside the Latin1 range, so this extends them to work. There are no tests I could find for things in handy.h, except that many of them are called all over the place during the normal course of events. This commit adds a new file for such testing, containing for now only with a few tests for the isBLANK's
*	Make formats close over the right closure	Father Chrysostomos	2012-06-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was brought up in ticket #113812. Formats that are nested inside closures only work if invoked from directly inside that closure. Calling the format from an inner sub call won’t work. Commit af41786fe57 stopped it from crashing, making it work as well as 5.8, in that closed-over variables would be undefined, being unavailable. This commit adds a variation of the find_runcv function that can check whether CvROOT matches an argument passed in. So we look not for the current sub, but for the topmost sub on the call stack that is a clone of the closure prototype that the format’s CvOUTSIDE field points to.
*	CV-based slab allocation for ops	Father Chrysostomos	2012-06-29	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses bugs #111462 and #112312 and part of #107000. When a longjmp occurs during lexing, parsing or compilation, any ops in C autos that are not referenced anywhere are leaked. This commit introduces op slabs that are attached to the currently- compiling CV. New ops are allocated on the slab. When an error occurs and the CV is freed, any ops remaining are freed. This is based on Nick Ing-Simmons’ old experimental op slab implemen- tation, but it had to be rewritten to work this way. The old slab allocator has a pointer before each op that points to a reference count stored at the beginning of the slab. Freed ops are never reused. When the last op on a slab is freed, the slab itself is freed. When a slab fills up, a new one is created. To allow iteration through the slab to free everything, I had to have two pointers; one points to the next item (op slot); the other points to the slab, for accessing the reference count. Ops come in different sizes, so adding sizeof(OP) to a pointer won’t work. The old slab allocator puts the ops at the end of the slab first, the idea being that the leaves are allocated first, so the order will be cache-friendly as a result. I have preserved that order for a dif- ferent reason: We don’t need to store the size of the slab (slabs vary in size; see below) if we can simply follow pointers to find the last op. I tried eliminating reference counts altogether, by having all ops implicitly attached to PL_compcv when allocated and freed when the CV is freed. That also allowed op_free to skip FreeOp altogether, free- ing ops faster. But that doesn’t work in those cases where ops need to survive beyond their CVs; e.g., re-evals. The CV also has to have a reference count on the slab. Sometimes the first op created is immediately freed. If the reference count of the slab reaches 0, then it will be freed with the CV still point- ing to it. CVs use the new CVf_SLABBED flag to indicate that the CV has a refer- ence count on the slab. When this flag is set, the slab is accessible via CvSTART when CvROOT is not set, or by subtracting two pointers (2sizeof(I32 )) from CvROOT when it is set. I decided to sneak the slab into CvSTART during compilation, because enlarging the xpvcv struct by another pointer would make all CVs larger, even though this patch only benefits few (programs using string eval). When the CVf_SLABBED flag is set, the CV takes responsibility for freeing the slab. If CvROOT is not set when the CV is freed or undeffed, it is assumed that a compilation error has occurred, so the op slab is traversed and all the ops are freed. Under normal circumstances, the CV forgets about its slab (decrement- ing the reference count) when the root is attached. So the slab ref- erence counting that happens when ops are freed takes care of free- ing the slab. In some cases, the CV is told to forget about the slab (cv_forget_slab) precisely so that the ops can survive after the CV is done away with. Forgetting the slab when the root is attached is not strictly neces- sary, but avoids potential problems with CvROOT being written over. There is code all over the place, both in core and on CPAN, that does things with CvROOT, so forgetting the slab makes things more robust and avoids potential problems. Since the CV takes ownership of its slab when flagged, that flag is never copied when a CV is cloned, as one CV could free a slab that another CV still points to, since forced freeing of ops ignores the reference count (but asserts that it looks right). To avoid slab fragmentation, freed ops are marked as freed and attached to the slab’s freed chain (an idea stolen from DBM::Deep). Those freed ops are reused when possible. I did consider not reusing freed ops, but realised that would result in significantly higher mem- ory using for programs with large ‘if (DEBUG) {...}’ blocks. SAVEFREEOP was slightly problematic. Sometimes it can cause an op to be freed after its CV. If the CV has forcibly freed the ops on its slab and the slab itself, then we will be fiddling with a freed slab. Making SAVEFREEOP a no-op won’t help, as sometimes an op can be savefreed when there is no compilation error, so the op would never be freed. It holds a reference count on the slab, so the whole slab would leak. So SAVEFREEOP now sets a special flag on the op (->op_savefree). The forced freeing of ops after a compilation error won’t free any ops thus marked. Since many pieces of code create tiny subroutines consisting of only a few ops, and since a huge slab would be quite a bit of baggage for those to carry around, the first slab is always very small. To avoid allocating too many slabs for a single CV, each subsequent slab is twice the size of the previous. Smartmatch expects to be able to allocate an op at run time, run it, and then throw it away. For that to work the op is simply mallocked when PL_compcv has’t been set up. So all slab-allocated ops are marked as such (->op_slabbed), to distinguish them from mallocked ops. All of this is kept under lock and key via #ifdef PERL_CORE, as it should be completely transparent. If it isn’t transparent, I would consider that a bug. I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as it is used by PERL_DEBUG_READONLY_OPS, which I am not about to rewrite. :-) Concerning the change from A to X for slab allocation functions: Many times in the past, A has been used for functions that were not intended to be public but were used for public macros. Since PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_* to be API functions, since they were rarely actually available. To avoid propagating this mistake further, they are now X.
*	Reset the iterator when an array is cleared	Vincent Pit	2012-06-22	1	-0/+1
\| \| \| \|	This fixes RT #75596.
*	Refactor \x processing to single function	Karl Williamson	2012-06-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	There are three places that process \x. These can and did get out of sync. This moves all three to use a common static inline function so that they all do the same thing on the same inputs, and their behaviors will not drift apart again. This commit should not change current behavior. A previous commit was designed to bring all three to identical behavior.
*	Don’t create pads for sub stubs	Father Chrysostomos	2012-06-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two code paths, sv_2cv (for \&name) and get_cvn_flags (for &{"name"}()) were using start_subparse and newATTRSUB to create a subroutine stub, which is what usually happens for Perl subs (with op trees). This resulted in subs with unused pads attached to them, because start_subparse sets up the pad, which must be accessible dur- ing parsing. One code path, gv_init, which (among other things) reifies a GV after a sub declaration (like ‘sub foo;’, which for efficiency doesn’t create a CV), created the subroutine stub itself, without using start_subparse/newATTRSUB. This commit takes the code from gv_init, makes it more generic so it can apply to the other two cases, puts it in a new function called newSTUB, and makes all three locations call it. Now stub creation should be faster and use less memory. Additionally, this commit causes sv_2cv and get_cvn_flags to bypass bug #107370 (glob stringification not round-tripping properly). They used to stringify the GV and pass the string to newATTRSUB (wrapped in an op, of all things) for it to look up the GV again. While bug been fixed, as it was a side effect of sv_2cv triggering bug #107370.
*	S_regcppush/pop : don't save PL_reginput	David Mitchell	2012-06-13	1	-1/+1
\| \| \| \| \| \| \|	currently, S_regcppush() pushes PL_reginput, then S_regcppop() pops its value and returns it. However, all calls to S_regcppop() are currently in void context, so nothing actually uses this value. So don't save it in the first place.
*	eliminate PL_reglast(close)?paren, PL_regoffs	David Mitchell	2012-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	eliminate the three vars PL_reglastcloseparen PL_reglastparen PL_regoffs (which are actually aliases to PL_reg_state struct elements). These three vars always point to the corresponding fields within the currently executing regex; so just access those fields directly instead. This makes switching between regexes with (??{}) simpler: just update rex, and everything automatically references the new fields.
*	make is_bare_re bool. not int in re_op_compile	David Mitchell	2012-06-13	1	-1/+1
\| \| \| \| \|	This flag pointer only stores truth, so make it a pointer to a bool rather than to an int.
*	eliminate sv_compile_2op, sv_compile_2op_is_broken	David Mitchell	2012-06-13	1	-8/+1
\| \| \| \| \| \| \| \| \|	These two functions, which have been a pimple on the face of perl for far too long, are no longer needed, now that regex code blocks are compiled in a sensible manner. This also allows S_doeval() to be simplified, now that it is no longer called from sv_compile_2op_is_broken().
*	Fix up runtime regex codeblocks.	David Mitchell	2012-06-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous commits in this branch have brought literal code blocks into the New World Order; now do the same for runtime blocks, i.e. those needing "use re 'eval'". The main user-visible changes from this commit are that: * the code is now fully parsed, rather than needing balanced {}'s; i.e. this now works: my $code = q[ (?{ $a = '{' }) ]; use re 'eval'; /$code/ * warnings and errors are now reported as coming from "(eval NNN)" rather than "(re_eval NNN)" (although see the next commit for some fixups to that). Indeed, the string "re_eval" has been expunged from the source and documentation. The big internal difference is that the sv_compile_2op() and sv_compile_2op_is_broken() functions are no longer used, and will be removed shorty. It works by the regex compiler detecting the presence of run-time code blocks, and feeding the whole pattern string back into the parser (where the run-time blocks are now seen as compile-time), then extracting out any compiled code blocks and adding them to the mix. For example, in the following: $c = '(?{"runtime"})d'; use re 'eval'; /a(?{"literal"})\b'$c/ At the point the regex compiler is called, the perl parser will already have compiled the literal code block and presented it to the regex engine. The engine examines the pattern string, sees two '(?{', but only one accounted for by the parser, and so constructs a short string to be evalled: based on the pattern, but with literal code-blocks blanked out, and \ and ' escaped. In the above example, the pattern string is a(?{"literal"})\b'(?{"runtime"})d and we call eval_sv() with an SV containing the text qr'a \\b\'(?{"runtime"})d' The returned qr will contain the new code-block (and associated CV and pad) which can be extracted and added to the list of compiled code blocks of the original pattern. Note that with this scheme, the requirement for "use re 'eval'" is easily determined, and no longer requires all the pp_regcreset / PL_reginterp_cnt machinery, which will be removed shortly. Two subtleties of this scheme are that normally, \\ isn't collapsed into \ for literal regexes (unlike literal strings), and hints aren't inherited when using eval_sv(). We get round both of these by adding and setting a new flag, PL_reg_state.re_reparsing, which indicates that we are refeeding a pattern into the perl parser.
*	add op_comp field to regexp_engine API	David Mitchell	2012-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Perl's internal function for compiling regexes that knows about code blocks, Perl_re_op_compile, isn't part of the engine API. However, the way that regcomp.c is dual-lifed as ext/re/re_comp.c with debugging compiled in, means that Perl_re_op_compile is also compiled as my_re_op_compile. These days days the mechanism to choose whether to call the main functions or the debugging my_* functions when 'use re debug' is in scope, is the re engine API jump table. Ergo, to ensure that my_re_op_compile gets called appropriately, this method needs adding to the jump table. So, I've added it, but documented as 'for perl internal use only, set to null in your engine'. I've also updated current_re_engine() to always return a pointer to a jump table, even if we're using the internal engine (formerly it returned null). This then allows us to use the simple condition (eng->op_comp) to determine whether the current engine supports code blocks.
*	re_op_compile(): split flags into two arguments	David Mitchell	2012-06-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two sets of regex-related flags; the RXf_* which end up in the extflags field of a REGEXP, and the PMf_, which are in the op_pmflags field of a PMOP. Since I added the PMf_HAS_CV and PMf_IS_QR flags, I've been conflating these two meanings in the single flags arg to re_op_compile(), which meant that some bits were being misinterpreted. The only test that was failing was peek.t, but it may have quietly broken other things that simply weren't tested for (for example PMf_HAS_CV and RXf_SPLIT share the same value, so something with split qr/(?{...})/ might get messed up). So, split this arg into two; one for the RXf flags, and one for the PMf_* flags. The public regexp API continues to have only a single flags arg, which should only be accepting RXf_* flags.
*	re_op_compile(): rename pm_flags to rx_flags	David Mitchell	2012-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The orig_pm_flags argument and its modified copy, pm_flags, actually contain bits related to REGEXPs (i.e. RXf_) rather than PMOPs (i.e. PMf_); although there is some overlap between the two sets of bit flags. Rename the variables to make this less unclear. Ditto for re_compile().
*	"don't recompile pattern" check: account for UTF8	David Mitchell	2012-06-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When recompiling a pattern (e.g. for $x (x,y) { /$x/ }), it tests whether the new pattern string matches the old one, and if so skips recompiling it. However, it doesn't take account of the UTF8ness of the old and new patterns, so can falsely skip recompiling. Now fixed. Also, there is a feature in re_op_compile() that may abort a pattern compilation, upgrade the pattern to UTF8, then begin the compile again. I've added a second check for whether the pattern matches the old pattern, against the upgraded string. I can't see a way to test this, since its just an optimisation. Arguably I could add a BEGIN in an embedded code block to see if it gets compiled twice, but soon I'm going to make it so that embedded code blocks always get recompiled anyway.
*	Move bulk of pp_regcomp() into re_op_compile()	David Mitchell	2012-06-13	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When called, pp_regcomp() is presented with a list of SVs on the stack. Previously, it would perform (amongst other things): * overloading those SVs; * concatenating them; * detection of bare /$qr/; * detection of unchanged pattern; optionally followed by a call to the built-in or an external regexp compiler. Since we want to avoid premature concatenation (so that we can handle /$runtime(?{...})/), move all these activities from pp_regcomp() into re_op_compile(). This makes re_op_compile() a bit cumbersome, with a large arg list, but I haven't found any way of only moving only a subset of the above. Note that a side-effect of this is that qr-overloading now works for all regex compilations, not just those reached via pp_regcomp(); in particular this now invokes the qr method rather than the "" method if available: /(??{ $overloaded_object })/
*	change re_op_compile() to take a list of SVs	David Mitchell	2012-06-13	1	-2/+2
\| \| \| \| \| \| \| \|	rather than passing a single SV string containing the pattern, allow a list of SVs (plus count) to be passed. For the moment, only allow that list to be one element long, but this will allow us to directly pass in the list of SVs normally pre-processed into a single SV by pp_regcomp.
*	make qr/(?{})/ behave with closures	David Mitchell	2012-06-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this commit, qr// with a literal (compile-time) code block will Do the Right Thing as regards closures and the scope of lexical vars; in particular, the following now creates three regexes that match 1, 2 and 3: for my $i (0..2) { push @r, qr/^(??{$i})$/; } "1" =~ $r[1]; # matches Previously, $i would be evaluated as undef in all 3 patterns. This is achieved by wrapping the compilation of the pattern within a new anonymous CV, which is then attached to the pattern. At run-time pp_qr() clones the CV as well as copying the REGEXP; and when the code block is executed, it does so using the pad of the cloned CV. Which makes everything come out all right in the wash. The CV is stored in a new field of the REGEXP, called qr_anoncv. Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/; nor is it yet fixed where the qr// is embedded within another pattern: continuing with the code example from above, my $i = 99; "1" =~ $r[1]; # bare qr: matches: correct! "X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still seeing the wrong $i
*	add Perl_re_op_compile function	David Mitchell	2012-06-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Make Perl_re_compile() a thin wrapper around a new function, Perl_re_op_compile(). This function can take either a string pattern or a list of ops. Then make pmruntime() pass a list of ops directly to it, rather concatenating all the consts into a single string and passing the const to Perl_re_compile(). For now, Perl_re_op_compile just does the same: if its passed an op tree rather than an SV, then it just concats the consts. So this is is just the next step towards eventually allowing the regex engine to use the ops directly.
*	add Perl_current_re_engine() function	David Mitchell	2012-06-13	1	-0/+1
\| \| \| \| \| \| \|	Abstract out into a separate function the task of finding the current in-scope regex engine ($^H{regex}). Currently this task is only done in one place each for compile- and run-time, but shortly we'll need it in other places too.
*	Add alloccopstash provisionally to the API	Father Chrysostomos	2012-06-08	1	-1/+1
\|
*	[perl #109542] Make num ops treat $1 as "$1"	Father Chrysostomos	2012-06-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Numeric ops were not taking magical variables into account. So $1 (a magical variable) would be treated differently from "$1" (a non-magi- cal variable0. In determining whether to use an integer operation, they would call SvIV_please_nomg, and then check whether the sv was SvIOK as a result. SvIV_please_nomg would call SvIV_nomg if the sv were SvPOK or SvNOK. The problem here is that gmagical variables are never SvIOK, but only SvIOKp. In fact, the private flags are used differently for gmagical and non- magical variables. For non-gmagical variables, the private flag indi- cates that there is a cached value. If the public flag is not set, then the cached value is imprecise. For gmagical variables, imprecise values are never cached; only the private flags are used, and they are equivalent to the public flags on non-gmagical variables. This commit changes SvIV_please_nomg to take gmagical variables into account, using the newly-added sv_gmagical_2iv_please (see the docs for it in the diff). SvIV_please_nomg now returns true or false, not void, since a subsequent SvIOK is not reliable. So ‘SvIV_please_nomg(sv); if(SvIOK)’ becomes ‘if(SvIV_please_nomg(sv))’.
*	Make setdefout accept only NN	Father Chrysostomos	2012-06-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Just search through the source for GvIOp(PL_defoutgv) and you will see that perl assumes that PL_defoutgv is never null. I tried setting it to null from XS and got crashes, unsurprisingly. The only CPAN user of PL_defoutgv sets it to STDOUT.
*	Do away with stashpv_hvname_match	Father Chrysostomos	2012-06-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	For some reason this is listed in the API, even though it is not docu- mented and is only available under ithreads. It was added by commit ed221c5717, which doesn’t explain why it needed to be part of the API. (Presumably because a public macro used it, even though there are better ways to solve that.) It is unused on CPAN and (now) in core, so there is no reason to keep it.
*	[perl #78742] Store CopSTASH in a pad under threads	Father Chrysostomos	2012-06-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this commit, a pointer to the cop’s stash was stored in cop->cop_stash under non-threaded perls, and the name and name length were stored in cop->cop_stashpv and cop->cop_stashlen under ithreads. Consequently, eval "__PACKAGE__" would end up returning the wrong package name under threads if the current package had been assigned over. This commit changes the way cops store their stash under threads. Now it is an offset (cop->cop_stashoff) into the new PL_stashpad array (just a mallocked block), which holds pointers to all stashes that have code compiled in them. I didn’t use the lexical pads, because CopSTASH(cop) won’t work unless PL_curpad is holding the right pad. And things start to get very hairy in pp_caller, since the correct pad isn’t anywhere easily accessible on the context stack (oldcomppad actually referring to the current comppad). The approach I’ve followed uses far less code, too. In addition to fixing the bug, this also saves memory. Instead of allocating a separate PV for every single statement (to hold the stash name), now all lines of code in a package can share the same stashpad slot. So, on a 32-bit OS X, that’s 16 bytes less memory per COP for short package names. Since stashoff is the same size as stashpv, there is no difference there. Each package now needs just 4 bytes in the stashpad for storing a pointer. For speed’s sake PL_stashpadix stores the index of the last-used stashpad offset. So only when switching packages is there a linear search through the stashpad.
*	remove deprecated qw-as-parens behaviour	Zefram	2012-05-25	1	-2/+0
\|
*	The reentrant API should always have prototypes.	Craig A. Berry	2012-05-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reentr.c always defines and exports its functions even when USE_REENTRANT_API is not defined (though they'll be empty functions in that case). In general we shouldn't be exporting functions without providing prototypes for them, but specifically, when compiling with C++, the prototype-less functions get their names mangled. So the purpose of defining the functions when we aren't using them (to have a consistent API) is defeated because no one looking for those functions under their proper names would be able to find them. So this makes us stop hiding the prototypes when USE_REENTRANT_API is not defined.
*	utf8.c: Add nomix-ASCII option to to_fold functions	Karl Williamson	2012-05-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Under /iaa regex matching, folds that cross the ASCII/non-ASCII boundary are prohibited. This changes _to_uni_fold_flags() and _to_utf8_fold_flags() functions to take a new flag which, when set, tells them to not accept such folds. This allows us to later move the intelligence for handling this situation to these centralized functions.