delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	[perl #122911] regexp.h: Rmv VOL from op_comp sig	Father Chrysostomos	2014-10-06	1	-1/+1
\| \| \| \|	It is no longer needed as of 1067df30ae9.
*	Suppress some Solaris warnings	Karl Williamson	2014-09-29	1	-15/+16
\| \| \| \| \| \| \| \|	We get an integer overflow message when we left shift a 1 into the highest bit of a word. This changes the 1's into 1U's to indicate unsigned. This is done for all the flag bits in the affected word, as they could get reorderd by someone in the future, unintentionally reintroducing this problem again.
*	Deprecate multiple "x" in "/xx"	Karl Williamson	2014-09-29	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	It is planned for a future Perl release to have /xx mean something different from just /x. To prepare for this, this commit raises a deprecation warning if someone currently has this usage. A grep of CPAN did not turn up any instances of this, but this is to be safe anyway. The added code is more general than actually needed, in case we want to do this for another flag.
*	Make space for /xx flag	Karl Williamson	2014-09-29	1	-2/+2
\| \| \| \| \| \|	This doesn't actually use the flag yet. We no longer have to make version-dependent changes to ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
*	regexp.h: Comment shared-pool free bits scheme	Karl Williamson	2014-09-29	1	-3/+39
\|
*	regexp.h: Make tentative division of free-bit space	Karl Williamson	2014-09-29	1	-20/+18
\| \| \| \| \| \|	This sets a #define to point in the middle of the free-space, so that bits at either end can be added without having to adjust many other defines.
*	regexp.h: Define flag bit directly, not indirectly	Karl Williamson	2014-09-29	1	-8/+5
\| \| \| \| \| \| \|	This #defined a symbol then did a compile time check that it was the same as another symbol. This commit simply defines it as the other symbol directly, and moves it to above the other definitions, which it no longer is part of. This prepares for the next commit.
*	regexp.h Remove unused bit placeholders	Karl Williamson	2014-09-29	1	-6/+1
\| \| \| \| \| \|	We do not need a placeholder for unused flag bits. And removing them makes the generated regnodes.h more accurate as to what bits are available.
*	regexp.h: Move regex flag bit positions.	Karl Williamson	2014-09-29	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves three bits to create a block of unused bits at the beginning. The first bit had to be moved to make space for other uses that are coming in future commits. This breaks binary compatibility, so might as well move the other two bits so that all the unused bits are consolidated at the beginning. This pool of unused bits is the boundary between the bits that are common to op.h and regexp.h (and in op_reg_common.h) and those that are separate. It's best to have all the unused bits there, so when we need to use one, it can be taken from either side, as needed, without us being trapped into having an available bit, but of the wrong kind.
*	Some low-hanging -Wunreachable-code fruits.	Jarkko Hietaniemi	2014-06-15	1	-31/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- after return/croak/die/exit, return/break are pointless (break is not a terminator/separator, it's a goto) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (the initializations will be skipped since the flow will bypass the start of the block); in some easy cases simply hoist the declarations out of the block and move them earlier Note 1: Since after this patch the core is not yet -Wunreachable-code clean, not enabling that via cflags.SH, one needs to -Accflags=... it. Note 2: At least with the older gcc 4.4.7 there are far too many "unreachable code" warnings, which seem to go away with gcc 4.8, maybe better flow control analysis. Therefore, the warning should eventually be enabled only for modernish gccs (what about clang and Intel cc?)
*	Revert "Some low-hanging -Wunreachable-code fruits."	Jarkko Hietaniemi	2014-06-13	1	-2/+2
\| \| \| \| \| \| \|	This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679. I don't understand - smoke-me came back happy with three separate reports... oh well, some other time.
*	Some low-hanging -Wunreachable-code fruits.	Jarkko Hietaniemi	2014-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- after croak/die/exit (or return), break (or return!) are pointless (break is not a terminator/separator, it's a promise of a jump) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (they will be skipped!); in some easy cases simply hoist the declarations out of the block and move them earlier There are still a few places left.
*	Undo 63b558ddd980cd36bcbd8a7465a3412e886ba75e.	Jarkko Hietaniemi	2014-05-29	1	-1/+1
\| \| \| \|	(For some odd reason assert() cannot be found and Jenkins becomes apoplectic.)
*	Use NOT_REACHED for the impossible case.	Jarkko Hietaniemi	2014-05-29	1	-1/+1
\| \| \| \| \| \| \| \|	The default case really is impossible because all the valid enums values are already covered in the switch. The NOT_REACHED; is for the compiler (from perl.h), the /* NOTREACHED */ is for static analyzers.
*	[perl #121854] use re 'taint' regression	David Mitchell	2014-05-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent regexes so that the pattern was considered tainted at compile-time, rather than determining it each time at run-time whenever it executed a locale-dependent node. Unfortunately due to the conflating of two flags, RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently marking a pattern as tainted once it had had a single tainted result. E.g. use re qw(taint); use Scalar::Util qw(tainted); for ($^X, "abc") { /(.)/ or die; print "not " unless tainted("$1"); print "tainted\n"; }; which from 5.19.9 onwards output: tainted tainted but with this commit (and with 5.19.8 and earlier), it now outputs: tainted not tainted The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g. $r = qr/$tainted_value/ while the RXf_TAINTED_SEEN flag means that the results of the last match are tainted, e.g. use re 'tainted'; $tainted =~ /(.)/; # $1 is tainted Pre 63baef5, the code used to look like: at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { switch(node) { case BOUNDL: /* and other locale-specific ops */ turn on RXf_TAINTED_SEEN; ...; } } if (tainted \|\| RXf_TAINTED) turn on RXf_TAINTED_SEEN; 63baef5 changed it to: at compile-time: if (pattern has locale ops) turn on RXf_TAINTED_SEEN; at run-time: while (nodes to execute) { ... } if (tainted \|\| RXf_TAINTED) turn on RXf_TAINTED_SEEN; This commit changes it to: at compile-time; if (pattern has locale ops) turn on RXf_TAINTED; at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { ... } if (tainted \|\| RXf_TAINTED) turn on RXf_TAINTED_SEEN;
*	regex substrs: record index of check substr	David Mitchell	2014-02-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently prog->substrs->data[] is a 3 element array of structures. Elements 0 and 1 record the longest anchored and floating substrings, while element 2 ('check'), is a copy of the longest of 0 and 1. Record in a new field, prog->substrs->check_ix, the index of which element was copied. (Eventually I intend to remove the copy altogether.) Also for the anchored substr, set max_offset equal to min offset. Previously it was left as zero and ignored, although if copied to check, the check copy of max was set equal to min. Having this always set will allow us to make the code simpler.
*	regexp.h: document the fields of reg_substr_datum	David Mitchell	2014-02-07	1	-3/+3
\| \| \| \| \|	In particular, specify that the various offset fields are char rather than byte counts.
*	Avoid compiler warnings by consistently using #ifdef instead of plain #if	Brian Fraser	2014-02-05	1	-1/+1
\|
*	Add RXf_UNBOUNDED_QUANTIFIER and regexp->maxlen	Yves Orton	2014-02-03	1	-1/+2
\| \| \| \| \| \| \| \| \|	The flag tells us that a pattern may match an infinitely long string. The new member in the regexp struct tells us how long the string might be. With these two items we can implement regexp based $/
*	Move the RXf_ANCH flags to intflags as PREGf_ANCH_xxx and add ↵	Yves Orton	2014-01-31	1	-8/+5
\| \| \| \| \| \| \| \| \| \|	RXf_IS_ANCHORED as a replacement The only requirement outside of the regex engine is to identify that there is an anchor involved at all. So we move the 4 anchor flags to intflags and replace it with a single aggregate flag RXf_IS_ANCHORED in extflags. This frees up another 3 bits in extflags.
*	rename RXf_UNUSED flags to match their BASE_SHIFT offset	Yves Orton	2014-01-31	1	-4/+4
\| \| \| \|	So they stay stable as I move other flags from extflags to intflags
*	move RXf_GPOS_SEEN and RXf_GPOS_FLOAT to intflags	Yves Orton	2014-01-31	1	-5/+4
\| \| \| \| \| \| \| \|	This required removing the RXf_GPOS_CHECK mask as it uses one flag that will stay in extflags for now (RXf_ANCH_GPOS), and one flag that moves to intflags (RXf_GPOS_SEEN). This mask is strange however, as you cant have RXf_ANCH_GPOS without having RXf_GPOS_SEEN so I dont know why we test both. Further investigation required.
*	Rename RXf_CANY_SEEN to PREGf_CANY_SEEN and move from extflags to intflags	Yves Orton	2014-01-31	1	-2/+2
\|
*	move RXf_NOSCAN from extflags to intflags as PREGf_NOSCAN	Yves Orton	2014-01-31	1	-1/+1
\| \| \| \| \|	Includes some improvements to how we dump regexps so that when a regexp is for the standard perl engine we also show the intflags for the engine
*	perlapi: Consistent spaces after dots	Father Chrysostomos	2013-12-29	1	-1/+1
\| \| \| \|	plus some typo fixes. I probably changed some things in perlintern, too.
*	Use SSize_t/STRLEN in more places in regexp code	Father Chrysostomos	2013-08-25	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of getting the regexp engine to handle long strings, this com- mit changes any variables, parameters and struct members that hold lengths of the string being matched against (or parts thereof) to use SSize_t or STRLEN instead of [IU]32. To avoid having to change any logic, I kept the signedness the same. I did not change anything that affects the length of the regular expression itself, so regexps are still practically limited to I32_MAX. Changing that would involve changing the size of regnodes, which would be a lot more involved. These changes should fix bugs, but are very hard to test. In most cases, I don’t know the regexp engine well enough to come up with test cases that test the paths in question with long strings. In other cases I don’t have a box with enough memory to test the fix.
*	Stop substr re optimisation from rejecting long strs	Father Chrysostomos	2013-08-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using I32 for the fields that record information about the location of a fixed string that must be found for a regular expression to match can result in match failures, because I32 is not large enough to store offsets >= 2**31. SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32 bits on 32-bit platforms. This commit changes enough instances of I32 to SSize_t to get the added test passing and suppress compiler warnings. A later commit will change many more.
*	Make $' work past the 2**31 threshold	Father Chrysostomos	2013-08-25	1	-1/+1
\|
*	[perl #116907] Allow //g matching past 2**31 threshold	Father Chrysostomos	2013-08-25	1	-3/+4
\| \| \| \| \| \| \| \| \|	Change the internal fields for storing positions so that //g in scalar context can move past the 2**31 character threshold. Before this com- mit, the numbers would wrap, resulting in assertion failures. The changes in this commit are only enough to get the added test pass- ing. Stay tuned for more.
*	Stop pos() from being confused by changing utf8ness	Father Chrysostomos	2013-08-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
*	improve regexec_flags() API documentation	David Mitchell	2013-08-13	1	-11/+16
\| \| \| \| \| \|	In the API, rename the 'screamer' arg to be 'sv' instead; update the description of the functions args; improve the documentation of the REXEC_* flags for the 'flags' arg.
*	s/.(?=.\G)/X/g: refuse to go backwards	David Mitchell	2013-07-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On something like: $_ = "123456789"; pos = 6; s/.(?=.\G)/X/g; each iteration could in theory start with pos one character to the left of the previous position, and with the substitution replacing bits that it has already replaced. Since that way madness lies, ban any attempt by s/// to substitute to the left of a previous position. To implement this, add a new flag to regexec(), REXEC_FAIL_ON_UNDERFLOW. This tells regexec() to return failure even if the match itself succeeded, but where the start of $& is before the passed stringarg point. This change caused one existing test to fail (which was added about a year ago): $_="abcdef"; s/bc\|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge; print; # used to print "aXX[c-d][d-e][e-f]"; now prints "aXXdef" I think that that test relies on ambiguous behaviour, and that my change makes things saner. Note that s/// with \G is generally very under-tested.
*	regexec: handle \G ourself, rather than in callers	David Mitchell	2013-07-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally a /g match starts its processing at the previous pos() (or at char 0 if pos is not set); however in the case of something like /abc\G/ we actually need to start 3 characters before pos. This has been handled by the callers of regexec() subtracting prog->gofs from the stringarg arg before calling it, or by setting stringarg to strbeg for floating, such as /\w+\G/. This is clearly wrong: the callers of regexec() shouldn't need to worry about the details of getting \G right: move this code into regexec() itself. (Note that although this commit passes all tests, it quite possibly isn't logically correct. It will get fixed up further during the next few commits)
*	document fields of regmatch_info struct	David Mitchell	2013-06-02	1	-6/+6
\|
*	eliminate PL_reg_state	David Mitchell	2013-06-02	1	-7/+0
\| \| \| \| \| \| \| \| \| \|	This is a struct that holds all the global state of the current regex match. The previous set of commits have gradually removed all the fields of this struct (by making things local rather than global state). Since the struct is now empty, the PL_reg_state var can be removed, along with the SAVEt_RE_STATE save type which was used to save and restore those fields on recursive re-entry to the regex engine.
*	Eliminate PL_reg_starttry	David Mitchell	2013-06-02	1	-2/+2
\| \| \| \| \| \| \| \|	Its only used for printing debugging messages, and its value is already available as the startpos local var in S_regmatch(). Whoo hoo! This var was the last field within the PL_reg_state global state struct.
*	make PL_reg_curpm global	David Mitchell	2013-06-02	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PL_reg_curpm is actually #deffed to a field within PL_reg_state; promote it into a fully autonomous perl-interpreter variable. PL_reg_curpm points to a fake PMOP that's used to temporarily point PL_curpm to, that we can hang the current regex off, so that this works: "a" =~ /^(.)(?{ print $1 })/ # prints 'a' It turns out that it doesn't need to be saved and restored when we recursively enter the regex engine; that is already handled by saving and restoring which regex is currently attached to PL_reg_curpm. So we just need a single global (per interpreter) placeholder. Since we're shortly going to get rid of PL_reg_state, we need to move it out of that struct.
*	eliminate PL_reg_poscache, PL_reg_poscache_size	David Mitchell	2013-06-02	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eliminate these two global vars (well, fields in the global PL_reg_state), that hold the regex super-liner cache. PL_reg_poscache_size gets replaced with a field in the local regmatch_info struct, while PL_reg_poscache (which needs freeing at end of pattern execution or on croak()), goes in the regmatch_info_aux struct. Note that this includes a slight change in behaviour. Each regex execution now has its own private poscache pointer, initially null. If the super-linear behaviour is detected, the cache is malloced, used for the duration of the pattern match, then freed. The former behaviour allocated a global poscache on first use, which was retained between regex executions. Since the poscache could between 0.25 and 2x the size of the string being matched, that could potentially be a big buffer lying around unused. So we save memory at the expense of a new malloc/free for every regex that triggers super-linear behaviour. The old behaviour saved the old pointer on reentrancy, then restored the old one (and possibly freed the new buffer) at exit. Except it didn't for (?{}), so /(?{ m{something-that-triggers-super-linear-cache} })/ would leak each time the inner regex was called. This is now fixed automatically.
*	use new cleanup for PL_regmatch_state	David Mitchell	2013-06-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous commit reorganised state save and cleanup at the end of regex execution. Use this new mechanism, by recording the original values of PL_regmatch_slab and PL_regmatch_state in the regmatch_info_aux struct, and restoring them and freeing higher slabs as part of the general S_cleanup_regmatch_info_aux() destructor, rather than pushing the old values directly onto the savestack and using another specific destructor. Also, make the initial allocating of (up to) 3 PL_regmatch_state slots more efficient by doing it in a loop. We also skip the first slot; this may already be in use if we're called reentrantly. try 1
*	unify regmatch_info data	David Mitchell	2013-06-02	1	-10/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the regmatch_info struct was allocated as a local var on the C stack, while some extra state (only needed for regexes having (?{})) was malloced (as a regmatch_eval_state struct) as needed - and a destructor set up to clean it up afterwards. This being because the stuff being cleaned up couldn't be allocated on the C stack as it needed to hang around after a croak(). Reorganise this so that: * regmatch_info is on the C stack as before. * a new struct, regmatch_info_aux is allocated within the first slot of the regmatch_state stack, for fields which must always exist but which need cleanup afterwards. This is currently unused, but will be shortly. * a new struct, regmatch_info_aux_eval (which is just a renamed regmatch_eval_state struct), is optionally allocated in the second slot of regmatch_state. This is logically part of regmatch_info_aux, except that splitting it in two stops it being too large to fit in a regmatch_state slot (we can fit it in two instead). (The second and third structs aren't allocated when we're intuit() rather than regexec()). Doing it like this simplifies allocation and cleanup: there's no need for a malloc(), and we are already going to allocate a slab's worth of regmatch_state slots, so using an extra one of two of them is effectively free; and the cleanup just requires calling a single overall destructor. In the next few commits, more of the regexec() state setup and tear-down will be integrated into this new regime. And in particular, the new regmatch_info_aux struct will give us somewhere to hang things like PL_reg_poscache once it stops being global (it being local state that needs cleanup).
*	eliminate PL_reg_maxiter, PL_reg_leftiter	David Mitchell	2013-06-02	1	-4/+2
\| \| \| \| \|	Move these two fields of PL_reg_state into the regmatch_info struct, so they are local to each match.
*	Eliminate PL_reg_match_utf8	David Mitchell	2013-06-02	1	-4/+2
\| \| \| \| \|	Earlier commits made the use of this var just local to the current match, so move it to the local regmatch_info struct instead.
*	regex engine: simplify is_utf8_pat handling	David Mitchell	2013-06-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since this value is actually just always equal to cBOOL(RX_UTF8(rx)), there's no need to save the old value of the local boolean (as u.eval.saved_utf8_pat) when switching back and forwards between regexes with (??{}); instead, just re-calculate it whenever we switch, and update reginfo->is_utf8_pat and its cached value in the is_utf8_pat local var accordingly. Also, pass reginfo as an arg to S_setup_EXACTISH_ST_c1_c2() rather than is_utf8_pat directly; this will allow us to eliminate PL_reg_match_utf8 shortly. A new test is included that detects a mistake I made while working up this change: I recalculated is_utf8_pat, but forgot to update reginfo->is_utf8_pat too.
*	stop callers of rex engine using RX_MATCH_UTF8_set	David Mitchell	2013-06-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way that the regex engine knows that the match string is utf8 is currently a complete mess. It's partially signalled by the utf8 flag of the passed SV, but also by the RXf_MATCH_UTF8 flag in the regex itself, and the value of PL_reg_match_utf8. Currently all the callers of the engine (such as pp_match, pp_split etc) initially use RX_MATCH_UTF8_set() before calling the engine. This sets both the RXf_MATCH_UTF8 flag on the regex, and PL_reg_match_utf8. Then the two entry points to the engine (regexec_flags() and re_intuit_start()) initially repeat the RX_MATCH_UTF8_set() themselves. Remove the usage of RX_MATCH_UTF8_set() by the callers of the engine, and instead just rely on the engine to do it. Also, remove the "secret" setting of PL_reg_match_utf8 by RX_MATCH_UTF8_set(), and do it explicitly. This is a prelude to eliminating PL_reg_match_utf8.
*	add regmatch_eval_state struct	David Mitchell	2013-06-02	1	-19/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace several PL_reg* vars with a new struct. This is part of the goal of removing all global regex state. These particular vars are used in the case of a regex with (?{}) code blocks. In this case, when the code in a block is called, various bits of state (such as $1, pos($_)) are temporarily set up, even though the match has not yet completed. This involves updating the current PL_curpm to point to a fake PMOP which points to the regex currently being executed. That regex has all its current fields that are associated with captures (such as subbeg) temporarily saved and overwritten with the current partial match results. Similarly, $_ is temporarily aliased to the current match string, and any old pos() position is saved. This saving was formerly done to the various PL_reg* vars. When the regex has finished executing (or if the code block croaks), its fields are restored to the original values. Since this can happen in a croak, it may be done using SAVEDESTRUCTOR_X() on the save stack. This precludes just moving the PL_reg* vars into the regmatch_info struct, since that is just allocated as a local var in regexec_flags(), and would have already been abandoned and possibly overwritten after the croak and longjmp, but before the SAVEDESTRUCTOR_X() action is taken. So instead we put all the vars into new struct, and malloc that on entry to the regex engine when we know we need to copy the various fields. We save a pointer to that in the regmatch_info struct, as well as passing it to SAVEDESTRUCTOR_X(). The destructor may get called up to twice in the non-croak case: first it's called explicitly at the end of regexec_flags(), which restores subbeg etc; then again from the savestack, which just free()s the struct. In the croak case, it's called just once, and does both the restoring and the freeing. The vars / PL_reg_state fields this commit eliminates are: re_state_eval_setup_done PL_reg_oldsaved PL_reg_oldsavedlen PL_reg_oldsavedoffset PL_reg_oldsavedcoffset PL_reg_magic PL_reg_oldpos PL_nrs PL_reg_oldcurpm
*	remove unused reginfo->bol field	David Mitchell	2013-06-02	1	-1/+0
\|
*	eliminate PL_bostr	David Mitchell	2013-06-02	1	-2/+1
\| \| \| \| \|	by moving it from the global PL_reg_state struct to the local reginfo struct.
*	add strbeg argument to Perl_re_intuit_start()	David Mitchell	2013-06-02	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(note that this is a change both to the perl API and the regex engine plugin API). Currently, Perl_re_intuit_start() is passed an SV, plus pointers to: where in the string to start matching (strpos); and to the end of the string (strend). Unlike Perl_regexec_flags(), it doesn't also have a strbeg arg. Because of this this, it guesses strbeg: based on the passed SV (if its svPOK()); or just set to strpos otherwise. This latter can happen if for example the SV is overloaded. Note also that this latter guess is wrong, and could in theory make /\b.../ fail. But just to confuse matters, although Perl_re_intuit_start() itself uses its guesstimate strbeg var, some of the functions it calls use the global value of PL_bostr instead. To make this work, the callers of Perl_re_intuit_start() currently set PL_bostr first. This is why \b doesn't actually break. The fix to this unholy mess is to simply add a strbeg arg to Perl_re_intuit_start(). It's also the first step to eliminating PL_bostr altogether.
*	eliminiate PL_regeol	David Mitchell	2013-06-02	1	-2/+1
\| \| \| \| \| \| \|	This is another global regex state variable (actually a field of PL_reg_state). Eliminate it by moving it into the regmatch_info struct instead, which is local to each match. Also, rename it to strend, which is a less misleading description in these exciting days of multi-line matches.
*	make more use of regmatch_info struct.	David Mitchell	2013-06-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	regmatch_info is a small struct that is currently directly allocated as a local var in Perl_regexec_flags(), and has a few fields that maintain part of the state of the current pattern match. It is passed as an arg to various functions that regexec_flags() calls, such as regtry(). In some ways its a rival to PL_reg_state, which also maintains state for the current match, but which is a global variable (whose state needs saving and restoring whenever the regex engine goes reentrant). It makes more sense to store state in the regmatch_info struct, and as a first step in moving more state to there, this commit makes more use of regmatch_info. In particular, it makes Perl_re_intuit_start() also allocate such a struct, so that now both the main execution entry points to the regex engine make use of it. It's also now passed as an arg to more of the static functions that these two op-level ones call. Two changes of special note. First, whether S_find_byclass() got called with a null reginfo pointer of not indicated whether it had been called from Perl_regexec_flags() (with a valid reginfo pointer), or from Perl_re_intuit_start() (null pointer). Since they both pass non-null reginfo pointers now, instead we add an extra field, reginfo->intuit that indicates who's the top-level caller. Secondly, to allow in future for various macros to uniformly refer to values like reginfo->foo, where the structure is actually allocated as a local var in Perl_regexec_flags(), we change the reginfo from being the struct itself to being a pointer to the struct, (so Perl_regexec_flags itself now uses reginfo->foo too rather than reginfo.foo). In summary, all the above is essentially window dressing that makes no functional changes to the code, but will facilitate future changes.