delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	fix 'ignoring return value' compiler warnings	David Mitchell	2013-11-24	1	-10/+19
\| \| \| \| \| \| \| \| \| \| \|	Various system functions like write() are marked with the __warn_unused_result__ attribute, which causes an 'ignoring return value' warning to be emitted, even if the function call result is cast to (void). The generic solution seems to be int rc = write(...); PERL_UNUSED_VAR(rc);
*	[perl #119811] Remove %DB::lsub	Father Chrysostomos	2013-10-28	1	-1/+1
\| \| \| \| \| \| \| \|	Under the debugger (the -d switch), if there is a DB::sub subroutine, then %DB::lsub gets autovivified during a call to an lvalue sub. That hash is never used. The code for vivifying the DB::lsub glob was copied from DB::sub when lsub was added, and DB::sub does have a hash (%DB::sub), but DB::lsub doesn’t need one.
*	pp_hot.c:pp_rv2av: Remove superfluous SPAGAIN	Father Chrysostomos	2013-10-24	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Commit perl-5.8.0-9908-g5e17dd7, in March of 2007, added a SPAGAIN here to account for the stack shifting when hv_scalar calls tied hashes’ SCALAR method. It was never necessary, because commit perl-5.8.0-3008-ga3bcc51, which added hv_scalar and magic_scalarpack in December of 2003, made magic_scalarpack push a new stack, protecting the old one.
*	rv2hv does not use its TARG	Father Chrysostomos	2013-10-24	1	-1/+1
\| \| \| \| \| \| \|	rv2hv has had a TARG since perl 5.000, but it has not used it since hv_scalar was added in perl-5.8.0-3008-ga3bcc51. This commit removes it, saving a tiny bit of space in the pad.
*	Make &xsub and goto &xsub work with tied @_	Father Chrysostomos	2013-09-09	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is the only place where tied @_ does not work, and there appears to be no reason why it shouldn’t, apart from the fact that it hasn’t been implemented. Commit 67955e0c was what made &xsub work to begin with. 93965878572 introduced tied arrays and added the comment to pp_entersub saying that @_ is not tiable. goto &xsub has worked since perl 5.000, but 93965878572 did not make it work with tied arrays.
*	[perl #117265] move the "glob failed" warning to the point of failure	Tony Cook	2013-09-09	1	-7/+3
\| \| \| \|	This avoids an extraneous warning when globbing fails for other reasons.
*	Allow 64-bit array and stack offsets in entersub & goto	Father Chrysostomos	2013-09-06	1	-4/+4
\| \| \| \| \|	I don’t have enough memory to test this, but it needs to be done even- tually anyway.
*	Stop &xsub and goto &xsub from crashing on undef *_	Father Chrysostomos	2013-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$ perl -e 'undef _; &Internals::V' Segmentation fault: 11 $ perl -e 'sub { undef _; goto &Internals::V }->()' $ perl5.18.1 -e 'sub { undef _; goto &Internals::V }->()' Segmentation fault: 11 The goto case is actually a regression from 5.16 (049bd5ffd62), as goto used to ignore changes to _. (Fixing one bug uncovers another.) We shouldn’t assume that GvAV(PL_defgv) (*_{ARRAY}) gives us anything. While we’re at it, since we have to add extra checks anyway, use them to speed up empty @_ in goto (by checking items, rather than arg).
*	Put AV defelem creation code in one place	Father Chrysostomos	2013-09-06	1	-24/+5
\|
*	Use defelems for (goto) &xsub calls	Father Chrysostomos	2013-09-06	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before ce0d59f: $ perl -e '++$#_; &utf8::encode' Modification of a read-only value attempted at -e line 1. As of ce0d59f: $ ./perl -Ilib -e '++$#_; &utf8::encode' Assertion failed: (sv), function Perl_sv_utf8_encode, file sv.c, line 3581. Abort trap: 6 Calling sub { utf8::encode($_[0]) } should be more or less equivalent to calling utf8::encode, but it is not in this case: $ ./perl -Ilib -we '++$#_; &{sub { utf8::encode($_[0]) }}' Use of uninitialized value in subroutine entry at -e line 1. In the first two examples above, an implementation detail is leaking through. What you are seeing is not the array element, but a place- holder that indicates an element that has not been assigned to yet. We should use defelem magic so that what the XSUB assigns to will cre- ate an array element (as happens with utf8::encode($_[0])). All of the above applies to goto &xsub as well.
*	pp_hot.c:pp_aelem: Use _NN in one spot	Father Chrysostomos	2013-09-06	1	-1/+1
\| \| \| \| \|	This av can never be null here. av_len will already have failed an assertion if it is.
*	Stop creating defelems for undef in foreach(@_)	Father Chrysostomos	2013-08-28	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of ticket #119433. This particular bug is triggered by Data::Dump’s test suite. Commit ce0d59f changed arrays to use NULL for nonexistent elements, instead of &PL_sv_undef (the special scalar returned by Perl’s ‘undef’ operator). ‘foreach’ was not updated to account. It was still treating &PL_sv_undef as a nonexistent element. This was causing ‘Modifica- tion of non-creatable array value attempted, subscript 0’, due to a similar bug in vivify_defelem, which the next commit will fix. (Fixing vivify_defelem without fixing foreach will make the test pass, but for foreach to create a defelem to begin with is inefficient and should be addressed anyway.)
*	[perl #119311] Keep CvDEPTH and savestack in sync	Father Chrysostomos	2013-08-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when unwinding sub and format calls. The comments in the added test file explain what the problem is. The fix is to call LEAVE_SCOPE in POPSUB and POPFORMAT (to free their lexicals) before lowering CvDEPTH. If the context has already been popped via cxstack_ix--, then LEAVE_SCOPE could overwrite it, so accessing cx after LEAVE_SCOPE is unsafe. Hence the changes to POPSUB and POPFORMAT are a bit involved. Some callers of POPSUB do a temporary cxstack_ix++ first so they can access cx afterwards. Two cases needed to be changed to work that way.
*	pp_hot.c: Show lengths in -Dr output for minlen optimisation	Father Chrysostomos	2013-08-25	1	-1/+3
\|
*	[perl #116907] Allow //g matching past 2**31 threshold	Father Chrysostomos	2013-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	Change the internal fields for storing positions so that //g in scalar context can move past the 2**31 character threshold. Before this com- mit, the numbers would wrap, resulting in assertion failures. The changes in this commit are only enough to get the added test pass- ing. Stay tuned for more.
*	Stop minlen regexp optimisation from rejecting long strings	Father Chrysostomos	2013-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This fixes #112790 and part of #116907. The length of the string is cast to I32, so it wraps and end up less than the minimum length. For now, simply skip this optimisation if minlen itself wraps and becomes negative.
*	Stop pos() from being confused by changing utf8ness	Father Chrysostomos	2013-08-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
*	Use SSize_t for arrays	Father Chrysostomos	2013-08-25	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	Make the array interface 64-bit safe by using SSize_t instead of I32 for array indices. This is based on a patch by Chip Salzenberg. This completes what the previous commit began when it changed av_extend.
*	[perl #118747] Allow in-place s///g when !!PL_sawampersand	Father Chrysostomos	2013-08-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the more correct version of 1555b325 which was reverted by 6200d5a0e. In pp_subst, there is an initial pattern match against the target string, followed by logic to determine which of several code paths will handle the rest of the substitution, depending on which shortcuts can be taken. There is one path specifically for doing a global sort (/g) and modi- fying the target in place. This code was skipped if the target was a copy-on-write scalar or if the pre-match copy was enabled. The pre- match copy is always enabled now, so this code is unreachable. In-place substitution stringifies the rhs at the outset, just after the first regexp match, but before any substitution. Then it uses that string buffer, expecting it not to change. That clearly cannot work with s/a/$&/g; it will also cause erratic behaviour in the case of regexp code blocks (which will see the string being modified, which doesn not happen with unoptimised subst). That’s why the in-place optimisation has to be skipped when the REXEC_COPY_STR flag is set. But we can tweak that logic: • As long as the rhs is not a magical var, its contents are not going to change from one iteration to the next. • If there are no code blocks, nothing will see the string during the substitution. So this commit adds logic to check those things, enabling this opti- misation where possible. Skipping this optimisation for the pre-match copy was originally added in commit 5d5aaa5e7.
*	Revert "[perl #118747] Allow in-place s///g when !!PL_sawampersand"	Father Chrysostomos	2013-08-22	1	-0/+1
\| \| \| \| \| \| \|	This reverts commit 1555b325296e46f7b95bee03fe856cec348b0d57. This is causing test failures (not on my machine), and I do not have time right now to track them all down.
*	pp_hot.c:pp_subst: Move comment	Father Chrysostomos	2013-08-21	1	-2/+2
\| \| \| \| \| \| \| \| \|	Perl 5 has always had the ‘don't match same null twice’ comment in pp_subst. Originally it was a parenthetical note right below the s == m. 71be2cbc removed the parentheses. Commit f722798be moved it further away from s == m. Now what it refers to is far from clear.
*	[perl #118747] Allow in-place s///g when !!PL_sawampersand	Father Chrysostomos	2013-08-21	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pp_subst, there is an initial pattern match against the target string, followed by logic to determine which of several code paths will handle the rest of the substitution, depending on which shortcuts can be taken. There is one path specifically for doing a global sort (/g) and modi- fying the target in place. This code was skipped if the target was a copy-on-write scalar or if the pre-match copy was enabled. The pre- match copy is always enabled now, so this code is unreachable. There does not appear to be any reason why this path must be skipped in the presence of the pre-match copy. The string gets copied by the initial regexp match and $& and friends point there afterwards. This skip was added in commit 5d5aaa5e7 (a jumbo patch, so good luck figuring it out). This commit removes the skip, and all tests pass. This, of course, only affects those cases where copy-on-write does not kick in; for instance, when the string’s length is one less than its buffer: $ ./perl -Ilib -e 'use Devel::Peek; $x = " "; $x .= " "x22; Dump $x; $x =~ s/ /b/g; Dump $x' SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x7ffb0b4066b8 " "\0 CUR = 23 LEN = 24 SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x7ffb0b4066b8 "bbbbbbbbbbbbbbbbbbbbbbb"\0 CUR = 23 LEN = 24
*	[perl #118691] Allow defelem magic with neg indices	Father Chrysostomos	2013-08-21	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a nonexistent array element is passed to a subroutine, a special ‘deferred element’ scalar (implemented using something called defelem magic) is passed to the subroutine instead, which delegates to the array element. This allows some_benign_function($array[$nonexistent]) to avoid autovivifying unnecessarily. Whether this magic would be triggered was based on whether the element was within the range 0..$#array. Since arrays can contain nonexistent elements before $#array, this logic is incorrect. It also makes sense to allow $array[$neg] where the negative number points before the beginning of the array to create a deferred element and only croak if it is assigned to. This commit fixes the logic for when deferred elements are created and implements these deferred negative elements. Since we have to be able to store negative values in xlv_targoff, it is convenient to make it a union (with two types--signed and unsigned) and use LvSTARGOFF for defelem array indices.
*	[perl #7508] Use NULL for nonexistent array elems	Father Chrysostomos	2013-08-20	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes bug #7508 and provides the groundwork for fixing several other bugs. Elements of @_ are aliased to the arguments, so that \$_[0] within sub foo will reference the same scalar as \$x if the sub is called as foo($x). &PL_sv_undef (the global read-only undef scalar returned by the ‘undef’ operator itself) was being used to represent nonexistent array elements. So the pattern would be broken for foo(undef), where \$_[0] would vivify a new $_[0] element, treating it as having been nonexistent. This also causes other problems with constants under ithreads (#105906) and causes a pending fix for another bug (#118691) to trig- ger this bug. This commit changes the internals to use a null pointer to represent a nonexistent element. This requires that Storable be changed to account for it. Also, IPC::Open3 was relying on the bug. So this commit patches both modules.
*	Copy PADTMPS passed to XSUBs	Father Chrysostomos	2013-08-13	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	This resolves the last remaining issue in ticket #78194, that newRV is supposedly buggy because it doesn’t copy its referent. The full implications of the PADTMP are not explained anywhere in the API docs, and even XSUBs shouldn’t have to worry about special handling. (E.g., what if they do SvREFCNT_dec(SvRV(sv)); SvRV(sv)=...?) So the real solution here is not to let XSUBs see them.
*	Read-only COWs should not be exempt from s/// croaking	Father Chrysostomos	2013-08-11	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/a/a/ }' Modification of a read-only value attempted at -e line 1. $ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/b/b/ }' $ ./miniperl -Ilib -e 'for("main") { s/a/a/ }' Modification of a read-only value attempted at -e line 1. $ ./miniperl -Ilib -e 'for("main") { s/b/b/ }' Modification of a read-only value attempted at -e line 1. When I pass the constant "main" to s///, it croaks whether the regular expression matches or not. When I pass __PACKAGE__, which has the same content and is also read- only, it only croaks when the pattern matches. This commit removes some logic that is left over from when READONLY+FAKE meant copy-on-write. Read-only does mean read-only now, so copy-on-write scalars should not be exempt from read-only checks.
*	pp_match(): remove some superfluous braces	David Mitchell	2013-07-31	1	-4/+2
\|
*	pp_match(): only look up pos() magic once	David Mitchell	2013-07-31	1	-7/+7
\| \| \| \| \| \| \|	Currently before matching, we see whether the SV has any pos() magic attached; then after the match we look it up again to update pos(). Instead just remember the previous value of mg and reuse it where possible.
*	pp_match(): remove redundant condition	David Mitchell	2013-07-31	1	-8/+5
\| \| \| \| \|	a successful match always sets $-[0] now, so there's no need to check whether its set
*	RT #118213: handle $r=qr/.../; /$r/p properly	David Mitchell	2013-07-30	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the case where a qr// regex is directly used by PMOP (rather than being interpolated with some other stuff and a new regex created, such as /a$r/p), then the PMf_KEEPCOPY flag will be set on the PMOP, but the corresponding RXf_PMf_KEEPCOPY flag won't be set on the regex. Since most of the regex handling for copying the string and extracting out ${^PREMATCH} etc is done based on the RXf_PMf_KEEPCOPY flag in the regex, this is a bit of a problem. Prior to 5.18.0 this wasn't so noticeable, since various other bugs around //p handling meant that ${$PREMATCH} etc often accidentally got set anyway. 5.18.0 fixed these bugs, and so as a side-effect, exposed the PMOP verses regex flag issue. In particular, this stopped working in 5.18.0: my $pat = qr/a/; 'aaaa' =~ /$pat/gp or die; print "MATCH=[${^MATCH}]\n"; (prints 'a' in 5.16.0, undef in 5.18.0). The presence /g caused the engine to copy the string anyway by luck. We can't just set the RXf_PMf_KEEPCOPY flag on the regex if we see the PMf_KEEPCOPY flag on the PMOP, otherwise stuff like this will be wrong: $r = qr/..../; /$r/p; # set RXf_PMf_KEEPCOPY on $r /$r/; # does a /p match by mistake Since for 5.19.x onwards COW is enabled by default (and cheap copies are always made regardless of /p), then this fix is mainly for PERL_NO_COW builds and for backporting to 5.18.x. (Although it still applies to strings that can't be COWed for whatever reason). Since we can't set a flag in the rx, we fix this by: 1) when calling the regex engine (which may attempt to copy part or all of the capture string), make sure we pass REXEC_COPY_STR, but neither of REXEC_COPY_SKIP_PRE, REXEC_COPY_SKIP_POST when we call regexec() from pp_match or pp_subst when the corresponding PMOP has PMf_KEEPCOPY set. 2) in Perl_reg_numbered_buff_fetch() etc, check for PMf_KEEPCOPY in PL_curpm as well as for RXf_PMf_KEEPCOPY in the current rx before deciding whether to process ${^PREMATCH} etc. As well as adding new tests to t/re/reg_pmod.t, I also changed the string to be matched against from being '12...' to '012...', to ensure that the lengths of ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} would all be different.
*	s/.(?=.\G)/X/g: refuse to go backwards	David Mitchell	2013-07-28	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On something like: $_ = "123456789"; pos = 6; s/.(?=.\G)/X/g; each iteration could in theory start with pos one character to the left of the previous position, and with the substitution replacing bits that it has already replaced. Since that way madness lies, ban any attempt by s/// to substitute to the left of a previous position. To implement this, add a new flag to regexec(), REXEC_FAIL_ON_UNDERFLOW. This tells regexec() to return failure even if the match itself succeeded, but where the start of $& is before the passed stringarg point. This change caused one existing test to fail (which was added about a year ago): $_="abcdef"; s/bc\|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge; print; # used to print "aXX[c-d][d-e][e-f]"; now prints "aXXdef" I think that that test relies on ambiguous behaviour, and that my change makes things saner. Note that s/// with \G is generally very under-tested.
*	pp_subst: don't use REXEC_COPY_STR on 2nd match	David Mitchell	2013-07-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	pp_subst() sets the REXEC_COPY_STR flag on the first match. On the second and subsequent matches, it doesn't set it in two out three of the branches (including pp_susbstcont) where it calls CALLREGEXEC(). The one place where it does set it is a (harmless) mistake, since regexec ignores REXEC_COPY_STR if REXEC_NOT_FIRST is set (which is it is, on all 3 brnanches). So unset REXEC_COPY_STR in the third branch too, for consistency
*	pp_subst: combine 3 small elsif blocks into 1	David Mitchell	2013-07-28	1	-11/+5
\| \| \| \|	and slightly reduce the scope of the temporary i var.
*	pp_subst: remove one use of 'm' local var	David Mitchell	2013-07-28	1	-2/+1
\|
*	pp_subst: reduce scope of 'i' variable	David Mitchell	2013-07-28	1	-2/+3
\| \| \| \| \|	it's just used a temporary var in a few blocks; declare it individually in each block rather than being scoped to the whole function.
*	pp_subst: reduce scope of 'm' var	David Mitchell	2013-07-28	1	-7/+8
\| \| \| \| \|	its mainly just a temporary local var; declare it individually within each scope that makes use of it.
*	pp_subst: set/use s,m vars near where they're used	David Mitchell	2013-07-28	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be just a cosmetic change; but basically change stuff like m = orig; s = foo(); ... lots of lines not using s or m ... bar(m,s) ... more stuff using s ... to ... lots of lines not using s or m ... s = foo(); bar(orig,s) ... more stuff using s ... This is part of few commits to generally clean up the scope and comprehensibility of the vars within pp_subst
*	pp_subst: reduce scope of 'd' variable	David Mitchell	2013-07-28	1	-2/+3
\| \| \| \| \|	It's just used as a temporary value in two branches; so make it a local var in each of those branches.
*	pp_subst: cosmetic re-arrangement of vars	David Mitchell	2013-07-28	1	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	since 'orig' always points to the start of the string, while 's' varies, change s = SvPV_nomg(...); ...other stuff using value of s ... orig = s ... to orig = SvPV_nomg(...); ...other stuff using value of orig ... s = orig ... No functional change, just reduces the cognitive load slightly also adds some comments as to what force_on_match is about.
*	pp_match: simplify pos()-getting code	David Mitchell	2013-07-28	1	-16/+9
\| \| \| \| \| \|	The previous commit removed the \G handling from pp_match; most of what's left in that code block is redundant code that just sets curpos under all conditions. So tidy it up.
*	regexec: handle \G ourself, rather than in callers	David Mitchell	2013-07-28	1	-17/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally a /g match starts its processing at the previous pos() (or at char 0 if pos is not set); however in the case of something like /abc\G/ we actually need to start 3 characters before pos. This has been handled by the callers of regexec() subtracting prog->gofs from the stringarg arg before calling it, or by setting stringarg to strbeg for floating, such as /\w+\G/. This is clearly wrong: the callers of regexec() shouldn't need to worry about the details of getting \G right: move this code into regexec() itself. (Note that although this commit passes all tests, it quite possibly isn't logically correct. It will get fixed up further during the next few commits)
*	pp_match(): don't set REXEC_IGNOREPOS on 1st iter	David Mitchell	2013-07-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently all core callers of regexec set both the REXEC_IGNOREPOS and REXEC_NOT_FIRST flags, or neither, depending on whether this is the first or subsequent iteration of a //g; except for one place in pp_match(), where REXEC_IGNOREPOS is set on the first iteration for the one specific case of /g with an anchored \G. Now AFAICT this makes no difference, because the starting position as calculated by regexec() still comes to the same value of (strbeg + pos -gofs), and the same value og ganch calculated. Also in the commit that added this particular use of the flag to pp_match, (0ef3e39ecdfec), removing the flag makes no difference to the passing or not of the new test case. So I don't understand what its purpose it, and its possibly a mistake. Removing it now makes the code simpler for further clearup.
*	pp_match(): stop setting $-[0] before regexec()	David Mitchell	2013-07-28	1	-5/+5
\| \| \| \|	It doesn't actually achieve anything.
*	pp_match: avoid setting $+[0]	David Mitchell	2013-07-28	1	-5/+7
\| \| \| \| \| \|	This function sometimes set $+[0] to pos() before calling regexec(). This value isn't used by regexec(), and was really just a way of updating the new start position for //g. Replace it with a local var instead.
*	pp_match(): eliminate unused t variable	David Mitchell	2013-07-28	1	-8/+7
\| \| \| \|	and restrict usage of s variable
*	pp_match(): skip passing gpos arg to regexec()	David Mitchell	2013-07-28	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In one specific case, pp_match() passes the value of pos() to regexec() via the otherwise unused 'data' arg. It turns out that pp_match() only passes this value when it exists and is >= 0, while regexec() only uses it when there's no pos magic or pos() < 0. So its never used as far as I can tell. So, strip it for now.
*	add intuit-only match to s///	David Mitchell	2013-07-28	1	-15/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pp_match() has an intuit-only match mode: if intuit_start() succeeds and the regex is marked as only needing intuit (RXf_CHECK_ALL), then calling regexec() is skipped, and just $& set and then returns. The commit which originally added that feature to pp_match() also added a comment to pp_subst() suggesting that the same thing could be done there. This commit finally achieves that. It builds on the previous commit (which moved this mechanism from pp_match() directly into regexec()), skipping calling intuit_start() and directly calling regexec() with the REXEC_CHECKED flag not set. This appears to reduce the execution time of a simple substitution like s/abc/def/ by a fifth.
*	move intuit call from pp_match() into regexec()	David Mitchell	2013-07-28	1	-30/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the main part of pp_match() looks like: if (can_use_intuit) { if (!intuit_start()) goto nope; if (can_match_based_only_on_intuit_result) { ... set up $&, $-[0] etc ... goto gotcha; } } if (!regexec(..., REXEC_CHECKED\|r_flags)) goto nope; gotcha: ... This rather breaks the regex API encapulation. The caller of the regex engine shouldn't have to worry about whether to call intuit() or regexec(), and to know to set $& in the intuit-only case. So, move all the intuit-calling and $& setting into regexec itself. This is cleaner, and will also shortly allow us to enable intuit-only matches in pp_subst() too. After this change, the code above looks like (in its entirety): if (!regexec(..., r_flags)) goto nope; ... There, isn't that nicer?
*	make intuit_start() handle mixed utf8-ness	David Mitchell	2013-07-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug in intuit_start() that makes it fail when the utf8-ness of the string and pattern differ. This was mostly masked, since pp_match() skips calling intuit in this case (and has done since 2000, presumably as a workaround for this issue, and possibly for other issues since fixed). But pp_subst() didn't skip, so code like this would fail: $c = "\x{c0}"; utf8::upgrade($c); print "ok\n" if $c =~ s/\xC0{1,2}$/\xC0/i; Now that intuit is (hopefully) fixed, also remove the guard in pp_match().
*	pp_match(): fix UTF* match setting	David Mitchell	2013-07-28	1	-1/+1
\| \| \| \| \| \| \|	A recent commit did RX_MATCH_UTF8_set() based on the utf8-ness of the pattern rather than the match string. I didn't matter because in that branch they were guaranteed to have the same value, but fix it anyway, both for correctness sake, and because it it will matter shortly