summaryrefslogtreecommitdiff
path: root/pp_hot.c
Commit message (Collapse)AuthorAgeFilesLines
* [perl #121854] use re 'taint' regressionDavid Mitchell2014-05-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent regexes so that the pattern was considered tainted at compile-time, rather than determining it each time at run-time whenever it executed a locale-dependent node. Unfortunately due to the conflating of two flags, RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently marking a pattern as tainted once it had had a single tainted result. E.g. use re qw(taint); use Scalar::Util qw(tainted); for ($^X, "abc") { /(.*)/ or die; print "not " unless tainted("$1"); print "tainted\n"; }; which from 5.19.9 onwards output: tainted tainted but with this commit (and with 5.19.8 and earlier), it now outputs: tainted not tainted The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g. $r = qr/$tainted_value/ while the RXf_TAINTED_SEEN flag means that the results of the last match are tainted, e.g. use re 'tainted'; $tainted =~ /(.*)/; # $1 is tainted Pre 63baef5, the code used to look like: at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { switch(node) { case BOUNDL: /* and other locale-specific ops */ turn on RXf_TAINTED_SEEN; ...; } } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN; 63baef5 changed it to: at compile-time: if (pattern has locale ops) turn on RXf_TAINTED_SEEN; at run-time: while (nodes to execute) { ... } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN; This commit changes it to: at compile-time; if (pattern has locale ops) turn on RXf_TAINTED; at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { ... } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN;
* Reduce excessive stat calls in glob on VMS.Craig A. Berry2014-03-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PERL_EXTERNAL_GLOB is defined (currently only on VMS, or on other platforms when running miniperl), each item returned from the glob operation is checked against a set of glob metacharacters, and then, if it matches any of these characters, it's checked with lstat() to see if it actually exists. Then it's passed through if it exists but skipped otherwise. Presumably this is because returning the pattern unchanged is how a shell glob indicates "no match." It appears that the lstat() is necessary because glob metacharacters are almost always valid filename characters on Unix, so it's the only way to distinguish a funny-looking but real filename from the no match case (oops -- indeterminate grammar). Since these filenames are rare on Unix, lstat() is seldom called. Enter VMS, where "external glob" is neither external, nor is it actually a glob. It is a native wildcard-matching search built into Perl with vms/vms.c's Perl_vms_start_glob(), which does return the original pattern when there is no match, but that pattern will only contain native wildcard characters, so the check for glob metacharacters is really not the right thing to be doing. Moreover, glob metacharacters such as dollar signs, brackets, and semicolons are extremely common in native VMS paths, so in many common scenarios, the lstat() to see if the file really exists gets triggered for every single file, which is expensive. This commit replaces, on VMS only, the check for glob metacharacters with a check for VMS wildcard characters. A simple glob of a Perl source tree becomes 60% faster on the first iteration and 80% faster on subsequent iterations. Thanks to Hein van den Heuvel, who reported the problem and explained that a search operation should only be looking at directory files, whereas a stat does I/O to the individual file header, resulting in a massive increase in unnecessary I/O. This closes [perl #121440].
* Change core uses of Perl_do_openn() to Perl_do_open6() or Perl_do_open_raw().Nicholas Clark2014-03-191-1/+1
| | | | | | Calls to Perl_do_openn() all have at least 2 unused arguments which clutter the code and hinder easy understanding. Perl_do_open6() and Perl_do_open_raw() each only do one job, so don't have the dead arguments.
* sprinkle LIKELY() on pp_hot.c scope.c and some *.hDavid Mitchell2014-03-121-53/+56
| | | | | | | | I've gone through pp_hot.c and scope.c and added LIKELY() or UNLIKELY() to all conditionals where I understand the code well enough to know that a particular branch is or isn't likely to be taken very often. I also processed some of the .h files which contain commonly used macros.
* make OP_AELEMFAST work with negative indicesDavid Mitchell2014-02-281-1/+5
| | | | | | | | | | Use aelemfast for literal index array access where the index is in the range -128..127, rather than 0..255. You'd expect something like $a[-1] or $a[-2] to be a lot more common than $a[100] say. In fact a quick CPAN grep shows 66 distributions matching /\$\w+\[\d{3,}\]/, but "at least" 1000 matching /\$\w+\[\-\d\]/. And most of the former appear to be table initialisations.
* don't set SvPADTMP() on PADGV'sDavid Mitchell2014-02-271-4/+11
| | | | | | | | | | | | | | | | | | | | | | | Under threaded builds, GVs for OPs are stored in the pad rather than being directly attached to the op. For some reason, all such GV's were getting the SvPADTMP flag set. There seems to be be no good reason for this, and after skipping setting the flag, all tests still pass. The advantage of not setting this flag is that there are quite a few hot places in the code that do if (SvPADTMP(sv) && !IS_PADGV(sv)) { ... I've replaced them all with if (SvPADTMP(sv)) { assert(!IS_PADGV(sv)); ... Since the IS_PADGV() macro expands to something quite heavyweight, this is quite a saving: for example this commit reduces the size of pp_entersub by 111 bytes.
* pp_entersub(): simplify XSUB arg cleanup codeDavid Mitchell2014-02-271-6/+6
| | | | | | The code that reduces the returned XS args to 1 arg in scalar context was a bit messy. This makes it smaller, more comprehensible, with slighylu smaller object size.
* pp_entersub(): tweak some varsDavid Mitchell2014-02-271-5/+8
| | | | | | | | | make the scope of item be smaller; don't assign +1 to MARK when we will shortly overwrite it; use a local var to to store &GvAV(PL_defgv) to avoid recalculating it use a _NN variant of SvREFCNT_inc Shaves 2% off the object size of pp_entersub.
* pp_entersub(): assign CvDEPTH to a local varDavid Mitchell2014-02-271-5/+5
| | | | stop repeatedly retrieving it
* pp_entersub(): re-indent blockDavid Mitchell2014-02-271-54/+54
| | | | | the previous commit wrapped a switch statement in an if(), so re-indent accordingly. Whitespace-only change.
* tweak pp_entersubDavid Mitchell2014-02-271-18/+25
| | | | | | | | | liberally sprinkle some LIKELY()s, initialise gimme closer to where it's first needed, and shortcut the common case of a GV with valid CV, skipping the switch statement. With gcc and x86_64, this reduces the code size of the function by about 5%.
* Change av_len calls to av_tindex for clarityKarl Williamson2014-02-201-2/+2
| | | | | | av_tindex is a more clearly named synonym for av_len, available starting in v5.18. This changes the core uses to it, including modules in /ext, which are not dual-lifed.
* Make taint checking regex compile time instead of runtimeKarl Williamson2014-02-191-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | See discussion at https://rt.perl.org/Ticket/Display.html?id=120675 There are several unresolved items in this discussion, but we did agree that tainting should be dependent only on the regex pattern, and not the particular input string being matched against: "The bottom line is we are moving to the policy that tainting is based on the operation being in locale, without regard to the particular operand's contents passed this time to the operation. This means simpler core code and more consistent tainting results. And it lessens the likelihood that there are paths in the core that should taint but don't" This commit does the minimal work to change regex pattern matching to determine tainting at pattern compilation time. Simply put, if a pattern contains a regnode whose match/not match depends on the run-time locale, any attempt to match against that pattern will taint, regardless of the actual target string or runtime locale in effect. Given this change, there are optimizations that can be made to avoid runtime work, but these are deferred until later. Note that just because a regular expression is compiled under locale doesn't mean that the generated pattern will be tainted. It depends on the actual pattern. For example, the pattern /(.)/ doesn't taint because it will match exactly one character of the input, regardless of locale settings.
* pp_concat: Only call SvPV_force_nolen when neededSteffen Mueller2014-02-101-4/+6
| | | | | | | | | If we just did an sv_setpvs on it, the SvPV_force_nolen should not do anything useful, so let's not. Side note: s/TARG/left/ in the enclosing block because they are the same pointer, so why use a define that needs grokking by the reader of the code if the local variable is guaranteed to be the same?
* pp_hot.c: Rmv unnecessary testKarl Williamson2014-01-221-2/+1
| | | | | The strchr() cannot not succeed unless the first test fails, so the first test need not be tried at all.
* fix 'ignoring return value' compiler warningsDavid Mitchell2013-11-241-10/+19
| | | | | | | | | | | Various system functions like write() are marked with the __warn_unused_result__ attribute, which causes an 'ignoring return value' warning to be emitted, even if the function call result is cast to (void). The generic solution seems to be int rc = write(...); PERL_UNUSED_VAR(rc);
* [perl #119811] Remove %DB::lsubFather Chrysostomos2013-10-281-1/+1
| | | | | | | | Under the debugger (the -d switch), if there is a DB::sub subroutine, then %DB::lsub gets autovivified during a call to an lvalue sub. That hash is never used. The code for vivifying the *DB::lsub glob was copied from *DB::sub when lsub was added, and *DB::sub does have a hash (%DB::sub), but *DB::lsub doesn’t need one.
* pp_hot.c:pp_rv2av: Remove superfluous SPAGAINFather Chrysostomos2013-10-241-1/+0
| | | | | | | | | | Commit perl-5.8.0-9908-g5e17dd7, in March of 2007, added a SPAGAIN here to account for the stack shifting when hv_scalar calls tied hashes’ SCALAR method. It was never necessary, because commit perl-5.8.0-3008-ga3bcc51, which added hv_scalar and magic_scalarpack in December of 2003, made magic_scalarpack push a new stack, protecting the old one.
* rv2hv does not use its TARGFather Chrysostomos2013-10-241-1/+1
| | | | | | | rv2hv has had a TARG since perl 5.000, but it has not used it since hv_scalar was added in perl-5.8.0-3008-ga3bcc51. This commit removes it, saving a tiny bit of space in the pad.
* Make &xsub and goto &xsub work with tied @_Father Chrysostomos2013-09-091-2/+11
| | | | | | | | | | | | | This is the only place where tied @_ does not work, and there appears to be no reason why it shouldn’t, apart from the fact that it hasn’t been implemented. Commit 67955e0c was what made &xsub work to begin with. 93965878572 introduced tied arrays and added the comment to pp_entersub saying that @_ is not tiable. goto &xsub has worked since perl 5.000, but 93965878572 did not make it work with tied arrays.
* [perl #117265] move the "glob failed" warning to the point of failureTony Cook2013-09-091-7/+3
| | | | This avoids an extraneous warning when globbing fails for other reasons.
* Allow 64-bit array and stack offsets in entersub & gotoFather Chrysostomos2013-09-061-4/+4
| | | | | I don’t have enough memory to test this, but it needs to be done even- tually anyway.
* Stop &xsub and goto &xsub from crashing on undef *_Father Chrysostomos2013-09-061-1/+1
| | | | | | | | | | | | | | | | $ perl -e 'undef *_; &Internals::V' Segmentation fault: 11 $ perl -e 'sub { undef *_; goto &Internals::V }->()' $ perl5.18.1 -e 'sub { undef *_; goto &Internals::V }->()' Segmentation fault: 11 The goto case is actually a regression from 5.16 (049bd5ffd62), as goto used to ignore changes to *_. (Fixing one bug uncovers another.) We shouldn’t assume that GvAV(PL_defgv) (*_{ARRAY}) gives us anything. While we’re at it, since we have to add extra checks anyway, use them to speed up empty @_ in goto (by checking items, rather than arg).
* Put AV defelem creation code in one placeFather Chrysostomos2013-09-061-24/+5
|
* Use defelems for (goto) &xsub callsFather Chrysostomos2013-09-061-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before ce0d59f: $ perl -e '++$#_; &utf8::encode' Modification of a read-only value attempted at -e line 1. As of ce0d59f: $ ./perl -Ilib -e '++$#_; &utf8::encode' Assertion failed: (sv), function Perl_sv_utf8_encode, file sv.c, line 3581. Abort trap: 6 Calling sub { utf8::encode($_[0]) } should be more or less equivalent to calling utf8::encode, but it is not in this case: $ ./perl -Ilib -we '++$#_; &{sub { utf8::encode($_[0]) }}' Use of uninitialized value in subroutine entry at -e line 1. In the first two examples above, an implementation detail is leaking through. What you are seeing is not the array element, but a place- holder that indicates an element that has not been assigned to yet. We should use defelem magic so that what the XSUB assigns to will cre- ate an array element (as happens with utf8::encode($_[0])). All of the above applies to goto &xsub as well.
* pp_hot.c:pp_aelem: Use _NN in one spotFather Chrysostomos2013-09-061-1/+1
| | | | | This av can never be null here. av_len will already have failed an assertion if it is.
* Stop creating defelems for undef in foreach(@_)Father Chrysostomos2013-08-281-4/+3
| | | | | | | | | | | | | | | | | | This is part of ticket #119433. This particular bug is triggered by Data::Dump’s test suite. Commit ce0d59f changed arrays to use NULL for nonexistent elements, instead of &PL_sv_undef (the special scalar returned by Perl’s ‘undef’ operator). ‘foreach’ was not updated to account. It was still treating &PL_sv_undef as a nonexistent element. This was causing ‘Modifica- tion of non-creatable array value attempted, subscript 0’, due to a similar bug in vivify_defelem, which the next commit will fix. (Fixing vivify_defelem without fixing foreach will make the test pass, but for foreach to create a defelem to begin with is inefficient and should be addressed anyway.)
* [perl #119311] Keep CvDEPTH and savestack in syncFather Chrysostomos2013-08-271-1/+1
| | | | | | | | | | | | | | | | when unwinding sub and format calls. The comments in the added test file explain what the problem is. The fix is to call LEAVE_SCOPE in POPSUB and POPFORMAT (to free their lexicals) before lowering CvDEPTH. If the context has already been popped via cxstack_ix--, then LEAVE_SCOPE could overwrite it, so accessing cx after LEAVE_SCOPE is unsafe. Hence the changes to POPSUB and POPFORMAT are a bit involved. Some callers of POPSUB do a temporary cxstack_ix++ first so they can access cx afterwards. Two cases needed to be changed to work that way.
* pp_hot.c: Show lengths in -Dr output for minlen optimisationFather Chrysostomos2013-08-251-1/+3
|
* [perl #116907] Allow //g matching past 2**31 thresholdFather Chrysostomos2013-08-251-1/+1
| | | | | | | | | Change the internal fields for storing positions so that //g in scalar context can move past the 2**31 character threshold. Before this com- mit, the numbers would wrap, resulting in assertion failures. The changes in this commit are only enough to get the added test pass- ing. Stay tuned for more.
* Stop minlen regexp optimisation from rejecting long stringsFather Chrysostomos2013-08-251-1/+1
| | | | | | | | | | This fixes #112790 and part of #116907. The length of the string is cast to I32, so it wraps and end up less than the minimum length. For now, simply skip this optimisation if minlen itself wraps and becomes negative.
* Stop pos() from being confused by changing utf8nessFather Chrysostomos2013-08-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
* Use SSize_t for arraysFather Chrysostomos2013-08-251-7/+7
| | | | | | | | | | Make the array interface 64-bit safe by using SSize_t instead of I32 for array indices. This is based on a patch by Chip Salzenberg. This completes what the previous commit began when it changed av_extend.
* [perl #118747] Allow in-place s///g when !!PL_sawampersandFather Chrysostomos2013-08-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the more correct version of 1555b325 which was reverted by 6200d5a0e. In pp_subst, there is an initial pattern match against the target string, followed by logic to determine which of several code paths will handle the rest of the substitution, depending on which shortcuts can be taken. There is one path specifically for doing a global sort (/g) and modi- fying the target in place. This code was skipped if the target was a copy-on-write scalar or if the pre-match copy was enabled. The pre- match copy is always enabled now, so this code is unreachable. In-place substitution stringifies the rhs at the outset, just after the first regexp match, but before any substitution. Then it uses that string buffer, expecting it not to change. That clearly cannot work with s/a/$&/g; it will also cause erratic behaviour in the case of regexp code blocks (which will see the string being modified, which doesn not happen with unoptimised subst). That’s why the in-place optimisation has to be skipped when the REXEC_COPY_STR flag is set. But we can tweak that logic: • As long as the rhs is not a magical var, its contents are not going to change from one iteration to the next. • If there are no code blocks, nothing will see the string during the substitution. So this commit adds logic to check those things, enabling this opti- misation where possible. Skipping this optimisation for the pre-match copy was originally added in commit 5d5aaa5e7.
* Revert "[perl #118747] Allow in-place s///g when !!PL_sawampersand"Father Chrysostomos2013-08-221-0/+1
| | | | | | | This reverts commit 1555b325296e46f7b95bee03fe856cec348b0d57. This is causing test failures (not on my machine), and I do not have time right now to track them all down.
* pp_hot.c:pp_subst: Move commentFather Chrysostomos2013-08-211-2/+2
| | | | | | | | | Perl 5 has always had the ‘don't match same null twice’ comment in pp_subst. Originally it was a parenthetical note right below the s == m. 71be2cbc removed the parentheses. Commit f722798be moved it further away from s == m. Now what it refers to is far from clear.
* [perl #118747] Allow in-place s///g when !!PL_sawampersandFather Chrysostomos2013-08-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pp_subst, there is an initial pattern match against the target string, followed by logic to determine which of several code paths will handle the rest of the substitution, depending on which shortcuts can be taken. There is one path specifically for doing a global sort (/g) and modi- fying the target in place. This code was skipped if the target was a copy-on-write scalar or if the pre-match copy was enabled. The pre- match copy is always enabled now, so this code is unreachable. There does not appear to be any reason why this path must be skipped in the presence of the pre-match copy. The string gets copied by the initial regexp match and $& and friends point there afterwards. This skip was added in commit 5d5aaa5e7 (a jumbo patch, so good luck figuring it out). This commit removes the skip, and all tests pass. This, of course, only affects those cases where copy-on-write does not kick in; for instance, when the string’s length is one less than its buffer: $ ./perl -Ilib -e 'use Devel::Peek; $x = " "; $x .= " "x22; Dump $x; $x =~ s/ /b/g; Dump $x' SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x7ffb0b4066b8 " "\0 CUR = 23 LEN = 24 SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x7ffb0b4066b8 "bbbbbbbbbbbbbbbbbbbbbbb"\0 CUR = 23 LEN = 24
* [perl #118691] Allow defelem magic with neg indicesFather Chrysostomos2013-08-211-2/+8
| | | | | | | | | | | | | | | | | | | | | | When a nonexistent array element is passed to a subroutine, a special ‘deferred element’ scalar (implemented using something called defelem magic) is passed to the subroutine instead, which delegates to the array element. This allows some_benign_function($array[$nonexistent]) to avoid autovivifying unnecessarily. Whether this magic would be triggered was based on whether the element was within the range 0..$#array. Since arrays can contain nonexistent elements before $#array, this logic is incorrect. It also makes sense to allow $array[$neg] where the negative number points before the beginning of the array to create a deferred element and only croak if it is assigned to. This commit fixes the logic for when deferred elements are created and implements these deferred negative elements. Since we have to be able to store negative values in xlv_targoff, it is convenient to make it a union (with two types--signed and unsigned) and use LvSTARGOFF for defelem array indices.
* [perl #7508] Use NULL for nonexistent array elemsFather Chrysostomos2013-08-201-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes bug #7508 and provides the groundwork for fixing several other bugs. Elements of @_ are aliased to the arguments, so that \$_[0] within sub foo will reference the same scalar as \$x if the sub is called as foo($x). &PL_sv_undef (the global read-only undef scalar returned by the ‘undef’ operator itself) was being used to represent nonexistent array elements. So the pattern would be broken for foo(undef), where \$_[0] would vivify a new $_[0] element, treating it as having been nonexistent. This also causes other problems with constants under ithreads (#105906) and causes a pending fix for another bug (#118691) to trig- ger this bug. This commit changes the internals to use a null pointer to represent a nonexistent element. This requires that Storable be changed to account for it. Also, IPC::Open3 was relying on the bug. So this commit patches both modules.
* Copy PADTMPS passed to XSUBsFather Chrysostomos2013-08-131-0/+9
| | | | | | | | | | This resolves the last remaining issue in ticket #78194, that newRV is supposedly buggy because it doesn’t copy its referent. The full implications of the PADTMP are not explained anywhere in the API docs, and even XSUBs shouldn’t have to worry about special handling. (E.g., what if they do SvREFCNT_dec(SvRV(sv)); SvRV(sv)=...?) So the real solution here is not to let XSUBs see them.
* Read-only COWs should not be exempt from s/// croakingFather Chrysostomos2013-08-111-3/+0
| | | | | | | | | | | | | | | | | | | | $ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/a/a/ }' Modification of a read-only value attempted at -e line 1. $ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/b/b/ }' $ ./miniperl -Ilib -e 'for("main") { s/a/a/ }' Modification of a read-only value attempted at -e line 1. $ ./miniperl -Ilib -e 'for("main") { s/b/b/ }' Modification of a read-only value attempted at -e line 1. When I pass the constant "main" to s///, it croaks whether the regular expression matches or not. When I pass __PACKAGE__, which has the same content and is also read- only, it only croaks when the pattern matches. This commit removes some logic that is left over from when READONLY+FAKE meant copy-on-write. Read-only does mean read-only now, so copy-on-write scalars should not be exempt from read-only checks.
* pp_match(): remove some superfluous bracesDavid Mitchell2013-07-311-4/+2
|
* pp_match(): only look up pos() magic onceDavid Mitchell2013-07-311-7/+7
| | | | | | | Currently before matching, we see whether the SV has any pos() magic attached; then after the match we look it up again to update pos(). Instead just remember the previous value of mg and reuse it where possible.
* pp_match(): remove redundant conditionDavid Mitchell2013-07-311-8/+5
| | | | | a successful match always sets $-[0] now, so there's no need to check whether its set
* RT #118213: handle $r=qr/.../; /$r/p properlyDavid Mitchell2013-07-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where a qr// regex is directly used by PMOP (rather than being interpolated with some other stuff and a new regex created, such as /a$r/p), then the PMf_KEEPCOPY flag will be set on the PMOP, but the corresponding RXf_PMf_KEEPCOPY flag *won't* be set on the regex. Since most of the regex handling for copying the string and extracting out ${^PREMATCH} etc is done based on the RXf_PMf_KEEPCOPY flag in the regex, this is a bit of a problem. Prior to 5.18.0 this wasn't so noticeable, since various other bugs around //p handling meant that ${$PREMATCH} etc often accidentally got set anyway. 5.18.0 fixed these bugs, and so as a side-effect, exposed the PMOP verses regex flag issue. In particular, this stopped working in 5.18.0: my $pat = qr/a/; 'aaaa' =~ /$pat/gp or die; print "MATCH=[${^MATCH}]\n"; (prints 'a' in 5.16.0, undef in 5.18.0). The presence /g caused the engine to copy the string anyway by luck. We can't just set the RXf_PMf_KEEPCOPY flag on the regex if we see the PMf_KEEPCOPY flag on the PMOP, otherwise stuff like this will be wrong: $r = qr/..../; /$r/p; # set RXf_PMf_KEEPCOPY on $r /$r/; # does a /p match by mistake Since for 5.19.x onwards COW is enabled by default (and cheap copies are always made regardless of /p), then this fix is mainly for PERL_NO_COW builds and for backporting to 5.18.x. (Although it still applies to strings that can't be COWed for whatever reason). Since we can't set a flag in the rx, we fix this by: 1) when calling the regex engine (which may attempt to copy part or all of the capture string), make sure we pass REXEC_COPY_STR, but neither of REXEC_COPY_SKIP_PRE, REXEC_COPY_SKIP_POST when we call regexec() from pp_match or pp_subst when the corresponding PMOP has PMf_KEEPCOPY set. 2) in Perl_reg_numbered_buff_fetch() etc, check for PMf_KEEPCOPY in PL_curpm as well as for RXf_PMf_KEEPCOPY in the current rx before deciding whether to process ${^PREMATCH} etc. As well as adding new tests to t/re/reg_pmod.t, I also changed the string to be matched against from being '12...' to '012...', to ensure that the lengths of ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} would all be different.
* s/.(?=.\G)/X/g: refuse to go backwardsDavid Mitchell2013-07-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On something like: $_ = "123456789"; pos = 6; s/.(?=.\G)/X/g; each iteration could in theory start with pos one character to the left of the previous position, and with the substitution replacing bits that it has already replaced. Since that way madness lies, ban any attempt by s/// to substitute to the left of a previous position. To implement this, add a new flag to regexec(), REXEC_FAIL_ON_UNDERFLOW. This tells regexec() to return failure even if the match itself succeeded, but where the start of $& is before the passed stringarg point. This change caused one existing test to fail (which was added about a year ago): $_="abcdef"; s/bc|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge; print; # used to print "aXX[c-d][d-e][e-f]"; now prints "aXXdef" I think that that test relies on ambiguous behaviour, and that my change makes things saner. Note that s/// with \G is generally very under-tested.
* pp_subst: don't use REXEC_COPY_STR on 2nd matchDavid Mitchell2013-07-281-2/+1
| | | | | | | | | | | pp_subst() sets the REXEC_COPY_STR flag on the first match. On the second and subsequent matches, it doesn't set it in two out three of the branches (including pp_susbstcont) where it calls CALLREGEXEC(). The one place where it *does* set it is a (harmless) mistake, since regexec ignores REXEC_COPY_STR if REXEC_NOT_FIRST is set (which is it is, on all 3 brnanches). So unset REXEC_COPY_STR in the third branch too, for consistency
* pp_subst: combine 3 small elsif blocks into 1David Mitchell2013-07-281-11/+5
| | | | and slightly reduce the scope of the temporary i var.
* pp_subst: remove one use of 'm' local varDavid Mitchell2013-07-281-2/+1
|
* pp_subst: reduce scope of 'i' variableDavid Mitchell2013-07-281-2/+3
| | | | | it's just used a temporary var in a few blocks; declare it individually in each block rather than being scoped to the whole function.