| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the only place where tied @_ does not work, and there appears
to be no reason why it shouldn’t, apart from the fact that it hasn’t
been implemented.
Commit 67955e0c was what made &xsub work to begin with. 93965878572
introduced tied arrays and added the comment to pp_entersub saying
that @_ is not tiable.
goto &xsub has worked since perl 5.000, but 93965878572 did not make
it work with tied arrays.
|
|
|
|
| |
This avoids an extraneous warning when globbing fails for other reasons.
|
|
|
|
|
| |
I don’t have enough memory to test this, but it needs to be done even-
tually anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ perl -e 'undef *_; &Internals::V'
Segmentation fault: 11
$ perl -e 'sub { undef *_; goto &Internals::V }->()'
$ perl5.18.1 -e 'sub { undef *_; goto &Internals::V }->()'
Segmentation fault: 11
The goto case is actually a regression from 5.16 (049bd5ffd62), as
goto used to ignore changes to *_. (Fixing one bug uncovers another.)
We shouldn’t assume that GvAV(PL_defgv) (*_{ARRAY}) gives us anything.
While we’re at it, since we have to add extra checks anyway, use them
to speed up empty @_ in goto (by checking items, rather than arg).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before ce0d59f:
$ perl -e '++$#_; &utf8::encode'
Modification of a read-only value attempted at -e line 1.
As of ce0d59f:
$ ./perl -Ilib -e '++$#_; &utf8::encode'
Assertion failed: (sv), function Perl_sv_utf8_encode, file sv.c, line 3581.
Abort trap: 6
Calling sub { utf8::encode($_[0]) } should be more or less equivalent
to calling utf8::encode, but it is not in this case:
$ ./perl -Ilib -we '++$#_; &{sub { utf8::encode($_[0]) }}'
Use of uninitialized value in subroutine entry at -e line 1.
In the first two examples above, an implementation detail is leaking
through. What you are seeing is not the array element, but a place-
holder that indicates an element that has not been assigned to yet.
We should use defelem magic so that what the XSUB assigns to will cre-
ate an array element (as happens with utf8::encode($_[0])).
All of the above applies to goto &xsub as well.
|
|
|
|
|
| |
This av can never be null here. av_len will already have failed an
assertion if it is.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is part of ticket #119433. This particular bug is triggered by
Data::Dump’s test suite.
Commit ce0d59f changed arrays to use NULL for nonexistent elements,
instead of &PL_sv_undef (the special scalar returned by Perl’s ‘undef’
operator).
‘foreach’ was not updated to account. It was still treating
&PL_sv_undef as a nonexistent element. This was causing ‘Modifica-
tion of non-creatable array value attempted, subscript 0’, due to a
similar bug in vivify_defelem, which the next commit will fix.
(Fixing vivify_defelem without fixing foreach will make the test pass,
but for foreach to create a defelem to begin with is inefficient and
should be addressed anyway.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when unwinding sub and format calls.
The comments in the added test file explain what the problem is.
The fix is to call LEAVE_SCOPE in POPSUB and POPFORMAT (to free their
lexicals) before lowering CvDEPTH.
If the context has already been popped via cxstack_ix--, then
LEAVE_SCOPE could overwrite it, so accessing cx after LEAVE_SCOPE is
unsafe. Hence the changes to POPSUB and POPFORMAT are a bit involved.
Some callers of POPSUB do a temporary cxstack_ix++ first so they
can access cx afterwards. Two cases needed to be changed to
work that way.
|
| |
|
|
|
|
|
|
|
|
|
| |
Change the internal fields for storing positions so that //g in scalar
context can move past the 2**31 character threshold. Before this com-
mit, the numbers would wrap, resulting in assertion failures.
The changes in this commit are only enough to get the added test pass-
ing. Stay tuned for more.
|
|
|
|
|
|
|
|
|
|
| |
This fixes #112790 and part of #116907.
The length of the string is cast to I32, so it wraps and end up less
than the minimum length.
For now, simply skip this optimisation if minlen itself wraps and
becomes negative.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at -e line 1.
0
So pos() should be stored as a character offset.
The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.
This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.
|
|
|
|
|
|
|
|
|
|
| |
Make the array interface 64-bit safe by using SSize_t instead of I32
for array indices.
This is based on a patch by Chip Salzenberg.
This completes what the previous commit began when it changed
av_extend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the more correct version of 1555b325 which was reverted
by 6200d5a0e.
In pp_subst, there is an initial pattern match against the target
string, followed by logic to determine which of several code paths
will handle the rest of the substitution, depending on which shortcuts
can be taken.
There is one path specifically for doing a global sort (/g) and modi-
fying the target in place. This code was skipped if the target was a
copy-on-write scalar or if the pre-match copy was enabled. The pre-
match copy is always enabled now, so this code is unreachable.
In-place substitution stringifies the rhs at the outset, just after
the first regexp match, but before any substitution. Then it uses
that string buffer, expecting it not to change.
That clearly cannot work with s/a/$&/g; it will also cause erratic
behaviour in the case of regexp code blocks (which will see the
string being modified, which doesn not happen with unoptimised subst).
That’s why the in-place optimisation has to be skipped when the
REXEC_COPY_STR flag is set.
But we can tweak that logic:
• As long as the rhs is not a magical var, its contents are not going
to change from one iteration to the next.
• If there are no code blocks, nothing will see the string during the
substitution.
So this commit adds logic to check those things, enabling this opti-
misation where possible.
Skipping this optimisation for the pre-match copy was originally added
in commit 5d5aaa5e7.
|
|
|
|
|
|
|
| |
This reverts commit 1555b325296e46f7b95bee03fe856cec348b0d57.
This is causing test failures (not on my machine), and I do not have
time right now to track them all down.
|
|
|
|
|
|
|
|
|
| |
Perl 5 has always had the ‘don't match same null twice’ comment
in pp_subst. Originally it was a parenthetical note right below
the s == m.
71be2cbc removed the parentheses. Commit f722798be moved it further
away from s == m. Now what it refers to is far from clear.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pp_subst, there is an initial pattern match against the target
string, followed by logic to determine which of several code paths
will handle the rest of the substitution, depending on which shortcuts
can be taken.
There is one path specifically for doing a global sort (/g) and modi-
fying the target in place. This code was skipped if the target was a
copy-on-write scalar or if the pre-match copy was enabled. The pre-
match copy is always enabled now, so this code is unreachable.
There does not appear to be any reason why this path must be skipped
in the presence of the pre-match copy. The string gets copied by the
initial regexp match and $& and friends point there afterwards.
This skip was added in commit 5d5aaa5e7 (a jumbo patch, so good luck
figuring it out). This commit removes the skip, and all tests pass.
This, of course, only affects those cases where copy-on-write does
not kick in; for instance, when the string’s length is one less than
its buffer:
$ ./perl -Ilib -e 'use Devel::Peek; $x = " "; $x .= " "x22; Dump $x; $x =~ s/ /b/g; Dump $x'
SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7ffb0b4066b8 " "\0
CUR = 23
LEN = 24
SV = PV(0x7ffb0b807098) at 0x7ffb0b82eed8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7ffb0b4066b8 "bbbbbbbbbbbbbbbbbbbbbbb"\0
CUR = 23
LEN = 24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a nonexistent array element is passed to a subroutine, a special
‘deferred element’ scalar (implemented using something called defelem
magic) is passed to the subroutine instead, which delegates to the
array element. This allows some_benign_function($array[$nonexistent])
to avoid autovivifying unnecessarily.
Whether this magic would be triggered was based on whether the element
was within the range 0..$#array. Since arrays can contain nonexistent
elements before $#array, this logic is incorrect. It also makes sense
to allow $array[$neg] where the negative number points before the
beginning of the array to create a deferred element and only croak if
it is assigned to.
This commit fixes the logic for when deferred elements are created
and implements these deferred negative elements.
Since we have to be able to store negative values in xlv_targoff, it
is convenient to make it a union (with two types--signed and unsigned)
and use LvSTARGOFF for defelem array indices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes bug #7508 and provides the groundwork for fixing
several other bugs.
Elements of @_ are aliased to the arguments, so that \$_[0] within
sub foo will reference the same scalar as \$x if the sub is called
as foo($x).
&PL_sv_undef (the global read-only undef scalar returned by the
‘undef’ operator itself) was being used to represent nonexistent
array elements. So the pattern would be broken for foo(undef), where
\$_[0] would vivify a new $_[0] element, treating it as having been
nonexistent.
This also causes other problems with constants under ithreads
(#105906) and causes a pending fix for another bug (#118691) to trig-
ger this bug.
This commit changes the internals to use a null pointer to represent a
nonexistent element.
This requires that Storable be changed to account for it. Also,
IPC::Open3 was relying on the bug. So this commit patches
both modules.
|
|
|
|
|
|
|
|
|
|
| |
This resolves the last remaining issue in ticket #78194, that
newRV is supposedly buggy because it doesn’t copy its referent.
The full implications of the PADTMP are not explained anywhere in
the API docs, and even XSUBs shouldn’t have to worry about special
handling. (E.g., what if they do SvREFCNT_dec(SvRV(sv)); SvRV(sv)=...?)
So the real solution here is not to let XSUBs see them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/a/a/ }'
Modification of a read-only value attempted at -e line 1.
$ ./miniperl -Ilib -e 'for(__PACKAGE__) { s/b/b/ }'
$ ./miniperl -Ilib -e 'for("main") { s/a/a/ }'
Modification of a read-only value attempted at -e line 1.
$ ./miniperl -Ilib -e 'for("main") { s/b/b/ }'
Modification of a read-only value attempted at -e line 1.
When I pass the constant "main" to s///, it croaks whether the regular
expression matches or not.
When I pass __PACKAGE__, which has the same content and is also read-
only, it only croaks when the pattern matches.
This commit removes some logic that is left over from when
READONLY+FAKE meant copy-on-write. Read-only does mean read-only now,
so copy-on-write scalars should not be exempt from read-only checks.
|
| |
|
|
|
|
|
|
|
| |
Currently before matching, we see whether the SV has any pos() magic
attached; then after the match we look it up again to update pos().
Instead just remember the previous value of mg and reuse it where
possible.
|
|
|
|
|
| |
a successful match always sets $-[0] now, so there's no need to check
whether its set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where a qr// regex is directly used by PMOP (rather than being
interpolated with some other stuff and a new regex created, such as
/a$r/p), then the PMf_KEEPCOPY flag will be set on the PMOP, but the
corresponding RXf_PMf_KEEPCOPY flag *won't* be set on the regex.
Since most of the regex handling for copying the string and extracting out
${^PREMATCH} etc is done based on the RXf_PMf_KEEPCOPY flag in the regex,
this is a bit of a problem.
Prior to 5.18.0 this wasn't so noticeable, since various other bugs around
//p handling meant that ${$PREMATCH} etc often accidentally got set
anyway. 5.18.0 fixed these bugs, and so as a side-effect, exposed the
PMOP verses regex flag issue. In particular, this stopped working in
5.18.0:
my $pat = qr/a/;
'aaaa' =~ /$pat/gp or die;
print "MATCH=[${^MATCH}]\n";
(prints 'a' in 5.16.0, undef in 5.18.0).
The presence /g caused the engine to copy the string anyway by luck.
We can't just set the RXf_PMf_KEEPCOPY flag on the regex if we see the
PMf_KEEPCOPY flag on the PMOP, otherwise stuff like this will be wrong:
$r = qr/..../;
/$r/p; # set RXf_PMf_KEEPCOPY on $r
/$r/; # does a /p match by mistake
Since for 5.19.x onwards COW is enabled by default (and cheap copies are
always made regardless of /p), then this fix is mainly for PERL_NO_COW
builds and for backporting to 5.18.x. (Although it still applies to
strings that can't be COWed for whatever reason).
Since we can't set a flag in the rx, we fix this by:
1) when calling the regex engine (which may attempt to copy part or all of
the capture string), make sure we pass REXEC_COPY_STR, but neither of
REXEC_COPY_SKIP_PRE, REXEC_COPY_SKIP_POST when we call regexec() from
pp_match or pp_subst when the corresponding PMOP has PMf_KEEPCOPY set.
2) in Perl_reg_numbered_buff_fetch() etc, check for PMf_KEEPCOPY in
PL_curpm as well as for RXf_PMf_KEEPCOPY in the current rx before deciding
whether to process ${^PREMATCH} etc.
As well as adding new tests to t/re/reg_pmod.t, I also changed the
string to be matched against from being '12...' to '012...', to ensure that
the lengths of ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} would all be
different.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On something like:
$_ = "123456789";
pos = 6;
s/.(?=.\G)/X/g;
each iteration could in theory start with pos one character to the left
of the previous position, and with the substitution replacing bits that
it has already replaced. Since that way madness lies, ban any attempt by
s/// to substitute to the left of a previous position.
To implement this, add a new flag to regexec(), REXEC_FAIL_ON_UNDERFLOW.
This tells regexec() to return failure even if the match itself succeeded,
but where the start of $& is before the passed stringarg point.
This change caused one existing test to fail (which was added about a year
ago):
$_="abcdef";
s/bc|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge;
print; # used to print "aXX[c-d][d-e][e-f]"; now prints "aXXdef"
I think that that test relies on ambiguous behaviour, and that my change
makes things saner.
Note that s/// with \G is generally very under-tested.
|
|
|
|
|
|
|
|
|
|
|
| |
pp_subst() sets the REXEC_COPY_STR flag on the first match. On the second
and subsequent matches, it doesn't set it in two out three of the branches
(including pp_susbstcont) where it calls CALLREGEXEC().
The one place where it *does* set it is a (harmless) mistake, since regexec
ignores REXEC_COPY_STR if REXEC_NOT_FIRST is set (which is it is, on all 3
brnanches).
So unset REXEC_COPY_STR in the third branch too, for consistency
|
|
|
|
| |
and slightly reduce the scope of the temporary i var.
|
| |
|
|
|
|
|
| |
it's just used a temporary var in a few blocks; declare it individually
in each block rather than being scoped to the whole function.
|
|
|
|
|
| |
its mainly just a temporary local var; declare it individually within each
scope that makes use of it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should be just a cosmetic change; but basically change stuff like
m = orig;
s = foo();
... lots of lines not using s or m ...
bar(m,s)
... more stuff using s ...
to
... lots of lines not using s or m ...
s = foo();
bar(orig,s)
... more stuff using s ...
This is part of few commits to generally clean up the scope and
comprehensibility of the vars within pp_subst
|
|
|
|
|
| |
It's just used as a temporary value in two branches;
so make it a local var in each of those branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
since 'orig' always points to the start of the string, while 's' varies,
change
s = SvPV_nomg(...);
...other stuff using value of s ...
orig = s
...
to
orig = SvPV_nomg(...);
...other stuff using value of orig ...
s = orig
...
No functional change, just reduces the cognitive load slightly
also adds some comments as to what force_on_match is about.
|
|
|
|
|
|
| |
The previous commit removed the \G handling from pp_match; most of what's
left in that code block is redundant code that just sets curpos under all
conditions. So tidy it up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally a /g match starts its processing at the previous pos() (or at
char 0 if pos is not set); however in the case of something like /abc\G/
we actually need to start 3 characters before pos. This has been handled
by the *callers* of regexec() subtracting prog->gofs from the stringarg
arg before calling it, or by setting stringarg to strbeg for floating,
such as /\w+\G/.
This is clearly wrong: the callers of regexec() shouldn't need to worry
about the details of getting \G right: move this code into regexec()
itself.
(Note that although this commit passes all tests, it quite possibly isn't
logically correct. It will get fixed up further during the next few
commits)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently all core callers of regexec set both the
REXEC_IGNOREPOS and REXEC_NOT_FIRST flags, or neither, depending
on whether this is the first or subsequent iteration of a //g;
*except* for one place in pp_match(), where REXEC_IGNOREPOS is set
on the first iteration for the one specific case of /g with an anchored
\G.
Now AFAICT this makes no difference, because the starting position
as calculated by regexec() still comes to the same value of
(strbeg + pos -gofs), and the same value og ganch calculated.
Also in the commit that added this particular use of the flag to pp_match,
(0ef3e39ecdfec), removing the flag makes no difference to the passing or
not of the new test case.
So I don't understand what its purpose it, and its possibly a mistake.
Removing it now makes the code simpler for further clearup.
|
|
|
|
| |
It doesn't actually achieve anything.
|
|
|
|
|
|
| |
This function sometimes set $+[0] to pos() before calling regexec().
This value isn't used by regexec(), and was really just a way of updating
the new start position for //g. Replace it with a local var instead.
|
|
|
|
| |
and restrict usage of s variable
|
|
|
|
|
|
|
|
|
|
|
|
| |
In one specific case, pp_match() passes the value of pos() to regexec()
via the otherwise unused 'data' arg.
It turns out that pp_match() only passes this value when it exists and is
>= 0, while regexec() only uses it when there's no pos magic or pos() < 0.
So its never used as far as I can tell.
So, strip it for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pp_match() has an intuit-only match mode: if intuit_start() succeeds and
the regex is marked as only needing intuit (RXf_CHECK_ALL), then calling
regexec() is skipped, and just $& set and then returns.
The commit which originally added that feature to pp_match() also added a
comment to pp_subst() suggesting that the same thing could be done there.
This commit finally achieves that. It builds on the previous commit (which
moved this mechanism from pp_match() directly into regexec()), skipping
calling intuit_start() and directly calling regexec() with the
REXEC_CHECKED flag not set.
This appears to reduce the execution time of a simple substitution
like s/abc/def/ by a fifth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the main part of pp_match() looks like:
if (can_use_intuit) {
if (!intuit_start())
goto nope;
if (can_match_based_only_on_intuit_result) {
... set up $&, $-[0] etc ...
goto gotcha;
}
}
if (!regexec(..., REXEC_CHECKED|r_flags))
goto nope;
gotcha:
...
This rather breaks the regex API encapulation. The caller of the regex
engine shouldn't have to worry about whether to call intuit() or
regexec(), and to know to set $& in the intuit-only case.
So, move all the intuit-calling and $& setting into regexec itself.
This is cleaner, and will also shortly allow us to enable intuit-only
matches in pp_subst() too. After this change, the code above looks like
(in its entirety):
if (!regexec(..., r_flags))
goto nope;
...
There, isn't that nicer?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a bug in intuit_start() that makes it fail when the utf8-ness of the
string and pattern differ. This was mostly masked, since pp_match() skips
calling intuit in this case (and has done since 2000, presumably as a
workaround for this issue, and possibly for other issues since fixed).
But pp_subst() didn't skip, so code like this would fail:
$c = "\x{c0}";
utf8::upgrade($c);
print "ok\n" if $c =~ s/\xC0{1,2}$/\xC0/i;
Now that intuit is (hopefully) fixed, also remove the guard in pp_match().
|
|
|
|
|
|
|
| |
A recent commit did RX_MATCH_UTF8_set() based on the utf8-ness of the
pattern rather than the match string. I didn't matter because in that
branch they were guaranteed to have the same value, but fix it anyway,
both for correctness sake, and because it it *will* matter shortly
|
|
|
|
|
| |
It looks like we no longer need to skip intuit-only matching when the
match is a ref or overloaded (e.g. $ref =~ /ARRAY/)
|
|
|
|
|
| |
The nope: and ret_no: labels labelled the same point in the code.
Eliminate one of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was some code that looked roughly like:
if (can_match_on_intuit_only) {
....
goto yup;
}
if (!regexec())
goto ret_no;
gotcha:
A; B;
if (simple)
RETURNYES;
X; Y;
RETURN;
yup:
A;
if (!simple)
goto gotcha;
B;
RETURNYES
Refactor it to look like
if (can_match_on_intuit_only) {
....
goto gotcha;
}
if (!regexec())
goto ret_no;
gotcha:
A; B;
if (simple)
RETURNYES;
X; Y;
RETURN;
As well as simplifying the code, it also avoids duplicating some work
(the 'A' above was done twice sometimes) - harmless but less efficient.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
change
if (intuit_only)
goto yup:
...
yup:
A; B; X; Y;
to
if (intuit_only)
A; B;
goto yup:
...
yup:
X; Y;
where A and B are intuit_only-specific steps while X and Y are done by the
regexec() branch too. This will shortly allow us to merge the two
branches.
|