| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had a few reports of segmentation faults and other misbehaviour
when sub-parsing, such as within interpolated expressions, fails.
This change aborts compilation if anything complex enough to not be
parsed by the lexer is compiled in a sub-parse *and* an error
occurs within the sub-parse.
An earlier version of this patch failed on simpler expressions,
which caused many test failures, which this version doesn't (which may
just mean we need more tests...)
(cherry picked from commit bb4e4c3869d9fb6ee5bddd820c2a373601ecc310)
Modified for maint by cherry-picker: New parser struct members moved to
end of struct to preserve backwards-compatibility.
|
|
|
|
| |
(cherry picked from commit 1141a2c757171575dd43caa4b731ca4f491c2bcf)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The:
d = skipspace(d);
can reallocate linestr in the test case, invalidating s. This would
end up in PL_bufptr from the embedded (PL_bufptr = s) in the TOKEN()
macro.
Assigning s to PL_bufptr and restoring s from PL_bufptr allows
lex_next_chunk() to adjust the pointer to the reallocated buffer.
(cherry picked from commit 3b8804a4c2320ae4e7e713c5836d340eb210b6cd)
|
|
|
|
|
|
|
|
|
| |
In the test case, scan_ident() ends up fetching another line
(updating PL_linestart), and since in this case we don't
successfully parse ${identifier} s (and PL_bufptr) end up being
before PL_linestart.
(cherry picked from commit 36000cd1c47863d8412b285701db7232dd450239)
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d9fc04eebe29b8cf5f6f6bf31373b202eafa44d6.
As discussed in
http://www.nntp.perl.org/group/perl.perl5.porters/2016/05/msg236423.html,
the current perl6-shebang code has rather sharp edge-cases. Hence a revert
until we come up with a better solution seems wise.
(cherry picked from commit f691e4455dd520eff11e7f070a9b034b0fa5ca1c)
|
|
|
|
|
|
| |
Looking ahead for the "Missing $ on loop variable" diagnostic can reallocate
PL_linestr, invalidating our pointer. Save the offset so we can update it
in that case.
|
|
|
|
|
|
| |
Some vars have been tagged as const because they do not change in their
new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char,
so is changed to char.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl tries to continue parsing in the face of errors for the convenience
of the person running the script, so as to batch up as many errors as
possible, and cut down the number of runs. Some errors will, however,
have a cascading effect, resulting in the parser getting confused as to
the intent. Perl currently aborts parsing if 10 errors accumulate.
However, some things are reparsed as compilation continues, in
particular tr///, s///, and qr//. The code that reparses has an
expectation of basic sanity in what it is looking at, and so reparsing
with known errors can lead to segfaults. Recent commits have tightened
this up to avoid reparsing, or substitute valid stuff before reparsing.
This all works, as the code won't execute until all the errors get
fixed.
Commit f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b changed things so that
if there is an error in parsing a pattern, the whole compilation is
immediately aborted. Since then, I realized it would be relatively
simple to instead, skip compilation of that particular pattern, but
continue on with the parsing of the program as a whole, up to the
maximum number of allowed errors. And again the program will refuse to
execute after compilation if there were any errors.
This commit implements that, the benefit being that we don't try to
reparse a pattern that failed the original parse, but can go on to find
errors elsewhere in the program.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3dd4eaeb8ac39e08179145b86aedda36584a3509 fixed a bug wherein the
tr/// operator parsing code could be looking at uninitialized data.
This happens only because we try to carry on when we find errors, so as
to find as many errors as possible in a single run, as a convenience to
the person debugging the script being compiled. And we failed to
initialize stuff upon getting an error; stuff that was later looked at
by tr///.
That commit fixed the ticket by making sure the things mentioned there
got initialized upon error, but didn't handle the various other places
in the loop where the same thing could happen.
At the time, I thought it would be easier to instead change the tr///
handling code to know that its inputs were problematic, and to avoid
looking at them in that case. This is easily done, and would
automatically catch all the cases in the loop, now and any added in the
future.
But then I thought, maybe tr/// isn't the only operator that could be
thrown off by this. It is the most obvious one, to someone who knows
how it goes about getting compiled; but there may be other operators
that I don't know how they get compiled and have the same or a similar
problem. The better solution then would be to extend
3dd4eaeb8ac39e08179145b86aedda36584a3509 to make sure everything gets
initialized when there is an error. That is what this current commit
does.
The previous few commits have refactored things so as to minimize the
number of places that need to be handled here, down to three. I kinda
doubt that new constructs will be added, at this stage in the language
development, that would require the same initialization handling. But,
if they were, hopefully those doing it would follow the existing
paradigm that this commit and 3dd4eaeb8ac39e08179145b86aedda36584a3509
establish.
Another way to handle this would have been to, instead of doing an
initialize-and-'continue', to instead jump to a common label at the
bottom of the loop which does the initialization. I think it doesn't
matter much which, so left it as this.
|
|
|
|
|
|
|
| |
In these two cases, we know we are at the end of the input, and that we
have an error. There is no need to try to patch things up so we can
continue to parse looking for other errors; there's nothing left to
parse. So skip having to deal with patching up.
|
|
|
|
|
|
| |
By refactoring slightly, we make this code in a switch statement
have the same entrance and exit invariants as the other cases, so they
all can be handled uniformly at the end of the switch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regular expression patterns are parsed by the lexer/toker, and then
compiled by the regex compiler. It is foolish to try to compile one
that the parser has rejected as syntactically bad; assumptions may be
violated and segfaults ensue. This commit abandons all parsing
immediately if a pattern had errors in it. A better solution would be
to flag this pattern as not to be compiled, and continue parsing other
things so as to find the most errors in a single attempt, but I don't
think it's worth the extra effort.
Making this change caused some misleading error messages in the test
suite to be replaced by better ones.
|
|
|
|
|
|
| |
This is to be called to abort the parsing early, before the required
number of errors have been found. It is used when continuing the parse
would be either fruitless or we could be looking at garbage.
|
|
|
|
| |
Indent after the previous commit enclosed this code in a new block.
|
|
|
|
|
|
| |
This changes yyerror_pvn so that its first parameter can be NULL. This
indicates no message is to be output, but that parsing is to be
abandoned immediately, without waiting for more errors to build up.
|
|
|
|
|
|
| |
This creates a function in toke.c to output the compilation aborted
message, changing perl.c to call that function. This is in preparation
for this to be called from a 2nd place
|
|
|
|
|
|
|
|
|
| |
The previous commit tightened up the checking for well-formed UTF8ness,
so that the ones removed here were redundant.
The test during a string eval may also no longer be necessary, but since
there are many ways to create that string, I'm not confidant enough to
remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous commits have tightened up the checking of UTF-8 for
well-formedness in the input program or string eval. This is done in
lex_next_chunk and lex_start. But it doesn't handle the case of
use utf8; foo
because 'foo' is checked while UTF-8 is still off. This solves that
problem by noticing when utf8 is turned on, and then rechecking at the
next opportunity.
See thread beginning at
http://nntp.perl.org/group/perl.perl5.porters/242916
This fixes [perl #130675]. A test will be added in a future commit
This catches some errors earlier than they used to be and aborts. so
some tests in the suite had to be split into multiple parts.
|
|
|
|
| |
The input is far more likely to be well-formed than not.
|
|
|
|
| |
The comments about what this function does were incorrect.
|
|
|
|
|
|
| |
This moves an automatic variable to closer to the only place it is used;
it also adds branch prediction. It is likely that the input will be
well-formed.
|
|
|
|
|
| |
I am adding the braces because in one of the areas, the lack of braces
had led to a blead failure.
|
|
|
|
|
| |
Before starting this memEQ, we know that the first bytes are the same,
so might as well start the compare with the 2nd bytes.
|
|
|
|
| |
This automatic variable doesn't need such a large scope.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit d2067945159644d284f8064efbd41024f9e8448a reverted commit
b5248d1e210c2a723adae8e9b7f5d17076647431. b5248 removed a parameter
from S_scan_ident, and changed its interior to use PL_bufend instead of
that parameter. The parameter had been used to limit how far into the
string being parsed scan_ident could look. In all calls to scan_ident
but one, the parameter was already PL_bufend. In the one call where it
wasn't, b5248 compensated by temporarily changing PL_bufend around the
call, running afoul, eventually, of the expectation that PL_bufend
points to a NUL.
I would have expected the reversion to add back both the parameter and
the uses of it, but apparently the function interior has changed enough
since the original commit, that it didn't even think there were
conflicts. As a result the parameter got added back, but not the uses
of it.
I tried both approaches to fix this:
1) to change the function to use the parameter;
2) to simply delete the parameter.
Only the latter passed the test suite without error.
I then tried to understand why the parameter in the first place, and why
the kludge introduced by b5248 to work around removing it. It appears
to me that this is for the benefit of the intuit_more function to enable
it to discern $] from a $ ending a bracketed character class, by ending
the scan before the ']' when in a pattern.
The trouble is that modern scan_ident versions do not view themselves as
constrained by PL_bufend. If that is reached at a point where white
space is allowed, it will try appending the next input line and
continuing, thus changing PL_bufend. Thus the kludge in b5248 wouldn't
necessarily do the expected limiting anyway. The reason the approach
"1)" I tried didn't work was that the function continued to use the
original value, even after it had read in new things, instead of
accounting for those.
Hence approach "2)" is used. I'm a little nervous about this, as it may
lead to intuit_more() (which uses heuristics) having more cases where it
makes the wrong choice about $] vs [...$]. But I don't see a way around
this, and the pre-existing code could fail anyway.
Spotted by Dave Mitchell.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The root cause of this was code like this
if (a)
b
which got changed into
if (a)
c
b
thus causing 'b' to being changed to be executed unconditionally. The
solution is just to add braces
if (a) {
c
b
}
This is why I always use braces even if not required at the moment. It
was the coding standard at $work.
It turns out that #130567 doesn't even come up with this fix in place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug happend under things like
tr/\x{101}-\x{200}/
\x{201}-\x{301}/
The newline in the middle was crucial. As a result the second line got
parsed already knowing that the result was UTF-8, and as a result
setting a variable got skipped which happens only when we discover we
need to flip into UTF-8.
The solution adopted here is to set the variable under other conditions,
which leads to it getting set multiple times. But this extra branch and
setting is confined to somehwat rare circumstances, leaving the mainline
code untouched.
|
|
|
|
|
|
|
|
|
| |
RT #130661
In the presence of 'use feature "signatures"', a char >= 0x80 where a sigil
was expected triggered an assert failure, because the (signed) character
was being was being promoted to int and ended up getting returned from
yylex() as a negative value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
buffer" argument, use PL_bufend"
This reverts commit b5248d1e210c2a723adae8e9b7f5d17076647431.
This commit, dating from 2013, was made unnecessary by later removal of
the MAD code. It temporarily changed the PL_bufend variable; doing that
ran afoul of an assertion, added in
fac0f7a38edc4e50a7250b738699165079b852d8, that expects PL_bufend to
point to a terminating NUL.
Beyond the reversion, a test is added here.
|
| |
|
|
|
|
|
|
| |
It turns out that eval text isn't necessarily parsed by
lex_next_chunk(), but is by lex_start(). So, add a test to there to
look for malformed UTF-8.
|
|
|
|
| |
Except under cpan/ and dist/
|
|
|
|
| |
and broke PL_bufptr when it did.
|
| |
|
|
|
|
|
|
| |
Changed one deprecation message to not use a leading v in the Perl
version number, as the other deprecation messages don't have them
either.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit af9be36c89322d2469f27b3c98c20c32044697fe changed toke.c to count
the number of UTF-8 variant characters seen in a string so far. If the
count is 0 when the string has to be upgraded to UTF-8, then only a flag
has to be flipped, saving reparse time. Incrementing this count wasn't
getting done during the expansion of ranges like A-Z under tr///. This
currently doesn't matter for ASCII platforms, as the count is currently
treated as a boolen, and it was getting set if a range endpoint is
variant. On EBCDIC platforms a range may contain variants even if both
endpoints are not. For example \x00-\xFF. (\xFF is a control that is
an invariant). This led to a lot of noise on an EBCDIC smoke, but no
actual tests failing.
I want to keep it as a count so that in the future, things could be
changed so that count can be used to know how big to grow a string when
it is converted to UTF-8, without having to re-parse it as we do now.
It turns out that we need to have this count anyway in the tr/// code as
that grows the string to account for the expansion, and needs to know
how many variants there are in order to do so if the string already is
in UTF-8. So refactoring that code slightly allows the count to served
double-duty, for the grow if it is already UTF-8, and how much to grow
if it isn't UTF-8. And it fixes the noise problem on EBCDIC
|
|
|
|
|
| |
A two-element range here is already fully set up, and no need to do
anything.
|
|
|
|
| |
A single element range can skip a bunch of work.
|
|
|
|
|
| |
By ordering these sequential tests properly, a branch in the mainline
can be saved.
|
| |
|
|
|
|
|
|
|
| |
A parse error due to invalid octal or hex escape in the range of a
transliteration must still ensure some kind of start and end values
are captured, since we don't stop on the first such error. Failure
to do so can cause invalid reads after "Here we have parsed a range".
|
|
|
|
| |
Fixes [perl #70878].
|
|
|
|
| |
This will no longer be allowed in 5.30.
|
|
|
|
| |
It will be fatal by Perl 5.28.
|
|
|
|
|
|
| |
Heredocs without a terminator after the << have been deprecated
since 5.000. After more than 2 decades, it's time to retire this
construct. They will be fatal in 5.28.
|
|
|
|
|
| |
Use of \N{} in a double quoted context, with nothing between the
braces, was deprecated in 5.24, and it will be a fatal error in 5.28.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The :unique and :locked attributes have had no effect since 5.8.8
and 5.005 respectively. They were deprecated in 5.12. They are now
scheduled to be deleted in 5.28.
There are two places the deprecation warning can be issued:
in lib/attributes.pm, and in toke.c. The warnings were phrased
differently, but since we're changing the warning anyway (as we
added the version of Perl in which the attributes will disappear),
we've used the same phrasing for this warning, regardless of where
it is generated:
Attribute "locked" is deprecated, and will disappear in Perl 5.28
Attribute "unique" is deprecated, and will disappear in Perl 5.28
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
|
|
|
|
| |
This function is too long to be effectively inlined, so don't request
the compiler to do so.
|
|
|
|
|
|
|
|
| |
If that byte was part of a utf-8 character, this caused inappropriate
"malformed utf8" warnings or assertions.
In principle this should also skip the newline, but failing to do so
is safe.
|