| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
The majority of perlapi uses C<> to specify these things, but a few
things used I<> instead. Standardize to C<>.
|
|
|
|
|
|
|
| |
This makes && and || begin lines instead of end them for easier
following of the logic, as suggested in Perl Best Practices. A couple
of { } pairs were added to make the code within an 'if' stand out from
the 'if' test itself.
|
| |
|
|
|
|
| |
Instead of #include-ing the C file, compile it normally.
|
| |
|
|
|
|
|
|
|
|
| |
On VC2003 32b -O1, the .text section of miniperl.exe decreased from
0xAEFCD bytes of machine code to 0xAEF9D after this patch.
see also
http://www.nntp.perl.org/group/perl.perl5.porters/2015/07/msg229308.html
|
|
|
|
|
| |
This moves the definition to before the function it is used in, rather
than disrupting the flow of code within the function.
|
|
|
|
|
|
|
|
|
|
|
| |
See http://nntp.perl.org/group/perl.perl5.porters/229168
Also, the documentation has been updated beyond this change to clarify
related matters, based on some experimentation.
Previously, spaces couldn't be in variable names; now ASCII control
characters can't be either. The remaining permissible ASCII characters
in a variable name now must be all graphic ones.
|
|
|
|
|
| |
Add some clarifying comments, and properly indent some lines to
prevailing level.
|
|
|
|
|
| |
The feature still exists, for compatibility with code that tries to enable
it, but it has no effect. The postderef_qq feature still exists, however.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As proposed by RJBS.
The "5.24" feature bundle (and therefore C<< use v5.24 >>) now enable
postderef and postderef_qq.
I can't find any precedent for what to do with the relevant experimental::*
warnings category when an experimental feature graduates to acceptance. I
have elected to leave the category in place, so that code doing C<< no
warnings "experimental::postderef" >> will continue to work. This means that
C<< use warnings "experimental::postderef" >> is also accepted, but has no
effect.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to FC's 488bc579589, which stopped assertion
failures on parsing 0${. A similar code path still failed with
0$#{.
The fix is similar to the previous fix, although I suspect a more general
fix is needed - perhaps moving the fixes into S_no_op() - but not this
close to the 5.22 release.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT# 124216
When paring the interpolated string "$X", where X is a unicode char that
is not a legal variable name, failure to restore things properly during
error recovery led to corrupted state and assertion failures.
In more detail:
When parsing a double-quoted string, S_sublex_push() saves most of the
current parser state. On parse error, the save stack is popped back,
which restores all that state. However, PL_lex_defer wasn't being saved,
so if we were in the middle of handling a forced token, PL_lex_state gets
restored from PL_lex_defer, and suddenly the lexer thinks we're back
inside an interpolated string again. So S_sublex_done() gets called
multiple times, too many scopes are popped, and things like PL_compcv are
freed prematurely.
Note that in order to reproduce:
* we must be within a double quoted context;
* we must be parsing a var (which causes a forced token);
* the variable name must be illegal, which implies unicode, as
chr(0..255) are all legal names;
* the terminating string quote must be the last char of the input
file, as this code:
case LEX_INTERPSTART:
if (PL_bufptr == PL_bufend)
return REPORT(sublex_done());
won't trigger an extra call to sublex_done() otherwise.
I'm sure this bug affects other cases too, but this was the only way I
found to reproduce.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
While isWORDCHAR_lazy_if is UTF-8 aware, checking advanced byte-by-byte.
This lead to errors of the form:
Passing malformed UTF-8 to "XPosixWord" is deprecated
Malformed UTF-8 character (unexpected continuation byte 0x9d, with
no preceding start byte)
Warning: Use of "�" without parentheses is ambiguous
Use UTF8SKIP to advance character-by-character, not byte-by-byte.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When not explicitly quoted, tokenization of the HERE-document terminator
dealt improperly with multi-byte characters, advancing one byte at a
time instead of one character at a time. This lead to
incomprehensible-to-the-user errors of the form:
Passing malformed UTF-8 to "XPosixWord" is deprecated
Malformed UTF-8 character (unexpected continuation byte 0xa7, with
no preceding start byte)
Can't find string terminator "EnFra�" anywhere before EOF
If enclosed in single or double quotes, parsing was correctly effected,
as delimcpy advances byte-by-byte, but looks only for the single-byte
ending character.
When doing a \w+ match looking for the end of the word, advance
character-by-character instead of byte-by-byte, ensuring that the size
does not extend past the available size in PL_tokenbuf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During parsing, toke.c checks if the user is attempting provide multiple
indexes to an array index:
$a[ $foo, $bar ];
However, while checking for word characters in variable names is aware
of multi-byte characters if "use utf8" is enabled, the loop is only
advanced one byte at a time, not one character at a time. As such,
multibyte variables in array indexes incorrectly yield warnings:
Passing malformed UTF-8 to "XPosixWord" is deprecated
Malformed UTF-8 character (unexpected continuation byte 0x9d, with
no preceding start byte)
Switch the loop to advance character-by-character if UTF-8 semantics are
in use.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
|
|
|
|
|
|
| |
This changes the way some of the current internal-only macros are named
and used in order to simplify things and minimize what gets exposed as
part of the API.
Although these have not been listed as publicly available, it costs
essentially nothing to keep the old names around in case someone was
illegally using them.
|
|
|
|
|
|
|
| |
If s;; gobbles up the implicit semicolon that is tacked on to the end
of the file, it can confuse the here-doc parser into thinking it is
inside a string eval, because there is no file handle. We need to
check for that possibility where the assertion was failing.
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, the regex compiler was relying on the lexer to do
the translation from Unicode to native for \N{...} constructs, where it
was simpler to do. However, when the pattern is a single-quoted string,
it is passed unchanged to the regex compiler, and did not work. Fixing
it required some refactoring, though it led to a clean API in a static
function.
This was spotted by Father Chrysostomos.
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an @ sign in a double-quoted string is not followed by a valid
identifier, then it is treated literally. Or at least that is how it
was intended to work.
The lexer was actually not self-consistent. It was treating non-ASCII
digits at valid identifiers in determining where the interpolation
started, but was not treating them as valid identifiers when actually
parsing the interpolated code. So this would result in syntax errors,
and even crashes in some cases.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some questions and loose ends:
XXX gv.c:S_gv_magicalize - why are we using SSize_t for paren?
XXX mg.c:Perl_magic_set - need appopriate error handling for $)
XXX regcomp.c:S_reg - need to check if we do the right thing if parno
was not grokked
Perl_get_debug_opts should probably return something unsigned; not sure
if that's something we can change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is related to bug #123617 and is a follow-up to eabab8bcc.
This code:
"@0{0s 000";eval"$"
begins tokenisation as follows:
stringify ( join ( $ " , @ 0 { 0 subst
When seeing the subst after the 0, the parser discards many tokens and
we end up at the ; outside the quotes.
Since PL_lex_stuff (the temporary spot for storing the contents of a
quote-like operator) is localised as of eabab8bcc, we end up with just
PL_sublex_info.repl (the temporary spot for storing the replacement
part) set. Since it is still set when we get to the next double-
quote, it is treated as a two-part quote-like operator, like y or s.
That can’t happen, and we have assertions to make sure of it.
We need to localise PL_sublex_info.repl as well, so it gets freed
properly when scopes are popped after an error.
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to f4460c6f7a. The check to see whether we are
in a quote-like operator needs to come before the call to sublex_done,
as sublex_done is just as problematic as doing SvIVX on a PV. (See
479ae48e22f for details on why.) Checking the type of PL_linestr is
not a reliable way to see whether we are in a quote-like op, so use
PL_in_what instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the parser (perly.c) pops tokens when encountering a syntax error,
it can pop inner lexing scopes (which handle the contents of quote-
like operators).
If there is a pending token on the pending token stack, then the cur-
rent lexing state is LEX_KNOWNEXT. It usually gets set to a pending
value stored in PL_lex_defer when the last pending token is emitted.
If scopes are exited when there is a pending token, then the state is
reset, since it is localised, even thought we still have a token pend-
ing. We have code to account for that and still emit the pending
token. (See 7aa8cb0dec1.) But the pending lexing state is still
used after the pending token is emitted. So we can end up with
LEX_INTERPEND when there is no inner lexing scope.
LEX_INTERPEND will cause sublex_done (in toke.c) to be called, which
does a LEAVE. If the current scope does not belong to it, we end up
exiting a scope set up by the parser, which frees the parser stack
(via SAVEDESTRUCTOR_X and clear_yystack in perly.c). The parser is
still using the stack, which holds reference counts on the CV in
PL_compcv, so reference counts get screwed up.
We need to check whether we have a proper lexing scope set up if the
lexing state is LEX_INTERPEND.
This is a follow-up to f4460c6f7a, which was a similar bug, but
occurred with LEX_INTERPCONCAT, rather than LEX_INTERPEND.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer normally turns s##...#e into
PMFUNC '(' WORD '/' DO '{' ... ';' '}' ')'
If you have [} inside the replacement, that becomes '[' ';' '}'. When
the parser gets to the second semicolon, it pops scopes to try to
recover from the syntax error, and in so doing it exits the inner lex-
ing scope that was set up for the substitution.
When that happens, the second '}' is already on the pending token
stack. Since we set the lexing state to LEX_KNOWNEXT when there is a
pending token (though we may not have to; see 7aa8cb0dec1), we have to
record a pending state as well, so we know what to set the state back
to. That pending state is not localised, and, in this case, was set
before the scopes were popped.
So we end up in the outermost lexing scope, with the lexing state set
to LEX_INTERPEND.
Inside an inner lexing scope, PL_linestr is of type PVIV, with the IVX
field used to hold extra information about the type of quote. In the
main lexing scope, PL_linestr is an SVt_PV with no IVX field.
If the lexing state is LEX_INTERPanything, it is assumed that
PL_linestr has an IVX field, which is not the case here, so we fail an
assertion or crash.
The safest pre-5.22 solution is to check the type of PL_linestr before
reading IVX.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.21.8-320-ge47d32d stopped code interpolated into quote-like
operators from reading more lines of input, by making lex_next_chunk
ignore the open filehandle and return false. That causes this block
under case 0 in yylex to loop:
if (!lex_next_chunk(fake_eof)) {
CopLINE_dec(PL_curcop);
s = PL_bufptr;
TOKEN(';'); /* not infinite loop because rsfp is NULL now */
}
(rsfp is not null there.) This commit makes it check for quote-like
operators above, in the same place where it checks whether the file is
open, to avoid falling through to this code that can loop.
This changes the syntax errors for a couple of cases recently added
to t/op/lex.t, though I think the error output is now more consis-
tent overall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_sublex_info.sub_inwhat (in the parser struct) is a temporary spot
to store the value of PL_lex_inwhat (also in the parser struct)
when a sub-lexing scope (for a quote-like operator) is entered.
PL_lex_inwhat is localised, and the value is copied from its temporary
spot (sub_inwhat) into PL_lex_inwhat.
The PL_sublex_info.sub_inwhat was not localised, but instead the value
was set to 0 when a sub-lexing scope was exited. This value was being
used, in a couple of places, to determine whether we were inside a
quote-like operator. But because the value is not localised, it can
be wrong when it is set to 0, if we have nested lexing scopes.
So this ends up crashing for the same reason described in e47d32dcd5:
echo -n '/$a[m||/<<a' | ./miniperl
perl-5.005_02-1816-g09bef84 added the first use of
PL_sublex_info.sub_inwhat to determine whether we are in a quote-like
operator. (Later it got shifted around.) I copied that in e47d32dcd5
(earlier today), because I assumed the logic was correct. Other parts
of the code use PL_lex_inwhat, which is already localised, as I said,
and does not suffer this problem.
If we do not check PL_sublex_info.sub_inwhat to see if we are in
a quote-like construct, then we don’t need to clear it on lexing
scope exit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parser used to read more lines of input when parsing code interpo-
lated into quote-like operators, under some circumstance. This would
result in code like this working, even though it should be a syn-
tax error:
s||${s/.*/|;
/s}Just another Perl hacker,
print
"${;s/.*/Just an";
other Perl hacker,
/s} die or return;
print
While this was harmless, other cases, like /$a[/<<a with no trailing
newline, would cause unexpected internal state that did not meet the
reasonable assumptions made by S_scan_heredoc, resulting in a crash.
The simplest fix is to modify the function that reads more input,
namely, lex_next_chunk, and prevent it from reading more lines of
input from inside a quote-like operator. (The alternative would be to
modify all the calls to lex_next_chunk, and make them conditional.)
That breaks here-doc parsing for things like s//<<EOF/, but the
LEX_NO_TERM flag to lex_next_chunk is used only by the here-doc
parser, so lex_next_chunk can make an exception if it is set.
|
|
|
|
| |
The type is unused, so there is no need to set it to 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this naughty code snippet:
s)$0{0h());qx(@0);qx(@0);qx(@0)
the s)...)) is treated as a substition, with $0{0h( for the left part.
When the lexer reaches the h( it tries to emit two tokens at once, '&'
and a WORD token representing the h. To do that it pushes the WORD on
to the pending token stack and then emits '&'. The next call to yylex
will usually pop the token off the pending stack and use that, because
the lexing state (PL_lex_state) is LEX_KNOWNEXT.
However, when the parser sees '&', it immediately reports it as
a syntax error, and tries to pop tokens to make sense of what it
has, popping scopes in the process. Inside a quote-like operator,
PL_lex_state is localised, so the value after this scope-popping is
no longer LEX_KNOWNEXT, so the next call to yylex continues parsing
‘;qx...’ and ignores the pending token.
When it reaches the @0 inside the qx, it tries to push five pending
tokens on to the stack at once, because that’s how the implicit join
works. But the stack only has room for five items. Since it already
has one, the last item overflows, corrupting the parser state.
Crashes ensue.
If we check for the number of pending tokens and always emit any
regardless of the lexing state, then we avoid the crash. This is
arguably how it should have been written to begin with.
This makes LEX_KNOWNEXT, and probably PL_lex_defer, redundant, but I
will wait till after perl 5.22 before removing those, as the removal
may break CPAN modules, and this is a little late in the dev cycle.
|
|
|
|
| |
With this assertion, the test case from #123743 fails sooner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the problem mentioned in 3c47da3c2e with an op address
being used as flags. '&' not followed by a identifier was being fed
to the parser with a stale token value, left over from the previous
token that had a value, which might be an op address. This would
cause the flags on the op to vary randomly.
Usually the rv2cv op created this way is nulled, but if there is a
syntax error it may be freed before that happens. And it is when the
op is freed that the private flags are checked to make sure no invalid
flags have been set.
The test added to t/op/lex.t used to fail an assertion (for me) more
than half the time, but not always, because the 0x10 bit was being set
in op_private (rv2cv does not use that bit).
|
|
|
|
|
|
| |
As of v5.21.3-105-gc5e7362, force_ident no longer reads the value of
PL_expect, so the assignment can come after it. And TERM('&') (just
after this if-statement) already assigns XOPERATOR to PL_expect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally the lexer skips over stray nulls, treating them as white-
space. After a sigil, though, it was getting confused. While $\0foo
would work and was equivalent to $foo (but did not work for lexicals),
$\0eq was a syntax error. Some cases of &\0foo would cause assertion
failures or outright buggy behaviour, such as strictures randomly
turning on and off.
There were two problems occurring:
1) Nulls were not being treated as whitespace right after a sigil,
unlike elsewhere.
2) '&' not followed immediately by an identifier was not getting
pl_yylval set, so the previous value, which might be an op address,
was being passed as a flags parameter to an op constructor. (The
other sigil tokens never use their values.)
This commit addresses the first of those. I still need to investigate
whether the second can still cause problems.
|
|
|
|
|
|
|
|
|
|
| |
S_no_op, which displays ‘Foo found where operator expected’, assumes
that PL_bufptr points to the beginning of the token, but that was not
the case for ${ at the end of a line. The attempt to read more into
the buffer would make PL_bufptr point to the end of the line. This
meant it would use a negative string length when generating the
‘(Missing operator before foo?)’ text, only accidentally escaping a
crash. On debugging builds, it failed an assertion.
|
|
|
|
|
| |
In these three code paths, PL_lex_stuff is never null, so there is no
need to check that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes crashes and assertion failures related to ticket #123617.
When the lexer encounters a quote-like operator, it scans for the
final delimiter, putting the string in PL_lex_stuff and the replace-
ment, if any, in PL_sublex_info.repl. Those are just temporary spots
for those values. As soon as the next token is emitted (FUNC or
PMFUNC), the values are copied to PL_linestr and PL_lex_repl, respec-
tively, after these latter have been localised.
When scan_str (which scans a quote-like op) sees that PL_lex_stuff is
already set, it assumes that it is now parsing a replacement, so it
puts the result in PL_sublex_info.repl.
The FUNC or PMFUNC token for a quote-like operator may trigger a syn-
tax error while PL_lex_stuff and PL_sublex_info.repl are still set. A
syntax error can cause scopes to be popped, discarding the inner lex-
ing scope (for the quote op) that we were about to enter, but leaving
a PL_lex_stuff value behind.
If another quote-like op is parsed after that, scan_str will assume it
is parsing a replacement since PL_lex_stuff is set. So you can end up
with a replacement for an op of type OP_MATCH, which is not supposed
to happen. S_sublex_done fails an assertion in that case. Some exam-
ples of this bug crash later on non-debugging builds.
Localising PL_lex_stuff fixes the problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PL_lex_stuff variable in the parser struct is reference-counted.
Yet, in toke.c:S_sublex_start we pass the value to S_tokeq, which may
pass it to S_new_constant, which takes ownership of the reference
count (possibly freeing or mortalising the SV), and then relinquishes
its ownership of the returned SV (incrementing the reference count if
it is the same SV passed to it). If S_new_constant croaks, then it
will have mortalised the SV passed to it while PL_lex_stuff still
points to it.
This example makes S_new_constant croak indirectly, by causing its
yyerror call to croak because of the number of errors:
$ perl5.20.1 -e 'BEGIN { $^H|=0x8000} undef(1,2); undef(1,2); undef(1,2); undef(1,2); undef(1,2); undef(1,2); undef(1,2); undef(1,2); undef(1,2); "a"'
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Too many arguments for undef operator at -e line 1, near "2)"
Constant(q) unknown at -e line 1, near ";"a""
-e has too many errors.
Attempt to free unreferenced scalar: SV 0x7fb49882fae8 at -e line 1.
|
| |
|
|
|
|
|
|
|
|
| |
If we are parsing a \N{U+XXX.YYY} construct in a regexp literal, we do
not need to pass it to grok_hex, because we do not need the numeric
value at this point. The regexp engine will be calling grok_hex
again, after all. A simple scan for hex digits should be faster, and
makes the code a little simpler, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code for handling this only worked in double-quotish contexts. To
make it work in single-quotish areas as well, it needs to be moved out
of toke.c, and corresponding code added to regcomp.c. This commit does
just the portion that removes the code from toke.c. The other portion
hasn't been fully debugged yet. This means that blead will now fail on
EBCDIC platforms in double-quotish contexts. But EBCDIC platforms
aren't fully supported in blead currently anyway.
The reason this partial commit is being pushed to blead now is that its
absence is blocking other work in toke.c
Spotted by Father Chrysostomos
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perl-5.8.0-117-g6f33ba7, which added the XTERMORDORDOR hack, did not
change the leftbracket code to treat XTERMORDORDOR the same way as
XTERM, so -l {0} and getc {0} (among other ops) were treating {...} as
a block, rather than an anonymous hash. This was not, however, being
turned into a real block with enter/leave ops to protect the stack,
so the nextstate op was corrupting the stack and possibly freeing mor-
tals in use.
This commit makes the leftbracket code check for XTERMORDORDOR and
treat it like XTERM, so that -l {0} once more creates an anonymous
hash. There is really no way to get to that hash, though, so all I
can test for is the crash.
|
|
|
|
| |
Yay, the semicolons are back.
|
|
|
|
|
|
|
|
|
| |
Generally the guideline is to outdent C labels (e.g. 'foo:') 2 columns
from the surrounding code.
If the label starts at column zero, then it means that diffs, such as
those generated by git, display the label rather than the function
name at the head of a diff block: which makes diffs harder to peruse.
|
|
|
|
| |
That also avoids crashing on overrun.
|
| |
|
| |
|