| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turned out to be tricky. Normally @ at the beginning of the
interpolated code signals to the lexer to emit ‘join($",’ immediately.
With "$_->@*" we would have to retract the $ _ -> tokens upon encoun-
tering @*, which we obviously cannot do.
Waiting until we reach the end of the interpolated text before emit-
ting anything could not work either, as it may contain BEGIN blocks
that affect the way part of the interpolated code is parsed.
So what we do is introduce an egregious or clever hack, depending on
how you look at it.
Normally, the lexer turns "@foo" into:
stringify ( join ( $ " , @ foo ) )
(The " is a WORD token, representing a variable name.)
"$_" becomes:
stringify ( $ _ )
We can turn "$_->@*" into:
stringify ( $ _ -> @ * POSTJOIN )
Where POSTJOIN is a new lexer token with special handling that creates
a join op just the way join($", ...) does.
To make "foo$_->@*bar" work as well, we have to make POSTJOIN have
precedence just below ->, so that
stringify ( "foo" . $ _ -> @ * POSTJOIN . "bar" )
(what the parser sees) is equivalent to:
stringify ( "foo" . ( $ _ -> @ * POSTJOIN ) . "bar" )
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2179133 inadvertently stopped the PERL5DB env var from being
truncated just before the first line break. (I’m considering that
a bug fix.)
The result is that #!perl -d:foo will throw the line numbers off by
one, as will line breaks in PERL5DB:
$ PERL5DB='sub DB::DB{}'$'\n\n\n''' ./perl -dle 'warn "ok"'
ok at -e line 4.
#!perl -d:foo has thrown off line numbers since f0e67a1d291 in 5.12.
This commit fixes both, by storing the line number of #! -d or the
number 0 for -d on the command line in the new PL_parser->preambling
member, which now overrides any number in PL_curcop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the line number was localised in lexing scopes. herelines
had to be in the lex_shared struct so that inner lexing scopes
could peek into values belonging to outer lexing scopes and set the
herelines value belonging to the same scope that a here-doc body was
extracted from. (herelines records how much extra to increase the
line number at the next line ending, to jump over a here-doc.)
In commit ffdb8b167e, I changed things so that lexing scopes no longer
localised the line number, except for here-docs, and the line number
was incremented within the inner lexing scope, instead of during the
initial scan for the terminator. That meant the herelines value had
to be copied into the inner lexing scope.
For nested here-docs, the inner here-doc’s body is always inside the
outer here-doc, so no peeking into outer scopes is necessary.
Hence, there is no longer any reason for herelines to be inside the
lex_shared struct. We can put it directly inside the parser struct.
Here-docs will localise it. Other quote-like constructs will not (and
can avoid the copy.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
‘Ambiguous use of * resolved as operator *’: This message can occur in
cases where there is no multiplication operator, so what it is saying
is completely wrong.
When the lexer parses a bareword, it looks at the previous character
and warns if it happens to match /[*%&]/, so foo**bar and foo&&bar
result in this warning, as does print $%foo.
The purpose of the code is to catch *bar *bar or &black &sheep.
To avoid false positives, when emitting one of the three operators
* % & the lexer can record that fact, so when it sees a bareword pre-
ceded by one of those three characters, instead of guessing that the
infix operator was used, it will *know*.
The test cases added also trigger ‘Bareword found where operator
expected’. I don’t know whether that should change, but at least the
current behaviour is tested, so we will know when it does change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When re-parsing a pattern for run-time (?{}) code blocks,
we end up with the EVAL_RE_REPARSING flag set in PL_in_eval.
Currently we clear this flag as soon as scan_str() returns, to ensure that
it's not set if we happen to parse further patterns (e.g. within the
(?{ ... }) code itself.
However, a soon-to-be-applied bugfix requires us to know the reparsing
state beyond this point. To solve this, we add a new boolean flag to the
parser struct, which is set from PL_in_eval in S_sublex_push() (with the
old value being saved). This allows us to have the flag around for the
entire pattern string parsing phase, without it affecting nested pattern
compilation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yylex must emit exactly one token each time it is called. Some-
times yylex needs to parse several tokens at once. That’s what
the various force functions are for. But that is also what
PL_pending_ident is for.
The various force_next, force_word, force_ident, etc., functions keep
a stack of tokens (PL_nextval/PL_nexttype) that yylex will check imme-
diately when called.
PL_pending_ident is used to track a single identifier that yylex will
hand off to S_pending_ident to handle.
S_pending_ident is the only piece of code for resolving an identi-
fier that could be lexical but could also be a package variable.
force_ident assumes it is looking for a package variable.
force_* takes precedence over PL_pending_ident.
All this means that, if an identifier needs to be looked up in the pad
on the next yylex invocation, it has to use PL_pending_ident, and the
force_* functions cannot be used at the same time.
Not realising that, when I made ‘our sub foo’ store the sub in the
pad I also made ‘our sub foo ($)’ into a syntax error, because it
was being parsed as ‘our sub ($) foo’ (the prototype being ‘forced’);
i.e., the pending tokens were being pulled out of the ‘queue’ in the
wrong order. (I put queue in quotes, because one queue and one unre-
lated buffer together don’t exactly count as ‘a queue’.)
Changing PL_pending_ident to have precedence over the force stack
breaks ext/XS-APItest/t/swaptwostmts.t, because the statement-parsing
interface does not localise PL_pending_ident. It could be changed to
do that, but I don’t think it is the right solution.
Having two separate pending token mechanisms makes things need-
lessly fragile.
This commit eliminates the PL_pending_ident mechanism and
modifies S_pending_ident (renaming it in the process to
S_force_ident_maybe_lex) to work with the force mechanism. I was
going to merge it with force_ident, but the two make incompatible
assumptions that just complicate the code if merged. S_pending_ident
needs the sigil in the same string buffer, to pass to the pad inter-
face. force_ident needs to be able to work without a sigil present.
So now we only have one queue for pending tokens and the order is more
predictable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes here-docs in single-line re-evals in files (as
opposed to evals) and here-docs in single-line quote-like operators
inside re-evals.
In both cases, the here-doc parser has to look into an outer
lexing scope to find the here-doc body. And in both cases it
was stomping on PL_linestr (the current line buffer) while
PL_sublex_info.re_eval_start was pointing to an offset in that buffer.
(re_eval_start is used to construct the string to include in the
regexp’s stringification once the lexer reaches the end of the
re-eval.)
Fixing this entails moving re_eval_start and re_eval_str to
PL_parser->lex_shared, making the pre-localised values visible.
This is so that the code that peeks into an outer linestr buffer to
steal the here-doc body can set up re_eval_str in the right scope.
(re_eval_str is used to store the re-eval text when the here-
oc parser has no choice but to modify linestr; see also commit
db4442662555874019.)
It also entails making the stream-based parser (i.e., that reads from
an input stream) leave PL_linestr alone, instead of clobbering it and
then reconstructing part of it afterwards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, PL_parser->linestr and PL_parser->bufptr are both
part of the API, so we can’t just move them to PL_parser->lex_shared.
Instead, we have to copy them in sublex_push, to make them visible to
inner lexing scopes.
This allows the SvIVX(PL_linestr) and SvNVX(PL_linestr) hack to
be removed.
It should also speed things up slightly. We are already allocating
PL_parser->lex_shared in sublex_push, so there should be no need to
upgrade PL_linestr to SvNVX as well.
I was pleasantly surprised to see how the here-doc code seemed to
shrink all by itself when modified to account.
PL_sublex_info.super_bufptr is also superseded by the addition of
->ls_bufptr to the LEXSHARED struct. Its old values when localised
were not visible, being stashed away on the savestack, so it was
harder to use.
|
|
|
|
|
| |
It took me a while to figure this out, so here it is for
future readers.
|
|
|
|
|
|
|
|
| |
PL_parser->herelines needs to be visible to inner lexing scopes, which
also need to have their own copy of it, so that the here-doc parser
can modify the right herelines variable corresponding to the
PL_linestr from which it is stealing its body. (A subsequent commit
will take take of that.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The line numbers for operators after a here-doc marker on the same
line were off by the length of the here-doc.
This is because the here-doc parser would artificially increase the
line number as it went, because it was stealing lines out of the
input stream.
Instead, we can record the number of lines in the here-doc, and add it
to the line number the next time we need to increment it.
This also fixes the line numbers after s//<<END/e to the end of the
file, which were off because the line number adjusted by the <<END was
localised to the s///.
Since herelines is visible to inner lexing scopes, the outer lexing
scope can see changes made by the inner one.
The lack of localisation does cause problems with line numbers inside
quote-like operators (but they were off by one already), which will be
addressed in subsequent commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For re-evals, this is something that broke recently, post-5.16 (the
jumbo fix). For other interpolating constructs, this has never
worked, as far as I can tell.
The lexer was losing track of PL_lex_state (aka PL_parser->lex_state)
when parsing formats. Usually, the state alternates between
LEX_FORMLINE (a picture line) and LEX_NORMAL (an argument line), but
the LEX_NORMAL should actually be whatever the state was before the
format started.
This commit adds a new parser member to track the ‘normal’ state when
parsing a format.
It also tweaks S_scan_formline to handle multi-line buffers outside of
string eval (such as happens in interpolating constructs).
That bufend assignment that is removed as a result is not necessary as
of a0d0e21ea6ea (perl 5.000). That very commit added a bufend assign-
ment after the sv_gets (later filter_gets; later lex_next_chunk) fur-
ther down in the loop in scan_formline.
|
| |
|
| |
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it would leave the file handle open if it was (equal to) stdin,
on the assumption that this must have been because no script name was
supplied on the interpreter command line, so the interpreter was defaulting
to reading the script from standard input.
However, if the program has closed STDIN, then the next file handle opened
(for any reason) will have file descriptor 0. So in this situation, the
handle that require opened to read the module would be mistaken for the above
situation and left open. Effectively, this leaked a file handle.
This is now fixed, by explicitly tracking from parser creation time whether
it should keep the file handle open, and only setting this flag when
defaulting to reading the main program from standard input. This resolves
RT #37033.
|
|
|
|
|
|
|
|
|
|
| |
lex_flags holds 4 flag bits, with multiple flag bits manipulated together
at times, so they can't be split out into individual bitfields. This change
permits the C compiler to generate simpler code, reducing toke.o by about
400 bytes on this platform, but doesn't change the size of the structure.
lex_flags was added in commit 802a15e9c01d1a0b in August 2011, so is not in
any stable release.
|
|
|
|
|
|
|
| |
Sync copyright dates with actual changes according to git history.
[Plus run regen_perly.h to update the SHA-256 checksums, and
regen/regcharclass.pl to update regcharclass.h]
|
| |
|
|
|
|
|
|
|
| |
Perl_lex_start copies the string passed to it unconditionally.
Sometimes pp_entereval makes a copy before passing the string
to lex_start. So in those cases we can pass a flag to avoid a
redundant copy.
|
|
|
|
|
|
|
| |
By combining two booleans with the flags field, we save some space.
By making it a 16-bit instead of 32-bit field (only two flag bits
are currently used), we also avoid alignment holes (I hope; I’m
not very good at this).
|
|
|
|
|
|
| |
When a filter is added, the current buffer is hung on the end of
the filters array, and a new substring of it becomes the current
buffer.
|
|
|
|
|
| |
Put LEX_IGNORE_UTF8_HINTS near the only other constant passed
to lex_start
|
|
|
|
|
| |
(modified by the committer only to apply when the unicode_eval
feature is enabled)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this commit:
commit f07ec6dd59215a56bc1159449a9631be7a02a94d
Author: Zefram <zefram@fysh.org>
Date: Wed Oct 13 19:05:19 2010 +0100
remove filter inheritance option from lex_start
The only uses of lex_start that had the new_filter parameter false,
to make the new lexer context share source filters with the previous
lexer context, were uses with rsfp null, which therefore never invoked
source filters. Inheriting source filters from a logically unrelated
file seems like a silly idea anyway.
string evals could inherit the same source filter space as the cur-
rently compiling code. Despite what the quoted commit message says,
sharing source filters allows filters to be inherited in both direc-
tions: A source filter created when the eval is being compiled also
applies to the file with which it is sharing its space.
There are at least 20 CPAN distributions relying on this behaviour
(or, rather, what could be considered a Test::More bug). So this com-
mit restores the source-filter-sharing capability. It does not change
the current API or make public the API for sharing source filters, as
this is supposed to be a temporary stop-gap measure for 5.14.
|
|
|
|
|
|
| |
New API functions parse_fullexpr(), parse_listexpr(), parse_termexpr(),
and parse_arithexpr(), to parse an expression at various precedence
levels.
|
|
|
|
|
|
|
| |
New API function parse_label() parses a label, separate from statements.
If a label has not already been lexed and queued up, it does not use
yylex(), but parses the label itself at the character level, to avoid
unwanted lexing past an absent optional label.
|
|
|
|
|
|
|
|
| |
PL_doextract had two unrelated jobs, neither best served by an interpreter
global variable. The first was to track the -x command-line switch.
That is replaced with a local variable in S_parse_body(). The second
was to track whether the lexer is in the middle of a =pod section.
That is replaced with an element in PL_parser.
|
|
|
|
| |
it reference counted. Properly solves [perl #66094]
|
|
|
|
|
|
|
|
|
| |
Attached is a patch that adds a public API for the lowest layers of
lexing. This is meant to provide a solid foundation for the parsing that
Devel::Declare and similar modules do, and it complements the pluggable
keyword mechanism. The API consists of some existing variables combined
with some new functions, all marked as experimental (which making them
public certainly is).
|
|
|
| |
p4raw-id: //depot/perl@32793
|
|
|
|
|
| |
Re-order struct yy_stack_frame to save space on LP64 systems.
p4raw-id: //depot/perl@31618
|
|
|
|
|
| |
p4raw-link: @31615 on //depot/perl: 503de4705ff6537018ae94e9179e16636748b2a6
p4raw-id: //depot/perl@31616
|
|
|
|
|
|
|
|
|
| |
Change 22306# inadvertently made 'local $[' statement-scoped
rather than block-scoped; so revert that change and add a
different fix. The problem was to ensure that the savestack got
popped correctly while popping errored tokens. We how record the
current value of PL_savestack_ix with each pushed parser state.
p4raw-id: //depot/perl@31615
|
|
|
| |
p4raw-id: //depot/perl@31255
|
|
|
| |
p4raw-id: //depot/perl@31254
|
|
|
| |
p4raw-id: //depot/perl@31252
|
|
|
| |
p4raw-id: //depot/perl@31203
|
|
|
| |
p4raw-id: //depot/perl@31201
|
|
|
| |
p4raw-id: //depot/perl@31200
|
|
|
|
|
| |
and simplify its creation and destruction
p4raw-id: //depot/perl@31199
|
|
|
| |
p4raw-id: //depot/perl@31154
|
|
|
|
|
| |
PL_nexttoke PL_curforce PL_nextval PL_nexttype
p4raw-id: //depot/perl@31148
|
|
|
|
|
|
|
| |
PL_bufptr PL_oldbufptr PL_oldoldbufptr
PL_linestart PL_bufend
PL_last_uni PL_last_lop PL_last_lop_op
p4raw-id: //depot/perl@31147
|
|
|
| |
p4raw-id: //depot/perl@31134
|
|
|
| |
p4raw-id: //depot/perl@31058
|
|
|
|
|
| |
actually only holding chars.
p4raw-id: //depot/perl@31015
|
|
|
|
|
|
| |
platforms. On LP64 structs stackinfo, refcounted_he, and magic shrink
by 8 bytes, struct yy_parser by 16.
p4raw-id: //depot/perl@30817
|
|
|
|
|
| |
earlier we missed in av.h and hv.h)
p4raw-id: //depot/perl@29670
|
|
|
|
|
| |
(where "easy" == "only appear in toke.c")
p4raw-id: //depot/perl@29655
|