| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Mainly it no longer generates some tables used for debugging.
This commit also adds a new define showing what bison version was used.
|
|
|
|
|
|
| |
Previously the assignment was hidden by the not op wrapped around the
condition, but newCONDOP() is sufficiently flexible that it isn't
needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is dead code that used to allow
my $_;
...
given ($foo) {
# lexical $_ aliased to $foo here
}
Now that lexical $_ has been removed, remove the code. I've left the
signatures of the newFOO() functions unchanged; they just expect a target
of 0 to always be passed now.
|
|
|
|
|
|
| |
This token’s type is never used. We don’t bother setting the type,
either, in toke.c, so it will be garbage. Removing the type makes
it harder to use the garbage value by mistake in refactoring.
|
|
|
|
|
|
|
| |
These two tokens never use their value, and the value is not even set
in toke.c, which means it will contain a junk value from some previous
token. Removing the types prevents that junk value from being acci-
dentally used.
|
|
|
|
| |
Yay, the semicolons are back.
|
|
|
|
|
|
|
|
|
| |
This moves signatures before attributes in the grammar by
creating separate branches for the prototype and signatures
cases, so that the introduced block and the fact that signatures
do not allow for declarations can be handled properly.
Tests and regen_perly to follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These two operators were being translated into subst("","") and
tr("","") by the lexer. Then pmruntime in op.c would take apart the
resulting list op. Instead of constructing a list op only to take it
apart again, feed the replacement part to pmruntime separately. We
can achieve this by introducing a new token ('/') that the parser rec-
ognizes as introducing a replacement.
If we had followed this approach to begin with, then bug #123542 would
never have happened.
(Actually, it seems the parser did know about the replacement part to
begin with, but it changed in perl-5.8.0-4047-g131b3ad to fix some
overloading problems.)
|
|
|
|
|
|
| |
ck_spair also applies lvalue context to the kid ops, so we just end up
calling op_lvalue twice on the same ops. It’s harmless (being idempo-
tent), but wasteful.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Formats need the same logic applied in 34b54951568, to avoid this:
Input:
{
my $x;
format STDOUT =
@
$x
.
}
__END__
$ pbpaste|./perl -Ilib -MO=Deparse
{
my $x;
}
format STDOUT =
@
$x
.
__DATA__
- syntax OK
That $x in the format is now global, not lexical.
Also, we need to make sure the sequence numbers are correct. The
statement after the format stopped having a higher sequence number
than CvOUTSIDE_SEQ(format) in 8635e3c238f, because the ‘pending’
sequence number set aside for the nextstate op created just after com-
pile-time block exit (which nextstate op comes before the block) was
not actually being used by newFORM (unlike newATTRSUB), so the follow-
ing statement was using that number.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A block in perl usually consists of an enter/leave pair plus the con-
tents of the block:
leave
enter
nextstate
whatever
But if the contents of the block are simple enough to forego the
full block structure, a simple scope op is used, which is not
even executed:
scope
ex-nextstate
whatever
If there is a real nextstate op anywhere in the block, it resets the
stack to whatever it was at block entry, based on the value on the
context stack placed there by the enter op. That’s why we can never
have scope+nextstate (we have ex-nextstate, or a former nextstate op
that is not executed).
A for-loop (for(init; cond; cont) { ... }) executes the init section
first, and then an unstack op, which is like nextstate in that it
resets the stack based on what the context stack says is the base off-
set for this block.
If we have an unstack op, we can’t use scope, just as we can’t use it
with nextstate. But we *were* nonetheless using scope in this case.
Hence, map { for(...;...;...) {...} } ... caused the for-loop to reset
the stack to the beginning of map’s own arguments. So the for-loop
would stomp on them.
We can see the same bug with ‘for’ clobbering an outer list:
$ perl5.20.1 -le 'print 1..3, do{for(0;0;){}}, 4..6;'
0456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
8635e3c2 (5.21.6) changed the COP sequence numbers for nested blocks,
such that most BEGIN blocks (incl. ‘use’ statements) and sub declara-
tions end up in the right place. However, it had the side effect of
causing declarations at the end of the enclosing scope to fall out of
it and appear below.
This commit fixes that by adding an extra nulled COP to the end of the
enclosing scope if that scope ends with a sub, so the final declara-
tion gets deparsed before it.
The frequency of sub declarations at the end of the enclosing scope is
sufficiently low (I’m guessing a bit here) that this slight increase
in run-time memory usage is probably acceptable.
I had to change B::Deparse to deparse nulled COPs the same way it does
live COPs, which means we get more extraneous semicolons than before.
I hope to fix that in a forthcoming commit. I also ran into a B bug,
in that null ops are not presented to Perl code with the right op
class (see the blessing in the patch). I plan to fix that in a separ-
ate commit, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the op tree, a statement consists of a nextstate/dbstate op (of
class cop) followed by the contents of the statement. This cop is
created after the statement has been parsed. So if you have nested
statements, the outermost statement has the highest sequence number
(cop_seq). Every sub (including BEGIN blocks) has a sequence number
indicating where it occurs in its containing sub.
So
BEGIN { } #1
# seq 2
{
# seq 1
...
}
is indistinguishable from
# seq 2
{
BEGIN { } #1
# seq 1
...
}
because the sequence number of the BEGIN block is 1 in both examples.
By reserving a sequence number at the start of every block and using
it once the block has finished parsing, we can do this:
BEGIN { } #1
# seq 1
{
# seq 2
...
}
# seq 1
{
BEGIN { } #2
# seq 2
...
}
and now B::Deparse can tell where to put the blocks.
PL_compiling.cop_seq was unused, so this is where I am stashing
the pending sequence number.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When (\$x)=\$y is compiled, the \ on the lhs gives lvalue context to
its argument by calling op_lvalue. Then later the = gives lvalue con-
text to the \, calling op_lvalue again, which transforms the $x into
an lvref op (via op.c:S_lvref).
I just copied that logic when I extended aliasing via reference to
foreach \$x. But here, we don’t need to call op_lvalue on the $x,
because we know it is going to go through op.c:S_lvref, which doesn’t
care whether it has been through op_lvalue already or not. The end
result is the same.
|
|
|
|
| |
Some passing tests are still marked to-do. We need more tests still.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new opcode class, METHOP, which will hold class/method related
info needed at runtime to improve performance of class/object method
calls, then change OP_METHOD and OP_METHOD_NAMED from being UNOP/SVOP to
being METHOP.
Note that because OP_METHOD is a UNOP with an op_first, while
OP_METHOD_NAMED is an SVOP, the first field of the METHOP structure
is a union holding either op_first or op_sv. This was seen as less messy
than having to introduce two new op classes.
The new op class's character is '.'
Nothing has changed in functionality and/or performance by this commit.
It just introduces new structure which will be extended with extra
fields and used in later commits.
Added METHOP constructors:
- newMETHOP() for method ops with dynamic method names.
The only optype for this op is OP_METHOD.
- newMETHOP_named() for method ops with constant method names.
Optypes for this op are: OP_METHOD_NAMED (currently) and (later)
OP_METHOD_SUPER, OP_METHOD_REDIR, OP_METHOD_NEXT, OP_METHOD_NEXTCAN,
OP_METHOD_MAYBENEXT
(This commit includes fixups by davem)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All these code snippets are embedded inside a function
(perly.c:yyparse) that puts the current value of PL_parser in a local
variable named parser. So the two are equivalent, but the latter
only has to access a local variable.
Before:
$ ls -ld perly.o
-rw-r--r-- 1 sprout staff 94748 Aug 22 06:12 perly.o
After:
$ ls -ld perly.o
-rw-r--r-- 1 sprout staff 94340 Aug 22 06:15 perly.o
|
|
|
|
|
|
|
|
|
|
|
|
| |
When curly subscripts are parsed, the lexer (toke.c:yylex) notes that
the value of PL_expect needs to be set to XSTATE (expecting a state-
ment) after the final brace. When the final brace is encountered,
PL_expect is set to that recorded value. But then the parser
(perly.y) sets it to XOPERATOR immediately thereafter.
This approach requires a plethora of identical statements in perly.y.
If we just set PL_expect to the right value to begin with, we can
avoid all those assignments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As it worked before, the parser (perly.y) would set PL_expect to
XSTATE after encountering a statement-terminating semicolon.
Two functions in op.c--package and utilize--had to set the value to
XSTATE as a result.
Also, in the case of a closing brace, the lexer emits an implicit
semicolon followed by '}' (emitted via force_next). force_next
records the value of PL_expect and restores it when emitting the
token. So in this case the value of PL_expect was flipping back and
forth between two values.
Instead of having the parser set it to XSTATE, we can have the lexer
set it to XSTATE by default when emitting an explicit semicolon. (It
was setting it to XTERM.) The parser can set it to XTERM in the only
place that matters; viz., the header of a for-loop. This simplifies
things conceptually, and makes the code a whole line shorter.
(The diff stat shows more savings in line count, but that is because
the version of bison I used to regenerate the tables produces smaller
headers than what was already committed.)
|
|
|
|
|
|
| |
MAD = Misc Attribute Decoration; unmaintained attempt at preserving
the Perl parse tree more faithfully so that automatic conversion to
Perl 6 would have been easier.
|
|
|
|
|
|
|
|
|
|
| |
Declarative syntax to unwrap argument list into lexical variables.
"sub foo ($a,$b) {...}" checks number of arguments and puts the
arguments into lexical variables. Signatures are not equivalent to the
existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only
available by enabling a non-default feature, and generate warnings about
being experimental. The syntactic clash with prototypes is managed by
disabling the short prototype syntax when signatures are enabled.
|
|
|
|
|
| |
It's been deprecated (and emitting a warning) since Perl v5.0.0, and
support for it consitutes nearly 3% of the grammar.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turned out to be tricky. Normally @ at the beginning of the
interpolated code signals to the lexer to emit ‘join($",’ immediately.
With "$_->@*" we would have to retract the $ _ -> tokens upon encoun-
tering @*, which we obviously cannot do.
Waiting until we reach the end of the interpolated text before emit-
ting anything could not work either, as it may contain BEGIN blocks
that affect the way part of the interpolated code is parsed.
So what we do is introduce an egregious or clever hack, depending on
how you look at it.
Normally, the lexer turns "@foo" into:
stringify ( join ( $ " , @ foo ) )
(The " is a WORD token, representing a variable name.)
"$_" becomes:
stringify ( $ _ )
We can turn "$_->@*" into:
stringify ( $ _ -> @ * POSTJOIN )
Where POSTJOIN is a new lexer token with special handling that creates
a join op just the way join($", ...) does.
To make "foo$_->@*bar" work as well, we have to make POSTJOIN have
precedence just below ->, so that
stringify ( "foo" . $ _ -> @ * POSTJOIN . "bar" )
(what the parser sees) is equivalent to:
stringify ( "foo" . ( $ _ -> @ * POSTJOIN ) . "bar" )
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$_->$* means $$_ (and compiled down to the same op tree)
$_->@* means @$_ ( ditto ditto blah blah blah )
$_->%* means %$_ (...)
$_->&* means &$_
$_->** means *$_
$_->@[...] means @$_[...]
$_->@{...} means @$_{...}
$_->*{...} means *$_{...}
$_->@* is not always equivalent to @$_, particularly in contexts like
@foo[0], which cannot be written foo->@*[0]. (Just omit the asterisk
and it works.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resolves tickets #28380 and #114024.
Commit 95a31aad5 did something similar to this for the new %hash{...}
syntax. This commit extends it to @ slices and combines the two
code paths.
The heuristics in toke.c can easily produce false positives. So the
op is flagged as being a candidate for the warning. Then when op.c
has the op tree available, it examines it to see whether the heuristic
may have been a false positive.
This avoids bugs with qw "foo bar baz" and sub calls triggering
the warning.
The source code is no longer available for the warning, so we recon-
struct it from the op tree, skipping the subscript if it is anything
other than a const op.
This means that @hash{$foo} comes out as @hash{...} and @hash{foo} as
@hash{"foo"}. It also meeans that @hash{"]"} is displayed correctly
instead of as @hash{"].
Commit 95a31aad5 also modified the heuristic for %hash{...} to exempt
qw altogether. But it did not exempt it if it was preceded by a tab.
So this commit rectifies that.
This commit also improves the false positive detection by exempting
any ops returning lists that can get past toke.c’s heuristic. I went
through the entire list of ops, but I may have missed some.
Also, @ slices on the lhs of = are exempt, as they change the context
and are hence actually useful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of warning in the lexer, flag the op and then warn in op.c,
when the op tree is available, so we don’t end up warning for actual
lists or for sub calls.
Also, only warn in scalar context, as in list context $hash{$scalar}
and %hash{$scalar} do different things.
In op.c we no longer have easy access to the source code, so recon-
struct the hash/array access based on the op tree. This means
%hash{foo} becomes %hash{"foo"}. We only reconstruct constant keys,
so %hash{++$x} becomes %hash{...}. This also corrects erroneous
dumps, like %hash{"} for %hash{"}"}.
Instead of triggering the warning solely based on the op tree, we
still keep the heuristic in toke.c, so that common workarounds for
that warning (e.g., {q<key>} and {("key")}) continue to work.
The heuristic in toke.c is tweaked to avoid warning for qw().
In a future commit I plan to extend this to the existing @array[0] and
@hash{key} warnings, to avoid false positives.
|
|
|
|
|
|
| |
kvaslice operator that imlements %a[0,2,4] syntax which
result in list of index/value pairs. Implemented in
consistency with "key/value hash slice" operator.
|
|
|
|
|
|
| |
kvhslice operator that implements %h{1,2,3,4} syntax which
returns list of key value pairs rather than just values
(regular slices).
|
|
|
|
|
|
| |
NOAMP is only emitted by toke.c when there are no parentheses. If
there is a parenthesis following a word, the lexer conjures up an '&'
token from nowhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this fixup the build breaks on Win32. Previously it was enabled using
a second regular expression to parse the bison version string. This
effectively duplicates the first regular expression that parses the bison
version string to determine if the version of bison found can be used.
The second regular expression could get missed when changing the first, as
happened with commit f5cf236ddbe8e24e. So refactor the code to extract the
bison version once, and then use this later as a simple numeric comparison.
With this change, we can actually use bison 2.7 to regenerate perly.*
|
|
|
|
|
| |
this is needed due to the change to regen_perly.pl. Otherwise
regen.t starts complaining. The actual diff is just noise.
|
|
|
|
|
|
|
|
|
|
|
| |
Under mad builds, commit 5db1eb8 caused this warning:
$ PERL_XMLDUMP=/dev/null ./perl -Ilib -e 'foo:'
Invalid TOKEN object ignored at -e line 1.
Since I don’t understand the mad code so well, the easiest fix is to
revert back to using a PV, as we did before 5db1eb8. To record the
utf8ness, we sneak it behind the trailing null.
|
|
|
|
|
| |
They have leaked since v5.15.9-35-g5db1eb8 (which probably broke mad
dumping of labels; to be addressed in the next commit).
|
| |
|
|
|
|
| |
This token is not used any more.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pad slot for a my sub now holds a stub with a prototype CV
attached to it by proto magic.
The prototype is cloned on scope entry. The stub in the pad is used
when cloning, so any code that references the sub before scope entry
will be able to see that stub become defined, making these behave
similarly:
our $x;
BEGIN { $x = \&foo }
sub foo { }
our $x;
my sub foo { }
BEGIN { $x = \&foo }
Constants are currently not cloned, but that may cause bugs in
pad_push. I’ll have to look into that.
On scope exit, lexical CVs go through leave_scope’s SAVEt_CLEARSV sec-
tion, like lexical variables. If the sub is referenced elsewhere, it
is abandoned, and its proto magic is stolen and attached to a new stub
stored in the pad. If the sub is not referenced elsewhere, it is
undefined via cv_undef.
To clone my subs on scope entry, we create a sequence of introcv and
clonecv ops. See the huge comment in block_end that explains why we
need two separate ops for each CV.
To allow my subs to be defined in inner subs (my sub foo; sub { sub
foo {} }), pad_add_name_pvn and S_pad_findlex now upgrade the entry
for a my sub to a CV to begin with, so that fake entries added to pads
(fake entries are those that reference outer pads) can share the same
CV. Otherwise newMYSUB would have to add the CV to every pad that
closes over the ‘my sub’ declaration. newMYSUB no longer throws away
the initial value replacing it with a new one.
Prototypes are not currently visible to sub calls at compile time,
because the lexer sees the empty stub. A future commit will
solve that.
When I added name heks to CV’s I made mistakes in a few places, by not
turning on the CVf_NAMED flag, or by not clearing the field when free-
ing the hek. Those code paths were not exercised enough by state
subs, so the problems did not show up till now. So this commit fixes
those, too.
One of the tests in lexsub.t, involving foreach loops, was incorrect,
and has been fixed. Another test has been added to the end for a par-
ticular case of state subs closing over my subs that I broke when ini-
tially trying to get sibling my subs to close over each other, before
I had separate introcv and clonecv ops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since state variables are not shared between closures, but only
between invocations of the same closure, state subs should behave
the same way.
This was a little tricky. When we clone a sub, we now clone inner
state subs at the same time. When walking through the pad, cloning
items, we cannot simply clone the inner sub when we see it, because it
may close over things we haven’t cloned yet:
sub {
state sub foo;
my $x
sub foo { $x }
}
We can’t just delay cloning it and do it afterwards, because they may
be multiple subs closing over each other:
sub {
state sub foo;
state sub bar;
sub foo { \&bar }
sub bar { \&foo }
}
So *all* the entries in the new pad must be filled before any inner
subs can be cloned.
So what we do is put a stub in place of the cloned sub. And then
in a second pass clone the inner subs, reusing the stubs from the
first pass.
|
|
|
|
|
|
|
|
|
|
| |
This commit does just enough to get things compiling. The padcv op
is still unimplemented (in fact, converting the padany to a padcv is
still not done), so you can’t actually run the code yet.
Bareword lookup in yylex now produces PRIVATEREF tokens for state
subs, so the grammar has been adjusted to accept a ‘subname’ in sub
calls (PRIVATEREF or WORD) where previously only a WORD was permitted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In making ‘sub foo’ respect previous ‘our sub’ declarations in a
recent commit, I actually made ‘state sub foo’ into a syntax error.
(At the time, I patched up MYSUB in perly.y to keep the tests for ‘"my
sub" not yet implemented’ still working.) Basically, it was creat-
ing an empty pad entry, but returning something that perly.y was not
expecting.
This commit adjusts the grammar to allow the SUB branch of barestmt to
accept a PRIVATEREF for its subname, in addition to a WORD. It reuses
the subname rule that SUB used to use (before our subs were added),
gutting it to remove the special block handling, which SUB now tokes
care of. That means the MYSUB rule will no longer turn on CvSPECIAL
on the PL_compcv that is going to be thrown away anyway.
The code for special blocks (BEGIN, END, etc.) that turns on CvSPECIAL
now checks for state subs and skips those. It only applies to our
subs and package subs.
newMYSUB has now actually been written. It basically duplicates
newATTRSUB, except for GV-specific things. It does currently vivify a
GV and set CvGV, but I am hoping to change that later. I also hope to
merge some of the code later, too.
I changed the prototype of newMYSUB to make it easier to use. It is
not used anywhere on CPAN and has always simply died, so that should
be all right.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit switches all sub definitions, whether with ‘our’ or not,
to using S_force_ident_maybe_lex (formerly known as S_pending_ident).
This means that an unqualified (no our/my/state or package prefix)
‘sub foo’ declaration does a pad lookup, just like $foo.
It turns out that the vivification that I added to the then
S_pending_ident for CVs was unnecessary and actually buggy. We
*don’t* want to autovivify GVs for CVs, because they might be con-
stants or forward declarations, which are stored in a simpler form.
I also had to change the subname rule used by MYSUB in perly.y, since
it can now be fed a PRIVATEREF, which it does not expect. This may
prove to be temporary, but it keeps current tests passing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the name is only allocated there. Nothing fetches it yet.
Notes on the implementation:
S_pending_ident contains the logic for determining whether $foo or
@foo refers to a lexical or package variable.
yylex defers to S_pending_ident if PL_pending_ident is set.
The KEY_sub case in yylex is changed to set PL_pending_ident instead
of using force_word. For package variables (including our),
S_pending_ident returns a WORD token, which is the same thing that
force_word produces. So *that* aspect of this change does not affect
the grammar. However....
The barestmt rule’s SUB branch begins with ‘SUB startsub subname’.
startsub is a null rule that creates a new sub in PL_compcv via
start_subparse(). subname is defined in terms of WORD and also checks
whether this is a special block, turning on CvSPECIAL(PL_compcv) if
it is. That flag has to be visible during compilation of the sub.
But for a lexical name, such as ‘our foo’, to be allocated in the
right pad, it has to come *before* startsub, i.e., ‘SUB subname
startsub’.
But subname needs to modify the sub that startsub created, set-
ting the flag.
So I copied (not moved, because MYSUB still uses it) the name-checking
code from the subname rule into the SUB branch of barestmt. Now that
uses WORD directly instead of invoking subname. That allows the code
there to set everything up in the right order.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing formats, the lexer invents tokens to feed to the parser.
So when the lexer dissects this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
The parser actually sees this (the parser knows that = . is like { }):
format =
; formline "@<<<< @>>>>\n", $foo, $bar, $baz;
.
The lexer makes no effort to make sure that the argument line is con-
tained within formline’s arguments. To make
{ do_stuff; $foo, bar }
work, the lexer supplies a ‘do’ before the block, if it is
inside a format.
This means that
$a, $b; $c, $d
feeds ($a, $b) to formline, wheras
{ $a, $b; $c, $d }
feeds ($c, $d) to formline. It also has various other
strange effects:
This script prints "# 0" as I would expect:
print "# ";
format =
@
(0 and die)
.
write
This one, locking parentheses, dies because ‘and’ has low precedence:
print "# ";
format =
@
0 and die
.
write
This does not work:
my $day = "Wed";
format =
@<<<<<<<<<<
({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day})
.
write
You have to do this:
my $day = "Wed";
format =
@<<<<<<<<<<
({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day})
.
write
which is very strange and shouldn’t even be valid syntax.
This does not work, because ‘no’ is not allowed in an expression:
use strict;
$::foo = "bar"
format =
@<<<<<<<<<<<
no strict; $foo
.
write;
Putting a block around it makes it work. Putting a semicolon before
‘no’ stop it from being a syntax error, but it silently does the
wrong thing.
I thought I could fix all these by putting an implicit do { ... }
around the argument line and removing the special-casing for an open-
ing brace, allowing anonymous hashrefs to work in formats, such
that this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
would turn into this:
format =
; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; };
.
But that will lead to madness like this ‘working’:
format =
@
}+do{
.
It would also stop lexicals declared in one format line from being
visible in another.
So instead this commit starts being honest with the parser. We still
have some ‘invented’ tokens, to indicate the start and end of a format
line, but now it is the parser itself that understands a sequence of
format lines, instead of being fed generated code.
So the example above is now presented to the parser like this:
format = ; FORMRBRACK
"@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK
; .
Note about the semicolons: The parser expects to see a semicolon at
the end of each statement. So the lexer has to supply one before
FORMRBRACK. The final dot goes through the same code that handles
closing braces, which generates a semicolon for the same reason. It’s
easier to make the parser expect a semicolon before the final dot than
to change the } code in the lexer. We use the } code for . because it
handles the internal variables that keep track of how many nested lev-
els there, what kind, etc.
The extra ;FORMRBRACK after the = is there also to keep the lexer sim-
ple (ahem). When a newline is encountered during ‘normal’ (as opposed
to format picture) parsing inside a format, that’s when the semicolon
and FORMRBRACK are emitted. (There was already a semicolon there
before this commit. I have just added FORMRBRACK in the same spot.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was brought up in ticket #43425.
The new slab allocator for ops (8be227ab5e) makes a CV responsible for
cleaning up its ops if it is freed prematurely (before the root is
attached).
Certain syntax errors involving formats can cause the parser to free
the CV owning an op when that op is on the PL_nextval stack.
This happens in these cases:
format =
@
use; format
strict
.
format =
@
;use
strict
.
format foo require bar
In the first two cases it is the line containing ‘strict’ that is
being interpreted as a format picture line and being fed through
force_next by S_scan_formline.
Then the error condition kicks in after the force, causing a
LEAVE_SCOPE in the parser (perly.c) which frees the sub owning the
const op for the format picture. Then a token with a freed op is fed
to the parser.
To make this clearer, when the second case above is parsed, the tokens
produced are as follows:
format =
[;] [formline] "@\n" [,]
; use [<word>]
[;] [formline] <freed>
[;] .
Notice how there is an implicit semicolon before each (implicit)
formline. Notice also how there is an implicit semicolon before the
final dot.
The <freed> thing represents the "strict\n" constant after it has been
freed. (-DT displays it as THING(opval=op_freed).)
When the implicit semicolon is emitted before a formline, the next two
tokens (formline and the string constant for this format picture line)
are put on to the PL_nextval stack via force_next.
It is when the implicit semicolon before "strict\n" is emitted that
the parser sees the error (there is only one path through the gram-
mar that uses the USE token, and it must have two WORDs following it;
therefore a semicolon after one WORD is an immediate error), calling
LEAVE_SCOPE, which frees the sub created by ‘use’, which owns the
const op on the PL_nextval stack containing the word "strict" and con-
sequently frees it.
I thought I could fix this by putting an implicit do { ... } around
the argument line. (This would fix another bug, whereby the argument
line in a format can ‘leak out’ of the formline(...).) But this does
not solve anything, as we end up with four tokens ( } ; formline
const ) on the PL_nextval stack when we emit the implicit semicolon
after ‘use’, instead of two.
format=
@
;use
strict
.
will turn into
format =
[;] [formline] "@\n" [,]
[do] [{] ; use [<word>] [;] [}]
[;] [formline] "strict\n"/<freed>
[;] .
It is when the lexer reaches "strict" that it will emit the semicolon
after the use. So we will be in the same situation as before.
So fixing the fact that the argument line can ‘leak out’ of the
formline and start a new statement won’t solve this particu-
lar problem.
I tried eliminating the LEAVE_SCOPE. (See
<https://rt.perl.org/rt3/Ticket/Display.html?id=43425#txn-273447>
where Dave Mitchell explains that the LEAVE_SCOPE is not strictly nec-
essary, but it is ‘still good to ensure that the savestack gets cor-
rectly popped during error recovery’.) That does not help, because
the lexer itself does ENTER/LEAVE to make sure form_lex_state and
lex_formbrack get restored properly after the lexer exits the format
(see 583c9d5cccf and 64a408986cf).
So when the final dot is reached, the ‘use’ CV is freed. Then an op
tree that includes the now-freed "strict\n" const op is passed to
newFORM, which tries to do op_free(block) (as of 3 commits ago; before
that the errors were more catastrophic), and ends up freeing an op
belonging to a freed slab.
Removing LEAVE_SCOPE did actually fix ‘format foo require bar’,
because there is no ENTER/LEAVE involved there, as the = (ENTER) has
not been reached yet. It was failing because ‘require bar’ would call
force_next for "bar", and then feed a REQUIRE token to the parser,
which would immediately see the error and call LEAVE_SCOPE (free-
ing the format), with the "bar" (belonging to the format’s slab)
still pending.
The final solution I came up with was to reuse an mechanism I came up
with earlier. Since the savestack may cause ops to outlive their CVs
due to SAVEFREEOP, opslab_force_free (called when an incomplete CV is
freed prematurely) will skip any op with o->op_savestack set. The
nextval stack can use the same flag. To make sure nothing goes awry
(we don’t want the same op on the nextval stack and the savestack at
the same time), I added a few assertions.
|
|
|
|
|
| |
Otherwise you can end up with a format that has nonsensical op tree,
but it gets attached and you can call it anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As long the argument line is not missing, braces can be used
for a format:
# valid
format foo {
@<<<
$a
}
but if the argument line is missing, you get a syntax error:
# invalid
format foo {
@<<<
}
and also if the name is missing:
# invalid
format {
@<<<
$a
}
If you use = then you can use a closing brace to terminate the format,
but only if the argument line is missing:
# valid, but useless
format =
@<<<
}
# invalid
format =
@<<<
$a
}
In commit 79072805 (perl 5.0 alpha 20), the lexer was changed to lie
to the parser about formats’ = and . delimiters, pretending they are
actually { and }, because the bracket-handling code takes care of all
the scoping issues (well, most of them).
Unfortunately, this implementation detail can leak through, but it is
far from consistent, as shown above.
Fixing this makes it easier to fix other bugs.
We do this by recording the fact that we are dealing with a format
token before jumping to the bracket code. And we feed format-specific
tokens to the parser in that case, so it can’t get fake brackets and
real brackets mixed up.
|