summaryrefslogtreecommitdiff
path: root/perly.h
Commit message (Collapse)AuthorAgeFilesLines
* run regen_perly.plDavid Mitchell2016-02-111-1/+1
|
* Add support for bison 3.0David Mitchell2016-02-031-6/+8
| | | | | | Mainly it no longer generates some tables used for debugging. This commit also adds a new define showing what bison version was used.
* [perl #127122] warn on unless (assignment) when syntax warnings are onTony Cook2016-01-211-5/+5
| | | | | | Previously the assignment was hidden by the not op wrapped around the condition, but newCONDOP() is sufficiently flexible that it isn't needed.
* given(): remove support for lexical $_David Mitchell2015-10-021-22/+42
| | | | | | | | | | | | | | There is dead code that used to allow my $_; ... given ($foo) { # lexical $_ aliased to $foo here } Now that lexical $_ has been removed, remove the code. I've left the signatures of the newFOO() functions unchanged; they just expect a target of 0 to always be passed now.
* perly.y: Remove type from ';'Father Chrysostomos2015-02-151-1/+1
| | | | | | This token’s type is never used. We don’t bother setting the type, either, in toke.c, so it will be garbage. Removing the type makes it harder to use the garbage value by mistake in refactoring.
* perly.y: Remove types for '$' and '*'Father Chrysostomos2015-02-071-1/+1
| | | | | | | These two tokens never use their value, and the value is not even set in toke.c, which means it will contain a junk value from some previous token. Removing the types prevents that junk value from being acci- dentally used.
* Parse and compile string- and num-specific bitopsFather Chrysostomos2015-01-311-42/+22
| | | | Yay, the semicolons are back.
* perly.y changes from Lukas Mai in RT 123069Peter Martini2015-01-181-22/+42
| | | | | | | | | This moves signatures before attributes in the grammar by creating separate branches for the prototype and signatures cases, so that the introduced block and the fact that signatures do not allow for declarations can be handled properly. Tests and regen_perly to follow.
* Simplify s/// and tr/// parsing logicFather Chrysostomos2015-01-081-1/+1
| | | | | | | | | | | | | | | | These two operators were being translated into subst("","") and tr("","") by the lexer. Then pmruntime in op.c would take apart the resulting list op. Instead of constructing a list op only to take it apart again, feed the replacement part to pmruntime separately. We can achieve this by introducing a new token ('/') that the parser rec- ognizes as introducing a replacement. If we had followed this approach to begin with, then bug #123542 would never have happened. (Actually, it seems the parser did know about the replacement part to begin with, but it changed in perl-5.8.0-4047-g131b3ad to fix some overloading problems.)
* perly.y: Don’t call op_lvalue on refgen kidFather Chrysostomos2015-01-061-1/+1
| | | | | | ck_spair also applies lvalue context to the kid ops, so we just end up calling op_lvalue twice on the same ops. It’s harmless (being idempo- tent), but wasteful.
* Unify format and named-sub pad weakref codeFather Chrysostomos2014-12-091-1/+1
|
* Deparse formats in the right spotFather Chrysostomos2014-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formats need the same logic applied in 34b54951568, to avoid this: Input: { my $x; format STDOUT = @ $x . } __END__ $ pbpaste|./perl -Ilib -MO=Deparse { my $x; } format STDOUT = @ $x . __DATA__ - syntax OK That $x in the format is now global, not lexical. Also, we need to make sure the sequence numbers are correct. The statement after the format stopped having a higher sequence number than CvOUTSIDE_SEQ(format) in 8635e3c238f, because the ‘pending’ sequence number set aside for the nextstate op created just after com- pile-time block exit (which nextstate op comes before the block) was not actually being used by newFORM (unlike newATTRSUB), so the follow- ing statement was using that number.
* [perl #123286] Lone C-style for in a blockFather Chrysostomos2014-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A block in perl usually consists of an enter/leave pair plus the con- tents of the block: leave enter nextstate whatever But if the contents of the block are simple enough to forego the full block structure, a simple scope op is used, which is not even executed: scope ex-nextstate whatever If there is a real nextstate op anywhere in the block, it resets the stack to whatever it was at block entry, based on the value on the context stack placed there by the enter op. That’s why we can never have scope+nextstate (we have ex-nextstate, or a former nextstate op that is not executed). A for-loop (for(init; cond; cont) { ... }) executes the init section first, and then an unstack op, which is like nextstate in that it resets the stack based on what the context stack says is the base off- set for this block. If we have an unstack op, we can’t use scope, just as we can’t use it with nextstate. But we *were* nonetheless using scope in this case. Hence, map { for(...;...;...) {...} } ... caused the for-loop to reset the stack to the beginning of map’s own arguments. So the for-loop would stomp on them. We can see the same bug with ‘for’ clobbering an outer list: $ perl5.20.1 -le 'print 1..3, do{for(0;0;){}}, 4..6;' 0456
* [perl #77452] Deparse { ...; BEGIN{} } correctlyFather Chrysostomos2014-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | 8635e3c2 (5.21.6) changed the COP sequence numbers for nested blocks, such that most BEGIN blocks (incl. ‘use’ statements) and sub declara- tions end up in the right place. However, it had the side effect of causing declarations at the end of the enclosing scope to fall out of it and appear below. This commit fixes that by adding an extra nulled COP to the end of the enclosing scope if that scope ends with a sub, so the final declara- tion gets deparsed before it. The frequency of sub declarations at the end of the enclosing scope is sufficiently low (I’m guessing a bit here) that this slight increase in run-time memory usage is probably acceptable. I had to change B::Deparse to deparse nulled COPs the same way it does live COPs, which means we get more extraneous semicolons than before. I hope to fix that in a forthcoming commit. I also ran into a B bug, in that null ops are not presented to Perl code with the right op class (see the blessing in the patch). I plan to fix that in a separ- ate commit, too.
* [perl #77452] Deparse BEGIN blocks in the right placeFather Chrysostomos2014-11-061-42/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the op tree, a statement consists of a nextstate/dbstate op (of class cop) followed by the contents of the statement. This cop is created after the statement has been parsed. So if you have nested statements, the outermost statement has the highest sequence number (cop_seq). Every sub (including BEGIN blocks) has a sequence number indicating where it occurs in its containing sub. So BEGIN { } #1 # seq 2 { # seq 1 ... } is indistinguishable from # seq 2 { BEGIN { } #1 # seq 1 ... } because the sequence number of the BEGIN block is 1 in both examples. By reserving a sequence number at the start of every block and using it once the block has finished parsing, we can do this: BEGIN { } #1 # seq 1 { # seq 2 ... } # seq 1 { BEGIN { } #2 # seq 2 ... } and now B::Deparse can tell where to put the blocks. PL_compiling.cop_seq was unused, so this is where I am stashing the pending sequence number.
* rename convert to op_convert_list and APIfyLukas Mai2014-10-261-22/+42
|
* Remove redundant op_lvalue calls in perly.yFather Chrysostomos2014-10-241-1/+1
| | | | | | | | | | | | | When (\$x)=\$y is compiled, the \ on the lhs gives lvalue context to its argument by calling op_lvalue. Then later the = gives lvalue con- text to the \, calling op_lvalue again, which transforms the $x into an lvref op (via op.c:S_lvref). I just copied that logic when I extended aliasing via reference to foreach \$x. But here, we don’t need to call op_lvalue on the $x, because we know it is going to go through op.c:S_lvref, which doesn’t care whether it has been through op_lvalue already or not. The end result is the same.
* foreach \$varFather Chrysostomos2014-10-111-42/+22
| | | | Some passing tests are still marked to-do. We need more tests still.
* Make OP_METHOD* to be of new class METHOPsyber2014-10-031-22/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new opcode class, METHOP, which will hold class/method related info needed at runtime to improve performance of class/object method calls, then change OP_METHOD and OP_METHOD_NAMED from being UNOP/SVOP to being METHOP. Note that because OP_METHOD is a UNOP with an op_first, while OP_METHOD_NAMED is an SVOP, the first field of the METHOP structure is a union holding either op_first or op_sv. This was seen as less messy than having to introduce two new op classes. The new op class's character is '.' Nothing has changed in functionality and/or performance by this commit. It just introduces new structure which will be extended with extra fields and used in later commits. Added METHOP constructors: - newMETHOP() for method ops with dynamic method names. The only optype for this op is OP_METHOD. - newMETHOP_named() for method ops with constant method names. Optypes for this op are: OP_METHOD_NAMED (currently) and (later) OP_METHOD_SUPER, OP_METHOD_REDIR, OP_METHOD_NEXT, OP_METHOD_NEXTCAN, OP_METHOD_MAYBENEXT (This commit includes fixups by davem)
* In perly.y, change PL_parser to parserFather Chrysostomos2014-08-241-1/+1
| | | | | | | | | | | | | | | | | All these code snippets are embedded inside a function (perly.c:yyparse) that puts the current value of PL_parser in a local variable named parser. So the two are equivalent, but the latter only has to access a local variable. Before: $ ls -ld perly.o -rw-r--r-- 1 sprout staff 94748 Aug 22 06:12 perly.o After: $ ls -ld perly.o -rw-r--r-- 1 sprout staff 94340 Aug 22 06:15 perly.o
* Set PL_expect only once after curly subscriptsFather Chrysostomos2014-08-241-1/+1
| | | | | | | | | | | | When curly subscripts are parsed, the lexer (toke.c:yylex) notes that the value of PL_expect needs to be set to XSTATE (expecting a state- ment) after the final brace. When the final brace is encountered, PL_expect is set to that recorded value. But then the parser (perly.y) sets it to XOPERATOR immediately thereafter. This approach requires a plethora of identical statements in perly.y. If we just set PL_expect to the right value to begin with, we can avoid all those assignments.
* Set PL_expect less often when parsing semicolonsFather Chrysostomos2014-08-241-42/+22
| | | | | | | | | | | | | | | | | | | | | | | | As it worked before, the parser (perly.y) would set PL_expect to XSTATE after encountering a statement-terminating semicolon. Two functions in op.c--package and utilize--had to set the value to XSTATE as a result. Also, in the case of a closing brace, the lexer emits an implicit semicolon followed by '}' (emitted via force_next). force_next records the value of PL_expect and restores it when emitting the token. So in this case the value of PL_expect was flipping back and forth between two values. Instead of having the parser set it to XSTATE, we can have the lexer set it to XSTATE by default when emitting an explicit semicolon. (It was setting it to XTERM.) The parser can set it to XTERM in the only place that matters; viz., the header of a for-loop. This simplifies things conceptually, and makes the code a whole line shorter. (The diff stat shows more savings in line count, but that is because the version of bison I used to regenerate the tables produces smaller headers than what was already committed.)
* Remove MAD.Jarkko Hietaniemi2014-06-131-23/+27
| | | | | | MAD = Misc Attribute Decoration; unmaintained attempt at preserving the Perl parse tree more faithfully so that automatic conversion to Perl 6 would have been easier.
* subroutine signaturesZefram2014-02-011-25/+9
| | | | | | | | | | Declarative syntax to unwrap argument list into lexical variables. "sub foo ($a,$b) {...}" checks number of arguments and puts the arguments into lexical variables. Signatures are not equivalent to the existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only available by enabling a non-default feature, and generate warnings about being experimental. The syntactic clash with prototypes is managed by disabling the short prototype syntax when signatures are enabled.
* Remove support for "do SUBROUTINE(LIST)"Dagfinn Ilmari Mannsåker2013-12-221-22/+42
| | | | | It's been deprecated (and emitting a warning) since Perl v5.0.0, and support for it consitutes nearly 3% of the grammar.
* ->$#*Father Chrysostomos2013-11-241-1/+1
|
* Allow ->@ ->$ interpolation under postderef_qq featureFather Chrysostomos2013-10-051-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This turned out to be tricky. Normally @ at the beginning of the interpolated code signals to the lexer to emit ‘join($",’ immediately. With "$_->@*" we would have to retract the $ _ -> tokens upon encoun- tering @*, which we obviously cannot do. Waiting until we reach the end of the interpolated text before emit- ting anything could not work either, as it may contain BEGIN blocks that affect the way part of the interpolated code is parsed. So what we do is introduce an egregious or clever hack, depending on how you look at it. Normally, the lexer turns "@foo" into: stringify ( join ( $ " , @ foo ) ) (The " is a WORD token, representing a variable name.) "$_" becomes: stringify ( $ _ ) We can turn "$_->@*" into: stringify ( $ _ -> @ * POSTJOIN ) Where POSTJOIN is a new lexer token with special handling that creates a join op just the way join($", ...) does. To make "foo$_->@*bar" work as well, we have to make POSTJOIN have precedence just below ->, so that stringify ( "foo" . $ _ -> @ * POSTJOIN . "bar" ) (what the parser sees) is equivalent to: stringify ( "foo" . ( $ _ -> @ * POSTJOIN ) . "bar" )
* ->%{ ->%[Father Chrysostomos2013-10-051-1/+1
|
* Postfix dereference syntaxFather Chrysostomos2013-10-051-1/+1
| | | | | | | | | | | | | | | $_->$* means $$_ (and compiled down to the same op tree) $_->@* means @$_ ( ditto ditto blah blah blah ) $_->%* means %$_ (...) $_->&* means &$_ $_->** means *$_ $_->@[...] means @$_[...] $_->@{...} means @$_{...} $_->*{...} means *$_{...} $_->@* is not always equivalent to @$_, particularly in contexts like @foo[0], which cannot be written foo->@*[0]. (Just omit the asterisk and it works.)
* Reduce false positives for @hsh{$s} and @ary[$s] warningsFather Chrysostomos2013-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This resolves tickets #28380 and #114024. Commit 95a31aad5 did something similar to this for the new %hash{...} syntax. This commit extends it to @ slices and combines the two code paths. The heuristics in toke.c can easily produce false positives. So the op is flagged as being a candidate for the warning. Then when op.c has the op tree available, it examines it to see whether the heuristic may have been a false positive. This avoids bugs with qw "foo bar baz" and sub calls triggering the warning. The source code is no longer available for the warning, so we recon- struct it from the op tree, skipping the subscript if it is anything other than a const op. This means that @hash{$foo} comes out as @hash{...} and @hash{foo} as @hash{"foo"}. It also meeans that @hash{"]"} is displayed correctly instead of as @hash{"]. Commit 95a31aad5 also modified the heuristic for %hash{...} to exempt qw altogether. But it did not exempt it if it was preceded by a tab. So this commit rectifies that. This commit also improves the false positive detection by exempting any ops returning lists that can get past toke.c’s heuristic. I went through the entire list of ops, but I may have missed some. Also, @ slices on the lhs of = are exempt, as they change the context and are hence actually useful.
* Fewer false positives for %hash{$scalar} warningFather Chrysostomos2013-09-131-42/+22
| | | | | | | | | | | | | | | | | | | | | | | | Instead of warning in the lexer, flag the op and then warn in op.c, when the op tree is available, so we don’t end up warning for actual lists or for sub calls. Also, only warn in scalar context, as in list context $hash{$scalar} and %hash{$scalar} do different things. In op.c we no longer have easy access to the source code, so recon- struct the hash/array access based on the op tree. This means %hash{foo} becomes %hash{"foo"}. We only reconstruct constant keys, so %hash{++$x} becomes %hash{...}. This also corrects erroneous dumps, like %hash{"} for %hash{"}"}. Instead of triggering the warning solely based on the op tree, we still keep the heuristic in toke.c, so that common workarounds for that warning (e.g., {q<key>} and {("key")}) continue to work. The heuristic in toke.c is tweaked to avoid warning for qw(). In a future commit I plan to extend this to the existing @array[0] and @hash{key} warnings, to avoid false positives.
* index/value array slice operationRuslan Zakirov2013-09-131-1/+1
| | | | | | kvaslice operator that imlements %a[0,2,4] syntax which result in list of index/value pairs. Implemented in consistency with "key/value hash slice" operator.
* key/value hash slice operationRuslan Zakirov2013-09-131-22/+42
| | | | | | kvhslice operator that implements %h{1,2,3,4} syntax which returns list of key value pairs rather than just values (regular slices).
* Correct three sub call comments in perly.yFather Chrysostomos2013-05-311-42/+22
| | | | | | NOAMP is only emitted by toke.c when there are no parentheses. If there is a parenthesis following a word, the lexer conjures up an '&' token from nowhere.
* Improve the logic in regen_perly.pl for manually generating token macros.Nicholas Clark2013-05-061-1/+1
| | | | | | | | | | | | | Without this fixup the build breaks on Win32. Previously it was enabled using a second regular expression to parse the bison version string. This effectively duplicates the first regular expression that parses the bison version string to determine if the version of bison found can be used. The second regular expression could get missed when changing the first, as happened with commit f5cf236ddbe8e24e. So refactor the code to extract the bison version once, and then use this later as a simple numeric comparison. With this change, we can actually use bison 2.7 to regenerate perly.*
* run regen_perlyDavid Mitchell2013-03-031-22/+42
| | | | | this is needed due to the change to regen_perly.pl. Otherwise regen.t starts complaining. The actual diff is just noise.
* Fix invalid token warning with PERL_XMLDUMP and labelFather Chrysostomos2012-11-041-1/+1
| | | | | | | | | | | Under mad builds, commit 5db1eb8 caused this warning: $ PERL_XMLDUMP=/dev/null ./perl -Ilib -e 'foo:' Invalid TOKEN object ignored at -e line 1. Since I don’t understand the mad code so well, the easiest fix is to revert back to using a PV, as we did before 5db1eb8. To record the utf8ness, we sneak it behind the trailing null.
* Stop statement labels from leakingFather Chrysostomos2012-11-041-42/+22
| | | | | They have leaked since v5.15.9-35-g5db1eb8 (which probably broke mad dumping of labels; to be addressed in the next commit).
* apparently this actually needs to be regenerated tooJesse Luehrs2012-09-251-23/+43
|
* perly.y: Remove MYSUBFather Chrysostomos2012-09-151-55/+53
| | | | This token is not used any more.
* Clone my subs on scope entryFather Chrysostomos2012-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pad slot for a my sub now holds a stub with a prototype CV attached to it by proto magic. The prototype is cloned on scope entry. The stub in the pad is used when cloning, so any code that references the sub before scope entry will be able to see that stub become defined, making these behave similarly: our $x; BEGIN { $x = \&foo } sub foo { } our $x; my sub foo { } BEGIN { $x = \&foo } Constants are currently not cloned, but that may cause bugs in pad_push. I’ll have to look into that. On scope exit, lexical CVs go through leave_scope’s SAVEt_CLEARSV sec- tion, like lexical variables. If the sub is referenced elsewhere, it is abandoned, and its proto magic is stolen and attached to a new stub stored in the pad. If the sub is not referenced elsewhere, it is undefined via cv_undef. To clone my subs on scope entry, we create a sequence of introcv and clonecv ops. See the huge comment in block_end that explains why we need two separate ops for each CV. To allow my subs to be defined in inner subs (my sub foo; sub { sub foo {} }), pad_add_name_pvn and S_pad_findlex now upgrade the entry for a my sub to a CV to begin with, so that fake entries added to pads (fake entries are those that reference outer pads) can share the same CV. Otherwise newMYSUB would have to add the CV to every pad that closes over the ‘my sub’ declaration. newMYSUB no longer throws away the initial value replacing it with a new one. Prototypes are not currently visible to sub calls at compile time, because the lexer sees the empty stub. A future commit will solve that. When I added name heks to CV’s I made mistakes in a few places, by not turning on the CVf_NAMED flag, or by not clearing the field when free- ing the hek. Those code paths were not exercised enough by state subs, so the problems did not show up till now. So this commit fixes those, too. One of the tests in lexsub.t, involving foreach loops, was incorrect, and has been fixed. Another test has been added to the end for a par- ticular case of state subs closing over my subs that I broke when ini- tially trying to get sibling my subs to close over each other, before I had separate introcv and clonecv ops.
* Clone state subs in anon subsFather Chrysostomos2012-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since state variables are not shared between closures, but only between invocations of the same closure, state subs should behave the same way. This was a little tricky. When we clone a sub, we now clone inner state subs at the same time. When walking through the pad, cloning items, we cannot simply clone the inner sub when we see it, because it may close over things we haven’t cloned yet: sub { state sub foo; my $x sub foo { $x } } We can’t just delay cloning it and do it afterwards, because they may be multiple subs closing over each other: sub { state sub foo; state sub bar; sub foo { \&bar } sub bar { \&foo } } So *all* the entries in the new pad must be filled before any inner subs can be cloned. So what we do is put a stub in place of the cloned sub. And then in a second pass clone the inner subs, reusing the stubs from the first pass.
* Look up state subs in the padFather Chrysostomos2012-09-151-1/+1
| | | | | | | | | | This commit does just enough to get things compiling. The padcv op is still unimplemented (in fact, converting the padany to a padcv is still not done), so you can’t actually run the code yet. Bareword lookup in yylex now produces PRIVATEREF tokens for state subs, so the grammar has been adjusted to accept a ‘subname’ in sub calls (PRIVATEREF or WORD) where previously only a WORD was permitted.
* Store state subs in the padFather Chrysostomos2012-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In making ‘sub foo’ respect previous ‘our sub’ declarations in a recent commit, I actually made ‘state sub foo’ into a syntax error. (At the time, I patched up MYSUB in perly.y to keep the tests for ‘"my sub" not yet implemented’ still working.) Basically, it was creat- ing an empty pad entry, but returning something that perly.y was not expecting. This commit adjusts the grammar to allow the SUB branch of barestmt to accept a PRIVATEREF for its subname, in addition to a WORD. It reuses the subname rule that SUB used to use (before our subs were added), gutting it to remove the special block handling, which SUB now tokes care of. That means the MYSUB rule will no longer turn on CvSPECIAL on the PL_compcv that is going to be thrown away anyway. The code for special blocks (BEGIN, END, etc.) that turns on CvSPECIAL now checks for state subs and skips those. It only applies to our subs and package subs. newMYSUB has now actually been written. It basically duplicates newATTRSUB, except for GV-specific things. It does currently vivify a GV and set CvGV, but I am hoping to change that later. I also hope to merge some of the code later, too. I changed the prototype of newMYSUB to make it easier to use. It is not used anywhere on CPAN and has always simply died, so that should be all right.
* Make ‘sub foo{}’ respect ‘our foo’Father Chrysostomos2012-09-151-1/+1
| | | | | | | | | | | | | | | | | This commit switches all sub definitions, whether with ‘our’ or not, to using S_force_ident_maybe_lex (formerly known as S_pending_ident). This means that an unqualified (no our/my/state or package prefix) ‘sub foo’ declaration does a pad lookup, just like $foo. It turns out that the vivification that I added to the then S_pending_ident for CVs was unnecessary and actually buggy. We *don’t* want to autovivify GVs for CVs, because they might be con- stants or forward declarations, which are stored in a simpler form. I also had to change the subname rule used by MYSUB in perly.y, since it can now be fed a PRIVATEREF, which it does not expect. This may prove to be temporary, but it keeps current tests passing.
* Allocate ‘our sub’ in the padFather Chrysostomos2012-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the name is only allocated there. Nothing fetches it yet. Notes on the implementation: S_pending_ident contains the logic for determining whether $foo or @foo refers to a lexical or package variable. yylex defers to S_pending_ident if PL_pending_ident is set. The KEY_sub case in yylex is changed to set PL_pending_ident instead of using force_word. For package variables (including our), S_pending_ident returns a WORD token, which is the same thing that force_word produces. So *that* aspect of this change does not affect the grammar. However.... The barestmt rule’s SUB branch begins with ‘SUB startsub subname’. startsub is a null rule that creates a new sub in PL_compcv via start_subparse(). subname is defined in terms of WORD and also checks whether this is a special block, turning on CvSPECIAL(PL_compcv) if it is. That flag has to be visible during compilation of the sub. But for a lexical name, such as ‘our foo’, to be allocated in the right pad, it has to come *before* startsub, i.e., ‘SUB subname startsub’. But subname needs to modify the sub that startsub created, set- ting the flag. So I copied (not moved, because MYSUB still uses it) the name-checking code from the subname rule into the SUB branch of barestmt. Now that uses WORD directly instead of invoking subname. That allows the code there to set everything up in the right order.
* Don’t let format arguments ‘leak out’ of formlineFather Chrysostomos2012-08-081-45/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing formats, the lexer invents tokens to feed to the parser. So when the lexer dissects this: format = @<<<< @>>>> $foo, $bar, $baz . The parser actually sees this (the parser knows that = . is like { }): format = ; formline "@<<<< @>>>>\n", $foo, $bar, $baz; . The lexer makes no effort to make sure that the argument line is con- tained within formline’s arguments. To make { do_stuff; $foo, bar } work, the lexer supplies a ‘do’ before the block, if it is inside a format. This means that $a, $b; $c, $d feeds ($a, $b) to formline, wheras { $a, $b; $c, $d } feeds ($c, $d) to formline. It also has various other strange effects: This script prints "# 0" as I would expect: print "# "; format = @ (0 and die) . write This one, locking parentheses, dies because ‘and’ has low precedence: print "# "; format = @ 0 and die . write This does not work: my $day = "Wed"; format = @<<<<<<<<<< ({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day}) . write You have to do this: my $day = "Wed"; format = @<<<<<<<<<< ({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day}) . write which is very strange and shouldn’t even be valid syntax. This does not work, because ‘no’ is not allowed in an expression: use strict; $::foo = "bar" format = @<<<<<<<<<<< no strict; $foo . write; Putting a block around it makes it work. Putting a semicolon before ‘no’ stop it from being a syntax error, but it silently does the wrong thing. I thought I could fix all these by putting an implicit do { ... } around the argument line and removing the special-casing for an open- ing brace, allowing anonymous hashrefs to work in formats, such that this: format = @<<<< @>>>> $foo, $bar, $baz . would turn into this: format = ; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; }; . But that will lead to madness like this ‘working’: format = @ }+do{ . It would also stop lexicals declared in one format line from being visible in another. So instead this commit starts being honest with the parser. We still have some ‘invented’ tokens, to indicate the start and end of a format line, but now it is the parser itself that understands a sequence of format lines, instead of being fed generated code. So the example above is now presented to the parser like this: format = ; FORMRBRACK "@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK ; . Note about the semicolons: The parser expects to see a semicolon at the end of each statement. So the lexer has to supply one before FORMRBRACK. The final dot goes through the same code that handles closing braces, which generates a semicolon for the same reason. It’s easier to make the parser expect a semicolon before the final dot than to change the } code in the lexer. We use the } code for . because it handles the internal variables that keep track of how many nested lev- els there, what kind, etc. The extra ;FORMRBRACK after the = is there also to keep the lexer sim- ple (ahem). When a newline is encountered during ‘normal’ (as opposed to format picture) parsing inside a format, that’s when the semicolon and FORMRBRACK are emitted. (There was already a semicolon there before this commit. I have just added FORMRBRACK in the same spot.)
* Prevent double frees/crashes with format syntax errsFather Chrysostomos2012-08-081-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was brought up in ticket #43425. The new slab allocator for ops (8be227ab5e) makes a CV responsible for cleaning up its ops if it is freed prematurely (before the root is attached). Certain syntax errors involving formats can cause the parser to free the CV owning an op when that op is on the PL_nextval stack. This happens in these cases: format = @ use; format strict . format = @ ;use strict . format foo require bar In the first two cases it is the line containing ‘strict’ that is being interpreted as a format picture line and being fed through force_next by S_scan_formline. Then the error condition kicks in after the force, causing a LEAVE_SCOPE in the parser (perly.c) which frees the sub owning the const op for the format picture. Then a token with a freed op is fed to the parser. To make this clearer, when the second case above is parsed, the tokens produced are as follows: format = [;] [formline] "@\n" [,] ; use [<word>] [;] [formline] <freed> [;] . Notice how there is an implicit semicolon before each (implicit) formline. Notice also how there is an implicit semicolon before the final dot. The <freed> thing represents the "strict\n" constant after it has been freed. (-DT displays it as THING(opval=op_freed).) When the implicit semicolon is emitted before a formline, the next two tokens (formline and the string constant for this format picture line) are put on to the PL_nextval stack via force_next. It is when the implicit semicolon before "strict\n" is emitted that the parser sees the error (there is only one path through the gram- mar that uses the USE token, and it must have two WORDs following it; therefore a semicolon after one WORD is an immediate error), calling LEAVE_SCOPE, which frees the sub created by ‘use’, which owns the const op on the PL_nextval stack containing the word "strict" and con- sequently frees it. I thought I could fix this by putting an implicit do { ... } around the argument line. (This would fix another bug, whereby the argument line in a format can ‘leak out’ of the formline(...).) But this does not solve anything, as we end up with four tokens ( } ; formline const ) on the PL_nextval stack when we emit the implicit semicolon after ‘use’, instead of two. format= @ ;use strict . will turn into format = [;] [formline] "@\n" [,] [do] [{] ; use [<word>] [;] [}] [;] [formline] "strict\n"/<freed> [;] . It is when the lexer reaches "strict" that it will emit the semicolon after the use. So we will be in the same situation as before. So fixing the fact that the argument line can ‘leak out’ of the formline and start a new statement won’t solve this particu- lar problem. I tried eliminating the LEAVE_SCOPE. (See <https://rt.perl.org/rt3/Ticket/Display.html?id=43425#txn-273447> where Dave Mitchell explains that the LEAVE_SCOPE is not strictly nec- essary, but it is ‘still good to ensure that the savestack gets cor- rectly popped during error recovery’.) That does not help, because the lexer itself does ENTER/LEAVE to make sure form_lex_state and lex_formbrack get restored properly after the lexer exits the format (see 583c9d5cccf and 64a408986cf). So when the final dot is reached, the ‘use’ CV is freed. Then an op tree that includes the now-freed "strict\n" const op is passed to newFORM, which tries to do op_free(block) (as of 3 commits ago; before that the errors were more catastrophic), and ends up freeing an op belonging to a freed slab. Removing LEAVE_SCOPE did actually fix ‘format foo require bar’, because there is no ENTER/LEAVE involved there, as the = (ENTER) has not been reached yet. It was failing because ‘require bar’ would call force_next for "bar", and then feed a REQUIRE token to the parser, which would immediately see the error and call LEAVE_SCOPE (free- ing the format), with the "bar" (belonging to the format’s slab) still pending. The final solution I came up with was to reuse an mechanism I came up with earlier. Since the savestack may cause ops to outlive their CVs due to SAVEFREEOP, opslab_force_free (called when an incomplete CV is freed prematurely) will skip any op with o->op_savestack set. The nextval stack can use the same flag. To make sure nothing goes awry (we don’t want the same op on the nextval stack and the savestack at the same time), I added a few assertions.
* Don’t create formats after compilation errorsFather Chrysostomos2012-08-081-1/+1
| | | | | Otherwise you can end up with a format that has nonsensical op tree, but it gets attached and you can call it anyway.
* Forbid braces as format delimitersFather Chrysostomos2012-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As long the argument line is not missing, braces can be used for a format: # valid format foo { @<<< $a } but if the argument line is missing, you get a syntax error: # invalid format foo { @<<< } and also if the name is missing: # invalid format { @<<< $a } If you use = then you can use a closing brace to terminate the format, but only if the argument line is missing: # valid, but useless format = @<<< } # invalid format = @<<< $a } In commit 79072805 (perl 5.0 alpha 20), the lexer was changed to lie to the parser about formats’ = and . delimiters, pretending they are actually { and }, because the bracket-handling code takes care of all the scoping issues (well, most of them). Unfortunately, this implementation detail can leak through, but it is far from consistent, as shown above. Fixing this makes it easier to fix other bugs. We do this by recording the fact that we are dealing with a format token before jumping to the bracket code. And we feed format-specific tokens to the parser in that case, so it can’t get fake brackets and real brackets mixed up.