summaryrefslogtreecommitdiff
path: root/toke.c
Commit message (Collapse)AuthorAgeFilesLines
* (perl #125351) abort parsing if parse errors happen in a sub lexSteve Hay2018-03-191-0/+18
| | | | | | | | | | | | | | | | | | We've had a few reports of segmentation faults and other misbehaviour when sub-parsing, such as within interpolated expressions, fails. This change aborts compilation if anything complex enough to not be parsed by the lexer is compiled in a sub-parse *and* an error occurs within the sub-parse. An earlier version of this patch failed on simpler expressions, which caused many test failures, which this version doesn't (which may just mean we need more tests...) (cherry picked from commit bb4e4c3869d9fb6ee5bddd820c2a373601ecc310) Modified for maint by cherry-picker: New parser struct members moved to end of struct to preserve backwards-compatibility.
* (perl #131949) adjust s in case peekspace() moves the line stringTony Cook2018-03-121-1/+4
| | | | (cherry picked from commit 1141a2c757171575dd43caa4b731ca4f491c2bcf)
* (perl #131836) avoid a use-after-free after parsing a "sub" keywordTony Cook2018-03-121-0/+2
| | | | | | | | | | | | | | | The: d = skipspace(d); can reallocate linestr in the test case, invalidating s. This would end up in PL_bufptr from the embedded (PL_bufptr = s) in the TOKEN() macro. Assigning s to PL_bufptr and restoring s from PL_bufptr allows lex_next_chunk() to adjust the pointer to the reallocated buffer. (cherry picked from commit 3b8804a4c2320ae4e7e713c5836d340eb210b6cd)
* (perl #131793) sanely handle PL_linestart > PL_bufptrTony Cook2018-03-121-4/+15
| | | | | | | | | In the test case, scan_ident() ends up fetching another line (updating PL_linestart), and since in this case we don't successfully parse ${identifier} s (and PL_bufptr) end up being before PL_linestart. (cherry picked from commit 36000cd1c47863d8412b285701db7232dd450239)
* Revert "Respect hashbangs containing perl6"Leon Timmermans2017-03-201-2/+0
| | | | | | | | | | | This reverts commit d9fc04eebe29b8cf5f6f6bf31373b202eafa44d6. As discussed in http://www.nntp.perl.org/group/perl.perl5.porters/2016/05/msg236423.html, the current perl6-shebang code has rather sharp edge-cases. Hence a revert until we come up with a better solution seems wise. (cherry picked from commit f691e4455dd520eff11e7f070a9b034b0fa5ca1c)
* [perl #130814] update pointer into PL_linestr after lookaheadHugo van der Sanden2017-02-211-0/+4
| | | | | | Looking ahead for the "Missing $ on loop variable" diagnostic can reallocate PL_linestr, invalidating our pointer. Save the offset so we can update it in that case.
* Moving variables to their innermost scope.Andy Lester2017-02-181-6/+6
| | | | | | Some vars have been tagged as const because they do not change in their new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char, so is changed to char.
* Improve handling pattern compilation errorsKarl Williamson2017-02-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Perl tries to continue parsing in the face of errors for the convenience of the person running the script, so as to batch up as many errors as possible, and cut down the number of runs. Some errors will, however, have a cascading effect, resulting in the parser getting confused as to the intent. Perl currently aborts parsing if 10 errors accumulate. However, some things are reparsed as compilation continues, in particular tr///, s///, and qr//. The code that reparses has an expectation of basic sanity in what it is looking at, and so reparsing with known errors can lead to segfaults. Recent commits have tightened this up to avoid reparsing, or substitute valid stuff before reparsing. This all works, as the code won't execute until all the errors get fixed. Commit f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b changed things so that if there is an error in parsing a pattern, the whole compilation is immediately aborted. Since then, I realized it would be relatively simple to instead, skip compilation of that particular pattern, but continue on with the parsing of the program as a whole, up to the maximum number of allowed errors. And again the program will refuse to execute after compilation if there were any errors. This commit implements that, the benefit being that we don't try to reparse a pattern that failed the original parse, but can go on to find errors elsewhere in the program.
* toke.c: Make sure things are initializedKarl Williamson2017-02-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3dd4eaeb8ac39e08179145b86aedda36584a3509 fixed a bug wherein the tr/// operator parsing code could be looking at uninitialized data. This happens only because we try to carry on when we find errors, so as to find as many errors as possible in a single run, as a convenience to the person debugging the script being compiled. And we failed to initialize stuff upon getting an error; stuff that was later looked at by tr///. That commit fixed the ticket by making sure the things mentioned there got initialized upon error, but didn't handle the various other places in the loop where the same thing could happen. At the time, I thought it would be easier to instead change the tr/// handling code to know that its inputs were problematic, and to avoid looking at them in that case. This is easily done, and would automatically catch all the cases in the loop, now and any added in the future. But then I thought, maybe tr/// isn't the only operator that could be thrown off by this. It is the most obvious one, to someone who knows how it goes about getting compiled; but there may be other operators that I don't know how they get compiled and have the same or a similar problem. The better solution then would be to extend 3dd4eaeb8ac39e08179145b86aedda36584a3509 to make sure everything gets initialized when there is an error. That is what this current commit does. The previous few commits have refactored things so as to minimize the number of places that need to be handled here, down to three. I kinda doubt that new constructs will be added, at this stage in the language development, that would require the same initialization handling. But, if they were, hopefully those doing it would follow the existing paradigm that this commit and 3dd4eaeb8ac39e08179145b86aedda36584a3509 establish. Another way to handle this would have been to, instead of doing an initialize-and-'continue', to instead jump to a common label at the bottom of the loop which does the initialization. I think it doesn't matter much which, so left it as this.
* toke.c: Quit now if error at end of inputKarl Williamson2017-02-131-1/+2
| | | | | | | In these two cases, we know we are at the end of the input, and that we have an error. There is no need to try to patch things up so we can continue to parse looking for other errors; there's nothing left to parse. So skip having to deal with patching up.
* toke.c: Un-special case somethingKarl Williamson2017-02-131-2/+2
| | | | | | By refactoring slightly, we make this code in a switch statement have the same entrance and exit invariants as the other cases, so they all can be handled uniformly at the end of the switch.
* Don't try to compile a pattern known to be in errorKarl Williamson2017-02-131-0/+9
| | | | | | | | | | | | | | Regular expression patterns are parsed by the lexer/toker, and then compiled by the regex compiler. It is foolish to try to compile one that the parser has rejected as syntactically bad; assumptions may be violated and segfaults ensue. This commit abandons all parsing immediately if a pattern had errors in it. A better solution would be to flag this pattern as not to be compiled, and continue parsing other things so as to find the most errors in a single attempt, but I don't think it's worth the extra effort. Making this change caused some misleading error messages in the test suite to be replaced by better ones.
* toke.c: Add internal function to abort parsingKarl Williamson2017-02-131-0/+9
| | | | | | This is to be called to abort the parsing early, before the required number of errors have been found. It is used when continuing the parse would be either fruitless or we could be looking at garbage.
* toke.c: White-space onlyKarl Williamson2017-02-131-81/+86
| | | | Indent after the previous commit enclosed this code in a new block.
* Relax internal function APIKarl Williamson2017-02-131-9/+23
| | | | | | This changes yyerror_pvn so that its first parameter can be NULL. This indicates no message is to be output, but that parsing is to be abandoned immediately, without waiting for more errors to build up.
* Extract code into a functionKarl Williamson2017-02-131-0/+14
| | | | | | This creates a function in toke.c to output the compilation aborted message, changing perl.c to call that function. This is in preparation for this to be called from a 2nd place
* toke.c: Rmv no longer necessary UTF-8 checksKarl Williamson2017-02-131-28/+1
| | | | | | | | | The previous commit tightened up the checking for well-formed UTF8ness, so that the ones removed here were redundant. The test during a string eval may also no longer be necessary, but since there are many ways to create that string, I'm not confidant enough to remove it.
* toke.c: Fix bugs where UTF-8 is turned on in mid chunkKarl Williamson2017-02-131-0/+33
| | | | | | | | | | | | | | | | | | | | Previous commits have tightened up the checking of UTF-8 for well-formedness in the input program or string eval. This is done in lex_next_chunk and lex_start. But it doesn't handle the case of use utf8; foo because 'foo' is checked while UTF-8 is still off. This solves that problem by noticing when utf8 is turned on, and then rechecking at the next opportunity. See thread beginning at http://nntp.perl.org/group/perl.perl5.porters/242916 This fixes [perl #130675]. A test will be added in a future commit This catches some errors earlier than they used to be and aborts. so some tests in the suite had to be split into multiple parts.
* toke.c: Add branch predictionKarl Williamson2017-02-131-4/+7
| | | | The input is far more likely to be well-formed than not.
* toke.c: Fix comments describing S_tokeqKarl Williamson2017-02-131-4/+3
| | | | The comments about what this function does were incorrect.
* toke.c: Slight refactor.Karl Williamson2017-02-131-10/+13
| | | | | | This moves an automatic variable to closer to the only place it is used; it also adds branch prediction. It is likely that the input will be well-formed.
* toke.c: White space, comments, bracesKarl Williamson2017-02-131-6/+16
| | | | | I am adding the braces because in one of the areas, the lack of braces had led to a blead failure.
* toke.c: Don't compare same bytes twiceKarl Williamson2017-02-131-1/+2
| | | | | Before starting this memEQ, we know that the first bytes are the same, so might as well start the compare with the 2nd bytes.
* toke.c: Move declarationKarl Williamson2017-02-131-1/+2
| | | | This automatic variable doesn't need such a large scope.
* toke.c: Remove unused param from static functionKarl Williamson2017-02-011-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d2067945159644d284f8064efbd41024f9e8448a reverted commit b5248d1e210c2a723adae8e9b7f5d17076647431. b5248 removed a parameter from S_scan_ident, and changed its interior to use PL_bufend instead of that parameter. The parameter had been used to limit how far into the string being parsed scan_ident could look. In all calls to scan_ident but one, the parameter was already PL_bufend. In the one call where it wasn't, b5248 compensated by temporarily changing PL_bufend around the call, running afoul, eventually, of the expectation that PL_bufend points to a NUL. I would have expected the reversion to add back both the parameter and the uses of it, but apparently the function interior has changed enough since the original commit, that it didn't even think there were conflicts. As a result the parameter got added back, but not the uses of it. I tried both approaches to fix this: 1) to change the function to use the parameter; 2) to simply delete the parameter. Only the latter passed the test suite without error. I then tried to understand why the parameter in the first place, and why the kludge introduced by b5248 to work around removing it. It appears to me that this is for the benefit of the intuit_more function to enable it to discern $] from a $ ending a bracketed character class, by ending the scan before the ']' when in a pattern. The trouble is that modern scan_ident versions do not view themselves as constrained by PL_bufend. If that is reached at a point where white space is allowed, it will try appending the next input line and continuing, thus changing PL_bufend. Thus the kludge in b5248 wouldn't necessarily do the expected limiting anyway. The reason the approach "1)" I tried didn't work was that the function continued to use the original value, even after it had read in new things, instead of accounting for those. Hence approach "2)" is used. I'm a little nervous about this, as it may lead to intuit_more() (which uses heuristics) having more cases where it makes the wrong choice about $] vs [...$]. But I don't see a way around this, and the pre-existing code could fail anyway. Spotted by Dave Mitchell.
* PATCH: [perl #130655] Unrecognized UTF-8 charKarl Williamson2017-01-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The root cause of this was code like this if (a) b which got changed into if (a) c b thus causing 'b' to being changed to be executed unconditionally. The solution is just to add braces if (a) { c b } This is why I always use braces even if not required at the moment. It was the coding standard at $work. It turns out that #130567 doesn't even come up with this fix in place.
* PATCH: [perl #130656] tr// failue with UTF-8 across linesKarl Williamson2017-01-311-3/+16
| | | | | | | | | | | | | | | | | This bug happend under things like tr/\x{101}-\x{200}/ \x{201}-\x{301}/ The newline in the middle was crucial. As a result the second line got parsed already knowing that the result was UTF-8, and as a result setting a variable got skipped which happens only when we discover we need to flip into UTF-8. The solution adopted here is to set the variable under other conditions, which leads to it getting set multiple times. But this extra branch and setting is confined to somehwat rare circumstances, leaving the mainline code untouched.
* signature sub (\x80 triggered an assertionDavid Mitchell2017-01-301-1/+1
| | | | | | | | | RT #130661 In the presence of 'use feature "signatures"', a char >= 0x80 where a sigil was expected triggered an assert failure, because the (signed) character was being was being promoted to int and ended up getting returned from yylex() as a negative value.
* PATCH: [perl #130666]: Revert "toke.c, S_scan_ident(): Don't take a "end of ↵Karl Williamson2017-01-291-11/+8
| | | | | | | | | | | | | | buffer" argument, use PL_bufend" This reverts commit b5248d1e210c2a723adae8e9b7f5d17076647431. This commit, dating from 2013, was made unnecessary by later removal of the MAD code. It temporarily changed the PL_bufend variable; doing that ran afoul of an assertion, added in fac0f7a38edc4e50a7250b738699165079b852d8, that expects PL_bufend to point to a terminating NUL. Beyond the reversion, a test is added here.
* perlapi: Fix grammarKarl Williamson2017-01-261-1/+1
|
* PATCH: [perl #130567] Assertion failure in scan_constKarl Williamson2017-01-251-0/+14
| | | | | | It turns out that eval text isn't necessarily parsed by lex_next_chunk(), but is by lex_start(). So, add a test to there to look for malformed UTF-8.
* Use cBOOL() instead of ? TRUE : FALSEDagfinn Ilmari Mannsåker2017-01-251-2/+2
| | | | Except under cpan/ and dist/
* (perl #129190) intuit_method() can move the line bufferTony Cook2017-01-241-1/+9
| | | | and broke PL_bufptr when it did.
* (perl #129274) avoid treating the # in $# as a comment introTony Cook2017-01-241-1/+3
|
* Be consistent in deprecation messages.Abigail2017-01-231-1/+1
| | | | | | Changed one deprecation message to not use a leading v in the Perl version number, as the other deprecation messages don't have them either.
* toke.c: Refactor part of tr// handling, mostly for EBCDICKarl Williamson2017-01-191-30/+86
| | | | | | | | | | | | | | | | | | | | | | | | Commit af9be36c89322d2469f27b3c98c20c32044697fe changed toke.c to count the number of UTF-8 variant characters seen in a string so far. If the count is 0 when the string has to be upgraded to UTF-8, then only a flag has to be flipped, saving reparse time. Incrementing this count wasn't getting done during the expansion of ranges like A-Z under tr///. This currently doesn't matter for ASCII platforms, as the count is currently treated as a boolen, and it was getting set if a range endpoint is variant. On EBCDIC platforms a range may contain variants even if both endpoints are not. For example \x00-\xFF. (\xFF is a control that is an invariant). This led to a lot of noise on an EBCDIC smoke, but no actual tests failing. I want to keep it as a count so that in the future, things could be changed so that count can be used to know how big to grow a string when it is converted to UTF-8, without having to re-parse it as we do now. It turns out that we need to have this count anyway in the tr/// code as that grows the string to account for the expansion, and needs to know how many variants there are in order to do so if the string already is in UTF-8. So refactoring that code slightly allows the count to served double-duty, for the grow if it is already UTF-8, and how much to grow if it isn't UTF-8. And it fixes the noise problem on EBCDIC
* toke.c: Avoid work if tr/a-b/foo/Karl Williamson2017-01-191-0/+8
| | | | | A two-element range here is already fully set up, and no need to do anything.
* toke.c: Avoid work for tr/a-a/.../Karl Williamson2017-01-191-1/+14
| | | | A single element range can skip a bunch of work.
* toke.c: Save a branchKarl Williamson2017-01-191-3/+5
| | | | | By ordering these sequential tests properly, a branch in the mainline can be saved.
* toke.c: Add, clarify some comments, white-spaceKarl Williamson2017-01-191-69/+67
|
* [perl #129342] ensure range-start is set after error in tr///Hugo van der Sanden2017-01-191-2/+2
| | | | | | | A parse error due to invalid octal or hex escape in the range of a transliteration must still ensure some kind of start and end values are captured, since we don't stop on the first such error. Failure to do so can cause invalid reads after "Here we have parsed a range".
* warn at most once per literal about misplaced _Zefram2017-01-171-26/+23
| | | | Fixes [perl #70878].
* Deprecation of an unqualified dump() to mean CORE::dump().Abigail2017-01-161-2/+4
| | | | This will no longer be allowed in 5.30.
* Use of comma-less variable lists is deprecated.Abigail2017-01-161-1/+1
| | | | It will be fatal by Perl 5.28.
* Bare heredocs will be fatal in 5.28.Abigail2017-01-161-1/+1
| | | | | | Heredocs without a terminator after the << have been deprecated since 5.000. After more than 2 decades, it's time to retire this construct. They will be fatal in 5.28.
* Use of \N{} will be fatal in 5.28.Abigail2017-01-161-2/+1
| | | | | Use of \N{} in a double quoted context, with nothing between the braces, was deprecated in 5.24, and it will be a fatal error in 5.28.
* Time limit the deprecation of :unique and :locked.Abigail2017-01-161-2/+4
| | | | | | | | | | | | | | | | The :unique and :locked attributes have had no effect since 5.8.8 and 5.005 respectively. They were deprecated in 5.12. They are now scheduled to be deleted in 5.28. There are two places the deprecation warning can be issued: in lib/attributes.pm, and in toke.c. The warnings were phrased differently, but since we're changing the warning anyway (as we added the version of Perl in which the attributes will disappear), we've used the same phrasing for this warning, regardless of where it is generated: Attribute "locked" is deprecated, and will disappear in Perl 5.28 Attribute "unique" is deprecated, and will disappear in Perl 5.28
* Add /xx regex pattern modifierKarl Williamson2017-01-131-8/+0
| | | | | This was first proposed in the thread starting at http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
* toke.c: Make too-long inline function just staticKarl Williamson2017-01-051-1/+1
| | | | | This function is too long to be effectively inlined, so don't request the compiler to do so.
* [perl #130495] /x comment skipping stops a byte shortHugo van der Sanden2017-01-041-1/+6
| | | | | | | | If that byte was part of a utf-8 character, this caused inappropriate "malformed utf8" warnings or assertions. In principle this should also skip the newline, but failing to do so is safe.