summaryrefslogtreecommitdiff
path: root/toke.c
Commit message (Collapse)AuthorAgeFilesLines
* eliminate recursion from yyl_fake_eof() into yyl_try()Tony Cook2020-05-211-14/+46
| | | | | | | This is intended as a minimal commit due to the current stage of the release process. fixes #17268
* eliminate len from recursive yyl_try/yyl_fake_eofTony Cook2020-05-211-10/+9
| | | | | len is only used in these functions to pass to the other function when recursing.
* removed unused len parameter from yyl_dblquote()Tony Cook2020-05-211-2/+3
| | | | | The len variable is used, but the value is overwritten before being read.
* Remove spurious double spaces before open braces in core C codeDagfinn Ilmari Mannsåker2020-04-131-1/+1
|
* gh-17645: avoid oob read on conflict marker detectionHugo van der Sanden2020-03-271-3/+3
| | | | Introduced in 0ae5281a2d.
* chained comparisonsZefram2020-03-121-22/+28
|
* Fix variable name in wrap_keyword_plugin documentationStefan Seifert2020-03-051-1/+1
|
* Add 'indirect' feature that can be turned off to disable indirect object syntaxDagfinn Ilmari Mannsåker2020-02-161-1/+4
| | | | Co-authored-by: Tony Cook <tony@develop-help.com>
* toke.c: Split code to load _charnames.pm into own fncKarl Williamson2020-02-121-29/+62
| | | | This is in preparation for it being called from more than one place.
* toke.c: Change variable name, add oneKarl Williamson2020-02-121-8/+12
| | | | | This if for clarity as to what's going on, and to simplify some expressions.
* toke.c: extract charnames code from S_new_constantKarl Williamson2020-02-121-76/+82
| | | | | | | | | | | | | | | | | | | | | The code for dealing with charnames is intertwined and special cased in S_new_constant. My guess is it was originally to offer customized, better error messages when things go wrong. Much later the function was changed so that a message could be returned instead of output, and the code didn't really need the customization any longer. But by then autoloading of charnames had been added when a \N[} was parsed, meaning that more special casing was added instead, as that had been the logical place to do it. This commit extracts the special charnames handling to the one place it is actually used, and the disentangled S_new_constant is then called. This is in preparation for future commits, and makes the code cleaner. This adds testing of the new syntax to lib/charnames.t. That file randomly generates some tests, simply because there are too many names to test reasonably at once. To compensate for the added tests, I lowered the percentage per run of characters tested so that this file takes about the same amount of time as before.
* toke.c - handle ${10} properly - Issue #12948Yves Orton2020-02-101-6/+25
| | | | | | | | ${10} and $10 were handled differently, this patch makes them be handled the same. It also forbids multi-digit numeric variables from starting with 0. Thus $00 is now a new fatal exception "Numeric variables with more than one digit may not start with '0'"
* toke.c: fix Multidimensional array heuristic to ignore function callsYves Orton2020-02-021-6/+33
| | | | | | | | | | | | | | | | | | | | | | | Fix issue #16535 - $t[index $x, $y] should not throw Multidimensional array warnings. The heuristic for detecting lists in array subscripts is implemented in toke.c, which means it is not particularly reliable. There are lots of ways that code might return a list in an array subscript. So for instance $t[do{ $x, $y }] should throw a warning but doesn't. On the other hand, we can make this warning less likely to happen by being a touch more careful about how we parse the inside of the square brackets so we do not throw an exception from $t[index $x,$y]. Really this should be moved to the parser so we do not need to rely on fallable heuristics, and also into the runtime so that if we have $t[f()] and f() returns a list we can also warn there. But for now this improves things somewhat.
* toke.c: Don't accept illegal code pointsKarl Williamson2020-01-231-2/+12
| | | | | | This now croaks if the input is an illegal code point. Before, it likely would eventually croak if that code point was actually used in some manner.
* Remove dquote_inline.hKarl Williamson2020-01-231-1/+0
| | | | | The remaining function in this file is moved to inline.h, just to not have an extra file lying around with hardly anything in it.
* (toke|regcomp).c: Use common fcn to handle \0 problemsKarl Williamson2020-01-231-6/+9
| | | | | | | | This changes warning messages for too short \0 octal constants to use the function introduced in the previous commit. This function assures a consistent and clear warning message, which is slightly different than the one this commit replaces. I know of no CPAN code which depends on this warning's wording.
* Restructure grok_bslash_[ox]Karl Williamson2020-01-231-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes these functions to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within these functions. The messages are somewhat changed from previously to be clearer. I did not find any code in CPAN that relied on the previous message text. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) These could generate truncated messages due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* Restructure grok_bslash_cKarl Williamson2020-01-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes this function to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within this function. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) The messages could be truncated due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* Hoist code point portability warningsKarl Williamson2020-01-231-6/+8
|
* utf8.c: Change parameter types of internal fcnsKarl Williamson2020-01-031-1/+1
| | | | | These generated warnings on certain platform builds, and weren't the best types for the purpose anyway.
* Add memCHRs() macro and use itKarl Williamson2019-12-181-29/+29
| | | | | | | This replaces strchr("list", c) calls throughout the core. They don't work properly when 'c' is a NUL, returning the position of the terminating NUL in "list" instead of failure. This could lead to segfaults or even security issues.
* Note that certain flags are documentedKarl Williamson2019-12-171-0/+7
| | | | | | | | | | | This is useful in Devel::PPPort for generating its api-info data. That useful feature of D:P allows someone to find out what was the first release of Perl to have a function, macro, or flag. And whether using ppport.h backports it further. I went through apidoc.pod and looked for flags that were documented but that D:P didn't know about. This commit adds entries for each so that D:P can find them.
* Rmv leading underscore from macro nameKarl Williamson2019-12-111-2/+2
| | | | | | | These are illegal in C, but we have plenty of them around; I happened to be looking at this function, and decided to fix it. Note that only the macro name is illegal; the function was fine, but to change the macro name means changing the function one.
* Add the `isa` operatorPaul "LeoNerd" Evans2019-12-091-0/+5
| | | | | | | | | | | | | | | | | | Adds a new infix operator named `isa`, with the semantics that $x isa SomeClass is true if and only if `$x` is a blessed object reference that is either `SomeClass` directly, or includes the class somewhere in its @ISA hierarchy. It is false without warning or error for non-references or non-blessed references. This operator respects `->isa` method overloading, and is intended to replace boilerplate code such as use Scalar::Util 'blessed'; blessed($x) and $x->isa("SomeClass")
* Fix: local variable hiding parameter of same nameJames E Keenan2019-11-121-5/+5
| | | | | | | | | | | | | | | | LGTM provides static code analysis and recommendations for code quality improvements. Their recent run over the Perl 5 core distribution identified 12 instances where a local variable hid a parameter of the same name in an outer scope. The LGTM rule governing this situation can be found here: Per: https://lgtm.com/rules/2156240606/ This patch renames local variables in approximately 8 of those instances to comply with the LGTM recommendation. Suggestions for renamed variables were made by Tony Cook. For: https://github.com/Perl/perl5/pull/17281
* fix build under PERL_GLOBAL_STRUCT_PRIVATEDavid Mitchell2019-11-121-0/+1
| | | | sprinkle a few random 'dVAR's at the top of some fns.
* Remove swashes from coreKarl Williamson2019-11-061-2/+2
| | | | Also references to the term.
* Reimplement tr/// without swashesKarl Williamson2019-11-061-20/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large commit removes the last use of swashes from core. It replaces swashes by inversion maps. This data structure is already in use for some Unicode properties, such as case changing. The inversion map data structure leads to straight forward implementation code, so I collapsed the two doop.c routines do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few conditionals could be avoided in the loop if this function were split so that one version didn't have to test for, e.g., squashing, but I suspect these are in the noise in the loop, which has to deal with UTF-8 conversions. This should be faster than the previous implementation anyway. I measured the differences some releases back, and inversion maps were faster than the equivalent swash for up to 512 or 1024 different ranges. These numbers are unlikely to be exceeded in tr/// except possibly in machine-generated ones. Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases, but I left in the existing non-UTF-8 implementation, which uses tables, because I suspect it is faster. This means that there is extra code, purely for runtime performance. An inversion map is always created from the input, and then if the table implementation is to be used, the table is easily derived from the map. Prior to this commit, the table implementation was used in certain edge cases involving code points above 255. Those cases are now handled by the inversion map implementation, because it would have taken extra code to detect them, and I didn't think it was worth it. That could be changed if I am wrong. Creating an inversion map for all inputs essentially normalizes them, and then the same logic is usable for all. This fixes some false negatives in the previous implementation. It also allows for detecting if the actual transliteration can be done in place. Previously, the code mostly punted on that detection for the UTF-8 case. This also allows for accurate counting of the lengths of the two sides, fixing some longstanding TODO warning tests. A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a below 256 character resolving to one that requires UTF-8. If this isn't set, the code knows that a non-UTF-8 input won't become UTF-8 in the process, and so can take short cuts. The bit representing this flag is the same as OPpTRANS_FROM_UTF, which is no longer used. That name is left in so that the dozen-ish modules in cpan that refer to it can still compile. AFAICT none of them actually use the flag, as well they shouldn't since it is private to the core. Inversion maps are ideally suited for tr/// implementations. An issue with them in general is that for some pathological data, they can become fragmented requiring more space than you would expect, to represent the underlying data. However, the typical tr/// would not have this issue, requiring only very short inversion maps to represent; in some cases shorter than the table implementation. Inversion maps are also easier to deparse than swashes. A deparse TODO was also fixed by this commit, and the code to deparse UTF-8 inputs is simplified. One could implement specialized data structures for specific types of inputs. For example, a common tr/// form is a single range, like tr/A-Z/a-z/. That could be implemented without a table and be quite fast. An intermediate step would be to use the inversion map implementation always when the transliteration is a single range, and then special case length=1 maps at execution time. Thanks to Nicholas Rochemagne for his help on B
* Change macro name in tr/// codeKarl Williamson2019-11-061-8/+9
| | | | This makes it more mnemonic. Also add an explanation in toke.c
* toke.c: comment, White-space onlyKarl Williamson2019-11-061-2/+3
| | | | Wrap a too-long line
* toke.c: comment changesKarl Williamson2019-11-051-9/+2
| | | | | These should have been included in 0c311b7c345769239f38d0139ea7738feec5ca4d
* Remove unused `key` and `orig_keyword` parameters from `yyl_key_core`Dagfinn Ilmari Mannsåker2019-11-051-3/+5
| | | | | They were only ever passed as zeros, so just make them local to the function.
* Rename `tmp` local to `key` in `yyl_keylookup`Dagfinn Ilmari Mannsåker2019-11-051-9/+9
| | | | Also only initialise it just before it's actually used.
* Remove unused `key` parameter from `yyl_just_a_word`Dagfinn Ilmari Mannsåker2019-11-051-13/+12
|
* toke.c: const-ify formbrack parametersAaron Crane2019-11-041-2/+2
|
* toke.c: replace recursive calls to yyl_try() with gotoAaron Crane2019-11-041-15/+24
| | | | | | | | | | | | | | | | | | | | | | | | | The downside of writing these calls recursively is that not all compilers will compile the tail-position calls as jumps; that's especially true in earlier versions of this refactoring process (where yyl_try() took a large number of arguments), but it's not in general something we can expect to happen — especially in the presence of `-O0` or similar compiler options. This can lead to call-stack overflow in some circumstances. Most recursive calls to yyl_try() occur within yyl_try() itself, so we can easily replace them with an explicit `goto` (which is what most compilers would use for the recursive calls anyway, now that yyl_try() takes ≤3 parameters). There are only two other recursive-call cases. One is yyl_fake_eof(), which as far as I can tell is never called repeatedly within a single file; this seems safe. The other is yyl_eol(). It has exactly two distinct return paths, so this commit moves the retry logic into its yyl_try() caller. With this change, we no longer seem to trigger call-stack overflow. Closes #17220
* toke.c: delete unused bof parametersAaron Crane2019-11-041-14/+13
|
* toke.c: don't pass around a copy of PL_parser->saw_infix_sigilAaron Crane2019-11-041-56/+65
| | | | | | | | | | | | There's exactly one place where we need to consult it (and that only for producing good error messages in a specific group of term-after-term situations). The reason for passing it around was so that it could be reset to false early on in the process of lexing a token, while then allowing the three separate cases that might need to set it true to do so independently. Instead, centralise the logic of determining when it needs to be true.
* toke.c: remove some spurious orig_keyword usesAaron Crane2019-11-041-6/+4
|
* toke.c: remove formbrack argument from yyl_try()Aaron Crane2019-11-041-23/+20
| | | | | With this commit, yyl_try() has few enough arguments that the RETRY() macro no longer serves any useful purpose; delete it too.
* toke.c: delete weird initial_state arg to yyl_try()Aaron Crane2019-11-041-20/+13
| | | | | | | I thought I was going to end up using this for more stuff, but I've found better approaches. This commit also removes two more goto targets.
* toke.c: factor out static yyl_keylookup()Aaron Crane2019-11-041-157/+143
|
* toke.c: factor out static yyl_key_core() and yyl_word_or_keyword()Aaron Crane2019-11-041-945/+947
|
* toke.c: bundle some yyl_just_a_word() params into a structAaron Crane2019-11-041-71/+98
| | | | This makes calls to it much easier to understand.
* toke.c: factor out static yyl_just_a_word()Aaron Crane2019-11-041-291/+275
|
* toke.c: stop passing around several needless local variablesAaron Crane2019-11-041-22/+13
| | | | | | | I introduced these parameters as part of mechanically refactoring goto-heavy logic into subroutines. However, they aren't actually needed through most of the code. Even in the recursive case (in which yyl_try() or one of its callees will call itself), we can reset the variables to zero.
* toke.c: factor out static yyl_strictwarn_bareword()Aaron Crane2019-11-041-31/+39
|
* toke.c: remove the really_sub goto labelAaron Crane2019-11-041-18/+7
| | | | This permits some additional pleasing simplifications.
* toke.c: factor out static yyl_constant_op()Aaron Crane2019-11-041-43/+48
| | | | With the removal of another goto label!
* toke.c: factor out static yyl_safe_bareword()Aaron Crane2019-11-041-14/+20
|