summaryrefslogtreecommitdiff
path: root/toke.c
Commit message (Collapse)AuthorAgeFilesLines
* warnings.pm - add deprecated::delimiter_will_be_paired categoryYves Orton2023-03-181-1/+1
| | | | | | Some delimiters are considered deprecated because in the future they will be used as part of a paired delimiter. This adds a new category for these cases.
* warnings.pm - add deprecated::apostrophe_as_package_separator as new ↵Yves Orton2023-03-181-3/+3
| | | | | | | deprecation category This category is about use of apostrophe as a package separator, eg for things like "Test::More::isn't()".
* warnings.pm - support deprecated::smartmatch categoryYves Orton2023-03-181-3/+3
| | | | | | | | | | | | | | | | Currently we seem to lack a way to have a subcategory under deprecated. It seems reasonable to me that people might want to disable a specific subcategory warning while leaving the rest in place. This patch allows that. Note that both no warnings "deprecated"; and no warnings "deprecated::smartmatch"; work to disable the warning. Deprecated warnings shouldn't be "all or nothing", they should be specific and targetted.
* Have start_subparse() call class_prepare_method_parse() if CVf_IsMETHODPaul "LeoNerd" Evans2023-03-061-1/+8
|
* Deprecate smartmatchPhilippe Bruhat (BooK)2023-02-251-6/+6
| | | | | Make the 'experimental::smartmatch' warning obsolete, and use 'deprecated' instead.
* pp_ctl.c - Consistently exit after 10 errorsYves Orton2023-02-201-23/+4
| | | | | | | | | | | Currently we only check the error count when we report an error via yyerror(), even though we say we will stop processing after 10 errors. Errors reported directly to qerror() bypass the check. This fixes this so that we check the number of errors reported in qerror() itself. We also change qerror() so that qerror(NULL) triggers the exception, this way we can move the logic out of yyerror and into qerror().
* perl.h, pp_ctl.c - switch to standard way of terminating compilationYves Orton2023-02-201-9/+1
| | | | | | I did not fully understand the use of yyquit() when I implemented the SYNTAX_ERROR related stuff. It is not needed, and switching to this makes eval compile error messages more consistent.
* toke.c - silence maybe-uninitialized warning on gcc 12Yves Orton2023-02-191-1/+1
| | | | | | | | | | | | This silences the following (bogus) warning: toke.c:12104:24: warning: ‘b’ may be used uninitialized [-Wmaybe-uninitialized] It won't be used unitialized, but there is no reason not to initialize it to shut up the warning on gcc-12 Fixes Github Issue #20816
* toke.c - use SvREFCNT_dec() rather than calling sv_free()Richard Leach2023-02-101-5/+8
| | | | | | | | | sv_free() is a function call just to then do SvREFCNT_dec() anyway. SvREFCNT_dec is a macro that just calls the simple inline function Perl_SvREFCNT_dec(). In places where the SV being operated on has been newly creted, we can use ASSUME() statements to help the compiler to eliminate some unnecessary branches in this function.
* Field :param attributes, //= and ||= default assignmentsPaul "LeoNerd" Evans2023-02-101-2/+3
|
* Initial attack at parsing attribute syntax for class blocks; though no attrs ↵Paul "LeoNerd" Evans2023-02-101-3/+4
| | | | are yet defined
* Initial attack at basic 'class' featurePaul "LeoNerd" Evans2023-02-101-7/+58
| | | | | | | | | | | | | Adds a new experimental warning, feature, keywords and enough parsing to implement basic classes with an empty `new` constructor method. Inject a $self lexical into method bodies; populate it with the object instance, suitably shifted Creates a new OP_METHSTART opcode to perform method setup Define an aux flag to remark which stashes are classes Basic implementation of fields. Basic anonymous methods.
* regcomp.c - remove (**{ ... }) from the regex engineYves Orton2023-02-081-9/+9
| | | | | | | | | | | | | | | | | Dave M pointed out that this idea was flawed, and after some testing I have come to agree with him. This removes it. It was only available for 5.37.8, so no deprecation cycle involved. The point of (**{ ... }) was to have a postponed eval that does not disable optimizations. But some of the optimizations are disabled because if they are not we do not match correctly as the optimizations will make unwarranted assumptions about the pattern, assumptions which can be incorrect depending on what pattern is returned from the codeblock. The original idea was proposed because (?{ ... }) was treated as though it was (??{ ... }) and disabled many optimizations, when in fact it doesn't interact with optimizations at all. When I added (*{ ... }) as the optimistic version of (?{ ... }) I used "completeness" as the justification for also adding (**{ ... }) when it does not make sense to do so.
* toke.c - dont just return the function, assign it to an intermediaryYves Orton2023-02-071-2/+5
| | | | | | So we can debug it before we return. an optimization compiler should make them the same thing anyway, but under -Og it is helpful to be able to be able to see the return before we return it.
* toke.c: deprecation warning for ' as a package separatorTony Cook2023-02-071-38/+44
| | | | | | | First stage of RFC 0015. This also changes the warning for ' as package separator in quoted strings to also be a deprecation warning.
* Enable the current "Old package separator used in string" by defaultTony Cook2023-02-071-1/+1
| | | | | If we're going to be removing this entirely, the behavior of existing code is going to change. Make sure everyone sees it.
* regcomp.c - add optimistic eval (*{ ... }) and (**{ ... })Yves Orton2023-01-191-11/+17
| | | | | | | | | | | | | | | | | This adds (*{ ... }) and (**{ ... }) as equivalents to (?{ ... }) and (??{ ... }). The only difference being that the star variants are "optimisitic" and are defined to never disable optimisations. This is especially relevant now that use of (?{ ... }) prevents important optimisations anywhere in the pattern, instead of the older and inconsistent rules where it only affected the parts that contained the EVAL. It is also very useful for injecting debugging style expressions to the pattern to understand what the regex engine is actually doing. The older style (?{ ... }) variants would change the regex engines behavior, meaning this was not as effective a tool as it could have been. Similarly it is now possible to test that a given regex optimisation works correctly using (*{ ... }), which was not possible with (?{ ... }).
* toke.c: Change spelling of one variableJames E Keenan2022-12-291-6/+6
| | | | s/overriden/overridden/gc
* toke.c: Manual correction of typos from GH 20435James E Keenan2022-12-291-4/+4
| | | | | This commit only corrects typos in comments. A subsequent commit will change the spelling of a variable.
* Define five new operator precedence levelsPaul "LeoNerd" Evans2022-12-161-12/+27
| | | | | | | | | | | | Assignment operators (`==`) were missing, as were both the logical and the low-precedence shortcutting OR and AND operators (`&&`, `||`, `and`, `or`) Also renumbered them around somewhat to even out the spacing. This is fine during a development cycle. Also renamed the tokenizer/parser symbol names from "PLUG*OP" to "PLUGIN_*_OP" for better readability.
* Write an apidoc fragment for wrap_infix_plugin()Paul "LeoNerd" Evans2022-12-141-0/+21
|
* Token type `pval` should be a void * pointerPaul "LeoNerd" Evans2022-12-141-3/+3
| | | | | | | | | | The `pval` field of the token type union is currently only used in one place; storing the result of the infix operator plugin. Its use here stores a structure pointer, not a string. The union should define this field as a `void *` and not a `char *`. In addition we should not attempt to debug print it as a string because its value is not valid as one.
* Define a PL_infix_plugin hook, of a similar style to PL_keyword_pluginPaul "LeoNerd" Evans2022-12-081-0/+132
| | | | | | | | | Runs for identifier-named custom infix operators and sequences of non-identifier symbol characters. Defines multiple precedence levels for custom infix operators that fit alongside exponentiation, multiplication, addition, or relational comparision operators, as well as a "high" and "low" at either end.
* Simplify a few callsites with the newPADxVOP() functionPaul "LeoNerd" Evans2022-12-081-2/+1
|
* Replace SvGROW with sv_grow_fresh in perl.c, pp_sys.c, toke.cRichard Leach2022-11-301-1/+1
| | | | | | | | | | | | | | Changed: * perl.c - Perl_moreswitches * pp_sys.c - pp_sysread * toke.c - Perl_scan_str In each of the above functions, one instance of SvGROW on a new SVt_PV can be swapped for the more efficient sv_grow_fresh. In two of the instances, the calls used to create the the SVt_PV have also been streamlined. There should not be any functional change as a result of this commit.
* Recognise `//=` and `||=` syntax in signature parameter defaultsPaul "LeoNerd" Evans2022-11-261-1/+17
| | | | | | These create parameters where the default expression is assigned whenever the caller did not pass a defined (or true) value. I.e. both if it is missing, or is present but undef (or false).
* toke.c - rework "Perl_no_op" warnings so we call Perl_warner() once onlyYves Orton2022-10-261-17/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using multiple calls to Perl_warner() means that fatalized warnings do not include the full diagnostics. It also means that $SIG{__WARN__} might get called more than once for a given warning, with parts of the message in each call. This would affect "missing operator" warnings, which often come up in the context of barewords and misspelled sub names. This patch moves the parenthesized "hint" part of the warning to the same line as the main warning and ensures the entire message is dispatched in a single call to Perl_warner(). The result of this is that the hint is visible even under fatalized warnings and that $SIG{__WARN__} is called only once for the warning. At the same time this patch fixes an oversight where we would sometimes form warning messages with a subject (var name or bareword name) that was unquoted and sometimes had leading whitespace. This patch changes this to quote the subject like most of our errors do and to strip the whitespace when appropriate. (Note this doesn't use the QUOTEDPREFIX formats, as it didn't seem to be necessary with the type of warnings this is involved in.) This is not done in a separate patch as it would mean manually altering all the tests multiple times over multiple patches. Note that yywarn() calls Perl_warner(), so even though this patch does not call it directly it does call it indirectly via yywarn() via yyerror(). This resolves GH Issue #20425.
* Better handling of builtin CV attributesPaul "LeoNerd" Evans2022-10-251-40/+88
| | | | | | | | | | | The previous code would handle subroutine attributes directly against `PL_compcv` as a side-effect of merely parsing the syntax in `yyl_colon()`, an unlikely place for anyone to find it. This complicates the way the parser works. The new structure creates a new function to apply all the builtin attributes out of an attribute list to any given CV, and invokes it from the parser at a slightly better time.
* Use `LINE_Tf` for formatting line numbersTAKAI Kousuke2022-10-131-4/+4
|
* toke.c: Use `line_t` (rather than `I32`) to hold the value of `CopLINE()`TAKAI Kousuke2022-10-131-1/+1
|
* Use `LINE_Tf` thoroughly for formatting the value of CopLINE()TAKAI Kousuke2022-10-131-3/+3
| | | | | | The value of CopLINE() used to be formatted with various way; sometimes with `%ld` and `(long)` cast, sometimes `IVdf` and `(IV)` cast, or `%d` and so on.
* toke.c - silence build warning for non DEBUGGING modeYves Orton2022-09-251-5/+3
| | | | | | | | | | | | | | A previous commit introduced the variable 'bool syntax_error' which was only used under DEBUGGING, which produced unused variable warnings on "production" builds. This reworks the code to not need the variable at all. Warning fixed: toke.c: In function ‘Perl_yyerror_pvn’: toke.c:12662:14: warning: unused variable ‘syntax_error’ [-Wunused-variable] 12662 | bool syntax_error = PERL_PARSE_IS_SYNTAX_ERROR(PL_error_count); | ^~~~~~~~~~~~
* S_scan_heredoc - use SvPVCLEAR_FRESH on new SVRichard Leach2022-09-181-1/+1
|
* S_scan_heredoc: fresh sv functions close to point of useRichard Leach2022-09-181-2/+2
| | | | | | S_scan_heredoc does a SvGROW on a fresh PVIV. A sv_grow_fresh is more efficient, plus it seems only really needed in the nearby "else" branch. In the "if" branch, sv_setsv_fresh can be used directly.
* Stop parsing on first syntax error.Yves Orton2022-09-091-15/+41
| | | | | | | | | | | | | | | | | | | | | | We try to keep parsing after many types of errors, up to a (current) maximum of 10 errors. Continuing after a semantic error (like undeclared variables) can be helpful, for instance showing a set of common errors, but continuing after a syntax error isn't helpful most of the time as the internal state of the parser can get confused and is not reliably restored in between attempts. This can produce sometimes completely bizarre errors which just obscure the true error, and has resulted in security tickets being filed in the past. This patch makes the parser stop after the first syntax error, while preserving the current behavior for other errors. An error is considered a syntax error if the error message from our internals is the literal text "syntax error". This may not be a complete list of true syntax errors, we can iterate on that in the future. This fixes the segfaults reported in Issue #17397, and #16944 and likely fixes other "segfault due to compiler continuation after syntax error" bugs that we have on record, which has been a recurring issue over the years.
* Replace sv_2mortal(newSVhek( with newSVhek_mortalRichard Leach2022-08-051-1/+1
| | | | The new Perl_newSVhek_mortal function is slightly more efficient.
* toke.c: Variable should be decleared Size_t, not SSize_tKarl Williamson2022-08-031-1/+1
|
* OP_RUNCV should be created by newSVOP()Paul "LeoNerd" Evans2022-08-031-1/+4
| | | | | | | | | | | This is in case rpeep converts it into an OP_CONST; it will need the space big enough to be a full SVOP. Before this commit it called `newPVOP()` which wasn't technically correct, but since sizeof(PVOP) == sizeof(SVOP) nothing actually broke when the memory slab was reused. However if the definition of either op type is changed so this is no longer the case, it may cause otherwise hard-to-debug memory corruption.
* toke.c - consistently refuse octal digit vars, and allow ${10} under strict.Yves Orton2022-07-301-34/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1d0e12bb5bdc88c62a18294336ab03d4b8 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue #12948, Issue #19986, and Issue #19989.
* Rename CVf_METHOD to CVf_NOWARN_AMBIGUOUSPaul "LeoNerd" Evans2022-07-261-2/+2
| | | | Also renames the CvMETHOD* macro family to CvNOWARN_AMBIGUOUS*
* Consistency - use DEBUG_TOKEN for all tokensBranislav Zahradník2022-07-131-100/+100
|
* fix no bareword_filehandles for method calls as first argument of LSTOPsTony Cook2022-07-041-6/+0
| | | | | | | | | | | | | | | | | | | The original code in toke.c tried to detect a bareword as a file handle during tokenization rather than once the expression following the LSTOP has been parsed. I've moved the checking for a bareword filehandle to the ck functions for each OP in most cases, which covers most OPs except for LSTOPs like print. To handle that I've added a check in newGVREF(). This means in most cases that the error is produced just as the bareword would normally be wrapped in an OP_RV2GV. The special case for that is readline(), which does it's own rv2gv() at runtime, but it does have a ck function which can check for the bareword handle.
* Rename token types for keywords to add KW_... prefixPaul "LeoNerd" Evans2022-07-021-54/+54
| | | | | | | | | | | Some of the token types represent simple keywords; some of them do not. It's easier to read and work out what's going on if all the simple keyword ones have a common prefix; `KW_...` in this case. Additionally I've renamed the four `sub`-related keywords to have a bit more structure to them. Also added comments.
* Rename some grammar rules/tokens to avoid 'method'Paul "LeoNerd" Evans2022-06-281-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | These token names are shared between perl.y and toke.c, to communicate on the nature of various tokens parsed from perl source. The name FUNCMETH used to refer to a method call with possible arguments ->NAME(...) whereas METHOD referred to one without even the parens ->NAME These names are a little confusing, and most importantly, METHOD was in the way of my adding a new `method` keyword as part of the upcoming work on 'use feature "class"'. As such, this simple rename moves them out of the way and makes them slightly more consistent and easier to read/remember, by calling them METHCALL and METHCALL0. This commit also renames the `method` grammar rule to `methodname`, for similar reasons. As all of these names are entirely internal to the tokenizer/parser, there is not expected to be any upstream CPAN incompatibility, or other issues, caused by these renames.
* toke.c: Remove Undefined C behaviorKarl Williamson2022-06-101-2/+6
| | | | Spotted by clang 14.
* perlapi: Document start_subparseKarl Williamson2022-05-271-0/+17
|
* perlapi: Document scan_vstringKarl Williamson2022-05-101-0/+3
|
* perlapi: Document filter_delKarl Williamson2022-05-071-1/+9
|
* toke.c: Reorder branches for clarityKarl Williamson2022-04-011-4/+4
| | | | The trivial case should be handled first.
* toke.c: scan_str(): Rmv special handling for NUL delimKarl Williamson2022-04-011-10/+9
| | | | | Because we use ninstr(), which can handle NULs, no special handling of them is required.