summaryrefslogtreecommitdiff
path: root/toke.c
Commit message (Collapse)AuthorAgeFilesLines
* support "package Foo { ... }"Zefram2010-05-201-2/+5
| | | | | | Package block syntax limits the scope of the package declaration to the attached block. It's cleaner than requiring the declaration to come inside the block.
* Revert "New deprecation warning: Dot after %s literal is concatenation"Jesse Vincent2010-05-051-9/+0
| | | | | | | This reverts commit 6fb472bab4fadd0ae2ca9624b74596afab4fb8cb. Zefram asked me to revert this as he's going to be doing something more pluggable
* Revert "Deprecation warnings should always be mandatory since 5.12.0"Jesse Vincent2010-05-051-2/+2
| | | | | | | This reverts commit a7e260e62a5e47961e908363da32ef16f41301b2. Zefram asked me to revert this as he's going to be doing something more pluggable
* Revert "tweak "0x123.456" deprecation"Jesse Vincent2010-05-051-12/+9
| | | | | | | This reverts commit 1183a10042af0734ee65e252f15bd820b7bbe686. Zefram asked me to revert this as he's going to be doing something more pluggable
* If we're going to introduce an @@ array, we'll want to be able to parse $#@ tooRafael Garcia-Suarez2010-05-051-1/+1
|
* tweak "0x123.456" deprecationZefram2010-05-031-9/+12
| | | | | | | | | | | | Some improvements to the deprecation added in commit 6fb472bab4fadd0ae2ca9624b74596afab4fb8cb: - warning message includes the word "deprecated" - warning is in "syntax" category as well as "deprecated" - more systematic tests - dot detected more efficiently by incorporation into existing switch - small doc rewording - avoid the warning in t/op/taint.t
* remove Perl_pmflagRobin Barker2010-04-261-13/+0
|
* Deal with "\c{", and its kinKarl Williamson2010-04-261-6/+1
| | | | | | | | | | | | | | | | | | make regen is needed This patch forbids non-ascii following the "\c". It also terminates for "\c{" with a message to contact p5p if there is need for continuing its current definition. And if the character following the "\c" causes the result to not be a control character, a warning is issued. This is currently 'deprecated', which by default is turned on. This can easily be changed later. This patch is the initial patch. It does not do any fancy showing the context where the problematic construct occurs. This can be added later. It gathers the 3 occurrences of evaluating \c and puts them in one common routine.
* PATCH: memory leak introduced in 5.12.0Karl Williamson2010-04-251-7/+4
| | | | | | | | There is a small possibility of a memory leak in toke.c when there is a deprecated character in the name in a \N{...} construct, and the Perl is embedded or something like that so that memory isn't freed up when it exits. This patch avoids the creation of a new scalar, and gives a better error message besides.
* consting in lex_stuff_pvnRobin Barker2010-04-231-4/+4
|
* Deprecation warnings should always be mandatory since 5.12.0Rafael Garcia-Suarez2010-04-231-2/+2
|
* New deprecation warning: Dot after %s literal is concatenationJames Mastros2010-04-231-0/+9
|
* use cBOOL for bool castsDavid Mitchell2010-04-151-1/+1
| | | | | | | | | | | | | bool b = (bool)some_int doesn't necessarily do what you think. In some builds, bool is defined as char, and that cast's behaviour is thus undefined. So this line in mg.c: const bool was_temp = (bool)SvTEMP(sv); was actually setting was_temp to false even when the SVs_TEMP flag was set. Fix this by replacing all the (bool) casts with a new cBOOL() cast macro that (hopefully) does the right thing.
* [perl #74006] 5.12.0-RC stuffing bugZefram2010-04-141-0/+5
| | | | | | | There's a small bug in lex_stuff_pvn() that causes spurious syntax errors in an obscure situation. It happens if stuffing is performed on the last line of a file, and the line ends with a statement that lacks its terminating semicolon. Attached patch fixes and adds test.
* Revert "Revert "* Fixed typo in toke.c docs, identified by Zefram""Jesse Vincent2010-04-131-1/+1
| | | | This reverts commit 06164d6c3ad67ed7ba18030ae378f46f482a29af.
* Revert "* Fixed typo in toke.c docs, identified by Zefram"Jesse Vincent2010-04-121-1/+1
| | | | | | | The commit was good, but we're in freeze for 5.12.0. I'd be happy to see this hit blead again after 5.12.0 is tagged. This reverts commit 675ac12c19e6fe00eff6e604a7d637bf621997ef.
* * Fixed typo in toke.c docs, identified by Zeframbrian d foy2010-04-111-1/+1
|
* Revert "Forbid labels with keyword names"Jan Dubois2010-03-021-2/+0
| | | | | | | | This reverts commit f71d6157c7933c0d3df645f0411d97d7e2b66b2f. Revert "Add new error "Can't use keyword '%s' as a label"" This reverts commit 28ccebc469d90664106fcc1cb73d7321c4b60716.
* PATCH: deprecation warnings for unreasonable charnamesKarl Williamson2010-02-201-1/+64
| | | | | | | | | | | | | | | | | Prior to now just about anything has been legal for a character name in \N{...}. This means that legal code was broken by having \N{3,4} for example mean [^\n]{3,4}. Such code doesn't come from standard charnames, but from legal custom translators. This patch deprecates "unreasonable" names. handy.h is changed by the addition of macros that taken together define the names we deem reasonable, namely alpha beginning with alphanumerics and some punctuations as continuations. toke.c is changed to parse each name and to raise a warning if any problematic characters are found. Some tests and diagnostic documentation are also included.
* Add some missing dVAR'sMarcus Holland-Moritz2010-02-201-0/+2
| | | | | | Commits c3acb9e0760135dfd888c0ee1b415777d784aabc, 867fa1e2da145229b4db2c6e8d5b51700c15f114 and f0e67a1d29102aa9905aecf2b0f98449697d5af3 added or changed functions that now require a dVAR declaration to compile with -DPERL_GLOBAL_STRUCT.
* Avoid returning an undefined SV*Rafael Garcia-Suarez2010-02-191-1/+2
|
* Make a missing right brace on \N{ fatalKarl Williamson2010-02-191-24/+9
| | | | | | It was decided that this should be a fatal error instead of a warning. Also some comments were updated..
* PATCH: [perl #56444] delayed interpolation of \N{...}Karl Williamson2010-02-191-85/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make regen embed.fnc needs to be run on this patch. This patch fixes Bugs #56444 and #62056. Hopefully we have finally gotten this right. The parser used to handle all the escaped constants, expanding \x2e to its single byte equivalent. The problem is that for regexp patterns, this is a '.', which is a metacharacter and has special meaning that \x2e does not. So things were changed so that the parser didn't expand things in patterns. But this causes problems for \N{NAME}, when the pattern doesn't get evaluated until runtime, as for example when it has a scalar reference in it, like qr/$foo\N{NAME}/. We want the value for \N{NAME} that was in effect at the point during the parsing phase that this regex was encountered in, but we don't actually look at it until runtime, when these bug reports show that it is gone. The solution is for the tokenizer to parse \N{NAME}, but to compile it into an intermediate value that won't ever be considered a metacharacter. We have chosen to compile NAME to its equivalent code point value, and express it in the already existing \N{U+...} form. This indicates to the regex compiler that the original input was a named character and retains the value it had at that point in the parse. This means that \N{U+...} now always must imply Unicode semantics for the string or pattern it appeared in. Previously there was an inconsistency, where effectively \N{NAME} implied Unicode semantics, but \N{U+...} did not necessarily. So now, any string or pattern that has either of these forms is utf8 upgraded. A complication is that a charnames handler can return a sequence of multiple characters instead of just one. To deal with this case, the tokenizer will generate a constant of the form \N{U+c1.c2.c2...}, where c1 etc are the individual characters. Perhaps this will be made a public interface someday, but I decided to not expose it externally as far as possible for now in case we find reason to change it. It is possible to defeat this by passing it in a single quoted string to the regex compiler, so the documentation will be changed to discourage that. A further complication is that \N can have an additional meaning: to match a non-newline. This means that the two meanings have to be disambiguated. embed.fnc was changed to make public the function regcurly() in regcomp.c so that it could be referred to in toke.c to see if the ... in \N{...} is a legal quantifier like {2,}. This is used in the disambiguation. toke.c was changed to update some out-dated relevant comments. It now parses \N in patterns. If it determines that it isn't a named sequence, it passes it through unchanged. This happens when there is no brace after the \N, or no closing brace, or if the braces enclose a legal quantifier. Previously there has been essentially no restriction on what can come between the braces so that a custom translator can accept virtually anything. Now, legal quantifiers are assumed to mean that the \N is a "match non-newline that quantity of times". I removed the #ifdef'd out code that had been left in in case pack U reverted to earlier behavior. I did this because it complicated things, and because the change to pack U has been in long enough and shown that it is correct so it's not likely to be reverted. \N meaning a named character is handled differently depending on whether this is a pattern or not. In all cases, the output will be upgraded to utf8 because a named character implies Unicode semantics. If not a pattern, the \N is parsed into a utf8 string, as before. Otherwise it will be parsed into the intermediate \N{U+...} form. If the original was already a valid \N{U+...} constant, it is passed through unchanged. I now check that the sequence returned by the charnames handler is not malformed, which was lacking before. The code in regcomp.c which dealt with interfacing with the charnames handler has been removed. All the values should be determined by the time regcomp.c gets involved. The affected subroutine is necessarily restructured. An EXACT-type node is generated for the character sequence. Such a node has a capacity of 255 bytes, and so it is possible to overflow it. This wasn't checked for before, but now it is, and a warning issued and the overflowing characters are discarded.
* Allow arbitrary whitespace between NAME and VERSION in "package NAME ↵Jesse Vincent2010-02-031-0/+1
| | | | | | VERSION;" statements Fixes [perl #72432]
* Parse 'use NAME VERSION' with C localeDavid Golden2010-01-161-0/+6
|
* Omnibus strict and lax version parsingDavid Golden2010-01-131-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authors: John Peacock, David Golden and Zefram The goal of this mega-patch is to enforce strict rules for version numbers provided to 'package NAME VERSION' while formalizing the prior, lax rules used for version object creation. Parsing for use() is unchanged. version.pm adds two globals, $STRICT and $LAX, containing regular expressions that define the rules. There are two additional functions -- version::is_strict and version::is_lax -- that test an argument against these rules. However, parsing of strings that might contain version numbers is done in core via the Perl_scan_version function, which may be called during compilation or may be called later when version objects are created by Perl_new_version or Perl_upg_version. A new helper function, Perl_prescan_version, has been added to validate a string under either strict or lax rules. This is used in toke.c for 'package NAME VERSION' in strict mode and by Perl_scan_version in lax mode. It matches the behavior of the verison.pm regular expressions, but does not use them directly. A new test file, comp/packagev.t, validates strict and lax behaviors of 'package NAME VERSION' and 'version->new(VERSION)' respectively and verifies their behavior against the $STRICT and $LAX regular expressions, as well. Validating these two implementation should help ensure they each work as intended. Other files and tests have been modified as necessary to support these changes. There is remaining work to be done in a few areas: * documenting all changes in behavior and new functions * determining proper treatment of "," as decimal separators in various locales * updating diagnostics for new error messages * porting changes back to the version.pm distribution on CPAN, including pure-Perl versions
* Move prototype parsing related warnings from the 'syntax' top level warnings ↵Matt S Trout2010-01-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | category to a new 'illegalproto' subcategory. Two warnings can be emitted when parsing a prototype - Illegal character in prototype for %s : %s Prototype after '%c' for %s : %s The first one is emitted when any invalid character is found, the latter when further prototype-type stuff is found after a slurpy entry (i.e. valid character but in such a place as to be a no-op, and therefore likely a bug). These warnings are distinct from those emitted when a sub is overwritten by one with a different prototype, and when calls are made to subroutines with prototypes - those are in the pre-existing sub-category 'prototype'. Since modules such as signatures.pm and Web::Simple only need to disable the warnings during parsing, I chose to add a new category containing only these. Moving these warnings into the 'prototype' sub-category would have forced authors to disable more warnings than they intended, and the entire raison d'etre of this patch is to allow the specific warnings involved to be disabled. In order to maintain compatibility with existing code, the new location needed to be a sub-category of 'syntax' - this means that no warnings 'syntax'; will continue to work as expected - even in cases like Web::Simple where all subcategories extant prior to this patch are re-enabled (this is another reason why a move into the 'protoype' category would not achieve the desired goal). The category name 'illegalproto' was chosen because the most common warning to encounter is the "Illegal character" one, and therefore 'illegalproto' while minorly inaccurate by ignoring the (relatively recent and unknown) second warning is an easy name to spot on an initial skim of perllexwarn and will behave as expected by also disabling the case of an unusual prototype that happens to look like a normal one. This patch updates pod/perllexwarn.pod, perldiag.pod and perl5113delta.pod to document the new category, toke.c and warnings.pl to create and implement the new category, and a new test t/op/protowarn.t that verifies the new behaviour in a number of cases. It also includes the files generated by regen.pl that are found in the repo - notably warnings.h and lib/warnings.pm.
* [perl #71748] Bleadperl f0e67a1 breaks CPAN: Template::Plugin::YAML::Encode 0.02Zefram2010-01-051-12/+8
| | | | | Unsurprisingly, the nature of the bug is that I accidentally changed the logic of one of the several types of space skipping. Fix attached.
* Allow "{sub f}" to compileVincent Pit2010-01-031-1/+1
|
* Remove spurious case of warning "Use of %s without parentheses is ambiguous"Rafael Garcia-Suarez2009-12-201-2/+0
| | | | | | Eric Brine pointed out that this warning doesn't apply to ".", as in C<rand . 1>, that shouldn't warn since C<. 1> cannot be mistaken for a floating point number.
* Introduce C<use feature "unicode_strings">Rafael Garcia-Suarez2009-12-201-1/+1
| | | | | | | | | | | | | This turns on the unicode semantics for uc/lc/ucfirst/lcfirst operations on strings without the UTF8 bit set but with ASCII characters higher than 127. This replaces the "legacy" pragma experiment. Note that currently this feature sets both a bit in $^H and a (unused) key in %^H. The bit in $^H could be replaced by a flag on the uc/lc/etc op. It's probably not feasible to test a key in %^H in pp_uc in friends each time we want to know which semantics to apply.
* Make eval {} compile directly to OP_ENTERTRYRafael Garcia-Suarez2009-12-201-2/+8
| | | | | This way, it's correctly caught and blocked by Safe, separately from eval "".
* Fix for [perl #70910] wrong line number in syntax error messageZefram2009-12-091-1/+2
|
* -Dmad: double free or corruptionTony Cook2009-12-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | > If your perl has -Dmad, the following program crashes: > > $ bleadperl -we '$x="x" x 257; eval "for $x"' > *** glibc detected *** bleadperl: double free or corruption (!prev): 0x0000000001dca670 *** Change 6136c704 changed S_scan_ident from: e = d + destlen - 3; to: register char * const e = d + destlen + 3; where e is used to mark the end of the buffer, this meant that the various buffer end checks allowed the various buffers supplied S_scan_ident to overflow. Attached is a fix, various tests with fencepost checks on different identifier lengths, and the specific case mentioned in the ticket. Tony Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
* Fix -DPERL_NO_UTF16_FILTEREric Brine2009-11-301-7/+15
|
* Allow a closing brace after an "use VERSION"Vincent Pit2009-11-281-1/+2
| | | | This fixes [perl #70884] : use VERSION in BLOCK without semicolon -> syntax error
* -Dmad minitest failure bisectZefram2009-11-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | Tony Cook wrote: >Smokes with -Dmad have been failing during make minitest since the >middle of last month. Mostly fixed by the attached patch. The fault is a logic error on my part, probably from the early phase of developing the lexer API patch, when I didn't properly understand the various buffer pointer variables. In my tests with -Dmad, I'm still getting a test failure ("panic: input overflow") from t/op/incfilter.t. The underlying problem is the filter layer mishandling things when a filter function gives it a multiline string, so it generates an invalid SV state (strlen(SvPVX(PL_linestr)) > SvCUR(PL_linestr)). This faulty state also occurs without -Dmad, and so doesn't appear to be Mad-related, it just doesn't in practice cause the test panic without -Dmad. I'm investigating this bug now. -zefram Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
* perl-5.11.2 breaks NYTProf savesrc option (Lexer API suspected)Zefram2009-11-251-5/+8
| | | | | | | | | | | | | | | | | | Tim Bunce wrote: >The primary issue is the off-by-one error in the array indexing. There's a bit more to it than that. The indexing was off-by-one for *some* places that process a new line, but correct for others, so the saved source as a whole was mangled rather than simply offset. Also, there were some redundant calls to update_debugger_info(), so some lines got saved twice, in some cases off-by-one for one saving and not for the other. The saved source is, therefore, hopelessly broken in 5.11.2. Attached patch fixes the source saving. Includes a new test, which works through all reachable places that source lines get saved. This should close RT #70804. -zefram
* Also skip spaces after variable if we are within lexical brackets. Fixes ↵Gerard Goossen2009-11-251-1/+1
| | | | #70091: Segmentation fault in hash lookup in regex substitution
* lexer API fixesZefram2009-11-191-3/+6
| | | | | | | | | | | | | The attached patch contains these fixes for the lexer API work: * fix MinGW-revealed problem in BOM logic (replacing Jan's patch) * fix warnings from t/op/incfilter.t * probably fix g++ failure due to goto bypassing initialisation * perl5112delta update -zefram Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
* Remove dead preprocessor code from toke.cJan Dubois2009-11-161-13/+0
| | | | | | The symbol FTELL_FOR_PIPE_IS_BROKEN is no longer being used and should have been removed with the commit 4c84d7f2, which removed the -P option.
* Fix crash in refactored lexer internalsJan Dubois2009-11-161-1/+1
| | | | | | | | | | | | Commit f0e67a1d29102aa9905aecf2b0f98449697d5af3 changed the control flow so that PerlIO_tell(PL_rsfp) could be called when PL_rsfp was NULL, which produces a crash at least on Windows with the MSVCRT runtime. This change moves the detection if PL_rsfp is NULL or not closer to the location where is is actually tested, which gets rid of the crashes. I however have *not* verified if the changes in control flow in f0e67a1d are otherwise correct or not.
* lexer APIZefram2009-11-151-213/+758
| | | | | | | | | Attached is a patch that adds a public API for the lowest layers of lexing. This is meant to provide a solid foundation for the parsing that Devel::Declare and similar modules do, and it complements the pluggable keyword mechanism. The API consists of some existing variables combined with some new functions, all marked as experimental (which making them public certainly is).
* Add length and flags arguments to Perl_allocmy().Nicholas Clark2009-11-091-2/+2
| | | | | | Currently no flags bits are used, and the length is cross-checked against strlen() on the pointer, but the intent is to re-work the entire pad API to be UTF-8 aware, from the current situation of char * pointers only.
* Bareword sub lookupsZefram2009-11-081-30/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Attached is a patch that changes how the tokeniser looks up subroutines, when they're referenced by a bareword, for prototype and const-sub purposes. Formerly, it has looked up bareword subs directly in the package, which is contrary to the way the generated op tree looks up the sub, via an rv2cv op. The patch makes the tokeniser generate the rv2cv op earlier, and dig around in that. The motivation for this is to allow modules to hook the rv2cv op creation, to affect the name->subroutine lookup process. Currently, such hooking affects op execution as intended, but everything goes wrong with a bareword ref where the tokeniser looks at some unrelated CV, or a blank space, in the package. With the patch in place, an rv2cv hook correctly affects the tokeniser and therefore the prototype-based aspects of parsing. The patch also changes ck_subr (which applies the argument context and checking parts of prototype behaviour) to handle subs referenced by an RV const op inside the rv2cv, where formerly it would only handle a gv op inside the rv2cv. This is to support the most likely kind of modified rv2cv op. The attached patch is the resulting revised version of the bareword sub patch. It incorporates the original patch (allowing rv2cv op hookers to control prototype processing), the GV-downgrading addition, and a mention in perldelta.
* Add length and flags arguments to Perl_pad_findmy(), moving it to the public ↵Nicholas Clark2009-11-071-2/+2
| | | | | | | | API. Currently no flags bits are used, and the length is cross-checked against strlen() on the pointer, but the intent is to re-work the entire pad API to be UTF-8 aware, from the current situation of char * pointers only.
* Placate a warning from Borland's compiler.Nicholas Clark2009-11-071-1/+1
|
* Implement facility to plug in syntax triggered by keywordsJesse Vincent2009-11-051-17/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Date: Tue, 27 Oct 2009 01:29:40 +0000 From: Zefram <zefram@fysh.org> To: perl5-porters@perl.org Subject: bareword sub lookups Attached is a patch that changes how the tokeniser looks up subroutines, when they're referenced by a bareword, for prototype and const-sub purposes. Formerly, it has looked up bareword subs directly in the package, which is contrary to the way the generated op tree looks up the sub, via an rv2cv op. The patch makes the tokeniser generate the rv2cv op earlier, and dig around in that. The motivation for this is to allow modules to hook the rv2cv op creation, to affect the name->subroutine lookup process. Currently, such hooking affects op execution as intended, but everything goes wrong with a bareword ref where the tokeniser looks at some unrelated CV, or a blank space, in the package. With the patch in place, an rv2cv hook correctly affects the tokeniser and therefore the prototype-based aspects of parsing. The patch also changes ck_subr (which applies the argument context and checking parts of prototype behaviour) to handle subs referenced by an RV const op inside the rv2cv, where formerly it would only handle a gv op inside the rv2cv. This is to support the most likely kind of modified rv2cv op. [This commit includes the Makefile.PL for XS-APITest-KeywordRPN missing from the original patch, as well as updates to perldiag.pod and a MANIFEST sort]
* Deprecate use of := to mean an empty attribute list in my $pi := 4;Nicholas Clark2009-11-041-0/+3
| | | | | | | | | | | An accident of Perl's parser meant that my $pi := 4; was parsed as an empty attribute list. Empty attribute lists are ignored, hence the above is equivalent to my $pi = 4; However, the fact that it is currently valid syntax means that := cannot be used as new token, without silently changing the meaning of existing code. Hence it is now deprecated, so that it can subsequently be removed, allowing the possibility of := to be used as a new token with new semantics.
* S_utf16_textfilter() was not returning EOF correctly in some situations.Nicholas Clark2009-11-011-2/+6
|