| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This reverts commit f71d6157c7933c0d3df645f0411d97d7e2b66b2f.
Revert "Add new error "Can't use keyword '%s' as a label""
This reverts commit 28ccebc469d90664106fcc1cb73d7321c4b60716.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to now just about anything has been legal for a character name in
\N{...}. This means that legal code was broken by having \N{3,4} for
example mean [^\n]{3,4}. Such code doesn't come from standard
charnames, but from legal custom translators.
This patch deprecates "unreasonable" names. handy.h is changed by the
addition of macros that taken together define the names we deem
reasonable, namely alpha beginning with alphanumerics and some
punctuations as continuations.
toke.c is changed to parse each name and to raise a warning if any
problematic characters are found.
Some tests and diagnostic documentation are also included.
|
|
|
|
|
|
| |
Commits c3acb9e0760135dfd888c0ee1b415777d784aabc, 867fa1e2da145229b4db2c6e8d5b51700c15f114
and f0e67a1d29102aa9905aecf2b0f98449697d5af3 added or changed functions that now require a
dVAR declaration to compile with -DPERL_GLOBAL_STRUCT.
|
| |
|
|
|
|
|
|
| |
It was decided that this should be a fatal error instead of a warning.
Also some comments were updated..
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make regen embed.fnc
needs to be run on this patch.
This patch fixes Bugs #56444 and #62056.
Hopefully we have finally gotten this right. The parser used to handle
all the escaped constants, expanding \x2e to its single byte equivalent.
The problem is that for regexp patterns, this is a '.', which is a
metacharacter and has special meaning that \x2e does not. So things
were changed so that the parser didn't expand things in patterns. But
this causes problems for \N{NAME}, when the pattern doesn't get
evaluated until runtime, as for example when it has a scalar reference
in it, like qr/$foo\N{NAME}/. We want the value for \N{NAME} that was
in effect at the point during the parsing phase that this regex was
encountered in, but we don't actually look at it until runtime, when
these bug reports show that it is gone. The solution is for the
tokenizer to parse \N{NAME}, but to compile it into an intermediate
value that won't ever be considered a metacharacter. We have chosen to
compile NAME to its equivalent code point value, and express it in the
already existing \N{U+...} form. This indicates to the regex compiler
that the original input was a named character and retains the value it
had at that point in the parse.
This means that \N{U+...} now always must imply Unicode semantics for
the string or pattern it appeared in. Previously there was an
inconsistency, where effectively \N{NAME} implied Unicode semantics, but
\N{U+...} did not necessarily. So now, any string or pattern that has
either of these forms is utf8 upgraded.
A complication is that a charnames handler can return a sequence of
multiple characters instead of just one. To deal with this case, the
tokenizer will generate a constant of the form \N{U+c1.c2.c2...}, where
c1 etc are the individual characters. Perhaps this will be made a
public interface someday, but I decided to not expose it externally as
far as possible for now in case we find reason to change it. It is
possible to defeat this by passing it in a single quoted string to the
regex compiler, so the documentation will be changed to discourage that.
A further complication is that \N can have an additional meaning: to
match a non-newline. This means that the two meanings have to be
disambiguated.
embed.fnc was changed to make public the function regcurly() in
regcomp.c so that it could be referred to in toke.c to see if the ... in
\N{...} is a legal quantifier like {2,}. This is used in the
disambiguation.
toke.c was changed to update some out-dated relevant comments.
It now parses \N in patterns. If it determines that it isn't a named
sequence, it passes it through unchanged. This happens when there is no
brace after the \N, or no closing brace, or if the braces enclose a
legal quantifier. Previously there has been essentially no restriction
on what can come between the braces so that a custom translator can
accept virtually anything. Now, legal quantifiers are assumed to mean
that the \N is a "match non-newline that quantity of times".
I removed the #ifdef'd out code that had been left in in case pack U
reverted to earlier behavior. I did this because it complicated things,
and because the change to pack U has been in long enough and shown that
it is correct so it's not likely to be reverted.
\N meaning a named character is handled differently depending on whether
this is a pattern or not. In all cases, the output will be upgraded to
utf8 because a named character implies Unicode semantics. If not a
pattern, the \N is parsed into a utf8 string, as before. Otherwise it
will be parsed into the intermediate \N{U+...} form. If the original
was already a valid \N{U+...} constant, it is passed through unchanged.
I now check that the sequence returned by the charnames handler is not
malformed, which was lacking before.
The code in regcomp.c which dealt with interfacing with the charnames
handler has been removed. All the values should be determined by the
time regcomp.c gets involved. The affected subroutine is necessarily
restructured.
An EXACT-type node is generated for the character sequence. Such a node
has a capacity of 255 bytes, and so it is possible to overflow it. This
wasn't checked for before, but now it is, and a warning issued and the
overflowing characters are discarded.
|
|
|
|
|
|
| |
VERSION;" statements
Fixes [perl #72432]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authors: John Peacock, David Golden and Zefram
The goal of this mega-patch is to enforce strict rules for version
numbers provided to 'package NAME VERSION' while formalizing the prior,
lax rules used for version object creation. Parsing for use() is
unchanged.
version.pm adds two globals, $STRICT and $LAX, containing regular
expressions that define the rules. There are two additional functions
-- version::is_strict and version::is_lax -- that test an argument
against these rules.
However, parsing of strings that might contain version numbers is done
in core via the Perl_scan_version function, which may be called during
compilation or may be called later when version objects are created by
Perl_new_version or Perl_upg_version.
A new helper function, Perl_prescan_version, has been added to validate
a string under either strict or lax rules. This is used in toke.c for
'package NAME VERSION' in strict mode and by Perl_scan_version in lax
mode. It matches the behavior of the verison.pm regular expressions,
but does not use them directly.
A new test file, comp/packagev.t, validates strict and lax behaviors of
'package NAME VERSION' and 'version->new(VERSION)' respectively and
verifies their behavior against the $STRICT and $LAX regular
expressions, as well. Validating these two implementation should help
ensure they each work as intended.
Other files and tests have been modified as necessary to support these
changes.
There is remaining work to be done in a few areas:
* documenting all changes in behavior and new functions
* determining proper treatment of "," as decimal separators in
various locales
* updating diagnostics for new error messages
* porting changes back to the version.pm distribution on CPAN,
including pure-Perl versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
category to a new 'illegalproto' subcategory.
Two warnings can be emitted when parsing a prototype -
Illegal character in prototype for %s : %s
Prototype after '%c' for %s : %s
The first one is emitted when any invalid character is found, the latter
when further prototype-type stuff is found after a slurpy entry (i.e. valid
character but in such a place as to be a no-op, and therefore likely a bug).
These warnings are distinct from those emitted when a sub is overwritten by
one with a different prototype, and when calls are made to subroutines with
prototypes - those are in the pre-existing sub-category 'prototype'.
Since modules such as signatures.pm and Web::Simple only need to disable
the warnings during parsing, I chose to add a new category containing only
these. Moving these warnings into the 'prototype' sub-category would have
forced authors to disable more warnings than they intended, and the entire
raison d'etre of this patch is to allow the specific warnings involved to
be disabled.
In order to maintain compatibility with existing code, the new location
needed to be a sub-category of 'syntax' - this means that
no warnings 'syntax';
will continue to work as expected - even in cases like Web::Simple where all
subcategories extant prior to this patch are re-enabled (this is another
reason why a move into the 'protoype' category would not achieve the desired
goal).
The category name 'illegalproto' was chosen because the most common warning
to encounter is the "Illegal character" one, and therefore 'illegalproto'
while minorly inaccurate by ignoring the (relatively recent and unknown)
second warning is an easy name to spot on an initial skim of perllexwarn
and will behave as expected by also disabling the case of an unusual prototype
that happens to look like a normal one.
This patch updates pod/perllexwarn.pod, perldiag.pod and perl5113delta.pod
to document the new category, toke.c and warnings.pl to create and implement
the new category, and a new test t/op/protowarn.t that verifies the new
behaviour in a number of cases. It also includes the files generated by
regen.pl that are found in the repo - notably warnings.h and lib/warnings.pm.
|
|
|
|
|
| |
Unsurprisingly, the nature of the bug is that I accidentally changed
the logic of one of the several types of space skipping. Fix attached.
|
| |
|
|
|
|
|
|
| |
Eric Brine pointed out that this warning doesn't apply to ".",
as in C<rand . 1>, that shouldn't warn since C<. 1> cannot be
mistaken for a floating point number.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turns on the unicode semantics for uc/lc/ucfirst/lcfirst
operations on strings without the UTF8 bit set but with ASCII
characters higher than 127. This replaces the "legacy" pragma
experiment.
Note that currently this feature sets both a bit in $^H and
a (unused) key in %^H. The bit in $^H could be replaced by
a flag on the uc/lc/etc op. It's probably not feasible to
test a key in %^H in pp_uc in friends each time we want to
know which semantics to apply.
|
|
|
|
|
| |
This way, it's correctly caught and blocked by Safe, separately
from eval "".
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> If your perl has -Dmad, the following program crashes:
>
> $ bleadperl -we '$x="x" x 257; eval "for $x"'
> *** glibc detected *** bleadperl: double free or corruption (!prev): 0x0000000001dca670 ***
Change 6136c704 changed S_scan_ident from:
e = d + destlen - 3;
to:
register char * const e = d + destlen + 3;
where e is used to mark the end of the buffer, this meant that the
various buffer end checks allowed the various buffers supplied
S_scan_ident to overflow.
Attached is a fix, various tests with fencepost checks on different
identifier lengths, and the specific case mentioned in the ticket.
Tony
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
| |
|
|
|
|
| |
This fixes [perl #70884] : use VERSION in BLOCK without semicolon -> syntax error
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tony Cook wrote:
>Smokes with -Dmad have been failing during make minitest since the
>middle of last month.
Mostly fixed by the attached patch. The fault is a logic error on my
part, probably from the early phase of developing the lexer API patch,
when I didn't properly understand the various buffer pointer variables.
In my tests with -Dmad, I'm still getting a test failure ("panic: input
overflow") from t/op/incfilter.t. The underlying problem is the filter
layer mishandling things when a filter function gives it a multiline
string, so it generates an invalid SV state (strlen(SvPVX(PL_linestr))
> SvCUR(PL_linestr)). This faulty state also occurs without -Dmad,
and so doesn't appear to be Mad-related, it just doesn't in practice
cause the test panic without -Dmad. I'm investigating this bug now.
-zefram
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tim Bunce wrote:
>The primary issue is the off-by-one error in the array indexing.
There's a bit more to it than that. The indexing was off-by-one for
*some* places that process a new line, but correct for others, so the
saved source as a whole was mangled rather than simply offset. Also,
there were some redundant calls to update_debugger_info(), so some lines
got saved twice, in some cases off-by-one for one saving and not for
the other. The saved source is, therefore, hopelessly broken in 5.11.2.
Attached patch fixes the source saving. Includes a new test, which works
through all reachable places that source lines get saved. This should
close RT #70804.
-zefram
|
|
|
|
| |
#70091: Segmentation fault in hash lookup in regex substitution
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The attached patch contains these fixes for the lexer API work:
* fix MinGW-revealed problem in BOM logic (replacing Jan's patch)
* fix warnings from t/op/incfilter.t
* probably fix g++ failure due to goto bypassing initialisation
* perl5112delta update
-zefram
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
|
|
| |
The symbol FTELL_FOR_PIPE_IS_BROKEN is no longer being used
and should have been removed with the commit 4c84d7f2, which
removed the -P option.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit f0e67a1d29102aa9905aecf2b0f98449697d5af3 changed the control
flow so that PerlIO_tell(PL_rsfp) could be called when PL_rsfp was
NULL, which produces a crash at least on Windows with the MSVCRT
runtime.
This change moves the detection if PL_rsfp is NULL or not closer to
the location where is is actually tested, which gets rid of the
crashes. I however have *not* verified if the changes in control
flow in f0e67a1d are otherwise correct or not.
|
|
|
|
|
|
|
|
|
| |
Attached is a patch that adds a public API for the lowest layers of
lexing. This is meant to provide a solid foundation for the parsing that
Devel::Declare and similar modules do, and it complements the pluggable
keyword mechanism. The API consists of some existing variables combined
with some new functions, all marked as experimental (which making them
public certainly is).
|
|
|
|
|
|
| |
Currently no flags bits are used, and the length is cross-checked against
strlen() on the pointer, but the intent is to re-work the entire pad API to
be UTF-8 aware, from the current situation of char * pointers only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Attached is a patch that changes how the tokeniser looks up subroutines,
when they're referenced by a bareword, for prototype and const-sub
purposes. Formerly, it has looked up bareword subs directly in the
package, which is contrary to the way the generated op tree looks up
the sub, via an rv2cv op. The patch makes the tokeniser generate the
rv2cv op earlier, and dig around in that.
The motivation for this is to allow modules to hook the rv2cv op
creation, to affect the name->subroutine lookup process. Currently,
such hooking affects op execution as intended, but everything goes wrong
with a bareword ref where the tokeniser looks at some unrelated CV,
or a blank space, in the package. With the patch in place, an rv2cv
hook correctly affects the tokeniser and therefore the prototype-based
aspects of parsing.
The patch also changes ck_subr (which applies the argument context and
checking parts of prototype behaviour) to handle subs referenced by an
RV const op inside the rv2cv, where formerly it would only handle a gv
op inside the rv2cv. This is to support the most likely kind of
modified rv2cv op.
The attached patch is the resulting revised version of the bareword
sub patch. It incorporates the original patch (allowing rv2cv op
hookers to control prototype processing), the GV-downgrading addition,
and a mention in perldelta.
|
|
|
|
|
|
|
|
| |
API.
Currently no flags bits are used, and the length is cross-checked against
strlen() on the pointer, but the intent is to re-work the entire pad API to be
UTF-8 aware, from the current situation of char * pointers only.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Tue, 27 Oct 2009 01:29:40 +0000
From: Zefram <zefram@fysh.org>
To: perl5-porters@perl.org
Subject: bareword sub lookups
Attached is a patch that changes how the tokeniser looks up subroutines,
when they're referenced by a bareword, for prototype and const-sub
purposes. Formerly, it has looked up bareword subs directly in the
package, which is contrary to the way the generated op tree looks up
the sub, via an rv2cv op. The patch makes the tokeniser generate the
rv2cv op earlier, and dig around in that.
The motivation for this is to allow modules to hook the rv2cv op
creation, to affect the name->subroutine lookup process. Currently,
such hooking affects op execution as intended, but everything goes wrong
with a bareword ref where the tokeniser looks at some unrelated CV,
or a blank space, in the package. With the patch in place, an rv2cv
hook correctly affects the tokeniser and therefore the prototype-based
aspects of parsing.
The patch also changes ck_subr (which applies the argument context and
checking parts of prototype behaviour) to handle subs referenced by an
RV const op inside the rv2cv, where formerly it would only handle a gv
op inside the rv2cv. This is to support the most likely kind of modified
rv2cv op.
[This commit includes the Makefile.PL for XS-APITest-KeywordRPN missing
from the original patch, as well as updates to perldiag.pod and a
MANIFEST sort]
|
|
|
|
|
|
|
|
|
|
|
| |
An accident of Perl's parser meant that my $pi := 4; was parsed as an empty
attribute list. Empty attribute lists are ignored, hence the above is
equivalent to my $pi = 4; However, the fact that it is currently valid syntax
means that := cannot be used as new token, without silently changing the
meaning of existing code.
Hence it is now deprecated, so that it can subsequently be removed, allowing
the possibility of := to be used as a new token with new semantics.
|
| |
|
|
|
|
|
|
| |
regcomp.c stopped using it before 5.10, leaving only toke.c. The only code on
CPAN that uses it is copies of regcomp.c. Replace it with a static function,
with a cleaner interface.
|
|
|
|
| |
ckWARN(WARN_AMBIGUOUS) is cheaper than Perl_gv_fetchpvn_flags().
|
|
|
|
| |
This saves allocating an extra SV head and body.
|
| |
|
|
|
|
| |
Easier said than done.
|
|
|
|
|
|
|
|
|
|
| |
Treat any (and all) octects after the BOM (or all, if there was no BOM) as
initial read data for the filter, and call it to convert them to the first
line, reading more if necessary. This correctly handles the "problem" that
UTF-16LE read as a line, on the assumption that it's ASCII/ISO-8859-*/UTF-8/etc
will be truncated after the first octect of the "\n\0" pair that is "\n"
encoded as UTF-16LE. This fixes bug #69678.
Read from the upstream filter in block mode, rather than line mode.
|
|
|
|
|
| |
Given that t/TEST already had code to add -I../lib when testing UTF-8 with
-utf8, do likewise for testing UTF-16 with -utf16.
|
|
|
|
| |
Re-use the same SV for each call. Store it in IoTOP_GV(filter).
|
|
|
|
|
|
|
|
| |
Conceptually it's also wrong, as if there are source filters, the passed-in
file handle is not passed up the stack of filters for the topmost filter to
use to read from. It was in the parameter list from the first creation of
filter_gets() in 16d20bd98cd29be76029ebf04027a7edd34d817b, when calls to
sv_gets() were replaced by it.
|
|
|
|
|
| |
aa6dbd607b0a3d8a wrongly assumed that the filter's state SV was the SV passed
in as an argument to the filter read function.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Remove it. All its writes were byte-for-byte identical with the memory they
overwrote. The bugs it attempts to fix are real, but caused by the design and
implementation of other parts of this routine and S_utf16_textfilter().
|
|
|
|
| |
Use IoLINES() on the filter's SV to determine which encoding is in use.
|
| |
|