| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Package block syntax limits the scope of the package declaration to the
attached block. It's cleaner than requiring the declaration to come
inside the block.
|
|
|
|
|
|
|
| |
This reverts commit 6fb472bab4fadd0ae2ca9624b74596afab4fb8cb.
Zefram asked me to revert this as he's going to be doing something more
pluggable
|
|
|
|
|
|
|
| |
This reverts commit a7e260e62a5e47961e908363da32ef16f41301b2.
Zefram asked me to revert this as he's going to be doing something more
pluggable
|
|
|
|
|
|
|
| |
This reverts commit 1183a10042af0734ee65e252f15bd820b7bbe686.
Zefram asked me to revert this as he's going to be doing something more
pluggable
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some improvements to the deprecation added in commit
6fb472bab4fadd0ae2ca9624b74596afab4fb8cb:
- warning message includes the word "deprecated"
- warning is in "syntax" category as well as "deprecated"
- more systematic tests
- dot detected more efficiently by incorporation into existing switch
- small doc rewording
- avoid the warning in t/op/taint.t
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make regen is needed
This patch forbids non-ascii following the "\c". It also terminates for
"\c{" with a message to contact p5p if there is need for continuing its
current definition. And if the character following the "\c" causes the
result to not be a control character, a warning is issued. This is
currently 'deprecated', which by default is turned on. This can easily
be changed later.
This patch is the initial patch. It does not do any fancy showing the
context where the problematic construct occurs. This can be added
later.
It gathers the 3 occurrences of evaluating \c and puts them in one
common routine.
|
|
|
|
|
|
|
|
| |
There is a small possibility of a memory leak in toke.c when there is a
deprecated character in the name in a \N{...} construct, and the Perl is
embedded or something like that so that memory isn't freed up when it
exits. This patch avoids the creation of a new scalar, and gives a
better error message besides.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bool b = (bool)some_int
doesn't necessarily do what you think. In some builds, bool is defined as
char, and that cast's behaviour is thus undefined. So this line in mg.c:
const bool was_temp = (bool)SvTEMP(sv);
was actually setting was_temp to false even when the SVs_TEMP flag was set.
Fix this by replacing all the (bool) casts with a new cBOOL() cast macro
that (hopefully) does the right thing.
|
|
|
|
|
|
|
| |
There's a small bug in lex_stuff_pvn() that causes spurious syntax errors
in an obscure situation. It happens if stuffing is performed on the
last line of a file, and the line ends with a statement that lacks its
terminating semicolon. Attached patch fixes and adds test.
|
|
|
|
| |
This reverts commit 06164d6c3ad67ed7ba18030ae378f46f482a29af.
|
|
|
|
|
|
|
| |
The commit was good, but we're in freeze for 5.12.0. I'd be happy to
see this hit blead again after 5.12.0 is tagged.
This reverts commit 675ac12c19e6fe00eff6e604a7d637bf621997ef.
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit f71d6157c7933c0d3df645f0411d97d7e2b66b2f.
Revert "Add new error "Can't use keyword '%s' as a label""
This reverts commit 28ccebc469d90664106fcc1cb73d7321c4b60716.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to now just about anything has been legal for a character name in
\N{...}. This means that legal code was broken by having \N{3,4} for
example mean [^\n]{3,4}. Such code doesn't come from standard
charnames, but from legal custom translators.
This patch deprecates "unreasonable" names. handy.h is changed by the
addition of macros that taken together define the names we deem
reasonable, namely alpha beginning with alphanumerics and some
punctuations as continuations.
toke.c is changed to parse each name and to raise a warning if any
problematic characters are found.
Some tests and diagnostic documentation are also included.
|
|
|
|
|
|
| |
Commits c3acb9e0760135dfd888c0ee1b415777d784aabc, 867fa1e2da145229b4db2c6e8d5b51700c15f114
and f0e67a1d29102aa9905aecf2b0f98449697d5af3 added or changed functions that now require a
dVAR declaration to compile with -DPERL_GLOBAL_STRUCT.
|
| |
|
|
|
|
|
|
| |
It was decided that this should be a fatal error instead of a warning.
Also some comments were updated..
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make regen embed.fnc
needs to be run on this patch.
This patch fixes Bugs #56444 and #62056.
Hopefully we have finally gotten this right. The parser used to handle
all the escaped constants, expanding \x2e to its single byte equivalent.
The problem is that for regexp patterns, this is a '.', which is a
metacharacter and has special meaning that \x2e does not. So things
were changed so that the parser didn't expand things in patterns. But
this causes problems for \N{NAME}, when the pattern doesn't get
evaluated until runtime, as for example when it has a scalar reference
in it, like qr/$foo\N{NAME}/. We want the value for \N{NAME} that was
in effect at the point during the parsing phase that this regex was
encountered in, but we don't actually look at it until runtime, when
these bug reports show that it is gone. The solution is for the
tokenizer to parse \N{NAME}, but to compile it into an intermediate
value that won't ever be considered a metacharacter. We have chosen to
compile NAME to its equivalent code point value, and express it in the
already existing \N{U+...} form. This indicates to the regex compiler
that the original input was a named character and retains the value it
had at that point in the parse.
This means that \N{U+...} now always must imply Unicode semantics for
the string or pattern it appeared in. Previously there was an
inconsistency, where effectively \N{NAME} implied Unicode semantics, but
\N{U+...} did not necessarily. So now, any string or pattern that has
either of these forms is utf8 upgraded.
A complication is that a charnames handler can return a sequence of
multiple characters instead of just one. To deal with this case, the
tokenizer will generate a constant of the form \N{U+c1.c2.c2...}, where
c1 etc are the individual characters. Perhaps this will be made a
public interface someday, but I decided to not expose it externally as
far as possible for now in case we find reason to change it. It is
possible to defeat this by passing it in a single quoted string to the
regex compiler, so the documentation will be changed to discourage that.
A further complication is that \N can have an additional meaning: to
match a non-newline. This means that the two meanings have to be
disambiguated.
embed.fnc was changed to make public the function regcurly() in
regcomp.c so that it could be referred to in toke.c to see if the ... in
\N{...} is a legal quantifier like {2,}. This is used in the
disambiguation.
toke.c was changed to update some out-dated relevant comments.
It now parses \N in patterns. If it determines that it isn't a named
sequence, it passes it through unchanged. This happens when there is no
brace after the \N, or no closing brace, or if the braces enclose a
legal quantifier. Previously there has been essentially no restriction
on what can come between the braces so that a custom translator can
accept virtually anything. Now, legal quantifiers are assumed to mean
that the \N is a "match non-newline that quantity of times".
I removed the #ifdef'd out code that had been left in in case pack U
reverted to earlier behavior. I did this because it complicated things,
and because the change to pack U has been in long enough and shown that
it is correct so it's not likely to be reverted.
\N meaning a named character is handled differently depending on whether
this is a pattern or not. In all cases, the output will be upgraded to
utf8 because a named character implies Unicode semantics. If not a
pattern, the \N is parsed into a utf8 string, as before. Otherwise it
will be parsed into the intermediate \N{U+...} form. If the original
was already a valid \N{U+...} constant, it is passed through unchanged.
I now check that the sequence returned by the charnames handler is not
malformed, which was lacking before.
The code in regcomp.c which dealt with interfacing with the charnames
handler has been removed. All the values should be determined by the
time regcomp.c gets involved. The affected subroutine is necessarily
restructured.
An EXACT-type node is generated for the character sequence. Such a node
has a capacity of 255 bytes, and so it is possible to overflow it. This
wasn't checked for before, but now it is, and a warning issued and the
overflowing characters are discarded.
|
|
|
|
|
|
| |
VERSION;" statements
Fixes [perl #72432]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authors: John Peacock, David Golden and Zefram
The goal of this mega-patch is to enforce strict rules for version
numbers provided to 'package NAME VERSION' while formalizing the prior,
lax rules used for version object creation. Parsing for use() is
unchanged.
version.pm adds two globals, $STRICT and $LAX, containing regular
expressions that define the rules. There are two additional functions
-- version::is_strict and version::is_lax -- that test an argument
against these rules.
However, parsing of strings that might contain version numbers is done
in core via the Perl_scan_version function, which may be called during
compilation or may be called later when version objects are created by
Perl_new_version or Perl_upg_version.
A new helper function, Perl_prescan_version, has been added to validate
a string under either strict or lax rules. This is used in toke.c for
'package NAME VERSION' in strict mode and by Perl_scan_version in lax
mode. It matches the behavior of the verison.pm regular expressions,
but does not use them directly.
A new test file, comp/packagev.t, validates strict and lax behaviors of
'package NAME VERSION' and 'version->new(VERSION)' respectively and
verifies their behavior against the $STRICT and $LAX regular
expressions, as well. Validating these two implementation should help
ensure they each work as intended.
Other files and tests have been modified as necessary to support these
changes.
There is remaining work to be done in a few areas:
* documenting all changes in behavior and new functions
* determining proper treatment of "," as decimal separators in
various locales
* updating diagnostics for new error messages
* porting changes back to the version.pm distribution on CPAN,
including pure-Perl versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
category to a new 'illegalproto' subcategory.
Two warnings can be emitted when parsing a prototype -
Illegal character in prototype for %s : %s
Prototype after '%c' for %s : %s
The first one is emitted when any invalid character is found, the latter
when further prototype-type stuff is found after a slurpy entry (i.e. valid
character but in such a place as to be a no-op, and therefore likely a bug).
These warnings are distinct from those emitted when a sub is overwritten by
one with a different prototype, and when calls are made to subroutines with
prototypes - those are in the pre-existing sub-category 'prototype'.
Since modules such as signatures.pm and Web::Simple only need to disable
the warnings during parsing, I chose to add a new category containing only
these. Moving these warnings into the 'prototype' sub-category would have
forced authors to disable more warnings than they intended, and the entire
raison d'etre of this patch is to allow the specific warnings involved to
be disabled.
In order to maintain compatibility with existing code, the new location
needed to be a sub-category of 'syntax' - this means that
no warnings 'syntax';
will continue to work as expected - even in cases like Web::Simple where all
subcategories extant prior to this patch are re-enabled (this is another
reason why a move into the 'protoype' category would not achieve the desired
goal).
The category name 'illegalproto' was chosen because the most common warning
to encounter is the "Illegal character" one, and therefore 'illegalproto'
while minorly inaccurate by ignoring the (relatively recent and unknown)
second warning is an easy name to spot on an initial skim of perllexwarn
and will behave as expected by also disabling the case of an unusual prototype
that happens to look like a normal one.
This patch updates pod/perllexwarn.pod, perldiag.pod and perl5113delta.pod
to document the new category, toke.c and warnings.pl to create and implement
the new category, and a new test t/op/protowarn.t that verifies the new
behaviour in a number of cases. It also includes the files generated by
regen.pl that are found in the repo - notably warnings.h and lib/warnings.pm.
|
|
|
|
|
| |
Unsurprisingly, the nature of the bug is that I accidentally changed
the logic of one of the several types of space skipping. Fix attached.
|
| |
|
|
|
|
|
|
| |
Eric Brine pointed out that this warning doesn't apply to ".",
as in C<rand . 1>, that shouldn't warn since C<. 1> cannot be
mistaken for a floating point number.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turns on the unicode semantics for uc/lc/ucfirst/lcfirst
operations on strings without the UTF8 bit set but with ASCII
characters higher than 127. This replaces the "legacy" pragma
experiment.
Note that currently this feature sets both a bit in $^H and
a (unused) key in %^H. The bit in $^H could be replaced by
a flag on the uc/lc/etc op. It's probably not feasible to
test a key in %^H in pp_uc in friends each time we want to
know which semantics to apply.
|
|
|
|
|
| |
This way, it's correctly caught and blocked by Safe, separately
from eval "".
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> If your perl has -Dmad, the following program crashes:
>
> $ bleadperl -we '$x="x" x 257; eval "for $x"'
> *** glibc detected *** bleadperl: double free or corruption (!prev): 0x0000000001dca670 ***
Change 6136c704 changed S_scan_ident from:
e = d + destlen - 3;
to:
register char * const e = d + destlen + 3;
where e is used to mark the end of the buffer, this meant that the
various buffer end checks allowed the various buffers supplied
S_scan_ident to overflow.
Attached is a fix, various tests with fencepost checks on different
identifier lengths, and the specific case mentioned in the ticket.
Tony
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
| |
|
|
|
|
| |
This fixes [perl #70884] : use VERSION in BLOCK without semicolon -> syntax error
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tony Cook wrote:
>Smokes with -Dmad have been failing during make minitest since the
>middle of last month.
Mostly fixed by the attached patch. The fault is a logic error on my
part, probably from the early phase of developing the lexer API patch,
when I didn't properly understand the various buffer pointer variables.
In my tests with -Dmad, I'm still getting a test failure ("panic: input
overflow") from t/op/incfilter.t. The underlying problem is the filter
layer mishandling things when a filter function gives it a multiline
string, so it generates an invalid SV state (strlen(SvPVX(PL_linestr))
> SvCUR(PL_linestr)). This faulty state also occurs without -Dmad,
and so doesn't appear to be Mad-related, it just doesn't in practice
cause the test panic without -Dmad. I'm investigating this bug now.
-zefram
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tim Bunce wrote:
>The primary issue is the off-by-one error in the array indexing.
There's a bit more to it than that. The indexing was off-by-one for
*some* places that process a new line, but correct for others, so the
saved source as a whole was mangled rather than simply offset. Also,
there were some redundant calls to update_debugger_info(), so some lines
got saved twice, in some cases off-by-one for one saving and not for
the other. The saved source is, therefore, hopelessly broken in 5.11.2.
Attached patch fixes the source saving. Includes a new test, which works
through all reachable places that source lines get saved. This should
close RT #70804.
-zefram
|
|
|
|
| |
#70091: Segmentation fault in hash lookup in regex substitution
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The attached patch contains these fixes for the lexer API work:
* fix MinGW-revealed problem in BOM logic (replacing Jan's patch)
* fix warnings from t/op/incfilter.t
* probably fix g++ failure due to goto bypassing initialisation
* perl5112delta update
-zefram
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
|
|
| |
The symbol FTELL_FOR_PIPE_IS_BROKEN is no longer being used
and should have been removed with the commit 4c84d7f2, which
removed the -P option.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit f0e67a1d29102aa9905aecf2b0f98449697d5af3 changed the control
flow so that PerlIO_tell(PL_rsfp) could be called when PL_rsfp was
NULL, which produces a crash at least on Windows with the MSVCRT
runtime.
This change moves the detection if PL_rsfp is NULL or not closer to
the location where is is actually tested, which gets rid of the
crashes. I however have *not* verified if the changes in control
flow in f0e67a1d are otherwise correct or not.
|
|
|
|
|
|
|
|
|
| |
Attached is a patch that adds a public API for the lowest layers of
lexing. This is meant to provide a solid foundation for the parsing that
Devel::Declare and similar modules do, and it complements the pluggable
keyword mechanism. The API consists of some existing variables combined
with some new functions, all marked as experimental (which making them
public certainly is).
|
|
|
|
|
|
| |
Currently no flags bits are used, and the length is cross-checked against
strlen() on the pointer, but the intent is to re-work the entire pad API to
be UTF-8 aware, from the current situation of char * pointers only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Attached is a patch that changes how the tokeniser looks up subroutines,
when they're referenced by a bareword, for prototype and const-sub
purposes. Formerly, it has looked up bareword subs directly in the
package, which is contrary to the way the generated op tree looks up
the sub, via an rv2cv op. The patch makes the tokeniser generate the
rv2cv op earlier, and dig around in that.
The motivation for this is to allow modules to hook the rv2cv op
creation, to affect the name->subroutine lookup process. Currently,
such hooking affects op execution as intended, but everything goes wrong
with a bareword ref where the tokeniser looks at some unrelated CV,
or a blank space, in the package. With the patch in place, an rv2cv
hook correctly affects the tokeniser and therefore the prototype-based
aspects of parsing.
The patch also changes ck_subr (which applies the argument context and
checking parts of prototype behaviour) to handle subs referenced by an
RV const op inside the rv2cv, where formerly it would only handle a gv
op inside the rv2cv. This is to support the most likely kind of
modified rv2cv op.
The attached patch is the resulting revised version of the bareword
sub patch. It incorporates the original patch (allowing rv2cv op
hookers to control prototype processing), the GV-downgrading addition,
and a mention in perldelta.
|
|
|
|
|
|
|
|
| |
API.
Currently no flags bits are used, and the length is cross-checked against
strlen() on the pointer, but the intent is to re-work the entire pad API to be
UTF-8 aware, from the current situation of char * pointers only.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Tue, 27 Oct 2009 01:29:40 +0000
From: Zefram <zefram@fysh.org>
To: perl5-porters@perl.org
Subject: bareword sub lookups
Attached is a patch that changes how the tokeniser looks up subroutines,
when they're referenced by a bareword, for prototype and const-sub
purposes. Formerly, it has looked up bareword subs directly in the
package, which is contrary to the way the generated op tree looks up
the sub, via an rv2cv op. The patch makes the tokeniser generate the
rv2cv op earlier, and dig around in that.
The motivation for this is to allow modules to hook the rv2cv op
creation, to affect the name->subroutine lookup process. Currently,
such hooking affects op execution as intended, but everything goes wrong
with a bareword ref where the tokeniser looks at some unrelated CV,
or a blank space, in the package. With the patch in place, an rv2cv
hook correctly affects the tokeniser and therefore the prototype-based
aspects of parsing.
The patch also changes ck_subr (which applies the argument context and
checking parts of prototype behaviour) to handle subs referenced by an
RV const op inside the rv2cv, where formerly it would only handle a gv
op inside the rv2cv. This is to support the most likely kind of modified
rv2cv op.
[This commit includes the Makefile.PL for XS-APITest-KeywordRPN missing
from the original patch, as well as updates to perldiag.pod and a
MANIFEST sort]
|
|
|
|
|
|
|
|
|
|
|
| |
An accident of Perl's parser meant that my $pi := 4; was parsed as an empty
attribute list. Empty attribute lists are ignored, hence the above is
equivalent to my $pi = 4; However, the fact that it is currently valid syntax
means that := cannot be used as new token, without silently changing the
meaning of existing code.
Hence it is now deprecated, so that it can subsequently be removed, allowing
the possibility of := to be used as a new token with new semantics.
|
| |
|