| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All built-in functions that operate directly on array or hash
containers now also accept hard references to arrays or hashes:
|----------------------------+---------------------------|
| Traditional syntax | Terse syntax |
|----------------------------+---------------------------|
| push @$arrayref, @stuff | push $arrayref, @stuff |
| unshift @$arrayref, @stuff | unshift $arrayref, @stuff |
| pop @$arrayref | pop $arrayref |
| shift @$arrayref | shift $arrayref |
| splice @$arrayref, 0, 2 | splice $arrayref, 0, 2 |
| keys %$hashref | keys $hashref |
| keys @$arrayref | keys $arrayref |
| values %$hashref | values $hashref |
| values @$arrayref | values $arrayref |
| ($k,$v) = each %$hashref | ($k,$v) = each $hashref |
| ($k,$v) = each @$arrayref | ($k,$v) = each $arrayref |
|----------------------------+---------------------------|
This allows these built-in functions to act on long dereferencing
chains or on the return value of subroutines without needing to wrap
them in C<@{}> or C<%{}>:
push @{$obj->tags}, $new_tag; # old way
push $obj->tags, $new_tag; # new way
for ( keys %{$hoh->{genres}{artists}} ) {...} # old way
for ( keys $hoh->{genres}{artists} ) {...} # new way
For C<push>, C<unshift> and C<splice>, the reference will auto-vivify
if it is not defined, just as if it were wrapped with C<@{}>.
Calling C<keys> or C<values> directly on a reference gives a
substantial performance improvement over explicit dereferencing.
For C<keys>, C<values>, C<each>, when overloaded dereferencing is
present, the overloaded dereference is used instead of dereferencing
the underlying reftype. Warnings are issued about assumptions made in
the following three ambiguous cases:
(a) If both %{} and @{} overloading exists, %{} is used
(b) If %{} overloading exists on a blessed arrayref, %{} is used
(c) If @{} overloading exists on a blessed hashref, @{} is used
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds recognition of these modifiers, with appropriate action
for d and l. u does nothing useful yet. This allows for the
interpolation of a regex into another one without losing the character
set semantics that it was compiled with, as for the first time, the
semantics is now specified in the stringification as one of these
modifiers.
To this end, it allocates an unused bit in the structures. The off-
sets change so as to not disturb other bits.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds (?^...) to signify to use the default regex modifiers for the
cluster or embedded pattern-match modifier change. The major purpose of
this is to simplify regex stringification, so that "^" is output in
place of "-xism". As a result, the stringification will not change in
the future when new regex modifiers are added, so tests, etc. that rely
on a particular stringification will have to change now, but never
again.
Code that needs to work properly with both old- and new-style regexes
can use something like the following:
# Accept both old and new-style stringification
my $modifiers = (qr/foobar/ =~ /\Q(?^/) ? '^' : '-xism';
This construct is Ben Morrow's idea.
|
|
|
|
|
|
|
|
|
|
| |
This makes a qw(...) list literal a distinct token type for the
parser, where previously it was munged into a "(",THING,")" sequence.
The change means that qw(...) can't accidentally supply parens to parts
of the grammar that want real parens. Due to many bits of code taking
advantage of that by "foreach my $x qw(...) {}", this patch also includes
a hack to coerce qw(...) to the old-style parenthesised THING, emitting
a deprecation warning along the way.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yyparse() becomes reentrant. The yacc stack and related resources
are allocated in yyparse(), rather than in lex_start(), and they are
localised to yyparse(), preserving their values from any outer parser.
yyparse() now takes a parameter which determines which production it
will parse at the top level. New API function parse_fullstmt() uses this
facility to parse just a single statement. The top-level single-statement
production that is used for this then messes with the parser's head so
that the parsing stops without seeing EOF, and any lookahead token seen
after the statement is pushed back to the lexer.
|
| |
|
|
|
|
|
| |
Also updates porting/diag.t to standardize the
detected messages into the format used in perldiag.pod
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds a mention of \o{} to perlre to avoid the backreference
ambiguities, and uses 3 octal digits in an example, and suggests using 3
digits where 2 were suggested before.
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the new construct \o{} to express a character constant
by its octal ordinal value, along with ancillary tests and
documentation.
A function to handle this is added to util.c, and it is called from the
3 parsing places it could occur. The function is a candidate for
in-lining, though I doubt that it will ever be used frequently.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this patch, \400 - \777 meant something different in some
circumstances in regexes outside bracketed character classes. A
deprecated warning message has been in place since 5.10.1 when this
happens. Remove the warning, and bring the behavior into line with the
other double-quotish contexts. \400 - \777 now always means the same
thing as \x{100} - \x{1FF} (except when the octal forms are taken as
backreferences.)
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
|
| |
This patch raises a deprecated warning on constructs like
$result = $a =~ m/$foo/sand $bar;
which means
$result = $a =~ m/$foo/s and $bar;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of function names that do warnings have been added, but diag.t
hasn't kept up.
This patch changes it to look for likely function names in embed.fnc, so
it will automatically keep up in the future. There's no need to worry
about it looking for inappropriate functions, as the syntax of messages
that it looks for is so restrictive, that there won't be false
positives. Instead there are still many messages it fails to catch.
As a result of it's falling behind several issues have crept in. I
resolved the couple I thought were clear (including one in a comment;
diag.t doesn't strip comments, but mostly it doesn't matter), and added
the others to the <DATA> section to ignore.
are
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Attached is a patch for some of this issue. I took Nicholas' advice,
and if the result of \cX isn't a word character, the output message will
precede it with a backslash, so the message in the example would be
"\c`" more clearly written simply as "\ " at -e line 1.
I think that message is true.
I also added tests.
There is a test that guarantees that we won't ship 5.14 with things as
they are now in it. I added wording to the comments next to that test
to be sure to verify with this email thread if we should remove the
deprecation, and mentioned that in the explanatory wording in the pod.
I support removing the deprecation, but for now I'm not touching that,
to see what other issues may yet arise before 5.14.
|
|
|
|
|
| |
Prior to this patch, messages in perldiag.pod had to have \\ instead of
the correct single backslash in order for diag.t to not complain.
|
| |
|
|
|
|
|
|
|
| |
This reverts commit 6fb472bab4fadd0ae2ca9624b74596afab4fb8cb.
Zefram asked me to revert this as he's going to be doing something more
pluggable
|
|
|
|
|
|
|
| |
This reverts commit 1183a10042af0734ee65e252f15bd820b7bbe686.
Zefram asked me to revert this as he's going to be doing something more
pluggable
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some improvements to the deprecation added in commit
6fb472bab4fadd0ae2ca9624b74596afab4fb8cb:
- warning message includes the word "deprecated"
- warning is in "syntax" category as well as "deprecated"
- more systematic tests
- dot detected more efficiently by incorporation into existing switch
- small doc rewording
- avoid the warning in t/op/taint.t
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make regen is needed
This patch forbids non-ascii following the "\c". It also terminates for
"\c{" with a message to contact p5p if there is need for continuing its
current definition. And if the character following the "\c" causes the
result to not be a control character, a warning is issued. This is
currently 'deprecated', which by default is turned on. This can easily
be changed later.
This patch is the initial patch. It does not do any fancy showing the
context where the problematic construct occurs. This can be added
later.
It gathers the 3 occurrences of evaluating \c and puts them in one
common routine.
|
|
|
|
|
|
|
|
| |
There is a small possibility of a memory leak in toke.c when there is a
deprecated character in the name in a \N{...} construct, and the Perl is
embedded or something like that so that memory isn't freed up when it
exits. This patch avoids the creation of a new scalar, and gives a
better error message besides.
|
| |
|
|
|
|
| |
Missing warning description noticed by Zefram
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The message ‘Variable "%s" is not imported’ cannot be suppressed, even
with -X (local $SIG{__WARN__}=sub{} is what I have to use):
perl -Xle '$foo;use strict; eval q/$foo/ or die "---$@---"'
Variable "$foo" is not imported at (eval 1) line 2.
---Global symbol "$foo" requires explicit package name at (eval 1) line 2.
--- at -e line 1.
This is because we have what appears to the user to be a multi-line
error message. It is in fact a warning ‘Variable...’ followed by an
error ‘Global symbol...’.
The attached patch assigns a warning category to the warning.
|
|
|
|
|
|
|
|
| |
This reverts commit f71d6157c7933c0d3df645f0411d97d7e2b66b2f.
Revert "Add new error "Can't use keyword '%s' as a label""
This reverts commit 28ccebc469d90664106fcc1cb73d7321c4b60716.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to now just about anything has been legal for a character name in
\N{...}. This means that legal code was broken by having \N{3,4} for
example mean [^\n]{3,4}. Such code doesn't come from standard
charnames, but from legal custom translators.
This patch deprecates "unreasonable" names. handy.h is changed by the
addition of macros that taken together define the names we deem
reasonable, namely alpha beginning with alphanumerics and some
punctuations as continuations.
toke.c is changed to parse each name and to raise a warning if any
problematic characters are found.
Some tests and diagnostic documentation are also included.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible to bypass the lexer's parsing of \N. This patch causes
the regex compiler to deal with that better. The compiler no longer
assumes that the lexer parsed the \N. It generates an error message if
the \N isn't in a form it is expecting, and invalid hexadecimal digits
are now fatal errors, with the position of the error more clearly
marked.
The diagnostic pod has been updated to reflect the new error messages,
with some slight clarifications to the previous ones as well.
|
|
|
|
|
|
| |
It was decided that this should be a fatal error instead of a warning.
Also some comments were updated..
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make regen embed.fnc
needs to be run on this patch.
This patch fixes Bugs #56444 and #62056.
Hopefully we have finally gotten this right. The parser used to handle
all the escaped constants, expanding \x2e to its single byte equivalent.
The problem is that for regexp patterns, this is a '.', which is a
metacharacter and has special meaning that \x2e does not. So things
were changed so that the parser didn't expand things in patterns. But
this causes problems for \N{NAME}, when the pattern doesn't get
evaluated until runtime, as for example when it has a scalar reference
in it, like qr/$foo\N{NAME}/. We want the value for \N{NAME} that was
in effect at the point during the parsing phase that this regex was
encountered in, but we don't actually look at it until runtime, when
these bug reports show that it is gone. The solution is for the
tokenizer to parse \N{NAME}, but to compile it into an intermediate
value that won't ever be considered a metacharacter. We have chosen to
compile NAME to its equivalent code point value, and express it in the
already existing \N{U+...} form. This indicates to the regex compiler
that the original input was a named character and retains the value it
had at that point in the parse.
This means that \N{U+...} now always must imply Unicode semantics for
the string or pattern it appeared in. Previously there was an
inconsistency, where effectively \N{NAME} implied Unicode semantics, but
\N{U+...} did not necessarily. So now, any string or pattern that has
either of these forms is utf8 upgraded.
A complication is that a charnames handler can return a sequence of
multiple characters instead of just one. To deal with this case, the
tokenizer will generate a constant of the form \N{U+c1.c2.c2...}, where
c1 etc are the individual characters. Perhaps this will be made a
public interface someday, but I decided to not expose it externally as
far as possible for now in case we find reason to change it. It is
possible to defeat this by passing it in a single quoted string to the
regex compiler, so the documentation will be changed to discourage that.
A further complication is that \N can have an additional meaning: to
match a non-newline. This means that the two meanings have to be
disambiguated.
embed.fnc was changed to make public the function regcurly() in
regcomp.c so that it could be referred to in toke.c to see if the ... in
\N{...} is a legal quantifier like {2,}. This is used in the
disambiguation.
toke.c was changed to update some out-dated relevant comments.
It now parses \N in patterns. If it determines that it isn't a named
sequence, it passes it through unchanged. This happens when there is no
brace after the \N, or no closing brace, or if the braces enclose a
legal quantifier. Previously there has been essentially no restriction
on what can come between the braces so that a custom translator can
accept virtually anything. Now, legal quantifiers are assumed to mean
that the \N is a "match non-newline that quantity of times".
I removed the #ifdef'd out code that had been left in in case pack U
reverted to earlier behavior. I did this because it complicated things,
and because the change to pack U has been in long enough and shown that
it is correct so it's not likely to be reverted.
\N meaning a named character is handled differently depending on whether
this is a pattern or not. In all cases, the output will be upgraded to
utf8 because a named character implies Unicode semantics. If not a
pattern, the \N is parsed into a utf8 string, as before. Otherwise it
will be parsed into the intermediate \N{U+...} form. If the original
was already a valid \N{U+...} constant, it is passed through unchanged.
I now check that the sequence returned by the charnames handler is not
malformed, which was lacking before.
The code in regcomp.c which dealt with interfacing with the charnames
handler has been removed. All the values should be determined by the
time regcomp.c gets involved. The affected subroutine is necessarily
restructured.
An EXACT-type node is generated for the character sequence. Such a node
has a capacity of 255 bytes, and so it is possible to overflow it. This
wasn't checked for before, but now it is, and a warning issued and the
overflowing characters are discarded.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise gmtime(2**66) will cause a very, very, very long loop and
DOS Perl.
Add a test that very, very large times don't send gmtime and localtime into a loop
Had to fix some revealed mistakes in op/time.t when warnings were turned on.
Fix Time::gmtime and Time::localtime tests to match the new limits of gm/localtime.
|
|
|
|
|
|
|
| |
Currently the meaning of e.g. \q is simply 'q', with a warning
generated. Elsewhere it is documented that this means that \q is
thus a reserved term, available for Perl to co-opt in some future
release. But the warning doesn't say that. Clarify it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I looked at all the instances of spaces around -- and in most cases
converted the sentences to use more appropriate punctuation. In
general, the -- in the perl docs seem to be there only to make
really complicated and really long sentences.
I didn't look at the closed em-dashes. They probably have the same
sentence-complexity problem.
I left some open em-dashes in place. Those are the ones used in
lists.
|
|
|
|
|
|
| |
A check for the warning category was missing from commit
885ef6f56b61fd750ef3b1fa614d11480baac635. Also, document
the warning category in perldiag.
|
|
|
|
| |
about it. Fixes part of [perl #68758].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
category to a new 'illegalproto' subcategory.
Two warnings can be emitted when parsing a prototype -
Illegal character in prototype for %s : %s
Prototype after '%c' for %s : %s
The first one is emitted when any invalid character is found, the latter
when further prototype-type stuff is found after a slurpy entry (i.e. valid
character but in such a place as to be a no-op, and therefore likely a bug).
These warnings are distinct from those emitted when a sub is overwritten by
one with a different prototype, and when calls are made to subroutines with
prototypes - those are in the pre-existing sub-category 'prototype'.
Since modules such as signatures.pm and Web::Simple only need to disable
the warnings during parsing, I chose to add a new category containing only
these. Moving these warnings into the 'prototype' sub-category would have
forced authors to disable more warnings than they intended, and the entire
raison d'etre of this patch is to allow the specific warnings involved to
be disabled.
In order to maintain compatibility with existing code, the new location
needed to be a sub-category of 'syntax' - this means that
no warnings 'syntax';
will continue to work as expected - even in cases like Web::Simple where all
subcategories extant prior to this patch are re-enabled (this is another
reason why a move into the 'protoype' category would not achieve the desired
goal).
The category name 'illegalproto' was chosen because the most common warning
to encounter is the "Illegal character" one, and therefore 'illegalproto'
while minorly inaccurate by ignoring the (relatively recent and unknown)
second warning is an easy name to spot on an initial skim of perllexwarn
and will behave as expected by also disabling the case of an unusual prototype
that happens to look like a normal one.
This patch updates pod/perllexwarn.pod, perldiag.pod and perl5113delta.pod
to document the new category, toke.c and warnings.pl to create and implement
the new category, and a new test t/op/protowarn.t that verifies the new
behaviour in a number of cases. It also includes the files generated by
regen.pl that are found in the repo - notably warnings.h and lib/warnings.pm.
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
| |
|
|
|
|
| |
Add a new warning "Missing argument in %s"
|