| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This is intended as a minimal commit due to the current stage
of the release process.
fixes #17268
|
|
|
|
|
| |
len is only used in these functions to pass to the other function when
recursing.
|
|
|
|
|
| |
The len variable is used, but the value is overwritten before being
read.
|
| |
|
|
|
|
| |
Introduced in 0ae5281a2d.
|
| |
|
| |
|
|
|
|
| |
Co-authored-by: Tony Cook <tony@develop-help.com>
|
|
|
|
| |
This is in preparation for it being called from more than one place.
|
|
|
|
|
| |
This if for clarity as to what's going on, and to simplify some
expressions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code for dealing with charnames is intertwined and special cased in
S_new_constant. My guess is it was originally to offer customized,
better error messages when things go wrong. Much later the function was
changed so that a message could be returned instead of output, and the
code didn't really need the customization any longer. But by then
autoloading of charnames had been added when a \N[} was parsed, meaning
that more special casing was added instead, as that had been the logical
place to do it.
This commit extracts the special charnames handling to the one place it
is actually used, and the disentangled S_new_constant is then called.
This is in preparation for future commits, and makes the code cleaner.
This adds testing of the new syntax to lib/charnames.t. That file
randomly generates some tests, simply because there are too many names
to test reasonably at once. To compensate for the added tests, I
lowered the percentage per run of characters tested so that this file
takes about the same amount of time as before.
|
|
|
|
|
|
|
|
| |
${10} and $10 were handled differently, this patch makes them be handled
the same. It also forbids multi-digit numeric variables from starting
with 0. Thus $00 is now a new fatal exception
"Numeric variables with more than one digit may not start with '0'"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix issue #16535 - $t[index $x, $y] should not throw Multidimensional
array warnings.
The heuristic for detecting lists in array subscripts is implemented
in toke.c, which means it is not particularly reliable. There are
lots of ways that code might return a list in an array subscript.
So for instance $t[do{ $x, $y }] should throw a warning but doesn't.
On the other hand, we can make this warning less likely to happen
by being a touch more careful about how we parse the inside of the
square brackets so we do not throw an exception from $t[index $x,$y].
Really this should be moved to the parser so we do not need to rely
on fallable heuristics, and also into the runtime so that if we have
$t[f()]
and f() returns a list we can also warn there. But for now this
improves things somewhat.
|
|
|
|
|
|
| |
This now croaks if the input is an illegal code point. Before, it
likely would eventually croak if that code point was actually used in
some manner.
|
|
|
|
|
| |
The remaining function in this file is moved to inline.h, just to not
have an extra file lying around with hardly anything in it.
|
|
|
|
|
|
|
|
| |
This changes warning messages for too short \0 octal constants to use
the function introduced in the previous commit. This function assures a
consistent and clear warning message, which is slightly different than
the one this commit replaces. I know of no CPAN code which depends on
this warning's wording.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes these functions to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within these functions. The messages are somewhat changed from
previously to be clearer. I did not find any code in CPAN that relied
on the previous message text.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) These could generate truncated messages due to the (mostly)
single-pass nature of pattern compilation that is now in effect. It
keeps track of where during a parse a message has been output, and
won't output it again if a second parsing pass turns out to be
necessary. Prior to this commit, it had to assume that a message
from one of these functions did get output, and this caused some
out-of-bounds reads when a subparse (using a constructed pattern) was
executed. The possibility of those went away in commit 5d894ca5213,
which guarantees it won't try to read outside bounds, but that may
still mean it is outputting text from the wrong parse, giving
meaningless results. This commit should stop that possibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes this function to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within this function.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) The messages could be truncated due to the (mostly) single-pass
nature of pattern compilation that is now in effect. It keeps track
of where during a parse a message has been output, and won't output
it again if a second parsing pass turns out to be necessary. Prior
to this commit, it had to assume that a message from one of these
functions did get output, and this caused some out-of-bounds reads
when a subparse (using a constructed pattern) was executed. The
possibility of those went away in commit 5d894ca5213, which
guarantees it won't try to read outside bounds, but that may still
mean it is outputting text from the wrong parse, giving meaningless
results. This commit should stop that possibility.
|
| |
|
|
|
|
|
| |
These generated warnings on certain platform builds, and weren't the
best types for the purpose anyway.
|
|
|
|
|
|
|
| |
This replaces strchr("list", c) calls throughout the core. They don't
work properly when 'c' is a NUL, returning the position of the
terminating NUL in "list" instead of failure. This could lead to
segfaults or even security issues.
|
|
|
|
|
|
|
|
|
|
|
| |
This is useful in Devel::PPPort for generating its api-info data. That
useful feature of D:P allows someone to find out what was the first
release of Perl to have a function, macro, or flag. And whether using
ppport.h backports it further.
I went through apidoc.pod and looked for flags that were documented but
that D:P didn't know about. This commit adds entries for each so that
D:P can find them.
|
|
|
|
|
|
|
| |
These are illegal in C, but we have plenty of them around; I happened
to be looking at this function, and decided to fix it. Note that only
the macro name is illegal; the function was fine, but to change the
macro name means changing the function one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a new infix operator named `isa`, with the semantics that
$x isa SomeClass
is true if and only if `$x` is a blessed object reference that is either
`SomeClass` directly, or includes the class somewhere in its @ISA
hierarchy. It is false without warning or error for non-references or
non-blessed references.
This operator respects `->isa` method overloading, and is intended to
replace boilerplate code such as
use Scalar::Util 'blessed';
blessed($x) and $x->isa("SomeClass")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LGTM provides static code analysis and recommendations for code quality
improvements. Their recent run over the Perl 5 core distribution
identified 12 instances where a local variable hid a parameter of
the same name in an outer scope. The LGTM rule governing this situation
can be found here:
Per: https://lgtm.com/rules/2156240606/
This patch renames local variables in approximately 8 of those instances
to comply with the LGTM recommendation. Suggestions for renamed
variables were made by Tony Cook.
For: https://github.com/Perl/perl5/pull/17281
|
|
|
|
| |
sprinkle a few random 'dVAR's at the top of some fns.
|
|
|
|
| |
Also references to the term.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large commit removes the last use of swashes from core.
It replaces swashes by inversion maps. This data structure is already
in use for some Unicode properties, such as case changing.
The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions. This should be faster than the previous implementation
anyway. I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges. These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.
Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster. This means that there is extra code,
purely for runtime performance.
An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255. Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it. That could be
changed if I am wrong.
Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all. This fixes some false
negatives in the previous implementation. It also allows for detecting
if the actual transliteration can be done in place. Previously, the
code mostly punted on that detection for the UTF-8 case.
This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.
A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8. If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts. The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used. That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile. AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.
Inversion maps are ideally suited for tr/// implementations. An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data. However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.
Inversion maps are also easier to deparse than swashes. A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.
One could implement specialized data structures for specific types of
inputs. For example, a common tr/// form is a single range, like
tr/A-Z/a-z/. That could be implemented without a table and be quite
fast. An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.
Thanks to Nicholas Rochemagne for his help on B
|
|
|
|
| |
This makes it more mnemonic. Also add an explanation in toke.c
|
|
|
|
| |
Wrap a too-long line
|
|
|
|
|
| |
These should have been included in
0c311b7c345769239f38d0139ea7738feec5ca4d
|
|
|
|
|
| |
They were only ever passed as zeros, so just make them local to the
function.
|
|
|
|
| |
Also only initialise it just before it's actually used.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The downside of writing these calls recursively is that not all compilers
will compile the tail-position calls as jumps; that's especially true in
earlier versions of this refactoring process (where yyl_try() took a large
number of arguments), but it's not in general something we can expect to
happen — especially in the presence of `-O0` or similar compiler options.
This can lead to call-stack overflow in some circumstances.
Most recursive calls to yyl_try() occur within yyl_try() itself, so we can
easily replace them with an explicit `goto` (which is what most compilers
would use for the recursive calls anyway, now that yyl_try() takes ≤3
parameters).
There are only two other recursive-call cases. One is yyl_fake_eof(), which
as far as I can tell is never called repeatedly within a single file; this
seems safe.
The other is yyl_eol(). It has exactly two distinct return paths, so this
commit moves the retry logic into its yyl_try() caller.
With this change, we no longer seem to trigger call-stack overflow.
Closes #17220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's exactly one place where we need to consult it (and that only for
producing good error messages in a specific group of term-after-term
situations).
The reason for passing it around was so that it could be reset to false
early on in the process of lexing a token, while then allowing the three
separate cases that might need to set it true to do so independently.
Instead, centralise the logic of determining when it needs to be true.
|
| |
|
|
|
|
|
| |
With this commit, yyl_try() has few enough arguments that the RETRY()
macro no longer serves any useful purpose; delete it too.
|
|
|
|
|
|
|
| |
I thought I was going to end up using this for more stuff, but I've
found better approaches.
This commit also removes two more goto targets.
|
| |
|
| |
|
|
|
|
| |
This makes calls to it much easier to understand.
|
| |
|
|
|
|
|
|
|
| |
I introduced these parameters as part of mechanically refactoring goto-heavy
logic into subroutines. However, they aren't actually needed through most of
the code. Even in the recursive case (in which yyl_try() or one of its
callees will call itself), we can reset the variables to zero.
|
| |
|
|
|
|
| |
This permits some additional pleasing simplifications.
|
|
|
|
| |
With the removal of another goto label!
|
| |
|