| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
64-bits on that platform require a long long, and 1UL isn't. I should
have copied more carefully the similar code in utf8.h
(reported to me privately by Craig Berry)
|
|
|
|
|
| |
These were introduced in the tr/// changes in the series
merged in 240494d6992696a7a350217c131e1d5dc1444a0c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large commit removes the last use of swashes from core.
It replaces swashes by inversion maps. This data structure is already
in use for some Unicode properties, such as case changing.
The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions. This should be faster than the previous implementation
anyway. I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges. These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.
Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster. This means that there is extra code,
purely for runtime performance.
An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255. Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it. That could be
changed if I am wrong.
Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all. This fixes some false
negatives in the previous implementation. It also allows for detecting
if the actual transliteration can be done in place. Previously, the
code mostly punted on that detection for the UTF-8 case.
This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.
A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8. If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts. The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used. That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile. AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.
Inversion maps are ideally suited for tr/// implementations. An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data. However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.
Inversion maps are also easier to deparse than swashes. A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.
One could implement specialized data structures for specific types of
inputs. For example, a common tr/// form is a single range, like
tr/A-Z/a-z/. That could be implemented without a table and be quite
fast. An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.
Thanks to Nicholas Rochemagne for his help on B
|
|
|
|
| |
This function dumps out an inversion map
|
|
|
|
|
| |
This also makes sure 'struct_size' has the correct value in it for any
future uses.
|
|
|
|
| |
For legibility and maintainability
|
|
|
|
| |
This makes it more mnemonic. Also add an explanation in toke.c
|
|
|
|
| |
Indent for clarity, and add a comment
|
|
|
|
| |
Remove trailing blanks and outdent a doubly indented block
|
|
|
|
|
| |
This is in preparation for a future commit which will surround this with
an 'if'.
|
|
|
|
| |
gh #17254
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perform only a bit check instead of a much more expensive hash
lookup to test features.
For now I've just added a U32 to the cop structure to store the bits,
if we need more we could either add more bits directly, or make it a
pointer.
We don't have the immediate need for a pointer that warning do since
we don't dynamically add new features during compilation/runtime.
The changes to %^H are retained so that caller() can be used from perl
code to check the features enabled at a given caller's scope.
|
|
|
|
|
|
| |
When looking for a suitable op-sized chunk of memory in a slab's free
list, perl logs the search but doesn't log a successful match. Add such
a log line to make analysis of the output of 'perl -DS' easier.
|
|
|
|
|
|
|
|
|
|
|
| |
When using one of the globals like $_ or @_ in a subroutine signature,
the error message was misleading:
Can't use global $_ in "my"
This commit changes it to:
Can't use global $_ in subroutine signature
|
|
|
|
|
|
|
|
|
|
| |
rpeep() already optimises away consecutive nextstate ops. This commit
makes it do this even if there are 'noop' ops between them like null,
scope, lineseq.
This has a specific utility for the next commit, which will reorganise
the optree for subroutine signatures in a way which introduces a lineseq
between two nextstates.
|
|
|
|
|
|
|
|
| |
original merge commit: v5.31.3-198-gd2cd363728
reverted by: v5.31.4-0-g20ef288c53
The commit following this commit fixes the breakage, which that means
the revert can be undone.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d2cd363728088adada85312725ac9d96c29659be, reversing
changes made to 068b48acd4bdf9e7c69b87f4ba838bdff035053c.
This change breaks installing Test::Deep:
...
not ok 37 - Test 'isa eq' completed
ok 38 - Test 'isa eq' no premature diagnostication
...
|
|
|
|
|
|
|
|
| |
The OP_ENTER planted at the start of a program (and possibly elsewhere)
gets left as UNKNOWN context rather than VOID context, due to op_scope()
not honouring the current context.
Fixing this makes things infinitesimally faster.
|
|
|
|
| |
It requires the prefix and a thread context parameter.
|
|
|
|
|
|
|
|
|
| |
RT #134344
My recent commit v5.31.2-54-g8c47b5bce7 broke some CAN modules because
the code in Perl_newFOROP() wasn't accounting for the overhead in the
opslot struct when deciding whether an allocated LISTOP was large enough
to be upgraded in-place to a LOOPOP.
|
|
|
|
|
|
|
|
|
| |
Formerly, slots were allocated within a slab, but leaving the very top
word in the slab as a NULL pointer which appeared as a fake slot so that
a 'while (slot->opslot_next)' loop would stop. Since opslot_next has
been eradicated and the NULL is no longer allocated, the loop condition
for scanning all slots can be simplified slightly (with no change in
functionality).
|
|
|
|
|
|
|
|
|
| |
Currently, each allocated opslot has a pointer to the opslot that was
allocated immediately above it. Replace this with a U16 opslot_size field
giving the size of the opslot. The next opslot can then be found by
adding slot->opslot_size * sizeof(void*) to slot.
This saves space.
|
|
|
|
|
|
|
| |
Currently a OPSLAB maintains a pointer to the lowest allocated OPSLOT
within the slab (slots are allocated downwards). Replace this pointer
with a U16 indicating how many pointer-sized words are free below the
lowest allocated slot.
|
|
|
|
|
|
|
|
| |
Currently this struct only has the opslab_size field on debugging
builds. Change it so that this field is always present. This will make
it easier to not need a fake partial OPSLOT at the end of the slab with
a NULL opslot_next field, which will in turn simplify converting
opslot_next into U16 size field shortly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each OPSLOT allocated within an OPSLAB contains a pointer, opslot_slab,
which points back to the first (head) slab of the slab chain (i.e. not
necessarily to the slab which the op is contained in).
This commit changes the pointer to be a 16-bit offset from the start of
the current slab, and adds a pointer at the start of each slab which
points back to the head slab.
The mapping from an op to the head slab is now a two-step process: use
the op's slot's opslot_offset field to find the start of the current
slab, then use that slab's new opslab_head pointer to find the head
slab.
The advantage of this is that it reduces the storage per op. (It
probably doesn't make any practical difference yet, due to alignment
issues, but that will will be sorted shortly in this branch.)
|
|
|
|
|
| |
Rename this local var to better identify that it always points to the
first slab in the slab chain, rather than to the current slab.
|
|
|
|
|
| |
Recursion is left in a few places where is necessary to call itself
with a different value for 'type'.
|
|
|
|
| |
... between switch cases for readability.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently this function asserts that its 'o' argument is non-VOID;
later when recursing an OP_LIST, it skips any kids which are VOID.
This commit changes it so that the assert becomes a return, and
OP_LIST doesn't check whether its kids are VOID.
Doing it this way makes it easier to shortly make Perl_op_lvalue_flags()
non-recursive.
The only functional difference is that on debugging builds,
Perl_op_lvalue_flags() will no longer fail an assert if inadvertently
called with a VOID op.
|
|
|
|
|
|
|
|
| |
First, move the apidoc text for op_lvalue() to be directly above
Perl_op_lvalue_flags() (it had wandered).
Secondly, add a brief non-API note explaining what the extra 'flags'
parameter does
|
|
|
|
|
| |
... after the previous commit wrapped most if it in a while loop. Also
put a blank line after each switch case for readability.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For an OP_NULL, his function formerly recursed into *all* its kids
if was an ex-list, otherwise only the first one.
To simplify making this function non-recursive, make it so that it
unconditionally recurses into all the kids.
However for now, also add an assertion that a non ex-list OP_NULL
will only have one child at most. If we find some code which violates
this, then we can nmake a more informed decision as to whether
non ex-list OP_NULL's should have all, or only their first child
examined.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For every CV that's freed which has a shared optree (e.g. a closure
or between threads), the whole optree is walked looking for PMOPs.
Make that walk non-recursive.
Contrived code that triggers a stack overflow:
{
my $outer;
my $e = 'sub { $outer && '
. join('&&', ('$x') x 100_000)
. " }";
#print $e, "\n";
eval $e;
}
Even after this commit, that code still SEGVs due to a separate stack
blow in Perl_rpeep().
|
|
|
|
| |
Previous commit added a while loop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This stops the following code from SEGVing for example:
my $e = "\$r";
$e = "+do{$e}" for 1..70_000;
$e = "push \@{$e}, 1";
eval $e;
Similarly with a long
$a[0][0][0][0].....
This commit causes a slight change in behaviour, in that scalar(o)
is now only called once at the end of the top-level doref() call,
rather than at the end of processing each child. This should make no
functional difference, apart from speeding up compiling infinitesimally.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
With this commit and some previous ones, the following code no longer
blows the stack:
my $e = "1";
$e = "do { \$x; $e}" for 1..100_000;
$e = "\@x = $e";
eval $e;
|
| |
|
|
|
|
|
|
|
|
| |
This function just blindly assumes that cUNOPo->op_first is a valid
indication that the op has at least one child. This is successful *most*
of the time. Putting in an assertion caused t/op/lvref.t to fail.
Instead, check the OPf_KIDS flag.
|
|
|
|
|
| |
It applies void context, which isn't all that obvious just from the
name.
|
|
|
|
|
| |
There are a couple of places where this function recurses, but they
are both effectively tail recursion and can be easily eliminated.
|
|
|
|
| |
plus a few blank lines for readability.
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4fec880468dad87517895b935b19a8d51e98b5a6 converted the
static boolean function S_is_list_assignment() into a 3-valued
function: S_assignment_type().
However, much of the code body still did things like 'return TRUE'.
Replace these with 'return ASSIGN_LIST' etc. These have the same
physical values, so there's no functional change here. But it makes the
code more consistent and readable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The part of this function that scans the children of e.g.
$scalar = do { void; void; scalar }
applying scalar context only to the last child: tail call optimise that
call to Perl_scalar().
It also adds some extra 'warnings' tests. An earlier attempt at this
patch caused some unrelated tests to start emitting spurious 'useless in
void context' messages, which are covered by the new tests.
This also showed up that the current method for updating PL_curcop
while descending optrees in Perl_scalar/scalarvoid/S_scalarseq is a bit
broken. It gets updated every time a newstate op is seen, but haphazardly
(and sometimes wrongly) restored to &PL_compiling when going back up the
tree. One of the tests is TODO based on PL_curcop being wrong and so the
'no warnings "void"' leaking into an outer scope.
This commit maintains the status quo.
|
|
|
|
|
|
|
|
|
| |
The if statement that scans children applying void context to all except
the last child:
1) document what it does;
2) reorganise it (without changing its logical meaning) to make it
simpler to understand, and to make the next commit easier.
|
|
|
|
|
| |
.. that has just been wrapped in a while loop.
Whitespace-only change.
|