| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These are identified as being static shared COW strings whose string
buffer points directly at PL_Yes / PL_No
Define sv_setbool() and sv_setbool_mg() macros
Use sv_setbool() where appropriate
Have sv_dump() annotate when an SV's PV buffer is one of the PL_(Yes|No) special booleans
|
|
|
|
|
|
|
|
| |
This saves a conditional in many cases. Core Perl doesn't call this
on an empty string, so the first test that it is empty is redundant.
We can't guarantee this for non-core calls, so the conditional is made
explicit for them.
|
|
|
|
| |
This resolves GH #19069
|
|
|
|
|
|
|
|
|
| |
Now that the DFA is used by the only callers to this to eliminate the
need to check for e.g., wrong continuation bytes, this function can be
refactored to use a switch statement, which makes it clearer, shorter,
and faster.
The name is changed to indicate its private nature
|
|
|
|
| |
This makes it use the fast DFA for this functionality.
|
|
|
|
|
|
|
|
| |
The DFA macro for determining if a sequence is valid UTF-8 was
deliberately made general enough to accommodate this use-case, in which
only a partial character is acceptable. Change the code to use the DFA.
The helper function's name is changed to indicate it is private
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are currently three functions for variants of finding if the next
few bytes of a string form a proper UTF-8 encoded character of some ilk.
The main code for each is identical to the others, except for the table
that drives it.
This commit makes that code a macro that takes arguments to customize
its behavior sufficiently for current and forseeable needs.
This makes it easier to keep the varieties in sync with each other with
future changes.
The macro has three exit points: 1) successful parsing
2) unsuccessful parsing
3) succesful parsing as far as it went,
but the input was exhausted before
reaching a full character.
What to do for each of these eventualities is passed to the macro. This
is a change in behavior in which 2) and 3) were not distinguished from
each other. This actually leads to fewer tests in some situations, and
future commits using this DFA for other purposes will take advantage of
it.
|
|
|
|
|
|
|
| |
Now that previous commits have made it fast to find the position of the
first set bit in a word, we can use a forumla to find how many bytes the
UTF-8 of that will occupy. This allows for simplification of this
macro, removing several conditionals
|
|
|
|
|
|
|
|
|
| |
This commit makes is_HANGUL_ED_utf8_safe() return 0 unconditionally on
EBCDIC platforms. This means its callers don't have to care what
platform is running. Change the two callers to take advantage of this
The commit also changes the description of the macro to be slightly more
accurate
|
|
|
|
|
|
| |
Experiments by Tomasz Konojacki indicated that gcc, for one, doesn't
optimally optimize a subtraction from 2**n-1. This commit tells the
compiler the optimization.
|
|
|
|
|
|
|
|
|
| |
Some platforms have a fast way to get the msb but not the lsb; others,
more rarely, have the reverse. But using a few shift and the like
instructions allows us to reduce either instance to terms of the other.
This commit causes any available fast method to be used by turning the
non-available case into the available one
|
| |
|
|
|
|
|
| |
Windows has different intrinsics than the previous commit added, with a
different API for counting leading/trailing zeros
|
|
|
|
|
|
| |
On many modern platforms these functions can be replaced by a single
machine instruction or two. This commit looks for this possibility and
uses it if possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing code to determine the position of the most significant 1
bit in a word is extracted from variant_byte_number(), and generalized
to use the deBruijn method previously added that works on any bit in the
word, rather than the existing method which looks just at the msb of
each byte. The code is moved to a new function in preparation for being
called from other places.
A U32 version is created, and on 64 bit platforms, a second, parallel,
version taking a U64 argument is also created. This is because future
commits may care about the word size differences.
|
|
|
|
|
|
|
|
|
|
| |
The existing code to determine the position of the least significant 1
bit in a word is extracted from variant_byte_number() and moved to a new
function in preparation for being called from other places.
A U32 version is created, and on 64 bit platforms, a second, parallel,
version taking a U64 argument is also created. This is because future
commits may care about the word size differences.
|
|
|
|
|
| |
This should be called only when it is known there is a variant byte.
The assert() previously wasn't checking that precisely
|
|
|
|
|
|
|
| |
The current mechanism doesn't work if the lowest bit is the one set. At
the moment that doesn't matter as we aren't looking at that bit anyway.
But a future commit will refactor things so that bit will be looked at.
So prepare for that. The new expression is simpler, besides.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will prove useful in future commits on platforms that have 64 bit
capability.
The deBruijn sequence used here, taken from the internet, differs
from the 32 bit one in how they treat a word with no set bits. But this
is considered undefined behavior, so that difference is immaterial.
Apparently figuring this out uses brute force methods, and so I decided
to live with this difference, rather than to expend the time needed to
bring them into sync.
|
|
|
|
|
|
| |
This moves the code from regcomp.c to inline.h that calculates the
position of the lone set bit in a U32. This is in preparation for use
by other call sites.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These three functions to determine if the next bit of a string is UTF-8
(constrained in three different ways) have basically the same short
loop.
One of the initial conditions in the while() is always true the first
time around. By moving that condition to the middle of the loop, we
avoid it for the common case where the loop is executed just once. This
is when the input is a UTF-8 invariant character (ASCII on ASCII
platforms).
If the functions were constrained to require the first byte pointed to
by the input to exist, the while() could be a do {} while(), and there
would be no extra conditional in calling this vs checking if the next
character is invariant, and if not calling this. And there would be
fewer conditionals for the case of 2 or more bytes in the character.
|
| |
|
|
|
|
|
|
| |
It is legal to call this function, though not so done in core, with
empty input. By swapping two conditions in the same 'if', we check if
empty before trying to access it.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The try change added code to pp_return to skip past try contexts
when looking for the sub/sort/eval context to return from.
This was only needed because cx_pusheval() sets si_cxsubix to the
current frame and try uses that function to push it's context, that
value is then used by the dopopto_cursub() macro to shortcut
walking the context stack.
Since we don't need to treat try as a sub for return, list vs array
checks or lvalue sub checks, don't set si_cxsubix on try.
|
|
|
|
|
|
|
|
|
|
|
| |
This just detabifies to get rid of the mixed tab/space indentation.
Applying consistent indentation and dealing with other tabs are another issue.
Done with `expand -i`.
* vutil.* left alone, it's part of version.
* Left regen managed files alone for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes GH #18341
There are problems with getenv() on threaded perls wchich can lead to
incorrect results when compiled with PERL_MEM_LOG.
Commit 0b83dfe6dd9b0bda197566adec923f16b9a693cd fixed this for some
platforms, but as Tony Cook, pointed out there may be
standards-compliant platforms that that didn't fix.
The detailed comments outline the issues and (complicated) full solution.
|
|
|
|
|
|
|
|
| |
get_env() needs to lock other threads from writing to the environment
while it is executing. It may need to have an exclusive lock if those
threads can clobber its buffer before it gets a chance to save them.
The previous commit has added a Configure probe which tells us if that
is the case. This commit uses it to select which type of mutex to use.
|
|
|
|
| |
5.32 did this for one form; now all do.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were occurring on FreeBSD smokes.
warning: implicit conversion from 'IV' (aka 'long') to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion]
9223372036854775807 is IV_MAX. What needed to be done here was to use
the NV containing IV_MAX+1, a value that already exists in perl.h
In other instances, simply casting to an NV before doing the comparison
with the NV was what was needed.
This fixes #18328
|
|
|
|
|
|
| |
This function is called only at compile time; experience has shown that
compile-time operations are not time-critical. And future commits will
lengthen it, making it not practically inlinable anyway.
|
|
|
|
|
|
|
|
|
|
|
| |
This feature allows documentation destined for perlapi or perlintern to
be split into sections of related functions, no matter where the
documentation source is. Prior to this commit the line had to contain
the exact text of the title of the section. Now it can be a $variable
name that autodoc.pl expands to the title. It still has to be an exact
match for the variable in autodoc, but now, the expanded text can be
changed in autodoc alone, without other files needing to be updated at
the same time.
|
| |
|
| |
|
|
|
|
|
| |
This uses a new organization of sections that I came up with. I asked
for comments on p5p, but there were none.
|
|
|
|
|
| |
apidoc_section is slightly favored over head1, as it is known only to
autodoc, and can't be confused with real pod.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This inline function was added by v5.31.0-27-g3a019afd6f to consolidate
similar code in several places, like pp_add(). It also avoided undefined
behaviour, as seen by ASan, by no longer unconditionally trying to cast
an NV to IV - ASan would complain when nv was -Inf for example.
However that commit introduced a performance regression into common
numeric operators like pp_and(). This commit partially claws back
performance by skipping the initial test of 'skip if Nan' which called
Perl_isnan(). Instead, except on systems where NAN_COMPARE_BROKEN is
true, it relies on NaN being compared to anything always being false,
and simply rearranges existing conditions nv < IV_MIN etc to be
nv >= IV_MIN so that any NaN comparison will trigger a false return.
This claws back about half the performance loss. The rest seems
unavoidable, since the two range tests for IV_MIN..IV_MAX are an
unavoidable part of avoiding undefined behaviour.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GH #18081
A sub call via return in a sort block was called in void rather than
scalar context, causing the comparison result to be discarded.
This because when a sort block is called it is not a real function
call, even though a sort block can be returned from. Instead, a
CXt_NULL is pushed on the context stack. Because this isn't a sub-ish
context type (unlike CXt_SUB, CXt_EVAL etc) there is no 'caller sub'
on the context stack to be found to retrieve the caller's context
(i.e. cx->cx_gimme).
This commit fixes it by special-casing Perl_gimme_V().
Ideally at some future point, a new context type, CXt_SORT, should be
added. This would be used instead of CXt_NULL when a sort BLOCK is
called. Like other sub-ish context types, it would have an old_cxsubix
field and PL_curstackinfo->si_cxsubix would point to it. This would
eliminate needing special-case handling in places like Perl_gimme_V().
|
|
|
|
|
|
|
|
|
| |
This returns the number of elements in an array in a clearly named
function.
av_top_index(), av_tindex() are clearly named, but are less than ideal,
and came about because no one back then thought of this one, until now
Paul Evans did.
|
|
|
|
|
| |
It only does anything under PERL_GLOBAL_STRUCT, which is gone.
Keep the dNOOP defintion for CPAN back-compat
|
|
|
|
|
| |
Mostly in comments and docs, but some in diagnostic messages and one
case of 'or die die'.
|
|
|
|
|
|
|
| |
This reverts commit 6c714a09cc08600278e72aea1fcdf83576d061b4.
croak_memory_wrap is designed to save a few bytes of memory, and was
never intended to be inlined. This commit moves it to util.c where the
other croak functions are.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit added a mutex specifically for protecting against
simultaneous accesses of the environment. This commit changes the
normal getenv, putenv, and clearenv functions to use it, to avoid races.
This makes the code simpler in places where we've gotten burned and
added stuff to avoid races. Other places where we haven't known we were
getting burned could have existed until now. Now that comes
automatically, and we can remove the special cases we earlier stumbled
over.
getenv() returns a pointer to static memory, which can be overwritten at
any moment from another thread, or even another getenv from the same
thread. This commit changes the accesses to be under control of a
mutex, and in the case of getenv, a mortalized copy is created so that
there is no possible race.
|
| |
|
| |
|
| |
|
|
|
|
| |
This internal function is more properly bool, not I32.
|
|
|
|
|
| |
The remaining function in this file is moved to inline.h, just to not
have an extra file lying around with hardly anything in it.
|
|
|
|
|
|
|
| |
This commit changes this function to use memchr() instead of looping
byte-by-byte through the string. And it inlines it into 3 lines of
code. This should give comparable performance to a native libc
strnlen().
|