| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On something like $hash{constantstring}, at compile-time the
PVX string on the SV attached to the OP_CONST is converted into a
HEK (with an appropriate offset shift).
At run-time on hash keying, this HEK is used to speed up the bucket
search; however it turns out that this can be improved. Currently,
the main bucket loop does:
for (; entry; entry = HeNEXT(entry)) {
if (HeHASH(entry) != hash)
continue;
if (HeKLEN(entry) != (I32)klen)
continue;
if (HeKEY(entry) != key && memNE(HeKEY(entry),key,klen))
continue;
if ((HeKFLAGS(entry) ^ masked_flags) & HVhek_UTF8)
continue;
The 'HeKEY(entry) != key' test is the bit that allows us to skip the
memNE() when 'key' is actually part of a HEK. However, this means that in
the const HEK scenario, for a match, we do pointless hash, klen and
HVhek_UTF8 tests, when HeKEY(entry) == key is sufficient for a
match. Conversely, in the non-const-HEK scenario, the 'HeKEY(entry) !=
key' will always fail, and so it's just dead weight in the loop.
To work around this, this commit splits the code into two separate bucket
search loops; one for const-HEKs that just compare HEK pointers, and a
general loop that now doesn't have do the 'HeKEY(entry) != key' test.
Analysing this code with cachegrind shows that with this commit, lookups
of constant keys that exist (e.g. the typical perl object scenario,
$self->{somefield}) takes 15% less instruction reads in hv_common(), 14%
less data reads and 27% less writes.
A lookup with a non-existing constant key ($hash{not_exist}) is about the
same as before (0.7% improvement).
Non-constant existing lookup ($hash{$existing_key}) is about 5% less
instructions, while $hash{$non_existing_key} is about 0.7%.
|
|
|
|
|
|
|
|
| |
You need to configure with g++ *and* -Accflags=-DPERL_GLOBAL_STRUCT
or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference.
(g++ does not do the "post-annotation" form of "unused".)
The version code has some of these issues, reported upstream.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removing context params will save machine code in the callers of these
functions, and 1 ptr of stack space. Some of these funcs are heavily used
as mg_find*. The contexts can always be readded in the future the same way
they were removed. This patch inspired by commit dc3bf40570. Also remove
PERL_UNUSED_CONTEXT when its not needed. See removal candidate rejection
rational in [perl #122106].
-Perl_hv_backreferences_p uses context in S_hv_auxinit
commit 96a5add60f was wrong
-Perl_whichsig_sv and Perl_whichsig_pv wrongly used PERL_UNUSED_CONTEXT
from inception in commit 84c7b88cca
-in authors opinion cast_* shouldn't be public API, no CPAN grep usage,
can't be static and/or inline optimized since it is exported
-Perl_my_unexec move to block where it is needed, make Win32 block, context
free, for inlining likelyhood, private api and only 2 callers in core
-Perl_my_dirfd make all blocks context free, then change proto
-Perl_bytes_cmp_utf8 wrongly used PERL_UNUSED_CONTEXT
from inception in commit fed3ba5d6b
|
|
|
|
|
|
|
|
| |
This meant sprinkling some PERL_UNUSED_CONTEXT invocations,
as well as stopping some functions from getting my_perl in
the first place; all of the functions in the latter category
are internal (S_ prefix and s or i in embed.fnc), so
this should be both safe and economical.
|
|
|
|
| |
This silences a chunk of warnings under -Wformat
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike other pod handling routines, autodoc requires the line following
an =head1 to be non-empty for its text to be included in the paragraph
started by the heading. If you fail to do this, silently the text will
be omitted from perlapi. I went through the source code, and where it
was apparent that the text was supposed to be in perlapi, deleted the
empty line so it would be, with some revisions to make more sense.
I added =cuts where I thought it best for the text to not be included.
|
|
|
|
|
|
| |
Fix for Coverity perl5 CID 28935:
Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
result_independent_of_operands: (unsigned long)entry->hent_hek->hek_hash >> 47 /* 64 - 17 */ is 0 regardless of the values of its operands. This occurs as the bitwise second operand of '|'.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assumption is that the time/space tradeoff of not allocating
the HvAUX() structure goes away for a large bucket array where the
size of the allocated buffer is much larger than the nonallocated
HvAUX() "extension".
This should make keys() and each() on larger hashes faster, but
still preserve the essence of the original space conservation,
where the assumption is a lot of small hash based objects which
will never be traversed.
|
|
|
|
|
|
| |
Changes nothing except that it introduces hv_auxinit_interal() which
does part of the job of hv_auxinit(), so that we can call it from
somewhere else in the next commit.
|
|
|
|
| |
HvUSEDKEYS contains a function call nowadays. Don't call it repeatedly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the HvAUX structure is just tacked onto the end of the HvARRAY()
struct, code like this can do bad things:
aux = HvAUX();
... something that might split hv ...
aux->foo = ...; /* SEGV! */
So I've visually audited core for places where HbAUX() is saved and then
re-used, and re-initialised the var if it looks like HvARRAY() could
have changed in the meantime.
I've been very conservative about what might be unsafe. For example,
destructors or __WARN__ handlers could call perl code that modifies the
hash.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an extra U32 general flags field to the xpvhv_aux struct (which is
used on HVs such as stashes, that need extra fields).
On 64-bit systems, this doesn't consume any extra space since there's
already an odd number of I32/U32 fields. On 32-bit systems it will consume
an extra 4 bytes. But of course only on those hashes that have the aux
struct.
As well as providing extra flags in the AUX case, it will also allow
us to free up at least one general flag bit for HVs - see next commit.
|
|
|
|
| |
This should fix RT #116441 and possibly other bugs.
|
|
|
|
| |
This reduces the size of hv.o by 32 bytes under clang.
|
|
|
|
| |
plus some typo fixes. I probably changed some things in perlintern, too.
|
|
|
|
|
|
|
| |
In those cases where the hash key comes from a hek, we already have a
computed hash value, so pass that to hv_common.
The easiest way to accomplish this is to add a new macro.
|
|
|
|
|
| |
This uses macros which work cross-platform. This has the added advantge
what is going on is much clearer.
|
|
|
|
|
|
|
| |
Iterated hashes shouldn’t have to allocate space for something
specific to stashes, so move the SUPER method cache from the
HvAUX struct (which all iterated hashes have) into the mro
meta struct (which only stashes have).
|
|
|
|
|
|
|
| |
This is left over from when READONLY+FAKE meant copy-on-write.
Read-only copy-on-write scalars (which could not occur with the old
way of flagging things) must not be exempt from hash key restrictions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is part of #116907, too. It also fixes #72924 as a side effect;
the next commit will explain.
The value of pos($foo) was being stored as an I32, not allowing values
above I32_MAX. Change it to SSize_t (the signed equivalent of size_t,
representing the maximum string length the OS/compiler supports).
This is accomplished by changing the size of the entry in the magic
struct, which is the simplest fix.
Other parts of the code base can benefit from this, too.
We actually cast the pos value to STRLEN (size_t) when reading
it, to allow *very* long strings. Only the value -1 is special,
meaning there is no pos. So the maximum supported offset is
2**sizeof(size_t)-2.
The regexp engine itself still cannot handle large strings, so being
able to set pos to large values is useless right now. This is but one
piece in a larger puzzle.
Changing the size of mg->mg_len also requires that
Perl_hv_placeholders_p change its type. This function
should in fact not be in the API, since it exists
solely to implement the HvPLACEHOLDERS macro. See
<https://rt.perl.org/rt3/Ticket/Display.html?id=116907#txn-1237043>.
|
|
|
|
| |
It was not clear that it referred to uvar magic.
|
|
|
|
|
|
| |
This avoids HvFILL() being O(n) for large n on large hashes, but also avoids
storing the value of HvFILL() in smaller hashes (ie a memory overhead on
every single object built using a hash.)
|
|
|
|
|
|
|
| |
No keys implies no chains used, so the return value is 0. One key
unambiguously means 1 chain used, and all the others are free. Two or more
keys might share the same chain, or might not, so the calculation can't be
short-circuited.
|
|
|
|
|
| |
The are lots of places where local vars aren't used when compiled
with NO_TAINT_SUPPORT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control
the level of randomization applied to keys() and friends.
When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The
chance that keys() changes due to an insert will be the same as in
previous perls, basically only when the bucket size is changed.
When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable
way. The chance that keys() changes due to an insert will be very high.
This is the most secure and default mode.
When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way.
Repititive runs of the same program should produce the same output every
time. The chance that keys changes due to an insert will be very high.
This patch also makes PERL_HASH_SEED imply a non-default
PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies
PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED
to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable
hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a
different level overrides this behavior.
Includes changes to allow one to compile out various aspects of the
patch. One can compile such that PERL_PERTURB_KEYS is not respected, or
can compile without hash key traversal randomization at all. Note that
support for these modes is incomplete, and currently a few tests will
fail.
Also includes a new subroutine in Hash::Util::hash_traversal_mask()
which can be used to ensure a given hash produces a predictable key
order (assuming the same hash seed is in effect). This sub acts as a
getter and a setter.
NOTE - this patch lacks tests, but I lack tuits to get them done quickly,
so I am pushing this with the hope that others can add them afterwards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The usages are as far as I know incorrect anyway. We resize
the hash bucket array based on the number of keys it holds,
not based on the number of buckets that are used, so this
usage was wrong anyway.
Another bug that this revealed is that the old code would allow
HvMAX(hv) to fall to 0, even though every other part of the
core expects it to have a minimum of 7 (meaning 8 buckets).
As part of this we change the hard coded 7 to a defined constant
PERL_HASH_DEFAULT_HvMAX.
After this patch there remains one use of HvFILL in core, that used
for scalar(%hash) which I plan to remove in a later patch.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inserting into a hash that is being traversed with each()
has always produced undefined behavior. With hash traversal
randomization this is more pronounced, and at the same
time relatively easy to spot. At the cost of an extra U32
in the xpvhv_aux structure we can detect that the xhv_rand
has changed and then produce a warning if it has.
It was suggested on IRC that this should produce a fatal
error, but I couldn't see a clean way to manage that with
"strict", it was much easier to create a "severe" (internal)
warning, which is enabled by default but suppressible with
C<no warnings "internal";> if people /really/ wanted.
|
|
|
|
|
|
|
| |
This serves two functions, it makes it harder for an attacker
to learn useful information by viewing the output of keys(),
and it makes "insert during traversal" errors much easier to
spot, as they will almost always produce degenerate behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When inserting into a hash results in a collision the order of the items
in the bucket chain is predictable (FILO), and can be used to determine
that a collision has occured.
When a hash is too small for the number of items it holds we double
its size and remap the items as required. During this process the
keys in a bucket will reverse order, and exposes information to an
attacker that a collision has occured.
We therefore use the PL_hash_rand_bits() and the S_ptr_hash()
infrastructure to randomly "perturb" the order that colliding
items are inserted into the bucket chain. During insertion and
mapping instead of doing a simple "insert to top" we check the low
bit of PL_hash_rand_bits() and depending if it is set or not we
insert at the top of the chain, otherwise second from the top.
The end result being that the order in a bucket is less predictable,
which should make it harder for an attacker to spot a collision.
Every insert (via hv_common), and bucket doubling (via hsplit())
results in us updating PL_hash_rand_bits() using "randomish" data
like the hashed bucket address, the hash of the inserted item, and
the address of the inserted item.
This also updates the xhv_rand() of the hash, if there is one, during
S_hsplit() so that the iteration order changes when S_hsplit() is
called. This also is intended to make it harder for an attacker to
aquire information about collisions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
S_ptr_hash() - A new static function in hv.c which can be used to
hash a pointer or integer.
PL_hash_rand_bits - A new interpreter variable used as a cheap
provider of "semi-random" state for use by the hash infrastructure.
xpvhv_aux.xhv_rand - Used as a mask which is xored against the
xpvhv_aux.riter during iteration to randomize the order the actual
buckets are visited.
PL_hash_rand_bits is initialized as interpreter start from the random
hash seed, and then modified by "mixing in" the result of ptr_hash()
on the bucket array pointer in the hv (HvARRAY(hv)) every time
hv_auxinit() allocates a new iterator structure.
The net result is that every hash has its own iteration order, which
should make it much more difficult to determine what the current hash
seed is.
This required some test to be restructured, as they tested for something
that was not necessarily true, we never guaranteed that two hashes with
the same keys would produce the same key order, we merely promised that
using keys(), values(), or each() on the same hash, without any
insertions in between, would produce the same order of visiting the
key/values.
|
|
|
|
| |
This saves one call to HvPLACEHOLDERS_get().
|
|
|
|
|
|
|
|
|
|
|
| |
Which makes me realise that if we clear placeholders, we may be able to
avoid the need to split at all.
(In fact, as hv_common() only adds one key to the hash, under the current
definition of DO_HSPLIT() which only considers total number of keys,
clearing any placeholder is going to be enough to drop the total number of
keys, and so no longer trigger the split. But we'll leave the code making a
second check, to avoid a tight coupling with the internals of DO_HSPLIT().)
|
|
|
|
|
| |
Seems pointless to check the exit condition before any iterations, when we
know that it will always be false the first time.
|
|
|
|
|
| |
The code duplication that introduced hv_ksplit() as a fork of hsplit() back
with commit 72940dca186befa0 in Sept 1996 is finally healed.
|
|
|
|
|
|
|
| |
This mimics the behaviour in Perl_hv_ksplit().
Also remove a vestigial comment. The code it relates to was removed in
commit 7dc8663964c66a69 in Nov 2012.
|
|
|
|
|
|
|
|
| |
Whilst this is slightly more work for its existing two callers, it will
permit Perl_hv_ksplit() to also call it.
Use STRLEN for the parameters, and change a local variable from I32 to
STRLEN to match.
|
|
|
|
|
| |
This makes the rest of the code of Perl_hv_ksplit() closer to that of
S_hsplit().
|
|
|
|
| |
Making the code as similar as possible will make it simpler to merge the two.
|
|
|
|
|
|
|
|
|
|
| |
The relevant code calls Perl_hv_clear_placeholders() at split time, if there
are still placeholders left over from a (previously) restricted hash.
There are two callers to S_hsplit(), one from the regular HV code, and one
from the shared string table code. As the shared string table can't contain
placeholders, only the other call site could trigger this condition, so move
the code there. This simplifies S_hsplit(), and will make splitting the
shared string table marginally faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
STRANGE_MALLOC was added in 5.002 beta 1 (4633a7c4bad06b47) as part of an
work around for typical mallocs, which had a bad interaction with perl's
allocation needs. Specifically, repeatedly extending an array and then
creating SV heads (such as when reading lines of a file into an array)
could end up with each reallocation for the array being unable to extend in
place, needing a fresh chunk of memory, and the released memory not being
suitable for use as more SV heads, so sitting unused. The solution was for
perl to recycle the old array body as SV heads, instead of returning it to
the system, passing the memory from the the AV code to the SV code using
offer_nice_chunk(), PL_nice_chunk and PL_nice_chunk_size.
STRANGE_MALLOC was actually a signal that the malloc() didn't need
protecting from itself, and to disable the work around.
offer_nice_chunk(), PL_nice_chunk and PL_nice_chunk_size were removed by
commit 9a87bd09eea1d037 in Nov 2010, without any ill effects, hence the
code used when STRANGE_MALLOC was *not* defined is essentially doing extra
work for no benefits.
From the lack of problems reported, one can assume that in the intervening
15 years malloc technology has got significantly improved, and it is probably
better to be honest with it, rather than trying to second guess it.
Hence remove all the non-STRANGE_MALLOC code, and leave everyone using the
much simpler code. See also
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-11/msg00495.html
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-01/msg00126.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose is less machine instructions/faster code.
* S_hv_free_ent_ret() is always called with entry non-null: so change its
signature to reflect this, and remove a null check;
* Add some SvREFCNT_dec_NNs;
* In hv_clear(), refactor the code slightly to only do a SvREFCNT_dec_NN
within the branch where its already been determined that the arg is
non-null; also, use the _nocontext variant of Perl_croak() to save
a push instruction in threaded perls.
|
|
|
|
|
| |
In this instance, we know that av is not null, so no need to check
whether it is
|
|
|
|
|
|
|
| |
There is no point in modifying a struct just before freeing it.
This was my mistake, in commit 47f1cf7702, when I moved the code from
S_hfreeentries to hv_undef.
|
|
|
|
|
|
|
| |
This finishes the removal of register declarations started by
eb578fdb5569b91c28466a4d1939e381ff6ceaf4. It neglected the ones in
function parameter declarations, and didn't include things in dist, ext,
and lib, which this does include
|
|
|
|
|
|
| |
We do not want to resize the hash every time the bucket length is
too long. Nor do we want to pay the price of checking how long
the bucket length is when there is nothing we can do about it anyway.
|
|
|
|
|
| |
note: this failed to build in smoke-me eg.
http://perl.develop-help.com/raw/?id=131750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
*) Introduces multiple new hash functions to choose from at build
time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and
One-at-a-time. Currently this is handled by muning hv.h. Configure
support hopefully to follow.
*) Changes the default hash to Murmur hash which is faster than the
old default One-at-a-time.
*) Rips out the old HvREHASH mechanism and replaces it with a
per-process random hash seed.
*) Changes the old PL_hash_seed from an interpreter value to a
global variable. This means it does not have to be copied during
interpreter setup or cloning.
*) Changes the format of the PERL_HASH_SEED variable to a hex
string so that hash seeds longer than fit in an integer are possible.
*) Changes the return of Hash::Util::hash_seed() from a number to a
string. This is to accomodate hash functions which have more bits than
can be fit in an integer.
*) Adds new functions to Hash::Util to improve introspection of hashes
-) hash_value() - returns an integer hash value for a given string.
-) bucket_info() - returns basic hash bucket utilization info
-) bucket_stats() - returns more hash bucket utilization info
-) bucket_array() - which keys are in which buckets in a hash
More details on the new hash functions can be found below:
Murmur Hash: (v3) from google, see
http://code.google.com/p/smhasher/wiki/MurmurHash3
Superfast Hash: From Paul Hsieh.
http://www.azillionmonkeys.com/qed/hash.html
DJB2: a hash function from Daniel Bernstein
http://www.cse.yorku.ca/~oz/hash.html
SDBM: a hash function sdbm.
http://www.cse.yorku.ca/~oz/hash.html
SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein.
https://www.131002.net/siphash/
They have all be converted into Perl's ugly macro format.
I have not done any rigorous testing to make sure this conversion
is correct. They seem to function as expected however.
All of them use the random hash seed.
You can force the use of a given function by defining one of
PERL_HASH_FUNC_MURMUR
PERL_HASH_FUNC_SUPERFAST
PERL_HASH_FUNC_DJB2
PERL_HASH_FUNC_SDBM
PERL_HASH_FUNC_ONE_AT_A_TIME
Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make
perl output the current seed (changed to hex) and the hash function
it has been built with.
Setting the environment variable PERL_HASH_SEED to a hex value will
cause that value to be used at the seed. Any missing bits of the seed
will be set to 0. The bits are filled in from left to right, not
the traditional right to left so setting it to FE results in a seed
value of "FE000000" not "000000FE".
Note that we do the hash seed initialization in perl_construct().
Doing it via perl_alloc() (via init_tls) causes problems under
threaded builds as the buffers used for reentrant srand48 functions
are not allocated. See also the p5p mail "Hash improvements blocker:
portable random code that doesnt depend on a functional interpreter",
Message-ID:
<CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
|