| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assumption is that the time/space tradeoff of not allocating
the HvAUX() structure goes away for a large bucket array where the
size of the allocated buffer is much larger than the nonallocated
HvAUX() "extension".
This should make keys() and each() on larger hashes faster, but
still preserve the essence of the original space conservation,
where the assumption is a lot of small hash based objects which
will never be traversed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider a class that has some minimal overloading added - e.g. to give
pretty stringification of objects - but which *doesn't* overload
dereference methods such as '@[]'. '%[]' etc.
In this case, simple dereferencing, such as $obj->[0] or $obj->{foo}
becomes much slower than if the object was blessed into a non-overloaded
class.
This is because every time a dereferencing is performed in pp_rv2av for
example, the "normal" code path has to go through the full checking of:
* is the stash into which the referent is blessed overloaded? If so,
* retrieve the overload magic from the stash;
* check whether the overload method cache has been invalidated and if so
rebuild it;
* check whether we are in the scope of 'no overloading', and if so
is the current method disabled in this scope?
* Is there a '@{}' or whatever (or 'nomethod') method in the cache?
If not, then process the ref as normal.
That's a lot of extra overhead to decide that an overloaded method doesn't
in fact need to be called.
This commit adds a new flag to the newish xhv_aux_flags field,
HvAUXf_NO_DEREF, which signals that the overloading of this stash
contains no deref (nor 'nomethod') overloaded methods. Thus a quick check
for this flag in the common case allows us to short-circuit all the above
checks except the first one.
Before this commit, a simple $obj->[0] was about 40-50% slower if the
class it was blessed into was overloaded (but didn't have deref methods);
after the commit, the slowdown is 0-10%. (These timings are very
approximate, given the vagaries of nano benchmarks.)
|
|
|
|
|
|
|
|
|
|
| |
Currently the SVf_IsCOW flag doesn't have any meaning for HVs,
except that it is used in the specific case of gv_check() to temporarily
mark a stash as being scanned. Since stashes will have the HV_AUX fields,
we can use a flags bit in the new xhv_aux_flags field instead.
This then potentially frees up the SVf_IsCOW for use as a new general flag
bit for *all* HVs (including non-stash ones).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an extra U32 general flags field to the xpvhv_aux struct (which is
used on HVs such as stashes, that need extra fields).
On 64-bit systems, this doesn't consume any extra space since there's
already an odd number of I32/U32 fields. On 32-bit systems it will consume
an extra 4 bytes. But of course only on those hashes that have the aux
struct.
As well as providing extra flags in the AUX case, it will also allow
us to free up at least one general flag bit for HVs - see next commit.
|
|
|
|
| |
plus some typo fixes. I probably changed some things in perlintern, too.
|
|
|
|
|
|
|
|
| |
where possible
This involved adding hv_fetchhek and hv_storehek macros and changing
S_mro_clean_isarev to accept a hash parameter and expect HVhek_UTF8
instead of SVf_UTF8.
|
|
|
|
| |
Let’s defuse this time bomb before it causes problems.
|
|
|
|
|
|
|
| |
In those cases where the hash key comes from a hek, we already have a
computed hash value, so pass that to hv_common.
The easiest way to accomplish this is to add a new macro.
|
|
|
|
|
|
|
|
|
|
| |
HeSVKEY_force() is only used in two places in core.
In the first case, the key is always stored as a SV (when handling tie
magic, since NEXTKEY can only return a SV)
The second case is in B::HE, but I don't see a way to create a B::HE object
from a hash.
|
|
|
|
|
|
|
| |
Iterated hashes shouldn’t have to allocate space for something
specific to stashes, so move the SUPER method cache from the
HvAUX struct (which all iterated hashes have) into the mro
meta struct (which only stashes have).
|
|
|
|
|
|
|
|
|
| |
Commit 8c34e50d inadvertently caused DESTROY caches not to be
reset when UNIVERSAL::DESTROY changes. Normally, a change to
a method will cause mro_method_changed_in to be called on all
subclasses, but mro.c cheats for UNIVERSAL and just does
++PL_sub_generation. So clearing the DESTROY cache explicitly
in mro_method_changed_in is clearly not enough.
|
|
|
|
|
|
| |
This avoids HvFILL() being O(n) for large n on large hashes, but also avoids
storing the value of HvFILL() in smaller hashes (ie a memory overhead on
every single object built using a hash.)
|
|
|
|
|
|
| |
Install was a copy of other material, made heavy reference to 5.8.x and
and didnt really document what it should have. I reworked it to be more
up to date.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control
the level of randomization applied to keys() and friends.
When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The
chance that keys() changes due to an insert will be the same as in
previous perls, basically only when the bucket size is changed.
When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable
way. The chance that keys() changes due to an insert will be very high.
This is the most secure and default mode.
When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way.
Repititive runs of the same program should produce the same output every
time. The chance that keys changes due to an insert will be very high.
This patch also makes PERL_HASH_SEED imply a non-default
PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies
PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED
to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable
hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a
different level overrides this behavior.
Includes changes to allow one to compile out various aspects of the
patch. One can compile such that PERL_PERTURB_KEYS is not respected, or
can compile without hash key traversal randomization at all. Note that
support for these modes is incomplete, and currently a few tests will
fail.
Also includes a new subroutine in Hash::Util::hash_traversal_mask()
which can be used to ensure a given hash produces a predictable key
order (assuming the same hash seed is in effect). This sub acts as a
getter and a setter.
NOTE - this patch lacks tests, but I lack tuits to get them done quickly,
so I am pushing this with the hope that others can add them afterwards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The usages are as far as I know incorrect anyway. We resize
the hash bucket array based on the number of keys it holds,
not based on the number of buckets that are used, so this
usage was wrong anyway.
Another bug that this revealed is that the old code would allow
HvMAX(hv) to fall to 0, even though every other part of the
core expects it to have a minimum of 7 (meaning 8 buckets).
As part of this we change the hard coded 7 to a defined constant
PERL_HASH_DEFAULT_HvMAX.
After this patch there remains one use of HvFILL in core, that used
for scalar(%hash) which I plan to remove in a later patch.
|
|
|
|
|
|
|
|
| |
* If the hash is not OOK omit any iterator status information
instead of showing -1/NULL
* If the hash is OOK then add the RAND value from the iterator
and if the LASTRAND is not the same show it too
* Tweak tests to test the above.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inserting into a hash that is being traversed with each()
has always produced undefined behavior. With hash traversal
randomization this is more pronounced, and at the same
time relatively easy to spot. At the cost of an extra U32
in the xpvhv_aux structure we can detect that the xhv_rand
has changed and then produce a warning if it has.
It was suggested on IRC that this should produce a fatal
error, but I couldn't see a clean way to manage that with
"strict", it was much easier to create a "severe" (internal)
warning, which is enabled by default but suppressible with
C<no warnings "internal";> if people /really/ wanted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
S_ptr_hash() - A new static function in hv.c which can be used to
hash a pointer or integer.
PL_hash_rand_bits - A new interpreter variable used as a cheap
provider of "semi-random" state for use by the hash infrastructure.
xpvhv_aux.xhv_rand - Used as a mask which is xored against the
xpvhv_aux.riter during iteration to randomize the order the actual
buckets are visited.
PL_hash_rand_bits is initialized as interpreter start from the random
hash seed, and then modified by "mixing in" the result of ptr_hash()
on the bucket array pointer in the hv (HvARRAY(hv)) every time
hv_auxinit() allocates a new iterator structure.
The net result is that every hash has its own iteration order, which
should make it much more difficult to determine what the current hash
seed is.
This required some test to be restructured, as they tested for something
that was not necessarily true, we never guaranteed that two hashes with
the same keys would produce the same key order, we merely promised that
using keys(), values(), or each() on the same hash, without any
insertions in between, would produce the same order of visiting the
key/values.
|
|
|
|
|
| |
This includes various tweaks related to building SipHash and other
cleanup.
|
|
|
|
|
|
| |
builds
Murmurhash has certain disadvantages that neither ONE_AT_A_TIME nor SIPHASH posses
|
|
|
|
|
|
|
|
|
| |
With a 0 seed and ONE_AT_A_TIME_OLD hashing enabled one can simulate
older perls (with the exception there is no rehashing at play).
This includes a modest tweak to reduce ops per character by comparing
the string pointer to the end of the string, instead of maintaining
a position counter.
|
|
|
|
|
|
|
| |
This finishes the removal of register declarations started by
eb578fdb5569b91c28466a4d1939e381ff6ceaf4. It neglected the ones in
function parameter declarations, and didn't include things in dist, ext,
and lib, which this does include
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a toy. Probably not worth using in production. But
interesting enough I thought I would include it.
The idea is to use the hash seed as a table of random 16 bit integers
whose values are what we hash depending on the character we read.
It is pretty fast, I have no idea how secure it is. It will probably
work really badly if the seed is crap. YMMV.
|
| |
|
|
|
|
|
| |
The approach MurmurHash3 supplied wasn't able to probe endianness
successfully on (at least) HP-UX.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
*) Introduces multiple new hash functions to choose from at build
time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and
One-at-a-time. Currently this is handled by muning hv.h. Configure
support hopefully to follow.
*) Changes the default hash to Murmur hash which is faster than the
old default One-at-a-time.
*) Rips out the old HvREHASH mechanism and replaces it with a
per-process random hash seed.
*) Changes the old PL_hash_seed from an interpreter value to a
global variable. This means it does not have to be copied during
interpreter setup or cloning.
*) Changes the format of the PERL_HASH_SEED variable to a hex
string so that hash seeds longer than fit in an integer are possible.
*) Changes the return of Hash::Util::hash_seed() from a number to a
string. This is to accomodate hash functions which have more bits than
can be fit in an integer.
*) Adds new functions to Hash::Util to improve introspection of hashes
-) hash_value() - returns an integer hash value for a given string.
-) bucket_info() - returns basic hash bucket utilization info
-) bucket_stats() - returns more hash bucket utilization info
-) bucket_array() - which keys are in which buckets in a hash
More details on the new hash functions can be found below:
Murmur Hash: (v3) from google, see
http://code.google.com/p/smhasher/wiki/MurmurHash3
Superfast Hash: From Paul Hsieh.
http://www.azillionmonkeys.com/qed/hash.html
DJB2: a hash function from Daniel Bernstein
http://www.cse.yorku.ca/~oz/hash.html
SDBM: a hash function sdbm.
http://www.cse.yorku.ca/~oz/hash.html
SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein.
https://www.131002.net/siphash/
They have all be converted into Perl's ugly macro format.
I have not done any rigorous testing to make sure this conversion
is correct. They seem to function as expected however.
All of them use the random hash seed.
You can force the use of a given function by defining one of
PERL_HASH_FUNC_MURMUR
PERL_HASH_FUNC_SUPERFAST
PERL_HASH_FUNC_DJB2
PERL_HASH_FUNC_SDBM
PERL_HASH_FUNC_ONE_AT_A_TIME
Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make
perl output the current seed (changed to hex) and the hash function
it has been built with.
Setting the environment variable PERL_HASH_SEED to a hex value will
cause that value to be used at the seed. Any missing bits of the seed
will be set to 0. The bits are filled in from left to right, not
the traditional right to left so setting it to FE results in a seed
value of "FE000000" not "000000FE".
Note that we do the hash seed initialization in perl_construct().
Doing it via perl_alloc() (via init_tls) causes problems under
threaded builds as the buffers used for reentrant srand48 functions
are not allocated. See also the p5p mail "Hash improvements blocker:
portable random code that doesnt depend on a functional interpreter",
Message-ID:
<CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl caches SUPER methods inside packages named Foo::SUPER. But this
interferes with actual method calls on those packages (SUPER->foo,
foo::SUPER->foo).
The first time a package is looked up, it is vivified under the name
with which it is looked up. So *SUPER:: will cause that package
to be called SUPER, and *main::SUPER:: will cause it to be named
main::SUPER.
main->SUPER::isa used to be very sensitive to the name of the
main::FOO package (where the cache is kept). If it happened to be
called SUPER, that call would fail.
Fixing that bug (commit 3c104e59d83f) caused the CPAN module named
SUPER to fail, because SUPER->foo was now being treated as a
SUPER::method call. gv_fetchmeth_pvn was using the ::SUPER suffix to
determine where to look for the method. The package passed to it (the
::SUPER package) was being used to look for cached methods, but the
package with ::SUPER stripped off was being used for the rest of
lookup. 3c104e59d83f made main->SUPER::foo work by treating SUPER
as main::SUPER in that case. Mentioning *main::SUPER:: or doing a
main->SUPER::foo call before loading SUPER.pm also caused it to fail,
even before 3c104e59d83f.
Instead of using publicly-visible packages for internal caches, we
should be keeping them internal, to avoid such side effects.
This commit adds a new member to the HvAUX struct, where a hash of GVs
is stored, to cache super methods. I cannot simpy use a hash of CVs,
because I need GvCVGEN. Using a hash of GVs allows the existing
method cache code to be used.
This new hash of GVs is not actually a stash, as it has no HvAUX
struct (i.e., no name, no mro_meta). It doesn’t even need an @ISA
entry as before (which was only used to make isa caches reset), as it
shares its owner stash’s mro_meta generation numbers. In fact, the
GVs inside it have their GvSTASH pointers pointing to the owner stash.
In terms of memory use, it is probably the same as before. Every
stash and every iterated or weakly-referenced hash is now one pointer
larger than before, but every SUPER cache is smaller (no HvAUX, no
*ISA + @ISA + $ISA[0] + magic).
The code is a lot simpler now and uses fewer stash lookups, so it
should be faster.
This will break any XS code that expects the gv_fetchmeth_pvn to treat
the ::SUPER suffix as magical. This behaviour was only barely docu-
mented (the suffix was mentioned, but what it did was not), and is
unused on CPAN.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes most register declarations in C code (and accompanying
documentation) in the Perl core. Retained are those in the ext
directory, Configure, and those that are associated with assembly
language.
See:
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
which says, in part:
There is no good example of register usage when using modern compilers
(read: last 10+ years) because it almost never does any good and can do
some bad. When you use register, you are telling the compiler "I know
how to optimize my code better than you do" which is almost never the
case. One of three things can happen when you use register:
The compiler ignores it, this is most likely. In this case the only
harm is that you cannot take the address of the variable in the
code.
The compiler honors your request and as a result the code runs slower.
The compiler honors your request and the code runs faster, this is the least likely scenario.
Even if one compiler produces better code when you use register, there
is no reason to believe another will do the same. If you have some
critical code that the compiler is not optimizing well enough your best
bet is probably to use assembler for that part anyway but of course do
the appropriate profiling to verify the generated code is really a
problem first.
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
|
| |
When seeing whether the cop hint hash contains the given feature,
Perl_feature_is_enabled only needs to see whether the hint hash ele-
ment exists. It doesn’t need to turn it into a scalar.
|
|
|
|
|
|
| |
This goes all the way back to bbce6d6978 (inseparable changes from
patch from perl5.003_08 to perl5.003_09). It is mightily confusing
for anyone trying to figure out how these things work.
|
|
|
|
| |
This comment was made obsolete by commit bc5cdc2388.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Brian's comments:
if xhv_name_count == 1, HvENAME_HEK_NN returns null.
So there's no need to use that macro twice. Just check for -1
The real need to make these smaller is the fact that some precompilers
(e.g. HP-UX 10.20) cannot cope with the size these have grown to. The
precompiler has since got an option (-Hnnn) to increase the macrospace
but that option never made it to these old compilers.
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
| |
|
| |
|
|
|
|
|
| |
For macros that returns flags, the _get convention implies that there
could be a _set variant some day. But we don’t do that for flags.
|
|
|
|
| |
Groundwork for the following commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Those two macros expand into two large, almost identical chunks of code.
The only difference between the two is the source of the hash seed.
So parameterize this into a new PERL_HASH_INTERNAL_() macro.
Also, there are a couple of places in hv.c that do the rough equivalent of
if (HvREHASH(hv))
key = PERL_HASH_INTERNAL(...)
else
key = PERL_HASH(...)
which incorporates two complete macro expansions into the code.
Reorganise them to be
key = PERL_HASH_INTERNAL_(..., HvREHASH(hv))
|
|
|
|
|
|
|
|
|
| |
# New Ticket Created by (Peter J. Acklam)
# Please include the string: [perl #81904]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81904 >
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
| |
This avoids a lot of casting. Nothing outside the perl core code is accessing
that member directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unless called from sv_clear.
This is necessary as and undeffed stash, though it nominally becomes
just a plain hash and is not a stash any more, is still to be found
in the symbol table. It may even be in multiple places. HvENAME’s
raison d’être is to keep track of this. If the effective name is
deleted, then things can get out of sync as the test in the commit
demonstrates. This can cause problems if the hash is turned back
into a stash.
This does not change the deletion of the HvNAME, which is the only
difference between hv_clear and hv_undef on stashes that is visible
from Perl. caller still returns (unknown) or __ANON__::....
I tried to make this into several small commits, but each part of it
breaks things without the other parts, so this is one big commit.
These are the various parts:
• hv_undef no longer calls mro_package_named directly, as it deletes
the effective name of the stash. It must only be called on sub-
stashes, so hfreeentries has been modified to do that.
• hv_name_set, which has erased the HvENAME when passed a null arg
for the value ever since effective names were added (a special case
put it just for hv_undef), now leaves the HvENAME alone, unless the
new HV_NAME_SETALL flag (set to 2 to allow for UTF8 in future)
is passed.
• hv_undef does not delete the name before the call to hfreeentries
during global destruction. That extra name deletion was added when
hfreeentries stopped hiding the name, as CVs won’t be anonymised
properly if they see it. It does not matter where the CVs point if
they are to be freed shortly. This is just a speed optimisation, as
it allows the name and effective name to be deleted in one fell
swoop. Deleting just the name (not the effective name) can require a
memory allocation.
• hv_undef calls mro_isa_changed_in as it used to (before it started
using mro_package_moved), but now it happens after the entries are
freed. Calling it first, as 5.13.6 and earlier versions did, was
simply wrong.
• Both names are deleted from PL_stashcache. I inadvertently switched
it back and forth between the two names in previous commits. Since
it needed to be accounted for, it made no omit it, as that would
just complicate things. (I think PL_stashcache is buggy, though I
have yet to come up with a test case.)
• sv_clear now calls Perl_hv_undef_flags with the HV_NAME_SETALL
flag, which is passed through to the second hv_name_set call,
after hfreeentries. That determines whether the effective names
are deleted.
• The changes at the end of hv_undef consist of pussyfooting to avoid
unnecessary work. They make sure that everything is freed that needs
to be and nothing is freed that must not be.
|
|
|
|
|
|
|
| |
Add flags param to hv_undef.
There is no mathom, as the changes that this will support
are by no means suitable for maint.
|
|
|
|
|
|
|
|
| |
as of 80ebaca.
It was nice while it lasted.
This reverts 6f86b615fa.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a new HV_FETCH_EMPTY_HE flag for hv_common. It is to
be used in conjunction with HV_FETCH_LVALUE. It just stops the newly-
created HE from having a new undef scalar assigned to it.
This allows code to call hv_common just once instead of an hv_exists/
hv_store pair.
It was such a double hv_common call that I was trying to avoid with
HV_FETCH_LVALUE, without realising that it was leaking.
|
|
|
|
|
|
| |
This avoids structure padding on architectures with 64 bit alignment for
pointers. For example, on x86_64 it reduces the structure size from 48 to 40
bytes.
|
| |
|
| |
|