| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
When giving a function-style prototype for a macro taking a literal string
parameter, put a string literal in place of a type for that parameter.
This goofy appearance makes it obvious that this isn't really a function,
and clues the reader in that the parameter can't actually be an arbitrary
expression of the right type. Also change the nonsensical "NUL-terminated
literal string" to "literal string" to describe these parameters.
Fixes [perl #116286].
|
|
|
|
|
|
|
| |
Incidentally, it currently works on SV *'s as well because there's an
explicit cast after an SvANY. Let's not rely on that. This commit also
removes a pointless const in a cast. Again. It takes an HV * as argument.
Let's only change that if we have a strong reason to.
|
|
|
|
| |
Except under cpan/ and dist/
|
|
|
|
| |
The previous commit had a double closing comment (*/)
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit d3148f758506efd28325dfd8e1b698385133f0cd.
SV keys are stored as pointers in the key_key, on platforms with
alignment requirements (such as PA-RISC) this resulted in bus errors
early in the build.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the store of HEK_FLAGS off the end of the allocated
hek_key into the hek struct, simplifying access and providing
clarity to the code.
What is not clear is why Nicholas or perhaps Jarkko did
not do this themselves. We use similar tricks elsewhere,
so perhaps it was just continuing a tradition...
One thought is that we often have do strcmp/memeq on these
strings, and having their start be aligned might improve
performance, wheras this patch changes them to be unaligned.
If so perhaps we should just make flags a U32 and let the
HEK's be larger. They are shared in PL_strtab, and are
probably often sitting in malloc blocks that are sufficiently
large enough that making them bigger would make no practical
difference. (All of this is worth checking.)
[with edits by Yves Orton]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason s suffix macro definitions intended for handling
constant string arguments were put into handy.h and not into hv.h.
I think this is wrong, especially as the macro defintions have
"drifted" and are not properly defined in terms the right base
macros.
Also we want to have such wrappers for the main hash functions,
so move them all to hv.h, recode them properly in terms of the
right base macros, and add support for the missing functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This subject has a long history see [perl #114576] for more discussion.
https://rt.perl.org/Public/Bug/Display.html?id=114576
There are a variety of reasons we want to change the return signature of
scalar(%hash). One is that it leaks implementation details about our
associative array structure. Another is that it requires us to keep track
of the used buckets in the hash, which we use for no other purpose but
for scalar(%hash). Another is that it is just odd. Almost nothing needs to
know these values. Perhaps debugging, but we have several much better
functions for introspecting the internals of a hash.
By changing the return signature we can remove all the logic related
to maintaining and updating xhv_fill_lazy. This should make hot code
paths a little faster, and maybe save some memory for traversed hashes.
In order to provide some form of backwards compatibility we adds three
new functions to the Hash::Util namespace: bucket_ratio(), num_buckets()
and used_buckets(). These functions are actually implemented in
universal.c, and thus always available even if Hash::Util is not loaded.
This simplifies testing. At the same time Hash::Util contains backwards
compatible code so that the new functions are available from it should
they be needed in older perls.
There are many tests in t/op/hash.t that are more or less obsolete after
this patch as they test that xhv_fill_lazy is correctly set in various
situations. However since we have a backwards compat layer we can just
switch them to use bucket_ratio(%hash) instead of scalar(%hash) and keep
the tests, just in case they are actually testing something not tested
elsewhere.
|
|
|
|
|
| |
-perl doesn't use malloc, it uses Newx (per interp memory)
-say what the return type is of SSNEW
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're already keeping destroy_gen there, so keep the CV there too.
The previous implementation, introduced in 8c34e50d, kept the
destroy method cache in the stash's stash, which broke B's SvSTASH
method.
Before that, the DESTROY method was cached in overload magic.
A previous version of this patch didn't clear the destructor cache on
a clone, which caused ext/XS-APItest/t/clone_with_stack.t to fail.
|
|
|
|
| |
Some entries already had this. For those, it standardizes the text.
|
| |
|
|
|
|
| |
Removes 'the' in front of parameter names in some instances.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
|
|
|
|
|
| |
This is to prevent a conflict showing up on z/OS (os390) because this
file's name is the same as one in /ext, and there are functions
cross-referenced between them, and the loader on that platform
can't deal with this.
See http://nntp.perl.org/group/perl.perl5.porters/226612
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assumption is that the time/space tradeoff of not allocating
the HvAUX() structure goes away for a large bucket array where the
size of the allocated buffer is much larger than the nonallocated
HvAUX() "extension".
This should make keys() and each() on larger hashes faster, but
still preserve the essence of the original space conservation,
where the assumption is a lot of small hash based objects which
will never be traversed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider a class that has some minimal overloading added - e.g. to give
pretty stringification of objects - but which *doesn't* overload
dereference methods such as '@[]'. '%[]' etc.
In this case, simple dereferencing, such as $obj->[0] or $obj->{foo}
becomes much slower than if the object was blessed into a non-overloaded
class.
This is because every time a dereferencing is performed in pp_rv2av for
example, the "normal" code path has to go through the full checking of:
* is the stash into which the referent is blessed overloaded? If so,
* retrieve the overload magic from the stash;
* check whether the overload method cache has been invalidated and if so
rebuild it;
* check whether we are in the scope of 'no overloading', and if so
is the current method disabled in this scope?
* Is there a '@{}' or whatever (or 'nomethod') method in the cache?
If not, then process the ref as normal.
That's a lot of extra overhead to decide that an overloaded method doesn't
in fact need to be called.
This commit adds a new flag to the newish xhv_aux_flags field,
HvAUXf_NO_DEREF, which signals that the overloading of this stash
contains no deref (nor 'nomethod') overloaded methods. Thus a quick check
for this flag in the common case allows us to short-circuit all the above
checks except the first one.
Before this commit, a simple $obj->[0] was about 40-50% slower if the
class it was blessed into was overloaded (but didn't have deref methods);
after the commit, the slowdown is 0-10%. (These timings are very
approximate, given the vagaries of nano benchmarks.)
|
|
|
|
|
|
|
|
|
|
| |
Currently the SVf_IsCOW flag doesn't have any meaning for HVs,
except that it is used in the specific case of gv_check() to temporarily
mark a stash as being scanned. Since stashes will have the HV_AUX fields,
we can use a flags bit in the new xhv_aux_flags field instead.
This then potentially frees up the SVf_IsCOW for use as a new general flag
bit for *all* HVs (including non-stash ones).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an extra U32 general flags field to the xpvhv_aux struct (which is
used on HVs such as stashes, that need extra fields).
On 64-bit systems, this doesn't consume any extra space since there's
already an odd number of I32/U32 fields. On 32-bit systems it will consume
an extra 4 bytes. But of course only on those hashes that have the aux
struct.
As well as providing extra flags in the AUX case, it will also allow
us to free up at least one general flag bit for HVs - see next commit.
|
|
|
|
| |
plus some typo fixes. I probably changed some things in perlintern, too.
|
|
|
|
|
|
|
|
| |
where possible
This involved adding hv_fetchhek and hv_storehek macros and changing
S_mro_clean_isarev to accept a hash parameter and expect HVhek_UTF8
instead of SVf_UTF8.
|
|
|
|
| |
Let’s defuse this time bomb before it causes problems.
|
|
|
|
|
|
|
| |
In those cases where the hash key comes from a hek, we already have a
computed hash value, so pass that to hv_common.
The easiest way to accomplish this is to add a new macro.
|
|
|
|
|
|
|
|
|
|
| |
HeSVKEY_force() is only used in two places in core.
In the first case, the key is always stored as a SV (when handling tie
magic, since NEXTKEY can only return a SV)
The second case is in B::HE, but I don't see a way to create a B::HE object
from a hash.
|
|
|
|
|
|
|
| |
Iterated hashes shouldn’t have to allocate space for something
specific to stashes, so move the SUPER method cache from the
HvAUX struct (which all iterated hashes have) into the mro
meta struct (which only stashes have).
|
|
|
|
|
|
|
|
|
| |
Commit 8c34e50d inadvertently caused DESTROY caches not to be
reset when UNIVERSAL::DESTROY changes. Normally, a change to
a method will cause mro_method_changed_in to be called on all
subclasses, but mro.c cheats for UNIVERSAL and just does
++PL_sub_generation. So clearing the DESTROY cache explicitly
in mro_method_changed_in is clearly not enough.
|
|
|
|
|
|
| |
This avoids HvFILL() being O(n) for large n on large hashes, but also avoids
storing the value of HvFILL() in smaller hashes (ie a memory overhead on
every single object built using a hash.)
|
|
|
|
|
|
| |
Install was a copy of other material, made heavy reference to 5.8.x and
and didnt really document what it should have. I reworked it to be more
up to date.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control
the level of randomization applied to keys() and friends.
When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The
chance that keys() changes due to an insert will be the same as in
previous perls, basically only when the bucket size is changed.
When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable
way. The chance that keys() changes due to an insert will be very high.
This is the most secure and default mode.
When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way.
Repititive runs of the same program should produce the same output every
time. The chance that keys changes due to an insert will be very high.
This patch also makes PERL_HASH_SEED imply a non-default
PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies
PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED
to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable
hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a
different level overrides this behavior.
Includes changes to allow one to compile out various aspects of the
patch. One can compile such that PERL_PERTURB_KEYS is not respected, or
can compile without hash key traversal randomization at all. Note that
support for these modes is incomplete, and currently a few tests will
fail.
Also includes a new subroutine in Hash::Util::hash_traversal_mask()
which can be used to ensure a given hash produces a predictable key
order (assuming the same hash seed is in effect). This sub acts as a
getter and a setter.
NOTE - this patch lacks tests, but I lack tuits to get them done quickly,
so I am pushing this with the hope that others can add them afterwards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The usages are as far as I know incorrect anyway. We resize
the hash bucket array based on the number of keys it holds,
not based on the number of buckets that are used, so this
usage was wrong anyway.
Another bug that this revealed is that the old code would allow
HvMAX(hv) to fall to 0, even though every other part of the
core expects it to have a minimum of 7 (meaning 8 buckets).
As part of this we change the hard coded 7 to a defined constant
PERL_HASH_DEFAULT_HvMAX.
After this patch there remains one use of HvFILL in core, that used
for scalar(%hash) which I plan to remove in a later patch.
|
|
|
|
|
|
|
|
| |
* If the hash is not OOK omit any iterator status information
instead of showing -1/NULL
* If the hash is OOK then add the RAND value from the iterator
and if the LASTRAND is not the same show it too
* Tweak tests to test the above.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inserting into a hash that is being traversed with each()
has always produced undefined behavior. With hash traversal
randomization this is more pronounced, and at the same
time relatively easy to spot. At the cost of an extra U32
in the xpvhv_aux structure we can detect that the xhv_rand
has changed and then produce a warning if it has.
It was suggested on IRC that this should produce a fatal
error, but I couldn't see a clean way to manage that with
"strict", it was much easier to create a "severe" (internal)
warning, which is enabled by default but suppressible with
C<no warnings "internal";> if people /really/ wanted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
S_ptr_hash() - A new static function in hv.c which can be used to
hash a pointer or integer.
PL_hash_rand_bits - A new interpreter variable used as a cheap
provider of "semi-random" state for use by the hash infrastructure.
xpvhv_aux.xhv_rand - Used as a mask which is xored against the
xpvhv_aux.riter during iteration to randomize the order the actual
buckets are visited.
PL_hash_rand_bits is initialized as interpreter start from the random
hash seed, and then modified by "mixing in" the result of ptr_hash()
on the bucket array pointer in the hv (HvARRAY(hv)) every time
hv_auxinit() allocates a new iterator structure.
The net result is that every hash has its own iteration order, which
should make it much more difficult to determine what the current hash
seed is.
This required some test to be restructured, as they tested for something
that was not necessarily true, we never guaranteed that two hashes with
the same keys would produce the same key order, we merely promised that
using keys(), values(), or each() on the same hash, without any
insertions in between, would produce the same order of visiting the
key/values.
|
|
|
|
|
| |
This includes various tweaks related to building SipHash and other
cleanup.
|
|
|
|
|
|
| |
builds
Murmurhash has certain disadvantages that neither ONE_AT_A_TIME nor SIPHASH posses
|
|
|
|
|
|
|
|
|
| |
With a 0 seed and ONE_AT_A_TIME_OLD hashing enabled one can simulate
older perls (with the exception there is no rehashing at play).
This includes a modest tweak to reduce ops per character by comparing
the string pointer to the end of the string, instead of maintaining
a position counter.
|
|
|
|
|
|
|
| |
This finishes the removal of register declarations started by
eb578fdb5569b91c28466a4d1939e381ff6ceaf4. It neglected the ones in
function parameter declarations, and didn't include things in dist, ext,
and lib, which this does include
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a toy. Probably not worth using in production. But
interesting enough I thought I would include it.
The idea is to use the hash seed as a table of random 16 bit integers
whose values are what we hash depending on the character we read.
It is pretty fast, I have no idea how secure it is. It will probably
work really badly if the seed is crap. YMMV.
|
| |
|
|
|
|
|
| |
The approach MurmurHash3 supplied wasn't able to probe endianness
successfully on (at least) HP-UX.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
*) Introduces multiple new hash functions to choose from at build
time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and
One-at-a-time. Currently this is handled by muning hv.h. Configure
support hopefully to follow.
*) Changes the default hash to Murmur hash which is faster than the
old default One-at-a-time.
*) Rips out the old HvREHASH mechanism and replaces it with a
per-process random hash seed.
*) Changes the old PL_hash_seed from an interpreter value to a
global variable. This means it does not have to be copied during
interpreter setup or cloning.
*) Changes the format of the PERL_HASH_SEED variable to a hex
string so that hash seeds longer than fit in an integer are possible.
*) Changes the return of Hash::Util::hash_seed() from a number to a
string. This is to accomodate hash functions which have more bits than
can be fit in an integer.
*) Adds new functions to Hash::Util to improve introspection of hashes
-) hash_value() - returns an integer hash value for a given string.
-) bucket_info() - returns basic hash bucket utilization info
-) bucket_stats() - returns more hash bucket utilization info
-) bucket_array() - which keys are in which buckets in a hash
More details on the new hash functions can be found below:
Murmur Hash: (v3) from google, see
http://code.google.com/p/smhasher/wiki/MurmurHash3
Superfast Hash: From Paul Hsieh.
http://www.azillionmonkeys.com/qed/hash.html
DJB2: a hash function from Daniel Bernstein
http://www.cse.yorku.ca/~oz/hash.html
SDBM: a hash function sdbm.
http://www.cse.yorku.ca/~oz/hash.html
SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein.
https://www.131002.net/siphash/
They have all be converted into Perl's ugly macro format.
I have not done any rigorous testing to make sure this conversion
is correct. They seem to function as expected however.
All of them use the random hash seed.
You can force the use of a given function by defining one of
PERL_HASH_FUNC_MURMUR
PERL_HASH_FUNC_SUPERFAST
PERL_HASH_FUNC_DJB2
PERL_HASH_FUNC_SDBM
PERL_HASH_FUNC_ONE_AT_A_TIME
Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make
perl output the current seed (changed to hex) and the hash function
it has been built with.
Setting the environment variable PERL_HASH_SEED to a hex value will
cause that value to be used at the seed. Any missing bits of the seed
will be set to 0. The bits are filled in from left to right, not
the traditional right to left so setting it to FE results in a seed
value of "FE000000" not "000000FE".
Note that we do the hash seed initialization in perl_construct().
Doing it via perl_alloc() (via init_tls) causes problems under
threaded builds as the buffers used for reentrant srand48 functions
are not allocated. See also the p5p mail "Hash improvements blocker:
portable random code that doesnt depend on a functional interpreter",
Message-ID:
<CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl caches SUPER methods inside packages named Foo::SUPER. But this
interferes with actual method calls on those packages (SUPER->foo,
foo::SUPER->foo).
The first time a package is looked up, it is vivified under the name
with which it is looked up. So *SUPER:: will cause that package
to be called SUPER, and *main::SUPER:: will cause it to be named
main::SUPER.
main->SUPER::isa used to be very sensitive to the name of the
main::FOO package (where the cache is kept). If it happened to be
called SUPER, that call would fail.
Fixing that bug (commit 3c104e59d83f) caused the CPAN module named
SUPER to fail, because SUPER->foo was now being treated as a
SUPER::method call. gv_fetchmeth_pvn was using the ::SUPER suffix to
determine where to look for the method. The package passed to it (the
::SUPER package) was being used to look for cached methods, but the
package with ::SUPER stripped off was being used for the rest of
lookup. 3c104e59d83f made main->SUPER::foo work by treating SUPER
as main::SUPER in that case. Mentioning *main::SUPER:: or doing a
main->SUPER::foo call before loading SUPER.pm also caused it to fail,
even before 3c104e59d83f.
Instead of using publicly-visible packages for internal caches, we
should be keeping them internal, to avoid such side effects.
This commit adds a new member to the HvAUX struct, where a hash of GVs
is stored, to cache super methods. I cannot simpy use a hash of CVs,
because I need GvCVGEN. Using a hash of GVs allows the existing
method cache code to be used.
This new hash of GVs is not actually a stash, as it has no HvAUX
struct (i.e., no name, no mro_meta). It doesn’t even need an @ISA
entry as before (which was only used to make isa caches reset), as it
shares its owner stash’s mro_meta generation numbers. In fact, the
GVs inside it have their GvSTASH pointers pointing to the owner stash.
In terms of memory use, it is probably the same as before. Every
stash and every iterated or weakly-referenced hash is now one pointer
larger than before, but every SUPER cache is smaller (no HvAUX, no
*ISA + @ISA + $ISA[0] + magic).
The code is a lot simpler now and uses fewer stash lookups, so it
should be faster.
This will break any XS code that expects the gv_fetchmeth_pvn to treat
the ::SUPER suffix as magical. This behaviour was only barely docu-
mented (the suffix was mentioned, but what it did was not), and is
unused on CPAN.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes most register declarations in C code (and accompanying
documentation) in the Perl core. Retained are those in the ext
directory, Configure, and those that are associated with assembly
language.
See:
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
which says, in part:
There is no good example of register usage when using modern compilers
(read: last 10+ years) because it almost never does any good and can do
some bad. When you use register, you are telling the compiler "I know
how to optimize my code better than you do" which is almost never the
case. One of three things can happen when you use register:
The compiler ignores it, this is most likely. In this case the only
harm is that you cannot take the address of the variable in the
code.
The compiler honors your request and as a result the code runs slower.
The compiler honors your request and the code runs faster, this is the least likely scenario.
Even if one compiler produces better code when you use register, there
is no reason to believe another will do the same. If you have some
critical code that the compiler is not optimizing well enough your best
bet is probably to use assembler for that part anyway but of course do
the appropriate profiling to verify the generated code is really a
problem first.
|
|
|
|
|
| |
This updates the editor hints in our files for Emacs and vim to request
that tabs be inserted as spaces.
|
|
|
|
|
|
| |
When seeing whether the cop hint hash contains the given feature,
Perl_feature_is_enabled only needs to see whether the hint hash ele-
ment exists. It doesn’t need to turn it into a scalar.
|