summaryrefslogtreecommitdiff
path: root/hv.c
Commit message (Collapse)AuthorAgeFilesLines
* Refer to CopLABEL_len[_flags] in pod for cop_fetch_labelKarl Williamson2019-09-021-2/+8
|
* perlapi: Clarify pod for cop_store_labelKarl Williamson2019-09-021-1/+1
|
* Remove redundant info on =for apidoc linesKarl Williamson2019-05-301-10/+10
| | | | | | | | | This information is already in embed.fnc, and we know it compiles. Some of this information is now out-of-date. Get rid of it. There was one bit of information that was (apparently) wrong in embed.fnc. The apidoc line asked that there be no usage example generated for newXS. I added that flag to the embed.fnc entry.
* perlapi: Clarify entry for hv_store()Karl Williamson2019-03-121-1/+3
|
* S_hv_delete_common(): avoid undefined behaviourDavid Mitchell2018-11-211-1/+1
| | | | | | | | | | | ASAN -fsanitize-undefined was tripping on the second of these two lines: svp = AvARRAY(isa); end = svp + AvFILLp(isa)+1; In the case where svp is NULL and AvFILLp(isa) is -1, the first addition is undefined behaviour. Add the 1 first, so that it becomes svp + (-1+1), which is safe.
* Use memEQs, memNEs in core filesKarl Williamson2017-11-061-1/+1
| | | | | | | | | | Where the length is known, we can use these functions which relieve the programmer and the program reader from having to count characters. The memFOO functions should also be slightly faster than the strFOO equivalents. In some instances in this commit, hard coded numbers are used. These come from the 'case' statement values that apply to them.
* Rename strEQs to strBEGINs; remove strNEsKarl Williamson2017-11-061-1/+1
| | | | | | | | | | The original names are confusing. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/244335 The two macros are mapped into just that one, complementing the result for the few cases where strNEs was used.
* Consider magic %ENV as tied in hv_pushkv.Craig A. Berry2017-08-051-1/+5
| | | | | | | For the DYNAMIC_ENV_FETCH case, we don't know the number of keys until the first iteration triggers a call to prime_env_iter(), so piggyback on the tied magic case, which already handles extending the stack for each iteration rather than all at once beforehand.
* hv_pushkv(): handle keys() and values() tooDavid Mitchell2017-07-271-16/+35
| | | | | | | | | | | | | The newish function hv_pushkv() currently just pushes all key/value pairs on the stack. i.e. it does the equivalent of the perl code '() = %h'. Extend it so that it can handle 'keys %h' and values %h' too. This is basically moving the remaining list-context functionality out of do_kv() and into hv_pushkv(). The rationale for this is that hv_pushkv() is a pure HV-related function, while do_kv() is a pp function for several ops including OP_KEYS/VALUES, and expects PL_op->op_flags/op_private to be valid.
* Perl_hv_pushkv(): unroll hv_iterkeysv()David Mitchell2017-07-271-6/+12
| | | | Do our own mortal stack extending and handling.
* create Perl_hv_pushkv() functionDavid Mitchell2017-07-271-0/+44
| | | | | | | | | | | | | | | | ...and make pp_padhv(), pp_rv2hv() use it rather than using Perl_do_kv() Both pp_padhv() and pp_rv2hv() (via S_padhv_rv2hv_common()), outsource to Perl_do_kv(), the list-context pushing/flattening of a hash onto the stack. Perl_do_kv() is a big function that handles all the actions of keys, values etc. Instead, create a new function which does just the pushing of a hash onto the stack. At the same time, split it out into two loops, one for tied, one for normal: the untied one can skip extending the stack on each iteration, and use a cheaper HeVAL() instead of calling hv_iterval().
* make callers of SvTRUE() more efficientDavid Mitchell2017-07-271-1/+1
| | | | | | Where its obvious that the args can't be null, use SvTRUE_NN() instead. Avoid possible multiple evaluations of the arg by assigning to a local var first if necessary.
* use the new PL_sv_zero in obvious placesDavid Mitchell2017-07-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | In places that do things like mPUSHi(0) or newSViv(0), replace them with PUSHs(&PL_sv_zero) and &PL_sv_zero, etc. This avoids the cost of creating and/or mortalising an SV, and/or setting its value to 0. This commit causes a subtle change to tainting in various places as a side-effect. For example, grep in scalar context retunrs 0 if it has no args. Formerly the zero value could in theory get tainted: @a = (); $x = ( ($^X . ""), grep { 1 } @a); It used to be the case that $x would be tainted; now its not. In practice this doesn't matter - the zero value was only getting tainted as a side-effect of tainting's "if anything in the statement uses a tainted value, taint everything" mechanism, which gives (documented) false positives. This commit merely removes some such false positives, and makes the behaviour similar to functions which return &PL_sv_undef/no/yes, which are also immune to side-effect tainting.
* hv.c: fixup args assert for HV_FREE_ENTRIESYves Orton2017-07-011-1/+1
|
* hv.c: rename static function S_hfreeentries() to S_hv_free_entries()Yves Orton2017-07-011-6/+6
| | | | hfreeentries() reads very poorly - hv_free_entries() makes more sense too.
* fixup typo in commentYves Orton2017-07-011-1/+1
|
* hv.c: silence compiler warningYves Orton2017-06-011-1/+1
| | | | | | | | | | hv.c: In function ‘Perl_hv_undef_flags’: hv.c:2053:35: warning: ‘orig_ix’ may be used uninitialized in this function [-Wmaybe-uninitialized] PL_tmps_stack[orig_ix] = &PL_sv_undef; The warning is bogus, as we only use orig_ix if "save" is true, and if "save" is true we will have initialized orig_ix. However initializing it in the first place avoids any issue
* RT #127742: Hash keys are limited to 2 GB - throw an exception if hash keys ↵Aaron Crane2017-06-011-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | are too long We currently require hash keys to be less than 2**31 bytes long. But (a) nothing actually tries to enforce that, and (b) if a Perl program tries to create a hash with such a key (using a 64-bit system), we miscalculate the size of a memory block, yielding a panic: $ ./perl -e '+{ "x" x 2**31, undef }' panic: malloc, size=18446744071562068026 at -e line 1. Instead, check for this situation, and croak with an appropriate (new) diagnostic in the unlikely event that it occurs. This also involves changing the type of an argument to a public API function: Perl_share_hek() previously took the key's length as an I32, but that makes it impossible to detect over-long keys, so it must be SSize_t instead. From Yves: We also inject the length test into the PERL_HASH() macro, so that where the macro is used *before* calling into any of the hv functions we can avoid hashing a very long string only to throw an exception that it is too long. Might as well fail fast.
* Restore "Tweak our hash bucket splitting rules"Yves Orton2017-06-011-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit e4343ef32499562ce956ba3cb9cf4454d5d2ff7f, which was a revert of 05f97de032fe95cabe8c9f6d6c0a5897b1616194. Prior to this patch we resized hashes when after inserting a key the load factor of the hash reached 1 (load factor= keys / buckets). This patch makes two subtle changes to this logic: 1. We split only after inserting a key into an utilized bucket, 2. and the maximum load factor exceeds 0.667 The intent and effect of this change is to increase our hash tables efficiency. Reducing the maximum load factor 0.667 means that we should have much less keys in collision overall, at the cost of some unutilized space (2/3rds was chosen as it is easier to calculate than 0.7). On the other hand, only splitting after a collision means in theory that we execute the "final split" less often. Additionally, insertin a key into an unused bucket increases the efficiency of the hash, without changing the worst case.[1] In other words without increasing collisions we use the space in our hashes more efficiently. A side effect of this hash is that the size of a hash is more sensitive to key insert order. A set of keys with some collisions might be one size if those collisions were encountered early, or another if they were encountered later. Assuming random distribution of hash values about 50% of hashes should be smaller than they would be without this rule. The two changes complement each other, as changing the maximum load factor decreases the chance of a collision, but changing to only split after a collision means that we won't waste as much of that space we might. [1] Since I personally didnt find this obvious at first here is my explanation: The old behavior was that we doubled the number of buckets when the number of keys in the hash matched that of buckets. So on inserting the Kth key into a K bucket hash, we would double the number of buckets. Thus the worse case prior to this patch was a hash containing K-1 keys which all hash into a single bucket, and the post split worst case behavior would be having K items in a single bucket of a hash with 2*K buckets total. The new behavior says that we double the size of the hash once inserting an item into an occupied bucket and after doing so we exceeed the maximum load factor (leave aside the change in maximum load factor in this patch). If we insert into an occupied bucket (including the worse case bucket) then we trigger a key split, and we have exactly the same cases as before. If we insert into an empty bucket then we now have a worst case of K-1 items in one bucket, and 1 item in another, in a hash with K buckets, thus the worst case has not changed.
* Revert "Tweak our hash bucket splitting rules"Yves Orton2017-04-231-31/+12
| | | | | | This reverts commit 05f97de032fe95cabe8c9f6d6c0a5897b1616194. Accidentally pushed while waiting for blead-unfreeze.
* Tweak our hash bucket splitting rulesYves Orton2017-04-231-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch we resized hashes when after inserting a key the load factor of the hash reached 1 (load factor= keys / buckets). This patch makes two subtle changes to this logic: 1. We split only after inserting a key into an utilized bucket, 2. and the maximum load factor exceeds 0.667 The intent and effect of this change is to increase our hash tables efficiency. Reducing the maximum load factor 0.667 means that we should have much less keys in collision overall, at the cost of some unutilized space (2/3rds was chosen as it is easier to calculate than 0.7). On the other hand, only splitting after a collision means in theory that we execute the "final split" less often. Additionally, insertin a key into an unused bucket increases the efficiency of the hash, without changing the worst case.[1] In other words without increasing collisions we use the space in our hashes more efficiently. A side effect of this hash is that the size of a hash is more sensitive to key insert order. A set of keys with some collisions might be one size if those collisions were encountered early, or another if they were encountered later. Assuming random distribution of hash values about 50% of hashes should be smaller than they would be without this rule. The two changes complement each other, as changing the maximum load factor decreases the chance of a collision, but changing to only split after a collision means that we won't waste as much of that space we might. [1] Since I personally didnt find this obvious at first here is my explanation: The old behavior was that we doubled the number of buckets when the number of keys in the hash matched that of buckets. So on inserting the Kth key into a K bucket hash, we would double the number of buckets. Thus the worse case prior to this patch was a hash containing K-1 keys which all hash into a single bucket, and the post split worst case behavior would be having K items in a single bucket of a hash with 2*K buckets total. The new behavior says that we double the size of the hash once inserting an item into an occupied bucket and after doing so we exceeed the maximum load factor (leave aside the change in maximum load factor in this patch). If we insert into an occupied bucket (including the worse case bucket) then we trigger a key split, and we have exactly the same cases as before. If we insert into an empty bucket then we now have a worst case of K-1 items in one bucket, and 1 item in another, in a hash with K buckets, thus the worst case has not changed.
* Correct hv_iterinit's return value documentationMatthew Horsfall2017-02-281-2/+2
|
* HvTOTALKEYS() takes a HV* as argumentSteffen Mueller2017-02-031-1/+1
| | | | | | | Incidentally, it currently works on SV *'s as well because there's an explicit cast after an SvANY. Let's not rely on that. This commit also removes a pointless const in a cast. Again. It takes an HV * as argument. Let's only change that if we have a strong reason to.
* Use cBOOL() instead of ? TRUE : FALSEDagfinn Ilmari Mannsåker2017-01-251-2/+2
| | | | Except under cpan/ and dist/
* Clean up warnings uncovered by 'clang -Weverything'.Andy Lester2016-12-051-0/+1
| | | | For: RT #130195
* Change white space to avoid C++ deprecation warningKarl Williamson2016-11-181-15/+15
| | | | | | | | | | | | | | | | | | | | | | C++11 requires space between the end of a string literal and a macro, so that a feature can unambiguously be added to the language. Starting in g++ 6.2, the compiler emits a warning when there isn't a space (presumably so that future versions can support C++11). Unfortunately there are many such instances in the perl core. This commit fixes those, including those in ext/, but individual commits will be used for the other modules, those in dist/ and cpan/. This commit also inserts space at the end of a macro before a string literal, even though that is not deprecated, and removes useless "" literals following a macro (instead of inserting a blank). The result is easier to read, making the macro stand out, and be clearer as to the intention. Code and modules included with the Perl core need to be compilable using C++. This is so that perl can be embedded in C++ programs. (Actually, only the hdr files need to be so compilable, but it would be hard to test that just the hdrs are compilable.) So we need to accommodate changes to the C++ language.
* Revert "hv.h: rework HEK_FLAGS to a proper member in struct hek"Tony Cook2016-11-031-1/+2
| | | | | | | | This reverts commit d3148f758506efd28325dfd8e1b698385133f0cd. SV keys are stored as pointers in the key_key, on platforms with alignment requirements (such as PA-RISC) this resulted in bus errors early in the build.
* speed up AV and HV clearing/undeffingDavid Mitchell2016-10-261-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | av_clear(), av_undef(), hv_clear(), hv_undef() and av_make() all have similar guards along the lines of: ENTER; SAVEFREESV(SvREFCNT_inc_simple_NN(av)); ... do stuff ...; LEAVE; to stop the AV or HV leaking or being prematurely freed while processing its elements (e.g. FETCH() or DESTROY() might do something to it). Introducing an extra scope and calling leave_scope() is expensive. Instead, use a trick I introduced in my recent pp_assign() recoding: add the AV/HV to the temps stack, then at the end of the function, just PL_tmpx_ix-- if nothing else has been pushed on the tmps stack in the meantime, or replace the tmps stack slot with &PL_sv_undef otherwise (which doesn't care how many times its ref count gets decremented). This is efficient, and doesn't artificially extend the life of the SV like sv_2mortal() would. This commit makes this code around 5% faster: my @a; for my $i (1..3_000_000) { @a = (1,2,3); @a = (); } and this code around 3% faster: my %h; for my $i (1..3_000_000) { %h = qw(a 1 b 2); %h = (); }
* hv.h: rework HEK_FLAGS to a proper member in struct hekTodd Rinaldo2016-10-241-2/+1
| | | | | | | | | | | | | | | | | | | | | Move the store of HEK_FLAGS off the end of the allocated hek_key into the hek struct, simplifying access and providing clarity to the code. What is not clear is why Nicholas or perhaps Jarkko did not do this themselves. We use similar tricks elsewhere, so perhaps it was just continuing a tradition... One thought is that we often have do strcmp/memeq on these strings, and having their start be aligned might improve performance, wheras this patch changes them to be unaligned. If so perhaps we should just make flags a U32 and let the HEK's be larger. They are shared in PL_strtab, and are probably often sitting in malloc blocks that are sufficiently large enough that making them bigger would make no practical difference. (All of this is worth checking.) [with edits by Yves Orton]
* hv.c: use new SvPVCLEAR and constant string friendly macrosYves Orton2016-10-191-1/+1
|
* perlapi: Add entry for hv_bucket_ratioKarl Williamson2016-06-301-1/+1
| | | | autodoc doesn't find things like Per_hv_bucket_ratio().
* Change scalar(%hash) to be the same as 0+keys(%hash)Yves Orton2016-06-221-54/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This subject has a long history see [perl #114576] for more discussion. https://rt.perl.org/Public/Bug/Display.html?id=114576 There are a variety of reasons we want to change the return signature of scalar(%hash). One is that it leaks implementation details about our associative array structure. Another is that it requires us to keep track of the used buckets in the hash, which we use for no other purpose but for scalar(%hash). Another is that it is just odd. Almost nothing needs to know these values. Perhaps debugging, but we have several much better functions for introspecting the internals of a hash. By changing the return signature we can remove all the logic related to maintaining and updating xhv_fill_lazy. This should make hot code paths a little faster, and maybe save some memory for traversed hashes. In order to provide some form of backwards compatibility we adds three new functions to the Hash::Util namespace: bucket_ratio(), num_buckets() and used_buckets(). These functions are actually implemented in universal.c, and thus always available even if Hash::Util is not loaded. This simplifies testing. At the same time Hash::Util contains backwards compatible code so that the new functions are available from it should they be needed in older perls. There are many tests in t/op/hash.t that are more or less obsolete after this patch as they test that xhv_fill_lazy is correctly set in various situations. However since we have a backwards compat layer we can just switch them to use bucket_ratio(%hash) instead of scalar(%hash) and keep the tests, just in case they are actually testing something not tested elsewhere.
* [perl #128086] Fix precedence in hv_ename_deleteHugo van der Sanden2016-05-151-1/+2
| | | | | | | | | | | | A stash’s array of names may have null for the first entry, in which case it is not one of the effective names, and the name count will be negative. The ‘count > 0’ is meant to prevent hv_ename_delete from trying to read that entry, but a precedence problem introduced in 4643eb699 stopped it from doing that. [This commit message was written by the committer.]
* [perl #123788] update isa magic stash records when *ISA is deletedTony Cook2016-01-111-1/+66
|
* Improve pod for [ah]v_(clear|undef)David Mitchell2015-10-201-6/+4
| | | | See [perl #117341].
* Add macro for converting Latin1 to UTF-8, and use itKarl Williamson2015-09-041-2/+2
| | | | | | | | | This adds a macro that converts a code point in the ASCII 128-255 range to UTF-8, and changes existing code to use it when the range is known to be restricted to this one, rather than the previous macro which accepted a wider range (any code point representable by 2 bytes), but had an extra test on EBCDIC platforms, hence was larger than necessary and slightly slower.
* perlapi use 'UTF-8' instead of variants of thatKarl Williamson2015-09-031-1/+1
|
* Various pods: Add C<> around many typed-as-is thingsKarl Williamson2015-09-031-23/+24
| | | | Removes 'the' in front of parameter names in some instances.
* perlapi, perlintern: Add L<> links to podKarl Williamson2015-09-031-7/+8
|
* perlapi: Use C<> instead of I<> for parameter names, etcKarl Williamson2015-08-011-11/+11
| | | | | The majority of perlapi uses C<> to specify these things, but a few things used I<> instead. Standardize to C<>.
* Impossible for entry to be NULL at this point.Jarkko Hietaniemi2015-06-261-1/+1
| | | | | | | | | | 740 if (return_svp) { notnull: At condition entry, the value of entry cannot be NULL. dead_error_condition: The condition entry must be true. CID 104777: Logically dead code (DEADCODE) dead_error_line: Execution cannot reach the expression NULL inside this statement: return entry ? (void *)&ent.... 741 return entry ? (void *) &HeVAL(entry) : NULL;
* mg_find can return NULL.Jarkko Hietaniemi2015-06-261-1/+5
| | | | | | CID 104831: Dereference null return value (NULL_RETURNS) 43. dereference: Dereferencing a pointer that might be null Perl_mg_find(sv, 112) when calling Perl_magic_existspack. (The dereference is assumed on the basis of the 'nonnull' parameter attribute.) 499 magic_existspack(svret, mg_find(sv, PERL_MAGIC_tiedelem));
* Stop $^H |= 0x1c020000 from enabling all featuresFather Chrysostomos2015-03-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | That set of bits sets the feature bundle to ‘custom’, which means that the features are set by %^H, and also indicates that %^H has been did- dled with, so it’s worth looking at. In the specific case where %^H is untouched and there is no corres- ponding cop hint hash behind the scenes, Perl_feature_is_enabled (in toke.c) ends up returning TRUE. Commit v5.15.6-55-g94250ae sped up feature checking by allowing refcounted_he_fetch to return a boolean when checking for existence, instead of converting the value to a scalar, whose contents we are not even going to use. This was when the bug started happening. I did not update the code path in refcounted_he_fetch that handles the absence of a hint hash. So it was returning &PL_sv_placeholder instead of NULL; TRUE instead of FALSE. This did not cause problems for most code, but with the introduction of the new bitwise ops in v5.21.8-150-g8823cb8, it started causing uni::perl to fail, because they were implicitly enabled, making ^ a numeric op, when it was being used as a string op.
* Replace common Emacs file-local variables with dir-localsDagfinn Ilmari Mannsåker2015-03-221-6/+0
| | | | | | | | | | | | | | | | An empty cpan/.dir-locals.el stops Emacs using the core defaults for code imported from CPAN. Committer's work: To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed to be incremented in many files, including throughout dist/PathTools. perldelta entry for module updates. Add two Emacs control files to MANIFEST; re-sort MANIFEST. For: RT #124119.
* [perl #123847] crash with *foo::=*bar::=*with_hashFather Chrysostomos2015-03-111-2/+5
| | | | | | When a hash has no canonical name and one effective name, the array of names has a null pointer at the beginning. hv_ename_add was not tak- ing that into account, and was trying to dereference the null pointer.
* don't test non-null argsDavid Mitchell2015-03-111-23/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For lots of core functions: if a function parameter has been declared NN in embed.fnc, don't test for nullness at the start of the function, i.e. eliminate code like if (!foo) ... On debugging builds the test is redundant, as the PERL_ARGS_ASSERT_FOO at the start of the function will already have croaked. On optimised builds, it will skip the check (and so be slightly faster), but if actually passed a null arg, will now crash with a null-deref SEGV rather than doing whatever the check used to do (e.g. croak, or silently return and let the caller's code logic to go awry). But hopefully this should never happen as such instances will already have been detected on debugging builds. It also has the advantage of shutting up recent clangs which spew forth lots of stuff like: sv.c:6308:10: warning: nonnull parameter 'bigstr' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion] if (!bigstr) The only exception was in dump.c, where rather than skipping the null test, I instead changed the function def in embed.fnc to allow a null arg, on the basis that dump functions are often used for debugging (where pointers may unexpectedly become NULL) and it's better there to display that this item is null than to SEGV. See the p5p thread starting at 20150224112829.GG28599@iabyn.com.
* Consistently use NOT_REACHED; /* NOTREACHED */Jarkko Hietaniemi2015-03-041-1/+1
| | | | | | Both needed: the macro is for compilers, the comment for static checkers. (This doesn't address whether each spot is correct and necessary.)
* Corrections to spelling and grammatical errors.Lajos Veres2015-01-281-1/+1
| | | | Extracted from patch submitted by Lajos Veres in RT #123693.
* Rework sv_get_backrefs() so it is simpler, and C++ compliantYves Orton2014-12-251-0/+1
| | | | | | We unroll hv_backreferences_p() in sv_get_backrefs() so the logic is simpler, (we dont need a **SV for this function), and (hopefully) make it C++ compliant at the same time.
* Restructure hv_backreferences_p() so assert makes senseYves Orton2014-12-251-4/+4
| | | | | | | | Prior to this patch the assert was meaningless as we would use the argument before we asserted things about it. This patch restructures the logic so we do the asserts first and *then* use the argument.