summaryrefslogtreecommitdiff
path: root/hv.c
Commit message (Collapse)AuthorAgeFilesLines
* S_clear_placeholders() should call HvHASKFLAGS_off() if no keys remain.Nicholas Clark2021-07-261-9/+6
| | | | | | | | | | | | | | | | | | This isn't essential - HvHASKFLAGS() set when there are no keys with flags merely disables some potential optimisations. (The other way round - not being set when keys have flags would be a bug). This is a regression I introduced in Feb 2004 with commit d36773897a6f30fc: hv_clear_placeholders now manipulates the linked lists directly, rather than using the iterator interface and calling hv_delete This will allow hv_delete to be simplified to remove most of the special casing related to placeholders. However several people have looked at the code since then and no-one has realised that with the logic as-was, this call had to be unreachable. Also avoid calling HvPLACEHOLDERS_get() twice - each caller has already done this, so pass the value in.
* Correctly call delete magic on all hash magicLeon Timmermans2021-06-021-1/+1
| | | | | | | Previously it would only call it correctly if the hash magic was RMAGICAL, which is only set if a magic either has clear magic or has neither get or set magic. This means any magic with get or set would break.
* hv.c: add a guard clause to prevent the number of buckets in a hash from ↵Yves Orton2021-02-121-1/+9
| | | | | | | | | getting too large This caps it at 1<<28 buckets, eg, ~268M. In theory without a guard clause like this we could grow to the point of possibly wrapping around in terms of size, not to mention being ridiculously wasteful of memory at larger sizes. Even this cap is probably too high. It should probably be something like 1<<24.
* style: Detabify indentation of the C code maintained by the core.Michael G. Schwern2021-01-171-1301/+1301
| | | | | | | | | | | This just detabifies to get rid of the mixed tab/space indentation. Applying consistent indentation and dealing with other tabs are another issue. Done with `expand -i`. * vutil.* left alone, it's part of version. * Left regen managed files alone for now.
* Remove empty "#ifdef"sTom Hukins2020-12-081-4/+0
|
* Fix documentation grammarTom Hukins2020-11-201-1/+1
| | | | | Replace "Frees the all the" with "Frees all the". The original wording was introduced in c2217cd33590ef654 and a4395ebabc8655115.
* autodoc.pl: Enhance apidoc_section featureKarl Williamson2020-11-061-2/+2
| | | | | | | | | | | This feature allows documentation destined for perlapi or perlintern to be split into sections of related functions, no matter where the documentation source is. Prior to this commit the line had to contain the exact text of the title of the section. Now it can be a $variable name that autodoc.pl expands to the title. It still has to be an exact match for the variable in autodoc, but now, the expanded text can be changed in autodoc alone, without other files needing to be updated at the same time.
* Reorganize perlapiKarl Williamson2020-09-041-1/+3
| | | | | This uses a new organization of sections that I came up with. I asked for comments on p5p, but there were none.
* Change some link pod for better renderingKarl Williamson2020-08-311-3/+3
| | | | C<L</foo>> renders better in places than L</C<foo>>
* Revert "there is no obvious reason not to set flags"Karl Williamson2020-07-301-2/+3
| | | | | This reverts commit 0ddecb91901742e7df780394170d4bf818ee1da8 as part of https://github.com/Perl/perl5/issues/15855
* Remove use of dVAR in coreDagfinn Ilmari Mannsåker2020-07-201-16/+0
| | | | | It only does anything under PERL_GLOBAL_STRUCT, which is gone. Keep the dNOOP defintion for CPAN back-compat
* hv.c: Remove obsolete/confusing constantEric Herman2020-07-101-1/+0
| | | | | | | | | The HV_FILL_THRESHOLD is no longer used and is misleading. See also: commit 8bf4c4010cc474d4000c2a8c78f6890fa5f1e577 Date: Mon Jun 20 22:51:38 2016 +0200 Change scalar(%hash) to be the same as 0+keys(%hash)
* Note that certain flags are documentedKarl Williamson2019-12-171-0/+2
| | | | | | | | | | | This is useful in Devel::PPPort for generating its api-info data. That useful feature of D:P allows someone to find out what was the first release of Perl to have a function, macro, or flag. And whether using ppport.h backports it further. I went through apidoc.pod and looked for flags that were documented but that D:P didn't know about. This commit adds entries for each so that D:P can find them.
* Fix: local variable hiding parameter of same nameJames E Keenan2019-11-121-5/+5
| | | | | | | | | | | | | | | | LGTM provides static code analysis and recommendations for code quality improvements. Their recent run over the Perl 5 core distribution identified 12 instances where a local variable hid a parameter of the same name in an outer scope. The LGTM rule governing this situation can be found here: Per: https://lgtm.com/rules/2156240606/ This patch renames local variables in approximately 8 of those instances to comply with the LGTM recommendation. Suggestions for renamed variables were made by Tony Cook. For: https://github.com/Perl/perl5/pull/17281
* Refer to CopLABEL_len[_flags] in pod for cop_fetch_labelKarl Williamson2019-09-021-2/+8
|
* perlapi: Clarify pod for cop_store_labelKarl Williamson2019-09-021-1/+1
|
* Remove redundant info on =for apidoc linesKarl Williamson2019-05-301-10/+10
| | | | | | | | | This information is already in embed.fnc, and we know it compiles. Some of this information is now out-of-date. Get rid of it. There was one bit of information that was (apparently) wrong in embed.fnc. The apidoc line asked that there be no usage example generated for newXS. I added that flag to the embed.fnc entry.
* perlapi: Clarify entry for hv_store()Karl Williamson2019-03-121-1/+3
|
* S_hv_delete_common(): avoid undefined behaviourDavid Mitchell2018-11-211-1/+1
| | | | | | | | | | | ASAN -fsanitize-undefined was tripping on the second of these two lines: svp = AvARRAY(isa); end = svp + AvFILLp(isa)+1; In the case where svp is NULL and AvFILLp(isa) is -1, the first addition is undefined behaviour. Add the 1 first, so that it becomes svp + (-1+1), which is safe.
* Use memEQs, memNEs in core filesKarl Williamson2017-11-061-1/+1
| | | | | | | | | | Where the length is known, we can use these functions which relieve the programmer and the program reader from having to count characters. The memFOO functions should also be slightly faster than the strFOO equivalents. In some instances in this commit, hard coded numbers are used. These come from the 'case' statement values that apply to them.
* Rename strEQs to strBEGINs; remove strNEsKarl Williamson2017-11-061-1/+1
| | | | | | | | | | The original names are confusing. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/244335 The two macros are mapped into just that one, complementing the result for the few cases where strNEs was used.
* Consider magic %ENV as tied in hv_pushkv.Craig A. Berry2017-08-051-1/+5
| | | | | | | For the DYNAMIC_ENV_FETCH case, we don't know the number of keys until the first iteration triggers a call to prime_env_iter(), so piggyback on the tied magic case, which already handles extending the stack for each iteration rather than all at once beforehand.
* hv_pushkv(): handle keys() and values() tooDavid Mitchell2017-07-271-16/+35
| | | | | | | | | | | | | The newish function hv_pushkv() currently just pushes all key/value pairs on the stack. i.e. it does the equivalent of the perl code '() = %h'. Extend it so that it can handle 'keys %h' and values %h' too. This is basically moving the remaining list-context functionality out of do_kv() and into hv_pushkv(). The rationale for this is that hv_pushkv() is a pure HV-related function, while do_kv() is a pp function for several ops including OP_KEYS/VALUES, and expects PL_op->op_flags/op_private to be valid.
* Perl_hv_pushkv(): unroll hv_iterkeysv()David Mitchell2017-07-271-6/+12
| | | | Do our own mortal stack extending and handling.
* create Perl_hv_pushkv() functionDavid Mitchell2017-07-271-0/+44
| | | | | | | | | | | | | | | | ...and make pp_padhv(), pp_rv2hv() use it rather than using Perl_do_kv() Both pp_padhv() and pp_rv2hv() (via S_padhv_rv2hv_common()), outsource to Perl_do_kv(), the list-context pushing/flattening of a hash onto the stack. Perl_do_kv() is a big function that handles all the actions of keys, values etc. Instead, create a new function which does just the pushing of a hash onto the stack. At the same time, split it out into two loops, one for tied, one for normal: the untied one can skip extending the stack on each iteration, and use a cheaper HeVAL() instead of calling hv_iterval().
* make callers of SvTRUE() more efficientDavid Mitchell2017-07-271-1/+1
| | | | | | Where its obvious that the args can't be null, use SvTRUE_NN() instead. Avoid possible multiple evaluations of the arg by assigning to a local var first if necessary.
* use the new PL_sv_zero in obvious placesDavid Mitchell2017-07-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | In places that do things like mPUSHi(0) or newSViv(0), replace them with PUSHs(&PL_sv_zero) and &PL_sv_zero, etc. This avoids the cost of creating and/or mortalising an SV, and/or setting its value to 0. This commit causes a subtle change to tainting in various places as a side-effect. For example, grep in scalar context retunrs 0 if it has no args. Formerly the zero value could in theory get tainted: @a = (); $x = ( ($^X . ""), grep { 1 } @a); It used to be the case that $x would be tainted; now its not. In practice this doesn't matter - the zero value was only getting tainted as a side-effect of tainting's "if anything in the statement uses a tainted value, taint everything" mechanism, which gives (documented) false positives. This commit merely removes some such false positives, and makes the behaviour similar to functions which return &PL_sv_undef/no/yes, which are also immune to side-effect tainting.
* hv.c: fixup args assert for HV_FREE_ENTRIESYves Orton2017-07-011-1/+1
|
* hv.c: rename static function S_hfreeentries() to S_hv_free_entries()Yves Orton2017-07-011-6/+6
| | | | hfreeentries() reads very poorly - hv_free_entries() makes more sense too.
* fixup typo in commentYves Orton2017-07-011-1/+1
|
* hv.c: silence compiler warningYves Orton2017-06-011-1/+1
| | | | | | | | | | hv.c: In function ‘Perl_hv_undef_flags’: hv.c:2053:35: warning: ‘orig_ix’ may be used uninitialized in this function [-Wmaybe-uninitialized] PL_tmps_stack[orig_ix] = &PL_sv_undef; The warning is bogus, as we only use orig_ix if "save" is true, and if "save" is true we will have initialized orig_ix. However initializing it in the first place avoids any issue
* RT #127742: Hash keys are limited to 2 GB - throw an exception if hash keys ↵Aaron Crane2017-06-011-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | are too long We currently require hash keys to be less than 2**31 bytes long. But (a) nothing actually tries to enforce that, and (b) if a Perl program tries to create a hash with such a key (using a 64-bit system), we miscalculate the size of a memory block, yielding a panic: $ ./perl -e '+{ "x" x 2**31, undef }' panic: malloc, size=18446744071562068026 at -e line 1. Instead, check for this situation, and croak with an appropriate (new) diagnostic in the unlikely event that it occurs. This also involves changing the type of an argument to a public API function: Perl_share_hek() previously took the key's length as an I32, but that makes it impossible to detect over-long keys, so it must be SSize_t instead. From Yves: We also inject the length test into the PERL_HASH() macro, so that where the macro is used *before* calling into any of the hv functions we can avoid hashing a very long string only to throw an exception that it is too long. Might as well fail fast.
* Restore "Tweak our hash bucket splitting rules"Yves Orton2017-06-011-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit e4343ef32499562ce956ba3cb9cf4454d5d2ff7f, which was a revert of 05f97de032fe95cabe8c9f6d6c0a5897b1616194. Prior to this patch we resized hashes when after inserting a key the load factor of the hash reached 1 (load factor= keys / buckets). This patch makes two subtle changes to this logic: 1. We split only after inserting a key into an utilized bucket, 2. and the maximum load factor exceeds 0.667 The intent and effect of this change is to increase our hash tables efficiency. Reducing the maximum load factor 0.667 means that we should have much less keys in collision overall, at the cost of some unutilized space (2/3rds was chosen as it is easier to calculate than 0.7). On the other hand, only splitting after a collision means in theory that we execute the "final split" less often. Additionally, insertin a key into an unused bucket increases the efficiency of the hash, without changing the worst case.[1] In other words without increasing collisions we use the space in our hashes more efficiently. A side effect of this hash is that the size of a hash is more sensitive to key insert order. A set of keys with some collisions might be one size if those collisions were encountered early, or another if they were encountered later. Assuming random distribution of hash values about 50% of hashes should be smaller than they would be without this rule. The two changes complement each other, as changing the maximum load factor decreases the chance of a collision, but changing to only split after a collision means that we won't waste as much of that space we might. [1] Since I personally didnt find this obvious at first here is my explanation: The old behavior was that we doubled the number of buckets when the number of keys in the hash matched that of buckets. So on inserting the Kth key into a K bucket hash, we would double the number of buckets. Thus the worse case prior to this patch was a hash containing K-1 keys which all hash into a single bucket, and the post split worst case behavior would be having K items in a single bucket of a hash with 2*K buckets total. The new behavior says that we double the size of the hash once inserting an item into an occupied bucket and after doing so we exceeed the maximum load factor (leave aside the change in maximum load factor in this patch). If we insert into an occupied bucket (including the worse case bucket) then we trigger a key split, and we have exactly the same cases as before. If we insert into an empty bucket then we now have a worst case of K-1 items in one bucket, and 1 item in another, in a hash with K buckets, thus the worst case has not changed.
* Revert "Tweak our hash bucket splitting rules"Yves Orton2017-04-231-31/+12
| | | | | | This reverts commit 05f97de032fe95cabe8c9f6d6c0a5897b1616194. Accidentally pushed while waiting for blead-unfreeze.
* Tweak our hash bucket splitting rulesYves Orton2017-04-231-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch we resized hashes when after inserting a key the load factor of the hash reached 1 (load factor= keys / buckets). This patch makes two subtle changes to this logic: 1. We split only after inserting a key into an utilized bucket, 2. and the maximum load factor exceeds 0.667 The intent and effect of this change is to increase our hash tables efficiency. Reducing the maximum load factor 0.667 means that we should have much less keys in collision overall, at the cost of some unutilized space (2/3rds was chosen as it is easier to calculate than 0.7). On the other hand, only splitting after a collision means in theory that we execute the "final split" less often. Additionally, insertin a key into an unused bucket increases the efficiency of the hash, without changing the worst case.[1] In other words without increasing collisions we use the space in our hashes more efficiently. A side effect of this hash is that the size of a hash is more sensitive to key insert order. A set of keys with some collisions might be one size if those collisions were encountered early, or another if they were encountered later. Assuming random distribution of hash values about 50% of hashes should be smaller than they would be without this rule. The two changes complement each other, as changing the maximum load factor decreases the chance of a collision, but changing to only split after a collision means that we won't waste as much of that space we might. [1] Since I personally didnt find this obvious at first here is my explanation: The old behavior was that we doubled the number of buckets when the number of keys in the hash matched that of buckets. So on inserting the Kth key into a K bucket hash, we would double the number of buckets. Thus the worse case prior to this patch was a hash containing K-1 keys which all hash into a single bucket, and the post split worst case behavior would be having K items in a single bucket of a hash with 2*K buckets total. The new behavior says that we double the size of the hash once inserting an item into an occupied bucket and after doing so we exceeed the maximum load factor (leave aside the change in maximum load factor in this patch). If we insert into an occupied bucket (including the worse case bucket) then we trigger a key split, and we have exactly the same cases as before. If we insert into an empty bucket then we now have a worst case of K-1 items in one bucket, and 1 item in another, in a hash with K buckets, thus the worst case has not changed.
* Correct hv_iterinit's return value documentationMatthew Horsfall2017-02-281-2/+2
|
* HvTOTALKEYS() takes a HV* as argumentSteffen Mueller2017-02-031-1/+1
| | | | | | | Incidentally, it currently works on SV *'s as well because there's an explicit cast after an SvANY. Let's not rely on that. This commit also removes a pointless const in a cast. Again. It takes an HV * as argument. Let's only change that if we have a strong reason to.
* Use cBOOL() instead of ? TRUE : FALSEDagfinn Ilmari Mannsåker2017-01-251-2/+2
| | | | Except under cpan/ and dist/
* Clean up warnings uncovered by 'clang -Weverything'.Andy Lester2016-12-051-0/+1
| | | | For: RT #130195
* Change white space to avoid C++ deprecation warningKarl Williamson2016-11-181-15/+15
| | | | | | | | | | | | | | | | | | | | | | C++11 requires space between the end of a string literal and a macro, so that a feature can unambiguously be added to the language. Starting in g++ 6.2, the compiler emits a warning when there isn't a space (presumably so that future versions can support C++11). Unfortunately there are many such instances in the perl core. This commit fixes those, including those in ext/, but individual commits will be used for the other modules, those in dist/ and cpan/. This commit also inserts space at the end of a macro before a string literal, even though that is not deprecated, and removes useless "" literals following a macro (instead of inserting a blank). The result is easier to read, making the macro stand out, and be clearer as to the intention. Code and modules included with the Perl core need to be compilable using C++. This is so that perl can be embedded in C++ programs. (Actually, only the hdr files need to be so compilable, but it would be hard to test that just the hdrs are compilable.) So we need to accommodate changes to the C++ language.
* Revert "hv.h: rework HEK_FLAGS to a proper member in struct hek"Tony Cook2016-11-031-1/+2
| | | | | | | | This reverts commit d3148f758506efd28325dfd8e1b698385133f0cd. SV keys are stored as pointers in the key_key, on platforms with alignment requirements (such as PA-RISC) this resulted in bus errors early in the build.
* speed up AV and HV clearing/undeffingDavid Mitchell2016-10-261-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | av_clear(), av_undef(), hv_clear(), hv_undef() and av_make() all have similar guards along the lines of: ENTER; SAVEFREESV(SvREFCNT_inc_simple_NN(av)); ... do stuff ...; LEAVE; to stop the AV or HV leaking or being prematurely freed while processing its elements (e.g. FETCH() or DESTROY() might do something to it). Introducing an extra scope and calling leave_scope() is expensive. Instead, use a trick I introduced in my recent pp_assign() recoding: add the AV/HV to the temps stack, then at the end of the function, just PL_tmpx_ix-- if nothing else has been pushed on the tmps stack in the meantime, or replace the tmps stack slot with &PL_sv_undef otherwise (which doesn't care how many times its ref count gets decremented). This is efficient, and doesn't artificially extend the life of the SV like sv_2mortal() would. This commit makes this code around 5% faster: my @a; for my $i (1..3_000_000) { @a = (1,2,3); @a = (); } and this code around 3% faster: my %h; for my $i (1..3_000_000) { %h = qw(a 1 b 2); %h = (); }
* hv.h: rework HEK_FLAGS to a proper member in struct hekTodd Rinaldo2016-10-241-2/+1
| | | | | | | | | | | | | | | | | | | | | Move the store of HEK_FLAGS off the end of the allocated hek_key into the hek struct, simplifying access and providing clarity to the code. What is not clear is why Nicholas or perhaps Jarkko did not do this themselves. We use similar tricks elsewhere, so perhaps it was just continuing a tradition... One thought is that we often have do strcmp/memeq on these strings, and having their start be aligned might improve performance, wheras this patch changes them to be unaligned. If so perhaps we should just make flags a U32 and let the HEK's be larger. They are shared in PL_strtab, and are probably often sitting in malloc blocks that are sufficiently large enough that making them bigger would make no practical difference. (All of this is worth checking.) [with edits by Yves Orton]
* hv.c: use new SvPVCLEAR and constant string friendly macrosYves Orton2016-10-191-1/+1
|
* perlapi: Add entry for hv_bucket_ratioKarl Williamson2016-06-301-1/+1
| | | | autodoc doesn't find things like Per_hv_bucket_ratio().
* Change scalar(%hash) to be the same as 0+keys(%hash)Yves Orton2016-06-221-54/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This subject has a long history see [perl #114576] for more discussion. https://rt.perl.org/Public/Bug/Display.html?id=114576 There are a variety of reasons we want to change the return signature of scalar(%hash). One is that it leaks implementation details about our associative array structure. Another is that it requires us to keep track of the used buckets in the hash, which we use for no other purpose but for scalar(%hash). Another is that it is just odd. Almost nothing needs to know these values. Perhaps debugging, but we have several much better functions for introspecting the internals of a hash. By changing the return signature we can remove all the logic related to maintaining and updating xhv_fill_lazy. This should make hot code paths a little faster, and maybe save some memory for traversed hashes. In order to provide some form of backwards compatibility we adds three new functions to the Hash::Util namespace: bucket_ratio(), num_buckets() and used_buckets(). These functions are actually implemented in universal.c, and thus always available even if Hash::Util is not loaded. This simplifies testing. At the same time Hash::Util contains backwards compatible code so that the new functions are available from it should they be needed in older perls. There are many tests in t/op/hash.t that are more or less obsolete after this patch as they test that xhv_fill_lazy is correctly set in various situations. However since we have a backwards compat layer we can just switch them to use bucket_ratio(%hash) instead of scalar(%hash) and keep the tests, just in case they are actually testing something not tested elsewhere.
* [perl #128086] Fix precedence in hv_ename_deleteHugo van der Sanden2016-05-151-1/+2
| | | | | | | | | | | | A stash’s array of names may have null for the first entry, in which case it is not one of the effective names, and the name count will be negative. The ‘count > 0’ is meant to prevent hv_ename_delete from trying to read that entry, but a precedence problem introduced in 4643eb699 stopped it from doing that. [This commit message was written by the committer.]
* [perl #123788] update isa magic stash records when *ISA is deletedTony Cook2016-01-111-1/+66
|
* Improve pod for [ah]v_(clear|undef)David Mitchell2015-10-201-6/+4
| | | | See [perl #117341].
* Add macro for converting Latin1 to UTF-8, and use itKarl Williamson2015-09-041-2/+2
| | | | | | | | | This adds a macro that converts a code point in the ASCII 128-255 range to UTF-8, and changes existing code to use it when the range is known to be restricted to this one, rather than the previous macro which accepted a wider range (any code point representable by 2 bytes), but had an extra test on EBCDIC platforms, hence was larger than necessary and slightly slower.