summaryrefslogtreecommitdiff
path: root/ext
Commit message (Collapse)AuthorAgeFilesLines
* Tweak our hash bucket splitting rulesYves Orton2017-04-232-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch we resized hashes when after inserting a key the load factor of the hash reached 1 (load factor= keys / buckets). This patch makes two subtle changes to this logic: 1. We split only after inserting a key into an utilized bucket, 2. and the maximum load factor exceeds 0.667 The intent and effect of this change is to increase our hash tables efficiency. Reducing the maximum load factor 0.667 means that we should have much less keys in collision overall, at the cost of some unutilized space (2/3rds was chosen as it is easier to calculate than 0.7). On the other hand, only splitting after a collision means in theory that we execute the "final split" less often. Additionally, insertin a key into an unused bucket increases the efficiency of the hash, without changing the worst case.[1] In other words without increasing collisions we use the space in our hashes more efficiently. A side effect of this hash is that the size of a hash is more sensitive to key insert order. A set of keys with some collisions might be one size if those collisions were encountered early, or another if they were encountered later. Assuming random distribution of hash values about 50% of hashes should be smaller than they would be without this rule. The two changes complement each other, as changing the maximum load factor decreases the chance of a collision, but changing to only split after a collision means that we won't waste as much of that space we might. [1] Since I personally didnt find this obvious at first here is my explanation: The old behavior was that we doubled the number of buckets when the number of keys in the hash matched that of buckets. So on inserting the Kth key into a K bucket hash, we would double the number of buckets. Thus the worse case prior to this patch was a hash containing K-1 keys which all hash into a single bucket, and the post split worst case behavior would be having K items in a single bucket of a hash with 2*K buckets total. The new behavior says that we double the size of the hash once inserting an item into an occupied bucket and after doing so we exceeed the maximum load factor (leave aside the change in maximum load factor in this patch). If we insert into an occupied bucket (including the worse case bucket) then we trigger a key split, and we have exactly the same cases as before. If we insert into an empty bucket then we now have a worst case of K-1 items in one bucket, and 1 item in another, in a hash with K buckets, thus the worst case has not changed.
* fix and test execution of non-empty .bs filesDavid Mitchell2017-04-073-2/+26
| | | | | | | | | | | | | | | | | | | | | During the build of XS modules, an empty Foo.bs file is normally created for each Foo.so file. If a Foo_BS file is present, instead this triggers the auto-generatation of a .bs file which may have executable perl content. However, nothing in core currently generates a non-empty .bs file. So add a test that this mechanism works, and fix up the three dynamic lib loaders which implement the 'do $bs if -s $bs' mechanism to not rely on the process having '.' present in @INC. As it happens this already works currently, because the name of the .bs file to load will usually be something like ../../lib/auto/Foo/Foo.bs and the presence of the leading '..' causes 'do' to load the file directly rather than via @INC. But locally fix up @INC anyway, in case '../' isn't always the case.
* fix ext/Pod-Html/t/*.t that assumed '.' in @INCDavid Mitchell2017-04-0715-15/+15
|
* fix ext/arybase/t/*.t that assumed '.' in @INCDavid Mitchell2017-04-071-0/+1
|
* fix ext/XS-APItest/t/*.t that assumed '.' in @INCDavid Mitchell2017-04-072-2/+5
|
* POSIX.pod: Remove obsolete textKarl Williamson2017-03-081-1/+1
|
* Revert "ext/VMS-Stdio: switch to using macros designed for string constant args"Craig A. Berry2017-02-212-4/+4
| | | | | | | | | This reverts commit c0dea56fe487504493d97df5a7a6be57a2d2834d. The new macros introduced here have now just been rendered invisible by 8f71649941d02d5bdfe4f. Using macros that we can't see breaks the build, so revert this for now. It can be reintroduced when the macro names are settled and no longer hidden.
* Split XS-APItest/t/utf8.tKarl Williamson2017-02-2014-1539/+1684
| | | | | | | | | | | | | | | | This test file is one of the longest running ones. It has three main semi-independent parts. Two of them are split off into 2 files with a common file required. The other part is still long running, so it is split so that a common file is used to run the tests, but it is called with a chunk number and it only executes based on that chunk. The number of chunks is based on the environment variable TEST_JOBS, up to 10. Each chunk executes 1/TEST_JOBS of the total test. If TEST_JOBS is not set, it reverts to 1 chunk. The alternative would be to revert to 10, but since there is overhead associated with each new chunk, I chose, for now, 1. There may be a better solution later on, but I think this is good enough for now.
* Split APItest/t/handy.tKarl Williamson2017-02-1911-1/+86
| | | | | | | | | | | | This is a very long running test. This commit splits it into smaller chunks, based on the environment variable TEST_JOBS, up to 10. Each chunk executes 1/TEST_JOBS of the total test. If TEST_JOBS is not set, it reverts to 1 chunk. The alternative would be to revert to 10, but since there is overhead associated with each new chunk, I chose, for now, 1. There may be a better solution later on, but I think this is good enough for now.
* Use cBOOL() instead of ? TRUE : FALSEDagfinn Ilmari Mannsåker2017-01-252-2/+2
| | | | Except under cpan/ and dist/
* dump.c: handle GV being really a ref to a CVDavid Mitchell2017-01-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | RT #129285 These days a 'GV' can actually just be a ref to a CV when the only thing that would be stored in the glob is a CV. Update S_do_op_dump_bar() to handle this. Formerly it would trigger an assert on a non-threaded build. In fact, incorporate the fixed logic into a static function, S_gv_display(), that is shared by both S_do_op_dump_bar() and Perl_debop(); so both perl -Dx and perl -Dt get the benefit. Also for the -Dx case, make it display the raw address of the GV too.
* Fix memory leak in B::RHE->HASH method.Sergey Aleynikov2017-01-232-2/+2
|
* Move I8 test helpers to common fileKarl Williamson2017-01-221-35/+8
| | | | | | This moves the code that helps in testing I8 (which is the same as UTF-8 on non-EBCDIC platforms) to t/charset_tools.pl, away from the .t where they previously were. This means these can now be used in other .t's.
* revamp the op_dump() output formatDavid Mitchell2017-01-211-44/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mainly used for low-level debugging these days (higher level stuff like Concise having since been created), e.g. calling op_dump() from within a debugger or running with -Dx. Make it display more info, and use an ACSII-art tree to show the structure. The main changes are: * added 'ASCII-art' tree structure; * it now displays each op's class and address; * for op_next etc links, it now displays the type and address of the linked-to op in addition to its sequence number; * the following ops now have their op_other field displayed, like op_and etc already do: andassign argdefelem dor dorassign entergiven entertry enterwhen once orassign regcomp substcont * enteriter now has its op_redo etc fields displayed, like enterloop already does; Here is a sample before and after of perl -Dx -e'($x+$y) * $z' Before: { 1 TYPE = leave ===> NULL TARG = 1 FLAGS = (VOID,KIDS,PARENS,SLABBED) PRIVATE = (REFC) REFCNT = 1 { 2 TYPE = enter ===> 3 FLAGS = (UNKNOWN,SLABBED,MORESIB) } { 3 TYPE = nextstate ===> 4 FLAGS = (VOID,SLABBED,MORESIB) LINE = 1 PACKAGE = "main" SEQ = 4294967246 } { 5 TYPE = multiply ===> 1 TARG = 5 FLAGS = (VOID,KIDS,SLABBED) PRIVATE = (0x2) { 6 TYPE = add ===> 7 TARG = 3 FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB) PRIVATE = (0x2) { 8 TYPE = null ===> (9) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED,MORESIB) PRIVATE = (0x1) { 4 TYPE = gvsv ===> 9 FLAGS = (SCALAR,SLABBED) PADIX = 1 } } { 10 TYPE = null ===> (6) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) { 9 TYPE = gvsv ===> 6 FLAGS = (SCALAR,SLABBED) PADIX = 2 } } } { 11 TYPE = null ===> (5) (was rv2sv) FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) { 7 TYPE = gvsv ===> 5 FLAGS = (SCALAR,SLABBED) PADIX = 4 } } } } After: 1 leave LISTOP(0xdecb38) ===> [0x0] TARG = 1 FLAGS = (VOID,KIDS,PARENS,SLABBED) PRIVATE = (REFC) REFCNT = 1 | 2 +--enter OP(0xdecb00) ===> 3 [nextstate 0xdecb80] | FLAGS = (UNKNOWN,SLABBED,MORESIB) | 3 +--nextstate COP(0xdecb80) ===> 4 [gvsv 0xdeb3b8] | FLAGS = (VOID,SLABBED,MORESIB) | LINE = 1 | PACKAGE = "main" | SEQ = 4294967246 | 5 +--multiply BINOP(0xdecbe0) ===> 1 [leave 0xdecb38] TARG = 5 FLAGS = (VOID,KIDS,SLABBED) PRIVATE = (0x2) | 6 +--add BINOP(0xdeb2b0) ===> 7 [gvsv 0xdeb270] | TARG = 3 | FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB) | PRIVATE = (0x2) | | 8 | +--null (ex-rv2sv) UNOP(0xdeb378) ===> 9 [gvsv 0xdeb338] | | FLAGS = (SCALAR,KIDS,SLABBED,MORESIB) | | PRIVATE = (0x1) | | | 4 | | +--gvsv PADOP(0xdeb3b8) ===> 9 [gvsv 0xdeb338] | | FLAGS = (SCALAR,SLABBED) | | PADIX = 1 | | 10 | +--null (ex-rv2sv) UNOP(0xdeb2f8) ===> 6 [add 0xdeb2b0] | FLAGS = (SCALAR,KIDS,SLABBED) | PRIVATE = (0x1) | | 9 | +--gvsv PADOP(0xdeb338) ===> 6 [add 0xdeb2b0] | FLAGS = (SCALAR,SLABBED) | PADIX = 2 | 11 +--null (ex-rv2sv) UNOP(0xdeb220) ===> 5 [multiply 0xdecbe0] FLAGS = (SCALAR,KIDS,SLABBED) PRIVATE = (0x1) | 7 +--gvsv PADOP(0xdeb270) ===> 5 [multiply 0xdecbe0] FLAGS = (SCALAR,SLABBED) PADIX = 4
* add Perl_op_class(o) API functionDavid Mitchell2017-01-212-155/+5
| | | | | | | | | | | | Given an op, this function determines what type of struct it has been allocated as. Returns one of the OPclass enums, such as OPclass_LISTOP. Originally this was a static function in B.xs, but it has wider applicability; indeed several XS modules on CPAN have cut and pasted it. It adds the OPclass enum to op.h. In B.xs there was a similar enum, but with names like OPc_LISTOP. I've renamed them to OPclass_LISTOP etc. so as not to clash with the cut+paste code already on CPAN.
* APItest/t/handy.t: Skip some tests on EBCDICKarl Williamson2017-01-181-0/+3
| | | | | | | | | | The skipped tests are for malformed input for the various isCNTRL functions. Perl does not go out of its way to test for malformedness in the these, only making sure they are well-formed if that is necessary for the correct operation of the function. Since all controls in EBCDIC are represented by a single byte, and you can't malform a single byte, all the malformedness control tests will not detect malformedness on EBCDIC platforms, so skip them.
* APItest/t/handy.t: Use more mnemonic variable namesKarl Williamson2017-01-181-40/+40
| | | | | The previous commit might not have been necessary if these had been more mnemonic in the first place.
* Avoid deprecation message.Abigail2017-01-171-1/+1
| | | | | File::Glob::glob is deprecated. So, if we test it, we should avoid the warning.
* B::OP::terse will go away in Perl 5.28.Abigail2017-01-161-2/+2
| | | | Adjusted the deprecation message, and bumped the version of B::Terse.
* Don't recognize the --libpods option in Pod::HtmlAbigail2017-01-161-4/+2
| | | | | | | | Since Perl 5.18, the --libpods option has been recognized, but did not do anything other than issue a deprecation warnings. As of now, using the --libpods option creates an error. The version number of Pod::Html has bumped to 1.2202.
* Time limit the deprecation of :unique and :locked.Abigail2017-01-161-4/+7
| | | | | | | | | | | | | | | | The :unique and :locked attributes have had no effect since 5.8.8 and 5.005 respectively. They were deprecated in 5.12. They are now scheduled to be deleted in 5.28. There are two places the deprecation warning can be issued: in lib/attributes.pm, and in toke.c. The warnings were phrased differently, but since we're changing the warning anyway (as we added the version of Perl in which the attributes will disappear), we've used the same phrasing for this warning, regardless of where it is generated: Attribute "locked" is deprecated, and will disappear in Perl 5.28 Attribute "unique" is deprecated, and will disappear in Perl 5.28
* Actively deprecate File::Glob::glob().Abigail2017-01-161-1/+7
| | | | | | | | | | This function has been deprecated since 5.8. However, no deprecation message was issued; only perl5.008delta.pod and a comment in the file mention its deprecation. This patch issues a deprecation message, and warns the user it will be gone in perl 5.30. Since all this method does is calling File::Glob::bsd_glob anyway, code calling this is easily fixed.
* re.pm: pod formatting nits, and clarificationsKarl Williamson2017-01-131-8/+29
|
* Various .t's: Escape literal '}' and ']' in patternsKarl Williamson2017-01-132-2/+2
| | | | | It is clearer to show that these characters which are sometimes meta and sometimes literal are meant to be taken literally here.
* Add /xx regex pattern modifierKarl Williamson2017-01-132-11/+29
| | | | | This was first proposed in the thread starting at http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
* (perl #130534) fix test failures under DragonFly BSDTomasz Konojacki2017-01-101-1/+3
|
* (perl #130108) generate a dummy dtrace_main.o if perlmain.o doesn't contain ↵Tony Cook2017-01-051-1/+2
| | | | | | | | | | | | | | | | probes efc4bddfd4 added generating a probes object file for perlmain.o, since the compiler was generating probes even for unused inline functions. The default compiler on FreeBSD 11 however doesn't generate probes for these unused inline functions, and dtrace -G fails because it can't find any. So if dtrace fails for perlmain.o generate a dummy object file to take its place. Similarly for XS::APItest.
* APItest/t/handy.t: Skip locale tests when not in localeKarl Williamson2017-01-041-5/+3
| | | | | | | | | | | | In XS code, the macros that pay attention to locale don't check if they are being called from within the scope of 'use locale'; they assume that the code calling them wouldn't be doing so unless appropriate. That's not true of Perl-level code. I forgot that when writing these tests. Normally it doesn't show up as a problem as the underlying locale is the C locale, which on almost all platforms has the effect of not being in a locale. But the VMS C locale is special, and so doesn't meet the assumptions of these tests. The solution is to skip locale-aware macros unless we are testing locale.
* APItest/t/handy.t: Fix for EBCDICKarl Williamson2017-01-021-11/+12
| | | | | There were several instances where the native code point and the Unicode equivalent were being conflated.
* APItest.xs: Silence compiler warningsKarl Williamson2016-12-291-4/+10
| | | | See: http://www.nntp.perl.org/group/perl.perl5.porters/2016/12/msg241877.html
* Deprecate toFOO_utf8()Karl Williamson2016-12-232-0/+74
| | | | Now that there are _safe versions, deprecate the unsafe ones.
* Add toFOO_utf8_safe() macrosKarl Williamson2016-12-232-11/+42
|
* Deprecate isFOO_utf8() macrosKarl Williamson2016-12-231-3/+28
| | | | | | These macros are being replaced by a safe version; they now generate a deprecation message at each call site upon the first use there in each program run.
* Allow allowing UTF-8 overflow malformationKarl Williamson2016-12-231-9/+6
| | | | | | | | | perl has never allowed the UTF-8 overflow malformation, for some reason. But as long as overflows are turned into the REPLACEMENT CHARACTER, there is no real reason not to. And making it allowable allows code that wants to carry on in the face of malformed input to do so, without risk of contaminating things, as the REPLACEMENT is the Unicode prescribed way of handling malformations.
* Return REPLACEMENT for UTF-8 overlong malformationKarl Williamson2016-12-231-0/+24
| | | | | | | | | | | | | | | | | | | When perl decodes UTF-8 into a code point, it must decide what to do if the input is malformed in some way. When the flags passed to the decode function indicate that a given malformation type is not acceptable, the function returns 0 to indicate failure; on success it returns the decoded code point (unfortunately that may require disambiguation if the input is validly a NUL). As perl evolved, what happened when various allowed malformations were encountered got stricter and stricter. This is the final malformation that was not turned into a REPLACEMENT CHARACTER when the malformation was allowed, and this commit changes to return that. Unlike most other malformations, the code point value of an overlong is well-defined, and that is why it hadn't been changed here-to-fore. But it is safer to use the Unicode prescribed behavior on all malformations, which is to replace them with the REPLACEMENT CHARACTER. Just in case there is code that requires the old behavior, it is retained, but you have to search the source for the undocumented flag that enables it.
* Return REPLACEMENT for UTF-8 empty malformationKarl Williamson2016-12-231-1/+1
| | | | | | | | | | | | | | The previous commit no longer allows this so-called malformation under DEBUGGING builds, except if code explicitly changes to request it (or already explicitly does, but there are no instances of this in CPAN). If it is explicitly allowed, prior to this commit it returned NUL. If it wasn't allowed, it returned 0. Most code won't treat these as different. When returning NUL, it basically is making nothing into something, which might be exploitable some way by an attacker. The Unicode accepted way of dealing with malformations is to replace them with the REPLACEMENT CHARACTER, and so this commit changes things to conform to this.
* utf8.c: Forbid zero-length malformation under DEBUGGINGKarl Williamson2016-12-231-4/+6
|
* utf8.h: Renumber flag bitsKarl Williamson2016-12-231-10/+10
| | | | This creates a gap that will be filled by future commits
* Add isFOO_utf8_safe() macrosKarl Williamson2016-12-232-78/+437
| | | | | | | | | | | | | | | | The original API does not check that we aren't reading beyond the end of a buffer, apparently assuming that we could keep malformed UTF-8 out by use of gatekeepers, but that is currently impossible. This commit adds "safe" macros for determining if a UTF-8 sequence represents an alphabetic, a digit, etc. Each new macro has an extra parameter pointing to the end of the sequence, so that looking beyond the input string can be avoided. The macros aren't currently completely safe, as they don't test that there is at least a single valid byte in the input, except by an assertion in DEBUGGING builds. This is because typically they are called in code that makes that assumption, and frequently tests the current byte for one thing or another.
* Switch most open() calls to three-argument form.John Lightsey2016-12-2321-42/+42
| | | | | | | | | | Switch from two-argument form. Filehandle cloning is still done with the two argument form for backward compatibility. Committer: Get all porting tests to pass. Increment some $VERSIONs. Run: ./perl -Ilib regen/mk_invlists.pl; ./perl -Ilib regen/regcharclass.pl For: RT #130122
* APItest/t/handy.t: Bring final special case into loopKarl Williamson2016-12-231-11/+8
| | | | | | All the tests in this file are now in two loops, one for the isFOO() macros, and the other for the toFOO() macros. Thus the main logic applies to all, and tests can be added or changed easily.
* APItest/t/handy.t: White-space onlyKarl Williamson2016-12-231-13/+13
| | | | Indent newly formed block
* APItest/t/handy.t: Add more testsKarl Williamson2016-12-232-14/+224
| | | | | | | Macros with the '_uvchr' suffix were not being tested at all. Instead, the undocumented backwards-compatibility-only macros with the suffixes _uni were being tested, but these might diverge, and the tests wouldn't find that.
* APItest/t/handy.t: Add more testsKarl Williamson2016-12-233-5/+142
| | | | | | | | | The macros like isALPHA() were not getting tested; instead the theory being that testing isALPHA_A() was good enough because they are #defined to be the same. But that might change and the tests wouldn't uncover that. And it turned out that some things wern't getting tested at all if there was no _A version of the macro, for example isALNUM(). This commit adds test for the version of the isFOO() macros with no suffix.
* APItest/t/handy.t: Use abbrev. char name in test namesKarl Williamson2016-12-231-7/+29
| | | | | | | | | | | I got tired of seeing all these long character names fly by on my screen while testing, so this changes to use any official Unicode abbreviation when available. It's kind of silly to do this in this test, but I might extract and improve this for more general use in tests of characters in the future. This also changes some imports so that the full module name need not always be specified.
* APItest/t/handy.t: White-space onlyKarl Williamson2016-12-231-10/+10
| | | | indent newly formed block.
* APItest/t/handy.t: Fold in another special caseKarl Williamson2016-12-231-15/+12
| | | | | The previous commit revamped this .t to make most things part of a single loop. This adds another thing that was outside it.
* APItest/t/handy.t: Refactor for maintenanceKarl Williamson2016-12-231-205/+194
| | | | | | | Over the years code has kept getting copied and modified slightly in each new place. And a future commit would create still more. This cuts down the number of slightly different versions to the minimum reasonably attainable.
* PerlIO-scalar: Bump version to 0.26Karl Williamson2016-12-221-1/+1
|
* PerlIOScalar_eof(): silence compiler warning:David Mitchell2016-12-221-2/+1
| | | | | | | | | scalar.xs:23:15: warning: variable ‘p’ set but not used [-Wunused-but-set-variable] char *p; I'm not sure why this has only started warning, but this commit shuts it up anyway.