| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this patch we resized hashes when after inserting a key
the load factor of the hash reached 1 (load factor= keys / buckets).
This patch makes two subtle changes to this logic:
1. We split only after inserting a key into an utilized bucket,
2. and the maximum load factor exceeds 0.667
The intent and effect of this change is to increase our hash tables
efficiency. Reducing the maximum load factor 0.667 means that we should
have much less keys in collision overall, at the cost of some unutilized
space (2/3rds was chosen as it is easier to calculate than 0.7). On the
other hand, only splitting after a collision means in theory that we execute
the "final split" less often. Additionally, insertin a key into an unused
bucket increases the efficiency of the hash, without changing the worst
case.[1] In other words without increasing collisions we use the space
in our hashes more efficiently.
A side effect of this hash is that the size of a hash is more sensitive
to key insert order. A set of keys with some collisions might be one
size if those collisions were encountered early, or another if they were
encountered later. Assuming random distribution of hash values about
50% of hashes should be smaller than they would be without this rule.
The two changes complement each other, as changing the maximum load
factor decreases the chance of a collision, but changing to only split
after a collision means that we won't waste as much of that space we
might.
[1] Since I personally didnt find this obvious at first here is my
explanation:
The old behavior was that we doubled the number of buckets when the
number of keys in the hash matched that of buckets. So on inserting
the Kth key into a K bucket hash, we would double the number of
buckets. Thus the worse case prior to this patch was a hash
containing K-1 keys which all hash into a single bucket, and the post
split worst case behavior would be having K items in a single bucket
of a hash with 2*K buckets total.
The new behavior says that we double the size of the hash once inserting
an item into an occupied bucket and after doing so we exceeed the maximum
load factor (leave aside the change in maximum load factor in this patch).
If we insert into an occupied bucket (including the worse case bucket) then
we trigger a key split, and we have exactly the same cases as before.
If we insert into an empty bucket then we now have a worst case of K-1 items
in one bucket, and 1 item in another, in a hash with K buckets, thus the
worst case has not changed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the build of XS modules, an empty Foo.bs file is normally created
for each Foo.so file. If a Foo_BS file is present, instead this triggers
the auto-generatation of a .bs file which may have executable perl
content.
However, nothing in core currently generates a non-empty .bs file. So add
a test that this mechanism works, and fix up the three dynamic lib loaders
which implement the 'do $bs if -s $bs' mechanism to not rely on the
process having '.' present in @INC.
As it happens this already works currently, because the name of the
.bs file to load will usually be something like
../../lib/auto/Foo/Foo.bs
and the presence of the leading '..' causes 'do' to load the file directly
rather than via @INC. But locally fix up @INC anyway, in case '../' isn't
always the case.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit c0dea56fe487504493d97df5a7a6be57a2d2834d.
The new macros introduced here have now just been rendered invisible
by 8f71649941d02d5bdfe4f. Using macros that we can't see breaks the
build, so revert this for now. It can be reintroduced when the macro
names are settled and no longer hidden.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test file is one of the longest running ones. It has three main
semi-independent parts. Two of them are split off into 2 files with a
common file required. The other part is still long running, so it is
split so that a common file is used to run the tests, but it is called
with a chunk number and it only executes based on that chunk. The
number of chunks is based on the environment variable TEST_JOBS, up to
10. Each chunk executes 1/TEST_JOBS of the total test. If TEST_JOBS is
not set, it reverts to 1 chunk. The alternative would be to revert to
10, but since there is overhead associated with each new chunk, I chose,
for now, 1.
There may be a better solution later on, but I think this is good enough
for now.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a very long running test. This commit splits it into smaller
chunks, based on the environment variable TEST_JOBS, up to 10. Each
chunk executes 1/TEST_JOBS of the total test. If TEST_JOBS is not set,
it reverts to 1 chunk. The alternative would be to revert to 10, but
since there is overhead associated with each new chunk, I chose, for
now, 1.
There may be a better solution later on, but I think this is good enough
for now.
|
|
|
|
| |
Except under cpan/ and dist/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #129285
These days a 'GV' can actually just be a ref to a CV when the only thing
that would be stored in the glob is a CV. Update S_do_op_dump_bar() to
handle this. Formerly it would trigger an assert on a non-threaded build.
In fact, incorporate the fixed logic into a static function,
S_gv_display(), that is shared by both S_do_op_dump_bar() and
Perl_debop(); so both
perl -Dx
and
perl -Dt
get the benefit.
Also for the -Dx case, make it display the raw address of the GV too.
|
| |
|
|
|
|
|
|
| |
This moves the code that helps in testing I8 (which is the same as UTF-8
on non-EBCDIC platforms) to t/charset_tools.pl, away from the .t where
they previously were. This means these can now be used in other .t's.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mainly used for low-level debugging these days (higher level stuff
like Concise having since been created), e.g. calling op_dump() from
within a debugger or running with -Dx. Make it display more info, and use
an ACSII-art tree to show the structure.
The main changes are:
* added 'ASCII-art' tree structure;
* it now displays each op's class and address;
* for op_next etc links, it now displays the type and address of the
linked-to op in addition to its sequence number;
* the following ops now have their op_other field displayed, like op_and
etc already do:
andassign argdefelem dor dorassign entergiven entertry enterwhen
once orassign regcomp substcont
* enteriter now has its op_redo etc fields displayed, like enterloop
already does;
Here is a sample before and after of perl -Dx -e'($x+$y) * $z'
Before:
{
1 TYPE = leave ===> NULL
TARG = 1
FLAGS = (VOID,KIDS,PARENS,SLABBED)
PRIVATE = (REFC)
REFCNT = 1
{
2 TYPE = enter ===> 3
FLAGS = (UNKNOWN,SLABBED,MORESIB)
}
{
3 TYPE = nextstate ===> 4
FLAGS = (VOID,SLABBED,MORESIB)
LINE = 1
PACKAGE = "main"
SEQ = 4294967246
}
{
5 TYPE = multiply ===> 1
TARG = 5
FLAGS = (VOID,KIDS,SLABBED)
PRIVATE = (0x2)
{
6 TYPE = add ===> 7
TARG = 3
FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB)
PRIVATE = (0x2)
{
8 TYPE = null ===> (9)
(was rv2sv)
FLAGS = (SCALAR,KIDS,SLABBED,MORESIB)
PRIVATE = (0x1)
{
4 TYPE = gvsv ===> 9
FLAGS = (SCALAR,SLABBED)
PADIX = 1
}
}
{
10 TYPE = null ===> (6)
(was rv2sv)
FLAGS = (SCALAR,KIDS,SLABBED)
PRIVATE = (0x1)
{
9 TYPE = gvsv ===> 6
FLAGS = (SCALAR,SLABBED)
PADIX = 2
}
}
}
{
11 TYPE = null ===> (5)
(was rv2sv)
FLAGS = (SCALAR,KIDS,SLABBED)
PRIVATE = (0x1)
{
7 TYPE = gvsv ===> 5
FLAGS = (SCALAR,SLABBED)
PADIX = 4
}
}
}
}
After:
1 leave LISTOP(0xdecb38) ===> [0x0]
TARG = 1
FLAGS = (VOID,KIDS,PARENS,SLABBED)
PRIVATE = (REFC)
REFCNT = 1
|
2 +--enter OP(0xdecb00) ===> 3 [nextstate 0xdecb80]
| FLAGS = (UNKNOWN,SLABBED,MORESIB)
|
3 +--nextstate COP(0xdecb80) ===> 4 [gvsv 0xdeb3b8]
| FLAGS = (VOID,SLABBED,MORESIB)
| LINE = 1
| PACKAGE = "main"
| SEQ = 4294967246
|
5 +--multiply BINOP(0xdecbe0) ===> 1 [leave 0xdecb38]
TARG = 5
FLAGS = (VOID,KIDS,SLABBED)
PRIVATE = (0x2)
|
6 +--add BINOP(0xdeb2b0) ===> 7 [gvsv 0xdeb270]
| TARG = 3
| FLAGS = (SCALAR,KIDS,PARENS,SLABBED,MORESIB)
| PRIVATE = (0x2)
| |
8 | +--null (ex-rv2sv) UNOP(0xdeb378) ===> 9 [gvsv 0xdeb338]
| | FLAGS = (SCALAR,KIDS,SLABBED,MORESIB)
| | PRIVATE = (0x1)
| | |
4 | | +--gvsv PADOP(0xdeb3b8) ===> 9 [gvsv 0xdeb338]
| | FLAGS = (SCALAR,SLABBED)
| | PADIX = 1
| |
10 | +--null (ex-rv2sv) UNOP(0xdeb2f8) ===> 6 [add 0xdeb2b0]
| FLAGS = (SCALAR,KIDS,SLABBED)
| PRIVATE = (0x1)
| |
9 | +--gvsv PADOP(0xdeb338) ===> 6 [add 0xdeb2b0]
| FLAGS = (SCALAR,SLABBED)
| PADIX = 2
|
11 +--null (ex-rv2sv) UNOP(0xdeb220) ===> 5 [multiply 0xdecbe0]
FLAGS = (SCALAR,KIDS,SLABBED)
PRIVATE = (0x1)
|
7 +--gvsv PADOP(0xdeb270) ===> 5 [multiply 0xdecbe0]
FLAGS = (SCALAR,SLABBED)
PADIX = 4
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given an op, this function determines what type of struct it has been
allocated as. Returns one of the OPclass enums, such as OPclass_LISTOP.
Originally this was a static function in B.xs, but it has wider
applicability; indeed several XS modules on CPAN have cut and pasted it.
It adds the OPclass enum to op.h. In B.xs there was a similar enum, but
with names like OPc_LISTOP. I've renamed them to OPclass_LISTOP etc. so as
not to clash with the cut+paste code already on CPAN.
|
|
|
|
|
|
|
|
|
|
| |
The skipped tests are for malformed input for the various isCNTRL
functions. Perl does not go out of its way to test for malformedness in
the these, only making sure they are well-formed if that is necessary
for the correct operation of the function. Since all controls in EBCDIC
are represented by a single byte, and you can't malform a single byte,
all the malformedness control tests will not detect malformedness on
EBCDIC platforms, so skip them.
|
|
|
|
|
| |
The previous commit might not have been necessary if these had been more
mnemonic in the first place.
|
|
|
|
|
| |
File::Glob::glob is deprecated. So, if we test it, we should avoid
the warning.
|
|
|
|
| |
Adjusted the deprecation message, and bumped the version of B::Terse.
|
|
|
|
|
|
|
|
| |
Since Perl 5.18, the --libpods option has been recognized, but
did not do anything other than issue a deprecation warnings.
As of now, using the --libpods option creates an error.
The version number of Pod::Html has bumped to 1.2202.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The :unique and :locked attributes have had no effect since 5.8.8
and 5.005 respectively. They were deprecated in 5.12. They are now
scheduled to be deleted in 5.28.
There are two places the deprecation warning can be issued:
in lib/attributes.pm, and in toke.c. The warnings were phrased
differently, but since we're changing the warning anyway (as we
added the version of Perl in which the attributes will disappear),
we've used the same phrasing for this warning, regardless of where
it is generated:
Attribute "locked" is deprecated, and will disappear in Perl 5.28
Attribute "unique" is deprecated, and will disappear in Perl 5.28
|
|
|
|
|
|
|
|
|
|
| |
This function has been deprecated since 5.8. However, no deprecation
message was issued; only perl5.008delta.pod and a comment in the file
mention its deprecation.
This patch issues a deprecation message, and warns the user it will
be gone in perl 5.30. Since all this method does is calling
File::Glob::bsd_glob anyway, code calling this is easily fixed.
|
| |
|
|
|
|
|
| |
It is clearer to show that these characters which are sometimes meta and
sometimes literal are meant to be taken literally here.
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
probes
efc4bddfd4 added generating a probes object file for perlmain.o, since
the compiler was generating probes even for unused inline functions.
The default compiler on FreeBSD 11 however doesn't generate probes for
these unused inline functions, and dtrace -G fails because it can't
find any.
So if dtrace fails for perlmain.o generate a dummy object file to
take its place.
Similarly for XS::APItest.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In XS code, the macros that pay attention to locale don't check if they
are being called from within the scope of 'use locale'; they assume that
the code calling them wouldn't be doing so unless appropriate. That's
not true of Perl-level code. I forgot that when writing these tests.
Normally it doesn't show up as a problem as the underlying locale is the
C locale, which on almost all platforms has the effect of not being in a
locale. But the VMS C locale is special, and so doesn't meet the
assumptions of these tests. The solution is to skip locale-aware macros
unless we are testing locale.
|
|
|
|
|
| |
There were several instances where the native code point and the Unicode
equivalent were being conflated.
|
|
|
|
| |
See: http://www.nntp.perl.org/group/perl.perl5.porters/2016/12/msg241877.html
|
|
|
|
| |
Now that there are _safe versions, deprecate the unsafe ones.
|
| |
|
|
|
|
|
|
| |
These macros are being replaced by a safe version; they now generate a
deprecation message at each call site upon the first use there in each
program run.
|
|
|
|
|
|
|
|
|
| |
perl has never allowed the UTF-8 overflow malformation, for some reason.
But as long as overflows are turned into the REPLACEMENT CHARACTER,
there is no real reason not to. And making it allowable allows code
that wants to carry on in the face of malformed input to do so, without
risk of contaminating things, as the REPLACEMENT is the Unicode
prescribed way of handling malformations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When perl decodes UTF-8 into a code point, it must decide what to do if
the input is malformed in some way. When the flags passed to the decode
function indicate that a given malformation type is not acceptable, the
function returns 0 to indicate failure; on success it returns the decoded
code point (unfortunately that may require disambiguation if the
input is validly a NUL). As perl evolved, what happened when various
allowed malformations were encountered got stricter and stricter. This
is the final malformation that was not turned into a REPLACEMENT
CHARACTER when the malformation was allowed, and this commit changes to
return that. Unlike most other malformations, the code point value of
an overlong is well-defined, and that is why it hadn't been changed
here-to-fore. But it is safer to use the Unicode prescribed behavior on
all malformations, which is to replace them with the REPLACEMENT
CHARACTER. Just in case there is code that requires the old behavior,
it is retained, but you have to search the source for the undocumented
flag that enables it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit no longer allows this so-called malformation under
DEBUGGING builds, except if code explicitly changes to request it (or
already explicitly does, but there are no instances of this in CPAN).
If it is explicitly allowed, prior to this commit it returned NUL. If
it wasn't allowed, it returned 0. Most code won't treat these as
different. When returning NUL, it basically is making nothing into
something, which might be exploitable some way by an attacker. The
Unicode accepted way of dealing with malformations is to replace them
with the REPLACEMENT CHARACTER, and so this commit changes things to
conform to this.
|
| |
|
|
|
|
| |
This creates a gap that will be filled by future commits
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original API does not check that we aren't reading beyond the end of
a buffer, apparently assuming that we could keep malformed UTF-8 out by
use of gatekeepers, but that is currently impossible. This commit adds
"safe" macros for determining if a UTF-8 sequence represents
an alphabetic, a digit, etc. Each new macro has an extra parameter
pointing to the end of the sequence, so that looking beyond the input
string can be avoided.
The macros aren't currently completely safe, as they don't test that
there is at least a single valid byte in the input, except by an
assertion in DEBUGGING builds. This is because typically they are
called in code that makes that assumption, and frequently tests the
current byte for one thing or another.
|
|
|
|
|
|
|
|
|
|
| |
Switch from two-argument form. Filehandle cloning is still done with the two
argument form for backward compatibility.
Committer: Get all porting tests to pass. Increment some $VERSIONs.
Run: ./perl -Ilib regen/mk_invlists.pl; ./perl -Ilib regen/regcharclass.pl
For: RT #130122
|
|
|
|
|
|
| |
All the tests in this file are now in two loops, one for the isFOO()
macros, and the other for the toFOO() macros. Thus the main logic
applies to all, and tests can be added or changed easily.
|
|
|
|
| |
Indent newly formed block
|
|
|
|
|
|
|
| |
Macros with the '_uvchr' suffix were not being tested at all. Instead,
the undocumented backwards-compatibility-only macros with the suffixes
_uni were being tested, but these might diverge, and the tests wouldn't
find that.
|
|
|
|
|
|
|
|
|
| |
The macros like isALPHA() were not getting tested; instead the theory
being that testing isALPHA_A() was good enough because they are #defined
to be the same. But that might change and the tests wouldn't uncover
that. And it turned out that some things wern't getting tested at all
if there was no _A version of the macro, for example isALNUM(). This
commit adds test for the version of the isFOO() macros with no suffix.
|
|
|
|
|
|
|
|
|
|
|
| |
I got tired of seeing all these long character names fly by on my screen
while testing, so this changes to use any official Unicode abbreviation
when available. It's kind of silly to do this in this test, but I might
extract and improve this for more general use in tests of characters in
the future.
This also changes some imports so that the full module name need not
always be specified.
|
|
|
|
| |
indent newly formed block.
|
|
|
|
|
| |
The previous commit revamped this .t to make most things
part of a single loop. This adds another thing that was outside it.
|
|
|
|
|
|
|
| |
Over the years code has kept getting copied and modified slightly in
each new place. And a future commit would create still more. This cuts
down the number of slightly different versions to the minimum reasonably
attainable.
|
| |
|
|
|
|
|
|
|
|
|
| |
scalar.xs:23:15: warning: variable ‘p’ set but not used
[-Wunused-but-set-variable]
char *p;
I'm not sure why this has only started warning, but this commit shuts it
up anyway.
|