| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
I am starting to write a Unicode::Private_Use module which will allow
one to specify the Unicode properties of private use code points, thus
making them actually useful. This commit adds a hook to regcomp.c to
accommodate this module. The changes are pretty minimal. This way we
don't have to wait another release cycle to get it out there.
I don't want to document this interface, until it's proven.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MY_CXT subsystem allows per-thread pseudo-static data storage.
Part of the implementation for this involves each XS module being
assigned a unique index in its my_cxt_index static var when first
loaded.
Because PERL_GLOBAL_STRUCT bans any static vars, under those builds
there is instead a table which maps the MY_CXT_KEY identifying string to
index.
Unfortunately, this table was allocated per-interpreter rather than
globally, meaning if multiple threads tried to load the same XS module,
crashes could ensue.
This manifested itself in failures in
ext/XS-APItest/t/keyword_plugin_threads.t
The fix is relatively straightforward: allocate PL_my_cxt_keys globally
rather than per-interpreter.
Also record the size of this struct in a new var, PL_my_cxt_keys_size,
rather than doing double duty on PL_my_cxt_size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the various Perl_PerlSock_dup2_cloexec() type functions so that
t/porting/liberl.a passes under -DPERL_GLOBAL_STRUCT_PRIVATE builds.
In these builds it is forbidden to have any static variables, but each
of these functions (via convoluted macros) has a static var called
'strategy' which records, for each function, whether a run-time probe
has been done to determine the best way of achieving close-exec
functionality, and the result.
Replace them all with 'global' vars: PL_strategy_dup2 etc.
NB these vars aren't thread-safe but it doesn't really matter, as the
worst that can happen is for a redundant probe or two to be done before
a suitable "don't probe any more" value is written to the var and seen
by all the threads.
|
|
|
|
|
|
|
| |
A global hash has to be specially handled. The keys can't be shared,
and all the SVs stored into it must be in its thread. This commit adds
the hash, and initialization, and macros for context change, but doesn't
use them. The code to deal with this is entirely confined to regcomp.c.
|
|
|
|
| |
This will be used in future commits
|
|
|
|
| |
This will be used in a future commit.
|
|
|
|
|
| |
The inversion list this refers to now includes the Latin 1 range, so the
name was misleading.
|
|
|
|
|
|
| |
This variable's name was out-of-date and misleading. It is the name of
an inversion list that contains all the code points in the current
version of Unicode that participate in any way in a /i type of fold.
|
|
|
|
|
|
|
| |
This table contains all the code points that are in any multi-character
fold (not the folded-from character, but what that character folds to).
It will be used in a future commit.
|
|
|
|
|
| |
These variables are constant, once initialized, through the life of a
program, so having them be per instance is a waste of time and space
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under /a pattern matching, the matches of the [:posix:] classes are
restricted to the ASCII range. Previously, in a time/space trade-off
that favored space, we created the list of matching characters at
pattern compilation time by ANDing the full-range Posix class with the
set of ASCII characters.
But now, the tables for just the ASCII-range classes are generated
anyway, so there's no need to do that compilation-time intersection.
This slightly simplifies the code.
|
|
|
|
|
|
|
|
|
|
| |
This commit changes to use the C data structures generated by the
previous commit to compute what characters fold to a given one. This is
used to find out what things should match under /i.
This now avoids the expensive start up cost of switching to perl
utf8_heavy.pl, loading a file from disk, and constructing a hash from
it.
|
|
|
|
|
|
|
|
| |
This commit makes the inversion lists for parsing character name global
instead of interpreter level, so can be initialized once per process,
and no copies are created upon new thread instantiation. More
importantly, this is another instance where utf8_heavy.pl no longer
needs to be loaded, and the definition files read from disk.
|
|
|
|
|
| |
These are now constant through the life of the program, so don't need to
be duplicated at each new thread instantiation.
|
|
|
|
|
|
|
|
|
|
| |
These structures are read-only, use const C strings, and are truly
global, so no need to have them be interpreter level. This saves
duplicating and freeing them as threads come and go.
In doing this, I noticed that not every one was properly being
copied/deallocated, so this fixes some potential unreported bugs, and
leaks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for operations on threaded perls which don't 'use locale'
to still change the locale. This happens when calling
POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can
happen for other operations when perl has been initialized with the
environment causing the various locale categories to not have a uniform
locale.
This commit causes the areas where the locale for this category should
predictably be in one or the other state to be a critical section where
another thread can't interrupt and change it. This is a separate
mutex, so that only these particular operations will be held up.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit e6a172f358c0f48c4b744dbd5e9ef6ff0b4ff289,
which was a revert of a3bf60fbb1f05cd2c69d4ff0a2ef99537afdaba7.
Add new hashing and "hash with state" infrastructure
This adds support for three new hash functions: StadtX, Zaphod32 and SBOX,
and reworks some of our hash internals infrastructure to do so.
SBOX is special in that it is designed to be used in conjuction with any
other hash function for hashing short strings very efficiently and very
securely. It features compile time options on how much memory and startup
time are traded off to control the length of keys that SBOX hashes.
This also adds support for caching the hash values of single byte characters
which can be used in conjuction with any other hash, including SBOX, although
SBOX itself is as fast as the lookup cache, so typically you wouldnt use both
at the same time.
This also *removes* support for Jenkins One-At-A-Time. It has served us
well, but it's day is done.
This patch adds three new files: zaphod32_hash.h, stadtx_hash.h,
sbox32_hash.h
|
|
|
|
|
|
| |
This reverts commit a3bf60fbb1f05cd2c69d4ff0a2ef99537afdaba7.
Accidentally pushed work pending unfreeze.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for three new hash functions: StadtX, Zaphod32 and SBOX,
and reworks some of our hash internals infrastructure to do so.
SBOX is special in that it is designed to be used in conjuction with any
other hash function for hashing short strings very efficiently and very
securely. It features compile time options on how much memory and startup
time are traded off to control the length of keys that SBOX hashes.
This also adds support for caching the hash values of single byte characters
which can be used in conjuction with any other hash, including SBOX, although
SBOX itself is as fast as the lookup cache, so typically you wouldnt use both
at the same time.
This also *removes* support for Jenkins One-At-A-Time. It has served us
well, but it's day is done.
This patch adds three new files: zaphod32_hash.h, stadtx_hash.h,
sbox32_hash.h
|
|
|
|
|
|
|
| |
Because if we're running under a Unix shell, the path separator is
likely to meet the expectations of Unix shell scripts better if it's
the Unix ':' rather than the VMS '|'. There is no change when
running under DCL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is the first step in making locale handling thread-safe.
[perl #127708] was solved for 5.24 by adding a mutex in this function.
That bug was caused by the code changing the locale even if the calling
program is not consciously using locales.
Posix 2008 introduced thread-safe locale functions. This commit changes
this function to use them if the perl is threaded and the platform has
them available. This means that the mutex is avoided on modern
platforms.
It restructures the function to return a mortal copy of the error
message. This is a step towards making the function completely thread
safe. Right now, as documented, if you do 'use locale', locale handling
isn't thread-safe.
A global C locale object is created and used here if necessary. It is
destroyed at the end of the program.
Note that some platforms have a strerror_r(), which is automatically
used instead of strerror() if available. It differs form straight
strerror() by taking a buffer to place the returned string, so the
return does not point to internal static storage. One could test for
the existence of this and avoid the mortal copy.
|
|
|
|
|
| |
This adds a new mutex for use in the next commit for use with locale
handling.
|
| |
|
|
|
|
|
|
|
|
| |
Because some platforms (like HP-UX 10.*) have HUGE_VAL as DBL_MAX,
which, while large, is not quite the infinity. So have infinity
own our very own.
Similarly for NV_NAN.
|
|
|
|
|
|
| |
This reverts commit c1cec775e9019cc8ae244d4db239a7ea5c0b343e.
See ticket #120864.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cop address for each breakable line was being stored in the IVX
slot of ${"_<$file"}[$line]. This value itself, writable from Perl
space, was being used as the address of the op to be flagged, whenever
a breakpoint was set.
This meant writing to ${"_<$file"}[$line] and assigning a number (like
42) would cause perl to use 42 as an op address, and crash when trying
to flag the op.
Furthermore, since the array holding the lines could outlive the ops,
setting a breakpoint on the op could write to freed memory or to an
unrelated op (even a different type), potentially changing the beha-
viour of unrelated code.
This commit solves those pitfalls by moving breakpoints into a global
breakpoint bitfield. Dbstate ops now have an extra field on the end
holding a sequence number, representing which bit holds the breakpoint
for that op.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
*) Introduces multiple new hash functions to choose from at build
time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and
One-at-a-time. Currently this is handled by muning hv.h. Configure
support hopefully to follow.
*) Changes the default hash to Murmur hash which is faster than the
old default One-at-a-time.
*) Rips out the old HvREHASH mechanism and replaces it with a
per-process random hash seed.
*) Changes the old PL_hash_seed from an interpreter value to a
global variable. This means it does not have to be copied during
interpreter setup or cloning.
*) Changes the format of the PERL_HASH_SEED variable to a hex
string so that hash seeds longer than fit in an integer are possible.
*) Changes the return of Hash::Util::hash_seed() from a number to a
string. This is to accomodate hash functions which have more bits than
can be fit in an integer.
*) Adds new functions to Hash::Util to improve introspection of hashes
-) hash_value() - returns an integer hash value for a given string.
-) bucket_info() - returns basic hash bucket utilization info
-) bucket_stats() - returns more hash bucket utilization info
-) bucket_array() - which keys are in which buckets in a hash
More details on the new hash functions can be found below:
Murmur Hash: (v3) from google, see
http://code.google.com/p/smhasher/wiki/MurmurHash3
Superfast Hash: From Paul Hsieh.
http://www.azillionmonkeys.com/qed/hash.html
DJB2: a hash function from Daniel Bernstein
http://www.cse.yorku.ca/~oz/hash.html
SDBM: a hash function sdbm.
http://www.cse.yorku.ca/~oz/hash.html
SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein.
https://www.131002.net/siphash/
They have all be converted into Perl's ugly macro format.
I have not done any rigorous testing to make sure this conversion
is correct. They seem to function as expected however.
All of them use the random hash seed.
You can force the use of a given function by defining one of
PERL_HASH_FUNC_MURMUR
PERL_HASH_FUNC_SUPERFAST
PERL_HASH_FUNC_DJB2
PERL_HASH_FUNC_SDBM
PERL_HASH_FUNC_ONE_AT_A_TIME
Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make
perl output the current seed (changed to hex) and the hash function
it has been built with.
Setting the environment variable PERL_HASH_SEED to a hex value will
cause that value to be used at the seed. Any missing bits of the seed
will be set to 0. The bits are filled in from left to right, not
the traditional right to left so setting it to FE results in a seed
value of "FE000000" not "000000FE".
Note that we do the hash seed initialization in perl_construct().
Doing it via perl_alloc() (via init_tls) causes problems under
threaded builds as the buffers used for reentrant srand48 functions
are not allocated. See also the p5p mail "Hash improvements blocker:
portable random code that doesnt depend on a functional interpreter",
Message-ID:
<CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
|
|
|
|
|
| |
This function provides a convenient and thread-safe way for modules to
hook op checking.
|
|
|
|
|
|
|
| |
For the default (non-multiplicity) configuration, PERLVAR*() macros now
directly expand their arguments to tokens such as C<PL_defgv>, instead of
expanding to C<PL_Idefgv>. This removes over 350 lines from F<embedvar.h>,
which defined macros to map from C<PL_Idefgv> to C<PL_defgv> and so forth.
|
|
|
|
|
| |
They exist solely to ensure that Perl_runops_standard and Perl_runops_debug
are linked in - nothing assigns to either variable, and nothing reads them.
|
|
|
|
| |
Make them const U16 - they should have been const from the start.
|
|
|
|
|
| |
Rename PL_interp_size_5_10_0 to PL_interp_size_5_16_0, as it is only intended to
track interpreter size within (forwards) binary compatible maintenance branches.
|
|
|
|
|
| |
To get the initialisation to work, the location of #include patchlevel.h needs
to be moved.
|
|
|
|
|
|
|
|
|
|
|
| |
They were converted in perl.h from const char[] to #define in 31fb120917c4f65d,
then re-instated as const char[], but in perlvars.h, in 3fe35a814d0a98f4.
There's no need for compile-time constants to jump through the hoops of
perlvars.h, even for Symbian, as the various "EXTCONST" variables already in
perl.h demonstrate.
These were the only 3 users of the the PERLVARISC macro, so eliminate that, and
all related code.
|
|
|
|
|
|
|
| |
patleave was added in perl 3.0 patch #35 patch #29 -- 395c379347344a50,
used in scanpat(). scanpat() was refactored and renamed to scan_pat() in
5.0 alpha 2, "commented" out with the C pre-processor in 5.000, and removed in
5.001.
|
|
|
|
|
|
|
|
|
| |
As PL_charclass is a constant, it doesn't need to be accessed via the global
struct. It should be exported via globvar.sym, not PERLVARA() in perlvars.h
[With a PERVARA() it all compiles perfectly, once C<dVAR>s are added where
now needed, but the build loops forever because the (real) charclass array is
never initialised]
|
|
|
|
|
|
|
|
|
| |
Previously all the scripts in regen/ had code to generate header comments
(buffer-read-only, "do not edit this file", and optionally regeneration
script, regeneration data, copyright years and filename).
This change results in some minor reformatting of header blocks, and
standardises the copyright line as "Larry Wall and others".
|
|
|
|
|
|
|
|
|
|
|
| |
Eliminate the #define pp_foo Perl_pp_foo(pTHX) macros, and update the 13
locations that relied on them.
regen/opcode.pl now generates prototypes for the PP functions directly, into
pp_proto.h. It no longer writes pp.sym, and regen/embed.pl no longer reads
this, removing the only ordering dependency in the regen scripts. opcode.pl
is now responsible for prototypes for pp_* functions. (embed.pl remains
responsible for ck_* functions, reading from regen/opcodes)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was some buggy code in Perl_sighandler() related to getting an SV
with the signal name to pass to the perl-level handler function.
`
Basically:
on threaded builds, a sig handler that died leaked PL_psig_name[sig];
on unthreaded builds, in a recursive handler that died, PL_sig_sv was
prematurely freed.
PL_sig_sv was originally just a file static var that was not
recursion-save anyway, and got promoted to perlvars.h when it should
instead have been done away with. So I've got rid of it now, and
rationalised the code, which fixed the two issues listed above.
Also added an assert which makes the dodgy manual popping of the save
stack slightly less dodgy.
|
|
|
|
| |
so newcomers can find it more easily
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When MULTIPLICITY was first developed, and interpreter state moved into an
interpreter struct, thread and interpreter local PL_* variables were defined
as macros that called accessor functions, returning the address of the value,
outside of the perl core. The intent was to allow members within the
interpreter struct to change size without breaking binary compatibility, so
that bug fixes could be merged to a maintenance branch that necessitated such
a size change.
However, some non-core code defines PERL_CORE, sometimes intentionally to
bypass this mechanism for speed reasons, sometimes for other reasons but with
the inadvertent side effect of bypassing this mechanism. As some of this code
is widespread in production use, the result is that the core *can't* change
the size of members of the interpreter struct, as it will break such modules
compiled against a previous release on that maintenance branch. The upshot
is that this mechanism is redundant, and well-behaved code is penalised by
it. Hence it can and should be removed.
Accessor functions are still needed for globals when PERL_GLOBAL_STRUCT is
defined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These provided a non-public API for the hash and array code to donate free
memory direct to the SV head allocation routines, instead of returning it
to the malloc system with free().
I assume that on some older mallocs this could offer significant benefits.
However, my benchmarking on a modern malloc couldn't detect any significant
effect (positive or negative) on removing the code. Its (continued) presence,
however, has downsides
a: slightly more code complexity
b: slightly larger interpreter structure
c: in the steady state, if net creation of SVs is zero, 1 chunk of allocated
but unused memory will exist (per thread)
So I think it best to remove it.
|
|
|
|
|
|
|
|
| |
New variable PL_rpeepp makes it possible for extensions to hook
the per-op-chain part of the peephole optimiser (which recurses into
side chains). The existing variable PL_peepp still allows hooking the
per-sub part of the peephole optimiser, maintaining perfect backward
compatibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds PL_apiversion, allowing the API version of a running interpreter to be
introspected. It is used in the new XS_APIVERSION_BOOTCHECK macro, which is
added to the _boot function of every XS module, to compare it against the API
version the module has been compiled against. If the versions do not match, an
exception is thrown.
This doesn't fully prevent binary incompatible extensions to be loaded. It
merely compares PERL_API_* between compile- and runtime, and does not attempt to
solve the problem of identifying binary incompatible perls with the same API
version (i.e. the same perl version configured with and without DEBUGGING).
|
|
|
|
|
|
|
|
|
|
| |
These take the form of a vtable pushed onto the new PL_blockhooks array.
This could probably do with a API around it later. Separate pre_end and
post_end hooks are needed to capture globals before the stack is unwound
(like needblockscope in the existing code). The intention is that once
a vtable is installed it never gets removed, so where necessary
extensions using this will need to use a hinthv element to determine
whether to do anything or not.
|
|
|
|
|
| |
This is initially intended for threads::shared and shouldn't (yet)
be considered part of the public API.
|
|
|
|
|
|
|
|
| |
When unwinding due to die, the new global PL_restartjmpenv points
to the JMP_ENV at which longjmping should stop and control should
be transferred to PL_restartop. This replaces the previous
use of cxstack[cxstack_ix+1].blk_eval.cur_top_env, located in a
nominally-discarded context frame.
|
| |
|