| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 9e254b0b5b145c9bfc3053e778e9f7fbb3760b45.
Date: Wed Apr 5 12:26:26 2023 -0600
This fixes GH #21040
The reverted commit caused failures in platforms using the musl library,
notably Alpine Linux. I came up with a fix for that, which instead
broke Windows. In looking at that I realized the original fix is
incomplete, and that things are too precarious to try to fix so close to
5.38.0. For example, I spent hours, due to a %p format printing 0 for
what turned out to be a non-NULL string pointer. I think it has to do
do with the fact that the failing code is in the middle of transitioning
between threads, and the printing got confused as a result.
The reverted commit was part of a series fixing #20155 and #20231. But
the earlier part of the series succeeded in fixing those, without that
commit, so reverting it should not cause things to break as a result.
This whole issue has to do with locales and threading. Those still
don't play well together. I have a series of well over 200 commits that
address this situation, for applying in early 5.39. My point is that we
are a long way from solving these kinds of issues; and they don't come
up that much in the field because they just don't get used. The
reverted commit would help if it worked properly, but it's not the only
thing wrong by a long shot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a macro that does a quick check before calling a function to
actually do the work. The sense of that check was reversed.
The check is repeated in the function, but this time correctly.
The bottom line was if the function should be called, the macro failed
to call it. If it shouldn't be called the macro would call it, but the
check in the function caused it to return without doing anything. Hence
this whole thing was a no-op.
However, I cant get things to fail without this patch. ISTR this was
the result of a BBC, with another one likely affected, but I can't find
them now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our checks on the define info we expose via Internals::V(), especially
the sorted part, did not really work properly as it only checked defines
that are actually exposed in our standard builds. Many of the defines
that are exposed in this list are special cases that would not be
enabled in a normal build we test under CI, and indeed prior to this
patch it was possible for us to produce unsorted output if certain
defines were enabled.
This patch adds checks that reads the actual code. It checks that the
define and the string are the same, and it checks that strings would be
output in sorted order assuming every define was enabled.
There are two historical exceptions where the string we show and the
define use internally are different, but we work around these two cases
with as special case hash.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two warnings classes which account for a very large number of
the warnings produced when building on HPUX Itanium. We know the cause
of these warnings and we are ok with ignoring them.
One set comes from our memory wrap checks, where we end up doing a
comparison against constants in certain conditions. See the comments in
handy.h line 2723 related to PERL_MALLOC_WRAP.
The other set comes from our common "trick" of doing OO in C code with
casting. This is the foundation of how we manage SV types and how we
manage regular expression ops (regops).
If this logic really was a problem then we would have lots of test
failures and segfaults, and we do not, so we can silence them.
|
|
|
|
|
|
| |
mauke: it was added by Andy Lester in 6f207bd3ddac24959aa7f00f2d7a66f116dcc7ed
mauke: when he replaced '/*EMPTY*/;' statements by 'NOOP;'
mauke: I would also remove the comment
|
|
|
|
| |
Parens are required or precedence issues can occur.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The leading underbar is reserved by C.
These defines are debugging only "recursion" depth related counters
injected into the function macro wrappers when a function is marked as
'W', much the same way that aTHX_ and pTHX_ are when building under
threaded builds. The functions are expected to incremented the depth
parameter themselves. Note that "recursion" is quoted above because in
practice currently they are only used by the regex engine when recursing
virtually, and they do not relate to true C stack related recursion.
(But they could be used for tracking C level recursion under debugging
if someone needed it.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The limited POSIX guarantees of thread safety for nl_langinfo_l() aren't
enough for our uses, and I was naive to think that a simple Configure probe
could rule out all possible thread-safety issues that might exist in a
libc call. I don't remember what the platforms were that falsely tested
ok for the probe, but if it were necessary to find out, revert this
patch, and start a smoke-me test.
What that Configure probe did was find one particular point of
non-safety. And it turns out various platforms pass that, but don't
have a thread-safe nl_langinfo_l() generally.
There are two calls to nl_langinfo_l() in the code. This commit removes
one, where the major advantage of using nl_langinfo_l() over plain
nl_langinfo() was efficiency. There still had to be an alternate
implementation available that used plain nl_langinfo(). Since we can't
guarantee that the _l implementation doesn't have bugs, simply remove
it, and the existing alternative gets automatically used.
The remaining use of nl_langinfo_l() is only when using glibc, and is
disabled by default, requiring an explicit Configure parameter to
enable. I have never seen a case where the glibc implementation failed
to be thread-safe. This use may be enabled by default at some point,
but not until early in a development cycle.
|
|
|
|
| |
Noticed by Zefram
|
|
|
|
|
|
|
|
| |
We have fixed bugs related to $SIG{__DIE__} being inconsistently
triggered during eval, and we have fixed bugs with compilation
inconsistently stopping after 10 errors. This patch also includes a
micro-tweak to perl.h to allow the threshold to be sanely overriden in
Configure.
|
|
|
|
|
|
| |
I did not fully understand the use of yyquit() when I implemented
the SYNTAX_ERROR related stuff. It is not needed, and switching to this
makes eval compile error messages more consistent.
|
|
|
|
|
| |
This is an obsolete name, retained for back compat with cpan. Make sure
the core doesn't have it defined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls to libc snprintf() were neglected to be changed when perl was
fixed to change the radix character to the proper one based on whether
or not 'use locale' is in effect. Perl-level code is unaffected, but
core and XS code is.
This commit changes to wrap snprintf() calls with the macros designed
for the purpose, long used for similar situations elsewhere in the code.
Doing this requires the thread context. I achieved this in a few places
by a dTHX, instead of assuming a caller would have the context already
available, and adding a pTHX_ parameter. I tried doing it the other
way, and got a few breakages in our test suite. Formatting already
requires significant CPU time, so this addition should just be in the
noise
This bug was found by new tests that will be added in a future commit.
|
| |
|
|
|
|
|
|
| |
Having half of the comment have the * on the left side is confusing
for humans and especially so for programs. Split the two style into
two comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In GH 20435 many typos in our C code were corrected. However, this pull
request was not applied to blead and developed merge conflicts. I
extracted diffs for the individual modified files and applied them with
'git apply', excepting four files where patch conflicts were reported.
Those files were:
handy.h
locale.c
regcomp.c
toke.c
We can handle these in a subsequent commit. Also, had to run these two
programs to keep 'make test_porting' happy:
$ ./perl -Ilib regen/uconfig_h.pl
$ ./perl -Ilib regen/regcomp.pl regnodes.h
|
| |
|
|
|
|
|
|
|
|
| |
They are similar to SVf and SVf_QUOTEDPREFIX but take an HV * argument
and use HvNAME() and related macros to extract the string. This is
helpful as it makes constructing error messages from a stash (HV *)
easier. It is the callers responsibility to ensure that the HV is
actually a stash.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assignment operators (`==`) were missing, as were both the logical and
the low-precedence shortcutting OR and AND operators (`&&`, `||`,
`and`, `or`)
Also renumbered them around somewhat to even out the spacing. This is
fine during a development cycle.
Also renamed the tokenizer/parser symbol names from "PLUG*OP" to
"PLUGIN_*_OP" for better readability.
|
|
|
|
| |
the numbers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This splits a bunch of the subcomponents of the regex engine into
smaller files.
regcomp_debug.c
regcomp_internal.h
regcomp_invlist.c
regcomp_study.c
regcomp_trie.c
The only real change besides to the build machine to achieve the split
is to also adds some new defines which can be used in embed.fnc to control
exports without having to enumerate /every/ regex engine file. For
instance all of regcomp*.c defines PERL_IN_REGCOMP_ANY, and this is used
in embed.fnc to manage exports.
|
|
|
|
|
|
|
|
|
| |
Runs for identifier-named custom infix operators and sequences of
non-identifier symbol characters.
Defines multiple precedence levels for custom infix operators that fit
alongside exponentiation, multiplication, addition, or relational
comparision operators, as well as a "high" and "low" at either end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
localeconv() returns a structure contaiing fields that are associated
with two different categories: LC_NUMERIC and LC_MONETARY. Perl via
POSIX::localeconv() reutrns a hash containing all the fields.
Testing on Windows showed that if LC_CTYPE is not the same locale as
LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for
the numeric ones, mojibake can result.
The solution to similar situations elsewhere in the code is to toggle
LC_CTYPE into being the same locale as the one for the returned fields.
But those situations only have a single locale that LC_CTYPE has to
match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are
different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or
the other, not both at the same time.
The previous implementation did not consider this possibility, and
wasn't easily changeable to work.
Therefore, this rewrites a bunch of it. The solution used is to call
localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale
don't match (with LC_CTYPE toggled to the corresponding one each time).
(Only one call is made if the two categories have the same locale.)
This one vs two complicated the code, but I thought it was worth it
given that the one call is the most likely case.
Another complication is that on platforms that lack nl_langinfo(),
(Windows, for example), localeconv() is used to emulate portions of it.
Previously there was a separate function to handle this, using an SV()
cast as an HV() to avoid using a hash that wasn't actually necessary.
That proved to lead to extra duplicated code under the new scheme, so
that function was collapsed into a single one and a real hash is used in
all circumstances, but is only populated with the one or two fields
needed for the emulation.
The only part of this commit that I thought could be split off from the
rest concerns the fact that localeconv()'s return is not thread-safe,
and so must be copied to a safe place (the hash) while in a critical
section, locking out all other threads. Before this commit, that
copying was accompanied by determining if each string field needed to be
marked as UTF-8. That determination isn't necessarily trivial, so
should really not be in the critical section. This commit does that.
And, with some effort, that part could have been split into a separate
commit. but I didn't think it was worth the effort.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LC_NAME is a GNU extension that Perl hadn't been aware of. The
consequences were that it couldn't be set or queried in Perl (except by
using LC_ALL to set everything). There are other GNU extensions that
Perl has long known about; this was the only missing one.
The values associated with this category are retrievable by the glibc
call nl_langinfo(3) in XS code. The standard-specified items are
retrievable from pure Perl via I18N::Langinfo, but it doesn't know only
about any of the non-standard ones, including the ones for this
category.
|
|
|
|
|
|
| |
The lock expands to nothing if unthreaded, or thread-local storage is in
effect. But otherwise protects a global value from being clobbered by
another thread.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Major malloc implementations, including the popular dlmalloc
derivatives all return chunks of memory that are a multiple of
the platform's pointer size. Perl's traditional default string
allocation of 10 bytes will almost certainly result in a larger
allocation than requested. Consequently, the interpreter may try
to Renew() an allocation to increase the PV buffer size when it
does not actually need to do so.
This commit increases the default string size to the nearest
pointer multiple. (12 bytes for 32-bit pointers, 16 bytes for
64-bit pointers). This is almost certainly unnecessarily small
for 64-bit platforms, since most common malloc implementations
seem to return 3*pointer size (i.e. 24 bytes) as the smallest
allocation. However, 16 bytes was chosen to prevent an increase
in memory usage in memory-constrained platforms which might have
a smaller minimum memory allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a nutshell, for a long time the minimum PV length (hardcoded
in Perl_sv_grow) has been 10 bytes and the minimum AV array size
(hardcoded in av_extend_guts) has been 4 elements.
These numbers have been used elsewhere for consistency (e.g.
Perl_sv_grow_fresh) in the past couple of development cycles.
Having a standard definition, rather than hardcoding in multiple
places, is more maintainable. This commit therefore introduces
into perl.h:
PERL_ARRAY_NEW_MIN_KEY
PERL_STRLEN_NEW_MIN
(Note: Subsequent commit(s) will actually change the values.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macros were defined in perl.h using preprocessor conditionals,
but determining wheter I32 is "int" or "long" is pretty hard with
preprocessor, when INTSIZE == LONGSIZE. The Configure script
should know exact underlying type of I32, so it should be able to
determine whether %d or %ld shall be used to format I32 value
more robustly.
Various pre-configured files, such as uconfig.h, are updated to
align with this.
|
|
|
|
|
| |
With RCPV strings we can use the RCPV_LEN() macro, and
make this logic a little less weird.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the previous commit, some library functions now keep
per-thread state. So far the only ones we care about are libc
locale-changing ones.
When perl changes threads by swapping out tTHX, those library functions
need to be informed about the new value so that they remain in sync with
what perl thinks the locale should be.
This commit creates a function to do this, and changes the
thread-changing macros to also call this as part of the change.
For POSIX 2008, the function just calls uselocale() using the
per-interpreter object introduced previously.
For Windows, this commit adds a per-interpreter string of the current
LC_ALL, and the function calls setlocale on that. We keep the same
string for POSIX 2008 implementations that lack querylocale(), so this
commit just enables that variable on Windows as well. The code is
already in place to free the memory the string occupies when done.
The commit also creates a mechanism to skip this during thread
destruction. A thread in its death throes doesn't need to have accurate
locale information, and the information needed to map from thread to
what libc needs to know gets destroyed as part of those throes, while
relics of the thread remain. I couldn't find a way to accurately know
if we are dealing with a relic or not, so the solution I adopted was to
just not switch during destruction.
This commit completes fixing #20155.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some configurations require us to store the current locale for each
category. Prior to this commit, this was done in the array
PL_curlocales, with the entry for LC_ALL being in the highest element.
Future commits will need just the value for LC_ALL in some other
configurations, without needing the rest of the array. This commit
splits off the LC_ALL element into its own per-interpreter variable to
accommodate those. It always had to have special handling anyway beyond
the rest of the array elements,
|
|
|
|
|
| |
This has gotten two twisty little mazy over time. Clean it up, add
comments, and make sure the logic is the same on both.
|
|
|
|
|
|
| |
If there aren't threads, yes locales are trivially thread-safe, but
the code that gets executed to make them so doesn't need to get
compiled, and that is controlled by this #define.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's unlikely that perl will be compiled with out the LC_CTYPE locale
category being enabled. But if it isn't, there is no sense in having
per-interpreter variables for various conditions in it, and no sense
having code that tests those variables.
This commit changes a macro to always yield 'false' when this is
disabled, adds a new similar macro, and changes some occurrences that
test for a variable to use the macros instead of the variables. That
way the compiler knows these to conditions can never be true.
|
|
|
|
| |
The outer pair is all that is necessary.
|
|
|
|
|
|
|
| |
There are various system calls used by perl that need to be protected by
a mutex in some configurations. This commit adds the ones not
previously added, for use in future commits. Further details are
in the merge commit message for this series of commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are cases where an executing function is vulnerable to either the
locale or environment being changed by another thread. This commit
implements macros that use mutexes to protect these critical sections.
There are two cases that exist: one where the functions only read; and
one where they can also need exclusive control so that a competing
thread can't overwrite the returned static buffer before it is safely
copied.
5.32 had a placeholder for these, but didn't actually implement it.
Instead it locked just the ENV portion. On modern platforms with
thread-safe locales, the locale portion is a no-op anyway, so things
worked on them.
This new commit extends that safety to other platforms. This has long
been a vulnerability in Perl.
|
|
|
|
| |
So they are closer to related statements
|
|
|
|
| |
The old name was confusing.
|
|
|
|
|
| |
These are subsumed by gwENVr_LOCALEr_LOCK created in the previous
commit.
|
|
|
|
|
| |
This is for functions that read the locale and environment and write to
some global space.
|
|
|
|
|
| |
And remove the similar advice but which applied only to STMT_START {}
STMT_END
|
|
|
|
|
| |
Previous commits have left this empty except for comments, and
equivalent comments have also been added elsewhere
|
|
|
|
| |
To enable future simplifications
|
|
|
|
| |
This simplifies slightly, and will allow further simplification
|
|
|
|
|
|
| |
This macro is used to surround raw setlocale() calls so that the return
value in a global static buffer can be saved without interference with
other threads.
|
| |
|