| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This is an obsolete name, retained for back compat with cpan. Make sure
the core doesn't have it defined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls to libc snprintf() were neglected to be changed when perl was
fixed to change the radix character to the proper one based on whether
or not 'use locale' is in effect. Perl-level code is unaffected, but
core and XS code is.
This commit changes to wrap snprintf() calls with the macros designed
for the purpose, long used for similar situations elsewhere in the code.
Doing this requires the thread context. I achieved this in a few places
by a dTHX, instead of assuming a caller would have the context already
available, and adding a pTHX_ parameter. I tried doing it the other
way, and got a few breakages in our test suite. Formatting already
requires significant CPU time, so this addition should just be in the
noise
This bug was found by new tests that will be added in a future commit.
|
| |
|
|
|
|
|
|
| |
Having half of the comment have the * on the left side is confusing
for humans and especially so for programs. Split the two style into
two comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In GH 20435 many typos in our C code were corrected. However, this pull
request was not applied to blead and developed merge conflicts. I
extracted diffs for the individual modified files and applied them with
'git apply', excepting four files where patch conflicts were reported.
Those files were:
handy.h
locale.c
regcomp.c
toke.c
We can handle these in a subsequent commit. Also, had to run these two
programs to keep 'make test_porting' happy:
$ ./perl -Ilib regen/uconfig_h.pl
$ ./perl -Ilib regen/regcomp.pl regnodes.h
|
| |
|
|
|
|
|
|
|
|
| |
They are similar to SVf and SVf_QUOTEDPREFIX but take an HV * argument
and use HvNAME() and related macros to extract the string. This is
helpful as it makes constructing error messages from a stash (HV *)
easier. It is the callers responsibility to ensure that the HV is
actually a stash.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assignment operators (`==`) were missing, as were both the logical and
the low-precedence shortcutting OR and AND operators (`&&`, `||`,
`and`, `or`)
Also renumbered them around somewhat to even out the spacing. This is
fine during a development cycle.
Also renamed the tokenizer/parser symbol names from "PLUG*OP" to
"PLUGIN_*_OP" for better readability.
|
|
|
|
| |
the numbers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This splits a bunch of the subcomponents of the regex engine into
smaller files.
regcomp_debug.c
regcomp_internal.h
regcomp_invlist.c
regcomp_study.c
regcomp_trie.c
The only real change besides to the build machine to achieve the split
is to also adds some new defines which can be used in embed.fnc to control
exports without having to enumerate /every/ regex engine file. For
instance all of regcomp*.c defines PERL_IN_REGCOMP_ANY, and this is used
in embed.fnc to manage exports.
|
|
|
|
|
|
|
|
|
| |
Runs for identifier-named custom infix operators and sequences of
non-identifier symbol characters.
Defines multiple precedence levels for custom infix operators that fit
alongside exponentiation, multiplication, addition, or relational
comparision operators, as well as a "high" and "low" at either end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
localeconv() returns a structure contaiing fields that are associated
with two different categories: LC_NUMERIC and LC_MONETARY. Perl via
POSIX::localeconv() reutrns a hash containing all the fields.
Testing on Windows showed that if LC_CTYPE is not the same locale as
LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for
the numeric ones, mojibake can result.
The solution to similar situations elsewhere in the code is to toggle
LC_CTYPE into being the same locale as the one for the returned fields.
But those situations only have a single locale that LC_CTYPE has to
match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are
different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or
the other, not both at the same time.
The previous implementation did not consider this possibility, and
wasn't easily changeable to work.
Therefore, this rewrites a bunch of it. The solution used is to call
localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale
don't match (with LC_CTYPE toggled to the corresponding one each time).
(Only one call is made if the two categories have the same locale.)
This one vs two complicated the code, but I thought it was worth it
given that the one call is the most likely case.
Another complication is that on platforms that lack nl_langinfo(),
(Windows, for example), localeconv() is used to emulate portions of it.
Previously there was a separate function to handle this, using an SV()
cast as an HV() to avoid using a hash that wasn't actually necessary.
That proved to lead to extra duplicated code under the new scheme, so
that function was collapsed into a single one and a real hash is used in
all circumstances, but is only populated with the one or two fields
needed for the emulation.
The only part of this commit that I thought could be split off from the
rest concerns the fact that localeconv()'s return is not thread-safe,
and so must be copied to a safe place (the hash) while in a critical
section, locking out all other threads. Before this commit, that
copying was accompanied by determining if each string field needed to be
marked as UTF-8. That determination isn't necessarily trivial, so
should really not be in the critical section. This commit does that.
And, with some effort, that part could have been split into a separate
commit. but I didn't think it was worth the effort.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LC_NAME is a GNU extension that Perl hadn't been aware of. The
consequences were that it couldn't be set or queried in Perl (except by
using LC_ALL to set everything). There are other GNU extensions that
Perl has long known about; this was the only missing one.
The values associated with this category are retrievable by the glibc
call nl_langinfo(3) in XS code. The standard-specified items are
retrievable from pure Perl via I18N::Langinfo, but it doesn't know only
about any of the non-standard ones, including the ones for this
category.
|
|
|
|
|
|
| |
The lock expands to nothing if unthreaded, or thread-local storage is in
effect. But otherwise protects a global value from being clobbered by
another thread.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Major malloc implementations, including the popular dlmalloc
derivatives all return chunks of memory that are a multiple of
the platform's pointer size. Perl's traditional default string
allocation of 10 bytes will almost certainly result in a larger
allocation than requested. Consequently, the interpreter may try
to Renew() an allocation to increase the PV buffer size when it
does not actually need to do so.
This commit increases the default string size to the nearest
pointer multiple. (12 bytes for 32-bit pointers, 16 bytes for
64-bit pointers). This is almost certainly unnecessarily small
for 64-bit platforms, since most common malloc implementations
seem to return 3*pointer size (i.e. 24 bytes) as the smallest
allocation. However, 16 bytes was chosen to prevent an increase
in memory usage in memory-constrained platforms which might have
a smaller minimum memory allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a nutshell, for a long time the minimum PV length (hardcoded
in Perl_sv_grow) has been 10 bytes and the minimum AV array size
(hardcoded in av_extend_guts) has been 4 elements.
These numbers have been used elsewhere for consistency (e.g.
Perl_sv_grow_fresh) in the past couple of development cycles.
Having a standard definition, rather than hardcoding in multiple
places, is more maintainable. This commit therefore introduces
into perl.h:
PERL_ARRAY_NEW_MIN_KEY
PERL_STRLEN_NEW_MIN
(Note: Subsequent commit(s) will actually change the values.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macros were defined in perl.h using preprocessor conditionals,
but determining wheter I32 is "int" or "long" is pretty hard with
preprocessor, when INTSIZE == LONGSIZE. The Configure script
should know exact underlying type of I32, so it should be able to
determine whether %d or %ld shall be used to format I32 value
more robustly.
Various pre-configured files, such as uconfig.h, are updated to
align with this.
|
|
|
|
|
| |
With RCPV strings we can use the RCPV_LEN() macro, and
make this logic a little less weird.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the previous commit, some library functions now keep
per-thread state. So far the only ones we care about are libc
locale-changing ones.
When perl changes threads by swapping out tTHX, those library functions
need to be informed about the new value so that they remain in sync with
what perl thinks the locale should be.
This commit creates a function to do this, and changes the
thread-changing macros to also call this as part of the change.
For POSIX 2008, the function just calls uselocale() using the
per-interpreter object introduced previously.
For Windows, this commit adds a per-interpreter string of the current
LC_ALL, and the function calls setlocale on that. We keep the same
string for POSIX 2008 implementations that lack querylocale(), so this
commit just enables that variable on Windows as well. The code is
already in place to free the memory the string occupies when done.
The commit also creates a mechanism to skip this during thread
destruction. A thread in its death throes doesn't need to have accurate
locale information, and the information needed to map from thread to
what libc needs to know gets destroyed as part of those throes, while
relics of the thread remain. I couldn't find a way to accurately know
if we are dealing with a relic or not, so the solution I adopted was to
just not switch during destruction.
This commit completes fixing #20155.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some configurations require us to store the current locale for each
category. Prior to this commit, this was done in the array
PL_curlocales, with the entry for LC_ALL being in the highest element.
Future commits will need just the value for LC_ALL in some other
configurations, without needing the rest of the array. This commit
splits off the LC_ALL element into its own per-interpreter variable to
accommodate those. It always had to have special handling anyway beyond
the rest of the array elements,
|
|
|
|
|
| |
This has gotten two twisty little mazy over time. Clean it up, add
comments, and make sure the logic is the same on both.
|
|
|
|
|
|
| |
If there aren't threads, yes locales are trivially thread-safe, but
the code that gets executed to make them so doesn't need to get
compiled, and that is controlled by this #define.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's unlikely that perl will be compiled with out the LC_CTYPE locale
category being enabled. But if it isn't, there is no sense in having
per-interpreter variables for various conditions in it, and no sense
having code that tests those variables.
This commit changes a macro to always yield 'false' when this is
disabled, adds a new similar macro, and changes some occurrences that
test for a variable to use the macros instead of the variables. That
way the compiler knows these to conditions can never be true.
|
|
|
|
| |
The outer pair is all that is necessary.
|
|
|
|
|
|
|
| |
There are various system calls used by perl that need to be protected by
a mutex in some configurations. This commit adds the ones not
previously added, for use in future commits. Further details are
in the merge commit message for this series of commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are cases where an executing function is vulnerable to either the
locale or environment being changed by another thread. This commit
implements macros that use mutexes to protect these critical sections.
There are two cases that exist: one where the functions only read; and
one where they can also need exclusive control so that a competing
thread can't overwrite the returned static buffer before it is safely
copied.
5.32 had a placeholder for these, but didn't actually implement it.
Instead it locked just the ENV portion. On modern platforms with
thread-safe locales, the locale portion is a no-op anyway, so things
worked on them.
This new commit extends that safety to other platforms. This has long
been a vulnerability in Perl.
|
|
|
|
| |
So they are closer to related statements
|
|
|
|
| |
The old name was confusing.
|
|
|
|
|
| |
These are subsumed by gwENVr_LOCALEr_LOCK created in the previous
commit.
|
|
|
|
|
| |
This is for functions that read the locale and environment and write to
some global space.
|
|
|
|
|
| |
And remove the similar advice but which applied only to STMT_START {}
STMT_END
|
|
|
|
|
| |
Previous commits have left this empty except for comments, and
equivalent comments have also been added elsewhere
|
|
|
|
| |
To enable future simplifications
|
|
|
|
| |
This simplifies slightly, and will allow further simplification
|
|
|
|
|
|
| |
This macro is used to surround raw setlocale() calls so that the return
value in a global static buffer can be saved without interference with
other threads.
|
| |
|
|
|
|
|
| |
LOCALE_LOCK has already been defined in all circumstances earlier in the
file
|
|
|
|
|
| |
This reverts commit d0b8b8e8a48798446382161f988e6081140578d6.
I got ahead of myself. This commit was premature
|
|
|
|
| |
This simplifies slightly, and will allow further simplification
|
|
|
|
|
|
|
|
|
| |
Without this commit, Perl won't compile if -DUSE_NL_LOCALE_NAME is
specified to Configure. This is an undocumented feature that uses an
undocumented glibc feature that is effectively the querylocale() found
on Darwin and some other systems. POSIX 2017 has added a
querylocale-like function to the repertoire, and should eventually
supplant this option.
|
|
|
|
| |
This is needed in just one function, in locale.c, so more it there.
|
|
|
|
| |
This is needed in precisely one place in the code, so move it to there.
|
|
|
|
|
| |
This commit uses the new macro introduced in the previous commit to
define the internal locale mutex macros in POSIX.xs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some functions return a result in a global-to-the-program buffer, or
they use global memory internally. Other threads must be kept from
simultaneously using that function. This macro is to be used for all
such ones dealing with locales. Ideally, there would be a separate mutex
for each such buffer space. But these functions also have to lock the
locale from changing during their execution, and there aren't that many
such functions, and they actually are rarely executed. So a single lock
will do.
This will allow future commits to have more targeted locking for
functions that don't affect the global locale.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit removes the separate mutex for locking locale-related
numeric operations on threaded perls; instead using the general locale
one. The previous commit made that a general semaphore, so now suitable
for use for this purpose as well.
This means that the locale can be locked for the duration of some
sprintf operations, longer than before this commit. But on most modern
platforms, thread-safe locales cause this lock to expand just to a
no-op; so there is no effect on these. And on the impacted platforms,
one is not supposed to be using locales and threads in combination, as
races can occur. This lock is used on those perls to keep Perl's
manipulation of LC_NUMERIC thread-safe. And for those there is also no
effect, as they already lock around those sprintf's.
|
|
|
|
|
| |
Future commits will use this new capability, and in Configurations where
no locale locking is currently necessary.
|
|
|
|
| |
Disposing of the trivial case first makes things easier to read.
|