| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Adds a new function to statically detect forbidden control flow out of a
block.
|
|
|
|
|
|
|
| |
Returns the CV pointer to the overloaded method,
which will be needed by join to detect concat magic.
Co-authored-by: Philippe Bruhat (BooK) <book@cpan.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This splits a bunch of the subcomponents of the regex engine into
smaller files.
regcomp_debug.c
regcomp_internal.h
regcomp_invlist.c
regcomp_study.c
regcomp_trie.c
The only real change besides to the build machine to achieve the split
is to also adds some new defines which can be used in embed.fnc to control
exports without having to enumerate /every/ regex engine file. For
instance all of regcomp*.c defines PERL_IN_REGCOMP_ANY, and this is used
in embed.fnc to manage exports.
|
|
|
|
|
|
|
|
|
| |
Runs for identifier-named custom infix operators and sequences of
non-identifier symbol characters.
Defines multiple precedence levels for custom infix operators that fit
alongside exponentiation, multiplication, addition, or relational
comparision operators, as well as a "high" and "low" at either end.
|
|
|
|
|
|
|
|
|
| |
This function conveniently sets the ->op_targ field of the returned op,
making it neater to use inline in larger trees of new*OP functions used
to build optree fragments.
This function is implemented as `static inline`, for speed and code-size
reasons.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
localeconv() returns a structure contaiing fields that are associated
with two different categories: LC_NUMERIC and LC_MONETARY. Perl via
POSIX::localeconv() reutrns a hash containing all the fields.
Testing on Windows showed that if LC_CTYPE is not the same locale as
LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for
the numeric ones, mojibake can result.
The solution to similar situations elsewhere in the code is to toggle
LC_CTYPE into being the same locale as the one for the returned fields.
But those situations only have a single locale that LC_CTYPE has to
match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are
different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or
the other, not both at the same time.
The previous implementation did not consider this possibility, and
wasn't easily changeable to work.
Therefore, this rewrites a bunch of it. The solution used is to call
localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale
don't match (with LC_CTYPE toggled to the corresponding one each time).
(Only one call is made if the two categories have the same locale.)
This one vs two complicated the code, but I thought it was worth it
given that the one call is the most likely case.
Another complication is that on platforms that lack nl_langinfo(),
(Windows, for example), localeconv() is used to emulate portions of it.
Previously there was a separate function to handle this, using an SV()
cast as an HV() to avoid using a hash that wasn't actually necessary.
That proved to lead to extra duplicated code under the new scheme, so
that function was collapsed into a single one and a real hash is used in
all circumstances, but is only populated with the one or two fields
needed for the emulation.
The only part of this commit that I thought could be split off from the
rest concerns the fact that localeconv()'s return is not thread-safe,
and so must be copied to a safe place (the hash) while in a critical
section, locking out all other threads. Before this commit, that
copying was accompanied by determining if each string field needed to be
marked as UTF-8. That determination isn't necessarily trivial, so
should really not be in the critical section. This commit does that.
And, with some effort, that part could have been split into a separate
commit. but I didn't think it was worth the effort.
|
|
|
|
|
| |
This is in preparation for them to be called on platforms where locale
handling is not enabled.
|
|
|
|
| |
This dates from an earlier implementation
|
|
|
|
|
|
| |
These create parameters where the default expression is assigned
whenever the caller did not pass a defined (or true) value. I.e. both if
it is missing, or is present but undef (or false).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a weird bifurcation of the cop logic around threads. With
threads we use a char * cop_file member, without it we use a GV * and
replace cop_file with cop_filegv.
The GV * code refcounts filenames and more or less efficiently shares
the filename amongst many opcodes. However under threads we were
simplify copying the filenames into each opcode. This is because in
theory opcodes created in one thread can be destroyed in another. I say
in theory because as far as I know the core code does not actually do
this. But we have tests that you can construct a perl, clone it, and
then destroy the original, and have the copy work just fine, this means
that opcodes constructed in the main thread will be destroyed in the
cloned thread. This in turn means that you can't put SV derived
structures into the op-tree under threads. Which is why we can not use
the GV * stategy under threads.
As such this code adds a new struct/type RCPV, which is a refcounted
string using shared memory. This is implemented in such a way that code
that previously used a char * can continue to do so, as the refcounting
data is located a specific offset before the char * pointer itself.
This also allows the len data to embedded "into" the PV, which allows
us to expose macros to acces the length of what is in theory a null
terminated string.
struct rcpv {
UV refcount;
STRLEN len;
char pv[1];
};
typedef struct rcpv RCPV;
The struct is sized appropriately on creation in rcpv_new() so that the
pv member contains the full string plus a null byte. It then returns a
pointer to the pv member of the struct. Thus the refcount and length and
embedded at a predictable offset in front of the char *, which means we
do not have to change any types for members using this.
We provide three operations: rcpv_new(), rcpv_copy() and rcpv_free(),
which roughly correspond with newSVpv(), SvREFCNT_inc(), SvREFCNT_dec(),
and a handful of macros as well. We also expose SAVERCPVFREE which is
similar to SAVEGENERICSV but operates on pv's constructed with
rcpv_new().
Currently I have not restricted use of this logic to threaded perls. We
simply do not use it in unthreaded perls, but I see no reason we
couldn't normalize the code to use this in both cases, except possibly
that actually the GV case is more efficient.
Note that rcpv_new() does NOT use a hash table to dedup strings. Two
calls to rcpv_new() with the same arguments will produce two distinct
pointers with their own refcount data.
Refcounting the cop_file data was Tony Cook's idea.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code would handle subroutine attributes directly against
`PL_compcv` as a side-effect of merely parsing the syntax in
`yyl_colon()`, an unlikely place for anyone to find it. This complicates
the way the parser works.
The new structure creates a new function to apply all the builtin
attributes out of an attribute list to any given CV, and invokes it from
the parser at a slightly better time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently you can use sv_dump() to dump an AV or HV as these are
still SV's underneath in C terms, but it will only dump out the top
level object and will not dump out its contents, whereas if you have
an RV which references the same AV or HV it will dump it out to depth
of 4.
This adds av_dump() and hv_dump() which dump up to a depth of 3 (thus
matching what sv_dump() would have showed had it been used to dump
an RV to the same object). It also adds sv_dump_depth() which allows
passing in an arbitrary depth. You could argue the former are redundant
in light of sv_dump_depth(), but the av_dump() and hv_dump() variants
do not require a cast for their arguments.
These functions are provided as debugging aids for development. They
aren't used directly in the core, and they all wrap the same core
routine that is used for sv_dump() (do_sv_dump()).
|
|
|
|
| |
This was generating warnings on several different platforms
|
|
|
|
|
| |
This is in preparation for it to be used in more instances in future
commits. It uses a symbol that won't be defined until those commits.
|
|
|
|
|
|
|
|
|
| |
This changes these functions to take the code page as input, instead of
being just UTF-8. Macros are created to call them with UTF-8.
I'm doing this because there is no loss of efficiency, and it is
somewhat jarring, given Perl terminology, to call a function with 'Byte'
in the name with a parameter with 'utf8' in the name.
|
|
|
|
|
|
|
| |
These are non-API, used in this file, and because of #ifdefs, not
accessible outside it, so there is no current need to make them publicly
available. If we were ever to need them to be accessible more widely,
they would not belong in this file.
|
|
|
|
|
|
| |
There is code in locale.c to emulate POSIX 'setlocale(foo, "")'. And
there is separate code to emulate this on Windows. This commit
collapses them, ensuring the same algorithm is used on both systems.
|
|
|
|
|
| |
This is in preparation for this function to be used under more
circumstances.
|
|
|
|
| |
This makes the calls to it cleaner.
|
|
|
|
|
| |
A future commit will want the context for more than just DEBUGGING
builds.
|
|
|
|
|
|
| |
In reading this code, I realized that there were instances where the
functions didn't work properly. It is hard to test these, but a future
commit will do so.
|
|
|
|
|
| |
This makes some print statements less awkward, and is more flexible,
which will be used in future commits
|
|
|
|
|
|
|
|
| |
S_less_dicey_bool_setlocale_r() is a short function that makes a
complete set of similar functions, but there is no current use of it.
So just #ifdef it out.
This resolves #20338
|
|
|
|
|
| |
This is to prevent warnings due to the char * frequently being
unused.
|
|
|
|
|
| |
setlocale_debug_string() variants now use Perl_form, a function I
didn't know existed when I originally wrote this code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This a new set of macros and functions to do locale changing and
querying for platforms where perl is compiled with threads, but the
platform doesn't have thread-safe locale handling.
All it does is:
1) The return of setlocale() is always safely saved in a per-thread
buffer, and
2) setlocale() is protected by a mutex from other threads which are
using perl's locale functions.
This isn't much, but it might be enough to get some programs to work on
such platforms which rarely change or query the locale.
|
|
|
|
|
| |
On Configurations without LC_COLLATE, various unused warnings were
being generated.
|
|
|
|
|
| |
On Configurations without LC_CTYPE, various unused warnings were
being generated.
|
|
|
|
|
| |
This function is not used unless locales are enabled, so need not be
defined unless that is true.
|
|
|
|
|
| |
This function is not used unless LC_NUMERIC is enabled, so need not be
defined unless that is true.
|
|
|
|
|
|
|
| |
This encapsulates a common paradigm, helpful for debugging
It requires the calculate_LC_ALL to be additionally available when there
is no LC_ALL.
|
|
|
|
|
| |
The previous commit removed all but one use of this function, which is
replaceable by an array lookup
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What these functions do has been subsumed by code introduced in previous
commits, and in a more straight forward manner.
Also removed in this commit is the cache of the knowing what locales are
UTF-8 or not. This data is now cheaper to calculate when needed, and
there is now a single entry cache, so I don't think the complexity
warrants keeping it.
It could be added back if necessary, split off from the remainder of
this commit.
|
|
|
|
|
|
|
|
|
|
|
| |
locale.c has the infrastructure to handle this, so remove repeated
logic.
The removed code tried to discern better based on using script runs, but
this actually doesn't help, so is removed.
Since we're now using C99, we can remove the block that was previously
needed, and now the code is properly indented, whereas before it wasn't
|
|
|
|
| |
A future commit will need this even when locales are not used.
|
|
|
|
|
|
| |
This prints out more information, better organized.
It also moves up the info from -DLv to plain -DL
|
|
|
|
|
| |
This is in preparation for working on it; the new name, mem_collxfrm_ is
in compliance with the C Standard; the old was not.
|
|
|
|
|
|
|
| |
Previous commits have made this function much smaller, and its branches
can be easily absorbed into the callers, with clearer code, and in fact
removal of a redundant calculation of the locale's radix character,
promised in a previous commit's message
|
|
|
|
|
| |
my_langinfo_i() now will additionally return the UTF-8ness of the
returned string.
|
|
|
|
|
| |
This is like plain my_strftime(), but additionally returns an indication
of the UTF-8ness of the returned string
|
|
|
|
|
|
|
|
| |
A previous commit move the logic for localeconv() into locale.c. This
commit takes advantage of that to use it instead of repeating the logic.
Notably, this commit removes the inconsistent duplicate logic that had
been used to deal with the Windows broken localeconv() bug.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code currently in POSIX.xs is moved to locale.c, and reworked some
to fit in that scheme, and the logic for the workaround for the Windows
broken localeconv() is made more robust.
This is in preparation for the next commit which will use this logic
instead of (imperfectly) duplicating it.
This also creates Perl_localeconv() for direct XS calls of this
functionality.
|
|
|
|
|
| |
get_locale_string_utf8ness_i() will determine if the string it is passed
in the locale it is passed is to be treated as UTF-8, or not.
|
|
|
|
|
|
|
|
|
|
|
| |
Previous commits have added the infrastructure to be able to determine
if a locale is UTF-8. This will prove useful, and this commit adds
a function to encapsulate this information, and uses it in a couple of
places, with more to come in future commits.
This uses as a final fallback, mbtowc(), supposed to be available in
C99. Future commits will add heuristics when that function isn't
available or is known to be unreliable on a particular system.
|
|
|
|
|
|
|
|
|
|
| |
This commit changes the calling sequence for my_langinfo to add the
desired locale, and the locale category of the desired item.
This allows the function to be able to return the desired value for any
locale, avoiding some locale changes that would happen until this
commit, and hiding the need for locale changes from outside functions,
though a couple continue to do so to avoid potential multiple changes.
|
|
|
|
|
|
|
|
| |
It determines if the name indicates it is UTF-8 or not. There are
several variant spellings in use, and this hides that from the the
callers.
It won't be actually used until the next commit
|
|
|
|
|
|
|
|
|
|
|
| |
This makes my_langinfo() reentrant by adding parameters specifying where
to store the result.
This prepares for future commits, and fixes some minor bugs for XS
writers, in that the claim was that the buffer in calling
Perl_langinfo() was safe from getting zapped until the next call to it
in the same thread. It turns out there were cases where, because of
internal calls, the buffer did get zapped.
|
|
|
|
|
|
| |
The preprocessor directives were only flooking for plain nl_langinfo().
It's quite unlikely that a platform will have the '_l' version without
also having the plain one. But this makes sure.
|
|
|
|
| |
The extra syllable(s) are unnecessary noise
|