| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Prior to this commit, specifying a named sequence would result in a
mostly unhelpful fatal error message. This makes their use legal.
This is also the beginning of allowing Unicode string properties, which
are a new thing in the (still draft) Unicode requirements for regular
expression parsing, UTS 18. Full compliance will have to come later.
|
|
|
|
|
|
|
|
| |
This fixes #17612
This adds an inline function to pp_hot to be called to determine if
debugging info should be output or not, regardless of whether it comes
from -Dr, or from a 'use re Debug' statement
|
| |
|
|
|
|
|
| |
Use of this function was removed as part of adding wildcarding to the
Unicode name property
|
|
|
|
|
|
|
|
|
| |
The compilation of User-defined properties in a regular expression that
haven't been defined at the time that pattern is compiled is deferred
until execution time. Until this commit, any request for debugging info
on those was ignored.
This fixes that by
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit added a mutex specifically for protecting against
simultaneous accesses of the environment. This commit changes the
normal getenv, putenv, and clearenv functions to use it, to avoid races.
This makes the code simpler in places where we've gotten burned and
added stuff to avoid races. Other places where we haven't known we were
getting burned could have existed until now. Now that comes
automatically, and we can remove the special cases we earlier stumbled
over.
getenv() returns a pointer to static memory, which can be overwritten at
any moment from another thread, or even another getenv from the same
thread. This commit changes the accesses to be under control of a
mutex, and in the case of getenv, a mortalized copy is created so that
there is no possible race.
|
|
|
|
|
| |
This commit adds wildcard subpatterns for the Name and Name Aliases
properties.
|
|
|
|
|
|
|
|
| |
This makes special-cased forms such as sort { $b <=> $a }
even faster.
Also, since this commit removes PL_sort_RealCmp, it fixes the
issue with nested sort calls mentioned in gh #16129
|
|
|
|
|
|
|
|
| |
This adds two main functions that were previously only defined in
regcomp.c to also be defined in re_comp.c. This allows re.pm to use
debugging with them. To avoid duplicating large data structures,
several lightweight wrapper functions are added to regcomp.c that
re_comp.c calls to access those structures.
|
|
|
|
| |
These have to have a version in re_comp.c for re.pm to work on them.
|
|
|
|
|
|
| |
This moves a bunch of entries around so that they make more sense, and
consolidates some blocks that had the same #ifdefs. There should be no
change in what gets compiled.
|
|
|
|
|
|
|
|
| |
This is in preparation for being called from more than one place.
It has the salubrious effect that the wrapping we do around the user's
supplied pattern is no longer visible in the Debug output of that
pattern.
|
|
|
|
|
|
| |
This does the bulk of re_compile(), but is a private entry point,
meaning it takes an extra parameter, and a future commit will call it
from another place.
|
|
|
|
|
| |
Two of the functions are internal to the core; the third has long been
deprecated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A set operations expression can contain a previously-compiled one
interpolated in. Prior to this commit, some heuristics were employed
to verify it actually was such a thing, and not a sort of look-alike
that wasn't necessarily valid. The heuristics actually forbade legal
ones. I don't know of any illegal ones that were let through, but it is
certainly possible. Also, the error/warning messages referred to the
heuristics, and were unhelpful at best.
The technique used instead in this commit is to return a regop only used
by this feature for any nested compilations. This guarantees that the
caller can determine if the result is valid, and what that result is
without having to do any heuristics or inspecting any flags. The
error/warning messages are changed to reflect this, and I believe are
now helpful.
This fixes the bugs in #16779
https://github.com/Perl/perl5/issues/16779#issuecomment-563987618
|
|
|
|
| |
This is in preparation for it being called from more than one place.
|
|
|
|
|
| |
There is now a function that generates this error message. This is so
that it is always the same from wherever generated.
|
|
|
|
|
| |
This is in preparation for it being used elsewhere, to reduce
duplication of code.
|
|
|
|
|
| |
The remaining function in this file is moved to inline.h, just to not
have an extra file lying around with hardly anything in it.
|
|
|
|
|
|
|
|
| |
This changes warning messages for too short \0 octal constants to use
the function introduced in the previous commit. This function assures a
consistent and clear warning message, which is slightly different than
the one this commit replaces. I know of no CPAN code which depends on
this warning's wording.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes these functions to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within these functions. The messages are somewhat changed from
previously to be clearer. I did not find any code in CPAN that relied
on the previous message text.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) These could generate truncated messages due to the (mostly)
single-pass nature of pattern compilation that is now in effect. It
keeps track of where during a parse a message has been output, and
won't output it again if a second parsing pass turns out to be
necessary. Prior to this commit, it had to assume that a message
from one of these functions did get output, and this caused some
out-of-bounds reads when a subparse (using a constructed pattern) was
executed. The possibility of those went away in commit 5d894ca5213,
which guarantees it won't try to read outside bounds, but that may
still mean it is outputting text from the wrong parse, giving
meaningless results. This commit should stop that possibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes this function to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within this function.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) The messages could be truncated due to the (mostly) single-pass
nature of pattern compilation that is now in effect. It keeps track
of where during a parse a message has been output, and won't output
it again if a second parsing pass turns out to be necessary. Prior
to this commit, it had to assume that a message from one of these
functions did get output, and this caused some out-of-bounds reads
when a subparse (using a constructed pattern) was executed. The
possibility of those went away in commit 5d894ca5213, which
guarantees it won't try to read outside bounds, but that may still
mean it is outputting text from the wrong parse, giving meaningless
results. This commit should stop that possibility.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit uses a variety of techniques for speeding this up. It is
now faster than blead, and has less maintenance cost than before.
Most of the checks that the current character isn't NUL are unnecssary.
The logic works on that character, even if, for some reason, you can't
trust the input length. A special test is added to not output the
illegal character message if that character is a NUL. This is simply
for backcompat.
And a switch statement is used to unroll the loop for the leading digits
in the number. This should handle most common cases. Beyond these, and
one has to start worrying about overflow. So this version has removed
that worrying from the common cases.
Extra conditionals are avoided for large numbers by extracting the
portability warning message code into a separate static function called
from two different places. Simplifying this logic led me to see that if
it overflowed, it must be non-portable, so another conditional could be
removed.
Other conditionals were removed at the expense of adding parameters to
the function. This function isn't public, but is called from the
grok_hex, et. al. macros. grok_hex knows, for example, that it is
looking for an 'x' prefix and not a 'b'. Previously the code had a
conditional to determine that.
Similarly in pp.c, we look for the prefix. Having found it we can start
the parse after the prefix, and tell this function not to look for it.
Previously, this work was duplicated.
The previous changes had left this function slower than blead. That is
in part due to the fact that the loop doesn't go through that many
iterations per function call, and the gcc compiler managed to optimize
away the conditionals in XDIGIT_VALUE in the call of it from the loop.
(The other call in this function did have the conditionals.)
Thanks to Sergey Aleynikov for his help on this
|
|
|
|
|
|
|
|
|
|
| |
These functions are identical in logic in the main loop, the difference
being which digits they accept. The rest of the code had slight
variations. This commit unifies the functions.
I presume the reason they were kept separate was because of speed.
Future commits will make this unified function faster than blead, and
the reduced maintenance cost makes this worthwhile.
|
|
|
|
|
|
|
| |
These are illegal in C, but we have plenty of them around; I happened
to be looking at this function, and decided to fix it. Note that only
the macro name is illegal; the function was fine, but to change the
macro name means changing the function one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a new infix operator named `isa`, with the semantics that
$x isa SomeClass
is true if and only if `$x` is a blessed object reference that is either
`SomeClass` directly, or includes the class somewhere in its @ISA
hierarchy. It is false without warning or error for non-references or
non-blessed references.
This operator respects `->isa` method overloading, and is intended to
replace boilerplate code such as
use Scalar::Util 'blessed';
blessed($x) and $x->isa("SomeClass")
|
|
|
|
|
|
|
|
|
| |
This was caused by a static inline function in a header that was
#included in a file that didn't use it. Normally, these functions are
#ifdef'd so as to be visible only to files in which they are used.
Some compilers warn that the function is defined but not used
otherwise. The solution is to remove this function's visibility from
the file that didn't use it.
|
|
|
|
|
|
| |
This makes it less complicated to find the lowest code point in an
inversion list. This makes the place where it's used clearer as to what
is going on. And it may eventually be used in more than one place.
|
| |
|
|
|
|
|
|
|
| |
Currently, whether the OS-level signal handler function is declared as
1-arg or 3-arg depends on the configuration. Add explicit versions of
these functions, principally so that POSIX.xs can call which version of
the handler it wants regardless of configuration: see next commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function implements the body of what used to be Perl_sighandler(),
the latter becoming a thin wrapper round Perl_perly_sighandler().
The main reason for this change is that it allows us to add an extra
arg, 'safe' to the function without breaking backcompat. This arg
indicates whether the function is being called directly from the OS
signal handler (safe==0), or deferred via Perl_despatch_signals()
(safe==1).
This allows an infelicity in the code to be fixed - it was formerly
trying to determine the route it had been called by (and hence whether a
'safe' route) by seeing if either of the sig/uap parameters was
non-null. It turns out that this was highly dogdy, and only worked by
luck. The safe caller did indeed pass NULL args, but due to a bug
(shortly to be fixed), sometimes the kernel thinks its calling a 1-arg
sig handler when its actually calling a 3-arg one. This means that the
sig/uap args are random garbage, and happen to be non-zero only by happy
coincidence on the OS/platforms so far.
Also, it turns out that the call via Perl_csighandler() was getting it
wrong: its explicit (NULL,NULL) args made it look like a safe signal
call. This will be corrected in the next commit, but for this commit the
old wrong behaviour is preserved.
See RT #82040 for details of when/why the original dodgy 'safe' check
was
added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a bunch of places in core that do
#if defined(HAS_SIGACTION) && defined(SA_SIGINFO)
to decide whether the C signal handler function should be declared with,
and called with, 1 arg or 3 args.
This commit just adds
#if defined(HAS_SIGACTION) && defined(SA_SIGINFO)
# define PERL_USE_3ARG_SIGHANDLER
#endif
Then uses the new macro in all other places rather than checking
HAS_SIGACTION and SA_SIGINFO. Thus there is no functional change; it just
makes the code more readable.
However, it turns out that all is not well with core's use of 1-arg
versus 3-arg, and the few commits will fix this.
|
|
|
|
|
| |
This further decouples this function from knowing details of the calling
structure, by passing this detail in.
|
|
|
|
|
|
|
|
|
|
|
| |
This includes:
- remove them from the API
- simplify quadmath_format_single()'s interface, and rename it
to match the new interface
fixes #17288
|
|
|
|
| |
Also references to the term.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large commit removes the last use of swashes from core.
It replaces swashes by inversion maps. This data structure is already
in use for some Unicode properties, such as case changing.
The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions. This should be faster than the previous implementation
anyway. I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges. These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.
Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster. This means that there is extra code,
purely for runtime performance.
An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255. Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it. That could be
changed if I am wrong.
Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all. This fixes some false
negatives in the previous implementation. It also allows for detecting
if the actual transliteration can be done in place. Previously, the
code mostly punted on that detection for the UTF-8 case.
This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.
A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8. If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts. The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used. That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile. AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.
Inversion maps are ideally suited for tr/// implementations. An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data. However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.
Inversion maps are also easier to deparse than swashes. A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.
One could implement specialized data structures for specific types of
inputs. For example, a common tr/// form is a single range, like
tr/A-Z/a-z/. That could be implemented without a table and be quite
fast. An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.
Thanks to Nicholas Rochemagne for his help on B
|
|
|
|
| |
This function dumps out an inversion map
|
|
|
|
|
| |
instead of deriving it each time from inside the function. This is in
preparation for future commits.
|
|
|
|
|
| |
They are still only accessible from regcomp.c, but this is in
preparation for them to be usable from other core files as well.
|
|
|
|
| |
This removes an unnecessary leading underscore
|
|
|
|
|
| |
into one that takes both SV*/char*+len arguments, like hv_common,
to be able to use speedups from SV* stash lookup API.
|
|
|
|
|
|
|
|
|
| |
I'm not sure why this didn't show up elsewhere, but we have embed.fnc
entries for non-existent functions that should have been removed in
dd1a3ba7882ca70c1e85b0fd6c03d07856672075.
In addition, I see several more functions that should have been removed,
and this commit removes them.
|
|
|
|
| |
These were missed by 059703b088f44d5665f67fba0b9d80cad89085fd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It has been deprecated since 5.26 to use various macros that deal with
UTF-8 inputs but don't have a parameter indicating the maximum length
beyond which we should not look. This commit changes all such macros,
as threatened in existing documentation and warning messages, to have an
extra parameter giving the length.
This was originally scheduled to happen in 5.30, but was delayed because
it broke some CPAN modules, and there wasn't really a good way around
it. But now that Devel::PPPort 3.54 is out, ppport.h has new facilities
for getting modules making these changes to work with older Perl
releases.
|
|
|
|
|
|
|
|
| |
original merge commit: v5.31.3-198-gd2cd363728
reverted by: v5.31.4-0-g20ef288c53
The commit following this commit fixes the breakage, which that means
the revert can be undone.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d2cd363728088adada85312725ac9d96c29659be, reversing
changes made to 068b48acd4bdf9e7c69b87f4ba838bdff035053c.
This change breaks installing Test::Deep:
...
not ok 37 - Test 'isa eq' completed
ok 38 - Test 'isa eq' no premature diagnostication
...
|
|
|
|
|
|
| |
This function makes use of PL_curstackinfo->si_cxsubix to avoid the
overhead of a call to block_gimme() when the context of the op is
unknown.
|
|
|
|
|
| |
This is because it has the X flag, which means the function is visible
on ELF systems.
|
|
|
|
| |
These function names need a Perl_ prefix to avoid namespace pollution.
|