| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Instead of destroying the input by first swapping the bytes, this calls
a base function with the order to use. The non-reverse function is
changed to call the base function with the non-reversed order.
|
|
|
|
|
|
|
|
|
| |
Now that the DFA is used by the only callers to this to eliminate the
need to check for e.g., wrong continuation bytes, this function can be
refactored to use a switch statement, which makes it clearer, shorter,
and faster.
The name is changed to indicate its private nature
|
|
|
|
| |
This makes it use the fast DFA for this functionality.
|
|
|
|
|
|
|
|
|
| |
This specialized functionality is used to check the validity of Perl's
extended-length UTF-8, which has some ideosyncratic characteristics from
the shorter sequences. This means this function doesn't have to
consider those differences. It will be used in the next commit to avoid
some work, and to eventually enable is_utf8_char_helper() to be
simplified.
|
|
|
|
|
|
|
|
| |
One of these functions is now only called from the other, and there is
significant overlap in their logic.
This commit refactors them into one resulting function, which is half
the code, and more straight forward.
|
|
|
|
|
|
|
|
|
| |
I've always been uncomfortable with the input constraints this function
had. Now that it has been refactored into using a switch(), new cases
for full generality can be added without affecting performance, and
some conditionals removed before calling it.
The function is renamed to reflect its more generality
|
|
|
|
|
| |
This changes only portions of the capitalization, and the new version is
more in keeping with other function names.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing code to determine the position of the most significant 1
bit in a word is extracted from variant_byte_number(), and generalized
to use the deBruijn method previously added that works on any bit in the
word, rather than the existing method which looks just at the msb of
each byte. The code is moved to a new function in preparation for being
called from other places.
A U32 version is created, and on 64 bit platforms, a second, parallel,
version taking a U64 argument is also created. This is because future
commits may care about the word size differences.
|
|
|
|
|
|
|
|
|
|
| |
The existing code to determine the position of the least significant 1
bit in a word is extracted from variant_byte_number() and moved to a new
function in preparation for being called from other places.
A U32 version is created, and on 64 bit platforms, a second, parallel,
version taking a U64 argument is also created. This is because future
commits may care about the word size differences.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will prove useful in future commits on platforms that have 64 bit
capability.
The deBruijn sequence used here, taken from the internet, differs
from the 32 bit one in how they treat a word with no set bits. But this
is considered undefined behavior, so that difference is immaterial.
Apparently figuring this out uses brute force methods, and so I decided
to live with this difference, rather than to expend the time needed to
bring them into sync.
|
|
|
|
|
|
| |
This moves the code from regcomp.c to inline.h that calculates the
position of the lone set bit in a U32. This is in preparation for use
by other call sites.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the removal of PERL_OBJECT
(acfe0abcedaf592fb4b9cb69ce3468308ae99d91) PERL_IMPLICIT_CONTEXT and
MULTIPLICITY have been synonymous and they're being used interchangeably.
To simplify the code, this commit replaces all instances of
PERL_IMPLICIT_CONTEXT with MULTIPLICITY.
PERL_IMPLICIT_CONTEXT will stay defined for compatibility with XS
modules.
|
| |
|
|
|
|
|
| |
S_regclass() is unwieldy. This commit splits it into two nearly equal
size parts. More could be done.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The try change added code to pp_return to skip past try contexts
when looking for the sub/sort/eval context to return from.
This was only needed because cx_pusheval() sets si_cxsubix to the
current frame and try uses that function to push it's context, that
value is then used by the dopopto_cursub() macro to shortcut
walking the context stack.
Since we don't need to treat try as a sub for return, list vs array
checks or lvalue sub checks, don't set si_cxsubix on try.
|
| |
|
| |
|
|
|
|
|
| |
This used to be called from utf8.c, but no longer; no need to make it
other than static. This allows the compiler to better optimize.
|
|
|
|
|
|
|
|
| |
This change has been planned for a long time, bringing Perl into parity
with similar languages, but it took many deprecation cycles to be able
to reach the point where it could safely go in.
This fixes GH #18264
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit copies portions of new_regcurly(), which has been around
since 5.28, into plain regcurly(), as a baby step in preparation for
converting entirely to the new one. These functions are used for
parsing {m,n} quantifiers. Future commits will add capabilities not
available using the old version.
The commit adds an optional parameter, to return to the caller
information it gleans during parsing.
regpiece() is changed by this commit to use this information, instead of
itself reparsing the input. Part of the reason for this commit is that
changes are planned soon to what is legal syntax. With this commit in
place, those changes only have to be done once.
This commit also extracts into a function the calculation of the
quantifier bounds. This allows the logic for that to be done in one
place instead of two.
|
|
|
|
| |
This disables use of bareword filehandles except for the built-in handles
|
|
|
|
| |
5.32 did this for one form; now all do.
|
| |
|
|
|
|
|
|
| |
This function is called only at compile time; experience has shown that
compile-time operations are not time-critical. And future commits will
lengthen it, making it not practically inlinable anyway.
|
|
|
|
|
|
| |
I modified this function in ab01742544b98b5b5e13d8e1a6e9df474b9e3005,
and did not fully understand the edge cases. This commit now handles
those properly, the same as plain delimcpy() does.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
S_mg_free_struct() has a workaround to never free mg->mg_ptr for
PERL_MAGIC_regex_global.
Move this logic into a new magic vtable free method instead, so that
S_mg_free_struct() (which gets called for every type of magic) doesn't
have the overhead of checking every time for mg->mg_type ==
PERL_MAGIC_regex_global.
[ No, I don't know why PERL_MAGIC_regex_global's vtable and methods
are suffixed mglob rather than regex_global or vice versa ]
|
|
|
|
|
|
|
|
|
|
| |
S_mg_free_struct() has a workaround to free mg->mg_ptr in
PERL_MAGIC_utf8 even if mg_len is zero.
Move this logic into a new magic vtable free method instead, so that
S_mg_free_struct() (which gets called for every type of magic) doesn't
have the overhead of checking every time for mg->mg_type ==
PERL_MAGIC_utf8.
|
|
|
|
|
|
|
|
|
|
| |
v5.29.9-139-g44955e7de8 added a workaround to S_mg_free_struct() to
free mg->mg_ptr in PERL_MAGIC_collxfrm even if mg_len is zero.
Move this logic into a new magic vtable free method instead, so that
S_mg_free_struct() (which gets called for every type of magic) doesn't
have the overhead of checking every time for mg->mg_type ==
PERL_MAGIC_collxfrm.
|
| |
|
|
|
|
|
|
|
|
|
| |
This returns the number of elements in an array in a clearly named
function.
av_top_index(), av_tindex() are clearly named, but are less than ideal,
and came about because no one back then thought of this one, until now
Paul Evans did.
|
|
|
|
|
|
|
|
| |
This was originally added for MinGW, which no longer needs it, and
only still used by Symbian, which is now removed.
This also leaves perlapi.[ch] empty, but we keep the header for CPAN
backwards compatibility.
|
|
|
|
|
| |
Also eliminate USE_HEAP_INSTEAD_OF_STACK and
SETSOCKOPT_OPTION_VALUE_T, since Symbian was the only user of those.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in #17743, study_chunk can re-enter itself either by
simple recursion or by enframing. 089ad25d3f used the new mutate_ok
variable to track whether we were within the framing scope of GOSUB,
and to disallow mutating changes to ops if so.
This commit extends that logic to reentry by recursion, passing in
the current state as was_mutate_ok.
(CVE-2020-12723)
(cherry picked from commit 3445383845ed220eaa12cd406db2067eb7b8a741)
|
|
|
|
|
|
| |
(CVE-2020-10878)
(cherry picked from commit 4fccd2d99bdeb28c2937c3220ea5334999564aa8)
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, specifying a named sequence would result in a
mostly unhelpful fatal error message. This makes their use legal.
This is also the beginning of allowing Unicode string properties, which
are a new thing in the (still draft) Unicode requirements for regular
expression parsing, UTS 18. Full compliance will have to come later.
|
|
|
|
|
|
|
|
| |
This fixes #17612
This adds an inline function to pp_hot to be called to determine if
debugging info should be output or not, regardless of whether it comes
from -Dr, or from a 'use re Debug' statement
|
| |
|
|
|
|
|
| |
Use of this function was removed as part of adding wildcarding to the
Unicode name property
|
|
|
|
|
|
|
|
|
| |
The compilation of User-defined properties in a regular expression that
haven't been defined at the time that pattern is compiled is deferred
until execution time. Until this commit, any request for debugging info
on those was ignored.
This fixes that by
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit added a mutex specifically for protecting against
simultaneous accesses of the environment. This commit changes the
normal getenv, putenv, and clearenv functions to use it, to avoid races.
This makes the code simpler in places where we've gotten burned and
added stuff to avoid races. Other places where we haven't known we were
getting burned could have existed until now. Now that comes
automatically, and we can remove the special cases we earlier stumbled
over.
getenv() returns a pointer to static memory, which can be overwritten at
any moment from another thread, or even another getenv from the same
thread. This commit changes the accesses to be under control of a
mutex, and in the case of getenv, a mortalized copy is created so that
there is no possible race.
|
|
|
|
|
| |
This commit adds wildcard subpatterns for the Name and Name Aliases
properties.
|
|
|
|
|
|
|
|
| |
This makes special-cased forms such as sort { $b <=> $a }
even faster.
Also, since this commit removes PL_sort_RealCmp, it fixes the
issue with nested sort calls mentioned in gh #16129
|
|
|
|
|
|
|
|
| |
This adds two main functions that were previously only defined in
regcomp.c to also be defined in re_comp.c. This allows re.pm to use
debugging with them. To avoid duplicating large data structures,
several lightweight wrapper functions are added to regcomp.c that
re_comp.c calls to access those structures.
|
|
|
|
| |
These have to have a version in re_comp.c for re.pm to work on them.
|
|
|
|
|
|
| |
This moves a bunch of entries around so that they make more sense, and
consolidates some blocks that had the same #ifdefs. There should be no
change in what gets compiled.
|