| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in #17743, study_chunk can re-enter itself either by
simple recursion or by enframing. 089ad25d3f used the new mutate_ok
variable to track whether we were within the framing scope of GOSUB,
and to disallow mutating changes to ops if so.
This commit extends that logic to reentry by recursion, passing in
the current state as was_mutate_ok.
(CVE-2020-12723)
(cherry picked from commit 3445383845ed220eaa12cd406db2067eb7b8a741)
|
|
|
|
|
|
| |
(CVE-2020-10878)
(cherry picked from commit 4fccd2d99bdeb28c2937c3220ea5334999564aa8)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One analyzer said that what this commit changes was C undefined
behavior, in casting void* pointers. Right now, the only actual type it
is called with is SV*, but I made it void*, because I thought it might
be used more generally. But, it turns out that Unicode is planning on
changing its regular expression processing requirements to where what I
have no longer will make sense. And, since only SV* is actually used,
this commit changes the void* to SV*, removing any undefined behavior,
with no changes to program logic.
The changes for the new Unicode direction will come in probably 5.34;
their document is still in draft, but I anticipate it will soon be
finalized.
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, specifying a named sequence would result in a
mostly unhelpful fatal error message. This makes their use legal.
This is also the beginning of allowing Unicode string properties, which
are a new thing in the (still draft) Unicode requirements for regular
expression parsing, UTS 18. Full compliance will have to come later.
|
|
|
|
|
|
|
|
| |
This fixes #17612
This adds an inline function to pp_hot to be called to determine if
debugging info should be output or not, regardless of whether it comes
from -Dr, or from a 'use re Debug' statement
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optype appears to be almost completely unused, and on Win32 builds
we saw warnings from the cmpchain patches:
perly.y(1063) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data
perly.y(1065) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data
perly.y(1079) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data
perly.y(1081) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data
Reviewing the code I noticed that functions like Perl_newBINOP() have
an I32 type argument, and functions like OpTYPE_set() coerce such
arguments into type OPCODE:
#define OpTYPE_set(o,type) \
STMT_START { \
o->op_type = (OPCODE)type; \
o->op_ppaddr = PL_ppaddr[type]; \
} STMT_END
this patch changes the signature to the new cmpchain functions so that
they do they same, and change the type for storage for op_type values
to also use OPCODE like most of the other op.c code.
|
| |
|
|
|
|
|
|
|
| |
Pattern compilation is not a performance critical process; there's no
need to request these to be inlined. Let the compiler decide, given
they are static anyway. This came up because g++ was warning they
weren't getting inlined anyway.
|
|
|
|
|
|
|
| |
This reverts commit 6c714a09cc08600278e72aea1fcdf83576d061b4.
croak_memory_wrap is designed to save a few bytes of memory, and was
never intended to be inlined. This commit moves it to util.c where the
other croak functions are.
|
|
|
|
|
| |
Use of this function was removed as part of adding wildcarding to the
Unicode name property
|
|
|
|
|
|
|
|
|
| |
The compilation of User-defined properties in a regular expression that
haven't been defined at the time that pattern is compiled is deferred
until execution time. Until this commit, any request for debugging info
on those was ignored.
This fixes that by
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit added a mutex specifically for protecting against
simultaneous accesses of the environment. This commit changes the
normal getenv, putenv, and clearenv functions to use it, to avoid races.
This makes the code simpler in places where we've gotten burned and
added stuff to avoid races. Other places where we haven't known we were
getting burned could have existed until now. Now that comes
automatically, and we can remove the special cases we earlier stumbled
over.
getenv() returns a pointer to static memory, which can be overwritten at
any moment from another thread, or even another getenv from the same
thread. This commit changes the accesses to be under control of a
mutex, and in the case of getenv, a mortalized copy is created so that
there is no possible race.
|
|
|
|
|
| |
This commit adds wildcard subpatterns for the Name and Name Aliases
properties.
|
|
|
|
|
|
|
|
| |
This makes special-cased forms such as sort { $b <=> $a }
even faster.
Also, since this commit removes PL_sort_RealCmp, it fixes the
issue with nested sort calls mentioned in gh #16129
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes #17026
Patterns can have subpatterns since 5.30. These are processed when
encountered, by suspending the main pattern compilation, compiling the
subpattern, and then matching that against the set of all legal
possibilities, which Perl knows about.
Debugging info for the compilation portion of the subpattern was added
by be8790133a0ce8fc67454e55e7849a47a0858d32, without fanfare. But,
prior to this new commit, debugging info was not available for that
matching portion of the compilation, except under DEBUGGING builds, with
-Drv. This commit adds a new option to 'use re qw(Debug ...)',
WILDCARD, to enable subpattern match debugging. Whatever other match
debugging options have been turned on will show up when a wildcard
subpattern is compiled iff WILDCARD is specified.
The output of this may be voluminous, which is why you have to ask for
it specifically. Or, the EXTRA option turns it on, along with several
other things.
|
|
|
|
|
|
|
|
| |
This adds two main functions that were previously only defined in
regcomp.c to also be defined in re_comp.c. This allows re.pm to use
debugging with them. To avoid duplicating large data structures,
several lightweight wrapper functions are added to regcomp.c that
re_comp.c calls to access those structures.
|
|
|
|
| |
These have to have a version in re_comp.c for re.pm to work on them.
|
|
|
|
|
|
| |
This moves a bunch of entries around so that they make more sense, and
consolidates some blocks that had the same #ifdefs. There should be no
change in what gets compiled.
|
|
|
|
|
|
| |
This makes sure that a function with varargs arguments is marked as
format or non-format, so that a such a new function can't be added
without considering if it should be marked as 'f'.
|
|
|
|
|
| |
This enables compiler warnings when argument types don't match the
format
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes this function from taking two format parameters to instead
taking a single one. The reason is that the generality isn't actually
currently needed, and it prevents the function from being declared as
taking a format, which adds extra checking. If this checking had been
in effect, GH #17574 would have generated a warning message.
The reason the second format isn't required is that in all cases, both
formats are literal strings. In the macros that call this, simply
removing the comma separators between them causes the two literals to
automagically become one by the C preprocessor.
|
|
|
|
|
|
|
|
| |
This is in preparation for being called from more than one place.
It has the salubrious effect that the wrapping we do around the user's
supplied pattern is no longer visible in the Debug output of that
pattern.
|
|
|
|
|
|
| |
This does the bulk of re_compile(), but is a private entry point,
meaning it takes an extra parameter, and a future commit will call it
from another place.
|
|
|
|
|
| |
Two of the functions are internal to the core; the third has long been
deprecated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A set operations expression can contain a previously-compiled one
interpolated in. Prior to this commit, some heuristics were employed
to verify it actually was such a thing, and not a sort of look-alike
that wasn't necessarily valid. The heuristics actually forbade legal
ones. I don't know of any illegal ones that were let through, but it is
certainly possible. Also, the error/warning messages referred to the
heuristics, and were unhelpful at best.
The technique used instead in this commit is to return a regop only used
by this feature for any nested compilations. This guarantees that the
caller can determine if the result is valid, and what that result is
without having to do any heuristics or inspecting any flags. The
error/warning messages are changed to reflect this, and I believe are
now helpful.
This fixes the bugs in #16779
https://github.com/Perl/perl5/issues/16779#issuecomment-563987618
|
|
|
|
| |
This is in preparation for it being called from more than one place.
|
|
|
|
|
| |
There is now a function that generates this error message. This is so
that it is always the same from wherever generated.
|
|
|
|
|
| |
This is in preparation for it being used elsewhere, to reduce
duplication of code.
|
|
|
|
| |
This internal function is more properly bool, not I32.
|
|
|
|
|
| |
The remaining function in this file is moved to inline.h, just to not
have an extra file lying around with hardly anything in it.
|
|
|
|
|
|
|
|
| |
This changes warning messages for too short \0 octal constants to use
the function introduced in the previous commit. This function assures a
consistent and clear warning message, which is slightly different than
the one this commit replaces. I know of no CPAN code which depends on
this warning's wording.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes these functions to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within these functions. The messages are somewhat changed from
previously to be clearer. I did not find any code in CPAN that relied
on the previous message text.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) These could generate truncated messages due to the (mostly)
single-pass nature of pattern compilation that is now in effect. It
keeps track of where during a parse a message has been output, and
won't output it again if a second parsing pass turns out to be
necessary. Prior to this commit, it had to assume that a message
from one of these functions did get output, and this caused some
out-of-bounds reads when a subparse (using a constructed pattern) was
executed. The possibility of those went away in commit 5d894ca5213,
which guarantees it won't try to read outside bounds, but that may
still mean it is outputting text from the wrong parse, giving
meaningless results. This commit should stop that possibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes this function to allow a caller to request any
messages generated to be returned to the caller, instead of always being
handled within this function.
Like the previous commit for grok_bslash_c, here are two reasons to do
this, repeated here.
1) In pattern compilation this brings these messages into conformity
with the other ones that get generated in pattern compilation, where
there is a particular syntax, including marking the exact position in
the parse where the problem occurred.
2) The messages could be truncated due to the (mostly) single-pass
nature of pattern compilation that is now in effect. It keeps track
of where during a parse a message has been output, and won't output
it again if a second parsing pass turns out to be necessary. Prior
to this commit, it had to assume that a message from one of these
functions did get output, and this caused some out-of-bounds reads
when a subparse (using a constructed pattern) was executed. The
possibility of those went away in commit 5d894ca5213, which
guarantees it won't try to read outside bounds, but that may still
mean it is outputting text from the wrong parse, giving meaningless
results. This commit should stop that possibility.
|
|
|
|
|
|
| |
In two functions, future commits will generalize this parameter to be
possibly a warning message instead of only an error message. Change its
name to reflect the added meaning.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit uses a variety of techniques for speeding this up. It is
now faster than blead, and has less maintenance cost than before.
Most of the checks that the current character isn't NUL are unnecssary.
The logic works on that character, even if, for some reason, you can't
trust the input length. A special test is added to not output the
illegal character message if that character is a NUL. This is simply
for backcompat.
And a switch statement is used to unroll the loop for the leading digits
in the number. This should handle most common cases. Beyond these, and
one has to start worrying about overflow. So this version has removed
that worrying from the common cases.
Extra conditionals are avoided for large numbers by extracting the
portability warning message code into a separate static function called
from two different places. Simplifying this logic led me to see that if
it overflowed, it must be non-portable, so another conditional could be
removed.
Other conditionals were removed at the expense of adding parameters to
the function. This function isn't public, but is called from the
grok_hex, et. al. macros. grok_hex knows, for example, that it is
looking for an 'x' prefix and not a 'b'. Previously the code had a
conditional to determine that.
Similarly in pp.c, we look for the prefix. Having found it we can start
the parse after the prefix, and tell this function not to look for it.
Previously, this work was duplicated.
The previous changes had left this function slower than blead. That is
in part due to the fact that the loop doesn't go through that many
iterations per function call, and the gcc compiler managed to optimize
away the conditionals in XDIGIT_VALUE in the call of it from the loop.
(The other call in this function did have the conditionals.)
Thanks to Sergey Aleynikov for his help on this
|
|
|
|
|
|
|
|
|
|
| |
These functions are identical in logic in the main loop, the difference
being which digits they accept. The rest of the code had slight
variations. This commit unifies the functions.
I presume the reason they were kept separate was because of speed.
Future commits will make this unified function faster than blead, and
the reduced maintenance cost makes this worthwhile.
|
|
|
|
|
|
|
| |
This commit changes this function to use memchr() instead of looping
byte-by-byte through the string. And it inlines it into 3 lines of
code. This should give comparable performance to a native libc
strnlen().
|
|
|
|
| |
This parameter isn't const
|
|
|
|
| |
We handle longer strings than 31 bits.
|
|
|
|
|
| |
These generated warnings on certain platform builds, and weren't the
best types for the purpose anyway.
|
|
|
|
|
| |
This makes the first parameter consistent with the other similar
parameter.
|
|
|
|
|
|
|
| |
This is because these deal with only legal Unicode code points, which
are restricted to 21 bits, so 16 is too few, but 32 is sufficient to
hold them. Doing this saves some space/memory on 64 bit builds where an
int is 64 bits.
|
|
|
|
|
|
|
| |
These are illegal in C, but we have plenty of them around; I happened
to be looking at this function, and decided to fix it. Note that only
the macro name is illegal; the function was fine, but to change the
macro name means changing the function one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a new infix operator named `isa`, with the semantics that
$x isa SomeClass
is true if and only if `$x` is a blessed object reference that is either
`SomeClass` directly, or includes the class somewhere in its @ISA
hierarchy. It is false without warning or error for non-references or
non-blessed references.
This operator respects `->isa` method overloading, and is intended to
replace boilerplate code such as
use Scalar::Util 'blessed';
blessed($x) and $x->isa("SomeClass")
|
|
|
|
|
|
|
|
|
| |
This was caused by a static inline function in a header that was
#included in a file that didn't use it. Normally, these functions are
#ifdef'd so as to be visible only to files in which they are used.
Some compilers warn that the function is defined but not used
otherwise. The solution is to remove this function's visibility from
the file that didn't use it.
|
|
|
|
|
|
| |
This makes it less complicated to find the lowest code point in an
inversion list. This makes the place where it's used clearer as to what
is going on. And it may eventually be used in more than one place.
|
| |
|
|
|
|
|
|
|
| |
Currently, whether the OS-level signal handler function is declared as
1-arg or 3-arg depends on the configuration. Add explicit versions of
these functions, principally so that POSIX.xs can call which version of
the handler it wants regardless of configuration: see next commit.
|