| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
The flagp parameter currently can only be used to pass values up, not
down.
|
|
|
|
| |
to silence some compilers that were warning
|
|
|
|
|
| |
This variable will be used in future commits in more places, so compute
it just once.
|
| |
|
|
|
|
|
| |
This code is irrelevant unless the condition of the block immediately
before it is TRUE, so move it to within that block.
|
| |
|
|
|
|
|
| |
Based on a comment from @hvds, I think it better if this function return
an impossible node value if it didn't find a node to use.
|
| |
|
|
|
|
|
| |
Spotted by Hugo van der Sanden. Doing this caused it to attempt to be
compiled, and showed a typo.
|
|
|
|
| |
This is then used in regcomp.c to avoid an #ifdef EBCDIC
|
| |
|
|
|
|
|
|
| |
This moves the code from regcomp.c to inline.h that calculates the
position of the lone set bit in a U32. This is in preparation for use
by other call sites.
|
|
|
|
| |
Don't reinvent the macro
|
| |
|
|
|
|
| |
Comment change suggestions from @hvds in PR #18835.
|
|
|
|
|
|
|
|
| |
My attempt to insulate from the leading tab removal the year-old commits
finally pushed as 77a6d54c0deb1165b37dcf11c21cd334ae2579bb and
403d7eb3e4320188571cf61b9dab62ff10799f49 failed miserably.
I spent a bunch of time sorting it all out, and this is the result.
|
| |
|
| |
|
|
|
|
|
| |
*ACCEPT already avoids this (because it is "ENDLIKE"), but gets a
related fix to stop scanning for start class.
|
| |
|
|
|
|
|
|
|
| |
This is a rebasing by @khw of part of GH #18792, which I needed to get
in now to proceed with other commits.
It also strips trailing white space from the affected files.
|
|
|
|
|
| |
S_regclass() is unwieldy. This commit splits it into two nearly equal
size parts. More could be done.
|
|
|
|
|
|
|
|
| |
The expression we're about to add to data->pos_delta in this part of
study_chunk() can be both positive or negative; however while we apply
an overflow check to avoid exceeding OPTIMIZE_INFTY, we were happily
subtracting from it when the expression was negative, making it no longer
infinite.
|
|
|
|
| |
delta and pos_delta may hold OPTIMIZE_INFTY to represent infinity.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes GH #18604. There was a path through the code where a
particular SV did not get its reference count decremented.
I did an audit of the function and came up with several other
possiblities that are included in this commit.
Further, there would be leaks for some instances of finding syntax
errors in the input pattern, or when warnings are fatalized. Those
would require mortalizing some SVs, but that is beyond the scope of this
commit.
|
|
|
|
|
|
|
|
|
| |
Otherwise a strict linker will fail to build the extenstion due
to a multiply defined symbol. We used to do this but it was
removed in e513125ac7bdea1f for unknown reasons. The same
commit also defined some macros inside the function that are used
but inside and outside it, so put them where they can be seen
regardless of whether we are defining the function itself.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 122af31004 acted on the wrong assumption that NEXTOPER() and
regnext() were equivalent, and in fixing a valgrind complaint tried
to simplify code for detecting specific patterns for split() that
merited special-case handling by making them all use regnext().
As a result, the special case /\s+/ was no longer correctly detected,
resulting in a degree of pessimisation.
This commit fixes that, and avoids reading via the calculated 'next'
pointer except for the ops we need (in which cases we know it'll point
to another regop) - for the EXACT case (which we don't need), valgrind
was correctly pointing out that it points to potentially uninitialized
data.
|
| |
|
|
|
|
|
| |
This was the consensus in
http://nntp.perl.org/group/perl.perl5.porters/258489
|
|
|
|
|
|
| |
Not all three synonyms were documented.
This also fixes up related comments in regcomp.c to correspond
|
|
|
|
|
| |
By changing a bool into a pointer, we can avoid some work and prepare
for a future commit.
|
|
|
|
|
|
|
|
| |
This moves the finding of the matching '}' for \g{ to earlier, and
creates a temporary to point to the current position in the parse. This
makes it easier to deal with backtracking; we haven't advanced the main
parse pointer, so don't have to remember how far we advanced. This will
prove advantageous in a future commit.
|
|
|
|
| |
This is considered better practice.
|
|
|
|
|
|
|
| |
Rather than know how far we have advanced in parsing when we have to
back up, use the already-existing checkpoint position. This results in
slightly more maintainable code that a future commit will take advantage
of.
|
|
|
|
|
|
|
|
| |
This change has been planned for a long time, bringing Perl into parity
with similar languages, but it took many deprecation cycles to be able
to reach the point where it could safely go in.
This fixes GH #18264
|
|
|
|
|
|
| |
Prior to this comment a curly quantifier that had an error in the bounds
pointed to the left brace. Now the error message points to the first
bound that has a problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit copies portions of new_regcurly(), which has been around
since 5.28, into plain regcurly(), as a baby step in preparation for
converting entirely to the new one. These functions are used for
parsing {m,n} quantifiers. Future commits will add capabilities not
available using the old version.
The commit adds an optional parameter, to return to the caller
information it gleans during parsing.
regpiece() is changed by this commit to use this information, instead of
itself reparsing the input. Part of the reason for this commit is that
changes are planned soon to what is legal syntax. With this commit in
place, those changes only have to be done once.
This commit also extracts into a function the calculation of the
quantifier bounds. This allows the logic for that to be done in one
place instead of two.
|
|
|
|
|
|
| |
The new names are more understandable to me. This also adds a second
parameter to one macro, that is unused until the next commit in the
series.
|
|
|
|
|
|
|
|
|
|
|
| |
This just detabifies to get rid of the mixed tab/space indentation.
Applying consistent indentation and dealing with other tabs are another issue.
Done with `expand -i`.
* vutil.* left alone, it's part of version.
* Left regen managed files alone for now.
|
|
|
|
|
|
| |
This makes the linker have to decide (or guess) which of the
identically-named symbols to include. The VMS linker refuses
and throws a multiply-defined symbol error.
|
| |
|
|
|
|
|
|
| |
The names were intended to force people to not use them outside their
intended scopes. But by restricting those scopes in the first place, we
don't need such unwieldy names
|
|
|
|
|
|
| |
This function is called only at compile time; experience has shown that
compile-time operations are not time-critical. And future commits will
lengthen it, making it not practically inlinable anyway.
|
|
|
|
|
|
|
|
| |
Many of the files in perl are for one thing only, and hence their
embedded documentation will be for that one thing. By creating a hash
here of them, those files don't have to worry about what section that
documentation goes under, and so it can be completely changed without
affecting them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this also simplifies the flagging for these assertions, since this
error is now the only thing using in_lookhead and in_lookbehind they
can be combined into a single in_lookaround.
Rather than conditional increment/decrement as we recurse into S_reg
I simply save the value of in_lookaround and restore it before
returning. Some unsuccessful or restart paths don't do the restore,
but they either result in a croak(), or a restart which reinitialises
in_lookaround anyway.
Also added tests to ensure that all the different zero-width assertions
with content trigger the error.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was an assertion failure in regexec.c under rare circumstances. A
reduction of the fuzzed test case is now in pat_advanced.t
The root cause of this was that the pattern being compiled was encoded in
UTF-8 and 'use locale' was in effect, equivalent to the /l charset, and
then the charset was reset inside the pattern, to /d. But /d in a UTF-8
patterns is illegal, hence the later assertion failure.
The solution is to reset instead to /u when the pattern is UTF-8.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally we have to wait until runtime to do folding for regnodes that
are locale dependent, because we don't know what the locale at runtime
will be, and hence what the folds will be.
But UTF-8 locales all have the same folding behavior, no matter what the
locale is, with the exception of two fold pairs in Turkish. (Lithuanian
too, but Perl doesn't support that language's special folding rules.)
UTF-8 is the only locale type that Perl supports that can represent code
points above 255. Therefore we do know at compile time what the
above-255 folds are (again excepting the two in Turkish), and so we can
do the folding then. But only if both the components are above 255.
There are a few folds that cross the 255/256 boundary, and they must be
deferred.
However, there are two instances where there are three characters that
fold together in which two of them are above 255, and the third isn't.
That the two high ones are equivalent under /i is known at compile time,
and so that equivalence can be stated then.
|