| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
This is in preparation for a somewhat different use to be added.
|
|
|
|
|
|
| |
Before this commit, the code looped through a bitmap looking for a set
bit. Now that we have a fast way to find where a set bit is, use it,
and avoid the fruitless iterations.
|
| |
|
|
|
|
|
|
|
| |
This is a rebasing by @khw of part of GH #18792, which I needed to get
in now to proceed with other commits.
It also strips trailing white space from the affected files.
|
|
|
|
|
| |
Instead of calling the macro with a cast parameter, do the cast inside
the macro so the caller doesn't have to be bothered with it.
|
|
|
|
| |
Don't repeat a paradigm
|
|
|
|
|
| |
One comment there since 5.000 had become meaningless, so remove it;
add a couple of of other code comments to compensate.
|
|
|
|
|
| |
This used to be called from utf8.c, but no longer; no need to make it
other than static. This allows the compiler to better optimize.
|
|
|
|
|
|
| |
Addresses this build-time warning:
suggest braces around initialization of subobject [-Wmissing-braces]
|
|
|
|
|
| |
This was caused by copying too many characters for the size of the
buffer. Only one character is needed.
|
| |
|
| |
|
|
|
|
|
|
| |
This bug was introduced by bb3825626ed2b1217a2ac184eff66d0d4ed6e070, and
was the result of overflowing a 32 bit space. The solution is to rework
the expression so that it can't overflow.
|
| |
|
|
|
|
| |
Mostly indent because the prior commit created a new block
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the pattern /A*B/ where A and B are arbitrary. The pattern
matching code tries to make a tight loop to match the span of A's. The
logic of this was not really updated when UTF-8 was added. I did
revamp it some releases ago to fix some bugs and to at least consider
UTF-8.
This commit changes it so that Unicode is now a first class citizen.
Some details are listed in the ticket GH #18414
|
|
|
|
| |
The new name reflects its new functionality coming in future commits
|
| |
|
|
|
|
|
|
| |
The names were intended to force people to not use them outside their
intended scopes. But by restricting those scopes in the first place, we
don't need such unwieldy names
|
|
|
|
|
|
|
|
|
|
|
| |
This feature allows documentation destined for perlapi or perlintern to
be split into sections of related functions, no matter where the
documentation source is. Prior to this commit the line had to contain
the exact text of the title of the section. Now it can be a $variable
name that autodoc.pl expands to the title. It still has to be an exact
match for the variable in autodoc, but now, the expanded text can be
changed in autodoc alone, without other files needing to be updated at
the same time.
|
|
|
|
|
|
| |
This makes the text look cleaner, and prepares for a future commit,
where we will want to change the variable (which can't be done with the
expression).
|
|
|
|
| |
This makes it like a corresponding variable.
|
|
|
|
|
|
|
|
|
|
|
| |
I found myself getting confused, as this most likely was named before
UTF-8 came along. It actually is just a byte, plus an out-of-bounds
value.
While I'm at it, I'm also changing the type from I32, to the perl
equivalent of the C99 'int_fast16_t', as it doesn't need to be 32 bits,
and we should let the compiler choose what size is the most efficient
that still meets our needs.
|
|
|
|
|
| |
This commit uses the new macros from the previous commit to simply come
code.
|
| |
|
| |
|
|
|
|
|
| |
This is to distinguish it from a similar variable being added in a
future commit
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-on to the previous commit. The case number of the main
switch statement now includes three things: the regnode op, the UTF8ness
of the target, and the UTF8ness of the pattern.
This allows the conditionals within the previous cases (which only
encoded the op), to be removed, and things to be moved around so that
there is more fall throughs and fewer gotos, and the macros that are
called no longer have to test for UTF8ness; so I teased the UTF8 ones
apart from the non_UTF8 ones.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the #defines created in the previous commit to make the switch
statement in this function incorporate the UTF8ness of both the pattern
and the target string.
The reason for this is that the first statement in nearly every case of
the switch is to test if the target string being matched is UTF-8 or
not. By putting that information into the the case number, those
conditionals can be eliminated, leading to cleaner, more modular code.
I had hoped that this would also improve performance since there are
fewer conditionals, but Sergey Aleynikov did performance testing of this
change for me, and found no real noticeable gain nor loss.
Further, the cases involving matching EXACTish nodes have to also test
if the pattern is UTF-8 or not before doing anything else. I added that
information as well to the case number, so that those conditionals can
be eliminated. For the non-EXACTish nodes, it simply means that that
two case statements execute the same code.
This is an intermediate commit, which only does the expansion of the
current cases into four for each. The refactoring that takes advantage
of this is in the following commit.
|
|
|
|
|
| |
GH #17594: the logic here expects the node to have width 1 (except for
LNBREAK), it is not expected to do the right thing on zero-width nodes.
|
|
|
|
| |
Adjust indentation as a result of the previous commit.
|
|
|
|
|
|
|
| |
There are five \b variants. Plain \b (without braces) is the outlier as
far as implementation. This commit moves the handling of plain \b to
outside the switch that handles the others. That allows the duplicate
code that previously existed to be consolidated into one occurrence.
|
|
|
|
|
| |
apidoc_section is slightly favored over head1, as it is known only to
autodoc, and can't be confused with real pod.
|
|
|
|
|
|
|
| |
I was never happy with this short form, and other people weren't either.
Now that most things are better expressed in terms of av_count, convert
the few remaining items that are clearer when referring to an index into
using the fully spelled out form
|
|
|
|
| |
This is faster, and clearer
|
| |
|
|
|
|
|
| |
This code was advancing per-byte on a UTF-8 string, which still works,
but is slower than need be.
|
|
|
|
|
| |
It only does anything under PERL_GLOBAL_STRUCT, which is gone.
Keep the dNOOP defintion for CPAN back-compat
|
| |
|
|
|
|
|
| |
A future commit will change this array so that its size isn't known at
compilation time.
|
|
|
|
|
| |
Mostly in comments and docs, but some in diagnostic messages and one
case of 'or die die'.
|
|
|
|
|
|
| |
The code this replaces relies on the internal structure of a macro,
which can change and break things. This commit changes to use a more
straight forward way of accomplishing the same thing.
|
|
|
|
|
|
|
|
|
| |
The compilation of User-defined properties in a regular expression that
haven't been defined at the time that pattern is compiled is deferred
until execution time. Until this commit, any request for debugging info
on those was ignored.
This fixes that by
|
|
|
|
|
| |
It wasn't clear to me that the macro did more than a declaration, given
its name. Rename it to be clear as to what it does.
|
|
|
|
|
|
|
|
| |
A proper debugging statement isn't just controlled by DEBUG_r, it needs
what sort of class of debugging controls this, so that re.pm can operate
properly.
This is the second of two cases in the code where it was wrong.
|
|
|
|
|
|
|
|
| |
A proper debugging statement isn't just controlled by DEBUG_r, it needs
what sort of class of debugging controls this, so that re.pm can operate
properly.
This is one of two cases in the code where it was wrong.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unicode 12.0 used a new property file that was not from the Unicode
Character Database. It only had a long property name. I incorporated
it into our data, and rather than use the very long name all the time, I
created my own short name, since there was no official one.
Now, the upcoming 13.0 has moved the file to the UCD, and come up with a
short name that differs from the one I had. This commit converts to use
Unicode's name. This property is not exposed to user or XS space, so
there is no user impact.
|
|
|
|
|
| |
It wasn't intended to be part of the recursion logic, and doesn't get
decremented again (GH 17490).
|
|
|
|
|
| |
This makes the first parameter consistent with the other similar
parameter.
|