| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This encapsulates a common paradigm, making sure that it is done
correctly for the platform's size.
|
| |
|
| |
|
|
|
|
| |
This is another step in the process
|
| |
|
| |
|
|
|
|
|
| |
Add missing U suffix to unsigned numeric constant; parenthesize macro
expansions for safety.
|
|
|
|
|
| |
I don't understand why this compile error check is failing Jenkins, but
am removing it for now to get things to work.
|
|
|
|
|
|
|
|
| |
We get an integer overflow message when we left shift a 1 into the
highest bit of a word. This changes the 1's into 1U's to indicate
unsigned. This is done for all the flag bits in the affected word, as
they could get reorderd by someone in the future, unintentionally
reintroducing this problem again.
|
|
|
|
|
| |
The PL file previously referred to has been deleted, and replaced by a
different one.
|
|
|
|
| |
Align columns vertically
|
|
|
|
|
|
| |
This doesn't actually use the flag yet.
We no longer have to make version-dependent changes to
ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
|
|
|
|
|
|
| |
The mask to copy bits should always include at least the compile-time
bits. By defining it in terms of the compile-time bits, we make it
easier to change and understand.
|
|
|
|
|
|
|
|
|
| |
Previously the regex pattern compilation flags needed for this construct
would fit into an 8-bit byte. This conveniently fits into the flags
structure element of a regnode. There are changes coming that require
more than 8 bits, so in preparation, this commit adds an argument to the
node that implements (??{}) (31-bits usable for flags), and moves the
storage to that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch resolves several issues at once. The parts are
sufficiently interconnected that it is hard to break it down
into smaller commits. The tickets open for these issues are:
RT #94490 - split and constant folding
RT #116086 - split "\x20" doesn't work as documented
It additionally corrects some issues with cached regexes that
were exposed by the split changes (and applied to them).
It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171
and cccd1425414e6518c1fc8b7bcaccfb119320c513.
Prior to this patch the special RXf_SKIPWHITE behavior of
split(" ", $thing)
was only available if Perl could resolve the first argument to
split at compile time, meaning under various arcane situations.
This manifested as oddities like
my $delim = $cond ? " " : qr/\s+/;
split $delim, $string;
and
split $cond ? " ", qr/\s+/, $string
not behaving the same as:
($cond ? split(" ", $string) : split(/\s+/, $string))
which isn't very convenient.
This patch changes this by adding a new flag to the op_pmflags,
PMf_SPLIT which enables pp_regcomp() to know whether it was called
as part of split, which allows the RXf_SPLIT to be passed into run
time regex compilation. We also preserve the original flags so
pattern caching works properly, by adding a new property to the
regexp structure, "compflags", and related macros for accessing it.
We preserve the original flags passed into the compilation process,
so we can compare when we are trying to decide if we need to
recompile.
Note that this essentially the opposite fix from the one applied
originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171.
The reverted patch was meant to make:
split( 0 || " ", $thing ) #1
consistent with
my $x=0; split( $x || " ", $thing ) #2
and not with
split( " ", $thing ) #3
This was reverted because it broke C<split("\x{20}", $thing)>, and
because one might argue that is not that #1 does the wrong thing,
but rather that the behavior of #2 that is wrong. In other words
we might expect that all three should behave the same as #3, and
that instead of "fixing" the behavior of #1 to be like #2, we should
really fix the behavior of #2 to behave like #3. (Which is what we did.)
Also, it doesn't make sense to move the special case detection logic
further from the regex engine. We really want the regex engine to decide
this stuff itself, otherwise split " ", ... wouldn't work properly with
an alternate engine. (Imagine we add a special regexp meta pattern that behaves
the same as " " does in a split /.../. For instance we might make
split /(*SPLITWHITE)/ trigger the same behavior as split " ".
The other major change as result of this patch is it effectively
reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which
was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which
and free up bits in the regex flags structure.
But we dont want to get rid of these vars, and it turns out that
RXf_SEEN_LOOKBEHIND is used only in the same situation as the new
RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to
RXf_NO_INPLACE_SUBST, and then instead of using two vars we use
only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to
have their bits back.
|
|
|
|
|
|
| |
Commit 3018b823898645e44b8c37c70ac5c6302b031381 removed the regular
expression operations (regnodes) that these comments refer to, replacing
them with different ones. Update the comments to be accurate
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the node types that have differing versions depending on the
character set regex modifiers, /d, /l, /u, /a, and /aa, we can use the
enum values as offsets from the base node number to derive the correct
one. This eliminates a number of tests.
Because there is no DIGITU node type, I added placeholders for it (and
NDIGITU) to avoid some special casing of it (more important in future
commits). We currently have many available node types, so can afford to
waste these two.
|
|
|
|
| |
and add assert that a (U32 & mask) value can fit in a U8.
|
|
|
|
|
|
|
| |
Sync copyright dates with actual changes according to git history.
[Plus run regen_perly.h to update the SHA-256 checksums, and
regen/regcharclass.pl to update regcharclass.h]
|
| |
|
|
|
|
|
| |
This changes the bits to add a new charset type for /aa, and other bookkeeping
for it.
|
|
|
|
|
| |
A version of the g++ compiler isn't allowing the implicit cast of U32 to an
enum. Change to use an explicit cast.
|
|
|
|
|
| |
This restricts certain constructs, like \w, to matching in the ASCII range
only.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The /d, /l, and /u regex modifiers are mutually exclusive. This patch
changes the field that stores the character set to use more than one bit
with an enum determining which one. This data structure more
closely follows the semantics of their being mutually exclusive, and
conserves bits as well, and is better expandable.
A small API is added to set and query the bit field.
This patch is not .xs source backwards compatible. A handful of cpan
programs are affected.
|
|
|
|
| |
This is in preparation for adding some in-line functions.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds recognition of these modifiers, with appropriate action
for d and l. u does nothing useful yet. This allows for the
interpolation of a regex into another one without losing the character
set semantics that it was compiled with, as for the first time, the
semantics is now specified in the stringification as one of these
modifiers.
To this end, it allocates an unused bit in the structures. The off-
sets change so as to not disturb other bits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new op_reg_common.h did not have in it all the things that made
sense for it to have, including some comment changes that I should have
made when I created it.
I also realized the the new mechanism of using shifts allowed
RXf_PMf_STD_PMMOD_SHIFT to actually control things, rather than be a
#define that one had to remember to change if those things changed
independently.
Finally, I created a check so that adding bits without adding them to
RXf_PMf_COMPILETIME will force a compilation error. (This came from the
school of hard knocks)
|
|
|
|
|
|
|
| |
Moving the definitions of the duplicate variables makes it easier to
read. Unfortunately, the values can't be in terms of the previous ones
because defsubs_h.PL doesn't pick them up. So I've made them numeric
with a #if to make sure they don't drift off.
|
|
|
|
|
|
|
|
| |
This patch changes the variable that tells how many common bits there
are to instead be +1 that value, so bits won't get reused. A later
commit will renumber the bits in op.h and regexp.h, but for now things
are left as-is there, which means the base variables in those two files
must subtract one to compensate for the +1
|
|
op.h and regexp.h share common elements in their data structures. They
have had to manually be kept in sync. This patch makes it easier by
putting those common parts into a common header #included by the two.
To do this, it seemed easiest to change the symbol definitions to use
left shifts to generate the flag bits. But this meant that regcomp.pl
and axt/B/defsubs_h.PL had to be taught to recognize those forms of
expressions, done in separate commits
|