| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
By calling get-magic twice, it could cause its string buffer to be
reallocated, resulting in incorrect and random return values.
|
|
|
|
|
|
|
|
|
| |
Prior to this commit 98.4% of Unicode code points that went through \X
had to be looked up to see if they begin a grapheme cluster; then looked
up again to find that they didn't require special handling. This commit
refactors things so only one look-up is required for those 98.4%. It
changes the table generated by mktables to accomplish this, and hence
the name of it, and references to it are changed to correspond.
|
|
|
|
|
|
|
|
|
|
|
| |
This changes code to be able to handle Unicode 6.2, while continuing to
handle all prevrious releases.
The major change was a new definition of \X, which adds a property to
its calculation. Unfortunately \X is hard-coded into regexec.c, and so
has to revised whenever there is a change of this magnitude in Unicode,
which fortunately isn't all that often. I refactored the code in
mktables to make it easier next time there is a change like this one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way. sub { %hash || ... } is optimised to take advantage of this. If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean. When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.
If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more. We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
|
|
|
|
|
|
|
| |
Now that we have a flags parameter, we can get put this parameter as
just another flag, giving a cleaner interface to this internal-only
function. This also renames the flag parameter to <flag_p> to indicate
it needs to be dereferenced.
|
|
|
|
|
| |
This function only does something on EBCDIC platforms. On ASCII ones
make it a macro, like similar ones to avoid useless function nesting
|
|
|
|
|
|
|
|
|
|
|
| |
This revises the API for the version of swash_init() that is usable
by core Perl. The external interface is unaffected. There is now a
flags parameter to allow for future growth. And the core internal-only
function that returns if a swash has a user-defined property in it or
not has been removed. This information is now returned via the new
flags parameter upon initialization, and is unavailable afterwards.
This is to prepare for the flexibility to change the swash that is
needed in future commits.
|
|
|
|
|
| |
This function is not designed for a public API, and should have been so
listed.
|
|
|
|
|
|
|
| |
Benchmarking showed some speed-up when the result of the previous
search in an inversion list is cached, thus potentially avoiding a
search in the next call. This adds a field to each inversion list which
caches its previous search result.
|
|
|
|
|
|
| |
In looking at \X handling, I noticed that this function which is
intended for use in it, actually isn't used. This function may someday
be useful, so I'm leaving the source in.
|
|
|
|
|
|
| |
This populates inline_invlist.c with some static inline functions and
macro defines. These are the ones that are anticipated to be needed in
the near term outside regcomp.c
|
|
|
|
|
|
| |
These two functions will be moved into a header in a future commit,
where they will be accessible outside regcomp.c Prefix their names with
an underscore to emphasize that they are private
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CVs close over their outer CVs. So, when you write:
my $x = 52;
sub foo {
sub bar {
sub baz {
$x
}
}
}
baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo,
and foo’s to the main cv.
When the inner reference to $x is looked up, the CvOUTSIDE chain is
followed, and each sub’s pad is looked at to see if it has an $x.
(This happens at compile time.)
It can happen that bar is undefined and then redefined:
undef &bar;
eval 'sub bar { my $x = 34 }';
After this, baz will still refer to the main cv’s $x (52), but, if baz
had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x.
(It’s not really a new bar, as its refaddr is the same, but it has a
new body.)
This particular case is harmless, and is obscure enough that we could
define it any way we want, and it could still be considered correct.
The real problem happens when CVs are cloned.
When a CV is cloned, its name pad already contains the offsets into
the parent pad where the values are to be found. If the outer CV
has been undefined and redefined, those pad offsets can be com-
pletely bogus.
Normally, a CV cannot be cloned except when its outer CV is running.
And the outer CV cannot have been undefined without also throwing
away the op that would have cloned the prototype.
But formats can be cloned when the outer CV is not running. So it
is possible for cloned formats to close over bogus entries in a new
parent pad.
In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed)
instead of SCALAR(0xdeafbee):
sub foo {
my $x;
format =
@
($x,warn \$x)[0]
.
}
undef &foo;
eval 'sub foo { my @x; write }';
foo
__END__
And if the offset that the format’s pad closes over is beyond the end
of the parent’s new pad, we can even get a crash, as in this case:
eval
'sub foo {' .
'{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999
. q|
my $x;
format =
@
($x,warn \$x)[0]
.
}
|;
undef &foo;
eval 'sub foo { my @x; my $x = 34; write }';
foo();
__END__
So now, instead of using CvROOT to identify clones of
CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t
actually have an ID, so we give them one. Any time a sub is cloned,
the new padlist gets the same ID as the old. The format needs to
remember what its outer sub’s padlist ID was, so we put that in the
padlist struct, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to fix a bug, I need to add new fields to padlists. But I
cannot easily do that as long as they are AVs.
So I have created a new padlist struct.
This not only allows me to extend the padlist struct with new members
as necessary, but also saves memory, as we now have a three-pointer
struct where before we had a whole SV head (3-4 pointers) + XPVAV (5
pointers).
This will unfortunately break half of CPAN, but the pad API docs
clearly say this:
NOTE: this function is experimental and may change or be
removed without notice.
This would have broken B::Debug, but a patch sent upstream has already
been integrated into blead with commit 9d2d23d981.
|
|
|
|
|
| |
Much code relies on the fact that PADLIST is typedeffed as AV.
PADLIST should be treated as a distinct type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bracketed character class containing a single Latin1-range character
has long been optimized into an EXACT node. Also, flags are set to
include SIMPLE. However, EXACT nodes containing code points that are
different when encoded under UTF-8 versus not UTF-8 should not be marked
simple.
To fix this, the address of the flags parameter is now passed to
regclass(), the function that parses bracketed character classes, which
now sets it appropriately. The unconditional setting of SIMPLE that was
always done in the code after calling regclass() has been removed.
In addition, the setting of the flags for EXACT nodes has been pushed
into the common function that populates them.
regclass() will also now increment the naughtiness count if optimized to
a node that normally does that. I do not understand this heuristic
behavior very well, and could not come up with a test case for it;
experimentation revealed that there are no test cases in our test suite
for which naughtiness makes any difference at all.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing formats, the lexer invents tokens to feed to the parser.
So when the lexer dissects this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
The parser actually sees this (the parser knows that = . is like { }):
format =
; formline "@<<<< @>>>>\n", $foo, $bar, $baz;
.
The lexer makes no effort to make sure that the argument line is con-
tained within formline’s arguments. To make
{ do_stuff; $foo, bar }
work, the lexer supplies a ‘do’ before the block, if it is
inside a format.
This means that
$a, $b; $c, $d
feeds ($a, $b) to formline, wheras
{ $a, $b; $c, $d }
feeds ($c, $d) to formline. It also has various other
strange effects:
This script prints "# 0" as I would expect:
print "# ";
format =
@
(0 and die)
.
write
This one, locking parentheses, dies because ‘and’ has low precedence:
print "# ";
format =
@
0 and die
.
write
This does not work:
my $day = "Wed";
format =
@<<<<<<<<<<
({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day})
.
write
You have to do this:
my $day = "Wed";
format =
@<<<<<<<<<<
({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day})
.
write
which is very strange and shouldn’t even be valid syntax.
This does not work, because ‘no’ is not allowed in an expression:
use strict;
$::foo = "bar"
format =
@<<<<<<<<<<<
no strict; $foo
.
write;
Putting a block around it makes it work. Putting a semicolon before
‘no’ stop it from being a syntax error, but it silently does the
wrong thing.
I thought I could fix all these by putting an implicit do { ... }
around the argument line and removing the special-casing for an open-
ing brace, allowing anonymous hashrefs to work in formats, such
that this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
would turn into this:
format =
; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; };
.
But that will lead to madness like this ‘working’:
format =
@
}+do{
.
It would also stop lexicals declared in one format line from being
visible in another.
So instead this commit starts being honest with the parser. We still
have some ‘invented’ tokens, to indicate the start and end of a format
line, but now it is the parser itself that understands a sequence of
format lines, instead of being fed generated code.
So the example above is now presented to the parser like this:
format = ; FORMRBRACK
"@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK
; .
Note about the semicolons: The parser expects to see a semicolon at
the end of each statement. So the lexer has to supply one before
FORMRBRACK. The final dot goes through the same code that handles
closing braces, which generates a semicolon for the same reason. It’s
easier to make the parser expect a semicolon before the final dot than
to change the } code in the lexer. We use the } code for . because it
handles the internal variables that keep track of how many nested lev-
els there, what kind, etc.
The extra ;FORMRBRACK after the = is there also to keep the lexer sim-
ple (ahem). When a newline is encountered during ‘normal’ (as opposed
to format picture) parsing inside a format, that’s when the semicolon
and FORMRBRACK are emitted. (There was already a semicolon there
before this commit. I have just added FORMRBRACK in the same spot.)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is to allow future changes. The function now returns success or
failure, and the created regnode (if any) is set via a parameter
pointer.
I removed the 'register' declaration to get this to work, because
such declarations are considered bad form these days, e.g.,
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
|
|
|
|
|
|
|
|
|
| |
This was a static function which I couldn't get to be callable from the
debugging version of regcomp.c. This makes it public, but known only
in the regcomp.c source file. It changes the name to begin with an
underscore so that if someone cheats by adding preprocessor #defines,
they still have to call it with the name that convention indicates is a
private function.
|
|
|
|
| |
This function handles \N of any ilk, not just named sequences.
|
|
|
|
|
|
|
| |
The magic flags patch prevents this from ever being called, since the
OK flags work the same way for magic variables now as they have for
muggle vars, avoid these fiddly games. (It was when writing it that I
realised the value of the magic flags proposal.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A substitution forces its target to a string upon successful substitu-
tion, even if the substitution did nothing:
$ ./perl -Ilib -le '$a = *f; $a =~ s/f/f/; print ref \$a'
SCALAR
Notice that $a is no longer a glob after s///.
But vstrings are different:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/f/; print ref \$a'
VSTRING
I fixed this in 5.16 (1e6bda93) for those cases where the vstring ends
up with a value that doesn’t correspond to the actual string:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/o/; print ref \$a'
SCALAR
It works through vstring set-magic, that does the check and removes
the magic if it doesn’t match.
I did it that way because I couldn’t think of any other way to fix
bug #29070, and I didn’t realise at the time that I hadn’t fixed
all the bugs.
By making SvTHINKFIRST true on a vstring, we force it through
sv_force_normal before any in-place string operations. We can also
make sv_force_normal handle vstrings as well. This fixes all the lin-
gering-vstring-magic bugs in just two lines, making the vstring set-
magic (which is also slow) redundant. It also allows the special case
in sv_setsv_flags to be removed.
Or at least that was what I had hoped.
It turns out that pp_subst, twists and turns in tortuous ways, and
needs special treatment for things like this.
And do_trans functions wasn’t checking SvTHINKFIRST when arguably it
should have.
I tweaked sv_2pv{utf8,byte} to avoid copying magic variables that do
not need copying.
|
|
|
|
|
| |
This simply searches an inversion list without going through a swash.
It will be used in a future commit.
|
|
|
|
|
|
| |
This should have been written this way to begin with (I'm the culprit).
But we should have a method so another routine doesn't have to know the
internal details.
|
|
|
|
| |
This adds _invlistEQ which for now is commented out
|
|
|
|
| |
This code will be used in future commits in multiple places
|
|
|
|
|
|
| |
Future commits will use this paradigm in additional places, so extract
it to a function, so they all do things right. This isn't a great API,
but it works for the few places this will be called.
|
| |
|
|
|
|
| |
This should have been declared const.
|
| |
|
|
|
|
|
|
| |
Future commits will use this functionality in additional places beyond
the single one currently. It makes sense to abstract it into a
function.
|
|
|
|
|
|
|
| |
This creates a function to hide some of the internal details of swashes
from the regex engine, which is the only authorized user, enforced
through #ifdefs in embed.fnc. These work closely together, but it's
best to have a clean interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In restore_magic(), which is called after any magic processing, all of
the public OK flags have been shifted into the private OK flags. Thus
the lack of an appropriate public OK flags was used to trigger both get
magic and required conversions. This scheme did not cover ROK, however,
so all properly written code had to make sure mg_get was called the right
number of times anyway. Meanwhile the private OK flags gained a second
purpose of marking converted but non-authoritative values (e.g. the IV
conversion of an NV), and the inadequate flag shift mechanic broke this
in some cases.
This patch removes the shift mechanic for magic flags, thus exposing (and
fixing) some improper usage of magic SVs in which mg_get() was not called
correctly. It also has the side effect of making magic get functions
specifically set their SVs to undef if that is desired, as the new behavior
of empty get functions is to leave the value unchanged. This is a feature,
as now get magic that does not modify its value, e.g. tainting, does not
have to be special cased.
The changes to cpan/ here are only temporary, for development only, to
keep blead working until upstream applies them (or something like them).
Thanks to Rik and Father C for review input.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit eliminates the old slab allocator. It had bugs in it, in
that ops would not be cleaned up properly after syntax errors. So why
not fix it? Well, the new slab allocator *is* the old one fixed.
Now that this is gone, we don’t have to worry as much about ops leak-
ing when errors occur, because it won’t happen any more.
Recent commits eliminated the only reason to hang on to it:
PERL_DEBUG_READONLY_OPS required it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC),
but this useful debugging tool needs to be rewritten for the new
one first.
This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre-
ated after the main CV starts running will get read-only ops, too. It
is when a CV finishes compiling and relinquishes ownership of the slab
that the slab is made read-only, because at that point it should not
be used again for allocation.
BEGIN blocks are exempt, as they are processed before the Slab_to_ro
call in newATTRSUB. The Slab_to_ro call must come at the very end,
after LEAVE_SCOPE, because otherwise the ops freed via the stack (the
SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa-
ble again. At that point, the BEGIN block has already been run and
its slab freed. Maybe slabs belonging to BEGIN blocks can be made
read-only later.
Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to
record the size and readonliness of each slab. (Only the first slab
in a CV’s slab chain uses the readonly flag, since it is conceptually
simpler to treat them all as one unit.) Without recording this infor-
mation manually, things become unbearably slow, the tests taking hours
and hours instead of minutes.
|
|
|
|
|
|
| |
A previous commit has removed all calls to these two functions (moving a
large portion of the bit_fold() one to another place, and no longer sets
the variable.
|
|
|
|
|
|
|
|
|
|
| |
These macros have never worked outside the Latin1 range, so this extends
them to work.
There are no tests I could find for things in handy.h, except that many
of them are called all over the place during the normal course of
events. This commit adds a new file for such testing, containing for
now only with a few tests for the isBLANK's
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was brought up in ticket #113812.
Formats that are nested inside closures only work if invoked from
directly inside that closure. Calling the format from an inner sub
call won’t work.
Commit af41786fe57 stopped it from crashing, making it work as well
as 5.8, in that closed-over variables would be undefined, being
unavailable.
This commit adds a variation of the find_runcv function that can check
whether CvROOT matches an argument passed in. So we look not for the
current sub, but for the topmost sub on the call stack that is a clone
of the closure prototype that the format’s CvOUTSIDE field points to.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses bugs #111462 and #112312 and part of #107000.
When a longjmp occurs during lexing, parsing or compilation, any ops
in C autos that are not referenced anywhere are leaked.
This commit introduces op slabs that are attached to the currently-
compiling CV. New ops are allocated on the slab. When an error
occurs and the CV is freed, any ops remaining are freed.
This is based on Nick Ing-Simmons’ old experimental op slab implemen-
tation, but it had to be rewritten to work this way.
The old slab allocator has a pointer before each op that points to a
reference count stored at the beginning of the slab. Freed ops are
never reused. When the last op on a slab is freed, the slab itself is
freed. When a slab fills up, a new one is created.
To allow iteration through the slab to free everything, I had to have
two pointers; one points to the next item (op slot); the other points
to the slab, for accessing the reference count. Ops come in different
sizes, so adding sizeof(OP) to a pointer won’t work.
The old slab allocator puts the ops at the end of the slab first, the
idea being that the leaves are allocated first, so the order will be
cache-friendly as a result. I have preserved that order for a dif-
ferent reason: We don’t need to store the size of the slab (slabs
vary in size; see below) if we can simply follow pointers to find
the last op.
I tried eliminating reference counts altogether, by having all ops
implicitly attached to PL_compcv when allocated and freed when the CV
is freed. That also allowed op_free to skip FreeOp altogether, free-
ing ops faster. But that doesn’t work in those cases where ops need
to survive beyond their CVs; e.g., re-evals.
The CV also has to have a reference count on the slab. Sometimes the
first op created is immediately freed. If the reference count of
the slab reaches 0, then it will be freed with the CV still point-
ing to it.
CVs use the new CVf_SLABBED flag to indicate that the CV has a refer-
ence count on the slab. When this flag is set, the slab is accessible
via CvSTART when CvROOT is not set, or by subtracting two pointers
(2*sizeof(I32 *)) from CvROOT when it is set. I decided to sneak the
slab into CvSTART during compilation, because enlarging the xpvcv
struct by another pointer would make all CVs larger, even though this
patch only benefits few (programs using string eval).
When the CVf_SLABBED flag is set, the CV takes responsibility for
freeing the slab. If CvROOT is not set when the CV is freed or
undeffed, it is assumed that a compilation error has occurred, so the
op slab is traversed and all the ops are freed.
Under normal circumstances, the CV forgets about its slab (decrement-
ing the reference count) when the root is attached. So the slab ref-
erence counting that happens when ops are freed takes care of free-
ing the slab. In some cases, the CV is told to forget about the slab
(cv_forget_slab) precisely so that the ops can survive after the CV is
done away with.
Forgetting the slab when the root is attached is not strictly neces-
sary, but avoids potential problems with CvROOT being written over.
There is code all over the place, both in core and on CPAN, that does
things with CvROOT, so forgetting the slab makes things more robust
and avoids potential problems.
Since the CV takes ownership of its slab when flagged, that flag is
never copied when a CV is cloned, as one CV could free a slab that
another CV still points to, since forced freeing of ops ignores the
reference count (but asserts that it looks right).
To avoid slab fragmentation, freed ops are marked as freed and
attached to the slab’s freed chain (an idea stolen from DBM::Deep).
Those freed ops are reused when possible. I did consider not reusing
freed ops, but realised that would result in significantly higher mem-
ory using for programs with large ‘if (DEBUG) {...}’ blocks.
SAVEFREEOP was slightly problematic. Sometimes it can cause an op to
be freed after its CV. If the CV has forcibly freed the ops on its
slab and the slab itself, then we will be fiddling with a freed slab.
Making SAVEFREEOP a no-op won’t help, as sometimes an op can be
savefreed when there is no compilation error, so the op would never
be freed. It holds a reference count on the slab, so the whole
slab would leak. So SAVEFREEOP now sets a special flag on the op
(->op_savefree). The forced freeing of ops after a compilation error
won’t free any ops thus marked.
Since many pieces of code create tiny subroutines consisting of only
a few ops, and since a huge slab would be quite a bit of baggage for
those to carry around, the first slab is always very small. To avoid
allocating too many slabs for a single CV, each subsequent slab is
twice the size of the previous.
Smartmatch expects to be able to allocate an op at run time, run it,
and then throw it away. For that to work the op is simply mallocked
when PL_compcv has’t been set up. So all slab-allocated ops are
marked as such (->op_slabbed), to distinguish them from mallocked ops.
All of this is kept under lock and key via #ifdef PERL_CORE, as it
should be completely transparent. If it isn’t transparent, I would
consider that a bug.
I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as
it is used by PERL_DEBUG_READONLY_OPS, which I am not about to
rewrite. :-)
Concerning the change from A to X for slab allocation functions:
Many times in the past, A has been used for functions that were
not intended to be public but were used for public macros. Since
PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_*
to be API functions, since they were rarely actually available. To
avoid propagating this mistake further, they are now X.
|
|
|
|
| |
This fixes RT #75596.
|
|
|
|
|
|
|
|
|
|
| |
There are three places that process \x. These can and did get out of
sync. This moves all three to use a common static inline function so
that they all do the same thing on the same inputs, and their behaviors
will not drift apart again.
This commit should not change current behavior. A previous commit
was designed to bring all three to identical behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two code paths, sv_2cv (for \&name) and get_cvn_flags (for
&{"name"}()) were using start_subparse and newATTRSUB to create a
subroutine stub, which is what usually happens for Perl subs (with
op trees).
This resulted in subs with unused pads attached to them, because
start_subparse sets up the pad, which must be accessible dur-
ing parsing.
One code path, gv_init, which (among other things) reifies a GV after
a sub declaration (like ‘sub foo;’, which for efficiency doesn’t
create a CV), created the subroutine stub itself, without using
start_subparse/newATTRSUB.
This commit takes the code from gv_init, makes it more generic so it
can apply to the other two cases, puts it in a new function called
newSTUB, and makes all three locations call it.
Now stub creation should be faster and use less memory.
Additionally, this commit causes sv_2cv and get_cvn_flags to bypass
bug #107370 (glob stringification not round-tripping properly). They
used to stringify the GV and pass the string to newATTRSUB (wrapped in
an op, of all things) for it to look up the GV again. While bug
been fixed, as it was a side effect of sv_2cv triggering bug #107370.
|
|
|
|
|
|
|
| |
currently, S_regcppush() pushes PL_reginput, then S_regcppop() pops its
value and returns it. However, all calls to S_regcppop() are currently in
void context, so nothing actually uses this value. So don't save it in the
first place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
eliminate the three vars
PL_reglastcloseparen
PL_reglastparen
PL_regoffs
(which are actually aliases to PL_reg_state struct elements).
These three vars always point to the corresponding fields within the
currently executing regex; so just access those fields directly instead.
This makes switching between regexes with (??{}) simpler: just update
rex, and everything automatically references the new fields.
|
|
|
|
|
| |
This flag pointer only stores truth, so make it a pointer to a bool rather
than to an int.
|
|
|
|
|
|
|
|
|
| |
These two functions, which have been a pimple on the face of perl for
far too long, are no longer needed, now that regex code blocks are
compiled in a sensible manner.
This also allows S_doeval() to be simplified, now that it is no longer
called from sv_compile_2op_is_broken().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commits in this branch have brought literal code blocks
into the New World Order; now do the same for runtime blocks, i.e. those
needing "use re 'eval'".
The main user-visible changes from this commit are that:
* the code is now fully parsed, rather than needing balanced {}'s; i.e.
this now works:
my $code = q[ (?{ $a = '{' }) ];
use re 'eval';
/$code/
* warnings and errors are now reported as coming from "(eval NNN)" rather
than "(re_eval NNN)" (although see the next commit for some fixups to
that). Indeed, the string "re_eval" has been expunged from the source
and documentation.
The big internal difference is that the sv_compile_2op() and
sv_compile_2op_is_broken() functions are no longer used, and will be
removed shorty.
It works by the regex compiler detecting the presence of run-time code
blocks, and feeding the whole pattern string back into the parser (where
the run-time blocks are now seen as compile-time), then extracting out
any compiled code blocks and adding them to the mix.
For example, in the following:
$c = '(?{"runtime"})d';
use re 'eval';
/a(?{"literal"})\b'$c/
At the point the regex compiler is called, the perl parser will already
have compiled the literal code block and presented it to the regex engine.
The engine examines the pattern string, sees two '(?{', but only one
accounted for by the parser, and so constructs a short string to be
evalled: based on the pattern, but with literal code-blocks blanked out,
and \ and ' escaped. In the above example, the pattern string is
a(?{"literal"})\b'(?{"runtime"})d
and we call eval_sv() with an SV containing the text
qr'a \\b\'(?{"runtime"})d'
The returned qr will contain the new code-block (and associated CV and
pad) which can be extracted and added to the list of compiled code blocks
of the original pattern.
Note that with this scheme, the requirement for "use re 'eval'" is easily
determined, and no longer requires all the pp_regcreset / PL_reginterp_cnt
machinery, which will be removed shortly.
Two subtleties of this scheme are that normally, \\ isn't collapsed into \
for literal regexes (unlike literal strings), and hints aren't inherited
when using eval_sv(). We get round both of these by adding and setting a
new flag, PL_reg_state.re_reparsing, which indicates that we are refeeding
a pattern into the perl parser.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl's internal function for compiling regexes that knows about code
blocks, Perl_re_op_compile, isn't part of the engine API. However, the
way that regcomp.c is dual-lifed as ext/re/re_comp.c with debugging
compiled in, means that Perl_re_op_compile is also compiled as
my_re_op_compile. These days days the mechanism to choose whether to call
the main functions or the debugging my_* functions when 'use re debug' is
in scope, is the re engine API jump table. Ergo, to ensure that
my_re_op_compile gets called appropriately, this method needs adding to
the jump table.
So, I've added it, but documented as 'for perl internal use only, set to
null in your engine'.
I've also updated current_re_engine() to always return a pointer to a jump
table, even if we're using the internal engine (formerly it returned
null). This then allows us to use the simple condition (eng->op_comp)
to determine whether the current engine supports code blocks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two sets of regex-related flags; the RXf_* which
end up in the extflags field of a REGEXP, and the PMf_*, which
are in the op_pmflags field of a PMOP.
Since I added the PMf_HAS_CV and PMf_IS_QR flags, I've been conflating
these two meanings in the single flags arg to re_op_compile(), which meant
that some bits were being misinterpreted. The only test that was failing
was peek.t, but it may have quietly broken other things that simply
weren't tested for (for example PMf_HAS_CV and RXf_SPLIT share the same
value, so something with split qr/(?{...})/ might get messed up).
So, split this arg into two; one for the RXf* flags, and one for the PMf_*
flags.
The public regexp API continues to have only a single flags arg,
which should only be accepting RXf_* flags.
|