| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
These functions are only used when the native sockets functions are not
available, e.g. when building miniperl on Windows following commit
19253ae62c, so gcc's warning about ignoring the __malloc__ attribute here
is not normally seen.
The addition of "a" to these functions in embed.fnc by
commit f54cb97a39 was presumably wrong since none of them actually
allocate any memory (nor did so at the time), so change it to just "R"
(which is implied by the "a" and is still appropriate).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most perls are built with PERL_MALLOC_WRAP. This causes MEM_WRAP_CHECK
macro to perform some checks on the requested allocation size in macro
Newx. The checks are performed at the caller, not in the callee (for me
on Win32 perl the callee in Newx is Perl_safesysmalloc) of Newx.
If the check fails a "Perl_croak_nocontext("%s",PL_memory_wrap)" is done.
In x86 machine code,
"if(bad_alloc) Perl_croak_nocontext("%s",PL_memory_wrap); will be written
as "cond jmp ahead ~15 bytes", "push const pointer", "push const pointer",
"call const pointer". For each Newx where the allocation amount was not a
constant (constant folding would remove the croak memory wrap branch
compleatly), the branch takes 15-19 bytes depending on x86 compiler. There
are about 80 Newx'es in the interp (win32 dynamic linking perl) that do
the memory wrap check and have a
"Perl_croak_nocontext("%s",PL_memory_wrap)" in them after all optimizations
by the compiler.
This patch reduces the memory wrap branch from 15-19 to
5 bytes on x86. Since croak_memory_wrap is a static and a noreturn, a
compiler with IPO may optimize the whole branch to "cond jmp 32 bits
relative" at each callsite. A less optimal complier may do "cond jmp 8 bits
relative (jump past the "call S_croak_memory_wrap" instruction),
then "call S_croak_memory_wrap". Both ways are better than the current
situation. The reason why croak_memory_wrap is a static and not an export
is that the compiler has more opportunity to optimize/reduce the impact of
the memory wrap branch at the call site if the target is in the same image
rather than in a different image, which would require using the platform
specific dynamic linking mechanism/export table/etc, which often requires
a new stack frame per ABI of the platform. If a dynamic linked XS module
does not use S_croak_memory_wrap it will be removed from the image by the
C compiler. If it is included in the XS image, it is a very small block
of code and a 3 byte string litteral. A CPU cache line is typically
32 or 64 bytes and a memory read is typically 16. Cutting the
instructions by 10 to 16 bytes out of "hot code" (10 of the ~80 call
sites are pp_*) is a worthy goal. In a few places the memory wrap croak is
used explictly, not from a MEM_WRAP_CHECK, this patch converts those to use
the static. If PERL_MALLOC_WRAP is undef, there are still a couple uses of
croak memory wrap, so do not keep S_croak_memory_wrap in a ifdef
PERL_MALLOC_WRAP. Also see
http://www.nntp.perl.org/group/perl.perl5.porters/2012/10/msg194383.html
and [perl #115456].
|
|
|
|
|
|
|
|
|
| |
An ANYOF node now no longer matches more than one character, since
9d53c4576e551530162e7cd79ab72ed81b1e1a0f. This code was overlooked in
the clean up commit e0193e472b025d41438e251be622aad42c9af9cc. Since the
maximum match is 1 character, there is no point in passing a ptr that
was set to indicate how far the match went, so that parameter is
removed.
|
|
|
|
|
|
| |
A recent commit has changed the algorithm used to handle multi-character
folding in bracketed character classes. The old code is no longer
needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit cdc4a174060 static noreturn function, on a C++ build, (specific
example, GCC ) got a post preprocessor prototype of
"extern "C" static void S_fn_doesnt_return(". GCC generates a compile error
if "extern "C"" and static used together. Plain C build were not affected.
This commit fixed the problem by creating 2 new static exclusive macros, so
extern "C" does not wind up on statics in a C++ build. The macros allow
enough flexibility so any compiler/platform that needs a noreturn
declaration specifier instead of a noreturn function attribute can have
one.
|
|
|
|
|
|
|
|
|
| |
In commit 12a2785c7e8 PERL_CALLCONV_NO_RET was added to allow MS Visual C's
noreturn to work. In that commit, statics did not get a PERL_CALLCONV_NO_RET
so Visual C may not always figure out that a certain static is a noreturn.
This patch fixes that and allows statics to be Visual C noreturns. I
observed a drop in the .text section from 0xBEAAF to 0xBE8CF on no
DEBUGGING 32 bit VC 2003 -01 -GL/-LTCG after applying this.
|
|
|
|
|
|
| |
To prevent this very-internal core function from being used by XS
writers, it isn't defined except if the preprocessor indicates it is
compiling certain .c files. Add regexec.c to the list
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c72077c4fff72b66cdde1621c62fb4fd383ce093 fixed a place where
to_byte_substr() fails, but the code continued as if it had succeeded.
There is yet another place where the return is not checked. This commit
adds a check there.
However, it turns out that there is another underlying problem to
[perl #114808]. The function to_byte_substr() tries to downgrade the
substr fields in the regex program it is passed. If it fails (because
something in it is expressible only in UTF-8), it permanently changes
that field to point to PL_sv_undef, thus losing the original
information. This is fine as long as the program will be used once and
discarded. However, there are places where the program is re-used, as
in the test case introduced by this commit, and the original value has
been lost.
To solve this, this commit also changes to_byte_substr() from returning
void to instead returning bool, indicating success or failure. On
failure, the original substrs are left intact.
The calls to this function are correspondingly changed. One of them had
a trace statement when the failure happens, I reworded it to be more
general and accurate (it was slightly misleading), and added the trace
to every such place, not just the one.
In addition, I found the use of the same ternary operation in 3 or 4
consecutive lines very hard to understand; and is inefficient unless
compiled under C optimization which avoids recalculating things. So I
expanded the several nearly identical places in the code that do that so
that I could quickly see what is going on.
|
|
|
|
|
|
|
|
| |
XS code doing sv_mortalcopy(sv) will expect to get a true copy, and
not a COW ‘copy’.
So make sv_mortalcopy and wrapper around the new sv_mortalcopy_flags
that passes it SV_DO_COW_SVSETSV, which is defined as 0 for XS code.
|
|
|
|
|
|
|
|
|
| |
It is not possible to know how to interpret the returned length
without accessing the UTF8 flag, which is not reliable until
the SV has been stringified, which requires get-magic. So length
magic has not made senses since utf8 support was added. I have
removed all uses of length magic from the core, so this is now
dead code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mg_length returns the number of bytes if a scalar has length magic,
but the number of characters otherwise.
sv_len_utf8 used to assume that mg_length would return bytes. The
first mistake was added in commit b76347f2eb, which assumed that
mg_length would return characters. But it was #ifdeffed out until
commit ffc61ed20e.
Later, commit 5636d518683 met sv_len_utf8’s assumptions by making
mg_length return the length in characters, without accounting for
sv_len, which also used mg_length.
So we ended up with a buggy sv_len that would return a character
count for scalars with get- but not length-magic, and a byte count
otherwise.
In the previous commit, I fixed sv_len not to use mg_length at all. I
plan shortly to remove any use of mg_length (the one remaining use is
in sv_len_utf8, which is currently not called on magical values).
The reason for removing all calls to mg_length is that the returned
length cannot be converted to characters without access to the PV as
well, which requires get-magic. So length magic on scalars makes no
sense since the advent of utf8.
This commit restore mg_length to its old behaviour and lists it as
deprecated. This is mostly cosmetic, as there are no CPAN users. But
it is in the API, and I don’t know whether we can easily remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I accidentally broke these in commit 85ffec3682, yet everything passed
for me under threads+mad.
PL_compcv is usually restored to its previous value at the end of
newATTRSUB when LEAVE_SCOPE is called. But BEGIN blocks are called
before that. I needed PL_compcv to be restored to its previ-
ous value before it was called, so I added LEAVE_SCOPE before
process_special_blocks.
But that caused the name to be freed before S_process_special_blocks
got a chance to look at it.
So I have now added a new parameter to S_process_special_blocks to
allow *it* to call LEAVE_SCOPE after it determines that it is a BEGIN
block, but before it calls it.
|
|
|
|
|
|
|
|
|
| |
According to the documentation, reset() with no argument resets pat-
terns. But reset "" and reset "\0foo" were also resetting patterns.
While I was at it, I fixed embedded nulls, too, though it’s not likely
anyone is using this. I could not fix the bug within the existing API
for sv_reset, so I created a new function and left the old one with
the old behaviour. Call me pear-annoyed.
|
|
|
|
|
|
|
| |
apply_attrs consisted of a top-level if/else conditional upon a bool-
ean argument. It was being called with a TRUE argument in only one
place, apply_attrs_my. Inlining that branch into apply_attrs_my actu-
ally reduces the amount of code slightly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was broken by commit 60ac52eb5d5.
What that commit did was to merge two different queues that the
lexer had for pending tokens. Since bison requires that yylex
return exactly one token for each call, when the lexer sometimes has
to set aside tokens in a queue and return them from the next few
calls to yylex.
Formerly, there were two mechanism: the forced token queue (used by
force_next), and PL_pending_ident. PL_pending_ident was used for
names that had to be looked up in the pads.
$foo was handled like this:
First call to yylex:
1. Put '$foo' in PL_tokenbuf.
2. Set PL_pending_ident.
3. Return a '$' token.
Second call:
PL_pending_ident is set, so call S_pending_ident, which looks up
the name from PL_tokenbuf, and return the THING token containing
the appropriate op.
The forced token queue took precedence over PL_pending_ident. Chang-
ing the order (necessary for parsing ‘our sub foo($)’) caused some
XS::APItest tests to fail. So I concluded that the two queues needed
to be merged.
As a result, the $foo handling changed to this:
First call to yylex:
1. Put '$foo' in PL_tokenbuf.
2. Call force_ident_maybe_lex (S_pending_ident renamed and modi-
fied), which looks up the symbol and adds it to the forced
token queue.
3. Return a '$' token.
Second call:
Return the token from the forced token queue.
That had the unforeseen consequence of changing this:
for my $x (...) { ... }
$x;
such that the $x was still visible after the for loop. It only hap-
pened when the $ was the next token after the closing }:
$ ./miniperl -e 'for my $x(()){} $x = 3; warn $x'
Warning: something's wrong at -e line 1.
$ ./miniperl -e 'for my $x(()){} ;$x = 3; warn $x'
3 at -e line 1.
This broke Class::Declare.
The name lookup in the pad must not happen before the '$' token is
emitted. At that point, the parser has not yet created the for loop
(which includes exiting its scope), as it does not yet know whether
there is a continue block. (See the ‘FOR MY...’ branch of the
barestmt rule in perly.y.) So we must delay the name lookup till the
second call.
So we rename force_ident_maybe_lex back to S_pending_ident, removing
the force_next stuff. And we add a new force_ident_maybe_lex function
that adds a special ‘pending ident’ token to the forced token queue.
The part of yylex that handles pending tokens (case LEX_KNOWNEXT) is
modified to account for these special ‘pending ident’ tokens and call
S_pending_ident.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl caches SUPER methods inside packages named Foo::SUPER. But this
interferes with actual method calls on those packages (SUPER->foo,
foo::SUPER->foo).
The first time a package is looked up, it is vivified under the name
with which it is looked up. So *SUPER:: will cause that package
to be called SUPER, and *main::SUPER:: will cause it to be named
main::SUPER.
main->SUPER::isa used to be very sensitive to the name of the
main::FOO package (where the cache is kept). If it happened to be
called SUPER, that call would fail.
Fixing that bug (commit 3c104e59d83f) caused the CPAN module named
SUPER to fail, because SUPER->foo was now being treated as a
SUPER::method call. gv_fetchmeth_pvn was using the ::SUPER suffix to
determine where to look for the method. The package passed to it (the
::SUPER package) was being used to look for cached methods, but the
package with ::SUPER stripped off was being used for the rest of
lookup. 3c104e59d83f made main->SUPER::foo work by treating SUPER
as main::SUPER in that case. Mentioning *main::SUPER:: or doing a
main->SUPER::foo call before loading SUPER.pm also caused it to fail,
even before 3c104e59d83f.
Instead of using publicly-visible packages for internal caches, we
should be keeping them internal, to avoid such side effects.
This commit adds a new member to the HvAUX struct, where a hash of GVs
is stored, to cache super methods. I cannot simpy use a hash of CVs,
because I need GvCVGEN. Using a hash of GVs allows the existing
method cache code to be used.
This new hash of GVs is not actually a stash, as it has no HvAUX
struct (i.e., no name, no mro_meta). It doesn’t even need an @ISA
entry as before (which was only used to make isa caches reset), as it
shares its owner stash’s mro_meta generation numbers. In fact, the
GVs inside it have their GvSTASH pointers pointing to the owner stash.
In terms of memory use, it is probably the same as before. Every
stash and every iterated or weakly-referenced hash is now one pointer
larger than before, but every SUPER cache is smaller (no HvAUX, no
*ISA + @ISA + $ISA[0] + magic).
The code is a lot simpler now and uses fewer stash lookups, so it
should be faster.
This will break any XS code that expects the gv_fetchmeth_pvn to treat
the ::SUPER suffix as magical. This behaviour was only barely docu-
mented (the suffix was mentioned, but what it did was not), and is
unused on CPAN.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pad slot for a my sub now holds a stub with a prototype CV
attached to it by proto magic.
The prototype is cloned on scope entry. The stub in the pad is used
when cloning, so any code that references the sub before scope entry
will be able to see that stub become defined, making these behave
similarly:
our $x;
BEGIN { $x = \&foo }
sub foo { }
our $x;
my sub foo { }
BEGIN { $x = \&foo }
Constants are currently not cloned, but that may cause bugs in
pad_push. I’ll have to look into that.
On scope exit, lexical CVs go through leave_scope’s SAVEt_CLEARSV sec-
tion, like lexical variables. If the sub is referenced elsewhere, it
is abandoned, and its proto magic is stolen and attached to a new stub
stored in the pad. If the sub is not referenced elsewhere, it is
undefined via cv_undef.
To clone my subs on scope entry, we create a sequence of introcv and
clonecv ops. See the huge comment in block_end that explains why we
need two separate ops for each CV.
To allow my subs to be defined in inner subs (my sub foo; sub { sub
foo {} }), pad_add_name_pvn and S_pad_findlex now upgrade the entry
for a my sub to a CV to begin with, so that fake entries added to pads
(fake entries are those that reference outer pads) can share the same
CV. Otherwise newMYSUB would have to add the CV to every pad that
closes over the ‘my sub’ declaration. newMYSUB no longer throws away
the initial value replacing it with a new one.
Prototypes are not currently visible to sub calls at compile time,
because the lexer sees the empty stub. A future commit will
solve that.
When I added name heks to CV’s I made mistakes in a few places, by not
turning on the CVf_NAMED flag, or by not clearing the field when free-
ing the hek. Those code paths were not exercised enough by state
subs, so the problems did not show up till now. So this commit fixes
those, too.
One of the tests in lexsub.t, involving foreach loops, was incorrect,
and has been fixed. Another test has been added to the end for a par-
ticular case of state subs closing over my subs that I broke when ini-
tially trying to get sibling my subs to close over each other, before
I had separate introcv and clonecv ops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In making ‘sub foo’ respect previous ‘our sub’ declarations in a
recent commit, I actually made ‘state sub foo’ into a syntax error.
(At the time, I patched up MYSUB in perly.y to keep the tests for ‘"my
sub" not yet implemented’ still working.) Basically, it was creat-
ing an empty pad entry, but returning something that perly.y was not
expecting.
This commit adjusts the grammar to allow the SUB branch of barestmt to
accept a PRIVATEREF for its subname, in addition to a WORD. It reuses
the subname rule that SUB used to use (before our subs were added),
gutting it to remove the special block handling, which SUB now tokes
care of. That means the MYSUB rule will no longer turn on CvSPECIAL
on the PL_compcv that is going to be thrown away anyway.
The code for special blocks (BEGIN, END, etc.) that turns on CvSPECIAL
now checks for state subs and skips those. It only applies to our
subs and package subs.
newMYSUB has now actually been written. It basically duplicates
newATTRSUB, except for GV-specific things. It does currently vivify a
GV and set CvGV, but I am hoping to change that later. I also hope to
merge some of the code later, too.
I changed the prototype of newMYSUB to make it easier to use. It is
not used anywhere on CPAN and has always simply died, so that should
be all right.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yylex must emit exactly one token each time it is called. Some-
times yylex needs to parse several tokens at once. That’s what
the various force functions are for. But that is also what
PL_pending_ident is for.
The various force_next, force_word, force_ident, etc., functions keep
a stack of tokens (PL_nextval/PL_nexttype) that yylex will check imme-
diately when called.
PL_pending_ident is used to track a single identifier that yylex will
hand off to S_pending_ident to handle.
S_pending_ident is the only piece of code for resolving an identi-
fier that could be lexical but could also be a package variable.
force_ident assumes it is looking for a package variable.
force_* takes precedence over PL_pending_ident.
All this means that, if an identifier needs to be looked up in the pad
on the next yylex invocation, it has to use PL_pending_ident, and the
force_* functions cannot be used at the same time.
Not realising that, when I made ‘our sub foo’ store the sub in the
pad I also made ‘our sub foo ($)’ into a syntax error, because it
was being parsed as ‘our sub ($) foo’ (the prototype being ‘forced’);
i.e., the pending tokens were being pulled out of the ‘queue’ in the
wrong order. (I put queue in quotes, because one queue and one unre-
lated buffer together don’t exactly count as ‘a queue’.)
Changing PL_pending_ident to have precedence over the force stack
breaks ext/XS-APItest/t/swaptwostmts.t, because the statement-parsing
interface does not localise PL_pending_ident. It could be changed to
do that, but I don’t think it is the right solution.
Having two separate pending token mechanisms makes things need-
lessly fragile.
This commit eliminates the PL_pending_ident mechanism and
modifies S_pending_ident (renaming it in the process to
S_force_ident_maybe_lex) to work with the force mechanism. I was
going to merge it with force_ident, but the two make incompatible
assumptions that just complicate the code if merged. S_pending_ident
needs the sigil in the same string buffer, to pass to the pad inter-
face. force_ident needs to be able to work without a sigil present.
So now we only have one queue for pending tokens and the order is more
predictable.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_reginput (which is actually #defined to PL_reg_state.re_state_reginput)
is, to all intents and purposes, state that is only used within
S_regmatch().
The only other places it is referenced are in S_regtry() and S_regrepeat(),
where it is used to pass the current match position back and forth between
the subs.
Do this passing instead via function args, and bingo! PL_reginput is now
just a local var of S_regmatch().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rules for matching whether an above-Latin1 code point are now saved
in a macro generated from a trie by regen/regcharclass.pl, and these are
now used by pp.c to test these cases. This allows removal of a wrapper
subroutine, and also there is no need for dynamic loading at run-time
into a swash.
This macro is about as big as I'm comfortable compiling in, but it
saves the building of a hash that can grow over time, and removes a
subroutine and interpreter variables. Indeed, performance benchmarks
show that it is about the same speed as a hash, but it does not require
having to load the rules in from disk the first time it is used.
|
|
|
|
|
|
|
| |
One of these functions is currently commented out. The other is called
only in regexec.c in one place, and was recently revised to no longer
require the static function in utf8.c that it formerly called. They can
be made static inline.
|
|
|
|
|
|
|
|
|
|
| |
A previous commit has caused macros to be generated that will match
Unicode code points of interest to the \X algorithm. This patch uses
them. This speeds up modern Korean processing by 15%.
Together with recent previous commits, the throughput of modern Korean
under \X has more than doubled, and is now comparable to other
languages (which have increased themselved by 35%)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The debugger implements breakpoints by setting/clearing OPf_SPECIAL on
OP_DBSTATE ops. This means that it is writing to the optree at runtime,
and it falls foul of the enforced read-only OP slabs when debugging with
-DPERL_DEBUG_READONLY_OPS
Avoid this by removing static from Slab_to_rw(), and using it and Slab_to_ro()
in Perl_magic_setdbline() to temporarily make the slab re-write whilst
changing the breakpoint flag.
With this all tests pass with -DPERL_DEBUG_READONLY_OPS (on this system)
|
|
|
|
| |
This makes it consistent with Perl_Slab_to_ro(), which takes an OPSLAB *.
|
|
|
|
|
| |
By calling get-magic twice, it could cause its string buffer to be
reallocated, resulting in incorrect and random return values.
|
|
|
|
|
|
|
|
|
| |
Prior to this commit 98.4% of Unicode code points that went through \X
had to be looked up to see if they begin a grapheme cluster; then looked
up again to find that they didn't require special handling. This commit
refactors things so only one look-up is required for those 98.4%. It
changes the table generated by mktables to accomplish this, and hence
the name of it, and references to it are changed to correspond.
|
|
|
|
|
|
|
|
|
|
|
| |
This changes code to be able to handle Unicode 6.2, while continuing to
handle all prevrious releases.
The major change was a new definition of \X, which adds a property to
its calculation. Unfortunately \X is hard-coded into regexec.c, and so
has to revised whenever there is a change of this magnitude in Unicode,
which fortunately isn't all that often. I refactored the code in
mktables to make it easier next time there is a change like this one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way. sub { %hash || ... } is optimised to take advantage of this. If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean. When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.
If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more. We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
|
|
|
|
|
|
|
| |
Now that we have a flags parameter, we can get put this parameter as
just another flag, giving a cleaner interface to this internal-only
function. This also renames the flag parameter to <flag_p> to indicate
it needs to be dereferenced.
|
|
|
|
|
| |
This function only does something on EBCDIC platforms. On ASCII ones
make it a macro, like similar ones to avoid useless function nesting
|
|
|
|
|
|
|
|
|
|
|
| |
This revises the API for the version of swash_init() that is usable
by core Perl. The external interface is unaffected. There is now a
flags parameter to allow for future growth. And the core internal-only
function that returns if a swash has a user-defined property in it or
not has been removed. This information is now returned via the new
flags parameter upon initialization, and is unavailable afterwards.
This is to prepare for the flexibility to change the swash that is
needed in future commits.
|
|
|
|
|
|
|
| |
Benchmarking showed some speed-up when the result of the previous
search in an inversion list is cached, thus potentially avoiding a
search in the next call. This adds a field to each inversion list which
caches its previous search result.
|
|
|
|
|
|
| |
In looking at \X handling, I noticed that this function which is
intended for use in it, actually isn't used. This function may someday
be useful, so I'm leaving the source in.
|
|
|
|
|
|
| |
This populates inline_invlist.c with some static inline functions and
macro defines. These are the ones that are anticipated to be needed in
the near term outside regcomp.c
|
|
|
|
|
|
| |
These two functions will be moved into a header in a future commit,
where they will be accessible outside regcomp.c Prefix their names with
an underscore to emphasize that they are private
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CVs close over their outer CVs. So, when you write:
my $x = 52;
sub foo {
sub bar {
sub baz {
$x
}
}
}
baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo,
and foo’s to the main cv.
When the inner reference to $x is looked up, the CvOUTSIDE chain is
followed, and each sub’s pad is looked at to see if it has an $x.
(This happens at compile time.)
It can happen that bar is undefined and then redefined:
undef &bar;
eval 'sub bar { my $x = 34 }';
After this, baz will still refer to the main cv’s $x (52), but, if baz
had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x.
(It’s not really a new bar, as its refaddr is the same, but it has a
new body.)
This particular case is harmless, and is obscure enough that we could
define it any way we want, and it could still be considered correct.
The real problem happens when CVs are cloned.
When a CV is cloned, its name pad already contains the offsets into
the parent pad where the values are to be found. If the outer CV
has been undefined and redefined, those pad offsets can be com-
pletely bogus.
Normally, a CV cannot be cloned except when its outer CV is running.
And the outer CV cannot have been undefined without also throwing
away the op that would have cloned the prototype.
But formats can be cloned when the outer CV is not running. So it
is possible for cloned formats to close over bogus entries in a new
parent pad.
In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed)
instead of SCALAR(0xdeafbee):
sub foo {
my $x;
format =
@
($x,warn \$x)[0]
.
}
undef &foo;
eval 'sub foo { my @x; write }';
foo
__END__
And if the offset that the format’s pad closes over is beyond the end
of the parent’s new pad, we can even get a crash, as in this case:
eval
'sub foo {' .
'{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999
. q|
my $x;
format =
@
($x,warn \$x)[0]
.
}
|;
undef &foo;
eval 'sub foo { my @x; my $x = 34; write }';
foo();
__END__
So now, instead of using CvROOT to identify clones of
CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t
actually have an ID, so we give them one. Any time a sub is cloned,
the new padlist gets the same ID as the old. The format needs to
remember what its outer sub’s padlist ID was, so we put that in the
padlist struct, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to fix a bug, I need to add new fields to padlists. But I
cannot easily do that as long as they are AVs.
So I have created a new padlist struct.
This not only allows me to extend the padlist struct with new members
as necessary, but also saves memory, as we now have a three-pointer
struct where before we had a whole SV head (3-4 pointers) + XPVAV (5
pointers).
This will unfortunately break half of CPAN, but the pad API docs
clearly say this:
NOTE: this function is experimental and may change or be
removed without notice.
This would have broken B::Debug, but a patch sent upstream has already
been integrated into blead with commit 9d2d23d981.
|
|
|
|
|
| |
Much code relies on the fact that PADLIST is typedeffed as AV.
PADLIST should be treated as a distinct type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bracketed character class containing a single Latin1-range character
has long been optimized into an EXACT node. Also, flags are set to
include SIMPLE. However, EXACT nodes containing code points that are
different when encoded under UTF-8 versus not UTF-8 should not be marked
simple.
To fix this, the address of the flags parameter is now passed to
regclass(), the function that parses bracketed character classes, which
now sets it appropriately. The unconditional setting of SIMPLE that was
always done in the code after calling regclass() has been removed.
In addition, the setting of the flags for EXACT nodes has been pushed
into the common function that populates them.
regclass() will also now increment the naughtiness count if optimized to
a node that normally does that. I do not understand this heuristic
behavior very well, and could not come up with a test case for it;
experimentation revealed that there are no test cases in our test suite
for which naughtiness makes any difference at all.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing formats, the lexer invents tokens to feed to the parser.
So when the lexer dissects this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
The parser actually sees this (the parser knows that = . is like { }):
format =
; formline "@<<<< @>>>>\n", $foo, $bar, $baz;
.
The lexer makes no effort to make sure that the argument line is con-
tained within formline’s arguments. To make
{ do_stuff; $foo, bar }
work, the lexer supplies a ‘do’ before the block, if it is
inside a format.
This means that
$a, $b; $c, $d
feeds ($a, $b) to formline, wheras
{ $a, $b; $c, $d }
feeds ($c, $d) to formline. It also has various other
strange effects:
This script prints "# 0" as I would expect:
print "# ";
format =
@
(0 and die)
.
write
This one, locking parentheses, dies because ‘and’ has low precedence:
print "# ";
format =
@
0 and die
.
write
This does not work:
my $day = "Wed";
format =
@<<<<<<<<<<
({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day})
.
write
You have to do this:
my $day = "Wed";
format =
@<<<<<<<<<<
({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day})
.
write
which is very strange and shouldn’t even be valid syntax.
This does not work, because ‘no’ is not allowed in an expression:
use strict;
$::foo = "bar"
format =
@<<<<<<<<<<<
no strict; $foo
.
write;
Putting a block around it makes it work. Putting a semicolon before
‘no’ stop it from being a syntax error, but it silently does the
wrong thing.
I thought I could fix all these by putting an implicit do { ... }
around the argument line and removing the special-casing for an open-
ing brace, allowing anonymous hashrefs to work in formats, such
that this:
format =
@<<<< @>>>>
$foo, $bar, $baz
.
would turn into this:
format =
; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; };
.
But that will lead to madness like this ‘working’:
format =
@
}+do{
.
It would also stop lexicals declared in one format line from being
visible in another.
So instead this commit starts being honest with the parser. We still
have some ‘invented’ tokens, to indicate the start and end of a format
line, but now it is the parser itself that understands a sequence of
format lines, instead of being fed generated code.
So the example above is now presented to the parser like this:
format = ; FORMRBRACK
"@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK
; .
Note about the semicolons: The parser expects to see a semicolon at
the end of each statement. So the lexer has to supply one before
FORMRBRACK. The final dot goes through the same code that handles
closing braces, which generates a semicolon for the same reason. It’s
easier to make the parser expect a semicolon before the final dot than
to change the } code in the lexer. We use the } code for . because it
handles the internal variables that keep track of how many nested lev-
els there, what kind, etc.
The extra ;FORMRBRACK after the = is there also to keep the lexer sim-
ple (ahem). When a newline is encountered during ‘normal’ (as opposed
to format picture) parsing inside a format, that’s when the semicolon
and FORMRBRACK are emitted. (There was already a semicolon there
before this commit. I have just added FORMRBRACK in the same spot.)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is to allow future changes. The function now returns success or
failure, and the created regnode (if any) is set via a parameter
pointer.
I removed the 'register' declaration to get this to work, because
such declarations are considered bad form these days, e.g.,
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
|
|
|
|
|
|
|
|
|
| |
This was a static function which I couldn't get to be callable from the
debugging version of regcomp.c. This makes it public, but known only
in the regcomp.c source file. It changes the name to begin with an
underscore so that if someone cheats by adding preprocessor #defines,
they still have to call it with the name that convention indicates is a
private function.
|
|
|
|
| |
This function handles \N of any ilk, not just named sequences.
|
|
|
|
|
|
|
| |
The magic flags patch prevents this from ever being called, since the
OK flags work the same way for magic variables now as they have for
muggle vars, avoid these fiddly games. (It was when writing it that I
realised the value of the magic flags proposal.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A substitution forces its target to a string upon successful substitu-
tion, even if the substitution did nothing:
$ ./perl -Ilib -le '$a = *f; $a =~ s/f/f/; print ref \$a'
SCALAR
Notice that $a is no longer a glob after s///.
But vstrings are different:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/f/; print ref \$a'
VSTRING
I fixed this in 5.16 (1e6bda93) for those cases where the vstring ends
up with a value that doesn’t correspond to the actual string:
$ ./perl -Ilib -le '$a = v102; $a =~ s/f/o/; print ref \$a'
SCALAR
It works through vstring set-magic, that does the check and removes
the magic if it doesn’t match.
I did it that way because I couldn’t think of any other way to fix
bug #29070, and I didn’t realise at the time that I hadn’t fixed
all the bugs.
By making SvTHINKFIRST true on a vstring, we force it through
sv_force_normal before any in-place string operations. We can also
make sv_force_normal handle vstrings as well. This fixes all the lin-
gering-vstring-magic bugs in just two lines, making the vstring set-
magic (which is also slow) redundant. It also allows the special case
in sv_setsv_flags to be removed.
Or at least that was what I had hoped.
It turns out that pp_subst, twists and turns in tortuous ways, and
needs special treatment for things like this.
And do_trans functions wasn’t checking SvTHINKFIRST when arguably it
should have.
I tweaked sv_2pv{utf8,byte} to avoid copying magic variables that do
not need copying.
|
|
|
|
|
|
| |
ck_chdir, added in 2006 (d4ac975e) duplicates ck_trunc, added in
1993 (79072805), except for a null op check which is harmless when
applied to chdir.
|
|
|
|
|
| |
This simply searches an inversion list without going through a swash.
It will be used in a future commit.
|
|
|
|
|
|
| |
This should have been written this way to begin with (I'm the culprit).
But we should have a method so another routine doesn't have to know the
internal details.
|