| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several commits in the 5.23 series improved the display of the compiled
ANYOF regnodes, but introduced two bugs. One of them is in \p{Any} and
similar things that match the entire range 0-255. That range is omitted,
so it looks like \p{Any} only matches code points above 255. Note that
this is only what gets displayed under -Dr. What actually gets compiled
has been and still is fine.
The other is that when displaying a pattern that still has unresolved
user-defined properties that are complemented, it doesn't show properly
that the whole thing is complemented. That is, the output looks like it
doesn't obey De Morgan's laws.
The fixes to these are quite intertwined, and so I didn't try to
separate them.
(cherry picked from commit 753b2c6a60a81dacbe59e2041e30e8302484dc2d)
|
|
|
|
|
|
|
|
|
| |
Specifically in the S_space_join_names_mortal static function that
several pp functions call. On some platforms (such as Gentoo Linux
with torsocks), hent->h_aliases (where hent is a struct hostent *) may
be null after a gethostent call.
(cherry picked from commit d35c1b5e43e773f353239d9182ddccb41cdab3d6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ cat > foo
print "What?!\n"
^D
$ chmod +x foo
$ ./perl -Ilib -Te '$ENV{PATH}="."; exec "foo"'
Insecure directory in $ENV{PATH} while running with -T switch at -e line 1.
That is what I expect to see. But:
$ ./perl -Ilib -Te '$ENV{PATH}="/\\:."; exec "foo"'
What?!
Perl is allowing the \ to escape the :, but the \ is not treated as an
escape by the system, allowing a relative path in PATH to be consid-
ered safe.
(cherry picked from commit ba0a4150f6f1604df236035adf6df18bd43de88e)
|
|
|
|
|
|
| |
This reverts commit fea1d2dd5d210564d442a09fe034b62f262f35f9 due to it
causing problems so close to the release of 5.24. See
https://rt.perl.org/Ticket/Display.html?id=127852
|
|
|
|
|
|
|
|
| |
It had rotted a bit Well, more than one probably.
Move the declarations of the functions Perl_mem_log_alloc etc from handy.h
into embed.fnc where whey belong, and where Malloc_t will have already
been defined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
c:\p523\src\inline.h(211) : warning C4267: 'function' : conversion from 'size_t'
to 'I32', possible loss of data
c:\p523\src\inline.h(212) : warning C4267: 'function' : conversion from 'size_t'
to 'I32', possible loss of data
c:\p523\src\inline.h(421) : warning C4244: '=' : conversion from '__int64' to 'I
32', possible loss of data
c:\p523\src\inline.h(423) : warning C4244: '=' : conversion from '__int64' to 'I
32', possible loss of data
To fix the warnings at line 211 and 212, change the func to use a signed
ptr length type. Although on x64, a 64b to 64b move instruction is 1 byte
longer than a 32b to 32b move, so this commit adds a couple more bytes of
machine code to the interp, but PVs len and cur are STRLEN, which is 64b
on 64b OS, so something bad would happen if a very large off arg was
passed to Perl_utf8_hop that was trucated to 32b, hence casting to silence
the warning isn't appropriate, instead a bigger type is needed.
S_cx_pushblock, a 8*(2^32), or 32 GB long perl stack malloc block is
unrealistic. A 32 GB mark stack is infinite recursion. Cast away the
warnings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit:
1. Renames the various dtrace probe macros into a consistent and
self-documenting pattern, e.g.
ENTRY_PROBE => PERL_DTRACE_PROBE_ENTRY
RETURN_PROBE => PERL_DTRACE_PROBE_RETURN
Since they're supposed to be defined only under PERL_CORE, this shouldn't
break anything that's not being naughty.
2. Implement the main body of these macros using a real function.
They were formerly defined along the lines of
if (PERL_SUB_ENTRY_ENABLED())
PERL_SUB_ENTRY(...);
The PERL_SUB_ENTRY() part is a macro generated by the dtrace system, which
for example on linux expands to a large bunch of assembly directives.
Replace the direct macro with a function wrapper, e.g.
if (PERL_SUB_ENTRY_ENABLED())
Perl_dtrace_probe_call(aTHX_ cv, TRUE);
This reduces to once the number of times the macro is expanded.
The new functions also take simpler args and then process the values they
need using intermediate temporary vars to avoid huge macro expansions.
For example
ENTRY_PROBE(CvNAMED(cv)
? HEK_KEY(CvNAME_HEK(cv))
: GvENAME(CvGV(cv)),
CopFILE((const COP *)CvSTART(cv)),
CopLINE((const COP *)CvSTART(cv)),
CopSTASHPV((const COP *)CvSTART(cv)));
is now
PERL_DTRACE_PROBE_ENTRY(cv);
This reduces the executable size by 1K on -O2 -Dusedtrace builds,
and by 45K on -DDEBUGGING -Dusedtrace builds.
|
|
|
|
| |
... thus avoiding a function call overhead
|
|
|
|
|
| |
embed.fnc declared it as "U32 depth", while it was defined as "const U32
depth".
|
|
|
|
|
|
|
|
| |
this should fix the smoke failures on threaded builds,
also it renames re_indentfo which was a terrible name in the first
place, and now what i have had to strip the Perl_prefixes from
these subs with a perl -i -pe, I took the opportunity to rename
it to re_exec_indent, which self documents much better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces three new subs:
Perl_re_printf() which is a wrapper for
PerlIO_printf( Perl_debug_log, ... ),
which cuts down on clutter in the code. Arguably this could be moved
to util.c and renamed something like PerlIO_debugf() and then we could
declutter all the statements that write to the Perl_debug_log
filehandle. But that is a bit too ambituous for me right now, so
I leave this as a regex engine only sub for now.
Perl_re_indentf() which is a wrapper for PerlIO_re_printf(),
which adds an indent argument and automatically indents the
line appropriately, and is used in regcomp.c for trace diagnostics
during compilation.
Perl_re_indentfo() which is similar to Perl_re_indentf() but
is used in regexec.c which adds a specific prefix to each indented
line to account for the fact that during execution we normally have
string position information on the left.
The end result of this patch is that a lot of clutter in the debugging
statements in the regex engine is reduced, exposing what is actually
going on. It should also now be easier to add new diagnostics which
"do the right thing".
Over time the debugging trace output in regexec has become
very cluttered and confusing. This patch cleans much of it up,
if something happens at a given recursion depth it is output
at the right depth, etc, and formats have been changed to not have
leading spaces so you can actually see the indentation properly.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
A future commit will make more sense if these names are changed. This
reindents some code so that it doesn't overflow 79 columns
|
|
|
|
|
| |
I found myself using this function, forgetting that it zapped one of the
parameters, so change the name so that can't be forgotten.
|
|
|
|
|
|
|
|
|
|
|
| |
This is at least a partial patch for [perl #127392], cutting the maximum
memory used on my box from around 8600kB to 7800kB. For [perl #127568],
which has been merged into #127392, the savings are even larger, about
37%
Previously a large number of large mortal SVs could be created while
compiling a single regex pattern, and their accumulated memory quickly
added up. This changes things to not use so many mortals.
|
|
|
|
|
|
| |
I don't know of any cases where this happens, but in working on the next
commit I triggered a problem with shrinking an inversion list so much
that the required 0 UV at the beginning was freed.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This revamps the handling of -Dr for bracketed character classes. There
were bugs introduced earlier in 5.23, and this consolidates the handling
of /d classes so that the interactions can be better considered. It
tries inverting the portion that is in the bitmap range to see if the
output is shorter, and clearer that way. And it always makes the
above-bitmap code points show as not-inverted, as that is clearer.
I ran out of time before the freeze, so I had to not invert in some
cases.
|
|
|
|
|
|
|
| |
This parameter will be used in a future commit, it changes the output
format of this function that displays the contents of an inversion list
so that it won't have to be parsed later, simplifying the code at that
time.
|
|
|
|
|
|
|
| |
This function was used outside the file it contains, but was only
defined (by #ifdef's) for those few internal core files for which it was
needed. Now all those uses have gone, save for the one file. Better to
make it static so no one can circumvent those #ifdef's.
|
|
|
|
|
|
|
|
|
|
| |
grok_bslash_x() is so large that no compiler will inline it. Move it to
dquote.c from dq_inline.c. Conversely, move form_octal_warning() to
dq_inline.c. It is so tiny that the function call overhead is scarcely
smaller than the function body.
This also moves things in embed.fnc so all these functions. are not
visible outside the few files they are supposed to be used in.
|
|
|
|
|
|
| |
These should be internal only, and we may want to get rid of them
someday. Hide their existence so that people who don't already know
about them won't be tempted to try to use them.
|
|
|
|
|
| |
This is the one remaining empty {} that was accepted under the
experimental 'use re "strict"'.
|
|
|
|
|
|
| |
This takes code that was duplicated and makes it into a single static
inline function, so that maintenance tasks don't have to be done on both
copies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A problem with bracketed character classes, qr/[foo]/, is that there is
very little structure about them, so almost anything is legal, and so
typos just silently compile into something unintended. One of the
possible components are posix character classes. There are 14 of them,
and they have a very restricted structure, which is easy to get slightly
wrong, so that instead of the intended posix class being compiled,
something else silently is created. This commit causes the regex
compiler to look for slightly misspelled posix character classes and to
raise a warning when found. It does not change the results of the
compilation.
To do this, it introduces fuzzy parsing into the regex compiler, using
the Damerau-Levenshtein algorithm to find out how many single character
edits it would take to transform the input into one of the 14 classes.
If it is 1 or 2 off, it considers the input to have been intended to be
that class and raises the warning. If more edits would be needed, it
remains silent.
This is a heuristic, and someone could have made enough typos that this
thinks a class wasn't intended that was. Conversely it could raise a
warning when no class was intended, though warnings only happen when the
input very closely resembles a posix class of one of the 14 legal ones.
The algorithm can be tweaked if experience indicates it should. But the
bottom line is that many more cases of unintended results will now be
warned about.
Things like having blanks in the construct and having the '^' before the
colon are recognized as being intended posix classes (given that the
actual names are close to one of the 14), and raise warnings. Again
this commit does not change what gets compiled. This found a bug in
autodoc.pl which was fixed a few commits ago.
The [. .] and [= =] POSIX constructs cause perl to croak that they are
unimplemented. This commit improves the parsing of these two, and fixes
some false positives. See
http://nntp.perl.org/group/perl.perl5.porters/230975
The new code combines two functions in regcomp.c into one new one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used in a future commit.
This code is taken from CPAN Text::Levenshtein::Damerau::XS with the
author's knowledge. There have been white-space changes to make it
conform better to perl's core coding standards, and declaration changes
to make it more portable, such as using UV instead of 'unsigned int',
and PERL_STATIC_INLINE instead of a less portable form, but the logic is
unchanged. One variable was changed to signed from unsigned to avoid a
warning message from some compilers.
The author and I will decide later about keeping the cpan module and
this code in sync. It changes very rarely.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of gimme stored in the context stack is U8.
Make all other uses in the main core consistent with this.
My primary motivation on this was that the new function cx_pushblock(),
which I gave a 'U8 gimme' parameter, was generating warnings where callers
were passing I32 gimme vars to it. Rather than play whack-a-mole, it
seemed simpler to just uniformly use U8 everywhere.
Porting/bench.pl shows a consistent reduction of about 2 instructions on
the loop and sub benchmarks, so this change isn't harming performance.
|
|
|
|
| |
Replace CX_PUSHGIVEN() with cx_pushgiven() etc.
|
|
|
|
| |
Replace CX_PUSHLOOP_FOR() with cx_pushfloop_for() etc.
|
|
|
|
|
|
| |
Replace CX_PUSHEVAL() with cx_pusheval() etc.
No functional changes.
|
|
|
|
|
|
| |
Replace CX_PUSHFORMAT() with cx_pushformat() etc.
No functional changes.
|
|
|
|
|
|
| |
Replace CX_PUSHSUB() with cx_pushsub() etc.
No functional changes.
|
|
|
|
|
|
| |
Replace CX_PUSHBLOCK() with cx_pushblock() etc.
No functional changes.
|
|
|
|
|
|
|
|
| |
By making SAVETMPS have its own dedicated save type, it avoids having to
push the address of PL_tmps_floor onto the save stack each time.
By also giving it a dedicated save function, the function can do
the PL_tmpsfloor = PL_tmps_ix step too, making the binary slightly more
compact.
|
|
|
|
|
|
| |
Rather than doing cx->blk_eval.retop = NULL in PUSHEVAL, then relying on
the caller to subsequently change it to something more useful, make it an
arg to PUSHEVAL.
|
|
|
|
|
|
|
|
|
|
| |
Make the remaining callers of S_leave_common() use leave_adjust_stacks()
instead, then delete this static function.
This brings the benefits of freeing TEMPS on all scope exists that
has already been introduced on sub exits; uses the optimised code for
creating mortal copies; and finally unifies all the different 'process
return args on scope exit' implementations into single function.
|
|
|
|
|
|
|
| |
It was using S_leave_common(), but that's shortly to be removed. It also
required adding an extra arg to leave_adjust_stacks() to indicate where to
shift the return args to. This will also be needed for when we replace the
remaining uses of S_leave_common() with leave_adjust_stacks().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently S_leavesub_adjust_stacks() is just used by pp_leavesub.
Rename it to Perl_leave_adjust_stacks(), extend its functionality
slightly, then make pp_leavesublv() use it too.
This means that lvalue sub exit gains the benefit of FREETMPS being done,
and (where mortal copying needs doing) the optimised copying code.
It also means there is now one less version of the "process args on scope
exit" code.
pp_leavesublv() still does a scan of its return args looking for things to
croak() on, but leaves everything else to leave_adjust_stacks().
leave_adjust_stacks() is intended shortly to be used in place of
S_leave_common() too, thus unifying all args-on-scope-exit code.
The changes to leave_adjust_stacks() in this commit (apart from the
renaming and doc changes) are:
* a new arg to indicate what condition to use to decide whether to
pass or copy the arg;
* a new branch to mortalise and ref count bump an arg
|
|
|
|
|
| |
This makes it a bit more obvious what niche in the "eval" ecosystem
that it occupies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently one of the args to S_leave_common() is supposed to be the
current stack pointer; it returns an updated sp. Instead make it get/set
PL_stack_sp directly.
e.g. in the caller, replace
dSP;
SP = S_leave_common(..., SP, ...);
PUTBACK;
with
S_leave_common(..., ...);
and in S_leave_common(), make it initially get PL_stack_sp, and before
returning, update PL_stack_sp.
|
|
|
|
|
| |
Since it searches the context stack for the next GIVEN *or* FOR LOOP
context, make the name better express its purpose.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function implements the less commonly used branch in the POPSUB()
macro that clears @_ in place, or abandons it and creates a new array
in pad slot 0 of the function (the common branch is where @_ hasn't been
reified, and so can be clered simply by setting fill to -1).
By moving this out to a separate function we can avoid repeating the same
code everywhere the POPSUB macro is used; but since its only used
in the less frequent cases, the extra overall of a function call doesn't
matter.
It has a currently unused arg, 'abandon', which will be used shortly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the handling of Grapheme Cluster Breaks to be entirely via
a lookup table generated by regen/mk_invlists.pl.
This is easier to maintain and follow, as the generation of the table
follows the text of Unicode's UAX29 precisely, and loops can be used to
set every class up instead of having to name each explicitly, so it will
be easier to add new rules. And the runtime switch statement is
replaced by a single line.
My gcc compiler optimized the previous version to an array lookup, but
this commit does it for not so clever compilers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the final Unicode boundary type previously missing from core
Perl: the LineBreak one. This feature is already available in the
Unicode::LineBreak module, but I've been told that there are portability
and some other issues with that module. What's added here is a
light-weight version that is lacking the customizable features of the
module.
This implements the default Line Breaking algorithm, but with the
customizations that Unicode is expecting everybody to add, as their
test file tests for them. In other words, this passes Unicode's fairly
extensive furnished tests, but wouldn't if it didn't include certain
customizations specified by Unicode beyond the basic algorithm.
The implementation uses a look-up table of the characters surrounding a
boundary to see if it is a suitable place to break a line. In a few
cases, context needs to be taken into account, so there is code in
addition to the lookup table to handle those.
This should meet the needs for line breaking of many applications,
without having to load the module.
The algorithm is somewhat independent of the Unicode version, just like
the other boundary types. Only if new rules are added, or existing ones
modified is there need to go in and change this code. Otherwise,
running regen/mk_invlists.pl should be sufficient when a new Unicode
release is done to keep it up-to-date, again like the other Unicode
boundary types.
|
|
|
|
| |
whitespace-only change
|
|
|
|
| |
This will allow new behavior, needed in a future commit.
|
|
|
|
| |
This is just acting on the TODO comment.
|