| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This appears to resolve these three related tickets:
[perl #116989] S_croak_memory_wrap breaks gcc warning flags detection
[perl #117319] Can't include perl.h without linking to libperl
[perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?)
This patch changes S_croak_memory_wrap from a static (but not inline)
function into an ordinary exported function Perl_croak_memory_wrap.
This has the advantage of allowing programs (particuarly probes, such
as in cflags.SH and Time::HiRes) to include perl.h without linking
against libperl. Since it is not a static function defined within each
compilation unit, the optimizer can no longer remove it when it's not
needed or inline it as needed. This likely negates some of the savings
that motivated the original commit 380f764c1ead36fe3602184804292711.
However, calling the simpler function Perl_croak_memory_wrap() still
does take less set-up than the previous version, so it may still be a
slight win. Specific cross-platform measurements are welcome.
|
|
|
|
|
|
|
|
|
|
| |
Remove volatile qualifiers. Remove the variable jump_ret. Move the
initialisation of restudied back to the declaration. This reverts several of
the changes made by commits 5d51ce98fae3de07 and bbd61b5ffb7621c2.
However, I can't see a cleaner way to avoid code duplication when restarting
the parse than to approach I've taken here - the label redo_first_pass is
now inside an if (0) block, which is clear but ugly.
|
|
|
|
|
|
|
|
|
|
|
| |
The SV listsv is sometimes stored in an array generated near the end of
S_regclass(). In other cases it is not used, and it needs to be freed if
any of the warnings that S_regclass() can trigger turn out to be fatal.
The simplest solution to this problem is to declare it from the start as a
mortal, and claim a (new) reference to it if it is *not* to be freed. This
permits the removal of all other code related to ensuring that it is freed
at the right time, but not freed prematurely if a call to a warning returns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
S_ptr_hash() - A new static function in hv.c which can be used to
hash a pointer or integer.
PL_hash_rand_bits - A new interpreter variable used as a cheap
provider of "semi-random" state for use by the hash infrastructure.
xpvhv_aux.xhv_rand - Used as a mask which is xored against the
xpvhv_aux.riter during iteration to randomize the order the actual
buckets are visited.
PL_hash_rand_bits is initialized as interpreter start from the random
hash seed, and then modified by "mixing in" the result of ptr_hash()
on the bucket array pointer in the hv (HvARRAY(hv)) every time
hv_auxinit() allocates a new iterator structure.
The net result is that every hash has its own iteration order, which
should make it much more difficult to determine what the current hash
seed is.
This required some test to be restructured, as they tested for something
that was not necessarily true, we never guaranteed that two hashes with
the same keys would produce the same key order, we merely promised that
using keys(), values(), or each() on the same hash, without any
insertions in between, would produce the same order of visiting the
key/values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Namely:
* The first character in ${...} used to have no restrictions
* ${foo:bar} used to be legal
* ${foo::bar} worked, but ${foo'bar} didn't
And possibly other subtle, so far undiscovered bugs. This was
resolved by simply using the same code for both things.
Note that this commit is not entirely useful on its own; While
tests pass, it requires changes from the following commit to work
entirely.
|
|
|
|
|
|
|
|
| |
Whilst this is slightly more work for its existing two callers, it will
permit Perl_hv_ksplit() to also call it.
Use STRLEN for the parameters, and change a local variable from I32 to
STRLEN to match.
|
|
|
|
|
| |
The latter is a somewhat less clumsy name. The old one is provided a a
very clear name; the new one as a somewhat slangy version
|
|
|
|
|
| |
This function is just an assert and a macro call. Avoid the function
call overhead by making it inline.
|
|
|
|
|
|
|
|
| |
In using the av_top() function created in a recent commit, I found
myself being confused, and thinking it meant the top element of the
array, whereas it really means the index of the top element of that
array. Since the new name has not appeared in a stable release, it can
be changed, without remorse, to include 'index' in it.
|
|
|
|
|
|
|
|
|
| |
This commit adds the capability for '(?[ ])' to contain interpolated
variables from other '(?[ ])' constructs. A set operation can thus be
built up from the composition of other ones, without having to worry
about precedence, etc.
Thanks to Aaron Crane for suggesting this.
|
|
|
|
| |
Thanks to Hugo van der Sanden for reviewing this new code.
|
|
|
|
|
|
| |
The code to parse the flags that occur after in '(?foo)' and
'(?foo:bar)' is extracted into a function; some comments were added.
This is in preparation for this to be called from an additional place
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose is less machine instructions/faster code.
* S_hv_free_ent_ret() is always called with entry non-null: so change its
signature to reflect this, and remove a null check;
* Add some SvREFCNT_dec_NNs;
* In hv_clear(), refactor the code slightly to only do a SvREFCNT_dec_NN
within the branch where its already been determined that the arg is
non-null; also, use the _nocontext variant of Perl_croak() to save
a push instruction in threaded perls.
|
|
|
|
|
|
|
| |
Make the second variable name in embed.fnc match those used in the
actual function declaration. This will matter if we add in 'entry'
to PERL_ARGS_ASSERT_HV_FREE_ENT_RET. Also regen headers (only proto.h
is affected) to match.
|
|
|
|
| |
av_len() is misleadingly named.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three pairs of characters that Perl recognizes as
metacharacters in regular expression patterns: {}, [], and (). These
can be used as well to delimit patterns, as in:
m{foo}
s(foo)(bar)
Since they are metacharacters, they have special meaning to regular
expression patterns, and it turns out that you can't turn off that
special meaning by the normal means of preceding them with a backslash,
if you use them, paired, within a pattern delimitted by them. For
example, in
m{foo\{1,3\}}
the backslashes do not change the behavior, and this matches "f", "o"
followed by one to three more occurrences of "o".
Usages like this, where they are interpreted as metacharacters, are
exceedingly rare; we think there are none, for example, in all of CPAN.
Hence, this deprecation should affect very little code. It does give
notice, however, that any such code needs to change, which will in turn
allow us to change the behavior in future Perl versions so that the
backslashes do have an effect, and without fear that we are silently
breaking any existing code.
=head1 Performance Enhancements
|
|
|
|
|
|
|
|
|
|
| |
This recently added regex syntax imposes stricter rules on parsing than
normal. However, this did not include parsing \N{} constructs that
occur within it. This commit does that, making fatal the warnings that
come from \N{}
I will add to perldiag the newly added messages along with the others
for (?[ ]) before 5.18 ships
|
|
|
|
|
| |
("function declared with __declspec(noreturn) has non-void return type" /
"function declared with __declspec(noreturn) has a return statement".)
|
|
|
|
|
| |
("'initializing' : conversion from 'I32' to 'U8', possible loss of data"
and "formal parameter n different from declaration".)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was discussed in thread
http://perl.markmail.org/thread/avtzvtpzemvg2ki2
but I never got around to this portion of the consensus, until now.
I did a cpan grep
http://grep.cpan.me/?q=%28^|[^\\]%29\\[0-7]{1%2C2}[8-9]&page=1
and eyeballing the results, saw three cases where this warning might
show up; one of which was for EBCDIC. The others looked to be false
positives, such as in .css files.
|
|
|
|
|
|
|
|
|
|
|
| |
These macros should not be used as they are prone to misuse. There are
no occurrences of them in CPAN. The single use of either of them in
core has recently been removed (commit
8d40577bdbdfa85ed3293f84bf26a313b1b92f55), because it was a misuse.
Instead code should use isIDFIRST_lazy_if or isWORDCHAR_lazy_if
(isALNUM_lazy_if is also available, but can be confused with the Posix
alnum, which it doesn't mean).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fancier [bracketed] character class which allows set
operations, such as intersection and subtraction. The entry in perlre
for this commit details its operation.
Besides extending regular expressions to handle this functionality,
recommended by Unicode, the intent here is to do three things:
1) Intersection has been simulated by regexes using zero-width
look-around assertions, which are non-obvious. This allows replacing
those with a more powerful and clearer syntax; the compiled regexes
are smaller and faster. Everything is known at compile time.
2) Set operations have also been simulated by using user-defined Unicode
properties. These are globals, have security implications,
restricted names, and d don't allow as complex expressions as this
new feature.
3) I hope that this feature will come to be viewed as a "better"
bracketed character class. I took advantage of the fact that there
is no embedded base to have to be compatibile with to forbid certain
iffy practices with the existing ones, while remaining mostly
backwards compatible. The main difference is that /x is always
enabled, so white space can be pretty much freely used with these,
but to specify a match on white space, it must be escaped. Things
that should have been illegal are, such as \x{}, and \x{abcdefghi}.
Things that look like a posix specifier but don't quite meet the
rules now give an error instead of silently compiling. e.g., [:digit]
is an error instead of the union of the characters that compose it.
I may have omitted things; perhaps it should be an error to have the
same letter occur twice, adjacent. Since this is experimental, we
can make such changes based on field feed back.
The intent is to keep this feature, since it is strongly recommended by
Unicode. The exact syntax is subject to change, so is experimental.
|
|
|
|
|
| |
This is currently unused, but will have regclass() return an inversion
list instead of a node.
|
|
|
|
|
| |
This adds a parameter to regpposixcc() to enforce stricter rules on the
posix class syntax. It is currently unused
|
|
|
|
|
|
|
| |
The plan is to eventually convert all of regcomp to use this for white
space ignoring under /x, but this will be used for now in just the new
syntax for (?[ ]), coming in a few commits. Until then, this function
is unused.
|
|
|
|
|
| |
This parameter silences warnings for non-portable characters. It
currently is always FALSE, meaning that warnings are given.
|
|
|
|
|
|
|
|
|
| |
This parameter allows the caller to specify whether multi-character
folds should be allowed or not. In general it should, and in the case
where this commit says it shouldn't, they never are returned anyway from
Unicode properties.
This capability will be put to real use by future commits
|
|
|
|
|
|
|
|
| |
If a hex or octal number is too big to fit in a 32 bit word, grok_oct
and grok_hex by default output a warning that it is a non-portable
value. This new parameter to the grok_bslash functions can cause them
to shut up those warnings. This is currently unused, but will be needed
in future commits.
|
|
|
|
|
|
| |
This mode croaks on any iffy constructs that currently compile. It is
not currently used; documentation of the error messages will be
delivered later.
|
|
|
|
|
|
| |
By passing the address of the parse pointer, the functions can advance
it, eliminating a parameter to the function, and simplifying the code in
the caller.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing \p{} outside of a bracketed character class, code in
regcomp.c has pretended it is a bracketed character class by changing
and restoring the parsing pointers, and then calling the charclass
handler. This code can be simplified by instead passing a flag to the
handler meaning to just parse one item. The faking is simpler there,
with no restoring necessary. Also we can eliminate the duplicate
handling of special cases.
Future commits will make more extensive use of this mechanism.
|
|
|
|
|
| |
This function is specified as inline in the source code, but not in the
prototypes; only one compiler seems to have noticed.
|
|
|
|
|
|
|
| |
This adds functions to prevent accidental (or deliberate) iteration over
an inversion list while it is being modified. This is to catch
development errors, and in production builds, the asserts() are likely
no-ops
|
|
|
|
|
|
|
|
|
|
|
| |
This global flag is cleared at the start of execution, and then set if
any locale-based nodes are executed. At the end of execution, the
RXf_TAINTED_SEEN flag on the regex is set/cleared based on RF_tainted.
We eliminate RF_tainted by simply directly setting RXf_TAINTED_SEEN
each time a taintable node is executed.
This is the final step before eliminating PL_reg_flags.
|
|
|
|
|
|
|
|
| |
This global flag indicates whether the currently executing regex is utf8.
Replace it with a boolean var local to to the matching function, and pass
it around via function args, or as a member of the regmatch_info struct.
This is a first step to eliminating PL_reg_flags.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit saves machine instructions by preventing inlining, and keeps
the error handling code for an extremely rare panic out of hot code. This
should make the interp smaller and faster.
Perl_error_log is a macro that has a very large expansion on threaded
perls, 4 branches and possibly a call to Perl_PerlIO_stderr. POPSTACK
18 times, by asm, on my non DEBUGGING threaded Win32 32 bit Perl 5.17
-O1 compiled with VC 2003. POPSTACK is also used in some core XS modules,
for example List::Util and PerlIO::encoding. The .text section of
perl517.dll dropped from 0xc05ff bytes of x86 instructions to 0xc00ff
after applying this for me.
Perl_croak_popstack was made contextless to save a push/move instruction
at each caller (less instructions in the instruction cache) and for more
opportunity for the compiler to optimize. Since Perl_croak_popstack is a
noreturn, some compilers may optimize it to just a conditional jump
instruction. VC 2003 32 bit did this inside perl517.dll and from XS
modules using POPSTACK. Perl_croak_popstack measures at 0x48 bytes of
instructions under -O1 for me, so previously, those 0x48 minus the
dTHX overhead would have been sitting in the caller because of macro
expansion.
|
|
|
|
|
|
|
| |
This also changes isIDCONT_utf8() to use the Perl definition, which
excludes any \W characters (the Unicode definition includes a few of
these). Tests are also added. These macros remain undocumented for
now.
|
|
|
|
|
| |
Coders should use the macros in handy.h instead of calling these
directly.
|
|
|
|
|
|
|
|
|
|
|
| |
This is so we can deprecate non-core use of the existing one in a future
commit. XS coders should be using the macros in handy.h instead of
calling such functions directly. A future commit will deprecate all of
them, but first the core uses of this one must change so they don't
generate deprecation messages. I will not have a chance to look for
some time, but I suspect that most uses of this function in the core
should be changed to use something else, but in the meantime, the
non-core uses can be deprecated.
|
|
|
|
|
| |
These functions were used internally as helpers for matching \X in
regular expressions. They are no longer used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The regular rexpression operation POSIXA works on any of the (currently)
16 posix classes (like \w and [:graph:]) under the regex modifier /a.
This commit creates similar operations for the other modifiers: POSIXL
(for /l), POSIXD (for /d), POSIXU (for /u), plus their complements.
It causes these ops to be generated instead of the ALNUM, DIGIT,
HORIZWS, SPACE, and VERTWS ops, as well as all their variants. The net
saving is 22 regnode types.
The reason to do this is for maintenance. As of this commit, there are
now 22 fewer node types for which code has to be maintained. The code
for each variant was essentially the same logic, but on different
operands. It would be easy to make a change to one copy and forget to
make the corresponding change in the others. Indeed, this patch fixes
[perl #114272] in which one copy was out of sync with others.
This patch actually reduces the number of separate code paths to 5:
POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU. The complements of the
last 3 use the same code path as their non-complemented version, except
that a variable is initialized differently. The code then XORs this
variable with its result to do the complementing or not. Further, the
POSIXD branch now just checks if the target string being matched is
UTF-8 or not, and then jumps to either the POSIXU or POSIXA code
respectively. So, there are effectively only 4 cases that are coded:
POSIXA, NPOSIXA, POSIXL, and POSIXU. (POSIXA doesn't have to worry
about UTF-8, while NPOSIXA does, hence these for efficiency are coded
separately.)
Removing all this code saves memory. The output of the Linux size
command shows that the perl executable was shrunk by 33K bytes on my
platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2.
The reason this patch was doable was previous work in numbering the
POSIX classes, so that they could be indexed in arrays and bit
positions. This is a large patch; I didn't see how to break it into
smaller components.
I chose to make this code more efficient as opposed to saving even more
memory. Thus there is a separate loop that is jumped to after we know
we have to load a swash; this just saves having to test if the swash is
loaded each time through the loop. I avoid loading the swash until
absolutely necessary. In places in the previous version of this code,
the swash was loaded when the input was UTF-8, even if it wasn't yet
needed (and might never be if the input didn't contain anything above
Latin1); apparently to avoid the extra test per iteration.
The Perl test suite runs slightly faster on my platform with this patch
under -O0, and the speeds are indistinguishable under -O2. This is in
spite of these new POSIX regops being unknown to the regex optimizer
(this will be addressed in future commits), and extra machine
instructions being required for each character (the xor, and some
shifting and masking). I expect this is a result of better caching, and
not loading swashes unless absolutely necessary.
|
|
|
|
|
|
| |
This function uses table lookup to replace 9 more specific functions,
which can be deprecated. They should not have been exposed to the
public API in the first place
|
|
|
|
|
|
|
|
|
| |
This var (or rather PL_reg_state.re_state_regsize, which it is #deffed to)
just holds the index of the maximum opening paren index seen so far in
S_regmatch(). So make it a local var of S_regmatch() and pass it as a
param to the couple of static functions called from there that need it.
(Also give the local var the more meaningful name 'maxopenparen'.)
|
|
|
|
|
|
| |
This refactors the code slightly that checks for Korean precomposed
syllables in \X. It eliminates the PL_variable formerly used to keep
track of things.
|
|
|
|
| |
rename arguments to make more clear what function takes
|
|
|
|
|
|
| |
This saves 1.5 KB in the text section on my machine in regexec.o
(unoptimized) and 820 optimized. I did not benchmark, as we don't
really care very much about performance under 'use locale'.
|
|
|
|
|
| |
These functions are not used by the Perl core. Code should be using
the equivalent macros in handy.h that may avoid a function call.
|
|
|
|
|
|
|
| |
We think this is meant to stand for C's alphanumeric, that is what is
matched by POSIX [:alnum:]. There were not functions and a dedicated
swash available for accessing it. Future commits will want to use
these.
|
|
|
|
|
| |
This function is defined in utf8.c, but isn't called by the core, and
there was no entry for it in embed.fnc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only affected threaded builds. I think the comments in the added
test explain well enough what was happening.
The solution is to store a stashpad offset in the pmop, instead of the
name of the stash. This is similar to what was done with cop stashes
in d4d03940c58a.
Not only does this fix the crash, but it also makes compilation faster
and saves memory (no separate malloc for every m?pat?).
I had to move Safefree(PL_stashpad) later on in perl_destruct, because
freeing a pmop causes the PL_stashpad to be accessed, and pmops can be
freed during sv_clean_all. Its previous location was not a problem
for cops, as PL_stashpad[cop->cop_stashoff] is only accessed when
PL_curcop==that_cop and Perl code is running, not when cops are freed.
|