| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
A micro-optimization inspired by bulk88's perl #115112.
The original proposal suggested applying a two changes that removed the
duplicate calls, and then explicitly inlined path_is_absolute().
This version removes the duplicate calls, renames the function to better
match it's purpose and asks the compiler to inline it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(note that this is a change both to the perl API and the regex engine
plugin API).
Currently, Perl_re_intuit_start() is passed an SV, plus pointers to:
where in the string to start matching (strpos); and to the end of the
string (strend).
Unlike Perl_regexec_flags(), it doesn't also have a strbeg arg.
Because of this this, it guesses strbeg: based on the passed SV (if its
svPOK()); or just set to strpos otherwise. This latter can happen if for
example the SV is overloaded. Note also that this latter guess is wrong,
and could in theory make /\b.../ fail.
But just to confuse matters, although Perl_re_intuit_start() itself uses
its guesstimate strbeg var, some of the functions it calls use the global
value of PL_bostr instead. To make this work, the *callers* of
Perl_re_intuit_start() currently set PL_bostr first. This is why \b
doesn't actually break.
The fix to this unholy mess is to simply add a strbeg arg to
Perl_re_intuit_start(). It's also the first step to eliminating PL_bostr
altogether.
|
|
|
|
|
|
| |
Remove the is_utf8_pat arg from these two static functions in regexec.c.
Since both these functions are now passed a valid reginfo pointer, this
info is already available as one of the fields in that struct.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
regmatch_info is a small struct that is currently directly allocated as a
local var in Perl_regexec_flags(), and has a few fields that maintain part
of the state of the current pattern match. It is passed as an arg to
various functions that regexec_flags() calls, such as regtry().
In some ways its a rival to PL_reg_state, which also maintains state for
the current match, but which is a global variable (whose state needs
saving and restoring whenever the regex engine goes reentrant). It makes
more sense to store state in the regmatch_info struct, and as a first step
in moving more state to there, this commit makes more use of
regmatch_info.
In particular, it makes Perl_re_intuit_start() also allocate such a
struct, so that now *both* the main execution entry points to the regex
engine make use of it. It's also now passed as an arg to more of the static
functions that these two op-level ones call.
Two changes of special note. First, whether S_find_byclass() got called
with a null reginfo pointer of not indicated whether it had been called
from Perl_regexec_flags() (with a valid reginfo pointer), or from
Perl_re_intuit_start() (null pointer). Since they both pass non-null
reginfo pointers now, instead we add an extra field, reginfo->intuit that
indicates who's the top-level caller.
Secondly, to allow in future for various macros to uniformly refer to
values like reginfo->foo, where the structure is actually allocated as a
local var in Perl_regexec_flags(), we change the reginfo from being the
struct itself to being a pointer to the struct, (so Perl_regexec_flags
itself now uses reginfo->foo too rather than reginfo.foo).
In summary, all the above is essentially window dressing that makes no
functional changes to the code, but will facilitate future changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Peter Martini noted in ticket #116735, lexical subs produce dif-
ferent op trees for ‘foo 1’ and ‘foo(1)’. foo(1) produces an rv2cv
op with a padcv kid. The unparenthetical version produces just
a padcv op.
And the difference in op trees caused lexical sub calls to honour
prototypes only in the presence of parentheses, because rv2cv_op_cv
(which searches for the cv in order to check its prototype) was
expecting rv2cv+padcv.
Not realising there was a discrepancy between the two forms, and
noticing that foo() produces *two* newCVREF ops, in commit 279d09bf893
I made newCVREF return just a padcv op for lexical subs. At the time
I couldn’t figure out why there were two rv2cv ops, and punted on
researching it.
This is how it works for package subs:
When a sub call is compiled, if there are parentheses, an implicit '&'
is fed to the parser. The token that follows is a WORD token with a
constant op attached to it, containing the name of the subroutine.
When the parser sees '&', it calls newCVREF on the const op to create
an rv2cv op.
For sub calls without parentheses, the token passed to the parser is
already an rv2cv op.
The resulting op tree is the same either way.
For lexical subs, I had the lexer emitting an rv2cv op in both paths,
which was why we got the double rv2cv when newCVREF was returning an
rv2cv for lexical subs.
The real solution is to call newCVREF in the lexer only when there
are no parentheses, since in that case the lexer is not going to call
newCVREF itself. That avoids a redundant newCVREF call. Hence, we
can have newCVREF always return an rv2cv op.
The result is that ‘foo(1)’ and ‘foo 1’ produce identical op trees for
a lexical sub.
One more thing needed to change: The lexer was not looking at the
lexical prototype CV but simply the stub to be autovivified, so it
couldn’t see the parameter prototype attached to the CV (the stub
doesn’t have one).
The lexer needs to see the parameter prototype too, in order to deter-
mine precedence.
The logic for digging through pads to find the CV has been extracted
out of rv2cv_op_cv into a separate (non-API!) routine.
|
|
|
|
|
|
| |
This avoids HvFILL() being O(n) for large n on large hashes, but also avoids
storing the value of HvFILL() in smaller hashes (ie a memory overhead on
every single object built using a hash.)
|
|
|
|
| |
It is not marked as part of the API, and no code on CPAN is using it.
|
|
|
|
|
| |
This should restore support for big endian Crays. It doesn't support
mixed-endian systems.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These are now only being used for mixed-endian platforms which do not
provide their own htnol (etc) functions. Given that the fallbacks have been
buggy since they were added in Perl 3.0, it's safe to conclude that no
mixed-endian platforms were ever using these functions.
It's also unclear why these functions were ever marked as 'A', part of the
API. XS code can't call them directly, as it can't rely on them being
compiled. Unsurprisingly, no code on CPAN references them.
|
|
|
|
| |
This will be used in future commits to pass more flags.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for PERL_PERTURB_KEYS environment variable, which in turn allows one to control
the level of randomization applied to keys() and friends.
When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The
chance that keys() changes due to an insert will be the same as in
previous perls, basically only when the bucket size is changed.
When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable
way. The chance that keys() changes due to an insert will be very high.
This is the most secure and default mode.
When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way.
Repititive runs of the same program should produce the same output every
time. The chance that keys changes due to an insert will be very high.
This patch also makes PERL_HASH_SEED imply a non-default
PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies
PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED
to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable
hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a
different level overrides this behavior.
Includes changes to allow one to compile out various aspects of the
patch. One can compile such that PERL_PERTURB_KEYS is not respected, or
can compile without hash key traversal randomization at all. Note that
support for these modes is incomplete, and currently a few tests will
fail.
Also includes a new subroutine in Hash::Util::hash_traversal_mask()
which can be used to ensure a given hash produces a predictable key
order (assuming the same hash seed is in effect). This sub acts as a
getter and a setter.
NOTE - this patch lacks tests, but I lack tuits to get them done quickly,
so I am pushing this with the hope that others can add them afterwards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This appears to resolve these three related tickets:
[perl #116989] S_croak_memory_wrap breaks gcc warning flags detection
[perl #117319] Can't include perl.h without linking to libperl
[perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?)
This patch changes S_croak_memory_wrap from a static (but not inline)
function into an ordinary exported function Perl_croak_memory_wrap.
This has the advantage of allowing programs (particuarly probes, such
as in cflags.SH and Time::HiRes) to include perl.h without linking
against libperl. Since it is not a static function defined within each
compilation unit, the optimizer can no longer remove it when it's not
needed or inline it as needed. This likely negates some of the savings
that motivated the original commit 380f764c1ead36fe3602184804292711.
However, calling the simpler function Perl_croak_memory_wrap() still
does take less set-up than the previous version, so it may still be a
slight win. Specific cross-platform measurements are welcome.
|
|
|
|
|
|
|
|
|
|
| |
Remove volatile qualifiers. Remove the variable jump_ret. Move the
initialisation of restudied back to the declaration. This reverts several of
the changes made by commits 5d51ce98fae3de07 and bbd61b5ffb7621c2.
However, I can't see a cleaner way to avoid code duplication when restarting
the parse than to approach I've taken here - the label redo_first_pass is
now inside an if (0) block, which is clear but ugly.
|
|
|
|
|
|
|
|
|
|
|
| |
The SV listsv is sometimes stored in an array generated near the end of
S_regclass(). In other cases it is not used, and it needs to be freed if
any of the warnings that S_regclass() can trigger turn out to be fatal.
The simplest solution to this problem is to declare it from the start as a
mortal, and claim a (new) reference to it if it is *not* to be freed. This
permits the removal of all other code related to ensuring that it is freed
at the right time, but not freed prematurely if a call to a warning returns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
S_ptr_hash() - A new static function in hv.c which can be used to
hash a pointer or integer.
PL_hash_rand_bits - A new interpreter variable used as a cheap
provider of "semi-random" state for use by the hash infrastructure.
xpvhv_aux.xhv_rand - Used as a mask which is xored against the
xpvhv_aux.riter during iteration to randomize the order the actual
buckets are visited.
PL_hash_rand_bits is initialized as interpreter start from the random
hash seed, and then modified by "mixing in" the result of ptr_hash()
on the bucket array pointer in the hv (HvARRAY(hv)) every time
hv_auxinit() allocates a new iterator structure.
The net result is that every hash has its own iteration order, which
should make it much more difficult to determine what the current hash
seed is.
This required some test to be restructured, as they tested for something
that was not necessarily true, we never guaranteed that two hashes with
the same keys would produce the same key order, we merely promised that
using keys(), values(), or each() on the same hash, without any
insertions in between, would produce the same order of visiting the
key/values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Namely:
* The first character in ${...} used to have no restrictions
* ${foo:bar} used to be legal
* ${foo::bar} worked, but ${foo'bar} didn't
And possibly other subtle, so far undiscovered bugs. This was
resolved by simply using the same code for both things.
Note that this commit is not entirely useful on its own; While
tests pass, it requires changes from the following commit to work
entirely.
|
|
|
|
|
|
|
|
| |
Whilst this is slightly more work for its existing two callers, it will
permit Perl_hv_ksplit() to also call it.
Use STRLEN for the parameters, and change a local variable from I32 to
STRLEN to match.
|
|
|
|
|
| |
The latter is a somewhat less clumsy name. The old one is provided a a
very clear name; the new one as a somewhat slangy version
|
|
|
|
|
| |
This function is just an assert and a macro call. Avoid the function
call overhead by making it inline.
|
|
|
|
|
|
|
|
| |
In using the av_top() function created in a recent commit, I found
myself being confused, and thinking it meant the top element of the
array, whereas it really means the index of the top element of that
array. Since the new name has not appeared in a stable release, it can
be changed, without remorse, to include 'index' in it.
|
|
|
|
|
| |
These functions do not begin with 'Perl_'; currently this flag is
ignored here.
|
|
|
|
|
|
|
|
|
| |
This commit adds the capability for '(?[ ])' to contain interpolated
variables from other '(?[ ])' constructs. A set operation can thus be
built up from the composition of other ones, without having to worry
about precedence, etc.
Thanks to Aaron Crane for suggesting this.
|
|
|
|
| |
Thanks to Hugo van der Sanden for reviewing this new code.
|
|
|
|
|
|
| |
The code to parse the flags that occur after in '(?foo)' and
'(?foo:bar)' is extracted into a function; some comments were added.
This is in preparation for this to be called from an additional place
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose is less machine instructions/faster code.
* S_hv_free_ent_ret() is always called with entry non-null: so change its
signature to reflect this, and remove a null check;
* Add some SvREFCNT_dec_NNs;
* In hv_clear(), refactor the code slightly to only do a SvREFCNT_dec_NN
within the branch where its already been determined that the arg is
non-null; also, use the _nocontext variant of Perl_croak() to save
a push instruction in threaded perls.
|
|
|
|
|
|
|
| |
Make the second variable name in embed.fnc match those used in the
actual function declaration. This will matter if we add in 'entry'
to PERL_ARGS_ASSERT_HV_FREE_ENT_RET. Also regen headers (only proto.h
is affected) to match.
|
|
|
|
| |
av_len() is misleadingly named.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three pairs of characters that Perl recognizes as
metacharacters in regular expression patterns: {}, [], and (). These
can be used as well to delimit patterns, as in:
m{foo}
s(foo)(bar)
Since they are metacharacters, they have special meaning to regular
expression patterns, and it turns out that you can't turn off that
special meaning by the normal means of preceding them with a backslash,
if you use them, paired, within a pattern delimitted by them. For
example, in
m{foo\{1,3\}}
the backslashes do not change the behavior, and this matches "f", "o"
followed by one to three more occurrences of "o".
Usages like this, where they are interpreted as metacharacters, are
exceedingly rare; we think there are none, for example, in all of CPAN.
Hence, this deprecation should affect very little code. It does give
notice, however, that any such code needs to change, which will in turn
allow us to change the behavior in future Perl versions so that the
backslashes do have an effect, and without fear that we are silently
breaking any existing code.
=head1 Performance Enhancements
|
|
|
|
|
|
|
|
|
|
| |
This recently added regex syntax imposes stricter rules on parsing than
normal. However, this did not include parsing \N{} constructs that
occur within it. This commit does that, making fatal the warnings that
come from \N{}
I will add to perldiag the newly added messages along with the others
for (?[ ]) before 5.18 ships
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC++-specific warning"
There is no written investigation to google up for
the record. I don't want to forget that the #ifdef is benign and
accidentally reinvestigate it in the future. .text section of
perl517.dll was 0xC013F before and after the commit. No change.
-----------------------------------------------------------------
|
|
|
|
|
| |
("function declared with __declspec(noreturn) has non-void return type" /
"function declared with __declspec(noreturn) has a return statement".)
|
|
|
|
|
| |
("'initializing' : conversion from 'I32' to 'U8', possible loss of data"
and "formal parameter n different from declaration".)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was discussed in thread
http://perl.markmail.org/thread/avtzvtpzemvg2ki2
but I never got around to this portion of the consensus, until now.
I did a cpan grep
http://grep.cpan.me/?q=%28^|[^\\]%29\\[0-7]{1%2C2}[8-9]&page=1
and eyeballing the results, saw three cases where this warning might
show up; one of which was for EBCDIC. The others looked to be false
positives, such as in .css files.
|
|
|
|
|
| |
Macro don't have variable numbers of args, hence the entry in embed.h is
suppressed.
|
|
|
|
|
|
|
|
|
|
|
| |
These macros should not be used as they are prone to misuse. There are
no occurrences of them in CPAN. The single use of either of them in
core has recently been removed (commit
8d40577bdbdfa85ed3293f84bf26a313b1b92f55), because it was a misuse.
Instead code should use isIDFIRST_lazy_if or isWORDCHAR_lazy_if
(isALNUM_lazy_if is also available, but can be confused with the Posix
alnum, which it doesn't mean).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fancier [bracketed] character class which allows set
operations, such as intersection and subtraction. The entry in perlre
for this commit details its operation.
Besides extending regular expressions to handle this functionality,
recommended by Unicode, the intent here is to do three things:
1) Intersection has been simulated by regexes using zero-width
look-around assertions, which are non-obvious. This allows replacing
those with a more powerful and clearer syntax; the compiled regexes
are smaller and faster. Everything is known at compile time.
2) Set operations have also been simulated by using user-defined Unicode
properties. These are globals, have security implications,
restricted names, and d don't allow as complex expressions as this
new feature.
3) I hope that this feature will come to be viewed as a "better"
bracketed character class. I took advantage of the fact that there
is no embedded base to have to be compatibile with to forbid certain
iffy practices with the existing ones, while remaining mostly
backwards compatible. The main difference is that /x is always
enabled, so white space can be pretty much freely used with these,
but to specify a match on white space, it must be escaped. Things
that should have been illegal are, such as \x{}, and \x{abcdefghi}.
Things that look like a posix specifier but don't quite meet the
rules now give an error instead of silently compiling. e.g., [:digit]
is an error instead of the union of the characters that compose it.
I may have omitted things; perhaps it should be an error to have the
same letter occur twice, adjacent. Since this is experimental, we
can make such changes based on field feed back.
The intent is to keep this feature, since it is strongly recommended by
Unicode. The exact syntax is subject to change, so is experimental.
|
|
|
|
|
| |
This is currently unused, but will have regclass() return an inversion
list instead of a node.
|
|
|
|
|
| |
This adds a parameter to regpposixcc() to enforce stricter rules on the
posix class syntax. It is currently unused
|
|
|
|
|
|
|
| |
The plan is to eventually convert all of regcomp to use this for white
space ignoring under /x, but this will be used for now in just the new
syntax for (?[ ]), coming in a few commits. Until then, this function
is unused.
|
|
|
|
|
| |
This parameter silences warnings for non-portable characters. It
currently is always FALSE, meaning that warnings are given.
|
|
|
|
|
|
|
|
|
| |
This parameter allows the caller to specify whether multi-character
folds should be allowed or not. In general it should, and in the case
where this commit says it shouldn't, they never are returned anyway from
Unicode properties.
This capability will be put to real use by future commits
|
|
|
|
|
|
|
|
| |
If a hex or octal number is too big to fit in a 32 bit word, grok_oct
and grok_hex by default output a warning that it is a non-portable
value. This new parameter to the grok_bslash functions can cause them
to shut up those warnings. This is currently unused, but will be needed
in future commits.
|
|
|
|
|
|
| |
This mode croaks on any iffy constructs that currently compile. It is
not currently used; documentation of the error messages will be
delivered later.
|
|
|
|
|
|
| |
By passing the address of the parse pointer, the functions can advance
it, eliminating a parameter to the function, and simplifying the code in
the caller.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing \p{} outside of a bracketed character class, code in
regcomp.c has pretended it is a bracketed character class by changing
and restoring the parsing pointers, and then calling the charclass
handler. This code can be simplified by instead passing a flag to the
handler meaning to just parse one item. The faking is simpler there,
with no restoring necessary. Also we can eliminate the duplicate
handling of special cases.
Future commits will make more extensive use of this mechanism.
|
|
|
|
|
| |
This debugging function is normally #ifdef'd out, but should it e
enabled, the flags were wrong.
|
|
|
|
|
| |
This function is specified as inline in the source code, but not in the
prototypes; only one compiler seems to have noticed.
|