| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
so that stringification will be able to use it, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
newATTRSUB requires the sub name to be passed to it wrapped up in
a const op.
Commit 8756617677dbd allowed it to accept a GV that way, since
S_maybe_add_coresub (in gv.c) needed to pass it an existing GV not in
the symbol table yet (to simplify code elsewhere).
This had the inadvertent side-effect of making the GV read-only, since
that’s what the check function for const ops does.
Even if we were to call this a feature, it wouldn’t make sense as
implemented, as GVs for non-ampable (&-able) subs like *CORE::chdir
were not being made read-only.
This commit adds a new flag to newATTRSUB, to allow a GV to be passed
as the o parameter, instead of an op. While this may look as though
it’s undoing the simplification in commit 8756617677dbd by adding
more code, the new code is still conceptually simpler and more
straightforward.
Since newATTRSUB is in the API, I had to add a new _flags variant.
(How did newATTRSUB get into the API to begin with?)
In adding a test, I also discovered that ‘used once’ warnings
were applying to these subs, which is obviously wrong. Commit
8756617677dbd caused that, too, as it was relying on the side-effect
of newATTRSUB doing a GV lookup.
This fixes that, too, by turning on the multi flag in
S_maybe_add_coresub.
|
|
|
|
|
|
|
| |
The strings in every EXACTFish node are examined for certain problematic
sequences and code points. Prior to this patch, this was done in
several passes, but this refactors the routine to do it in a single
pass.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the function that returns the swash associated with a
bracketed character class so that it returns the original swash and not
a copy. The function is renamed and made accessible only from within
regexec.c, and a new wrapper function with the original name is created
that just calls the other one and returns a copy of the swash.
Thus, all access from outside regexec.c will use a copy which if
overwritten will not harm others; while the option exists from within
regexec.c to use a shared version.
|
|
|
|
| |
This will be used in future commits for debug traces
|
|
|
|
|
|
|
| |
Add a new parameter to _core_swash_init() that is an inversion list to
add to the swash, along with a boolean to indicate if this inversion
list is derived from a user-defined property. This capability will prove
useful in future commits
|
|
|
|
|
|
| |
This adds the capability, to be used in future commits, for swash_ini()
to return NULL instead of croaking if it can't find a property, so that
the caller can choose how to handle the situation.
|
|
|
|
| |
This function will be used in future commits
|
|
|
|
|
| |
This function does a binary search on an inversion list. It will be
used in future commits
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, swash_init returns a copy of the swash it finds. The core
portions of the swash are read-only, and the non-read-only portions are
derived from them. When the value for a code point is looked up, the
results for it and adjacent code points are stored in a new element,
so that the lookup never has to be performed again. But since a copy is
returned, those results are stored only in the copy, and any other uses
of the same logical stash don't have access to them, so the lookups have
to be performed for each logical use.
Here's an example. If you have 2 occurrences of /\p{Upper}/ in your
program, there are 2 different swashes created, both initialized
identically. As you start matching against code points, say "A" =~
/\p{Upper}/, the swashes diverge, as the results for each match are
saved in the one applicable to that match. If you match "A" in each
swash, it has to be looked up in each swash, and an (identical) element
will be saved for it in each swash. This is wasteful of both time and
memory.
This patch renames the function and returns the original and not a copy,
thus eliminating the overhead for stashes accessed through the new
interface. The old function name is serviced by a new function which
merely wraps the new name result with a copy, thus preserving the
interface for existing calls.
Thus, in the example above, there is only one swash, and matching "A"
against it results in only one new element, and so the second use will
find that, and not have to go out looking again. In a program with lots
of regular expressions, the savings in time and memory can be quite
large.
The new name is restricted to use only in regcomp.c and utf8.c (unless
XS code cheats the preprocessor), where we will code so as to not
destroy the original's data. Otherwise, a change to that would change
the definition of a Unicode property everywhere in the program.
Note that there are no current callers of the new interface; these will
be added in future commits.
|
|
|
|
|
| |
This function has always confused me, as it doesn't return a swash, but
a swatch.
|
|
|
|
|
|
|
|
|
|
| |
These 4 functions have been replaced by variants to_utf8_foo_flags(),
but for XS code that called the old ones in the Perl_to_utf8_foo()
forms, backwards compatibility versions need to be created.
For calls of just the to_utf8_foo() forms, macros have been used to
automatically call the new forms without the performance penalty of
going through the compatibility functions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some operators, like pp_complement, assign their argument to TARG
(which copies vstring magic), modify it in place, and then call set-
magic. That’s supposed to work, but vstring magic was remaining as it
was, such that ~v7 would still be treated as "v7" by vstring-aware
code, even though the resulting string is not "\7".
This commit adds vstring set-magic that checks to see whether the pv
still matches the vstring. It cannot simply free the vstring magic,
as that would prevent $x=v0 from working.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug is a side effect of rv2gv’s starting to return an incoercible
mortal copy of a coercible glob in 5.14:
$ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell'
0
$ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell'
-1
In the first case, tell without arguments is returning the position of
the filehandle.
In the second case, tell with an explicit argument that happens to
be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu-
ally tell *$fh) sets PL_last_in_gv to a mortal copy thereof, which is
freed at the end of the statement, setting PL_last_in_gv to null. So
there is no ‘last used’ handle by the time we get to the tell without
arguments.
This commit adds a new rv2gv flag that tells it not to copy the glob.
By doing it unconditionally on the kidop, this allows tell(*$fh) to
work the same way.
Let’s hope nobody does tell(*{*$fh}), which will unset PL_last_in_gv
because the inner * returns a mortal copy.
This whole area is really icky. PL_last_in_gv should be refcounted,
but that would cause handles to leak out of scope, breaking programs
that rely on the auto-closing ‘feature’.
|
|
|
|
|
|
|
|
|
|
| |
This changes the 4 case changing functions to take extra parameters to
specify if the utf8 string is to be processed under locale rules when
the code points are < 256. The current functions are changed to macros
that call the new versions so that current behavior is unchanged.
An additional, static, function is created that makes sure that the
255/256 boundary is not crossed during the case change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When substr() occurs in potential lvalue context, the offsets are
adjusted to the current string (negative being converted to positive,
lengths reaching beyond the end of the string being shortened, etc.)
as soon as the special lvalue to be returned is created.
When that lvalue is assigned to, the original scalar is stringified
once more.
That implementation results in two bugs:
1) Fetch is called twice in a simple substr() assignment (except in
void context, due to the special optimisation of commit 24fcb59fc).
2) These two calls are not equivalent:
$SIG{__WARN__} = sub { warn "w ",shift};
sub myprint { print @_; $_[0] = 1 }
print substr("", 2);
myprint substr("", 2);
The second one dies. The first one only warns. That’s mean. The
error is also wrong, sometimes, if the original string is going to get
longer before the substr lvalue is actually used.
The behaviour of \substr($str, -1) if $str changes length is com-
pletely undocumented. Before 5.10, it was documented as being unreli-
able and subject to change.
What this commit does is make the lvalue returned by substr remember
the original arguments and only adjust the offsets when the assign-
ment happens.
This means that the following now prints z, instead of xyz (which is
actually what I would expect):
$str = "a";
$substr = \substr($str,-1);
$str = "xyz";
print $substr;
|
| |
|
|
|
|
|
| |
This simplifies the code, as it's only called from one spot, in
Perl_moreswitches().
|
|
|
|
|
|
|
|
|
|
| |
When -Dusesitecustomize is used with -Duserelocatableinc,
SITELIB_EXP/sitecustomize.pl is not found due to SITELIB_EXP having a
'.../..' relocation path.
This patch refactors the path relocation code from S_incpush() into
S_mayberelocate() so that it can be used in both S_incpush() and in
usesitecustomize's use of SITELIB_EXP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sv_force_normal is passed the SV_COW_DROP_PV flag if the scalar is
about to be written over. That flag is not currently used. We can
speed up assignment over fake GVs a lot by taking advantage of the flag.
Before and after:
$ time ./perl -e '$x = *foo, undef $x for 1..2000000'
real 0m4.264s
user 0m4.248s
sys 0m0.007s
$ time ./perl -e '$x = *foo, undef $x for 1..2000000'
real 0m1.820s
user 0m1.812s
sys 0m0.005s
|
|
|
|
|
|
|
|
|
|
| |
The logic surrounding subroutine redefinition warnings (to warn or not
to warn?) was in three places. Over time, they drifted apart, to the
point that newXS was following completely different rules. It was
only warning for redefinition of functions in the autouse namespace.
Recent commits have brought it into conformity with the other redefi-
nition warnings.
Obviously it’s about time we put it in one function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason why constant redefinition warnings should be
default warnings for sub foo(){1}, but not for newCONSTSUB (which
calls newXS, which triggers the warning).
To make this work properly, I also had to import sv.c’s ‘are these
const subs from the same SV originally?’ logic. Constants created
with XS can have NULL for the SV (they return an empty list or
&PL_sv_undef), which means sv.c’s logic will stop *this=\&that from
warning if both this and that are such XS-created constants.
newCONSTSUB needed to be consistent with that. It required tweaking a
test I added a few commits ago, which arguably shouldn’t have warned
the way it was written.
As of this commit (and before it, too, come to think of it),
newXS_len_flags’s calling convention is quite awful and would need to
be throughly re-thunk before being made into an API, or probably sim-
ply never made into an API.
|
|
|
|
|
|
|
|
|
| |
It accepts a length as well as a pv for the name.
Since newXS_flags is marked with M in embed.fnc and is undocumented,
technically policy allows me to change it, but there are files
throughout cpan/ that use newXS_flags. So it seemed safer to add a
new function.
|
|
|
|
|
| |
This function was added after 5.14.0, so it is not too late to
change it. It is currently unused.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn’t any more.
Now the hints are localised in a separate inner scope surrounding the
call to yyparse. This meant moving hint-handling code from pp_require
and pp_entereval into S_doeval.
Some tests in t/comp/hints.t were testing for the buggy behaviour, so
they have been adjusted.
Basically, this fixes
sub import {
eval "strict->import"
}
which should work the same way as
sub import {
strict->import
}
but was not working because %^H and $^H were being localised to
the eval at its run time, not just its compilation. So the values
assigned to %^H and $^H at the eval’s run time would simply be lost.
|
|
|
|
|
|
|
| |
If something like this were to be made more generally available, it
would be better to have two in-line functions, to_upper_latin1() and
to_title_latin1() that just call this underlying one with the correct
final parameter.
|
|
|
|
|
|
|
| |
This adds a function similar to the ones for the other three case
changing operations that works on latin1 characters only, and avoids
having to go out to swashes. It changes to_uni_fold() and
to_utf8_fold() to call it on the appropriate input
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates a new function to handle upper/title casing code points in
the latin1 range, and avoids using a swash to compute the case. This is
because the correct values are compiled-in.
And it calls this function when appropriate for both title and upper
casing, in both utf8 and uni forms,
Unlike the similar function for lower casing, it may make sense for this function to be
called from outside utf8.c, but inside the core, so it is not static,
but its name begins with an underscore.
|
|
|
|
|
| |
The portion that deals with Latin1 range characters is refactored into a
separate (static) function, so that it can be called from more than one place.
|
|
|
|
|
|
|
| |
Following Michael Schwern’s suggestion, here is a warning for those
hapless folks who use $[ for version checks.
It applies whenever $[ is used in one of: < > <= >=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, whenever we dump an op tree, we first call sequence(),
which walks the tree, creating address => sequence# mappings in
PL_op_sequence. Then when individual ops or op-next fields are displayed,
the sequence is looked up.
Instead, do away with the initial walk, and just map addresses on request.
This simplifies the code.
As a deliberate side-effect, it no longer assigns a seq# of zero to
null ops. This makes it easer to work out what's going on when you
call op_dump() during a debugging session with partially constructed
op-trees. It also removes the ambiguity in "====> 0" as to whether
op_next is NULL or just points to an op_null.
|
|
|
|
|
| |
This adds _pv, _pvn, and _pv versions of whichsig() in mg.c, which
get both kill "NAME" and %SIG lookup nul-clean.
|
|
|
|
|
|
|
|
|
|
| |
It is no longer used in core (having been superseded by
cv_ckproto_len_flags), is unused on CPAN, and is not part of the API.
The cv_ckproto ‘public’ macro is modified to use the _flags version.
I put ‘public’ in quotes because, even before this commit, cv_ckproto
was using a non-exported function, and hence could never have worked
on a strict linker (or whatever you call it).
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that eval "sub foo ($;\0whoops) { say @_ }" will correctly
include \0whoops in the CV's prototype (while complaining about illegal
characters), and that
use utf8;
BEGIN { $::{"foo"} = "\$\0L\351on" }
BEGIN { eval "sub foo (\$\0L\x{c3}\x{a9}on) {};"; }
will not warn about a mismatched prototype.
|
|
|
|
|
| |
This adds _sv, _pv, and _pvn forms to sv_does, and changes it to use
sv_ref() instead of sv_reftype().
|
|
|
|
|
|
|
|
|
|
|
| |
This patch also duplicates existing mro tests with copies that use
Unicode in identifiers, to test the mro code.
Since those tests trigger it, it also fixes a bug in the parsing
of *{...}: If the first character inside the braces is a non-ASCII
Unicode identifier character, the inside is now implicitly quoted
if it is just an identifier (just as it is with ASCII identifiers),
instead of being parsed as a bareword that would violate strict subs.
|
|
|
|
|
|
|
| |
This makes them both nul-and-UTF8 clean, although the latter
is somewhat superficial, as mro isn't clean yet.
(Tests coming once ->can and ->DOES are clean)
|
|
|
|
| |
This is exported so that attributes.xs can use it.
|
|
|
|
|
|
|
|
| |
newXS was merged into newXS_flags; added a line in the docs
recommeding using that instead.
newCONSTSUB got a _flags version, which generates the CV in
the right glob if passed the UTF-8 flag.
|
|
|
|
|
| |
Since multi is a boolean (even though it’s typed as an int), there is
no need to have a separate parameter. We can just use a flag bit.
|
| |
|
|
|
|
|
|
|
|
| |
method is a boolean flag (typed I32, but used as a boolean) added by
commit 54310121b442.
These new gv_autoload_* functions have a flags parameter, so there’s
no reason for this extra effective bool. We can just use a flag bit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 4 was added in commit 54310121b442 (inseparable changes during
5.003/4 developement), presumably the ‘Don't look up &AUTOLOAD in @ISA
when calling plain function’ part.
Before that, gv_autoload had three arguments, so the 4 indicated the
new version (with the method argument).
Since these new functions don’t all have four arguments, and since
they have a new naming convention, there is not reason for the 4.
|
| |
|
|
|
|
|
|
| |
In addition from taking a flags parameter, it also takes the
length of the method; This will eventually make method
lookup nul-clean.
|
| |
|
|
|
|
|
| |
I'm probably pushing this too early. Can't do the
Perl-level tests because of that. TODO.
|
|
|
|
|
|
|
|
|
| |
gv_init_pvn() is the same as the old gv_init(), but takes
a flags parameter, which will be used for the UTF-8 cleanup.
The old gv_init() is now implemeneted as a macro in gv.h.
Also included is some minimal testing in XS::APItest.
|