| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
rather than $__ANONIO__
That dollar sign *has* to have been a mistake. In ck_fun, the
name was set to __ANONIO__, but it seems the change that added it
(afd1915d43) did not account for the fact that a little later on the
same function checks to makes sure it begins with a dollar sign, as it
could only be a variable name.
rv2gv’s use of $__ANONIO__ (added recently by yours truly) was just
copying was ck_fun was doing.
|
|
|
|
|
|
|
|
|
| |
As proposed on p5p and approved, this changes the functions uc(), lc(),
ucfirst(), and lcfirst() to respect locale for code points < 255; and
use Unicode semantics for those above 255. This results in better, but
not perfect results, as noted in the changed pods, and brings these
functions into line with how regular expression pattern matching already
works.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When substr() occurs in potential lvalue context, the offsets are
adjusted to the current string (negative being converted to positive,
lengths reaching beyond the end of the string being shortened, etc.)
as soon as the special lvalue to be returned is created.
When that lvalue is assigned to, the original scalar is stringified
once more.
That implementation results in two bugs:
1) Fetch is called twice in a simple substr() assignment (except in
void context, due to the special optimisation of commit 24fcb59fc).
2) These two calls are not equivalent:
$SIG{__WARN__} = sub { warn "w ",shift};
sub myprint { print @_; $_[0] = 1 }
print substr("", 2);
myprint substr("", 2);
The second one dies. The first one only warns. That’s mean. The
error is also wrong, sometimes, if the original string is going to get
longer before the substr lvalue is actually used.
The behaviour of \substr($str, -1) if $str changes length is com-
pletely undocumented. Before 5.10, it was documented as being unreli-
able and subject to change.
What this commit does is make the lvalue returned by substr remember
the original arguments and only adjust the offsets when the assign-
ment happens.
This means that the following now prints z, instead of xyz (which is
actually what I would expect):
$str = "a";
$substr = \substr($str,-1);
$str = "xyz";
print $substr;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In void context we can optimise
substr($foo, $bar, $baz) = $replacement;
to something like
substr($foo, $bar, $baz, $replacement);
except that the execution order must be preserved. So what we actu-
ally do is
substr($replacement, $foo, $bar, $baz);
with a flag to indicate that the replacement comes first. This means
we can also optimise assignment to two-argument substr the same way.
Although optimisations are not supposed to change behaviour,
this one does.
• It stops substr assignment from calling get-magic twice, which means
the optimisation makes things less buggy than usual.
• It causes the uninitialized warning (for an undefined first argu-
ment) to mention the substr operator, as it did before the previous
commit, rather than the assignment operator. I think that sort of
detail is minor enough.
I had to make the warning about clobbering references apply whenever
substr does a replacement, and not only when used as an lvalue. So
four-argument substr now emits that warning. I would consider that a
bug fix, too.
Also, if the numeric arguments to four-argument substr and the
replacement string are undefined, the order of the uninitialized warn-
ings is slightly different, but is consistent regardless of whether
the optimisation is in effect.
I believe this will make 95% of substr assignments run faster. So
there is less incentive to use what I consider the less readable form
(the four-argument form, which is not self-documenting).
Since I like naïve benchmarks, here are Before and After:
$ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000'
real 0m2.391s
user 0m2.381s
sys 0m0.005s
$ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000'
real 0m0.936s
user 0m0.927s
sys 0m0.005s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This program:
#!perl -l
sub myprint { print @_ }
print substr *foo, 1;
myprint substr *foo, 1;
produces:
main::foo
Can't coerce GLOB to string in substr at - line 4.
Ouch!
I would expect \substr simply to give me a scalar that peeks into the
original string, but without modifying the original until the return
value of \substr is actually assigned to.
But it turns out that it coerces the original into a string immedi-
ately, unless it’s GMAGICAL. I find the exception for magical varia-
ble rather befuddling. I can only imagine it was for efficency (since
the stringified form will be overwritten when magic_setsubstr calls
SvGETMAGIC), but that doesn’t make sense as the original variable can
itself be modified between the return of the special lvalue and the
assignment to that lvalue.
Since magic_setsubstr itself coerces the variable into a string upon
assignment to the lvalue, we can just remove the coercion code from
pp_substr.
But that causes double uninitialized warnings in cases like
substr($undef, 0,0) = "lrep".
That happens because pp_substr is still stringifying the variable (but
without modifying it). It has to do that, as it looks at the length
of the original string and accordingly adjusts the offsets stored in
the lvalue if they are negative or if they extend beyond the end of
the string.
So this commit takes the simple route of avoiding the warning in
pp_substr by only stringifying a variable that is SvOK if called in
lvalue context.
Hence, assignment to substr($tied...) will continue to call FETCH
twice, but that is not a new bug.
The ideal solution would be for the offsets to be translated in mg.c,
rather than in pp_substr. But that would be a more involved change
(including most of this commit, which is therefore not wasted) with
potential backward-compatibility issue with negative numbers.
A side effect it that the ‘Attempt to use reference as lvalue in
substr’ warning now occurs during the assignment to the substr lvalue,
rather that substr itself. This means it occurs even for tied varia-
bles, so things are now more consistent.
The example at the beginning could still croak if the glob were
replaced with a null string, so this commit only partially allevi-
ates the pain.
|
| |
|
|
|
|
|
|
| |
After sv_force_normal_flags, the scalar will no longer be read-only,
except in those cases where sv_force_normal_flags croaks. So this
check will never be true when SvFAKE was true.
|
|
|
|
|
| |
As amagic_deref_call pushes a new stack, PL_stack_sp will always have
the same value before and after, so SPAGAIN is unnecessary.
|
|
|
|
|
| |
After much alternation, altercation and alteration, __SUB__ is
finally here.
|
|
|
|
| |
This brings it into conformity with y without the /r.
|
| |
|
|
|
|
|
| |
A compiler generated a warning about this. It is the degenerate case
with an empty input, so isn't really a problem, but silence the warning
|
|
|
|
|
|
|
|
| |
Now that there is a function that can convert a latin1 character to
title or upper case without going out to swashes, we can call it instead
of repeating the code. There is the additional overhead of a function
call, but this could be avoided if it comes down to it by making it
in-line.
|
| |
|
|
|
|
|
|
|
|
|
| |
Now that there is a function that can convert a latin1 character to
title or upper case without going out to swashes, we can call it
instead of repeating the code. There is the additional overhead of a
function call, but this could be avoided if it comes down to it by
making it in-line. And this only happens when upper-casing y with
diaresis, and the micro sign
|
|
|
|
|
| |
This outdents and reflows comments as a result of the removal of a
surrounding block
|
|
|
|
|
|
|
|
|
| |
Now that toLOWER_utf8() and toTITLE_utf8() have the intelligence to skip
going out to swashes for Latin1 code points, it's not so critical to
bypass calling them for these (for speed). It simplifies things not to
have the intelligence repeated. There is the additional overhead of two
function calls (minus the branches saved), but these could be avoided if
it comes down to it by making them in-line.
|
|
|
|
|
| |
This outdents and reflows comments as a result of the removal of a
surrounding block
|
|
|
|
|
|
|
|
|
| |
Now that toUPPER_utf8() has the intelligence to skip going out to
swashes for Latin1 code points, it's not so critical to bypass calling
it for these (for speed). It simplifies things not to have the
intelligence repeated. There is the additional overhead of two function
calls (minus the branches saved), but these could be avoided if it comes
down to it by making them in-line.
|
|
|
|
|
| |
Almost always the input to uc() will be one of the other 253 Latin1
characters rather than one of the three that gets here.
|
|
|
|
|
| |
This outdents and reflows comments as a result of the removal of a
surrounding block
|
|
|
|
|
|
|
|
|
| |
Now that toLOWER_utf8() has the intelligence to skip going out to
swashes for Latin1 code points, it's not so critical to bypass calling
it for these (for speed). It simplifies things not to have the
intelligence repeated. There is the additional overhead of two function
calls (minus the branches saved), but these could be avoided if it comes
down to it by making them in-line.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gv_efullname4 produces undef if the GV points to no stash, instead of
using __ANON__, as it does when the stash has no name.
Instead of going through hoops to try and work around it elsewhere, fix
gv_efullname4.
This means that
$x = *$io;
$x .= "whate’er";
no longer produces an uninitialized warning. (The warning was rather
strange, as defined() returned true.)
This commit also gives the glob the name $__ANONIO__ (yes, with a dol-
lar sign). It may seem a little strange, but there is precedent in
other autovivified globs, such as those open() produces when it cannot
determine the variable name (e.g, open $t->{fh}).
|
|
|
|
|
|
| |
This outdents a block to the same level as the surrounding text, and
reflows the comments to take advantage of the extra space and use fewer
lines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was always #ifdef'd out. It would have been used to convert
to a Greek final sigma from a non-final one, depending on context. The
problem is that we can't know algorithmically if a final sigma is in
order or not. I excerpt this quote, that I find persuasive, from
correspondence from Father Chrysostomos, who knows Greek:
"I cannot see how any algorithm can know to get it right.
"The letter σ (or Σ in capitals) represents the number 200 in Greek
numerals. Those are not just ancient Greek numerals, but are used on a
regular basis even in modern Greek. In many printed books ς is used in
place of ϛ, which represents the number 6. So if casefolding should
change ͵ΑΣʹ to ͵αςʹ, or if an output layer changes ͵ασʹ similarly, it
will be changing the number (from 1200 to 1006). You can’t get around
it by checking for the Greek numeral sign (ʹ), as sometimes the tonos
(΄), oxeia (´), or even the ASCII straight quote is used. And often in
lists or chapter titles a dot is used instead of numeral sign.
"Also, σ is commonly used at the ends of abbreviations. Changing ‘βλέπε
σ. 16’ (‘see page 16’) to ‘βλέπε ς. 16’ is not acceptable.
"So, no, I don’t think a programming language should be fiddling with σ
versus ς. (A word processor is another matter.)"
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Did you know that a subroutine’s prototype can be modified with s///?
Don’t look:
*AUTOLOAD = *Internals'SvREFCNT;
my $f = "Just another "; eval{main->$f};
print prototype AUTOLOAD;
$f =~ s/Just another /Perl hacker,\n/;
print prototype AUTOLOAD;
You did look, didn’t you? You must admit that’s creepy.
The problem goes back to this:
commit adb5a9ae91a0bed93d396bb0abda99831f9e2e6f
Author: Doug MacEachern <dougm@covalent.net>
Date: Sat Jan 6 01:30:05 2001 -0800
[patch] xsub AUTOLOAD fix/optimization
Message-ID: <Pine.LNX.4.10.10101060924280.24460-100000@mojo.covalent.net>
Allow AUTOLOAD to be an xsub and allow such xsubs
to avoid use of $AUTOLOAD.
p4raw-id: //depot/perl@8362
which includes this:
+ if (CvXSUB(cv)) {
+ /* rather than lookup/init $AUTOLOAD here
+ * only to have the XSUB do another lookup for $AUTOLOAD
+ * and split that value on the last '::',
+ * pass along the same data via some unused fields in the CV
+ */
+ CvSTASH(cv) = stash;
+ SvPVX(cv) = (char *)name; /* cast to loose constness warning */
+ SvCUR(cv) = len;
+ return gv;
+ }
That ‘unused’ field is not unused. It’s where the prototype is
stored. So, not only is it clobbering the prototype, it’s also leak-
ing it by assigning over the top of SvPVX. Furthermore, it’s blindly
assigning someone else’s string, which could be freed before it’s
even used.
Since it has been documented for a long time that SvPVX contains the
name of the AUTOLOADed sub, and since the use of SvPVX for prototypes
is documented nowhere, we have to preserve the former.
So this commit makes the prototype and the sub name share the same
buffer, in a manner resembling that which CvFILE used before I changed
it with bad4ae38.
There are two new internal macros, CvPROTO and CvPROTOLEN for retriev-
ing the prototype.
|
|
|
|
|
|
|
|
| |
This makes perl -E '$::{example} = "\x{30cb}"; say prototype example;'
store and fetch the correctly flagged prototype.
With this, all TODO tests in gv.t pass; The next commit will deal
with making the parsing of prototypes nul-clean.
|
| |
|
| |
|
|
|
|
|
| |
Since typeglobs may have the UTF8 flag set now, we need to avoid
testing SvCUR on a potential glob, as that would trip an assertion.
|
|
|
|
|
|
|
|
|
| |
This adds a new function to sv.c, sv_ref, which is a nul-and-UTF8
clean version of sv_reftype. pp_ref now uses that.
sv_ref() not only returns the SV, but also takes in an SV
to modify, so we can say both sv_ref(TARG, obj, TRUE); and
sv = sv_ref(NULL, obj, TRUE);
|
|
|
|
|
| |
Some tests in t/uni/bless.t are TODO, as ref() isn't
clean yet.
|
| |
|
| |
|
|
|
|
| |
They were nearly identical.
|
|
|
|
|
| |
They are almost identical. This gives the compiler less code
to digest.
|
|
|
|
| |
These ops considered typeglobs read-only, even if they weren’t.
|
|
|
|
|
|
| |
$[ remains as a variable. It no longer has compile-time magic.
At runtime, it always reads as zero, accepts a write of zero, but dies
on writing any other value.
|
|
|
|
|
|
|
|
|
|
|
| |
There are so many cases that use this incantation to get around
gv_fetchsv’s calling of get-magic--
STRLEN len;
const char *name = SvPV_nomg_const(sv,len);
gv = gv_fetchpvn_flags(name, len, flags | SvUTF8(sv), type);
--that it’s about time we had a shorthand.
|
|
|
|
|
|
| |
and silences some compiler warnings.
I do not understand the code in toke.c but the change aligns the code
with other uses of FUN0OP, it has no warnings and does not break any test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Magical variables usually get autovivified, even in rvalue context,
because Perl is trying to pretend they have been there all along.
That means defined(${"."}) will autovivify $. and return true.
Until CORE subs were introduced, there were no subroutines that popped
into existence when looked at.
This commit makes rv_2cv use the GV_ADDMG flag added in commit
23496c6ea. When this flag is passed, gv_fetchpvn_flags creates a GV
but does not add it to the stash until it finds out that it is creat-
ing a magical one. The CORE sub code calls newATTRSUB, which expects
to add the CV to the stash itself. So the gv has to be added there
and then. So gv_fetchpvn_flags is also adjusted to add the gv to the
stash right before calling newATTRSUB, and to tell itself that the
GV_ADDMG flag is actually off.
It might be better to move the CV-creation code into op.c and inline
parts of newATTRSUB, to avoid fiddling with the addmg variable (and
avoid prototype checks on CORE subs), but that refactoring should
probably come in separate commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resolves perl bug #97978.
Many built-in variables, like $], are actually created on the fly
when first accessed. Perl likes to pretend that these variables have
always existed, so it autovivifies the *] glob even in rvalue context
(e.g., defined *{"]"}, close "]").
The list of variables that were autovivified was maintained separ-
ately (in is_gv_magical_sv) from the code that actually creates
them (gv_fetchpvn_flags). ‘Maintained’ is not actually precise: it
*wasn’t* being maintained, and there were new variables that never
got added to is_gv_magical_sv and one deleted variable that was
never removed.
There are only two pieces of code that call is_gv_magical_sv, both in
pp.c: S_rv2gv (called by *{} and also the implicit *{} that functions
like close() provide) and Perl_softrefxv (called by ${}, @{}, %{}).
In both cases, the glob is immediately autovivified if
is_gv_magical_sv returns true.
So this commit eliminates the extra maintenance burden by extirpat-
ing is_gv_magical_sv altogether, and replacing it with a new flag to
gv_fetchpvn_flags, GvADDMG, which will autovivify a glob *if* it’s a
magical one.
It does make defined(*{"frobbly"}) slightly slower, in that it creates
a temporary glob and then frees it when it sees nothing magical has
been done with it. But this case is rare enough it should not matter.
At least I got rid of the bugginess.
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows &CORE::unpack to be called through references and
via ampersand syntax.
It moves the $_-handling code in pp_coreargs inside the parameter
loop, so it can apply to the second parameter, not just the first.
Consequently, a mkdir test has been added that ensures implicit $_
is not used for mkdir’s second argument; i.e., that the $_-handling
code’s if() condition is correct.
|
|
|
|
|
|
|
| |
This commit allows the tie, tied and untie subroutines in the CORE
namespace to be called through references and via &ersand() syntax.
pp_coreargs is modified to handle the functions with \[$@%*] in their
prototypes (which happen to be just the tie functions).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit makes &CORE::substr callable through references and via
&ersand syntax.
It’s a bit awkward, as we need a substr op that is flagged as hav-
ing 4 arguments *and* possibly returning an lvalue. The code in
op_lvalue_flags wasn’t really set up for that, so I needed to flag
the op with OPpMAYBE_LVSUB in coresub_op before it gets passed to
op_lvalue_flags. It turned out that only that was necessary, as
op_lvalue_flags does an op_private == 4 check (rather than (op_private
& 7) == 4 or some such) when checking for the 4-arg case and croak-
ing. When the op arrives in op_lvalue_flags, it’s already flagged
OPpMAYBE_LVSUB|4 which != 4.
pp_substr is also modified to check for nulls and, if necessary,
adjust its count of how many arguments were actually passed.)
|
|
|
|
|
|
|
|
| |
This commit allows &CORE::srand to be called through references and
via ampersand syntax. pp_srand is modified to take into account the
nulls pushed on to the stack in pp_coreargs, which happens because
pp_coreargs has no other way to tell srand how many arguments it’s
actually getting. See commit 0163043a for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, all case changing on utf8-encoded strings used the tables on
disk, under the off-chance that there was a user-defined case change
override in effect. Now that that feature has been removed, this can't
happen, so we can use the existing built-in tables.
This code has been present and ifdef'd out since 5.10.1. New compiler
warnings forced a few other changes besides removing the #if statements
Running some primitive benchmarks showed that this sped up upper-casing of
utf8 strings in the latin1 range by 2 orders of magnitude.
|
|
|
|
|
| |
This now reflects Tom Christiansen's and my current thinking about Greek
Final Sigma
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are grouped together because they all have \$ in their
prototypes.
This commit allows the subs in the CORE package under those names to
be called through references and via &ersand syntax.
The coreargs op in the subroutine is marked with the OPpSCALARMOD
flag. (scalar_mod_type in op.c returns true for these three ops,
indicating that the OA_SCALARREF parameter is \$, not \[$@%(&)*].)
pp_coreargs uses that flag to decide what arguments to reject.
|