summaryrefslogtreecommitdiff
path: root/pp.c
Commit message (Collapse)AuthorAgeFilesLines
* Name anon handles __ANONIO__Father Chrysostomos2011-12-151-1/+1
| | | | | | | | | | | | | rather than $__ANONIO__ That dollar sign *has* to have been a mistake. In ck_fun, the name was set to __ANONIO__, but it seems the change that added it (afd1915d43) did not account for the fact that a little later on the same function checks to makes sure it begins with a dollar sign, as it could only be a variable name. rv2gv’s use of $__ANONIO__ (added recently by yours truly) was just copying was ck_fun was doing.
* pp.c: Changing case of utf8 strings under locale uses locale for < 255Karl Williamson2011-12-151-4/+29
| | | | | | | | | As proposed on p5p and approved, this changes the functions uc(), lc(), ucfirst(), and lcfirst() to respect locale for code points < 255; and use Unicode semantics for those above 255. This results in better, but not perfect results, as noted in the changed pods, and brings these functions into line with how regular expression pattern matching already works.
* Adjust substr offsets when using, not when creating, lvalueFather Chrysostomos2011-12-041-75/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;
* Optimise substr assignment in void contextFather Chrysostomos2011-11-261-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In void context we can optimise substr($foo, $bar, $baz) = $replacement; to something like substr($foo, $bar, $baz, $replacement); except that the execution order must be preserved. So what we actu- ally do is substr($replacement, $foo, $bar, $baz); with a flag to indicate that the replacement comes first. This means we can also optimise assignment to two-argument substr the same way. Although optimisations are not supposed to change behaviour, this one does. • It stops substr assignment from calling get-magic twice, which means the optimisation makes things less buggy than usual. • It causes the uninitialized warning (for an undefined first argu- ment) to mention the substr operator, as it did before the previous commit, rather than the assignment operator. I think that sort of detail is minor enough. I had to make the warning about clobbering references apply whenever substr does a replacement, and not only when used as an lvalue. So four-argument substr now emits that warning. I would consider that a bug fix, too. Also, if the numeric arguments to four-argument substr and the replacement string are undefined, the order of the uninitialized warn- ings is slightly different, but is consistent regardless of whether the optimisation is in effect. I believe this will make 95% of substr assignments run faster. So there is less incentive to use what I consider the less readable form (the four-argument form, which is not self-documenting). Since I like naïve benchmarks, here are Before and After: $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m2.391s user 0m2.381s sys 0m0.005s $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m0.936s user 0m0.927s sys 0m0.005s
* Don’t coerce $x immediately in foo(substr $x...)Father Chrysostomos2011-11-261-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This program: #!perl -l sub myprint { print @_ } print substr *foo, 1; myprint substr *foo, 1; produces: main::foo Can't coerce GLOB to string in substr at - line 4. Ouch! I would expect \substr simply to give me a scalar that peeks into the original string, but without modifying the original until the return value of \substr is actually assigned to. But it turns out that it coerces the original into a string immedi- ately, unless it’s GMAGICAL. I find the exception for magical varia- ble rather befuddling. I can only imagine it was for efficency (since the stringified form will be overwritten when magic_setsubstr calls SvGETMAGIC), but that doesn’t make sense as the original variable can itself be modified between the return of the special lvalue and the assignment to that lvalue. Since magic_setsubstr itself coerces the variable into a string upon assignment to the lvalue, we can just remove the coercion code from pp_substr. But that causes double uninitialized warnings in cases like substr($undef, 0,0) = "lrep". That happens because pp_substr is still stringifying the variable (but without modifying it). It has to do that, as it looks at the length of the original string and accordingly adjusts the offsets stored in the lvalue if they are negative or if they extend beyond the end of the string. So this commit takes the simple route of avoiding the warning in pp_substr by only stringifying a variable that is SvOK if called in lvalue context. Hence, assignment to substr($tied...) will continue to call FETCH twice, but that is not a new bug. The ideal solution would be for the offsets to be translated in mg.c, rather than in pp_substr. But that would be a more involved change (including most of this commit, which is therefore not wasted) with potential backward-compatibility issue with negative numbers. A side effect it that the ‘Attempt to use reference as lvalue in substr’ warning now occurs during the assignment to the substr lvalue, rather that substr itself. This means it occurs even for tied varia- bles, so things are now more consistent. The example at the beginning could still croak if the glob were replaced with a null string, so this commit only partially allevi- ates the pain.
* Call FETCH once when chomping a tied refFather Chrysostomos2011-11-241-1/+1
|
* pp.c: Remove useless read-only check from S_do_chompFather Chrysostomos2011-11-241-1/+1
| | | | | | After sv_force_normal_flags, the scalar will no longer be read-only, except in those cases where sv_force_normal_flags croaks. So this check will never be true when SvFAKE was true.
* amagic_deref_call does not necessitate SPAGAINFather Chrysostomos2011-11-221-4/+0
| | | | | As amagic_deref_call pushes a new stack, PL_stack_sp will always have the same value before and after, so SPAGAIN is unnecessary.
* [perl #80628] __SUB__Father Chrysostomos2011-11-221-0/+19
| | | | | After much alternation, altercation and alteration, __SUB__ is finally here.
* Mention implicit $_ in y///r uninit warningFather Chrysostomos2011-11-191-2/+4
| | | | This brings it into conformity with y without the /r.
* expunge gratuitous Unicode punctuation in commentsZefram2011-11-161-1/+1
|
* pp.c: Make sure variable is initializedKarl Williamson2011-11-121-0/+1
| | | | | A compiler generated a warning about this. It is the degenerate case with an empty input, so isn't really a problem, but silence the warning
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-49/+34
| | | | | | | | Now that there is a function that can convert a latin1 character to title or upper case without going out to swashes, we can call it instead of repeating the code. There is the additional overhead of a function call, but this could be avoided if it comes down to it by making it in-line.
* pp.c: Remove macro no-longer calledKarl Williamson2011-11-111-10/+2
|
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-38/+2
| | | | | | | | | Now that there is a function that can convert a latin1 character to title or upper case without going out to swashes, we can call it instead of repeating the code. There is the additional overhead of a function call, but this could be avoided if it comes down to it by making it in-line. And this only happens when upper-casing y with diaresis, and the micro sign
* pp.c: White-space onlyKarl Williamson2011-11-111-7/+6
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-76/+1
| | | | | | | | | Now that toLOWER_utf8() and toTITLE_utf8() have the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling them for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* pp.c: White-space onlyKarl Williamson2011-11-111-28/+28
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-24/+8
| | | | | | | | | Now that toUPPER_utf8() has the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling it for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* pp.c: Add compiler hintKarl Williamson2011-11-111-1/+1
| | | | | Almost always the input to uc() will be one of the other 253 Latin1 characters rather than one of the three that gets here.
* pp.c: White-space onlyKarl Williamson2011-11-111-24/+23
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-19/+0
| | | | | | | | | Now that toLOWER_utf8() has the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling it for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* [perl #96326] *{$io} should not be semi-definedFather Chrysostomos2011-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | gv_efullname4 produces undef if the GV points to no stash, instead of using __ANON__, as it does when the stash has no name. Instead of going through hoops to try and work around it elsewhere, fix gv_efullname4. This means that $x = *$io; $x .= "whate’er"; no longer produces an uninitialized warning. (The warning was rather strange, as defined() returned true.) This commit also gives the glob the name $__ANONIO__ (yes, with a dol- lar sign). It may seem a little strange, but there is precedent in other autovivified globs, such as those open() produces when it cannot determine the variable name (e.g, open $t->{fh}).
* pp.c: White space onlyKarl Williamson2011-10-171-16/+15
| | | | | | This outdents a block to the same level as the surrounding text, and reflows the comments to take advantage of the extra space and use fewer lines.
* pp.c: Remove disabled code for context sensitive lcKarl Williamson2011-10-171-70/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This code was always #ifdef'd out. It would have been used to convert to a Greek final sigma from a non-final one, depending on context. The problem is that we can't know algorithmically if a final sigma is in order or not. I excerpt this quote, that I find persuasive, from correspondence from Father Chrysostomos, who knows Greek: "I cannot see how any algorithm can know to get it right. "The letter σ (or Σ in capitals) represents the number 200 in Greek numerals. Those are not just ancient Greek numerals, but are used on a regular basis even in modern Greek. In many printed books ς is used in place of ϛ, which represents the number 6. So if casefolding should change ͵ΑΣʹ to ͵αςʹ, or if an output layer changes ͵ασʹ similarly, it will be changing the number (from 1200 to 1006). You can’t get around it by checking for the Greek numeral sign (ʹ), as sometimes the tonos (΄), oxeia (´), or even the ASCII straight quote is used. And often in lists or chapter titles a dot is used instead of numeral sign. "Also, σ is commonly used at the ends of abbreviations. Changing ‘βλέπε σ. 16’ (‘see page 16’) to ‘βλέπε ς. 16’ is not acceptable. "So, no, I don’t think a programming language should be fiddling with σ versus ς. (A word processor is another matter.)"
* do not return useless value from void-context substrChip Salzenberg2011-10-101-9/+14
|
* Resolve XS AUTOLOAD-prototype conflictFather Chrysostomos2011-10-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Did you know that a subroutine’s prototype can be modified with s///? Don’t look: *AUTOLOAD = *Internals'SvREFCNT; my $f = "Just another "; eval{main->$f}; print prototype AUTOLOAD; $f =~ s/Just another /Perl hacker,\n/; print prototype AUTOLOAD; You did look, didn’t you? You must admit that’s creepy. The problem goes back to this: commit adb5a9ae91a0bed93d396bb0abda99831f9e2e6f Author: Doug MacEachern <dougm@covalent.net> Date: Sat Jan 6 01:30:05 2001 -0800 [patch] xsub AUTOLOAD fix/optimization Message-ID: <Pine.LNX.4.10.10101060924280.24460-100000@mojo.covalent.net> Allow AUTOLOAD to be an xsub and allow such xsubs to avoid use of $AUTOLOAD. p4raw-id: //depot/perl@8362 which includes this: + if (CvXSUB(cv)) { + /* rather than lookup/init $AUTOLOAD here + * only to have the XSUB do another lookup for $AUTOLOAD + * and split that value on the last '::', + * pass along the same data via some unused fields in the CV + */ + CvSTASH(cv) = stash; + SvPVX(cv) = (char *)name; /* cast to loose constness warning */ + SvCUR(cv) = len; + return gv; + } That ‘unused’ field is not unused. It’s where the prototype is stored. So, not only is it clobbering the prototype, it’s also leak- ing it by assigning over the top of SvPVX. Furthermore, it’s blindly assigning someone else’s string, which could be freed before it’s even used. Since it has been documented for a long time that SvPVX contains the name of the AUTOLOADed sub, and since the use of SvPVX for prototypes is documented nowhere, we have to preserve the former. So this commit makes the prototype and the sub name share the same buffer, in a manner resembling that which CvFILE used before I changed it with bad4ae38. There are two new internal macros, CvPROTO and CvPROTOLEN for retriev- ing the prototype.
* gv.c, op.c, pp.c: Stash-injected prototypes and prototype() are UTF-8 clean.Brian Fraser2011-10-061-1/+1
| | | | | | | | This makes perl -E '$::{example} = "\x{30cb}"; say prototype example;' store and fetch the correctly flagged prototype. With this, all TODO tests in gv.t pass; The next commit will deal with making the parsing of prototypes nul-clean.
* pp.c: Got pp_gelem nul-clean.Brian Fraser2011-10-061-11/+12
|
* pp.c: Make warnings utf8-cleanBrian Fraser2011-10-061-3/+5
|
* pp.c: pp_substr for UTF-8 globs.Brian Fraser2011-10-061-2/+2
| | | | | Since typeglobs may have the UTF8 flag set now, we need to avoid testing SvCUR on a potential glob, as that would trip an assertion.
* pp.c & sv.c: pp_ref UTF8 and null cleanup.Brian Fraser2011-10-061-3/+2
| | | | | | | | | This adds a new function to sv.c, sv_ref, which is a nul-and-UTF8 clean version of sv_reftype. pp_ref now uses that. sv_ref() not only returns the SV, but also takes in an SV to modify, so we can say both sv_ref(TARG, obj, TRUE); and sv = sv_ref(NULL, obj, TRUE);
* pp.c: pp_bless UTF8 cleanup.Brian Fraser2011-10-061-1/+1
| | | | | Some tests in t/uni/bless.t are TODO, as ref() isn't clean yet.
* pp.c: Make pp_rv2cv use gv_autoload_pvn()Brian Fraser2011-10-061-1/+1
|
* pp.c: pp_rv2gv UTF8 cleanup.Brian Fraser2011-10-061-4/+3
|
* Merge postinc and postdecFather Chrysostomos2011-09-161-25/+7
| | | | They were nearly identical.
* Merge preinc and postincFather Chrysostomos2011-09-161-17/+0
| | | | | They are almost identical. This gives the compiler less code to digest.
* Make ++ and -- work on glob copiesFather Chrysostomos2011-09-161-3/+3
| | | | These ops considered typeglobs read-only, even if they weren’t.
* remove index offsetting ($[)Zefram2011-09-091-51/+13
| | | | | | $[ remains as a variable. It no longer has compile-time magic. At runtime, it always reads as zero, accepts a write of zero, but dies on writing any other value.
* Enter gv_fetchsv_nomgFather Chrysostomos2011-09-081-21/+5
| | | | | | | | | | | There are so many cases that use this incantation to get around gv_fetchsv’s calling of get-magic-- STRLEN len; const char *name = SvPV_nomg_const(sv,len); gv = gv_fetchpvn_flags(name, len, flags | SvUTF8(sv), type); --that it’s about time we had a shorthand.
* remove unused variables and assignmentsRobin Barker2011-09-081-2/+1
| | | | | | and silences some compiler warnings. I do not understand the code in toke.c but the change aligns the code with other uses of FUN0OP, it has no warnings and does not break any test.
* [perl #97484] Make defined &{...} vivify CORE subsFather Chrysostomos2011-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Magical variables usually get autovivified, even in rvalue context, because Perl is trying to pretend they have been there all along. That means defined(${"."}) will autovivify $. and return true. Until CORE subs were introduced, there were no subroutines that popped into existence when looked at. This commit makes rv_2cv use the GV_ADDMG flag added in commit 23496c6ea. When this flag is passed, gv_fetchpvn_flags creates a GV but does not add it to the stash until it finds out that it is creat- ing a magical one. The CORE sub code calls newATTRSUB, which expects to add the CV to the stash itself. So the gv has to be added there and then. So gv_fetchpvn_flags is also adjusted to add the gv to the stash right before calling newATTRSUB, and to tell itself that the GV_ADDMG flag is actually off. It might be better to move the CV-creation code into op.c and inline parts of newATTRSUB, to avoid fiddling with the addmg variable (and avoid prototype checks on CORE subs), but that refactoring should probably come in separate commits.
* Eliminate is_gv_magical_svFather Chrysostomos2011-08-301-18/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This resolves perl bug #97978. Many built-in variables, like $], are actually created on the fly when first accessed. Perl likes to pretend that these variables have always existed, so it autovivifies the *] glob even in rvalue context (e.g., defined *{"]"}, close "]"). The list of variables that were autovivified was maintained separ- ately (in is_gv_magical_sv) from the code that actually creates them (gv_fetchpvn_flags). ‘Maintained’ is not actually precise: it *wasn’t* being maintained, and there were new variables that never got added to is_gv_magical_sv and one deleted variable that was never removed. There are only two pieces of code that call is_gv_magical_sv, both in pp.c: S_rv2gv (called by *{} and also the implicit *{} that functions like close() provide) and Perl_softrefxv (called by ${}, @{}, %{}). In both cases, the glob is immediately autovivified if is_gv_magical_sv returns true. So this commit eliminates the extra maintenance burden by extirpat- ing is_gv_magical_sv altogether, and replacing it with a new flag to gv_fetchpvn_flags, GvADDMG, which will autovivify a glob *if* it’s a magical one. It does make defined(*{"frobbly"}) slightly slower, in that it creates a temporary glob and then frees it when it sees nothing magical has been done with it. But this case is rare enough it should not matter. At least I got rid of the bugginess.
* &CORE::unpack()Father Chrysostomos2011-08-291-15/+14
| | | | | | | | | | | This commit allows &CORE::unpack to be called through references and via ampersand syntax. It moves the $_-handling code in pp_coreargs inside the parameter loop, so it can apply to the second parameter, not just the first. Consequently, a mkdir test has been added that ensures implicit $_ is not used for mkdir’s second argument; i.e., that the $_-handling code’s if() condition is correct.
* &CORE::foo() for tie functionsFather Chrysostomos2011-08-291-2/+6
| | | | | | | This commit allows the tie, tied and untie subroutines in the CORE namespace to be called through references and via &ampersand() syntax. pp_coreargs is modified to handle the functions with \[$@%*] in their prototypes (which happen to be just the tie functions).
* &CORE::substr()Father Chrysostomos2011-08-271-5/+9
| | | | | | | | | | | | | | | | | | This commit makes &CORE::substr callable through references and via &ampersand syntax. It’s a bit awkward, as we need a substr op that is flagged as hav- ing 4 arguments *and* possibly returning an lvalue. The code in op_lvalue_flags wasn’t really set up for that, so I needed to flag the op with OPpMAYBE_LVSUB in coresub_op before it gets passed to op_lvalue_flags. It turned out that only that was necessary, as op_lvalue_flags does an op_private == 4 check (rather than (op_private & 7) == 4 or some such) when checking for the 4-arg case and croak- ing. When the op arrives in op_lvalue_flags, it’s already flagged OPpMAYBE_LVSUB|4 which != 4. pp_substr is also modified to check for nulls and, if necessary, adjust its count of how many arguments were actually passed.)
* &CORE::srand()Father Chrysostomos2011-08-271-1/+1
| | | | | | | | This commit allows &CORE::srand to be called through references and via ampersand syntax. pp_srand is modified to take into account the nulls pushed on to the stack in pp_coreargs, which happens because pp_coreargs has no other way to tell srand how many arguments it’s actually getting. See commit 0163043a for details.
* pp.c: Use built-in case tables for ords < 256Karl Williamson2011-08-271-27/+4
| | | | | | | | | | | | | Previously, all case changing on utf8-encoded strings used the tables on disk, under the off-chance that there was a user-defined case change override in effect. Now that that feature has been removed, this can't happen, so we can use the existing built-in tables. This code has been present and ifdef'd out since 5.10.1. New compiler warnings forced a few other changes besides removing the #if statements Running some primitive benchmarks showed that this sped up upper-casing of utf8 strings in the latin1 range by 2 orders of magnitude.
* pp.c: Change commentKarl Williamson2011-08-271-14/+8
| | | | | This now reflects Tom Christiansen's and my current thinking about Greek Final Sigma
* &CORE::foo() for (sys)read and recvFather Chrysostomos2011-08-261-4/+15
| | | | | | | | | | | | | | These are grouped together because they all have \$ in their prototypes. This commit allows the subs in the CORE package under those names to be called through references and via &ampersand syntax. The coreargs op in the subroutine is marked with the OPpSCALARMOD flag. (scalar_mod_type in op.c returns true for these three ops, indicating that the OA_SCALARREF parameter is \$, not \[$@%(&)*].) pp_coreargs uses that flag to decide what arguments to reject.