summaryrefslogtreecommitdiff
path: root/embed.h
Commit message (Collapse)AuthorAgeFilesLines
* op.c: Disentangle apply_attrs_my from apply_attrsFather Chrysostomos2012-09-191-1/+1
| | | | | | | apply_attrs consisted of a top-level if/else conditional upon a bool- ean argument. It was being called with a TRUE argument in only one place, apply_attrs_my. Inlining that branch into apply_attrs_my actu- ally reduces the amount of code slightly.
* [perl #114942] Correct scoping for ‘for my $x(){} $x’Father Chrysostomos2012-09-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was broken by commit 60ac52eb5d5. What that commit did was to merge two different queues that the lexer had for pending tokens. Since bison requires that yylex return exactly one token for each call, when the lexer sometimes has to set aside tokens in a queue and return them from the next few calls to yylex. Formerly, there were two mechanism: the forced token queue (used by force_next), and PL_pending_ident. PL_pending_ident was used for names that had to be looked up in the pads. $foo was handled like this: First call to yylex: 1. Put '$foo' in PL_tokenbuf. 2. Set PL_pending_ident. 3. Return a '$' token. Second call: PL_pending_ident is set, so call S_pending_ident, which looks up the name from PL_tokenbuf, and return the THING token containing the appropriate op. The forced token queue took precedence over PL_pending_ident. Chang- ing the order (necessary for parsing ‘our sub foo($)’) caused some XS::APItest tests to fail. So I concluded that the two queues needed to be merged. As a result, the $foo handling changed to this: First call to yylex: 1. Put '$foo' in PL_tokenbuf. 2. Call force_ident_maybe_lex (S_pending_ident renamed and modi- fied), which looks up the symbol and adds it to the forced token queue. 3. Return a '$' token. Second call: Return the token from the forced token queue. That had the unforeseen consequence of changing this: for my $x (...) { ... } $x; such that the $x was still visible after the for loop. It only hap- pened when the $ was the next token after the closing }: $ ./miniperl -e 'for my $x(()){} $x = 3; warn $x' Warning: something's wrong at -e line 1. $ ./miniperl -e 'for my $x(()){} ;$x = 3; warn $x' 3 at -e line 1. This broke Class::Declare. The name lookup in the pad must not happen before the '$' token is emitted. At that point, the parser has not yet created the for loop (which includes exiting its scope), as it does not yet know whether there is a continue block. (See the ‘FOR MY...’ branch of the barestmt rule in perly.y.) So we must delay the name lookup till the second call. So we rename force_ident_maybe_lex back to S_pending_ident, removing the force_next stuff. And we add a new force_ident_maybe_lex function that adds a special ‘pending ident’ token to the forced token queue. The part of yylex that handles pending tokens (case LEX_KNOWNEXT) is modified to account for these special ‘pending ident’ tokens and call S_pending_ident.
* [perl #114924] Make method calls work with ::SUPER packagesFather Chrysostomos2012-09-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perl caches SUPER methods inside packages named Foo::SUPER. But this interferes with actual method calls on those packages (SUPER->foo, foo::SUPER->foo). The first time a package is looked up, it is vivified under the name with which it is looked up. So *SUPER:: will cause that package to be called SUPER, and *main::SUPER:: will cause it to be named main::SUPER. main->SUPER::isa used to be very sensitive to the name of the main::FOO package (where the cache is kept). If it happened to be called SUPER, that call would fail. Fixing that bug (commit 3c104e59d83f) caused the CPAN module named SUPER to fail, because SUPER->foo was now being treated as a SUPER::method call. gv_fetchmeth_pvn was using the ::SUPER suffix to determine where to look for the method. The package passed to it (the ::SUPER package) was being used to look for cached methods, but the package with ::SUPER stripped off was being used for the rest of lookup. 3c104e59d83f made main->SUPER::foo work by treating SUPER as main::SUPER in that case. Mentioning *main::SUPER:: or doing a main->SUPER::foo call before loading SUPER.pm also caused it to fail, even before 3c104e59d83f. Instead of using publicly-visible packages for internal caches, we should be keeping them internal, to avoid such side effects. This commit adds a new member to the HvAUX struct, where a hash of GVs is stored, to cache super methods. I cannot simpy use a hash of CVs, because I need GvCVGEN. Using a hash of GVs allows the existing method cache code to be used. This new hash of GVs is not actually a stash, as it has no HvAUX struct (i.e., no name, no mro_meta). It doesn’t even need an @ISA entry as before (which was only used to make isa caches reset), as it shares its owner stash’s mro_meta generation numbers. In fact, the GVs inside it have their GvSTASH pointers pointing to the owner stash. In terms of memory use, it is probably the same as before. Every stash and every iterated or weakly-referenced hash is now one pointer larger than before, but every SUPER cache is smaller (no HvAUX, no *ISA + @ISA + $ISA[0] + magic). The code is a lot simpler now and uses fewer stash lookups, so it should be faster. This will break any XS code that expects the gv_fetchmeth_pvn to treat the ::SUPER suffix as magical. This behaviour was only barely docu- mented (the suffix was mentioned, but what it did was not), and is unused on CPAN.
* Clone my subs on scope entryFather Chrysostomos2012-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pad slot for a my sub now holds a stub with a prototype CV attached to it by proto magic. The prototype is cloned on scope entry. The stub in the pad is used when cloning, so any code that references the sub before scope entry will be able to see that stub become defined, making these behave similarly: our $x; BEGIN { $x = \&foo } sub foo { } our $x; my sub foo { } BEGIN { $x = \&foo } Constants are currently not cloned, but that may cause bugs in pad_push. I’ll have to look into that. On scope exit, lexical CVs go through leave_scope’s SAVEt_CLEARSV sec- tion, like lexical variables. If the sub is referenced elsewhere, it is abandoned, and its proto magic is stolen and attached to a new stub stored in the pad. If the sub is not referenced elsewhere, it is undefined via cv_undef. To clone my subs on scope entry, we create a sequence of introcv and clonecv ops. See the huge comment in block_end that explains why we need two separate ops for each CV. To allow my subs to be defined in inner subs (my sub foo; sub { sub foo {} }), pad_add_name_pvn and S_pad_findlex now upgrade the entry for a my sub to a CV to begin with, so that fake entries added to pads (fake entries are those that reference outer pads) can share the same CV. Otherwise newMYSUB would have to add the CV to every pad that closes over the ‘my sub’ declaration. newMYSUB no longer throws away the initial value replacing it with a new one. Prototypes are not currently visible to sub calls at compile time, because the lexer sees the empty stub. A future commit will solve that. When I added name heks to CV’s I made mistakes in a few places, by not turning on the CVf_NAMED flag, or by not clearing the field when free- ing the hek. Those code paths were not exercised enough by state subs, so the problems did not show up till now. So this commit fixes those, too. One of the tests in lexsub.t, involving foreach loops, was incorrect, and has been fixed. Another test has been added to the end for a par- ticular case of state subs closing over my subs that I broke when ini- tially trying to get sibling my subs to close over each other, before I had separate introcv and clonecv ops.
* Store state subs in the padFather Chrysostomos2012-09-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In making ‘sub foo’ respect previous ‘our sub’ declarations in a recent commit, I actually made ‘state sub foo’ into a syntax error. (At the time, I patched up MYSUB in perly.y to keep the tests for ‘"my sub" not yet implemented’ still working.) Basically, it was creat- ing an empty pad entry, but returning something that perly.y was not expecting. This commit adjusts the grammar to allow the SUB branch of barestmt to accept a PRIVATEREF for its subname, in addition to a WORD. It reuses the subname rule that SUB used to use (before our subs were added), gutting it to remove the special block handling, which SUB now tokes care of. That means the MYSUB rule will no longer turn on CvSPECIAL on the PL_compcv that is going to be thrown away anyway. The code for special blocks (BEGIN, END, etc.) that turns on CvSPECIAL now checks for state subs and skips those. It only applies to our subs and package subs. newMYSUB has now actually been written. It basically duplicates newATTRSUB, except for GV-specific things. It does currently vivify a GV and set CvGV, but I am hoping to change that later. I also hope to merge some of the code later, too. I changed the prototype of newMYSUB to make it easier to use. It is not used anywhere on CPAN and has always simply died, so that should be all right.
* eliminate PL_reginputDavid Mitchell2012-09-141-2/+2
| | | | | | | | | | | | | PL_reginput (which is actually #defined to PL_reg_state.re_state_reginput) is, to all intents and purposes, state that is only used within S_regmatch(). The only other places it is referenced are in S_regtry() and S_regrepeat(), where it is used to pass the current match position back and forth between the subs. Do this passing instead via function args, and bingo! PL_reginput is now just a local var of S_regmatch().
* Use macro not swash for utf8 quotemetaKarl Williamson2012-09-131-3/+0
| | | | | | | | | | | | | | The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
* Move 2 functions from utf8.c to regexec.cKarl Williamson2012-09-131-1/+1
| | | | | | | One of these functions is currently commented out. The other is called only in regexec.c in one place, and was recently revised to no longer require the static function in utf8.c that it formerly called. They can be made static inline.
* regexec.c: Use new macros instead of swashesKarl Williamson2012-09-131-7/+0
| | | | | | | | | | A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
* Perl_magic_setdbline() should clear and set read-only OP slabs.Nicholas Clark2012-09-041-5/+1
| | | | | | | | | | | | | The debugger implements breakpoints by setting/clearing OPf_SPECIAL on OP_DBSTATE ops. This means that it is writing to the optree at runtime, and it falls foul of the enforced read-only OP slabs when debugging with -DPERL_DEBUG_READONLY_OPS Avoid this by removing static from Slab_to_rw(), and using it and Slab_to_ro() in Perl_magic_setdbline() to temporarily make the slab re-write whilst changing the breakpoint flag. With this all tests pass with -DPERL_DEBUG_READONLY_OPS (on this system)
* Stop substr($utf8) from calling get-magic twiceFather Chrysostomos2012-08-301-0/+1
| | | | | By calling get-magic twice, it could cause its string buffer to be reallocated, resulting in incorrect and random return values.
* Refactor \X regex handling to avoid a typical case table lookupKarl Williamson2012-08-281-1/+1
| | | | | | | | | Prior to this commit 98.4% of Unicode code points that went through \X had to be looked up to see if they begin a grapheme cluster; then looked up again to find that they didn't require special handling. This commit refactors things so only one look-up is required for those 98.4%. It changes the table generated by mktables to accomplish this, and hence the name of it, and references to it are changed to correspond.
* Prepare for Unicode 6.2Karl Williamson2012-08-261-1/+2
| | | | | | | | | | | This changes code to be able to handle Unicode 6.2, while continuing to handle all prevrious releases. The major change was a new definition of \X, which adds a property to its calculation. Unfortunately \X is hard-coded into regexec.c, and so has to revised whenever there is a change of this magnitude in Unicode, which fortunately isn't all that often. I refactored the code in mktables to make it easier next time there is a change like this one.
* Banish boolkeysFather Chrysostomos2012-08-251-1/+0
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* utf8.c: collapse a function parameterKarl Williamson2012-08-251-1/+1
| | | | | | | Now that we have a flags parameter, we can get put this parameter as just another flag, giving a cleaner interface to this internal-only function. This also renames the flag parameter to <flag_p> to indicate it needs to be dereferenced.
* embed.fnc: Turn null wrapper function into macroKarl Williamson2012-08-251-1/+1
| | | | | This function only does something on EBCDIC platforms. On ASCII ones make it a macro, like similar ones to avoid useless function nesting
* utf8.c: Revise internal API of swash_init()Karl Williamson2012-08-251-1/+0
| | | | | | | | | | | This revises the API for the version of swash_init() that is usable by core Perl. The external interface is unaffected. There is now a flags parameter to allow for future growth. And the core internal-only function that returns if a swash has a user-defined property in it or not has been removed. This information is now returned via the new flags parameter upon initialization, and is unavailable afterwards. This is to prepare for the flexibility to change the swash that is needed in future commits.
* Add caching to inversion list searchesKarl Williamson2012-08-251-0/+3
| | | | | | | Benchmarking showed some speed-up when the result of the previous search in an inversion list is cached, thus potentially avoiding a search in the next call. This adds a field to each inversion list which caches its previous search result.
* Comment out unused functionKarl Williamson2012-08-251-1/+0
| | | | | | In looking at \X handling, I noticed that this function which is intended for use in it, actually isn't used. This function may someday be useful, so I'm leaving the source in.
* regcomp.c: Move functions to inline_invlist.cKarl Williamson2012-08-251-4/+4
| | | | | | This populates inline_invlist.c with some static inline functions and macro defines. These are the ones that are anticipated to be needed in the near term outside regcomp.c
* regcomp.c: Rename 2 functions to indicate private natureKarl Williamson2012-08-251-2/+2
| | | | | | These two functions will be moved into a header in a future commit, where they will be accessible outside regcomp.c Prefix their names with an underscore to emphasize that they are private
* Stop padlists from being AVsFather Chrysostomos2012-08-211-0/+2
| | | | | | | | | | | | | | | | | | | | | In order to fix a bug, I need to add new fields to padlists. But I cannot easily do that as long as they are AVs. So I have created a new padlist struct. This not only allows me to extend the padlist struct with new members as necessary, but also saves memory, as we now have a three-pointer struct where before we had a whole SV head (3-4 pointers) + XPVAV (5 pointers). This will unfortunately break half of CPAN, but the pad API docs clearly say this: NOTE: this function is experimental and may change or be removed without notice. This would have broken B::Debug, but a patch sent upstream has already been integrated into blead with commit 9d2d23d981.
* regcomp.c: Set flags when optimizing a [char class]Karl Williamson2012-08-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | A bracketed character class containing a single Latin1-range character has long been optimized into an EXACT node. Also, flags are set to include SIMPLE. However, EXACT nodes containing code points that are different when encoded under UTF-8 versus not UTF-8 should not be marked simple. To fix this, the address of the flags parameter is now passed to regclass(), the function that parses bracketed character classes, which now sets it appropriately. The unconditional setting of SIMPLE that was always done in the code after calling regclass() has been removed. In addition, the setting of the flags for EXACT nodes has been pushed into the common function that populates them. regclass() will also now increment the naughtiness count if optimized to a node that normally does that. I do not understand this heuristic behavior very well, and could not come up with a test case for it; experimentation revealed that there are no test cases in our test suite for which naughtiness makes any difference at all.
* Don’t let format arguments ‘leak out’ of formlineFather Chrysostomos2012-08-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing formats, the lexer invents tokens to feed to the parser. So when the lexer dissects this: format = @<<<< @>>>> $foo, $bar, $baz . The parser actually sees this (the parser knows that = . is like { }): format = ; formline "@<<<< @>>>>\n", $foo, $bar, $baz; . The lexer makes no effort to make sure that the argument line is con- tained within formline’s arguments. To make { do_stuff; $foo, bar } work, the lexer supplies a ‘do’ before the block, if it is inside a format. This means that $a, $b; $c, $d feeds ($a, $b) to formline, wheras { $a, $b; $c, $d } feeds ($c, $d) to formline. It also has various other strange effects: This script prints "# 0" as I would expect: print "# "; format = @ (0 and die) . write This one, locking parentheses, dies because ‘and’ has low precedence: print "# "; format = @ 0 and die . write This does not work: my $day = "Wed"; format = @<<<<<<<<<< ({qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]}->{$day}) . write You have to do this: my $day = "Wed"; format = @<<<<<<<<<< ({my %d = qw[ Sun 0 Mon 1 Tue 2 Wed 3 Thu 4 Fri 5 Sat 6 ]; \%d}->{$day}) . write which is very strange and shouldn’t even be valid syntax. This does not work, because ‘no’ is not allowed in an expression: use strict; $::foo = "bar" format = @<<<<<<<<<<< no strict; $foo . write; Putting a block around it makes it work. Putting a semicolon before ‘no’ stop it from being a syntax error, but it silently does the wrong thing. I thought I could fix all these by putting an implicit do { ... } around the argument line and removing the special-casing for an open- ing brace, allowing anonymous hashrefs to work in formats, such that this: format = @<<<< @>>>> $foo, $bar, $baz . would turn into this: format = ; formline "@<<<< @>>>>\n", do { $foo, $bar, $baz; }; . But that will lead to madness like this ‘working’: format = @ }+do{ . It would also stop lexicals declared in one format line from being visible in another. So instead this commit starts being honest with the parser. We still have some ‘invented’ tokens, to indicate the start and end of a format line, but now it is the parser itself that understands a sequence of format lines, instead of being fed generated code. So the example above is now presented to the parser like this: format = ; FORMRBRACK "@<<<< @>>>>\n" FORMLBRACK $foo, $bar, $baz ; FORMRBRACK ; . Note about the semicolons: The parser expects to see a semicolon at the end of each statement. So the lexer has to supply one before FORMRBRACK. The final dot goes through the same code that handles closing braces, which generates a semicolon for the same reason. It’s easier to make the parser expect a semicolon before the final dot than to change the } code in the lexer. We use the } code for . because it handles the internal variables that keep track of how many nested lev- els there, what kind, etc. The extra ;FORMRBRACK after the = is there also to keep the lexer sim- ple (ahem). When a newline is encountered during ‘normal’ (as opposed to format picture) parsing inside a format, that’s when the semicolon and FORMRBRACK are emitted. (There was already a semicolon there before this commit. I have just added FORMRBRACK in the same spot.)
* regcomp.c: Revise API for static functionKarl Williamson2012-08-021-1/+1
| | | | | | | | | | This is to allow future changes. The function now returns success or failure, and the created regnode (if any) is set via a parameter pointer. I removed the 'register' declaration to get this to work, because such declarations are considered bad form these days, e.g., http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c
* regcomp.c: Make invlist_search() usable from re_comp.cKarl Williamson2012-08-021-1/+1
| | | | | | | | | This was a static function which I couldn't get to be callable from the debugging version of regcomp.c. This makes it public, but known only in the regcomp.c source file. It changes the name to begin with an underscore so that if someone cheats by adding preprocessor #defines, they still have to call it with the name that convention indicates is a private function.
* regcomp.c: Rename static fcn to better reflect its purposeKarl Williamson2012-08-021-1/+1
| | | | This function handles \N of any ilk, not just named sequences.
* Oust sv_gmagical_2iv_pleaseFather Chrysostomos2012-07-281-1/+0
| | | | | | | The magic flags patch prevents this from ever being called, since the OK flags work the same way for magic variables now as they have for muggle vars, avoid these fiddly games. (It was when writing it that I realised the value of the magic flags proposal.)
* Flatten vstrings modified in placeFather Chrysostomos2012-07-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A substitution forces its target to a string upon successful substitu- tion, even if the substitution did nothing: $ ./perl -Ilib -le '$a = *f; $a =~ s/f/f/; print ref \$a' SCALAR Notice that $a is no longer a glob after s///. But vstrings are different: $ ./perl -Ilib -le '$a = v102; $a =~ s/f/f/; print ref \$a' VSTRING I fixed this in 5.16 (1e6bda93) for those cases where the vstring ends up with a value that doesn’t correspond to the actual string: $ ./perl -Ilib -le '$a = v102; $a =~ s/f/o/; print ref \$a' SCALAR It works through vstring set-magic, that does the check and removes the magic if it doesn’t match. I did it that way because I couldn’t think of any other way to fix bug #29070, and I didn’t realise at the time that I hadn’t fixed all the bugs. By making SvTHINKFIRST true on a vstring, we force it through sv_force_normal before any in-place string operations. We can also make sv_force_normal handle vstrings as well. This fixes all the lin- gering-vstring-magic bugs in just two lines, making the vstring set- magic (which is also slow) redundant. It also allows the special case in sv_setsv_flags to be removed. Or at least that was what I had hoped. It turns out that pp_subst, twists and turns in tortuous ways, and needs special treatment for things like this. And do_trans functions wasn’t checking SvTHINKFIRST when arguably it should have. I tweaked sv_2pv{utf8,byte} to avoid copying magic variables that do not need copying.
* Merge ck_trunc and ck_chdirFather Chrysostomos2012-07-251-1/+0
| | | | | | ck_chdir, added in 2006 (d4ac975e) duplicates ck_trunc, added in 1993 (79072805), except for a null op check which is harmless when applied to chdir.
* regcomp.c: Add _invlist_contains_cpKarl Williamson2012-07-241-0/+1
| | | | | This simply searches an inversion list without going through a swash. It will be used in a future commit.
* utf8.c: Add a get_() method to hide internal detailsKarl Williamson2012-07-241-0/+1
| | | | | | This should have been written this way to begin with (I'm the culprit). But we should have a method so another routine doesn't have to know the internal details.
* regcomp.c: Extract some code into an inline functionKarl Williamson2012-07-241-0/+1
| | | | This code will be used in future commits in multiple places
* regcomp.c: Extract code to inline functionKarl Williamson2012-07-241-0/+1
| | | | | | Future commits will use this paradigm in additional places, so extract it to a function, so they all do things right. This isn't a great API, but it works for the few places this will be called.
* embed.fnc: Remove duplicate entryKarl Williamson2012-07-241-1/+0
|
* Unify code that initializes constants yes, no, and undefChip Salzenberg2012-07-231-0/+1
|
* regcomp.c: Refactor code into a functionKarl Williamson2012-07-191-0/+1
| | | | | | Future commits will use this functionality in additional places beyond the single one currently. It makes sense to abstract it into a function.
* utf8.c: Create API so internals can be hiddenKarl Williamson2012-07-191-0/+1
| | | | | | | This creates a function to hide some of the internal details of swashes from the regex engine, which is the only authorized user, enforced through #ifdefs in embed.fnc. These work closely together, but it's best to have a clean interface.
* Magic flags harmonization.Chip Salzenberg2012-07-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | In restore_magic(), which is called after any magic processing, all of the public OK flags have been shifted into the private OK flags. Thus the lack of an appropriate public OK flags was used to trigger both get magic and required conversions. This scheme did not cover ROK, however, so all properly written code had to make sure mg_get was called the right number of times anyway. Meanwhile the private OK flags gained a second purpose of marking converted but non-authoritative values (e.g. the IV conversion of an NV), and the inadequate flag shift mechanic broke this in some cases. This patch removes the shift mechanic for magic flags, thus exposing (and fixing) some improper usage of magic SVs in which mg_get() was not called correctly. It also has the side effect of making magic get functions specifically set their SVs to undef if that is desired, as the new behavior of empty get functions is to leave the value unchanged. This is a feature, as now get magic that does not modify its value, e.g. tainting, does not have to be special cased. The changes to cpan/ here are only temporary, for development only, to keep blead working until upstream applies them (or something like them). Thanks to Rik and Father C for review input.
* Eliminate PL_OP_SLAB_ALLOCFather Chrysostomos2012-07-121-15/+9
| | | | | | | | | | | | This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator *is* the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
* PERL_DEBUG_READONLY_OPS with the new allocatorFather Chrysostomos2012-07-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.
* regcomp.c: Remove obsolete codeKarl Williamson2012-06-291-2/+0
| | | | | | A previous commit has removed all calls to these two functions (moving a large portion of the bit_fold() one to another place, and no longer sets the variable.
* handy.h: Fix isBLANK_uni and isBLANK_utf8Karl Williamson2012-06-291-0/+2
| | | | | | | | | | These macros have never worked outside the Latin1 range, so this extends them to work. There are no tests I could find for things in handy.h, except that many of them are called all over the place during the normal course of events. This commit adds a new file for such testing, containing for now only with a few tests for the isBLANK's
* Make formats close over the right closureFather Chrysostomos2012-06-291-0/+1
| | | | | | | | | | | | | | | | | This was brought up in ticket #113812. Formats that are nested inside closures only work if invoked from directly inside that closure. Calling the format from an inner sub call won’t work. Commit af41786fe57 stopped it from crashing, making it work as well as 5.8, in that closed-over variables would be undefined, being unavailable. This commit adds a variation of the find_runcv function that can check whether CvROOT matches an argument passed in. So we look not for the current sub, but for the topmost sub on the call stack that is a clone of the closure prototype that the format’s CvOUTSIDE field points to.
* CV-based slab allocation for opsFather Chrysostomos2012-06-291-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses bugs #111462 and #112312 and part of #107000. When a longjmp occurs during lexing, parsing or compilation, any ops in C autos that are not referenced anywhere are leaked. This commit introduces op slabs that are attached to the currently- compiling CV. New ops are allocated on the slab. When an error occurs and the CV is freed, any ops remaining are freed. This is based on Nick Ing-Simmons’ old experimental op slab implemen- tation, but it had to be rewritten to work this way. The old slab allocator has a pointer before each op that points to a reference count stored at the beginning of the slab. Freed ops are never reused. When the last op on a slab is freed, the slab itself is freed. When a slab fills up, a new one is created. To allow iteration through the slab to free everything, I had to have two pointers; one points to the next item (op slot); the other points to the slab, for accessing the reference count. Ops come in different sizes, so adding sizeof(OP) to a pointer won’t work. The old slab allocator puts the ops at the end of the slab first, the idea being that the leaves are allocated first, so the order will be cache-friendly as a result. I have preserved that order for a dif- ferent reason: We don’t need to store the size of the slab (slabs vary in size; see below) if we can simply follow pointers to find the last op. I tried eliminating reference counts altogether, by having all ops implicitly attached to PL_compcv when allocated and freed when the CV is freed. That also allowed op_free to skip FreeOp altogether, free- ing ops faster. But that doesn’t work in those cases where ops need to survive beyond their CVs; e.g., re-evals. The CV also has to have a reference count on the slab. Sometimes the first op created is immediately freed. If the reference count of the slab reaches 0, then it will be freed with the CV still point- ing to it. CVs use the new CVf_SLABBED flag to indicate that the CV has a refer- ence count on the slab. When this flag is set, the slab is accessible via CvSTART when CvROOT is not set, or by subtracting two pointers (2*sizeof(I32 *)) from CvROOT when it is set. I decided to sneak the slab into CvSTART during compilation, because enlarging the xpvcv struct by another pointer would make all CVs larger, even though this patch only benefits few (programs using string eval). When the CVf_SLABBED flag is set, the CV takes responsibility for freeing the slab. If CvROOT is not set when the CV is freed or undeffed, it is assumed that a compilation error has occurred, so the op slab is traversed and all the ops are freed. Under normal circumstances, the CV forgets about its slab (decrement- ing the reference count) when the root is attached. So the slab ref- erence counting that happens when ops are freed takes care of free- ing the slab. In some cases, the CV is told to forget about the slab (cv_forget_slab) precisely so that the ops can survive after the CV is done away with. Forgetting the slab when the root is attached is not strictly neces- sary, but avoids potential problems with CvROOT being written over. There is code all over the place, both in core and on CPAN, that does things with CvROOT, so forgetting the slab makes things more robust and avoids potential problems. Since the CV takes ownership of its slab when flagged, that flag is never copied when a CV is cloned, as one CV could free a slab that another CV still points to, since forced freeing of ops ignores the reference count (but asserts that it looks right). To avoid slab fragmentation, freed ops are marked as freed and attached to the slab’s freed chain (an idea stolen from DBM::Deep). Those freed ops are reused when possible. I did consider not reusing freed ops, but realised that would result in significantly higher mem- ory using for programs with large ‘if (DEBUG) {...}’ blocks. SAVEFREEOP was slightly problematic. Sometimes it can cause an op to be freed after its CV. If the CV has forcibly freed the ops on its slab and the slab itself, then we will be fiddling with a freed slab. Making SAVEFREEOP a no-op won’t help, as sometimes an op can be savefreed when there is no compilation error, so the op would never be freed. It holds a reference count on the slab, so the whole slab would leak. So SAVEFREEOP now sets a special flag on the op (->op_savefree). The forced freeing of ops after a compilation error won’t free any ops thus marked. Since many pieces of code create tiny subroutines consisting of only a few ops, and since a huge slab would be quite a bit of baggage for those to carry around, the first slab is always very small. To avoid allocating too many slabs for a single CV, each subsequent slab is twice the size of the previous. Smartmatch expects to be able to allocate an op at run time, run it, and then throw it away. For that to work the op is simply mallocked when PL_compcv has’t been set up. So all slab-allocated ops are marked as such (->op_slabbed), to distinguish them from mallocked ops. All of this is kept under lock and key via #ifdef PERL_CORE, as it should be completely transparent. If it isn’t transparent, I would consider that a bug. I have left the old slab allocator (PL_OP_SLAB_ALLOC) in place, as it is used by PERL_DEBUG_READONLY_OPS, which I am not about to rewrite. :-) Concerning the change from A to X for slab allocation functions: Many times in the past, A has been used for functions that were not intended to be public but were used for public macros. Since PL_OP_SLAB_ALLOC is rarely used, it didn’t make sense for Perl_Slab_* to be API functions, since they were rarely actually available. To avoid propagating this mistake further, they are now X.
* Reset the iterator when an array is clearedVincent Pit2012-06-221-0/+1
| | | | This fixes RT #75596.
* Refactor \x processing to single functionKarl Williamson2012-06-201-0/+1
| | | | | | | | | | There are three places that process \x. These can and did get out of sync. This moves all three to use a common static inline function so that they all do the same thing on the same inputs, and their behaviors will not drift apart again. This commit should not change current behavior. A previous commit was designed to bring all three to identical behavior.
* Don’t create pads for sub stubsFather Chrysostomos2012-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Two code paths, sv_2cv (for \&name) and get_cvn_flags (for &{"name"}()) were using start_subparse and newATTRSUB to create a subroutine stub, which is what usually happens for Perl subs (with op trees). This resulted in subs with unused pads attached to them, because start_subparse sets up the pad, which must be accessible dur- ing parsing. One code path, gv_init, which (among other things) reifies a GV after a sub declaration (like ‘sub foo;’, which for efficiency doesn’t create a CV), created the subroutine stub itself, without using start_subparse/newATTRSUB. This commit takes the code from gv_init, makes it more generic so it can apply to the other two cases, puts it in a new function called newSTUB, and makes all three locations call it. Now stub creation should be faster and use less memory. Additionally, this commit causes sv_2cv and get_cvn_flags to bypass bug #107370 (glob stringification not round-tripping properly). They used to stringify the GV and pass the string to newATTRSUB (wrapped in an op, of all things) for it to look up the GV again. While bug been fixed, as it was a side effect of sv_2cv triggering bug #107370.
* eliminate PL_reglast(close)?paren, PL_regoffsDavid Mitchell2012-06-131-1/+1
| | | | | | | | | | | | | | eliminate the three vars PL_reglastcloseparen PL_reglastparen PL_regoffs (which are actually aliases to PL_reg_state struct elements). These three vars always point to the corresponding fields within the currently executing regex; so just access those fields directly instead. This makes switching between regexes with (??{}) simpler: just update rex, and everything automatically references the new fields.
* eliminate sv_compile_2op, sv_compile_2op_is_brokenDavid Mitchell2012-06-131-2/+1
| | | | | | | | | These two functions, which have been a pimple on the face of perl for far too long, are no longer needed, now that regex code blocks are compiled in a sensible manner. This also allows S_doeval() to be simplified, now that it is no longer called from sv_compile_2op_is_broken().