summaryrefslogtreecommitdiff
path: root/embed.fnc
Commit message (Collapse)AuthorAgeFilesLines
* embed.fnc: Fix flags problem for regposixccKarl Williamson2012-11-191-1/+1
| | | | | | | | The static and inline flags are considered mutually exclusive. This is not a fatal embed error as something can't be inline unless it is also static, but the warning is there because the entry looks suspicious. Commit 2fd63cc5b615213574e0153ed2bf14d9df23c073 introduced the flags that caused the warning
* embed.fnc: Make a function globalKarl Williamson2012-11-191-1/+1
| | | | | This function is supposed to only be called internally, but it is called by a macro that has global scope, so it also has to be global.
* regcomp.c: Revise debugging functionKarl Williamson2012-11-191-0/+1
| | | | | | I use this function for debugging, but it is normally commented out. This commit adds an entry to embed.fnc for it that can quickly be uncommented, and makes some revisions to the function itself.
* utf8.c: Request function to be inlineKarl Williamson2012-11-191-1/+1
| | | | | This could remove a layer of function call overhead for this small function, (if the compiler doesn't already choose to inline it).
* embed.fnc: Restrict access to non-public functionsKarl Williamson2012-11-191-2/+2
| | | | | | These two functions added earlier in 5.17 are not meant to be public. This moves them so they are defined only for certain core files, and marks them as experimental should they show up in any documentation.
* Request that regcomp.c:S_regpposixcc be inlinedFather Chrysostomos2012-11-181-1/+1
| | | | It is only called from one spot.
* Stop /[.zog.]/ and /[[.zog.]]/ from leakingFather Chrysostomos2012-11-181-1/+2
| | | | | | Before croaking, we need to free any SVs we might have allocated tem- porarily. Also, Simple_vFAIL does not free the regular expression. For that we need vFAIL.
* Inline regcomp.c:S_checkposixcc into its only callerFather Chrysostomos2012-11-181-1/+0
| | | | | In the next commit, it will need to access other variables around its call site.
* Hash Function Change - Murmur hash and true per process hash seedYves Orton2012-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: *) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. *) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. *) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. *) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. *) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. *) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
* Stop eval "END OF TERMS" from leakingFather Chrysostomos2012-11-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found this memory leak by evaluating lines of the Copying file as Perl code. :-) The parser requires yylex to return exactly one token with each call. Sometimes yylex needs to record a few tokens ahead of time, so its puts them in its forced token stack. The next call to yylex then pops the pending token off that stack. Ops belong to their subroutines. If the subroutine is freed before its root is attached, all the ops created when PL_compcv pointed to that sub are freed as well. To avoid crashes, the ops on the savestack and the forced token stack are specially marked so they are not freed when the sub is freed. When it comes to evaluating "END OF TERMS AND CONDITIONS", the END token causes a subroutine to be created and placed in PL_compcv. The OF token is treated by the lexer as a method call on the TERMS pack- age. The TERMS token is placed in the forced token stack as an sv in an op for a WORD token, and a METHOD token for OF is returned. As soon as the parser sees the OF, it generates an error, which results in LEAVE_SCOPE being called, which frees the subroutine for END while TERMS is still on the forced token stack. So the subroutine’s op cleanup skips that op. Then the parser calls back into the lexer, which returns the TERMS token from the forced token stack. Since there has been an error, the parser discards that token, so the op is never freed. The forced token stack cleanup that happens in parser_free does not catch this, as the token is no longer on that stack. Earlier, to solve the problem of yylex returning freed ops to the parser, resulting in crashes, I set the op_savefree flag on ops on the forced token stack. But that resulted in a leak. So now I am using a different approach: When the sub is freed and frees all its ops, have it also look in the parser’s forced token stack, freeing any ops that belong to it, and setting the point- ers to null.
* clean up the users of PL_no_memDaniel Dragan2012-11-121-1/+1
| | | | | | | | | This commit eliminates a couple strlen()s of a literal. "Out of memory!\n" and PL_no_mem did not string pool on Visual C, so PL_no_mem was given a length. This commit removes S_write_no_mem and replaces it with nonstatic. Perl_croak_no_mem was made nocontext to save instructions in it's callers. NORETURN_FUNCTION_END caused a syntax error on Visual C C++ mode and therefore was removed.
* rmv context from Perl_croak_no_modify and Perl_croak_xs_usageDaniel Dragan2012-11-121-2/+2
| | | | | | | | | | | Remove the context/pTHX from Perl_croak_no_modify and Perl_croak_xs_usage. For croak_no_modify, it now has no parameters (and always has been no return), and on some compilers will now be optimized to a conditional jump. For Perl_croak_xs_usage one push asm opcode is removed at the caller. For both funcs, their footprint in their callers (which probably are hot code) is smaller, which means a tiny bit more room in the cache. My text section went from 0xC1A2F to 0xC198F after apply this. Also see http://www.nntp.perl.org/group/perl.perl5.porters/2012/11/msg195233.html .
* embed.fnc: Allow toke.c to call core_swash_init()Karl Williamson2012-11-111-1/+3
| | | | | | This internal function is allowed to just a few core functions by #ifdef's in embed.fnc. Expand the list to include toke.c, as it will be needed in a future commit.
* toke.c: Extract part of \N{} processing to new functionKarl Williamson2012-11-111-0/+2
| | | | | | | | | | | | | | | | | This is in preparation for making fatal the deprecations that this code covers. This code combines the first and final portions of the code that handles \N{names}, leaving the middle intact. There are no intentional logic changes. The code is moved and outdented as appropriate for not being within nested "if's", and the comments are reflowed to fill 79 columns. One declaration had a const added. This causes the logic that checks for input name validity to be moved from after everything is computed to doing it beforehand. Since invalid names are not currently fatal, there was no problem with checking them after computing things, but a future commit will make them fatal, so this saves the work of computing something that is erroneous.
* "func not implemented" croaks optimizations in /win32/*Daniel Dragan2012-11-081-1/+3
| | | | | | | | | This commit removes a number of "* not implemented" strings from the image. A win32_croak_not_implemented wrapper is created to reduce machine code by not putting the format string on the C stack many times. embed.fnc was used to declare win32_croak_not_implemented for proper cross compiler support of noreturn (noreturn on GCC and VC ok). Tailcalling and noreturn optimizations of the C compiler are heavily used in this commit.
* Remove __attribute__malloc__ from MYSWAP functionsSteve Hay2012-11-011-3/+3
| | | | | | | | | | | | These functions are only used when the native sockets functions are not available, e.g. when building miniperl on Windows following commit 19253ae62c, so gcc's warning about ignoring the __malloc__ attribute here is not normally seen. The addition of "a" to these functions in embed.fnc by commit f54cb97a39 was presumably wrong since none of them actually allocate any memory (nor did so at the time), so change it to just "R" (which is implied by the "a" and is still appropriate).
* optimize memory wrap croaks, often used in MEM_WRAP_CHECKDaniel Dragan2012-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most perls are built with PERL_MALLOC_WRAP. This causes MEM_WRAP_CHECK macro to perform some checks on the requested allocation size in macro Newx. The checks are performed at the caller, not in the callee (for me on Win32 perl the callee in Newx is Perl_safesysmalloc) of Newx. If the check fails a "Perl_croak_nocontext("%s",PL_memory_wrap)" is done. In x86 machine code, "if(bad_alloc) Perl_croak_nocontext("%s",PL_memory_wrap); will be written as "cond jmp ahead ~15 bytes", "push const pointer", "push const pointer", "call const pointer". For each Newx where the allocation amount was not a constant (constant folding would remove the croak memory wrap branch compleatly), the branch takes 15-19 bytes depending on x86 compiler. There are about 80 Newx'es in the interp (win32 dynamic linking perl) that do the memory wrap check and have a "Perl_croak_nocontext("%s",PL_memory_wrap)" in them after all optimizations by the compiler. This patch reduces the memory wrap branch from 15-19 to 5 bytes on x86. Since croak_memory_wrap is a static and a noreturn, a compiler with IPO may optimize the whole branch to "cond jmp 32 bits relative" at each callsite. A less optimal complier may do "cond jmp 8 bits relative (jump past the "call S_croak_memory_wrap" instruction), then "call S_croak_memory_wrap". Both ways are better than the current situation. The reason why croak_memory_wrap is a static and not an export is that the compiler has more opportunity to optimize/reduce the impact of the memory wrap branch at the call site if the target is in the same image rather than in a different image, which would require using the platform specific dynamic linking mechanism/export table/etc, which often requires a new stack frame per ABI of the platform. If a dynamic linked XS module does not use S_croak_memory_wrap it will be removed from the image by the C compiler. If it is included in the XS image, it is a very small block of code and a 3 byte string litteral. A CPU cache line is typically 32 or 64 bytes and a memory read is typically 16. Cutting the instructions by 10 to 16 bytes out of "hot code" (10 of the ~80 call sites are pp_*) is a worthy goal. In a few places the memory wrap croak is used explictly, not from a MEM_WRAP_CHECK, this patch converts those to use the static. If PERL_MALLOC_WRAP is undef, there are still a couple uses of croak memory wrap, so do not keep S_croak_memory_wrap in a ifdef PERL_MALLOC_WRAP. Also see http://www.nntp.perl.org/group/perl.perl5.porters/2012/10/msg194383.html and [perl #115456].
* regexec.c: Remove dead codeKarl Williamson2012-10-241-2/+2
| | | | | | | | | An ANYOF node now no longer matches more than one character, since 9d53c4576e551530162e7cd79ab72ed81b1e1a0f. This code was overlooked in the clean up commit e0193e472b025d41438e251be622aad42c9af9cc. Since the maximum match is 1 character, there is no point in passing a ptr that was set to indicate how far the match went, so that parameter is removed.
* regex: Remove old code that tried to handle multi-char foldsKarl Williamson2012-10-141-2/+1
| | | | | | A recent commit has changed the algorithm used to handle multi-character folding in bracketed character classes. The old code is no longer needed.
* Allow _swash_inversion_hash() to be called in regexec.cKarl Williamson2012-10-091-1/+1
| | | | | | To prevent this very-internal core function from being used by XS writers, it isn't defined except if the preprocessor indicates it is compiling certain .c files. Add regexec.c to the list
* regexec.c: PATCH: [perl #114808]Karl Williamson2012-10-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c72077c4fff72b66cdde1621c62fb4fd383ce093 fixed a place where to_byte_substr() fails, but the code continued as if it had succeeded. There is yet another place where the return is not checked. This commit adds a check there. However, it turns out that there is another underlying problem to [perl #114808]. The function to_byte_substr() tries to downgrade the substr fields in the regex program it is passed. If it fails (because something in it is expressible only in UTF-8), it permanently changes that field to point to PL_sv_undef, thus losing the original information. This is fine as long as the program will be used once and discarded. However, there are places where the program is re-used, as in the test case introduced by this commit, and the original value has been lost. To solve this, this commit also changes to_byte_substr() from returning void to instead returning bool, indicating success or failure. On failure, the original substrs are left intact. The calls to this function are correspondingly changed. One of them had a trace statement when the failure happens, I reworded it to be more general and accurate (it was slightly misleading), and added the trace to every such place, not just the one. In addition, I found the use of the same ternary operation in 3 or 4 consecutive lines very hard to understand; and is inefficient unless compiled under C optimization which avoids recalculating things. So I expanded the several nearly identical places in the code that do that so that I could quickly see what is going on.
* [perl #79824] Don’t cow for sv_mortalcopy call from XSFather Chrysostomos2012-10-051-1/+2
| | | | | | | | XS code doing sv_mortalcopy(sv) will expect to get a true copy, and not a COW ‘copy’. So make sv_mortalcopy and wrapper around the new sv_mortalcopy_flags that passes it SV_DO_COW_SVSETSV, which is defined as 0 for XS code.
* Remove length magic on scalarsFather Chrysostomos2012-10-011-1/+0
| | | | | | | | | It is not possible to know how to interpret the returned length without accessing the UTF8 flag, which is not reliable until the SV has been stringified, which requires get-magic. So length magic has not made senses since utf8 support was added. I have removed all uses of length magic from the core, so this is now dead code.
* Deprecate mg_length; make it return bytesFather Chrysostomos2012-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mg_length returns the number of bytes if a scalar has length magic, but the number of characters otherwise. sv_len_utf8 used to assume that mg_length would return bytes. The first mistake was added in commit b76347f2eb, which assumed that mg_length would return characters. But it was #ifdeffed out until commit ffc61ed20e. Later, commit 5636d518683 met sv_len_utf8’s assumptions by making mg_length return the length in characters, without accounting for sv_len, which also used mg_length. So we ended up with a buggy sv_len that would return a character count for scalars with get- but not length-magic, and a byte count otherwise. In the previous commit, I fixed sv_len not to use mg_length at all. I plan shortly to remove any use of mg_length (the one remaining use is in sv_len_utf8, which is currently not called on magical values). The reason for removing all calls to mg_length is that the returned length cannot be converted to characters without access to the PV as well, which requires get-magic. So length magic on scalars makes no sense since the advent of utf8. This commit restore mg_length to its old behaviour and lists it as deprecated. This is mostly cosmetic, as there are no CPAN users. But it is in the API, and I don’t know whether we can easily remove it.
* Restore special blocks to working orderFather Chrysostomos2012-09-261-1/+2
| | | | | | | | | | | | | | | | | | I accidentally broke these in commit 85ffec3682, yet everything passed for me under threads+mad. PL_compcv is usually restored to its previous value at the end of newATTRSUB when LEAVE_SCOPE is called. But BEGIN blocks are called before that. I needed PL_compcv to be restored to its previ- ous value before it was called, so I added LEAVE_SCOPE before process_special_blocks. But that caused the name to be freed before S_process_special_blocks got a chance to look at it. So I have now added a new parameter to S_process_special_blocks to allow *it* to call LEAVE_SCOPE after it determines that it is a BEGIN block, but before it calls it.
* [perl #97958] Make reset "" match its docsFather Chrysostomos2012-09-241-0/+2
| | | | | | | | | According to the documentation, reset() with no argument resets pat- terns. But reset "" and reset "\0foo" were also resetting patterns. While I was at it, I fixed embedded nulls, too, though it’s not likely anyone is using this. I could not fix the bug within the existing API for sv_reset, so I created a new function and left the old one with the old behaviour. Call me pear-annoyed.
* op.c: Disentangle apply_attrs_my from apply_attrsFather Chrysostomos2012-09-191-1/+1
| | | | | | | apply_attrs consisted of a top-level if/else conditional upon a bool- ean argument. It was being called with a TRUE argument in only one place, apply_attrs_my. Inlining that branch into apply_attrs_my actu- ally reduces the amount of code slightly.
* embed.fnc: Remove d flag from cv_clone_intoFather Chrysostomos2012-09-191-1/+1
| | | | | | I copied it from another entry by mistake. I don’t think this should be documented, at least not yet, as I am not confident the interface is stable yet. There may be significant changes before 5.18.
* [perl #114942] Correct scoping for ‘for my $x(){} $x’Father Chrysostomos2012-09-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was broken by commit 60ac52eb5d5. What that commit did was to merge two different queues that the lexer had for pending tokens. Since bison requires that yylex return exactly one token for each call, when the lexer sometimes has to set aside tokens in a queue and return them from the next few calls to yylex. Formerly, there were two mechanism: the forced token queue (used by force_next), and PL_pending_ident. PL_pending_ident was used for names that had to be looked up in the pads. $foo was handled like this: First call to yylex: 1. Put '$foo' in PL_tokenbuf. 2. Set PL_pending_ident. 3. Return a '$' token. Second call: PL_pending_ident is set, so call S_pending_ident, which looks up the name from PL_tokenbuf, and return the THING token containing the appropriate op. The forced token queue took precedence over PL_pending_ident. Chang- ing the order (necessary for parsing ‘our sub foo($)’) caused some XS::APItest tests to fail. So I concluded that the two queues needed to be merged. As a result, the $foo handling changed to this: First call to yylex: 1. Put '$foo' in PL_tokenbuf. 2. Call force_ident_maybe_lex (S_pending_ident renamed and modi- fied), which looks up the symbol and adds it to the forced token queue. 3. Return a '$' token. Second call: Return the token from the forced token queue. That had the unforeseen consequence of changing this: for my $x (...) { ... } $x; such that the $x was still visible after the for loop. It only hap- pened when the $ was the next token after the closing }: $ ./miniperl -e 'for my $x(()){} $x = 3; warn $x' Warning: something's wrong at -e line 1. $ ./miniperl -e 'for my $x(()){} ;$x = 3; warn $x' 3 at -e line 1. This broke Class::Declare. The name lookup in the pad must not happen before the '$' token is emitted. At that point, the parser has not yet created the for loop (which includes exiting its scope), as it does not yet know whether there is a continue block. (See the ‘FOR MY...’ branch of the barestmt rule in perly.y.) So we must delay the name lookup till the second call. So we rename force_ident_maybe_lex back to S_pending_ident, removing the force_next stuff. And we add a new force_ident_maybe_lex function that adds a special ‘pending ident’ token to the forced token queue. The part of yylex that handles pending tokens (case LEX_KNOWNEXT) is modified to account for these special ‘pending ident’ tokens and call S_pending_ident.
* embed.fnc: Clarify o commentsFather Chrysostomos2012-09-181-1/+1
| | | | It applies also to S_ for static functions.
* [perl #114924] Make method calls work with ::SUPER packagesFather Chrysostomos2012-09-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perl caches SUPER methods inside packages named Foo::SUPER. But this interferes with actual method calls on those packages (SUPER->foo, foo::SUPER->foo). The first time a package is looked up, it is vivified under the name with which it is looked up. So *SUPER:: will cause that package to be called SUPER, and *main::SUPER:: will cause it to be named main::SUPER. main->SUPER::isa used to be very sensitive to the name of the main::FOO package (where the cache is kept). If it happened to be called SUPER, that call would fail. Fixing that bug (commit 3c104e59d83f) caused the CPAN module named SUPER to fail, because SUPER->foo was now being treated as a SUPER::method call. gv_fetchmeth_pvn was using the ::SUPER suffix to determine where to look for the method. The package passed to it (the ::SUPER package) was being used to look for cached methods, but the package with ::SUPER stripped off was being used for the rest of lookup. 3c104e59d83f made main->SUPER::foo work by treating SUPER as main::SUPER in that case. Mentioning *main::SUPER:: or doing a main->SUPER::foo call before loading SUPER.pm also caused it to fail, even before 3c104e59d83f. Instead of using publicly-visible packages for internal caches, we should be keeping them internal, to avoid such side effects. This commit adds a new member to the HvAUX struct, where a hash of GVs is stored, to cache super methods. I cannot simpy use a hash of CVs, because I need GvCVGEN. Using a hash of GVs allows the existing method cache code to be used. This new hash of GVs is not actually a stash, as it has no HvAUX struct (i.e., no name, no mro_meta). It doesn’t even need an @ISA entry as before (which was only used to make isa caches reset), as it shares its owner stash’s mro_meta generation numbers. In fact, the GVs inside it have their GvSTASH pointers pointing to the owner stash. In terms of memory use, it is probably the same as before. Every stash and every iterated or weakly-referenced hash is now one pointer larger than before, but every SUPER cache is smaller (no HvAUX, no *ISA + @ISA + $ISA[0] + magic). The code is a lot simpler now and uses fewer stash lookups, so it should be faster. This will break any XS code that expects the gv_fetchmeth_pvn to treat the ::SUPER suffix as magical. This behaviour was only barely docu- mented (the suffix was mentioned, but what it did was not), and is unused on CPAN.
* Clone my subs on scope entryFather Chrysostomos2012-09-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pad slot for a my sub now holds a stub with a prototype CV attached to it by proto magic. The prototype is cloned on scope entry. The stub in the pad is used when cloning, so any code that references the sub before scope entry will be able to see that stub become defined, making these behave similarly: our $x; BEGIN { $x = \&foo } sub foo { } our $x; my sub foo { } BEGIN { $x = \&foo } Constants are currently not cloned, but that may cause bugs in pad_push. I’ll have to look into that. On scope exit, lexical CVs go through leave_scope’s SAVEt_CLEARSV sec- tion, like lexical variables. If the sub is referenced elsewhere, it is abandoned, and its proto magic is stolen and attached to a new stub stored in the pad. If the sub is not referenced elsewhere, it is undefined via cv_undef. To clone my subs on scope entry, we create a sequence of introcv and clonecv ops. See the huge comment in block_end that explains why we need two separate ops for each CV. To allow my subs to be defined in inner subs (my sub foo; sub { sub foo {} }), pad_add_name_pvn and S_pad_findlex now upgrade the entry for a my sub to a CV to begin with, so that fake entries added to pads (fake entries are those that reference outer pads) can share the same CV. Otherwise newMYSUB would have to add the CV to every pad that closes over the ‘my sub’ declaration. newMYSUB no longer throws away the initial value replacing it with a new one. Prototypes are not currently visible to sub calls at compile time, because the lexer sees the empty stub. A future commit will solve that. When I added name heks to CV’s I made mistakes in a few places, by not turning on the CVf_NAMED flag, or by not clearing the field when free- ing the hek. Those code paths were not exercised enough by state subs, so the problems did not show up till now. So this commit fixes those, too. One of the tests in lexsub.t, involving foreach loops, was incorrect, and has been fixed. Another test has been added to the end for a par- ticular case of state subs closing over my subs that I broke when ini- tially trying to get sibling my subs to close over each other, before I had separate introcv and clonecv ops.
* Store state subs in the padFather Chrysostomos2012-09-151-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In making ‘sub foo’ respect previous ‘our sub’ declarations in a recent commit, I actually made ‘state sub foo’ into a syntax error. (At the time, I patched up MYSUB in perly.y to keep the tests for ‘"my sub" not yet implemented’ still working.) Basically, it was creat- ing an empty pad entry, but returning something that perly.y was not expecting. This commit adjusts the grammar to allow the SUB branch of barestmt to accept a PRIVATEREF for its subname, in addition to a WORD. It reuses the subname rule that SUB used to use (before our subs were added), gutting it to remove the special block handling, which SUB now tokes care of. That means the MYSUB rule will no longer turn on CvSPECIAL on the PL_compcv that is going to be thrown away anyway. The code for special blocks (BEGIN, END, etc.) that turns on CvSPECIAL now checks for state subs and skips those. It only applies to our subs and package subs. newMYSUB has now actually been written. It basically duplicates newATTRSUB, except for GV-specific things. It does currently vivify a GV and set CvGV, but I am hoping to change that later. I also hope to merge some of the code later, too. I changed the prototype of newMYSUB to make it easier to use. It is not used anywhere on CPAN and has always simply died, so that should be all right.
* Fix our sub with protoFather Chrysostomos2012-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yylex must emit exactly one token each time it is called. Some- times yylex needs to parse several tokens at once. That’s what the various force functions are for. But that is also what PL_pending_ident is for. The various force_next, force_word, force_ident, etc., functions keep a stack of tokens (PL_nextval/PL_nexttype) that yylex will check imme- diately when called. PL_pending_ident is used to track a single identifier that yylex will hand off to S_pending_ident to handle. S_pending_ident is the only piece of code for resolving an identi- fier that could be lexical but could also be a package variable. force_ident assumes it is looking for a package variable. force_* takes precedence over PL_pending_ident. All this means that, if an identifier needs to be looked up in the pad on the next yylex invocation, it has to use PL_pending_ident, and the force_* functions cannot be used at the same time. Not realising that, when I made ‘our sub foo’ store the sub in the pad I also made ‘our sub foo ($)’ into a syntax error, because it was being parsed as ‘our sub ($) foo’ (the prototype being ‘forced’); i.e., the pending tokens were being pulled out of the ‘queue’ in the wrong order. (I put queue in quotes, because one queue and one unre- lated buffer together don’t exactly count as ‘a queue’.) Changing PL_pending_ident to have precedence over the force stack breaks ext/XS-APItest/t/swaptwostmts.t, because the statement-parsing interface does not localise PL_pending_ident. It could be changed to do that, but I don’t think it is the right solution. Having two separate pending token mechanisms makes things need- lessly fragile. This commit eliminates the PL_pending_ident mechanism and modifies S_pending_ident (renaming it in the process to S_force_ident_maybe_lex) to work with the force mechanism. I was going to merge it with force_ident, but the two make incompatible assumptions that just complicate the code if merged. S_pending_ident needs the sigil in the same string buffer, to pass to the pad inter- face. force_ident needs to be able to work without a sigil present. So now we only have one queue for pending tokens and the order is more predictable.
* eliminate PL_reginputDavid Mitchell2012-09-141-3/+3
| | | | | | | | | | | | | PL_reginput (which is actually #defined to PL_reg_state.re_state_reginput) is, to all intents and purposes, state that is only used within S_regmatch(). The only other places it is referenced are in S_regtry() and S_regrepeat(), where it is used to pass the current match position back and forth between the subs. Do this passing instead via function args, and bingo! PL_reginput is now just a local var of S_regmatch().
* Use macro not swash for utf8 quotemetaKarl Williamson2012-09-131-1/+0
| | | | | | | | | | | | | | The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
* Move 2 functions from utf8.c to regexec.cKarl Williamson2012-09-131-2/+2
| | | | | | | One of these functions is currently commented out. The other is called only in regexec.c in one place, and was recently revised to no longer require the static function in utf8.c that it formerly called. They can be made static inline.
* regexec.c: Use new macros instead of swashesKarl Williamson2012-09-131-7/+0
| | | | | | | | | | A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
* Perl_magic_setdbline() should clear and set read-only OP slabs.Nicholas Clark2012-09-041-3/+1
| | | | | | | | | | | | | The debugger implements breakpoints by setting/clearing OPf_SPECIAL on OP_DBSTATE ops. This means that it is writing to the optree at runtime, and it falls foul of the enforced read-only OP slabs when debugging with -DPERL_DEBUG_READONLY_OPS Avoid this by removing static from Slab_to_rw(), and using it and Slab_to_ro() in Perl_magic_setdbline() to temporarily make the slab re-write whilst changing the breakpoint flag. With this all tests pass with -DPERL_DEBUG_READONLY_OPS (on this system)
* In op.c, change S_Slab_to_rw() from an OP * parameter to an OPSLAB *.Nicholas Clark2012-09-041-1/+1
| | | | This makes it consistent with Perl_Slab_to_ro(), which takes an OPSLAB *.
* Stop substr($utf8) from calling get-magic twiceFather Chrysostomos2012-08-301-0/+1
| | | | | By calling get-magic twice, it could cause its string buffer to be reallocated, resulting in incorrect and random return values.
* Refactor \X regex handling to avoid a typical case table lookupKarl Williamson2012-08-281-1/+1
| | | | | | | | | Prior to this commit 98.4% of Unicode code points that went through \X had to be looked up to see if they begin a grapheme cluster; then looked up again to find that they didn't require special handling. This commit refactors things so only one look-up is required for those 98.4%. It changes the table generated by mktables to accomplish this, and hence the name of it, and references to it are changed to correspond.
* Prepare for Unicode 6.2Karl Williamson2012-08-261-1/+2
| | | | | | | | | | | This changes code to be able to handle Unicode 6.2, while continuing to handle all prevrious releases. The major change was a new definition of \X, which adds a property to its calculation. Unfortunately \X is hard-coded into regexec.c, and so has to revised whenever there is a change of this magnitude in Unicode, which fortunately isn't all that often. I refactored the code in mktables to make it easier next time there is a change like this one.
* Banish boolkeysFather Chrysostomos2012-08-251-1/+0
| | | | | | | | | | | | | | Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo- leans in scalar context, instead of bucket stats, if flagged the right way. sub { %hash || ... } is optimised to take advantage of this. If the || is in unknown context at compile time, the %hash is flagged as being maybe a true boolean. When flagged that way, it returns a bool- ean if block_gimme() returns G_VOID. If rv2hv and padhv can already do this, then we don’t need the boolkeys op any more. We can just flag the rv2hv to return a boolean. In all the cases where boolkeys was used, we know at compile time that it is true boolean context, so we add a new flag for that.
* utf8.c: collapse a function parameterKarl Williamson2012-08-251-1/+1
| | | | | | | Now that we have a flags parameter, we can get put this parameter as just another flag, giving a cleaner interface to this internal-only function. This also renames the flag parameter to <flag_p> to indicate it needs to be dereferenced.
* embed.fnc: Turn null wrapper function into macroKarl Williamson2012-08-251-1/+2
| | | | | This function only does something on EBCDIC platforms. On ASCII ones make it a macro, like similar ones to avoid useless function nesting
* utf8.c: Revise internal API of swash_init()Karl Williamson2012-08-251-4/+3
| | | | | | | | | | | This revises the API for the version of swash_init() that is usable by core Perl. The external interface is unaffected. There is now a flags parameter to allow for future growth. And the core internal-only function that returns if a swash has a user-defined property in it or not has been removed. This information is now returned via the new flags parameter upon initialization, and is unavailable afterwards. This is to prepare for the flexibility to change the swash that is needed in future commits.
* embed.fnc: Mark internal function as "may change"Karl Williamson2012-08-251-1/+1
| | | | | This function is not designed for a public API, and should have been so listed.
* Add caching to inversion list searchesKarl Williamson2012-08-251-1/+4
| | | | | | | Benchmarking showed some speed-up when the result of the previous search in an inversion list is cached, thus potentially avoiding a search in the next call. This adds a field to each inversion list which caches its previous search result.
* Comment out unused functionKarl Williamson2012-08-251-1/+1
| | | | | | In looking at \X handling, I noticed that this function which is intended for use in it, actually isn't used. This function may someday be useful, so I'm leaving the source in.