summaryrefslogtreecommitdiff
path: root/embed.fnc
Commit message (Collapse)AuthorAgeFilesLines
* regcomp.c: Use minimal struct formal parameterKarl Williamson2014-03-041-1/+1
| | | | | | | The static function get_ANYOF_cp_list_for_ssc() takes a struct formal parameter that is a superset of what it actually uses. The calls to it have to cast to that superset. By setting the parameter to the smallest structure it uses, we simplify things.
* Optimization: Remove needless list/pushmark pairs from the OP executionSteffen Mueller2014-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an optimization for OP trees that involve list OPs in list context. In list context, the list OP's first child, a pushmark, will do what its name claims and push a mark to the mark stack, indicating the start of a list of parameters to another OP. Then the list's other child OPs will do their stack pushing. Finally, the list OP will be executed and do nothing but undo what the pushmark has done. This is because the main effect of the list OP only really kicks in if it's not in array context (actually, it should probably only kick in if it's in scalar context, but I don't know of any valid examples of list OPs in void contexts). This optimization is quite a measurable speed-up for array or hash slicing and some other situations. Another (contrived) example is that (1,2,(3,4)) now actually is the same, performance-wise as (1,2,3,4), albeit that's rarely relevant. The price to pay for this is a slightly convoluted (by standards other than the perl core) bit of optimization logic that has to do minor look-ahead on certain OPs in the peephole optimizer. A number of tests failed after the first attack on this problem. The failures were in two categories: a) Tests that are sensitive to details of the OP tree structure and did verbatim text comparisons of B::Concise output (ouch). These are just patched according to the new red in this commit. b) Test that validly failed because certain conditions in op.c were expecting OP_LISTs where there are now OP_NULLs (with op_targ=OP_LIST). For these, the respective conditions in op.c were adjusted. The change includes modifying B::Deparse to handle the new OP tree structure in the face of nulled OP_LISTs.
* Improve how regprop dumps REF-like nodes during executionYves Orton2014-02-241-1/+1
| | | | | We pass in the regmatch_info struct, which allows us to dump what a given REF is going to match.
* regcomp.c: Don't read uninitialized dataKarl Williamson2014-02-191-1/+1
| | | | | | | | I keep forgetting that the OP of a regnode is not defined in Pass 1 of the regex compiler. This is likely the cause of inconsistent results in lib/locale.t, as valgrind shows there to be a read of uninitialized data before this patch, and the result is randomly tainting when there shouldn't be, consistent with the test failures.
* regcomp.c: Remove no longer used functionKarl Williamson2014-02-191-1/+0
| | | | I don't think this function will need to be used again.
* regcomp.c: Fix more alignment problemsKarl Williamson2014-02-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | I believe this will fix the remaining alignment problems recently being shown on gcc on HP-UX, It works on the procura machine. regnodes should not have stricter alignment than required by U32, for reasons given in the comments this commit adds to the beginning of regcomp.h. Commit 31f05a37 added a new ANYOF regnode struct with a pointer field. This requires stricter alignment on some 64-bit platforms, and hence doesn't work on those platforms. This commit removes that regnode struct type, and instead stores the pointer it used via a more indirect, but already existing mechanism that stores other data.. The function that returns that other data is enlarged to return this new field as well. It now needs to be called from regcomp.c, so the previous commit had renamed and made it accessible from there. The "public" function that wraps this one is unchanged. (I put "public" in quotes here, because I don't think anyone outside core is or should be using it, but since it has been publicly available for a long time, I'm treating the API as unchangeable. regcomp.c called this public function before this commit, but needs the additional data returned by the inner one).
* regexec.c: Rename function, add parameter, make non-staticKarl Williamson2014-02-191-3/+7
| | | | | | This is in preparation for a future commit where the function does more things so its current name would be misleading. It will need to be callable from regcomp.c as well.
* Convert more EXACTFish nodes to EXACT when possibleKarl Williamson2014-02-191-1/+1
| | | | | | | | | | Under /i matching, many characters match only themselves, such a punctuation. If a node contains only such characters it can be an EXACT node. The optimizer gets better hints when dealing with EXACT nodes than ones with folding. This changes the alloc_maybe_populate() function to look for possibilities of non-folding input.
* regcomp.c: Fix some alignment problemsKarl Williamson2014-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bracketed character class (e.g. /[abc]/) in regular expression patterns is implemented as an ANYOF regnode. There are several different structs used for these, each a superset of the next smaller size, with extra fields tacked on to its end. Bits in the part common to all of them are set to indicate which size this particular instance is. Several functions in regcomp.c take the largest of these as a formal parameter, even though a smaller one may actually be passed. This avoids the need to have casts to access the optional fields, but the code needs to be careful to check the common part bits before trying to access a portion that may not actually be present. This practice dates to at least Perl v5.6.2. It turns out that there is further a problem with this if the tacked-on fields require a stricter alignment than the common fields. The code in the functions may assume that the actual parameter has suitable alignment, which may not be the case. Some months ago I added some extra optional pointer fields, which have stricter alignment requirements on 64-bit machines than the common portion, but no apparent problems ensued. Then, I changed things slightly, so that the gcc compiler on HP machines found an optimization possibility whose use required the proper alignment, which wasn't present, and bus errors started happening there. Tony Cook diagnosed the problem. A summary of his work can be found at http://markmail.org/message/hee5zyah7rb62c72 This commit changes the formal parameter to the smallest ANYOF struct, and uses casts to acess the optional portions. I don't know how common the coding style formerly used in regcomp.c is, but it is dangerous and can lead to unrelated changes causing errors. This commit should enable gcc builds to complete on the HP gcc smokers (previously miniperl built, but crashed in building the rest of perl), but we're not sure because unrelated header issues on the gcc on the machine that we have access to prevent blead from fully compiling there. There remain alignment bugs which will cause the tests to fail there, as the appended pointer field needs to have strict alignment on that platform, but when the regnodes are allocated alignment isn't done. I am working on fixing those.
* Emulate POSIX locale setting on WindowsKarl Williamson2014-02-151-0/+5
| | | | | | | | | | | | | Locale initialization and setting on Windows haven't been as described in perllocale for setting locales to "". This is because that tells Windows to use the system default locale, as set through the Control Panel, but on POSIX systems, it means to look at various environment variables. This commit creates a wrapper for setlocale, used only on Windows, that looks for the appropriate environment variables when called with a "" input locale. If none are found, it continues to use the system default locale.
* re_intuit_start(): bias last* vars; revive reghop4David Mitchell2014-02-071-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the "just matched float substr, now match fixed substr" branch, initially add an extra prog->anchored_offset to the last and last2 vars; since a lot of the later calculations involve adding anchored_offset, doing this early to the last* vars means less work in some cases. In particular, last is calculated from s by a single HOP4(s, prog->anchored_offset-start_shift,...) rather than two separate HOP3(s, -start_shift,...); HOP3(..., prog->anchored_offset,...); which may mostly cancel each other out. Similarly with last2. Later, we can skip adding prog->anchored_offset to last1, since its antecedents already have the bias added. In the case of failure, calculating a new start position involves an extra HOP to s, but removes a HOP from other_last, so the two cancel out. To make this work, I revived the reghop4() function which had been commented out, and added a HOP4c() wrapper macro. This is like HOP3c(), but allows you to specify both lower and upper limits. Useful when you don't know the sign of the offset in advance. (Yves had earlier added this function, but had commented it out until such time as it was actually used.) I also added some extra comments to this block and removed the comment about it being maybe broken under utf8, since I'm auditing the code for utf8-safeness.
* merge basic zefram/purple_signatures into bleadZefram2014-02-061-0/+1
|\
| * Merge blead into zefram/purple_signaturesZefram2014-02-011-2/+2
| |\
| * | subroutine signaturesZefram2014-02-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declarative syntax to unwrap argument list into lexical variables. "sub foo ($a,$b) {...}" checks number of arguments and puts the arguments into lexical variables. Signatures are not equivalent to the existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only available by enabling a non-default feature, and generate warnings about being experimental. The syntactic clash with prototypes is managed by disabling the short prototype syntax when signatures are enabled.
* | | Forbid "\c{" and \c{non-ascii}Karl Williamson2014-02-051-1/+1
| |/ |/| | | | | | | These constructs have been deprecated since v5.14 with the intention of making them fatal in 5.18. This wasn't done; and is being done now.
* | update embed.fnc now op_null and op_free have docsDavid Mitchell2014-01-291-2/+2
|/
* regcomp.c: Change a variable and flag bit namesKarl Williamson2014-01-271-1/+1
| | | | | The meaning of these was expanded two commits ago, so update the name to reflect this, to prevent future confusion
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-271-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* Taint more operands with case changesKarl Williamson2014-01-271-4/+8
| | | | | | | | | | The documentation says that Perl taints certain operations when subject to locale rules, such as lc() and ucfirst(). Prior to this commit there were exceptions when the operand to these functions contained no characters whose case change actually varied depending on the locale, for example the empty string or above-Latin1 code points. Changing to conform to the documentation simplifies the core code, and yields more consistent results.
* regcomp.c: Extract out code into a separate functionKarl Williamson2014-01-221-0/+1
| | | | This is in preparation for it to be called from a 2nd place.
* Comments, white-spaceKarl Williamson2014-01-221-1/+1
| | | | | | | | This adds and modifies various comments in several files, rewrapping some comments to occupy fewer lines but not exceed 79 columns. And fixes some indentation and other white space issues. It includes removing trailing white space in lines in regcomp.c. I didn't think it was worth making a commit for each file.
* regcomp.c: request inlining of single line functionKarl Williamson2014-01-221-1/+1
|
* sv_buf_to_rw can be staticFather Chrysostomos2014-01-171-1/+3
| | | | | sv_buf_to_ro needs to be non-static because op.c uses it, but sv_buf_to_rw is only called from sv.c.
* PERL_DEBUG_READONLY_COWFather Chrysostomos2014-01-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | Make perls compiled with -Accflags=-DPERL_DEBUG_READONLY_COW to turn COW buffer violations into crashes. We do this using mmap to allocate memory and then mprotect to mark memory as read-only when buffers are shared. We have to do this at the safesysmalloc level, because some code does SvPV_set with buffers it allocates on its own via safemalloc(). Unfortunately this means many things are allocated using mmap that will never be marked read-only, slowing things down considerably, but I see no other way. Because munmap and mprotect need to know the length, we use the existing sTHX/perl_memory_debug_header mechanism used already by PERL_TRACK_MEMPOOL and store the size there (as PERL_POISON already does when PERL_TRACK_MEMPOOL is enabled). perl_memory_debug_header is a struct positioned at the beginning of every allocated buffer, for tracking things.
* IDStart and IDCont no longer go out to diskKarl Williamson2014-01-091-1/+1
| | | | | | | These are the base names for various macros used in parsing identifiers. Prior to this patch, parsing a code point above Latin1 caused loading disk files. This patch causes all the information to be compiled into the Perl binary.
* isWORDCHAR_uni(), isDIGIT_utf8() etc no longer go out to diskKarl Williamson2014-01-091-1/+1
| | | | | | | Previous commits in this series have caused all the POSIX classes to be completely specified at C compile time. This allows us to revise the base function used by all these macros to use these definitions, avoiding reading them in from disk.
* Move initialization of PL_XPosix_ptrs[] to perl.cKarl Williamson2014-01-091-2/+2
| | | | | | | This was performed unconditionally in regcomp.c. However, future commits will use this from other code. Almost all (but not completely all) Perl code uses regular expressions, so only rarely will this small amount of initialization be performed when it currently isn't.
* Hide some undocumented functions from perlapiKarl Williamson2014-01-041-5/+5
| | | | | | | | These functions should not be called from any other places than they are now. They have been marked in the public API as undocumented. I presume they are there because they are called from various parts of the Perl core, so can't be static. But this suppresses them from being listed so people won't be tempted to use them.
* regexec.c: Guard against malformed UTF-8 in [...]Karl Williamson2014-01-011-1/+4
| | | | | | | | | The code that handles bracketed character classes assumed that the string being matched against did not have the too-short malformation; this could lead to reading beyond-the-end-of-buffer. (It did check for other malformations.) This is solved by changing the function that operates on bracketed character classes to take and use an extra parameter, the actaul buffer end.
* Remove no-longer used inversion list functionKarl Williamson2013-12-311-1/+0
| | | | | | | | | | | The function _invlist_invert_prop() is hereby removed. The recent changes to allow \p{} to match above-Unicode means that no special handling of properties need be done when inverting. This function was accessible to XS code that cheated by using #defines to pretend it was something it wasn't, but it also has been marked as subject to change since its inception, and never appeared in any documentation.
* Change format of mktables output binary property tablesKarl Williamson2013-12-311-0/+1
| | | | | | | | | mktables now outputs the tables for binary properties as inversion lists, with a size as the first element. This means simpler handling of these tables in the core, including removal of an entire pass over them (it was done just to get the size). These tables are marked as for internal use by the Perl core only, so their format is changeable at will.
* Purge sfio support, which has been broken for a decade.Nicholas Clark2013-12-271-1/+1
| | | | | | | | | | | The last Perl release that built with -Dusesfio was v5.8.0, and even that failed many regression tests. Every subsequent release fails to build, and in the decade that has passed we have had no bug reports about this. So it's safe to delete all the code. The Configure related code will be purged in a subsequent commit. 2 references to sfio intentionally remain in fakesdio.h and nostdio.h, as these appear to be for using its stdio API-compatibility layer.
* [perl #115736] fix undocumented param from newATTRSUB_flagsDaniel Dragan2013-12-231-3/+3
| | | | | flags param was poorly designed and didn't have a formal api. Replace it with the bool it really is. See #115736 for details.
* don't check format args on taint_properDavid Mitchell2013-12-011-1/+1
| | | | | | | | My recent commit 5d37acd6b65eb enabled (among other things) format-arg checking of taint_proper(). This was not a good idea since taint_proper() adds extra args before it actually calls a printf-style function. This was masked since on some gcc systems, a NULLOK format arg disables this check.
* silence -Wformat-nonliteral compiler warningsDavid Mitchell2013-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the security risks associated with user-supplied formats being passed to C-level printf() style functions (eg %n), gcc has a -Wformat-nonliteral warning that complains whenever such a function is passed a non-literal format string. This commit silences all such warnings in core and ext/. The main changes are 1) the 'f' (format) flag in embed.fnc is now handled slightly more cleverly. Rather than just applying to functions whose last arg is '...' (and where the format arg is assumed to be the previous arg), it can now handle non-'...' functions: arg checking is disabled, but format checking is sill done: it works by assuming that an arg called 'fmt', 'pat' or 'f' is the format string (and dies if fails to find exactly one such arg). 2) with the new embed.fnc functionally, more functions have been marked with the 'f' flag. When such a function passes its fmt arg onto an inner printf-like function, we simply disable the warning for that call using GCC_DIAG_IGNORE(-Wformat-nonliteral), since we know that the caller must have already checked it. 3) In quite a few places the format string isn't literal, but it *is* constant (e.g. PL_warn_uninit_sv). For those cases, again disable the warning. 4) In pp_formline(), a particular format was was one of several different literal strings depending on circumstances. Rather than assigning this string to a temporary variable, incorporate the ?: branches directly in the function call arg. gcc is clever enough to decide the arg is then always literal.
* mark Perl_my_strftime with format attributeDavid Mitchell2013-11-281-3/+5
| | | | | | | | | | | | | | mark this function with __attribute__format__null_ok__(__strftime__,pTHX_1,0) so that compiler checks and warnings about strftime-style format args can be checked. Rather than adding new flag(s) to embed.fnc, I just enhanced the f flag to treat it as strftime-style rather than printf if the function name matches /strftime/. This was quicker, and we're unlikely to have many such functions.
* remove almost unreachable NULL sv arg code from sv_2*n_flagsDaniel Dragan2013-11-281-7/+7
| | | | | | | | | | | The NULL sv code being removed dates to commit e334a159a5 Perl 1.0 as the pre-SV str_2ptr and str_2num calls. When SVs were intoduced in commit 79072805bf Perl 5.0 alpha 2, the NULL sv code was copied to the new SV functions. The functions were bulk marked non-NULL in commit f54cb97a39 during 5.9.3 development. The docs were corrected to say NULLOK support in commit 53e8571218 during 5.11.0. See the perldelta part of this patch for the rest of commit body.
* mg.c: Extract code into a function.Karl Williamson2013-11-261-0/+1
| | | | | This is in preparation for the same code to be used in additional places. There should be no logic changes.
* Avoid pointer churn in study_chunk recursion bitmap allocationYves Orton2013-11-241-1/+1
| | | | | | | | | | | | | | | | Since we can only recurse into a given paren (or the entire pattern) once, we know that the maximum recursion depth is the number of parens in the pattern (plus one for "whole pattern"). This means we can preallocate one large bitmap, and then use different chunks of it for each level. That avoids SAVEFREEPV costs for each bitmap, which are likely short anyway. (One could imagine an optimization where a flag somewhere lets us use the RExC_study_chunk_recursed pointer as a bitmap, so we dont have to allocate all when we have less than 32 parens.) This removes the "recursed" argument from study_chunk() and replaces it with a "recursive_depth" argument which counts how deep we are in the bitmap "stack".
* [perl #120463] s/// and tr/// with wide delimitersFather Chrysostomos2013-11-141-1/+2
| | | | | | | | | | | | | | | | | | | | | $ perl -Mutf8 -e 's αaαα' Substitution replacement not terminated at -e line 1. What is happening is that the first scan goes past the delimiter at the end of the pattern. Then a single byte is compared (the previous character against the first byte of the opening delimiter) to see whether the parser needs to step back one byte before scanning the second part. That means you can do the equivalent of s/foo/|bar|g if / is replaced with a wide character: $ perl -l -Mutf8 -e '$_ = "a"; s αaα|b|; print' b This commit fixes it by giving toke.c:S_scan_str an extra parameter, so it can tell the callers that need this (scan_subst and scan_trans) where to start scanning the replacement.
* fix multi-eval of Perl_custom_op_xop in XopENTRYDaniel Dragan2013-11-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1830b3d9c8 introduced a flaw where XopENTRY calls Perl_custom_op_xop twice to retrieve the same XOP *. This is inefficient and causes extra machine code. Since I found no CPAN or upstream=blead usage of Perl_custom_op_xop, and its previous docs say it isn't 100% public, it is being converted to a macro. Most usage of Perl_custom_op_xop is to conditionally fetch a member of the XOP struct, which was previously implemented by XopENTRY. Move the XopENTRY logic and picking defaults to an expanded version of Perl_custom_op_xop. The union allows Perl_custom_op_get_field to return its result in 1 register, since the union is similar to a void * or IV, but with the machine code overhead of casting, if any, being done in the callee (Perl_custom_op_get_field), not the caller. Perl_custom_op_get_field can also return the XOP * without looking inside it to implement Perl_custom_op_xop. XopENTRYCUSTOM is a wrapper around Perl_custom_op_get_field with XopENTRY-like usage. XopENTRY is used by the OP_* macros, which are heavily used (but rarely called, since custom ops are rare) by Perl lang warnings system. The vararg warning arguments are usually evaluted no matter if the warning will be printed to STDERR or not. Since some people like to ignore warnings or run no strict; and warnings branches are frequent in pp_*, it is beneficial to make the OP_* macros smaller in machine code. The design of Perl_custom_op_get_field supports these goals. This commit does not pass judgement on Ben Morrow's unclear public or private API designation of Perl_custom_op_xop, and whether Perl_custom_op_xop should deprecated and removed from public API. It was trivial to leave a form of Perl_custom_op_xop in the new design. XOPe enums are identical to XOPf constants so no conversion has to be done between the field selector parameter and the field flag to test in machine code. ASSUME and NOT_REACHED are being introduced. The closest to the 2 previously was "assert(0)". Perl has not used ASSUME or CC specific versions of it before. Clang, GCC >= 4.5, and Visual C are supported. For completeness, ARMCC's __promise was added, but Perl is not known to have any support for ARMCC by this commiter. This patch is part of perl #115032.
* Fix qx, `` and <<`` overridesFather Chrysostomos2013-11-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This resolves two RT tickets: • #115330 is that qx and `` overrides do not support interpolation. • #119827 is that <<`` does not support readpipe overrides at all. The obvious fix for #115330 fixes #119827 at the same time. When quote-like operators are parsed, after the string has been scanned S_sublex_push is called, which decides which of two paths to follow: 1) For things not requiring interpolation, the string is passed to tokeq (originally called q, it handles double backslashes and back- slashed delimiters) and returned to the parser immediately. 2) For anything that interpolates, the lexer enters a special inter- polation mode (LEX_INTERPPUSH) and goes through a more complex sequence over the next few calls (e.g., qq"a.$b.c" is turned into ‘stringify ( "a." . $ b . ".c" )’). When commit e3f73d4ed (Oct 2006, perl 5.10) added support for overrid- ing `` and qx with a readpipe sub, it did so by creating an entersub op in toke.c and making S_sublex_push follow path no. 1, taking the result if tokeq and inserting it into the already-constructed op tree for the sub call. That approach caused interpolation to be skipped when qx or `` is overridden. Furthermore it didn’t touch <<`` at all. The easiest solution is to let toke.c follow its normal path and create a backtick op (instead of trying to half-intercept it), and to deal with override lookup afterwards in ck_backtick, the same way require overrides are handled. Since <<`` also turns into a backtick op, it gets handled too that way.
* Put common override code into gv_overrideFather Chrysostomos2013-11-061-0/+2
| | | | | | | | | | | | When I moved the three occurrences of this code in op.c into a static function, I did not realise at the time that it also occurred thre etimes in toke.c. So now it is in a new non-static function in gv.c. Only two of the instances in toke.c could be changed to use this func- tion, as the otherwise is a little different. I couldn’t see a simple way of factoring its requirements in.
* Move the function to set $^X to its own filePeter Martini2013-11-041-0/+2
| | | | | | | This also moves the indirect dependency on stdbool.h to its own file, rather than being pulled in for all of perl.c, for those cases where one may want to test using other definitions of bool.
* Make mro code pass precomputed hash valuesFather Chrysostomos2013-11-041-1/+2
| | | | | | | | where possible This involved adding hv_fetchhek and hv_storehek macros and changing S_mro_clean_isarev to accept a hash parameter and expect HVhek_UTF8 instead of SVf_UTF8.
* [perl #119797] Fix if/else in lvalue subFather Chrysostomos2013-10-231-1/+2
| | | | | | | | | | | | | | | | | When if/else/unless is the last thing in an lvalue sub, the lvalue context is not always propagated properly and scope exit tries to copy things, including arrays, resulting in ‘Bizarre copy of ARRAY’. This commit fixes the bizarre copy by flagging any leave op that is part of an lvalue sub’s return sequence, using the OPpLEAVE flag added for this purpose in the previous commit. Then pp_leave uses that flag to avoid copying return values, but protects them via the mortals stack just like pp_leavesublv (actually pp_ctl.c:S_return_lvalues). For ‘if’ and ‘unless’ without ‘else’, the lvalue context was not being propagated, resulting in arrays’ getting flattened despite the lvalue context. op_lvalue_flags in op.c needed to handle AND and OR ops, which ‘if’ and ‘unless’ compile to, to make this work.
* make sv_2bool_flags() non-recursive on overloadDaniel Dragan2013-10-211-1/+1
| | | | | | | | | | | | | When Perl_sv_2bool_flags() has an overloaded arg, it calls SvTRUE() on the SV returned from the overload method. This indirectly calls sv_2bool_flags() again. Change it so that sv_2bool_flags() just iterates the new overload value each time. 2 callsites were converted to gotos. A SvTRUE_common was expanded so goto can be used. This function's machine code size on VC2003 32 bits dropped by 0x24 bytes after this patch.
* Adding a prototype attribute.Peter Martini2013-10-161-1/+2
| | | | | | | | | | | | | | | This attribute adds an additional way of declaring a prototype for a sub, making sub foo($$) and sub foo : prototype($$) equivalent. The intent is to keep the functionality of prototypes while allowing other modules to use the syntactic space it currently occupies for other purposes. The attribute is supported in attributes.xs to allow attributes::->import to work, but if its defined inline via something like sub foo : prototype($$) {}, it will not call out to the attributes module. For: RT #119251
* regcomp.c: Remove unused parameter in static functionKarl Williamson2013-09-241-2/+1
| | | | | This parameter is no longer used, since a few commits ago in this series.
* Teach regex optimizer to handle above-Latin1Karl Williamson2013-09-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until this commit, the regular expression optimizer has essentially punted on above-Latin1 code points. Under some circumstances, they would be taken into account, more or less, but often, the generated synthetic start class would end up matching all above-Latin1 code points. With the advent of inversion lists, it becomes feasible to actually fully handle such code points, as inversion lists are a convenient way to express arbitrary lists of code points and take their union, intersection, etc. This commit changes the optimizer to use inversion lists for operating on the code points the synthetic start class can match. I don't much understand the overall operation of the optimizer. I'm told that previous porters found that perturbing it caused unexpected behaviors. I had promised to get this change in 5.18, but didn't. I'm trying to get it in early enough into the 5.20 preliminary series that any problems will surface before 5.20 ships. This commit doesn't change the macro level logic, but does significantly change various micro level things. Thus the 'and' and 'or' subroutines have been rewritten to use inversion lists. I'm pretty confident that they do what their names suggest. I re-derived the equations for what these operations should do, getting the same results in some cases, but extending others where the previous code mostly punted. The derivations are given in comments in the respective routines. Some of the code is greatly simplified, as it no longer has to treat above-Latin1 specially. It is now feasible for /i matching of above-Latin1 code points to know explicitly the folds that should be in the synthetic start class. But more prepatory work needs to be done before putting that into place. ...