delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove the non-inline function S_croak_memory_wrap from inline.h.	Andy Dougherty	2013-03-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This appears to resolve these three related tickets: [perl #116989] S_croak_memory_wrap breaks gcc warning flags detection [perl #117319] Can't include perl.h without linking to libperl [perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?) This patch changes S_croak_memory_wrap from a static (but not inline) function into an ordinary exported function Perl_croak_memory_wrap. This has the advantage of allowing programs (particuarly probes, such as in cflags.SH and Time::HiRes) to include perl.h without linking against libperl. Since it is not a static function defined within each compilation unit, the optimizer can no longer remove it when it's not needed or inline it as needed. This likely negates some of the savings that motivated the original commit 380f764c1ead36fe3602184804292711. However, calling the simpler function Perl_croak_memory_wrap() still does take less set-up than the previous version, so it may still be a slight win. Specific cross-platform measurements are welcome.
*	In Perl_re_op_compile(), tidy up after removing setjmp().	Nicholas Clark	2013-03-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Remove volatile qualifiers. Remove the variable jump_ret. Move the initialisation of restudied back to the declaration. This reverts several of the changes made by commits 5d51ce98fae3de07 and bbd61b5ffb7621c2. However, I can't see a cleaner way to avoid code duplication when restarting the parse than to approach I've taken here - the label redo_first_pass is now inside an if (0) block, which is clear but ugly.
*	In S_regclass(), create listsv as a mortal, claiming a reference if needed.	Nicholas Clark	2013-03-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The SV listsv is sometimes stored in an array generated near the end of S_regclass(). In other cases it is not used, and it needs to be freed if any of the warnings that S_regclass() can trigger turn out to be fatal. The simplest solution to this problem is to declare it from the start as a mortal, and claim a (new) reference to it if it is not to be freed. This permits the removal of all other code related to ensuring that it is freed at the right time, but not freed prematurely if a call to a warning returns.
*	Harden hashes against hash seed discovery by randomizing hash iteration	Yves Orton	2013-03-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds: S_ptr_hash() - A new static function in hv.c which can be used to hash a pointer or integer. PL_hash_rand_bits - A new interpreter variable used as a cheap provider of "semi-random" state for use by the hash infrastructure. xpvhv_aux.xhv_rand - Used as a mask which is xored against the xpvhv_aux.riter during iteration to randomize the order the actual buckets are visited. PL_hash_rand_bits is initialized as interpreter start from the random hash seed, and then modified by "mixing in" the result of ptr_hash() on the bucket array pointer in the hv (HvARRAY(hv)) every time hv_auxinit() allocates a new iterator structure. The net result is that every hash has its own iteration order, which should make it much more difficult to determine what the current hash seed is. This required some test to be restructured, as they tested for something that was not necessarily true, we never guaranteed that two hashes with the same keys would produce the same key order, we merely promised that using keys(), values(), or each() on the same hash, without any insertions in between, would produce the same order of visiting the key/values.
*	Fix several differences in the parsing of $.. and ${...}	Brian Fraser	2013-03-06	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Namely: * The first character in ${...} used to have no restrictions * ${foo:bar} used to be legal * ${foo::bar} worked, but ${foo'bar} didn't And possibly other subtle, so far undiscovered bugs. This was resolved by simply using the same code for both things. Note that this commit is not entirely useful on its own; While tests pass, it requires changes from the following commit to work entirely.
*	Pass the current and desired hash sizes to S_hsplit().	Nicholas Clark	2013-02-26	1	-1/+1
\| \| \| \| \| \| \| \|	Whilst this is slightly more work for its existing two callers, it will permit Perl_hv_ksplit() to also call it. Use STRLEN for the parameters, and change a local variable from I32 to STRLEN to match.
*	Add av_tindex() synonym for av_top_index()	Karl Williamson	2013-02-08	1	-0/+4
\| \| \| \| \|	The latter is a somewhat less clumsy name. The old one is provided a a very clear name; the new one as a somewhat slangy version
*	Inline av_top_index()	Karl Williamson	2013-02-08	1	-1/+1
\| \| \| \| \|	This function is just an assert and a macro call. Avoid the function call overhead by making it inline.
*	Change name 'av_top' to 'av_top_index'	Karl Williamson	2013-02-08	1	-2/+2
\| \| \| \| \| \| \| \|	In using the av_top() function created in a recent commit, I found myself being confused, and thinking it meant the top element of the array, whereas it really means the index of the top element of that array. Since the new name has not appeared in a stable release, it can be changed, without remorse, to include 'index' in it.
*	Add interpolations to regex sets	Karl Williamson	2013-02-03	1	-3/+3
\| \| \| \| \| \| \| \| \|	This commit adds the capability for '(?[ ])' to contain interpolated variables from other '(?[ ])' constructs. A set operation can thus be built up from the composition of other ones, without having to worry about precedence, etc. Thanks to Aaron Crane for suggesting this.
*	Incorporate code review feedback for (?[])	Karl Williamson	2013-02-03	1	-4/+4
\| \| \| \|	Thanks to Hugo van der Sanden for reviewing this new code.
*	regcomp.c: Extract code into function	Karl Williamson	2013-02-03	1	-0/+5
\| \| \| \| \| \|	The code to parse the flags that occur after in '(?foo)' and '(?foo:bar)' is extracted into a function; some comments were added. This is in preparation for this to be called from an additional place
*	hv.c: add some NULL check removal	bulk88 (via RT)	2013-01-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose is less machine instructions/faster code. * S_hv_free_ent_ret() is always called with entry non-null: so change its signature to reflect this, and remove a null check; * Add some SvREFCNT_dec_NNs; * In hv_clear(), refactor the code slightly to only do a SvREFCNT_dec_NN within the branch where its already been determined that the arg is non-null; also, use the _nocontext variant of Perl_croak() to save a push instruction in threaded perls.
*	Correct variable names in embed.fnc for hv_free_ent and hv_free_ent_ret.	Andy Dougherty	2013-01-25	1	-2/+2
\| \| \| \| \| \| \|	Make the second variable name in embed.fnc match those used in the actual function declaration. This will matter if we add in 'entry' to PERL_ARGS_ASSERT_HV_FREE_ENT_RET. Also regen headers (only proto.h is affected) to match.
*	Add av_top() synonym for av_len()	Karl Williamson	2013-01-19	1	-0/+6
\| \| \| \|	av_len() is misleadingly named.
*	Deprecate certain rare uses of backslashes within regexes	Karl Williamson	2013-01-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three pairs of characters that Perl recognizes as metacharacters in regular expression patterns: {}, [], and (). These can be used as well to delimit patterns, as in: m{foo} s(foo)(bar) Since they are metacharacters, they have special meaning to regular expression patterns, and it turns out that you can't turn off that special meaning by the normal means of preceding them with a backslash, if you use them, paired, within a pattern delimitted by them. For example, in m{foo\{1,3\}} the backslashes do not change the behavior, and this matches "f", "o" followed by one to three more occurrences of "o". Usages like this, where they are interpreted as metacharacters, are exceedingly rare; we think there are none, for example, in all of CPAN. Hence, this deprecation should affect very little code. It does give notice, however, that any such code needs to change, which will in turn allow us to change the behavior in future Perl versions so that the backslashes do have an effect, and without fear that we are silently breaking any existing code. =head1 Performance Enhancements
*	Extend strictness for qr/(?[ \N{} ])/	Karl Williamson	2013-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This recently added regex syntax imposes stricter rules on parsing than normal. However, this did not include parsing \N{} constructs that occur within it. This commit does that, making fatal the warnings that come from \N{} I will add to perldiag the newly added messages along with the others for (?[ ]) before 5.18 ships
*	Silence a MSVC++-specific warning	Steve Hay	2013-01-15	1	-7/+17
\| \| \| \| \|	("function declared with __declspec(noreturn) has non-void return type" / "function declared with __declspec(noreturn) has a return statement".)
*	Silence a couple of warnings	Steve Hay	2013-01-14	1	-1/+1
\| \| \| \| \|	("'initializing' : conversion from 'I32' to 'U8', possible loss of data" and "formal parameter n different from declaration".)
*	Add warnings for "\08", /\017/	Karl Williamson	2013-01-14	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was discussed in thread http://perl.markmail.org/thread/avtzvtpzemvg2ki2 but I never got around to this portion of the consensus, until now. I did a cpan grep http://grep.cpan.me/?q=%28^\|[^\\]%29\\[0-7]{1%2C2}[8-9]&page=1 and eyeballing the results, saw three cases where this warning might show up; one of which was for EBCDIC. The others looked to be false positives, such as in .css files.
*	Create deprecated fncs to replace to-be-removed macros	Karl Williamson	2013-01-12	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	These macros should not be used as they are prone to misuse. There are no occurrences of them in CPAN. The single use of either of them in core has recently been removed (commit 8d40577bdbdfa85ed3293f84bf26a313b1b92f55), because it was a misuse. Instead code should use isIDFIRST_lazy_if or isWORDCHAR_lazy_if (isALNUM_lazy_if is also available, but can be confused with the Posix alnum, which it doesn't mean).
*	New regex experimental feature: (?[ ])	Karl Williamson	2013-01-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fancier [bracketed] character class which allows set operations, such as intersection and subtraction. The entry in perlre for this commit details its operation. Besides extending regular expressions to handle this functionality, recommended by Unicode, the intent here is to do three things: 1) Intersection has been simulated by regexes using zero-width look-around assertions, which are non-obvious. This allows replacing those with a more powerful and clearer syntax; the compiled regexes are smaller and faster. Everything is known at compile time. 2) Set operations have also been simulated by using user-defined Unicode properties. These are globals, have security implications, restricted names, and d don't allow as complex expressions as this new feature. 3) I hope that this feature will come to be viewed as a "better" bracketed character class. I took advantage of the fact that there is no embedded base to have to be compatibile with to forbid certain iffy practices with the existing ones, while remaining mostly backwards compatible. The main difference is that /x is always enabled, so white space can be pretty much freely used with these, but to specify a match on white space, it must be escaped. Things that should have been illegal are, such as \x{}, and \x{abcdefghi}. Things that look like a posix specifier but don't quite meet the rules now give an error instead of silently compiling. e.g., [:digit] is an error instead of the union of the characters that compose it. I may have omitted things; perhaps it should be an error to have the same letter occur twice, adjacent. Since this is experimental, we can make such changes based on field feed back. The intent is to keep this feature, since it is strongly recommended by Unicode. The exact syntax is subject to change, so is experimental.
*	regcomp.c: Add capability for regclass() to return inversion list	Karl Williamson	2013-01-11	1	-1/+1
\| \| \| \| \|	This is currently unused, but will have regclass() return an inversion list instead of a node.
*	regcomp.c: Add capability for strict [:posix:]	Karl Williamson	2013-01-11	1	-1/+1
\| \| \| \| \|	This adds a parameter to regpposixcc() to enforce stricter rules on the posix class syntax. It is currently unused
*	regcomp.c: Add function to skip pattern white space	Karl Williamson	2013-01-11	1	-0/+7
\| \| \| \| \| \| \|	The plan is to eventually convert all of regcomp to use this for white space ignoring under /x, but this will be used for now in just the new syntax for (?[ ]), coming in a few commits. Until then, this function is unused.
*	regcomp.c: Add parameter to regclass()	Karl Williamson	2013-01-11	1	-1/+1
\| \| \| \| \|	This parameter silences warnings for non-portable characters. It currently is always FALSE, meaning that warnings are given.
*	regcomp.c: Add parameter to regclass()	Karl Williamson	2013-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	This parameter allows the caller to specify whether multi-character folds should be allowed or not. In general it should, and in the case where this commit says it shouldn't, they never are returned anyway from Unicode properties. This capability will be put to real use by future commits
*	grok_bslash_[ox]: Add param to silence non-portable warnings	Karl Williamson	2013-01-11	1	-2/+2
\| \| \| \| \| \| \| \|	If a hex or octal number is too big to fit in a 32 bit word, grok_oct and grok_hex by default output a warning that it is a non-portable value. This new parameter to the grok_bslash functions can cause them to shut up those warnings. This is currently unused, but will be needed in future commits.
*	Add optional strict mode to grok_bslash_[xo]	Karl Williamson	2013-01-11	1	-2/+2
\| \| \| \| \| \|	This mode croaks on any iffy constructs that currently compile. It is not currently used; documentation of the error messages will be delivered later.
*	Revise calling sequences for grok_bslash_[xo]	Karl Williamson	2013-01-11	1	-8/+6
\| \| \| \| \| \|	By passing the address of the parse pointer, the functions can advance it, eliminating a parameter to the function, and simplifying the code in the caller.
*	regcomp.c: Use a parameter to simplify some code	Karl Williamson	2013-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	When parsing \p{} outside of a bracketed character class, code in regcomp.c has pretended it is a bracketed character class by changing and restoring the parsing pointers, and then calling the charclass handler. This code can be simplified by instead passing a flag to the handler meaning to just parse one item. The faking is simpler there, with no restoring necessary. Also we can eliminate the duplicate handling of special cases. Future commits will make more extensive use of this mechanism.
*	embed.fnc: Properly declare fcn inline	Karl Williamson	2013-01-06	1	-1/+1
\| \| \| \| \|	This function is specified as inline in the source code, but not in the prototypes; only one compiler seems to have noticed.
*	regcomp.c: Don't iterate while changing an inversion list	Karl Williamson	2012-12-28	1	-0/+11
\| \| \| \| \| \| \|	This adds functions to prevent accidental (or deliberate) iteration over an inversion list while it is being modified. This is to catch development errors, and in production builds, the asserts() are likely no-ops
*	Eliminate RF_tainted flag from PL_reg_flags	David Mitchell	2012-12-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This global flag is cleared at the start of execution, and then set if any locale-based nodes are executed. At the end of execution, the RXf_TAINTED_SEEN flag on the regex is set/cleared based on RF_tainted. We eliminate RF_tainted by simply directly setting RXf_TAINTED_SEEN each time a taintable node is executed. This is the final step before eliminating PL_reg_flags.
*	eliminate RF_utf8 flag from PL_reg_flags	David Mitchell	2012-12-25	1	-2/+2
\| \| \| \| \| \| \| \|	This global flag indicates whether the currently executing regex is utf8. Replace it with a boolean var local to to the matching function, and pass it around via function args, or as a member of the regmatch_info struct. This is a first step to eliminating PL_reg_flags.
*	uninline panic branch from POPSTACK	Daniel Dragan	2012-12-23	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit saves machine instructions by preventing inlining, and keeps the error handling code for an extremely rare panic out of hot code. This should make the interp smaller and faster. Perl_error_log is a macro that has a very large expansion on threaded perls, 4 branches and possibly a call to Perl_PerlIO_stderr. POPSTACK 18 times, by asm, on my non DEBUGGING threaded Win32 32 bit Perl 5.17 -O1 compiled with VC 2003. POPSTACK is also used in some core XS modules, for example List::Util and PerlIO::encoding. The .text section of perl517.dll dropped from 0xc05ff bytes of x86 instructions to 0xc00ff after applying this for me. Perl_croak_popstack was made contextless to save a push/move instruction at each caller (less instructions in the instruction cache) and for more opportunity for the compiler to optimize. Since Perl_croak_popstack is a noreturn, some compilers may optimize it to just a conditional jump instruction. VC 2003 32 bit did this inside perl517.dll and from XS modules using POPSTACK. Perl_croak_popstack measures at 0x48 bytes of instructions under -O1 for me, so previously, those 0x48 minus the dTHX overhead would have been sitting in the caller because of macro expansion.
*	handy.h: Add full complement of isIDCONT() macros	Karl Williamson	2012-12-23	1	-0/+9
\| \| \| \| \| \| \|	This also changes isIDCONT_utf8() to use the Perl definition, which excludes any \W characters (the Unicode definition includes a few of these). Tests are also added. These macros remain undocumented for now.
*	Deprecate all is_(uni\|utf8)_foo function uses	Karl Williamson	2012-12-22	1	-0/+29
\| \| \| \| \|	Coders should use the macros in handy.h instead of calling these directly.
*	Create internal _is_utf8_mark()	Karl Williamson	2012-12-22	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	This is so we can deprecate non-core use of the existing one in a future commit. XS coders should be using the macros in handy.h instead of calling such functions directly. A future commit will deprecate all of them, but first the core uses of this one must change so they don't generate deprecation messages. I will not have a chance to look for some time, but I suspect that most uses of this function in the core should be changed to use something else, but in the meantime, the non-core uses can be deprecated.
*	utf8.c: Remove two internal now unused functions.	Karl Williamson	2012-12-22	1	-12/+0
\| \| \| \| \|	These functions were used internally as helpers for matching \X in regular expressions. They are no longer used.
*	Consolidate some regex OPS	Karl Williamson	2012-12-22	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The regular rexpression operation POSIXA works on any of the (currently) 16 posix classes (like \w and [:graph:]) under the regex modifier /a. This commit creates similar operations for the other modifiers: POSIXL (for /l), POSIXD (for /d), POSIXU (for /u), plus their complements. It causes these ops to be generated instead of the ALNUM, DIGIT, HORIZWS, SPACE, and VERTWS ops, as well as all their variants. The net saving is 22 regnode types. The reason to do this is for maintenance. As of this commit, there are now 22 fewer node types for which code has to be maintained. The code for each variant was essentially the same logic, but on different operands. It would be easy to make a change to one copy and forget to make the corresponding change in the others. Indeed, this patch fixes [perl #114272] in which one copy was out of sync with others. This patch actually reduces the number of separate code paths to 5: POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU. The complements of the last 3 use the same code path as their non-complemented version, except that a variable is initialized differently. The code then XORs this variable with its result to do the complementing or not. Further, the POSIXD branch now just checks if the target string being matched is UTF-8 or not, and then jumps to either the POSIXU or POSIXA code respectively. So, there are effectively only 4 cases that are coded: POSIXA, NPOSIXA, POSIXL, and POSIXU. (POSIXA doesn't have to worry about UTF-8, while NPOSIXA does, hence these for efficiency are coded separately.) Removing all this code saves memory. The output of the Linux size command shows that the perl executable was shrunk by 33K bytes on my platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2. The reason this patch was doable was previous work in numbering the POSIX classes, so that they could be indexed in arrays and bit positions. This is a large patch; I didn't see how to break it into smaller components. I chose to make this code more efficient as opposed to saving even more memory. Thus there is a separate loop that is jumped to after we know we have to load a swash; this just saves having to test if the swash is loaded each time through the loop. I avoid loading the swash until absolutely necessary. In places in the previous version of this code, the swash was loaded when the input was UTF-8, even if it wasn't yet needed (and might never be if the input didn't contain anything above Latin1); apparently to avoid the extra test per iteration. The Perl test suite runs slightly faster on my platform with this patch under -O0, and the speeds are indistinguishable under -O2. This is in spite of these new POSIX regops being unknown to the regex optimizer (this will be addressed in future commits), and extra machine instructions being required for each character (the xor, and some shifting and masking). I expect this is a result of better caching, and not loading swashes unless absolutely necessary.
*	Add generic _is_(uni\|utf8)_FOO() function	Karl Williamson	2012-12-22	1	-0/+9
\| \| \| \| \| \|	This function uses table lookup to replace 9 more specific functions, which can be deprecated. They should not have been exposed to the public API in the first place
*	eliminate PL_regsize	David Mitchell	2012-12-16	1	-4/+5
\| \| \| \| \| \| \| \| \|	This var (or rather PL_reg_state.re_state_regsize, which it is #deffed to) just holds the index of the maximum opening paren index seen so far in S_regmatch(). So make it a local var of S_regmatch() and pass it as a param to the couple of static functions called from there that need it. (Also give the local var the more meaningful name 'maxopenparen'.)
*	regexec.c: More efficient Korean \X processing	Karl Williamson	2012-12-16	1	-6/+0
\| \| \| \| \| \|	This refactors the code slightly that checks for Korean precomposed syllables in \X. It eliminates the PL_variable formerly used to keep track of things.
*	hash argument is not used anymore in do_oddball	Ruslan Zakirov	2012-12-11	1	-4/+3
\| \| \| \|	rename arguments to make more clear what function takes
*	regexec.c: Replace infamous if-else-if sequence by loop	Karl Williamson	2012-12-09	1	-0/+3
\| \| \| \| \| \|	This saves 1.5 KB in the text section on my machine in regexec.o (unoptimized) and 820 optimized. I did not benchmark, as we don't really care very much about performance under 'use locale'.
*	Deprecate some functions in utf8.c	Karl Williamson	2012-12-09	1	-0/+26
\| \| \| \| \|	These functions are not used by the Perl core. Code should be using the equivalent macros in handy.h that may avoid a function call.
*	Add functions for getting ctype ALNUMC	Karl Williamson	2012-12-09	1	-0/+14
\| \| \| \| \| \| \|	We think this is meant to stand for C's alphanumeric, that is what is matched by POSIX [:alnum:]. There were not functions and a dedicated swash available for accessing it. Future commits will want to use these.
*	embed.fnc: Add missing entry	Karl Williamson	2012-12-09	1	-0/+4
\| \| \| \| \|	This function is defined in utf8.c, but isn't called by the core, and there was no entry for it in embed.fnc
*	Stop renamed packages from making reset() crash	Father Chrysostomos	2012-12-05	1	-16/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This only affected threaded builds. I think the comments in the added test explain well enough what was happening. The solution is to store a stashpad offset in the pmop, instead of the name of the stash. This is similar to what was done with cop stashes in d4d03940c58a. Not only does this fix the crash, but it also makes compilation faster and saves memory (no separate malloc for every m?pat?). I had to move Safefree(PL_stashpad) later on in perl_destruct, because freeing a pmop causes the PL_stashpad to be accessed, and pmops can be freed during sv_clean_all. Its previous location was not a problem for cops, as PL_stashpad[cop->cop_stashoff] is only accessed when PL_curcop==that_cop and Perl code is running, not when cops are freed.