delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix -Dr output regression	Karl Williamson	2017-08-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several commits in the 5.23 series improved the display of the compiled ANYOF regnodes, but introduced two bugs. One of them is in \p{Any} and similar things that match the entire range 0-255. That range is omitted, so it looks like \p{Any} only matches code points above 255. Note that this is only what gets displayed under -Dr. What actually gets compiled has been and still is fine. The other is that when displaying a pattern that still has unresolved user-defined properties that are complemented, it doesn't show properly that the whole thing is complemented. That is, the output looks like it doesn't obey De Morgan's laws. The fixes to these are quite intertwined, and so I didn't try to separate them. (cherry picked from commit 753b2c6a60a81dacbe59e2041e30e8302484dc2d)
*	[perl #128740] Check for null in pp_ghostent et al.	Father Chrysostomos	2017-07-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	Specifically in the S_space_join_names_mortal static function that several pp functions call. On some platforms (such as Gentoo Linux with torsocks), hent->h_aliases (where hent is a struct hostent *) may be null after a gethostent call. (cherry picked from commit d35c1b5e43e773f353239d9182ddccb41cdab3d6)
*	Fix checks for tainted dir in $ENV{PATH}	Father Chrysostomos	2017-02-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$ cat > foo print "What?!\n" ^D $ chmod +x foo $ ./perl -Ilib -Te '$ENV{PATH}="."; exec "foo"' Insecure directory in $ENV{PATH} while running with -T switch at -e line 1. That is what I expect to see. But: $ ./perl -Ilib -Te '$ENV{PATH}="/\\:."; exec "foo"' What?! Perl is allowing the \ to escape the :, but the \ is not treated as an escape by the system, allowing a relative path in PATH to be consid- ered safe. (cherry picked from commit ba0a4150f6f1604df236035adf6df18bd43de88e)
*	Revert "Make instr() a macro"	Karl Williamson	2016-04-08	1	-1/+1
\| \| \| \| \| \|	This reverts commit fea1d2dd5d210564d442a09fe034b62f262f35f9 due to it causing problems so close to the release of 5.24. See https://rt.perl.org/Ticket/Display.html?id=127852
*	Get -Accflags=-DPERL_MEM_LOG compiling again	Matthew Horsfall	2016-04-05	1	-0/+6
\| \| \| \| \| \| \| \|	It had rotted a bit Well, more than one probably. Move the declarations of the functions Perl_mem_log_alloc etc from handy.h into embed.fnc where whey belong, and where Malloc_t will have already been defined.
*	silence warnings in inline.h on Win64 VC build	Daniel Dragan	2016-03-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	c:\p523\src\inline.h(211) : warning C4267: 'function' : conversion from 'size_t' to 'I32', possible loss of data c:\p523\src\inline.h(212) : warning C4267: 'function' : conversion from 'size_t' to 'I32', possible loss of data c:\p523\src\inline.h(421) : warning C4244: '=' : conversion from '__int64' to 'I 32', possible loss of data c:\p523\src\inline.h(423) : warning C4244: '=' : conversion from '__int64' to 'I 32', possible loss of data To fix the warnings at line 211 and 212, change the func to use a signed ptr length type. Although on x64, a 64b to 64b move instruction is 1 byte longer than a 32b to 32b move, so this commit adds a couple more bytes of machine code to the interp, but PVs len and cur are STRLEN, which is 64b on 64b OS, so something bad would happen if a very large off arg was passed to Perl_utf8_hop that was trucated to 32b, hence casting to silence the warning isn't appropriate, instead a bigger type is needed. S_cx_pushblock, a 8*(2^32), or 32 GB long perl stack malloc block is unrealistic. A 32 GB mark stack is infinite recursion. Cast away the warnings.
*	rename and function-ise dtrace macros	David Mitchell	2016-03-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit: 1. Renames the various dtrace probe macros into a consistent and self-documenting pattern, e.g. ENTRY_PROBE => PERL_DTRACE_PROBE_ENTRY RETURN_PROBE => PERL_DTRACE_PROBE_RETURN Since they're supposed to be defined only under PERL_CORE, this shouldn't break anything that's not being naughty. 2. Implement the main body of these macros using a real function. They were formerly defined along the lines of if (PERL_SUB_ENTRY_ENABLED()) PERL_SUB_ENTRY(...); The PERL_SUB_ENTRY() part is a macro generated by the dtrace system, which for example on linux expands to a large bunch of assembly directives. Replace the direct macro with a function wrapper, e.g. if (PERL_SUB_ENTRY_ENABLED()) Perl_dtrace_probe_call(aTHX_ cv, TRUE); This reduces to once the number of times the macro is expanded. The new functions also take simpler args and then process the values they need using intermediate temporary vars to avoid huge macro expansions. For example ENTRY_PROBE(CvNAMED(cv) ? HEK_KEY(CvNAME_HEK(cv)) : GvENAME(CvGV(cv)), CopFILE((const COP )CvSTART(cv)), CopLINE((const COP )CvSTART(cv)), CopSTASHPV((const COP *)CvSTART(cv))); is now PERL_DTRACE_PROBE_ENTRY(cv); This reduces the executable size by 1K on -O2 -Dusedtrace builds, and by 45K on -DDEBUGGING -Dusedtrace builds.
*	Make instr() a macro	Karl Williamson	2016-03-17	1	-1/+1
\| \| \| \|	... thus avoiding a function call overhead
*	harmonize S_dump_exec_pos()'s last arg type	David Mitchell	2016-03-15	1	-1/+1
\| \| \| \| \|	embed.fnc declared it as "U32 depth", while it was defined as "const U32 depth".
*	fixup definitions and usage of new re debugging subs	Yves Orton	2016-03-13	1	-3/+3
\| \| \| \| \| \| \| \|	this should fix the smoke failures on threaded builds, also it renames re_indentfo which was a terrible name in the first place, and now what i have had to strip the Perl_prefixes from these subs with a perl -i -pe, I took the opportunity to rename it to re_exec_indent, which self documents much better.
*	Rework diagnostics in the regex engine	Yves Orton	2016-03-13	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces three new subs: Perl_re_printf() which is a wrapper for PerlIO_printf( Perl_debug_log, ... ), which cuts down on clutter in the code. Arguably this could be moved to util.c and renamed something like PerlIO_debugf() and then we could declutter all the statements that write to the Perl_debug_log filehandle. But that is a bit too ambituous for me right now, so I leave this as a regex engine only sub for now. Perl_re_indentf() which is a wrapper for PerlIO_re_printf(), which adds an indent argument and automatically indents the line appropriately, and is used in regcomp.c for trace diagnostics during compilation. Perl_re_indentfo() which is similar to Perl_re_indentf() but is used in regexec.c which adds a specific prefix to each indented line to account for the fact that during execution we normally have string position information on the left. The end result of this patch is that a lot of clutter in the debugging statements in the regex engine is reduced, exposing what is actually going on. It should also now be easier to add new diagnostics which "do the right thing". Over time the debugging trace output in regexec has become very cluttered and confusing. This patch cleans much of it up, if something happens at a given recursion depth it is output at the right depth, etc, and formats have been changed to not have leading spaces so you can actually see the indentation properly.
*	make building without memcpy work (RT #127619)	Lukas Mai	2016-03-07	1	-1/+1
\|
*	util.c: make my_mem/my_b prototypes more like the originals	Lukas Mai	2016-03-07	1	-4/+4
\|
*	embed.fnc: Fcn should have been within an #ifdef	Karl Williamson	2016-03-01	1	-1/+1
\|
*	PATCH: [perl #127581] Spurious warning about posix class	Karl Williamson	2016-03-01	1	-1/+6
\|
*	regcomp.c: Add new static inline convenience function	Karl Williamson	2016-02-27	1	-0/+1
\|
*	regcomp.c: Change variable names, white-space	Karl Williamson	2016-02-27	1	-1/+1
\| \| \| \| \|	A future commit will make more sense if these names are changed. This reindents some code so that it doesn't overflow 79 columns
*	regcomp.c: Change name of static function	Karl Williamson	2016-02-27	1	-1/+1
\| \| \| \| \|	I found myself using this function, forgetting that it zapped one of the parameters, so change the name so that can't be forgotten.
*	Use less memory in compiling regexes	Karl Williamson	2016-02-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This is at least a partial patch for [perl #127392], cutting the maximum memory used on my box from around 8600kB to 7800kB. For [perl #127568], which has been merged into #127392, the savings are even larger, about 37% Previously a large number of large mortal SVs could be created while compiling a single regex pattern, and their accumulated memory quickly added up. This changes things to not use so many mortals.
*	regcomp.c: Guard against corrupting inversion list SV	Karl Williamson	2016-02-23	1	-1/+1
\| \| \| \| \| \|	I don't know of any cases where this happens, but in working on the next commit I triggered a problem with shrinking an inversion list so much that the required 0 UV at the beginning was freed.
*	Revamp -Dr handling of /[...]/	Karl Williamson	2016-02-19	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \|	This revamps the handling of -Dr for bracketed character classes. There were bugs introduced earlier in 5.23, and this consolidates the handling of /d classes so that the interactions can be better considered. It tries inverting the portion that is in the bitmap range to see if the output is shorter, and clearer that way. And it always makes the above-bitmap code points show as not-inverted, as that is clearer. I ran out of time before the freeze, so I had to not invert in some cases.
*	Add a parameter to a static function	Karl Williamson	2016-02-19	1	-1/+2
\| \| \| \| \| \| \|	This parameter will be used in a future commit, it changes the output format of this function that displays the contents of an inversion list so that it won't have to be parsed later, simplifying the code at that time.
*	Change private function to static	Karl Williamson	2016-02-19	1	-2/+2
\| \| \| \| \| \| \|	This function was used outside the file it contains, but was only defined (by #ifdef's) for those few internal core files for which it was needed. Now all those uses have gone, save for the one file. Better to make it static so no one can circumvent those #ifdef's.
*	regcomp.c, toke.c: swap functions being inline static	Karl Williamson	2016-02-18	1	-7/+5
\| \| \| \| \| \| \| \| \| \|	grok_bslash_x() is so large that no compiler will inline it. Move it to dquote.c from dq_inline.c. Conversely, move form_octal_warning() to dq_inline.c. It is so tiny that the function call overhead is scarcely smaller than the function body. This also moves things in embed.fnc so all these functions. are not visible outside the few files they are supposed to be used in.
*	perlapi: Hide the swash functions	Karl Williamson	2016-02-18	1	-3/+3
\| \| \| \| \| \|	These should be internal only, and we may want to get rid of them someday. Hide their existence so that people who don't already know about them won't be tempted to try to use them.
*	Don't allow /\N{}/ under 're strict'	Karl Williamson	2016-02-18	1	-0/+1
\| \| \| \| \|	This is the one remaining empty {} that was accepted under the experimental 'use re "strict"'.
*	regcomp.c: Extract duped code into one fcn	Karl Williamson	2016-02-10	1	-0/+4
\| \| \| \| \| \|	This takes code that was duplicated and makes it into a single static inline function, so that maintenance tasks don't have to be done on both copies.
*	PATCH: [perl #8904] Revamp [:posix:] parsing	Karl Williamson	2016-02-09	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A problem with bracketed character classes, qr/[foo]/, is that there is very little structure about them, so almost anything is legal, and so typos just silently compile into something unintended. One of the possible components are posix character classes. There are 14 of them, and they have a very restricted structure, which is easy to get slightly wrong, so that instead of the intended posix class being compiled, something else silently is created. This commit causes the regex compiler to look for slightly misspelled posix character classes and to raise a warning when found. It does not change the results of the compilation. To do this, it introduces fuzzy parsing into the regex compiler, using the Damerau-Levenshtein algorithm to find out how many single character edits it would take to transform the input into one of the 14 classes. If it is 1 or 2 off, it considers the input to have been intended to be that class and raises the warning. If more edits would be needed, it remains silent. This is a heuristic, and someone could have made enough typos that this thinks a class wasn't intended that was. Conversely it could raise a warning when no class was intended, though warnings only happen when the input very closely resembles a posix class of one of the 14 legal ones. The algorithm can be tweaked if experience indicates it should. But the bottom line is that many more cases of unintended results will now be warned about. Things like having blanks in the construct and having the '^' before the colon are recognized as being intended posix classes (given that the actual names are close to one of the 14), and raise warnings. Again this commit does not change what gets compiled. This found a bug in autodoc.pl which was fixed a few commits ago. The [. .] and [= =] POSIX constructs cause perl to croak that they are unimplemented. This commit improves the parsing of these two, and fixes some false positives. See http://nntp.perl.org/group/perl.perl5.porters/230975 The new code combines two functions in regcomp.c into one new one.
*	regcomp.c: Add code to compute edit distance (Damerau–Levenshtein)	Karl Williamson	2016-02-09	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used in a future commit. This code is taken from CPAN Text::Levenshtein::Damerau::XS with the author's knowledge. There have been white-space changes to make it conform better to perl's core coding standards, and declaration changes to make it more portable, such as using UV instead of 'unsigned int', and PERL_STATIC_INLINE instead of a less portable form, but the logic is unchanged. One variable was changed to signed from unsigned to avoid a warning message from some compilers. The author and I will decide later about keeping the cpan module and this code in sync. It changes very rarely.
*	make gimme consistently U8	David Mitchell	2016-02-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The value of gimme stored in the context stack is U8. Make all other uses in the main core consistent with this. My primary motivation on this was that the new function cx_pushblock(), which I gave a 'U8 gimme' parameter, was generating warnings where callers were passing I32 gimme vars to it. Rather than play whack-a-mole, it seemed simpler to just uniformly use U8 everywhere. Porting/bench.pl shows a consistent reduction of about 2 instructions on the loop and sub benchmarks, so this change isn't harming performance.
*	convert CX_{PUSH\|POP}{WHEN\|GIVEN} to inline fns	David Mitchell	2016-02-03	1	-0/+4
\| \| \| \|	Replace CX_PUSHGIVEN() with cx_pushgiven() etc.
*	convert CX_PUSHLOOP*/POPLOOP to inline fns	David Mitchell	2016-02-03	1	-0/+4
\| \| \| \|	Replace CX_PUSHLOOP_FOR() with cx_pushfloop_for() etc.
*	convert CX_PUSHEVAL/POPEVAL to inline fns	David Mitchell	2016-02-03	1	-0/+3
\| \| \| \| \| \|	Replace CX_PUSHEVAL() with cx_pusheval() etc. No functional changes.
*	convert CX_PUSHFORMAT/POPFORMAT to inline fns	David Mitchell	2016-02-03	1	-2/+5
\| \| \| \| \| \|	Replace CX_PUSHFORMAT() with cx_pushformat() etc. No functional changes.
*	convert CX_PUSHSUB/POPSUB to inline fns	David Mitchell	2016-02-03	1	-1/+6
\| \| \| \| \| \|	Replace CX_PUSHSUB() with cx_pushsub() etc. No functional changes.
*	convert CX_PUSH/POP/TOPBLOCK to inline fns	David Mitchell	2016-02-03	1	-0/+6
\| \| \| \| \| \|	Replace CX_PUSHBLOCK() with cx_pushblock() etc. No functional changes.
*	add SAVEt_TMPSFLOOR save type and Perl_savetmps()	David Mitchell	2016-02-03	1	-0/+1
\| \| \| \| \| \| \| \|	By making SAVETMPS have its own dedicated save type, it avoids having to push the address of PL_tmps_floor onto the save stack each time. By also giving it a dedicated save function, the function can do the PL_tmpsfloor = PL_tmps_ix step too, making the binary slightly more compact.
*	PUSHEVAL: make retop a parameter	David Mitchell	2016-02-03	1	-1/+1
\| \| \| \| \| \|	Rather than doing cx->blk_eval.retop = NULL in PUSHEVAL, then relying on the caller to subsequently change it to something more useful, make it an arg to PUSHEVAL.
*	replace leave_common() with leave_adjust_stacks()	David Mitchell	2016-02-03	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Make the remaining callers of S_leave_common() use leave_adjust_stacks() instead, then delete this static function. This brings the benefits of freeing TEMPS on all scope exists that has already been introduced on sub exits; uses the optimised code for creating mortal copies; and finally unifies all the different 'process return args on scope exit' implementations into single function.
*	make pp_return() use leave_adjust_stacks()	David Mitchell	2016-02-03	1	-1/+2
\| \| \| \| \| \| \|	It was using S_leave_common(), but that's shortly to be removed. It also required adding an extra arg to leave_adjust_stacks() to indicate where to shift the return args to. This will also be needed for when we replace the remaining uses of S_leave_common() with leave_adjust_stacks().
*	make pp_leavesublv use S_leavesub_adjust_stacks()	David Mitchell	2016-02-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently S_leavesub_adjust_stacks() is just used by pp_leavesub. Rename it to Perl_leave_adjust_stacks(), extend its functionality slightly, then make pp_leavesublv() use it too. This means that lvalue sub exit gains the benefit of FREETMPS being done, and (where mortal copying needs doing) the optimised copying code. It also means there is now one less version of the "process args on scope exit" code. pp_leavesublv() still does a scan of its return args looking for things to croak() on, but leaves everything else to leave_adjust_stacks(). leave_adjust_stacks() is intended shortly to be used in place of S_leave_common() too, thus unifying all args-on-scope-exit code. The changes to leave_adjust_stacks() in this commit (apart from the renaming and doc changes) are: * a new arg to indicate what condition to use to decide whether to pass or copy the arg; * a new branch to mortalise and ref count bump an arg
*	rename S_doeval() to S_doeval_compile()	David Mitchell	2016-02-03	1	-2/+1
\| \| \| \| \|	This makes it a bit more obvious what niche in the "eval" ecosystem that it occupies.
*	simplify S_leave_common() and callers	David Mitchell	2016-02-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently one of the args to S_leave_common() is supposed to be the current stack pointer; it returns an updated sp. Instead make it get/set PL_stack_sp directly. e.g. in the caller, replace dSP; SP = S_leave_common(..., SP, ...); PUTBACK; with S_leave_common(..., ...); and in S_leave_common(), make it initially get PL_stack_sp, and before returning, update PL_stack_sp.
*	rename S_dopoptogiven() to S_dopoptogivenfor()	David Mitchell	2016-02-03	1	-1/+1
\| \| \| \| \|	Since it searches the context stack for the next GIVEN or FOR LOOP context, make the name better express its purpose.
*	add Perl_clear_defarray()	David Mitchell	2016-02-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function implements the less commonly used branch in the POPSUB() macro that clears @_ in place, or abandons it and creates a new array in pad slot 0 of the function (the common branch is where @_ hasn't been reified, and so can be clered simply by setting fill to -1). By moving this out to a separate function we can avoid repeating the same code everywhere the POPSUB macro is used; but since its only used in the less frequent cases, the extra overall of a function call doesn't matter. It has a currently unused arg, 'abandon', which will be used shortly.
*	Use lookup table for /\b{gcb}/ instead of switch stmt	Karl Williamson	2016-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the handling of Grapheme Cluster Breaks to be entirely via a lookup table generated by regen/mk_invlists.pl. This is easier to maintain and follow, as the generation of the table follows the text of Unicode's UAX29 precisely, and loops can be used to set every class up instead of having to name each explicitly, so it will be easier to add new rules. And the runtime switch statement is replaced by a single line. My gcc compiler optimized the previous version to an array lookup, but this commit does it for not so clever compilers.
*	Add qr/\b{lb}/	Karl Williamson	2016-01-19	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the final Unicode boundary type previously missing from core Perl: the LineBreak one. This feature is already available in the Unicode::LineBreak module, but I've been told that there are portability and some other issues with that module. What's added here is a light-weight version that is lacking the customizable features of the module. This implements the default Line Breaking algorithm, but with the customizations that Unicode is expecting everybody to add, as their test file tests for them. In other words, this passes Unicode's fairly extensive furnished tests, but wouldn't if it didn't include certain customizations specified by Unicode beyond the basic algorithm. The implementation uses a look-up table of the characters surrounding a boundary to see if it is a suitable place to break a line. In a few cases, context needs to be taken into account, so there is code in addition to the lookup table to handle those. This should meet the needs for line breaking of many applications, without having to load the module. The algorithm is somewhat independent of the Unicode version, just like the other boundary types. Only if new rules are added, or existing ones modified is there need to go in and change this code. Otherwise, running regen/mk_invlists.pl should be sufficient when a new Unicode release is done to keep it up-to-date, again like the other Unicode boundary types.
*	embed.fnc: fix some back indentation	David Mitchell	2016-01-15	1	-2/+2
\| \| \| \|	whitespace-only change
*	regexec.c: Add a parameter to a static function	Karl Williamson	2016-01-08	1	-1/+2
\| \| \| \|	This will allow new behavior, needed in a future commit.
*	regcomp.c: Make some params to a static fcn const	Karl Williamson	2015-12-22	1	-2/+4
\| \| \| \|	This is just acting on the TODO comment.