delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	add wrap_keyword_plugin function (RT #132413)	Lukas Mai	2017-11-11	1	-0/+1
\|
*	embed.fnc: Change fcn from A to X	Karl Williamson	2017-11-08	1	-1/+1
\| \| \| \| \| \| \| \|	This function is marked as accessible anywhere, but experimental, and so is changeable at any time without notice, and its name begins with an underscore to indicate its private nature. I didn't know at the time I wrote it that we have an existing mechanism to deal with functions whose only use should be a public macro. This changes to use that mechanism.
*	Change name of internal function	Karl Williamson	2017-11-08	1	-1/+1
\| \| \| \| \|	Following on the previous commit, this changes the name of the function that changes the variable to be in sync with it.
*	locale.c: Change static fcn name	Karl Williamson	2017-11-08	1	-1/+1
\| \| \| \|	The new name more closely reflects what it does
*	locale.c: Refactor static fcn to save work	Karl Williamson	2017-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a parameter to the function that sets the radix character for floating point numbers. We know that the radix by default is a dot, so no need to calculate it in that case. This code was previously using localeconv() to find the locale's decimal point. The just added my_nl_langinfo() fcn does the same with an easier API, and is more thread safe, and automatically switches to use localeconv() when n nl_langinfo() isn't available, so revise the conditional compilation directives that previously were necessary, and collapse directives that were unnecessarily nested. And adjust indentation
*	locale.c: Create extended internal Perl_langinfo()	Karl Williamson	2017-11-08	1	-0/+10
\| \| \| \| \| \| \| \|	This extended version allows it to be called so that it uses the current locale for the LC_NUMERIC, instead of toggling to the underlying one. (This can be useful when in the middle of things.) This ability won't be used until the next commit
*	toke.c: Add limit parameter to 3 static functions	Karl Williamson	2017-11-06	1	-3/+3
\| \| \| \| \|	This will make it possible to fix to handle embedded NULs in the next commits.
*	dquote.c: Use memchr() instead of strchr()	Karl Williamson	2017-11-06	1	-2/+2
\| \| \| \| \| \| \|	This allows \x and \o to work properly in the face of embedded NULs. A limit parameter is added to each function, and that is passed to memchr (which replaces strchr). See the branch merge message for more information.
*	Add my_memrchr() implementation of memrchr()	Karl Williamson	2017-11-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	On platforms that have memrchr(), my_mrchr() maps to use that instead. This is useful functionality, lacking on many platforms. This commit also uses the new function in two places in the core where the comments previously indicated it would be advantageous to use it if we had it. It is left usable only in core, so that if this turns out to have been a bad idea, it can be easily removed.
*	Add OP_MULTICONCAT op	David Mitchell	2017-10-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can make things a lot faster: 4x or more. In more detail: it will optimise into a single OP_MULTICONCAT, most expressions of the form LHS RHS where LHS is one of (empty) my $lexical = $lexical = $lexical .= expression = expression .= and RHS is one of (A . B . C . ...) where A,B,C etc are expressions and/or string constants "aAbBc..." where a,A,b,B etc are expressions and/or string constants sprintf "..%s..%s..", A,B,.. where the format is a constant string containing only '%s' and '%%' elements, and A,B, etc are scalar expressions (so only a fixed, compile-time-known number of args: no arrays or list context function calls etc) It doesn't optimise other forms, such as ($a . $b) . ($c. $d) ((($a .= $b) .= $c) .= $d); (although sub-parts of those expressions might be converted to an OP_MULTICONCAT). This is partly because it would be hard to maintain the correct ordering of tie or overload calls. The compiler uses heuristics to determine when to convert: in general, expressions involving a single OP_CONCAT aren't converted, unless some other saving can be made, for example if an OP_CONST can be eliminated, or in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply OPpTARGET_MY to, but OP_CONST can't. The multiconcat op is of type UNOP_AUX, with the op_aux structure directly holding a pointer to a single constant char* string plus a list of segment lengths. So for "a=$a b=$b\n"; the constant string is "a= b=\n", and the segment lengths are (2,3,1). If the constant string has different non-utf8 and utf8 representations (such as "\x80") then both variants are pre-computed and stored in the aux struct, along with two sets of segment lengths. For all the above LHS types, any SASSIGN op is optimised away. For a LHS of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too. For example where $a and $b are lexical vars, this statement: my $c = "a=$a, b=$b\n"; formerly compiled to const[PV "a="] s padsv[$a:1,3] s concat[t4] sK/2 const[PV ", b="] s concat[t5] sKS/2 padsv[$b:1,3] s concat[t6] sKS/2 const[PV "\n"] s concat[t7] sKS/2 padsv[$c:2,3] sRM/LVINTRO sassign vKS/2 and now compiles to: padsv[$a:1,3] s padsv[$b:1,3] s multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY In terms of how much faster it is, this code: my $a = "the quick brown fox jumps over the lazy dog"; my $b = "to be, or not to be; sorry, what was the question again?"; for my $i (1..10_000_000) { my $c = "a=$a, b=$b\n"; } runs 2.7 times faster, and if you throw utf8 mixtures in it gets even better. This loop runs 4 times faster: my $s; my $a = "ab\x{100}cde"; my $b = "fghij"; my $c = "\x{101}klmn"; for my $i (1..10_000_000) { $s = "\x{100}wxyz"; $s .= "foo=$a bar=$b baz=$c"; } The main ways in which OP_MULTICONCAT gains its speed are: any OP_CONSTs are eliminated, and the constant bits (already in the right encoding) are copied directly from the constant string attached to the op's aux structure. * It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in all cases; OP_CONCAT only did this in very limited circumstances. * Because it has a holistic view of the entire concatenation expression, it can do the whole thing in one efficient go, rather than creating and copying intermediate results. pp_multiconcat() goes to considerable efforts to avoid inefficiencies. For example it will only SvGROW() the target once, and to the exact size needed, no matter what mix of utf8 and non-utf8 appear on the LHS and RHS. It never allocates any temporary SVs except possibly in the case of tie or overloading. * It does all its own appending and utf8 handling rather than calling out to functions like sv_catsv(). * It's very good at handling the LHS appearing on the RHS; for example in $x = "abcd"; $x = "-$x-$x-"; It will do roughly the equivalent of the following (where targ is $x); SvPV_force(targ); SvGROW(targ, 11); p = SvPVX(targ); Move(p, p+1, 4, char); Copy("-", p, 1, char); Copy("-", p+5, 1, char); Copy(p+1, p+6, 4, char); Copy("-", p+10, 1, char); SvCUR(targ) = 11; p[11] = '\0'; Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to handle situations like that. The code is quite big; both S_maybe_multiconcat() and pp_multiconcat() (the main compile-time and runtime parts of the implementation) are over 700 lines each. It turns out that when you combine multiple ops, the number of edge cases grows exponentially ;-)
*	add extra optimization phase	David Mitchell	2017-10-31	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the function optimize_optree(). Optree optimization/finalization is now done in three main phases: 1) optimize_optree(optree); 2) CALL_PEEP(*startp); 3) finalize_optree(optree); (1) and (3) are done in top-down order, while (2) is done in execution order. Note that this function doesn't actually optimize anything yet; this commit is just adding the necessary infrastructure. Adding this extra top-down phase allows certain combinations of ops to be recognised in ways that the peephole optimizer would find hard. For example in $a = expression1 . expression2 . expression3 . expression4 the top-down tree looks like sassign concat concat concat expression1 ... expression2 ... expression3 ... expression4 ... padsv[$a] so its easy to see the nested concats, while execution order looks like ... lots of ops for expression1 ... ... lots of ops for expression2 ... concat ... lots of ops for expression3 ... concat ... lots of ops for expression4 ... concat padsv[$a] sassign where its not at all obvious that there is a chain of nested concats. Similarly, trying to do this in finalize_optree() is hard because the peephole optimizer will have messed things up. Also it will be too late to remove nulled-out ops from the execution path.
*	Assume we have sane C89 memcmp()	Aaron Crane	2017-10-21	1	-3/+0
\| \| \| \| \| \| \|	"Sane" means that it works correctly on bytes with their high bit set, as C89 also requires. We therefore no longer need to probe for and/or use BSD bcmp().
*	Assume we have C89 memcpy() and memmove()	Aaron Crane	2017-10-21	1	-3/+0
\| \| \| \|	We can therefore also avoid probing for and/or using BSD bcopy().
*	Don't look for a "safe" memcpy()	Aaron Crane	2017-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	C89 says that, if you want to copy overlapping memory blocks, you must use memmove(), and that attempt to copy overlapping memory blocks using memcpy() yields undefined behaviour. So we should never even attempt to probe for a system memcpy() implementation that just happens to handle overlapping memory blocks. In particular, the compiler might compile the probe program in such a way that Configure thinks overlapping memcpy() works even when it doesn't. This has the additional advantage of removing a Configure probe that needs to execute a target-platform program on the build host.
*	Assume we have C89 memset()	Aaron Crane	2017-10-21	1	-6/+0
\| \| \| \|	This means we also never need to consider using BSD bzero().
*	(perl #127663) safer in-place editing	Tony Cook	2017-09-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Previously in-place editing opened the file then immediately replaced the file, so if an error occurs while writing the output, such as running out of space, the content of the original file is lost. This changes in-place editing to write to a work file which is renamed over the original only once the output file is successfully closed. It also fixes an issue with setting setuid/setgid file modes for recursive in-place editing.
*	Add API function Perl_langinfo()	Karl Williamson	2017-09-09	1	-14/+19
\| \| \| \| \| \|	This is designed to generally replace nl_langinfo() in XS code. It is thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be used on systems lacking nl_langinfo.
*	Add new API function sv_rvunweaken	Dagfinn Ilmari Mannsåker	2017-09-04	1	-0/+1
\| \| \| \| \| \| \|	Needed to fix in-place sort of weak references in a future commit. Stolen from Scalar::Util::unweaken, which will be made to use this when available via CPAN upstream.
*	[perl #131883] Include pkg in :prototype warnings	Father Chrysostomos	2017-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The subref-in-stash optimisation was causing the package name to be dropped in prototype warnings triggered by the :prototype() attribute syntax, since the GV containing the stash name and the sub name did not exist because of the optimisation. Commit 2eaf799e, which introduced said optimisation, simply did not include the package name in validate_proto’s ‘name’ parameter, but just the sub name. This commit makes it tell validate_proto to use the current stash name.
*	Add another param to validate_proto	Father Chrysostomos	2017-08-28	1	-1/+1
\| \| \| \| \|	I need this in order to fix bug #131883. Since it has a bit of churn, I’m putting it in a separate commit.
*	add sv_string_from_errnum()	Zefram	2017-08-19	1	-0/+1
\| \| \| \| \| \|	This is a new API function, partly substituting for the my_strerror() that was recently removed from the public API, but also incorporating the decoding work that's done for $!.
*	Improve heuristic for UTF-8 detection in "$!"	Karl Williamson	2017-08-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the stringification of "$!" was considered to be UTF-8 if it had any characters with the high bit set, and everything was syntactically legal UTF-8. This may to correctly guess on short strings where there are only a few non-ASCII bytes. This could happen in languages based on the Latin script where many words don't use non-ASCII. This commit adds a check that the locale is a UTF-8 one. That check is a call to an already-existing subroutine which goes to some lengths to get an accurate answer, and should be essentially completely reliable on modern systems that have nl_langinfo() and/or mbtowc(). See the thread starting at http://nntp.perl.org/group/perl.perl5.porters/245902
*	add cv_get_call_checker_flags()	Zefram	2017-08-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new cv_get_call_checker_flags() is the obvious counterpart to the existing cv_set_call_checker_flags(), which was added without providing any public way to retrieve the flag state. Not only does cv_get_call_checker_flags() return the CALL_CHECKER_REQUIRE_GV flag state, it also takes a flags parameter as an input, to allow for future expansion. The gflags input can at minimum be used for the caller to indicate which flags it understands, if more checker flags are added in the future, in case such flags are not ignorable in the way that CALL_CHECKER_REQUIRE_GV is. In this commit the gflags parameter is applied to indicate whether the caller understands the CALL_CHECKER_REQUIRE_GV flag, or more precisely (due to the funny inverted sense of the flag) whether it understands the flag being clear. This use of gflags isn't really necessary, but establishes the pattern of usage.
*	merge Perl_ck_cmp() and Perl_ck_eq()	David Mitchell	2017-08-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	I added ck_eq() recently; it's used for the EQ and NE ops, while ck_cmp() is used for LT, GT, LE, GE. This commit eliminates the ck_eq() function and makes ck_cmp() handle EQ/NE too. This will will make it easier to extend the index() == -1 optimisation to handle index() >= 0 etc too. At the moment there should be no functional differences.
*	hv_pushkv(): handle keys() and values() too	David Mitchell	2017-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The newish function hv_pushkv() currently just pushes all key/value pairs on the stack. i.e. it does the equivalent of the perl code '() = %h'. Extend it so that it can handle 'keys %h' and values %h' too. This is basically moving the remaining list-context functionality out of do_kv() and into hv_pushkv(). The rationale for this is that hv_pushkv() is a pure HV-related function, while do_kv() is a pp function for several ops including OP_KEYS/VALUES, and expects PL_op->op_flags/op_private to be valid.
*	create Perl_hv_pushkv() function	David Mitchell	2017-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...and make pp_padhv(), pp_rv2hv() use it rather than using Perl_do_kv() Both pp_padhv() and pp_rv2hv() (via S_padhv_rv2hv_common()), outsource to Perl_do_kv(), the list-context pushing/flattening of a hash onto the stack. Perl_do_kv() is a big function that handles all the actions of keys, values etc. Instead, create a new function which does just the pushing of a hash onto the stack. At the same time, split it out into two loops, one for tied, one for normal: the untied one can skip extending the stack on each iteration, and use a cheaper HeVAL() instead of calling hv_iterval().
*	optimise (index() == -1)	David Mitchell	2017-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unusually, index() and rindex() return -1 on failure. So it's reasonably common to see code like if (index(...) != -1) { ... } and variants. For such code, this commit optimises away to OP_EQ and OP_CONST, and sets a couple of private flags on the index op instead, indicating: OPpTRUEBOOL return a boolean which is a comparison of what the return would have been, against -1 OPpINDEX_BOOLNEG negate the boolean result Its also supports OPpTRUEBOOL in conjunction with the existing OPpTARGET_MY flag, so for example in $lexical = (index(...) == -1) the padmy, sassign, eq and const ops are all optimised away.
*	embed.fnc: Fix declaration of my_strerror()	Karl Williamson	2017-07-15	1	-1/+1
\| \| \| \| \|	This was improperly made public (but the docs indicate it should not be used by the public).
*	embed.fnc Change Some functions only used in macros	Karl Williamson	2017-07-15	1	-5/+4
\| \| \| \| \|	The X flag is used for this situation where a function is public only because it is called from a public macro.
*	Move bulk of POSIX::setlocale to locale.c	Karl Williamson	2017-07-15	1	-11/+7
\| \| \| \| \|	This cleans up the interface, as it allows several functions to now be static that used to have to be called from outside locale.c
*	Add debugging to locale handling	Karl Williamson	2017-07-14	1	-0/+1
\| \| \| \| \| \| \|	These debug statements have proven useful in the past tracking down problems. I looked them over and kept the ones that I though might be useful in the future. This includes extracting some code into a static function so it can be called from more than one place.
*	utf8n_to_uvchr() Properly test for extended UTF-8	Karl Williamson	2017-07-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It somehow dawned on me that the code is incorrect for warning/disallowing very high code points. What is really wanted in the API is to catch UTF-8 that is not necessarily portable. There are several classes of this, but I'm referring here to just the code points that are above the Unicode-defined maximum of 0x10FFFF. These can be considered non-portable, and there is a mechanism in the API to warn/disallow these. However an earlier standard defined UTF-8 to handle code points up to 231-1. Anything above that is using an extension to UTF-8 that has never been officially recognized. Perl does use such an extension, and the API is supposed to have a different mechanism to warn/disallow on this. Thus there are two classes of warning/disallowing for above-Unicode code points. One for things that have some non-Unicode official recognition, and the other for things that have never had official recognition. UTF-EBCDIC differs somewhat in this, and since Perl 5.24, we have had a Perl extension that allows it to handle any code point that fits in a 64-bit word. This kicks in at code points above 230-1, a number different than UTF-8 extended kicks in on ASCII platforms. Things are also complicated by the fact that the API has provisions for accepting the overlong UTF-8 malformation. It is possible to use extended UTF-8 to represent code points smaller than 31-bit ones. Until this commit, the extended warning/disallowing was based on the resultant code point, and only when that code point did not fit into 31 bits. But what is really wanted is if extended UTF-8 was used to represent a code point, no matter how large the resultant code point is. This differs from the previous definition, but only for EBCDIC platforms, or when the overlong malformation was also present. So it does not affect very many real-world cases. This commit fixes that. It turns out that it is easier to tell if something is using extended-UTF8. One just looks at the first byte of a sequence. The trailing part of the warning message that gets raised is slightly changed to be clearer. It's not significant enough to affect perldiag.
*	_byte-dump_string() callable from regcomp, regexec	Karl Williamson	2017-07-02	1	-1/+1
\| \| \| \|	This changes the function so it's visible from re_comp, re_exec
*	hv.c: rename static function S_hfreeentries() to S_hv_free_entries()	Yves Orton	2017-07-01	1	-1/+1
\| \| \| \|	hfreeentries() reads very poorly - hv_free_entries() makes more sense too.
*	Add new function utf8_from_bytes_loc()	Karl Williamson	2017-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is currently undocumented externally, so we can change the API if needed. This is like utf8_from_bytes(), but in the case of not being able to convert the whole string, it converts the initial substring that is convertible, and tells you where it had to stop.
*	Add XS-callable function is_utf8_invariant_string_loc()	Karl Williamson	2017-06-08	1	-1/+1
\| \| \| \| \| \|	This is like is_utf8_invariant_string(), but takes an additional parameter, a pointer into which it stores the location of the first variant if any are found.
*	Remove deprecated function 'to_utf8_case()'	Karl Williamson	2017-06-01	1	-1/+0
\| \| \| \|	This is keeping with the schedule for 5.28.
*	Relax fatal circumstances of unescaped '{'	Karl Williamson	2017-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the 5.26.0 code freeze, it came out that an application that many others depend on, GNU Autoconf, has an unescaped '{' in it. Commit 7335cb814c19345052a23bc4462c701ce734e6c5 created a kludge that was minimal, and designed to get just that one application to work. I originally proposed a less kludgy patch that was applicable across a larger set of applications. The proposed patch didn't fatalize uses of unesacped '{' where we don't anticipate using it for something other than its literal self. That approach worked for Autoconf, but also far more instances, but was more complicated, and was rejected as being too risky during code freeze. Now this commit implements my original suggestion. I am putting it in now, to let it soak in blead, in case something else surfaces besides Autoconf, that we need to work around. By having experience with the patch live, we can be more confident about using it, if necessary, in a dot release.
*	Eliminate remaining uses of PL_statbuf	Dagfinn Ilmari Mannsåker	2017-06-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Give Perl_nextargv its own statbuf and pass a pointer to it into Perl_do_open_raw and thence S_openn_cleanup when needed. Also reduce the scope of the existing statbuf in Perl_nextargv to make it clear it's distinct from the one populated by do_open_raw. Fix perldelta entry for PL_statbuf removal
*	Remove deprecated comma-less format variable lists	Dagfinn Ilmari Mannsåker	2017-06-01	1	-1/+0
\| \| \| \|	This has been issuing a deprecation warning since perl 5.000.
*	embed.fnc: _byte_dump_string is core-only	Karl Williamson	2017-02-24	1	-1/+1
\| \| \| \|	This commit, made during the freeze, was approved by the pumpking
*	Followup on a4570f51 for t/porting/extrefs.t	Jarkko Hietaniemi	2017-02-24	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	More functions have appeared that are PERL_STATIC_INLINE, but the porting/extrefs.t compiles with -DPERL_NO_INLINE_FUNCTIONS, which means no bodies are visible, but the Tru64 cc takes static inline seriously, requiring the bodies. Instead of the manual tweak of adding #ifndef PERL_NO_INLINE_FUNCTIONS to embed.fnc, fix the problem in embed.pl so that 'i' type inserts the required ifndef. Remove the manual PERL_NO_INLINE_FUNCTIONS insertions made in a4570f51 (note that the types of some have diverged). Now the extrefs.t again works in Tru64 (and no other compiler has ever tripped on this).
*	Make _byte_dump_string() usable in all of core	Karl Williamson	2017-02-13	1	-1/+1
\| \| \| \| \| \|	I found myself needing this function for development debugging, which formerly was only usable from utf8.c. This enhances it to allow a second format type, and makes it core-accessible.
*	toke.c: Add internal function to abort parsing	Karl Williamson	2017-02-13	1	-0/+1
\| \| \| \| \| \|	This is to be called to abort the parsing early, before the required number of errors have been found. It is used when continuing the parse would be either fruitless or we could be looking at garbage.
*	Extract code into a function	Karl Williamson	2017-02-13	1	-0/+1
\| \| \| \| \| \|	This creates a function in toke.c to output the compilation aborted message, changing perl.c to call that function. This is in preparation for this to be called from a 2nd place
*	toke.c: Fix bugs where UTF-8 is turned on in mid chunk	Karl Williamson	2017-02-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous commits have tightened up the checking of UTF-8 for well-formedness in the input program or string eval. This is done in lex_next_chunk and lex_start. But it doesn't handle the case of use utf8; foo because 'foo' is checked while UTF-8 is still off. This solves that problem by noticing when utf8 is turned on, and then rechecking at the next opportunity. See thread beginning at http://nntp.perl.org/group/perl.perl5.porters/242916 This fixes [perl #130675]. A test will be added in a future commit This catches some errors earlier than they used to be and aborts. so some tests in the suite had to be split into multiple parts.
*	toke.c: Remove unused param from static function	Karl Williamson	2017-02-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d2067945159644d284f8064efbd41024f9e8448a reverted commit b5248d1e210c2a723adae8e9b7f5d17076647431. b5248 removed a parameter from S_scan_ident, and changed its interior to use PL_bufend instead of that parameter. The parameter had been used to limit how far into the string being parsed scan_ident could look. In all calls to scan_ident but one, the parameter was already PL_bufend. In the one call where it wasn't, b5248 compensated by temporarily changing PL_bufend around the call, running afoul, eventually, of the expectation that PL_bufend points to a NUL. I would have expected the reversion to add back both the parameter and the uses of it, but apparently the function interior has changed enough since the original commit, that it didn't even think there were conflicts. As a result the parameter got added back, but not the uses of it. I tried both approaches to fix this: 1) to change the function to use the parameter; 2) to simply delete the parameter. Only the latter passed the test suite without error. I then tried to understand why the parameter in the first place, and why the kludge introduced by b5248 to work around removing it. It appears to me that this is for the benefit of the intuit_more function to enable it to discern $] from a $ ending a bracketed character class, by ending the scan before the ']' when in a pattern. The trouble is that modern scan_ident versions do not view themselves as constrained by PL_bufend. If that is reached at a point where white space is allowed, it will try appending the next input line and continuing, thus changing PL_bufend. Thus the kludge in b5248 wouldn't necessarily do the expected limiting anyway. The reason the approach "1)" I tried didn't work was that the function continued to use the original value, even after it had read in new things, instead of accounting for those. Hence approach "2)" is used. I'm a little nervous about this, as it may lead to intuit_more() (which uses heuristics) having more cases where it makes the wrong choice about $] vs [...$]. But I don't see a way around this, and the pre-existing code could fail anyway. Spotted by Dave Mitchell.
*	PATCH: [perl #130666]: Revert "toke.c, S_scan_ident(): Don't take a "end of ↵	Karl Williamson	2017-01-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	buffer" argument, use PL_bufend" This reverts commit b5248d1e210c2a723adae8e9b7f5d17076647431. This commit, dating from 2013, was made unnecessary by later removal of the MAD code. It temporarily changed the PL_bufend variable; doing that ran afoul of an assertion, added in fac0f7a38edc4e50a7250b738699165079b852d8, that expects PL_bufend to point to a terminating NUL. Beyond the reversion, a test is added here.
*	add Perl_op_class(o) API function	David Mitchell	2017-01-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Given an op, this function determines what type of struct it has been allocated as. Returns one of the OPclass enums, such as OPclass_LISTOP. Originally this was a static function in B.xs, but it has wider applicability; indeed several XS modules on CPAN have cut and pasted it. It adds the OPclass enum to op.h. In B.xs there was a similar enum, but with names like OPc_LISTOP. I've renamed them to OPclass_LISTOP etc. so as not to clash with the cut+paste code already on CPAN.
*	Deprecate non-grapheme string delimiter	Karl Williamson	2016-12-23	1	-0/+3
\| \| \| \| \| \| \| \| \|	In order for Perl to eventually allow string delimiters to be Unicode grapheme clusters (which look like a single character, but may be multiple ones), we have to stop allowing a single char delimiter that isn't a grapheme by itself. These are unlikely to exist in actual code, as they would typically display as attached to the character in front of them, but we should be sure.