delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	comment pp_foo aliases in pp*.c	David Mitchell	2014-09-19	1	-0/+39
\| \| \| \| \| \| \| \|	Where pp_foo() also handles OP_BAR, add a comment above the function mentioning that it also does pp_bar. This means that "grep pp_bar pp*.c" quickly locates the file/function that handles OP_BAR.
*	Remove !IS_PADGV assertions	Father Chrysostomos	2014-09-17	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 60779a30f stopped setting PADTMP on GVs in pads, and changed many instances of if(SvPADTMP && !IS_PADGV) to if(SvPADTMP){assert(!IS_PADGV)...}. This PADTMP was leftover from the original ithreads implementation that marked these as PADTMP under the assumption that anything in a pad had to be marked PADMY or PADTMP. Since we don’t want these GVs copied the way operator targets are (PADTMP indicates the latter), we needed this !IS_PADGV check all over the place. 60779a30f was actually right in removing the flag and the !IS_PADGV check, because it should be possible for XS modules to create ops that return copiable GVs. But the assertions prevent that from working. More importantly (to me at least), this IS_PADGV doesn’t quite make sense and I am trying to eliminate it. BTW, you have to be doubly naughty, but it is possible to fail these assertions: $ ./perl -Ilib -e 'BEGIN { sub { $::{foo} = \@_; constant::_make_const(@_) }->(*bar) } \ foo' Assertion failed: (!IS_PADGV(sv)), function S_refto, file pp.c, line 576. Abort trap: 6
*	Stop undef &foo from temporarily anonymising	Father Chrysostomos	2014-09-15	1	-12/+1
\| \| \| \| \|	Instead of setting aside the name, calling cv_undef, and then naming the sub anew, just pass a flag to tell cv_undef not to unname it.
*	Fix assertion failure with undef &my_sub/&anon	Father Chrysostomos	2014-09-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$ ./perl -Ilib -le 'use experimental lexical_subs; my sub x; undef &x;' Assertion failed: (isGV_with_GP(_gvname_hek)), function Perl_leave_scope, file scope.c, line 1035. Abort trap: 6 pp_undef undefines a subroutine via cv_undef, which wipes out the name, and then restores the name again afterwards. For subs with GVs, it would call CvGV_set afterwards with the same gv. But cv_undef could have freed the GV, if the CV held the only refer- ence count. I caused this for lexical subs a few commits ago in ae77754ae (because CvGV will always return non-null; in fact the CvNAME_HEK code in pp_undef is no longer exercised, but I will address that soon). For anonymous subs it is older: $ perl5.14.4 -e '$_ = sub{}; delete $::{__ANON__}; undef &$_; use Devel::Peek; Dump $_' SV = IV(0x7fed9982f9c0) at 0x7fed9982f9d0 REFCNT = 1 FLAGS = (ROK) RV = 0x7fed9982f9e8 SV = PVCV(0x7fed9982e290) at 0x7fed9982f9e8 REFCNT = 2 FLAGS = (PADMY,WEAKOUTSIDE,CVGV_RC) COMP_STASH = 0x7fed99806b68 "main" ROOT = 0x0 GVGV::GV = 0x7fed9982fa48Assertion failed: (isGV_with_GP(_gvname_hek)), function Perl_do_gvgv_dump, file dump.c, line 1477. Abort trap: 6 (Probably commit 803f2748.) Presumably that could be made to crash in other ways than introspec- tion, but it is much harder. This commit fixes the problem by fiddling with reference counts. But this is only a temporary fix. I think I plan to stop cv_undef from removing the name (gv/hek) when called from pp_undef.
*	Avoid creating GVs when subs are declared	Father Chrysostomos	2014-09-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes ‘sub foo {...}’ declarations to store subroutine references in the stash, to save memory. Typeglobs still notionally exist. Accessing CvGV(cv) will reify them. Hence, currently the savings are lost when a sub call is compiled. $ ./miniperl -e 'sub foo{} BEGIN { warn $::{foo} } foo(); BEGIN { warn $::{foo} }' CODE(0x7f8ef082ad98) at -e line 1. *main::foo at -e line 1. This optimisation is skipped if the subroutine declaration contains a package separator. Concerning the changes in caller.t, this code: sub foo { print +(caller(0))[3],"\n" } my $fooref = delete $::{foo}; $fooref -> (); used to crash in 5.7.3 or thereabouts. It was fixed by 16658 (aka 07b8c804e8) to produce ‘(unknown)’ instead. Then in 5.13.3 it was changed (by 803f274) to produce ‘main::__ANON__’ instead. So the tests are really checking that we don’t get a crash. I think it is acceptable that it has now changed to ‘main::foo’.
*	Fix refcounting in rv2gv when it calls newGVgen	Father Chrysostomos	2014-09-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the compiler (op.c) can’t figure out the name of a vivified file- handle based on the variable name, then pp.c:S_rv2gv (which vivifies the handle at run time) calls newGVgen, which generates something named _GEN_0 or suchlike. When it does that, the reference counting is wrong, because the stash gets a *_GEN_0 typeglob and the reference stored in open’s argument points to it, too; but the reference count is nevertheless 1. So if both sources shed their pointers to the GV, then you get a double free. Because usually the typeglob sits in the stash until program exit, this bug has gone unnoticed for a long time. This bug appears to have been present ever since rv2gv started call- ing newGVgen, in 2c8ac474a0.
*	Make certain pp_sin result is always initialized.	Jarkko Hietaniemi	2014-09-01	1	-1/+2
\|
*	Avoid using function pointers for math functions.	Jarkko Hietaniemi	2014-08-31	1	-22/+17
\| \| \| \| \| \|	Otherwise AIX with long double has issues, see perl #122571. AIX has some rather intricate arrangement of symbols and macros. Also, it is okay to use two switches instead of just one.
*	[perl #122556] Make undef $s free refs forthwith	Father Chrysostomos	2014-08-28	1	-1/+2
\|
*	Make sprintf %c and chr() on inf/nan return the U+FFFD.	Jarkko Hietaniemi	2014-08-27	1	-11/+20
\| \| \| \| \|	%c was made to produce "Inf"/"NaN" earlier, but let's keep with the Unicode way, and make chr() agree with %c.
*	Add and use macros for case-insensitive comparison	Karl Williamson	2014-08-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds to handy.h isALPHA_FOLD_EQ(c1,c2) which efficiently tests if c1 and c2 are the same character, case-insensitively. For example isALPHA_FOLD_EQ(c, 's') returns true if and only if <c> is 's' or 'S'. isALPHA_FOLD_NE() is also added by this commit. At least one of c1 and c2 must be known to be in [A-Za-z] or this macro doesn't work properly. (There is an assert for this in the macro in DEBUGGING builds). That is why the name includes "ALPHA", so you won't forget when using it. This functionality has been in regcomp.c for a while, under a different name. I had thought that the only reason to make it more generally available was potential speed gain, but recent gcc versions optimize to the same code, so I thought there wasn't any point to doing so. But I now think that using this makes things easier to read (and certainly shorter to type in). Once you grok what this macro does, it simplifies what you have to keep in your mind when reading logical expressions with multiple operands. That something can be either upper or lower case can be a distraction to understanding the larger point of the expression.
*	pp.c: Fixed a quotemeta bug on perls built without locale.	Brian Fraser	2014-07-25	1	-3/+6
\| \| \| \| \|	This was causing quotemeta("\N{U+D7}") to not be quoted, as well as some other codepoints in the latin1 range.
*	refactor pp_ref	Daniel Dragan	2014-07-14	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	similar to commmit b3cf48215c -removed: -4/-8 pop on SP +4/+8 push on SP PUTBACK 1 non vol register save/restore (TARG not saved across the sv_ref()) TARG is not computed if the SV isn't a reference, so the PL_sv_no branch is slightly faster. On VC 2003 32 bit miniperl, this func dropped from 0x6D to 0x58 bytes of machine code.
*	Remove or downgrade unnecessary dVAR.	Jarkko Hietaniemi	2014-06-25	1	-110/+86
\| \| \| \| \| \| \| \|	You need to configure with g++ and -Accflags=-DPERL_GLOBAL_STRUCT or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference. (g++ does not do the "post-annotation" form of "unused".) The version code has some of these issues, reported upstream.
*	PERL_UNUSED_CONTEXT -> remove interp context where possible	Daniel Dragan	2014-06-24	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing context params will save machine code in the callers of these functions, and 1 ptr of stack space. Some of these funcs are heavily used as mg_find. The contexts can always be readded in the future the same way they were removed. This patch inspired by commit dc3bf40570. Also remove PERL_UNUSED_CONTEXT when its not needed. See removal candidate rejection rational in [perl #122106]. -Perl_hv_backreferences_p uses context in S_hv_auxinit commit 96a5add60f was wrong -Perl_whichsig_sv and Perl_whichsig_pv wrongly used PERL_UNUSED_CONTEXT from inception in commit 84c7b88cca -in authors opinion cast_ shouldn't be public API, no CPAN grep usage, can't be static and/or inline optimized since it is exported -Perl_my_unexec move to block where it is needed, make Win32 block, context free, for inlining likelyhood, private api and only 2 callers in core -Perl_my_dirfd make all blocks context free, then change proto -Perl_bytes_cmp_utf8 wrongly used PERL_UNUSED_CONTEXT from inception in commit fed3ba5d6b
*	Revert "/* NOTREACHED / belongs before* the unreachable."	Jarkko Hietaniemi	2014-06-19	1	-2/+1
\| \| \| \| \| \|	This reverts commit 148f39b7de6eae9ddd59e0b0aff691d6abea7aca. (Still needs more work, but wanted to see how well this passed with Jenkins.)
*	/* NOTREACHED / belongs before* the unreachable.	Jarkko Hietaniemi	2014-06-19	1	-1/+2
\| \| \| \| \| \|	Definitely not after it. It marks the start of the unreachable, not the first unrechable line. And if they are in that order, it looks better to linebreak after the lint hint.
*	PATCH: [perl #122126] BBC DBD::SQLite	Karl Williamson	2014-06-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This problem turns out to be a misspelling in two places of a compiler definition. Since the definition didn't exist (as it was misspelled), the #ifdef failed. I don't know how really to test this as it is locale collation, which varies by locale, and we would be relying on vendor-supplied locales which may be inconsistent between platforms. I intend to tackle improvements to collaction later this release cycle, and should come up with tests at that time. The failing tests in the module were comparing the Perl sort results with those of the module, and finding they differ.
*	PATCH: [perl #121816] Add warning for repetition x < 0	Karl Williamson	2014-06-17	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I consider this experimental, so that if code breaks as a result, we will remove it. I chose the numeric warnings category. But misc or a new subcategory of numeric might be better choices. There is also the issue if someone is calculating the repeat count in floating point and gets something that would be 0 if there were infinite precision, but ends up being a very small negative number. The current implementation will warn on that, but probably shouldn't. I suspect that this would be extremely rare in practice.
*	Some low-hanging -Wunreachable-code fruits.	Jarkko Hietaniemi	2014-06-15	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- after return/croak/die/exit, return/break are pointless (break is not a terminator/separator, it's a goto) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (the initializations will be skipped since the flow will bypass the start of the block); in some easy cases simply hoist the declarations out of the block and move them earlier Note 1: Since after this patch the core is not yet -Wunreachable-code clean, not enabling that via cflags.SH, one needs to -Accflags=... it. Note 2: At least with the older gcc 4.4.7 there are far too many "unreachable code" warnings, which seem to go away with gcc 4.8, maybe better flow control analysis. Therefore, the warning should eventually be enabled only for modernish gccs (what about clang and Intel cc?)
*	Revert "Some low-hanging -Wunreachable-code fruits."	Jarkko Hietaniemi	2014-06-13	1	-14/+12
\| \| \| \| \| \| \|	This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679. I don't understand - smoke-me came back happy with three separate reports... oh well, some other time.
*	Some low-hanging -Wunreachable-code fruits.	Jarkko Hietaniemi	2014-06-13	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- after croak/die/exit (or return), break (or return!) are pointless (break is not a terminator/separator, it's a promise of a jump) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (they will be skipped!); in some easy cases simply hoist the declarations out of the block and move them earlier There are still a few places left.
*	pp.c: Add comment	Karl Williamson	2014-06-13	1	-1/+2
\|
*	Silence several -Wunused-parameter warnings about my_perl	Brian Fraser	2014-06-13	1	-0/+1
\| \| \| \| \| \| \| \|	This meant sprinkling some PERL_UNUSED_CONTEXT invocations, as well as stopping some functions from getting my_perl in the first place; all of the functions in the latter category are internal (S_ prefix and s or i in embed.fnc), so this should be both safe and economical.
*	Adding missing SVfARG() invocations	Brian Fraser	2014-06-13	1	-1/+1
\| \| \| \|	This silences a chunk of warnings under -Wformat
*	Fix some compilation warnings	Karl Williamson	2014-06-12	1	-1/+6
\| \| \| \| \| \|	After commits d6ded95025185cb1ec8ca3ba5879cab881d8b180 and 130c5df3625bd130cd1e2771308fcd4eb66cebb2, there are some compilation warnings if not all locale categories are used.
*	pp.c: Fix Win32 compilation problems	Karl Williamson	2014-06-12	1	-16/+8
\| \| \| \| \| \| \| \|	Commit 130c5df3625bd130cd1e2771308fcd4eb66cebb2 introduced errors into Windows (at least) compilations because it used #if's in the middle of apparent function calls, but these were really macros that turned the function call foo() into a call of Perl_foo(), and so we were doing an #if from within a #define which is not generally legal.
*	Allow to compile if don't have LC_CTYPE etc defined	Karl Williamson	2014-06-12	1	-19/+82
\| \| \| \| \| \| \| \|	Commit d6ded95025185cb1ec8ca3ba5879cab881d8b180 introduced the ability to specify individual category parameters to 'use locale'. However in doing so, it causes Perl to not be able to compile on platforms that don't have some or all of those categories defined, such as Android. This commit uses #ifdefs to remedy that.
*	pp.c: Vertically stack ternary operators	Karl Williamson	2014-06-12	1	-2/+5
\| \| \| \|	This is for comprehensibility and to make a future commit easier.
*	Add parameters to "use locale"	Karl Williamson	2014-06-05	1	-18/+18
\| \| \| \| \| \| \|	This commit allows one to specify to enable locale-awareness for only a specified subset of the locale categories. Thus you could make a section of code LC_MESSAGES aware, with no locale-awareness for the other categories.
*	Accessing array before its start is dubious.	Jarkko Hietaniemi	2014-05-28	1	-4/+4
\| \| \| \| \| \| \| \| \|	Fix by petermartini. Fix for Coverity perl5 CID 28909: Out-of-bounds access (ARRAY_VS_SINGLETON) ptr_arith: Using &unsliced_keysv as an array. This might corrupt or misinterpret adjacent memory locations.
*	refactor pp_list	Daniel Dragan	2014-05-28	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	-move PL_stack_sp and PL_stack_base reads into the branch in which they are used, this also removes 1 var from being saved across the function call in GIMME, which removes saving and restoring 1 non-vol register -write SP to PL_stack_sp (PUTBACK) only if it was changed -POPMARK is mutable, it must execute on all branches this reduced pp_list's machine code size of the function from 0x58 to 0x53 bytes on VC 2003 -01 32 bits
*	fix the I32 bug for index() and rindex()	Tony Cook	2014-05-28	1	-5/+5
\|
*	don't set SvPADTMP() on PADGV's	David Mitchell	2014-02-27	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under threaded builds, GVs for OPs are stored in the pad rather than being directly attached to the op. For some reason, all such GV's were getting the SvPADTMP flag set. There seems to be be no good reason for this, and after skipping setting the flag, all tests still pass. The advantage of not setting this flag is that there are quite a few hot places in the code that do if (SvPADTMP(sv) && !IS_PADGV(sv)) { ... I've replaced them all with if (SvPADTMP(sv)) { assert(!IS_PADGV(sv)); ... Since the IS_PADGV() macro expands to something quite heavyweight, this is quite a saving: for example this commit reduces the size of pp_entersub by 111 bytes.
*	Change av_len calls to av_tindex for clarity	Karl Williamson	2014-02-20	1	-3/+3
\| \| \| \| \| \|	av_tindex is a more clearly named synonym for av_len, available starting in v5.18. This changes the core uses to it, including modules in /ext, which are not dual-lifed.
*	[perl #121242] Fix crash in gp_free when gv is freed	Father Chrysostomos	2014-02-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 4571f4a caused the gp to have a refcount of 1, not 0, in gp_free when the contents of the glob are freed. This makes gv_try_downgrade see the gv as a candidate for downgrading in this example: sub Fred::AUTOLOAD { $Fred::AUTOLOAD } undef {"Fred::AUTOLOAD"}; When the glob is undefined, the sub inside it is freed, and the gvop ($Fred::AUTOLOAD), when freed, tries to downgrade the glob (Fred::AUTOLOAD). Since it is empty, it deletes it completely from the containing stash, so the GV is freed out from under gp_free, which is still using it, causing an assertion failure. We can trigger a similar condition more explicitly: $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; undef {"foo"}' This bug is nothing new. On a non-debugging 5.18.2, I get this: $ perl5.18.2 -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; undef {"foo"}' Attempt to free unreferenced glob pointers at -e line 1. Segmentation fault: 11 That crashes in pp_undef after the call to gp_free, becaues pp_undef continues to manipulate the GV. The problem occurs not only with pp_undef, but also with other func- tions calling gp_free: sv_setsv_flags: $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; {"foo"}="bar"' glob_assign_glob: $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; {"foo"}=bar' sv_unglob, reached through various paths: $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local bar}; $${"foo"} = bless []; ${"foo"} = 3' $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local bar}; $${"foo"} = bless []; utf8::encode(${"foo"})' $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local bar}; $${"foo"} = bless []; open bar, "t/TEST"; ${"foo"} .= <bar>' $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local bar}; $${"foo"} = bless []; ${"foo"}++' $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local bar}; $${"foo"} = bless []; undef ${"foo"}' $ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = 3; ${"foo"} =~ s/3/${"foo"} = do {local *bar}; $${"foo"} = bless []; 4/e' And there are probably more ways to trigger this through sv_unglob. (I stopped looking when I thought of the fix.) This patch fixes the problem by protecting the GV using the mortals stack in functions that call gp_free. I did not change gp_free itself, since it is an API function that as yet does not touch the mortals stack, and I am not sure that should change. All of its callers that this patch touches already do sv_2mortal in some cir- cumstances.
*	pp.c: Silence compiler warning	Karl Williamson	2014-01-30	1	-1/+1
\| \| \| \| \| \|	The only time the result of toFOLD_LC() can be larger than a byte is in a UTF-8 locale, which has already been ruled out for this section of code.
*	Move the RXf_ANCH flags to intflags as PREGf_ANCH_xxx and add ↵	Yves Orton	2014-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	RXf_IS_ANCHORED as a replacement The only requirement outside of the regex engine is to identify that there is an anchor involved at all. So we move the 4 anchor flags to intflags and replace it with a single aggregate flag RXf_IS_ANCHORED in extflags. This frees up another 3 bits in extflags.
*	Work properly under UTF-8 LC_CTYPE locales	Karl Williamson	2014-01-27	1	-15/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
*	Taint more operands with case changes	Karl Williamson	2014-01-27	1	-41/+23
\| \| \| \| \| \| \| \| \| \|	The documentation says that Perl taints certain operations when subject to locale rules, such as lc() and ucfirst(). Prior to this commit there were exceptions when the operand to these functions contained no characters whose case change actually varied depending on the locale, for example the empty string or above-Latin1 code points. Changing to conform to the documentation simplifies the core code, and yields more consistent results.
*	Add some cBOOL casts to macros	Karl Williamson	2014-01-22	1	-4/+4
\| \| \| \| \| \|	I kept getting burned by these macros returning non-zero instead of a boolean, as they are used as bool parameters to functions. So I added cBOOLs to them.
*	pp.c: Remove unnecessary mask operation.	Karl Williamson	2014-01-01	1	-1/+1
\| \| \| \| \|	An unsigned character (U8) should not have more than 8 bits of data, so no need to force that by masking with 0xFF.
*	pp.c: Guard against malformed UTF-8 input in ord()	Karl Williamson	2014-01-01	1	-1/+2
\| \| \| \| \| \| \|	This code got the actual length of the input scalar, but discarded it. If that scalar contains malformed UTF-8 that has fewer bytes than is indicated, a read beyond-buffer-end could happen. Simply use the actual length.
*	pp.c: Simplify lc and uc stringification code	Father Chrysostomos	2014-01-01	1	-39/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally, lc and uc would not warn about undef, due to an implemen- tation detail. The implementation changed in 673061948, and extra code was added to keep the behaviour the same. Commit 0a0ffbced enabled the warnings about undef, but did so by added even more code in the midst of the blocks that existed solely to avoid the warning. We can just delete those blocks and put in a simple stringification.
*	pp.c: Improve self-referential comment	Father Chrysostomos	2014-01-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pp.c:pp_lc has this: /* Here is where we would do context-sensitive actions. See the * commit message for this comment for why there isn't any */ If I try to look up the commit that added the comment, I get this: commit 06b5486afd6f58eb7fdf8c5c8cdb8520a4c87f40 Author: Karl Williamson <public@khwilliamson.com> Date: Fri Nov 11 10:13:28 2011 -0700 pp.c: White-space only This outdents and reflows comments as a result of the removal of a surrounding block 86510fb15 was the commit that added the comment, whose commit message contains the explanation, so cite that directly.
*	Reënable in-place lc/uc	Father Chrysostomos	2014-01-01	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It used to be that this code: for("$foo") { lc $_; ... } would modify $_, allowing other code in the ‘for’ block to see the changes (bug #43207). Commit 17fa077605 fixed that by changing the logic that determined whether lc/uc(first) could modify the sca- lar in place. In doing so, it stopped in-place modification from happening at all, because the condition became SvPADTMP && SvTEMP, which never happens. (SvPADTMP unually indicates an operator return value stored in a pad; i.e., a scalar that will next be used by the same operator again to return another value. SvTEMP indicates that the REFCNT will go down shortly, usually a temporary value created solely for the sake of returning something.) Now that bug #78194 is fixed, for("$foo") no longer exposes a PADTMP to the following code, so we can now assume (as was done erroneously before) that PADTMP indicates something like lc("$foo$bar") and modify pp_stringify’s return value in place. Also, we can extend this to apply to TEMP variables that have a ref- erence count of 1, since they cannot be in use elsewhere. We skip TEMP variables with set-magic, because they could be tied, and SvSETMAGIC would have a side effect. (That could happen with lc(delete $h{tied_elem}).) Previously, this was skipped for uc and lc for overloaded references, since stringification could change the utf8ness. That is no longer sufficient. As of Perl 5.16, typeglobs and non-overloaded blessed references can also enable their utf8 flag upon stringification, if the stash or glob names contains wide characters. So I changed the !SvAMAGIC (not overloaded) to SvPOK (is a string already), which will cover most cases where this optimisation helps. The two tests added to the end of lc.t fail with !SvAMAGIC.
*	pp.c: Remove redundant diag_listed_as	Father Chrysostomos	2013-12-23	1	-1/+0
\|
*	Revert "make perl core quiet under -Wfloat-equal"	David Mitchell	2013-11-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	A suggested way of avoiding the the warning on nv1 != nv2 by replacing it with (nv1 < nv2 \|\| nv1 > nv2), has too many issues with NaN. [perl #120538]. I haven't found any other way of selectively disabling the warning, so for now I'm just reverting the whole commit. This reverts commit c279c4550ce59702722d0921739b1a1b92701b0d.
*	make perl core quiet under -Wfloat-equal	David Mitchell	2013-11-09	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The gcc option -Wfloat-equal warns when two floating-point numbers are directly compared for equality or inequality, the idea being that this is usually a logic error, and that you should be checking that the values are instead very near to each other. perl on the other hand has lots of reasons to do a direct comparison. Add two macros, NV_eq_nowarn(a,b) and NV_eq_nowarn(a,b) that do the same as (a == b) and (a != b), but without the warnings. They achieve this by instead doing (a < b) \|\| ( a > b). Under gcc at least, this is optimised into the same code as the direct comparison. The are three places that I've left untouched, because they are handling NaNs, and that gets a bit tricky. In particular (nv != nv) is a test for a NaN, and replacing it with (< \|\| >) creates signalling NaNs (whereas == and != create quiet NaNs)
*	Stop lexical CORE sub from interfering with CORE::	Father Chrysostomos	2013-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way CORE:: was handled in the lexer was convoluted. CORE was treated initially as a keyword, with exceptions in the lexer to make it behave correctly. If it turned out not to be followed by ::, then the lexer would fall back to treating it as a bareword or sub name. Before even checking for a keyword, the lexer looks for :: and goes to the bareword/sub code. But it made a special exception there for CORE::. In the end, treating CORE as a keyword recognized by the keyword() function requires more special cases than simply special-casing CORE:: in toke.c. This fixes the lexical CORE sub bug, while reducing the total num- ber of lines.