delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Make global two interpreter variables	Karl Williamson	2018-07-14	1	-4/+0
\| \| \| \| \|	These variables are constant, once initialized, through the life of a program, so having them be per instance is a waste of time and space
*	Use compiled-in C structure for inverted case folds	Karl Williamson	2018-03-31	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	This commit changes to use the C data structures generated by the previous commit to compute what characters fold to a given one. This is used to find out what things should match under /i. This now avoids the expensive start up cost of switching to perl utf8_heavy.pl, loading a file from disk, and constructing a hash from it.
*	Remove obsolete variables	Karl Williamson	2018-03-31	1	-1/+0
\| \| \| \| \|	These were for when some of the Posix character classes were implemented as swashes, which is no longer the case, so these can be removed.
*	Use charnames inversion lists	Karl Williamson	2018-03-31	1	-2/+0
\| \| \| \| \| \| \| \|	This commit makes the inversion lists for parsing character name global instead of interpreter level, so can be initialized once per process, and no copies are created upon new thread instantiation. More importantly, this is another instance where utf8_heavy.pl no longer needs to be loaded, and the definition files read from disk.
*	Move case change invlists from interpreter to global	Karl Williamson	2018-03-26	1	-6/+0
\| \| \| \| \|	These are now constant through the life of the program, so don't need to be duplicated at each new thread instantiation.
*	Move UTF-8 case changing data into core	Karl Williamson	2018-03-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, if a program wanted to compute the case-change of a character above 0xFF, the C code would switch to perl, loading lib/utf8heavy.pl and then read another file from disk, and then create a hash. Future references would use the hash, but the start up cost is quite large. There are five case change types, uc, lc, tc, fc, and simple fc. Only the first encountered requires loading of utf8_heavy, but each required switching to utf8_heavy, and reading the appropriate file from disk. This commit changes these functions to use compiled-in C data structures (inversion maps) to represent the data. To look something up requires a binary search instead of a hash lookup. An individual hash lookup tends to be faster than a binary search, but the differences are small for small sizes. I did some benchmarking some years ago, (commit message 87367d5f9dc9bbf7db1a6cf87820cea76571bf1a) and the results were that for fewer than 512 entries, the binary search was just as fast as a hash, if not actually faster. Now, I've done some more benchmarks on blead, using the tool benchmark.pl, which wasn't available back then. The results below indicate that the differences are minimal up through 2047 entries, which all Unicode properties are well within. A hash, PL_foldclosures, is still constructed at runtime for the case of regular expression /i matching, and this could be generated at Perl compile time, as a further enhancement for later. But reading a file from disk is no longer required to do this. ======================= benchmarking results ======================= Key: Ir Instruction read Dr Data read Dw Data write COND conditional branches IND indirect branches _m branch predict miss _m1 level 1 cache miss _mm last cache (e.g. L3) miss - indeterminate percentage (e.g. 1/0) The numbers represent raw counts per loop iteration. "\x{10000}" =~ qr/\p{CWKCF}/" swash invlist Ratio % fetch search ------ ------- ------- Ir 2259.0 2264.0 99.8 Dr 665.0 664.0 100.2 Dw 406.0 404.0 100.5 COND 406.0 405.0 100.2 IND 17.0 15.0 113.3 COND_m 8.0 8.0 100.0 IND_m 4.0 4.0 100.0 Ir_m1 8.9 17.0 52.4 Dr_m1 4.5 3.4 132.4 Dw_m1 1.9 1.2 158.3 Ir_mm 0.0 0.0 100.0 Dr_mm 0.0 0.0 100.0 Dw_mm 0.0 0.0 100.0 These were constructed by using the file whose contents are below, which uses the property in Unicode that currently has the largest number of entries in its inversion list, > 1600. The test was run on blead -O2, no debugging, no threads. Then the cut-off boundary was changed from 512 to 2047 for when we use a hash vs an inversion list, and the test run again. This yields the difference between a hash fetch and an inversion list binary search ===================== The benchmark file is below =============== no warnings 'once'; my @benchmarks; push @benchmarks, 'swash' => { desc => '"\x{10000}" =~ qr/\p{CWKCF}/"', setup => 'no warnings "once"; my $re = qr/\p{CWKCF}/; my $a = "\x{10000}";', code => '$a =~ $re;', }; \@benchmarks;
*	Don't include interpreter variable unless used	Karl Williamson	2018-03-16	1	-0/+2
\| \| \| \| \| \| \|	This adds an #ifdef around this variable, so that it isn't defined unless used. Spotted by Daniel Dragan.
*	Make Unicode data structures global	Karl Williamson	2018-03-14	1	-24/+0
\| \| \| \| \| \| \| \| \| \|	These structures are read-only, use const C strings, and are truly global, so no need to have them be interpreter level. This saves duplicating and freeing them as threads come and go. In doing this, I noticed that not every one was properly being copied/deallocated, so this fixes some potential unreported bugs, and leaks.
*	Add thread-safe locale handling	Karl Williamson	2018-02-18	1	-0/+8
\| \| \| \| \| \|	This (large) commit allows locales to be used in threaded perls on platforms that support it. This includes recent Windows and Posix 2008 ones.
*	Latch LC_NUMERIC during critical sections	Karl Williamson	2018-02-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for operations on threaded perls which don't 'use locale' to still change the locale. This happens when calling POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can happen for other operations when perl has been initialized with the environment causing the various locale categories to not have a uniform locale. This commit causes the areas where the locale for this category should predictably be in one or the other state to be a critical section where another thread can't interrupt and change it. This is a separate mutex, so that only these particular operations will be held up.
*	Add Perl_setlocale()	Karl Williamson	2018-02-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	khw could not find any modules on CPAN that correctly use the C library function setlocale(). (The very few that do try, do not use it correctly, looking at the return value incorrectly, so they are broken.) This analysis does not include modules that call non-Perl libaries that may call setlocale(). And, a future commit will render the setlocale() function useless in some configurations on some platforms. So this commit adds Perl_setlocale(), for XS code to call, and which is always effective, but it should not be used to alter the locale except on platforms where the predefined variable ${^SAFE_LOCALES} evaluates to 1. This function is also what POSIX::setlocale() calls to do the real work.
*	Use proper #define to see if need PLnumeric underlying_obj	Karl Williamson	2018-02-18	1	-1/+1
\| \| \| \| \|	perl.h has a single #define which is the combination of several that determines if this object should be created or not.
*	Avoid changing locale when finding radix char	Karl Williamson	2018-01-30	1	-0/+5
\| \| \| \| \| \| \| \| \|	On systems that have the POSIX 2008 operations, including nl_langinfo_l(), this commit causes them to not have to actually change the locale when determining what the decimal point character is. The locale may have to change during the printing/reading of numbers, but eventually we can use sprintf_l(), if available, to avoid that too.
*	Cache locale UTF8-ness lookups	Karl Williamson	2018-01-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some locales are UTF-8, some are not. Knowledge of this is needed in various circumstances. This commit saves the results of the last several lookups so they don't have to be recalculated each time. The full generality of POSIX locales is such that you can have error messages be displayed in one locale, say Spanish, while other things are in French. To accommodate this generality, the program can loop through all the locale categories finding the UTF8ness of the locale it points to. However, in almost all instances, people are going to be in either French or in Spanish, and not in some combination. Suppose it is a French UTF-8 locale for all categories. This new cache will know that the French locale is UTF-8, and the queries for all but the first category can return that immediately. This simple cache avoids the overhead of hashes. This also fixes a bug I realized exists in threaded perls, but haven't reproduced. We do not support locales in such perls, and the user must not change the locale or 'use locale'. But perl itself could change the locale behind the scenes, leading to segfaults or incorrect results. One such instance is the determination of UTF8ness. But this only could happen if the full generality of locales is used so that the categories are not all in the same locale. This could only happen (if the user doesn't change locales) if the environment is such that the perl program is started up so that the categories are in such a state. This commit fixes this potential bug by caching the UTF8ness of each category at startup, before any threads are instantiated, and so checking for it later just looks it up in the cache, without perl changing the locale.
*	Avoid some unnecessary changing of locales	Karl Williamson	2018-01-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The LC_NUMERIC locale category is kept so that generally the decimal point (radix) is a dot. For some (mostly) output purposes, it needs to be swapped into the program's current underlying locale so that a non-dot can be printed. This commit changes things so that if the current underlying locale uses a decimal point, the swap doesn't happen, as it's not needed.
*	Remove unused interpreter variable	Karl Williamson	2017-12-26	1	-1/+0
\| \| \| \|	This somehow became unused or never got used; I didn't do the research.
*	Add script_run regex feature	Karl Williamson	2017-12-24	1	-0/+1
\| \| \| \|	As explained in the docs, this helps detect spoofing attacks.
*	make exec keep its argument list more reliably	Zefram	2017-12-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Bits of exec code were putting the constructed commands into globals PL_Argv and PL_Cmd, which could then be clobbered by reentrancy. These are only global in order to manage their freeing, but that's better managed by using the scope stack. So replace them with automatic variables, with ENTER/SAVEFREEPV/LEAVE to free the memory. Also copy the strings acquired from SVs, to avoid magic clobbering the buffers of SVs already read. Fixes [perl #129888].
*	Change name of locale per-interpreter variable	Karl Williamson	2017-11-08	1	-3/+3
\| \| \| \| \| \|	The real purpose of this internal variable is to give the name of the locale that is the underlying one for the C program. Various macros already indicate that. This furthers the process.
*	Change upper limit handling of -Dr output	Karl Williamson	2017-10-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2bfbbbaf9ef1783ba914ff9e9270e877fbbb6aba changed things so -Dr output could be changed through an environment variable to truncate the output differently than the default. For most purposes, the default is good enough, but for someone trying to debug the regcomp internals, sometimes one wants to see more than is output by default. That commit did not catch all the places. This one changes the handling so that any place that use the previous default maximum now uses the environment variable (if set) instead.
*	Don't use VOL internally, because "volatile" works just fine	Aaron Crane	2017-10-21	1	-1/+1
\| \| \| \|	However, we do preserve it outside PERL_CORE for the use of XS authors.
*	(perl #127663) create a separate random source for internal use	Tony Cook	2017-09-11	1	-0/+8
\| \| \| \| \|	and use it to initialize hash randomization and to innoculate against quadratic behaviour in pp_sort
*	Add API function Perl_langinfo()	Karl Williamson	2017-09-09	1	-0/+3
\| \| \| \| \| \|	This is designed to generally replace nl_langinfo() in XS code. It is thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be used on systems lacking nl_langinfo.
*	Make immortal SVs contiguous	David Mitchell	2017-07-27	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that PL_sv_yes, PL_sv_undef, PL_sv_no and PL_sv_zero are allocated adjacently in memory. This allows the SvIMMORTAL() test to be more efficient, and will (in the next commit) allow SvTRUE() to be more efficient. In MULTIPLICITY builds the constraint is already met by virtue of them being adjacent items in the interpreter struct. For non-MULTIPLICITY builds, they were just 4 global vars with no guarantees of where they would be allocated. For this case, PL_sv_undef are deleted as global vars and replaced with a new global var PL_sv_immortals[4], with #define PL_sv_yes (PL_sv_immortals[0]) etc in their place.
*	add PL_sv_zero	David Mitchell	2017-07-27	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	it's like PL_sv_no, except that its string value is "0" rather than "". It can be used for example where pp function wants to push a zero return value on the stack. The next commit will start to use it. Also update the SvIMMORTAL() to be more efficient: it now checks whether the SV's address is in a range rather than individually checking against &PL_sv_undef, &PL_sv_no etc.
*	Eliminate remaining uses of PL_statbuf	Dagfinn Ilmari Mannsåker	2017-06-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Give Perl_nextargv its own statbuf and pass a pointer to it into Perl_do_open_raw and thence S_openn_cleanup when needed. Also reduce the scope of the existing statbuf in Perl_nextargv to make it clear it's distinct from the one populated by do_open_raw. Fix perldelta entry for PL_statbuf removal
*	Improve perlintern.pod docs for PL_dowarn	Aaron Crane	2017-01-07	1	-2/+4
\|
*	Create inversion list for Assigned code points	Karl Williamson	2016-12-23	1	-0/+1
\| \| \| \|	This will be used in a future commit.
*	Deprecate isFOO_utf8() macros	Karl Williamson	2016-12-23	1	-0/+1
\| \| \| \| \| \|	These macros are being replaced by a safe version; they now generate a deprecation message at each call site upon the first use there in each program run.
*	Change name of PL_ variable	Karl Williamson	2016-11-28	1	-2/+1
\| \| \| \| \| \| \|	This variable really means the character that replaces any embedded NULs when doing collation. Change the name accordingly. (Embedded NULs must be replaced because the libc function strxfrm is used, and it operates on C strings which have no embedded NULs.)
*	PATCH: [perl #129953] lib/locale.t failures on FREEBSD	Karl Williamson	2016-11-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I thought this bug was in FREEBSD, but when I went to gather the info needed to report it to the vendor, it turned out to be a mistake I had made. The problem is basically doubly encoding into UTF-8. In order to save CPU time, in a UTF-8 locale, I had stored a string as UTF-8 encoded. This string is to be inserted into a larger string. What I neglected to consider in this situation is that not all strings in such locales need be in UTF-8. The UTF-8 encoded insert could get added to a non-UTF-8 string, and the result later was switched to UTF-8, so the inserted string's bytes were individually converted to UTF-8, effectively a second time. This is a problem only if the inserted string is different when encoded in UTF-8 than not, and for this particular usage, on most platforms it was UTF-8 invariant, so did not show up, except on those platforms where it was variant. The solution is to store the replacement as a code point, and encode it as UTF-8 only if necessary, once. This actually simplifies the code.
*	rework perl #129903 - inf recursion from use of empty pattern in regex codeblock	Yves Orton	2016-11-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FC didn't like my previous patch for this issue, so here is the one he likes better. With tests and etc. :-) The basic problem is that code like this: /(?{ s!!! })/ can trigger infinite recursion on the C stack (not the normal perl stack) when the last successful pattern in scope is itself. Since the C stack overflows this manifests as an untrappable error/segfault, which then kills perl. We avoid the segfault by simply forbidding the use of the empty pattern when it would resolve to the currently executing pattern. I imagine with a bit of effort someone can trigger the original SEGV, unlike my original fix which forbade use of the empty pattern in a regex code block. So if someone actually reports such a bug we might have to revert to the older approach of prohibiting this.
*	make PL_ pad vars be of type PADOFFSET	David Mitchell	2016-09-26	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that that PADOFFSET is signed, make PL_comppad_name_fill PL_comppad_name_floor PL_padix PL_constpadix PL_padix_floor PL_min_intro_pending PL_max_intro_pending be of type PADOFFSET rather than I32, to match the rest of the pad interface. At the same time, change various I32 local vars in pad.c functions to be PADOFFSET.
*	Re-order intrp struct	Father Chrysostomos	2016-08-14	1	-3/+1
\|
*	Remove PL_maxo	Father Chrysostomos	2016-08-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have an interpreter variable using memory, PL_maxo, which is defined to be the same as MAXO, a #defined constant. As far as I can tell, it is never used in lvalue context, in core or on CPAN, except for the initialisation in intrpvar.h. It can simply be removed and replaced with a macro defined as equiva- lent to MAXO. It was added in this commit: commit 84ea024ac9cdf20f21223e686dddea82d5eceb4f Author: Perl 5 Porters <perl5-porters.nicoh.com> Date: Tue Jan 2 23:21:55 1996 +0000 perl 5.002beta1h patch: perl.h 5.002beta1 attempted some memory optimizations, but unfortunately they can result in a memory leak problem. This can be avoided by #define STRANGE_MALLOC. I do that here until consensus is reached on a better strategy for handling the memory optimizations. Include maxo for the maximum number of operations (needed for the Safe extension). But apparently it is not needed for the Safe extension (tests pass without it).
*	Remove PL_(lex_)encoding and all dependent code	Father Chrysostomos	2016-07-13	1	-3/+0
\|
*	Do better locale collation in UTF-8 locales	Karl Williamson	2016-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some platforms, the libc strxfrm() works reasonably well on UTF-8 locales, giving a default collation ordering. It will assume that every string passed to it is in UTF-8. This commit changes Perl to make sure that strxfrm's expectations are met. Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8 string. And this commit makes sure of that as well. So, simply meeting strxfrm's expectations allows Perl to start supporting default collation in UTF-8 locales, and fixes it to work on single-byte locales with UTF-8 input. (Unicode::Collate provides tailorable functionality and is portable to platforms where strxfrm isn't as intelligent, but is a much more heavy-weight solution that may not be needed for particular applications.) There is a problem in non-UTF-8 locales if the passed string contains code points representable only in UTF-8. This commit causes them to be changed, before being passed to strxfrm, into the highest collating character in the locale that doesn't require UTF-8. They then will sort the same as that character, which means after all other characters in the locale but that one. In strings that don't have that character, this will generally provide exactly correct operation. There still is a problem, if that character, in the given locale, combines with adjacent characters to form a specially weighted sequence. Then, the change of these above-255 code points into that character can skew the results. See the commit message for 6696cfa7cc3a0e1e0eab29a11ac131e6f5a3469e for more on this. But it is really an illegal situation to have above-255 code points in a single-byte locale, so this behavior is a reasonable degradation when given illegal input. If two transformed strings compare exactly equal, Perl already uses the un-transformed versions to break ties, and there, these faked-up strings will collate so the above-255 code points sort after everything else, and in code point order amongst themselves.
*	locale.c: Change algorithm for strxfrm() trials	Karl Williamson	2016-05-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's kind of guess work deciding how big a buffer to give to strxfrm(). If you give it too small a one, it will fail. Prior to this commit, the buffer size was doubled and then strxfrm() was called again, looping until it worked, or we used too much memory. Each time a new locale is made, we try to minimize the necessity of doing this by calculating numbers 'm' and 'b' that can be plugged into the equation mx + b where 'x' is the size of the string passed to strxfrm(). strxfrm() is roughly linear with respect to its input's length, so this generally works without us having to do many loops to get a large enough size. But on many systems, strxfrm(), in failing, returns how much space you should have given it. On such systems, we can just use that number on the 2nd try and not have to keep guessing. This commit changes to do that. But on other systems this doesn't work. So the original method is retained if we determine that there are problems with strxfrm(), either from previous experience, or because using the size returned from the first trial didn't work
*	Change mem_collxfrm() algorithm for embedded NULs	Karl Williamson	2016-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the problems in implementing Perl is that the C library routines forbid embedded NUL characters, which Perl accepts. This is true for the case of strxfrm() which handles collation under locale. The best solution as far as functionality goes, would be for Perl to write its own strxfrm replacement which would handle the specific needs of Perl. But that is not going to happen because of the huge complexity in handling it across many platforms. We would have to know the location and format of the locale definition files for every such platform. Some might follow POSIX guidelines, some might not. strxfrm creates a transformation of its input into a new string consisting of weight bytes. In the typical but general case, a 3 character NUL-terminated input string 'A B C 00' (spaces added for readability) gets transformed into something like: A¹ B¹ C¹ 01 A² B² C² 01 A³ B³ C³ 00 where the superscripted characters are weights for the corresponding input characters. Superscript 1 represents (essentially) the primary sorting key; 2, the secondary, etc, for as many levels as the locale definition gives. The 01 byte is likely to be the separator between levels, but not necessarily, and there could be some other mechanisms used on various platforms. To handle embedded NULs, the simplest thing would be to just remove them before passing in to strxfrm(). Then they would be entirely ignored, which might not be what you want. You might want them to have some weight at the tertiary level, for example. It also causes problems because strxfrm is very context sensitive. The locale definition can define weights for specific sequences of any length (and the weights can be multi-byte), and by removing a NUL, two characters now become adjacent that weren't in the input, and they could now form one of those special sequences and thus throw things off. Another way to handle NULs, that seemingly ignores them, but actually doesn't, is the mechanism in use prior to this commit. The input string is split at the NULs, and the substrings are independently passed to strxfrm, and the results concatenated together. This doesn't work either. In our example 'A B C 00', suppose B is a NUL, and should have some weight at the tertiary level. What we want is: A¹ C¹ 01 A² C² 01 A³ B³ C³ 00 But that's not at all what you get. Instead it is: A¹ 01 A² 01 A³ C¹ 01 C² 01 C³ 00 The primary weight of C comes immediately after the teriary weight of A, but more importantly, a NUL, instead of being ignored at the primary levels, is significant at all levels, so that "a\0c" would sort before "ab". Still another possibility is to replace the NUL with some other character before passing it to strxfrm. That was my original plan, to replace each NUL with the character that this code determines has the lowest collation order for the current locale. On strings that don't contain that character, the results would be as good as it gets for that locale. That character is likely to be ignored at higher weight levels, but have some small non-ignored weight at the lowest ones. And hopefully the character would rarely be encountered in practice. When it does happen, it and NUL would sort identically; hardly the end of the world. If the entire strings sorted identically, the NUL-containing one would come out before the other one, since the original Perl strings are used as a tie breaker. However, testing showed a problem with this. If that other character is part of a sequence that has special weighting, the results won't be correct. With gcc, U+00B4 ACUTE ACCENT is the lowest collating character in many UTF-8 locales. It combines in Romanian and Vietnamese with some other characters to change weights, and hence changing NULs into U+B4 screws things up. What I finally have come to is to do is a modification of this final approach, where the possible NUL replacements are limited to just characters that are controls in the locale. NULs are replaced by the lowest collating control. It would really be a defective locale if this control combined with some other character to form a special sequence. Often the character will be a 01, START OF HEADING. In the very unlikely case that there are absolutely no controls in the locale, 01 is used, because we have to replace it with something. The code added by this commit is mostly utf8-ready. A few commits from now will make Perl properly work with UTF-8 (if the platform supports it). But until that time, this isn't a full implementation; it only looks for the lowest-sorting control that is invariant, where the the UTF8ness doesn't matter. The added tests are marked as TODO until then.
*	Keep track of if collation locale is UTF-8 or not	Karl Williamson	2016-05-24	1	-0/+1
\| \| \| \|	This will be used in future commits
*	Add environment variable for -Dr: PERL_DUMP_RE_MAX_LEN	Karl Williamson	2016-02-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The regex engine when displaying debugging info, say under -Dr, will elide data in order to keep the output from getting too long. For example, the number of code points in all of Unicode matched by \w is quite large, and so when displaying a pattern that matches this, only the first some number of them are printed, and the rest are truncated, represented by "...". Sometimes, one wants to see more than what the compiled-into-the-engine-max shows. This commit creates code to read this environment variable to override the default max lengths. This changes the lengths for everything to the input number, even if they have different compiled maximums in the absence of this variable. I'm not currently documenting this variable, as I don't think it works properly under threads, and we may want to alter the behavior in various ways as a result of gaining experience with using it.
*	intrvar.h: document PL_tmps_max	David Mitchell	2016-02-03	1	-1/+1
\| \| \| \|	Its name implies that it's the top allocated element; in fact it's top+1.
*	Add qr/\b{lb}/	Karl Williamson	2016-01-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the final Unicode boundary type previously missing from core Perl: the LineBreak one. This feature is already available in the Unicode::LineBreak module, but I've been told that there are portability and some other issues with that module. What's added here is a light-weight version that is lacking the customizable features of the module. This implements the default Line Breaking algorithm, but with the customizations that Unicode is expecting everybody to add, as their test file tests for them. In other words, this passes Unicode's fairly extensive furnished tests, but wouldn't if it didn't include certain customizations specified by Unicode beyond the basic algorithm. The implementation uses a look-up table of the characters surrounding a boundary to see if it is a suitable place to break a line. In a few cases, context needs to be taken into account, so there is code in addition to the lookup table to handle those. This should meet the needs for line breaking of many applications, without having to load the module. The algorithm is somewhat independent of the Unicode version, just like the other boundary types. Only if new rules are added, or existing ones modified is there need to go in and change this code. Otherwise, running regen/mk_invlists.pl should be sufficient when a new Unicode release is done to keep it up-to-date, again like the other Unicode boundary types.
*	remove deprecated PL_timesbuf	Daniel Dragan	2016-01-17	1	-5/+0
\| \| \| \|	Saves memory in interp struct.
*	Fix broken fix for RT #127212	Aaron Crane	2016-01-17	1	-3/+5
\| \| \| \| \| \|	As ilmari++ points out, the fix didn't work on builds without PERL_IMPLICIT_CONTEXT (including non-threaded, non-multiplicity) or PERL_DEBUG_READONLY_COW.
*	Fix version numbers in intrpvar.h comments	Aaron Crane	2016-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	There are two version numbers in intrpvar.h that have been repeatedly but confusingly bumped by an older version of Porting/bump-perl-version. Now that the porting script ignores intrpvar.h, it's better to restore the version numbers to the way they were when they were originally added. The comment for PERL_LAST_5_18_0_INTERP_MEMBER was introduced in commit d399cf59bde32e412ae99791ae46a871c7337b42, and the comments for PL_timesbuf was introduced in 25983af42cdcf2dc1fea6717dac7aac441b6301d.
*	RT #127212: retain binary compatibility across plain/DEBUGGING	Aaron Crane	2016-01-17	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Niko Tyni of Debian points out that the size of the interpreter structure differs between plain and -DDEBUGGING builds, and that this breaks binary compatibility of XS modules between such builds. Making the definition of PL_memory_debug_header unconditional on PERL_TRACK_MEMPOOL (which itself is defined only on debug builds) eliminates this needless incompatibility. There is some confusion about whether plain and debug builds are expected to be compatible. Commit 1e8125c621275d18c74bc8dae3bfc3c03929fe1e (July 2010) refers in passing to "binary incompatible perls with the same API version (i.e. the same perl version configured with and without DEBUGGING)". But f2b88940d815760ad254d35a0ee1eb2ed8ce7762 (November 2009) says explicitly that "-DDEBUGGING and not need to be binary compatible with each other", and I think this explicit statement is a better example to follow. Further, this compatibility is clearly useful for our downstream packagers (as reported by Niko), and for any users who'd like to be able to use a debug build for tracking down problems (including those encountered while using modules with XS parts).
*	Bump the perl version in various places for 5.23.7	David Golden	2015-12-21	1	-2/+2
\|
*	Bump the perl version in various places for 5.23.6	Abigail	2015-11-20	1	-2/+2
\|
*	avoid (TAINTING_get && TAINT_get)	David Mitchell	2015-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In various places we test for both (PL_tainting && PL_tainted). Since if tainting isn't enabled PL_tainted should never get set, it's more efficient to just test for (TAINT_get). We ensure that PL_tainted doesn't actually get set when !PL_tainting by changing some "setting" macros from PL_tainted = TRUE to PL_tainted = PL_tainting.