delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	replace all instances of PERL_IMPLICIT_CONTEXT with MULTIPLICITY	Tomasz Konojacki	2021-06-09	1	-23/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Since the removal of PERL_OBJECT (acfe0abcedaf592fb4b9cb69ce3468308ae99d91) PERL_IMPLICIT_CONTEXT and MULTIPLICITY have been synonymous and they're being used interchangeably. To simplify the code, this commit replaces all instances of PERL_IMPLICIT_CONTEXT with MULTIPLICITY. PERL_IMPLICIT_CONTEXT will stay defined for compatibility with XS modules.
*	Fix broken PERL_MEM_LOG under threads	Karl Williamson	2020-12-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes GH #18341 There are problems with getenv() on threaded perls wchich can lead to incorrect results when compiled with PERL_MEM_LOG. Commit 0b83dfe6dd9b0bda197566adec923f16b9a693cd fixed this for some platforms, but as Tony Cook, pointed out there may be standards-compliant platforms that that didn't fix. The detailed comments outline the issues and (complicated) full solution.
*	Remove obsolete FCRYPT ifdefs and associated PL_cryptseen (#17624)	Richard Leach	2020-07-30	1	-1/+0
\| \| \|	Co-authored-by: Karl Williamson <khw@cpan.org>
*	Remove PERL_GLOBAL_STRUCT	Dagfinn Ilmari Mannsåker	2020-07-20	1	-129/+0
\| \| \| \| \| \| \| \|	This was originally added for MinGW, which no longer needs it, and only still used by Symbian, which is now removed. This also leaves perlapi.[ch] empty, but we keep the header for CPAN backwards compatibility.
*	Make PL_utf8_foldclosures interpreter level	Karl Williamson	2020-06-02	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This resolves #17774. This ticket is because the fixes in GH #17154 failed to get every case, leaving this one outlier to be fixed by this commit. The text in https://github.com/Perl/perl5/issues/17154 gives extensive details as to the problem. But briefly, in an attempt to speed up interpreter cloning, I moved certain SVs from interpreter level to global level in e80a0113c4a8036dfb22aec44d0a9feb65d36fed (v5.27.11, March 2018). This was doable, we thought, because the content of these SVs is constant throughout the life of the program, so no need to copy them when cloning a new interpreter or thread. However when an interpreter exits, all its SVs get cleaned up, which caused these to become garbage in applications where another interpreter remains running. This circumstance is rare enough that the bug wasn't reported until September 2019, #17154. I made an initial attempt to fix the problem, and closed that ticket, but I overlooked one of the variables, which was reported in #17774, which this commit addresses. Effectively the behavior is reverted to the way it was before e80a0113c4a8036dfb22aec44d0a9feb65d36fed.
*	Add mutex for accessing ENV	Karl Williamson	2020-03-11	1	-0/+2
\|
*	optimize sort by inlining comparison functions	Tomasz Konojacki	2020-03-09	1	-1/+0
\| \| \| \| \| \| \| \|	This makes special-cased forms such as sort { $b <=> $a } even faster. Also, since this commit removes PL_sort_RealCmp, it fixes the issue with nested sort calls mentioned in gh #16129
*	Fixup POSIX::mbtowc, wctomb	Karl Williamson	2020-02-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit enhances these functions so that on threaded perls, they use mbrtowc and wcrtomb when available, making them thread safe. The substitution isn't completely transparent, as no effort is made to hide any differences in errno setting upon error. And there may be slight differences in edge case behavior on some platforms. This commit also changes the behaviors so that they take a scalar parameter instead of a char *, and this might be 'undef' or not be forceable into a valid PV. If not a PV, the functions initialize the shift state. Previously the shift state was always reinitialized with every call, which meant these could not work on locales with shift states. In addition, there were several issues in mbtowc and wctomb that this commit fixes. mbtowc and wctomb, when used, are now run with a semaphore. This avoids races if called at the same time in another thread. The returned wide character from mbtowc() could well have been garbage. The final parameter to mbtowc is now optional, as passing an SV allows us to determine the length without the need for an extra parameter. It is now used only to restrict the parsing of the string to shorter than the actual length. wctomb would segfault if the string parameter was shared or hadn't been pre-allocated with a string of sufficient length to hold the result.
*	POSIX::mblen() Make thread-safe; allow shift state control	Karl Williamson	2020-02-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit changes the behavior so that it takes a scalar parameter instead of a char *, and thus might not be forceable into a valid PV. When not a PV, the shift state is reinitialized, like calling mblen with a NULL first parameter. Previously the shift state was always reinitialized with every call, which meant this could not work on locales with shift states. This commit also changes to use mbrlen() on threaded perls transparently (mostly), when available, to achieve thread-safe operation. It is not completely transparent because mbrlen (under the very rare stateful locales) returns a different value when it's resetting the shift state. It also may set errno differently upon errors, and no effort is made to hide that difference. Also mbrlen on some platforms can handle partial characters. [perl #133928] showed that someone was having trouble with shift states.
*	Revert "Move PL_check to the interp vars to fix threading issues"	Tony Cook	2019-12-16	1	-1/+2
\| \| \| \| \|	and the associated commits, at least until a way to make wrap_op_checker() work is available.
*	Move PL_check to the interp vars to fix threading issues	Stefan Seifert	2019-12-12	1	-2/+1
\| \| \| \|	Fixes issue #14816
*	Move regex global variables to interpreter level	Karl Williamson	2019-11-26	1	-62/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of fixing gh #17154 This scenario from the ticket (https://github.com/Perl/perl5/issues/17154#issuecomment-558877358) shows why this fix is necessary: main interpreter initializes PL_AboveLatin1 to an SV it owns loads threads::lite and creates a new thread/interpreter which initializes PL_AboveLatin1 to a SV owned by the new interpreter threads::lite child interpreter finishes, freeing all of its SVs, PL_AboveLatin1 is now invalid main interpreter uses a regexp that relies on PL_AboveLatin1, dies horribly. By making these interpreter level variables, this is avoided. There is extra copying, but it is just the SV headers, as the real data is kept as static C arrays.
*	add explicit 1-arg and 3-arg sig handler functions	David Mitchell	2019-11-18	1	-0/+6
\| \| \| \| \| \| \|	Currently, whether the OS-level signal handler function is declared as 1-arg or 3-arg depends on the configuration. Add explicit versions of these functions, principally so that POSIX.xs can call which version of the handler it wants regardless of configuration: see next commit.
*	Remove generation and use of NonFinalFold table	Karl Williamson	2019-11-16	1	-2/+0
\| \| \| \| \| \|	With the revamping done in cc288b7a2732c37504039083ebb98241954636be, the table of Unicode case folds that are more than a single character is no longer used, so no need to generate it, or having it available.
*	Remove swashes from core	Karl Williamson	2019-11-06	1	-5/+0
\| \| \| \|	Also references to the term.
*	intrpvar.h: Add variable for use in tr///	Karl Williamson	2019-11-06	1	-0/+1
\| \| \| \|	This is part of this branch of changes.
*	Rmv more deprecated characlassify/case change macros	Karl Williamson	2019-10-31	1	-1/+0
\| \| \| \|	These were missed by 059703b088f44d5665f67fba0b9d80cad89085fd.
*	Add hook for Unicode private use override	Karl Williamson	2019-03-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	I am starting to write a Unicode::Private_Use module which will allow one to specify the Unicode properties of private use code points, thus making them actually useful. This commit adds a hook to regcomp.c to accommodate this module. The changes are pretty minimal. This way we don't have to wait another release cycle to get it out there. I don't want to document this interface, until it's proven.
*	fix thread issue with PERL_GLOBAL_STRUCT	David Mitchell	2019-02-19	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MY_CXT subsystem allows per-thread pseudo-static data storage. Part of the implementation for this involves each XS module being assigned a unique index in its my_cxt_index static var when first loaded. Because PERL_GLOBAL_STRUCT bans any static vars, under those builds there is instead a table which maps the MY_CXT_KEY identifying string to index. Unfortunately, this table was allocated per-interpreter rather than globally, meaning if multiple threads tried to load the same XS module, crashes could ensue. This manifested itself in failures in ext/XS-APItest/t/keyword_plugin_threads.t The fix is relatively straightforward: allocate PL_my_cxt_keys globally rather than per-interpreter. Also record the size of this struct in a new var, PL_my_cxt_keys_size, rather than doing double duty on PL_my_cxt_size.
*	foo_cloexec() under PERL_GLOBAL_STRUCT_PRIVATE	David Mitchell	2019-02-19	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the various Perl_PerlSock_dup2_cloexec() type functions so that t/porting/liberl.a passes under -DPERL_GLOBAL_STRUCT_PRIVATE builds. In these builds it is forbidden to have any static variables, but each of these functions (via convoluted macros) has a static var called 'strategy' which records, for each function, whether a run-time probe has been done to determine the best way of achieving close-exec functionality, and the result. Replace them all with 'global' vars: PL_strategy_dup2 etc. NB these vars aren't thread-safe but it doesn't really matter, as the worst that can happen is for a redundant probe or two to be done before a suitable "don't probe any more" value is written to the var and seen by all the threads.
*	Add global hash to handle \p{user-defined}	Karl Williamson	2019-02-14	1	-0/+4
\| \| \| \| \| \| \|	A global hash has to be specially handled. The keys can't be shared, and all the SVs stored into it must be in its thread. This commit adds the hash, and initialization, and macros for context change, but doesn't use them. The code to deal with this is entirely confined to regcomp.c.
*	Add mutex for dealing with qr/\p{user-defined}/	Karl Williamson	2019-02-14	1	-0/+2
\| \| \| \|	This will be used in future commits
*	Add variable for if the current UTF-8 locale is Turkic	Karl Williamson	2019-02-05	1	-0/+1
\| \| \| \|	It currently is always set false, until later in this series of commits.
*	regen/mk_invlists.pl: Create new inversion list	Karl Williamson	2019-02-05	1	-0/+2
\| \| \| \|	This will be used in a future commit.
*	Change name of PL_NonL1NonFinalFold	Karl Williamson	2018-12-25	1	-2/+2
\| \| \| \| \|	The inversion list this refers to now includes the Latin 1 range, so the name was misleading.
*	Change name of PL_utf8_foldable variable	Karl Williamson	2018-12-25	1	-2/+2
\| \| \| \| \| \|	This variable's name was out-of-date and misleading. It is the name of an inversion list that contains all the code points in the current version of Unicode that participate in any way in a /i type of fold.
*	regen/mk_invlists.pl: Add new table	Karl Williamson	2018-12-07	1	-0/+2
\| \| \| \| \| \| \|	This table contains all the code points that are in any multi-character fold (not the folded-from character, but what that character folds to). It will be used in a future commit.
*	Make global two interpreter variables	Karl Williamson	2018-07-14	1	-2/+4
\| \| \| \| \|	These variables are constant, once initialized, through the life of a program, so having them be per instance is a waste of time and space
*	regcomp.c: Simplify	Karl Williamson	2018-06-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Under /a pattern matching, the matches of the [:posix:] classes are restricted to the ASCII range. Previously, in a time/space trade-off that favored space, we created the list of matching characters at pattern compilation time by ANDing the full-range Posix class with the set of ASCII characters. But now, the tables for just the ASCII-range classes are generated anyway, so there's no need to do that compilation-time intersection. This slightly simplifies the code.
*	Use compiled-in C structure for inverted case folds	Karl Williamson	2018-03-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This commit changes to use the C data structures generated by the previous commit to compute what characters fold to a given one. This is used to find out what things should match under /i. This now avoids the expensive start up cost of switching to perl utf8_heavy.pl, loading a file from disk, and constructing a hash from it.
*	Remove obsolete variables	Karl Williamson	2018-03-31	1	-1/+0
\| \| \| \| \|	These were for when some of the Posix character classes were implemented as swashes, which is no longer the case, so these can be removed.
*	Use charnames inversion lists	Karl Williamson	2018-03-31	1	-2/+4
\| \| \| \| \| \| \| \|	This commit makes the inversion lists for parsing character name global instead of interpreter level, so can be initialized once per process, and no copies are created upon new thread instantiation. More importantly, this is another instance where utf8_heavy.pl no longer needs to be loaded, and the definition files read from disk.
*	Move case change invlists from interpreter to global	Karl Williamson	2018-03-26	1	-5/+10
\| \| \| \| \|	These are now constant through the life of the program, so don't need to be duplicated at each new thread instantiation.
*	Move UTF-8 case changing data into core	Karl Williamson	2018-03-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, if a program wanted to compute the case-change of a character above 0xFF, the C code would switch to perl, loading lib/utf8heavy.pl and then read another file from disk, and then create a hash. Future references would use the hash, but the start up cost is quite large. There are five case change types, uc, lc, tc, fc, and simple fc. Only the first encountered requires loading of utf8_heavy, but each required switching to utf8_heavy, and reading the appropriate file from disk. This commit changes these functions to use compiled-in C data structures (inversion maps) to represent the data. To look something up requires a binary search instead of a hash lookup. An individual hash lookup tends to be faster than a binary search, but the differences are small for small sizes. I did some benchmarking some years ago, (commit message 87367d5f9dc9bbf7db1a6cf87820cea76571bf1a) and the results were that for fewer than 512 entries, the binary search was just as fast as a hash, if not actually faster. Now, I've done some more benchmarks on blead, using the tool benchmark.pl, which wasn't available back then. The results below indicate that the differences are minimal up through 2047 entries, which all Unicode properties are well within. A hash, PL_foldclosures, is still constructed at runtime for the case of regular expression /i matching, and this could be generated at Perl compile time, as a further enhancement for later. But reading a file from disk is no longer required to do this. ======================= benchmarking results ======================= Key: Ir Instruction read Dr Data read Dw Data write COND conditional branches IND indirect branches _m branch predict miss _m1 level 1 cache miss _mm last cache (e.g. L3) miss - indeterminate percentage (e.g. 1/0) The numbers represent raw counts per loop iteration. "\x{10000}" =~ qr/\p{CWKCF}/" swash invlist Ratio % fetch search ------ ------- ------- Ir 2259.0 2264.0 99.8 Dr 665.0 664.0 100.2 Dw 406.0 404.0 100.5 COND 406.0 405.0 100.2 IND 17.0 15.0 113.3 COND_m 8.0 8.0 100.0 IND_m 4.0 4.0 100.0 Ir_m1 8.9 17.0 52.4 Dr_m1 4.5 3.4 132.4 Dw_m1 1.9 1.2 158.3 Ir_mm 0.0 0.0 100.0 Dr_mm 0.0 0.0 100.0 Dw_mm 0.0 0.0 100.0 These were constructed by using the file whose contents are below, which uses the property in Unicode that currently has the largest number of entries in its inversion list, > 1600. The test was run on blead -O2, no debugging, no threads. Then the cut-off boundary was changed from 512 to 2047 for when we use a hash vs an inversion list, and the test run again. This yields the difference between a hash fetch and an inversion list binary search ===================== The benchmark file is below =============== no warnings 'once'; my @benchmarks; push @benchmarks, 'swash' => { desc => '"\x{10000}" =~ qr/\p{CWKCF}/"', setup => 'no warnings "once"; my $re = qr/\p{CWKCF}/; my $a = "\x{10000}";', code => '$a =~ $re;', }; \@benchmarks;
*	Make Unicode data structures global	Karl Williamson	2018-03-14	1	-19/+38
\| \| \| \| \| \| \| \| \| \|	These structures are read-only, use const C strings, and are truly global, so no need to have them be interpreter level. This saves duplicating and freeing them as threads come and go. In doing this, I noticed that not every one was properly being copied/deallocated, so this fixes some potential unreported bugs, and leaks.
*	Add thread-safe locale handling	Karl Williamson	2018-02-18	1	-0/+1
\| \| \| \| \| \|	This (large) commit allows locales to be used in threaded perls on platforms that support it. This includes recent Windows and Posix 2008 ones.
*	Latch LC_NUMERIC during critical sections	Karl Williamson	2018-02-18	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for operations on threaded perls which don't 'use locale' to still change the locale. This happens when calling POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can happen for other operations when perl has been initialized with the environment causing the various locale categories to not have a uniform locale. This commit causes the areas where the locale for this category should predictably be in one or the other state to be a critical section where another thread can't interrupt and change it. This is a separate mutex, so that only these particular operations will be held up.
*	Add Perl_setlocale()	Karl Williamson	2018-02-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	khw could not find any modules on CPAN that correctly use the C library function setlocale(). (The very few that do try, do not use it correctly, looking at the return value incorrectly, so they are broken.) This analysis does not include modules that call non-Perl libaries that may call setlocale(). And, a future commit will render the setlocale() function useless in some configurations on some platforms. So this commit adds Perl_setlocale(), for XS code to call, and which is always effective, but it should not be used to alter the locale except on platforms where the predefined variable ${^SAFE_LOCALES} evaluates to 1. This function is also what POSIX::setlocale() calls to do the real work.
*	Avoid changing locale when finding radix char	Karl Williamson	2018-01-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	On systems that have the POSIX 2008 operations, including nl_langinfo_l(), this commit causes them to not have to actually change the locale when determining what the decimal point character is. The locale may have to change during the printing/reading of numbers, but eventually we can use sprintf_l(), if available, to avoid that too.
*	Cache locale UTF8-ness lookups	Karl Williamson	2018-01-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some locales are UTF-8, some are not. Knowledge of this is needed in various circumstances. This commit saves the results of the last several lookups so they don't have to be recalculated each time. The full generality of POSIX locales is such that you can have error messages be displayed in one locale, say Spanish, while other things are in French. To accommodate this generality, the program can loop through all the locale categories finding the UTF8ness of the locale it points to. However, in almost all instances, people are going to be in either French or in Spanish, and not in some combination. Suppose it is a French UTF-8 locale for all categories. This new cache will know that the French locale is UTF-8, and the queries for all but the first category can return that immediately. This simple cache avoids the overhead of hashes. This also fixes a bug I realized exists in threaded perls, but haven't reproduced. We do not support locales in such perls, and the user must not change the locale or 'use locale'. But perl itself could change the locale behind the scenes, leading to segfaults or incorrect results. One such instance is the determination of UTF8ness. But this only could happen if the full generality of locales is used so that the categories are not all in the same locale. This could only happen (if the user doesn't change locales) if the environment is such that the perl program is started up so that the categories are in such a state. This commit fixes this potential bug by caching the UTF8ness of each category at startup, before any threads are instantiated, and so checking for it later just looks it up in the cache, without perl changing the locale.
*	Avoid some unnecessary changing of locales	Karl Williamson	2018-01-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The LC_NUMERIC locale category is kept so that generally the decimal point (radix) is a dot. For some (mostly) output purposes, it needs to be swapped into the program's current underlying locale so that a non-dot can be printed. This commit changes things so that if the current underlying locale uses a decimal point, the swap doesn't happen, as it's not needed.
*	Remove unused interpreter variable	Karl Williamson	2017-12-26	1	-1/+0
\| \| \| \|	This somehow became unused or never got used; I didn't do the research.
*	Add script_run regex feature	Karl Williamson	2017-12-24	1	-0/+1
\| \| \| \|	As explained in the docs, this helps detect spoofing attacks.
*	make exec keep its argument list more reliably	Zefram	2017-12-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Bits of exec code were putting the constructed commands into globals PL_Argv and PL_Cmd, which could then be clobbered by reentrancy. These are only global in order to manage their freeing, but that's better managed by using the scope stack. So replace them with automatic variables, with ENTER/SAVEFREEPV/LEAVE to free the memory. Also copy the strings acquired from SVs, to avoid magic clobbering the buffers of SVs already read. Fixes [perl #129888].
*	add wrap_keyword_plugin function (RT #132413)	Lukas Mai	2017-11-11	1	-0/+2
\|
*	Change name of locale per-interpreter variable	Karl Williamson	2017-11-08	1	-1/+1
\| \| \| \| \| \|	The real purpose of this internal variable is to give the name of the locale that is the underlying one for the C program. Various macros already indicate that. This furthers the process.
*	(perl #127663) create a separate random source for internal use	Tony Cook	2017-09-11	1	-0/+1
\| \| \| \| \|	and use it to initialize hash randomization and to innoculate against quadratic behaviour in pp_sort
*	Add API function Perl_langinfo()	Karl Williamson	2017-09-09	1	-0/+2
\| \| \| \| \| \|	This is designed to generally replace nl_langinfo() in XS code. It is thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be used on systems lacking nl_langinfo.
*	Make immortal SVs contiguous	David Mitchell	2017-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that PL_sv_yes, PL_sv_undef, PL_sv_no and PL_sv_zero are allocated adjacently in memory. This allows the SvIMMORTAL() test to be more efficient, and will (in the next commit) allow SvTRUE() to be more efficient. In MULTIPLICITY builds the constraint is already met by virtue of them being adjacent items in the interpreter struct. For non-MULTIPLICITY builds, they were just 4 global vars with no guarantees of where they would be allocated. For this case, PL_sv_undef are deleted as global vars and replaced with a new global var PL_sv_immortals[4], with #define PL_sv_yes (PL_sv_immortals[0]) etc in their place.
*	add PL_sv_zero	David Mitchell	2017-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	it's like PL_sv_no, except that its string value is "0" rather than "". It can be used for example where pp function wants to push a zero return value on the stack. The next commit will start to use it. Also update the SvIMMORTAL() to be more efficient: it now checks whether the SV's address is in a range rather than individually checking against &PL_sv_undef, &PL_sv_no etc.