delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Add comments describing how PVLVs store REGEXPs by reference	Nicholas Clark	2021-09-26	1	-0/+15
\|
*	Add SvIsBOOL() macro to test for SVs being boolean-intent	Paul "LeoNerd" Evans	2021-09-10	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	These are identified as being static shared COW strings whose string buffer points directly at PL_Yes / PL_No Define sv_setbool() and sv_setbool_mg() macros Use sv_setbool() where appropriate Have sv_dump() annotate when an SV's PV buffer is one of the PL_(Yes\|No) special booleans
*	isUTF8 DFA: change while {} to do {} while;	Karl Williamson	2021-08-22	1	-2/+20
\| \| \| \| \| \| \| \|	This saves a conditional in many cases. Core Perl doesn't call this on an empty string, so the first test that it is empty is redundant. We can't guarantee this for non-core calls, so the conditional is made explicit for them.
*	Rmv redundant API info for isUTF8_char_flags	Karl Williamson	2021-08-21	1	-1/+1
\| \| \| \|	This resolves GH #19069
*	utf8.c: Refactor is_utf8_char_helper()	Karl Williamson	2021-08-14	1	-2/+2
\| \| \| \| \| \| \| \| \|	Now that the DFA is used by the only callers to this to eliminate the need to check for e.g., wrong continuation bytes, this function can be refactored to use a switch statement, which makes it clearer, shorter, and faster. The name is changed to indicate its private nature
*	Make macro isUTF8_CHAR_flags an inline fcn	Karl Williamson	2021-08-14	1	-0/+67
\| \| \| \|	This makes it use the fast DFA for this functionality.
*	is_utf8_valid_partial_char_flags: Use DFA	Karl Williamson	2021-08-14	1	-5/+34
\| \| \| \| \| \| \| \|	The DFA macro for determining if a sequence is valid UTF-8 was deliberately made general enough to accommodate this use-case, in which only a partial character is acceptable. Change the code to use the DFA. The helper function's name is changed to indicate it is private
*	inline.h: Macroize DFA for isFOO_UTF8_CHAR()	Karl Williamson	2021-08-07	1	-60/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are currently three functions for variants of finding if the next few bytes of a string form a proper UTF-8 encoded character of some ilk. The main code for each is identical to the others, except for the table that drives it. This commit makes that code a macro that takes arguments to customize its behavior sufficiently for current and forseeable needs. This makes it easier to keep the varieties in sync with each other with future changes. The macro has three exit points: 1) successful parsing 2) unsuccessful parsing 3) succesful parsing as far as it went, but the input was exhausted before reaching a full character. What to do for each of these eventualities is passed to the macro. This is a change in behavior in which 2) and 3) were not distinguished from each other. This actually leads to fewer tests in some situations, and future commits using this DFA for other purposes will take advantage of it.
*	Reimplement OFFUNISKIP	Karl Williamson	2021-08-07	1	-0/+8
\| \| \| \| \| \| \|	Now that previous commits have made it fast to find the position of the first set bit in a word, we can use a forumla to find how many bytes the UTF-8 of that will occupy. This allows for simplification of this macro, removing several conditionals
*	regcharclass.h: Remove 2 EBCDIC dependencies	Karl Williamson	2021-07-31	1	-10/+3
\| \| \| \| \| \| \| \| \|	This commit makes is_HANGUL_ED_utf8_safe() return 0 unconditionally on EBCDIC platforms. This means its callers don't have to care what platform is running. Change the two callers to take advantage of this The commit also changes the description of the macro to be slightly more accurate
*	msb_pos(): Bit twiddle a subtraction into an xor	Karl Williamson	2021-07-30	1	-2/+9
\| \| \| \| \| \|	Experiments by Tomasz Konojacki indicated that gcc, for one, doesn't optimally optimize a subtraction from 2**n-1. This commit tells the compiler the optimization.
*	Always use any fast available msb/lsb method	Karl Williamson	2021-07-30	1	-0/+49
\| \| \| \| \| \| \| \| \|	Some platforms have a fast way to get the msb but not the lsb; others, more rarely, have the reverse. But using a few shift and the like instructions allows us to reduce either instance to terms of the other. This commit causes any available fast method to be used by turning the non-available case into the available one
*	Comment why ffs() isn't used for lsbit_pos()	Karl Williamson	2021-07-30	1	-0/+9
\|
*	Use windows builtins for msb_pos, lsb_pos, if avail	Karl Williamson	2021-07-30	1	-1/+42
\| \| \| \| \|	Windows has different intrinsics than the previous commit added, with a different API for counting leading/trailing zeros
*	Use clz, ctz for msb_pos, lsb_pos, if available	Karl Williamson	2021-07-30	1	-2/+109
\| \| \| \| \| \|	On many modern platforms these functions can be replaced by a single machine instruction or two. This commit looks for this possibility and uses it if possible.
*	Create and use 32 and 64 bit msbit_pos() fcns	Karl Williamson	2021-07-30	1	-39/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code to determine the position of the most significant 1 bit in a word is extracted from variant_byte_number(), and generalized to use the deBruijn method previously added that works on any bit in the word, rather than the existing method which looks just at the msb of each byte. The code is moved to a new function in preparation for being called from other places. A U32 version is created, and on 64 bit platforms, a second, parallel, version taking a U64 argument is also created. This is because future commits may care about the word size differences.
*	Create and use 32 and 64 bit lsbit_pos() fcns	Karl Williamson	2021-07-30	1	-17/+56
\| \| \| \| \| \| \| \| \| \|	The existing code to determine the position of the least significant 1 bit in a word is extracted from variant_byte_number() and moved to a new function in preparation for being called from other places. A U32 version is created, and on 64 bit platforms, a second, parallel, version taking a U64 argument is also created. This is because future commits may care about the word size differences.
*	Perl_variant_byte_number: Move assert()	Karl Williamson	2021-07-30	1	-2/+4
\| \| \| \| \|	This should be called only when it is known there is a variant byte. The assert() previously wasn't checking that precisely
*	Perl_variant_byte_number: Generalize	Karl Williamson	2021-07-30	1	-14/+10
\| \| \| \| \| \| \|	The current mechanism doesn't work if the lowest bit is the one set. At the moment that doesn't matter as we aren't looking at that bit anyway. But a future commit will refactor things so that bit will be looked at. So prepare for that. The new expression is simpler, besides.
*	Add 64bit single-1bit_pos()	Karl Williamson	2021-07-30	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	This will prove useful in future commits on platforms that have 64 bit capability. The deBruijn sequence used here, taken from the internet, differs from the 32 bit one in how they treat a word with no set bits. But this is considered undefined behavior, so that difference is immaterial. Apparently figuring this out uses brute force methods, and so I decided to live with this difference, rather than to expend the time needed to bring them into sync.
*	Create and use single_1bit_pos32()	Karl Williamson	2021-07-30	1	-0/+19
\| \| \| \| \| \|	This moves the code from regcomp.c to inline.h that calculates the position of the lone set bit in a U32. This is in preparation for use by other call sites.
*	Add inline av_fetch_simple and av_store_simple functions	Richard Leach	2021-07-03	1	-0/+83
\|
*	Avoid some conditionals in is...UTF8_CHAR()	Karl Williamson	2021-06-28	1	-17/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These three functions to determine if the next bit of a string is UTF-8 (constrained in three different ways) have basically the same short loop. One of the initial conditions in the while() is always true the first time around. By moving that condition to the middle of the loop, we avoid it for the common case where the loop is executed just once. This is when the input is a UTF-8 invariant character (ASCII on ASCII platforms). If the functions were constrained to require the first byte pointed to by the input to exist, the while() could be a do {} while(), and there would be no extra conditional in calling this vs checking if the next character is invariant, and if not calling this. And there would be fewer conditionals for the case of 2 or more bytes in the character.
*	inline.h: Clarify comment	Karl Williamson	2021-05-28	1	-1/+2
\|
*	isUTF8_CHAR() Check ptr before dereferencing	Karl Williamson	2021-05-28	1	-1/+1
\| \| \| \| \| \|	It is legal to call this function, though not so done in core, with empty input. By swapping two conditions in the same 'if', we check if empty before trying to access it.
*	try isn't treated as a sub call like eval is	Tony Cook	2021-02-14	1	-5/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	The try change added code to pp_return to skip past try contexts when looking for the sub/sort/eval context to return from. This was only needed because cx_pusheval() sets si_cxsubix to the current frame and try uses that function to push it's context, that value is then used by the dopopto_cursub() macro to shortcut walking the context stack. Since we don't need to treat try as a sub for return, list vs array checks or lvalue sub checks, don't set si_cxsubix on try.
*	style: Detabify indentation of the C code maintained by the core.	Michael G. Schwern	2021-01-17	1	-68/+68
\| \| \| \| \| \| \| \| \| \| \|	This just detabifies to get rid of the mixed tab/space indentation. Applying consistent indentation and dealing with other tabs are another issue. Done with `expand -i`. * vutil.* left alone, it's part of version. * Left regen managed files alone for now.
*	Fix broken PERL_MEM_LOG under threads	Karl Williamson	2020-12-19	1	-15/+159
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes GH #18341 There are problems with getenv() on threaded perls wchich can lead to incorrect results when compiled with PERL_MEM_LOG. Commit 0b83dfe6dd9b0bda197566adec923f16b9a693cd fixed this for some platforms, but as Tony Cook, pointed out there may be standards-compliant platforms that that didn't fix. The detailed comments outline the issues and (complicated) full solution.
*	Add GETENV_LOCK	Karl Williamson	2020-12-19	1	-2/+3
\| \| \| \| \| \| \| \|	get_env() needs to lock other threads from writing to the environment while it is executing. It may need to have an exclusive lock if those threads can clobber its buffer before it gets a chance to save them. The previous commit has added a Configure probe which tells us if that is the case. This commit uses it to select which type of mutex to use.
*	Evaluate arg once in all forms of SvTRUE	Karl Williamson	2020-12-06	1	-1/+50
\| \| \| \|	5.32 did this for one form; now all do.
*	Slience compiler warnings for NV, [IU]V compare	Karl Williamson	2020-11-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These were occurring on FreeBSD smokes. warning: implicit conversion from 'IV' (aka 'long') to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion] 9223372036854775807 is IV_MAX. What needed to be done here was to use the NV containing IV_MAX+1, a value that already exists in perl.h In other instances, simply casting to an NV before doing the comparison with the NV was what was needed. This fixes #18328
*	Move regcurly to regcomp.c (from inline.h)	Karl Williamson	2020-11-18	1	-30/+0
\| \| \| \| \| \|	This function is called only at compile time; experience has shown that compile-time operations are not time-critical. And future commits will lengthen it, making it not practically inlinable anyway.
*	autodoc.pl: Enhance apidoc_section feature	Karl Williamson	2020-11-06	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	This feature allows documentation destined for perlapi or perlintern to be split into sections of related functions, no matter where the documentation source is. Prior to this commit the line had to contain the exact text of the title of the section. Now it can be a $variable name that autodoc.pl expands to the title. It still has to be an exact match for the variable in autodoc, but now, the expanded text can be changed in autodoc alone, without other files needing to be updated at the same time.
*	Document CvGV	Karl Williamson	2020-09-28	1	-0/+7
\|
*	Clarify branch prediction in SvTRUE	Karl Williamson	2020-09-05	1	-1/+1
\|
*	Reorganize perlapi	Karl Williamson	2020-09-04	1	-2/+5
\| \| \| \| \|	This uses a new organization of sections that I came up with. I asked for comments on p5p, but there were none.
*	Change some =head1 to apidoc_section lines	Karl Williamson	2020-09-04	1	-1/+1
\| \| \| \| \|	apidoc_section is slightly favored over head1, as it is known only to autodoc, and can't be confused with real pod.
*	S_lossless_NV_to_IV(): skip Perl_isnan	David Mitchell	2020-08-27	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This inline function was added by v5.31.0-27-g3a019afd6f to consolidate similar code in several places, like pp_add(). It also avoided undefined behaviour, as seen by ASan, by no longer unconditionally trying to cast an NV to IV - ASan would complain when nv was -Inf for example. However that commit introduced a performance regression into common numeric operators like pp_and(). This commit partially claws back performance by skipping the initial test of 'skip if Nan' which called Perl_isnan(). Instead, except on systems where NAN_COMPARE_BROKEN is true, it relies on NaN being compared to anything always being false, and simply rearranges existing conditions nv < IV_MIN etc to be nv >= IV_MIN so that any NaN comparison will trigger a false return. This claws back about half the performance loss. The rest seems unavoidable, since the two range tests for IV_MIN..IV_MAX are an unavoidable part of avoiding undefined behaviour.
*	sort { return foo() } ...	David Mitchell	2020-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GH #18081 A sub call via return in a sort block was called in void rather than scalar context, causing the comparison result to be discarded. This because when a sort block is called it is not a real function call, even though a sort block can be returned from. Instead, a CXt_NULL is pushed on the context stack. Because this isn't a sub-ish context type (unlike CXt_SUB, CXt_EVAL etc) there is no 'caller sub' on the context stack to be found to retrieve the caller's context (i.e. cx->cx_gimme). This commit fixes it by special-casing Perl_gimme_V(). Ideally at some future point, a new context type, CXt_SORT, should be added. This would be used instead of CXt_NULL when a sort BLOCK is called. Like other sub-ish context types, it would have an old_cxsubix field and PL_curstackinfo->si_cxsubix would point to it. This would eliminate needing special-case handling in places like Perl_gimme_V().
*	Add av_count()	Karl Williamson	2020-08-19	1	-4/+12
\| \| \| \| \| \| \| \| \|	This returns the number of elements in an array in a clearly named function. av_top_index(), av_tindex() are clearly named, but are less than ideal, and came about because no one back then thought of this one, until now Paul Evans did.
*	Remove use of dVAR in core	Dagfinn Ilmari Mannsåker	2020-07-20	1	-1/+0
\| \| \| \| \|	It only does anything under PERL_GLOBAL_STRUCT, which is gone. Keep the dNOOP defintion for CPAN back-compat
*	Fix a bunch of repeated-word typos	Dagfinn Ilmari Mannsåker	2020-05-22	1	-2/+2
\| \| \| \| \|	Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
*	Revert "croak_memory_wrap is an inline function."	Karl Williamson	2020-03-11	1	-11/+0
\| \| \| \| \| \| \|	This reverts commit 6c714a09cc08600278e72aea1fcdf83576d061b4. croak_memory_wrap is designed to save a few bytes of memory, and was never intended to be inlined. This commit moves it to util.c where the other croak functions are.
*	Add thread safety to some environment accesses	Karl Williamson	2020-03-11	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous commit added a mutex specifically for protecting against simultaneous accesses of the environment. This commit changes the normal getenv, putenv, and clearenv functions to use it, to avoid races. This makes the code simpler in places where we've gotten burned and added stuff to avoid races. Other places where we haven't known we were getting burned could have existed until now. Now that comes automatically, and we can remove the special cases we earlier stumbled over. getenv() returns a pointer to static memory, which can be overwritten at any moment from another thread, or even another getenv from the same thread. This commit changes the accesses to be under control of a mutex, and in the case of getenv, a mortalized copy is created so that there is no possible race.
*	Inline the SvGETMAGIC call directly rather than via the macro	Paul "LeoNerd" Evans	2020-03-01	1	-1/+2
\|
*	SvTRUE might need to take aTHX	Paul "LeoNerd" Evans	2020-03-01	1	-2/+4
\|
*	Initial experiment at moving SvTRUE into a static inline macro	Paul "LeoNerd" Evans	2020-03-01	1	-0/+5
\|
*	Change return type of regcurly to bool	Karl Williamson	2020-01-23	1	-1/+1
\| \| \| \|	This internal function is more properly bool, not I32.
*	Remove dquote_inline.h	Karl Williamson	2020-01-23	1	-0/+30
\| \| \| \| \|	The remaining function in this file is moved to inline.h, just to not have an extra file lying around with hardly anything in it.
*	Rewrite and inline my_strnlen()	Karl Williamson	2020-01-13	1	-0/+30
\| \| \| \| \| \| \|	This commit changes this function to use memchr() instead of looping byte-by-byte through the string. And it inlines it into 3 lines of code. This should give comparable performance to a native libc strnlen().