summaryrefslogtreecommitdiff
path: root/embed.h
Commit message (Collapse)AuthorAgeFilesLines
* Add utf8_to_utf16Karl Williamson2021-08-141-0/+1
|
* Improve utf16_to_utf8_reversed()Karl Williamson2021-08-141-2/+1
| | | | | | Instead of destroying the input by first swapping the bytes, this calls a base function with the order to use. The non-reverse function is changed to call the base function with the non-reversed order.
* utf8.c: Refactor is_utf8_char_helper()Karl Williamson2021-08-141-1/+1
| | | | | | | | | Now that the DFA is used by the only callers to this to eliminate the need to check for e.g., wrong continuation bytes, this function can be refactored to use a switch statement, which makes it clearer, shorter, and faster. The name is changed to indicate its private nature
* Make macro isUTF8_CHAR_flags an inline fcnKarl Williamson2021-08-141-0/+1
| | | | This makes it use the fast DFA for this functionality.
* Add helper function for longest UTF8 sequenceKarl Williamson2021-08-071-0/+1
| | | | | | | | | This specialized functionality is used to check the validity of Perl's extended-length UTF-8, which has some ideosyncratic characteristics from the shorter sequences. This means this function doesn't have to consider those differences. It will be used in the next commit to avoid some work, and to eventually enable is_utf8_char_helper() to be simplified.
* utf8.c: Fold 2 overlapping fcns into oneKarl Williamson2021-08-071-5/+0
| | | | | | | | One of these functions is now only called from the other, and there is significant overlap in their logic. This commit refactors them into one resulting function, which is half the code, and more straight forward.
* utf8.c: Generalize static fcnKarl Williamson2021-08-071-1/+1
| | | | | | | | | I've always been uncomfortable with the input constraints this function had. Now that it has been refactored into using a switch(), new cases for full generality can be added without affecting performance, and some conditionals removed before calling it. The function is renamed to reflect its more generality
* utf8.c: Change name of static functionKarl Williamson2021-08-071-1/+1
| | | | | This changes only portions of the capitalization, and the new version is more in keeping with other function names.
* Create and use 32 and 64 bit msbit_pos() fcnsKarl Williamson2021-07-301-0/+2
| | | | | | | | | | | | | The existing code to determine the position of the most significant 1 bit in a word is extracted from variant_byte_number(), and generalized to use the deBruijn method previously added that works on any bit in the word, rather than the existing method which looks just at the msb of each byte. The code is moved to a new function in preparation for being called from other places. A U32 version is created, and on 64 bit platforms, a second, parallel, version taking a U64 argument is also created. This is because future commits may care about the word size differences.
* Create and use 32 and 64 bit lsbit_pos() fcnsKarl Williamson2021-07-301-0/+2
| | | | | | | | | | The existing code to determine the position of the least significant 1 bit in a word is extracted from variant_byte_number() and moved to a new function in preparation for being called from other places. A U32 version is created, and on 64 bit platforms, a second, parallel, version taking a U64 argument is also created. This is because future commits may care about the word size differences.
* Add 64bit single-1bit_pos()Karl Williamson2021-07-301-0/+3
| | | | | | | | | | | | | This will prove useful in future commits on platforms that have 64 bit capability. The deBruijn sequence used here, taken from the internet, differs from the 32 bit one in how they treat a word with no set bits. But this is considered undefined behavior, so that difference is immaterial. Apparently figuring this out uses brute force methods, and so I decided to live with this difference, rather than to expend the time needed to bring them into sync.
* Create and use single_1bit_pos32()Karl Williamson2021-07-301-0/+1
| | | | | | This moves the code from regcomp.c to inline.h that calculates the position of the lone set bit in a U32. This is in preparation for use by other call sites.
* Add inline av_fetch_simple and av_store_simple functionsRichard Leach2021-07-031-0/+2
|
* Rename scalarseq() to a somewhat more meaningful voidnonfinal()Paul "LeoNerd" Evans2021-06-161-1/+1
|
* replace all instances of PERL_IMPLICIT_CONTEXT with MULTIPLICITYTomasz Konojacki2021-06-091-24/+24
| | | | | | | | | | | | Since the removal of PERL_OBJECT (acfe0abcedaf592fb4b9cb69ce3468308ae99d91) PERL_IMPLICIT_CONTEXT and MULTIPLICITY have been synonymous and they're being used interchangeably. To simplify the code, this commit replaces all instances of PERL_IMPLICIT_CONTEXT with MULTIPLICITY. PERL_IMPLICIT_CONTEXT will stay defined for compatibility with XS modules.
* Call magic on all elements on %SIG delocalizationLeon Timmermans2021-06-021-0/+1
|
* regcomp.c: Extract code from a too-large-functionKarl Williamson2021-05-311-0/+1
| | | | | S_regclass() is unwieldy. This commit splits it into two nearly equal size parts. More could be done.
* Add Perl_av_new_alloc() function and newAV_alloc_x/z() macrosRichard Leach2021-05-261-0/+1
|
* try isn't treated as a sub call like eval isTony Cook2021-02-141-0/+1
| | | | | | | | | | | | | The try change added code to pp_return to skip past try contexts when looking for the sub/sort/eval context to return from. This was only needed because cx_pusheval() sets si_cxsubix to the current frame and try uses that function to push it's context, that value is then used by the dopopto_cursub() macro to shortcut walking the context stack. Since we don't need to treat try as a sub for return, list vs array checks or lvalue sub checks, don't set si_cxsubix on try.
* A totally new optree structure for try/catch involving three new optypesPaul "LeoNerd" Evans2021-02-141-0/+1
|
* Add a newTRYCATCHOP(); migrate the custom code out of perly.y into itPaul "LeoNerd" Evans2021-02-141-0/+1
|
* regexec.c: Make internal function staticKarl Williamson2021-02-101-3/+1
| | | | | This used to be called from utf8.c, but no longer; no need to make it other than static. This allows the compiler to better optimize.
* Allow empty lower bound in /{,n}/Karl Williamson2021-01-201-1/+0
| | | | | | | | This change has been planned for a long time, bringing Perl into parity with similar languages, but it took many deprecation cycles to be able to reach the point where it could safely go in. This fixes GH #18264
* Revamp regcurly(), regpiece() use of itKarl Williamson2021-01-201-0/+1
| | | | | | | | | | | | | | | | | | | | This commit copies portions of new_regcurly(), which has been around since 5.28, into plain regcurly(), as a baby step in preparation for converting entirely to the new one. These functions are used for parsing {m,n} quantifiers. Future commits will add capabilities not available using the old version. The commit adds an optional parameter, to return to the caller information it gleans during parsing. regpiece() is changed by this commit to use this information, instead of itself reparsing the input. Part of the reason for this commit is that changes are planned soon to what is legal syntax. With this commit in place, those changes only have to be done once. This commit also extracts into a function the calculation of the quantifier bounds. This allows the logic for that to be done in one place instead of two.
* add a bareword_filehandles feature, which is enabled by defaultTony Cook2021-01-041-0/+1
| | | | This disables use of bareword filehandles except for the built-in handles
* Evaluate arg once in all forms of SvTRUEKarl Williamson2020-12-061-0/+4
| | | | 5.32 did this for one form; now all do.
* perlapi: Note proper rplcemnt for pad_compname_typeKarl Williamson2020-11-221-0/+3
|
* Move regcurly to regcomp.c (from inline.h)Karl Williamson2020-11-181-1/+1
| | | | | | This function is called only at compile time; experience has shown that compile-time operations are not time-critical. And future commits will lengthen it, making it not practically inlinable anyway.
* Fix up delimcpy_no_escape()Karl Williamson2020-10-311-1/+1
| | | | | | I modified this function in ab01742544b98b5b5e13d8e1a6e9df474b9e3005, and did not fully understand the edge cases. This commit now handles those properly, the same as plain delimcpy() does.
* embed.h: Add caution about PERL_NO_SHORT_NAMESKarl Williamson2020-10-261-1/+3
|
* add Perl_magic_freemglob() magic vtable methodDavid Mitchell2020-10-231-0/+1
| | | | | | | | | | | | | S_mg_free_struct() has a workaround to never free mg->mg_ptr for PERL_MAGIC_regex_global. Move this logic into a new magic vtable free method instead, so that S_mg_free_struct() (which gets called for every type of magic) doesn't have the overhead of checking every time for mg->mg_type == PERL_MAGIC_regex_global. [ No, I don't know why PERL_MAGIC_regex_global's vtable and methods are suffixed mglob rather than regex_global or vice versa ]
* add Perl_magic_freeutf8() magic vtable methodDavid Mitchell2020-10-231-0/+1
| | | | | | | | | | S_mg_free_struct() has a workaround to free mg->mg_ptr in PERL_MAGIC_utf8 even if mg_len is zero. Move this logic into a new magic vtable free method instead, so that S_mg_free_struct() (which gets called for every type of magic) doesn't have the overhead of checking every time for mg->mg_type == PERL_MAGIC_utf8.
* add Perl_magic_freecollxfrm() magic vtable methodDavid Mitchell2020-10-231-0/+1
| | | | | | | | | | v5.29.9-139-g44955e7de8 added a workaround to S_mg_free_struct() to free mg->mg_ptr in PERL_MAGIC_collxfrm even if mg_len is zero. Move this logic into a new magic vtable free method instead, so that S_mg_free_struct() (which gets called for every type of magic) doesn't have the overhead of checking every time for mg->mg_type == PERL_MAGIC_collxfrm.
* perlapi: deprecate pack_cat() (a mathoms func)Karl Williamson2020-09-051-0/+3
|
* Add av_count()Karl Williamson2020-08-191-1/+1
| | | | | | | | | This returns the number of elements in an array in a clearly named function. av_top_index(), av_tindex() are clearly named, but are less than ideal, and came about because no one back then thought of this one, until now Paul Evans did.
* Remove PERL_GLOBAL_STRUCTDagfinn Ilmari Mannsåker2020-07-201-5/+0
| | | | | | | | This was originally added for MinGW, which no longer needs it, and only still used by Symbian, which is now removed. This also leaves perlapi.[ch] empty, but we keep the header for CPAN backwards compatibility.
* Remove Symbian portDagfinn Ilmari Mannsåker2020-07-201-1/+1
| | | | | Also eliminate USE_HEAP_INSTEAD_OF_STACK and SETSOCKOPT_OPTION_VALUE_T, since Symbian was the only user of those.
* study_chunk: honour mutate_ok over recursionHugo van der Sanden2020-06-011-1/+1
| | | | | | | | | | | | | | As described in #17743, study_chunk can re-enter itself either by simple recursion or by enframing. 089ad25d3f used the new mutate_ok variable to track whether we were within the framing scope of GOSUB, and to disallow mutating changes to ops if so. This commit extends that logic to reentry by recursion, passing in the current state as was_mutate_ok. (CVE-2020-12723) (cherry picked from commit 3445383845ed220eaa12cd406db2067eb7b8a741)
* study_chunk: extract rck_elide_nothingHugo van der Sanden2020-06-011-0/+1
| | | | | | (CVE-2020-10878) (cherry picked from commit 4fccd2d99bdeb28c2937c3220ea5334999564aa8)
* Add named sequences to Unicode wildcard name capabilitesKarl Williamson2020-03-201-2/+2
| | | | | | | | | Prior to this commit, specifying a named sequence would result in a mostly unhelpful fatal error message. This makes their use legal. This is also the beginning of allowing Unicode string properties, which are a new thing in the (still draft) Unicode requirements for regular expression parsing, UTS 18. Full compliance will have to come later.
* pp_match(): output regex debugging infoKarl Williamson2020-03-181-0/+1
| | | | | | | | This fixes #17612 This adds an inline function to pp_hot to be called to determine if debugging info should be output or not, regardless of whether it comes from -Dr, or from a 'use re Debug' statement
* chained comparisonsZefram2020-03-121-0/+3
|
* Rmv obsolete functionKarl Williamson2020-03-111-1/+0
| | | | | Use of this function was removed as part of adding wildcarding to the Unicode name property
* Allow debugging from regexec.c back to regcomp.cKarl Williamson2020-03-111-1/+8
| | | | | | | | | The compilation of User-defined properties in a regular expression that haven't been defined at the time that pattern is compiled is deferred until execution time. Until this commit, any request for debugging info on those was ignored. This fixes that by
* Add thread safety to some environment accessesKarl Williamson2020-03-111-0/+1
| | | | | | | | | | | | | | | | | | The previous commit added a mutex specifically for protecting against simultaneous accesses of the environment. This commit changes the normal getenv, putenv, and clearenv functions to use it, to avoid races. This makes the code simpler in places where we've gotten burned and added stuff to avoid races. Other places where we haven't known we were getting burned could have existed until now. Now that comes automatically, and we can remove the special cases we earlier stumbled over. getenv() returns a pointer to static memory, which can be overwritten at any moment from another thread, or even another getenv from the same thread. This commit changes the accesses to be under control of a mutex, and in the case of getenv, a mortalized copy is created so that there is no possible race.
* Implement \p{Name=/.../} wildcardsKarl Williamson2020-03-111-0/+1
| | | | | This commit adds wildcard subpatterns for the Name and Name Aliases properties.
* optimize sort by inlining comparison functionsTomasz Konojacki2020-03-091-0/+9
| | | | | | | | This makes special-cased forms such as sort { $b <=> $a } even faster. Also, since this commit removes PL_sort_RealCmp, it fixes the issue with nested sort calls mentioned in gh #16129
* Allow more debugging in re_comp.cKarl Williamson2020-03-021-2/+6
| | | | | | | | This adds two main functions that were previously only defined in regcomp.c to also be defined in re_comp.c. This allows re.pm to use debugging with them. To avoid duplicating large data structures, several lightweight wrapper functions are added to regcomp.c that re_comp.c calls to access those structures.
* Move two regex functions so that can use re debugKarl Williamson2020-03-021-2/+2
| | | | These have to have a version in re_comp.c for re.pm to work on them.
* embed.fnc: Reorder the entries dealing with regexesKarl Williamson2020-03-021-33/+33
| | | | | | This moves a bunch of entries around so that they make more sense, and consolidates some blocks that had the same #ifdefs. There should be no change in what gets compiled.