summaryrefslogtreecommitdiff
path: root/embed.fnc
Commit message (Collapse)AuthorAgeFilesLines
...
* embed.fnc - documentation improvementsYves Orton2022-12-241-291/+319
| | | | | | | | | - improve comment about exporting symbols. The explanation was a bit scanty and out of date. This expands and updates it to modern practice. - ensure that docs for flags are all single quoted, thus they can be easily found with a search in the file.
* Revert "Inline strlcat(), strlcpy()"Yves Orton2022-12-221-2/+2
| | | | | | This reverts commit 5be23dd97e360d19c8abb4c0c534f5d5f9d3691a. The original patch seems to break DBM.
* Inline strlcat(), strlcpy()Karl Williamson2022-12-221-2/+2
| | | | These short functions are reasonably frequently used.
* Inline savepv() and related functionsKarl Williamson2022-12-221-4/+4
| | | | | | | These short functions are moved from util.c to inline.h. savesharedpv() and savesharedpvn() aren't moved because there is a complication of needing also to move croak_no_mem(), which could be done, but are these called enough to justify it?
* gv.c - rename amagic_find() to amagic_applies()Yves Orton2022-12-221-1/+1
| | | | | | | | | The api for amagic_find() didnt make as much as sense as we thought at first. Most people will be using this as a predicate, and don't care about the returned CV, so to simplify things until we can really think this through the required API this switches it to return a bool and renames it to amagic_applies(), as in "which amagic_applies to this sv".
* Fix broken API: sync_locale()Karl Williamson2022-12-201-4/+4
| | | | | | | | | | | | | | This fixes GH #20565. Lack of tests allowed sync_locale() to get broken until CPAN testing showed it so. Basically, I blew it in 9f5a615be674d7663d3b4719849baa1ba3027f5b. Most egregiously, I forgot to turn back on when a sync_locale() is executed, the toggling for locales whose radix character isn't a dot. And this needs a way to tell the other code that it needs to recompute things at this time, since our records don't reflect what happened before the sync.
* Add `forbid_outofblock_ops()` to op.cPaul "LeoNerd" Evans2022-12-171-0/+1
| | | | | Adds a new function to statically detect forbidden control flow out of a block.
* Added function amagic_find(sv, method, flags)Eric Herman2022-12-161-0/+1
| | | | | | | Returns the CV pointer to the overloaded method, which will be needed by join to detect concat magic. Co-authored-by: Philippe Bruhat (BooK) <book@cpan.org>
* regcomp.c - decompose into smaller filesYves Orton2022-12-091-118/+149
| | | | | | | | | | | | | | | | | This splits a bunch of the subcomponents of the regex engine into smaller files. regcomp_debug.c regcomp_internal.h regcomp_invlist.c regcomp_study.c regcomp_trie.c The only real change besides to the build machine to achieve the split is to also adds some new defines which can be used in embed.fnc to control exports without having to enumerate /every/ regex engine file. For instance all of regcomp*.c defines PERL_IN_REGCOMP_ANY, and this is used in embed.fnc to manage exports.
* Define a PL_infix_plugin hook, of a similar style to PL_keyword_pluginPaul "LeoNerd" Evans2022-12-081-0/+4
| | | | | | | | | Runs for identifier-named custom infix operators and sequences of non-identifier symbol characters. Defines multiple precedence levels for custom infix operators that fit alongside exponentiation, multiplication, addition, or relational comparision operators, as well as a "high" and "low" at either end.
* Define a newPADxVOP() convenience functionPaul "LeoNerd" Evans2022-12-081-0/+1
| | | | | | | | | This function conveniently sets the ->op_targ field of the returned op, making it neater to use inline in larger trees of new*OP functions used to build optree fragments. This function is implemented as `static inline`, for speed and code-size reasons.
* locale.c: Rewrite localeconv() handlingKarl Williamson2022-12-071-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | localeconv() returns a structure contaiing fields that are associated with two different categories: LC_NUMERIC and LC_MONETARY. Perl via POSIX::localeconv() reutrns a hash containing all the fields. Testing on Windows showed that if LC_CTYPE is not the same locale as LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for the numeric ones, mojibake can result. The solution to similar situations elsewhere in the code is to toggle LC_CTYPE into being the same locale as the one for the returned fields. But those situations only have a single locale that LC_CTYPE has to match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or the other, not both at the same time. The previous implementation did not consider this possibility, and wasn't easily changeable to work. Therefore, this rewrites a bunch of it. The solution used is to call localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale don't match (with LC_CTYPE toggled to the corresponding one each time). (Only one call is made if the two categories have the same locale.) This one vs two complicated the code, but I thought it was worth it given that the one call is the most likely case. Another complication is that on platforms that lack nl_langinfo(), (Windows, for example), localeconv() is used to emulate portions of it. Previously there was a separate function to handle this, using an SV() cast as an HV() to avoid using a hash that wasn't actually necessary. That proved to lead to extra duplicated code under the new scheme, so that function was collapsed into a single one and a real hash is used in all circumstances, but is only populated with the one or two fields needed for the emulation. The only part of this commit that I thought could be split off from the rest concerns the fact that localeconv()'s return is not thread-safe, and so must be copied to a safe place (the hash) while in a critical section, locking out all other threads. Before this commit, that copying was accompanied by determining if each string field needed to be marked as UTF-8. That determination isn't necessarily trivial, so should really not be in the critical section. This commit does that. And, with some effort, that part could have been split into a separate commit. but I didn't think it was worth the effort.
* locale.c: Move 2 functions elsewhere in the codeKarl Williamson2022-12-071-6/+6
| | | | | This is in preparation for them to be called on platforms where locale handling is not enabled.
* locale.c: Reorder parameters to static functionKarl Williamson2022-12-071-4/+4
| | | | | | Move the least important parameters (that can be NULL to indicate unused) to the end of the parameter list, thereby moving the required ones to the beginning. This makes it clear what is important.
* locale.c: Rmv unnecessary parameter from static functionKarl Williamson2022-12-011-2/+1
| | | | This dates from an earlier implementation
* pp_sort: don't force inline the comparison functionsTony Cook2022-11-301-14/+14
| | | | | | | | | | | | | With gcc 12 when building with the -Og forcing inline causes the build to fail, with the compiler complaining it can't inline those functions. With -O2, the functions are inlined anyway. Note that -Og is the gcc recommended optimization option to build with if you are going to debug the binary. Fixes #20395
* Fix POSIX::strxfrm()Karl Williamson2022-11-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does two things. Most simply it extends strxfrm() to handle strings containing NUL characters. Previously the transformation stopped at the first NUL encountered. Second, it combines the implementation of this with the existing implementation used for the 'cmp' operator, eliminating existing discrepancies and preventing future ones. This function takes an SV containing a PV. The encoding of that PV is based on the locale of the LC_CTYPE locale. It really doesn't make sense to collate based off of the sequencing of a different locale, which prior to this commit it would do (but not for 'cmp') if the LC_COLLATION locale were different. As an example, consider the string: my $string = quotemeta join "", map { chr } (1..255); and with LC_TYPE=8859-1 (Latin-1, used for several Western European languages), LC_COLLATE set to ja_JP.utf8. This doesn't make much sense, outside of specialty uses such as a lazy implementation of a Japanese/French dictionary, or for quoting snippets in one language in a document written in the other. ('lazy' because such text should really be changing locales to the language of the snippet currently being worked on.) Nevertheless Perl should do something as sensible as possible. and this commit changes POSIX::strxfrm() to use the method already in use by the code implementing 'cmp'. Prior to this commit, POSIX::strxfrm($string) yielded on glibc 12.1: ^\3^\4^\5^\6^\a^\b^\t^\n^\13^\f^\r^\16^\17^\20^\21^\22^\23^\24^\25^\26^\27^\30^\31^\32^\e^\34^\35^\36^\37^ ^!^\"^#^\$^%^&^'^(^)^*^+^,^-^.^/^0^123456789:;^<^=^>^?^\@^A^BCDEFGHIJKLMNOPQRSTUVWXYZ[\\^]^^^_^`a^bcdefghijklmnopqrstuvwxyz{|^}^~^\177^\302\200^\302\201^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3^\3 These are effectively a sorting order, and it is not meant to be human understandable. But it is clear that most of the characters had the same weight of 3, so a libc sort would mark them as ties in sorting order. And after, ^\3^\4^\5^\6^\a^\b^\t^\n^\13^\f^\r^\16^\17^\20^\21^\22^\23^\24^\25^\26^\27^\30^\31^\32^\e^\34^\35^\36^\37^ ^!^\"^#^\$^%^&^'^(^)^*^+^,^-^.^/^0^123456789:;^<^=^>^?^\@^A^BCDEFGHIJKLMNOPQRSTUVWXYZ[\\^]^^^_^`a^bcdefghijklmnopqrstuvwxyz{|^}^~^\177^\302\200^\302\201^\302\202^\302\203^\302\204^\302\205^\302\206^\302\207^\302\210^\302\211^\302\212^\302\213^\302\214^\302\215^\302\216^\302\217^\3\3^\3\3^\302\220^\302\221^\302\222^\302\223^\302\224^\302\225^\302\226^\302\227^\302\230^\302\231^\302\232^\302\233^\302\234^\302\235^\302\236^\302\237^\3\3^\341\257\211^\304\257^\304\260^\341\257\221^\3\3^\341\257\212^\304\266^\303\255^\341\257\216^\341\257\215^\3\3^\305\225^\3\3^\341\257\217^\341\257\203^\304\251^\304\234^\3\3^\3\3^\303\253^\3\3^\305\260^\3\3^\341\257\200^\3\3^\341\257\214^\3\3^\3\3^\3\3^\3\3^\341\257\213^\341\260\236^\341\260\235^\341\260\240^\341\260\246^\341\260\237^\341\260\245^\341\260\202^\341\260\252^\341\260\256^\341\260\255^\341\260\260^\341\260\257^\341\260\273^\341\260\272^\341\260\275^\341\260\274^\3\3^\341\261\213^\341\261\215^\341\261\214^\341\261\217^\341\261\223^\341\261\216^\304\235^\341\260\211^\341\261\236^\341\261\235^\341\261\240^\341\261\237^\341\261\255^\341\260\214^\341\260\232^\341\261\264^\341\261\263^\341\261\266^\341\261\274^\341\261\265^\341\261\273^\341\260\215^\341\262\200^\341\262\204^\341\262\203^\341\262\206^\341\262\205^\341\262\221^\341\262\220^\341\262\223^\341\262\222^\341\260\217^\341\262\240^\341\262\242^\341\262\241^\341\262\244^\341\262\250^\341\262\243^\304\236^\341\260\230^\341\262\263^\341\262\262^\341\262\265^\341\262\264^\341\263\202^\341\260\234^\341\263\203 which shows that most of the ties have been resolved, and hence the results are more sensible
* Recognise `//=` and `||=` syntax in signature parameter defaultsPaul "LeoNerd" Evans2022-11-261-0/+1
| | | | | | These create parameters where the default expression is assigned whenever the caller did not pass a defined (or true) value. I.e. both if it is missing, or is present but undef (or false).
* change the return value of SSNEW to SSize_tTony Cook2022-11-031-2/+2
| | | | | | | | | | | | The normal savestack index is an I32, but that counts in ANY (which are typically the larger of pointer or IV sizes), this meant is the save stack was large, but still nowhere need it's limit, the result of SSNEW() could overflow. So make the result SSize_t and adjust SSPTR() to match. SSPTR() asserts to ensure the supplied type is the same size as SSize_t to ensure callers are updated to handle the new limit.
* cop.h - get rid of the STRLEN* stuff from cop_warningsYves Orton2022-11-021-2/+2
| | | | | With RCPV strings we can use the RCPV_LEN() macro, and make this logic a little less weird.
* cop.h - add support for refcounted filenames in cops under threadsYves Orton2022-11-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a weird bifurcation of the cop logic around threads. With threads we use a char * cop_file member, without it we use a GV * and replace cop_file with cop_filegv. The GV * code refcounts filenames and more or less efficiently shares the filename amongst many opcodes. However under threads we were simplify copying the filenames into each opcode. This is because in theory opcodes created in one thread can be destroyed in another. I say in theory because as far as I know the core code does not actually do this. But we have tests that you can construct a perl, clone it, and then destroy the original, and have the copy work just fine, this means that opcodes constructed in the main thread will be destroyed in the cloned thread. This in turn means that you can't put SV derived structures into the op-tree under threads. Which is why we can not use the GV * stategy under threads. As such this code adds a new struct/type RCPV, which is a refcounted string using shared memory. This is implemented in such a way that code that previously used a char * can continue to do so, as the refcounting data is located a specific offset before the char * pointer itself. This also allows the len data to embedded "into" the PV, which allows us to expose macros to acces the length of what is in theory a null terminated string. struct rcpv { UV refcount; STRLEN len; char pv[1]; }; typedef struct rcpv RCPV; The struct is sized appropriately on creation in rcpv_new() so that the pv member contains the full string plus a null byte. It then returns a pointer to the pv member of the struct. Thus the refcount and length and embedded at a predictable offset in front of the char *, which means we do not have to change any types for members using this. We provide three operations: rcpv_new(), rcpv_copy() and rcpv_free(), which roughly correspond with newSVpv(), SvREFCNT_inc(), SvREFCNT_dec(), and a handful of macros as well. We also expose SAVERCPVFREE which is similar to SAVEGENERICSV but operates on pv's constructed with rcpv_new(). Currently I have not restricted use of this logic to threaded perls. We simply do not use it in unthreaded perls, but I see no reason we couldn't normalize the code to use this in both cases, except possibly that actually the GV case is more efficient. Note that rcpv_new() does NOT use a hash table to dedup strings. Two calls to rcpv_new() with the same arguments will produce two distinct pointers with their own refcount data. Refcounting the cop_file data was Tony Cook's idea.
* Add sv_derived_from_hv() helper functionPaul "LeoNerd" Evans2022-10-251-0/+1
|
* Better handling of builtin CV attributesPaul "LeoNerd" Evans2022-10-251-0/+1
| | | | | | | | | | | The previous code would handle subroutine attributes directly against `PL_compcv` as a side-effect of merely parsing the syntax in `yyl_colon()`, an unlikely place for anyone to find it. This complicates the way the parser works. The new structure creates a new function to apply all the builtin attributes out of an attribute list to any given CV, and invokes it from the parser at a slightly better time.
* dump.c - add ways to dump HV's and AV's and SV's to any depth.Yves Orton2022-10-181-0/+3
| | | | | | | | | | | | | | | | | | | Currently you can use sv_dump() to dump an AV or HV as these are still SV's underneath in C terms, but it will only dump out the top level object and will not dump out its contents, whereas if you have an RV which references the same AV or HV it will dump it out to depth of 4. This adds av_dump() and hv_dump() which dump up to a depth of 3 (thus matching what sv_dump() would have showed had it been used to dump an RV to the same object). It also adds sv_dump_depth() which allows passing in an arbitrary depth. You could argue the former are redundant in light of sv_dump_depth(), but the av_dump() and hv_dump() variants do not require a cast for their arguments. These functions are provided as debugging aids for development. They aren't used directly in the core, and they all wrap the same core routine that is used for sv_dump() (do_sv_dump()).
* locale.c: Don't compile get_displayable_string if unusedKarl Williamson2022-10-181-0/+2
| | | | This was generating warnings on several different platforms
* Switch libc per-interpreter data when tTHX changesKarl Williamson2022-10-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the previous commit, some library functions now keep per-thread state. So far the only ones we care about are libc locale-changing ones. When perl changes threads by swapping out tTHX, those library functions need to be informed about the new value so that they remain in sync with what perl thinks the locale should be. This commit creates a function to do this, and changes the thread-changing macros to also call this as part of the change. For POSIX 2008, the function just calls uselocale() using the per-interpreter object introduced previously. For Windows, this commit adds a per-interpreter string of the current LC_ALL, and the function calls setlocale on that. We keep the same string for POSIX 2008 implementations that lack querylocale(), so this commit just enables that variable on Windows as well. The code is already in place to free the memory the string occupies when done. The commit also creates a mechanism to skip this during thread destruction. A thread in its death throes doesn't need to have accurate locale information, and the information needed to map from thread to what libc needs to know gets destroyed as part of those throes, while relics of the thread remain. I couldn't find a way to accurately know if we are dealing with a relic or not, so the solution I adopted was to just not switch during destruction. This commit completes fixing #20155.
* locale.c: Compile display fcn under more circumstancesKarl Williamson2022-10-181-1/+3
| | | | | This is in preparation for it to be used in more instances in future commits. It uses a symbol that won't be defined until those commits.
* locale.c: Make win32_setlocale return const *Karl Williamson2022-10-101-1/+1
| | | | | Add a bit of safety, and makes it correspond to the other setlocale returns we use.
* Add some const to wrap_wsetlocaleKarl Williamson2022-10-101-1/+2
| | | | And move declarations closer to first use as allowed in C99
* locale.c: Generalize static functionsKarl Williamson2022-10-101-2/+4
| | | | | | | | | This changes these functions to take the code page as input, instead of being just UTF-8. Macros are created to call them with UTF-8. I'm doing this because there is no loss of efficiency, and it is somewhat jarring, given Perl terminology, to call a function with 'Byte' in the name with a parameter with 'utf8' in the name.
* locale.c: Make static 2 Win-only functionsKarl Williamson2022-10-101-2/+2
| | | | | | | These are non-API, used in this file, and because of #ifdefs, not accessible outside it, so there is no current need to make them publicly available. If we were ever to need them to be accessible more widely, they would not belong in this file.
* locale.c: Meld two functions into oneKarl Williamson2022-10-101-2/+3
| | | | | | There is code in locale.c to emulate POSIX 'setlocale(foo, "")'. And there is separate code to emulate this on Windows. This commit collapses them, ensuring the same algorithm is used on both systems.
* locale.c: Move find_locale_from_environment() in fileKarl Williamson2022-10-101-1/+3
| | | | | This is in preparation for this function to be used under more circumstances.
* Add wrap_wsetlocale() to embed.fncKarl Williamson2022-10-101-0/+2
| | | | This makes the calls to it cleaner.
* Add pTHX to thread_locale_(init|term)Karl Williamson2022-09-301-2/+2
| | | | | A future commit will want the context for more than just DEBUGGING builds.
* locale.c: Revamp sync_locale(), switch_to_global_locale()Karl Williamson2022-09-291-2/+2
| | | | | | In reading this code, I realized that there were instances where the functions didn't work properly. It is hard to test these, but a future commit will do so.
* locale.c Change function to return a string, not printKarl Williamson2022-09-291-3/+1
| | | | | This makes some print statements less awkward, and is more flexible, which will be used in future commits
* locale.c: Stop compiler warningKarl Williamson2022-09-251-0/+3
| | | | | | | | S_less_dicey_bool_setlocale_r() is a short function that makes a complete set of similar functions, but there is no current use of it. So just #ifdef it out. This resolves #20338
* SvPVCLEAR_FRESH - change from macro to inline functionRichard Leach2022-09-221-0/+1
| | | | | This is to prevent warnings due to the char * frequently being unused.
* locale.c: Refactor internal debugging functionKarl Williamson2022-09-221-3/+5
| | | | | setlocale_debug_string() variants now use Perl_form, a function I didn't know existed when I originally wrote this code.
* locale.c: Mitigate unsafe threaded localesKarl Williamson2022-09-211-0/+15
| | | | | | | | | | | | | | | | This a new set of macros and functions to do locale changing and querying for platforms where perl is compiled with threads, but the platform doesn't have thread-safe locale handling. All it does is: 1) The return of setlocale() is always safely saved in a per-thread buffer, and 2) setlocale() is protected by a mutex from other threads which are using perl's locale functions. This isn't much, but it might be enough to get some programs to work on such platforms which rarely change or query the locale.
* make expr parameter to newLOOPOP() NN, it was required anywayTony Cook2022-09-211-1/+1
| | | | | | | | | | | Coverity picked up that while we checked that expr was non-NULL in newLOOPOP(), the call to new_logop() dereferenced it anyway, the address of expr is passed to new_logop(), which sees it as firstp (so firstp == &expr), that is then derefed to first (first == expr) and then switches on first->op_type, hence dereferencing expr. Since both callers always pass in an expr value, and the code would have crashed anyway with a NULL expr, make it required.
* locale.c: Silence compiler warning when no LC_COLLATEKarl Williamson2022-09-101-7/+11
| | | | | On Configurations without LC_COLLATE, various unused warnings were being generated.
* locale.c: Silence compiler warning when no LC_CTYPEKarl Williamson2022-09-101-1/+3
| | | | | On Configurations without LC_CTYPE, various unused warnings were being generated.
* locale.c: Silence compiler warning about S_mortalixzed_pv_copyKarl Williamson2022-09-101-1/+1
| | | | | This function is not used unless locales are enabled, so need not be defined unless that is true.
* locale.c: Silence compiler warning about S_new_numericKarl Williamson2022-09-101-0/+2
| | | | | This function is not used unless LC_NUMERIC is enabled, so need not be defined unless that is true.
* Stop parsing on first syntax error.Yves Orton2022-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | We try to keep parsing after many types of errors, up to a (current) maximum of 10 errors. Continuing after a semantic error (like undeclared variables) can be helpful, for instance showing a set of common errors, but continuing after a syntax error isn't helpful most of the time as the internal state of the parser can get confused and is not reliably restored in between attempts. This can produce sometimes completely bizarre errors which just obscure the true error, and has resulted in security tickets being filed in the past. This patch makes the parser stop after the first syntax error, while preserving the current behavior for other errors. An error is considered a syntax error if the error message from our internals is the literal text "syntax error". This may not be a complete list of true syntax errors, we can iterate on that in the future. This fixes the segfaults reported in Issue #17397, and #16944 and likely fixes other "segfault due to compiler continuation after syntax error" bugs that we have on record, which has been a recurring issue over the years.
* locale.c: Add S_get_LC_ALL display()Karl Williamson2022-09-081-4/+10
| | | | | | | This encapsulates a common paradigm, helpful for debugging It requires the calculate_LC_ALL to be additionally available when there is no LC_ALL.
* embed.fnc: Fix indendationKarl Williamson2022-09-071-1/+1
|
* locale.c: Convert final use of S_category_name()Karl Williamson2022-09-021-1/+0
| | | | | The previous commit removed all but one use of this function, which is replaceable by an array lookup