summaryrefslogtreecommitdiff
path: root/embed.h
Commit message (Collapse)AuthorAgeFilesLines
* Add `forbid_outofblock_ops()` to op.cPaul "LeoNerd" Evans2022-12-171-0/+1
| | | | | Adds a new function to statically detect forbidden control flow out of a block.
* Added function amagic_find(sv, method, flags)Eric Herman2022-12-161-0/+1
| | | | | | | Returns the CV pointer to the overloaded method, which will be needed by join to detect concat magic. Co-authored-by: Philippe Bruhat (BooK) <book@cpan.org>
* regcomp.c - decompose into smaller filesYves Orton2022-12-091-99/+113
| | | | | | | | | | | | | | | | | This splits a bunch of the subcomponents of the regex engine into smaller files. regcomp_debug.c regcomp_internal.h regcomp_invlist.c regcomp_study.c regcomp_trie.c The only real change besides to the build machine to achieve the split is to also adds some new defines which can be used in embed.fnc to control exports without having to enumerate /every/ regex engine file. For instance all of regcomp*.c defines PERL_IN_REGCOMP_ANY, and this is used in embed.fnc to manage exports.
* Define a PL_infix_plugin hook, of a similar style to PL_keyword_pluginPaul "LeoNerd" Evans2022-12-081-0/+2
| | | | | | | | | Runs for identifier-named custom infix operators and sequences of non-identifier symbol characters. Defines multiple precedence levels for custom infix operators that fit alongside exponentiation, multiplication, addition, or relational comparision operators, as well as a "high" and "low" at either end.
* Define a newPADxVOP() convenience functionPaul "LeoNerd" Evans2022-12-081-0/+1
| | | | | | | | | This function conveniently sets the ->op_targ field of the returned op, making it neater to use inline in larger trees of new*OP functions used to build optree fragments. This function is implemented as `static inline`, for speed and code-size reasons.
* locale.c: Rewrite localeconv() handlingKarl Williamson2022-12-071-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | localeconv() returns a structure contaiing fields that are associated with two different categories: LC_NUMERIC and LC_MONETARY. Perl via POSIX::localeconv() reutrns a hash containing all the fields. Testing on Windows showed that if LC_CTYPE is not the same locale as LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for the numeric ones, mojibake can result. The solution to similar situations elsewhere in the code is to toggle LC_CTYPE into being the same locale as the one for the returned fields. But those situations only have a single locale that LC_CTYPE has to match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or the other, not both at the same time. The previous implementation did not consider this possibility, and wasn't easily changeable to work. Therefore, this rewrites a bunch of it. The solution used is to call localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale don't match (with LC_CTYPE toggled to the corresponding one each time). (Only one call is made if the two categories have the same locale.) This one vs two complicated the code, but I thought it was worth it given that the one call is the most likely case. Another complication is that on platforms that lack nl_langinfo(), (Windows, for example), localeconv() is used to emulate portions of it. Previously there was a separate function to handle this, using an SV() cast as an HV() to avoid using a hash that wasn't actually necessary. That proved to lead to extra duplicated code under the new scheme, so that function was collapsed into a single one and a real hash is used in all circumstances, but is only populated with the one or two fields needed for the emulation. The only part of this commit that I thought could be split off from the rest concerns the fact that localeconv()'s return is not thread-safe, and so must be copied to a safe place (the hash) while in a critical section, locking out all other threads. Before this commit, that copying was accompanied by determining if each string field needed to be marked as UTF-8. That determination isn't necessarily trivial, so should really not be in the critical section. This commit does that. And, with some effort, that part could have been split into a separate commit. but I didn't think it was worth the effort.
* locale.c: Move 2 functions elsewhere in the codeKarl Williamson2022-12-071-2/+2
| | | | | This is in preparation for them to be called on platforms where locale handling is not enabled.
* locale.c: Rmv unnecessary parameter from static functionKarl Williamson2022-12-011-1/+1
| | | | This dates from an earlier implementation
* Recognise `//=` and `||=` syntax in signature parameter defaultsPaul "LeoNerd" Evans2022-11-261-0/+1
| | | | | | These create parameters where the default expression is assigned whenever the caller did not pass a defined (or true) value. I.e. both if it is missing, or is present but undef (or false).
* cop.h - add support for refcounted filenames in cops under threadsYves Orton2022-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a weird bifurcation of the cop logic around threads. With threads we use a char * cop_file member, without it we use a GV * and replace cop_file with cop_filegv. The GV * code refcounts filenames and more or less efficiently shares the filename amongst many opcodes. However under threads we were simplify copying the filenames into each opcode. This is because in theory opcodes created in one thread can be destroyed in another. I say in theory because as far as I know the core code does not actually do this. But we have tests that you can construct a perl, clone it, and then destroy the original, and have the copy work just fine, this means that opcodes constructed in the main thread will be destroyed in the cloned thread. This in turn means that you can't put SV derived structures into the op-tree under threads. Which is why we can not use the GV * stategy under threads. As such this code adds a new struct/type RCPV, which is a refcounted string using shared memory. This is implemented in such a way that code that previously used a char * can continue to do so, as the refcounting data is located a specific offset before the char * pointer itself. This also allows the len data to embedded "into" the PV, which allows us to expose macros to acces the length of what is in theory a null terminated string. struct rcpv { UV refcount; STRLEN len; char pv[1]; }; typedef struct rcpv RCPV; The struct is sized appropriately on creation in rcpv_new() so that the pv member contains the full string plus a null byte. It then returns a pointer to the pv member of the struct. Thus the refcount and length and embedded at a predictable offset in front of the char *, which means we do not have to change any types for members using this. We provide three operations: rcpv_new(), rcpv_copy() and rcpv_free(), which roughly correspond with newSVpv(), SvREFCNT_inc(), SvREFCNT_dec(), and a handful of macros as well. We also expose SAVERCPVFREE which is similar to SAVEGENERICSV but operates on pv's constructed with rcpv_new(). Currently I have not restricted use of this logic to threaded perls. We simply do not use it in unthreaded perls, but I see no reason we couldn't normalize the code to use this in both cases, except possibly that actually the GV case is more efficient. Note that rcpv_new() does NOT use a hash table to dedup strings. Two calls to rcpv_new() with the same arguments will produce two distinct pointers with their own refcount data. Refcounting the cop_file data was Tony Cook's idea.
* Add sv_derived_from_hv() helper functionPaul "LeoNerd" Evans2022-10-251-0/+1
|
* Better handling of builtin CV attributesPaul "LeoNerd" Evans2022-10-251-0/+1
| | | | | | | | | | | The previous code would handle subroutine attributes directly against `PL_compcv` as a side-effect of merely parsing the syntax in `yyl_colon()`, an unlikely place for anyone to find it. This complicates the way the parser works. The new structure creates a new function to apply all the builtin attributes out of an attribute list to any given CV, and invokes it from the parser at a slightly better time.
* dump.c - add ways to dump HV's and AV's and SV's to any depth.Yves Orton2022-10-181-0/+3
| | | | | | | | | | | | | | | | | | | Currently you can use sv_dump() to dump an AV or HV as these are still SV's underneath in C terms, but it will only dump out the top level object and will not dump out its contents, whereas if you have an RV which references the same AV or HV it will dump it out to depth of 4. This adds av_dump() and hv_dump() which dump up to a depth of 3 (thus matching what sv_dump() would have showed had it been used to dump an RV to the same object). It also adds sv_dump_depth() which allows passing in an arbitrary depth. You could argue the former are redundant in light of sv_dump_depth(), but the av_dump() and hv_dump() variants do not require a cast for their arguments. These functions are provided as debugging aids for development. They aren't used directly in the core, and they all wrap the same core routine that is used for sv_dump() (do_sv_dump()).
* locale.c: Don't compile get_displayable_string if unusedKarl Williamson2022-10-181-1/+5
| | | | This was generating warnings on several different platforms
* locale.c: Compile display fcn under more circumstancesKarl Williamson2022-10-181-1/+7
| | | | | This is in preparation for it to be used in more instances in future commits. It uses a symbol that won't be defined until those commits.
* locale.c: Generalize static functionsKarl Williamson2022-10-101-2/+2
| | | | | | | | | This changes these functions to take the code page as input, instead of being just UTF-8. Macros are created to call them with UTF-8. I'm doing this because there is no loss of efficiency, and it is somewhat jarring, given Perl terminology, to call a function with 'Byte' in the name with a parameter with 'utf8' in the name.
* locale.c: Make static 2 Win-only functionsKarl Williamson2022-10-101-8/+2
| | | | | | | These are non-API, used in this file, and because of #ifdefs, not accessible outside it, so there is no current need to make them publicly available. If we were ever to need them to be accessible more widely, they would not belong in this file.
* locale.c: Meld two functions into oneKarl Williamson2022-10-101-4/+4
| | | | | | There is code in locale.c to emulate POSIX 'setlocale(foo, "")'. And there is separate code to emulate this on Windows. This commit collapses them, ensuring the same algorithm is used on both systems.
* locale.c: Move find_locale_from_environment() in fileKarl Williamson2022-10-101-1/+3
| | | | | This is in preparation for this function to be used under more circumstances.
* Add wrap_wsetlocale() to embed.fncKarl Williamson2022-10-101-0/+1
| | | | This makes the calls to it cleaner.
* Add pTHX to thread_locale_(init|term)Karl Williamson2022-09-301-2/+2
| | | | | A future commit will want the context for more than just DEBUGGING builds.
* locale.c: Revamp sync_locale(), switch_to_global_locale()Karl Williamson2022-09-291-2/+2
| | | | | | In reading this code, I realized that there were instances where the functions didn't work properly. It is hard to test these, but a future commit will do so.
* locale.c Change function to return a string, not printKarl Williamson2022-09-291-1/+1
| | | | | This makes some print statements less awkward, and is more flexible, which will be used in future commits
* locale.c: Stop compiler warningKarl Williamson2022-09-251-1/+9
| | | | | | | | S_less_dicey_bool_setlocale_r() is a short function that makes a complete set of similar functions, but there is no current use of it. So just #ifdef it out. This resolves #20338
* SvPVCLEAR_FRESH - change from macro to inline functionRichard Leach2022-09-221-0/+1
| | | | | This is to prevent warnings due to the char * frequently being unused.
* locale.c: Refactor internal debugging functionKarl Williamson2022-09-221-1/+1
| | | | | setlocale_debug_string() variants now use Perl_form, a function I didn't know existed when I originally wrote this code.
* locale.c: Mitigate unsafe threaded localesKarl Williamson2022-09-211-0/+11
| | | | | | | | | | | | | | | | This a new set of macros and functions to do locale changing and querying for platforms where perl is compiled with threads, but the platform doesn't have thread-safe locale handling. All it does is: 1) The return of setlocale() is always safely saved in a per-thread buffer, and 2) setlocale() is protected by a mutex from other threads which are using perl's locale functions. This isn't much, but it might be enough to get some programs to work on such platforms which rarely change or query the locale.
* locale.c: Silence compiler warning when no LC_COLLATEKarl Williamson2022-09-101-2/+6
| | | | | On Configurations without LC_COLLATE, various unused warnings were being generated.
* locale.c: Silence compiler warning when no LC_CTYPEKarl Williamson2022-09-101-2/+4
| | | | | On Configurations without LC_CTYPE, various unused warnings were being generated.
* locale.c: Silence compiler warning about S_mortalixzed_pv_copyKarl Williamson2022-09-101-1/+1
| | | | | This function is not used unless locales are enabled, so need not be defined unless that is true.
* locale.c: Silence compiler warning about S_new_numericKarl Williamson2022-09-101-1/+3
| | | | | This function is not used unless LC_NUMERIC is enabled, so need not be defined unless that is true.
* locale.c: Add S_get_LC_ALL display()Karl Williamson2022-09-081-6/+15
| | | | | | | This encapsulates a common paradigm, helpful for debugging It requires the calculate_LC_ALL to be additionally available when there is no LC_ALL.
* locale.c: Convert final use of S_category_name()Karl Williamson2022-09-021-1/+0
| | | | | The previous commit removed all but one use of this function, which is replaceable by an array lookup
* locale.c: Rmv no longer used code; UTF8ness cacheKarl Williamson2022-09-021-5/+0
| | | | | | | | | | | | | What these functions do has been subsumed by code introduced in previous commits, and in a more straight forward manner. Also removed in this commit is the cache of the knowing what locales are UTF-8 or not. This data is now cheaper to calculate when needed, and there is now a single entry cache, so I don't think the complexity warrants keeping it. It could be added back if necessary, split off from the remainder of this commit.
* Move utf8ness calc for $! into locale.c from mg.cKarl Williamson2022-09-021-1/+4
| | | | | | | | | | | locale.c has the infrastructure to handle this, so remove repeated logic. The removed code tried to discern better based on using script runs, but this actually doesn't help, so is removed. Since we're now using C99, we can remove the block that was previously needed, and now the code is properly indented, whereas before it wasn't
* Define print_bytes_for_locale() outside localeKarl Williamson2022-09-021-1/+1
| | | | A future commit will need this even when locales are not used.
* locale.c: Improve debugging for mem_collxfrm()Karl Williamson2022-09-011-1/+1
| | | | | | This prints out more information, better organized. It also moves up the info from -DLv to plain -DL
* Change internal function name to be standards compliantKarl Williamson2022-09-011-5/+5
| | | | | This is in preparation for working on it; the new name, mem_collxfrm_ is in compliance with the C Standard; the old was not.
* locale.c: Rmv S_set_numeric_radix()Karl Williamson2022-09-011-1/+0
| | | | | | | Previous commits have made this function much smaller, and its branches can be easily absorbed into the callers, with clearer code, and in fact removal of a redundant calculation of the locale's radix character, promised in a previous commit's message
* locale.c: Add utf8ness return param to my_langinfo_i()Karl Williamson2022-08-221-2/+2
| | | | | my_langinfo_i() now will additionally return the UTF-8ness of the returned string.
* Add my_strftime8()Karl Williamson2022-08-221-0/+1
| | | | | This is like plain my_strftime(), but additionally returns an indication of the UTF-8ness of the returned string
* locale.c: Collapse duplicate logic into one instanceKarl Williamson2022-08-221-2/+11
| | | | | | | | A previous commit move the logic for localeconv() into locale.c. This commit takes advantage of that to use it instead of repeating the logic. Notably, this commit removes the inconsistent duplicate logic that had been used to deal with the Windows broken localeconv() bug.
* Move POSIX::localeconv() logic to locale.cKarl Williamson2022-08-221-0/+8
| | | | | | | | | | | | The code currently in POSIX.xs is moved to locale.c, and reworked some to fit in that scheme, and the logic for the workaround for the Windows broken localeconv() is made more robust. This is in preparation for the next commit which will use this logic instead of (imperfectly) duplicating it. This also creates Perl_localeconv() for direct XS calls of this functionality.
* locale.c: Add fcn for UTF8ness determinationKarl Williamson2022-08-221-0/+1
| | | | | get_locale_string_utf8ness_i() will determine if the string it is passed in the locale it is passed is to be treated as UTF-8, or not.
* locale.c: Add is_locale_utf8()Karl Williamson2022-08-221-0/+1
| | | | | | | | | | | Previous commits have added the infrastructure to be able to determine if a locale is UTF-8. This will prove useful, and this commit adds a function to encapsulate this information, and uses it in a couple of places, with more to come in future commits. This uses as a final fallback, mbtowc(), supposed to be available in C99. Future commits will add heuristics when that function isn't available or is known to be unreliable on a particular system.
* New signature for static fcn my_langinfo()Karl Williamson2022-08-221-2/+2
| | | | | | | | | | This commit changes the calling sequence for my_langinfo to add the desired locale, and the locale category of the desired item. This allows the function to be able to return the desired value for any locale, avoiding some locale changes that would happen until this commit, and hiding the need for locale changes from outside functions, though a couple continue to do so to avoid potential multiple changes.
* locale.c: Add static fcn to analyze locale name codesetKarl Williamson2022-08-221-0/+1
| | | | | | | | It determines if the name indicates it is UTF-8 or not. There are several variant spellings in use, and this hides that from the the callers. It won't be actually used until the next commit
* locale.c: Make S_save_to_buffer() reentrantKarl Williamson2022-08-221-2/+2
| | | | | | | | | | | This makes my_langinfo() reentrant by adding parameters specifying where to store the result. This prepares for future commits, and fixes some minor bugs for XS writers, in that the claim was that the buffer in calling Perl_langinfo() was safe from getting zapped until the next call to it in the same thread. It turns out there were cases where, because of internal calls, the buffer did get zapped.
* embed.fnc: Also check for NL_LANGINFO_LKarl Williamson2022-08-221-2/+2
| | | | | | The preprocessor directives were only flooking for plain nl_langinfo(). It's quite unlikely that a platform will have the '_l' version without also having the plain one. But this makes sure.
* locale.c: Shorten my_nl_langinfo() to my_langinfo()Karl Williamson2022-08-221-2/+2
| | | | The extra syllable(s) are unnecessary noise