summaryrefslogtreecommitdiff
path: root/locale.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Switch libc per-interpreter data when tTHX changesKarl Williamson2022-10-181-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the previous commit, some library functions now keep per-thread state. So far the only ones we care about are libc locale-changing ones. When perl changes threads by swapping out tTHX, those library functions need to be informed about the new value so that they remain in sync with what perl thinks the locale should be. This commit creates a function to do this, and changes the thread-changing macros to also call this as part of the change. For POSIX 2008, the function just calls uselocale() using the per-interpreter object introduced previously. For Windows, this commit adds a per-interpreter string of the current LC_ALL, and the function calls setlocale on that. We keep the same string for POSIX 2008 implementations that lack querylocale(), so this commit just enables that variable on Windows as well. The code is already in place to free the memory the string occupies when done. The commit also creates a mechanism to skip this during thread destruction. A thread in its death throes doesn't need to have accurate locale information, and the information needed to map from thread to what libc needs to know gets destroyed as part of those throes, while relics of the thread remain. I couldn't find a way to accurately know if we are dealing with a relic or not, so the solution I adopted was to just not switch during destruction. This commit completes fixing #20155.
* Some locale operations need to be done in proper threadKarl Williamson2022-10-181-21/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a step in solving #20155 The POSIX 2008 locale API introduces per-thread locales. But the previous global locale system is retained, probably for backward compatibility. The POSIX 2008 interface causes memory to be malloc'd that needs to be freed. In order to do this, the caller must first stop using that memory, by switching to another locale. perl accomplishes this during termination by switching to the global locale, which is always available and doesn't need to be freed. Perl has long assumed that all that was needed to switch threads was to change out tTHX. That's because that structure was intended to hold all the information for a given thread. But it turns out that this doesn't work when some library independently holds information about the thread's state. And there are now some libraries that do that. What was happening in this case was that perl thought that it was sufficient to switch tTHX to change to a different thread in order to do the freeing of memory, and then used the POSIX 2008 function to change to the global locale so that the memory could be safely freed. But the POSIX 2008 function doesn't care about tTHX, and actually was typically operating on a different thread, and so changed that thread to the global locale instead of the intended thread. Often that was the top-level thread, thread 0. That caused whatever thread it was to no longer be in the expected locale, and to no longer be thread-safe with regards to localess, This commit causes locale_term(), which has always been called from the actual terminating thread that POSIX 2008 knows about, to change to the global thread and free the memory. It also creates a new per-interpreter variable that effectively maps the tTHX thread to the associated POSIX 2008 memory. During perl_destruct(), it frees the memory this variable points to, instead of blindly assuming the memory to free is the current tTHX thread's. This fixes the symptoms associtated with #20155, but doesn't solve the whole problem. In general, a library that has independent thread status needs to be updated to the new thread when Perl changes threads using tTHX. Future commits will do this.
* locale.c: Do uselocale() earlier in init processKarl Williamson2022-10-181-0/+5
| | | | | This prevents some unnecessary steps, that the next commit would turn into memory leaks.
* locale.c: Compile display fcn under more circumstancesKarl Williamson2022-10-181-2/+2
| | | | | This is in preparation for it to be used in more instances in future commits. It uses a symbol that won't be defined until those commits.
* locale: Create special variable to hold current LC_ALLKarl Williamson2022-10-181-24/+24
| | | | | | | | | | | | Some configurations require us to store the current locale for each category. Prior to this commit, this was done in the array PL_curlocales, with the entry for LC_ALL being in the highest element. Future commits will need just the value for LC_ALL in some other configurations, without needing the rest of the array. This commit splits off the LC_ALL element into its own per-interpreter variable to accommodate those. It always had to have special handling anyway beyond the rest of the array elements,
* locale.c: Silence unused var warning on freebsdKarl Williamson2022-10-151-6/+1
| | | | | | This just moves some code out of #ifdefs so that the compiler sees it, decides it is always false, and almost certainly won't generate any code for it, but stops warning.
* locale.c: Fix Debug statement on netbsdKarl Williamson2022-10-131-2/+2
| | | | | | Other platforms declare the nl_item typedef an int, but this one makes it a long. To portably output its value, cast it to a long and use the %ld format.
* Use `LINE_Tf` for formatting line numbersTAKAI Kousuke2022-10-131-2/+2
|
* locale.c: Add explicit (line_t) cast to silence DEBUGGING build warningsTAKAI Kousuke2022-10-131-2/+7
| | | | | | | | | | Warning fixed: locale.c:130:55: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘int’ [-Wformat=] 130 | dSAVE_ERRNO; dTHX; PerlIO_printf(Perl_debug_log, "\n%s: %" LINE_Tf ": ", \ | ^~~~~~~~~ This warning might be only on 32-bit build.
* locale.c: Add comments/white space; slight tidyingKarl Williamson2022-10-101-6/+9
| | | | | | C99 allows declarations to be closer to their first use. This also removes a redundant conditional that would set a variable to what it already was initialized to.
* locale.c: Make win32_setlocale return const *Karl Williamson2022-10-101-2/+2
| | | | | Add a bit of safety, and makes it correspond to the other setlocale returns we use.
* Add some const to wrap_wsetlocaleKarl Williamson2022-10-101-9/+8
| | | | And move declarations closer to first use as allowed in C99
* locale.c: Generalize static functionsKarl Williamson2022-10-101-10/+14
| | | | | | | | | This changes these functions to take the code page as input, instead of being just UTF-8. Macros are created to call them with UTF-8. I'm doing this because there is no loss of efficiency, and it is somewhat jarring, given Perl terminology, to call a function with 'Byte' in the name with a parameter with 'utf8' in the name.
* locale.c: Make static 2 Win-only functionsKarl Williamson2022-10-101-2/+2
| | | | | | | These are non-API, used in this file, and because of #ifdefs, not accessible outside it, so there is no current need to make them publicly available. If we were ever to need them to be accessible more widely, they would not belong in this file.
* locale.c: Remove unused cpp alternativesKarl Williamson2022-10-101-10/+0
| | | | | | The wide setlocale function in Windows has been in the field since 5.32, long enough, that we won't be forced to discontinue its use. So can remove the never-used overrides, cleaning it up slightly
* locale.c: Windows special case NULL input firstKarl Williamson2022-10-101-1/+5
| | | | | | This gets the trivial case out of the way, and can use plain setlocale, as the locale string is non-existent, so doesn't need to handle different character sets.
* locale.c: Allow function to have side effectsKarl Williamson2022-10-101-17/+7
| | | | | | | The previous commit changed find_locale_from_environment() to work on Windows, and took care to not make the function have side effects. But in the only use of this function so far (and likely forever), those side effects are fine. Changing to allow them simplifies things.
* locale.c: Meld two functions into oneKarl Williamson2022-10-101-92/+73
| | | | | | There is code in locale.c to emulate POSIX 'setlocale(foo, "")'. And there is separate code to emulate this on Windows. This commit collapses them, ensuring the same algorithm is used on both systems.
* locale.c: Refactor S_find_locale_from_environment()Karl Williamson2022-10-101-25/+38
| | | | | | This changes this function a bit to make the next commit easier, which will extend the function to being usable from Windows. This also moves declarations closer to first use, as now allowed in C99.
* locale.c: Move find_locale_from_environment() in fileKarl Williamson2022-10-101-80/+79
| | | | | This is in preparation for this function to be used under more circumstances.
* Add wrap_wsetlocale() to embed.fncKarl Williamson2022-10-101-3/+5
| | | | This makes the calls to it cleaner.
* Allow non-ASCII locale namesKarl Williamson2022-10-071-4/+5
| | | | | | | | | | | Locale names are supposed to be opaque to the calling program. The only requirement is that any name output by libc means the same as input to that libc. And it makes sense, you might very well want to have a locale name in your native language. This commit changes locale.c to not impose any restrictions on the name proper. (It should be noted, however, other Standards have come along that specify a particular syntax using only ASCII. Perl needn't, and shouldn't, impose those further restrictions.)
* locale.c: Display thread in DEBUG statementsKarl Williamson2022-10-061-4/+12
| | | | This makes it easier to understand what's going on in threaded perls.
* locale.c: Fix syntax error (only when no LC_ALL)Karl Williamson2022-10-021-1/+1
|
* locale.c: Remove obsolete, unused labelKarl Williamson2022-10-011-7/+0
|
* Add pTHX to thread_locale_(init|term)Karl Williamson2022-09-301-4/+2
| | | | | A future commit will want the context for more than just DEBUGGING builds.
* locale.c: Revamp sync_locale(), switch_to_global_locale()Karl Williamson2022-09-291-106/+146
| | | | | | In reading this code, I realized that there were instances where the functions didn't work properly. It is hard to test these, but a future commit will do so.
* locale.c Change function to return a string, not printKarl Williamson2022-09-291-26/+44
| | | | | This makes some print statements less awkward, and is more flexible, which will be used in future commits
* locale.c: Save output of emulate_setlocale in bufferKarl Williamson2022-09-291-3/+13
| | | | | | | Depending on Configuration and platform and details of the current request, the value returned could be pointing to a system static buffer, or be a temporary freeable upon LEAVE. This commit standardizes it to a known per-interpreter buffer that can be properly freed at termination.
* locale.c: Teach save_to_buffer to handle self paramKarl Williamson2022-09-291-4/+7
| | | | | | | | This function is called to save a string to a buffer. Teach it to treat as a no-op the string passed being the buffer itself. This generalizes it to make it work properly under more circumstances; the commit also removes the current case where the function call was explicitly avoided under this circumstance.
* locale.c: Use synonym name for clarityKarl Williamson2022-09-281-1/+1
| | | | | | At this point we have two variables which we just set equal. Change here to use the synonym that doesn't require looking elsewhere to understand what's going on.
* ensure curlocales[] is initializedTony Cook2022-09-261-0/+4
| | | | | | | | | Coverity complains that the call to Safefree() now at line 5098 could be called with an uninitialized value for curlocales[i], which appears to be possible if the trial_locales loop just below this change fails to find a locale. Fixes CID 184451
* locale.c: Stop compiler warningKarl Williamson2022-09-251-0/+3
| | | | | | | | S_less_dicey_bool_setlocale_r() is a short function that makes a complete set of similar functions, but there is no current use of it. So just #ifdef it out. This resolves #20338
* locale.c: Refactor internal debugging functionKarl Williamson2022-09-221-24/+36
| | | | | setlocale_debug_string() variants now use Perl_form, a function I didn't know existed when I originally wrote this code.
* locale.c: Mitigate unsafe threaded localesKarl Williamson2022-09-211-2/+109
| | | | | | | | | | | | | | | | This a new set of macros and functions to do locale changing and querying for platforms where perl is compiled with threads, but the platform doesn't have thread-safe locale handling. All it does is: 1) The return of setlocale() is always safely saved in a per-thread buffer, and 2) setlocale() is protected by a mutex from other threads which are using perl's locale functions. This isn't much, but it might be enough to get some programs to work on such platforms which rarely change or query the locale.
* Add POSIX_SETLOCALE_LOCK/UNLOCKKarl Williamson2022-09-211-2/+7
| | | | | | This macro is used to surround raw setlocale() calls so that the return value in a global static buffer can be saved without interference with other threads.
* locale.c: Workaround for attributes.pm breakageKarl Williamson2022-09-201-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | See https://github.com/Perl/perl5/issues/20155 The root cause of that problem is that under POSIX 2008, when a thread terminates, it causes thread 0 (the controller) to change to the global locale. Commit a7ff7ac caused perl to pay attention to the environment variables in effect at startup for setting the global locale when using the POSIX 2008 locale API. (Previously only the initial per-thread locale was affected.) This causes problems when the initial setting was for a locale that uses a comma as the radix character, but the thread 0 is set to a locale that is expecting a dot as a radix character. Whenever another thread terminates, thread 0 was silently changed to using the global locake, and hence a comma. This caused parse errors. The real solution is to fix thread 0 to remain in its chosen locale. But that fix is not ready in time for 5.37.4, and it is deemed important to get something working for this monthly development release. This commit changes the initial global LC_NUMERIC locale to always be C, hence uses a dot radix. The vast majority of code is expecting a dot. This is not the ultimate fix, but it works around the immediate problem at hand. The test case is courtesy @bram-perl
* locale.c: Use LC_ALL only if available on platformKarl Williamson2022-09-121-5/+13
| | | | | These are the final (unless I missed something) cases where LC_ALL could be referred to even if undefined on the system.
* Win32 should start new threads in C localeKarl Williamson2022-09-121-2/+1
| | | | Prior to this commit, wrong cpp directives did not guarantee this.
* locale.c: Clean up two DEBUG stmtsKarl Williamson2022-09-101-8/+4
| | | | | STMT_START...END aren't required in DEBUG() calls, as that macro already wraps its argument with those. So, they are just clutter here.
* Move #include from locale.c to perl.hKarl Williamson2022-09-101-1/+0
| | | | | | | | | Without this commit, Perl won't compile if -DUSE_NL_LOCALE_NAME is specified to Configure. This is an undocumented feature that uses an undocumented glibc feature that is effectively the querylocale() found on Darwin and some other systems. POSIX 2017 has added a querylocale-like function to the repertoire, and should eventually supplant this option.
* locale.c: Remove no-longer necessary conditionalsKarl Williamson2022-09-101-2/+2
| | | | | The previous commit initialized this variable early in start up, so that we never have to now check that it is non-NULL.
* locale.c: Use locale change subs at initializationKarl Williamson2022-09-101-11/+21
| | | | | | | | | | | | | | | | There are currently 4 functions that do special handling when the locale for their respective categories changes. One of these is for LC_ALL, which has and continues to be called at the end of initialization. But the other three have changed in recent commits to handle the trivial case specially of the locale being "C". These changes now avoid the complexities required for the general case (that needs everything to be set up at the time of the call). They can thus be called early in the initialization precess. This avoids having to duplicate their logics in the initialization code, which has led to some things being overlooked there. Now everything is guaranteed to stay in sync.
* locale.c: Initialize PL_underlying_numeric_objKarl Williamson2022-09-101-0/+6
| | | | | This probably doesn't matter, but it's better form to initialize it to a sane value.
* locale.c More new_ctype() refactoringKarl Williamson2022-09-101-28/+21
| | | | | | | | | | | | | | | Merge commit e4bbbfe02b9e9aae521b164eba0e518ca478945f refactored this function some. Most of the commits in that series dated to before when we could assume C99. In re-reading the result, I saw some opportunities to take advantage of C99, by, for example, moving declarations closer to their use. I also hadn't previously noticed that when changing to the C locale (a frequent occurrence), various things that we being recalculated are determinable at compile time. So this commit returns early under this circumstance. And, an obsolete comment is removed
* locale.c: Silence compiler warning when no LC_COLLATEKarl Williamson2022-09-101-8/+3
| | | | | On Configurations without LC_COLLATE, various unused warnings were being generated.
* locale.c: Silence compiler warning when no LC_CTYPEKarl Williamson2022-09-101-13/+11
| | | | | On Configurations without LC_CTYPE, various unused warnings were being generated.
* locale.c: Silence compiler warning about S_mortalixzed_pv_copyKarl Williamson2022-09-101-0/+3
| | | | | This function is not used unless locales are enabled, so need not be defined unless that is true.
* locale.c: Silence compiler warning about S_new_numericKarl Williamson2022-09-101-9/+3
| | | | | This function is not used unless LC_NUMERIC is enabled, so need not be defined unless that is true.
* locale.c: Silence C_codeset compiler warningKarl Williamson2022-09-101-1/+2
| | | | | | | | | | This fixes #20140 This static variable is used in just one or (unlikely) two places, and only in some Configureations. Rather than add #ifdefs, or make a PERL_UNUSED call somewhere, making it a #define fixes the issue without taking up extra memory except in some dumb compilers under unlikely Configurations.