summaryrefslogtreecommitdiff
path: root/perl.h
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2022-11-28 05:37:04 -0700
committerKarl Williamson <khw@cpan.org>2022-12-07 09:13:37 -0700
commit6af0187d84aff4f4fd1b8c5039418e86d16cfafd (patch)
treea2dea82a5d6c4ac518def2cd29a0d4b744ac7742 /perl.h
parent2fc8f382932e3d44b386b00160160e845822903f (diff)
downloadperl-6af0187d84aff4f4fd1b8c5039418e86d16cfafd.tar.gz
locale.c: Rewrite localeconv() handling
localeconv() returns a structure contaiing fields that are associated with two different categories: LC_NUMERIC and LC_MONETARY. Perl via POSIX::localeconv() reutrns a hash containing all the fields. Testing on Windows showed that if LC_CTYPE is not the same locale as LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for the numeric ones, mojibake can result. The solution to similar situations elsewhere in the code is to toggle LC_CTYPE into being the same locale as the one for the returned fields. But those situations only have a single locale that LC_CTYPE has to match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or the other, not both at the same time. The previous implementation did not consider this possibility, and wasn't easily changeable to work. Therefore, this rewrites a bunch of it. The solution used is to call localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale don't match (with LC_CTYPE toggled to the corresponding one each time). (Only one call is made if the two categories have the same locale.) This one vs two complicated the code, but I thought it was worth it given that the one call is the most likely case. Another complication is that on platforms that lack nl_langinfo(), (Windows, for example), localeconv() is used to emulate portions of it. Previously there was a separate function to handle this, using an SV() cast as an HV() to avoid using a hash that wasn't actually necessary. That proved to lead to extra duplicated code under the new scheme, so that function was collapsed into a single one and a real hash is used in all circumstances, but is only populated with the one or two fields needed for the emulation. The only part of this commit that I thought could be split off from the rest concerns the fact that localeconv()'s return is not thread-safe, and so must be copied to a safe place (the hash) while in a critical section, locking out all other threads. Before this commit, that copying was accompanied by determining if each string field needed to be marked as UTF-8. That determination isn't necessarily trivial, so should really not be in the critical section. This commit does that. And, with some effort, that part could have been split into a separate commit. but I didn't think it was worth the effort.
Diffstat (limited to 'perl.h')
-rw-r--r--perl.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/perl.h b/perl.h
index 7d96132dd1..19ccb1cd5c 100644
--- a/perl.h
+++ b/perl.h
@@ -1330,6 +1330,12 @@ typedef enum { /* Is the locale UTF8? */
LOCALE_UTF8NESS_UNKNOWN
} locale_utf8ness_t;
+typedef struct {
+ const char *name;
+ size_t offset;
+} lconv_offset_t;
+
+
#endif
#include <setjmp.h>