diff options
author | Karl Williamson <khw@cpan.org> | 2022-11-28 05:37:04 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2022-12-07 09:13:37 -0700 |
commit | 6af0187d84aff4f4fd1b8c5039418e86d16cfafd (patch) | |
tree | a2dea82a5d6c4ac518def2cd29a0d4b744ac7742 /perl.h | |
parent | 2fc8f382932e3d44b386b00160160e845822903f (diff) | |
download | perl-6af0187d84aff4f4fd1b8c5039418e86d16cfafd.tar.gz |
locale.c: Rewrite localeconv() handling
localeconv() returns a structure contaiing fields that are associated
with two different categories: LC_NUMERIC and LC_MONETARY. Perl via
POSIX::localeconv() reutrns a hash containing all the fields.
Testing on Windows showed that if LC_CTYPE is not the same locale as
LC_MONETARY for the monetary fields, or isn't the same as LC_NUMERIC for
the numeric ones, mojibake can result.
The solution to similar situations elsewhere in the code is to toggle
LC_CTYPE into being the same locale as the one for the returned fields.
But those situations only have a single locale that LC_CTYPE has to
match, so it doesn't work here when LC_NUMERIC and LC_MONETARY are
different locales. Unlike Schrödinger's cat, LC_CTYPE has to be one or
the other, not both at the same time.
The previous implementation did not consider this possibility, and
wasn't easily changeable to work.
Therefore, this rewrites a bunch of it. The solution used is to call
localeconv() twice when the LC_NUMERIC locale and the LC_MONETARY locale
don't match (with LC_CTYPE toggled to the corresponding one each time).
(Only one call is made if the two categories have the same locale.)
This one vs two complicated the code, but I thought it was worth it
given that the one call is the most likely case.
Another complication is that on platforms that lack nl_langinfo(),
(Windows, for example), localeconv() is used to emulate portions of it.
Previously there was a separate function to handle this, using an SV()
cast as an HV() to avoid using a hash that wasn't actually necessary.
That proved to lead to extra duplicated code under the new scheme, so
that function was collapsed into a single one and a real hash is used in
all circumstances, but is only populated with the one or two fields
needed for the emulation.
The only part of this commit that I thought could be split off from the
rest concerns the fact that localeconv()'s return is not thread-safe,
and so must be copied to a safe place (the hash) while in a critical
section, locking out all other threads. Before this commit, that
copying was accompanied by determining if each string field needed to be
marked as UTF-8. That determination isn't necessarily trivial, so
should really not be in the critical section. This commit does that.
And, with some effort, that part could have been split into a separate
commit. but I didn't think it was worth the effort.
Diffstat (limited to 'perl.h')
-rw-r--r-- | perl.h | 6 |
1 files changed, 6 insertions, 0 deletions
@@ -1330,6 +1330,12 @@ typedef enum { /* Is the locale UTF8? */ LOCALE_UTF8NESS_UNKNOWN } locale_utf8ness_t; +typedef struct { + const char *name; + size_t offset; +} lconv_offset_t; + + #endif #include <setjmp.h> |