summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-03-12 12:24:04 -0600
committerKarl Williamson <khw@cpan.org>2018-03-12 12:47:18 -0600
commit9487427ba26d65e7adf5954069fc2fde3bdedf41 (patch)
tree61eb06214fe38b3ff158e2c52e3b3464444a4947
parent02916c24e049d202108ea97c1e420790acb6090a (diff)
downloadperl-9487427ba26d65e7adf5954069fc2fde3bdedf41.tar.gz
Fix comments/pod for LC_NUMERIC not always C
In recent Perl versions, the underlying locale for LC_NUMERIC has been kept in C because XS code is expecting a dot radix character. But if the LC_NUMERIC locale has a dot, that is unnecessary. (There is also the thousands grouping separator which for safety we verify is empty.) Thus 5.27 doesn't always keep the underlying locale in C; it does so only if necessary. This commit updates various comments and pods to reflect this change.
-rw-r--r--locale.c30
-rw-r--r--perl.h32
2 files changed, 35 insertions, 27 deletions
diff --git a/locale.c b/locale.c
index 6a4e012954..d907e37c3d 100644
--- a/locale.c
+++ b/locale.c
@@ -2081,9 +2081,13 @@ S_win32_setlocale(pTHX_ int category, const char* locale)
This is an (almost) drop-in replacement for the system L<C<setlocale(3)>>,
taking the same parameters, and returning the same information, except that it
-returns the correct underlying C<LC_NUMERIC> locale, instead of C<C> always, as
-perl keeps that locale category as C<C>, changing it briefly during the
-operations where the underlying one is required.
+returns the correct underlying C<LC_NUMERIC> locale. Regular C<setlocale> will
+instead return C<C> if the underlying locale has a non-dot decimal point
+character, or a non-empty thousands separator for displaying floating point
+numbers. This is because perl keeps that locale category such that it has a
+dot and empty separator, changing the locale briefly during the operations
+where the underlying one is required. C<Perl_setlocale> knows about this, and
+compensates; regular C<setlocale> doesn't.
Another reason it isn't completely a drop-in replacement is that it is
declared to return S<C<const char *>>, whereas the system setlocale omits the
@@ -2123,8 +2127,9 @@ Perl_setlocale(const int category, const char * locale)
/* A NULL locale means only query what the current one is. We have the
* LC_NUMERIC name saved, because we are normally switched into the C
- * locale for it. For an LC_ALL query, switch back to get the correct
- * results. All other categories don't require special handling */
+ * (or equivalent) locale for it. For an LC_ALL query, switch back to get
+ * the correct results. All other categories don't require special
+ * handling */
if (locale == NULL) {
if (category == LC_NUMERIC) {
@@ -2291,13 +2296,14 @@ rather than getting segfaults at runtime.
It delivers the correct results for the C<RADIXCHAR> and C<THOUSEP> items,
without you having to write extra code. The reason for the extra code would be
because these are from the C<LC_NUMERIC> locale category, which is normally
-kept set to the C locale by Perl, no matter what the underlying locale is
-supposed to be, and so to get the expected results, you have to temporarily
-toggle into the underlying locale, and later toggle back. (You could use plain
-C<nl_langinfo> and C<L</STORE_LC_NUMERIC_FORCE_TO_UNDERLYING>> for this but
-then you wouldn't get the other advantages of C<Perl_langinfo()>; not keeping
-C<LC_NUMERIC> in the C locale would break a lot of CPAN, which is expecting the
-radix (decimal point) character to be a dot.)
+kept set by Perl so that the radix is a dot, and the separator is the empty
+string, no matter what the underlying locale is supposed to be, and so to get
+the expected results, you have to temporarily toggle into the underlying
+locale, and later toggle back. (You could use plain C<nl_langinfo> and
+C<L</STORE_LC_NUMERIC_FORCE_TO_UNDERLYING>> for this but then you wouldn't get
+the other advantages of C<Perl_langinfo()>; not keeping C<LC_NUMERIC> in the C
+(or equivalent) locale would break a lot of CPAN, which is expecting the radix
+(decimal point) character to be a dot.)
=item *
diff --git a/perl.h b/perl.h
index 5462b4793e..e76b9b8f32 100644
--- a/perl.h
+++ b/perl.h
@@ -5807,7 +5807,10 @@ typedef struct am_table_short AMTS;
#ifdef USE_LOCALE_NUMERIC
/* These macros are for toggling between the underlying locale (UNDERLYING or
- * LOCAL) and the C locale (STANDARD).
+ * LOCAL) and the C locale (STANDARD). (Actually we don't have to use the C
+ * locale if the underlying locale is indistinguishable from it in the numeric
+ * operations used by Perl, namely the decimal point, and even the thousands
+ * separator.)
=head1 Locale-related functions and macros
@@ -5851,10 +5854,11 @@ close by, and guaranteed to be called.
=for apidoc Am|void|STORE_LC_NUMERIC_SET_TO_NEEDED
-This is used to help wrap XS or C code that that is C<LC_NUMERIC> locale-aware.
-This locale category is generally kept set to the C locale by Perl for
-backwards compatibility, and because most XS code that reads floating point
-values can cope only with the decimal radix character being a dot.
+This is used to help wrap XS or C code that is C<LC_NUMERIC> locale-aware.
+This locale category is generally kept set to a locale where the decimal radix
+character is a dot, and the separator between groups of digits is empty. This
+is because most XS code that reads floating point numbers is expecting them to
+have this syntax.
This macro makes sure the current C<LC_NUMERIC> state is set properly, to be
aware of locale if the call to the XS or C code from the Perl program is
@@ -5906,16 +5910,14 @@ expression, but with an empty argument list, like this:
*/
-/* The numeric locale is generally kept in the C locale instead of the
- * underlying locale. The current status is known by looking at two words.
- * One is non-zero if the current numeric locale is the standard C/POSIX one.
- * The other is non-zero if the current locale is the underlying locale. Both
- * can be non-zero if, as often happens, the underlying locale is C.
- *
- * Its slightly more complicated than this, as the PL_numeric_standard variable
- * is set if the current numeric locale is indistinguishable from the C locale.
- * This happens when the radix character is a dot, and the thousands separator
- * is the empty string.
+/* If the underlying numeric locale has a non-dot decimal point or has a
+ * non-empty floating point thousands separator, the current locale is instead
+ * generally kept in the C locale instead of that underlying locale. The
+ * current status is known by looking at two words. One is non-zero if the
+ * current numeric locale is the standard C/POSIX one or is indistinguishable
+ * from C. The other is non-zero if the current locale is the underlying
+ * locale. Both can be non-zero if, as often happens, the underlying locale is
+ * C or indistinguishable from it.
*
* khw believes the reason for the variables instead of the bits in a single
* word is to avoid having to have masking instructions. */