| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This is a problem on Darwin due to a bug there. MB_CUR_MAX, according
to Tony Cook, is supposed to be an unsigned value according to the C99
standard, and it is in Linux. But Darwin declares it to be signed, even
though the minimum value it can reach is +1. Maybe other systems have
the same defect. But there is a simple fix, just cast it to unsigned.
|
|
|
|
|
|
| |
The previous commit calculates this and placed the result in a header
file. This now uses the calculated value instead of a hard-coded "4",
which is incorrect on EBCDIC platforms.
|
|
|
|
| |
Replace this number by an already existing mnemonic
|
|
|
|
| |
Instead of a switch() statement we can use 'foo ? bar : baz;'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 7aaa36b196e5a478a3d1bd32506797db7cebf0b2 changed to use
strerror_l() if available on the platform. But there is a potential bug
with this on threaded perls. The code uses strerror_l() when it needs
the answer on a locale that isn't necessarily the current one. But it
uses plain strerror() when the locale is known to be the current one.
Plain strerror() isn't necessarily thread-safe. However, on systems
that have strerror_r(), reentr.h has caused our apparent call to plain
strerror() to instead call the thread-safe strerror_r() under the hood.
So there is no bug on unthreaded perls nor on ones that have
strerror_r().
This commit fixes the bug on threaded builds which have strerror_l() but
not strerror_r(). It does this by using strerror_l() for everything,
and constructing a locale object that is the current locale to use when
the locale doesn't need to be changed. This is somewhat more work than
the alternative above does, so that one is used if available.
No changes are made to how it works on systems that don't have
strerror_l().
Some systems have deprecated strerror_r(). reentr.h does not use it on
such systems. The reason for the deprecation, we would hope, may be
that the plain strerror() is implemented thread-safely. We don't know
that, so we just assume that the plain version is thread-unsafe.
We do have tests that try to find races here, but they haven't shown
any. It could be that systems that are advanced enough to have
strerror_l() also have strerror_r().
|
|
|
|
| |
This is in prep for a future commit which needs it earlier
|
|
|
|
|
|
|
|
| |
The previous commit added arrays of locale categories. This commit
creates compile-time mappings from the category number to the index it
has in the array. It also changes to use the #define for the index of
LC_ALL in places it is expected to be defined. This causes bugs in this
logic to be found at compile time on systems that don't have LC_ALL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
locale.c is full of compiler conditionals because platforms vary widely
(or have in the past) in what categories they use. Prior to this
commit, there were many sections of code which had copies of the same
constructs which were #ifdef'd so they'd run only on the categories that
are to be used in this build.
This duplication creates the opportunity for changes to get applied to
only some of the places that they should, and also makes it hard to
read.
This commit adds two parallel arrays that can map a category to/from its
name, and are defined with each element conditionally compiled in based
on the needs of the build. Doing the conditionals during array
construction means that most of the other conditionals can be replaced
by looping through the arrays. Thus the duplicated code is eliminated,
as well as almost 200 lines in this file.
Most of these loops get executed only at process initialization, so the
slight performance hit is inconsequential.
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed this flaw by code reading; I doubt that it's exploitable.
foldEQ assumes that both operands are at least as long as its length
parameter. In this case, it's possible that the codeset returned by
nl_langinfo is shorter than 5, in which case, it would try to access the
extra characters in the heap. Real codesets tend to be longer than
this, so an attacker would likely have to install a locale with a
made-up codeset whose name is shorter.
Even the C locale is longer: "ANSI_X3.4-1968"
|
| |
|
|
|
|
|
| |
This makes savepv() part of the expressions instead of a separate
statement.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is trying to determine if the locale is UTF-8. The easiest way to
tell is if the codeset returned by nl_langinfo says UTF-8, but if that
fails or nl_langinfo() is not present on the system, a fallback method
is to use the libc routines to convert a known byte string to code point
and see if that matches the expected Unicode code point. Prior to this
patch, the byte string representing HYPHEN was used. That's probably
good enough, but we can do better with no extra work. This commit
changes to use the REPLACEMENT CHARACTER instead. That is a Unicode
concept. The chances of a non-UTF-8 locale taking the UTF-8 byte string
for the REPLACEMENT and evaluating to REPLACEMENT are vanishingly small.
|
|
|
|
|
|
| |
This is done only when debugging, but in some locales that have shift
states, the extra call could blow up. Instead save the result of the
mbtowc() call we care about.
|
|
|
|
|
|
|
|
| |
This adds STRLENs() where the argument must be a literal string
constant.
This may deserve wider applicability, but in case it doesn't, I'm making
it local to just this file.
|
|
|
|
|
|
|
| |
This comment contains a list of code points that are unusual, but it
also included ones that are standard, which made me keep looking to see
why they were unusual, each time realizing in the end that they were
not.
|
|
|
|
|
| |
Following on the previous commit, this changes the name of the function
that changes the variable to be in sync with it.
|
|
|
|
|
|
| |
The real purpose of this internal variable is to give the name of the
locale that is the underlying one for the C program. Various macros
already indicate that. This furthers the process.
|
|
|
|
|
| |
This code is full of 'if's interrupted by #ifdefs, which makes it hard
to read. Changing it to a switch() makes it much easier to understand.
|
| |
|
|
|
|
|
|
| |
This was changing to use the locale's radix, but this is unnecessary for
the later things in this function, and those change things to use dot,
so this call is useless.
|
|
|
|
|
|
|
| |
This converts the final plain nl_langinfo() function call in locale.c to
use the new equivalent that is more thread safe, and you don't have to
free the returned memory. There was an unlikely leak before this, if
the return was somehow "".
|
|
|
|
|
|
|
|
| |
The extra '!' that snuck in there caused this code to not work properly.
Fortunately, it doesn't get used except as a last resort, and that
apparently hasn't happened so as to have gotten reported from the field.
A test can't be added because it would only occur on a system that had
bad locales.
|
|
|
|
|
| |
This standardizes things to make things easier to understand and prepare
for future commits
|
|
|
|
| |
This will be useful in future commits
|
|
|
|
| |
The new name more closely reflects what it does
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a parameter to the function that sets the radix character for
floating point numbers. We know that the radix by default is a dot, so
no need to calculate it in that case.
This code was previously using localeconv() to find the locale's decimal
point. The just added my_nl_langinfo() fcn does the same with an easier
API, and is more thread safe, and automatically switches to use
localeconv() when n nl_langinfo() isn't available, so revise the
conditional compilation directives that previously were necessary, and
collapse directives that were unnecessarily nested.
And adjust indentation
|
|
|
|
|
|
|
|
| |
This extended version allows it to be called so that it uses the current
locale for the LC_NUMERIC, instead of toggling to the underlying one.
(This can be useful when in the middle of things.)
This ability won't be used until the next commit
|
|
|
|
|
| |
This function is called as part of the call made in the line before. No
need to do it twice.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file is full of conditional compilation, due to the fact that
locale support has been highly variable in the OSes Perl has operated
on. This commit properly indents nested compiler directives, and makes
sure there is a blank line between the directives and real code. I find
that much easier to read. It also re-orders some
#ifdef some_feature
Many lines of code handling feature
#else
1 to 3 lines of trivial code to avoid compilation warnings
#endif
to
#ifndef some_feature
1 to 3 lines of trivial code to avoid compilation warnings
#else
Many lines of code handling feature
#endif
Otherwise the trivial code may be hundreds of lines from the original
'#if', which makes it hard to grok.
This commit also clarifies and fixes typos in comments, and removes some
obsolete comments.
|
|
|
|
|
|
|
| |
Things like LC_CTYPE are locale variables, but not LC_ctype nor
LC__CTYPE. Prior to this commit all were treated as locale variables.
Many platforms have more locale variables than Perl knows about, e.g.,
LC_PAPER, and the code tries to catch all possibilities.
|
|
|
|
|
| |
Align vertically, and indent blocks to standard. It adds braces for
clarity.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The original names are confusing.
See thread beginning with
http://nntp.perl.org/group/perl.perl5.porters/244335
The two macros are mapped into just that one, complementing the result
for the few cases where strNEs was used.
|
|
|
|
| |
This function allows us to avoid using a mutex and changing the locale.
|
|
|
|
|
|
|
|
|
|
| |
Perl_langinfo() is supposed to return a pointer to internal storage that
is supposed to remain valid until the next call to it. That should come
automatically on single-threaded perls. The previous version took
advantage of this to avoid copying the result to a buffer, and just
called plain nl_langinfo(). However, it turns out that some systems
destroy the internal space also when a setlocale() is done. That means
the result must be copied in all instances.
|
|
|
|
|
| |
It's unclear why the code uses this number, so expand out the expression
that yields that, which makes it clearer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code I wrote was attempting to avoid multiple calls to strlen in
constructing the catenation of various components of a string. It did
this by keeping track of how far it got each iteration, and using that
as a starting point for the next. I now realize that the return value
of strlcat is as if it succeeds, even if there isn't enough room. That
means that if there were a problem, this could start out an iteration
such that it would be writing beyond the end of the buffer. It is safer
to not do this, so this commit removes it.
The use of strlcat is a safety measure, as there should be a sufficient
amount of space calculated for things to fit, so there is no bug here.
But one should be safe.
|
|
|
|
| |
This is the better way to do this.
|
|
|
|
|
|
| |
This is designed to generally replace nl_langinfo() in XS code. It is
thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be
used on systems lacking nl_langinfo.
|
|
|
|
|
|
| |
strerror_l makes the my_strerror function trivial, as it doesn't have to
worry about critical sections, etc. Even on unthreaded perls, it avoids
having to change the current locale, and then change it back.
|
|
|
|
|
|
|
|
|
|
| |
This moves all the handling of the case where there are no locale
messages, instead of splitting it up across long stretches of
conditionally compiled code. This code is essentially trivial, and seen
to be so when it isn't split up; this prepares for the next commit.
The final return of the function is still split off so that all branches
go through it, and the debugging code adjacent to it.
|
|
|
|
| |
This is moved so it gets executed for all branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the controlling #define for using the POSIX 2008 locale
functions to "USE_POSIX_2008_LOCALE". The previous controlling name
"USE_THREAD_SAFE_LOCALE" is retained for backward compatibility.
The reason for this change is that we may add thread-safe locale
handling even on platforms that don't have Posix 2008, so the name
USE_THREAD_SAFE_LOCALE would be used for controlling things in that
situation.
In other words, the concepts may become distinct, and so prepare for
that.
|
|
|
|
|
| |
This cleans up the interface, as it allows several functions to now be
static that used to have to be called from outside locale.c
|
|
|
|
|
| |
I pushed the previous commit without actually amending it to include
this
|
|
|
|
|
|
|
| |
These debug statements have proven useful in the past tracking down
problems. I looked them over and kept the ones that I though might be
useful in the future. This includes extracting some code into a
static function so it can be called from more than one place.
|
|
|
|
|
|
| |
(this is debugging-only code)
It was trying to printf a U32 using %u
|
|
|
|
|
|
| |
I found myself needing this function for development debugging, which
formerly was only usable from utf8.c. This enhances it to allow a
second format type, and makes it core-accessible.
|
|
|
|
|
|
| |
An array was being declared and initialized from a non-constant.
Spotted by James Keenan
|