| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Coverity CID 288709
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit added a mutex specifically for protecting against
simultaneous accesses of the environment. This commit changes the
normal getenv, putenv, and clearenv functions to use it, to avoid races.
This makes the code simpler in places where we've gotten burned and
added stuff to avoid races. Other places where we haven't known we were
getting burned could have existed until now. Now that comes
automatically, and we can remove the special cases we earlier stumbled
over.
getenv() returns a pointer to static memory, which can be overwritten at
any moment from another thread, or even another getenv from the same
thread. This commit changes the accesses to be under control of a
mutex, and in the case of getenv, a mortalized copy is created so that
there is no possible race.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit enhances these functions so that on threaded perls, they use
mbrtowc and wcrtomb when available, making them thread safe. The
substitution isn't completely transparent, as no effort is made to hide
any differences in errno setting upon error. And there may be slight
differences in edge case behavior on some platforms.
This commit also changes the behaviors so that they take a scalar
parameter instead of a char *, and this might be 'undef' or not be
forceable into a valid PV. If not a PV, the functions initialize the
shift state. Previously the shift state was always reinitialized with
every call, which meant these could not work on locales with shift
states.
In addition, there were several issues in mbtowc and wctomb that this
commit fixes.
mbtowc and wctomb, when used, are now run with a semaphore. This avoids
races if called at the same time in another thread.
The returned wide character from mbtowc() could well have been garbage.
The final parameter to mbtowc is now optional, as passing an SV allows
us to determine the length without the need for an extra parameter. It
is now used only to restrict the parsing of the string to shorter than
the actual length.
wctomb would segfault if the string parameter was shared or hadn't
been pre-allocated with a string of sufficient length to hold the
result.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit changes the behavior so that it takes a scalar parameter
instead of a char *, and thus might not be forceable into a valid PV.
When not a PV, the shift state is reinitialized, like calling mblen with
a NULL first parameter. Previously the shift state was always
reinitialized with every call, which meant this could not work on
locales with shift states.
This commit also changes to use mbrlen() on threaded perls transparently
(mostly), when available, to achieve thread-safe operation. It is not
completely transparent because mbrlen (under the very rare stateful
locales) returns a different value when it's resetting the shift state.
It also may set errno differently upon errors, and no effort is made to
hide that difference. Also mbrlen on some platforms can handle partial
characters.
[perl #133928] showed that someone was having trouble with shift states.
|
|
|
|
|
|
|
| |
This changes to use USE_POSIX_2008_LOCALE instead of
HAS_POSIX_2008_LOCALE. Rarely do they differ, but someone may choose to
configure their installation to not use these more modern functions,
even if available, perhaps because they're buggy on that system.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please see the ticket for a full explanation. This bug has been
submitted to glibc, without any real action forthcoming so far.
This invalidates the message cache each time the locale of LC_MESSAGES
is changed, as glibc should be doing this when uselocale changes that,
but glibc fails to do so.
This patch is an extension to the one submitted by Niko Tyni++.
I don't know how to test it, since a test would rely on several
different locales in different languages being available, and that
depends on what's installed on the platform. I suppose that one could
go through the available locales, and try to find three with different
wording for the same message. Doing so however would trigger the bug,
and at the end, if we didn't get three that differed, we wouldn't know
we wouldn't know if it is because of the bug, or that they just didn't
exist on the system.
However, below is a perl program that demonstrated the patch worked.
You could adjust it to the available locales. The buggy code shows the
same text for all locales. The fixed shows three different languages.
use strict;
use Locale::gettext;
use POSIX;
$ENV{LANG} = 'C.UTF-8';
for my $lang (qw(fi_FI fr_FR en_US)) {
$ENV{LANGUAGE} = $lang;
setlocale(LC_MESSAGES, '');
my $d = Locale::gettext->domain("bash");
print $d->get('syntax error'), "\n";
}
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This appears to abort because the supplied locale string isn't validly
encoded in the current code page, so we see the following steps:
1) an internal sizing call to mbstowcs_s() fails, but
2) the calling (CRT) code doesn't handle that, allocating a zero length
buffer
3) mbstowcs_s() is called with a buffer and a zero size, causing the
exception.
Since it's the conversion that fails, perform our own conversion.
Rather than using the current code page always use CP_UTF8, since
this is perl's typical non-Latin1 encoding.
Unfortunately we don't have the SVf_UTF8 flag at this point, so
all we can do is assume UTF-8.
This introduces a change in behaviour - previously locale names
were interpreted in the current code page, but most locale names
are ASCII, so it shouldn't matter.
One issue is that the return value is freed on the next LEAVE, but
all callers immediately use or copy the string.
|
|
|
|
|
| |
Coverity is right, so re-order these clauses. This code is executed
only if some very strange error occurs.
|
|
|
|
| |
And regen affected files
|
|
|
|
| |
The wrong #define was being tested for
|
|
|
|
|
| |
This was just an oversight. THe code doesn't get executed unless it's
trying to panic
|
| |
|
|
|
|
|
|
| |
This was failing in gcc 2.95. The original commit added a cast, but we
figured out that removing this other one that really served no purpose
causes this compiler to work.
|
|
|
|
|
|
| |
Karl pointed that a couple of my recent commits used (lower case)
safefree() rather than Safefree(), the latter having extra debugging
facilities.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following leaked:
LANG= perl -e1
because in S_emulate_setlocale(), it was
1) making a copy of $ENV{"LANG"};
2) throwing that copy away and replacing it with "C" when it discovered
that the string was empty.
A little judicious reordering of that chunk of code makes the issue go
away.
Showed up as failures of lib/locale_threads.t under valgrind / ASan.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example the following leaked:
require POSIX; import POSIX ':locale_h';
setlocale(&POSIX::LC_ALL, 'aa_DJ.iso88591') or die;
use locale;
my $ok = 'A' lt chr 0x100;
Some code in Perl__mem_collxfrm() does a couple of
for (j = 1; j < 256; j++) { ... }
loops where for each chr(j) character it recursively calls itself, and
records the index of the 'smallest' / 'largest' result. However, when
updating cur_min_x / cur_max_x, it wasn't freeing the previous value.
The symptoms were that valgrind / Address Sanitizer found fault with
lib/locale.t
|
| |
|
|
|
|
|
|
| |
It is possible to have a single-threaded build use the thread-safe
locale setting operations. Add a word to indicate it's not 100% the
other way.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 70bd6bc82ba64c1d197d3ec823f43c4a454b2920 fixed a leak (likely due
to a bug in glibc) by not duplicating the C locale object. However,
that meant that there's only one copy running around. And freeing that
will cause havoc, as its supposed to be there until destruction. What
appears to be happening is that the current locale object is freed upon
thread destruction, and that could be this global one. But I don't
understand why it's only happening on Free BSD and only on this version.
But this commit fixes the problem there, and makes sense. Simply don't
free this global object upon thread destruction.
This commit also changes it so it doesn't get destroyed at destruction
time, leaving it to the final PERL_SYS_TERM to free. I'm not sure, but
I think this fixes any issues with embedded perls.
|
|
|
|
|
| |
Indent a block newly formed in the previous commit.
Wrap some too-long lines
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On threaded perls, we create a locale object for LC_ALL "C" early in the
startup phase. When the user asks for that locale, we can just switch
to it instead of trying to create a new one.
Doing the creation worked, but ended up with a memory leak. My guess,
and its only a guess, is that it's a bug in glibc newlocale.c, in which
it does an early return, not doing proper cleanup, when it discovers it
can re-use an existing locale without needing to create a new one.
The reason I think its a glibc bug is that the sample one-liner sent
to me
PERL_DESTRUCT_LEVEL=2 valgrind --leak-check=full ./perl -DLv -Ilib -e'require POSIX;POSIX::setlocale(&POSIX::LC_ALL, "C");' 2>&1 | more
produced a stack output of where the leaked memory had been allocated.
I put a print immediately after that line, and prints at the points
where things get freed. Every allocation was matched by an attempt to
free it. But clearly at least one failed. freelocale() returns void,
so can't be checked for failing.
Anyway, it's better to try not to create a new locale when we already
have an existing one, and doing so, as this commit does, causes the leak
to go away.
No tests are added, as there are plenty of similar tests already in the
suite, and they all should have been leaking.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some systems fake their locales, so that they pretend to accept a locale
change, but they either do nothing, making everything the C locale, or
on some systems there is a a second locale "C-UTF-8" that can be
switched to. Configure probes have been added to find such systems, and
this commit changes to use the results of these probes, so that we don't
try looking for other locales (any names we came up with would be
accepted as valid, but don't work, and tests were failing as a result).
Anything running the musl library fits, as does OpenBSD and its kin, as
they view locales as security risks. This commit allows us to take out
some code that was looking for particular OS's.
|
|
|
|
|
| |
C99 has wide character case changing. If those are available, use them
to be surer we have a Turkic locale.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
change a couple of
const char * foo[] = { ... }
to
const char * const foo[] = { ... }
Making the string ptrs const means the whole thing is RO and doesn't
appear in data section, making porting/libperl.t happier when building
under -DPERL_GLOBAL_STRUCT_PRIVATE.
|
|
|
|
|
|
| |
The perl build option -DPERL_GLOBAL_STRUCT_PRIVATE had bit-rotted
due to lack of smoking. The main fix is to just add 'dVAR;' to any
functions which have a pTHX arg. It's a NOOP on normal builds.
|
|
|
|
| |
This code would fail to require if Configure had ccflags=-DNO_LOCALE
|
|
|
|
|
|
|
|
|
|
|
| |
When switching into a new locale, after it is decided this is a UTF-8
locale, the code now also checks for if the locale is a specialized
Turkic one, which has a couple of slightly modified casing change rules.
If so, it sets a flag indicating this.
The code that has been added in previous commits in this series check if
that flag is set when they are actually paying attention to the
background locale, and if so behave according to Unicode Turkic rules.
|
|
|
|
| |
It currently is always set false, until later in this series of commits.
|
|
|
|
|
| |
This is part of [perl #133696]. A typo was causing a macro to be
defined in terms of itself, hence an illegal recursive definition.
|
|
|
|
|
| |
This commit #ifdef's a usage of a variable that isn't valid unless the
system has LC_NUMERIC
|
|
|
|
|
|
| |
The function print_bytes_for_locale() should be defined if DEBUGGING;
prior to this commit it didn't get defined unless LC_COLLATE was
defined on the platform.
|
|
|
|
|
|
| |
Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2157860312
For: RT # 133686 (partial)
|
|
|
|
| |
avoid malloc/free when possible
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With Perl 5.28.0, there are some mismatches between blocks
and conditional compilation in the Perl__is_cur_LC_category_utf8() function.
The compilation of miniperl could fails like this:
```
locale.c: In function `Perl__is_cur_LC_category_utf8`:
locale.c:5481:1: error: expected declaration or statement at end of input
}
^
```
Signed-off-by: Francois Perrad <francois.perrad@gadz.org>
|
|
|
|
|
| |
Several problems with this compile option were not caught before 5.28
was frozen.
|
|
|
|
|
| |
I found these confusing trying to debug a field problem. This reorders
them, adjusting the wording slightly to compensate
|
|
|
|
|
|
|
| |
Commit 32a62865ef662fce2b2250a7e0eca15861e7fe20 did not work, as gcc
doesn't recognize a void cast as handling a return value. This should
hopefully work, though we discard the value before looking at it, which
could cause another warning.
|
|
|
|
| |
so that it is easier to debug memory leaks.
|
|
|
|
|
|
| |
This was caused by doing some initialization work out-of-order. This
commit just moves some code to later in the function, revising some
comments to make sense after the move.
|
|
|
|
|
|
|
|
|
| |
When there are discrepancies in the locale and what Perl is expecting, a
warning is raised listing the problematic characters. For \n, and \t,
they should have been displayed as mnemonics, but a required backslash
to escape things had been omitted, so they were displayed literally, so
looked just like white space. Also, put any displayed blank in ' ' so
it won't look like the list is empty.
|
|
|
|
|
|
| |
setlocale(LC_ALL, "LC_foo=bar; LC_baz=gah") is legal. Any categories
omitted in the string are set to "C". Prior to this commit the omitted
categories were unchanged
|
|
|
|
|
| |
The return value is discarded here, and a few lines down calls this
function again, retaining its return value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next call to setlocale can overwrite the returned value from the
current call, depending on platform. Therefore, one should save the
results. I forgot this in commit 39e69e777b8. Now fixing it.
I also audited locale.c to find any other instances. There were several
where setlocale() is called without saving, and that return is passed to
a function. It may work now, but it's dangerous to rely on the function
not getting changed in such a way as to do its own setlocale, expecting
the input parameter to be unchanged. So save the returns from these as
well, as a precaution.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In recent Perl versions, the underlying locale for LC_NUMERIC has been
kept in C because XS code is expecting a dot radix character. But if
the LC_NUMERIC locale has a dot, that is unnecessary. (There is also
the thousands grouping separator which for safety we verify is empty.)
Thus 5.27 doesn't always keep the underlying locale in C; it does so
only if necessary.
This commit updates various comments and pods to reflect this change.
|
|
|
|
|
|
|
|
|
| |
Prior to this patch, the code assumed that if you have the other, more
significant, POSIX 2008 functions available, that duplocale was present
and correctly functioning too.
However, we found that there have been bugs in it, so that a hints file
or Configure probe might want to exclude just it.
|