summaryrefslogtreecommitdiff
path: root/locale.c
Commit message (Collapse)AuthorAgeFilesLines
* switch_category_locale_to_template: Fix use-after-free under -DLvDagfinn Ilmari Mannsåker2020-03-161-1/+1
| | | | Coverity CID 288709
* Add thread safety to some environment accessesKarl Williamson2020-03-111-26/+4
| | | | | | | | | | | | | | | | | | The previous commit added a mutex specifically for protecting against simultaneous accesses of the environment. This commit changes the normal getenv, putenv, and clearenv functions to use it, to avoid races. This makes the code simpler in places where we've gotten burned and added stuff to avoid races. Other places where we haven't known we were getting burned could have existed until now. Now that comes automatically, and we can remove the special cases we earlier stumbled over. getenv() returns a pointer to static memory, which can be overwritten at any moment from another thread, or even another getenv from the same thread. This commit changes the accesses to be under control of a mutex, and in the case of getenv, a mortalized copy is created so that there is no possible race.
* Fixup POSIX::mbtowc, wctombKarl Williamson2020-02-191-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit enhances these functions so that on threaded perls, they use mbrtowc and wcrtomb when available, making them thread safe. The substitution isn't completely transparent, as no effort is made to hide any differences in errno setting upon error. And there may be slight differences in edge case behavior on some platforms. This commit also changes the behaviors so that they take a scalar parameter instead of a char *, and this might be 'undef' or not be forceable into a valid PV. If not a PV, the functions initialize the shift state. Previously the shift state was always reinitialized with every call, which meant these could not work on locales with shift states. In addition, there were several issues in mbtowc and wctomb that this commit fixes. mbtowc and wctomb, when used, are now run with a semaphore. This avoids races if called at the same time in another thread. The returned wide character from mbtowc() could well have been garbage. The final parameter to mbtowc is now optional, as passing an SV allows us to determine the length without the need for an extra parameter. It is now used only to restrict the parsing of the string to shorter than the actual length. wctomb would segfault if the string parameter was shared or hadn't been pre-allocated with a string of sufficient length to hold the result.
* POSIX::mblen() Make thread-safe; allow shift state controlKarl Williamson2020-02-191-0/+6
| | | | | | | | | | | | | | | | | | | This commit changes the behavior so that it takes a scalar parameter instead of a char *, and thus might not be forceable into a valid PV. When not a PV, the shift state is reinitialized, like calling mblen with a NULL first parameter. Previously the shift state was always reinitialized with every call, which meant this could not work on locales with shift states. This commit also changes to use mbrlen() on threaded perls transparently (mostly), when available, to achieve thread-safe operation. It is not completely transparent because mbrlen (under the very rare stateful locales) returns a different value when it's resetting the shift state. It also may set errno differently upon errors, and no effort is made to hide that difference. Also mbrlen on some platforms can handle partial characters. [perl #133928] showed that someone was having trouble with shift states.
* locale.c: Use proper #ifdef to enable behaviorKarl Williamson2019-11-301-3/+3
| | | | | | | This changes to use USE_POSIX_2008_LOCALE instead of HAS_POSIX_2008_LOCALE. Rarely do they differ, but someone may choose to configure their installation to not use these more modern functions, even if available, perhaps because they're buggy on that system.
* locale.c white space onlyKarl Williamson2019-11-301-2/+2
|
* PATCH: GH #17081: Workaround glibc bug with LC_MESSAGESKarl Williamson2019-11-301-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Please see the ticket for a full explanation. This bug has been submitted to glibc, without any real action forthcoming so far. This invalidates the message cache each time the locale of LC_MESSAGES is changed, as glibc should be doing this when uselocale changes that, but glibc fails to do so. This patch is an extension to the one submitted by Niko Tyni++. I don't know how to test it, since a test would rely on several different locales in different languages being available, and that depends on what's installed on the platform. I suppose that one could go through the available locales, and try to find three with different wording for the same message. Doing so however would trigger the bug, and at the end, if we didn't get three that differed, we wouldn't know we wouldn't know if it is because of the bug, or that they just didn't exist on the system. However, below is a perl program that demonstrated the patch worked. You could adjust it to the available locales. The buggy code shows the same text for all locales. The fixed shows three different languages. use strict; use Locale::gettext; use POSIX; $ENV{LANG} = 'C.UTF-8'; for my $lang (qw(fi_FI fr_FR en_US)) { $ENV{LANGUAGE} = $lang; setlocale(LC_MESSAGES, ''); my $d = Locale::gettext->domain("bash"); print $d->get('syntax error'), "\n"; }
* (perl #133981) fix my stupid mistakeTony Cook2019-09-051-2/+2
|
* (perl #133981) fix for Win32 setlocale() abortTony Cook2019-09-031-1/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This appears to abort because the supplied locale string isn't validly encoded in the current code page, so we see the following steps: 1) an internal sizing call to mbstowcs_s() fails, but 2) the calling (CRT) code doesn't handle that, allocating a zero length buffer 3) mbstowcs_s() is called with a buffer and a zero size, causing the exception. Since it's the conversion that fails, perform our own conversion. Rather than using the current code page always use CP_UTF8, since this is perl's typical non-Latin1 encoding. Unfortunately we don't have the SVf_UTF8 flag at this point, so all we can do is assume UTF-8. This introduces a change in behaviour - previously locale names were interpreted in the current code page, but most locale names are ASCII, so it shouldn't matter. One issue is that the return value is freed on the next LEAVE, but all callers immediately use or copy the string.
* locale.c: Stop Coverity warningKarl Williamson2019-08-061-5/+6
| | | | | Coverity is right, so re-order these clauses. This code is executed only if some very strange error occurs.
* Fix "it it" typosDagfinn Ilmari Mannsåker2019-07-041-1/+1
| | | | And regen affected files
* PATCH: [perl #134098] no locales + debugging = no compileKarl Williamson2019-05-241-1/+1
| | | | The wrong #define was being tested for
* locale.c: Fix '%s' directive argument is nullKarl Williamson2019-05-241-0/+1
| | | | | This was just an oversight. THe code doesn't get executed unless it's trying to panic
* locale.c: Add some commentsKarl Williamson2019-05-241-4/+7
|
* locale.c: remove unnecessary castJerome Duval2019-05-241-3/+1
| | | | | | This was failing in gcc 2.95. The original commit added a cast, but we figured out that removing this other one that really served no purpose causes this compiler to work.
* s/safefree()/Safefree() in a few placesDavid Mitchell2019-04-171-2/+2
| | | | | | Karl pointed that a couple of my recent commits used (lower case) safefree() rather than Safefree(), the latter having extra debugging facilities.
* fix leak when $LANG unsetDavid Mitchell2019-04-161-11/+8
| | | | | | | | | | | | | | | | The following leaked: LANG= perl -e1 because in S_emulate_setlocale(), it was 1) making a copy of $ENV{"LANG"}; 2) throwing that copy away and replacing it with "C" when it discovered that the string was empty. A little judicious reordering of that chunk of code makes the issue go away. Showed up as failures of lib/locale_threads.t under valgrind / ASan.
* fix locale leaks on utf8 stringsDavid Mitchell2019-04-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | For example the following leaked: require POSIX; import POSIX ':locale_h'; setlocale(&POSIX::LC_ALL, 'aa_DJ.iso88591') or die; use locale; my $ok = 'A' lt chr 0x100; Some code in Perl__mem_collxfrm() does a couple of for (j = 1; j < 256; j++) { ... } loops where for each chr(j) character it recursively calls itself, and records the index of the 'smallest' / 'largest' result. However, when updating cur_min_x / cur_max_x, it wasn't freeing the previous value. The symptoms were that valgrind / Address Sanitizer found fault with lib/locale.t
* fix locale.c under -DPERL_GLOBAL_STRUCT_PRIVATEDavid Mitchell2019-04-021-0/+1
|
* perlapi: Add weasel word to make stmt accurateKarl Williamson2019-03-271-1/+1
| | | | | | It is possible to have a single-threaded build use the thread-safe locale setting operations. Add a word to indicate it's not 100% the other way.
* PATCH: [perl #133959] Free BSD broken testsKarl Williamson2019-03-271-1/+1
| | | | | | | | | | | | | | | | Commit 70bd6bc82ba64c1d197d3ec823f43c4a454b2920 fixed a leak (likely due to a bug in glibc) by not duplicating the C locale object. However, that meant that there's only one copy running around. And freeing that will cause havoc, as its supposed to be there until destruction. What appears to be happening is that the current locale object is freed upon thread destruction, and that could be this global one. But I don't understand why it's only happening on Free BSD and only on this version. But this commit fixes the problem there, and makes sense. Simply don't free this global object upon thread destruction. This commit also changes it so it doesn't get destroyed at destruction time, leaving it to the final PERL_SYS_TERM to free. I'm not sure, but I think this fixes any issues with embedded perls.
* locale.c: White-space, comment onlyKarl Williamson2019-03-211-45/+60
| | | | | Indent a block newly formed in the previous commit. Wrap some too-long lines
* locale.c: Don't try to recreate the LC_ALL C localeKarl Williamson2019-03-211-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On threaded perls, we create a locale object for LC_ALL "C" early in the startup phase. When the user asks for that locale, we can just switch to it instead of trying to create a new one. Doing the creation worked, but ended up with a memory leak. My guess, and its only a guess, is that it's a bug in glibc newlocale.c, in which it does an early return, not doing proper cleanup, when it discovers it can re-use an existing locale without needing to create a new one. The reason I think its a glibc bug is that the sample one-liner sent to me PERL_DESTRUCT_LEVEL=2 valgrind --leak-check=full ./perl -DLv -Ilib -e'require POSIX;POSIX::setlocale(&POSIX::LC_ALL, "C");' 2>&1 | more produced a stack output of where the leaked memory had been allocated. I put a print immediately after that line, and prints at the points where things get freed. Every allocation was matched by an attempt to free it. But clearly at least one failed. freelocale() returns void, so can't be checked for failing. Anyway, it's better to try not to create a new locale when we already have an existing one, and doing so, as this commit does, causes the leak to go away. No tests are added, as there are plenty of similar tests already in the suite, and they all should have been leaking.
* Add, improve some debugging stmts for -DL (locales)Karl Williamson2019-03-211-1/+5
|
* Properly handle systems with crippled localesKarl Williamson2019-03-041-2/+5
| | | | | | | | | | | | | | Some systems fake their locales, so that they pretend to accept a locale change, but they either do nothing, making everything the C locale, or on some systems there is a a second locale "C-UTF-8" that can be switched to. Configure probes have been added to find such systems, and this commit changes to use the results of these probes, so that we don't try looking for other locales (any names we came up with would be accepted as valid, but don't work, and tests were failing as a result). Anything running the musl library fits, as does OpenBSD and its kin, as they view locales as security risks. This commit allows us to take out some code that was looking for particular OS's.
* locale.c: Tighten turkish locale tests on C99 platformsKarl Williamson2019-03-041-0/+12
| | | | | C99 has wide character case changing. If those are available, use them to be surer we have a Turkic locale.
* locale.c: Fix grammar in commentKarl Williamson2019-03-041-1/+1
|
* PERL_GLOBAL_STRUCT_PRIVATE: fix some const stringsDavid Mitchell2019-02-191-1/+1
| | | | | | | | | | | change a couple of const char * foo[] = { ... } to const char * const foo[] = { ... } Making the string ptrs const means the whole thing is RO and doesn't appear in data section, making porting/libperl.t happier when building under -DPERL_GLOBAL_STRUCT_PRIVATE.
* add dVAR's for PERL_GLOBAL_STRUCT_PRIVATE buildsDavid Mitchell2019-02-191-0/+2
| | | | | | The perl build option -DPERL_GLOBAL_STRUCT_PRIVATE had bit-rotted due to lack of smoking. The main fix is to just add 'dVAR;' to any functions which have a pTHX arg. It's a NOOP on normal builds.
* locale.c: Fix compilation errorKarl Williamson2019-02-061-4/+3
| | | | This code would fail to require if Configure had ccflags=-DNO_LOCALE
* locale.c: Add detection of Turkic UTF-8 localesKarl Williamson2019-02-051-1/+24
| | | | | | | | | | | When switching into a new locale, after it is decided this is a UTF-8 locale, the code now also checks for if the locale is a specialized Turkic one, which has a couple of slightly modified casing change rules. If so, it sets a flag indicating this. The code that has been added in previous commits in this series check if that flag is set when they are actually paying attention to the background locale, and if so behave according to Unicode Turkic rules.
* Add variable for if the current UTF-8 locale is TurkicKarl Williamson2019-02-051-0/+2
| | | | It currently is always set false, until later in this series of commits.
* locale.c: Failure to build if not allowing LC_COLLATEKarl Williamson2018-11-291-1/+1
| | | | | This is part of [perl #133696]. A typo was causing a macro to be defined in terms of itself, hence an illegal recursive definition.
* locale.c: Don't use numeric unless LC_NUMERICKarl Williamson2018-11-291-0/+4
| | | | | This commit #ifdef's a usage of a variable that isn't valid unless the system has LC_NUMERIC
* locale.c: Fix wrong scope of #if'sKarl Williamson2018-11-291-2/+5
| | | | | | The function print_bytes_for_locale() should be defined if DEBUGGING; prior to this commit it didn't get defined unless LC_COLLATE was defined on the platform.
* Rename local variable to prevent confusion with globalJames E Keenan2018-11-261-3/+3
| | | | | | Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2157860312 For: RT # 133686 (partial)
* use a buffer for is_cur_LC_category_utf8Nicolas R2018-09-271-5/+17
| | | | avoid malloc/free when possible
* locale.c: Fix conditional compilationFrancois Perrad2018-07-161-2/+3
| | | | | | | | | | | | | | With Perl 5.28.0, there are some mismatches between blocks and conditional compilation in the Perl__is_cur_LC_category_utf8() function. The compilation of miniperl could fails like this: ``` locale.c: In function `Perl__is_cur_LC_category_utf8`: locale.c:5481:1: error: expected declaration or statement at end of input } ^ ``` Signed-off-by: Francois Perrad <francois.perrad@gadz.org>
* Fix to compile under -DNO_LOCALEKarl Williamson2018-07-011-20/+53
| | | | | Several problems with this compile option were not caught before 5.28 was frozen.
* locale.c: Reorder some debugging statementsKarl Williamson2018-04-181-12/+14
| | | | | I found these confusing trying to debug a field problem. This reorders them, adjusting the wording slightly to compensate
* locale.c: Really silence compiler warningKarl Williamson2018-03-211-1/+1
| | | | | | | Commit 32a62865ef662fce2b2250a7e0eca15861e7fe20 did not work, as gcc doesn't recognize a void cast as handling a return value. This should hopefully work, though we discard the value before looking at it, which could cause another warning.
* locale.c: Add detail to debugging statementKarl Williamson2018-03-191-1/+1
| | | | so that it is easier to debug memory leaks.
* locale.c: Fix memory leakKarl Williamson2018-03-191-50/+56
| | | | | | This was caused by doing some initialization work out-of-order. This commit just moves some code to later in the function, revising some comments to make sense after the move.
* locale.c: Clarify warning messageKarl Williamson2018-03-161-4/+8
| | | | | | | | | When there are discrepancies in the locale and what Perl is expecting, a warning is raised listing the problematic characters. For \n, and \t, they should have been displayed as mnemonics, but a required backslash to escape things had been omitted, so they were displayed literally, so looked just like white space. Also, put any displayed blank in ' ' so it won't look like the list is empty.
* locale.c: Handle and edge caseKarl Williamson2018-03-161-1/+11
| | | | | | setlocale(LC_ALL, "LC_foo=bar; LC_baz=gah") is legal. Any categories omitted in the string are set to "C". Prior to this commit the omitted categories were unchanged
* locale.c: Silence Win32 compiler warningKarl Williamson2018-03-141-2/+2
| | | | | The return value is discarded here, and a few lines down calls this function again, retaining its return value.
* locale.c: Add savepv() to setlocale() returnsKarl Williamson2018-03-131-7/+14
| | | | | | | | | | | | | The next call to setlocale can overwrite the returned value from the current call, depending on platform. Therefore, one should save the results. I forgot this in commit 39e69e777b8. Now fixing it. I also audited locale.c to find any other instances. There were several where setlocale() is called without saving, and that return is passed to a function. It may work now, but it's dangerous to rely on the function not getting changed in such a way as to do its own setlocale, expecting the input parameter to be unchanged. So save the returns from these as well, as a precaution.
* perlapi/Perl_setlocale: ClarifyKarl Williamson2018-03-121-3/+3
|
* Fix comments/pod for LC_NUMERIC not always CKarl Williamson2018-03-121-12/+18
| | | | | | | | | | | In recent Perl versions, the underlying locale for LC_NUMERIC has been kept in C because XS code is expecting a dot radix character. But if the LC_NUMERIC locale has a dot, that is unnecessary. (There is also the thousands grouping separator which for safety we verify is empty.) Thus 5.27 doesn't always keep the underlying locale in C; it does so only if necessary. This commit updates various comments and pods to reflect this change.
* Don't use duplocale() unless is presentKarl Williamson2018-03-121-2/+5
| | | | | | | | | Prior to this patch, the code assumed that if you have the other, more significant, POSIX 2008 functions available, that duplocale was present and correctly functioning too. However, we found that there have been bugs in it, so that a hints file or Configure probe might want to exclude just it.