summaryrefslogtreecommitdiff
path: root/locale.c
Commit message (Collapse)AuthorAgeFilesLines
* Corrections to spelling and grammatical errors.Lajos Veres2015-01-281-1/+1
| | | | Extracted from patch submitted by Lajos Veres in RT #123693.
* locale.c: Add comment; move #ifKarl Williamson2015-01-131-3/+6
| | | | | | A better comment is added. The #if is moved so that the rare compilation that doesn't use LC_CTYPE, no unused variable warning would be generated.
* Move unlikely executed macro to functionKarl Williamson2015-01-131-0/+23
| | | | | | | | | | The bulk of this macro is extremely rarely executed, so it makes sense to optimize for space, as it is called from a fair number of places, and move as much as possible to a single function. For whatever it's worth, on my system with my typical compilation options, including -O0, the savings was 19640 bytes in regexec.o, 4528 in utf8.o, at a cost of 1488 in locale.o.
* locale.c: Fix memory leak.Karl Williamson2015-01-131-0/+1
| | | | | | | I spotted this in code review. I didn't add a test for it, because to expose the much more serious bug fixed by the previous commit, I had to temporarily change the C code to force these extremely unlikely-to-be-taken branches to execute.
* Fix breakage of 780fcc9Karl Williamson2014-12-291-2/+7
| | | | | I got confused in writing this: the global needs to be cleared always, and set to NULL.
* Don't raise 'poorly supported' locale warning unnecessarilyKarl Williamson2014-12-291-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8c6180a91de91a1194f427fc639694f43a903a78 added a warning message for when Perl determines that the program's underlying locale just switched into is poorly supported. At the time it was thought that this would be an extremely rare occurrence. However, a bug in HP-UX - B.11.00/64 causes this message to be raised for the "C" locale. A workaround was done that silenced those. However, before it got fixed, this message would occur gobs of times executing the test suite. It was raised even if the script is not locale-aware, so that the underlying locale was completely irrelevant. There is a good prospect that someone using an older Asian locale as their default would get this message inappropriately, even if they don't use locales, or switch to a supported one before using them. This commit causes the message to be raised only if it actually is relevant. When not in the scope of 'use locale', the message is stored, not raised. Upon the first locale-dependent operation within a bad locale, the saved message is raised, and the storage cleared. I was able to do this without adding extra branching to the main-line non-locale execution code. This was done by adding regnodes which get jumped to by switch statements, and refactoring some existing C tests so they exclude non-locale right off the bat. These changes would have been necessary for another locale warning that I previously agreed to implement, and which is coming a few commits from now. I do not know of any way to add tests in the test suite for this. It is in fact rare for modern locales to have these issues. The way I tested this was to temporarily change the C code so that all locales are viewed as defective, and manually note that the warnings came out where expected, and only where expected. I chose not to try to output this warning on any POSIX functions called. I believe that all that are affected are deprecated or scheduled to be deprecated anyway. And POSIX is closer to the hardware of the machine. For convenience, I also don't output the message for some zero-length pattern matches. If something is going to be matched, the message will likely very soon be raised anyway.
* Stop errorneous warnings for C localeKarl Williamson2014-12-111-1/+10
| | | | | | HP-UX - B.11.00/64 has a problem with the C locale that's only noticeable from newly added warnings flooding the logs. This adds a test to suppress them.
* Change core to use is_invariant_string()Karl Williamson2014-11-261-5/+5
| | | | | is_ascii_string's name has misled me in the past; the new name is clearer.
* locale.c: Account for setlocale using static storageKarl Williamson2014-11-191-2/+9
| | | | | | | | | Some systems setlocale()s use static storage for the locale name returned by it, so that a subsequent setlocale overwrites it. Therefore, you must make a copy of the name if you want it to work after the next setlocale. Thanks to Craig Berry for finding and diagnosing this problem.
* Reinstate "Raise warnings for poorly supported locales"Karl Williamson2014-11-141-0/+82
| | | | | This reverts commit 1244bd171b8d1fd4b6179e537f7b95c38bd8f099, thus reinstating commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc.
* Revert "Raise warnings for poorly supported locales"Karl Williamson2014-11-041-82/+0
| | | | | | | This reverts commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc. Win32 with a 1252 code page was failing blead. Revert until I have time to look at it.
* Raise warnings for poorly supported localesKarl Williamson2014-11-041-0/+82
| | | | | | | | | Perl only supports single-byte locales (except for UTF-8 ones), and has poor support for 7-bit locales that aren't supersets of ASCII (these should be exceedingly rare these days). This commit raises warnings in the new locale warning category when such a locale is entered.
* fix type incompatibilities between format strings/argsLukas Mai2014-10-261-1/+1
| | | | | | | | Building a debugging perl triggered warnings such as warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘U32’ warning: field width specifier ‘*’ expects argument of type ‘int’, but argument 5 has type ‘long unsigned int’ warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘wchar_t’
* EBCDIC doesn't have real UTF-8 locales.Karl Williamson2014-10-211-0/+3
| | | | | At least on the system that we have tested on. There are locales that say they are UTF-8, but they're not; they're EBCDIC 1047.
* Add and use macros for case-insensitive comparisonKarl Williamson2014-08-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | This adds to handy.h isALPHA_FOLD_EQ(c1,c2) which efficiently tests if c1 and c2 are the same character, case-insensitively. For example isALPHA_FOLD_EQ(c, 's') returns true if and only if <c> is 's' or 'S'. isALPHA_FOLD_NE() is also added by this commit. At least one of c1 and c2 must be known to be in [A-Za-z] or this macro doesn't work properly. (There is an assert for this in the macro in DEBUGGING builds). That is why the name includes "ALPHA", so you won't forget when using it. This functionality has been in regcomp.c for a while, under a different name. I had thought that the only reason to make it more generally available was potential speed gain, but recent gcc versions optimize to the same code, so I thought there wasn't any point to doing so. But I now think that using this makes things easier to read (and certainly shorter to type in). Once you grok what this macro does, it simplifies what you have to keep in your mind when reading logical expressions with multiple operands. That something can be either upper or lower case can be a distraction to understanding the larger point of the expression.
* Add sync_locale()Karl Williamson2014-08-141-0/+35
| | | | | | This trivial function is to be used by XS code when it changes the program's locale. It hides the details from that code of what needs to be done, which could change in the future.
* locale.c: Clarify commentKarl Williamson2014-08-131-1/+1
|
* locale.c: Use PERL_UNUSED_RESULTKarl Williamson2014-08-131-3/+1
| | | | | The previous way to suppress messages wasn't working for all gcc versions. Spotted by Jarkko Hietaniemi.
* Use grok_atou instead of atoi.Jarkko Hietaniemi2014-07-221-1/+2
| | | | | Remaining atoi() uses include at least: ext/DynaLoader/dl_aix.xs, os2/os2.c, vms/vms.c
* locale.c: Improve some commentsKarl Williamson2014-07-121-3/+6
|
* locale.c: Fix some unused code for potential future useKarl Williamson2014-07-121-33/+46
| | | | | | | | | | This code extends the heuristics used to determine if a locale is UTF-8 or not on older platforms. It has been #ifdef'd out because it only added a little value on dromedary. Now the previous commit has added new heuristics, and tests on dromedary show that this adds nothing to that. But I'm leaving it in the source in case it might ever prove useful. In order to test it, I compiled it and found some problems with the earlier version that this now fixes.
* locale.c: Add new heuristic for finding if locale is UTF-8Karl Williamson2014-07-121-1/+93
| | | | | | | | On older platforms that don't conform to POSIX 2001 nor C99, heuristics are employed to try to determine if a locale is UTF-8 or not. This commit improves those heuristics by looking at names of the months and days of the week to see if they are UTF-8 or not. This is done if looking at the currency symbol failed to help.
* locale.c: White-space onlyKarl Williamson2014-07-121-12/+12
| | | | | Indent and outdent blocks of code to conform to newly formed or removed braces
* locale.c: Refactor UTF8ness of currency symbol codeKarl Williamson2014-07-121-26/+24
| | | | | | | | | On older platforms that aren't C99 nor POSIX 2001, locale.c uses the currency symbol to try to see if a locale is UTF-8 or not. This commit refactors it somewhat to make it cleaner, and which fixes several problems. The least issue was that it sometimes did a setlocale() unnecessarily. Others are that in some circumstances it called localeconv() and/or looked at the result while within the wrong locale.
* locale.c: Use ptr's value before freeing it, not afterKarl Williamson2014-07-121-1/+1
| | | | This only affected runs with the -DL parameter to perl set.
* locale.c: Use safer code practiceKarl Williamson2014-07-121-5/+6
| | | | | | | The interior-most function can return NULL. Currently savepv() which is the next outer function handles this correctly, as does the next outer function, but it is dangerous to rely on that behavior. So we test for NULL before calling functions on a NULL ptr.
* locale.c: Skip compiling fallback code on modern platformsKarl Williamson2014-07-121-1/+5
| | | | | | | | | In the function that determines if a POSIX locale is UTF-8 or not, if either nl_langinfo or MB_CUR_MAX are defined, it can reliably determine the answer. If they are not defined, it uses heuristics to figure things out as best it can. This code doesn't add value for those platforms where one of the two symbols is defined, so can just be ifdef'd out
* locale.c: name should be last resort when deciding if locale is utf8Karl Williamson2014-07-121-73/+78
| | | | | Looking at if the currency symbol is UTF-8 should come ahead of looking at the locale name.
* locale.c: Prepare for rearrangement of code blocksKarl Williamson2014-07-121-7/+6
| | | | | | | | This section of code just returned generally,. This commit changes it so that it drops off the end if it can't determine if the current locale is UTF-8 or not, so that additional tests can be added later. The function defaults to not UTF-8 if this drops off the end, so there should be no functionality change
* locale.c: Fix misplaced parenthesisKarl Williamson2014-07-101-2/+2
| | | | | | Commit a39edc4c877304d4075679b1d8de1904671a9c37 got a parenthesis misplaced so it wasn't really looking at the next character, like it was supposed to be doing
* locale.c: White-space onlyKarl Williamson2014-07-091-8/+9
| | | | Outdent because the previous commit removed the enclosing block.
* locale.c: Remove conditionals.Karl Williamson2014-07-091-11/+10
| | | | | | | | | These two functions are supposed to normally be called through macro interfaces which check whether they actually should be called or not. That means the conditionals removed by this commit are redundant from the normal interface. By removing them, we allow the exceptional case where the code should be executed unconditionally, to happen, by just calling the functions directly, not using the macro interface.
* locale.c: Keep better track of C/non-C localeKarl Williamson2014-07-091-2/+2
| | | | | | | | | | | | Perl uses three interpreter-level (but private) variables to keep track of numeric locales. PL_numeric name is the current underlying locale. PL_standard is a boolean to indicate if we are switched to the C (or POSIX) locale, and PL_local is a boolean to indicate if we are switched to the underlying one. The reason there are two booleans is if the underlying locale is C, both can be true at the same time. But the code that is being changed by this commit didn't realize this, and could unnecessarily set the booleans to FALSE. This could cause unnecessary switching of locales.
* locale.c: Make a common idiom into a macroKarl Williamson2014-07-091-10/+18
|
* Avoid unused warnings from locale-less systems.Jarkko Hietaniemi2014-07-031-0/+6
| | | | | | From Brian Fraser: "Technically, any Perl compiled with -Accflags="-UUSE_LOCALE", or -Ui_locale -Ud_setlocale... realistically, for Android".
* Remove or downgrade unnecessary dVAR.Jarkko Hietaniemi2014-06-251-11/+0
| | | | | | | | You need to configure with g++ *and* -Accflags=-DPERL_GLOBAL_STRUCT or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference. (g++ does not do the "post-annotation" form of "unused".) The version code has some of these issues, reported upstream.
* Put back an #if-0-ed chunk 7053d92 removed.Jarkko Hietaniemi2014-06-131-0/+75
| | | | | | | The chunk is not MAD-related but instead locale stuff. I have no idea why that chunk got removed (I used a combination of unifdef(1) and editor). It's #if-0-ed, so no change of behavior either way, but let's keep the code for now, since it seems to have "historical significance".
* Remove MAD.Jarkko Hietaniemi2014-06-131-75/+0
| | | | | | MAD = Misc Attribute Decoration; unmaintained attempt at preserving the Perl parse tree more faithfully so that automatic conversion to Perl 6 would have been easier.
* locale.c: Fix uncomplemented 'if' testKarl Williamson2014-06-071-1/+1
| | | | | Somehow the ! in this if () got dropped, and there were no tests to catch it. Now both are remedied.
* fix locale.c under -DPERL_GLOBAL_STRUCTDavid Mitchell2014-06-071-0/+1
|
* Use C locale for "$!" ouside 'use locale' scopeKarl Williamson2014-06-051-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stringification of $! has long been an outlier in Perl locale handling. The theory has been that these operating system messages are likely to be of use to the final user, and should be in their language. Things like No space left on device Can't fork are not something the program is likely to handle, but could be meaningfully helpful to the end-user. There are problems with this though. One is that many perl messages are in English, with the $! appended to them, so that the resultant message is of mixed language, and may need to be translated anyway. Things like No space left on device probably won't need the remaining portion of the message to give someone a clear indication as to what's wrong. But there are many other messages where both the OS error and the Perl error would be needed togther to understand the problem. An on-line translation tool can be used to do this. Another problem is that it can lead to garbage coming out on the user's terminal when the program is not expecting UTF-8, but the underlying locale is UTF-8. This is what happens in Bug #112208, and another that was merged with it. It's a lot harder to translate mojibake via an online tool than English. This commit solves that by using the C locale for messages, except within the scope of 'use locale'. It is extremely likely that the messages in the C locale will be English, but if not they will be ASCII, and there will be no garbage printed. A program that says "use locale" is indicating that it has the intelligence necessary to deal with locales.
* Add parameters to "use locale"Karl Williamson2014-06-051-0/+21
| | | | | | | This commit allows one to specify to enable locale-awareness for only a specified subset of the locale categories. Thus you could make a section of code LC_MESSAGES aware, with no locale-awareness for the other categories.
* Allow dynamic lock of LC_NUMERICKarl Williamson2014-06-051-3/+6
| | | | | | | | | | | When processing version strings, the radix character must be a dot even if we otherwise would be using some other character. vutil.c upg_version() changes to the dot, but calls sv_catpvf() which may try to change the character to something else. This commit introduces a way to lock the character to a dot around the call to sv_catpvf() vutil.c is cpan-upstream, but already blead and cpan have diverged, so this just updates the SHA of the new version
* Fix up LC_NUMERIC wrap macrosKarl Williamson2014-06-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | perl.h has some macros used to manipulate the locale exposed for the category LC_NUMERIC. These are currently undocumented, but will need to be documented as the development of 5.21 progresses. This fixes these up in several ways: The tests for if we are in the correct state are made into macros. This is in preparation for the next commit, which will make one of them more complicated, and so that complication will only have to be in one place. The variable declared by them is renamed to be preceded by an underscore. It is dangerous practice to have a name used in a macro, as it could conflict with a name used by outside code. This alleviates it somewhat by making it even less likely to conflict. This will have to be revisited when some of these macros are made part of the public API. The tests to see if things need to change are reversed. Previously we said we need to change to standard, for example, if the variable for 'local' is set. But both can be true at the same time if the underlying locale is C. In this case, we only need to change if we aren't in standard. Whether that is also local is irrelevant.
* Keep LC_NUMERIC in C locale, except for brief periodsKarl Williamson2014-06-051-0/+6
| | | | | | | This is for XS modules, so they don't have to worry about the radix being a non-dot. When the locale needs to be in the underlying one, the operation should be wrapped using macros for the purpose. That API may change as we gain experience in 5.21, so I'm not including it now.
* Set utf8 flag properly in localeconvKarl Williamson2014-06-051-5/+5
| | | | | | | | | Rare, but not unheard of, is for the strings returned by localeconv to be in UTF-8. This commit looks for and sets the UTF-8 flag if they are. so encoded. A private function had to changed from static for this. It is renamed to begin with an underscore to emphasize its private nature.
* Use system default locale only if there is one.Jarkko Hietaniemi2014-05-291-4/+13
| | | | | | | | | | | (Currently, only Win32 has one.) [perl #121865] Fix for Coverity perl5 CID 28949: Logically dead code (DEADCODE) dead_error_line: Execution cannot reach this statement name = system_default_locale;
* Leaked string in failure path.Jarkko Hietaniemi2014-05-281-0/+1
| | | | | | | | | Fix for Coverity perl5 CID 29058: Resource leak (RESOURCE_LEAK) leaked_storage: Variable codeset going out of scope leaks the storage it points to. The savepv-ed codeset was not freed in failure path. (The save_input_locale is freed just few lines later.)
* g++ cleanups.Jarkko Hietaniemi2014-05-281-1/+1
| | | | | | regcomp.c:11083: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement locale.c:1113: warning: comparison between signed and unsigned integer expressions
* Fix for Coverity perl5 CID 45366: Use after free (USE_AFTER_FREE) ↵Jarkko Hietaniemi2014-04-291-2/+2
| | | | | | pass_freed_arg: Passing freed pointer save_input_locale as an argument to PerlIO_printf. Printfing save-pvs after freeing them.