summaryrefslogtreecommitdiff
path: root/locale.c
Commit message (Collapse)AuthorAgeFilesLines
* PATCH: [perl #132516] locale.c compiler warningKarl Williamson2017-11-291-1/+1
| | | | | | | | This is a problem on Darwin due to a bug there. MB_CUR_MAX, according to Tony Cook, is supposed to be an unsigned value according to the C99 standard, and it is in Linux. But Darwin declares it to be signed, even though the minimum value it can reach is +1. Maybe other systems have the same defect. But there is a simple fix, just cast it to unsigned.
* locale.c: Use computed length for U+10FFFFKarl Williamson2017-11-181-1/+1
| | | | | | The previous commit calculates this and placed the result in a header file. This now uses the calculated value instead of a hard-coded "4", which is incorrect on EBCDIC platforms.
* locale.c: Use mnemonicKarl Williamson2017-11-141-1/+1
| | | | Replace this number by an already existing mnemonic
* locale.c: Simplify code in Perl_langinfo()Karl Williamson2017-11-101-23/+7
| | | | Instead of a switch() statement we can use 'foo ? bar : baz;'
* locale.c: strerror_l() not fool proofKarl Williamson2017-11-091-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7aaa36b196e5a478a3d1bd32506797db7cebf0b2 changed to use strerror_l() if available on the platform. But there is a potential bug with this on threaded perls. The code uses strerror_l() when it needs the answer on a locale that isn't necessarily the current one. But it uses plain strerror() when the locale is known to be the current one. Plain strerror() isn't necessarily thread-safe. However, on systems that have strerror_r(), reentr.h has caused our apparent call to plain strerror() to instead call the thread-safe strerror_r() under the hood. So there is no bug on unthreaded perls nor on ones that have strerror_r(). This commit fixes the bug on threaded builds which have strerror_l() but not strerror_r(). It does this by using strerror_l() for everything, and constructing a locale object that is the current locale to use when the locale doesn't need to be changed. This is somewhat more work than the alternative above does, so that one is used if available. No changes are made to how it works on systems that don't have strerror_l(). Some systems have deprecated strerror_r(). reentr.h does not use it on such systems. The reason for the deprecation, we would hope, may be that the plain strerror() is implemented thread-safely. We don't know that, so we just assume that the plain version is thread-unsafe. We do have tests that try to find races here, but they haven't shown any. It could be that systems that are advanced enough to have strerror_l() also have strerror_r().
* locale.c: Move a #define to earlier in the fileKarl Williamson2017-11-091-14/+15
| | | | This is in prep for a future commit which needs it earlier
* locale.c: Add #define'sKarl Williamson2017-11-091-30/+101
| | | | | | | | The previous commit added arrays of locale categories. This commit creates compile-time mappings from the category number to the index it has in the array. It also changes to use the #define for the index of LC_ALL in places it is expected to be defined. This causes bugs in this logic to be found at compile time on systems that don't have LC_ALL.
* locale.c: Remove many #if conditionalsKarl Williamson2017-11-091-384/+195
| | | | | | | | | | | | | | | | | | | | | | locale.c is full of compiler conditionals because platforms vary widely (or have in the past) in what categories they use. Prior to this commit, there were many sections of code which had copies of the same constructs which were #ifdef'd so they'd run only on the categories that are to be used in this build. This duplication creates the opportunity for changes to get applied to only some of the places that they should, and also makes it hard to read. This commit adds two parallel arrays that can map a category to/from its name, and are defined with each element conditionally compiled in based on the needs of the build. Doing the conditionals during array construction means that most of the other conditionals can be replaced by looping through the arrays. Thus the duplicated code is eliminated, as well as almost 200 lines in this file. Most of these loops get executed only at process initialization, so the slight performance hit is inconsequential.
* locale.c: Avoid potential read beyond buffer endKarl Williamson2017-11-081-2/+4
| | | | | | | | | | | | I noticed this flaw by code reading; I doubt that it's exploitable. foldEQ assumes that both operands are at least as long as its length parameter. In this case, it's possible that the codeset returned by nl_langinfo is shorter than 5, in which case, it would try to access the extra characters in the heap. Real codesets tend to be longer than this, so an attacker would likely have to install a locale with a made-up codeset whose name is shorter. Even the C locale is longer: "ANSI_X3.4-1968"
* locale.c: Clarify some debug statementsKarl Williamson2017-11-081-2/+2
|
* locale.c: Slight refactorKarl Williamson2017-11-081-4/+2
| | | | | This makes savepv() part of the expressions instead of a separate statement.
* locale.c: Use REPLACEMENT_CHARACTER as a testKarl Williamson2017-11-081-4/+6
| | | | | | | | | | | | | This is trying to determine if the locale is UTF-8. The easiest way to tell is if the codeset returned by nl_langinfo says UTF-8, but if that fails or nl_langinfo() is not present on the system, a fallback method is to use the libc routines to convert a known byte string to code point and see if that matches the expected Unicode code point. Prior to this patch, the byte string representing HYPHEN was used. That's probably good enough, but we can do better with no extra work. This commit changes to use the REPLACEMENT CHARACTER instead. That is a Unicode concept. The chances of a non-UTF-8 locale taking the UTF-8 byte string for the REPLACEMENT and evaluating to REPLACEMENT are vanishingly small.
* locale.c: Avoid extra call to mbtowc()Karl Williamson2017-11-081-3/+6
| | | | | | This is done only when debugging, but in some locales that have shift states, the extra call could blow up. Instead save the result of the mbtowc() call we care about.
* locale.c: Add macroKarl Williamson2017-11-081-0/+4
| | | | | | | | This adds STRLENs() where the argument must be a literal string constant. This may deserve wider applicability, but in case it doesn't, I'm making it local to just this file.
* locale.c: Rmv extraneous detail from commentKarl Williamson2017-11-081-2/+2
| | | | | | | This comment contains a list of code points that are unusual, but it also included ones that are standard, which made me keep looking to see why they were unusual, each time realizing in the end that they were not.
* Change name of internal functionKarl Williamson2017-11-081-3/+3
| | | | | Following on the previous commit, this changes the name of the function that changes the variable to be in sync with it.
* Change name of locale per-interpreter variableKarl Williamson2017-11-081-5/+5
| | | | | | The real purpose of this internal variable is to give the name of the locale that is the underlying one for the C program. Various macros already indicate that. This furthers the process.
* Perl_locale(): Refactor for clarityKarl Williamson2017-11-081-76/+37
| | | | | This code is full of 'if's interrupted by #ifdefs, which makes it hard to read. Changing it to a switch() makes it much easier to understand.
* locale.c:sync_locale(): Add debugging infoKarl Williamson2017-11-081-3/+16
|
* locale.c:sync_locale(): Rmv useless callKarl Williamson2017-11-081-1/+0
| | | | | | This was changing to use the locale's radix, but this is unnecessary for the later things in this function, and those change things to use dot, so this call is useless.
* locale.c: Use new nl_langinfo equivalentKarl Williamson2017-11-081-4/+7
| | | | | | | This converts the final plain nl_langinfo() function call in locale.c to use the new equivalent that is more thread safe, and you don't have to free the returned memory. There was an unlikely leak before this, if the return was somehow "".
* locale.c: Rmv erroneous complement operatorKarl Williamson2017-11-081-1/+1
| | | | | | | | The extra '!' that snuck in there caused this code to not work properly. Fortunately, it doesn't get used except as a last resort, and that apparently hasn't happened so as to have gotten reported from the field. A test can't be added because it would only occur on a system that had bad locales.
* locale.c: Refactor locale macrosKarl Williamson2017-11-081-22/+23
| | | | | This standardizes things to make things easier to understand and prepare for future commits
* locale.c: Convert setlocale() calls to macrosKarl Williamson2017-11-081-36/+41
| | | | This will be useful in future commits
* locale.c: Change static fcn nameKarl Williamson2017-11-081-1/+3
| | | | The new name more closely reflects what it does
* locale.c: Refactor static fcn to save workKarl Williamson2017-11-081-26/+36
| | | | | | | | | | | | | | | This adds a parameter to the function that sets the radix character for floating point numbers. We know that the radix by default is a dot, so no need to calculate it in that case. This code was previously using localeconv() to find the locale's decimal point. The just added my_nl_langinfo() fcn does the same with an easier API, and is more thread safe, and automatically switches to use localeconv() when n nl_langinfo() isn't available, so revise the conditional compilation directives that previously were necessary, and collapse directives that were unnecessarily nested. And adjust indentation
* locale.c: Create extended internal Perl_langinfo()Karl Williamson2017-11-081-1/+10
| | | | | | | | This extended version allows it to be called so that it uses the current locale for the LC_NUMERIC, instead of toggling to the underlying one. (This can be useful when in the middle of things.) This ability won't be used until the next commit
* locale.c: Rmv redundant fcn callKarl Williamson2017-11-081-2/+0
| | | | | This function is called as part of the call made in the line before. No need to do it twice.
* locale.c: White-space, comment, rearrange some #elseKarl Williamson2017-11-081-260/+452
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This file is full of conditional compilation, due to the fact that locale support has been highly variable in the OSes Perl has operated on. This commit properly indents nested compiler directives, and makes sure there is a blank line between the directives and real code. I find that much easier to read. It also re-orders some #ifdef some_feature Many lines of code handling feature #else 1 to 3 lines of trivial code to avoid compilation warnings #endif to #ifndef some_feature 1 to 3 lines of trivial code to avoid compilation warnings #else Many lines of code handling feature #endif Otherwise the trivial code may be hundreds of lines from the original '#if', which makes it hard to grok. This commit also clarifies and fixes typos in comments, and removes some obsolete comments.
* locale.c: Tighten what is considered a LC variableKarl Williamson2017-11-061-3/+13
| | | | | | | Things like LC_CTYPE are locale variables, but not LC_ctype nor LC__CTYPE. Prior to this commit all were treated as locale variables. Many platforms have more locale variables than Perl knows about, e.g., LC_PAPER, and the code tries to catch all possibilities.
* locale.c: White space, extra braces onlyKarl Williamson2017-11-061-6/+8
| | | | | Align vertically, and indent blocks to standard. It adds braces for clarity.
* Use memENDs() in coreKarl Williamson2017-11-061-8/+2
|
* Rename strEQs to strBEGINs; remove strNEsKarl Williamson2017-11-061-2/+2
| | | | | | | | | | The original names are confusing. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/244335 The two macros are mapped into just that one, complementing the result for the few cases where strNEs was used.
* Use nl_langinfo_l() if availableKarl Williamson2017-10-301-2/+29
| | | | This function allows us to avoid using a mutex and changing the locale.
* Fix Perl_langinfo() non-threaded bugKarl Williamson2017-10-211-35/+5
| | | | | | | | | | Perl_langinfo() is supposed to return a pointer to internal storage that is supposed to remain valid until the next call to it. That should come automatically on single-threaded perls. The previous version took advantage of this to avoid copying the result to a buffer, and just called plain nl_langinfo(). However, it turns out that some systems destroy the internal space also when a setlocale() is done. That means the result must be copied in all instances.
* locale.c: Show how the number '112' was derivedKarl Williamson2017-09-141-1/+1
| | | | | It's unclear why the code uses this number, so expand out the expression that yields that, which makes it clearer.
* locale.c: Add a branch predictionKarl Williamson2017-09-091-1/+2
|
* locale.c: Don't be too clever in strlcatKarl Williamson2017-09-091-9/+3
| | | | | | | | | | | | | | | This code I wrote was attempting to avoid multiple calls to strlen in constructing the catenation of various components of a string. It did this by keeping track of how far it got each iteration, and using that as a starting point for the next. I now realize that the return value of strlcat is as if it succeeds, even if there isn't enough room. That means that if there were a problem, this could start out an iteration such that it would be writing beyond the end of the buffer. It is safer to not do this, so this commit removes it. The use of strlcat is a safety measure, as there should be a sufficient amount of space calculated for things to fit, so there is no bug here. But one should be safe.
* locale.c: Use %z to specify printf format for lengthKarl Williamson2017-09-091-1/+1
| | | | This is the better way to do this.
* Add API function Perl_langinfo()Karl Williamson2017-09-091-6/+593
| | | | | | This is designed to generally replace nl_langinfo() in XS code. It is thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be used on systems lacking nl_langinfo.
* locale.c: Use strerror_l if platform has itKarl Williamson2017-08-121-7/+26
| | | | | | strerror_l makes the my_strerror function trivial, as it doesn't have to worry about critical sections, etc. Even on unthreaded perls, it avoids having to change the current locale, and then change it back.
* locale.c: Refactor some #if clausesKarl Williamson2017-08-121-15/+13
| | | | | | | | | | This moves all the handling of the case where there are no locale messages, instead of splitting it up across long stretches of conditionally compiled code. This code is essentially trivial, and seen to be so when it isn't split up; this prepares for the next commit. The final return of the function is still split off so that all branches go through it, and the debugging code adjacent to it.
* locale.c: Move some DEBUGGING codeKarl Williamson2017-08-121-10/+9
| | | | This is moved so it gets executed for all branches.
* locales: Add #define; change how to overrideKarl Williamson2017-08-121-3/+3
| | | | | | | | | | | | | | This changes the controlling #define for using the POSIX 2008 locale functions to "USE_POSIX_2008_LOCALE". The previous controlling name "USE_THREAD_SAFE_LOCALE" is retained for backward compatibility. The reason for this change is that we may add thread-safe locale handling even on platforms that don't have Posix 2008, so the name USE_THREAD_SAFE_LOCALE would be used for controlling things in that situation. In other words, the concepts may become distinct, and so prepare for that.
* Move bulk of POSIX::setlocale to locale.cKarl Williamson2017-07-151-20/+184
| | | | | This cleans up the interface, as it allows several functions to now be static that used to have to be called from outside locale.c
* locale.c: Add forgotten #if DEBUGGINGKarl Williamson2017-07-141-0/+5
| | | | | I pushed the previous commit without actually amending it to include this
* Add debugging to locale handlingKarl Williamson2017-07-141-6/+45
| | | | | | | These debug statements have proven useful in the past tracking down problems. I looked them over and kept the ones that I though might be useful in the future. This includes extracting some code into a static function so it can be called from more than one place.
* locale.c: fix compiler warningDavid Mitchell2017-03-171-2/+2
| | | | | | (this is debugging-only code) It was trying to printf a U32 using %u
* Make _byte_dump_string() usable in all of coreKarl Williamson2017-02-131-5/+3
| | | | | | I found myself needing this function for development debugging, which formerly was only usable from utf8.c. This enhances it to allow a second format type, and makes it core-accessible.
* locale.c: Use only C89 legal CKarl Williamson2017-02-081-5/+9
| | | | | | An array was being declared and initialized from a non-constant. Spotted by James Keenan