| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Extracted from patch submitted by Lajos Veres in RT #123693.
|
|
|
|
|
|
| |
A better comment is added. The #if is moved so that the rare
compilation that doesn't use LC_CTYPE, no unused variable warning would
be generated.
|
|
|
|
|
|
|
|
|
|
| |
The bulk of this macro is extremely rarely executed, so it makes sense
to optimize for space, as it is called from a fair number of places, and
move as much as possible to a single function.
For whatever it's worth, on my system with my typical compilation
options, including -O0, the savings was 19640 bytes in regexec.o, 4528
in utf8.o, at a cost of 1488 in locale.o.
|
|
|
|
|
|
|
| |
I spotted this in code review. I didn't add a test for it, because to
expose the much more serious bug fixed by the previous commit, I had to
temporarily change the C code to force these extremely
unlikely-to-be-taken branches to execute.
|
|
|
|
|
| |
I got confused in writing this: the global needs to be cleared always,
and set to NULL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8c6180a91de91a1194f427fc639694f43a903a78 added a warning message
for when Perl determines that the program's underlying locale just
switched into is poorly supported. At the time it was thought that this
would be an extremely rare occurrence. However, a bug in HP-UX -
B.11.00/64 causes this message to be raised for the "C" locale. A
workaround was done that silenced those. However, before it got fixed,
this message would occur gobs of times executing the test suite. It was
raised even if the script is not locale-aware, so that the underlying
locale was completely irrelevant. There is a good prospect that someone
using an older Asian locale as their default would get this message
inappropriately, even if they don't use locales, or switch to a
supported one before using them.
This commit causes the message to be raised only if it actually is
relevant. When not in the scope of 'use locale', the message is stored,
not raised. Upon the first locale-dependent operation within a bad
locale, the saved message is raised, and the storage cleared. I was
able to do this without adding extra branching to the main-line
non-locale execution code. This was done by adding regnodes which get
jumped to by switch statements, and refactoring some existing C tests so
they exclude non-locale right off the bat.
These changes would have been necessary for another locale warning that
I previously agreed to implement, and which is coming a few commits from
now.
I do not know of any way to add tests in the test suite for this. It is
in fact rare for modern locales to have these issues. The way I tested
this was to temporarily change the C code so that all locales are viewed
as defective, and manually note that the warnings came out where
expected, and only where expected.
I chose not to try to output this warning on any POSIX functions called.
I believe that all that are affected are deprecated or scheduled to be
deprecated anyway. And POSIX is closer to the hardware of the machine.
For convenience, I also don't output the message for some zero-length
pattern matches. If something is going to be matched, the message will
likely very soon be raised anyway.
|
|
|
|
|
|
| |
HP-UX - B.11.00/64 has a problem with the C locale that's only
noticeable from newly added warnings flooding the logs. This adds a
test to suppress them.
|
|
|
|
|
| |
is_ascii_string's name has misled me in the past; the new name is
clearer.
|
|
|
|
|
|
|
|
|
| |
Some systems setlocale()s use static storage for the locale name
returned by it, so that a subsequent setlocale overwrites it.
Therefore, you must make a copy of the name if you want it to work after
the next setlocale.
Thanks to Craig Berry for finding and diagnosing this problem.
|
|
|
|
|
| |
This reverts commit 1244bd171b8d1fd4b6179e537f7b95c38bd8f099,
thus reinstating commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc.
|
|
|
|
|
|
|
| |
This reverts commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc.
Win32 with a 1252 code page was failing blead. Revert until I have time
to look at it.
|
|
|
|
|
|
|
|
|
| |
Perl only supports single-byte locales (except for UTF-8 ones), and has
poor support for 7-bit locales that aren't supersets of ASCII (these
should be exceedingly rare these days).
This commit raises warnings in the new locale warning category when
such a locale is entered.
|
|
|
|
|
|
|
|
| |
Building a debugging perl triggered warnings such as
warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘U32’
warning: field width specifier ‘*’ expects argument of type ‘int’, but argument 5 has type ‘long unsigned int’
warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘wchar_t’
|
|
|
|
|
| |
At least on the system that we have tested on. There are locales that
say they are UTF-8, but they're not; they're EBCDIC 1047.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds to handy.h isALPHA_FOLD_EQ(c1,c2) which efficiently tests if
c1 and c2 are the same character, case-insensitively. For example
isALPHA_FOLD_EQ(c, 's') returns true if and only if <c> is 's' or 'S'.
isALPHA_FOLD_NE() is also added by this commit.
At least one of c1 and c2 must be known to be in [A-Za-z] or this macro
doesn't work properly. (There is an assert for this in the macro in
DEBUGGING builds). That is why the name includes "ALPHA", so you won't
forget when using it.
This functionality has been in regcomp.c for a while, under a different
name. I had thought that the only reason to make it more generally
available was potential speed gain, but recent gcc versions optimize to
the same code, so I thought there wasn't any point to doing so.
But I now think that using this makes things easier to read (and
certainly shorter to type in). Once you grok what this macro does, it
simplifies what you have to keep in your mind when reading logical
expressions with multiple operands. That something can be either upper
or lower case can be a distraction to understanding the larger point of
the expression.
|
|
|
|
|
|
| |
This trivial function is to be used by XS code when it changes the
program's locale. It hides the details from that code of what needs to
be done, which could change in the future.
|
| |
|
|
|
|
|
| |
The previous way to suppress messages wasn't working for all gcc
versions. Spotted by Jarkko Hietaniemi.
|
|
|
|
|
| |
Remaining atoi() uses include at least:
ext/DynaLoader/dl_aix.xs, os2/os2.c, vms/vms.c
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This code extends the heuristics used to determine if a locale is UTF-8
or not on older platforms. It has been #ifdef'd out because it only
added a little value on dromedary. Now the previous commit has added
new heuristics, and tests on dromedary show that this adds nothing to
that. But I'm leaving it in the source in case it might ever prove
useful. In order to test it, I compiled it and found some problems with
the earlier version that this now fixes.
|
|
|
|
|
|
|
|
| |
On older platforms that don't conform to POSIX 2001 nor C99, heuristics
are employed to try to determine if a locale is UTF-8 or not. This
commit improves those heuristics by looking at names of the months and
days of the week to see if they are UTF-8 or not. This is done if
looking at the currency symbol failed to help.
|
|
|
|
|
| |
Indent and outdent blocks of code to conform to newly formed or removed
braces
|
|
|
|
|
|
|
|
|
| |
On older platforms that aren't C99 nor POSIX 2001, locale.c uses the
currency symbol to try to see if a locale is UTF-8 or not. This commit
refactors it somewhat to make it cleaner, and which fixes several
problems. The least issue was that it sometimes did a setlocale()
unnecessarily. Others are that in some circumstances it called
localeconv() and/or looked at the result while within the wrong locale.
|
|
|
|
| |
This only affected runs with the -DL parameter to perl set.
|
|
|
|
|
|
|
| |
The interior-most function can return NULL. Currently savepv() which is
the next outer function handles this correctly, as does the next outer
function, but it is dangerous to rely on that behavior. So we test for
NULL before calling functions on a NULL ptr.
|
|
|
|
|
|
|
|
|
| |
In the function that determines if a POSIX locale is UTF-8 or not, if
either nl_langinfo or MB_CUR_MAX are defined, it can reliably determine
the answer. If they are not defined, it uses heuristics to figure
things out as best it can. This code doesn't add value for those
platforms where one of the two symbols is defined, so can just be
ifdef'd out
|
|
|
|
|
| |
Looking at if the currency symbol is UTF-8 should come ahead of looking
at the locale name.
|
|
|
|
|
|
|
|
| |
This section of code just returned generally,. This commit changes it
so that it drops off the end if it can't determine if the current locale
is UTF-8 or not, so that additional tests can be added later. The
function defaults to not UTF-8 if this drops off the end, so there
should be no functionality change
|
|
|
|
|
|
| |
Commit a39edc4c877304d4075679b1d8de1904671a9c37 got a parenthesis
misplaced so it wasn't really looking at the next character, like it was
supposed to be doing
|
|
|
|
| |
Outdent because the previous commit removed the enclosing block.
|
|
|
|
|
|
|
|
|
| |
These two functions are supposed to normally be called through macro
interfaces which check whether they actually should be called or not.
That means the conditionals removed by this commit are redundant from
the normal interface. By removing them, we allow the exceptional case
where the code should be executed unconditionally, to happen, by just
calling the functions directly, not using the macro interface.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl uses three interpreter-level (but private) variables to keep track
of numeric locales. PL_numeric name is the current underlying locale.
PL_standard is a boolean to indicate if we are switched to the C (or
POSIX) locale, and PL_local is a boolean to indicate if we are switched
to the underlying one. The reason there are two booleans is if the
underlying locale is C, both can be true at the same time. But the code
that is being changed by this commit didn't realize this, and
could unnecessarily set the booleans to FALSE. This could cause
unnecessary switching of locales.
|
| |
|
|
|
|
|
|
| |
From Brian Fraser: "Technically, any Perl compiled with
-Accflags="-UUSE_LOCALE", or -Ui_locale -Ud_setlocale...
realistically, for Android".
|
|
|
|
|
|
|
|
| |
You need to configure with g++ *and* -Accflags=-DPERL_GLOBAL_STRUCT
or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference.
(g++ does not do the "post-annotation" form of "unused".)
The version code has some of these issues, reported upstream.
|
|
|
|
|
|
|
| |
The chunk is not MAD-related but instead locale stuff. I have no idea
why that chunk got removed (I used a combination of unifdef(1) and editor).
It's #if-0-ed, so no change of behavior either way, but let's keep
the code for now, since it seems to have "historical significance".
|
|
|
|
|
|
| |
MAD = Misc Attribute Decoration; unmaintained attempt at preserving
the Perl parse tree more faithfully so that automatic conversion to
Perl 6 would have been easier.
|
|
|
|
|
| |
Somehow the ! in this if () got dropped, and there were no tests to
catch it. Now both are remedied.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringification of $! has long been an outlier in Perl locale
handling. The theory has been that these operating system messages are
likely to be of use to the final user, and should be in their language.
Things like
No space left on device
Can't fork
are not something the program is likely to handle, but could be
meaningfully helpful to the end-user.
There are problems with this though. One is that many perl messages are
in English, with the $! appended to them, so that the resultant message
is of mixed language, and may need to be translated anyway. Things like
No space left on device
probably won't need the remaining portion of the message to give someone
a clear indication as to what's wrong. But there are many other
messages where both the OS error and the Perl error would be needed
togther to understand the problem. An on-line translation tool can be
used to do this.
Another problem is that it can lead to garbage coming out on the user's
terminal when the program is not expecting UTF-8, but the underlying
locale is UTF-8. This is what happens in Bug #112208, and another that
was merged with it. It's a lot harder to translate mojibake via an
online tool than English.
This commit solves that by using the C locale for messages, except
within the scope of 'use locale'. It is extremely likely that the
messages in the C locale will be English, but if not they will be ASCII,
and there will be no garbage printed. A program that says "use locale"
is indicating that it has the intelligence necessary to deal with
locales.
|
|
|
|
|
|
|
| |
This commit allows one to specify to enable locale-awareness for only a
specified subset of the locale categories. Thus you could make a
section of code LC_MESSAGES aware, with no locale-awareness for the
other categories.
|
|
|
|
|
|
|
|
|
|
|
| |
When processing version strings, the radix character must be a dot even
if we otherwise would be using some other character. vutil.c
upg_version() changes to the dot, but calls sv_catpvf() which may try to
change the character to something else. This commit introduces a way to
lock the character to a dot around the call to sv_catpvf()
vutil.c is cpan-upstream, but already blead and cpan have diverged, so
this just updates the SHA of the new version
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perl.h has some macros used to manipulate the locale exposed for the
category LC_NUMERIC. These are currently undocumented, but will need to
be documented as the development of 5.21 progresses. This fixes these
up in several ways:
The tests for if we are in the correct state are made into macros. This
is in preparation for the next commit, which will make one of them more
complicated, and so that complication will only have to be in one place.
The variable declared by them is renamed to be preceded by an
underscore. It is dangerous practice to have a name used in a macro, as
it could conflict with a name used by outside code. This alleviates it
somewhat by making it even less likely to conflict. This will have to
be revisited when some of these macros are made part of the public API.
The tests to see if things need to change are reversed. Previously we
said we need to change to standard, for example, if the variable for
'local' is set. But both can be true at the same time if the underlying
locale is C. In this case, we only need to change if we aren't in
standard. Whether that is also local is irrelevant.
|
|
|
|
|
|
|
| |
This is for XS modules, so they don't have to worry about the radix
being a non-dot. When the locale needs to be in the underlying one, the
operation should be wrapped using macros for the purpose. That API may
change as we gain experience in 5.21, so I'm not including it now.
|
|
|
|
|
|
|
|
|
| |
Rare, but not unheard of, is for the strings returned by localeconv to
be in UTF-8. This commit looks for and sets the UTF-8 flag if they are.
so encoded.
A private function had to changed from static for this. It is renamed
to begin with an underscore to emphasize its private nature.
|
|
|
|
|
|
|
|
|
|
|
| |
(Currently, only Win32 has one.)
[perl #121865]
Fix for Coverity perl5 CID 28949:
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement
name = system_default_locale;
|
|
|
|
|
|
|
|
|
| |
Fix for Coverity perl5 CID 29058: Resource leak
(RESOURCE_LEAK) leaked_storage: Variable codeset going out of scope leaks the
storage it points to.
The savepv-ed codeset was not freed in failure path.
(The save_input_locale is freed just few lines later.)
|
|
|
|
|
|
| |
regcomp.c:11083: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement
locale.c:1113: warning: comparison between signed and unsigned integer expressions
|
|
|
|
|
|
| |
pass_freed_arg: Passing freed pointer save_input_locale as an argument to PerlIO_printf.
Printfing save-pvs after freeing them.
|