| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This reverts commit 148f39b7de6eae9ddd59e0b0aff691d6abea7aca.
(Still needs more work, but wanted to see how well this passed with Jenkins.)
|
|
|
|
|
|
| |
Definitely not *after* it. It marks the start of the unreachable,
not the first unrechable line. And if they are in that order,
it looks better to linebreak after the lint hint.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This problem turns out to be a misspelling in two places of a compiler
definition. Since the definition didn't exist (as it was misspelled),
the #ifdef failed.
I don't know how really to test this as it is locale collation, which
varies by locale, and we would be relying on vendor-supplied locales
which may be inconsistent between platforms. I intend to tackle
improvements to collaction later this release cycle, and should come up
with tests at that time. The failing tests in the module were comparing
the Perl sort results with those of the module, and finding they differ.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I consider this experimental, so that if code breaks as a result, we
will remove it.
I chose the numeric warnings category. But misc or a new subcategory of
numeric might be better choices.
There is also the issue if someone is calculating the repeat count in
floating point and gets something that would be 0 if there were infinite
precision, but ends up being a very small negative number. The current
implementation will warn on that, but probably shouldn't. I suspect that
this would be extremely rare in practice.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after return/croak/die/exit, return/break are pointless
(break is not a terminator/separator, it's a goto)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(the initializations will be skipped since the flow will bypass
the start of the block); in some easy cases simply hoist the declarations
out of the block and move them earlier
Note 1: Since after this patch the core is not yet -Wunreachable-code
clean, not enabling that via cflags.SH, one needs to -Accflags=... it.
Note 2: At least with the older gcc 4.4.7 there are far too many
"unreachable code" warnings, which seem to go away with gcc 4.8,
maybe better flow control analysis. Therefore, the warning should
eventually be enabled only for modernish gccs (what about clang and
Intel cc?)
|
|
|
|
|
|
|
| |
This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679.
I don't understand - smoke-me came back happy with three
separate reports... oh well, some other time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after croak/die/exit (or return), break (or return!) are pointless
(break is not a terminator/separator, it's a promise of a jump)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(they will be skipped!); in some easy cases simply hoist the declarations
out of the block and move them earlier
There are still a few places left.
|
| |
|
|
|
|
|
|
|
|
| |
This meant sprinkling some PERL_UNUSED_CONTEXT invocations,
as well as stopping some functions from getting my_perl in
the first place; all of the functions in the latter category
are internal (S_ prefix and s or i in embed.fnc), so
this should be both safe and economical.
|
|
|
|
| |
This silences a chunk of warnings under -Wformat
|
|
|
|
|
|
| |
After commits d6ded95025185cb1ec8ca3ba5879cab881d8b180 and
130c5df3625bd130cd1e2771308fcd4eb66cebb2, there are some compilation
warnings if not all locale categories are used.
|
|
|
|
|
|
|
|
| |
Commit 130c5df3625bd130cd1e2771308fcd4eb66cebb2 introduced errors into
Windows (at least) compilations because it used #if's in the middle of
apparent function calls, but these were really macros that turned the
function call foo() into a call of Perl_foo(), and so we were doing
an #if from within a #define which is not generally legal.
|
|
|
|
|
|
|
|
| |
Commit d6ded95025185cb1ec8ca3ba5879cab881d8b180 introduced
the ability to specify individual category parameters to 'use locale'.
However in doing so, it causes Perl to not be able to compile on
platforms that don't have some or all of those categories defined, such
as Android. This commit uses #ifdefs to remedy that.
|
|
|
|
| |
This is for comprehensibility and to make a future commit easier.
|
|
|
|
|
|
|
| |
This commit allows one to specify to enable locale-awareness for only a
specified subset of the locale categories. Thus you could make a
section of code LC_MESSAGES aware, with no locale-awareness for the
other categories.
|
|
|
|
|
|
|
|
|
| |
Fix by petermartini.
Fix for Coverity perl5 CID 28909:
Out-of-bounds access (ARRAY_VS_SINGLETON)
ptr_arith: Using &unsliced_keysv as an array.
This might corrupt or misinterpret adjacent memory locations.
|
|
|
|
|
|
|
|
|
|
|
| |
-move PL_stack_sp and PL_stack_base reads into the branch in which they
are used, this also removes 1 var from being saved across the function
call in GIMME, which removes saving and restoring 1 non-vol register
-write SP to PL_stack_sp (PUTBACK) only if it was changed
-POPMARK is mutable, it must execute on all branches
this reduced pp_list's machine code size of the function from 0x58 to
0x53 bytes on VC 2003 -01 32 bits
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under threaded builds, GVs for OPs are stored in the pad rather than
being directly attached to the op. For some reason, all such GV's were
getting the SvPADTMP flag set. There seems to be be no good reason for
this, and after skipping setting the flag, all tests still pass.
The advantage of not setting this flag is that there are quite a few
hot places in the code that do
if (SvPADTMP(sv) && !IS_PADGV(sv)) {
...
I've replaced them all with
if (SvPADTMP(sv)) {
assert(!IS_PADGV(sv));
...
Since the IS_PADGV() macro expands to something quite heavyweight, this is
quite a saving: for example this commit reduces the size of pp_entersub by
111 bytes.
|
|
|
|
|
|
| |
av_tindex is a more clearly named synonym for av_len, available starting
in v5.18. This changes the core uses to it, including modules in /ext,
which are not dual-lifed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4571f4a caused the gp to have a refcount of 1, not 0, in
gp_free when the contents of the glob are freed. This makes
gv_try_downgrade see the gv as a candidate for downgrading in
this example:
sub Fred::AUTOLOAD { $Fred::AUTOLOAD }
undef *{"Fred::AUTOLOAD"};
When the glob is undefined, the sub inside it is freed, and the
gvop ($Fred::AUTOLOAD), when freed, tries to downgrade the glob
(*Fred::AUTOLOAD). Since it is empty, it deletes it completely from
the containing stash, so the GV is freed out from under gp_free, which
is still using it, causing an assertion failure.
We can trigger a similar condition more explicitly:
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; undef *{"foo"}'
This bug is nothing new. On a non-debugging 5.18.2, I get this:
$ perl5.18.2 -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; undef *{"foo"}'
Attempt to free unreferenced glob pointers at -e line 1.
Segmentation fault: 11
That crashes in pp_undef after the call to gp_free, becaues pp_undef
continues to manipulate the GV.
The problem occurs not only with pp_undef, but also with other func-
tions calling gp_free:
sv_setsv_flags:
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; *{"foo"}="bar"'
glob_assign_glob:
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = bless []; *{"foo"}=*bar'
sv_unglob, reached through various paths:
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local *bar}; $${"foo"} = bless []; ${"foo"} = 3'
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local *bar}; $${"foo"} = bless []; utf8::encode(${"foo"})'
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local *bar}; $${"foo"} = bless []; open bar, "t/TEST"; ${"foo"} .= <bar>'
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local *bar}; $${"foo"} = bless []; ${"foo"}++'
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = do {local *bar}; $${"foo"} = bless []; undef ${"foo"}'
$ ./miniperl -e 'DESTROY{delete $::{foo}} ${"foo"} = 3; ${"foo"} =~ s/3/${"foo"} = do {local *bar}; $${"foo"} = bless []; 4/e'
And there are probably more ways to trigger this through sv_unglob.
(I stopped looking when I thought of the fix.)
This patch fixes the problem by protecting the GV using the mortals
stack in functions that call gp_free. I did not change gp_free
itself, since it is an API function that as yet does not touch the
mortals stack, and I am not sure that should change. All of its
callers that this patch touches already do sv_2mortal in some cir-
cumstances.
|
|
|
|
|
|
| |
The only time the result of toFOLD_LC() can be larger than a byte is
in a UTF-8 locale, which has already been ruled out for this section of
code.
|
|
|
|
|
|
|
|
|
|
| |
RXf_IS_ANCHORED as a replacement
The only requirement outside of the regex engine is to identify that there is
an anchor involved at all. So we move the 4 anchor flags to intflags and replace
it with a single aggregate flag RXf_IS_ANCHORED in extflags.
This frees up another 3 bits in extflags.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
|
|
|
|
|
|
|
|
|
| |
The documentation says that Perl taints certain operations when subject
to locale rules, such as lc() and ucfirst(). Prior to this commit
there were exceptions when the operand to these functions contained no
characters whose case change actually varied depending on the locale,
for example the empty string or above-Latin1 code points. Changing to
conform to the documentation simplifies the core code, and yields more
consistent results.
|
|
|
|
|
|
| |
I kept getting burned by these macros returning non-zero instead of a
boolean, as they are used as bool parameters to functions. So I added
cBOOLs to them.
|
|
|
|
|
| |
An unsigned character (U8) should not have more than 8 bits of data, so
no need to force that by masking with 0xFF.
|
|
|
|
|
|
|
| |
This code got the actual length of the input scalar, but discarded it.
If that scalar contains malformed UTF-8 that has fewer bytes than is
indicated, a read beyond-buffer-end could happen. Simply use the actual
length.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, lc and uc would not warn about undef, due to an implemen-
tation detail.
The implementation changed in 673061948, and extra code was added to
keep the behaviour the same.
Commit 0a0ffbced enabled the warnings about undef, but did so by added
even more code in the midst of the blocks that existed solely to avoid
the warning.
We can just delete those blocks and put in a simple stringification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pp.c:pp_lc has this:
/* Here is where we would do context-sensitive actions. See the
* commit message for this comment for why there isn't any */
If I try to look up the commit that added the comment, I get this:
commit 06b5486afd6f58eb7fdf8c5c8cdb8520a4c87f40
Author: Karl Williamson <public@khwilliamson.com>
Date: Fri Nov 11 10:13:28 2011 -0700
pp.c: White-space only
This outdents and reflows comments as a result of the removal of a
surrounding block
86510fb15 was the commit that added the comment, whose commit message
contains the explanation, so cite that directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It used to be that this code:
for("$foo") {
lc $_;
...
}
would modify $_, allowing other code in the ‘for’ block to see the
changes (bug #43207). Commit 17fa077605 fixed that by changing the
logic that determined whether lc/uc(first) could modify the sca-
lar in place.
In doing so, it stopped in-place modification from happening at all,
because the condition became SvPADTMP && SvTEMP, which never happens.
(SvPADTMP unually indicates an operator return value stored in a pad;
i.e., a scalar that will next be used by the same operator again to
return another value. SvTEMP indicates that the REFCNT will go down
shortly, usually a temporary value created solely for the sake of
returning something.)
Now that bug #78194 is fixed, for("$foo") no longer exposes a PADTMP
to the following code, so we *can* now assume (as was done erroneously
before) that PADTMP indicates something like lc("$foo$bar") and modify
pp_stringify’s return value in place.
Also, we can extend this to apply to TEMP variables that have a ref-
erence count of 1, since they cannot be in use elsewhere. We skip
TEMP variables with set-magic, because they could be tied, and
SvSETMAGIC would have a side effect. (That could happen with
lc(delete $h{tied_elem}).)
Previously, this was skipped for uc and lc for overloaded references,
since stringification could change the utf8ness. That is no longer
sufficient. As of Perl 5.16, typeglobs and non-overloaded blessed
references can also enable their utf8 flag upon stringification, if
the stash or glob names contains wide characters. So I changed the
!SvAMAGIC (not overloaded) to SvPOK (is a string already), which will
cover most cases where this optimisation helps. The two tests added
to the end of lc.t fail with !SvAMAGIC.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A suggested way of avoiding the the warning on nv1 != nv2
by replacing it with (nv1 < nv2 || nv1 > nv2), has too many issues
with NaN. [perl #120538].
I haven't found any other way of selectively disabling the warning,
so for now I'm just reverting the whole commit.
This reverts commit c279c4550ce59702722d0921739b1a1b92701b0d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The gcc option -Wfloat-equal warns when two floating-point numbers
are directly compared for equality or inequality, the idea being that
this is usually a logic error, and that you should be checking that the
values are instead very near to each other.
perl on the other hand has lots of reasons to do a direct comparison.
Add two macros, NV_eq_nowarn(a,b) and NV_eq_nowarn(a,b)
that do the same as (a == b) and (a != b), but without the warnings.
They achieve this by instead doing (a < b) || ( a > b).
Under gcc at least, this is optimised into the same code as the direct
comparison.
The are three places that I've left untouched, because they are handling
NaNs, and that gets a bit tricky. In particular (nv != nv) is a test for a
NaN, and replacing it with (< || >) creates signalling NaNs (whereas ==
and != create quiet NaNs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way CORE:: was handled in the lexer was convoluted.
CORE was treated initially as a keyword, with exceptions in the lexer
to make it behave correctly. If it turned out not to be followed
by ::, then the lexer would fall back to treating it as a bareword
or sub name.
Before even checking for a keyword, the lexer looks for :: and goes
to the bareword/sub code. But it made a special exception there
for CORE::.
In the end, treating CORE as a keyword recognized by the keyword()
function requires more special cases than simply special-casing CORE::
in toke.c.
This fixes the lexical CORE sub bug, while reducing the total num-
ber of lines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of v5.16.0-81-g234df27, SvAMAGIC has only meant potentially over-
loaded. When method changes occur, the flag is turned on. When over-
loading tables are calculated (the first time overloading is used),
the flag is turned off if it turns out there is no overloading.
At the time I did that, I assumed that all uses of SvAMAGIC were to
avoid the inefficient code path for non-overloaded objects. What I
did not notice at the time was that SvAMAGIC is used in pp_bless to
determine whether an object used as a class name should be exempt from
‘Attempt to bless into a reference’.
Hence, the bizarre result:
$ ./perl -Ilib -e 'sub foo{} bless [], bless []'
$ ./perl -Ilib -e 'bless [], bless []'
Attempt to bless into a reference at -e line 1.
This commit makes both die consistently, as they did in 5.16.
|
|
|
|
|
| |
There is no reason tied (or otherwise magical variables like $/)
should be exempt from the ‘Attempt to bless into a reference’ error.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When PERL_DONT_CREATE_GVSV is defined, perl generally does not vivify
the scalar slot in every GV. But it hides that implementation detail
by vivifying it when *foo{SCALAR} is accessed.
undef(*foo) was one exception to this. It vivified the scalar in
the scalar slot regardless of whether PERL_DONT_CREATE_GVSV was
defined.
Until recently, it was not safe to remove this exception, because a
typeglob with no scalar could be a candidate for downgrading (see
gv.c:gv_try_downgrade), causing global pointers like PL_DBgv to point
to freed SVs. Recent commits have fixed all those cases, so this
is now safe.
|
|
|
|
| |
These get rid of some "possible loss of data" warnings
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
kvaslice operator that imlements %a[0,2,4] syntax which
result in list of index/value pairs. Implemented in
consistency with "key/value hash slice" operator.
|
|
|
|
|
|
| |
kvhslice operator that implements %h{1,2,3,4} syntax which
returns list of key value pairs rather than just values
(regular slices).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on Yves's random branch work.
This version makes the new random number visible to external modules,
for example, List::Util's XS shuffle() implementation.
I've also added a 64-bit implementation when HAS_QUAD is true, this
should be significantly faster, even on 32-bit CPUs. This is intended to
produce exactly the same sequence as the original implementation.
The original version of this commit retained the "freebsd" name from
Yves's original work for the function and data structure names. I've
removed "freebsd" from most function names so the name isn't an issue
if we choose to replace the implementation,
|
|
|
|
|
|
|
| |
This removes a macro not yet even in a development release, and splits
its calls into two classes: those where the input is a byte; and those
where it can be any unsigned integer. The byte implementation avoids a
function call on EBCDIC platforms.
|
|
|
|
|
|
|
| |
Commit ce0d59f changed AVs to use NULLs for nonexistent elements.
pp_splice needs to take that into account and avoid pushing NULLs on
to the stack.
|
|
|
|
| |
Make a ternary operation more clear
|
|
|
|
|
|
|
|
| |
Now that the Unicode tables are stored in native format, we shouldn't be
doing remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; not all of this has been done yet.
|
|
|
|
|
|
|
|
| |
The conversion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
|