| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC due to a bug doesn't merge identicals between .o'es or discard these
vars and their contents.
MEM_WRAP_CHECK_2 has never been used outside of core according to cpan grep
MEM_WRAP_CHECK_2 was removed on the "have PERL_MALLOC_WRAP" branch in
commit fabdb6c0879 "pre-likely cleanup" without explination, probably bc
it was unused. But MEM_WRAP_CHECK_2 was still left on the "no
PERL_MALLOC_WRAP" branch, so remove it from the "no" side for tidyness
since it was a mistake to leave it there if it was removed from the "yes"
side of the #ifdef.
Add MEM_WRAP_CHECK_s API, letter "s" means argument is string or static.
This lets us get rid of the "%s" argument passed to Perl_croak_nocontext at
a couple call sites since we fully control the next and only argument and
its guaranteed to be a string literal. This allows merging of 2
"Out of memory during array extend" c strings by linker now.
Also change the 2 op.h messages into macros which become string literals
at their call sites instead of "read char * from a global char **" which
was going on before.
VC 2003 32b perl527.dll section size before
.text name
DE503 virtual size
.rdata name
4B621 virtual size
after
.text name
DE503 virtual size
.rdata name
4B5D1 virtual size
|
|
|
|
| |
whitespace-only change left over from my recent tr///c fix work
|
|
|
|
|
| |
I inspected the code, and there is no problem here; it's a compiler
mistake. Nevertheless, smply initializing the variable silences it.
|
|
|
|
| |
Indent to correspond with the new block placed by the previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is already a fatal error for operations whose outcome depends on
them, but in things like
"abc" & "def\x{100}"
the wide character doesn't actually need to participate in the AND, and
so perl doesn't. As a result of the discussion in the thread beginning
with http://nntp.perl.org/group/perl.perl5.porters/244884, it was
decided to deprecate these ones too.
|
|
|
|
| |
This is slightly cleaner than hand rolling the min.
|
|
|
|
| |
Replace each with a more appropriate type
|
|
|
|
|
|
| |
Change the signature of all the internal do_trans*() functions to return
Size_t rather than I32, so that the count returned by tr//// can cope with
strings longer than 2Gb.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The run-time code to handle a non-utf8 tr/// against a utf8 string
is complex, with many variants of similar code repeated depending on the
presence of the /s and /c flags.
Simplify them all into a single code block by changing how the translation
table is stored. Formerly, the tr struct contained possibly two tables:
the basic 0-255 slot one, plus in the presence of /c, a second one
to map the implicit search range (\x{100}...) against any residual
replacement chars not consumed by the first table.
This commit merges the two tables into a single unified whole. For example
tr/\x00-\xfe/abcd/c
is equivalent to
tr/xff-\x{7fffffff}/abcd/
which generates a 259-entry translation table consisting of:
0x00 => -1
0x01 => -1
...
0xfe => -1
0xff => a
0x100 => b
0x101 => c
0x102 => d
In addition we store:
1) the size of the translation table (0x103 in the example above);
2) an extra 'wildcard' entry stored 1 slot beyond the main table,
which specifies the action for any codepoints outside the range of
the table (i.e. chars 0x103..0x7fffffff). This can be either:
a) a character, when the last replacement char is repeated;
b) -1 when /c isn't in effect;
c) -2 when /d is in effect;
c) -3 identity: when the replacement list is empty but not /d.
In the example above, this would be
0x103 => d
The addition of -3 as a valid slot value is new.
This makes the main runtime code for the utf8 string with non-utf8 tr//
case look like, at its core:
size = tbl->size;
mapped_ch = tbl->map[ch >= size ? size : ch];
which then processes mapped_ch based on whether its >=0, or -1/-2/-3.
This is a lot simpler than the old scheme, and should generally be faster
too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #132608
In the non-utf8 case, the /c (complement) flag to tr adds an implied
\x{100}-\x{7fffffff} range to the search charlist. If the replacement list
contains more chars than are paired with the 0-255 part of the search
list, then the excess chars are stored in an extended part of the table.
The excess char count was being stored as a short, which caused problems
if the replacement list contained more than 32767 excess chars: either
substituting the wrong char, or substituting for a char located up to
0xffff bytes in memory before the real translation table.
So change it to SSize_t.
Note that this is only a problem when the search and replacement charlists
are non-utf8, the replacement list contains around 0x8000+ entries, and
where the string being translated is utf8 with at least one codepoint >=
U+8000.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the op_pv of an OP_TRANS op pointed to a 256-slot array of
shorts, which contained the translations. However, in the presence of
tr///c, extra information needs to be stored to handle utf8 strings.
The 256 slot array was extended, with slot 0x100 holding a length,
and slots 0x101 holding some extra chars.
This has made things a bit messy, so this commit adds two structs,
one being an array of 256 shorts, and the other being the same but with
some extra fields. So for example tbl->[0x100] has been replaced with
tbl->excess_len.
This commit should make no functional difference, but will allow us
shortly to fix a bug by changing the type of the excess_len field from
short to something bigger, for example.
|
|
|
|
| |
outdent a code block following previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In transliterations where the search and replacement charlists are
non-utf8, but where the string being modified contains codepoints >=
0x100, then tr/.../.../cd would always delete all such codepoints, rather
than potentially mapping some of them.
In more detail: in the presence of /c (complement), an implicit
0x100..0x7fffffff is added to a non-utf8 search charlist. If the
replacement list is longer than the < 0x100 part of the search list, then
the last few replacement chars should in principle be paired off against
the first few of (\x100, \x101, ...). However, this wasn't happening. For
example,
tr/\x00-\xfd/ABCD/cd
should be equivalent to
tr/\xfe-\x{7fffffff}/ABCD/d
which should
map:
\xfe => A,
\xff => B,
\x{100} => C,
\x{101} => D,
and delete \x{102} onwards.
But instead, it behaved like
tr/\xfe-\x{7fffffff}/AB/d
and deleted all codepoints >= 0x100.
This commit fixes that by using the extended mapping table format
for all /c variants (formerly it excluded /cd).
I also changed a variable holding the mapped char from being I32 to UV:
principally to avoid a casting mess in the fixed code. This may (or may
not), as a side-effect, have fixed possible issues with very large
codepoints.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For non-utf8, OP_TRANS(R) ops have a translation table consisting of an
array of 256 shorts attached. For tr///c, this table is extended to hold
information about chars in the replacement list which aren't paired with
chars in the search list. For example,
tr/\x00-AE-\xff/bcdefg/c
is equivalent to
tr/BCD\x{100}-\x{7fffffff}/bcdefg/
which is equivalent to
tr/BCD\x{100}-\x{7fffffff}/bcdefggggggggg..../
Only the BCD => bcd mappings can be stored in the basic 256-slot table,
so potentially the following extra information needs recording in an
extended table to handle codepoints > 0xff in the string being modified:
1) the extra replacement chars ("efg");
2) the number of extra replacement chars (3);
3) the "repeat" char ('g').
Currently 2) and 3) are combined: the repeat char is found as the last
extra char, and if there are no extra chars, the repeat char is treated
as an extra char list of length 1.
Similarly, an 'extra chars' length value of 1 can imply either one extra
char, or no extra chars with the repeat char being faked as an extra char.
An 'extra chars' length of 0 implies an empty replacement list, i.e.
tr/....//c.
This commit changes it so that the repeat char is *always* stored (in slot
0x101), with the extra chars stored beginning at slot 0x102.
The 'extra chars' length value (located at slot 0x0100) has changed its
meaning slightly: now
-1 implies tr/....//c
0 implies no more replacement chars than search chars
1+ the number of excess replacement chars.
This (should) make no function difference, but the extra information
stored will make it easier to fix some bugs shortly.
|
|
|
|
|
|
|
|
|
| |
This:
DEBUG_t( Perl_deb(aTHX_ "2.TBL\n"));
has been around in one form or another since perl1, but it makes no sense
since perl5,000, where -Dt now shows the name of the op being executed.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the various C functions which implement the compile-time and
run-time aspects of OP_TRANS, add some basic code comments at the top of
each function explaining what its purpose is.
Also add lots of code comments to the body of S_pmtrans() (which compiles
a tr///).
Also comment what the OPpTRANS_ private flag bits mean.
No functional changes.
|
|
|
|
|
|
|
| |
This commit changes 3 occurrences of byte-at-a-time looking to see if a
string is invariant under UTF-8, to using the inlined
is_utf8_invariant_string() which now does much faster word-at-a-time
looking.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The newish function hv_pushkv() currently just pushes all key/value pairs on
the stack. i.e. it does the equivalent of the perl code '() = %h'.
Extend it so that it can handle 'keys %h' and values %h' too.
This is basically moving the remaining list-context functionality out of
do_kv() and into hv_pushkv().
The rationale for this is that hv_pushkv() is a pure HV-related function,
while do_kv() is a pp function for several ops including OP_KEYS/VALUES,
and expects PL_op->op_flags/op_private to be valid.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...and make pp_padhv(), pp_rv2hv() use it rather than using Perl_do_kv()
Both pp_padhv() and pp_rv2hv() (via S_padhv_rv2hv_common()), outsource to
Perl_do_kv(), the list-context pushing/flattening of a hash onto the
stack.
Perl_do_kv() is a big function that handles all the actions of
keys, values etc. Instead, create a new function which does just the
pushing of a hash onto the stack.
At the same time, split it out into two loops, one for tied, one for
normal: the untied one can skip extending the stack on each iteration,
and use a cheaper HeVAL() instead of calling hv_iterval().
|
|
|
|
|
|
| |
This op doesn't use that bit, but it calls the function Perl_do_kv(),
which is called by several different ops which *do* use that bit.
So ensure no-one in future thinks that bit is spare in OP_VALUES.
|
|
|
|
| |
Use this symbolic constant rather than the literal constant '3'.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function can be called directly or indirectly by several ops.
Update its code comments to explain this in detail, and assert which
ops can call it. Also remove a redundant comment about
OP_RKEYS/OP_RVALUES; these ops have been removed.
Also, reformat the 'dokv = ' expressions.
Finally, add some code comments to pp_avhvswitch explaining what its for.
Apart from the op_type asserts, there should be no functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 08b6664b858b8fd4b5c0c27542763337b6d78e46 breaks things like
$foo = "" & "\x{100}";
We have deprecated using above-FF code points in bitwise operations, and
made them illegal in 5.27. However, the case where the illegal code
points don't play a part in the operation never raised deprecation
warnings. The example above is one such, because the \x{100} comes
after the operation stops since the other operand has length 0.
We can't make something illegal without warning people about it for 2
releases.
Rather than revert that commit, and reinstate a bunch of slow code that
is far more general than now needed, this commit adds some extra code to
deal with these situations, but the basic operations still take place in
tight loops, which 08b6664b858b8fd4b5c0c27542763337b6d78e46 caused to
happen.
In the case of "&", the illegal code points get truncated away. In the
case of ^ and |, they get catenated as-is. This preserves earlier
behavior.
It has not been decided if these should at least warn, or the usage
should be deprecated. A commit can easily be done to change to whatever
the final decision is, but this commit doesn't raise any warnings, hence
preserves existing behavior.
The breaking commit looks like it might create some havoc with CPAN, and
fixing it now will save the CPAN testers effort, as they won't have to
deal with a bunch of broken distributions.
|
|
|
|
| |
Outdent, since the previous commit removed an enclosing block
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5d09ee1cb7b68f5e6fd15233bfe5048612e8f949 fatalized bitwise
operations of operands with wide characters in them. It retained the
regular UTF-8 handling, but throws an error when a wide character is
encountered.
But this code is complicated because of its original intended
generality. It can essentially be ripped out, replaced by code that
just downgrades the operand to non-UTF-8. Then we use the regular code
to do the operation. In the complement case, that's all that need be
done to mimic earlier behavior, as the result has not been in UTF-8.
For the other operations, the result is simply upgraded to UTF-8.
This removes quite a few lines of code, and now the UTF-8 handling uses
the same tight loops as the non-UTF-8. Downgrading and upgrading had to
be done specially before, but now they are done in tight loops, before
the operation, and after the operation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was formerly I32. It should be unsigned since you can't have a negative
number of args. And although you're unlikely to call sprintf with more
than 0x7fffffff args, it makes it more consistent with other APIs which
we've been gradually expanding to 64-bit/ptrsize. It also makes the
code internal to Perl_sv_vcatpvfn_flags more consistent, when
dealing with explict arg index formats like "%10$s". This function still
has a mix of STRLEN (for string lengths) and Size_t (for arg indexes)
but they are aliases for each other.
I made Perl_do_sprintf()'s len arg SSize_t rather than Size_t, since
it typically gets called with ptr diff arithmetic. Not sure if this is
being overly cautious.
|
|
|
|
|
|
| |
This commit removes quite a number of tests, mostly from t/op/bop.t,
which test the behaviour of such code points in combination of
bitwise operators. Since it's now fatal, the tests are no longer useful.
|
|
|
|
|
|
|
| |
RT# 129300
This hash-dumping debugging flag corrupted hash values and has probably
not been used by anyone in 20 years.
|
| |
|
|
|
|
| |
This will make this consistent with the bitwise operators.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #131083
Recent commits v5.25.10-81-gd69c430 and v5.25.10-82-g67dd6f3 added
out-of-range/overflow checks for the offset arg of vec(). However in
lvalue context, these croaks now happen before the SVt_PVLV was created,
rather than when its set magic was called. This means that something like
sub f { $x = $_[0] }
f(vec($s, -1, 8))
now croaks even though the out-of-range value never ended up getting used
in lvalue context.
This commit fixes things by, in pp_vec(), rather than croaking, just set
flag bits in LvFLAGS() to indicate that the offset is -Ve / out-of-range.
Then in Perl_magic_getvec(), return 0 if these flags are set, and in
Perl_magic_setvec() croak with a suitable error.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130915
In something like
vec($str, $bignum, 16)
(i.e. where $str is treated as a series of 16-bit words), Perl_do_vecget()
and Perl_do_vecset() end up doing calculations equivalent to:
$start = $bignum*2;
$end = $start + 2;
Currently both these calculations can wrap if $bignum is near the maximum
value of a STRLEN (the previous commit already fixed cases for $bignum >
max(STRLEN)).
So this commit makes them check for potential overflow before doing such
calculations.
It also takes account of the fact that the previous commit changed the
type of offset from signed to unsigned.
Finally, it also adds some tests to t/op/vec.t for where the 'word'
overlaps the end of the string, for example
$x = vec("ab", 0, 64)
should behave the same as:
$x = vec("ab\0\0\0\0\0\0", 0, 64)
This uses a separate code path, and I couldn't see any tests for it.
This commit is based on an earlier proposed fix by Aaron Crane.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... and fix up its caller, pp_vec().
This is part of a fix for RT #130915.
pp_vec() is responsible for extracting out the offset and size from SVs on
the stack, and then calling do_vecget() with those values. (Sometimes the
call is done indirectly by storing the offset in the LvTARGOFF() field of
a SVt_PVLV, then later Perl_magic_getvec() passes the LvTARGOFF() value to
do_vecget().)
Now SvCUR, SvLEN and LvTARGOFF are all of type STRLEN (a.k.a Size_t),
while the offset arg of do_vecget() is of type SSize_t (i.e. there's a
signed/unsigned mismatch). It makes more sense to make the arg of type
STRLEN. So that is what this commit does.
At the same time this commit fixes up pp_vec() to handle all the
possibilities where the offset value can't fit into a STRLEN, returning 0
or croaking accordingly, so that do_vecget() is never called with a
truncated or wrapped offset.
The next commit will fix up the internals of do_vecget() and do_vecset(),
which have to worry about offset*(2^n) wrapping or being > SvCUR().
This commit is based on an earlier proposed fix by Aaron Crane.
|
|
|
|
|
|
| |
Some vars have been tagged as const because they do not change in their
new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char,
so is changed to char.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TonyC's recent commit v5.25.6-172-gdc529e6 updated do_vop() to avoid
doing a sv_catpvn() when the left and destination SVs are the same.
As well as being more efficient, it is needed, as a recent change to
sv_catpvn() made it more likely to grow and realloc the buffer, meaning
the copy()'s src buffer had been freed.
This commit represents my parallel attempt to fix the same issue; I'm
replacing Tony's version with mine as it is logically more comprehensive:
it copes with the dest being the same as the right arg as well as the
left, and checks for string pointers being equal rather than sv's being
equal. Neither of these make any difference currently, but they could in
theory (although unlikely) catch some future change in usage.
RT #129995
|
|
|
|
|
|
|
| |
This could call sv_catpvn() with the source string being within the
destination SV, which caused a freed memory access if do_vop() and
sv_catpvn_flags() had different ideas about the ideal size of the
target SV's buffer.
|
| |
|
|
|
|
|
| |
The dual-life dists affected use Devel::PPPort, so can safely use
sv_setpvs() even though it wasn't added until Perl v5.10.0.
|
| |
|
|
|
|
|
|
|
|
| |
The & and &. operators were not appending a null byte to the string
in utf8 mode.
(The internal function that they use is the same. I used &. in the
test just because its intent is clearer.)
|
| |
|
|
|
|
|
| |
&CORE::keys does not yet work as an lvalue. (I’m not sure how to make
that work.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit makes perl die when keys(%hash) is returned from an lvalue
sub and the lvalue sub call is assigned to in list assignment:
sub foo : lvalue { keys(%INC) }
(foo) = 3; # death
This prevents an assignment that is completely useless and probably a
mistake, and it makes the lvalue-sub use of keys behave the same way
as (keys(%INC)) = 3.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of gimme stored in the context stack is U8.
Make all other uses in the main core consistent with this.
My primary motivation on this was that the new function cx_pushblock(),
which I gave a 'U8 gimme' parameter, was generating warnings where callers
were passing I32 gimme vars to it. Rather than play whack-a-mole, it
seemed simpler to just uniformly use U8 everywhere.
Porting/bench.pl shows a consistent reduction of about 2 instructions on
the loop and sub benchmarks, so this change isn't harming performance.
|
|
|
|
|
|
|
| |
See thread starting at
http://nntp.perl.org/group/perl.perl5.porters/227698
Ricardo Signes provided the perldelta and perldiag text.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit made it clear that the N argument to EXTEND()
is supposed to be signed, in particular SSize_t, and now typically
triggers compiler warnings where this isn't the case.
This commit fixes the various places in core that passed the wrong sort of
N to EXTEND(). The fixes are in three broad categories.
First, where sensible, I've changed the relevant var to be SSize_t.
Second, where its expected that N could never be large enough to wrap,
I've just added an assert and a cast.
Finally, I've added extra code to detect whether the cast could
wrap/truncate, and if so set N to -1, which will trigger a panic in
stack_grow().
This also fixes
[perl #125937] 'x' operator on list causes segfault with possible
stack corruption
|
|
|
|
|
|
|
| |
Commit 2b32fed8 removed the PUTBACK/SPAGAIN around hv_iterval and
Perl_sv_setpvf, but didn't take the opportunity to merge the
initialisation with the declaration now that there's no code between
them.
|
| |
|