| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Removed a XXX section.
|
| |
|
| |
|
| |
|
|
|
|
| |
Spotted by Dan Book
|
|
|
|
|
|
|
|
|
|
| |
6661956a2 was a little too powerful, and, in addition to fixing the
bug that @_ did not properly alias nonexistent elements, also broke
other uses of nonexistent array elements. (See the tests added.)
This commit changes it so that putting @a on the stack does not vivify
all ‘holes’ in @a, but creates defelem (deferred element) scalars, but
only in lvalue context.
|
|
|
|
| |
The next commit will depend on it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #132141
Attributes such as :lvalue have to come *before* the signature to ensure
that they're applied to any code block within the signature; e.g.
sub f :lvalue ($a = do { $x = "abc"; return substr($x,0,1)}) {
....
}
So this commit moves sub attributes to come before the signature. This is
how they were originally, but they were swapped with v5.21.7-394-gabcf453.
This commit is essentially a revert of that commit (and its followups
v5.21.7-395-g71917f6, v5.21.7-421-g63ccd0d), plus some extra work for
Deparse, and an extra test.
See:
RT #123069 for why they were originally swapped
RT #132141 for why that broke :lvalue
http://nntp.perl.org/group/perl.perl5.porters/247999
for a general discussion about RT #132141
|
|
|
|
|
|
| |
There is no "buffer" argument; don't refer to one.
Spotted by KES
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An unescaped left brace that is meant to be taken literally is
officially deprecated, though there are no plans to remove it in contexts
where we don't expect to use it to mean something else, and no warning
is raised in those contexts.
reg_mesg.t tests the known set of these contexts, currently (after this
commit):
/^{/
/foo|{/
/foo|^{/
/foo(:?{bar)/
/\s*{/
/a{3,4}{/
This commit deprecates this context:
/foo({bar})/
This probably should have been illegal all along when 'bar' is a valid
quantifier, as we do with the other quantifiers that follow a left
paren whose illegality we haven't already taken advantage of to mean
something else:
qr/(+0)/
Quantifier follows nothing in regex
This deprecation will allow ({...}) to be usable for a possible future
regex extension
|
|
|
|
| |
Indent to correspond with the new block placed by the previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is already a fatal error for operations whose outcome depends on
them, but in things like
"abc" & "def\x{100}"
the wide character doesn't actually need to participate in the AND, and
so perl doesn't. As a result of the discussion in the thread beginning
with http://nntp.perl.org/group/perl.perl5.porters/244884, it was
decided to deprecate these ones too.
|
|
|
|
| |
This is slightly cleaner than hand rolling the min.
|
| |
|
|
|
|
| |
Now, 2018 is included.
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This branch does the following:
Fixes an issue with tr/non_utf8/long_non_utf8/c, where
length(long_non_utf8) > 0x7fff.
Fixes an issue with tr/non_utf8/non_utf8/cd: basically, the
implicit \x{100}-\x{7fffffff} added to the searchlist by /c wasn't being
added.
Adds a lot of code comments to the various tr/// functions.
Adds tr///c tests - basically /c was almost completely untested.
Changes the layout of the op_pv transliteration table: it used to be roughly
256 x short - basic table
1 x short - length of extended table (n)
n x short - extended table
where the 2 and 3rd items were only present under /c. Its now
1 x Size_t - length of table (256+n)
(256+n) x short - table - both basic and extended
where n == 0 apart from under /c.
The new table format also allowed the tr/non_utf8/non_utf8/ code branches
to be considerably simplified.
op_dump() now dumps the contents of the (non-utf8 variant) transliteration
table.
Removes I32's from the tr/non_utf8/non_utf8/ code paths, making it fully
64-bit clean.
Improves the pod for tr///.
|
| |
| |
| |
| | |
Specifically, explain more clearly what the /csd modifiers do.
|
| |
| |
| |
| | |
Replace each with a more appropriate type
|
| |
| |
| |
| |
| |
| | |
Change the signature of all the internal do_trans*() functions to return
Size_t rather than I32, so that the count returned by tr//// can cope with
strings longer than 2Gb.
|
| |
| |
| |
| |
| | |
I32 to hold char counts etc is generally a bug. I've replaced with Size_t.
I've left the swash part of the code alone.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When, for each slot, deciding whether to set OPpTRANS_GROWS, the
calculation is only done in one of 4 possible branches. It turns out that
in the other branches, the condition can never be true; but determining
that is subtle, and the assumption might break for future changes. Move
the test outside the if/else tree so it can be seen to always apply.
So in theory this commit makes no function difference.
|
| |
| |
| |
| |
| |
| | |
previously it just displayed its address.
Also, when the table is in fact a swash, don't display its address
on threaded builds, as its actually just a padix.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The run-time code to handle a non-utf8 tr/// against a utf8 string
is complex, with many variants of similar code repeated depending on the
presence of the /s and /c flags.
Simplify them all into a single code block by changing how the translation
table is stored. Formerly, the tr struct contained possibly two tables:
the basic 0-255 slot one, plus in the presence of /c, a second one
to map the implicit search range (\x{100}...) against any residual
replacement chars not consumed by the first table.
This commit merges the two tables into a single unified whole. For example
tr/\x00-\xfe/abcd/c
is equivalent to
tr/xff-\x{7fffffff}/abcd/
which generates a 259-entry translation table consisting of:
0x00 => -1
0x01 => -1
...
0xfe => -1
0xff => a
0x100 => b
0x101 => c
0x102 => d
In addition we store:
1) the size of the translation table (0x103 in the example above);
2) an extra 'wildcard' entry stored 1 slot beyond the main table,
which specifies the action for any codepoints outside the range of
the table (i.e. chars 0x103..0x7fffffff). This can be either:
a) a character, when the last replacement char is repeated;
b) -1 when /c isn't in effect;
c) -2 when /d is in effect;
c) -3 identity: when the replacement list is empty but not /d.
In the example above, this would be
0x103 => d
The addition of -3 as a valid slot value is new.
This makes the main runtime code for the utf8 string with non-utf8 tr//
case look like, at its core:
size = tbl->size;
mapped_ch = tbl->map[ch >= size ? size : ch];
which then processes mapped_ch based on whether its >=0, or -1/-2/-3.
This is a lot simpler than the old scheme, and should generally be faster
too.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
RT #132608
In the non-utf8 case, the /c (complement) flag to tr adds an implied
\x{100}-\x{7fffffff} range to the search charlist. If the replacement list
contains more chars than are paired with the 0-255 part of the search
list, then the excess chars are stored in an extended part of the table.
The excess char count was being stored as a short, which caused problems
if the replacement list contained more than 32767 excess chars: either
substituting the wrong char, or substituting for a char located up to
0xffff bytes in memory before the real translation table.
So change it to SSize_t.
Note that this is only a problem when the search and replacement charlists
are non-utf8, the replacement list contains around 0x8000+ entries, and
where the string being translated is utf8 with at least one codepoint >=
U+8000.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Recent commits slightly changed the layout of the extended map table: it
now always stores a repeat count, and there are now two structs defined,
rather than treating certain slots, like tbl[0x101], specially.
Update B and Deparse to reflect this.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Originally, the op_pv of an OP_TRANS op pointed to a 256-slot array of
shorts, which contained the translations. However, in the presence of
tr///c, extra information needs to be stored to handle utf8 strings.
The 256 slot array was extended, with slot 0x100 holding a length,
and slots 0x101 holding some extra chars.
This has made things a bit messy, so this commit adds two structs,
one being an array of 256 shorts, and the other being the same but with
some extra fields. So for example tbl->[0x100] has been replaced with
tbl->excess_len.
This commit should make no functional difference, but will allow us
shortly to fix a bug by changing the type of the excess_len field from
short to something bigger, for example.
|
| |
| |
| |
| | |
outdent a code block following previous commit.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In transliterations where the search and replacement charlists are
non-utf8, but where the string being modified contains codepoints >=
0x100, then tr/.../.../cd would always delete all such codepoints, rather
than potentially mapping some of them.
In more detail: in the presence of /c (complement), an implicit
0x100..0x7fffffff is added to a non-utf8 search charlist. If the
replacement list is longer than the < 0x100 part of the search list, then
the last few replacement chars should in principle be paired off against
the first few of (\x100, \x101, ...). However, this wasn't happening. For
example,
tr/\x00-\xfd/ABCD/cd
should be equivalent to
tr/\xfe-\x{7fffffff}/ABCD/d
which should
map:
\xfe => A,
\xff => B,
\x{100} => C,
\x{101} => D,
and delete \x{102} onwards.
But instead, it behaved like
tr/\xfe-\x{7fffffff}/AB/d
and deleted all codepoints >= 0x100.
This commit fixes that by using the extended mapping table format
for all /c variants (formerly it excluded /cd).
I also changed a variable holding the mapped char from being I32 to UV:
principally to avoid a casting mess in the fixed code. This may (or may
not), as a side-effect, have fixed possible issues with very large
codepoints.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For non-utf8, OP_TRANS(R) ops have a translation table consisting of an
array of 256 shorts attached. For tr///c, this table is extended to hold
information about chars in the replacement list which aren't paired with
chars in the search list. For example,
tr/\x00-AE-\xff/bcdefg/c
is equivalent to
tr/BCD\x{100}-\x{7fffffff}/bcdefg/
which is equivalent to
tr/BCD\x{100}-\x{7fffffff}/bcdefggggggggg..../
Only the BCD => bcd mappings can be stored in the basic 256-slot table,
so potentially the following extra information needs recording in an
extended table to handle codepoints > 0xff in the string being modified:
1) the extra replacement chars ("efg");
2) the number of extra replacement chars (3);
3) the "repeat" char ('g').
Currently 2) and 3) are combined: the repeat char is found as the last
extra char, and if there are no extra chars, the repeat char is treated
as an extra char list of length 1.
Similarly, an 'extra chars' length value of 1 can imply either one extra
char, or no extra chars with the repeat char being faked as an extra char.
An 'extra chars' length of 0 implies an empty replacement list, i.e.
tr/....//c.
This commit changes it so that the repeat char is *always* stored (in slot
0x101), with the extra chars stored beginning at slot 0x102.
The 'extra chars' length value (located at slot 0x0100) has changed its
meaning slightly: now
-1 implies tr/....//c
0 implies no more replacement chars than search chars
1+ the number of excess replacement chars.
This (should) make no function difference, but the extra information
stored will make it easier to fix some bugs shortly.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
in tr/search/replace/c, the number of 'paired' replacement chars
will always be <= length(replace). Assert this, and thus simplify a couple
of conditionals from >= to ==.
It should make no difference to execution, but reduces the cognitive
load.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The /c (complement) flag is almost completely untested. Indeed, for the
all non-utf8 case, nothing in core exercises a plain tr///c.
So this commit adds reasonably comprehensive tests for tr//c and variants
(/cs, /cd, /csd) where the search and replacement ranges are non-utf8, and
the string being matched may or may not be utf8.
A few tests are TODO for now as I've exposed some bugs - to be fixed
shortly.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Various flag vars are set early on, such as:
const I32 complement = o->op_private & OPpTRANS_COMPLEMENT;
but sometimes these vars weren't being used, and op_private was being
tested again.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This:
DEBUG_t( Perl_deb(aTHX_ "2.TBL\n"));
has been around in one form or another since perl1, but it makes no sense
since perl5,000, where -Dt now shows the name of the op being executed.
|
| |
| |
| |
| |
| | |
Removal of MAD a long time ago left a couple of lines with very weird
indentation.
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
For the various C functions which implement the compile-time and
run-time aspects of OP_TRANS, add some basic code comments at the top of
each function explaining what its purpose is.
Also add lots of code comments to the body of S_pmtrans() (which compiles
a tr///).
Also comment what the OPpTRANS_ private flag bits mean.
No functional changes.
|
|
|
|
|
|
|
| |
The upgrade from 2.020_04 to 2.025 was not noted manually in
perldelta, since there was nothing particularly noteworthy in the
update, and the comment at the top of the section says this will
happen automatically as part of the release process.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The F0convert() function used to implement the %.0f format specifier
more cheaply went wrong on some edge cases. Its rounding went wrong
when the exponent is such that fractional values are not representable,
making the "+= 0.5" invoke floating point rounding. Fix that by only
invoking that rounding logic for values that start out fractional.
That fixes the output part of [perl #47602]. It also failed to emit the
sign for negative zero. Fix that by making it not apply to zero values.
|
| |
|
|
|
|
| |
Addresses RT # 132737.
|
| |
|
|
|
|
|
| |
Some things VMS doesn't have and one that it does. All were
missing from the config.sh we generate.
|
|
|
|
|
|
|
| |
On Darwin 15.6.0, mkostemp() was observed to be autodetected as present
but actually be unlinkable. It is unknown what other Darwin versions
are affected, so for the time being just override the autodetection on
all versions.
|
|
|
|
|
|
| |
This reverts commit 523d71b314dc75bd212794cc8392eab8267ea744, reinstating
commit 2cdf406af42834c46ef407517daab0734f7066fc. Reversion is not the
way to address the porting problem that motivated that reversion.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When displaying each reg node being executed, the code that dumps a REF
node assumed that a capture was valid if progs->offs[n].start != -1.
In fact during backtracking after a failure, a capture is "undone" by
merely setting progs->offs[n].end = -1.
So make the dump code account for that too.
This was causing a test in t/re/pat.t to coredump:
use re qw(Debug EXECUTE);
"x" =~ m{ () y | () \1 }x;
Although given that neither the test nor the REF code in regprop() have
changed recently, I'm not sure why this has only recently started crashing.
|