| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All but one of scan_ident()'s callers already passed PL_bufend as
the removed argument; The one deviant was intuit_more(), which was
setting the "end of buffer" argument, to the next close-bracket.
This commit modifies intuit_more() to temporarily set PL_bufend and
then restore it.
This was done as groundwork for the following commit, which will add
more uses of PEEKSPACE() to scan_ident() in order to fix some whitespace
and line number bugs, and PEEKSPACE() modifies PL_bufend directly
if it encounters a newline at the end of the buffer -- that last bit
being why changing intuit_more() to modify-and-restore PL_bufend is
safe, since the end of the buffer will always be a ']'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on Yves's random branch work.
This version makes the new random number visible to external modules,
for example, List::Util's XS shuffle() implementation.
I've also added a 64-bit implementation when HAS_QUAD is true, this
should be significantly faster, even on 32-bit CPUs. This is intended to
produce exactly the same sequence as the original implementation.
The original version of this commit retained the "freebsd" name from
Yves's original work for the function and data structure names. I've
removed "freebsd" from most function names so the name isn't an issue
if we choose to replace the implementation,
|
| |
|
|
|
|
|
| |
gv_is_in_main() checks if an unqualified identifier is in the main::
stash.
|
|
|
|
|
| |
Namely, gv_magicalize no longer stores the GV into the stash, which
is gv_fetchpvn_flags' job.
|
|
|
|
|
|
| |
This bit is called when a GV already exists, but it's name is length-one
and it's on the main:: stash, so it might have multiple kinds of magic,
like $! and %!, or @+ and %+.
|
| |
|
|
|
|
|
| |
This commit takes a chunk of code out of gv_fetchpvn_flags and
turns it into two fuctions: parse_gv_stash_name and find_default_stash.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7b6e8075e45ebc684565efbe3ce7b70435f20c79.
It turns out to be problematic, because it causes NULLs on the stack,
which XSUBs may trip on.
My main reason for it was actually to try to resolve some CPAN
failures, but it turns out that other fixes have removed the
need for that.
|
|
|
|
| |
These functions worked with ints instead of SSize_t,
|
|
|
|
|
|
|
| |
Now that NULL is used for a nonexistent element, it is easy for XS
code to pass it to av_push(). av_store already accepts NULL, and
av_push already works with it on non-debugging builds, so there is
really no need for this restriction.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warn and die have special code (closest_cop) to find a nulled
nextstate op closest to the warn or die op, to get the line number
from it. This commit extends that capability to caller, so that
if (1) {
foo();
}
sub foo { warn +(caller)[2] }
shows the right line number.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the Unicode data is stored in native character set order, it is
rare to need to work with the Unicode order. Traditionally, the real
work was done in functions that worked with the Unicode order, and
wrapper functions (or macros) were used to translate to/from native.
There are two groups of functions: one that translates from code point
to UTF-8, and the other group goes the opposite direction.
This commit changes the base function that translates from UTF-8 to code
point to output native instead of Unicode. Those extremely rare
instances where Unicode output is needed instead will have to hand-wrap
calls to this function with a translation macro, as now described in the
API pod. Prior to this, it was the other way, the native was wrapped,
and the rare, strict Unicode wasn't. This eliminates a layer of
function call overhead for a common case.
The base function that translates from code point to UTF-8 retains its
Unicode input, as that is more natural to process. However, it is
de-emphasized in the pod, with the functionality description moved to
the pod for a native input wrapper function. And, those wrappers are
now macros in all cases; previously there was function call overhead
sometimes. (Equivalent exported functions are retained, however, for XS
code that uses the Perl_foo() form.)
I had hoped to rebase this commit, squashing it with an earlier commit
in this series, eliminating the use of a temporary function name change,
but the work involved turns out to be large, with no real payoff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for deprecating these functions, to force any
code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these
functions should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation when it actually goes into effect: we are waiting for CPAN
files distributed with the core to change before doing the deprecation.
According to cpan.grep.me, this should affect fewer than 10 additional
CPAN distributions.
|
|
|
|
|
|
|
| |
Code should almost never be dealing with non-native code points
This is in preparation for later deprecation when our CPAN modules have
been converted away from using it.
|
|
|
|
|
|
|
| |
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
According to grep.cpan.me, there are no uses of this function in CPAN.
|
|
|
|
|
|
|
|
|
| |
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
According to grep.cpan.me, there are no uses of this function in CPAN.
|
|
|
|
| |
This is in preparation for the current wrapee becoming deprecated
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
|
|
|
|
|
| |
This fairly short paradigm is repeated in several places; a later commit
will improve it.
|
|
|
|
|
| |
Otherwise when compiling XS code, there is a declaration for a function
which is never used, which can cause some compilers to issue a warning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check for the nul char in pathnames and string arguments to
syscalls, return undef and set errno to ENOENT.
Added to the io warnings category syscalls.
Strings with embedded \0 chars were prev. ignored in the syscall but
kept in perl. The hidden payloads in these invalid string args may cause
unnoticed security problems, as they are hard to detect, ignored by
the syscalls but kept around in perl PVs.
Allow an ending \0 though, as several modules add a \0 to
such strings without adjusting the length.
This is based on a change originally by Reini Urban, but pretty much
all of the code has been replaced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of getting the regexp engine to handle long strings, this com-
mit changes any variables, parameters and struct members that hold
lengths of the string being matched against (or parts thereof) to use
SSize_t or STRLEN instead of [IU]32.
To avoid having to change any logic, I kept the signedness the same.
I did not change anything that affects the length of the regular
expression itself, so regexps are still practically limited to
I32_MAX. Changing that would involve changing the size of regnodes,
which would be a lot more involved.
These changes should fix bugs, but are very hard to test. In most
cases, I don’t know the regexp engine well enough to come up with test
cases that test the paths in question with long strings. In other
cases I don’t have a box with enough memory to test the fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using I32 for the fields that record information about the location of
a fixed string that must be found for a regular expression to match
can result in match failures, because I32 is not large enough to store
offsets >= 2**31.
SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32
bits on 32-bit platforms.
This commit changes enough instances of I32 to SSize_t to get the
added test passing and suppress compiler warnings. A later commit
will change many more.
|
|
|
|
|
|
|
|
|
| |
Change the internal fields for storing positions so that //g in scalar
context can move past the 2**31 character threshold. Before this com-
mit, the numbers would wrap, resulting in assertion failures.
The changes in this commit are only enough to get the added test pass-
ing. Stay tuned for more.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at -e line 1.
0
So pos() should be stored as a character offset.
The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.
This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.
|
|
|
|
|
|
|
|
|
|
| |
Make the array interface 64-bit safe by using SSize_t instead of I32
for array indices.
This is based on a patch by Chip Salzenberg.
This completes what the previous commit began when it changed
av_extend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(I am referring to what is usually known simply as The Stack.)
This partially fixes #119161.
By casting the argument to int, we can end up truncating/wrapping
it on 64-bit systems, so EXTEND(SP, 2147483648) translates into
EXTEND(SP, -1), which does not extend the stack at all. Then writing
to the stack in code like ()=1..1000000000000 goes past the end of
allocated memory and crashes.
I can’t really write a test for this, since instead of crashing it
will use more memory than I have available (and then I’ll start for-
getting things).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a partial fix for #119161.
On 64-bit platforms, I32 is too small to hold offsets into a stack
that can grow larger than I32_MAX. What happens is the offsets can
wrap so we end up referencing and modifying elements with negative
indices, corrupting memory, and causing crashes.
With this commit, ()=1..1000000000000 stops crashing immediately.
Instead, it gobbles up all your memory first, and then, if your com-
puter still survives, crashes. The second crash happesn bcause of
a similar bug with the argument stack, which the next commit will
take care of.
|
|
|
|
|
|
| |
This commit adds #if's to cause locale handling code to compile on
platforms that don't have full-featured locale handling. The commits
mentioned in the ticket did not adequately cover these situations.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c82ecf346.
It turn out to be faulty, because a location shared betweens threads
(the cop) was holding a reference count on a pad entry in a particu-
lar thread. So when you free the cop, how do you know where to do
SvREFCNT_dec?
In reverting c82ecf346, this commit still preserves the bug fix from
1311cfc0a7b, but shifts it around.
|
|
|
|
|
|
|
|
| |
gv_check was only checking for stashes nested directly inside them-
selves (*foo:: = *foo::foo) and the main stash.
Other stash circularities would cause infinite recursion, blowing the
C stack and crashing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This saves having to allocate a separate string buffer for every cop
(control op; every statement has one).
Under non-threaded builds, every cop has a pointer to the GV for that
source file, namely *{"_<filename"}.
Under threaded builds, the name of the GV used to be stored instead.
Now we store an offset into the per-interpreter PL_filegvpad, which
points to the GV.
This makes no significant speed difference, but it reduces mem-
ory usage.
|
|
|
|
|
|
|
|
|
| |
This changes the previously unused _invlist_dump() function to be called
from sv_dump() to dump inversion list scalars. The format for regular
SVt_PVs doesn't give human-friendly output for these.
Since these lists are currently not visible outside the Perl core, the
format is documented only in comments in the function itself.
|
|
|
|
| |
This code that appears twice is nearly duplicate.
|
|
|
|
|
| |
This function was introduced a few commits ago. Since it's now only
called from within regexec.c, make it static.
|
|
|
|
|
|
|
|
|
|
|
| |
Cut and paste into a separate function, the block of code in
regexec_flags() that is responsible (on successful match) for setting
RX_SAVED_COPY, RX_SUBBEG etc, ready for use by capture vars like $1, $&.
Although this function is currently only called from one place, we will
shortly use it elsewhere too.
This should contain no functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a closure closes over a variable, it references the variable
itself, as opposed to taking a snapshot of its value.
This was broken by the constant optimisation added for
constant.pm’s sake:
{
my $x;
sub () { $x }; # takes a snapshot of the current value of $x
}
constant.pm no longer uses that mechanism, except on older perls, so
we can remove this hack, causing code like this this to start work-
ing again:
BEGIN{
my $x = 5;
*foo = sub(){$x};
$x = 6
}
print foo; # now prints 6, not 5
|
|
|
|
|
|
|
|
|
|
|
| |
These are inlined the same way as 1..5. We have two ops:
rv2av
|
`-- const
The const op returns an AV, which is stored in the op tree, and then
rv2av flattens it.
|
|
|
|
|
|
|
|
|
| |
This reverts commit 43387ee1abcd83c3c7586b7f7aa86e838d239aac.
Which reverted parts of f019c49e380f764c1ead36fe3602184804292711, but that
reversion may no longer be necessary.
See [perl #116989]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is part of #116907, too. It also fixes #72924 as a side effect;
the next commit will explain.
The value of pos($foo) was being stored as an I32, not allowing values
above I32_MAX. Change it to SSize_t (the signed equivalent of size_t,
representing the maximum string length the OS/compiler supports).
This is accomplished by changing the size of the entry in the magic
struct, which is the simplest fix.
Other parts of the code base can benefit from this, too.
We actually cast the pos value to STRLEN (size_t) when reading
it, to allow *very* long strings. Only the value -1 is special,
meaning there is no pos. So the maximum supported offset is
2**sizeof(size_t)-2.
The regexp engine itself still cannot handle large strings, so being
able to set pos to large values is useless right now. This is but one
piece in a larger puzzle.
Changing the size of mg->mg_len also requires that
Perl_hv_placeholders_p change its type. This function
should in fact not be in the API, since it exists
solely to implement the HvPLACEHOLDERS macro. See
<https://rt.perl.org/rt3/Ticket/Display.html?id=116907#txn-1237043>.
|
|
|
|
|
|
|
|
|
| |
This, similar to sv_pos_u2b_flags, is a more friendly variant of
sv_pos_u2b that works with 2GB strings and actually returns a
value instead of modifying a passed-in value in place through
a pointer.
The next commit will use this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second try. 5969c5766a5d3 had a bug in it under non-
MAD builds.
If I have a sub I can use its name as a bareword as long as I suffix
it with =>, even if the => is on the next line:
$ ./perl -Ilib -e 'sub tim; warn tim' -e '=>'
tim at -e line 1.
If I want to use a built-in keyword’s name as a bareword, I can put =>
after it:
$ ./perl -Ilib -e 'warn time =>'
time at -e line 1.
But if I combine the two (keyword + newline), it does not work:
$ ./perl -Ilib -e 'warn time' -e ' =>'
1373611283 at -e line 1.
unless I override the keyword:
$ ./perl -Ilib -Msubs=time -e 'warn time' -e ' =>'
time at -e line 1.
=> after a bareword is checked for in two places in toke.c. The first
comes before a comment saying ‘NO SKIPSPACE BEFORE HERE!’; it only
skips spaces and finds a => on the same line. The second comes later;
it skips vertical space and comments, too.
But the second check is in a code path that is not reached by keywords
that are not overridden (as is the ‘NO SKIPSPACE’ comment).
This commit adds an extra check for built-in keywords after we have
determined that the keyword is not overridden. In that case, there is
no reason we cannot use skipspace, as we no longer have to worry about
what PL_oldbufptr etc. point to.
This commit leaves __DATA__ and __END__ alone, since they
are special, problematic and controversial. (See, e.g.,
<https://rt.perl.org/rt3/Ticket/Display.html?id=78348#txn-1234355>.)
Allowing whitespace to be scanned across line boundaries without
increasing the line number (something this commit has to do to make
this work) can cause the way PL_linestr is handled to change.
PL_linestr usually holds just the current line when reading from a
handle. Now it can hold the current line plus the next line or seve-
ral lines, depending on how much whitespace is to be found there.
When '\n' or '#' was encountered, the lexer would modify the buffer in
place and add a null, setting PL_bufend to point to that null. That
would make it look as though the end of the line had been reached, and
avoided having to scan to find the end of a comment.
In string eval and quote-like operators, the end of the comment does
have to be scanned for. We can’t just fake EOL and read the next
line of input.
Under MAD builds, the end of the comment was being scanned for any-
way, even when reading from a handle. So everything worked under MAD,
which was what I tested 5969c5766a5d3 under.
This commit changes the '\n' and '#' handling to match the MAD code
(scan for the end of the comment instead of faking a buffer trunca-
tion), which 5969c5766a5d3 failed to do.
|
|
|
|
|
|
|
| |
The number of elements in an inversion list is a simple calculation
based on SvCUR(). Prior to this patch there was a field that contained
that number directly, and the two values diverged, causing a bug. A
single value can't get out-of-sync with itself.
|
|
|
|
|
|
|
| |
The function invlist_set_len() has to be called after the offset header
field in an inversion list has been set. To make sure that future
maintainers don't forget to do this, add the parameter for the 'offset'
to its call, so it can't be called without having this value.
|