| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Currently the SvSCREAM flag is set on the sv pointed to by
cx->blk_eval.cur_text, to indicate that it is ref counted.
Instead, use a spare bit in the blk_u16 field of the eval context.
This is to reduce the number of odd special cases for the SvSCREAM flag.
|
|
|
|
| |
when reporting unrecognized characters in UTF mode.
|
|
|
|
|
|
| |
The second argument to S_lop() is an int, but it gets stored in
PL_expect which is a U8. If we need a U8, then let's bring it
into the function as a U8.
|
|
|
|
| |
when there's a short UTF-8 character at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch simplifies two bits of code that I came across while
working on supporting the clang -Weverything flag.
The first, in Perl_validate_proto, removes unnecessary variable
initialization if proto of NULL is passed.
The second, in S_scan_const, rearranges some code and #ifdefs so that
the convert_unicode and real_range_max variables are only declared
if EBCDIC is set. This lets us no longer have to unnecessarily set
useless variables to make the compiler happy, and it saves us from some
unnecessary checks on "if (convert_unicode)". One of the comments says
"(Compilers should optimize this out for non-EBCDIC)", but now the
compiler won't even see these unnecessary variables or tests.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
The value here is Unicode, not native, so needs a different macro.
There's no test for this, as this is allocating space, and could be one
byte off, which is only a problem if it is one byte small, and we were
at a limit where that single byte made the difference.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PVNV's used to be used to hold pad names (like '$lex'), but aren't used
for this purpose any more. So eliminate the xpad_cop_seq part of the
union.
Since S_scan_subst() was using xnv_u.xpad_cop_seq.xlow to store a
temporary line count, add a new union member for that.
The main usage of this field on CPAN is to define
COP_SEQ_RANGE_LOW()-style macros, so if the module is still using
xpad_cop_seq for that purpose, it's already broken.
|
| |
|
|
|
|
|
|
|
|
| |
While fixing delimcpy(), I found that it wasn't always clear what its
callers did, so I've added some extra code comments.
also add a balancing '}' in a comment block to help editors that
jump between matching brackets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the parameter of an attribute has no matching closing ')', there
are several issues with the way the error is handled.
First, the code currently tries to set PL_bufptr back to the position
of the opening '(' for a better error message. However, since the error
will have been discovered only at EOF after all the remaining lines have
been read and discarded, the buffer no longer contains ':attr(...'.
So the error message includes a spurious \0 followed by possibly some
random chunk of characters from the last line read in.
Worse, if the input buffer gets realloced while perl searches for the ')',
then PL_bufptr is reset to point into the freed buffer. [perl #129086].
It does yyerror() rather than croak(), so further error messages appear,
even though we're at EOF and no further parsing can occur. Similar cases
such as no matching closing string delimiter all croak instead.
It resets cop_line to zero *before* printing the error, so the line number
of the error is erroneously reported as zero.
This commit fixes all these issues by handling the error more similarly
to the way unterminated strings are handled. In particular, it no longer
tries to print out the section of src where it thinks the error is.
For comparison, running perl on this code file:
# this is line 1
my $x :foo(bar;
the
quick
brown
fox jumps over the lazy dog
used to output:
Unterminated attribute parameter in attribute list at /tmp/p1 line 0, near "\0\0x jumps "
syntax error at /tmp/p1 line 0, at EOF
Execution of /tmp/p1 aborted due to compilation errors.
but now outputs:
Unterminated attribute parameter in attribute list at /tmp/p1 line 2.
Note how previously: the error message included two literal null chars
(represented by \0 above), followed by a random chunk of the last line;
it claimed to be on line 0; it output two further error messages.
For comparison, change the ':foo' to 'q' so that its an unterminated
string, and you get (and always got):
Can't find string terminator ")" anywhere before EOF at /tmp/p1 line 2.
|
| |
|
|
|
|
| |
In some cases this is used in building error messages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
9bde56224 added this as part of macro:
- PL_last_lop_op = f; \
+ PL_last_lop_op = f < 0 ? -f : f; \
which broke win32 builds due to this
UNIBRACK(-OP_ENTEREVAL)
expanding to
PL_last_lop_op = -345 < 0 ? --345 : -345
and the -- being seen as a pre-dec op.
Diagnosed by Dagfinn Ilmari Mannsåker.
|
|
|
|
|
|
|
|
|
| |
5dc13276 added some code to toke.c that did not take into account
that the opnum (‘f’) argument to UNI* could be a negated op number.
PL_last_lop_op must never be negative, since it is used as an offset
into a struct.
Tests for the crash will come in the next commit.
|
|
|
|
|
|
| |
This changes the places in the core to use the clearer synonym added by
the previous commit. It also changes one place that hand-rolled its own
code to use this function instead.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl_yylex maintains up to two pointers, `s` and `d`, into the parser
buffer at PL_bufptr. It can call skipspace(), which can potentially
grow (and realloc) its argument. This can leave the second pointer
pointing at the old buffer. Under most cases it isn't visible, because
the old buffer isn't reused or zeroed. However, under Valgrind or
libdislocator, this memory management error becomes visible.
This patch saves the location of the second pointer in two locations,
and restores it after the call to skipspace.
|
|
|
|
|
|
| |
This should make the sites that use LEX_NO_INCLINE a bit less arcane.
This has nothing to do with the erstwhile PEEKSPACE macro that existed
for MADness’ sake.
|
| |
|
|
|
|
|
|
|
|
|
| |
It was added back in 21791330a when we still had MADness. Back then
there were about four skipspace functions, some of them before
S_update_debugger_info and some after, and I just put the #define
before all of them. But now the only skipspace function left is
after S_update_debugger_info, so having the #define before it just
makes it harder to see what’s what.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By \327 I mean character number 327 in octal.
Without memory tools like ASan, it produces garbled output. The added
test fails like this:
# Failed test 18 - @ { \327 \n - used to garble output (or fail asan) [perl \#128951] at ./test.pl line 1058
# got "Unrecognized character \\xD7; marked by <-- HERE after \x{a0}\x{f6}@3\x{a8}\x{7f}\000\000@{<-- HERE near column -1 at - line 1."
# expected "Unrecognized character \\xD7; marked by <-- HERE after @{<-- HERE near column 3 at - line 1."
Dave Mitchell’s explanation from the RT ticket:
> The src code contains the bytes:
>
> @ { \327 \n
>
> after seeing "@{" the lexer calls scan_ident(), which sees the \327 as an
> ident, then calls S_skipspace_flags() to skip the spaces following the
> ident. This moves the current cursor position to the \n, and since that's
> a line boundary, its updates PL_linestart and PL_bufptr to point to \n
> too.
>
> When it finds that the next char isn't a '}', it does this:
>
> /* Didn't find the closing } at the point we expected, so restore
> state such that the next thing to process is the opening { and */
> s = SvPVX(PL_linestr) + bracket; /* let the parser handle it */
>
> i.e. it moves s back to the "{\317" then continues.
>
> However, PL_linestart doesn't get reset, so later when the parser
> encounters the \327 and tries to croak with "Unrecognized character %s ...",
> when it prints out the section of src code in error, since s < PL_linestr,
> negative string lengths and ASAN errors ensue.
This commit fixes it by passing the LEX_NO_INCLINE flag (added by
21791330a), which specifies that we are not trying to read past the
newline but simply peek ahead. In that case lex_read_space does not
reset PL_linestart.
But that does cause problems with code like:
${;
#line 3
}
because we end up jumping ahead via skipspace without updating the
line number. So we need to do a skipspace_flags(..., LEX_NO_INCLINE)
first (i.e., peek ahead), and then when we know we don’t need to go
back again we can skipspace(...) for real.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I moved subroutine signature processing into perly.y with
v5.25.3-101-gd3d9da4, I added a new lexer PL_expect state, XSIGVAR.
This indicated, when about to parse a variable, that it was a signature
element rather than a my variable; in particular, it makes ($,...)
be toked as the lone sigil '$' rather than the punctuation variable '$,'.
However this is a bit heavy-handled; so instead this commit adds a
new allowed pseudo-keyword value to PL_in_my: as well as KEY_my, KEY_our and
KEY_state, it can now be KEY_sigvar. This is a less intrusive change
to the lexer.
|
|
|
|
|
|
|
|
| |
I had not realized that SvGROW returned the new string pointer. Using
that makes a one-step process from a two-step process.
I examined the code for other possible occurrences, and found others
where it seemed that the two-step seemed clearer, so left those alone.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mentioning a constant twice in a row results in an assertion failure:
$ ./miniperl -e 'sub ub(){0} ub ub'
Assertion failed: (SvTYPE(cv) == SVt_PVCV || SvTYPE(cv) == SVt_PVFM), function Perl_cv_const_sv_or_av, file op.c, line 7926.
Abort trap: 6
A bisect points to 2eaf799e7, but I don’t understand why that commit
introduced it. I suspect it was failing an assertion for a slightly
different reason back then, but am too lazy to check.
In any case, it fails now because, while ‘ub ub’ is being compiled,
when the sub is looked up initially (in toke.c:yylex), we call
rv2cv_op_cv with the RV2GVOPCV_RETURN_STUB flag, which allows a
bare constant ref to be returned. So the ‘cv’ variable contains
an RV (\0):
cv = lex
? isGV(gv)
? GvCV(gv)
: SvROK(gv) && SvTYPE(SvRV(gv)) == SVt_PVCV
? (CV *)SvRV(gv)
: ((CV *)gv)
: rv2cv_op_cv(rv2cv_op, RV2CVOPCV_RETURN_STUB);
(‘ub’ here is a constant 0, which is stored in the symbol table as
\0; i.e., ‘sub ub(){0}’ is equivalent to ‘BEGIN { $::{ub} = \0 }’.)
Then if we see a word immediately following it (the second ‘ub’) we
check a little further down to see whether it might be a method call.
That entails calling intuit_method, which does this:
indirgv = gv_fetchpvn_flags(tmpbuf, len, ( UTF ? SVf_UTF8 : 0 ), SVt_PVCV);
if (indirgv && GvCVu(indirgv))
return 0;
So we are looking to see whether the second word refers to a sub and
deciding this is not an indirect method call if there is a sub.
But calling gv_fetchpvn_flags like that has the effect of upgrading
the symbol table entry to a full GV. Since the ‘cv’ variable in yylex
points to that symbol table entry, it ends up pointing to a GV, which
certain code later on does not expect to happen.
So we should pass the GV_NOADD_NOINIT flag to gv_fetchpvn_flags to
prevent lookup of the second bareword from upgrading the entry (we
already do that earlier in intuit_method for the first bareword). We
only check the GV to see whether it has a sub or io thingy in it any-
way, so we don’t actually need a full GV. (As a bonus, GvIO will
already work on a non-GV and return NULL, so that part of the code
remains unchanged.)
|
|
|
|
|
|
|
|
|
|
|
| |
RT #128952
In
eval "q" . chr(100000000064);
generating the error message C<Can't find string terminator "XXX"'>
was overrunning a buffer designed to hold a single utf8 char, since
it wasn't allowing for the \0 at the end.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
The bug here is simply an I32 was used when an IV was needed.
One could argue that there should be that the parser should refuse to
accept something larger than an IV. I chose not to do that, as this is
a deprecated usage, which generates a warning by default and will be a
syntax error anyway in a future release.
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #128719. This:
sub f ($x, $x) {}
gave
"state" variable $x masks earlier declaration in same scope
This commit changes that to '"my" variable'
|
|
|
|
|
|
|
|
|
| |
During the course of parsing end exection, these values get stored
as ints and UVs, then used as SSize_t.
Standardise on IVs instead. Technically they can never be negative, but
their final use is as indices into AVs, which is SSize_t, so it's
easier to standardise on a signed value throughout.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
e.g.
a slurpy parameter may not have a default value
=>
A slurpy parameter may not have a default value
Also, split the "Too %s arguments for subroutine" diagnostic into
separate "too few" and "too many" entries in perldiag.
Based on suggestions by Father Chrysostomos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently subroutine signature parsing emits many small discrete ops
to implement arg handling. This commit replaces them with a couple of ops
per signature element, plus an initial signature check op.
These new ops are added to the OP tree during parsing, so will be visible
to hooks called up to and including peephole optimisation. It is intended
soon that the peephole optimiser will take these per-element ops, and
replace them with a single OP_SIGNATURE op which handles the whole
signature in a single go. So normally these ops wont actually get executed
much. But adding these intermediate-level ops gives three advantages:
1) it allows the parser to efficiently generate subtrees containing
individual signature elements, which can't be done if only OP_SIGNATURE
or discrete ops are available;
2) prior to optimisation, it provides a simple and straightforward
representation of the signature;
3) hooks can mess with the signature OP subtree in ways that make it
no longer possible to optimise into an OP_SIGNATURE, but which can
still be executed, deparsed etc (if less efficiently).
This code:
use feature "signatures";
sub f($a, $, $b = 1, @c) {$a}
under 'perl -MO=Concise,f' now gives:
d <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->d
1 <;> nextstate(main 84 foo:6) v:%,469762048 ->2
2 <+> argcheck(3,1,@) v ->3
3 <;> nextstate(main 81 foo:6) v:%,469762048 ->4
4 <+> argelem(0)[$a:81,84] v/SV ->5
5 <;> nextstate(main 82 foo:6) v:%,469762048 ->6
8 <+> argelem(2)[$b:82,84] vKS/SV ->9
6 <|> argdefelem(other->7)[2] sK ->8
7 <$> const(IV 1) s ->8
9 <;> nextstate(main 83 foo:6) v:%,469762048 ->a
a <+> argelem(3)[@c:83,84] v/AV ->b
- <;> ex-nextstate(main 84 foo:6) v:%,469762048 ->b
b <;> nextstate(main 84 foo:6) v:%,469762048 ->c
c <0> padsv[$a:81,84] s ->d
The argcheck(3,1,@) op knows the number of positional params (3), the
number of optional params (1), and whether it has an array / hash slurpy
element at the end. This op is responsible for checking that @_ contains
the right number of args.
A simple argelem(0)[$a] op does the equivalent of 'my $a = $_[0]'.
Similarly, argelem(3)[@c] is equivalent to 'my @c = @_[3..$#_]'.
If it has a child, it gets its arg from the stack rather than using $_[N].
Currently the only used child is the logop argdefelem.
argdefelem(other->7)[2] is equivalent to '@_ > 2 ? $_[2] : other'.
[ These ops currently assume that the lexical var being introduced
is undef/empty and non-magival etc. This is an incorrect assumption and
is fixed in a few commits' time ]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the signature of a sub (i.e. the '($a, $b = 1)' bit) is parsed
in toke.c using a roll-your-own mini-parser. This commit makes
the signature be part of the general grammar in perly.y instead.
In theory it should still generate the same optree as before, except
that an OP_STUB is no longer appended to each signature optree: it's
unnecessary, and I assume that was a hangover from early development of
the original signature code.
Error messages have changed somewhat: the generic 'Parse error' has
changed to the generic 'syntax error', with the addition of ', near "xyz"'
now appended to each message.
Also, some specific error messages have been added; for example
(@a=1) now says that slurpy params can't have a default vale, rather than
just giving 'Parse error'.
It introduces a new lexer expect state, XSIGVAR, since otherwise when
the lexer saw something like '($, ...)' it would see the identifier
'$,' rather than the tokens '$' and ','.
Since it no longer uses parse_termexpr(), it is no longer subject to the
bug (#123010) associated with that; so sub f($x = print, $y) {}
is no longer mis-interpreted as sub f($x = print($_, $y)) {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sublex_info is never validly copied or set* all at once and no pointer
is ever taken to it. It seems to be left over from the time when
PL_sublex_info was a global variable. (Indeed, the struct is still
defined in perl.h, an odd place for something used only by parser.h.)
It will be easier to eliminate alignment holes in the parser struct if
we just empty it out.
* The one instance of sublex_info being copied, in
sv.c:Perl_parser_dup, ended up potentially sharing an SV between
threads, which is a no-no. I say potentially, because I can’t see how
it could be non-null during thread cloning, which would have to happen
between sublex_start and sublex_push.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Built-in arrays should not be giving warnings about possible unin-
tended interpolation (which happens with nonexistent arrays).
Some built-in variables do not exist if they are not needed, but perl
will generally pretend that they did already exist whenever they are
fetched. It is such variables that trigger this warning erroneously:
$ ./miniperl -we 'sub dump_isa { warn "@ISA" } @ISA = ("foo","bar"); dump_isa'
Possible unintended interpolation of @ISA in string at -e line 1.
foo bar at -e line 1.
I discovered this when writing a test for @DB::args, using -w.
warnings.pm uses @DB::args, so ‘use warnings’ code won’t get the warn-
ing, but code using -w gets it:
$ ./miniperl -we 'sub foo { package DB { () = caller 0 } print "@DB::args\n" } foo(1..3);'
Possible unintended interpolation of @DB::args in string at -e line 1.
1 2 3
The code in toke.c that decides whether this warning should take place
needs to supply the GV_ADDMG flag to gv_fetchpvn_flags, making it one
of the code paths that engages in the pretence mentioned above.
That code already had an explicit exception for @+ and @-. This com-
mit removes it as being no longer necessary.
|
|
|
|
| |
re-wrap a comment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 6745174b561 I changed the multi_open and multi_close
parser pastruct members (known as PL_multi_open/close in toke.c) from
char to UV.
I failed to change the localization code in S_sublex_start:
SAVEI8(PL_multi_close);
So on big-endian architectures only the most significant byte would be
localized. That meant that effectively no localization would happen
for ASCII string delimiters.
In S_sublex_done:
LEAVE;
if (PL_multi_close == '<')
PL_parser->herelines += l - PL_multi_end;
That LEAVE undoes the localization. '<' for PL_multi_close is a spe-
cial value that can only happen for here-docs. The ->herelines line
makes sure that line numbers are correct after a here-doc.
What ended up happening was that s//<<END/e would throw off line num-
bers after the here-doc body. PL_multi_close would end up being set
to '<', not '/', when the lexer was finishing up the s///, so it
treated it like a here-doc and screwed things up. This resulted in
the test failures in ticket #128747.
I found that we also had a bug on little-endian machines. But to get
the localization of the *least* sigificant byte to screw things up,
you have to try something other than s//<<END/e:
use utf8;
<<END;
${
#line 57
qq || }
END
warn; # line 59
Replace the pipes with lightning bolts:
use utf8;
<<END;
${
#line 57
qq ϟϟ }
END
warn; # line 7
and you get line 7 instead of 59. In this case, the inner construct
has a delimiter character whose code is > 255, but only the lower
8 bits get localized. So when the localization unwinds, you get
ord("ϟ") & 0xff | ord("<") instead of just ord("<"), resulting in the
here-doc line number handling being skipped.
This commit fixes the localization and adds the little-endian test.
|
|
|
|
|
|
|
| |
The upper latin1 characters when expressed as \N{U+...} were failing.
This was due to trying to convert them to UTF-8 when the result isn't
UTF-8. I added a test for \N{name} as well, though these were not
affected by this regression.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The output of
perl -CS -e 'use utf8; q«'
is now correctly:
Can't find string terminator "«" anywhere before EOF at -e line 1.
Previously, the first byte of the delimiter (as encoded in UTF-8)
would be used instead:
Can't find string terminator "Â" anywhere before EOF at -e line 1.
|
|
|
|
|
|
| |
We will need to store characters > 255 in here.
Also, cast accordingly in toke.c.
|
|
|
|
|
|
|
|
| |
toke.c:4698:36: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
return REPORT(dojoin_was == 1 ? ')' : POSTJOIN);
I have no idea why mixing enums and non-emums is bad in C++. This commit
just casts the hell out the expression to shut it up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The enum value "WORD" can apparently clash with a enum value in windows
headers. It used to be worked around in windows by #defining YYTOKENTYPE,
which caused the
enum yytokentype {
...
WORD = ...;
}
in perly.h to be skipped, while still using the
#define WORD ...
which appears later in the same file.
In bison 3.x, the auto-generated perl.h no longer includes the #defines,
so this workaround no longer works.
Instead, change the name of the token from "WORD" to "BAREWORD".
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This applies to ‘my’, ‘our’, ‘state’ and ‘local’, and both to single
variable and lists of variables, in all their variations:
my \$a # equivalent to \my $a
my \($a,$b) # equivalent to \my($a, $b)
my (\($a,$b)) # same
my (\$a, $b) # equivalent to (\my $a, $b)
|