| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This is the one remaining empty {} that was accepted under the
experimental 'use re "strict"'.
|
|
|
|
|
|
|
|
| |
It is an error to specify an empty Unicode property name, like in
qr/\p{}/. It also is illegal to just say qr/\p/. Prior to this commit
the error message for that latter construct misleadingly referred to
braces. Since there are no braces in the input, they shouldn't be
mentioned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I undertook a code review of how regcomp.c parses things in light of the
tickets found by the fuzzer,
https://rt.perl.org/Ticket/Display.html?id=126546. This commit is the
result of my efforts so far. I was not planning to push it now, but the
work found a couple of new error messages that should be raised, and
this has to be done before the visible changes code freeze coming up all
too soon. I will add test cases after that freeze, including if to see
that these changes fix all the observed issues.
The audit was tedious, and may have missed some things. Several issues
occurred in multiple places. One is to not advance the parse by
UTF8SKIP appropriately; another is to subtract one byte from the parse
and assume that that is pointing to the beginning of the previous
character (which under UTF-8 it may not). Another is to assume that
that the pattern is a C string, that there are no interior NULs in it.
I also found unnecessary tests, given that an SV always has a
terminating NUL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A problem with bracketed character classes, qr/[foo]/, is that there is
very little structure about them, so almost anything is legal, and so
typos just silently compile into something unintended. One of the
possible components are posix character classes. There are 14 of them,
and they have a very restricted structure, which is easy to get slightly
wrong, so that instead of the intended posix class being compiled,
something else silently is created. This commit causes the regex
compiler to look for slightly misspelled posix character classes and to
raise a warning when found. It does not change the results of the
compilation.
To do this, it introduces fuzzy parsing into the regex compiler, using
the Damerau-Levenshtein algorithm to find out how many single character
edits it would take to transform the input into one of the 14 classes.
If it is 1 or 2 off, it considers the input to have been intended to be
that class and raises the warning. If more edits would be needed, it
remains silent.
This is a heuristic, and someone could have made enough typos that this
thinks a class wasn't intended that was. Conversely it could raise a
warning when no class was intended, though warnings only happen when the
input very closely resembles a posix class of one of the 14 legal ones.
The algorithm can be tweaked if experience indicates it should. But the
bottom line is that many more cases of unintended results will now be
warned about.
Things like having blanks in the construct and having the '^' before the
colon are recognized as being intended posix classes (given that the
actual names are close to one of the 14), and raise warnings. Again
this commit does not change what gets compiled. This found a bug in
autodoc.pl which was fixed a few commits ago.
The [. .] and [= =] POSIX constructs cause perl to croak that they are
unimplemented. This commit improves the parsing of these two, and fixes
some false positives. See
http://nntp.perl.org/group/perl.perl5.porters/230975
The new code combines two functions in regcomp.c into one new one.
|
|
|
|
|
|
|
|
|
|
| |
It panics if the context it has just popped back to isn't a CXt_EVAL.
Since we have just called dopoptoeval() which can only pop to an eval, and
since we assert that its an eval, this seems such an unlikely "this can
can never happen" that it doesn't really seem worth testing for.
It was added in perl5.000, and may have made slightly more sense then, as
there used to be a POPBLOCK just before it.
|
|
|
|
|
|
|
| |
See thread starting at
http://nntp.perl.org/group/perl.perl5.porters/227698
Ricardo Signes provided the perldelta and perldiag text.
|
|
|
|
|
| |
The old text used the passive voice. No 5.23 release has been made with
the old text, so no perldelta changes are needed.
|
| |
|
|
|
|
| |
and a small typo fix
|
| |
|
|
|
|
| |
See https://rt.perl.org/Ticket/Display.html?id=115166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses for UTF-EBCDIC essentially the same mechanism that Perl
already uses for UTF-8 on ASCII platforms to extend it beyond what might
be its natural maximum. That is, when the UTF-8 start byte is 0xFF, it
adds a bunch more bytes to the character than it otherwise would,
bringing it to a total of 14 for UTF-EBCDIC. This is enough to handle
any code point that fits in a 64 bit word.
The downside of this is that this extension is not compatible with
previous perls for the range 2**30 up through the previous max,
2**30 - 1. A simple program could be written to convert files that were
written out using an older perl so that they can be read with newer
perls, and the perldelta says we will do this should anyone ask.
However, I strongly suspect that the number of such files in existence
is zero, as people in EBCDIC land don't seem to use Unicode much, and
these are very large code points, which are associated with a
portability warning every time they are output in some way.
This extension brings UTF-EBCDIC to parity with UTF-8, so that both can
cover a 64-bit word. It allows some removal of special cases for EBCDIC
in core code and core tests. And it is a necessary step to handle Perl
6's NFG, which I'd like eventually to bring to Perl 5.
This commit causes two implementations of a macro in utf8.h and
utfebcdic.h to become the same, and both are moved to a single one in
the portion of utf8.h common to both.
To illustrate, the I8 for U+3FFFFFFF (2**30-1) is
"\xFE\xBF\xBF\xBF\xBF\xBF\xBF" before and after this commit, but the I8
for the next code point, U+40000000 is now
"\xFF\xA0\xA0\xA0\xA0\xA0\xA0\xA1\xA0\xA0\xA0\xA0\xA0\xA0",
and before this commit it was "\xFF\xA0\xA0\xA0\xA0\xA0\xA0".
The I8 for 2**64-1 (U+FFFFFFFFFFFFFFFF) is
"\xFF\xAF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF", whereas
before this commit it was unrepresentable.
Commit 7c560c3beefbb9946463c9f7b946a13f02f319d8 said in its message that
it was moving something that hadn't been needed on EBCDIC until the
"next commit". That statement turned out to be wrong, overtaken by
events. This now is the commit it was referring to.
commit I prematurely
pushed that
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So, if it isn't found and 'foo' doesn't begin with 'In' or 'Is', we know
that there would be a run-time error, which we can fail with at
compile time instead. We use a different error message than if we don't
know if it is a user-defined property.
See thread beginning at
http://nntp.perl.org/group/perl.perl5.porters/231658
I didn't make a perldelta entry, as I doubt that this has ever come up
in the field, as I discovered the issue myself while playing around
investigating other bugs.
|
|
|
|
|
| |
I noticed that this message was there, but hasn't been used for some
time, having been replaced, and I didn't look too hard for when.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See threads beginning at
http://nntp.perl.org/group/perl.perl5.porters/231263
http://nntp.perl.org/group/perl.perl5.porters/231389
Prior to this commit, these did not generate the pattern that would be
expected, and displayed apparently irrelevant warnings. Now this is a
fatal error.
This resolves [perl #126187]. I don't think it's worth a perldelta
entry for this ticket, as the new error message is now in perldelta, and
this never worked properly anyway; it's just now we have a proper error
message. Patches welcome if you disagree.
|
|
|
|
|
|
|
|
| |
Prior to this commit, an unknown Unicode property gave different
messages depending on when the problem was found. Prior to the previous
commit, most were found at run-time, but now most are found at
compile-time. Therefore use the runtime message everywhere, as it was
the most often encountered before.
|
|
|
|
|
| |
This makes the cause of the error more obvious if you accidentally call
a non-lvalue sub in the final position of an lvalue one.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This expands the concept introduced for regular expressions in v5.22 of
a portable range, to the transliteration operators. A portable range
has at least one endpoint expressed as \N{} that indicates that the
Unicode definition is desired, or has the endpoints expressed as both
uppercase ASCII alphabetic letters or both lowercase ASCII alphabetics.
The refactor fixes several EBCDIC problems, and it fixes the problem in
all platforms wherein the first endpoint of a range was not checked to
be <= the final endpoint in UTF-8 strings.
There remains a bug in which if any transliterated code point is larger
than IV_MAX, perl loops.
|
|
|
|
| |
Removes 'the' in front of parameter names in some instances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously use of this under /l regex rules was a compile time error.
Now it works like \b{wb} and \b{sb}, which compile under locale rules
and always work like Unicode says they should. A UTF-8 locale implies
Unicode rules, and the goal is for it to work seamlessly with the rest
of perl. This construct was the only one I am aware of that didn't work
seamlessly (not counting OS interfaces) under UTF-8 LC_CTYPE locales.
For all three of these constructs, use with a non-UTF-8 runtime locale
raises a warning, and Unicode rules are used anyway.
UTF-8 locale collation still has problems, but this is low priority to
fix, as it's a lot of work, and if one really cares, one should be using
Unicode::Collate.
|
| |
|
| |
|
|
|
|
|
|
| |
- reserve enough buffer space
- name the two different errors differently
- test around the problem spot
|
|
|
|
|
|
|
|
|
|
|
| |
sv_catpvf() and friends ultimately end up calling sv_vcatpvfn_flags() with a
C-style va_list argument (rather than with an array of SV pointers). When
the sprintf implementation in sv_vcatpvfn_flags() is called with a va_list
it always ignores any attempt by the format string to reorder the arguments.
This reasonable limitation is now documented, and the implementation throws
an exception when it encounters this situation.
Minimal tests for these exceptions have been added to XS::APItest.
|
| |
|
| |
|
|
|
|
|
| |
Non-characters are no longer forbidden as of Unicode 7.0; they are just
"not recommended". The wording of the warning changes accordingly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As proposed by RJBS.
The "5.24" feature bundle (and therefore C<< use v5.24 >>) now enable
postderef and postderef_qq.
I can't find any precedent for what to do with the relevant experimental::*
warnings category when an experimental feature graduates to acceptance. I
have elected to leave the category in place, so that code doing C<< no
warnings "experimental::postderef" >> will continue to work. This means that
C<< use warnings "experimental::postderef" >> is also accepted, but has no
effect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Give by default a warning, do not set the alarm, and return undef.
(the signedness problem detected by Coverity, CID 104837)
alarm() takes and returns unsigned int, not signed.
In other words, the C library function alarm() cannot fail, ever.
See for example:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/alarm.html
http://linux.die.net/man/3/alarm
https://www.freebsd.org/cgi/man.cgi?query=alarm&sektion=3
|
|
|
|
|
|
| |
This horrible thing broke encapsulation and was as buggy as a very buggy
thing. It's been officially deprecated since 5.20.0 and now it can finally
die die die!!!!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit:
1) makes the gimme of sort blocks, as specified by the pushed cx_gimme,
be G_SCALAR. Formerly it was likely to be G_ARRAY, as it inherited
whatever sort() was called as, and sort doesn't bother calling a sort
block unless sort was called in list context.
This change is largely cosmetic, as
a) the sort block is already compiled in scalar context; and
b) the code in S_sortcv() etc does its own return arg context
processing anyway, and assumes scalar context.
But it makes it consistent with sort SUB, which *does* set gimme to
G_SCALAR.
2) Makes use of the fact that a sort sub or block will always be
called as the first context in a new stackinfo, and such stackinfos always
have PL_stack_base[0] set to &PL_sv_undef as a guard.
So handling scalar context return (where zero args returned needs to be
converted into 1 PL_sv_undef arg) can be simplified by just always
accessing the last arg, *PL_stack_sp, regardless of whether 0,1,2+ args
were returned.
Note that some code making use of MULTICALL (e.g. List::Util) has already
been (possibly inadvertently) relying on this fact.
3) Remove the "Sort subroutine didn't return single value" fatal error.
This croak was removed from the sort BLOCK and sort NON-XS-SUB variants
in v5.15.5-82-g1715fa6, but the croak was left for XS sort subs.
That commit incorrectly asserted that for "sort BLOCK" and "sort
NON-XS-SUB", more than 1 arg could never be returned, but:
$ perl -e'sub f { return (1,2) } @a = sort f 1,2,3'
perl: pp_sort.c:1789: S_sortcv: Assertion `PL_stack_sp ==
PL_stack_base' failed.
That has been fixed by (2) above. By removing the croak from the XS branch
too, we make things consistent. This means that an XS sub which returns
more than 1 arg will just gets its return args be evaluated in scalar
context (so @return_args[-1] will be used), rather than being handled
specially.
|
| |
|
| |
|
|
|
|
|
| |
Except when referring to actual names of things.
Also update the diagnostic description in perldiag.
|
|
|
|
| |
Why is this change needed?
|
|
|
|
|
|
| |
Committer: Correct two instances of double 'S<' encoding.
For: RT # 124334
|
|
|
|
| |
Commit modifies 4 of 5 files in patch submitted by author in RT #124335.
|
| |
|
|
|
|
|
|
|
| |
Juerd Waalboer suggested long ago that in documentation, using the term
"rules" is better than "semantics". This changes some of the places
that had remained unchanged. I looked at the still-remaining places,
and decided that it was best to leave them as-is.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A function implements seeing if the space between any two characters is
a grapheme cluster break. Afer I wrote this, I realized that an array
lookup might be a better implementation, but the deadline for v5.22 was
too close to change it. I did see that my gcc optimized it down to
an array lookup.
This makes the implementation of \X go from being complicated to
trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4258cf903c752ec19a3aeee9b93020533d923e1a
91e945c051cfcdf499d5b43aa5ac0a5681cdd595
eb254f2672a985ec3c34810f624f36c18fc35fc7
c9a671b17a9c588469bcef958038daaaaf9cc88b
99fcdd4df47515fb0a62a046e622adec0871754d
ba511db061a88439acb528a66c780ab574bb4fb0
0d1cf11425608e9be019f27a3a4575bc71c49e6b
c2ea8a88f8537d00ba25ec8feb63ef5dc085ef2b
b5a6eedc2f49a90089cca896ee20f41e373fb4c9
30419b527d2c5a06cefe2db9183f59e2697c47fc
29b62199cd4c359dfc6b9d690341de40d105ca5f
be181dc9d91c84a2fe03912c993c8259fed92641
4de1bcfe1abdaba0a5da394ddea0cc6fd7e36c7b
6e915616c4ccb4f6cc3122c5d395765db96c0a2d
b2e3501558a1017eb529be0915c25d31671e7869
bfaa02d55f4ace1571e6fa9e5b47d5e3ac3cecc6
569f27e562618bdddcf4a9fc71612283a73747e9
4f89311dc8de87ddc9a302c6f2d2c844951bbd28
a307a0b0d83c509cc2adaad8cebb44260294bf36
6640aa2c3b93d7ac78e4e86983fe5948b3ca55f2
b74dc0b3c96390d8bf83d8c3ffc0c2c2d1f0a5d3
c3a8e5a5b4bb89a15de642c023dfd5cbc4678938
|
|
|
|
| |
Also: display the payload, and the number of bits
|
| |
|
| |
|
|
|
|
| |
Yay, the semicolons are back.
|