| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
'typedef enum x { ... } x' causes h2xs to enter a substitution loop while
trying to write the typemap file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The documentation was ambiguous about what type of interpolation
was disabled in single quote regexps. It is a bit debateable whether
"\n" in a regex is a regexp meta-escapes which happens to match "\n",
or if it is a string escape that needs to be interpolated. Since single
quote regexps should allow regexp meta-escapes (for instance \s), then
it makes more sense to treat \n and \x{..} also as regexp meta-escapes,
which then leaves nothing but variables that /could/ be interpolated.
This effectively officially defines the current behavior as correct,
and will allow us to close a number of tickets because of it. In
particular we can close #21491 as "not a bug", and probably also related
tickets.
|
|
|
|
| |
(This retains the blead customizations from 01b515d1d7 and 0fc44d0a18.)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FC didn't like my previous patch for this issue, so here is the
one he likes better. With tests and etc. :-)
The basic problem is that code like this: /(?{ s!!! })/ can trigger
infinite recursion on the C stack (not the normal perl stack) when the
last successful pattern in scope is itself. Since the C stack overflows
this manifests as an untrappable error/segfault, which then kills perl.
We avoid the segfault by simply forbidding the use of the empty pattern
when it would resolve to the currently executing pattern.
I imagine with a bit of effort someone can trigger the original SEGV,
unlike my original fix which forbade use of the empty pattern in a
regex code block. So if someone actually reports such a bug we might
have to revert to the older approach of prohibiting this.
|
|
|
|
|
|
|
|
|
| |
@{^CAPTURE} exposes the capture buffers of the last match
as an array. So $1 is ${^CAPTURE}[0].
%{^CAPTURE} is the equivalent to %+ (ie named captures)
%{^CAPTURE_ALL} is the equivalent to %- (ie all named captures).
|
|
|
|
|
|
|
|
| |
At first I thought these would be ftz/daz problems
(flush-to-zero/denormals-are-zero), compiled with bare cc those seem
to happen with denormals (e.g. DBL_MIN * 0.5), but the "cc -ieee"
which perl is compiled with does make the ftz/daz go away. Needs
further study. So make them TODO for now.
|
|
|
|
|
|
|
|
|
|
|
| |
When a match is anchored against the start of a string, the regexp
can be compiled to include a fixed string match against a fixed
offset in the string.
In some cases, where the matched against string included UTF-8 before
the fixed offset, this could result in attempting a memcmp() which
overlaps the end of the string and potentially past the end of the
allocated memory.
|
|
|
|
| |
when there's a short UTF-8 character at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch simplifies two bits of code that I came across while
working on supporting the clang -Weverything flag.
The first, in Perl_validate_proto, removes unnecessary variable
initialization if proto of NULL is passed.
The second, in S_scan_const, rearranges some code and #ifdefs so that
the convert_unicode and real_range_max variables are only declared
if EBCDIC is set. This lets us no longer have to unnecessarily set
useless variables to make the compiler happy, and it saves us from some
unnecessary checks on "if (convert_unicode)". One of the comments says
"(Compilers should optimize this out for non-EBCDIC)", but now the
compiler won't even see these unnecessary variables or tests.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
of 2
They are just performance bombs waiting to hit the regex engine
and other code. If someone wants this precise level of management
then we should provide an API for them to do so.
Really this just shows the flaw in our current COW implementation.
|
|
|
|
|
| |
Otherwise we get degenerate performance in things like the regex
engine under certain cases.
|
|
|
|
|
|
|
|
|
|
| |
As part of testing, certain malformations are perturbed to also be
overlong to see that the combination of them is properly handled. To do
this, the code will take a test case and calculate an overlong that is
longer than it. However if the test case is as long as the overlong
would be, this can't be done, and is skipped. This commit now
uses a longer overlong than previously (now the maximum possible) so
that fewer tests have to be skipped.
|
| |
|
|
|
|
| |
This number needs to be adjusted for EBCDIC platforms
|
|
|
|
|
| |
The maximum byte length of a single code-points UTF-8 representation is
used in a bunch of places. Calculate it once.
|
|
|
|
|
| |
The I8 string doesn't work the same as UTF-8, as it only takes 5 bits
from each continuation byte instead of 6.
|
| |
|
|
|
|
|
|
|
| |
There are still hacks (in a good sense) of detecting "vax float"
in the cpan/ modules (patches submitted upstream, customized moves done),
but that is fine since the new Config symbols will be available only in
the future.
|
|
|
|
|
|
|
|
|
|
| |
For windows/netware It seems that many of the recent fp definitions
have not been yet copied over there [1] [2], so went mostly by dead
reckoning [3].
[1] Note that many of them are not absolutely necessary for building.
[2] The proper updating involves doing stuff in win32, which I do not have.
[3] As far as I can tell, Windows CE does not really not have long double.
|
|
|
|
| |
Note also that the computation needs to be runtime, not compiletime.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
(1) Do not assume it is called 'tar'.
(2) Do not assume it has the compression features.
(3) Do not assume there is only one 'tar'.
(4) Do not assume the first one found has the compression features.
(5) Add the platform executable suffix to the name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit da42332b10691ba7af7550035ffc7f46c87e4e66 introduced a new test.
But on EBCDIC platforms that test doesn't do what it is intended. It
uses \xE4, assuming it will have a different representation when encoded
in UTF-8, and it is trying to test that having a different
representation still works. But \xE4 on EBCDIC is a UTF-8 invariant
(CAPITAL U).
perlhacktips gives some suggestions on writing tests that work on both
character sets. In this case \xB6, that is mentioned there, works, as
it is UTF-8 variant on both character sets, and all EBCDIC code pages that
have ever been supported by Perl.
|
|
|
|
|
|
|
|
| |
I wrote this code some time ago. It is somewhat of
a state machine with some interesting implicit
assumptions which took me a while to remember. While
I do it seems reasonable to document them so the next
guy (maybe/probably me) doesn't have to think so hard.
|
| |
|
|
|
|
|
| |
Using 'idx' and 'ofs' interchangably is confusing, calling
this first_ofs makes it more obvious what it is used for.
|
| |
|
| |
|
|
|
|
|
|
| |
In four places we use the same logic, so refactor
it to one macro called from four places and avoid
any future oversights missing one.
|
|
|
|
|
|
|
|
| |
We save a few ops for package vars starting with 'E'
by checking the second char as well. We could
probably be much smarter with this switch, but we
would have to generate it, which involves its own
issues.
|
|
|
|
| |
(Silence lots of used once warnings we used to not generate)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optimisation
The trie code contains a number of sub optimisations, one of which
extracts common prefixes from alternations, and another which isa
bitmap of the possible matching first chars.
The bitmap needs to contain the possible first octets of the string
which the trie can match, and for codepoints which might have a different
first octet under utf8 or non-utf8 need to register BOTH codepoints.
So for instance in the pattern (?:a|a\x{E4}) we should restructure this
as a(|\x{E4), and the bitmap for the trie should contain both \x{E4} AND
\x{C3} as \x{C3} is the first byte of \x{EF} expressed as utf8.
|
|
|
|
|
| |
I added alternates to a regex for matching a f/p number, but forgot
to put parentheses around them. So it was being ridiculously over-liberal
|
|
|
|
|
|
|
|
|
|
| |
[perl #129954] dist/Carp/t/arg_string.t: Test fails
This test script checks that args are displayed sensibly in longmess()
output, but floating-point numbers can be displayed in various formats
depending on platform, so make the regex more forgiving.
Also add a comment to the top of the script explaining its purpose.
|
|
|
|
| |
from some compilers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some locales are incompatible with Perl. The only multi-byte locales
accepted are UTF-8 ones. ISO 646 locales can have things like a '|' be
a \w, which will create havoc with regular expressions. This commit
causes locale.t to not test such locales, and to note which ones are
skipped, and why.
If we were to do the tests anyway, it could create segfaults. For
example locales with state can have their states screwed up by Perl,
which knows nothing about that. One could argue that the locale should
be immune from bad state, and not segfault, but that is not under Perl's
control, and we will get blamed initially, anyway, when a segfault
happens, so don't test them.
The multi-byte locales are more incompatible with Perl than the 7-bit
locales that aren't ASCII compatible. So it could be argued that those
should be tested, but I don't want to undertake the work to separate out
the two causes from each other. The ISO 646 locales are essentially
obsolete, and hence unlikely to be encountered in practice, so there
would be little pay off for doing that work.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, in order to work properly, find_locales required
the C enum number for a locale category. This relaxes that to allow the
name of the category as well. Thus it will work seamlessly when a given
category isn't on the platform. Unless wrapped in an eval, or checked
before using, it was a potential bug to call this function at all. This
is because it didn't properly handle a string, and trying to find the
locale number might fail on a given platform.
|
| |
|
|
|
|
|
| |
This is a simple move, with no changes. This will make things flow more
logically after a future commit.
|