| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike utf8_hop(), utf8_hop_safe() won't navigate before the
beginning or after the end of the supplied buffer.
The original version of this put all of the logic into
utf8_hop_safe(), but in many cases a caller specifically
needs to go forward or backward, and supplying the other limit
made the function less usable, so I split the function
into forward and backward cases.
This split may also make inlining these functions more efficient
or more likely.
|
|
|
|
|
|
|
|
| |
Commit 2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 changed the algorithm
for detecting overflow during decoding UTF-8 into code points. However,
on 32-bit platforms, this change caused it to claim some things overflow
that really don't. ALl such are overlong malformations, which are
normally forbidden, but not necessarily. This commit fixes that.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In the regex engine it can be useful in debugging mode to
maintain a depth counter, but in normal mode this argument
would be unused. This allows us to define functions in embed.fnc
with a "W" flag which use _pDEPTH and _aDEPTH defines which
effectively define/pass through a U32 depth parameter to the
macro wrappers. These defines are similar to the existing
aTHX and pTHX parameters.
|
|
|
|
|
|
| |
This is enabled by a C flag, as commented. It is designed to be found
only by someone reading the code and wanting something temporary to help
in debugging.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The first can be used to wrap several SVPV steps into
a single sub, and a wrapper macro which is the equivalent
of
$s= "";
but is optimized in various ways.
|
|
|
|
|
|
|
|
|
|
| |
This new function behaves like utf8n_to_uvchr(), but takes an extra
parameter that points to a U32 which will be set to 0 if no errors are
found; otherwise each error found will set a bit in it. This can be
used by the caller to figure out precisely what the error(s) is/are.
Previously, one would have to capture and parse the warning/error
messages raised. This can be used, for example, to customize the
messages to the expected end-user's knowledge level.
|
|
|
|
|
| |
This is in preparation for the same functionality to each be used in a
new place in a future commit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've long been unsatisfied with the information contained in the
error/warning messages raised when some input is malformed UTF-8, but
have been reluctant to change the text in case some one is relying on
it. One reason that someone might be parsing the messages is that there
has been no convenient way to otherwise pin down what the exact
malformation might be. A few commits from now will add a facility
to get the type of malformation unambiguously. This will be a better
mechanism to use for those rare modules that need to know what's the
exact malformation.
So, I will fix and issue pull requests for any module broken by this
commit.
The messages are changed by now dumping (in \xXY format) the bytes that
make up the malformed character, and extra details are added in most
cases.
Messages about overlongs now display the code point they evaluate to and
what the shortest UTF-8 sequence for generating that code point is.
Messages about overflowing now just display that it overflows, since the
entire byte sequence is now dumped. The previous message displayed just
the byte which was being processed where overflow was detected, but that
information is not at all meaningfull.
|
|
|
|
| |
This text is generated in 2 places; consolidate into one place.
|
|
|
|
|
|
|
|
| |
This encodes a simple pattern that may not be immediately obvious to
someone needing it. If you have a fixed-size buffer that is full of
purportedly UTF-8 bytes, is it valid or not? It's easy to do, as shown
in this commit. The file test operators -T and -B can be simpified by
using this function.
|
|
|
|
|
|
|
|
|
|
| |
These functions are all extensions of the is_utf8_string_foo()
functions, that restrict the UTF-8 recognized as valid in various ways.
There are named ones for the two definitions that Unicode makes, and
foo_flags ones for more custom restrictions.
The named ones are implemented as tries, while the flags ones provide
complete generality
|
|
|
|
|
| |
This is a generalization of is_utf8_valid_partial_char to allow the
caller to automatically exclude things such as surrogates.
|
|
|
|
|
|
|
| |
This changes the name of this helper function and adds a parameter and
functionality to allow it to exclude problematic classes of code
points, the same ones excludeable by utf8n_to_uvchar(), like surrogates
or non-character code points.
|
|
|
|
|
|
| |
Actually the code isn't quite duplicate, but should be because one
instance is wrong. This failure would only show up on EBCDIC platforms.
Tests are coming in a future commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ cat > foo
#!/usr/bin/perl
print "What?!\n"
^D
$ chmod +x foo
$ ./perl -Ilib -Te '$ENV{PATH}="."; exec "foo"'
Insecure directory in $ENV{PATH} while running with -T switch at -e line 1.
That is what I expect to see. But:
$ ./perl -Ilib -Te '$ENV{PATH}="/\\:."; exec "foo"'
What?!
Perl is allowing the \ to escape the :, but the \ is not treated as an
escape by the system, allowing a relative path in PATH to be consid-
ered safe.
|
|
|
|
|
|
|
|
|
| |
This new function can test some purported UTF-8 to see if it is
well-formed as far as it goes. That is there aren't enough bytes for
the character they start, but what is there is legal so far. This can
be useful in a fixed width buffer, where the final character is split in
the middle, and we want to test without waiting for the next read that
the entire buffer is valid.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macro isUTF8_CHAR calls a helper function for code points higher
than it can handle. That function had been an inlined wrapper around
utf8n_to_uvchr().
The function has been rewritten to not call utf8n_to_uvchr(), so it is
now too big to be effectively inlined. Instead, it implements a faster
method of checking the validity of the UTF-8 without having to decode
it. It just checks for valid syntax and now knows where the
few discontinuities are in UTF-8 where overlongs can occur, and uses a
string compare to verify that overflow won't occur.
As a result this is now a pure function.
This also causes a previously generated deprecation warning to not be,
because in printing UTF-8, no longer does it have to be converted to
internal form. I could add a check for that, but I think it's best not
to. If you manipulated what is getting printed in any way, the
deprecation message will already have been raised.
This commit also fleshes out the documentation of isUTF8_CHAR.
|
| |
|
|
|
|
|
|
| |
This is clearer as to its meaning than the existing 'is_ascii_string'
and 'is_invariant_string', which are retained for back compat. The
thread context variable is removed as it is not used.
|
|
|
|
|
|
|
|
| |
This function has been in several releases without problem, and is short
enough that some compilers can inline it. This commit also notes that
the result should not be ignored, and removes the unused pTHX. The
function has explicitly been marked as being changeable, and has not
been part of the API until now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I moved subroutine signature processing into perly.y with
v5.25.3-101-gd3d9da4, I added a new lexer PL_expect state, XSIGVAR.
This indicated, when about to parse a variable, that it was a signature
element rather than a my variable; in particular, it makes ($,...)
be toked as the lone sigil '$' rather than the punctuation variable '$,'.
However this is a bit heavy-handled; so instead this commit adds a
new allowed pseudo-keyword value to PL_in_my: as well as KEY_my, KEY_our and
KEY_state, it can now be KEY_sigvar. This is a less intrusive change
to the lexer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code is confusing, and confusion has resulted in bugs. So rework
the code a bit to make it more comprehensible.
gv_magicalize no longer has such an arcane return value. What it does
now is simply add appropriate magic to (or do appropriate vivification
in) the GV passed as its argument. It returns true if magic or vivi-
fication was applicable.
The caller (gv_fetchpvn_flags) uses that return value to determine
whether to install the GV in the symbol table, if the caller has
requested that a symbol only be added if it is a magical one
(GV_ADDMG).
This reworking does mean that the GV is now checked for content even
when it would make no difference, but I think the resulting clarity
(ahem, *relative* clarity) of the code is worth it.
|
|
|
|
|
|
| |
All the callers create the SV on the fly. We might as well put the
SV creation into the function itself. (A forthcoming commit will
refactor things to avoid the SV when possible.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many built-in variables that perl creates on demand for
efficiency’s sake. gv_fetchpvn_flags (which is responsible for sym-
bol lookup) will fill in those variables automatically when add-
ing a symbol.
The special GV_ADDMG flag passed to this function by a few code paths
(such as defined *{"..."}) tells gv_fetchpvn_flags to add the symbol,
but only if it is one of the ‘magical’ built-in variables that we pre-
tend already exist.
To accomplish this, when the GV_ADDMG flag is passed,
gv_fetchpvn_flags, if the symbol does not already exist, creates a new
GV that is not attached to the stash. It then runs it through its
magicalization code and checks afterward to see whether the GV
changed. If it did, then it gets added to the stash. Otherwise, it
is discarded.
Three of the variables, %-, %!, and $], are problematic, in that they
are implemented by external modules. gv_fetchpvn_flags loads those
modules, which tie the variable in question, and then control is
returned to gv_fetchpvn_flags. If it has a GV that has not been
installed in the symbol table yet, then the module will vivify that GV
on its own by a recursive call to gv_fetchpvn_flags (with the GV_ADD
flag, which does none of this temporary-dangling-GV stuff), and
gv_fetchpvn_flags will have a separate one which, when installed,
would clobber the one with the tied variable.
We solved that by having the GV installed right before calling the
module, for those three variables (in perl 5.16).
The implementation changed in commit v5.19.3-437-g930867a, which was
supposed to clean up the code and make it easier to follow. Unfortun-
ately there was a bug in the implementation. It tries to install the
GV for those cases *before* the magicalization code, but the logic is
wrong. It checks to see whether we are adding only magical symbols
(addmg) and whether the GV has anything in it, but before anything has
been added to the GV. So the symbol never gets installed. Instead,
it just leaks, and the one that the implementing module vivifies
gets used.
This leak can be observed with XS::APItest::sv_count:
$ ./perl -Ilib -MXS::APItest -e 'for (1..10){ defined *{"!"}; delete $::{"!"}; warn sv_count }'
3833 at -e line 1.
4496 at -e line 1.
4500 at -e line 1.
4504 at -e line 1.
4508 at -e line 1.
4512 at -e line 1.
4516 at -e line 1.
4520 at -e line 1.
4524 at -e line 1.
4528 at -e line 1.
Perl 5.18 does not exhibit the leak.
So in this commit I am finally implementing something that was dis-
cussed about the time that v5.19.3-437-g930867a was introduced. To
avoid the whole problem of recursive calls to gv_fetchpvn_flags vying
over whose GV counts, I have stopped the implementing modules from
tying the variables themselves. Instead, whichever gv_fetchpvn_flags
call is trying to create the glob is now responsible for seeing that
the variable is tied after the module is loaded. Each module now pro-
vides a _tie_it function that gv_fetchpvn_flags can call.
One remaining infelicity is that Errno mentions $! in its source, so
*! will be vivified when it is loading, only to be clobbered by the
GV subsequently installed by gv_fetch_pvn_flags. But at least it
will not leak.
One test that failed as a result of this (in t/op/magic.t) was try-
ing to undo the loading of Errno.pm in order to test it afresh with
*{"!"}. But it did not remove *! before the test. The new logic in
the code happens to work in such a way that the tiedness of the vari-
able determines whether the module needs to be loaded (which is neces-
sary, now that the module does not tie the variable). Since the test
is by no means normal code, it seems reasonable to change it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently subroutine signature parsing emits many small discrete ops
to implement arg handling. This commit replaces them with a couple of ops
per signature element, plus an initial signature check op.
These new ops are added to the OP tree during parsing, so will be visible
to hooks called up to and including peephole optimisation. It is intended
soon that the peephole optimiser will take these per-element ops, and
replace them with a single OP_SIGNATURE op which handles the whole
signature in a single go. So normally these ops wont actually get executed
much. But adding these intermediate-level ops gives three advantages:
1) it allows the parser to efficiently generate subtrees containing
individual signature elements, which can't be done if only OP_SIGNATURE
or discrete ops are available;
2) prior to optimisation, it provides a simple and straightforward
representation of the signature;
3) hooks can mess with the signature OP subtree in ways that make it
no longer possible to optimise into an OP_SIGNATURE, but which can
still be executed, deparsed etc (if less efficiently).
This code:
use feature "signatures";
sub f($a, $, $b = 1, @c) {$a}
under 'perl -MO=Concise,f' now gives:
d <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->d
1 <;> nextstate(main 84 foo:6) v:%,469762048 ->2
2 <+> argcheck(3,1,@) v ->3
3 <;> nextstate(main 81 foo:6) v:%,469762048 ->4
4 <+> argelem(0)[$a:81,84] v/SV ->5
5 <;> nextstate(main 82 foo:6) v:%,469762048 ->6
8 <+> argelem(2)[$b:82,84] vKS/SV ->9
6 <|> argdefelem(other->7)[2] sK ->8
7 <$> const(IV 1) s ->8
9 <;> nextstate(main 83 foo:6) v:%,469762048 ->a
a <+> argelem(3)[@c:83,84] v/AV ->b
- <;> ex-nextstate(main 84 foo:6) v:%,469762048 ->b
b <;> nextstate(main 84 foo:6) v:%,469762048 ->c
c <0> padsv[$a:81,84] s ->d
The argcheck(3,1,@) op knows the number of positional params (3), the
number of optional params (1), and whether it has an array / hash slurpy
element at the end. This op is responsible for checking that @_ contains
the right number of args.
A simple argelem(0)[$a] op does the equivalent of 'my $a = $_[0]'.
Similarly, argelem(3)[@c] is equivalent to 'my @c = @_[3..$#_]'.
If it has a child, it gets its arg from the stack rather than using $_[N].
Currently the only used child is the logop argdefelem.
argdefelem(other->7)[2] is equivalent to '@_ > 2 ? $_[2] : other'.
[ These ops currently assume that the lexical var being introduced
is undef/empty and non-magival etc. This is an incorrect assumption and
is fixed in a few commits' time ]
|
|
|
|
| |
This is principally so that it can be accessed from perly.y too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the signature of a sub (i.e. the '($a, $b = 1)' bit) is parsed
in toke.c using a roll-your-own mini-parser. This commit makes
the signature be part of the general grammar in perly.y instead.
In theory it should still generate the same optree as before, except
that an OP_STUB is no longer appended to each signature optree: it's
unnecessary, and I assume that was a hangover from early development of
the original signature code.
Error messages have changed somewhat: the generic 'Parse error' has
changed to the generic 'syntax error', with the addition of ', near "xyz"'
now appended to each message.
Also, some specific error messages have been added; for example
(@a=1) now says that slurpy params can't have a default vale, rather than
just giving 'Parse error'.
It introduces a new lexer expect state, XSIGVAR, since otherwise when
the lexer saw something like '($, ...)' it would see the identifier
'$,' rather than the tokens '$' and ','.
Since it no longer uses parse_termexpr(), it is no longer subject to the
bug (#123010) associated with that; so sub f($x = print, $y) {}
is no longer mis-interpreted as sub f($x = print($_, $y)) {}
|
|
|
|
|
| |
This also involves moving some complicated debugging statements to a
separate function so that it can be called from more than one place
|
|
|
|
| |
These functions are no longer needed in re_comp.c
|
|
|
|
|
|
| |
It turns out that the changes in
0854ea0b9abfd9ff71c9dca1b5a5765dad2a20bd caused two functions to no
longer be used in re_comp.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several commits in the 5.23 series improved the display of the compiled
ANYOF regnodes, but introduced two bugs. One of them is in \p{Any} and
similar things that match the entire range 0-255. That range is omitted,
so it looks like \p{Any} only matches code points above 255. Note that
this is only what gets displayed under -Dr. What actually gets compiled
has been and still is fine.
The other is that when displaying a pattern that still has unresolved
user-defined properties that are complemented, it doesn't show properly
that the whole thing is complemented. That is, the output looks like it
doesn't obey De Morgan's laws.
The fixes to these are quite intertwined, and so I didn't try to
separate them.
|
|
|
|
| |
Nothing uses it now, and it does nothing.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function scan_word, in toke.c, is used to parse barewords. The
scan_ident function is used to scan an identifier after a sigil.
Prior to v5.17.9-108-g07f7264, both functions had their own parsing
loops, and scan_ident actually had two, one for $foo and another
for ${foo}.
The state purpose of 07f7264 was to fix discrepancies in the parsing
of $foo vs ${foo}, by making the two forms use the same parsing code.
In accomplishing this, the commit in question merged not only the
two loops in scan_ident, but all three loops, including the one in
scan_word, by introducing a new function, parse_ident, that the
others call.
One result was that some logic appropriate only to scan_word started
to be applied also to scan_ident; namely, that ::$ would be explicitly
checked for and disallowed (the parsing would stop before the ::), for
the sake of the “Bad name after Foo::” error.
The consequence was that "$foo::$bar" started to be parsed as
$foo."::".$bar, instead of $foo:: . $bar, as previously.
Now, "$foo::@bar" was unaffected, so by fixing one form of inconsis-
tency we ended up form, including B::Deparse bugs (because B::Deparse
was not consistent with the core).
This commit restores the previous behaviour by giving parse_ident an
extra parameter, making the ::$ check optional.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This subject has a long history see [perl #114576] for more discussion.
https://rt.perl.org/Public/Bug/Display.html?id=114576
There are a variety of reasons we want to change the return signature of
scalar(%hash). One is that it leaks implementation details about our
associative array structure. Another is that it requires us to keep track
of the used buckets in the hash, which we use for no other purpose but
for scalar(%hash). Another is that it is just odd. Almost nothing needs to
know these values. Perhaps debugging, but we have several much better
functions for introspecting the internals of a hash.
By changing the return signature we can remove all the logic related
to maintaining and updating xhv_fill_lazy. This should make hot code
paths a little faster, and maybe save some memory for traversed hashes.
In order to provide some form of backwards compatibility we adds three
new functions to the Hash::Util namespace: bucket_ratio(), num_buckets()
and used_buckets(). These functions are actually implemented in
universal.c, and thus always available even if Hash::Util is not loaded.
This simplifies testing. At the same time Hash::Util contains backwards
compatible code so that the new functions are available from it should
they be needed in older perls.
There are many tests in t/op/hash.t that are more or less obsolete after
this patch as they test that xhv_fill_lazy is correctly set in various
situations. However since we have a backwards compat layer we can just
switch them to use bucket_ratio(%hash) instead of scalar(%hash) and keep
the tests, just in case they are actually testing something not tested
elsewhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major code changes needed to support Unicode 9.0 are to changes in
the boundary (break) rules, for things like \b{lb}, \b{wb}.
regen/mk_invlists.pl creates two-dimensional arrays for all these
properties. To see if a given point in the target string is a break or
not, regexec.c looks up the entry in the property's table whose row
corresponds to the code point before the potential break, and whose
column corresponds to the one after. Mostly this is completely
determining, but for some cases, extra context is required, and the
array entry indicates this, and there has to be specially crafted code
in regexec.c to handle each such possibility. When a new release comes
along, mk_invlists.pl has to be changed to handle any new or changed
rules, and regexec.c has to be changed to handle any changes to the
custom code.
Unfortunately this is not a mature area of the Standard, and changes are
fairly common in new releases. In part, this is because new types of
code points come along, which need new rules. Sometimes it is because
they realized the previous version didn't work as well as it could. An
example of the latter is that Unicode now realizes that Regional
Indicator (RI) characters come in pairs, and that one should be able to
break between each pair, but not within a pair. Previous versions
treated any run of them as unbreakable. (Regional Indicators are a
fairly recent type that was added to the Standard in 6.0, and things are
still getting shaken out.)
The other main changes to these rules also involve a fairly new type of
character, emojis. We can expect further changes to these in the next
Unicode releases.
\b{gcb} for the first time, now depends on context (in rarely
encountered cases, like RI's), so the function had to be changed from a
simple table look-up to be more like the functions handling the other
break properties.
Some years ago I revamped mktables in part to try to make it require as
few manual interventions as possible when upgrading to a new version of
Unicode. For example, a new data file in a release requires telling
mktables about it, but as long as it follows the format of existing
recent files, nothing else need be done to get whatever properties it
describes to be included.
Some of changes to mktables involved guessing, from existing limited
data, what the underlying paradigm for that data was. The problem with
that is there may not have been a paradigm, just something they did ad
hoc, which can change at will; or I didn't understand their unstated
thinking, and guessed wrong.
Besides the boundary rule changes, the only change that the existing
mktables couldn't cope with was the addition of the Tangut script, whose
character names include the code point, like CJK UNIFIED IDEOGRAPH-3400
has always done. The paradigm for this wasn't clear, since CJK was the
only script that had this characteristic, and so I hard-coded it into
mktables. The way Tangut is structured may show that there is a
paradigm emerging (but we only have two examples, and there may not be a
paradigm at all), and so I have guessed one, and changed mktables to
assume this guessed paradigm. If other scripts like this come along,
and I have guessed correctly, mktables will cope with these
automatically without manual intervention.
|
|
|
|
|
|
| |
The previous commit causes this function being moved to be just a
wrapper not called in core. Just in case someone is calling it, it is
retained, but moved to mathoms.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some platforms, the libc strxfrm() works reasonably well on UTF-8
locales, giving a default collation ordering. It will assume that every
string passed to it is in UTF-8. This commit changes Perl to make sure
that strxfrm's expectations are met.
Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8
string. And this commit makes sure of that as well.
So, simply meeting strxfrm's expectations allows Perl to start
supporting default collation in UTF-8 locales, and fixes it to work on
single-byte locales with UTF-8 input. (Unicode::Collate provides
tailorable functionality and is portable to platforms where strxfrm
isn't as intelligent, but is a much more heavy-weight solution that may
not be needed for particular applications.)
There is a problem in non-UTF-8 locales if the passed string contains
code points representable only in UTF-8. This commit causes them to be
changed, before being passed to strxfrm, into the highest collating
character in the locale that doesn't require UTF-8. They then will sort
the same as that character, which means after all other characters in
the locale but that one. In strings that don't have that character,
this will generally provide exactly correct operation. There still is a
problem, if that character, in the given locale, combines with adjacent
characters to form a specially weighted sequence. Then, the change of
these above-255 code points into that character can skew the results.
See the commit message for 6696cfa7cc3a0e1e0eab29a11ac131e6f5a3469e for
more on this. But it is really an illegal situation to have above-255
code points in a single-byte locale, so this behavior is a reasonable
degradation when given illegal input. If two transformed strings
compare exactly equal, Perl already uses the un-transformed versions to
break ties, and there, these faked-up strings will collate so the
above-255 code points sort after everything else, and in code point
order amongst themselves.
|
| |
|
|
|
|
|
|
|
| |
The functions sv_setpviv() and sv_setpviv_mg() exist only for backcompat
with 5.005. (verified with Jarkko and Nicholas). Make them compile only
when mathoms is present. They can't be moved to mathoms.c because they
both call a static function visible only in sv.c.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit changes this flag to mean that the backward compatibility
functions are compiled unless the -DNO_MATHOMS cflag is specified to
Configure. Previously the meaning was sort of like that but not
precisely.
Doing this means that the prototypes that needed to be manually added to
mathoms.c are no longer needed. No special parameter assertions have to
be made. makedef.pl no longer needs to parse mathoms.c and have special
cases for it. And several special case entries in embed.fnc can be
non-special cased.
|
|
|
|
|
|
| |
This reverts commit 2e08dfb2b133af0fbcb4346f8d096ca68454ca54, thus
reinstating the commit it reverted. This was delayed until 5.25. The
next commit will solve some problems with c++ that this commit causes
|
|
|
|
|
|
| |
This reverts commit fea1d2dd5d210564d442a09fe034b62f262f35f9 due to it
causing problems so close to the release of 5.24. See
https://rt.perl.org/Ticket/Display.html?id=127852
|
|
|
|
|
|
|
|
| |
It had rotted a bit Well, more than one probably.
Move the declarations of the functions Perl_mem_log_alloc etc from handy.h
into embed.fnc where whey belong, and where Malloc_t will have already
been defined.
|
|
|
|
| |
... thus avoiding a function call overhead
|
|
|
|
|
|
|
|
| |
this should fix the smoke failures on threaded builds,
also it renames re_indentfo which was a terrible name in the first
place, and now what i have had to strip the Perl_prefixes from
these subs with a perl -i -pe, I took the opportunity to rename
it to re_exec_indent, which self documents much better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces three new subs:
Perl_re_printf() which is a wrapper for
PerlIO_printf( Perl_debug_log, ... ),
which cuts down on clutter in the code. Arguably this could be moved
to util.c and renamed something like PerlIO_debugf() and then we could
declutter all the statements that write to the Perl_debug_log
filehandle. But that is a bit too ambituous for me right now, so
I leave this as a regex engine only sub for now.
Perl_re_indentf() which is a wrapper for PerlIO_re_printf(),
which adds an indent argument and automatically indents the
line appropriately, and is used in regcomp.c for trace diagnostics
during compilation.
Perl_re_indentfo() which is similar to Perl_re_indentf() but
is used in regexec.c which adds a specific prefix to each indented
line to account for the fact that during execution we normally have
string position information on the left.
The end result of this patch is that a lot of clutter in the debugging
statements in the regex engine is reduced, exposing what is actually
going on. It should also now be easier to add new diagnostics which
"do the right thing".
Over time the debugging trace output in regexec has become
very cluttered and confusing. This patch cleans much of it up,
if something happens at a given recursion depth it is output
at the right depth, etc, and formats have been changed to not have
leading spaces so you can actually see the indentation properly.
|
| |
|