| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For S_ functions, remove the context.
For Perl_ functions, add PERL_UNUSED_CONTEXT.
Tricky because sometimes depends on DEBUGGING, and sometimes
on whether we are have PERL_IMPLICIT_SYS.
(Why all the mathoms Perl_is_uni_... and Perl_is_utf8_...
functions are not being whined about is a mystery.)
vutil.c (included via util.c) has one of these, but it's cpan/,
and a known problem: https://rt.cpan.org/Ticket/Display.html?id=96100
|
|
|
|
|
| |
The problem was that a function was defined only in PERL_CORE, and
embed.fnc just needed to change to grant access outside that.
|
|
|
|
|
|
| |
MAD = Misc Attribute Decoration; unmaintained attempt at preserving
the Perl parse tree more faithfully so that automatic conversion to
Perl 6 would have been easier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit also causes escaped (by a backslash) "(", "[", and "{" to be
considered literally. In the previous 2 Perl versions, the escaping was
ignored, and a (default-on) deprecation warning was raised. Now that we
have warned for 2 release cycles, we can change the meaning.of escaping
to actually do something
Warning when a literal left brace is not escaped by a backslash, will
allow us to eventually use this character in more contexts as being
meta, allowing us to extend the language. For example, the lower limit
of a quantifier could be omited, and better error checking instituted,
or things like \w could be followed by a {...} indicating some special
word character, like \w{Greek} to restrict to just Greek word
characters.
We tried to do this in v5.16, and many CPAN modules changed to backslash
their left braces at that time. However we had to back out that change
before 5.16 shipped because it turned out that escaping a left brace in
some contexts didn't work, namely when the brace would normally be a
metacharacter (for example surrounding a quantifier), and the pattern
delimiters were { }. Instead we raised the useless backslash warning
mentioned above, which has now been there for the requisite 2 cycles.
This patch partially reverts 2 patches. The first,
e62d0b1335a7959680be5f7e56910067d6f33c1f, partially reverted
the deprecation of unescaped literal left brace. The other,
4d68ffa0f7f345bc1ae6751744518ba4bc3859bd, instituted the deprecation of
the useless left-characters.
Note that, as in the original attempt to deprecate, we don't raise a
warning if the left brace is the first character in the pattern. This
is because in that position it can't be a metacharacter, so we don't
require any disambiguation, and we found that if we did raise an error,
there were quite a few places where this occurred.
|
|
|
|
|
|
|
|
| |
This meant sprinkling some PERL_UNUSED_CONTEXT invocations,
as well as stopping some functions from getting my_perl in
the first place; all of the functions in the latter category
are internal (S_ prefix and s or i in embed.fnc), so
this should be both safe and economical.
|
|
|
|
|
| |
Namely, die_nocontext, die, die_sv, and screaminstr. They
all croak and never return, so let's mark them as non-returning.
|
|
|
|
|
|
| |
After commits d6ded95025185cb1ec8ca3ba5879cab881d8b180 and
130c5df3625bd130cd1e2771308fcd4eb66cebb2, there are some compilation
warnings if not all locale categories are used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PL_markstack_ptr was read once to do the ++ and comparison. Then after
the markstack_grow call, or not, depending on the branch. The code reads
PL_markstack_ptr a 2nd time. It has to be reread in case (or always does)
markstack_grow reallocs the mark stack. markstack_grow has a void retval.
That is a waste of a register. Let us put it to use to return the new
PL_markstack_ptr. In markstack_grow the contents that will be assigned to
PL_markstack_ptr are already in a register. So let the I32* flow out from
markstack_grow to its caller.
In VC2003 32 bit asm, mark_stack_entry is register eax. The retval of
markstack_grow is in eax. So the assignment "=" in
"mark_stack_entry = markstack_grow();" has no overhead. Since the other,
not extend branch, is function call free,
"(mark_stack_entry = ++PL_markstack_ptr)" assigns to eax. Ultimatly with
this patch a 3 byte mov instruction is saved for each instance of PUSHMARK,
and 1 interp var read is removed. I observed 42 callers of markstack_grow
with my disassembler, so theoretically 3*42 bytes of machine code was
removed for me.
Perl_pp_pushmark dropped from 0x2b to 0x28 bytes of x86 VC 2003
machine code. [perl #122034]
|
|
|
|
|
|
|
|
| |
Useful for at least debugging.
Supported in Linux and OS X (possibly to some extent in *BSD).
See perlhacktips for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringification of $! has long been an outlier in Perl locale
handling. The theory has been that these operating system messages are
likely to be of use to the final user, and should be in their language.
Things like
No space left on device
Can't fork
are not something the program is likely to handle, but could be
meaningfully helpful to the end-user.
There are problems with this though. One is that many perl messages are
in English, with the $! appended to them, so that the resultant message
is of mixed language, and may need to be translated anyway. Things like
No space left on device
probably won't need the remaining portion of the message to give someone
a clear indication as to what's wrong. But there are many other
messages where both the OS error and the Perl error would be needed
togther to understand the problem. An on-line translation tool can be
used to do this.
Another problem is that it can lead to garbage coming out on the user's
terminal when the program is not expecting UTF-8, but the underlying
locale is UTF-8. This is what happens in Bug #112208, and another that
was merged with it. It's a lot harder to translate mojibake via an
online tool than English.
This commit solves that by using the C locale for messages, except
within the scope of 'use locale'. It is extremely likely that the
messages in the C locale will be English, but if not they will be ASCII,
and there will be no garbage printed. A program that says "use locale"
is indicating that it has the intelligence necessary to deal with
locales.
|
|
|
|
|
|
|
| |
This commit allows one to specify to enable locale-awareness for only a
specified subset of the locale categories. Thus you could make a
section of code LC_MESSAGES aware, with no locale-awareness for the
other categories.
|
|
|
|
|
|
|
|
|
| |
Rare, but not unheard of, is for the strings returned by localeconv to
be in UTF-8. This commit looks for and sets the UTF-8 flag if they are.
so encoded.
A private function had to changed from static for this. It is renamed
to begin with an underscore to emphasize its private nature.
|
|
|
|
|
|
|
|
|
| |
self-documenting
The logic of setting up an AHO-CORASICK regex start class was not fully
encapsuated in the make_trie_failtable() function, which itself was
poorly named. Merged the code into make_trie_failtable() and renamed
it to construct_ahocorasick_from_trie().
|
|
|
|
|
| |
This entailed creating new internal functions for some of them to call
so that the functionality can be retained during the deprecation period.
|
|
|
|
|
|
| |
This function is now more efficiently implemented as a synonym for
isUTF8_CHAR(). We retain the Perl_is_utf8_char_buf() function for code
that calls it that way.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This macro will inline the code to determine if a character is
well-formed UTF-8 for code points below a certain value, falling back to
a slower function for larger ones. On ASCII platforms, it will inline
for well-beyond all legal Unicode code points. On EBCDIC, it currently
does it for code points up to 0x3FFF. This could be increased, but our
porting tests do the regen every time to make sure everything is ok, and
making it larger slows that down. This is worked around on ASCII by
normally commenting out the code that generates this info, but including
in utf8.h a version that did get generated. This is static information
and won't change. (This could be done for EBCDIC too, but I chose not
to at this time as each code page has a different macro generated, and
it gets ugly getting all of them in utf8.h)
Using this macro allowed for simplification of several functions in
utf8.c
|
|
|
|
|
| |
This is in preparation for it being called from outside utf8.c. It is
renamed to have a leading underscore to emphasize its private nature
|
|
|
|
| |
This is in preparation for it to be called from another place
|
|
|
|
| |
There are other cases where this functionality will be needed as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings Perl regular expressions more into conformance with Unicode.
/x now accepts 5 additional characters as white space. Use of these
characters as literals under /x has been deprecated since 5.18, so now
we are free to change what they mean.
This commit eliminates the static function that processes the old
whitespace definition (and a generated macro that was used only for
this), using the already existing one for the new definition. It
refactors slightly the static function that skips comments to mesh
better with the needs of its callers, and calls it in one place where
before the code was essentially duplicated.
p5p discussion starting in
http://nntp.perl.org/group/perl.perl5.porters/214726 convinced me that
the (?[ ]) comments should be terminated the same way as regular /x
comments, and this was also done in this commit. No prior notice is
necessary as this is an experimental feature.
|
|
|
|
| |
This automatically generates assertions for pointer arguments, etc.
|
|
|
|
|
|
|
| |
Otherwise the prototype is seen in places where the function
itself is not defined, which is illegal for inline functions.
Follow-up to 7cb3f9598b37fc.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 137da2b05b4b7628115049f343163bdaf2c30dbb. See the
"How about having a recommended way to add constant subs dynamically?"
thread on perl5-porters, specifically while it sucks that we have this
bug, it's been documented to work this way since 5.003 in "Constant
Functions" in perlsub:
If the result after optimization and constant folding is either a
constant or a lexically-scoped scalar which has no other references,
then it will be used in place of function calls made without C<&>
-- http://perldoc.perl.org/perlsub.html#Constant-Functions
Since we've had this documented bug for a long time we should introduce
this fix in a deprecation cycle rather than silently slowing down code
that assumes it's going to be optimized by constant folding.
I didn't revert the tests it t/op/sub.t, but turned them into TODO tests
instead.
Conflicts:
t/op/sub.t
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This failed as in it was producing:
Argument "123abc" treated as 0 in increment (++) at -e line 1.
when the user incremented that value (which is a lie).
This reverts commits 8140a7a801e37d147db0e5a8d89551d9d77666e0 and
2cd5095e471e1d84dc9e0b79900ebfd66aabc909.
I expect to revert this commit, and add fixes, after 5.20 is released.
Conflicts:
pod/perldiag.pod
|
|
|
|
|
|
|
| |
Perl_do_open_raw() handles the as_raw part of Perl_do_openn().
Perl_do_open6() handles the !as_raw part of Perl_do_openn().
do_open6() isn't a great name, but I can't see an obvious concise name that
covers 2 arg open, 3 arg open, piped open, implicit fork, and layers.
|
|
|
|
|
|
| |
A 12 parameter function is extremely ugly (as demonstrated by the need to add
macros for it to perl.h), but it's private, and it will permit the two-headed
public interface of Perl_do_openn() to be simplified.
|
| |
|
|
|
|
|
|
| |
Changes nothing except that it introduces hv_auxinit_interal() which
does part of the job of hv_auxinit(), so that we can call it from
somewhere else in the next commit.
|
|
|
|
|
|
|
|
| |
Perl_rpeep() elides OP_NULLs etc in the middle of an op_next chain, but
not at the start or end. Doing it at the start is hard (and not addressed
here); doing it at the end is trivial, and it just looks like a mistake in
the original code (there since 1994) that was (incorrectly) worried about
following through a null pointer.
|
|
|
|
|
|
|
| |
The static function get_ANYOF_cp_list_for_ssc() takes a struct formal
parameter that is a superset of what it actually uses. The calls to it
have to cast to that superset. By setting the parameter to the smallest
structure it uses, we simplify things.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an optimization for OP trees that involve list OPs in list
context. In list context, the list OP's first child, a pushmark, will do
what its name claims and push a mark to the mark stack, indicating the
start of a list of parameters to another OP. Then the list's other
child OPs will do their stack pushing. Finally, the list OP will be
executed and do nothing but undo what the pushmark has done. This is
because the main effect of the list OP only really kicks in if it's
not in array context (actually, it should probably only kick in if
it's in scalar context, but I don't know of any valid examples of
list OPs in void contexts).
This optimization is quite a measurable speed-up for array or hash
slicing and some other situations. Another (contrived) example is
that (1,2,(3,4)) now actually is the same, performance-wise as
(1,2,3,4), albeit that's rarely relevant.
The price to pay for this is a slightly convoluted (by standards other
than the perl core) bit of optimization logic that has to do minor
look-ahead on certain OPs in the peephole optimizer.
A number of tests failed after the first attack on this problem. The
failures were in two categories:
a) Tests that are sensitive to details of the OP tree structure and did
verbatim text comparisons of B::Concise output (ouch). These are just
patched according to the new red in this commit.
b) Test that validly failed because certain conditions in op.c were
expecting OP_LISTs where there are now OP_NULLs (with op_targ=OP_LIST).
For these, the respective conditions in op.c were adjusted.
The change includes modifying B::Deparse to handle the new OP tree
structure in the face of nulled OP_LISTs.
|
|
|
|
|
| |
We pass in the regmatch_info struct, which allows us to dump
what a given REF is going to match.
|
|
|
|
|
|
|
|
| |
I keep forgetting that the OP of a regnode is not defined in Pass 1 of
the regex compiler. This is likely the cause of inconsistent results in
lib/locale.t, as valgrind shows there to be a read of uninitialized
data before this patch, and the result is randomly tainting when there
shouldn't be, consistent with the test failures.
|
|
|
|
| |
I don't think this function will need to be used again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe this will fix the remaining alignment problems recently being
shown on gcc on HP-UX, It works on the procura machine.
regnodes should not have stricter alignment than required by U32, for
reasons given in the comments this commit adds to the beginning of
regcomp.h. Commit 31f05a37 added a new ANYOF regnode struct with a
pointer field. This requires stricter alignment on some 64-bit platforms,
and hence doesn't work on those platforms.
This commit removes that regnode struct type, and instead stores the
pointer it used via a more indirect, but already existing mechanism
that stores other data..
The function that returns that other data is enlarged to return this new
field as well. It now needs to be called from regcomp.c, so the
previous commit had renamed and made it accessible from there. The
"public" function that wraps this one is unchanged. (I put "public" in
quotes here, because I don't think anyone outside core is or should be
using it, but since it has been publicly available for a long time, I'm
treating the API as unchangeable. regcomp.c called this public function
before this commit, but needs the additional data returned by the inner
one).
|
|
|
|
|
|
| |
This is in preparation for a future commit where the function does more
things so its current name would be misleading. It will need to be
callable from regcomp.c as well.
|
|
|
|
|
|
|
|
|
|
| |
Under /i matching, many characters match only themselves, such a
punctuation. If a node contains only such characters it can be an EXACT
node. The optimizer gets better hints when dealing with EXACT nodes
than ones with folding.
This changes the alloc_maybe_populate() function to look for
possibilities of non-folding input.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bracketed character class (e.g. /[abc]/) in regular expression
patterns is implemented as an ANYOF regnode. There are several
different structs used for these, each a superset of the next smaller
size, with extra fields tacked on to its end. Bits in the part common
to all of them are set to indicate which size this particular instance
is.
Several functions in regcomp.c take the largest of these as a formal
parameter, even though a smaller one may actually be passed. This
avoids the need to have casts to access the optional fields, but the
code needs to be careful to check the common part bits before trying to
access a portion that may not actually be present. This practice dates
to at least Perl v5.6.2.
It turns out that there is further a problem with this if the tacked-on
fields require a stricter alignment than the common fields. The code in
the functions may assume that the actual parameter has suitable
alignment, which may not be the case.
Some months ago I added some extra optional pointer fields, which have
stricter alignment requirements on 64-bit machines than the common
portion, but no apparent problems ensued.
Then, I changed things slightly, so that the gcc compiler on HP machines
found an optimization possibility whose use required the proper
alignment, which wasn't present, and bus errors started happening there.
Tony Cook diagnosed the problem. A summary of his work can be found
at http://markmail.org/message/hee5zyah7rb62c72
This commit changes the formal parameter to the smallest ANYOF struct,
and uses casts to acess the optional portions.
I don't know how common the coding style formerly used in regcomp.c is,
but it is dangerous and can lead to unrelated changes causing errors.
This commit should enable gcc builds to complete on the HP gcc smokers
(previously miniperl built, but crashed in building the rest of perl),
but we're not sure because unrelated header issues on the gcc on the
machine that we have access to prevent blead from fully compiling there.
There remain alignment bugs which will cause the tests to fail there, as
the appended pointer field needs to have strict alignment on that
platform, but when the regnodes are allocated alignment isn't done. I
am working on fixing those.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locale initialization and setting on Windows haven't been as
described in perllocale for setting locales to "". This is because that
tells Windows to use the system default locale, as set through the
Control Panel, but on POSIX systems, it means to look at various
environment variables.
This commit creates a wrapper for setlocale, used only on Windows, that
looks for the appropriate environment variables when called with a ""
input locale. If none are found, it continues to use the system default
locale.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the "just matched float substr, now match fixed substr" branch,
initially add an extra prog->anchored_offset to the last and last2 vars;
since a lot of the later calculations involve adding anchored_offset,
doing this early to the last* vars means less work in some cases. In
particular, last is calculated from s by a single
HOP4(s, prog->anchored_offset-start_shift,...)
rather than two separate
HOP3(s, -start_shift,...);
HOP3(..., prog->anchored_offset,...);
which may mostly cancel each other out.
Similarly with last2. Later, we can skip adding prog->anchored_offset to
last1, since its antecedents already have the bias added.
In the case of failure, calculating a new start position involves an extra
HOP to s, but removes a HOP from other_last, so the two cancel out.
To make this work, I revived the reghop4() function which had been
commented out, and added a HOP4c() wrapper macro. This is like HOP3c(),
but allows you to specify both lower and upper limits. Useful when you
don't know the sign of the offset in advance.
(Yves had earlier added this function, but had commented it out until such
time as it was actually used.)
I also added some extra comments to this block and removed the comment
about it being maybe broken under utf8, since I'm auditing the code for
utf8-safeness.
|
|\ |
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Declarative syntax to unwrap argument list into lexical variables.
"sub foo ($a,$b) {...}" checks number of arguments and puts the
arguments into lexical variables. Signatures are not equivalent to the
existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only
available by enabling a non-default feature, and generate warnings about
being experimental. The syntactic clash with prototypes is managed by
disabling the short prototype syntax when signatures are enabled.
|
| |/
|/|
| |
| |
| | |
These constructs have been deprecated since v5.14 with the intention of
making them fatal in 5.18. This wasn't done; and is being done now.
|
|/ |
|
|
|
|
|
| |
The meaning of these was expanded two commits ago, so update the name to
reflect this, to prevent future confusion
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
|
|
|
|
|
|
|
|
|
| |
The documentation says that Perl taints certain operations when subject
to locale rules, such as lc() and ucfirst(). Prior to this commit
there were exceptions when the operand to these functions contained no
characters whose case change actually varied depending on the locale,
for example the empty string or above-Latin1 code points. Changing to
conform to the documentation simplifies the core code, and yields more
consistent results.
|
|
|
|
| |
This is in preparation for it to be called from a 2nd place.
|