| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0254aed965 says in its message, among other things:
Now that the RXf_MATCH_UTF8 flag on a regex is just used to indicate
whether the captures on a successful match are utf8, only set
this flag at the end of a successful match, rather than at the start of
the match.
But if we only set the utf8 flag at the end of the match, then $^N
accessed within the match will have the flag wrongly set.
|
|
|
|
|
|
| |
Requires adding a new optional argument to regprop as we do not
have a completed regexp object to give us the names, and we need
to get it from RExC_state.
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously the regex pattern compilation flags needed for this construct
would fit into an 8-bit byte. This conveniently fits into the flags
structure element of a regnode. There are changes coming that require
more than 8 bits, so in preparation, this commit adds an argument to the
node that implements (??{}) (31-bits usable for flags), and moves the
storage to that.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 60779a30f stopped setting PADTMP on GVs in pads,
and changed many instances of if(SvPADTMP && !IS_PADGV) to
if(SvPADTMP){assert(!IS_PADGV)...}.
This PADTMP was leftover from the original ithreads implementation
that marked these as PADTMP under the assumption that anything in a
pad had to be marked PADMY or PADTMP.
Since we don’t want these GVs copied the way operator targets are
(PADTMP indicates the latter), we needed this !IS_PADGV check all over
the place.
60779a30f was actually right in removing the flag and the !IS_PADGV
check, because it should be possible for XS modules to create ops that
return copiable GVs. But the assertions prevent that from working.
More importantly (to me at least), this IS_PADGV doesn’t quite make
sense and I am trying to eliminate it.
BTW, you have to be doubly naughty, but it is possible to fail these
assertions:
$ ./perl -Ilib -e 'BEGIN { sub { $::{foo} = \@_; constant::_make_const(@_) }->(*bar) } \ foo'
Assertion failed: (!IS_PADGV(sv)), function S_refto, file pp.c, line 576.
Abort trap: 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See also perl5porters thread titled: "Perl MBOLism in regex engine"
In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda)
the BOL regop was split into two behaviours MBOL and SBOL, with SBOL
and BOL behaving identically. Similarly the EOL regop was split into
two behaviors SEOL and MEOL, with EOL and SEOL behaving identically.
This then resulted in various duplicative code related to flags and
case statements in various parts of the regex engine.
It appears that perhaps BOL and EOL were kept because they are the
type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl
to handle aliases for the type data so that SBOL/MBOL are of type
BOL, even though BOL == SBOL seems to cover that case without adding
to the confusion.
This means two regops, a regstate, and an internal regex flag can
be removed (and used for other things), and various logic relating
to them can be removed.
For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and
MBOL is /^/m. (I consider it a fail we have no way to say MBOL without
the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is
also a /\z/ which is EOS "end of string" with or without the /m).
|
|
|
|
|
|
|
|
| |
This commit allows Perl to be compiled with a bitmap size that is larger
than 256. This bitmap is used to directly look up whether a character
matches or not, without having to do a binary search or hash lookup. It
might improve the performance for some installations that have a lot of
use of scripts that are above the Latin1 range.
|
|
|
|
|
|
|
|
|
| |
These are renamed to be more clear as to their actual meanings. I know
other people have been confused by their former names.
Some of the name changes will become more important as future commits
will allow the bitmap in a bracketed character class to be a different
size.
|
|
|
|
| |
Two "if"s can be combined, leading to one fewer (unoptimized) tests
|
|
|
|
|
|
|
|
|
|
| |
There is an optimization in regcomp.c which saves memory when used, but
which caused -Dr output to display the same code points twice.
I didn't add a test to ext/re/t/regop.t because this only happens for a
Unicode property, under certain circumstances, and that triggers a lot
of other regex patterns to be compiled which are subject to change, so
the test would frequently have to be updated. This only affects
debugging output anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function, though mostly used by regexec.c and previously contained
therein, relies on the particular data structure in regcomp.c, so it
makes sense to move it adjacent to the code that generates the
structure. Future commits will add function calls to this function that
are (currently) only in regcomp.c, and so will tie it tighter to that
file.
This just moves the code with surrounding #ifdef, adds a couple of blank
lines, removes some trailing white space, and adjusts the comments. No
other changes were made.
|
|
|
|
|
|
|
| |
ANYOF nodes (for bracketed character classes) currently are for code
points 0-255. This is the first step in the eventual making that size
configurable. This also renames a static function, as the domain may
not necessarily be 'latin1'
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove (almost all) direct access to the op_sibling field of OP structs,
and use these three new macros instead:
OP_SIBLING(o);
OP_HAS_SIBLING(o);
OP_SIBLING_set(o, new_value);
OP_HAS_SIBLING is intended to be a slightly more efficient version of
OP_SIBLING when only boolean context is needed.
For now these three macros are just defined in the obvious way:
#define OP_SIBLING(o) (0 + (o)->op_sibling)
#define OP_HAS_SIBLING(o) (cBOOL((o)->op_sibling))
#define OP_SIBLING_set(o, sib) ((o)->op_sibling = (sib))
but abstracting them out will allow us shortly to make the last pointer in
an op_sibling chain point back to the parent rather than being null, with
a new flag indicating whether this is the last op.
Perl_ck_fun() still has a couple of direct uses of op_sibling, since it
takes the field's address, which is not covered by these macros.
|
|
|
|
| |
Coverity perl5 CID 68587.
|
|
|
|
| |
Keep on keeping Coverity happy (Coverity perl5 CID 45352).
|
|
|
|
| |
This puts them in a more logical order
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The root cause of this is that the loop is a "< strend" instead of a
"<=". However, the macro that is called to form the loop is used
elsewhere where "<" is appropriate. So I opted to just repeat the
small salient portions of the loop right after it so it gets executed
one final time in the final position.
This code:
if ((!prog->minlen && tmp) && (reginfo->intuit || regtry(reginfo, &s))) \
goto got_it;
had the effect previously of causing \b to look at the position just
after the string, but not \B. By doing it the other way, this is no
longer needed, except I don't understand the prog->minlen portion. It
isn't used anywhere else in the tests which see if one should goto
got_it, nor do any test suite tests fail by removing it. I think it is
a relic, but if a bisect should end up blaming this commit, I'd start
with that.
|
| |
|
|
|
|
|
| |
This reorders things to avoid unnecessary work, as the now-first line
may jump, making the assignment that used to be unconditional irrelevant
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The definition of \w is now compiled into the Perl core. This allows
the complicated swash_fetch function call to be replaced by
isWORDCHAR_utf8, which takes a single parameter, so the interface can be
simplified. [1].
This macro will execute faster on Latin1-range inputs, as it doesn't do
a swash_fetch on them, but slower on other code points due to function
call overhead, and some currently in-place error checking that wasn't
done previously. This overhead could be removed by using inline
functions, and perhaps a different interface for known non-malformed
input (though I'm actually not sure the input is known to be well-formed
in this case).
These macros still depend on and modify outside variables. That could
be cleaned up by adding additional parameters to them, but I'm not going
to do it now. I don't like these kinds of code-generating macros, and
have been tempted to rewrite these as inline functions, but it's not a
trivial task to do.
[1] I hadn't realized it before, but the interface could have been
cleaned up instead by introducting a macro that makes it look like a
single parameter is used uniformly to existing macros, looking like
#define FBC_BOUND_SWASH_FETCH(s) \
cBOOL(swash_fetch(PL_utf8_swash_ptrs[_CC_WORDCHAR], s, utf8_target))
But it seems better to me to use isWORDCHAR_utf8 as it is faster for
Western European languages, and can be made nearly the same speed as the
alternative if experience tells us that this is a slow spot that should
be sped up.
|
|
|
|
|
|
|
|
|
| |
These code-generating macros are a pain to maintain. These have side
effects and use variables outside the macros. This commit cleans up the
interface slightly by separating the operand of a function from the
function, so the code outside doesn't have to know as much about the
macro as before. The next commit will fix up the remaining parameter
like this.
|
|
|
|
|
|
| |
The names were misleading; they were not for general use, but only for
this FBC area of the code, and they no longer have to do with having to
load utf8 or not.
|
|
|
|
|
|
|
| |
I find these a pain to type and read, and there is no reason for them.
As formal parameters there is no possibility of conflict with the code
outside them. And I'm about to do some editing of these, so this change
will make that easier.
|
|
|
|
|
|
|
|
| |
You need to configure with g++ *and* -Accflags=-DPERL_GLOBAL_STRUCT
or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference.
(g++ does not do the "post-annotation" form of "unused".)
The version code has some of these issues, reported upstream.
|
|
|
|
|
|
| |
This reverts commit 148f39b7de6eae9ddd59e0b0aff691d6abea7aca.
(Still needs more work, but wanted to see how well this passed with Jenkins.)
|
|
|
|
|
|
| |
Definitely not *after* it. It marks the start of the unreachable,
not the first unrechable line. And if they are in that order,
it looks better to linebreak after the lint hint.
|
|
|
|
| |
(Detected in HP-UX.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after return/croak/die/exit, return/break are pointless
(break is not a terminator/separator, it's a goto)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(the initializations will be skipped since the flow will bypass
the start of the block); in some easy cases simply hoist the declarations
out of the block and move them earlier
Note 1: Since after this patch the core is not yet -Wunreachable-code
clean, not enabling that via cflags.SH, one needs to -Accflags=... it.
Note 2: At least with the older gcc 4.4.7 there are far too many
"unreachable code" warnings, which seem to go away with gcc 4.8,
maybe better flow control analysis. Therefore, the warning should
eventually be enabled only for modernish gccs (what about clang and
Intel cc?)
|
|
|
|
|
|
|
| |
This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679.
I don't understand - smoke-me came back happy with three
separate reports... oh well, some other time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after croak/die/exit (or return), break (or return!) are pointless
(break is not a terminator/separator, it's a promise of a jump)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(they will be skipped!); in some easy cases simply hoist the declarations
out of the block and move them earlier
There are still a few places left.
|
|
|
|
|
|
|
|
| |
This meant sprinkling some PERL_UNUSED_CONTEXT invocations,
as well as stopping some functions from getting my_perl in
the first place; all of the functions in the latter category
are internal (S_ prefix and s or i in embed.fnc), so
this should be both safe and economical.
|
|
|
|
|
| |
The dual-life dists affected use Devel::PPPort, so can safely use
newSVpvs() even though it wasn't added until Perl v5.8.9.
|
|
|
|
| |
This uses an C automatic variable instead of a malloc and free.
|
|
|
|
| |
There are other cases where this functionality will be needed as well.
|
|
|
|
|
|
| |
We were testing for UTF-8 invariant, when we should have been testing
for ASCII. This is a problem only on EBCDIC platforms, where they mean
two different sets of code points.
|
|
|
|
|
|
|
| |
Used by linters (static checkers), and also good for human readers.
Even though "FALL THROUGH" seems to be the most common, e.g BSD lint
manual only knows "FALLTHROUGH" (or "FALLTHRU").
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity suspects paths like this:
(1)
p = foo();
if (!p) p = bar();
baz(p->q);
since it cannot prove that p is always non-NULL at the dereference.
(2) Or simply something like:
mg = mg_find(...);
foo(mg->blah);
Since mg_find() can fail returning NULL.
Adding assert() calls before the dereferences. Testing with -DDEBUGGING.
Hopefully there are regular smokes doing the same.
[perl #121894]
Fix for Coverity perl5 CIDs 28950,28952..28955,28964,28967,28970..28795,49921:
CID ...: Dereference after null check (FORWARD_NULL)
var_deref_op: Dereferencing null pointer p->q
(TODO: Coverity perl5 CIDs 28962,28968,28969: the same issue, but would
conflict with already in-flight changes, prepare catch-up patch later.)
---
dist/Data-Dumper/Dumper.xs | 1 +
dump.c | 1 +
ext/Devel-Peek/Peek.xs | 1 +
ext/XS-APItest/APItest.xs | 1 +
mg.c | 2 ++
mro.c | 4 +++-
op.c | 4 ++++
perlio.c | 1 +
pp_hot.c | 5 +++--
regcomp.c | 3 +++
regexec.c | 5 ++++-
universal.c | 2 +-
util.c | 4 +++-
13 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/dist/Data-Dumper/Dumper.xs b/dist/Data-Dumper/Dumper.xs
index 12c4ebd..23e0cf4 100644
--- a/dist/Data-Dumper/Dumper.xs
+++ b/dist/Data-Dumper/Dumper.xs
@@ -641,6 +641,7 @@ DD_dump(pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval, HV *seenhv,
else {
sv_pattern = val;
}
+ assert(sv_pattern);
rval = SvPV(sv_pattern, rlen);
rend = rval+rlen;
slash = rval;
diff --git a/dump.c b/dump.c
index 59be3e0..c2d72fd 100644
--- a/dump.c
+++ b/dump.c
@@ -471,6 +471,7 @@ Perl_sv_peek(pTHX_ SV *sv)
finish:
while (unref--)
sv_catpv(t, ")");
+ /* XXX when is sv ever NULL? */
if (TAINTING_get && SvTAINTED(sv))
sv_catpv(t, " [tainted]");
return SvPV_nolen(t);
diff --git a/ext/Devel-Peek/Peek.xs b/ext/Devel-Peek/Peek.xs
index 679efa5..b20fa94 100644
--- a/ext/Devel-Peek/Peek.xs
+++ b/ext/Devel-Peek/Peek.xs
@@ -450,6 +450,7 @@ PPCODE:
BOOT:
{
CV * const cv = get_cvn_flags("Devel::Peek::Dump", 17, 0);
+ assert(cv);
cv_set_call_checker(cv, S_ck_dump, (SV *)cv);
XopENTRY_set(&my_xop, xop_name, "Dump");
diff --git a/ext/XS-APItest/APItest.xs b/ext/XS-APItest/APItest.xs
index a51924d..18ba381 100644
--- a/ext/XS-APItest/APItest.xs
+++ b/ext/XS-APItest/APItest.xs
@@ -2096,6 +2096,7 @@ newCONSTSUB(stash, name, flags, sv)
break;
}
EXTEND(SP, 2);
+ assert(mycv);
PUSHs( CvCONST(mycv) ? &PL_sv_yes : &PL_sv_no );
PUSHs((SV*)CvGV(mycv));
diff --git a/mg.c b/mg.c
index 76912bd..7f3339a 100644
--- a/mg.c
+++ b/mg.c
@@ -1675,6 +1675,7 @@ Perl_magic_clearisa(pTHX_ SV *sv, MAGIC *mg)
same function. */
mg = mg_find(mg->mg_obj, PERL_MAGIC_isa);
+ assert(mg);
if (SvTYPE(mg->mg_obj) == SVt_PVAV) { /* multiple stashes */
SV **svp = AvARRAY((AV *)mg->mg_obj);
I32 items = AvFILLp((AV *)mg->mg_obj) + 1;
@@ -3437,6 +3438,7 @@ Perl_magic_copycallchecker(pTHX_ SV *sv, MAGIC *mg, SV *nsv,
sv_magic(nsv, &PL_sv_undef, mg->mg_type, NULL, 0);
nmg = mg_find(nsv, mg->mg_type);
+ assert(nmg);
if (nmg->mg_flags & MGf_REFCOUNTED) SvREFCNT_dec(nmg->mg_obj);
nmg->mg_ptr = mg->mg_ptr;
nmg->mg_obj = SvREFCNT_inc_simple(mg->mg_obj);
diff --git a/mro.c b/mro.c
index 1b37ca7..ccf4bf4 100644
--- a/mro.c
+++ b/mro.c
@@ -638,12 +638,14 @@ Perl_mro_isa_changed_in(pTHX_ HV* stash)
hv_storehek(mroisarev, namehek, &PL_sv_yes);
}
- if((SV *)isa != &PL_sv_undef)
+ if ((SV *)isa != &PL_sv_undef) {
+ assert(namehek);
mro_clean_isarev(
isa, HEK_KEY(namehek), HEK_LEN(namehek),
HvMROMETA(revstash)->isa, HEK_HASH(namehek),
HEK_UTF8(namehek)
);
+ }
}
}
}
diff --git a/op.c b/op.c
index 796cb03..79621ce 100644
--- a/op.c
+++ b/op.c
@@ -2907,6 +2907,7 @@ S_my_kid(pTHX_ OP *o, OP *attrs, OP **imopsp)
S_cant_declare(aTHX_ o);
} else if (attrs) {
GV * const gv = cGVOPx_gv(cUNOPo->op_first);
+ assert(PL_parser);
PL_parser->in_my = FALSE;
PL_parser->in_my_stash = NULL;
apply_attrs(GvSTASH(gv),
@@ -2929,6 +2930,7 @@ S_my_kid(pTHX_ OP *o, OP *attrs, OP **imopsp)
else if (attrs && type != OP_PUSHMARK) {
HV *stash;
+ assert(PL_parser);
PL_parser->in_my = FALSE;
PL_parser->in_my_stash = NULL;
@@ -10229,6 +10231,7 @@ Perl_ck_split(pTHX_ OP *o)
op_append_elem(OP_SPLIT, o, newDEFSVOP());
kid = kid->op_sibling;
+ assert(kid);
scalar(kid);
if (!kid->op_sibling)
@@ -10902,6 +10905,7 @@ Perl_cv_set_call_checker(pTHX_ CV *cv, Perl_call_checker ckfun, SV *ckobj)
MAGIC *callmg;
sv_magic((SV*)cv, &PL_sv_undef, PERL_MAGIC_checkcall, NULL, 0);
callmg = mg_find((SV*)cv, PERL_MAGIC_checkcall);
+ assert(callmg);
if (callmg->mg_flags & MGf_REFCOUNTED) {
SvREFCNT_dec(callmg->mg_obj);
callmg->mg_flags &= ~MGf_REFCOUNTED;
diff --git a/perlio.c b/perlio.c
index d4c43d0..e6ff9e4 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2225,6 +2225,7 @@ PerlIOBase_dup(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags)
PerlIO_funcs * const self = PerlIOBase(o)->tab;
SV *arg = NULL;
char buf[8];
+ assert(self);
PerlIO_debug("PerlIOBase_dup %s f=%p o=%p param=%p\n",
self ? self->name : "(Null)",
(void*)f, (void*)o, (void*)param);
diff --git a/pp_hot.c b/pp_hot.c
index 2cccc48..9c9d1e9 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -3078,6 +3078,7 @@ S_method_common(pTHX_ SV* meth, U32* hashp)
const HE* const he = hv_fetch_ent(stash, meth, 0, *hashp);
if (he) {
gv = MUTABLE_GV(HeVAL(he));
+ assert(stash);
if (isGV(gv) && GvCV(gv) &&
(!GvCVGEN(gv) || GvCVGEN(gv)
== (PL_sub_generation + HvMROMETA(stash)->cache_gen)))
@@ -3085,9 +3086,9 @@ S_method_common(pTHX_ SV* meth, U32* hashp)
}
}
+ assert(stash || packsv);
gv = gv_fetchmethod_sv_flags(stash ? stash : MUTABLE_HV(packsv),
- meth, GV_AUTOLOAD | GV_CROAK);
-
+ meth, GV_AUTOLOAD | GV_CROAK);
assert(gv);
return isGV(gv) ? MUTABLE_SV(GvCV(gv)) : MUTABLE_SV(gv);
diff --git a/regcomp.c b/regcomp.c
index eaee604..3d49827 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -2007,6 +2007,7 @@ S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode *startbranch,
});
re_trie_maxbuff = get_sv(RE_TRIE_MAXBUF_NAME, 1);
+ assert(re_trie_maxbuff);
if (!SvIOK(re_trie_maxbuff)) {
sv_setiv(re_trie_maxbuff, RE_TRIE_MAXBUF_INIT);
}
@@ -14920,6 +14921,7 @@ S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state,
av_store(av, 0, (runtime_defns)
? SvREFCNT_inc(runtime_defns) : &PL_sv_undef);
if (swash) {
+ assert(cp_list);
av_store(av, 1, swash);
SvREFCNT_dec_NN(cp_list);
}
@@ -16609,6 +16611,7 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node,
last= plast;
while (PL_regkind[op] != END && (!last || node < last)) {
+ assert(node);
/* While that wasn't END last time... */
NODE_ALIGN(node);
op = OP(node);
diff --git a/regexec.c b/regexec.c
index 362390b..ffab4f2 100644
--- a/regexec.c
+++ b/regexec.c
@@ -7010,6 +7010,8 @@ no_silent:
sv_commit = &PL_sv_yes;
sv_yes_mark = &PL_sv_no;
}
+ assert(sv_err);
+ assert(sv_mrk);
sv_setsv(sv_err, sv_commit);
sv_setsv(sv_mrk, sv_yes_mark);
}
@@ -7620,6 +7622,7 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp *prog,
*only_utf8_locale_ptr = ary[2];
}
else {
+ assert(only_utf8_locale_ptr);
*only_utf8_locale_ptr = NULL;
}
@@ -7641,7 +7644,7 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp *prog,
}
else if (doinit && ((si && si != &PL_sv_undef)
|| (invlist && invlist != &PL_sv_undef))) {
-
+ assert(si);
sw = _core_swash_init("utf8", /* the utf8 package */
"", /* nameless */
si,
diff --git a/universal.c b/universal.c
index a29696d..65e02df 100644
--- a/universal.c
+++ b/universal.c
@@ -67,7 +67,7 @@ S_isa_lookup(pTHX_ HV *stash, const char * const name, STRLEN len, U32 flags)
if (our_stash) {
HEK *canon_name = HvENAME_HEK(our_stash);
if (!canon_name) canon_name = HvNAME_HEK(our_stash);
-
+ assert(canon_name);
if (hv_common(isa, NULL, HEK_KEY(canon_name), HEK_LEN(canon_name),
HEK_FLAGS(canon_name),
HV_FETCH_ISEXISTS, NULL, HEK_HASH(canon_name))) {
diff --git a/util.c b/util.c
index b90abe5..cd0afb6 100644
--- a/util.c
+++ b/util.c
@@ -850,15 +850,17 @@ Perl_fbm_instr(pTHX_ unsigned char *big, unsigned char *bigend, SV *littlestr, U
{
const MAGIC *const mg = mg_find(littlestr, PERL_MAGIC_bm);
- const unsigned char * const table = (const unsigned char *) mg->mg_ptr;
const unsigned char *oldlittle;
+ assert(mg);
+
--littlelen; /* Last char found by table lookup */
s = big + littlelen;
little += littlelen; /* last char */
oldlittle = little;
if (s < bigend) {
+ const unsigned char * const table = (const unsigned char *) mg->mg_ptr;
I32 tmp;
top2:
--
1.8.5.2 (Apple Git-48)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix for Coverity perl5 CID 29033: Out-of-bounds read
(OVERRUN) overrun-local: Overrunning array PL_fold_locale of 256 bytes at
byte offset 256 using index c1 (which evaluates to 256).
- the "c1 > 256" was off-by-one, it needed to be "c1 > 255",
it could have caused the PL_fold_locale to be accessed one past the end,
at offset 256, but we have dodged the bullet thanks to the regex engine
optimizing the bad case away before we hit it (analysis by Karl Williamson):
regexec.c
- comment fixes (pointed out by Karl Williamson): regexec.c
- add tests to nail down the behaviour of fold matching
for the last of Latin-1 (0xFF, lowercase which curiously does not have
uppercase within Latin-1). and the first pure Unicode: t/re/pat.t
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent
regexes so that the pattern was considered tainted at compile-time, rather
than determining it each time at run-time whenever it executed a
locale-dependent node. Unfortunately due to the conflating of two flags,
RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently
marking a pattern as tainted once it had had a single tainted result.
E.g.
use re qw(taint);
use Scalar::Util qw(tainted);
for ($^X, "abc") {
/(.*)/ or die;
print "not " unless tainted("$1"); print "tainted\n";
};
which from 5.19.9 onwards output:
tainted
tainted
but with this commit (and with 5.19.8 and earlier), it now outputs:
tainted
not tainted
The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g.
$r = qr/$tainted_value/
while the RXf_TAINTED_SEEN flag means that the results of the last match
are tainted, e.g.
use re 'tainted';
$tainted =~ /(.*)/;
# $1 is tainted
Pre 63baef5, the code used to look like:
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
switch(node) {
case
BOUNDL: /* and other locale-specific ops */
turn on RXf_TAINTED_SEEN;
...;
}
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
63baef5 changed it to:
at compile-time:
if (pattern has locale ops)
turn on RXf_TAINTED_SEEN;
at run-time:
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
This commit changes it to:
at compile-time;
if (pattern has locale ops)
turn on RXf_TAINTED;
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
|
|
|
|
|
|
|
|
|
|
|
| |
My recent commit d0d4464849e2b30aee8 in re_intuit_start() reduced the
scope of a 'skip if multiline' check, so that certain optimisations
weren't being unnecessarily skipped. Unfortunately it didn't reduce the
scope enough, so a vital slen-- was being skipped in the
SvTAIL-but-don't-fail case.
This commit just moves the !multiline test further down, and updates the
commentary and condition formatting a bit.
|
|
|
|
|
| |
After the previous reworking of the PREGf_IMPLICIT tests, move commentary
about PREGf_IMPLICIT closer to where its now relevant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When BmUSEFUL() for a pattern goes negative, we unset ~RXf_USE_INTUIT.
A bug fix in 2010 (c941595168) made this part of the code also remove
the RXf_ANCH_MBOL at the same time, if PREGf_IMPLICIT was set.
However, this was really just working round a bug in regexec_flags().
Once intuit was disabled, regexec_flags() would then start taking the
buggy code path.
This buggy code path was separately fixed in 2012 by 21eede782, so there's
no longer any need to remove this flag. So don't.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ml_anch variable is supposed to indicate that a multi-line match is
possible, i.e. that the regex is anchored to a \n.
Currently we just do
ml_anch = (prog->intflags & PREGf_ANCH_MBOL);
However, MBOL is also set on /.*..../; the two cases are distinguished by
adding the PREGf_IMPLICIT flag too.
So at the moment we have lots of tests along the lines of
if (ml_anch && !(prog->intflags & PREGf_IMPLICIT))
Simplify this by adding the IMPLICIT condition when initially calculating
ml_anch, so there's no need to keep testing for it later. This also means
that ml_anch actually means what it says now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The block of code that deals with absolute anchors looks like:
if (...) {
if (.... && !PREGf_IMPLICIT) ...;
if (FOO) {
assert(!PREGf_IMPLICIT);
....
}
}
where the constraints of FOO imply PREGf_IMPLICIT, as shown by the
assertion. Simplify this to:
if (... && !PREGf_IMPLICIT) {
if (....) ...;
if (FOO) {
....
}
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the stclass block of code, there are various checks for anchored-ness.
Add !(prog->intflags & PREGf_IMPLICIT) to these conditions, since
from the perspective of these tests, we can only quit early etc if these
are real rather than fake anchors.
As it happens, this currently makes no logical difference, since an
PREGf_IMPLICIT pattern starts with .*, and so more or less by definition
can't have an stclass.
So this commit is really only for logical completeness, and to allow us to
change the definition of ml_anch shortly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'check' block of code has special handling for if the pattern is
anchored; in this case, it allows us to set an upper bound on where a
floating substring might match; e.g. in
/^..\d?abc/
the floating string can't be more than 3 chars from the absolute beginning
of the string. Similarly with /..\G\d?abc/, it can't be more than 3 chars
from the start position (assuming that position been calculated correctly
by the caller).
However, the current code used rx_origin as the base for the offset
calculation, rather than strbeg/strpos as appropriate. This meant that
a) the first time round the loop, if strpos > strbeg, then the upper bound
would be set higher than needed;
b) if we ever go back to restart: with an incremented rx_origin, then the
upper limit is recalculated with more wasted slack at the latter end.
This commit changes the limit calculation, which reduces the following
from second to milliseconds:
$s = "abcdefg" x 1_000_000;
$s =~ /^XX\d{1,10}cde/ for 1..100;
It also adds a quick test to skip hopping when the result is likely to
leave end_point unchanged, and adds an explicit test for !PREGf_IMPLICIT.
This latter test isn't strictly necessary, as if PREGf_IMPLICIT were set,
it implies that the pattern starts with '.*', which implies that
prog->check_offset_max == SSize_t_MAX, which is already tested for.
However, it makes the overall condition more comprehensible, and makes it
more robust in the face of future changes.
|