summaryrefslogtreecommitdiff
path: root/regexec.c
Commit message (Collapse)AuthorAgeFilesLines
* Manually revert "Rationalise RX_MATCH_UTF8_set()"Father Chrysostomos2014-11-051-2/+2
| | | | | | | | | | | | Commit 0254aed965 says in its message, among other things: Now that the RXf_MATCH_UTF8 flag on a regex is just used to indicate whether the captures on a successful match are utf8, only set this flag at the end of a successful match, rather than at the start of the match. But if we only set the utf8 flag at the end of the match, then $^N accessed within the match will have the flag wrongly set.
* regcomp.c: Improve re debug output by showing buffer names if they existYves Orton2014-10-201-3/+3
| | | | | | Requires adding a new optional argument to regprop as we do not have a completed regexp object to give us the names, and we need to get it from RExC_state.
* Eliminate unused BACK regnodeAaron Crane2014-09-291-3/+0
|
* Up regex flags limit for (??{})Karl Williamson2014-09-291-1/+1
| | | | | | | | | Previously the regex pattern compilation flags needed for this construct would fit into an 8-bit byte. This conveniently fits into the flags structure element of a regnode. There are changes coming that require more than 8 bits, so in preparation, this commit adds an argument to the node that implements (??{}) (31-bits usable for flags), and moves the storage to that.
* restore color to debug diagnosticsYves Orton2014-09-251-1/+2
|
* Remove !IS_PADGV assertionsFather Chrysostomos2014-09-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 60779a30f stopped setting PADTMP on GVs in pads, and changed many instances of if(SvPADTMP && !IS_PADGV) to if(SvPADTMP){assert(!IS_PADGV)...}. This PADTMP was leftover from the original ithreads implementation that marked these as PADTMP under the assumption that anything in a pad had to be marked PADMY or PADTMP. Since we don’t want these GVs copied the way operator targets are (PADTMP indicates the latter), we needed this !IS_PADGV check all over the place. 60779a30f was actually right in removing the flag and the !IS_PADGV check, because it should be possible for XS modules to create ops that return copiable GVs. But the assertions prevent that from working. More importantly (to me at least), this IS_PADGV doesn’t quite make sense and I am trying to eliminate it. BTW, you have to be doubly naughty, but it is possible to fail these assertions: $ ./perl -Ilib -e 'BEGIN { sub { $::{foo} = \@_; constant::_make_const(@_) }->(*bar) } \ foo' Assertion failed: (!IS_PADGV(sv)), function S_refto, file pp.c, line 576. Abort trap: 6
* Eliminate the duplicative regops BOL and EOLYves Orton2014-09-171-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | See also perl5porters thread titled: "Perl MBOLism in regex engine" In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda) the BOL regop was split into two behaviours MBOL and SBOL, with SBOL and BOL behaving identically. Similarly the EOL regop was split into two behaviors SEOL and MEOL, with EOL and SEOL behaving identically. This then resulted in various duplicative code related to flags and case statements in various parts of the regex engine. It appears that perhaps BOL and EOL were kept because they are the type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl to handle aliases for the type data so that SBOL/MBOL are of type BOL, even though BOL == SBOL seems to cover that case without adding to the confusion. This means two regops, a regstate, and an internal regex flag can be removed (and used for other things), and various logic relating to them can be removed. For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and MBOL is /^/m. (I consider it a fail we have no way to say MBOL without the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is also a /\z/ which is EOS "end of string" with or without the /m).
* Allow for changing size of bracketed regex char classKarl Williamson2014-09-031-1/+4
| | | | | | | | This commit allows Perl to be compiled with a bitmap size that is larger than 256. This bitmap is used to directly look up whether a character matches or not, without having to do a binary search or hash lookup. It might improve the performance for some installations that have a lot of use of scripts that are above the Latin1 range.
* Rename some internal regex #definesKarl Williamson2014-09-031-8/+10
| | | | | | | | | These are renamed to be more clear as to their actual meanings. I know other people have been confused by their former names. Some of the name changes will become more important as future commits will allow the bitmap in a bracketed character class to be a different size.
* regexec.c: Simplify a short code sectionKarl Williamson2014-09-031-5/+5
| | | | Two "if"s can be combined, leading to one fewer (unoptimized) tests
* Avoid redundant text -in -Dr outputKarl Williamson2014-08-211-2/+2
| | | | | | | | | | There is an optimization in regcomp.c which saves memory when used, but which caused -Dr output to display the same code points twice. I didn't add a test to ext/re/t/regop.t because this only happens for a Unicode property, under certain circumstances, and that triggers a lot of other regex patterns to be compiled which are subject to change, so the test would frequently have to be updated. This only affects debugging output anyway.
* Move _get_regclass_nonbitmap_data() to regcomp.cKarl Williamson2014-08-211-112/+0
| | | | | | | | | | | | | This function, though mostly used by regexec.c and previously contained therein, relies on the particular data structure in regcomp.c, so it makes sense to move it adjacent to the code that generates the structure. Future commits will add function calls to this function that are (currently) only in regcomp.c, and so will tie it tighter to that file. This just moves the code with surrounding #ifdef, adds a couple of blank lines, removes some trailing white space, and adjusts the comments. No other changes were made.
* regex: Use #define for number of bits in ANYOFKarl Williamson2014-08-211-1/+1
| | | | | | | ANYOF nodes (for bracketed character classes) currently are for code points 0-255. This is the first step in the eventual making that size configurable. This also renames a static function, as the domain may not necessarily be 'latin1'
* Move return false out of switch default.Jarkko Hietaniemi2014-07-291-8/+2
|
* NOTREACHED goes at/in the unreachable, not after it.Jarkko Hietaniemi2014-07-281-47/+90
|
* Negatives as UVs: sign-extension intentional, add cast.Jarkko Hietaniemi2014-07-281-2/+2
|
* wrap op_sibling field access in OP_SIBLING* macrosDavid Mitchell2014-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Remove (almost all) direct access to the op_sibling field of OP structs, and use these three new macros instead: OP_SIBLING(o); OP_HAS_SIBLING(o); OP_SIBLING_set(o, new_value); OP_HAS_SIBLING is intended to be a slightly more efficient version of OP_SIBLING when only boolean context is needed. For now these three macros are just defined in the obvious way: #define OP_SIBLING(o) (0 + (o)->op_sibling) #define OP_HAS_SIBLING(o) (cBOOL((o)->op_sibling)) #define OP_SIBLING_set(o, sib) ((o)->op_sibling = (sib)) but abstracting them out will allow us shortly to make the last pointer in an op_sibling chain point back to the parent rather than being null, with a new flag indicating whether this is the last op. Perl_ck_fun() still has a couple of direct uses of op_sibling, since it takes the field's address, which is not covered by these macros.
* Assert before deref due to possible NULL.Jarkko Hietaniemi2014-06-301-0/+1
| | | | Coverity perl5 CID 68587.
* Guard cur_curlyx at least with an assert.Jarkko Hietaniemi2014-06-271-3/+6
| | | | Keep on keeping Coverity happy (Coverity perl5 CID 45352).
* regexec.c: Move some macro definitions aroundKarl Williamson2014-06-261-38/+38
| | | | This puts them in a more logical order
* PATCH: [perl #122090] Non-word-boundary doesn't match EOSKarl Williamson2014-06-261-2/+12
| | | | | | | | | | | | | | | | | | | The root cause of this is that the loop is a "< strend" instead of a "<=". However, the macro that is called to form the loop is used elsewhere where "<" is appropriate. So I opted to just repeat the small salient portions of the loop right after it so it gets executed one final time in the final position. This code: if ((!prog->minlen && tmp) && (reginfo->intuit || regtry(reginfo, &s))) \ goto got_it; had the effect previously of causing \b to look at the position just after the string, but not \B. By doing it the other way, this is no longer needed, except I don't understand the prog->minlen portion. It isn't used anywhere else in the tests which see if one should goto got_it, nor do any test suite tests fail by removing it. I think it is a relic, but if a bisect should end up blaming this commit, I'd start with that.
* regexec.c: Comments and white-space onlyKarl Williamson2014-06-261-87/+126
|
* regexec.c: Exchange 2 lines of code and add commentsKarl Williamson2014-06-261-2/+11
| | | | | This reorders things to avoid unnecessary work, as the now-first line may jump, making the assignment that used to be unconditional irrelevant
* regexec.c: More cleaning of FBC macro/code interfaceKarl Williamson2014-06-261-35/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The definition of \w is now compiled into the Perl core. This allows the complicated swash_fetch function call to be replaced by isWORDCHAR_utf8, which takes a single parameter, so the interface can be simplified. [1]. This macro will execute faster on Latin1-range inputs, as it doesn't do a swash_fetch on them, but slower on other code points due to function call overhead, and some currently in-place error checking that wasn't done previously. This overhead could be removed by using inline functions, and perhaps a different interface for known non-malformed input (though I'm actually not sure the input is known to be well-formed in this case). These macros still depend on and modify outside variables. That could be cleaned up by adding additional parameters to them, but I'm not going to do it now. I don't like these kinds of code-generating macros, and have been tempted to rewrite these as inline functions, but it's not a trivial task to do. [1] I hadn't realized it before, but the interface could have been cleaned up instead by introducting a macro that makes it look like a single parameter is used uniformly to existing macros, looking like #define FBC_BOUND_SWASH_FETCH(s) \ cBOOL(swash_fetch(PL_utf8_swash_ptrs[_CC_WORDCHAR], s, utf8_target)) But it seems better to me to use isWORDCHAR_utf8 as it is faster for Western European languages, and can be made nearly the same speed as the alternative if experience tells us that this is a slow spot that should be sped up.
* regexec.c: Clean up macro/code interface slightlyKarl Williamson2014-06-261-17/+16
| | | | | | | | | These code-generating macros are a pain to maintain. These have side effects and use variables outside the macros. This commit cleans up the interface slightly by separating the operand of a function from the function, so the code outside doesn't have to know as much about the macro as before. The next commit will fix up the remaining parameter like this.
* regexec.c: Change names of 4 macrosKarl Williamson2014-06-261-14/+14
| | | | | | The names were misleading; they were not for general use, but only for this FBC area of the code, and they no longer have to do with having to load utf8 or not.
* regexec.c: Change MiXeD cAsE formal macro parametersKarl Williamson2014-06-261-16/+16
| | | | | | | I find these a pain to type and read, and there is no reason for them. As formal parameters there is no possibility of conflict with the code outside them. And I'm about to do some editing of these, so this change will make that easier.
* Remove or downgrade unnecessary dVAR.Jarkko Hietaniemi2014-06-251-15/+0
| | | | | | | | You need to configure with g++ *and* -Accflags=-DPERL_GLOBAL_STRUCT or -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE to see any difference. (g++ does not do the "post-annotation" form of "unused".) The version code has some of these issues, reported upstream.
* Revert "/* NOTREACHED */ belongs *before* the unreachable."Jarkko Hietaniemi2014-06-191-86/+43
| | | | | | This reverts commit 148f39b7de6eae9ddd59e0b0aff691d6abea7aca. (Still needs more work, but wanted to see how well this passed with Jenkins.)
* /* NOTREACHED */ belongs *before* the unreachable.Jarkko Hietaniemi2014-06-191-43/+86
| | | | | | Definitely not *after* it. It marks the start of the unreachable, not the first unrechable line. And if they are in that order, it looks better to linebreak after the lint hint.
* Do not printf U32 and I32 (both) as %i.Jarkko Hietaniemi2014-06-161-2/+3
| | | | (Detected in HP-UX.)
* Some low-hanging -Wunreachable-code fruits.Jarkko Hietaniemi2014-06-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | - after return/croak/die/exit, return/break are pointless (break is not a terminator/separator, it's a goto) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (the initializations will be skipped since the flow will bypass the start of the block); in some easy cases simply hoist the declarations out of the block and move them earlier Note 1: Since after this patch the core is not yet -Wunreachable-code clean, not enabling that via cflags.SH, one needs to -Accflags=... it. Note 2: At least with the older gcc 4.4.7 there are far too many "unreachable code" warnings, which seem to go away with gcc 4.8, maybe better flow control analysis. Therefore, the warning should eventually be enabled only for modernish gccs (what about clang and Intel cc?)
* Revert "Some low-hanging -Wunreachable-code fruits."Jarkko Hietaniemi2014-06-131-0/+3
| | | | | | | This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679. I don't understand - smoke-me came back happy with three separate reports... oh well, some other time.
* Some low-hanging -Wunreachable-code fruits.Jarkko Hietaniemi2014-06-131-3/+0
| | | | | | | | | | | | | | | | | | - after croak/die/exit (or return), break (or return!) are pointless (break is not a terminator/separator, it's a promise of a jump) - after goto, another goto (!) is pointless - in some cases (usually function ends) introduce explicit NOT_REACHED to make the noreturn nature clearer (do not do this everywhere, though, since that would mean adding NOT_REACHED after every croak) - for the added NOT_REACHED also add /* NOTREACHED */ since NOT_REACHED is for gcc (and VC), while the comment is for linters - declaring variables in switch blocks is just too fragile: it kind of works for narrowing the scope (which is nice), but breaks the moment there are initializations for the variables (they will be skipped!); in some easy cases simply hoist the declarations out of the block and move them earlier There are still a few places left.
* Silence several -Wunused-parameter warnings about my_perlBrian Fraser2014-06-131-1/+1
| | | | | | | | This meant sprinkling some PERL_UNUSED_CONTEXT invocations, as well as stopping some functions from getting my_perl in the first place; all of the functions in the latter category are internal (S_ prefix and s or i in embed.fnc), so this should be both safe and economical.
* Change newSVpvn("…", …) to newSVpvs("…")Dagfinn Ilmari Mannsåker2014-06-121-1/+1
| | | | | The dual-life dists affected use Devel::PPPort, so can safely use newSVpvs() even though it wasn't added until Perl v5.8.9.
* regexec.c: Eliminate a malloc/freeKarl Williamson2014-05-301-5/+4
| | | | This uses an C automatic variable instead of a malloc and free.
* regcomp.c, regexec.c: Move common code to a functionKarl Williamson2014-05-301-8/+1
| | | | There are other cases where this functionality will be needed as well.
* regexec.c: Fix some EBCDIC problemsKarl Williamson2014-05-301-6/+4
| | | | | | We were testing for UTF-8 invariant, when we should have been testing for ASCII. This is a problem only on EBCDIC platforms, where they mean two different sets of code points.
* Unify the "fall-through" lint annotation.Jarkko Hietaniemi2014-05-291-17/+17
| | | | | | | Used by linters (static checkers), and also good for human readers. Even though "FALL THROUGH" seems to be the most common, e.g BSD lint manual only knows "FALLTHROUGH" (or "FALLTHRU").
* Insert asserts to paths suspected by Coverity.Jarkko Hietaniemi2014-05-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity suspects paths like this: (1) p = foo(); if (!p) p = bar(); baz(p->q); since it cannot prove that p is always non-NULL at the dereference. (2) Or simply something like: mg = mg_find(...); foo(mg->blah); Since mg_find() can fail returning NULL. Adding assert() calls before the dereferences. Testing with -DDEBUGGING. Hopefully there are regular smokes doing the same. [perl #121894] Fix for Coverity perl5 CIDs 28950,28952..28955,28964,28967,28970..28795,49921: CID ...: Dereference after null check (FORWARD_NULL) var_deref_op: Dereferencing null pointer p->q (TODO: Coverity perl5 CIDs 28962,28968,28969: the same issue, but would conflict with already in-flight changes, prepare catch-up patch later.) --- dist/Data-Dumper/Dumper.xs | 1 + dump.c | 1 + ext/Devel-Peek/Peek.xs | 1 + ext/XS-APItest/APItest.xs | 1 + mg.c | 2 ++ mro.c | 4 +++- op.c | 4 ++++ perlio.c | 1 + pp_hot.c | 5 +++-- regcomp.c | 3 +++ regexec.c | 5 ++++- universal.c | 2 +- util.c | 4 +++- 13 files changed, 28 insertions(+), 6 deletions(-) diff --git a/dist/Data-Dumper/Dumper.xs b/dist/Data-Dumper/Dumper.xs index 12c4ebd..23e0cf4 100644 --- a/dist/Data-Dumper/Dumper.xs +++ b/dist/Data-Dumper/Dumper.xs @@ -641,6 +641,7 @@ DD_dump(pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval, HV *seenhv, else { sv_pattern = val; } + assert(sv_pattern); rval = SvPV(sv_pattern, rlen); rend = rval+rlen; slash = rval; diff --git a/dump.c b/dump.c index 59be3e0..c2d72fd 100644 --- a/dump.c +++ b/dump.c @@ -471,6 +471,7 @@ Perl_sv_peek(pTHX_ SV *sv) finish: while (unref--) sv_catpv(t, ")"); + /* XXX when is sv ever NULL? */ if (TAINTING_get && SvTAINTED(sv)) sv_catpv(t, " [tainted]"); return SvPV_nolen(t); diff --git a/ext/Devel-Peek/Peek.xs b/ext/Devel-Peek/Peek.xs index 679efa5..b20fa94 100644 --- a/ext/Devel-Peek/Peek.xs +++ b/ext/Devel-Peek/Peek.xs @@ -450,6 +450,7 @@ PPCODE: BOOT: { CV * const cv = get_cvn_flags("Devel::Peek::Dump", 17, 0); + assert(cv); cv_set_call_checker(cv, S_ck_dump, (SV *)cv); XopENTRY_set(&my_xop, xop_name, "Dump"); diff --git a/ext/XS-APItest/APItest.xs b/ext/XS-APItest/APItest.xs index a51924d..18ba381 100644 --- a/ext/XS-APItest/APItest.xs +++ b/ext/XS-APItest/APItest.xs @@ -2096,6 +2096,7 @@ newCONSTSUB(stash, name, flags, sv) break; } EXTEND(SP, 2); + assert(mycv); PUSHs( CvCONST(mycv) ? &PL_sv_yes : &PL_sv_no ); PUSHs((SV*)CvGV(mycv)); diff --git a/mg.c b/mg.c index 76912bd..7f3339a 100644 --- a/mg.c +++ b/mg.c @@ -1675,6 +1675,7 @@ Perl_magic_clearisa(pTHX_ SV *sv, MAGIC *mg) same function. */ mg = mg_find(mg->mg_obj, PERL_MAGIC_isa); + assert(mg); if (SvTYPE(mg->mg_obj) == SVt_PVAV) { /* multiple stashes */ SV **svp = AvARRAY((AV *)mg->mg_obj); I32 items = AvFILLp((AV *)mg->mg_obj) + 1; @@ -3437,6 +3438,7 @@ Perl_magic_copycallchecker(pTHX_ SV *sv, MAGIC *mg, SV *nsv, sv_magic(nsv, &PL_sv_undef, mg->mg_type, NULL, 0); nmg = mg_find(nsv, mg->mg_type); + assert(nmg); if (nmg->mg_flags & MGf_REFCOUNTED) SvREFCNT_dec(nmg->mg_obj); nmg->mg_ptr = mg->mg_ptr; nmg->mg_obj = SvREFCNT_inc_simple(mg->mg_obj); diff --git a/mro.c b/mro.c index 1b37ca7..ccf4bf4 100644 --- a/mro.c +++ b/mro.c @@ -638,12 +638,14 @@ Perl_mro_isa_changed_in(pTHX_ HV* stash) hv_storehek(mroisarev, namehek, &PL_sv_yes); } - if((SV *)isa != &PL_sv_undef) + if ((SV *)isa != &PL_sv_undef) { + assert(namehek); mro_clean_isarev( isa, HEK_KEY(namehek), HEK_LEN(namehek), HvMROMETA(revstash)->isa, HEK_HASH(namehek), HEK_UTF8(namehek) ); + } } } } diff --git a/op.c b/op.c index 796cb03..79621ce 100644 --- a/op.c +++ b/op.c @@ -2907,6 +2907,7 @@ S_my_kid(pTHX_ OP *o, OP *attrs, OP **imopsp) S_cant_declare(aTHX_ o); } else if (attrs) { GV * const gv = cGVOPx_gv(cUNOPo->op_first); + assert(PL_parser); PL_parser->in_my = FALSE; PL_parser->in_my_stash = NULL; apply_attrs(GvSTASH(gv), @@ -2929,6 +2930,7 @@ S_my_kid(pTHX_ OP *o, OP *attrs, OP **imopsp) else if (attrs && type != OP_PUSHMARK) { HV *stash; + assert(PL_parser); PL_parser->in_my = FALSE; PL_parser->in_my_stash = NULL; @@ -10229,6 +10231,7 @@ Perl_ck_split(pTHX_ OP *o) op_append_elem(OP_SPLIT, o, newDEFSVOP()); kid = kid->op_sibling; + assert(kid); scalar(kid); if (!kid->op_sibling) @@ -10902,6 +10905,7 @@ Perl_cv_set_call_checker(pTHX_ CV *cv, Perl_call_checker ckfun, SV *ckobj) MAGIC *callmg; sv_magic((SV*)cv, &PL_sv_undef, PERL_MAGIC_checkcall, NULL, 0); callmg = mg_find((SV*)cv, PERL_MAGIC_checkcall); + assert(callmg); if (callmg->mg_flags & MGf_REFCOUNTED) { SvREFCNT_dec(callmg->mg_obj); callmg->mg_flags &= ~MGf_REFCOUNTED; diff --git a/perlio.c b/perlio.c index d4c43d0..e6ff9e4 100644 --- a/perlio.c +++ b/perlio.c @@ -2225,6 +2225,7 @@ PerlIOBase_dup(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags) PerlIO_funcs * const self = PerlIOBase(o)->tab; SV *arg = NULL; char buf[8]; + assert(self); PerlIO_debug("PerlIOBase_dup %s f=%p o=%p param=%p\n", self ? self->name : "(Null)", (void*)f, (void*)o, (void*)param); diff --git a/pp_hot.c b/pp_hot.c index 2cccc48..9c9d1e9 100644 --- a/pp_hot.c +++ b/pp_hot.c @@ -3078,6 +3078,7 @@ S_method_common(pTHX_ SV* meth, U32* hashp) const HE* const he = hv_fetch_ent(stash, meth, 0, *hashp); if (he) { gv = MUTABLE_GV(HeVAL(he)); + assert(stash); if (isGV(gv) && GvCV(gv) && (!GvCVGEN(gv) || GvCVGEN(gv) == (PL_sub_generation + HvMROMETA(stash)->cache_gen))) @@ -3085,9 +3086,9 @@ S_method_common(pTHX_ SV* meth, U32* hashp) } } + assert(stash || packsv); gv = gv_fetchmethod_sv_flags(stash ? stash : MUTABLE_HV(packsv), - meth, GV_AUTOLOAD | GV_CROAK); - + meth, GV_AUTOLOAD | GV_CROAK); assert(gv); return isGV(gv) ? MUTABLE_SV(GvCV(gv)) : MUTABLE_SV(gv); diff --git a/regcomp.c b/regcomp.c index eaee604..3d49827 100644 --- a/regcomp.c +++ b/regcomp.c @@ -2007,6 +2007,7 @@ S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode *startbranch, }); re_trie_maxbuff = get_sv(RE_TRIE_MAXBUF_NAME, 1); + assert(re_trie_maxbuff); if (!SvIOK(re_trie_maxbuff)) { sv_setiv(re_trie_maxbuff, RE_TRIE_MAXBUF_INIT); } @@ -14920,6 +14921,7 @@ S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state, av_store(av, 0, (runtime_defns) ? SvREFCNT_inc(runtime_defns) : &PL_sv_undef); if (swash) { + assert(cp_list); av_store(av, 1, swash); SvREFCNT_dec_NN(cp_list); } @@ -16609,6 +16611,7 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node, last= plast; while (PL_regkind[op] != END && (!last || node < last)) { + assert(node); /* While that wasn't END last time... */ NODE_ALIGN(node); op = OP(node); diff --git a/regexec.c b/regexec.c index 362390b..ffab4f2 100644 --- a/regexec.c +++ b/regexec.c @@ -7010,6 +7010,8 @@ no_silent: sv_commit = &PL_sv_yes; sv_yes_mark = &PL_sv_no; } + assert(sv_err); + assert(sv_mrk); sv_setsv(sv_err, sv_commit); sv_setsv(sv_mrk, sv_yes_mark); } @@ -7620,6 +7622,7 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp *prog, *only_utf8_locale_ptr = ary[2]; } else { + assert(only_utf8_locale_ptr); *only_utf8_locale_ptr = NULL; } @@ -7641,7 +7644,7 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp *prog, } else if (doinit && ((si && si != &PL_sv_undef) || (invlist && invlist != &PL_sv_undef))) { - + assert(si); sw = _core_swash_init("utf8", /* the utf8 package */ "", /* nameless */ si, diff --git a/universal.c b/universal.c index a29696d..65e02df 100644 --- a/universal.c +++ b/universal.c @@ -67,7 +67,7 @@ S_isa_lookup(pTHX_ HV *stash, const char * const name, STRLEN len, U32 flags) if (our_stash) { HEK *canon_name = HvENAME_HEK(our_stash); if (!canon_name) canon_name = HvNAME_HEK(our_stash); - + assert(canon_name); if (hv_common(isa, NULL, HEK_KEY(canon_name), HEK_LEN(canon_name), HEK_FLAGS(canon_name), HV_FETCH_ISEXISTS, NULL, HEK_HASH(canon_name))) { diff --git a/util.c b/util.c index b90abe5..cd0afb6 100644 --- a/util.c +++ b/util.c @@ -850,15 +850,17 @@ Perl_fbm_instr(pTHX_ unsigned char *big, unsigned char *bigend, SV *littlestr, U { const MAGIC *const mg = mg_find(littlestr, PERL_MAGIC_bm); - const unsigned char * const table = (const unsigned char *) mg->mg_ptr; const unsigned char *oldlittle; + assert(mg); + --littlelen; /* Last char found by table lookup */ s = big + littlelen; little += littlelen; /* last char */ oldlittle = little; if (s < bigend) { + const unsigned char * const table = (const unsigned char *) mg->mg_ptr; I32 tmp; top2: -- 1.8.5.2 (Apple Git-48)
* Off-by-one in PL_fold_locale use.Jarkko Hietaniemi2014-05-281-4/+4
| | | | | | | | | | | | | | | | Fix for Coverity perl5 CID 29033: Out-of-bounds read (OVERRUN) overrun-local: Overrunning array PL_fold_locale of 256 bytes at byte offset 256 using index c1 (which evaluates to 256). - the "c1 > 256" was off-by-one, it needed to be "c1 > 255", it could have caused the PL_fold_locale to be accessed one past the end, at offset 256, but we have dodged the bullet thanks to the regex engine optimizing the bad case away before we hit it (analysis by Karl Williamson): regexec.c - comment fixes (pointed out by Karl Williamson): regexec.c - add tests to nail down the behaviour of fold matching for the last of Latin-1 (0xFF, lowercase which curiously does not have uppercase within Latin-1). and the first pure Unicode: t/re/pat.t
* [perl #121854] use re 'taint' regressionDavid Mitchell2014-05-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent regexes so that the pattern was considered tainted at compile-time, rather than determining it each time at run-time whenever it executed a locale-dependent node. Unfortunately due to the conflating of two flags, RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently marking a pattern as tainted once it had had a single tainted result. E.g. use re qw(taint); use Scalar::Util qw(tainted); for ($^X, "abc") { /(.*)/ or die; print "not " unless tainted("$1"); print "tainted\n"; }; which from 5.19.9 onwards output: tainted tainted but with this commit (and with 5.19.8 and earlier), it now outputs: tainted not tainted The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g. $r = qr/$tainted_value/ while the RXf_TAINTED_SEEN flag means that the results of the last match are tainted, e.g. use re 'tainted'; $tainted =~ /(.*)/; # $1 is tainted Pre 63baef5, the code used to look like: at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { switch(node) { case BOUNDL: /* and other locale-specific ops */ turn on RXf_TAINTED_SEEN; ...; } } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN; 63baef5 changed it to: at compile-time: if (pattern has locale ops) turn on RXf_TAINTED_SEEN; at run-time: while (nodes to execute) { ... } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN; This commit changes it to: at compile-time; if (pattern has locale ops) turn on RXf_TAINTED; at run-time: turn off RXf_TAINTED_SEEN; while (nodes to execute) { ... } if (tainted || RXf_TAINTED) turn on RXf_TAINTED_SEEN;
* [perl #121484] /m causing false negativeDavid Mitchell2014-03-241-6/+9
| | | | | | | | | | | My recent commit d0d4464849e2b30aee8 in re_intuit_start() reduced the scope of a 'skip if multiline' check, so that certain optimisations weren't being unnecessarily skipped. Unfortunately it didn't reduce the scope enough, so a vital slen-- was being skipped in the SvTAIL-but-don't-fail case. This commit just moves the !multiline test further down, and updates the commentary and condition formatting a bit.
* re_intuit_start(): move comments abut IMPLICITDavid Mitchell2014-03-191-9/+9
| | | | | After the previous reworking of the PREGf_IMPLICIT tests, move commentary about PREGf_IMPLICIT closer to where its now relevant.
* re_intuit_start(): don't unset MBOL on uselessnessDavid Mitchell2014-03-191-9/+0
| | | | | | | | | | | | | | When BmUSEFUL() for a pattern goes negative, we unset ~RXf_USE_INTUIT. A bug fix in 2010 (c941595168) made this part of the code also remove the RXf_ANCH_MBOL at the same time, if PREGf_IMPLICIT was set. However, this was really just working round a bug in regexec_flags(). Once intuit was disabled, regexec_flags() would then start taking the buggy code path. This buggy code path was separately fixed in 2012 by 21eede782, so there's no longer any need to remove this flag. So don't.
* re_intuit_start(): change definition of ml_anchDavid Mitchell2014-03-191-8/+5
| | | | | | | | | | | | | | | | | | | | The ml_anch variable is supposed to indicate that a multi-line match is possible, i.e. that the regex is anchored to a \n. Currently we just do ml_anch = (prog->intflags & PREGf_ANCH_MBOL); However, MBOL is also set on /.*..../; the two cases are distinguished by adding the PREGf_IMPLICIT flag too. So at the moment we have lots of tests along the lines of if (ml_anch && !(prog->intflags & PREGf_IMPLICIT)) Simplify this by adding the IMPLICIT condition when initially calculating ml_anch, so there's no need to keep testing for it later. This also means that ml_anch actually means what it says now.
* re_intuit_start(): check for IMPLICIT in abs anchDavid Mitchell2014-03-191-14/+3
| | | | | | | | | | | | | | | | | | | | | | The block of code that deals with absolute anchors looks like: if (...) { if (.... && !PREGf_IMPLICIT) ...; if (FOO) { assert(!PREGf_IMPLICIT); .... } } where the constraints of FOO imply PREGf_IMPLICIT, as shown by the assertion. Simplify this to: if (... && !PREGf_IMPLICIT) { if (....) ...; if (FOO) { .... } }
* re_intuit_start(): check for IMPLICIT in stclassDavid Mitchell2014-03-191-3/+4
| | | | | | | | | | | | | | In the stclass block of code, there are various checks for anchored-ness. Add !(prog->intflags & PREGf_IMPLICIT) to these conditions, since from the perspective of these tests, we can only quit early etc if these are real rather than fake anchors. As it happens, this currently makes no logical difference, since an PREGf_IMPLICIT pattern starts with .*, and so more or less by definition can't have an stclass. So this commit is really only for logical completeness, and to allow us to change the definition of ml_anch shortly.
* re_intuit_start(): use better limit on anch floatDavid Mitchell2014-03-191-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'check' block of code has special handling for if the pattern is anchored; in this case, it allows us to set an upper bound on where a floating substring might match; e.g. in /^..\d?abc/ the floating string can't be more than 3 chars from the absolute beginning of the string. Similarly with /..\G\d?abc/, it can't be more than 3 chars from the start position (assuming that position been calculated correctly by the caller). However, the current code used rx_origin as the base for the offset calculation, rather than strbeg/strpos as appropriate. This meant that a) the first time round the loop, if strpos > strbeg, then the upper bound would be set higher than needed; b) if we ever go back to restart: with an incremented rx_origin, then the upper limit is recalculated with more wasted slack at the latter end. This commit changes the limit calculation, which reduces the following from second to milliseconds: $s = "abcdefg" x 1_000_000; $s =~ /^XX\d{1,10}cde/ for 1..100; It also adds a quick test to skip hopping when the result is likely to leave end_point unchanged, and adds an explicit test for !PREGf_IMPLICIT. This latter test isn't strictly necessary, as if PREGf_IMPLICIT were set, it implies that the pattern starts with '.*', which implies that prog->check_offset_max == SSize_t_MAX, which is already tested for. However, it makes the overall condition more comprehensible, and makes it more robust in the face of future changes.