diff options
author | Yves Orton <demerphq@gmail.com> | 2012-07-11 09:03:09 +0200 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2012-07-13 20:10:43 -0700 |
commit | ac7af3f615eb56bda50bf123662b15779da26826 (patch) | |
tree | ea9a0337f5c67dfbac96e509bd26c28e360f1781 /regcomp.c | |
parent | 79a3e5ea36208f2f54e36fa3a73c72808a6d0ad8 (diff) | |
download | perl-ac7af3f615eb56bda50bf123662b15779da26826.tar.gz |
fix RT#114068 optimizer handles MEOL in middle of pattern improperly
It seems that under certain circumstances the optimiser handles the
MEOL operator (what $ turns into under /m), improperly including
things that follow. This results in compilation like this:
Compiling REx "( [^z] $ [^z]+ )"
Final program:
1: OPEN1 (3)
3: ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY] (14)
14: MEOL (15)
15: PLUS (27)
16: ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY] (0)
27: CLOSE1 (29)
29: END (0)
anchored ""$ at 2 stclass ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY]
Where the '""$ at 2' is sign of the bug. The problem is that the optimiser
does not "commit" the string when it encounters an MEOL, which means that
text that follows it is included. This has probably always been wrong as
$ is a multichar pattern (it matches before an \n or including an \n). This
failure to commit then interacts with the implementation for PLUS leading to
an incorrect offset. By adding a SCAN_COMMIT() as part of the optimisers
handling of EOL constructs this problem is avoided. Note that most uses of
$ were ok due to other reasons.
Diffstat (limited to 'regcomp.c')
-rw-r--r-- | regcomp.c | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -4404,6 +4404,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, data->flags |= (OP(scan) == MEOL ? SF_BEFORE_MEOL : SF_BEFORE_SEOL); + SCAN_COMMIT(pRExC_state, data, minlenp); + } else if ( PL_regkind[OP(scan)] == BRANCHJ /* Lookbehind, or need to calculate parens/evals/stclass: */ |