diff options
author | Jeffrey Friedl <jfriedl@regex.info> | 2000-08-08 17:59:43 -0700 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2000-08-09 23:05:47 +0000 |
commit | b45f050a81173020b0089d3ff02fa0276958461a (patch) | |
tree | 9ba9f52c16593c4b22ca597ee69abdc99927555b | |
parent | 68d47915e40dc6ab5050a2c749c177f42d1374ff (diff) | |
download | perl-b45f050a81173020b0089d3ff02fa0276958461a.tar.gz |
Re: enhanced(?) regex error messages
Message-Id: <200008090759.AAA07144@ventrue.yahoo.com>
(plus two small patches sent privately)
(this still seems to leave few test failures)
p4raw-id: //depot/perl@6560
-rw-r--r-- | pod/perldiag.pod | 218 | ||||
-rw-r--r-- | regcomp.c | 411 | ||||
-rw-r--r-- | regcomp.h | 14 | ||||
-rwxr-xr-x | t/op/misc.t | 2 | ||||
-rw-r--r-- | t/op/regmesg.t | 185 | ||||
-rw-r--r-- | t/pragma/warn/regcomp | 53 |
6 files changed, 650 insertions, 233 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index ea6f8931a3..c20d71d568 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -404,7 +404,7 @@ check the return value of your socket() call? See L<perlfunc/bind>. =item Bizarre copy of %s in %s (P) Perl detected an attempt to copy an internal value that is not -copiable. +copyable. =item B<-P> not allowed for setuid/setgid script @@ -563,10 +563,11 @@ C<-i.bak>, or some such. characters and Perl was unable to create a unique filename during inplace editing with the B<-i> switch. The file was ignored. -=item Can't do {n,m} with n > m +=item Can't do {n,m} with n > m at <HERE< in regex m/%s/ -(F) Minima must be less than or equal to maxima. If you really want -your regexp to match something 0 times, just put {0}. See L<perlre>. +(F) Minima must be less than or equal to maxima. If you really want your +regexp to match something 0 times, just put {0}. The <HERE< shows in the +regular expression about where the problem was discovered. See L<perlre>. =item Can't do setegid! @@ -1043,35 +1044,6 @@ references can be weakened. with an assignment operator, which implies modifying the value itself. Perhaps you need to copy the value to a temporary, and repeat that. -=item Character class syntax [%s] belongs inside character classes - -(W unsafe) The character class constructs [: :], [= =], and [. .] go -I<inside> character classes, the [] are part of the construct, for -example: /[012[:alpha:]345]/. Note that [= =] and [. .] are not -currently implemented; they are simply placeholders for future -extensions and will cause fatal errors. - -=item Character class syntax [. .] is reserved for future extensions - -(F regexp) Within regular expression character classes ([]) the syntax -beginning with "[." and ending with ".]" is reserved for future -extensions. If you need to represent those character sequences inside -a regular expression character class, just quote the square brackets -with the backslash: "\[." and ".\]". - -=item Character class syntax [= =] is reserved for future extensions - -(F) Within regular expression character classes ([]) the syntax -beginning with "[=" and ending with "=]" is reserved for future -extensions. If you need to represent those character sequences inside -a regular expression character class, just quote the square brackets -with the backslash: "\[=" and "=\]". - -=item Character class [:%s:] unknown - -(F) The class in the character class [: :] syntax is unknown. See -L<perlre>. - =item chmod() mode argument is missing initial 0 (W chmod) A novice will sometimes say @@ -1413,10 +1385,11 @@ some time before now. Check your logic flow. flock() operates on filehandles. Are you attempting to call flock() on a dirhandle by the same name? -=item ?+* follows nothing in regexp +=item Quantifier follows nothing at <HERE< in regex m/%s/ -(F) You started a regular expression with a quantifier. Backslash it if -you meant it literally. See L<perlre>. +(F) You started a regular expression with a quantifier. Backslash it if you +meant it literally. The <HERE< shows in the regular expression about where the +problem was discovered. See L<perlre>. =item Format not terminated @@ -1672,9 +1645,12 @@ transparently promotes all numbers to a floating point representation internally--subject to loss of precision errors in subsequent operations. -=item internal disaster in regexp +=item Internal disaster at <HERE< in regex m/%s/ (P) Something went badly wrong in the regular expression parser. +The <HERE< shows in the regular expression about where the problem was +discovered. + =item Internal inconsistency in tracking vforks @@ -1685,9 +1661,11 @@ L<perlvms/"exec LIST">). Somehow, this count has become scrambled, so Perl is making a guess and treating this C<exec> as a request to terminate the Perl script and execute the specified command. -=item internal urp in regexp at /%s/ +=item Internal urp at <HERE< in regex m/%s/ + +(P) Something went badly awry in the regular expression parser. The <HERE< +shows in the regular expression about where the problem was discovered. -(P) Something went badly awry in the regular expression parser. =item %s (...) interpreted as function @@ -1779,11 +1757,6 @@ effective uids or gids failed. to check the return value of your socket() call? See L<perlfunc/listen>. -=item Lookbehind longer than %d not implemented at {#} mark in regex 5s - -There is an upper limit to the depth of lookbehind in the (?<= -regular expression construct. - =item lstat() on filehandle %s (W io) You tried to do a lstat on a filehandle. What did you mean @@ -1796,6 +1769,12 @@ instead on the filehandle.) values cannot be returned in subroutines used in lvalue context. See L<perlsub/"Lvalue subroutines">. +=item Lookbehind longer than %d not implemented at <HERE< in reges m/%s/ + +(F) There is currently a limit on the length of string which lookbehind can +handle. This restriction may be eased in a future release. The <HERE< shows in +the regular expression about where the problem was discovered. + =item Malformed PERLLIB_PREFIX (F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form @@ -1972,14 +1951,16 @@ provided for this purpose. (F) You tried to do a read/write/send/recv operation with a buffer length that is less than 0. This is difficult to imagine. -=item Nested quantifiers in regexp +=item Nested quantifiers at <HERE< in regex m/%s/ -(F) You can't quantify a quantifier without intervening parentheses. So -things like ** or +* or ?* are illegal. +(F) You can't quantify a quantifier without intervening parentheses. So +things like ** or +* or ?* are illegal. The <HERE< shows in the regular +expression about where the problem was discovered. Note, however, that the minimal matching quantifiers, C<*?>, C<+?>, and C<??> appear to be nested quantifiers, but aren't. See L<perlre>. + =item %s never introduced (S internal) The symbol in question was declared but somehow went out of @@ -2588,6 +2569,35 @@ problem can be found in L<perllocale> section B<LOCALE PROBLEMS>. process which isn't a subprocess of the current process. While this is fine from VMS' perspective, it's probably not what you intended. +=item POSIX syntax [%s] belongs inside character classes + +(W unsafe) The character class constructs [: :], [= =], and [. .] go +I<inside> character classes, the [] are part of the construct, for +example: /[012[:alpha:]345]/. Note that [= =] and [. .] are not +currently implemented; they are simply placeholders for future +extensions and will cause fatal errors. + +=item POSIX syntax [. .] is reserved for future extensions + +(F regexp) Within regular expression character classes ([]) the syntax +beginning with "[." and ending with ".]" is reserved for future +extensions. If you need to represent those character sequences inside +a regular expression character class, just quote the square brackets +with the backslash: "\[." and ".\]". + +=item POSIX syntax [= =] is reserved for future extensions + +(F) Within regular expression character classes ([]) the syntax +beginning with "[=" and ending with "=]" is reserved for future +extensions. If you need to represent those character sequences inside +a regular expression character class, just quote the square brackets +with the backslash: "\[=" and "=\]". + +=item POSIX class [:%s:] unknown + +(F) The class in the character class [: :] syntax is unknown. See +L<perlre>. + =item POSIX getpgrp can't take an argument (F) Your system has POSIX getpgrp(), which takes no argument, unlike @@ -2710,14 +2720,19 @@ in L<perlos2>. (S unsafe) The subroutine being declared or defined had previously been declared or defined with a different function prototype. -=item Quantifier in {,} bigger than %d at {#} mark in regex %s +=item Quantifier in {,} bigger than %d at <HERE< in regex m/%s/ -(F) There is an upper limit to the number of allowed repetitions in the {,} -regular expression construct. +(F) There is currently a limit to the size of the min and max values of the +{min,max} construct. The <HERE< shows in the regular expression about where +the problem was discovered. See L<perlre>. -=item Quantifier follows nothing in rgexp +=item Quantifier unexpected on zero-length expression at <HERE< in regex m/%s/ -(F) Quantifiers like * are suffixes, they quantify something preceding them. +(W regexp) You applied a regular expression quantifier in a place where +it makes no sense, such as on a zero-width assertion. Try putting the +quantifier inside the assertion instead. For example, the way to match +"abc" provided that it is followed by three repetitions of "xyz" is +C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>. =item Range iterator outside integer range @@ -2779,22 +2794,22 @@ Doing so has no effect. (W internal) The internal sv_replace() function was handed a new SV with a reference count of other than 1. -=item Reference to nonexistent group +=item Reference to nonexistant group at <HERE< in regex m/%s/ + +(F) You used something like C<\7> in your regular expression, but there are +not at least seven sets of capturing parentheses in the expression. If you +wanted to have the character with value 7 inserted into the regular expression, +prepend a zero to make the number at least two digits: C<\07> -(F) In a regexp you tried to reference (\1, \2, ...) a group that -doesn't exist. Count your parentheses. +The <HERE< shows in the regular expression about where the problem was +discovered. =item regexp memory corruption (P) The regular expression engine got confused by what the regular expression compiler gave it. -=item regexp *+ operand could be empty - -(F) The part of the regexp subject to either the * or + quantifier could -match an empty string. - -=item regexp out of space +=item Regexp out of space (P) A "can't happen" error, because safemalloc() should have caught it earlier. @@ -2891,22 +2906,31 @@ scalar that had previously been marked as free. (W closed) The socket you're sending to got itself closed sometime before now. Check your logic flow. -=item Sequence (? incomplete +=item Sequence (? incomplete at <HERE< mark in regex m/%s/ -(F) A regular expression ended with an incomplete extension (?. See +(F) A regular expression ended with an incomplete extension (?. The <HERE< +shows in the regular expression about where the problem was discovered. See L<perlre>. -=item Sequence (?%s...) not implemented +=item Sequence (?{...}) not terminated or not {}-balanced in regex m/%s/ + +(F) If the contents of a (?{...}) clause contains braces, they must balance +for Perl to properly detect the end of the clause. See L<perlre>. -(F) A proposed regular expression extension has the character reserved -but has not yet been written. See L<perlre>. +=item Sequence (?%s...) not implemented at <HERE< mark in regex m/%s/ -=item Sequence (?%s...) not recognized +(F) A proposed regular expression extension has the character reserved but +has not yet been written. The <HERE< shows in the regular expression about +where the problem was discovered. See L<perlre>. + +=item Sequence (?%s...) not recognized at <HERE< mark in regex m/%s/ (F) You used a regular expression extension that doesn't make sense. +The <HERE< shows in the regular expression about +where the problem was discovered. See L<perlre>. -=item Sequence (?#... not terminated +=item Sequence (?#... not terminated in regex m/%s/ (F) A regular expression comment must be terminated by a closing parenthesis. Embedded parentheses aren't allowed. See L<perlre>. @@ -3043,14 +3067,6 @@ a block by itself. (W unopened) You tried to use the stat() function on a filehandle that was either never opened or has since been closed. -=item Strange *+?{} on zero-length expression - -(W regexp) You applied a regular expression quantifier in a place where -it makes no sense, such as on a zero-width assertion. Try putting the -quantifier inside the assertion instead. For example, the way to match -"abc" provided that it is followed by three repetitions of "xyz" is -C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>. - =item Stub found while resolving method `%s' overloading %s (P) Overloading resolution over @ISA tree may be broken by importation @@ -3098,6 +3114,24 @@ assignment or as a subroutine argument for example). (F) Your Perl was compiled with B<-D>SETUID_SCRIPTS_ARE_SECURE_NOW, but a version of the setuid emulator somehow got run anyway. +=item Switch (?(condition)... contains too many branches at <HERE< in regex m/%s/ + +(F) A (?(condition)if-clause|else-clause) construct can have at most two +branches (the if-clause and the else-clause). If you want one or both to +contain alternation, such as using C<this|that|other>, enclose it in +clustering parentheses: + + (?(condition)(?:this|that|other)|else-clause) + +The <HERE< shows in the regular expression about where the problem was +discovered. See L<perlre>. + +=item Switch condition not recognized at <HERE< in regex m/%s/ + +(F) If the argument to the (?(...)if-clause|else-clause) construct is a +number, it can be only a number. The <HERE< shows in the regular expression +about where the problem was discovered. See L<perlre>. + =item switching effective %s is not implemented (F) While under the C<use filetest> pragma, we cannot switch the real @@ -3367,11 +3401,23 @@ Check the #! line, or manually feed your script into Perl yourself. (F) The unexec() routine failed for some reason. See your local FSF representative, who probably put it there in the first place. + =item Unknown BYTEORDER (F) There are no byte-swapping functions for a machine with this byte order. +=item Unknown switch condition (?(%.2s at <HERE< in regex m/%s/ + +(F) The condition of a (?(condition)if-clause|else-clause) construct is not +known. The condition may be lookaround (the condition is true if the +lookaround is true), a (?{...}) construct (the condition is true if the +code evaluates to a true value), or a number (the condition is true if the +set of capturing parentheses named by the number is defined). + +The <HERE< shows in the regular expression about where the problem was +discovered. See L<perlre>. + =item Unknown open() mode '%s' (F) The second argument of 3-argument open() is not among the list @@ -3423,12 +3469,14 @@ script, a binary program, or a directory as a Perl program. recognized by Perl inside character classes. The character was understood literally. -=item /%s/: Unrecognized escape \\%c passed through +=item Unrecognized escape \\%c passed through at <HERE< in m/%s/ (W regexp) You used a backslash-character combination which is not -recognized by Perl. This combination appears in an interpolated -variable or a C<'>-delimited regular expression. The character was -understood literally. +recognized by Perl. This combination appears in an interpolated variable or +a C<'>-delimited regular expression. The character was understood +literally. The <HERE< shows in the regular expression about where the escape +was discovered. + =item Unrecognized escape \\%c passed through @@ -3659,10 +3707,6 @@ something else of the same name (usually a subroutine) is exported by that module. It usually means you put the wrong funny character on the front of your variable. -=item Variable length lookbehind not implemented - -(F) Lookbehind currently only works for fixed-length regular expressions. - =item "%s" variable %s masks earlier declaration in same %s (W misc) A "my" or "our" variable has been redeclared in the current @@ -3719,6 +3763,12 @@ anonymous, using the C<sub {}> syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables. +=item Variable length lookbehind not implemented at <HERE< in regex m/%s/ + +(F) Lookbehind is allowed only for subexpressions whose length is fixed and +known at compile time. The <HERE< shows in the regular expression about where +the problem was discovered. + =item Version number must be a constant number (P) The attempt to translate a C<use Module n.n LIST> statement into @@ -202,25 +202,180 @@ static scan_data_t zero_scan_data = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #define CHR_DIST(a,b) (UTF ? utf8_distance(a,b) : a - b) -#define vFAIL(m) \ +/* length of regex to show in messages that don't mark a position within */ +#define RegexLengthToShowInErrorMessages 127 + +/* + * If MARKER[12] are adjusted, be sure to adjust the constants at the top + * of t/op/regmesg.t, the tests in t/op/re_tests, and those in + * op/pragma/warn/regcomp. + */ +#define MARKER1 "<HERE<" /* marker as it appears in the description */ +#define MARKER2 " <<<HERE<<< " /* marker as it appears within the regex */ + +#define REPORT_LOCATION " at " MARKER1 " mark in regex m/%.*s" MARKER2 "%s/" + +/* + * Calls SAVEDESTRUCTOR_X if needed, then calls Perl_croak with the given + * arg. Show regex, up to a maximum length. If it's too long, chop and add + * "...". + */ +#define FAIL(m) \ STMT_START { \ + char *elipises = ""; \ + unsigned len = strlen(PL_regprecomp); \ + \ if (!SIZE_ONLY) \ SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ - Perl_croak(aTHX_ "%s at {#} mark in regex m/%.*s{#}%s/", m, \ - strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse), \ - PL_regprecomp, \ - PL_regprecomp + strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse));\ + \ + if (len > RegexLengthToShowInErrorMessages) { \ + /* chop 10 shorter than the max, to ensure meaning of "..." */ \ + len = RegexLengthToShowInErrorMessages - 10; \ + elipises = "..."; \ + } \ + Perl_croak(aTHX_ "%s in regex m/%.*s%s/", \ + m, len, PL_regprecomp, elipises); \ } STMT_END -#define vFAIL2(pat,m) \ +/* + * Calls SAVEDESTRUCTOR_X if needed, then calls Perl_croak with the given + * args. Show regex, up to a maximum length. If it's too long, chop and add + * "...". + */ +#define FAIL2(pat,m) \ STMT_START { \ + char *elipises = ""; \ + unsigned len = strlen(PL_regprecomp); \ + \ if (!SIZE_ONLY) \ SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ \ - S_re_croak2(aTHX_ pat, " at {#} mark in regex m/%.*s{#}%s/: ", m, \ - strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse), \ - PL_regprecomp, \ - PL_regprecomp + strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse));\ + if (len > RegexLengthToShowInErrorMessages) { \ + /* chop 10 shorter than the max, to ensure meaning of "..." */ \ + len = RegexLengthToShowInErrorMessages - 10; \ + elipises = "..."; \ + } \ + S_re_croak2(aTHX_ pat, " in regex m/%.*s%s/", \ + m, len, PL_regprecomp, elipises); \ + } STMT_END + + +/* + * Simple_vFAIL -- like FAIL, but marks the current location in the scan + */ +#define Simple_vFAIL(m) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \ + \ + Perl_croak(aTHX_ "%s" REPORT_LOCATION, \ + m, offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +/* + * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL() + */ +#define vFAIL(m) \ + STMT_START { \ + if (!SIZE_ONLY) \ + SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ + Simple_vFAIL(m); \ + } STMT_END + +/* + * Like Simple_vFAIL(), but accepts two arguments. + */ +#define Simple_vFAIL2(m,a1) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \ + \ + S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, \ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +/* + * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL2(). + */ +#define vFAIL2(m,a1) \ + STMT_START { \ + if (!SIZE_ONLY) \ + SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ + Simple_vFAIL2(m, a1); \ + } STMT_END + + +/* + * Like Simple_vFAIL(), but accepts three arguments. + */ +#define Simple_vFAIL3(m, a1, a2) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \ + \ + S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2, \ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +/* + * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL3(). + */ +#define vFAIL3(m,a1,a2) \ + STMT_START { \ + if (!SIZE_ONLY) \ + SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ + Simple_vFAIL3(m, a1, a2); \ + } STMT_END + +/* + * Like Simple_vFAIL(), but accepts four arguments. + */ +#define Simple_vFAIL4(m, a1, a2, a3) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \ + \ + S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2, a3,\ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +/* + * Like Simple_vFAIL(), but accepts five arguments. + */ +#define Simple_vFAIL5(m, a1, a2, a3, a4) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \ + S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2, a3, a4,\ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + + +#define vWARN(loc,m) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc)); \ + Perl_warner(aTHX_ WARN_REGEXP, "%s" REPORT_LOCATION,\ + m, offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END \ + + +#define vWARN2(loc, m, a1) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc)); \ + Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION,\ + a1, \ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +#define vWARN3(loc, m, a1, a2) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp) - (PL_regxend - (loc)); \ + Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION, \ + a1, a2, \ + offset, PL_regprecomp, PL_regprecomp + offset); \ + } STMT_END + +#define vWARN4(loc, m, a1, a2, a3) \ + STMT_START { \ + unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc)); \ + Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION,\ + a1, a2, a3, \ + offset, PL_regprecomp, PL_regprecomp + offset); \ } STMT_END @@ -788,8 +943,11 @@ S_study_chunk(pTHX_ regnode **scanp, I32 *deltap, regnode *last, scan_data_t *da if (ckWARN(WARN_REGEXP) && (minnext + deltanext == 0) && !(data->flags & (SF_HAS_PAR|SF_IN_PAR)) && maxcount <= REG_INFTY/3) /* Complement check for big count */ - Perl_warner(aTHX_ WARN_REGEXP, - "Strange *+?{} on zero-length expression"); + { + vWARN(PL_regcomp_parse, + "Quantifier unexpected on zero-length expression"); + } + min += minnext * mincount; is_inf_internal |= ((maxcount == REG_INFTY && (minnext + deltanext) > 0) @@ -852,7 +1010,7 @@ S_study_chunk(pTHX_ regnode **scanp, I32 *deltap, regnode *last, scan_data_t *da regnode *nxt1 = NEXTOPER(oscan) + EXTRA_STEP_2ARGS; /* OPEN*/ if (OP(nxt) != CLOSE) - FAIL("panic opt close"); + FAIL("Panic opt close"); oscan->flags = ARG(nxt); OP(nxt1) = OPTIMIZED; /* was OPEN. */ OP(nxt) = OPTIMIZED; /* was CLOSE. */ @@ -1414,7 +1572,8 @@ Perl_pregcomp(pTHX_ char *exp, char *xend, PMOP *pm) Newc(1001, r, sizeof(regexp) + (unsigned)PL_regsize * sizeof(regnode), char, regexp); if (r == NULL) - FAIL("regexp out of space"); + FAIL("Regexp out of space"); + #ifdef DEBUGGING /* avoid reading uninitialized memory in DEBUGGING code in study_chunk() */ Zero(r, sizeof(regexp) + (unsigned)PL_regsize * sizeof(regnode), char); @@ -1721,6 +1880,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp) register regnode *ender = 0; register I32 parno = 0; I32 flags, oregflags = PL_regflags, have_branch = 0, open = 0; + char *oregcomp_parse = PL_regcomp_parse; char c; *flagp = 0; /* Tentatively. */ @@ -1731,6 +1891,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp) U16 posflags = 0, negflags = 0; U16 *flagsp = &posflags; int logical = 0; + char *seqstart = PL_regcomp_parse; PL_regcomp_parse++; paren = *PL_regcomp_parse++; @@ -1763,8 +1924,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp) return NULL; case 'p': if (SIZE_ONLY) - Perl_warner(aTHX_ WARN_REGEXP, - "(?p{}) is deprecated - use (??{})"); + vWARN(PL_regcomp_parse, "(?p{}) is deprecated - use (??{})"); /* FALL THROUGH*/ case '?': logical = 1; @@ -1791,7 +1951,10 @@ S_reg(pTHX_ I32 paren, I32 *flagp) PL_regcomp_parse++; } if (*PL_regcomp_parse != ')') - FAIL("Sequence (?{...}) not terminated or not {}-balanced"); + { + PL_regcomp_parse = s; + vFAIL("Sequence (?{...}) not terminated or not {}-balanced"); + } if (!SIZE_ONLY) { AV *av; @@ -1850,7 +2013,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp) PL_regcomp_parse++; ret = reganode(GROUPP, parno); if ((c = *nextchar()) != ')') - vFAIL2("Switch (?(number%c not recognized", c); + vFAIL("Switch condition not recognized"); insert_if: regtail(ret, reganode(IFTHEN, 0)); br = regbranch(&flags, 1); @@ -1884,10 +2047,11 @@ S_reg(pTHX_ I32 paren, I32 *flagp) return ret; } else { - vFAIL2("Unknown condition for (?(%.2s", PL_regcomp_parse); + vFAIL2("Unknown switch condition (?(%.2s", PL_regcomp_parse); } } case 0: + PL_regcomp_parse--; /* for vFAIL to print correctly */ vFAIL("Sequence (? incomplete"); break; default: @@ -1911,8 +2075,10 @@ S_reg(pTHX_ I32 paren, I32 *flagp) break; } unknown: - if (*PL_regcomp_parse != ')') - vFAIL2("Sequence (?%c...) not recognized", *PL_regcomp_parse); + if (*PL_regcomp_parse != ')') { + PL_regcomp_parse++; + vFAIL3("Sequence (%.*s...) not recognized", PL_regcomp_parse-seqstart, seqstart); + } nextchar(); *flagp = TRYAGAIN; return NULL; @@ -2024,15 +2190,17 @@ S_reg(pTHX_ I32 paren, I32 *flagp) if (paren) { PL_regflags = oregflags; if (PL_regcomp_parse >= PL_regxend || *nextchar() != ')') { - FAIL("unmatched () in regexp"); + PL_regcomp_parse++; + vFAIL("Unmatched ("); } } else if (!paren && PL_regcomp_parse < PL_regxend) { if (*PL_regcomp_parse == ')') { - FAIL("unmatched () in regexp"); + PL_regcomp_parse = oregcomp_parse; + vFAIL("Unmatched ("); } else - FAIL("junk on end of regexp"); /* "Can't happen". */ + FAIL("Junk on end of regexp"); /* "Can't happen". */ /* NOTREACHED */ } @@ -2207,8 +2375,19 @@ S_regpiece(pTHX_ I32 *flagp) } #if 0 /* Now runtime fix should be reliable. */ + + /* if this is reinstated, don't forget to put this back into perldiag: + + =item Regexp *+ operand could be empty at {#} in regex m/%s/ + + (F) The part of the regexp subject to either the * or + quantifier + could match an empty string. The {#} shows in the regular + expression about where the problem was discovered. + + */ + if (!(flags&HASWIDTH) && op != '?') - FAIL("regexp *+ operand could be empty"); + vFAIL("Regexp *+ operand could be empty"); #endif nextchar(); @@ -2239,8 +2418,10 @@ S_regpiece(pTHX_ I32 *flagp) } nest_check: if (ckWARN(WARN_REGEXP) && !SIZE_ONLY && !(flags&HASWIDTH) && max > REG_INFTY/3) { - Perl_warner(aTHX_ WARN_REGEXP, "%.*s matches null string many times", - PL_regcomp_parse - origparse, origparse); + vWARN3(PL_regcomp_parse, + "%.*s matches null string many times", + PL_regcomp_parse - origparse, + origparse); } if (*PL_regcomp_parse == '?') { @@ -2248,8 +2429,10 @@ S_regpiece(pTHX_ I32 *flagp) reginsert(MINMOD, ret); regtail(ret, ret + NODE_STEP_REGNODE); } - if (ISMULT2(PL_regcomp_parse)) - vFAIL("Nested quantifiers in regexp"); + if (ISMULT2(PL_regcomp_parse)) { + PL_regcomp_parse++; + vFAIL("Nested quantifiers"); + } return(ret); } @@ -2262,8 +2445,7 @@ S_regpiece(pTHX_ I32 *flagp) * faster to run. Backslashed characters are exceptions, each becoming a * separate node; the code is simpler that way and it's not worth fixing. * - * [Yes, it is worth fixing, some scripts can run twice the speed.] - */ + * [Yes, it is worth fixing, some scripts can run twice the speed.] */ STATIC regnode * S_regatom(pTHX_ I32 *flagp) { @@ -2315,13 +2497,17 @@ tryagain: PL_regnaughty++; break; case '[': - PL_regcomp_parse++; + { + char *oregcomp_parse = ++PL_regcomp_parse; ret = (UTF ? regclassutf8() : regclass()); - if (*PL_regcomp_parse != ']') - FAIL("unmatched [] in regexp"); + if (*PL_regcomp_parse != ']') { + PL_regcomp_parse = oregcomp_parse; + vFAIL("Unmatched ["); + } nextchar(); *flagp |= HASWIDTH|SIMPLE; break; + } case '(': nextchar(); ret = reg(1, &flags); @@ -2344,7 +2530,7 @@ tryagain: *flagp |= TRYAGAIN; return NULL; } - vFAIL("internal urp"); + vFAIL("Internal urp"); /* Supposed to be caught earlier. */ break; case '{': @@ -2356,7 +2542,8 @@ tryagain: case '?': case '+': case '*': - vFAIL("Quantifier follows nothing in regexp"); + PL_regcomp_parse++; + vFAIL("Quantifier follows nothing"); break; case '\\': switch (*++PL_regcomp_parse) { @@ -2480,8 +2667,11 @@ tryagain: if (PL_regcomp_parse[1] == '{') { PL_regxend = strchr(PL_regcomp_parse, '}'); - if (!PL_regxend) - FAIL("Missing right brace on \\p{}"); + if (!PL_regxend) { + PL_regcomp_parse += 2; + PL_regxend = oldregxend; + vFAIL("Missing right brace on \\p{}"); + } PL_regxend++; } else @@ -2514,6 +2704,9 @@ tryagain: if (num > 9 && num >= PL_regnpar) goto defchar; else { + while (isDIGIT(*PL_regcomp_parse)) + PL_regcomp_parse++; + if (!SIZE_ONLY && num > PL_regcomp_rx->nparens) vFAIL("Reference to nonexistent group"); PL_regsawback = 1; @@ -2521,8 +2714,6 @@ tryagain: ? (LOC ? REFFL : REFF) : REF, num); *flagp |= HASWIDTH; - while (isDIGIT(*PL_regcomp_parse)) - PL_regcomp_parse++; PL_regcomp_parse--; nextchar(); } @@ -2530,7 +2721,7 @@ tryagain: break; case '\0': if (PL_regcomp_parse >= PL_regxend) - FAIL("trailing \\ in regexp"); + FAIL("Trailing \\"); /* FALL THROUGH */ default: /* Do not generate `unrecognized' warnings here, we fall @@ -2632,8 +2823,10 @@ tryagain: if (*++p == '{') { char* e = strchr(p, '}'); - if (!e) - FAIL("Missing right brace on \\x{}"); + if (!e) { + PL_regcomp_parse = p + 1; + vFAIL("Missing right brace on \\x{}"); + } else if (UTF) { numlen = 1; /* allow underscores */ ender = (UV)scan_hex(p + 1, e - p - 1, &numlen); @@ -2645,7 +2838,11 @@ tryagain: p = e + 1; } else + { + PL_regcomp_parse = e + 1; vFAIL("Can't use \\x{} without 'use utf8' declaration"); + } + } else { numlen = 0; /* disallow underscores */ @@ -2673,14 +2870,11 @@ tryagain: break; case '\0': if (p >= PL_regxend) - FAIL("trailing \\ in regexp"); + FAIL("Trailing \\"); /* FALL THROUGH */ default: if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(*p)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: Unrecognized escape \\%c passed through", - PL_regprecomp, - *p); + vWARN2(p +1, "Unrecognized escape \\%c passed through", *p); goto normal_default; } break; @@ -2728,7 +2922,7 @@ tryagain: PL_regcomp_parse = p - 1; nextchar(); if (len < 0) - vFAIL("internal disaster"); + vFAIL("Internal disaster"); if (len > 0) *flagp |= HASWIDTH; if (len == 1) @@ -2866,13 +3060,19 @@ S_regpposixcc(pTHX_ I32 value) if (namedclass == OOB_NAMEDCLASS || posixcc[skip] != ':' || posixcc[skip+1] != ']') - Perl_croak(aTHX_ - "Character class [:%.*s:] unknown", - t - s - 1, s + 1); - } else if (!SIZE_ONLY) + { + Simple_vFAIL3("POSIX class [:%.*s:] unknown", + t - s - 1, s + 1); + } + } else if (!SIZE_ONLY) { /* [[=foo=]] and [[.foo.]] are still future. */ - Perl_croak(aTHX_ - "Character class syntax [%c %c] is reserved for future extensions", c, c); + + /* adjust PL_regcomp_parse so the warning shows after + the class closes */ + while (*PL_regcomp_parse && *PL_regcomp_parse != ']') + PL_regcomp_parse++; + Simple_vFAIL3("POSIX syntax [%c %c] is reserved for future extensions", c, c); + } } else { /* Maternal grandfather: * "[:" ending in ":" but not in ":]" */ @@ -2897,11 +3097,17 @@ S_checkposixcc(pTHX) while(*s && isALNUM(*s)) s++; if (*s && c == *s && s[1] == ']') { - Perl_warner(aTHX_ WARN_REGEXP, - "Character class syntax [%c %c] belongs inside character classes", c, c); + vWARN3(s+2, "POSIX syntax [%c %c] belongs inside character classes", c, c); + + /* [[=foo=]] and [[.foo.]] are still future. */ if (c == '=' || c == '.') - Perl_croak(aTHX_ - "Character class syntax [%c %c] is reserved for future extensions", c, c); + { + /* adjust PL_regcomp_parse so the error shows after + the class closes */ + while (*PL_regcomp_parse && *PL_regcomp_parse++ != ']') + ; + Simple_vFAIL3("POSIX syntax [%c %c] is reserved for future extensions", c, c); + } } } } @@ -2991,10 +3197,8 @@ S_regclass(pTHX) break; default: if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(value)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: Unrecognized escape \\%c in character class passed through", - PL_regprecomp, - (int)value); + + vWARN2(PL_regcomp_parse, "Unrecognized escape \\%c in character class passed through", (int)value); break; } } @@ -3005,12 +3209,11 @@ S_regclass(pTHX) if (range) { /* a-\d, a-[:digit:] */ if (!SIZE_ONLY) { if (ckWARN(WARN_REGEXP)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: false [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + vWARN4(PL_regcomp_parse, + "False [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); ANYOF_BITMAP_SET(ret, lastvalue); ANYOF_BITMAP_SET(ret, '-'); } @@ -3264,7 +3467,7 @@ S_regclass(pTHX) } break; default: - vFAIL("invalid [::] class"); + vFAIL("Invalid [::] class"); break; } if (LOC) @@ -3274,12 +3477,10 @@ S_regclass(pTHX) } if (range) { if (lastvalue > value) /* b-a */ { - Perl_croak(aTHX_ - "/%.127s/: invalid [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + Simple_vFAIL4("Invalid [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); } range = 0; } @@ -3290,12 +3491,11 @@ S_regclass(pTHX) PL_regcomp_parse++; if (namedclass > OOB_NAMEDCLASS) { /* \w-, [:word:]- */ if (ckWARN(WARN_REGEXP)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: false [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + vWARN4(PL_regcomp_parse, + "False [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); if (!SIZE_ONLY) ANYOF_BITMAP_SET(ret, '-'); } else @@ -3416,7 +3616,7 @@ S_regclassutf8(pTHX) if (*PL_regcomp_parse == '{') { e = strchr(PL_regcomp_parse++, '}'); if (!e) - FAIL("Missing right brace on \\p{}"); + vFAIL("Missing right brace on \\p{}"); n = e - PL_regcomp_parse; } else { @@ -3449,8 +3649,8 @@ S_regclassutf8(pTHX) case 'x': if (*PL_regcomp_parse == '{') { e = strchr(PL_regcomp_parse++, '}'); - if (!e) - FAIL("Missing right brace on \\x{}"); + if (!e) + vFAIL("Missing right brace on \\x{}"); numlen = 1; /* allow underscores */ value = (UV)scan_hex(PL_regcomp_parse, e - PL_regcomp_parse, @@ -3475,10 +3675,9 @@ S_regclassutf8(pTHX) break; default: if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(value)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: Unrecognized escape \\%c in character class passed through", - PL_regprecomp, - (int)value); + vWARN2(PL_regcomp_parse, + "Unrecognized escape \\%c in character class passed through", + (int)value); break; } } @@ -3486,12 +3685,11 @@ S_regclassutf8(pTHX) if (range) { /* a-\d, a-[:digit:] */ if (!SIZE_ONLY) { if (ckWARN(WARN_REGEXP)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: false [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + vWARN4(PL_regcomp_parse, + "False [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); Perl_sv_catpvf(aTHX_ listsv, /* 0x002D is Unicode for '-' */ "%04"UVxf"\n002D\n", (UV)lastvalue); @@ -3558,12 +3756,10 @@ S_regclassutf8(pTHX) } if (range) { if (lastvalue > value) { /* b-a */ - Perl_croak(aTHX_ - "/%.127s/: invalid [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + Simple_vFAIL4("invalid [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); } range = 0; } @@ -3574,12 +3770,11 @@ S_regclassutf8(pTHX) PL_regcomp_parse++; if (namedclass > OOB_NAMEDCLASS) { /* \w-, [:word:]- */ if (ckWARN(WARN_REGEXP)) - Perl_warner(aTHX_ WARN_REGEXP, - "/%.127s/: false [] range \"%*.*s\" in regexp", - PL_regprecomp, - PL_regcomp_parse - rangebegin, - PL_regcomp_parse - rangebegin, - rangebegin); + vWARN4(PL_regcomp_parse, + "False [] range \"%*.*s\"", + PL_regcomp_parse - rangebegin, + PL_regcomp_parse - rangebegin, + rangebegin); if (!SIZE_ONLY) Perl_sv_catpvf(aTHX_ listsv, /* 0x002D is Unicode for '-' */ @@ -3976,7 +4171,7 @@ Perl_regprop(pTHX_ SV *sv, regnode *o) sv_setpvn(sv, "", 0); if (OP(o) >= reg_num) /* regnode.type is unsigned */ - FAIL("corrupted regexp opcode"); + FAIL("Corrupted regexp opcode"); sv_catpv(sv, (char*)reg_name[OP(o)]); /* Take off const! */ k = PL_regkind[(U8)OP(o)]; @@ -269,20 +269,6 @@ struct regnode_charclass_class { #define UCHARAT(p) PL_regdummy #endif /* lint */ -#define FAIL(m) \ - STMT_START { \ - if (!SIZE_ONLY) \ - SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ - Perl_croak(aTHX_ "/%.127s/: %s", PL_regprecomp,m); \ - } STMT_END - -#define FAIL2(pat,m) \ - STMT_START { \ - if (!SIZE_ONLY) \ - SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx); \ - S_re_croak2(aTHX_ "/%.127s/: ",pat,PL_regprecomp,m); \ - } STMT_END - #define EXTRA_SIZE(guy) ((sizeof(guy)-1)/sizeof(struct regnode)) #define REG_SEEN_ZERO_LEN 1 diff --git a/t/op/misc.t b/t/op/misc.t index 00abc99b45..900e014d58 100755 --- a/t/op/misc.t +++ b/t/op/misc.t @@ -346,7 +346,7 @@ print "you die joe!\n" unless "@x" eq 'x y z'; /(?{"{"})/ # Check it outside of eval too EXPECT Sequence (?{...}) not terminated or not {}-balanced at - line 1, within pattern -/(?{"{"})/: Sequence (?{...}) not terminated or not {}-balanced at - line 1. +Sequence (?{...}) not terminated or not {}-balanced at <HERE< mark in regex m/(?{ <<<HERE<<< "{"})/ at - line 1. ######## /(?{"{"}})/ # Check it outside of eval too EXPECT diff --git a/t/op/regmesg.t b/t/op/regmesg.t new file mode 100644 index 0000000000..fa22c0649b --- /dev/null +++ b/t/op/regmesg.t @@ -0,0 +1,185 @@ +#!./perl -w + +BEGIN { + chdir 't' if -d 't'; + unshift @INC, '../lib'; +} + +my $debug = 1; + +## +## If the markers used are changed (search for "MARKER1" in regcomp.c), +## update only these two variables, and leave the {#} in the @death/@warning +## arrays below. The {#} is a meta-marker -- it marks where the marker should +## go. + +my $marker1 = "<HERE<"; +my $marker2 = " <<<HERE<<< "; + +## +## Key-value pairs of code/error of code that should have fatal errors. +## +my @death = +( + '/[[=foo=]]/' => 'POSIX syntax [= =] is reserved for future extensions at {#} mark in regex m/[[=foo=]{#}]/', + + '/(?<= .*)/' => 'Variable length lookbehind not implemented at {#} mark in regex m/(?<= .*){#}/', + + '/(?<= x{10000})/' => 'Lookbehind longer than 255 not implemented at {#} mark in regex m/(?<= x{10000}){#}/', + + '/(?@)/' => 'Sequence (?@...) not implemented at {#} mark in regex m/(?@{#})/', + + '/(?{ 1/' => 'Sequence (?{...}) not terminated or not {}-balanced at {#} mark in regex m/(?{{#} 1/', + + '/(?(1x))/' => 'Switch condition not recognized at {#} mark in regex m/(?(1x{#}))/', + + '/(?(1)x|y|z)/' => 'Switch (?(condition)... contains too many branches at {#} mark in regex m/(?(1)x|y|{#}z)/', + + '/(?(x)y|x)/' => 'Unknown switch condition (?(x) at {#} mark in regex m/(?({#}x)y|x)/', + + '/(?/' => 'Sequence (? incomplete at {#} mark in regex m/(?{#}/', + + '/(?;x/' => 'Sequence (?;...) not recognized at {#} mark in regex m/(?;{#}x/', + '/(?<;x/' => 'Sequence (?<;...) not recognized at {#} mark in regex m/(?<;{#}x/', + + '/((x)/' => 'Unmatched ( at {#} mark in regex m/({#}(x)/', + + '/x{99999}/' => 'Quantifier in {,} bigger than 32766 at {#} mark in regex m/x{{#}99999}/', + + '/x{3,1}/' => 'Can\'t do {n,m} with n > m at {#} mark in regex m/x{3,1}{#}/', + + '/x**/' => 'Nested quantifiers at {#} mark in regex m/x**{#}/', + + '/x[/' => 'Unmatched [ at {#} mark in regex m/x[{#}/', + + '/*/', => 'Quantifier follows nothing at {#} mark in regex m/*{#}/', + + '/\p{x/' => 'Missing right brace on \p{} at {#} mark in regex m/\p{{#}x/', + + 'use utf8; /[\p{x]/' => 'Missing right brace on \p{} at {#} mark in regex m/[\p{{#}x]/', + + '/(x)\2/' => 'Reference to nonexistent group at {#} mark in regex m/(x)\2{#}/', + + 'my $m = chr(92); $m =~ $m', => 'Trailing \ in regex m/\/', + + '/\x{1/' => 'Missing right brace on \x{} at {#} mark in regex m/\x{{#}1/', + + 'use utf8; /[\x{X]/' => 'Missing right brace on \x{} at {#} mark in regex m/[\x{{#}X]/', + + '/\x{x}/' => 'Can\'t use \x{} without \'use utf8\' declaration at {#} mark in regex m/\x{x}{#}/', + + '/[[:barf:]]/' => 'POSIX class [:barf:] unknown at {#} mark in regex m/[[:barf:]{#}]/', + + '/[[=barf=]]/' => 'POSIX syntax [= =] is reserved for future extensions at {#} mark in regex m/[[=barf=]{#}]/', + + '/[[.barf.]]/' => 'POSIX syntax [. .] is reserved for future extensions at {#} mark in regex m/[[.barf.]{#}]/', + + '/[z-a]/' => 'Invalid [] range "z-a" at {#} mark in regex m/[z-a{#}]/', +); + +## +## Key-value pairs of code/error of code that should have non-fatal warnings. +## +@warning = ( + "m/(?p{ 'a' })/" => "(?p{}) is deprecated - use (??{}) at {#} mark in regex m/(?p{#}{ 'a' })/", + + 'm/\b*/' => '\b* matches null string many times at {#} mark in regex m/\b*{#}/', + + 'm/[:blank:]/' => 'POSIX syntax [: :] belongs inside character classes at {#} mark in regex m/[:blank:]{#}/', + + "m'[\\y]'" => 'Unrecognized escape \y in character class passed through at {#} mark in regex m/[\y{#}]/', + + 'm/[a-\d]/' => 'False [] range "a-\d" at {#} mark in regex m/[a-\d{#}]/', + 'm/[\w-x]/' => 'False [] range "\w-" at {#} mark in regex m/[\w-{#}x]/', + "m'\\y'" => 'Unrecognized escape \y passed through at {#} mark in regex m/\y{#}/', +); + +my $total = (@death + @warning)/2; + +print "1..$total\n"; + +my $count = 0; + +while (@death) +{ + $count++; + my $regex = shift @death; + my $result = shift @death; + + undef $@; + $_ = "x"; + eval $regex; + if (not $@) { + if ($debug) { + print "oops, $regex didn't die\n" + } else { + print "not ok $count\n"; + } + next; + } + chomp $@; + $@ =~ s/ at \(.*?\) line \d+\.$//; + $result =~ s/{\#}/$marker1/; + $result =~ s/{\#}/$marker2/; + if ($@ ne $result) { + if ($debug) { + print "For $regex, expected:\n $result\nGot:\n $@\n\n"; + } else { + print "not ok $count\n"; + } + next; + } + print "ok $count\n"; +} + + +our $warning; +$SIG{__WARN__} = sub { $warning = shift }; + +while (@warning) +{ + $count++; + my $regex = shift @warning; + my $result = shift @warning; + + undef $warning; + $_ = "x"; + eval $regex; + + if ($@) + { + if ($debug) { + print "oops, $regex died with:\n\t$@\n"; + } else { + print "not ok $count\n"; + } + next; + } + + if (not $warning) + { + if ($debug) { + print "oops, $regex didn't generate a warning\n"; + } else { + print "not ok $count\n"; + } + next; + } + chomp $warning; + $warning =~ s/ at \(.*?\) line \d+\.$//; + $result =~ s/{\#}/$marker1/; + $result =~ s/{\#}/$marker2/; + if ($warning ne $result) + { + if ($debug) { + print "For $regex, expected:\n $result\nGot:\n $warning\n\n"; + } else { + print "not ok $count\n"; + } + next; + } + print "ok $count\n"; +} + + + diff --git a/t/pragma/warn/regcomp b/t/pragma/warn/regcomp index ef87b7fbb4..82b9b53722 100644 --- a/t/pragma/warn/regcomp +++ b/t/pragma/warn/regcomp @@ -29,7 +29,7 @@ $a =~ /(?=a)*/ ; no warnings 'regexp' ; $a =~ /(?=a)*/ ; EXPECT -(?=a)* matches null string many times at - line 4. +(?=a)* matches null string many times at <HERE< mark in regex m/(?=a)* <<<HERE<<< / at - line 4. ######## # regcomp.c [S_study_chunk] use warnings 'regexp' ; @@ -38,7 +38,7 @@ $_ = "" ; no warnings 'regexp' ; /(?=a)?/; EXPECT -Strange *+?{} on zero-length expression at - line 4. +Quantifier unexpected on zero-length expression at <HERE< mark in regex m/(?=a)? <<<HERE<<< / at - line 4. ######## # regcomp.c [S_regatom] $x = '\m' ; @@ -47,7 +47,7 @@ $a =~ /a$x/ ; no warnings 'regexp' ; $a =~ /a$x/ ; EXPECT -/a\m/: Unrecognized escape \m passed through at - line 4. +Unrecognized escape \m passed through at <HERE< mark in regex m/a\m <<<HERE<<< / at - line 4. ######## # regcomp.c [S_regpposixcc S_checkposixcc] BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 } @@ -61,9 +61,9 @@ no warnings 'regexp' ; /[:zog:]/; /[[:zog:]]/; EXPECT -Character class syntax [: :] belongs inside character classes at - line 5. -Character class syntax [: :] belongs inside character classes at - line 6. -Character class [:zog:] unknown at - line 7. +POSIX syntax [: :] belongs inside character classes at <HERE< mark in regex m/[:alpha:] <<<HERE<<< / at - line 5. +POSIX syntax [: :] belongs inside character classes at <HERE< mark in regex m/[:zog:] <<<HERE<<< / at - line 6. +POSIX class [:zog:] unknown at <HERE< mark in regex m/[[:zog:] <<<HERE<<< ]/ ######## # regcomp.c [S_checkposixcc] BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 } @@ -73,8 +73,8 @@ $_ = "" ; no warnings 'regexp' ; /[.zog.]/; EXPECT -Character class syntax [. .] belongs inside character classes at - line 5. -Character class syntax [. .] is reserved for future extensions at - line 5. +POSIX syntax [. .] belongs inside character classes at <HERE< mark in regex m/[.zog.] <<<HERE<<< / at - line 5. +POSIX syntax [. .] is reserved for future extensions at <HERE< mark in regex m/[.zog.] <<<HERE<<< / ######## # regcomp.c [S_checkposixcc] BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 } @@ -84,7 +84,7 @@ $_ = "" ; no warnings 'regexp' ; /[[.zog.]]/; EXPECT -Character class syntax [. .] is reserved for future extensions at - line 5. +POSIX syntax [. .] is reserved for future extensions at <HERE< mark in regex m/[[.zog.] <<<HERE<<< ]/ ######## # regcomp.c [S_regclass] $_ = ""; @@ -109,14 +109,14 @@ no warnings 'regexp' ; /[[:alpha:]-[:digit:]]/; /[[:digit:]-[:alpha:]]/; EXPECT -/[a-\d]/: false [] range "a-\d" in regexp at - line 5. -/[\d-b]/: false [] range "\d-" in regexp at - line 6. -/[\s-\d]/: false [] range "\s-" in regexp at - line 7. -/[\d-\s]/: false [] range "\d-" in regexp at - line 8. -/[a-[:digit:]]/: false [] range "a-[:digit:]" in regexp at - line 9. -/[[:digit:]-b]/: false [] range "[:digit:]-" in regexp at - line 10. -/[[:alpha:]-[:digit:]]/: false [] range "[:alpha:]-" in regexp at - line 11. -/[[:digit:]-[:alpha:]]/: false [] range "[:digit:]-" in regexp at - line 12. +False [] range "a-\d" at <HERE< mark in regex m/[a-\d <<<HERE<<< ]/ at - line 5. +False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< b]/ at - line 6. +False [] range "\s-" at <HERE< mark in regex m/[\s- <<<HERE<<< \d]/ at - line 7. +False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< \s]/ at - line 8. +False [] range "a-[:digit:]" at <HERE< mark in regex m/[a-[:digit:] <<<HERE<<< ]/ at - line 9. +False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< b]/ at - line 10. +False [] range "[:alpha:]-" at <HERE< mark in regex m/[[:alpha:]- <<<HERE<<< [:digit:]]/ at - line 11. +False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< [:alpha:]]/ at - line 12. ######## # regcomp.c [S_regclassutf8] BEGIN { @@ -148,14 +148,14 @@ no warnings 'regexp' ; /[[:alpha:]-[:digit:]]/; /[[:digit:]-[:alpha:]]/; EXPECT -/[a-\d]/: false [] range "a-\d" in regexp at - line 12. -/[\d-b]/: false [] range "\d-" in regexp at - line 13. -/[\s-\d]/: false [] range "\s-" in regexp at - line 14. -/[\d-\s]/: false [] range "\d-" in regexp at - line 15. -/[a-[:digit:]]/: false [] range "a-[:digit:]" in regexp at - line 16. -/[[:digit:]-b]/: false [] range "[:digit:]-" in regexp at - line 17. -/[[:alpha:]-[:digit:]]/: false [] range "[:alpha:]-" in regexp at - line 18. -/[[:digit:]-[:alpha:]]/: false [] range "[:digit:]-" in regexp at - line 19. +False [] range "a-\d" at <HERE< mark in regex m/[a-\d <<<HERE<<< ]/ at - line 12. +False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< b]/ at - line 13. +False [] range "\s-" at <HERE< mark in regex m/[\s- <<<HERE<<< \d]/ at - line 14. +False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< \s]/ at - line 15. +False [] range "a-[:digit:]" at <HERE< mark in regex m/[a-[:digit:] <<<HERE<<< ]/ at - line 16. +False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< b]/ at - line 17. +False [] range "[:alpha:]-" at <HERE< mark in regex m/[[:alpha:]- <<<HERE<<< [:digit:]]/ at - line 18. +False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< [:alpha:]]/ at - line 19. ######## # regcomp.c [S_regclass S_regclassutf8] use warnings 'regexp' ; @@ -163,4 +163,5 @@ $a =~ /[a\zb]/ ; no warnings 'regexp' ; $a =~ /[a\zb]/ ; EXPECT -/[a\zb]/: Unrecognized escape \z in character class passed through at - line 3. +Unrecognized escape \z in character class passed through at <HERE< mark in regex m/[a\z <<<HERE<<< b]/ at - line 3. + |