Re: enhanced(?) regex error messages

Message-Id: <200008090759.AAA07144@ventrue.yahoo.com> (plus two small patches sent privately) (this still seems to leave few test failures) p4raw-id: //depot/perl@6560
author: Jeffrey Friedl <jfriedl@regex.info> 2000-08-08 17:59:43 -0700
committer: Jarkko Hietaniemi <jhi@iki.fi> 2000-08-09 23:05:47 +0000
commit: b45f050a81173020b0089d3ff02fa0276958461a (patch)
tree: 9ba9f52c16593c4b22ca597ee69abdc99927555b
parent: 68d47915e40dc6ab5050a2c749c177f42d1374ff (diff)
download: perl-b45f050a81173020b0089d3ff02fa0276958461a.tar.gz
6 files changed, 650 insertions, 233 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index ea6f8931a3..c20d71d568 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -404,7 +404,7 @@ check the return value of your socket() call?  See L<perlfunc/bind>.
 =item Bizarre copy of %s in %s
 
 (P) Perl detected an attempt to copy an internal value that is not
-copiable.
+copyable.
 
 =item B<-P> not allowed for setuid/setgid script
 
@@ -563,10 +563,11 @@ C<-i.bak>, or some such.
 characters and Perl was unable to create a unique filename during
 inplace editing with the B<-i> switch.  The file was ignored.
 
-=item Can't do {n,m} with n > m
+=item Can't do {n,m} with n > m at <HERE< in regex m/%s/
 
-(F) Minima must be less than or equal to maxima.  If you really want
-your regexp to match something 0 times, just put {0}.  See L<perlre>.
+(F) Minima must be less than or equal to maxima. If you really want your
+regexp to match something 0 times, just put {0}. The <HERE< shows in the
+regular expression about where the problem was discovered. See L<perlre>.
 
 =item Can't do setegid!
 
@@ -1043,35 +1044,6 @@ references can be weakened.
 with an assignment operator, which implies modifying the value itself.
 Perhaps you need to copy the value to a temporary, and repeat that.
 
-=item Character class syntax [%s] belongs inside character classes
-
-(W unsafe) The character class constructs [: :], [= =], and [. .]  go
-I<inside> character classes, the [] are part of the construct, for
-example: /[012[:alpha:]345]/.  Note that [= =] and [. .] are not
-currently implemented; they are simply placeholders for future
-extensions and will cause fatal errors.
-
-=item Character class syntax [. .] is reserved for future extensions
-
-(F regexp) Within regular expression character classes ([]) the syntax
-beginning with "[." and ending with ".]" is reserved for future
-extensions.  If you need to represent those character sequences inside
-a regular expression character class, just quote the square brackets
-with the backslash: "\[." and ".\]".
-
-=item Character class syntax [= =] is reserved for future extensions
-
-(F) Within regular expression character classes ([]) the syntax
-beginning with "[=" and ending with "=]" is reserved for future
-extensions.  If you need to represent those character sequences inside
-a regular expression character class, just quote the square brackets
-with the backslash: "\[=" and "=\]".
-
-=item Character class [:%s:] unknown
-
-(F) The class in the character class [: :] syntax is unknown.  See
-L<perlre>.
-
 =item chmod() mode argument is missing initial 0
 
 (W chmod) A novice will sometimes say
@@ -1413,10 +1385,11 @@ some time before now.  Check your logic flow.  flock() operates on
 filehandles.  Are you attempting to call flock() on a dirhandle by the
 same name?
 
-=item ?+* follows nothing in regexp
+=item Quantifier follows nothing at <HERE< in regex m/%s/
 
-(F) You started a regular expression with a quantifier.  Backslash it if
-you meant it literally.   See L<perlre>.
+(F) You started a regular expression with a quantifier. Backslash it if you
+meant it literally. The <HERE< shows in the regular expression about where the
+problem was discovered. See L<perlre>.
 
 =item Format not terminated
 
@@ -1672,9 +1645,12 @@ transparently promotes all numbers to a floating point representation
 internally--subject to loss of precision errors in subsequent
 operations.
 
-=item internal disaster in regexp
+=item Internal disaster at <HERE< in regex m/%s/
 
 (P) Something went badly wrong in the regular expression parser.
+The <HERE< shows in the regular expression about where the problem was
+discovered.
+
 
 =item Internal inconsistency in tracking vforks
 
@@ -1685,9 +1661,11 @@ L<perlvms/"exec LIST">).  Somehow, this count has become scrambled, so
 Perl is making a guess and treating this C<exec> as a request to
 terminate the Perl script and execute the specified command.
 
-=item internal urp in regexp at /%s/
+=item Internal urp at <HERE< in regex m/%s/
+
+(P) Something went badly awry in the regular expression parser. The <HERE<
+shows in the regular expression about where the problem was discovered.
 
-(P) Something went badly awry in the regular expression parser.
 
 =item %s (...) interpreted as function
 
@@ -1779,11 +1757,6 @@ effective uids or gids failed.
 to check the return value of your socket() call?  See
 L<perlfunc/listen>.
 
-=item Lookbehind longer than %d not implemented at {#} mark in regex 5s
-
-There is an upper limit to the depth of lookbehind in the (?<=
-regular expression construct.
-
 =item lstat() on filehandle %s
 
 (W io) You tried to do a lstat on a filehandle.  What did you mean
@@ -1796,6 +1769,12 @@ instead on the filehandle.)
 values cannot be returned in subroutines used in lvalue context.  See
 L<perlsub/"Lvalue subroutines">.
 
+=item Lookbehind longer than %d not implemented at <HERE< in reges m/%s/
+
+(F) There is currently a limit on the length of string which lookbehind can
+handle. This restriction may be eased in a future release. The <HERE< shows in
+the regular expression about where the problem was discovered.
+       
 =item Malformed PERLLIB_PREFIX
 
 (F) An error peculiar to OS/2.  PERLLIB_PREFIX should be of the form
@@ -1972,14 +1951,16 @@ provided for this purpose.
 (F) You tried to do a read/write/send/recv operation with a buffer
 length that is less than 0.  This is difficult to imagine.
 
-=item Nested quantifiers in regexp
+=item Nested quantifiers at <HERE< in regex m/%s/
 
-(F) You can't quantify a quantifier without intervening parentheses.  So
-things like ** or +* or ?* are illegal.
+(F) You can't quantify a quantifier without intervening parentheses. So
+things like ** or +* or ?* are illegal. The <HERE< shows in the regular
+expression about where the problem was discovered.
 
 Note, however, that the minimal matching quantifiers, C<*?>, C<+?>, and
 C<??> appear to be nested quantifiers, but aren't.  See L<perlre>.
 
+
 =item %s never introduced
 
 (S internal) The symbol in question was declared but somehow went out of
@@ -2588,6 +2569,35 @@ problem can be found in L<perllocale> section B<LOCALE PROBLEMS>.
 process which isn't a subprocess of the current process.  While this is
 fine from VMS' perspective, it's probably not what you intended.
 
+=item POSIX syntax [%s] belongs inside character classes
+
+(W unsafe) The character class constructs [: :], [= =], and [. .]  go
+I<inside> character classes, the [] are part of the construct, for
+example: /[012[:alpha:]345]/.  Note that [= =] and [. .] are not
+currently implemented; they are simply placeholders for future
+extensions and will cause fatal errors.
+
+=item POSIX  syntax [. .] is reserved for future extensions
+
+(F regexp) Within regular expression character classes ([]) the syntax
+beginning with "[." and ending with ".]" is reserved for future
+extensions.  If you need to represent those character sequences inside
+a regular expression character class, just quote the square brackets
+with the backslash: "\[." and ".\]".
+
+=item POSIX syntax [= =] is reserved for future extensions
+
+(F) Within regular expression character classes ([]) the syntax
+beginning with "[=" and ending with "=]" is reserved for future
+extensions.  If you need to represent those character sequences inside
+a regular expression character class, just quote the square brackets
+with the backslash: "\[=" and "=\]".
+
+=item POSIX class [:%s:] unknown
+
+(F) The class in the character class [: :] syntax is unknown.  See
+L<perlre>.
+
 =item POSIX getpgrp can't take an argument
 
 (F) Your system has POSIX getpgrp(), which takes no argument, unlike
@@ -2710,14 +2720,19 @@ in L<perlos2>.
 (S unsafe) The subroutine being declared or defined had previously been
 declared or defined with a different function prototype.
 
-=item Quantifier in {,} bigger than %d at {#} mark in regex %s
+=item Quantifier in {,} bigger than %d at <HERE< in regex m/%s/
 
-(F) There is an upper limit to the number of allowed repetitions in the {,}
-regular expression construct.
+(F) There is currently a limit to the size of the min and max values of the
+{min,max} construct. The <HERE< shows in the regular expression about where
+the problem was discovered. See L<perlre>.
 
-=item Quantifier follows nothing in rgexp
+=item Quantifier unexpected on zero-length expression at <HERE< in regex m/%s/
 
-(F) Quantifiers like * are suffixes, they quantify something preceding them.
+(W regexp) You applied a regular expression quantifier in a place where
+it makes no sense, such as on a zero-width assertion.  Try putting the
+quantifier inside the assertion instead.  For example, the way to match
+"abc" provided that it is followed by three repetitions of "xyz" is
+C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>.
 
 =item Range iterator outside integer range
 
@@ -2779,22 +2794,22 @@ Doing so has no effect.
 (W internal) The internal sv_replace() function was handed a new SV with
 a reference count of other than 1.
 
-=item Reference to nonexistent group
+=item Reference to nonexistant group at <HERE< in regex m/%s/
+
+(F) You used something like C<\7> in your regular expression, but there are
+not at least seven sets of capturing parentheses in the expression. If you
+wanted to have the character with value 7 inserted into the regular expression,
+prepend a zero to make the number at least two digits: C<\07>
 
-(F) In a regexp you tried to reference (\1, \2, ...) a group that
-doesn't exist.  Count your parentheses.
+The <HERE< shows in the regular expression about where the problem was
+discovered.
 
 =item regexp memory corruption
 
 (P) The regular expression engine got confused by what the regular
 expression compiler gave it.
 
-=item regexp *+ operand could be empty
-
-(F) The part of the regexp subject to either the * or + quantifier could
-match an empty string.
-
-=item regexp out of space
+=item Regexp out of space
 
 (P) A "can't happen" error, because safemalloc() should have caught it
 earlier.
@@ -2891,22 +2906,31 @@ scalar that had previously been marked as free.
 (W closed) The socket you're sending to got itself closed sometime
 before now.  Check your logic flow.
 
-=item Sequence (? incomplete
+=item Sequence (? incomplete at <HERE< mark in regex m/%s/
 
-(F) A regular expression ended with an incomplete extension (?.  See
+(F) A regular expression ended with an incomplete extension (?. The <HERE<
+shows in the regular expression about where the problem was discovered. See
 L<perlre>.
 
-=item Sequence (?%s...) not implemented
+=item Sequence (?{...}) not terminated or not {}-balanced in regex m/%s/
+
+(F) If the contents of a (?{...}) clause contains braces, they must balance
+for Perl to properly detect the end of the clause. See L<perlre>.
 
-(F) A proposed regular expression extension has the character reserved
-but has not yet been written.  See L<perlre>.
+=item Sequence (?%s...) not implemented at <HERE< mark in regex m/%s/
 
-=item Sequence (?%s...) not recognized
+(F) A proposed regular expression extension has the character reserved but
+has not yet been written. The <HERE< shows in the regular expression about
+where the problem was discovered. See L<perlre>.
+
+=item Sequence (?%s...) not recognized at <HERE< mark in regex m/%s/
 
 (F) You used a regular expression extension that doesn't make sense.
+The <HERE< shows in the regular expression about
+where the problem was discovered. 
 See L<perlre>.
 
-=item Sequence (?#... not terminated
+=item Sequence (?#... not terminated in regex m/%s/
 
 (F) A regular expression comment must be terminated by a closing
 parenthesis.  Embedded parentheses aren't allowed.  See L<perlre>.
@@ -3043,14 +3067,6 @@ a block by itself.
 (W unopened) You tried to use the stat() function on a filehandle that
 was either never opened or has since been closed.
 
-=item Strange *+?{} on zero-length expression
-
-(W regexp) You applied a regular expression quantifier in a place where
-it makes no sense, such as on a zero-width assertion.  Try putting the
-quantifier inside the assertion instead.  For example, the way to match
-"abc" provided that it is followed by three repetitions of "xyz" is
-C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>.
-
 =item Stub found while resolving method `%s' overloading %s
 
 (P) Overloading resolution over @ISA tree may be broken by importation
@@ -3098,6 +3114,24 @@ assignment or as a subroutine argument for example).
 (F) Your Perl was compiled with B<-D>SETUID_SCRIPTS_ARE_SECURE_NOW, but
 a version of the setuid emulator somehow got run anyway.
 
+=item Switch (?(condition)... contains too many branches at <HERE< in regex m/%s/
+
+(F) A (?(condition)if-clause|else-clause) construct can have at most two
+branches (the if-clause and the else-clause). If you want one or both to
+contain alternation, such as using C<this|that|other>, enclose it in
+clustering parentheses:
+
+    (?(condition)(?:this|that|other)|else-clause)
+
+The <HERE< shows in the regular expression about where the problem was
+discovered. See L<perlre>.
+
+=item Switch condition not recognized at <HERE< in regex m/%s/
+
+(F) If the argument to the (?(...)if-clause|else-clause) construct is a
+number, it can be only a number. The <HERE< shows in the regular expression
+about where the problem was discovered. See L<perlre>.
+
 =item switching effective %s is not implemented
 
 (F) While under the C<use filetest> pragma, we cannot switch the real
@@ -3367,11 +3401,23 @@ Check the #! line, or manually feed your script into Perl yourself.
 (F) The unexec() routine failed for some reason.  See your local FSF
 representative, who probably put it there in the first place.
 
+
 =item Unknown BYTEORDER
 
 (F) There are no byte-swapping functions for a machine with this byte
 order.
 
+=item Unknown switch condition (?(%.2s at <HERE< in regex m/%s/
+
+(F) The condition of a (?(condition)if-clause|else-clause) construct is not
+known. The condition may be lookaround (the condition is true if the
+lookaround is true), a (?{...}) construct (the condition is true if the
+code evaluates to a true value), or a number (the condition is true if the
+set of capturing parentheses named by the number is defined).
+
+The <HERE< shows in the regular expression about where the problem was
+discovered.  See L<perlre>.
+
 =item Unknown open() mode '%s'
 
 (F) The second argument of 3-argument open() is not among the list
@@ -3423,12 +3469,14 @@ script, a binary program, or a directory as a Perl program.
 recognized by Perl inside character classes.  The character was
 understood literally.
 
-=item /%s/: Unrecognized escape \\%c passed through
+=item Unrecognized escape \\%c passed through at <HERE< in m/%s/
 
 (W regexp) You used a backslash-character combination which is not
-recognized by Perl.  This combination appears in an interpolated
-variable or a C<'>-delimited regular expression.  The character was
-understood literally.
+recognized by Perl. This combination appears in an interpolated variable or
+a C<'>-delimited regular expression. The character was understood
+literally. The <HERE< shows in the regular expression about where the escape
+was discovered.
+
 
 =item Unrecognized escape \\%c passed through
 
@@ -3659,10 +3707,6 @@ something else of the same name (usually a subroutine) is exported by
 that module.  It usually means you put the wrong funny character on the
 front of your variable.
 
-=item Variable length lookbehind not implemented
-
-(F) Lookbehind currently only works for fixed-length regular expressions.
-
 =item "%s" variable %s masks earlier declaration in same %s
 
 (W misc) A "my" or "our" variable has been redeclared in the current
@@ -3719,6 +3763,12 @@ anonymous, using the C<sub {}> syntax.  When inner anonymous subs that
 reference variables in outer subroutines are called or referenced, they
 are automatically rebound to the current values of such variables.
 
+=item Variable length lookbehind not implemented at <HERE< in regex m/%s/
+
+(F) Lookbehind is allowed only for subexpressions whose length is fixed and
+known at compile time. The <HERE< shows in the regular expression about where
+the problem was discovered.
+
 =item Version number must be a constant number
 
 (P) The attempt to translate a C<use Module n.n LIST> statement into
diff --git a/regcomp.c b/regcomp.c
index 12b2eef633..fd4633ba9f 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -202,25 +202,180 @@ static scan_data_t zero_scan_data = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 #define CHR_DIST(a,b) (UTF ? utf8_distance(a,b) : a - b)
 
 
-#define	vFAIL(m)                                                             \
+/* length of regex to show in messages that don't mark a position within */
+#define RegexLengthToShowInErrorMessages 127
+
+/*
+ * If MARKER[12] are adjusted, be sure to adjust the constants at the top
+ * of t/op/regmesg.t, the tests in t/op/re_tests, and those in
+ * op/pragma/warn/regcomp.
+ */
+#define MARKER1 "<HERE<"      /* marker as it appears in the description */
+#define MARKER2 " <<<HERE<<< "  /* marker as it appears within the regex */
+   
+#define REPORT_LOCATION " at " MARKER1 " mark in regex m/%.*s" MARKER2 "%s/"
+
+/*
+ * Calls SAVEDESTRUCTOR_X if needed, then calls Perl_croak with the given
+ * arg. Show regex, up to a maximum length. If it's too long, chop and add
+ * "...".
+ */
+#define	FAIL(m)                                                              \
     STMT_START {                                                             \
+        char *elipises = "";                                                 \
+        unsigned len = strlen(PL_regprecomp);                                \
+                                                                             \
 	if (!SIZE_ONLY)                                                      \
 	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);                 \
-	Perl_croak(aTHX_ "%s at {#} mark in regex m/%.*s{#}%s/", m,          \
-		   strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse),    \
-                   PL_regprecomp,                                            \
-                   PL_regprecomp + strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse));\
+                                                                             \
+	if (len > RegexLengthToShowInErrorMessages) {                        \
+            /* chop 10 shorter than the max, to ensure meaning of "..." */   \
+	    len = RegexLengthToShowInErrorMessages - 10;                     \
+	    elipises = "...";                                                \
+	}                                                                    \
+	Perl_croak(aTHX_ "%s in regex m/%.*s%s/",                            \
+		   m, len, PL_regprecomp, elipises);                         \
     } STMT_END
 
-#define	vFAIL2(pat,m)                                                        \
+/*
+ * Calls SAVEDESTRUCTOR_X if needed, then calls Perl_croak with the given
+ * args. Show regex, up to a maximum length. If it's too long, chop and add
+ * "...".
+ */
+#define	FAIL2(pat,m)                                                         \
     STMT_START {                                                             \
+        char *elipises = "";                                                 \
+        unsigned len = strlen(PL_regprecomp);                                \
+                                                                             \
 	if (!SIZE_ONLY)                                                      \
 	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);                 \
                                                                              \
-	S_re_croak2(aTHX_ pat, " at {#} mark in regex m/%.*s{#}%s/: ", m,    \
-		   strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse),    \
-                   PL_regprecomp,                                            \
-                   PL_regprecomp + strlen(PL_regprecomp)-(PL_regxend - PL_regcomp_parse));\
+	if (len > RegexLengthToShowInErrorMessages) {                        \
+            /* chop 10 shorter than the max, to ensure meaning of "..." */   \
+	    len = RegexLengthToShowInErrorMessages - 10;                     \
+	    elipises = "...";                                                \
+	}                                                                    \
+	S_re_croak2(aTHX_ pat, " in regex m/%.*s%s/",                        \
+		    m, len, PL_regprecomp, elipises);                        \
+    } STMT_END
+
+
+/*
+ * Simple_vFAIL -- like FAIL, but marks the current location in the scan
+ */
+#define	Simple_vFAIL(m)                                                      \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \
+                                                                             \
+      Perl_croak(aTHX_ "%s" REPORT_LOCATION,               \
+		 m, offset, PL_regprecomp, PL_regprecomp + offset);          \
+    } STMT_END
+
+/*
+ * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL()
+ */
+#define	vFAIL(m)                                                             \
+    STMT_START {                                                             \
+      if (!SIZE_ONLY)                                                        \
+	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);                 \
+      Simple_vFAIL(m);                                                       \
+    } STMT_END
+
+/*
+ * Like Simple_vFAIL(), but accepts two arguments.
+ */
+#define	Simple_vFAIL2(m,a1)                                                  \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \
+                                                                             \
+      S_re_croak2(aTHX_ m, REPORT_LOCATION, a1,       \
+		  offset, PL_regprecomp, PL_regprecomp + offset);            \
+    } STMT_END
+
+/*
+ * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL2().
+ */
+#define	vFAIL2(m,a1)                                                         \
+    STMT_START {                                                             \
+      if (!SIZE_ONLY)                                                        \
+	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);                 \
+      Simple_vFAIL2(m, a1);                                                  \
+    } STMT_END
+
+
+/*
+ * Like Simple_vFAIL(), but accepts three arguments.
+ */
+#define	Simple_vFAIL3(m, a1, a2)                                             \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \
+                                                                             \
+      S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2,   \
+		  offset, PL_regprecomp, PL_regprecomp + offset);            \
+    } STMT_END
+
+/*
+ * Calls SAVEDESTRUCTOR_X if needed, then Simple_vFAIL3().
+ */
+#define	vFAIL3(m,a1,a2)                                                      \
+    STMT_START {                                                             \
+      if (!SIZE_ONLY)                                                        \
+	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);                 \
+      Simple_vFAIL3(m, a1, a2);                                              \
+    } STMT_END
+
+/*
+ * Like Simple_vFAIL(), but accepts four arguments.
+ */
+#define	Simple_vFAIL4(m, a1, a2, a3)                                         \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \
+                                                                             \
+      S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2, a3,\
+		  offset, PL_regprecomp, PL_regprecomp + offset);            \
+    } STMT_END
+
+/*
+ * Like Simple_vFAIL(), but accepts five arguments.
+ */
+#define	Simple_vFAIL5(m, a1, a2, a3, a4)                                     \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-PL_regcomp_parse); \
+      S_re_croak2(aTHX_ m, REPORT_LOCATION, a1, a2, a3, a4,\
+		  offset, PL_regprecomp, PL_regprecomp + offset);            \
+    } STMT_END
+
+
+#define	vWARN(loc,m)                                                         \
+    STMT_START {                                                             \
+        unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc));          \
+	Perl_warner(aTHX_ WARN_REGEXP, "%s" REPORT_LOCATION,\
+		 m, offset, PL_regprecomp, PL_regprecomp + offset);          \
+    } STMT_END                                                               \
+
+
+#define	vWARN2(loc, m, a1)                                                   \
+    STMT_START {                                                             \
+        unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc));          \
+	Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION,\
+                 a1,                                                         \
+		 offset, PL_regprecomp, PL_regprecomp + offset);             \
+    } STMT_END
+
+#define	vWARN3(loc, m, a1, a2)                                               \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp) - (PL_regxend - (loc));        \
+	Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION,                    \
+                 a1, a2,                                                     \
+		 offset, PL_regprecomp, PL_regprecomp + offset);             \
+    } STMT_END
+
+#define	vWARN4(loc, m, a1, a2, a3)                                           \
+    STMT_START {                                                             \
+      unsigned offset = strlen(PL_regprecomp)-(PL_regxend-(loc));            \
+	Perl_warner(aTHX_ WARN_REGEXP, m REPORT_LOCATION,\
+                 a1, a2, a3,                                                 \
+		 offset, PL_regprecomp, PL_regprecomp + offset);             \
     } STMT_END
 
 
@@ -788,8 +943,11 @@ S_study_chunk(pTHX_ regnode **scanp, I32 *deltap, regnode *last, scan_data_t *da
 		if (ckWARN(WARN_REGEXP) && (minnext + deltanext == 0) 
 		    && !(data->flags & (SF_HAS_PAR|SF_IN_PAR))
 		    && maxcount <= REG_INFTY/3) /* Complement check for big count */
-		    Perl_warner(aTHX_ WARN_REGEXP,
-				"Strange *+?{} on zero-length expression");
+		{
+		    vWARN(PL_regcomp_parse,
+			  "Quantifier unexpected on zero-length expression");
+		}
+
 		min += minnext * mincount;
 		is_inf_internal |= ((maxcount == REG_INFTY 
 				     && (minnext + deltanext) > 0)
@@ -852,7 +1010,7 @@ S_study_chunk(pTHX_ regnode **scanp, I32 *deltap, regnode *last, scan_data_t *da
 			regnode *nxt1 = NEXTOPER(oscan) + EXTRA_STEP_2ARGS; /* OPEN*/
 
 			if (OP(nxt) != CLOSE) 
-			    FAIL("panic opt close");
+			    FAIL("Panic opt close");
 			oscan->flags = ARG(nxt);
 			OP(nxt1) = OPTIMIZED;	/* was OPEN. */
 			OP(nxt) = OPTIMIZED;	/* was CLOSE. */
@@ -1414,7 +1572,8 @@ Perl_pregcomp(pTHX_ char *exp, char *xend, PMOP *pm)
     Newc(1001, r, sizeof(regexp) + (unsigned)PL_regsize * sizeof(regnode),
 	 char, regexp);
     if (r == NULL)
-	FAIL("regexp out of space");
+	FAIL("Regexp out of space");
+
 #ifdef DEBUGGING
     /* avoid reading uninitialized memory in DEBUGGING code in study_chunk() */
     Zero(r, sizeof(regexp) + (unsigned)PL_regsize * sizeof(regnode), char);
@@ -1721,6 +1880,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
     register regnode *ender = 0;
     register I32 parno = 0;
     I32 flags, oregflags = PL_regflags, have_branch = 0, open = 0;
+    char *oregcomp_parse = PL_regcomp_parse;
     char c;
 
     *flagp = 0;				/* Tentatively. */
@@ -1731,6 +1891,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 	    U16 posflags = 0, negflags = 0;
 	    U16 *flagsp = &posflags;
 	    int logical = 0;
+	    char *seqstart = PL_regcomp_parse;
 
 	    PL_regcomp_parse++;
 	    paren = *PL_regcomp_parse++;
@@ -1763,8 +1924,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 		return NULL;
 	    case 'p':
 		if (SIZE_ONLY)
-		    Perl_warner(aTHX_ WARN_REGEXP,
-				"(?p{}) is deprecated - use (??{})");
+		    vWARN(PL_regcomp_parse, "(?p{}) is deprecated - use (??{})");
 		/* FALL THROUGH*/
 	    case '?':
 		logical = 1;
@@ -1791,7 +1951,10 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 		    PL_regcomp_parse++;
 		}
 		if (*PL_regcomp_parse != ')')
-		    FAIL("Sequence (?{...}) not terminated or not {}-balanced");
+		{
+		    PL_regcomp_parse = s;		    
+		    vFAIL("Sequence (?{...}) not terminated or not {}-balanced");
+		}
 		if (!SIZE_ONLY) {
 		    AV *av;
 		    
@@ -1850,7 +2013,7 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 			PL_regcomp_parse++;
 		    ret = reganode(GROUPP, parno);
 		    if ((c = *nextchar()) != ')')
-			vFAIL2("Switch (?(number%c not recognized", c);
+			vFAIL("Switch condition not recognized");
 		  insert_if:
 		    regtail(ret, reganode(IFTHEN, 0));
 		    br = regbranch(&flags, 1);
@@ -1884,10 +2047,11 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 		    return ret;
 		}
 		else {
-		    vFAIL2("Unknown condition for (?(%.2s", PL_regcomp_parse);
+		    vFAIL2("Unknown switch condition (?(%.2s", PL_regcomp_parse);
 		}
 	    }
             case 0:
+		PL_regcomp_parse--; /* for vFAIL to print correctly */
                 vFAIL("Sequence (? incomplete");
                 break;
 	    default:
@@ -1911,8 +2075,10 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
 		    break;
 		}		
 	      unknown:
-		if (*PL_regcomp_parse != ')')
-		    vFAIL2("Sequence (?%c...) not recognized", *PL_regcomp_parse);
+		if (*PL_regcomp_parse != ')') {
+		    PL_regcomp_parse++;
+		    vFAIL3("Sequence (%.*s...) not recognized", PL_regcomp_parse-seqstart, seqstart);
+		}
 		nextchar();
 		*flagp = TRYAGAIN;
 		return NULL;
@@ -2024,15 +2190,17 @@ S_reg(pTHX_ I32 paren, I32 *flagp)
     if (paren) {
 	PL_regflags = oregflags;
 	if (PL_regcomp_parse >= PL_regxend || *nextchar() != ')') {
-	    FAIL("unmatched () in regexp");
+ 	    PL_regcomp_parse++;
+ 	    vFAIL("Unmatched (");
 	}
     }
     else if (!paren && PL_regcomp_parse < PL_regxend) {
 	if (*PL_regcomp_parse == ')') {
-	    FAIL("unmatched () in regexp");
+ 	    PL_regcomp_parse = oregcomp_parse;
+ 	    vFAIL("Unmatched (");
 	}
 	else
-	    FAIL("junk on end of regexp");	/* "Can't happen". */
+	    FAIL("Junk on end of regexp");	/* "Can't happen". */
 	/* NOTREACHED */
     }
 
@@ -2207,8 +2375,19 @@ S_regpiece(pTHX_ I32 *flagp)
     }
 
 #if 0				/* Now runtime fix should be reliable. */
+
+    /* if this is reinstated, don't forget to put this back into perldiag:
+
+	    =item Regexp *+ operand could be empty at {#} in regex m/%s/
+
+	   (F) The part of the regexp subject to either the * or + quantifier
+           could match an empty string. The {#} shows in the regular
+           expression about where the problem was discovered.
+
+    */
+
     if (!(flags&HASWIDTH) && op != '?')
-      FAIL("regexp *+ operand could be empty");
+      vFAIL("Regexp *+ operand could be empty");
 #endif 
 
     nextchar();
@@ -2239,8 +2418,10 @@ S_regpiece(pTHX_ I32 *flagp)
     }
   nest_check:
     if (ckWARN(WARN_REGEXP) && !SIZE_ONLY && !(flags&HASWIDTH) && max > REG_INFTY/3) {
-	Perl_warner(aTHX_ WARN_REGEXP, "%.*s matches null string many times",
-	    PL_regcomp_parse - origparse, origparse);
+	vWARN3(PL_regcomp_parse,
+	       "%.*s matches null string many times",
+	       PL_regcomp_parse - origparse,
+	       origparse);
     }
 
     if (*PL_regcomp_parse == '?') {
@@ -2248,8 +2429,10 @@ S_regpiece(pTHX_ I32 *flagp)
 	reginsert(MINMOD, ret);
 	regtail(ret, ret + NODE_STEP_REGNODE);
     }
-    if (ISMULT2(PL_regcomp_parse))
-	vFAIL("Nested quantifiers in regexp");
+    if (ISMULT2(PL_regcomp_parse)) {
+	PL_regcomp_parse++;
+	vFAIL("Nested quantifiers");
+    }
 
     return(ret);
 }
@@ -2262,8 +2445,7 @@ S_regpiece(pTHX_ I32 *flagp)
  * faster to run.  Backslashed characters are exceptions, each becoming a
  * separate node; the code is simpler that way and it's not worth fixing.
  *
- * [Yes, it is worth fixing, some scripts can run twice the speed.]
- */
+ * [Yes, it is worth fixing, some scripts can run twice the speed.] */
 STATIC regnode *
 S_regatom(pTHX_ I32 *flagp)
 {
@@ -2315,13 +2497,17 @@ tryagain:
 	PL_regnaughty++;
 	break;
     case '[':
-	PL_regcomp_parse++;
+    {
+	char *oregcomp_parse = ++PL_regcomp_parse;
 	ret = (UTF ? regclassutf8() : regclass());
-	if (*PL_regcomp_parse != ']')
-	    FAIL("unmatched [] in regexp");
+	if (*PL_regcomp_parse != ']') {
+	    PL_regcomp_parse = oregcomp_parse;
+	    vFAIL("Unmatched [");
+	}
 	nextchar();
 	*flagp |= HASWIDTH|SIMPLE;
 	break;
+    }
     case '(':
 	nextchar();
 	ret = reg(1, &flags);
@@ -2344,7 +2530,7 @@ tryagain:
 	    *flagp |= TRYAGAIN;
 	    return NULL;
 	}
-	vFAIL("internal urp");
+	vFAIL("Internal urp");
 				/* Supposed to be caught earlier. */
 	break;
     case '{':
@@ -2356,7 +2542,8 @@ tryagain:
     case '?':
     case '+':
     case '*':
-	vFAIL("Quantifier follows nothing in regexp");
+	PL_regcomp_parse++;
+	vFAIL("Quantifier follows nothing");
 	break;
     case '\\':
 	switch (*++PL_regcomp_parse) {
@@ -2480,8 +2667,11 @@ tryagain:
 
 		if (PL_regcomp_parse[1] == '{') {
 		    PL_regxend = strchr(PL_regcomp_parse, '}');
-		    if (!PL_regxend)
-			FAIL("Missing right brace on \\p{}");
+		    if (!PL_regxend) {
+			PL_regcomp_parse += 2;
+			PL_regxend = oldregxend;
+			vFAIL("Missing right brace on \\p{}");
+		    }
 		    PL_regxend++;
 		}
 		else
@@ -2514,6 +2704,9 @@ tryagain:
 		if (num > 9 && num >= PL_regnpar)
 		    goto defchar;
 		else {
+                   while (isDIGIT(*PL_regcomp_parse))
+                       PL_regcomp_parse++;
+
 		    if (!SIZE_ONLY && num > PL_regcomp_rx->nparens)
 			vFAIL("Reference to nonexistent group");
 		    PL_regsawback = 1;
@@ -2521,8 +2714,6 @@ tryagain:
 				   ? (LOC ? REFFL : REFF)
 				   : REF, num);
 		    *flagp |= HASWIDTH;
-		    while (isDIGIT(*PL_regcomp_parse))
-			PL_regcomp_parse++;
 		    PL_regcomp_parse--;
 		    nextchar();
 		}
@@ -2530,7 +2721,7 @@ tryagain:
 	    break;
 	case '\0':
 	    if (PL_regcomp_parse >= PL_regxend)
-		FAIL("trailing \\ in regexp");
+		FAIL("Trailing \\");
 	    /* FALL THROUGH */
 	default:
 	    /* Do not generate `unrecognized' warnings here, we fall
@@ -2632,8 +2823,10 @@ tryagain:
 			if (*++p == '{') {
 			    char* e = strchr(p, '}');
 	 
-			    if (!e)
-				FAIL("Missing right brace on \\x{}");
+			    if (!e) {
+				PL_regcomp_parse = p + 1;
+				vFAIL("Missing right brace on \\x{}");
+			    }
 			    else if (UTF) {
 				numlen = 1;	/* allow underscores */
 				ender = (UV)scan_hex(p + 1, e - p - 1, &numlen);
@@ -2645,7 +2838,11 @@ tryagain:
 				p = e + 1;
 			    }
 			    else
+			    {
+				PL_regcomp_parse = e + 1;
 				vFAIL("Can't use \\x{} without 'use utf8' declaration");
+			    }
+
 			}
 			else {
 			    numlen = 0;		/* disallow underscores */
@@ -2673,14 +2870,11 @@ tryagain:
 			break;
 		    case '\0':
 			if (p >= PL_regxend)
-			    FAIL("trailing \\ in regexp");
+			    FAIL("Trailing \\");
 			/* FALL THROUGH */
 		    default:
 			if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(*p))
-			    Perl_warner(aTHX_ WARN_REGEXP, 
-					"/%.127s/: Unrecognized escape \\%c passed through",
-					PL_regprecomp,
-					*p);
+			    vWARN2(p +1, "Unrecognized escape \\%c passed through", *p);
 			goto normal_default;
 		    }
 		    break;
@@ -2728,7 +2922,7 @@ tryagain:
 	    PL_regcomp_parse = p - 1;
 	    nextchar();
 	    if (len < 0)
-		vFAIL("internal disaster");
+		vFAIL("Internal disaster");
 	    if (len > 0)
 		*flagp |= HASWIDTH;
 	    if (len == 1)
@@ -2866,13 +3060,19 @@ S_regpposixcc(pTHX_ I32 value)
 		    if (namedclass == OOB_NAMEDCLASS ||
 			posixcc[skip] != ':' ||
 			posixcc[skip+1] != ']')
-			Perl_croak(aTHX_
-				   "Character class [:%.*s:] unknown",
-				   t - s - 1, s + 1);
-		} else if (!SIZE_ONLY)
+		    {
+			Simple_vFAIL3("POSIX class [:%.*s:] unknown",
+				      t - s - 1, s + 1);
+		    }
+		} else if (!SIZE_ONLY) {
 		    /* [[=foo=]] and [[.foo.]] are still future. */
-		    Perl_croak(aTHX_
-			       "Character class syntax [%c %c] is reserved for future extensions", c, c);
+
+		    /* adjust PL_regcomp_parse so the warning shows after
+		       the class closes */
+		    while (*PL_regcomp_parse && *PL_regcomp_parse != ']')
+			PL_regcomp_parse++;
+		    Simple_vFAIL3("POSIX syntax [%c %c] is reserved for future extensions", c, c);
+		}
 	    } else {
 		/* Maternal grandfather:
 		 * "[:" ending in ":" but not in ":]" */
@@ -2897,11 +3097,17 @@ S_checkposixcc(pTHX)
 	while(*s && isALNUM(*s))
 	    s++;
 	if (*s && c == *s && s[1] == ']') {
-	    Perl_warner(aTHX_ WARN_REGEXP,
-			"Character class syntax [%c %c] belongs inside character classes", c, c);
+	    vWARN3(s+2, "POSIX syntax [%c %c] belongs inside character classes", c, c);
+
+	    /* [[=foo=]] and [[.foo.]] are still future. */
 	    if (c == '=' || c == '.')
-		Perl_croak(aTHX_
-			   "Character class syntax [%c %c] is reserved for future extensions", c, c);
+	    {
+		/* adjust PL_regcomp_parse so the error shows after
+		   the class closes */
+		while (*PL_regcomp_parse && *PL_regcomp_parse++ != ']')
+		    ;
+		Simple_vFAIL3("POSIX syntax [%c %c] is reserved for future extensions", c, c);
+	    }
 	}
     }
 }
@@ -2991,10 +3197,8 @@ S_regclass(pTHX)
 		break;
 	    default:
 		if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(value))
-		    Perl_warner(aTHX_ WARN_REGEXP, 
-				"/%.127s/: Unrecognized escape \\%c in character class passed through",
-				PL_regprecomp,
-				(int)value);
+
+		    vWARN2(PL_regcomp_parse, "Unrecognized escape \\%c in character class passed through", (int)value);
 		break;
 	    }
 	}
@@ -3005,12 +3209,11 @@ S_regclass(pTHX)
 	    if (range) { /* a-\d, a-[:digit:] */
 		if (!SIZE_ONLY) {
 		    if (ckWARN(WARN_REGEXP))
-			Perl_warner(aTHX_ WARN_REGEXP,
-				    "/%.127s/: false [] range \"%*.*s\" in regexp",
-				    PL_regprecomp,
-				    PL_regcomp_parse - rangebegin,
-				    PL_regcomp_parse - rangebegin,
-				    rangebegin);
+			vWARN4(PL_regcomp_parse,
+			       "False [] range \"%*.*s\"",
+			       PL_regcomp_parse - rangebegin,
+			       PL_regcomp_parse - rangebegin,
+			       rangebegin);
 		    ANYOF_BITMAP_SET(ret, lastvalue);
 		    ANYOF_BITMAP_SET(ret, '-');
 		}
@@ -3264,7 +3467,7 @@ S_regclass(pTHX)
 		    }
 		    break;
 		default:
-		    vFAIL("invalid [::] class");
+		    vFAIL("Invalid [::] class");
 		    break;
 		}
 		if (LOC)
@@ -3274,12 +3477,10 @@ S_regclass(pTHX)
 	}
 	if (range) {
 	    if (lastvalue > value) /* b-a */ {
-		Perl_croak(aTHX_
-			   "/%.127s/: invalid [] range \"%*.*s\" in regexp",
-			   PL_regprecomp,
-			   PL_regcomp_parse - rangebegin,
-			   PL_regcomp_parse - rangebegin,
-			   rangebegin);
+		Simple_vFAIL4("Invalid [] range \"%*.*s\"",
+			      PL_regcomp_parse - rangebegin,
+			      PL_regcomp_parse - rangebegin,
+			      rangebegin);
 	    }
 	    range = 0;
 	}
@@ -3290,12 +3491,11 @@ S_regclass(pTHX)
 		PL_regcomp_parse++;
 		if (namedclass > OOB_NAMEDCLASS) { /* \w-, [:word:]- */
 		    if (ckWARN(WARN_REGEXP))
-			Perl_warner(aTHX_ WARN_REGEXP,
-				    "/%.127s/: false [] range \"%*.*s\" in regexp",
-				    PL_regprecomp,
-				    PL_regcomp_parse - rangebegin,
-				    PL_regcomp_parse - rangebegin,
-				    rangebegin);
+			vWARN4(PL_regcomp_parse,
+			       "False [] range \"%*.*s\"",
+			       PL_regcomp_parse - rangebegin,
+			       PL_regcomp_parse - rangebegin,
+			       rangebegin);
 		    if (!SIZE_ONLY)
 			ANYOF_BITMAP_SET(ret, '-');
 		} else
@@ -3416,7 +3616,7 @@ S_regclassutf8(pTHX)
 		if (*PL_regcomp_parse == '{') {
 		    e = strchr(PL_regcomp_parse++, '}');
                     if (!e)
-                        FAIL("Missing right brace on \\p{}");
+                        vFAIL("Missing right brace on \\p{}");
 		    n = e - PL_regcomp_parse;
 		}
 		else {
@@ -3449,8 +3649,8 @@ S_regclassutf8(pTHX)
 	    case 'x':
 		if (*PL_regcomp_parse == '{') {
 		    e = strchr(PL_regcomp_parse++, '}');
-                    if (!e)
-                        FAIL("Missing right brace on \\x{}");
+                    if (!e) 
+                        vFAIL("Missing right brace on \\x{}");
 		    numlen = 1;		/* allow underscores */
 		    value = (UV)scan_hex(PL_regcomp_parse,
 				     e - PL_regcomp_parse,
@@ -3475,10 +3675,9 @@ S_regclassutf8(pTHX)
 		break;
 	    default:
 		if (!SIZE_ONLY && ckWARN(WARN_REGEXP) && isALPHA(value))
-		    Perl_warner(aTHX_ WARN_REGEXP, 
-				"/%.127s/: Unrecognized escape \\%c in character class passed through",
-				PL_regprecomp,
-				(int)value);
+		    vWARN2(PL_regcomp_parse,
+			   "Unrecognized escape \\%c in character class passed through",
+			   (int)value);
 		break;
 	    }
 	}
@@ -3486,12 +3685,11 @@ S_regclassutf8(pTHX)
 	    if (range) { /* a-\d, a-[:digit:] */
 		if (!SIZE_ONLY) {
 		    if (ckWARN(WARN_REGEXP))
-			Perl_warner(aTHX_ WARN_REGEXP,
-				    "/%.127s/: false [] range \"%*.*s\" in regexp",
-				    PL_regprecomp,
-				    PL_regcomp_parse - rangebegin,
-				    PL_regcomp_parse - rangebegin,
-				    rangebegin);
+			vWARN4(PL_regcomp_parse,
+			       "False [] range \"%*.*s\"",
+			       PL_regcomp_parse - rangebegin,
+			       PL_regcomp_parse - rangebegin,
+			       rangebegin);
 		    Perl_sv_catpvf(aTHX_ listsv,
 				   /* 0x002D is Unicode for '-' */
 				   "%04"UVxf"\n002D\n", (UV)lastvalue);
@@ -3558,12 +3756,10 @@ S_regclassutf8(pTHX)
 	}
         if (range) {
 	    if (lastvalue > value) { /* b-a */
-		Perl_croak(aTHX_
-			   "/%.127s/: invalid [] range \"%*.*s\" in regexp",
-			   PL_regprecomp,
-			   PL_regcomp_parse - rangebegin,
-			   PL_regcomp_parse - rangebegin,
-			   rangebegin);
+		Simple_vFAIL4("invalid [] range \"%*.*s\"",
+			      PL_regcomp_parse - rangebegin,
+			      PL_regcomp_parse - rangebegin,
+			      rangebegin);
 	    }
 	    range = 0;
 	}
@@ -3574,12 +3770,11 @@ S_regclassutf8(pTHX)
 		PL_regcomp_parse++;
 		if (namedclass > OOB_NAMEDCLASS) { /* \w-, [:word:]- */
 		    if (ckWARN(WARN_REGEXP))
-			Perl_warner(aTHX_ WARN_REGEXP,
-				    "/%.127s/: false [] range \"%*.*s\" in regexp",
-				    PL_regprecomp,
-				    PL_regcomp_parse - rangebegin,
-				    PL_regcomp_parse - rangebegin,
-				    rangebegin);
+			vWARN4(PL_regcomp_parse,
+			       "False [] range \"%*.*s\"",
+			       PL_regcomp_parse - rangebegin,
+			       PL_regcomp_parse - rangebegin,
+			       rangebegin);
 		    if (!SIZE_ONLY)
 			Perl_sv_catpvf(aTHX_ listsv,
 				       /* 0x002D is Unicode for '-' */
@@ -3976,7 +4171,7 @@ Perl_regprop(pTHX_ SV *sv, regnode *o)
 
     sv_setpvn(sv, "", 0);
     if (OP(o) >= reg_num)		/* regnode.type is unsigned */
-	FAIL("corrupted regexp opcode");
+	FAIL("Corrupted regexp opcode");
     sv_catpv(sv, (char*)reg_name[OP(o)]); /* Take off const! */
 
     k = PL_regkind[(U8)OP(o)];
diff --git a/regcomp.h b/regcomp.h
index 3624917c6e..34c5c251be 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -269,20 +269,6 @@ struct regnode_charclass_class {
 #define UCHARAT(p)	PL_regdummy
 #endif /* lint */
 
-#define	FAIL(m) \
-    STMT_START {							\
-	if (!SIZE_ONLY)							\
-	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);		\
-	Perl_croak(aTHX_ "/%.127s/: %s",  PL_regprecomp,m);		\
-    } STMT_END
-
-#define	FAIL2(pat,m) \
-    STMT_START {							\
-	if (!SIZE_ONLY)							\
-	    SAVEDESTRUCTOR_X(clear_re,(void*)PL_regcomp_rx);		\
-	S_re_croak2(aTHX_ "/%.127s/: ",pat,PL_regprecomp,m);		\
-    } STMT_END
-
 #define EXTRA_SIZE(guy) ((sizeof(guy)-1)/sizeof(struct regnode))
 
 #define REG_SEEN_ZERO_LEN	1
diff --git a/t/op/misc.t b/t/op/misc.t
index 00abc99b45..900e014d58 100755
--- a/t/op/misc.t
+++ b/t/op/misc.t
@@ -346,7 +346,7 @@ print "you die joe!\n" unless "@x" eq 'x y z';
 /(?{"{"})/	# Check it outside of eval too
 EXPECT
 Sequence (?{...}) not terminated or not {}-balanced at - line 1, within pattern
-/(?{"{"})/: Sequence (?{...}) not terminated or not {}-balanced at - line 1.
+Sequence (?{...}) not terminated or not {}-balanced at <HERE< mark in regex m/(?{ <<<HERE<<< "{"})/ at - line 1.
 ########
 /(?{"{"}})/	# Check it outside of eval too
 EXPECT
diff --git a/t/op/regmesg.t b/t/op/regmesg.t
new file mode 100644
index 0000000000..fa22c0649b
--- /dev/null
+++ b/t/op/regmesg.t
@@ -0,0 +1,185 @@
+#!./perl -w
+
+BEGIN {
+	chdir 't' if -d 't';
+	unshift @INC, '../lib';
+}
+
+my $debug = 1;
+
+##
+## If the markers used are changed (search for "MARKER1" in regcomp.c),
+## update only these two variables, and leave the {#} in the @death/@warning
+## arrays below. The {#} is a meta-marker -- it marks where the marker should
+## go.
+
+my $marker1 = "<HERE<";
+my $marker2 = " <<<HERE<<< ";
+
+##
+## Key-value pairs of code/error of code that should have fatal errors.
+##
+my @death =
+(
+ '/[[=foo=]]/' => 'POSIX syntax [= =] is reserved for future extensions at {#} mark in regex m/[[=foo=]{#}]/',
+
+ '/(?<= .*)/' =>  'Variable length lookbehind not implemented at {#} mark in regex m/(?<= .*){#}/',
+
+ '/(?<= x{10000})/' => 'Lookbehind longer than 255 not implemented at {#} mark in regex m/(?<= x{10000}){#}/',
+
+ '/(?@)/' => 'Sequence (?@...) not implemented at {#} mark in regex m/(?@{#})/',
+
+ '/(?{ 1/' => 'Sequence (?{...}) not terminated or not {}-balanced at {#} mark in regex m/(?{{#} 1/',
+
+ '/(?(1x))/' => 'Switch condition not recognized at {#} mark in regex m/(?(1x{#}))/',
+
+ '/(?(1)x|y|z)/' => 'Switch (?(condition)... contains too many branches at {#} mark in regex m/(?(1)x|y|{#}z)/',
+
+ '/(?(x)y|x)/' => 'Unknown switch condition (?(x) at {#} mark in regex m/(?({#}x)y|x)/',
+
+ '/(?/' => 'Sequence (? incomplete at {#} mark in regex m/(?{#}/',
+
+ '/(?;x/' => 'Sequence (?;...) not recognized at {#} mark in regex m/(?;{#}x/',
+ '/(?<;x/' => 'Sequence (?<;...) not recognized at {#} mark in regex m/(?<;{#}x/',
+
+ '/((x)/' => 'Unmatched ( at {#} mark in regex m/({#}(x)/',
+
+ '/x{99999}/' => 'Quantifier in {,} bigger than 32766 at {#} mark in regex m/x{{#}99999}/',
+
+ '/x{3,1}/' => 'Can\'t do {n,m} with n > m at {#} mark in regex m/x{3,1}{#}/',
+
+ '/x**/' => 'Nested quantifiers at {#} mark in regex m/x**{#}/',
+
+ '/x[/' => 'Unmatched [ at {#} mark in regex m/x[{#}/',
+
+ '/*/', => 'Quantifier follows nothing at {#} mark in regex m/*{#}/',
+
+ '/\p{x/' => 'Missing right brace on \p{} at {#} mark in regex m/\p{{#}x/',
+
+ 'use utf8; /[\p{x]/' => 'Missing right brace on \p{} at {#} mark in regex m/[\p{{#}x]/',
+
+ '/(x)\2/' => 'Reference to nonexistent group at {#} mark in regex m/(x)\2{#}/',
+
+ 'my $m = chr(92); $m =~ $m', => 'Trailing \ in regex m/\/',
+
+ '/\x{1/' => 'Missing right brace on \x{} at {#} mark in regex m/\x{{#}1/',
+
+ 'use utf8; /[\x{X]/' => 'Missing right brace on \x{} at {#} mark in regex m/[\x{{#}X]/',
+
+ '/\x{x}/' => 'Can\'t use \x{} without \'use utf8\' declaration at {#} mark in regex m/\x{x}{#}/',
+
+ '/[[:barf:]]/' => 'POSIX class [:barf:] unknown at {#} mark in regex m/[[:barf:]{#}]/',
+
+ '/[[=barf=]]/' => 'POSIX syntax [= =] is reserved for future extensions at {#} mark in regex m/[[=barf=]{#}]/',
+
+ '/[[.barf.]]/' => 'POSIX syntax [. .] is reserved for future extensions at {#} mark in regex m/[[.barf.]{#}]/',
+  
+ '/[z-a]/' => 'Invalid [] range "z-a" at {#} mark in regex m/[z-a{#}]/',
+);
+
+##
+## Key-value pairs of code/error of code that should have non-fatal warnings.
+##
+@warning = (
+    "m/(?p{ 'a' })/" => "(?p{}) is deprecated - use (??{}) at {#} mark in regex m/(?p{#}{ 'a' })/",
+
+    'm/\b*/' => '\b* matches null string many times at {#} mark in regex m/\b*{#}/',
+
+    'm/[:blank:]/' => 'POSIX syntax [: :] belongs inside character classes at {#} mark in regex m/[:blank:]{#}/',
+
+    "m'[\\y]'"     => 'Unrecognized escape \y in character class passed through at {#} mark in regex m/[\y{#}]/',
+
+    'm/[a-\d]/' => 'False [] range "a-\d" at {#} mark in regex m/[a-\d{#}]/',
+    'm/[\w-x]/' => 'False [] range "\w-" at {#} mark in regex m/[\w-{#}x]/',
+    "m'\\y'"     => 'Unrecognized escape \y passed through at {#} mark in regex m/\y{#}/',
+);
+
+my $total = (@death + @warning)/2;
+
+print "1..$total\n";
+
+my $count = 0;
+
+while (@death)
+{
+    $count++;
+    my $regex = shift @death;
+    my $result = shift @death;
+
+    undef $@;
+    $_ = "x";
+    eval $regex;
+    if (not $@) {
+	if ($debug) {
+	    print "oops, $regex didn't die\n"
+	} else {
+	    print "not ok $count\n";
+	}
+	next;
+    }
+    chomp $@;
+    $@ =~ s/ at \(.*?\) line \d+\.$//;
+    $result =~ s/{\#}/$marker1/;
+    $result =~ s/{\#}/$marker2/;
+    if ($@ ne $result) {
+	if ($debug) {
+	    print "For $regex, expected:\n  $result\nGot:\n  $@\n\n";
+	} else {
+	    print "not ok $count\n";
+	}
+	next;
+    }
+    print "ok $count\n";
+}
+
+
+our $warning;
+$SIG{__WARN__} = sub { $warning = shift };
+
+while (@warning)
+{
+    $count++;
+    my $regex = shift @warning;
+    my $result = shift @warning;
+
+    undef $warning;
+    $_ = "x";
+    eval $regex;
+
+    if ($@)
+    {
+	if ($debug) {
+	    print "oops, $regex died with:\n\t$@\n";
+	} else {
+	    print "not ok $count\n";
+	}
+	next;
+    }
+
+    if (not $warning)
+    {
+	if ($debug) {
+	    print "oops, $regex didn't generate a warning\n";
+	} else {
+	    print "not ok $count\n";
+	}
+	next;
+    }
+    chomp $warning;
+    $warning =~ s/ at \(.*?\) line \d+\.$//;
+    $result =~ s/{\#}/$marker1/;
+    $result =~ s/{\#}/$marker2/;
+    if ($warning ne $result)
+    {
+	if ($debug) {
+	    print "For $regex, expected:\n  $result\nGot:\n  $warning\n\n";
+	} else {
+	    print "not ok $count\n";
+	}
+	next;
+    }
+    print "ok $count\n";
+}
+
+
+
diff --git a/t/pragma/warn/regcomp b/t/pragma/warn/regcomp
index ef87b7fbb4..82b9b53722 100644
--- a/t/pragma/warn/regcomp
+++ b/t/pragma/warn/regcomp
@@ -29,7 +29,7 @@ $a =~ /(?=a)*/ ;
 no warnings 'regexp' ;
 $a =~ /(?=a)*/ ;
 EXPECT
-(?=a)* matches null string many times at - line 4.
+(?=a)* matches null string many times at <HERE< mark in regex m/(?=a)* <<<HERE<<< / at - line 4.
 ########
 # regcomp.c [S_study_chunk]
 use warnings 'regexp' ;
@@ -38,7 +38,7 @@ $_ = "" ;
 no warnings 'regexp' ;
 /(?=a)?/;
 EXPECT
-Strange *+?{} on zero-length expression at - line 4.
+Quantifier unexpected on zero-length expression at <HERE< mark in regex m/(?=a)? <<<HERE<<< / at - line 4.
 ########
 # regcomp.c [S_regatom]
 $x = '\m' ;
@@ -47,7 +47,7 @@ $a =~ /a$x/ ;
 no warnings 'regexp' ;
 $a =~ /a$x/ ;
 EXPECT
-/a\m/: Unrecognized escape \m passed through at - line 4.
+Unrecognized escape \m passed through at <HERE< mark in regex m/a\m <<<HERE<<< / at - line 4.
 ########
 # regcomp.c [S_regpposixcc S_checkposixcc]
 BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 }
@@ -61,9 +61,9 @@ no warnings 'regexp' ;
 /[:zog:]/;
 /[[:zog:]]/;
 EXPECT
-Character class syntax [: :] belongs inside character classes at - line 5.
-Character class syntax [: :] belongs inside character classes at - line 6.
-Character class [:zog:] unknown at - line 7.
+POSIX syntax [: :] belongs inside character classes at <HERE< mark in regex m/[:alpha:] <<<HERE<<< / at - line 5.
+POSIX syntax [: :] belongs inside character classes at <HERE< mark in regex m/[:zog:] <<<HERE<<< / at - line 6.
+POSIX class [:zog:] unknown at <HERE< mark in regex m/[[:zog:] <<<HERE<<< ]/
 ########
 # regcomp.c [S_checkposixcc]
 BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 }
@@ -73,8 +73,8 @@ $_ = "" ;
 no warnings 'regexp' ;
 /[.zog.]/;
 EXPECT
-Character class syntax [. .] belongs inside character classes at - line 5.
-Character class syntax [. .] is reserved for future extensions at - line 5.
+POSIX syntax [. .] belongs inside character classes at <HERE< mark in regex m/[.zog.] <<<HERE<<< / at - line 5.
+POSIX syntax [. .] is reserved for future extensions at <HERE< mark in regex m/[.zog.] <<<HERE<<< /
 ########
 # regcomp.c [S_checkposixcc]
 BEGIN { $ENV{PERL_DESTRUCT_LEVEL} = 0 unless $ENV{PERL_DESTRUCT_LEVEL} > 3 }
@@ -84,7 +84,7 @@ $_ = "" ;
 no warnings 'regexp' ;
 /[[.zog.]]/;
 EXPECT
-Character class syntax [. .] is reserved for future extensions at - line 5.
+POSIX syntax [. .] is reserved for future extensions at <HERE< mark in regex m/[[.zog.] <<<HERE<<< ]/
 ########
 # regcomp.c [S_regclass]
 $_ = "";
@@ -109,14 +109,14 @@ no warnings 'regexp' ;
 /[[:alpha:]-[:digit:]]/;
 /[[:digit:]-[:alpha:]]/;
 EXPECT
-/[a-\d]/: false [] range "a-\d" in regexp at - line 5.
-/[\d-b]/: false [] range "\d-" in regexp at - line 6.
-/[\s-\d]/: false [] range "\s-" in regexp at - line 7.
-/[\d-\s]/: false [] range "\d-" in regexp at - line 8.
-/[a-[:digit:]]/: false [] range "a-[:digit:]" in regexp at - line 9.
-/[[:digit:]-b]/: false [] range "[:digit:]-" in regexp at - line 10.
-/[[:alpha:]-[:digit:]]/: false [] range "[:alpha:]-" in regexp at - line 11.
-/[[:digit:]-[:alpha:]]/: false [] range "[:digit:]-" in regexp at - line 12.
+False [] range "a-\d" at <HERE< mark in regex m/[a-\d <<<HERE<<< ]/ at - line 5.
+False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< b]/ at - line 6.
+False [] range "\s-" at <HERE< mark in regex m/[\s- <<<HERE<<< \d]/ at - line 7.
+False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< \s]/ at - line 8.
+False [] range "a-[:digit:]" at <HERE< mark in regex m/[a-[:digit:] <<<HERE<<< ]/ at - line 9.
+False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< b]/ at - line 10.
+False [] range "[:alpha:]-" at <HERE< mark in regex m/[[:alpha:]- <<<HERE<<< [:digit:]]/ at - line 11.
+False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< [:alpha:]]/ at - line 12.
 ########
 # regcomp.c [S_regclassutf8]
 BEGIN {
@@ -148,14 +148,14 @@ no warnings 'regexp' ;
 /[[:alpha:]-[:digit:]]/;
 /[[:digit:]-[:alpha:]]/;
 EXPECT
-/[a-\d]/: false [] range "a-\d" in regexp at - line 12.
-/[\d-b]/: false [] range "\d-" in regexp at - line 13.
-/[\s-\d]/: false [] range "\s-" in regexp at - line 14.
-/[\d-\s]/: false [] range "\d-" in regexp at - line 15.
-/[a-[:digit:]]/: false [] range "a-[:digit:]" in regexp at - line 16.
-/[[:digit:]-b]/: false [] range "[:digit:]-" in regexp at - line 17.
-/[[:alpha:]-[:digit:]]/: false [] range "[:alpha:]-" in regexp at - line 18.
-/[[:digit:]-[:alpha:]]/: false [] range "[:digit:]-" in regexp at - line 19.
+False [] range "a-\d" at <HERE< mark in regex m/[a-\d <<<HERE<<< ]/ at - line 12.
+False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< b]/ at - line 13.
+False [] range "\s-" at <HERE< mark in regex m/[\s- <<<HERE<<< \d]/ at - line 14.
+False [] range "\d-" at <HERE< mark in regex m/[\d- <<<HERE<<< \s]/ at - line 15.
+False [] range "a-[:digit:]" at <HERE< mark in regex m/[a-[:digit:] <<<HERE<<< ]/ at - line 16.
+False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< b]/ at - line 17.
+False [] range "[:alpha:]-" at <HERE< mark in regex m/[[:alpha:]- <<<HERE<<< [:digit:]]/ at - line 18.
+False [] range "[:digit:]-" at <HERE< mark in regex m/[[:digit:]- <<<HERE<<< [:alpha:]]/ at - line 19.
 ########
 # regcomp.c [S_regclass S_regclassutf8]
 use warnings 'regexp' ;
@@ -163,4 +163,5 @@ $a =~ /[a\zb]/ ;
 no warnings 'regexp' ;
 $a =~ /[a\zb]/ ;
 EXPECT
-/[a\zb]/: Unrecognized escape \z in character class passed through at - line 3.
+Unrecognized escape \z in character class passed through at <HERE< mark in regex m/[a\z <<<HERE<<< b]/ at - line 3.
+
author	Jeffrey Friedl <jfriedl@regex.info>	2000-08-08 17:59:43 -0700
committer	Jarkko Hietaniemi <jhi@iki.fi>	2000-08-09 23:05:47 +0000
commit	b45f050a81173020b0089d3ff02fa0276958461a (patch)
tree	9ba9f52c16593c4b22ca597ee69abdc99927555b
parent	68d47915e40dc6ab5050a2c749c177f42d1374ff (diff)
download	perl-b45f050a81173020b0089d3ff02fa0276958461a.tar.gz