sv.c - add a _QUOTEDPREFIX version of SVf, UTF8f, and HEKf for use in error messages.

These new formats are intended to be used in error messages where we want to show the contents of a string without any possible hidden characters not rendering in the error message, and where it would be unreasonable to show every character of the string if it is very long. A good example would be when we want to say that a class name is illegal. Consider: "Foo\0"->thing() should not throw an error message about "Foo" being missing, the fact there is a null in there should be visible to the developer. Similarly if we had ("x" x 1000_000)->thing() we also do not want to throw a 1MB error message as it is generally just unhelpful, a class name that long is almost certainly a mistake. Currently this patch restricts it to showing 256 characters, the first 128 followed by an ellipses followed by the last 128 characters, but the docs are such that we can change that if we wish, I suspect something like 100 would be more reasonable. You can override the define PERL_QUOTEDPREFIX_LEN to a longer value in Configure if you wish. Example usage: other= newSVpvs("Some\0::Thing\n"); sv_catpvf(msg_sv,"%" SVf_QUOTEDPREFIX, SVfARG(other)); Should append "Some\0::Thing\n" to the msg_sv. If it were very long it would have ellipses infixed. The class name "x" x 1_000_000 would show Can't locate object method "non_existent_method" via \ package "x[repeated 128 times]"..."x[repeated 128 times]" \ (perhaps you forgot to load \ "x[repeated 128 times]"..."x[repeated 128 times]"?) at -e line 1. (but obviously as one line with the literal text of the class instead of "[repeated 128 times]") This patch changes a variety of error messages that used to output the full string always. I haven't changed every place that this could happen yet, just the main ones related to method calls, subroutine names and the like.
author: Yves Orton <demerphq@gmail.com> 2022-08-01 16:21:12 +0200
committer: Yves Orton <demerphq@gmail.com> 2022-08-25 14:46:33 +0200
commit: 33ef5d2cdf69f4a7b22ff4c220758689b59b28a8 (patch)
tree: 82f059df72c6aac72eff8ff761051f69debba20e
parent: 697eaf802a042beb1c1c6f1983a08a147f12eb72 (diff)
download: perl-33ef5d2cdf69f4a7b22ff4c220758689b59b28a8.tar.gz
9 files changed, 256 insertions, 49 deletions
diff --git a/dump.c b/dump.c
index 232130be39..7272e65ff8 100644
--- a/dump.c
+++ b/dump.c
@@ -110,28 +110,31 @@ will also be escaped.
 Normally the SV will be cleared before the escaped string is prepared,
 but when C<PERL_PV_ESCAPE_NOCLEAR> is set this will not occur.
 
-If C<PERL_PV_ESCAPE_UNI> is set then the input string is treated as UTF-8
-if C<PERL_PV_ESCAPE_UNI_DETECT> is set then the input string is scanned
+If C<PERL_PV_ESCAPE_UNI> is set then the input string is treated as UTF-8.
+If C<PERL_PV_ESCAPE_UNI_DETECT> is set then the input string is scanned
 using C<is_utf8_string()> to determine if it is UTF-8.
 
 If C<PERL_PV_ESCAPE_ALL> is set then all input chars will be output
-using C<\x01F1> style escapes, otherwise if C<PERL_PV_ESCAPE_NONASCII> is set, only
-non-ASCII chars will be escaped using this style; otherwise, only chars above
-255 will be so escaped; other non printable chars will use octal or
-common escaped patterns like C<\n>.
-Otherwise, if C<PERL_PV_ESCAPE_NOBACKSLASH>
-then all chars below 255 will be treated as printable and
-will be output as literals.
+using C<\x01F1> style escapes, otherwise if C<PERL_PV_ESCAPE_NONASCII>
+is set, only non-ASCII chars will be escaped using this style;
+otherwise, only chars above 255 will be so escaped; other non printable
+chars will use octal or common escaped patterns like C<\n>. Otherwise,
+if C<PERL_PV_ESCAPE_NOBACKSLASH> then all chars below 255 will be
+treated as printable and will be output as literals. The
+C<PERL_PV_ESCAPE_NON_WC> modifies the previous rules to cause word
+chars, unicode or otherwise, to be output as literals, note this uses
+the *unicode* rules for deciding on word characters.
 
 If C<PERL_PV_ESCAPE_FIRSTCHAR> is set then only the first char of the
-string will be escaped, regardless of max.  If the output is to be in hex,
-then it will be returned as a plain hex
-sequence.  Thus the output will either be a single char,
-an octal escape sequence, a special escape like C<\n> or a hex value.
+string will be escaped, regardless of max. If the output is to be in
+hex, then it will be returned as a plain hex sequence. Thus the output
+will either be a single char, an octal escape sequence, a special escape
+like C<\n> or a hex value.
 
-If C<PERL_PV_ESCAPE_RE> is set then the escape char used will be a C<"%"> and
-not a C<"\\">.  This is because regexes very often contain backslashed
-sequences, whereas C<"%"> is not a particularly common character in patterns.
+If C<PERL_PV_ESCAPE_RE> is set then the escape char used will be a
+C<"%"> and not a C<"\\">. This is because regexes very often contain
+backslashed sequences, whereas C<"%"> is not a particularly common
+character in patterns.
 
 Returns a pointer to the escaped text as held by C<dsv>.
 
@@ -144,6 +147,7 @@ Returns a pointer to the escaped text as held by C<dsv>.
 =for apidoc Amnh||PERL_PV_ESCAPE_RE
 =for apidoc Amnh||PERL_PV_ESCAPE_UNI
 =for apidoc Amnh||PERL_PV_ESCAPE_UNI_DETECT
+=for apidoc Amnh||PERL_PV_ESCAPE_NON_WC
 
 =cut
 
@@ -161,7 +165,7 @@ Unused or not for public use
 
 char *
 Perl_pv_escape( pTHX_ SV *dsv, char const * const str, 
-                const STRLEN count, const STRLEN max, 
+                const STRLEN count, STRLEN max,
                 STRLEN * const escaped, U32 flags )
 {
 
@@ -173,13 +177,42 @@ Perl_pv_escape( pTHX_ SV *dsv, char const * const str,
 
     const char esc = (flags & PERL_PV_ESCAPE_RE) ? '%' : '\\';
     const char dq = (flags & PERL_PV_ESCAPE_QUOTE) ? '"' : esc;
+    const char *qs;
+    const char *qe;
+
     char octbuf[PV_ESCAPE_OCTBUFSIZE] = "%123456789ABCDF";
     STRLEN wrote = 0;    /* chars written so far */
     STRLEN chsize = 0;   /* size of data to be written */
     STRLEN readsize = 1; /* size of data just read */
-    bool isuni= flags & PERL_PV_ESCAPE_UNI ? 1 : 0; /* is this UTF-8 */
+    bool isuni= (flags & PERL_PV_ESCAPE_UNI)
+                ? TRUE : FALSE; /* is this UTF-8 */
     const char *pv  = str;
     const char * const end = pv + count; /* end of string */
+    const char *restart = NULL;
+    STRLEN extra_len = 0;
+    STRLEN tail = 0;
+    if ((flags & PERL_PV_ESCAPE_TRUNC_MIDDLE) && max > 3) {
+        if (flags & PERL_PV_ESCAPE_QUOTE) {
+            qs = qe = "\"";
+            extra_len = 5;
+        } else if (flags & PERL_PV_PRETTY_LTGT) {
+            qs = "<";
+            qe = ">";
+            extra_len = 5;
+        } else {
+            qs = qe = "";
+            extra_len = 3;
+        }
+        tail = max / 2;
+        restart = isuni ? (char *)utf8_hop_back((U8*)end,-tail,(U8*)pv) : end - tail;
+        if (restart > pv) {
+            max -= tail;
+        } else {
+            tail = 0;
+            restart = NULL;
+        }
+    }
+
     octbuf[0] = esc;
 
     PERL_ARGS_ASSERT_PV_ESCAPE;
@@ -192,9 +225,10 @@ Perl_pv_escape( pTHX_ SV *dsv, char const * const str,
     if ((flags & PERL_PV_ESCAPE_UNI_DETECT) && is_utf8_string((U8*)pv, count))
         isuni = 1;
     
-    for ( ; (pv < end && (!max || (wrote < max))) ; pv += readsize ) {
+    for ( ; pv < end ; pv += readsize ) {
         const UV u= (isuni) ? utf8_to_uvchr_buf((U8*)pv, (U8*) end, &readsize) : (U8)*pv;
         const U8 c = (U8)u;
+        const char *source_buf = octbuf;
         
         if ( ( u > 255 )
           || (flags & PERL_PV_ESCAPE_ALL)
@@ -204,6 +238,11 @@ Perl_pv_escape( pTHX_ SV *dsv, char const * const str,
                 chsize = my_snprintf( octbuf, PV_ESCAPE_OCTBUFSIZE, 
                                       "%" UVxf, u);
             else
+            if ((flags & PERL_PV_ESCAPE_NON_WC) && isWORDCHAR_uvchr(u)) {
+                chsize = readsize;
+                source_buf = pv;
+            }
+            else
                 chsize = my_snprintf( octbuf, PV_ESCAPE_OCTBUFSIZE, 
                                       ((flags & PERL_PV_ESCAPE_DWIM) && !isuni)
                                       ? ( use_uc_hex ? ("%c" PV_BYTE_HEX_UC) : ("%c" PV_BYTE_HEX_LC) )
@@ -251,11 +290,22 @@ Perl_pv_escape( pTHX_ SV *dsv, char const * const str,
                 chsize = 1;
             }
         }
-        if ( max && (wrote + chsize > max) ) {
-            break;
+        if (max && (wrote + chsize > max)) {
+            if (restart) {
+                /* this only happens with PERL_PV_ESCAPE_TRUNC_MIDDLE */
+                if (dsv)
+                    Perl_sv_catpvf( aTHX_ dsv,"%s...%s", qe, qs);
+                wrote += extra_len;
+                pv = restart;
+                max = tail;
+                wrote = tail = 0;
+                restart = NULL;
+            } else {
+                break;
+            }
         } else if (chsize > 1) {
             if (dsv)
-                sv_catpvn(dsv, octbuf, chsize);
+                sv_catpvn(dsv, source_buf, chsize);
             wrote += chsize;
         } else {
             /* If PERL_PV_ESCAPE_NOBACKSLASH is set then non-ASCII bytes
diff --git a/gv.c b/gv.c
index 9db0ada4e6..c6bb258656 100644
--- a/gv.c
+++ b/gv.c
@@ -1245,8 +1245,8 @@ Perl_gv_fetchmethod_pvn_flags(pTHX_ HV *stash, const char *name, const STRLEN le
                         return gv;
                 }
                 Perl_croak(aTHX_
-                           "Can't locate object method \"%" UTF8f
-                           "\" via package \"%" HEKf "\"",
+                           "Can't locate object method %" UTF8f_QUOTEDPREFIX ""
+                           " via package %" HEKf_QUOTEDPREFIX,
                                     UTF8fARG(is_utf8, name_end - name, name),
                                     HEKfARG(HvNAME_HEK(stash)));
             }
@@ -1261,9 +1261,9 @@ Perl_gv_fetchmethod_pvn_flags(pTHX_ HV *stash, const char *name, const STRLEN le
                 }
 
                 Perl_croak(aTHX_
-                           "Can't locate object method \"%" UTF8f
-                           "\" via package \"%" SVf "\""
-                           " (perhaps you forgot to load \"%" SVf "\"?)",
+                           "Can't locate object method %" UTF8f_QUOTEDPREFIX ""
+                           " via package %" SVf_QUOTEDPREFIX ""
+                           " (perhaps you forgot to load %" SVf_QUOTEDPREFIX "?)",
                            UTF8fARG(is_utf8, name_end - name, name),
                            SVfARG(packnamesv), SVfARG(packnamesv));
             }
diff --git a/perl.h b/perl.h
index e4551440c6..b61a8417ad 100644
--- a/perl.h
+++ b/perl.h
@@ -4152,30 +4152,59 @@ out there, Solaris being the most prominent.
 
 #define SVfARG(p) ((void*)(p))
 
+/* Render an SV as a quoted and escaped string suitable for an error message.
+ * Only shows the first PERL_QUOTEDPREFIX_LEN characters, and adds ellipses if the
+ * string is too long.
+ */
+#ifndef PERL_QUOTEDPREFIX_LEN
+# define PERL_QUOTEDPREFIX_LEN 256
+#endif
+#ifndef SVf_QUOTEDPREFIX
+#  define SVf_QUOTEDPREFIX "5p"
+#endif
+
+/* like %s but runs through the quoted prefix logic */
+#ifndef PVf_QUOTEDPREFIX
+#  define PVf_QUOTEDPREFIX "1p"
+#endif
+
 #ifndef HEKf
 #  define HEKf "2p"
 #endif
 
+#ifndef HEKf_QUOTEDPREFIX
+#  define HEKf_QUOTEDPREFIX "7p"
+#endif
+
 /* Not ideal, but we cannot easily include a number in an already-numeric
  * format sequence. */
 #ifndef HEKf256
 #  define HEKf256 "3p"
 #endif
 
+#ifndef HEKf256_QUOTEDPREFIX
+#  define HEKf256_QUOTEDPREFIX "8p"
+#endif
+
 #define HEKfARG(p) ((void*)(p))
 
 /* Documented in perlguts
  *
- * %4p is a custom format
+ * %4p and %9p are custom formats for handling UTF8 parameters.
+ * They only occur when prefixed by specific other formats.
  */
 #ifndef UTF8f
 #  define UTF8f "d%" UVuf "%4p"
 #endif
+#ifndef UTF8f_QUOTEDPREFIX
+#  define UTF8f_QUOTEDPREFIX "d%" UVuf "%9p"
+#endif
 #define UTF8fARG(u,l,p) (int)cBOOL(u), (UV)(l), (void*)(p)
 
 #define PNf UTF8f
 #define PNfARG(pn) (int)1, (UV)PadnameLEN(pn), (void *)PadnamePV(pn)
 
+
 #ifdef PERL_CORE
 /* not used; but needed for backward compatibility with XS code? - RMB
 =for apidoc_section $io_formats
@@ -8121,7 +8150,7 @@ Allows one ending \0
 #define PERL_PV_ESCAPE_NONASCII     0x000400
 #define PERL_PV_ESCAPE_FIRSTCHAR    0x000800
 
-#define PERL_PV_ESCAPE_ALL            0x001000
+#define PERL_PV_ESCAPE_ALL          0x001000
 #define PERL_PV_ESCAPE_NOBACKSLASH  0x002000
 #define PERL_PV_ESCAPE_NOCLEAR      0x004000
 #define PERL_PV_PRETTY_NOCLEAR      PERL_PV_ESCAPE_NOCLEAR
@@ -8133,6 +8162,18 @@ Allows one ending \0
 /* Escape PV with all hex, including NUL. */
 #define PERL_PV_ESCAPE_DWIM_ALL_HEX 0x020000
 
+/* Do not escape word characters, alters meaning of other flags */
+#define PERL_PV_ESCAPE_NON_WC       0x040000
+#define PERL_PV_ESCAPE_TRUNC_MIDDLE 0x080000
+
+#define PERL_PV_PRETTY_QUOTEDPREFIX (   \
+        PERL_PV_PRETTY_ELLIPSES |       \
+        PERL_PV_PRETTY_QUOTE    |       \
+        PERL_PV_ESCAPE_NONASCII |       \
+        PERL_PV_ESCAPE_NON_WC   |       \
+        PERL_PV_ESCAPE_TRUNC_MIDDLE |   \
+        0)
+
 
 /* used by pv_display in dump.c*/
 #define PERL_PV_PRETTY_DUMP  PERL_PV_PRETTY_ELLIPSES|PERL_PV_PRETTY_QUOTE
diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 4205003e80..9d9a74b986 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -3021,6 +3021,16 @@ You can use this to concatenate two scalars:
  SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf,
                      SVfARG(var1), SVfARG(var2));
 
+=for apidoc Amnh||SVf_QUOTEDPREFIX
+
+C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the
+number of the characters printed, showing at most the first
+C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with
+double quotes and with the contents escaped using double quoted string
+escaping rules. If the string is longer than this then ellipses "..."
+will be appended after the trailing quote. This is intended for error
+messages where the string is assumed to be a class name.
+
 =head2 Formatted Printing of Strings
 
 If you just want the bytes printed in a 7bit NUL-terminated string, you can
@@ -3054,6 +3064,11 @@ C<-C> parameter.  (See L<perlrun/-C [numberE<sol>list]>.
 
 =for apidoc_section $io_formats
 =for apidoc Amnh||UTF8f
+Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose
+the arguments for this format.
+=for apidoc Amnh||UTF8f_QUOTEDPREFIX
+Same as C<UTF8f> but the output is quoted, escaped and length limited.
+See C<SVf_QUOTEDPREFIX> for more details on escaping.
 =for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str
 
 =cut
diff --git a/pp_sys.c b/pp_sys.c
index 48ad17df80..6ae5cd5aa7 100644
--- a/pp_sys.c
+++ b/pp_sys.c
@@ -932,7 +932,8 @@ PP(pp_tie)
        stash = gv_stashsv(*MARK, 0);
        if (!stash) {
            if (SvROK(*MARK))
-               DIE(aTHX_ "Can't locate object method \"%s\" via package \"%" SVf "\"",
+               DIE(aTHX_ "Can't locate object method %" PVf_QUOTEDPREFIX
+                         " via package %" SVf_QUOTEDPREFIX,
                    methname, SVfARG(*MARK));
            else if (isGV(*MARK)) {
                /* If the glob doesn't name an existing package, using
@@ -940,15 +941,17 @@ PP(pp_tie)
                 * generate the name for the error message explicitly. */
                SV *stashname = sv_newmortal();
                gv_fullname4(stashname, (GV *) *MARK, NULL, FALSE);
-               DIE(aTHX_ "Can't locate object method \"%s\" via package \"%" SVf "\"",
+               DIE(aTHX_ "Can't locate object method %" PVf_QUOTEDPREFIX
+                         " via package %" SVf_QUOTEDPREFIX,
                    methname, SVfARG(stashname));
            }
            else {
                SV *stashname = !SvPOK(*MARK) ? &PL_sv_no
                              : SvCUR(*MARK)  ? *MARK
                              :                 newSVpvs_flags("main", SVs_TEMP);
-               DIE(aTHX_ "Can't locate object method \"%s\" via package \"%" SVf "\""
-                   " (perhaps you forgot to load \"%" SVf "\"?)",
+               DIE(aTHX_ "Can't locate object method %" PVf_QUOTEDPREFIX
+                         " via package %" SVf_QUOTEDPREFIX
+                   " (perhaps you forgot to load %" SVf_QUOTEDPREFIX "?)",
                    methname, SVfARG(stashname), SVfARG(stashname));
            }
        }
@@ -957,7 +960,8 @@ PP(pp_tie)
             * been deleted from the symbol table, which this one can't
             * be, since we just looked it up by name.
             */
-           DIE(aTHX_ "Can't locate object method \"%s\" via package \"%" HEKf "\"",
+           DIE(aTHX_ "Can't locate object method %" PVf_QUOTEDPREFIX
+                     " via package %" HEKf_QUOTEDPREFIX ,
                methname, HvENAME_HEK_NN(stash));
        }
         ENTER_with_name("call_TIE");
diff --git a/sv.c b/sv.c
index 3bda95ce7a..e7991350e0 100644
--- a/sv.c
+++ b/sv.c
@@ -12027,6 +12027,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 #endif
     /* we never change this unless USE_LOCALE_NUMERIC */
     bool in_lc_numeric = FALSE;
+    SV *tmp_sv = NULL;
 
     PERL_ARGS_ASSERT_SV_VCATPVFN_FLAGS;
     PERL_UNUSED_ARG(maybe_tainted);
@@ -12132,6 +12133,8 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 
         char c;                       /* the actual format ('d', s' etc) */
 
+        bool escape_it   = FALSE;     /* if this is a string should we quote and escape it? */
+
 
         /* echo everything up to the next format specification */
         for (q = fmtstart; q < patend && *q != '%'; ++q)
@@ -12505,6 +12508,21 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
             }
 
         string:
+            if (escape_it) {
+                U32 flags = PERL_PV_PRETTY_QUOTEDPREFIX;
+                if (is_utf8)
+                    flags |= PERL_PV_ESCAPE_UNI;
+
+                if (!tmp_sv) {
+                    /* "blah"... where blah might be made up
+                     * of characters like \x{1234} */
+                    tmp_sv = newSV(1 + (PERL_QUOTEDPREFIX_LEN * 8) + 1 + 3);
+                    sv_2mortal(tmp_sv);
+                }
+                pv_pretty(tmp_sv, eptr, elen, PERL_QUOTEDPREFIX_LEN,
+                            NULL, NULL, flags);
+                eptr = SvPV_const(tmp_sv, elen);
+            }
             if (has_precis && precis < elen)
                 elen = precis;
             break;
@@ -12513,7 +12531,34 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 
         case 'p':
 
-            /* %p extensions:
+            /* BEGIN NOTE
+             *
+             * We want to extend the C level sprintf format API with
+             * custom formats for specific types (eg SV*) and behavior.
+             * However some C compilers are "sprintf aware" and will
+             * throw compile time exceptions when an illegal sprintf is
+             * encountered, so we can't just add new format letters.
+             *
+             * However it turns out the length argument to the %p format
+             * is more or less useless (the size of a pointer does not
+             * change over time) and is not really used in the C level
+             * code. Accordingly we can map our special behavior to
+             * specific "length" options to the %p format. We hide these
+             * mappings behind defines anyway, so nobody needs to know
+             * that HEKf is actually %2p. This keeps the C compiler
+             * happy while allowing us to add new formats.
+             *
+             * Note the existing logic for which number is used for what
+             * is torturous. All negative values are used for SVf, and
+             * non-negative values have arbitrary meanings with no
+             * structure to them. This may change in the future.
+             *
+             * NEVER use the raw %p values directly. Always use the define
+             * as the underlying mapping may change in the future.
+             *
+             * END NOTE
+             *
+             * %p extensions:
              *
              * "%...p" is normally treated like "%...x", except that the
              * number to print is the SV's address (or a pointer address
@@ -12523,23 +12568,42 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
              * extensions. These are currently:
              *
              * %-p       (SVf)  Like %s, but gets the string from an SV*
-             *                  arg rather than a char* arg.
+             *                  arg rather than a char* arg. Use C<SVfARG()>
+             *                  to set up the argument properly.
              *                  (This was previously %_).
              *
-             * %-<num>p         Ditto but like %.<num>s (i.e. num is max width)
+             * %-<num>p         Ditto but like %.<num>s (i.e. num is max
+             *                  width), there is no escaped and quoted version
+             *                  of this.
+             *
+             * %1p       (PVf_QUOTEDPREFIX). Like raw %s, but it is escaped
+             *                  and quoted.
+             *
+             * %5p       (SVf_QUOTEDPREFIX) Like SVf, but length restricted,
+             *                  escaped and quoted with pv_pretty. Intended
+             *                  for error messages.
              *
              * %2p       (HEKf) Like %s, but using the key string in a HEK
+             * %7p       (HEKf_QUOTEDPREFIX) ... but escaped and quoted.
              *
              * %3p       (HEKf256) Ditto but like %.256s
+             * %8p       (HEKf256_QUOTEDPREFIX) ... but escaped and quoted
              *
              * %d%lu%4p  (UTF8f) A utf8 string. Consumes 3 args:
              *                       (cBOOL(utf8), len, string_buf).
              *                   It's handled by the "case 'd'" branch
              *                   rather than here.
+             * %d%lu%9p  (UTF8f_QUOTEDPREFIX) .. but escaped and quoted.
+             *
              *
-             * %<num>p   where num is 1 or > 4: reserved for future
+             * %<num>p   where num is > 9: reserved for future
              *           extensions. Warns, but then is treated as a
              *           general %p (print hex address) format.
+             *
+             * NOTE: If you add a new magic %p value you will
+             * need to update F<t/porting/diag.t> to be aware of it
+             * on top of adding the various defines and etc. Do not
+             * forget to add it to F<pod/perlguts.pod> as well.
              */
 
             if (   args
@@ -12551,10 +12615,12 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                 && q[-2] != '*'
                 && q[-2] != '$'
             ) {
-                if (left) {			/* %-p (SVf), %-NNNp */
-                    if (width) {
+                if (left || width == 5) {                /* %-p (SVf), %-NNNp, %5p */
+                    if (left && width) {
                         precis = width;
                         has_precis = TRUE;
+                    } else if (width == 5) {
+                        escape_it = TRUE;
                     }
                     argsv = MUTABLE_SV(va_arg(*args, void*));
                     eptr = SvPV_const(argsv, elen);
@@ -12563,7 +12629,9 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                     width = 0;
                     goto string;
                 }
-                else if (width == 2 || width == 3) {	/* HEKf, HEKf256 */
+                else if (width == 2 || width == 3 ||
+                         width == 7 || width == 8)
+                {        /* HEKf, HEKf256, HEKf_QUOTEDPREFIX, HEKf256_QUOTEDPREFIX */
                     HEK * const hek = va_arg(*args, HEK *);
                     eptr = HEK_KEY(hek);
                     elen = HEK_LEN(hek);
@@ -12573,10 +12641,20 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                         precis = 256;
                         has_precis = TRUE;
                     }
+                    if (width > 5)
+                        escape_it = TRUE;
+                    width = 0;
+                    goto string;
+                }
+                else if (width == 1) {
+                    eptr = va_arg(*args,char *);
+                    elen = strlen(eptr);
+                    escape_it = TRUE;
                     width = 0;
                     goto string;
                 }
                 else if (width) {
+                    /* note width=4 or width=9 is handled under %d */
                     Perl_ck_warner_d(aTHX_ packWARN(WARN_INTERNAL),
                          "internal %%<num>p might conflict with future printf extensions");
                 }
@@ -12617,7 +12695,8 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
         case 'd':
             /* probably just a plain %d, but it might be the start of the
              * special UTF8f format, which usually looks something like
-             * "%d%lu%4p" (the lu may vary by platform)
+             * "%d%lu%4p" (the lu may vary by platform) or
+             * "%d%lu%9p" for an escaped version.
              */
             assert((UTF8f)[0] == 'd');
             assert((UTF8f)[1] == '%');
@@ -12626,10 +12705,15 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                  && q == fmtstart + 1 /* plain %d, not %....d */
                  && patend >= fmtstart + sizeof(UTF8f) - 1 /* long enough */
                  && *q == '%'
-                 && strnEQ(q + 1, (UTF8f) + 2, sizeof(UTF8f) - 3))
+                 && strnEQ(q + 1, (UTF8f) + 2, sizeof(UTF8f) - 5)
+                 && q[sizeof(UTF8f)-3] == 'p'
+                 && (q[sizeof(UTF8f)-4] == '4' ||
+                     q[sizeof(UTF8f)-4] == '9'))
             {
                 /* The argument has already gone through cBOOL, so the cast
                    is safe. */
+                if (q[sizeof(UTF8f)-4] == '9')
+                    escape_it = TRUE;
                 is_utf8 = (bool)va_arg(*args, int);
                 elen = va_arg(*args, UV);
                 /* if utf8 length is larger than 0x7ffff..., then it might
diff --git a/t/op/method.t b/t/op/method.t
index d0fc321804..776a0e4ffb 100644
--- a/t/op/method.t
+++ b/t/op/method.t
@@ -428,7 +428,7 @@ is $kalled, 1, 'calling a class method via a magic variable';
     eval {
         NulTest->${ \"method\0Whoops" };
     };
-    like $@, qr/Can't locate object method "method\0Whoops" via package "NulTest" at/,
+    like $@, qr/Can't locate object method "method\\0Whoops" via package "NulTest" at/,
             "method lookup is nul-clean";
 
     *NulTest::AUTOLOAD = sub { our $AUTOLOAD; return $AUTOLOAD };
diff --git a/t/porting/diag.t b/t/porting/diag.t
index 4866e610c3..141d456970 100644
--- a/t/porting/diag.t
+++ b/t/porting/diag.t
@@ -182,7 +182,9 @@ foreach my $cur_entry ( keys %entries) {
 # List from perlguts.pod "Formatted Printing of IVs, UVs, and NVs"
 # Convert from internal formats to ones that the readers will be familiar
 # with, while removing any format modifiers, such as precision, the
-# presence of which would just confuse the pod's explanation
+# presence of which would just confuse the pod's explanation.
+# Note that the 'S' formats get converted into \"%s\" as they inject
+# double quotes.
 my %specialformats = (IVdf => 'd',
 		      UVuf => 'd',
 		      UVof => 'o',
@@ -192,12 +194,18 @@ my %specialformats = (IVdf => 'd',
 		      NVff => 'f',
 		      NVgf => 'f',
 		      HEKf256=>'s',
+		      HEKf256_QUOTEDPREFIX => 'S',
 		      HEKf => 's',
+		      HEKf_QUOTEDPREFIX => 'S',
 		      UTF8f=> 's',
+		      UTF8f_QUOTEDPREFIX => 'S',
 		      SVf256=>'s',
 		      SVf32=> 's',
 		      SVf  => 's',
+		      SVf_QUOTEDPREFIX  => 'S',
+                      PVf_QUOTEDPREFIX  => 'S',
 		      PNf  => 's');
+
 my $format_modifiers = qr/ [#0\ +-]*              # optional flags
 			  (?: [1-9][0-9]* | \* )? # optional field width
 			  (?: \. \d* )?           # optional precision
@@ -205,8 +213,8 @@ my $format_modifiers = qr/ [#0\ +-]*              # optional flags
 			/x;
 
 my $specialformats =
- join '|', sort { length $b cmp length $a } keys %specialformats;
-my $specialformats_re = qr/%$format_modifiers"\s*($specialformats)(\s*")?/;
+ join '|', sort { length($b) <=> length($a) || $a cmp $b } keys %specialformats;
+my $specialformats_re = qr/%$format_modifiers"\s*($specialformats)(\s*(?:"|\z))?/;
 
 # We skip the bodies of most XS functions, but not within these files
 my @include_xs_files = (
@@ -321,6 +329,7 @@ sub check_file {
     # in later lines.
 
     s/$specialformats_re/"%$specialformats{$1}" .  (defined $2 ? '' : '"')/ge;
+    s/\%S/\\"%s\\"/g; # convert an %S into a quoted %s.
 
     # Remove any remaining format modifiers, but not in %%
     s/ (?<!%) % $format_modifiers ( [dioxXucsfeEgGp] ) /%$1/xg;
diff --git a/t/uni/method.t b/t/uni/method.t
index 7fb7f18f48..36e37f6bbf 100644
--- a/t/uni/method.t
+++ b/t/uni/method.t
@@ -113,6 +113,7 @@ is( ref Føø::Bær->new, 'Føø::Bær');
 my $new_ascii = "new";
 my $new_latin = "nèw";
 my $e_with_grave = byte_utf8a_to_utf8n("\303\250");
+my $e_with_grave_escaped= $e_with_grave=~s/\x{a8}/\\\\x\\{a8\\}/r;
 my $new_utf8  = "n${e_with_grave}w";
 my $newoct    = "n${e_with_grave}w";
 utf8::decode($new_utf8);
@@ -123,7 +124,8 @@ like( Føø::Bær->$new_utf8, qr/Føø::Bær=HASH/u, "Can access \$new_utf8, [$n
 {
     local $@;
     eval { Føø::Bær->$newoct };
-    like($@, qr/Can't locate object method "n${e_with_grave}w" via package "Føø::Bær"/u, "Can't access [$newoct], stored in a scalar, as a method through a UTF-8 package." );
+    like($@, qr/Can't locate object method "n${e_with_grave_escaped}w" via package "Føø::Bær"/u,
+        "Can't access [$newoct], stored in a scalar, as a method through a UTF-8 package." );
 }
 
 
@@ -139,8 +141,10 @@ like( $pkg_latin_1->$new_latin, qr/Føø::Bær=HASH/u, "Can access \$new_latin,
 like( $pkg_latin_1->$new_utf8, qr/Føø::Bær=HASH/u, "Can access \$new_utf8, [$new_utf8], stored in a scalar, as a method, when the UTF-8 package name is also in a scalar." );
 {
     local $@;
+
     eval { $pkg_latin_1->$newoct };
-    like($@, qr/Can't locate object method "n${e_with_grave}w" via package "Føø::Bær"/u, "Can't access [$newoct], stored in a scalar, as a method, when the UTF-8 package name is also in a scalar.");
+    like($@, qr/Can't locate object method "n${e_with_grave_escaped}w" via package "Føø::Bær"/u,
+        "Can't access [$newoct], stored in a scalar, as a method, when the UTF-8 package name is also in a scalar.");
 }
 
 ok !!Føø::Bær->can($new_ascii), "->can works for [$new_ascii]";
author	Yves Orton <demerphq@gmail.com>	2022-08-01 16:21:12 +0200
committer	Yves Orton <demerphq@gmail.com>	2022-08-25 14:46:33 +0200
commit	33ef5d2cdf69f4a7b22ff4c220758689b59b28a8 (patch)
tree	82f059df72c6aac72eff8ff761051f69debba20e
parent	697eaf802a042beb1c1c6f1983a08a147f12eb72 (diff)
download	perl-33ef5d2cdf69f4a7b22ff4c220758689b59b28a8.tar.gz