diff options
author | Gurusamy Sarathy <gsar@cpan.org> | 2000-03-14 05:49:08 +0000 |
---|---|---|
committer | Gurusamy Sarathy <gsar@cpan.org> | 2000-03-14 05:49:08 +0000 |
commit | 055fd3a96a4b067d75446c3d47ffc318e9acc40d (patch) | |
tree | b6449a19782d8aa2703033c9338c80210f4189eb /pod/perldebguts.pod | |
parent | e3e876cf806e9a3bb353ac41418f1f80df999716 (diff) | |
download | perl-055fd3a96a4b067d75446c3d47ffc318e9acc40d.tar.gz |
patches for many bugs in the debugger; documentation updates for
perldelta; split perldebug.pod into perldeb{ug,guts}.pod (from
Tom Christiansen)
p4raw-id: //depot/perl@5723
Diffstat (limited to 'pod/perldebguts.pod')
-rw-r--r-- | pod/perldebguts.pod | 923 |
1 files changed, 923 insertions, 0 deletions
diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod new file mode 100644 index 0000000000..b74f3efb6b --- /dev/null +++ b/pod/perldebguts.pod @@ -0,0 +1,923 @@ +=head1 NAME + +perldebguts - Guts of Perl debugging + +=head1 DESCRIPTION + +This is not the perldebug(1) manpage, which tells you how to use +the debugger. This manpage describes low-level details ranging +between difficult and impossible for anyone who isn't incredibly +intimate with Perl's guts to understand. Caveat lector. + +=head1 Debugger Internals + +Perl has special debugging hooks at compile-time and run-time used +to create debugging environments. These hooks are not to be confused +with the I<perl -Dxxx> command described in L<perlrun>, which are +usable only if a special Perl built per the instructions the +F<INSTALL> podpage in the Perl source tree. + +For example, whenever you call Perl's built-in C<caller> function +from the package DB, the arguments that the corresponding stack +frame was called with are copied to the the @DB::args array. The +general mechanisms is enabled by calling Perl with the B<-d> switch, the +following additional features are enabled (cf. L<perlvar/$^P>): + +=over + +=item * + +Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require +'perl5db.pl'}> if not present) before the first line of your program. + +=item * + +The array C<@{"_<$filename"}> holds the lines of $filename for all +files compiled by Perl. The same for C<eval>ed strings that contain +subroutines, or which are currently being executed. The $filename +for C<eval>ed strings looks like C<(eval 34)>. Code assertions +in regexes look like C<(re_eval 19)>. + +=item * + +The hash C<%{"_<$filename"}> contains breakpoints and actions keyed +by line number. Individual entries (as opposed to the whole hash) +are settable. Perl only cares about Boolean true here, although +the values used by F<perl5db.pl> have the form +C<"$break_condition\0$action">. Values in this hash are magical +in numeric context: they are zeros if the line is not breakable. + +The same holds for evaluated strings that contain subroutines, or +which are currently being executed. The $filename for C<eval>ed strings +looks like C<(eval 34)> or C<(re_eval 19)>. + +=item * + +The scalar C<${"_<$filename"}> contains C<"_<$filename">. This is +also the case for evaluated strings that contain subroutines, or +which are currently being executed. The $filename for C<eval>ed +strings looks like C<(eval 34)> or C<(re_eval 19)>. + +=item * + +After each C<require>d file is compiled, but before it is executed, +C<DB::postponed(*{"_<$filename"})> is called if the subroutine +C<DB::postponed> exists. Here, the $filename is the expanded name of +the C<require>d file, as found in the values of %INC. + +=item * + +After each subroutine C<subname> is compiled, the existence of +C<$DB::postponed{subname}> is checked. If this key exists, +C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine +also exists. + +=item * + +A hash C<%DB::sub> is maintained, whose keys are subroutine names +and whose values have the form C<filename:startline-endline>. +C<filename> has the form C<(eval 34)> for subroutines defined inside +C<eval>s, or C<(re_eval 19)> for those within regex code assertions. + +=item * + +When the execution of your program reaches a point that can hold a +breakpoint, the C<DB::DB()> subroutine is called any of the variables +$DB::trace, $DB::single, or $DB::signal is true. These variables +are not C<local>izable. This feature is disabled when executing +inside C<DB::DB()>, including functions called from it +unless C<< $^D & (1<<30) >> is true. + +=item * + +When execution of the program reaches a subroutine call, a call to +C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the +name of the called subroutine. This doesn't happen if the subroutine +was compiled in the C<DB> package.) + +=back + +Note that if C<&DB::sub> needs external data for it to work, no +subroutine call is possible until this is done. For the standard +debugger, the C<$DB::deep> variable (how many levels of recursion +deep into the debugger you can go before a mandatory break) gives +an example of such a dependency. + +=head2 Writing Your Own Debugger + +The minimal working debugger consists of one line + + sub DB::DB {} + +which is quite handy as contents of C<PERL5DB> environment +variable: + + $ PERL5DB="sub DB::DB {}" perl -d your-script + +Another brief debugger, slightly more useful, could be created +with only the line: + + sub DB::DB {print ++$i; scalar <STDIN>} + +This debugger would print the sequential number of encountered +statement, and would wait for you to hit a newline before continuing. + +The following debugger is quite functional: + + { + package DB; + sub DB {} + sub sub {print ++$i, " $sub\n"; &$sub} + } + +It prints the sequential number of subroutine call and the name of the +called subroutine. Note that C<&DB::sub> should be compiled into the +package C<DB>. + +At the start, the debugger reads your rc file (F<./.perldb> or +F<~/.perldb> under Unix), which can set important options. This file may +define a subroutine C<&afterinit> to be executed after the debugger is +initialized. + +After the rc file is read, the debugger reads the PERLDB_OPTS +environment variable and parses this as the remainder of a C<O ...> +line as one might enter at the debugger prompt. + +The debugger also maintains magical internal variables, such as +C<@DB::dbline>, C<%DB::dbline>, which are aliases for +C<@{"::_<current_file"}> C<%{"::_<current_file"}>. Here C<current_file> +is the currently selected file, either explicitly chosen with the +debugger's C<f> command, or implicitly by flow of execution. + +Some functions are provided to simplify customization. See +L<perldebug/"Options"> for description of options parsed by +C<DB::parse_options(string)>. The function C<DB::dump_trace(skip[, +count])> skips the specified number of frames and returns a list +containing information about the calling frames (all of them, if +C<count> is missing). Each entry is reference to a a hash with +keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine +name, or info about C<eval>), C<args> (C<undef> or a reference to +an array), C<file>, and C<line>. + +The function C<DB::print_trace(FH, skip[, count[, short]])> prints +formatted info about caller frames. The last two functions may be +convenient as arguments to C<< < >>, C<< << >> commands. + +Note that any variables and functions that are not documented in +this manpages (or in L<perldebug>) are considered for internal +use only, and as such are subject to change without notice. + +=head1 Frame Listing Output Examples + +The C<frame> option can be used to control the output of frame +information. For example, contrast this expression trace: + + $ perl -de 42 + Stack dump during die enabled outside of evals. + + Loading DB routines from perl5db.pl patch level 0.94 + Emacs support available. + + Enter h or `h h' for help. + + main::(-e:1): 0 + DB<1> sub foo { 14 } + + DB<2> sub bar { 3 } + + DB<3> t print foo() * bar() + main::((eval 172):3): print foo() + bar(); + main::foo((eval 168):2): + main::bar((eval 170):2): + 42 + +with this one, once the C<O>ption C<frame=2> has been set: + + DB<4> O f=2 + frame = '2' + DB<5> t print foo() * bar() + 3: foo() * bar() + entering main::foo + 2: sub foo { 14 }; + exited main::foo + entering main::bar + 2: sub bar { 3 }; + exited main::bar + 42 + +By way of demonstration, we present below a laborious listing +resulting from setting your C<PERLDB_OPTS> environment variable to +the value C<f=n N>, and running I<perl -d -V> from the command line. +Examples use various values of C<n> are shown to give you a feel +for the difference between settings. Long those it may be, this +is not a complete listing, but only excerpts. + +=over 4 + +=item 1 + + entering main::BEGIN + entering Config::BEGIN + Package lib/Exporter.pm. + Package lib/Carp.pm. + Package lib/Config.pm. + entering Config::TIEHASH + entering Exporter::import + entering Exporter::export + entering Config::myconfig + entering Config::FETCH + entering Config::FETCH + entering Config::FETCH + entering Config::FETCH + +=item 2 + + entering main::BEGIN + entering Config::BEGIN + Package lib/Exporter.pm. + Package lib/Carp.pm. + exited Config::BEGIN + Package lib/Config.pm. + entering Config::TIEHASH + exited Config::TIEHASH + entering Exporter::import + entering Exporter::export + exited Exporter::export + exited Exporter::import + exited main::BEGIN + entering Config::myconfig + entering Config::FETCH + exited Config::FETCH + entering Config::FETCH + exited Config::FETCH + entering Config::FETCH + +=item 4 + + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 + +=item 6 + + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + out $=Config::BEGIN() from lib/Config.pm:0 + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + out $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + out $=main::BEGIN() from /dev/null:0 + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 + +=item 14 + + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + out $=Config::BEGIN() from lib/Config.pm:0 + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + out $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + out $=main::BEGIN() from /dev/null:0 + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 + out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 + in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 + out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 + +=item 30 + + in $=CODE(0x15eca4)() from /dev/null:0 + in $=CODE(0x182528)() from lib/Config.pm:2 + Package lib/Exporter.pm. + out $=CODE(0x182528)() from lib/Config.pm:0 + scalar context return from CODE(0x182528): undef + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:628 + out $=Config::TIEHASH('Config') from lib/Config.pm:628 + scalar context return from Config::TIEHASH: empty hash + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 + scalar context return from Exporter::export: '' + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + scalar context return from Exporter::import: '' + +=back + +In all cases shown above, the line indentation shows the call tree. +If bit 2 of C<frame> is set, a line is printed on exit from a +subroutine as well. If bit 4 is set, the arguments are printed +along with the caller info. If bit 8 is set, the arguments are +printed even if they are tied or references. If bit 16 is set, the +return value is printed, too. + +When a package is compiled, a line like this + + Package lib/Carp.pm. + +is printed with proper indentation. + +=head1 Debugging regular expressions + +There are two ways to enable debugging output for regular expressions. + +If your perl is compiled with C<-DDEBUGGING>, you may use the +B<-Dr> flag on the command line. + +Otherwise, one can C<use re 'debug'>, which has effects at +compile time and run time. It is not lexically scoped. + +=head2 Compile-time output + +The debugging output at compile time looks like this: + + compiling RE `[bc]d(ef*g)+h[ij]k$' + size 43 first at 1 + 1: ANYOF(11) + 11: EXACT <d>(13) + 13: CURLYX {1,32767}(27) + 15: OPEN1(17) + 17: EXACT <e>(19) + 19: STAR(22) + 20: EXACT <f>(0) + 22: EXACT <g>(24) + 24: CLOSE1(26) + 26: WHILEM(0) + 27: NOTHING(28) + 28: EXACT <h>(30) + 30: ANYOF(40) + 40: EXACT <k>(42) + 42: EOL(43) + 43: END(0) + anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) + stclass `ANYOF' minlen 7 + +The first line shows the pre-compiled form of the regex. The second +shows the size of the compiled form (in arbitrary units, usually +4-byte words) and the label I<id> of the first node that does a +match. + +The last line (split into two lines above) contains optimizer +information. In the example shown, the optimizer found that the match +should contain a substring C<de> at offset 1, plus substring C<gh> +at some offset between 3 and infinity. Moreover, when checking for +these substrings (to abandon impossible matches quickly), Perl will check +for the substring C<gh> before checking for the substring C<de>. The +optimizer may also use the knowledge that the match starts (at the +C<first> I<id>) with a character class, and the match cannot be +shorter than 7 chars. + +The fields of interest which may appear in the last line are + +=over + +=item C<anchored> I<STRING> C<at> I<POS> + +=item C<floating> I<STRING> C<at> I<POS1..POS2> + +See above. + +=item C<matching floating/anchored> + +Which substring to check first. + +=item C<minlen> + +The minimal length of the match. + +=item C<stclass> I<TYPE> + +Type of first matching node. + +=item C<noscan> + +Don't scan for the found substrings. + +=item C<isall> + +Means that the optimizer info is all that the regular +expression contains, and thus one does not need to enter the regex engine at +all. + +=item C<GPOS> + +Set if the pattern contains C<\G>. + +=item C<plus> + +Set if the pattern starts with a repeated char (as in C<x+y>). + +=item C<implicit> + +Set if the pattern starts with C<.*>. + +=item C<with eval> + +Set if the pattern contain eval-groups, such as C<(?{ code })> and +C<(??{ code })>. + +=item C<anchored(TYPE)> + +If the pattern may match only at a handful of places, (with C<TYPE> +being C<BOL>, C<MBOL>, or C<GPOS>. See the table below. + +=back + +If a substring is known to match at end-of-line only, it may be +followed by C<$>, as in C<floating `k'$>. + +The optimizer-specific info is used to avoid entering (a slow) regex +engine on strings that will not definitely match. If C<isall> flag +is set, a call to the regex engine may be avoided even when the optimizer +found an appropriate place for the match. + +The rest of the output contains the list of I<nodes> of the compiled +form of the regex. Each line has format + +C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) + +=head2 Types of nodes + +Here are the possible types, with short descriptions: + + # TYPE arg-description [num-args] [longjump-len] DESCRIPTION + + # Exit points + END no End of program. + SUCCEED no Return from a subroutine, basically. + + # Anchors: + BOL no Match "" at beginning of line. + MBOL no Same, assuming multiline. + SBOL no Same, assuming singleline. + EOS no Match "" at end of string. + EOL no Match "" at end of line. + MEOL no Same, assuming multiline. + SEOL no Same, assuming singleline. + BOUND no Match "" at any word boundary + BOUNDL no Match "" at any word boundary + NBOUND no Match "" at any word non-boundary + NBOUNDL no Match "" at any word non-boundary + GPOS no Matches where last m//g left off. + + # [Special] alternatives + ANY no Match any one character (except newline). + SANY no Match any one character. + ANYOF sv Match character in (or not in) this class. + ALNUM no Match any alphanumeric character + ALNUML no Match any alphanumeric char in locale + NALNUM no Match any non-alphanumeric character + NALNUML no Match any non-alphanumeric char in locale + SPACE no Match any whitespace character + SPACEL no Match any whitespace char in locale + NSPACE no Match any non-whitespace character + NSPACEL no Match any non-whitespace char in locale + DIGIT no Match any numeric character + NDIGIT no Match any non-numeric character + + # BRANCH The set of branches constituting a single choice are hooked + # together with their "next" pointers, since precedence prevents + # anything being concatenated to any individual branch. The + # "next" pointer of the last BRANCH in a choice points to the + # thing following the whole choice. This is also where the + # final "next" pointer of each individual branch points; each + # branch starts with the operand node of a BRANCH node. + # + BRANCH node Match this alternative, or the next... + + # BACK Normal "next" pointers all implicitly point forward; BACK + # exists to make loop structures possible. + # not used + BACK no Match "", "next" ptr points backward. + + # Literals + EXACT sv Match this string (preceded by length). + EXACTF sv Match this string, folded (prec. by length). + EXACTFL sv Match this string, folded in locale (w/len). + + # Do nothing + NOTHING no Match empty string. + # A variant of above which delimits a group, thus stops optimizations + TAIL no Match empty string. Can jump here from outside. + + # STAR,PLUS '?', and complex '*' and '+', are implemented as circular + # BRANCH structures using BACK. Simple cases (one character + # per match) are implemented with STAR and PLUS for speed + # and to minimize recursive plunges. + # + STAR node Match this (simple) thing 0 or more times. + PLUS node Match this (simple) thing 1 or more times. + + CURLY sv 2 Match this simple thing {n,m} times. + CURLYN no 2 Match next-after-this simple thing + # {n,m} times, set parens. + CURLYM no 2 Match this medium-complex thing {n,m} times. + CURLYX sv 2 Match this complex thing {n,m} times. + + # This terminator creates a loop structure for CURLYX + WHILEM no Do curly processing and see if rest matches. + + # OPEN,CLOSE,GROUPP ...are numbered at compile time. + OPEN num 1 Mark this point in input as start of #n. + CLOSE num 1 Analogous to OPEN. + + REF num 1 Match some already matched string + REFF num 1 Match already matched string, folded + REFFL num 1 Match already matched string, folded in loc. + + # grouping assertions + IFMATCH off 1 2 Succeeds if the following matches. + UNLESSM off 1 2 Fails if the following matches. + SUSPEND off 1 1 "Independent" sub-regex. + IFTHEN off 1 1 Switch, should be preceded by switcher . + GROUPP num 1 Whether the group matched. + + # Support for long regex + LONGJMP off 1 1 Jump far away. + BRANCHJ off 1 1 BRANCH with long offset. + + # The heavy worker + EVAL evl 1 Execute some Perl code. + + # Modifiers + MINMOD no Next operator is not greedy. + LOGICAL no Next opcode should set the flag only. + + # This is not used yet + RENUM off 1 1 Group with independently numbered parens. + + # This is not really a node, but an optimized away piece of a "long" node. + # To simplify debugging output, we mark it as if it were a node + OPTIMIZED off Placeholder for dump. + +=head2 Run-time output + +First of all, when doing a match, one may get no run-time output even +if debugging is enabled. This means that the regex engine was never +entered and that all of the job was therefore done by the optimizer. + +If the regex engine was entered, the output may look like this: + + Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' + Setting an EVAL scope, savestack=3 + 2 <ab> <cdefg__gh_> | 1: ANYOF + 3 <abc> <defg__gh_> | 11: EXACT <d> + 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} + 4 <abcd> <efg__gh_> | 26: WHILEM + 0 out of 1..32767 cc=effff31c + 4 <abcd> <efg__gh_> | 15: OPEN1 + 4 <abcd> <efg__gh_> | 17: EXACT <e> + 5 <abcde> <fg__gh_> | 19: STAR + EXACT <f> can match 1 times out of 32767... + Setting an EVAL scope, savestack=3 + 6 <bcdef> <g__gh__> | 22: EXACT <g> + 7 <bcdefg> <__gh__> | 24: CLOSE1 + 7 <bcdefg> <__gh__> | 26: WHILEM + 1 out of 1..32767 cc=effff31c + Setting an EVAL scope, savestack=12 + 7 <bcdefg> <__gh__> | 15: OPEN1 + 7 <bcdefg> <__gh__> | 17: EXACT <e> + restoring \1 to 4(4)..7 + failed, try continuation... + 7 <bcdefg> <__gh__> | 27: NOTHING + 7 <bcdefg> <__gh__> | 28: EXACT <h> + failed... + failed... + +The most significant information in the output is about the particular I<node> +of the compiled regex that is currently being tested against the target string. +The format of these lines is + +C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> + +The I<TYPE> info is indented with respect to the backtracking level. +Other incidental information appears interspersed within. + +=head1 Debugging Perl memory usage + +Perl is a profligate wastrel when it comes to memory use. There +is a saying that to estimate memory usage of Perl, assume a reasonable +algorithm for memory allocation, multiply that estimate by 10, and +while you still may miss the mark, at least you won't be quite so +astonished. This is not absolutely true, but may prvide a good +grasp of what happens. + +Assume that an integer cannot take less than 20 bytes of memory, a +float cannot take less than 24 bytes, a string cannot take less +than 32 bytes (all these examples assume 32-bit architectures, the +result are quite a bit worse on 64-bit architectures). If a variable +is accessed in two of three different ways (which require an integer, +a float, or a string), the memory footprint may increase yet another +20 bytes. A sloppy malloc(3) implementation can make inflate these +numbers dramatically. + +On the opposite end of the scale, a declaration like + + sub foo; + +may take up to 500 bytes of memory, depending on which release of Perl +you're running. + +Anecdotal estimates of source-to-compiled code bloat suggest an +eightfold increase. This means that the compiled form of reasonable +(normally commented, properly indented etc.) code will take +about eight times more space in memory than the code took +on disk. + +There are two Perl-specific ways to analyze memory usage: +$ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first +is available only if Perl is compiled with Perl's malloc(); the +second only if Perl was built with C<-DDEBUGGING>. See the +instructions for how to do this in the F<INSTALL> podpage at +the top level of the Perl source tree. + +=head2 Using C<$ENV{PERL_DEBUG_MSTATS}> + +If your perl is using Perl's malloc() and was compiled with the +necessary switches (this is the default), then it will print memory +usage statistics after compiling your code hwen C<< $ENV{PERL_DEBUG_MSTATS} +> 1 >>, and before termination of the program when C<< +$ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to +the following example: + + $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" + Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) + 14216 free: 130 117 28 7 9 0 2 2 1 0 0 + 437 61 36 0 5 + 60924 used: 125 137 161 55 7 8 6 16 2 0 1 + 74 109 304 84 20 + Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. + Memory allocation statistics after execution: (buckets 4(4)..8188(8192) + 30888 free: 245 78 85 13 6 2 1 3 2 0 1 + 315 162 39 42 11 + 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 + 196 178 1066 798 39 + Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. + +It is possible to ask for such a statistic at arbitrary points in +your execution using the mstats() function out of the standard +Devel::Peek module. + +Here is some explanation of that format: + +=over + +=item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> + +Perl's malloc() uses bucketed allocations. Every request is rounded +up to the closest bucket size available, and a bucket is taken from +the pool of buckets of that size. + +The line above describes the limits of buckets currently in use. +Each bucket has two sizes: memory footprint and the maximal size +of user data that can fit into this bucket. Suppose in the above +example that the smallest bucket were size 4. The biggest bucket +would have usable size 8188, and the memory footprint would be 8192. + +In a Perl built for debugging, some buckets may have negative usable +size. This means that these buckets cannot (and will not) be used. +For larger buckets, the memory footprint may be one page greater +than a power of 2. If so, case the corresponding power of two is +printed in the C<APPROX> field above. + +=item Free/Used + +The 1 or 2 rows of numbers following that correspond to the number +of buckets of each size between C<SMALLEST> and C<GREATEST>. In +the first row, the sizes (memory footprints) of buckets are powers +of two--or possibly one page greater. In the second row, if present, +the memory footprints of the buckets are between the memory footprints +of two buckets "above". + +For example, suppose under the pervious example, the memory footprints +were + + free: 8 16 32 64 128 256 512 1024 2048 4096 8192 + 4 12 24 48 80 + +With non-C<DEBUGGING> perl, the buckets starting from C<128> have +a 4-byte overhead, and thus a 8192-long bucket may take up to +8188-byte allocations. + +=item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> + +The first two fields give the total amount of memory perl sbrk(2)ed +(ess-broken? :-) and number of sbrk(2)s used. The third number is +what perl thinks about continuity of returned chunks. So long as +this number is positive, malloc() will assume that it is probable +that sbrk(2) will provide continuous memory. + +Memory allocated by external libraries is not counted. + +=item C<pad: 0> + +The amount of sbrk(2)ed memory needed to keep buckets aligned. + +=item C<heads: 2192> + +Although memory overhead of bigger buckets is kept inside the bucket, for +smaller buckets, it is kept in separate areas. This field gives the +total size of these areas. + +=item C<chain: 0> + +malloc() may want to subdivide a bigger bucket into smaller buckets. +If only a part of the deceased bucket is left unsubdivided, the rest +is kept as an element of a linked list. This field gives the total +size of these chunks. + +=item C<tail: 6144> + +To minimize the number of sbrk(2)s, malloc() asks for more memory. This +field gives the size of the yet unused part, which is sbrk(2)ed, but +never touched. + +=back + +=head2 Example of using B<-DL> switch + +Below we show how to analyse memory usage by + + do 'lib/auto/POSIX/autosplit.ix'; + +The file in question contains a header and 146 lines similar to + + sub getcwd; + +B<WARNING>: The discussion below supposes 32-bit architecture. In +newer releases of Perl, memory usage of the constructs discussed +here is greatly improved, but the story discussed below is a real-life +story. This story is mercilessly terse, and assumes rather more than cursory +knowledge of Perl internals. Type space to continue, `q' to quit. +(Actually, you just want to skip to the next section.) + +Here is the itemized list of Perl allocations performed during parsing +of this file: + + !!! "after" at test.pl line 3. + Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+ + 0 02 13752 . . . . 294 . . . . . . . . . . 4 + 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3 + 5 05 32 . . . . . . . 1 . . . . . . . . + 6 02 7152 . . . . . . . . . . 149 . . . . . + 7 02 3600 . . . . . 150 . . . . . . . . . . + 7 03 64 . -1 . 1 . . 2 . . . . . . . . . + 7 04 7056 . . . . . . . . . . . . . . . 7 + 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 . + 9 03 2078 17 249 32 . . . . 2 . . . . . . . . + + +To see this list, insert two C<warn('!...')> statements around the call: + + warn('!'); + do 'lib/auto/POSIX/autosplit.ix'; + warn('!!! "after"'); + +and run it with PErl's B<-DL> option. The first warn() will print +memory allocation info before parsing the file and will memorize +the statistics at this point (we ignore what it prints). The second +warn() prints increments with respect to these memorized data. This +is the printout shown above. + +Different I<Id>s on the left correspond to different subsystems of +the perl interpreter. They are just the first argument given to +the perl memory allocation API named New(). To find what C<9 03> +means, just B<grep> the perl source for C<903>. You'll find it in +F<util.c>, function savepvn(). (I know, you wonder why we told you +to B<grep> and then gave away the answer. That's because grepping +the source is good for the soul.) This function is used to store +a copy of an existing chunk of memory. Using a C debugger, one can +see that the function was called either directly from gv_init() or +via sv_magic(), and that gv_init() is called from gv_fetchpv()--which +was itself called from newSUB(). Please stop to catch your breath now. + +B<NOTE>: To reach this point in the debugger and skip the calls to +savepvn() during the compilation of the main program, you should +set a C breakpoint +in Perl_warn(), continue until this point is reached, and I<then> set +a C breakpoint in Perl_savepvn(). Note that you may need to skip a +handful of Perl_savepvn() calls that do not correspond to mass production +of CVs (there are more C<903> allocations than 146 similar lines of +F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are +added by macroization code in perl header files to avoid conflicts +with external libraries. + +Anyway, we see that C<903> ids correspond to creation of globs, twice +per glob - for glob name, and glob stringification magic. + +Here are explanations for other I<Id>s above: + +=over + +=item C<717> + +CReates bigger C<XPV*> structures. In the case above, it +creates 3 C<AV>s per subroutine, one for a list of lexical variable +names, one for a scratchpad (which contains lexical variables and +C<targets>), and one for the array of scratchpads needed for +recursion. + +It also creates a C<GV> and a C<CV> per subroutine, all called from +start_subparse(). + +=item C<002> + +Creates a C array corresponding to the C<AV> of scratchpads and the +scratchpad itself. The first fake entry of this scratchpad is +created though the subroutine itself is not defined yet. + +It also creates C arrays to keep data for the stash. This is one HV, +but it grows; thus, there are 4 big allocations: the big chunks are not +freed, but are kept as additional arenas for C<SV> allocations. + +=item C<054> + +Creates a C<HEK> for the name of the glob for the subroutine. This +name is a key in a I<stash>. + +Big allocations with this I<Id> correspond to allocations of new +arenas to keep C<HE>. + +=item C<602> + +Creates a C<GP> for the glob for the subroutine. + +=item C<702> + +Creates the C<MAGIC> for the glob for the subroutine. + +=item C<704> + +Creates I<arenas> which keep SVs. + +=back + +=head2 B<-DL> details + +If Perl is run with B<-DL> option, then warn()s that start with `!' +behave specially. They print a list of I<categories> of memory +allocations, and statistics of allocations of different sizes for +these categories. + +If warn() string starts with + +=over + +=item C<!!!> + +print changed categories only, print the differences in counts of allocations. + +=item C<!!> + +print grown categories only; print the absolute values of counts, and totals. + +=item C<!> + +print nonempty categories, print the absolute values of counts and totals. + +=back + +=head2 Limitations of B<-DL> statistics + +If an extension or external library does not use the Perl API to +allocate memory, such allocations are not counted. + +=head1 SEE ALSO + +L<perldebug>, +L<perlguts>, +L<perlrun> +L<re>, +and +L<Devel::Dprof>. |