patches for many bugs in the debugger; documentation updates for

perldelta; split perldebug.pod into perldeb{ug,guts}.pod (from Tom Christiansen) p4raw-id: //depot/perl@5723
author: Gurusamy Sarathy <gsar@cpan.org> 2000-03-14 05:49:08 +0000
committer: Gurusamy Sarathy <gsar@cpan.org> 2000-03-14 05:49:08 +0000
commit: 055fd3a96a4b067d75446c3d47ffc318e9acc40d (patch)
tree: b6449a19782d8aa2703033c9338c80210f4189eb /pod/perldebguts.pod
parent: e3e876cf806e9a3bb353ac41418f1f80df999716 (diff)
download: perl-055fd3a96a4b067d75446c3d47ffc318e9acc40d.tar.gz
1 files changed, 923 insertions, 0 deletions
diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod
new file mode 100644
index 0000000000..b74f3efb6b
--- /dev/null
+++ b/pod/perldebguts.pod
@@ -0,0 +1,923 @@
+=head1 NAME
+
+perldebguts - Guts of Perl debugging 
+
+=head1 DESCRIPTION
+
+This is not the perldebug(1) manpage, which tells you how to use
+the debugger.  This manpage describes low-level details ranging
+between difficult and impossible for anyone who isn't incredibly
+intimate with Perl's guts to understand.  Caveat lector.
+
+=head1 Debugger Internals
+
+Perl has special debugging hooks at compile-time and run-time used
+to create debugging environments.  These hooks are not to be confused
+with the I<perl -Dxxx> command described in L<perlrun>, which are
+usable only if a special Perl built per the instructions the
+F<INSTALL> podpage in the Perl source tree.
+
+For example, whenever you call Perl's built-in C<caller> function
+from the package DB, the arguments that the corresponding stack
+frame was called with are copied to the the @DB::args array.  The
+general mechanisms is enabled by calling Perl with the B<-d> switch, the
+following additional features are enabled (cf. L<perlvar/$^P>):
+
+=over
+
+=item *
+
+Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
+'perl5db.pl'}> if not present) before the first line of your program.
+
+=item *
+
+The array C<@{"_<$filename"}> holds the lines of $filename for all
+files compiled by Perl.  The same for C<eval>ed strings that contain
+subroutines, or which are currently being executed.  The $filename
+for C<eval>ed strings looks like C<(eval 34)>.   Code assertions
+in regexes look like C<(re_eval 19)>.
+
+=item *
+
+The hash C<%{"_<$filename"}> contains breakpoints and actions keyed
+by line number.  Individual entries (as opposed to the whole hash)
+are settable.  Perl only cares about Boolean true here, although
+the values used by F<perl5db.pl> have the form
+C<"$break_condition\0$action">.  Values in this hash are magical
+in numeric context: they are zeros if the line is not breakable.
+
+The same holds for evaluated strings that contain subroutines, or
+which are currently being executed.  The $filename for C<eval>ed strings
+looks like C<(eval 34)> or  C<(re_eval 19)>.
+
+=item *
+
+The scalar C<${"_<$filename"}> contains C<"_<$filename">.  This is
+also the case for evaluated strings that contain subroutines, or
+which are currently being executed.  The $filename for C<eval>ed
+strings looks like C<(eval 34)> or C<(re_eval 19)>.
+
+=item *
+
+After each C<require>d file is compiled, but before it is executed,
+C<DB::postponed(*{"_<$filename"})> is called if the subroutine
+C<DB::postponed> exists.  Here, the $filename is the expanded name of
+the C<require>d file, as found in the values of %INC.
+
+=item *
+
+After each subroutine C<subname> is compiled, the existence of
+C<$DB::postponed{subname}> is checked.  If this key exists,
+C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
+also exists.
+
+=item *
+
+A hash C<%DB::sub> is maintained, whose keys are subroutine names
+and whose values have the form C<filename:startline-endline>.
+C<filename> has the form C<(eval 34)> for subroutines defined inside
+C<eval>s, or C<(re_eval 19)> for those within regex code assertions.
+
+=item *
+
+When the execution of your program reaches a point that can hold a
+breakpoint, the C<DB::DB()> subroutine is called any of the variables
+$DB::trace, $DB::single, or $DB::signal is true.  These variables
+are not C<local>izable.  This feature is disabled when executing
+inside C<DB::DB()>, including functions called from it 
+unless C<< $^D & (1<<30) >> is true.
+
+=item *
+
+When execution of the program reaches a subroutine call, a call to
+C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
+name of the called subroutine.  This doesn't happen if the subroutine
+was compiled in the C<DB> package.)
+
+=back
+
+Note that if C<&DB::sub> needs external data for it to work, no
+subroutine call is possible until this is done.  For the standard
+debugger, the  C<$DB::deep> variable (how many levels of recursion
+deep into the debugger you can go before a mandatory break) gives
+an example of such a dependency.
+
+=head2 Writing Your Own Debugger
+
+The minimal working debugger consists of one line
+
+  sub DB::DB {}
+
+which is quite handy as contents of C<PERL5DB> environment
+variable:
+
+  $ PERL5DB="sub DB::DB {}" perl -d your-script
+
+Another brief debugger, slightly more useful, could be created
+with only the line:
+
+  sub DB::DB {print ++$i; scalar <STDIN>}
+
+This debugger would print the sequential number of encountered
+statement, and would wait for you to hit a newline before continuing.
+
+The following debugger is quite functional:
+
+  {
+    package DB;
+    sub DB  {}
+    sub sub {print ++$i, " $sub\n"; &$sub}
+  }
+
+It prints the sequential number of subroutine call and the name of the
+called subroutine.  Note that C<&DB::sub> should be compiled into the
+package C<DB>.
+
+At the start, the debugger reads your rc file (F<./.perldb> or
+F<~/.perldb> under Unix), which can set important options.  This file may
+define a subroutine C<&afterinit> to be executed after the debugger is
+initialized.
+
+After the rc file is read, the debugger reads the PERLDB_OPTS
+environment variable and parses this as the remainder of a C<O ...>
+line as one might enter at the debugger prompt.
+
+The debugger also maintains magical internal variables, such as
+C<@DB::dbline>, C<%DB::dbline>, which are aliases for
+C<@{"::_<current_file"}> C<%{"::_<current_file"}>.  Here C<current_file>
+is the currently selected file, either explicitly chosen with the
+debugger's C<f> command, or implicitly by flow of execution.
+
+Some functions are provided to simplify customization.  See
+L<perldebug/"Options"> for description of options parsed by
+C<DB::parse_options(string)>.  The function C<DB::dump_trace(skip[,
+count])> skips the specified number of frames and returns a list
+containing information about the calling frames (all of them, if
+C<count> is missing).  Each entry is reference to a a hash with
+keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
+name, or info about C<eval>), C<args> (C<undef> or a reference to
+an array), C<file>, and C<line>.
+
+The function C<DB::print_trace(FH, skip[, count[, short]])> prints
+formatted info about caller frames.  The last two functions may be
+convenient as arguments to C<< < >>, C<< << >> commands.
+
+Note that any variables and functions that are not documented in
+this manpages (or in L<perldebug>) are considered for internal   
+use only, and as such are subject to change without notice.
+
+=head1 Frame Listing Output Examples
+
+The C<frame> option can be used to control the output of frame 
+information.  For example, contrast this expression trace:
+
+ $ perl -de 42
+ Stack dump during die enabled outside of evals.
+
+ Loading DB routines from perl5db.pl patch level 0.94
+ Emacs support available.
+
+ Enter h or `h h' for help.
+
+ main::(-e:1):   0
+   DB<1> sub foo { 14 }
+
+   DB<2> sub bar { 3 }
+
+   DB<3> t print foo() * bar()
+ main::((eval 172):3):   print foo() + bar();
+ main::foo((eval 168):2):
+ main::bar((eval 170):2):
+ 42
+
+with this one, once the C<O>ption C<frame=2> has been set:
+
+   DB<4> O f=2
+                frame = '2'
+   DB<5> t print foo() * bar()
+ 3:      foo() * bar()
+ entering main::foo
+  2:     sub foo { 14 };
+ exited main::foo
+ entering main::bar
+  2:     sub bar { 3 };
+ exited main::bar
+ 42
+
+By way of demonstration, we present below a laborious listing
+resulting from setting your C<PERLDB_OPTS> environment variable to
+the value C<f=n N>, and running I<perl -d -V> from the command line.
+Examples use various values of C<n> are shown to give you a feel
+for the difference between settings.  Long those it may be, this
+is not a complete listing, but only excerpts.
+
+=over 4
+
+=item 1
+
+  entering main::BEGIN
+   entering Config::BEGIN
+    Package lib/Exporter.pm.
+    Package lib/Carp.pm.
+   Package lib/Config.pm.
+   entering Config::TIEHASH
+   entering Exporter::import
+    entering Exporter::export
+  entering Config::myconfig
+   entering Config::FETCH
+   entering Config::FETCH
+   entering Config::FETCH
+   entering Config::FETCH
+
+=item 2
+
+  entering main::BEGIN
+   entering Config::BEGIN
+    Package lib/Exporter.pm.
+    Package lib/Carp.pm.
+   exited Config::BEGIN
+   Package lib/Config.pm.
+   entering Config::TIEHASH
+   exited Config::TIEHASH
+   entering Exporter::import
+    entering Exporter::export
+    exited Exporter::export
+   exited Exporter::import
+  exited main::BEGIN
+  entering Config::myconfig
+   entering Config::FETCH
+   exited Config::FETCH
+   entering Config::FETCH
+   exited Config::FETCH
+   entering Config::FETCH
+
+=item 4
+
+  in  $=main::BEGIN() from /dev/null:0
+   in  $=Config::BEGIN() from lib/Config.pm:2
+    Package lib/Exporter.pm.
+    Package lib/Carp.pm.
+   Package lib/Config.pm.
+   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
+   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
+  in  @=Config::myconfig() from /dev/null:0
+   in  $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
+
+=item 6
+
+  in  $=main::BEGIN() from /dev/null:0
+   in  $=Config::BEGIN() from lib/Config.pm:2
+    Package lib/Exporter.pm.
+    Package lib/Carp.pm.
+   out $=Config::BEGIN() from lib/Config.pm:0
+   Package lib/Config.pm.
+   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
+   out $=Config::TIEHASH('Config') from lib/Config.pm:644
+   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
+    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
+   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+  out $=main::BEGIN() from /dev/null:0
+  in  @=Config::myconfig() from /dev/null:0
+   in  $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
+   out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
+   out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
+   out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
+   in  $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
+
+=item 14
+
+  in  $=main::BEGIN() from /dev/null:0
+   in  $=Config::BEGIN() from lib/Config.pm:2
+    Package lib/Exporter.pm.
+    Package lib/Carp.pm.
+   out $=Config::BEGIN() from lib/Config.pm:0
+   Package lib/Config.pm.
+   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
+   out $=Config::TIEHASH('Config') from lib/Config.pm:644
+   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
+    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
+   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+  out $=main::BEGIN() from /dev/null:0
+  in  @=Config::myconfig() from /dev/null:0
+   in  $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
+   out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
+   in  $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
+   out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
+
+=item 30
+
+  in  $=CODE(0x15eca4)() from /dev/null:0
+   in  $=CODE(0x182528)() from lib/Config.pm:2
+    Package lib/Exporter.pm.
+   out $=CODE(0x182528)() from lib/Config.pm:0
+   scalar context return from CODE(0x182528): undef
+   Package lib/Config.pm.
+   in  $=Config::TIEHASH('Config') from lib/Config.pm:628
+   out $=Config::TIEHASH('Config') from lib/Config.pm:628
+   scalar context return from Config::TIEHASH:   empty hash
+   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
+    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
+    scalar context return from Exporter::export: ''
+   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
+   scalar context return from Exporter::import: ''
+
+=back
+
+In all cases shown above, the line indentation shows the call tree.
+If bit 2 of C<frame> is set, a line is printed on exit from a
+subroutine as well.  If bit 4 is set, the arguments are printed
+along with the caller info.  If bit 8 is set, the arguments are
+printed even if they are tied or references.  If bit 16 is set, the
+return value is printed, too.
+
+When a package is compiled, a line like this
+
+    Package lib/Carp.pm.
+
+is printed with proper indentation.
+
+=head1 Debugging regular expressions
+
+There are two ways to enable debugging output for regular expressions.
+
+If your perl is compiled with C<-DDEBUGGING>, you may use the
+B<-Dr> flag on the command line.
+
+Otherwise, one can C<use re 'debug'>, which has effects at
+compile time and run time.  It is not lexically scoped.
+
+=head2 Compile-time output
+
+The debugging output at compile time looks like this:
+
+  compiling RE `[bc]d(ef*g)+h[ij]k$'
+  size 43 first at 1
+     1: ANYOF(11)
+    11: EXACT <d>(13)
+    13: CURLYX {1,32767}(27)
+    15:   OPEN1(17)
+    17:     EXACT <e>(19)
+    19:     STAR(22)
+    20:       EXACT <f>(0)
+    22:     EXACT <g>(24)
+    24:   CLOSE1(26)
+    26:   WHILEM(0)
+    27: NOTHING(28)
+    28: EXACT <h>(30)
+    30: ANYOF(40)
+    40: EXACT <k>(42)
+    42: EOL(43)
+    43: END(0)
+  anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
+				    stclass `ANYOF' minlen 7
+
+The first line shows the pre-compiled form of the regex.  The second
+shows the size of the compiled form (in arbitrary units, usually
+4-byte words) and the label I<id> of the first node that does a
+match.
+
+The last line (split into two lines above) contains optimizer
+information.  In the example shown, the optimizer found that the match 
+should contain a substring C<de> at offset 1, plus substring C<gh>
+at some offset between 3 and infinity.  Moreover, when checking for
+these substrings (to abandon impossible matches quickly), Perl will check
+for the substring C<gh> before checking for the substring C<de>.  The
+optimizer may also use the knowledge that the match starts (at the
+C<first> I<id>) with a character class, and the match cannot be
+shorter than 7 chars.
+
+The fields of interest which may appear in the last line are
+
+=over
+
+=item C<anchored> I<STRING> C<at> I<POS>
+
+=item C<floating> I<STRING> C<at> I<POS1..POS2>
+
+See above.
+
+=item C<matching floating/anchored>
+
+Which substring to check first.
+
+=item C<minlen>
+
+The minimal length of the match.
+
+=item C<stclass> I<TYPE>
+
+Type of first matching node.
+
+=item C<noscan>
+
+Don't scan for the found substrings.
+
+=item C<isall>
+
+Means that the optimizer info is all that the regular
+expression contains, and thus one does not need to enter the regex engine at
+all.
+
+=item C<GPOS>
+
+Set if the pattern contains C<\G>.
+
+=item C<plus> 
+
+Set if the pattern starts with a repeated char (as in C<x+y>).
+
+=item C<implicit>
+
+Set if the pattern starts with C<.*>.
+
+=item C<with eval> 
+
+Set if the pattern contain eval-groups, such as C<(?{ code })> and
+C<(??{ code })>.
+
+=item C<anchored(TYPE)>
+
+If the pattern may match only at a handful of places, (with C<TYPE>
+being C<BOL>, C<MBOL>, or C<GPOS>.  See the table below.
+
+=back
+
+If a substring is known to match at end-of-line only, it may be
+followed by C<$>, as in C<floating `k'$>.
+
+The optimizer-specific info is used to avoid entering (a slow) regex
+engine on strings that will not definitely match.  If C<isall> flag
+is set, a call to the regex engine may be avoided even when the optimizer
+found an appropriate place for the match.
+
+The rest of the output contains the list of I<nodes> of the compiled
+form of the regex.  Each line has format 
+
+C<   >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
+
+=head2 Types of nodes
+
+Here are the possible types, with short descriptions:
+
+    # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
+
+    # Exit points
+    END		no	End of program.
+    SUCCEED	no	Return from a subroutine, basically.
+
+    # Anchors:
+    BOL		no	Match "" at beginning of line.
+    MBOL	no	Same, assuming multiline.
+    SBOL	no	Same, assuming singleline.
+    EOS		no	Match "" at end of string.
+    EOL		no	Match "" at end of line.
+    MEOL	no	Same, assuming multiline.
+    SEOL	no	Same, assuming singleline.
+    BOUND	no	Match "" at any word boundary
+    BOUNDL	no	Match "" at any word boundary
+    NBOUND	no	Match "" at any word non-boundary
+    NBOUNDL	no	Match "" at any word non-boundary
+    GPOS	no	Matches where last m//g left off.
+
+    # [Special] alternatives
+    ANY		no	Match any one character (except newline).
+    SANY	no	Match any one character.
+    ANYOF	sv	Match character in (or not in) this class.
+    ALNUM	no	Match any alphanumeric character
+    ALNUML	no	Match any alphanumeric char in locale
+    NALNUM	no	Match any non-alphanumeric character
+    NALNUML	no	Match any non-alphanumeric char in locale
+    SPACE	no	Match any whitespace character
+    SPACEL	no	Match any whitespace char in locale
+    NSPACE	no	Match any non-whitespace character
+    NSPACEL	no	Match any non-whitespace char in locale
+    DIGIT	no	Match any numeric character
+    NDIGIT	no	Match any non-numeric character
+
+    # BRANCH	The set of branches constituting a single choice are hooked
+    #		together with their "next" pointers, since precedence prevents
+    #		anything being concatenated to any individual branch.  The
+    #		"next" pointer of the last BRANCH in a choice points to the
+    #		thing following the whole choice.  This is also where the
+    #		final "next" pointer of each individual branch points; each
+    #		branch starts with the operand node of a BRANCH node.
+    #
+    BRANCH	node	Match this alternative, or the next...
+
+    # BACK	Normal "next" pointers all implicitly point forward; BACK
+    #		exists to make loop structures possible.
+    # not used
+    BACK	no	Match "", "next" ptr points backward.
+
+    # Literals
+    EXACT	sv	Match this string (preceded by length).
+    EXACTF	sv	Match this string, folded (prec. by length).
+    EXACTFL	sv	Match this string, folded in locale (w/len).
+
+    # Do nothing
+    NOTHING	no	Match empty string.
+    # A variant of above which delimits a group, thus stops optimizations
+    TAIL	no	Match empty string. Can jump here from outside.
+
+    # STAR,PLUS	'?', and complex '*' and '+', are implemented as circular
+    #		BRANCH structures using BACK.  Simple cases (one character
+    #		per match) are implemented with STAR and PLUS for speed
+    #		and to minimize recursive plunges.
+    #
+    STAR	node	Match this (simple) thing 0 or more times.
+    PLUS	node	Match this (simple) thing 1 or more times.
+
+    CURLY	sv 2	Match this simple thing {n,m} times.
+    CURLYN	no 2	Match next-after-this simple thing 
+    #			{n,m} times, set parens.
+    CURLYM	no 2	Match this medium-complex thing {n,m} times.
+    CURLYX	sv 2	Match this complex thing {n,m} times.
+
+    # This terminator creates a loop structure for CURLYX
+    WHILEM	no	Do curly processing and see if rest matches.
+
+    # OPEN,CLOSE,GROUPP	...are numbered at compile time.
+    OPEN	num 1	Mark this point in input as start of #n.
+    CLOSE	num 1	Analogous to OPEN.
+
+    REF		num 1	Match some already matched string
+    REFF	num 1	Match already matched string, folded
+    REFFL	num 1	Match already matched string, folded in loc.
+
+    # grouping assertions
+    IFMATCH	off 1 2	Succeeds if the following matches.
+    UNLESSM	off 1 2	Fails if the following matches.
+    SUSPEND	off 1 1	"Independent" sub-regex.
+    IFTHEN	off 1 1	Switch, should be preceded by switcher .
+    GROUPP	num 1	Whether the group matched.
+
+    # Support for long regex
+    LONGJMP	off 1 1	Jump far away.
+    BRANCHJ	off 1 1	BRANCH with long offset.
+
+    # The heavy worker
+    EVAL	evl 1	Execute some Perl code.
+
+    # Modifiers
+    MINMOD	no	Next operator is not greedy.
+    LOGICAL	no	Next opcode should set the flag only.
+
+    # This is not used yet
+    RENUM	off 1 1	Group with independently numbered parens.
+
+    # This is not really a node, but an optimized away piece of a "long" node.
+    # To simplify debugging output, we mark it as if it were a node
+    OPTIMIZED	off	Placeholder for dump.
+
+=head2 Run-time output
+
+First of all, when doing a match, one may get no run-time output even
+if debugging is enabled.  This means that the regex engine was never
+entered and that all of the job was therefore done by the optimizer.
+
+If the regex engine was entered, the output may look like this:
+
+  Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
+    Setting an EVAL scope, savestack=3
+     2 <ab> <cdefg__gh_>    |  1: ANYOF
+     3 <abc> <defg__gh_>    | 11: EXACT <d>
+     4 <abcd> <efg__gh_>    | 13: CURLYX {1,32767}
+     4 <abcd> <efg__gh_>    | 26:   WHILEM
+				0 out of 1..32767  cc=effff31c
+     4 <abcd> <efg__gh_>    | 15:     OPEN1
+     4 <abcd> <efg__gh_>    | 17:     EXACT <e>
+     5 <abcde> <fg__gh_>    | 19:     STAR
+			     EXACT <f> can match 1 times out of 32767...
+    Setting an EVAL scope, savestack=3
+     6 <bcdef> <g__gh__>    | 22:       EXACT <g>
+     7 <bcdefg> <__gh__>    | 24:       CLOSE1
+     7 <bcdefg> <__gh__>    | 26:       WHILEM
+				    1 out of 1..32767  cc=effff31c
+    Setting an EVAL scope, savestack=12
+     7 <bcdefg> <__gh__>    | 15:         OPEN1
+     7 <bcdefg> <__gh__>    | 17:         EXACT <e>
+       restoring \1 to 4(4)..7
+				    failed, try continuation...
+     7 <bcdefg> <__gh__>    | 27:         NOTHING
+     7 <bcdefg> <__gh__>    | 28:         EXACT <h>
+				    failed...
+				failed...
+
+The most significant information in the output is about the particular I<node>
+of the compiled regex that is currently being tested against the target string.
+The format of these lines is
+
+C<    >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>>   |I<ID>:  I<TYPE>
+
+The I<TYPE> info is indented with respect to the backtracking level.
+Other incidental information appears interspersed within.
+
+=head1 Debugging Perl memory usage
+
+Perl is a profligate wastrel when it comes to memory use.  There
+is a saying that to estimate memory usage of Perl, assume a reasonable
+algorithm for memory allocation, multiply that estimate by 10, and
+while you still may miss the mark, at least you won't be quite so
+astonished.  This is not absolutely true, but may prvide a good
+grasp of what happens.
+
+Assume that an integer cannot take less than 20 bytes of memory, a
+float cannot take less than 24 bytes, a string cannot take less
+than 32 bytes (all these examples assume 32-bit architectures, the
+result are quite a bit worse on 64-bit architectures).  If a variable
+is accessed in two of three different ways (which require an integer,
+a float, or a string), the memory footprint may increase yet another
+20 bytes.  A sloppy malloc(3) implementation can make inflate these
+numbers dramatically.
+
+On the opposite end of the scale, a declaration like
+
+  sub foo;
+
+may take up to 500 bytes of memory, depending on which release of Perl
+you're running.
+
+Anecdotal estimates of source-to-compiled code bloat suggest an
+eightfold increase.  This means that the compiled form of reasonable
+(normally commented, properly indented etc.) code will take
+about eight times more space in memory than the code took
+on disk.
+
+There are two Perl-specific ways to analyze memory usage:
+$ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch.  The first
+is available only if Perl is compiled with Perl's malloc(); the
+second only if Perl was built with C<-DDEBUGGING>.  See the
+instructions for how to do this in the F<INSTALL> podpage at 
+the top level of the Perl source tree.
+
+=head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
+
+If your perl is using Perl's malloc() and was compiled with the
+necessary switches (this is the default), then it will print memory
+usage statistics after compiling your code hwen C<< $ENV{PERL_DEBUG_MSTATS}
+> 1 >>, and before termination of the program when C<<
+$ENV{PERL_DEBUG_MSTATS} >= 1 >>.  The report format is similar to
+the following example:
+
+  $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
+  Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
+     14216 free:   130   117    28     7     9   0   2     2   1 0 0
+		437    61    36     0     5
+     60924 used:   125   137   161    55     7   8   6    16   2 0 1
+		 74   109   304    84    20
+  Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
+  Memory allocation statistics after execution:   (buckets 4(4)..8188(8192)
+     30888 free:   245    78    85    13     6   2   1     3   2 0 1
+		315   162    39    42    11
+    175816 used:   265   176  1112   111    26  22  11    27   2 1 1
+		196   178  1066   798    39
+  Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
+
+It is possible to ask for such a statistic at arbitrary points in
+your execution using the mstats() function out of the standard
+Devel::Peek module.
+
+Here is some explanation of that format:
+
+=over
+
+=item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
+
+Perl's malloc() uses bucketed allocations.  Every request is rounded
+up to the closest bucket size available, and a bucket is taken from
+the pool of buckets of that size.
+
+The line above describes the limits of buckets currently in use.
+Each bucket has two sizes: memory footprint and the maximal size
+of user data that can fit into this bucket.  Suppose in the above
+example that the smallest bucket were size 4.  The biggest bucket
+would have usable size 8188, and the memory footprint would be 8192.
+
+In a Perl built for debugging, some buckets may have negative usable
+size.  This means that these buckets cannot (and will not) be used.
+For larger buckets, the memory footprint may be one page greater
+than a power of 2.  If so, case the corresponding power of two is
+printed in the C<APPROX> field above.
+
+=item Free/Used
+
+The 1 or 2 rows of numbers following that correspond to the number
+of buckets of each size between C<SMALLEST> and C<GREATEST>.  In
+the first row, the sizes (memory footprints) of buckets are powers
+of two--or possibly one page greater.  In the second row, if present,
+the memory footprints of the buckets are between the memory footprints
+of two buckets "above".
+
+For example, suppose under the pervious example, the memory footprints
+were
+
+     free:    8     16    32    64    128  256 512 1024 2048 4096 8192
+	   4     12    24    48    80
+
+With non-C<DEBUGGING> perl, the buckets starting from C<128> have
+a 4-byte overhead, and thus a 8192-long bucket may take up to
+8188-byte allocations.
+
+=item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
+
+The first two fields give the total amount of memory perl sbrk(2)ed
+(ess-broken? :-) and number of sbrk(2)s used.  The third number is
+what perl thinks about continuity of returned chunks.  So long as
+this number is positive, malloc() will assume that it is probable
+that sbrk(2) will provide continuous memory.
+
+Memory allocated by external libraries is not counted.
+
+=item C<pad: 0>
+
+The amount of sbrk(2)ed memory needed to keep buckets aligned.
+
+=item C<heads: 2192>
+
+Although memory overhead of bigger buckets is kept inside the bucket, for
+smaller buckets, it is kept in separate areas.  This field gives the
+total size of these areas.
+
+=item C<chain: 0>
+
+malloc() may want to subdivide a bigger bucket into smaller buckets.
+If only a part of the deceased bucket is left unsubdivided, the rest
+is kept as an element of a linked list.  This field gives the total
+size of these chunks.
+
+=item C<tail: 6144>
+
+To minimize the number of sbrk(2)s, malloc() asks for more memory.  This
+field gives the size of the yet unused part, which is sbrk(2)ed, but
+never touched.
+
+=back
+
+=head2 Example of using B<-DL> switch
+
+Below we show how to analyse memory usage by 
+
+  do 'lib/auto/POSIX/autosplit.ix';
+
+The file in question contains a header and 146 lines similar to
+
+  sub getcwd;
+
+B<WARNING>: The discussion below supposes 32-bit architecture.  In 
+newer releases of Perl, memory usage of the constructs discussed
+here is greatly improved, but the story discussed below is a real-life
+story.  This story is mercilessly terse, and assumes rather more than cursory
+knowledge of Perl internals.  Type space to continue, `q' to quit. 
+(Actually, you just want to skip to the next section.)
+
+Here is the itemized list of Perl allocations performed during parsing
+of this file:
+
+ !!! "after" at test.pl line 3.
+    Id  subtot   4   8  12  16  20  24  28  32  36  40  48  56  64  72  80 80+
+  0 02   13752   .   .   .   . 294   .   .   .   .   .   .   .   .   .   .   4
+  0 54    5545   .   .   8 124  16   .   .   .   1   1   .   .   .   .   .   3
+  5 05      32   .   .   .   .   .   .   .   1   .   .   .   .   .   .   .   .
+  6 02    7152   .   .   .   .   .   .   .   .   .   . 149   .   .   .   .   .
+  7 02    3600   .   .   .   .   . 150   .   .   .   .   .   .   .   .   .   .
+  7 03      64   .  -1   .   1   .   .   2   .   .   .   .   .   .   .   .   .
+  7 04    7056   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   7
+  7 17   38404   .   .   .   .   .   .   .   1   .   . 442 149   .   . 147   .
+  9 03    2078  17 249  32   .   .   .   .   2   .   .   .   .   .   .   .   .
+
+
+To see this list, insert two C<warn('!...')> statements around the call:
+
+  warn('!');
+  do 'lib/auto/POSIX/autosplit.ix';
+  warn('!!! "after"');
+
+and run it with PErl's B<-DL> option.  The first warn() will print
+memory allocation info before parsing the file and will memorize
+the statistics at this point (we ignore what it prints).  The second
+warn() prints increments with respect to these memorized data.  This
+is the printout shown above.
+
+Different I<Id>s on the left correspond to different subsystems of
+the perl interpreter.  They are just the first argument given to
+the perl memory allocation API named New().  To find what C<9 03>
+means, just B<grep> the perl source for C<903>.  You'll find it in
+F<util.c>, function savepvn().  (I know, you wonder why we told you
+to B<grep> and then gave away the answer.  That's because grepping
+the source is good for the soul.)  This function is used to store
+a copy of an existing chunk of memory.  Using a C debugger, one can
+see that the function was called either directly from gv_init() or
+via sv_magic(), and that gv_init() is called from gv_fetchpv()--which
+was itself called from newSUB().  Please stop to catch your breath now.
+
+B<NOTE>: To reach this point in the debugger and skip the calls to
+savepvn() during the compilation of the main program, you should
+set a C breakpoint
+in Perl_warn(), continue until this point is reached, and I<then> set
+a C breakpoint in Perl_savepvn().  Note that you may need to skip a
+handful of Perl_savepvn() calls that do not correspond to mass production
+of CVs (there are more C<903> allocations than 146 similar lines of
+F<lib/auto/POSIX/autosplit.ix>).  Note also that C<Perl_> prefixes are
+added by macroization code in perl header files to avoid conflicts
+with external libraries.
+
+Anyway, we see that C<903> ids correspond to creation of globs, twice
+per glob - for glob name, and glob stringification magic.
+
+Here are explanations for other I<Id>s above: 
+
+=over
+
+=item C<717> 
+
+CReates bigger C<XPV*> structures.  In the case above, it
+creates 3 C<AV>s per subroutine, one for a list of lexical variable
+names, one for a scratchpad (which contains lexical variables and
+C<targets>), and one for the array of scratchpads needed for
+recursion.  
+
+It also creates a C<GV> and a C<CV> per subroutine, all called from
+start_subparse().
+
+=item C<002>
+
+Creates a C array corresponding to the C<AV> of scratchpads and the
+scratchpad itself.  The first fake entry of this scratchpad is
+created though the subroutine itself is not defined yet.
+
+It also creates C arrays to keep data for the stash.  This is one HV,
+but it grows; thus, there are 4 big allocations: the big chunks are not
+freed, but are kept as additional arenas for C<SV> allocations.
+
+=item C<054>
+
+Creates a C<HEK> for the name of the glob for the subroutine.  This
+name is a key in a I<stash>.
+
+Big allocations with this I<Id> correspond to allocations of new
+arenas to keep C<HE>.
+
+=item C<602>
+
+Creates a C<GP> for the glob for the subroutine.
+
+=item C<702>
+
+Creates the C<MAGIC> for the glob for the subroutine.
+
+=item C<704>
+
+Creates I<arenas> which keep SVs.
+
+=back
+
+=head2 B<-DL> details
+
+If Perl is run with B<-DL> option, then warn()s that start with `!'
+behave specially.  They print a list of I<categories> of memory
+allocations, and statistics of allocations of different sizes for
+these categories.
+
+If warn() string starts with
+
+=over
+
+=item C<!!!> 
+
+print changed categories only, print the differences in counts of allocations.
+
+=item C<!!> 
+
+print grown categories only; print the absolute values of counts, and totals.
+
+=item C<!>
+
+print nonempty categories, print the absolute values of counts and totals.
+
+=back
+
+=head2 Limitations of B<-DL> statistics
+
+If an extension or external library does not use the Perl API to
+allocate memory, such allocations are not counted.
+
+=head1 SEE ALSO
+
+L<perldebug>,
+L<perlguts>,
+L<perlrun>
+L<re>,
+and
+L<Devel::Dprof>.
author	Gurusamy Sarathy <gsar@cpan.org>	2000-03-14 05:49:08 +0000
committer	Gurusamy Sarathy <gsar@cpan.org>	2000-03-14 05:49:08 +0000
commit	055fd3a96a4b067d75446c3d47ffc318e9acc40d (patch)
tree	b6449a19782d8aa2703033c9338c80210f4189eb /pod/perldebguts.pod
parent	e3e876cf806e9a3bb353ac41418f1f80df999716 (diff)
download	perl-055fd3a96a4b067d75446c3d47ffc318e9acc40d.tar.gz