summaryrefslogtreecommitdiff
path: root/docs/users_guide/profiling.xml
diff options
context:
space:
mode:
Diffstat (limited to 'docs/users_guide/profiling.xml')
-rw-r--r--docs/users_guide/profiling.xml1440
1 files changed, 1440 insertions, 0 deletions
diff --git a/docs/users_guide/profiling.xml b/docs/users_guide/profiling.xml
new file mode 100644
index 0000000000..a88c8bbf4c
--- /dev/null
+++ b/docs/users_guide/profiling.xml
@@ -0,0 +1,1440 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<chapter id="profiling">
+ <title>Profiling</title>
+ <indexterm><primary>profiling</primary>
+ </indexterm>
+ <indexterm><primary>cost-centre profiling</primary></indexterm>
+
+ <para> Glasgow Haskell comes with a time and space profiling
+ system. Its purpose is to help you improve your understanding of
+ your program's execution behaviour, so you can improve it.</para>
+
+ <para> Any comments, suggestions and/or improvements you have are
+ welcome. Recommended &ldquo;profiling tricks&rdquo; would be
+ especially cool! </para>
+
+ <para>Profiling a program is a three-step process:</para>
+
+ <orderedlist>
+ <listitem>
+ <para> Re-compile your program for profiling with the
+ <literal>-prof</literal> option, and probably one of the
+ <literal>-auto</literal> or <literal>-auto-all</literal>
+ options. These options are described in more detail in <xref
+ linkend="prof-compiler-options"/> </para>
+ <indexterm><primary><literal>-prof</literal></primary>
+ </indexterm>
+ <indexterm><primary><literal>-auto</literal></primary>
+ </indexterm>
+ <indexterm><primary><literal>-auto-all</literal></primary>
+ </indexterm>
+ </listitem>
+
+ <listitem>
+ <para> Run your program with one of the profiling options, eg.
+ <literal>+RTS -p -RTS</literal>. This generates a file of
+ profiling information.</para>
+ <indexterm><primary><option>-p</option></primary><secondary>RTS
+ option</secondary></indexterm>
+ </listitem>
+
+ <listitem>
+ <para> Examine the generated profiling information, using one of
+ GHC's profiling tools. The tool to use will depend on the kind
+ of profiling information generated.</para>
+ </listitem>
+
+ </orderedlist>
+
+ <sect1 id="cost-centres">
+ <title>Cost centres and cost-centre stacks</title>
+
+ <para>GHC's profiling system assigns <firstterm>costs</firstterm>
+ to <firstterm>cost centres</firstterm>. A cost is simply the time
+ or space required to evaluate an expression. Cost centres are
+ program annotations around expressions; all costs incurred by the
+ annotated expression are assigned to the enclosing cost centre.
+ Furthermore, GHC will remember the stack of enclosing cost centres
+ for any given expression at run-time and generate a call-graph of
+ cost attributions.</para>
+
+ <para>Let's take a look at an example:</para>
+
+ <programlisting>
+main = print (nfib 25)
+nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+</programlisting>
+
+ <para>Compile and run this program as follows:</para>
+
+ <screen>
+$ ghc -prof -auto-all -o Main Main.hs
+$ ./Main +RTS -p
+121393
+$
+</screen>
+
+ <para>When a GHC-compiled program is run with the
+ <option>-p</option> RTS option, it generates a file called
+ <filename>&lt;prog&gt;.prof</filename>. In this case, the file
+ will contain something like this:</para>
+
+<screen>
+ Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final)
+
+ Main +RTS -p -RTS
+
+ total time = 0.14 secs (7 ticks @ 20 ms)
+ total alloc = 8,741,204 bytes (excludes profiling overheads)
+
+COST CENTRE MODULE %time %alloc
+
+nfib Main 100.0 100.0
+
+
+ individual inherited
+COST CENTRE MODULE entries %time %alloc %time %alloc
+
+MAIN MAIN 0 0.0 0.0 100.0 100.0
+ main Main 0 0.0 0.0 0.0 0.0
+ CAF PrelHandle 3 0.0 0.0 0.0 0.0
+ CAF PrelAddr 1 0.0 0.0 0.0 0.0
+ CAF Main 6 0.0 0.0 100.0 100.0
+ main Main 1 0.0 0.0 100.0 100.0
+ nfib Main 242785 100.0 100.0 100.0 100.0
+</screen>
+
+
+ <para>The first part of the file gives the program name and
+ options, and the total time and total memory allocation measured
+ during the run of the program (note that the total memory
+ allocation figure isn't the same as the amount of
+ <emphasis>live</emphasis> memory needed by the program at any one
+ time; the latter can be determined using heap profiling, which we
+ will describe shortly).</para>
+
+ <para>The second part of the file is a break-down by cost centre
+ of the most costly functions in the program. In this case, there
+ was only one significant function in the program, namely
+ <function>nfib</function>, and it was responsible for 100&percnt;
+ of both the time and allocation costs of the program.</para>
+
+ <para>The third and final section of the file gives a profile
+ break-down by cost-centre stack. This is roughly a call-graph
+ profile of the program. In the example above, it is clear that
+ the costly call to <function>nfib</function> came from
+ <function>main</function>.</para>
+
+ <para>The time and allocation incurred by a given part of the
+ program is displayed in two ways: &ldquo;individual&rdquo;, which
+ are the costs incurred by the code covered by this cost centre
+ stack alone, and &ldquo;inherited&rdquo;, which includes the costs
+ incurred by all the children of this node.</para>
+
+ <para>The usefulness of cost-centre stacks is better demonstrated
+ by modifying the example slightly:</para>
+
+ <programlisting>
+main = print (f 25 + g 25)
+f n = nfib n
+g n = nfib (n `div` 2)
+nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+</programlisting>
+
+ <para>Compile and run this program as before, and take a look at
+ the new profiling results:</para>
+
+<screen>
+COST CENTRE MODULE scc %time %alloc %time %alloc
+
+MAIN MAIN 0 0.0 0.0 100.0 100.0
+ main Main 0 0.0 0.0 0.0 0.0
+ CAF PrelHandle 3 0.0 0.0 0.0 0.0
+ CAF PrelAddr 1 0.0 0.0 0.0 0.0
+ CAF Main 9 0.0 0.0 100.0 100.0
+ main Main 1 0.0 0.0 100.0 100.0
+ g Main 1 0.0 0.0 0.0 0.2
+ nfib Main 465 0.0 0.2 0.0 0.2
+ f Main 1 0.0 0.0 100.0 99.8
+ nfib Main 242785 100.0 99.8 100.0 99.8
+</screen>
+
+ <para>Now although we had two calls to <function>nfib</function>
+ in the program, it is immediately clear that it was the call from
+ <function>f</function> which took all the time.</para>
+
+ <para>The actual meaning of the various columns in the output is:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>entries</term>
+ <listitem>
+ <para>The number of times this particular point in the call
+ graph was entered.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>individual &percnt;time</term>
+ <listitem>
+ <para>The percentage of the total run time of the program
+ spent at this point in the call graph.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>individual &percnt;alloc</term>
+ <listitem>
+ <para>The percentage of the total memory allocations
+ (excluding profiling overheads) of the program made by this
+ call.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>inherited &percnt;time</term>
+ <listitem>
+ <para>The percentage of the total run time of the program
+ spent below this point in the call graph.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>inherited &percnt;alloc</term>
+ <listitem>
+ <para>The percentage of the total memory allocations
+ (excluding profiling overheads) of the program made by this
+ call and all of its sub-calls.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>In addition you can use the <option>-P</option> RTS option
+ <indexterm><primary><option>-P</option></primary></indexterm> to
+ get the following additional information:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>ticks</literal></term>
+ <listitem>
+ <para>The raw number of time &ldquo;ticks&rdquo; which were
+ attributed to this cost-centre; from this, we get the
+ <literal>&percnt;time</literal> figure mentioned
+ above.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>bytes</literal></term>
+ <listitem>
+ <para>Number of bytes allocated in the heap while in this
+ cost-centre; again, this is the raw number from which we get
+ the <literal>&percnt;alloc</literal> figure mentioned
+ above.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>What about recursive functions, and mutually recursive
+ groups of functions? Where are the costs attributed? Well,
+ although GHC does keep information about which groups of functions
+ called each other recursively, this information isn't displayed in
+ the basic time and allocation profile, instead the call-graph is
+ flattened into a tree. The XML profiling tool (described in <xref
+ linkend="prof-xml-tool"/>) will be able to display real loops in
+ the call-graph.</para>
+
+ <sect2><title>Inserting cost centres by hand</title>
+
+ <para>Cost centres are just program annotations. When you say
+ <option>-auto-all</option> to the compiler, it automatically
+ inserts a cost centre annotation around every top-level function
+ in your program, but you are entirely free to add the cost
+ centre annotations yourself.</para>
+
+ <para>The syntax of a cost centre annotation is</para>
+
+ <programlisting>
+ {-# SCC "name" #-} &lt;expression&gt;
+</programlisting>
+
+ <para>where <literal>"name"</literal> is an arbitrary string,
+ that will become the name of your cost centre as it appears
+ in the profiling output, and
+ <literal>&lt;expression&gt;</literal> is any Haskell
+ expression. An <literal>SCC</literal> annotation extends as
+ far to the right as possible when parsing.</para>
+
+ </sect2>
+
+ <sect2 id="prof-rules">
+ <title>Rules for attributing costs</title>
+
+ <para>The cost of evaluating any expression in your program is
+ attributed to a cost-centre stack using the following rules:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>If the expression is part of the
+ <firstterm>one-off</firstterm> costs of evaluating the
+ enclosing top-level definition, then costs are attributed to
+ the stack of lexically enclosing <literal>SCC</literal>
+ annotations on top of the special <literal>CAF</literal>
+ cost-centre. </para>
+ </listitem>
+
+ <listitem>
+ <para>Otherwise, costs are attributed to the stack of
+ lexically-enclosing <literal>SCC</literal> annotations,
+ appended to the cost-centre stack in effect at the
+ <firstterm>call site</firstterm> of the current top-level
+ definition<footnote> <para>The call-site is just the place
+ in the source code which mentions the particular function or
+ variable.</para></footnote>. Notice that this is a recursive
+ definition.</para>
+ </listitem>
+
+ <listitem>
+ <para>Time spent in foreign code (see <xref linkend="ffi"/>)
+ is always attributed to the cost centre in force at the
+ Haskell call-site of the foreign function.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>What do we mean by one-off costs? Well, Haskell is a lazy
+ language, and certain expressions are only ever evaluated once.
+ For example, if we write:</para>
+
+ <programlisting>
+x = nfib 25
+</programlisting>
+
+ <para>then <varname>x</varname> will only be evaluated once (if
+ at all), and subsequent demands for <varname>x</varname> will
+ immediately get to see the cached result. The definition
+ <varname>x</varname> is called a CAF (Constant Applicative
+ Form), because it has no arguments.</para>
+
+ <para>For the purposes of profiling, we say that the expression
+ <literal>nfib 25</literal> belongs to the one-off costs of
+ evaluating <varname>x</varname>.</para>
+
+ <para>Since one-off costs aren't strictly speaking part of the
+ call-graph of the program, they are attributed to a special
+ top-level cost centre, <literal>CAF</literal>. There may be one
+ <literal>CAF</literal> cost centre for each module (the
+ default), or one for each top-level definition with any one-off
+ costs (this behaviour can be selected by giving GHC the
+ <option>-caf-all</option> flag).</para>
+
+ <indexterm><primary><literal>-caf-all</literal></primary>
+ </indexterm>
+
+ <para>If you think you have a weird profile, or the call-graph
+ doesn't look like you expect it to, feel free to send it (and
+ your program) to us at
+ <email>glasgow-haskell-bugs@haskell.org</email>.</para>
+ </sect2>
+ </sect1>
+
+ <sect1 id="prof-compiler-options">
+ <title>Compiler options for profiling</title>
+
+ <indexterm><primary>profiling</primary><secondary>options</secondary></indexterm>
+ <indexterm><primary>options</primary><secondary>for profiling</secondary></indexterm>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-prof</option>:
+ <indexterm><primary><option>-prof</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para> To make use of the profiling system
+ <emphasis>all</emphasis> modules must be compiled and linked
+ with the <option>-prof</option> option. Any
+ <literal>SCC</literal> annotations you've put in your source
+ will spring to life.</para>
+
+ <para> Without a <option>-prof</option> option, your
+ <literal>SCC</literal>s are ignored; so you can compile
+ <literal>SCC</literal>-laden code without changing
+ it.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>There are a few other profiling-related compilation options.
+ Use them <emphasis>in addition to</emphasis>
+ <option>-prof</option>. These do not have to be used consistently
+ for all modules in a program.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-auto</option>:
+ <indexterm><primary><option>-auto</option></primary></indexterm>
+ <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
+ </term>
+ <listitem>
+ <para> GHC will automatically add
+ <function>&lowbar;scc&lowbar;</function> constructs for all
+ top-level, exported functions.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-auto-all</option>:
+ <indexterm><primary><option>-auto-all</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para> <emphasis>All</emphasis> top-level functions,
+ exported or not, will be automatically
+ <function>&lowbar;scc&lowbar;</function>'d.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-caf-all</option>:
+ <indexterm><primary><option>-caf-all</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para> The costs of all CAFs in a module are usually
+ attributed to one &ldquo;big&rdquo; CAF cost-centre. With
+ this option, all CAFs get their own cost-centre. An
+ &ldquo;if all else fails&rdquo; option&hellip;</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-ignore-scc</option>:
+ <indexterm><primary><option>-ignore-scc</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para>Ignore any <function>&lowbar;scc&lowbar;</function>
+ constructs, so a module which already has
+ <function>&lowbar;scc&lowbar;</function>s can be compiled
+ for profiling with the annotations ignored.</para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect1>
+
+ <sect1 id="prof-time-options">
+ <title>Time and allocation profiling</title>
+
+ <para>To generate a time and allocation profile, give one of the
+ following RTS options to the compiled program when you run it (RTS
+ options should be enclosed between <literal>+RTS...-RTS</literal>
+ as usual):</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-p</option> or <option>-P</option>:
+ <indexterm><primary><option>-p</option></primary></indexterm>
+ <indexterm><primary><option>-P</option></primary></indexterm>
+ <indexterm><primary>time profile</primary></indexterm>
+ </term>
+ <listitem>
+ <para>The <option>-p</option> option produces a standard
+ <emphasis>time profile</emphasis> report. It is written
+ into the file
+ <filename><replaceable>program</replaceable>.prof</filename>.</para>
+
+ <para>The <option>-P</option> option produces a more
+ detailed report containing the actual time and allocation
+ data as well. (Not used much.)</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-px</option>:
+ <indexterm><primary><option>-px</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para>The <option>-px</option> option generates profiling
+ information in the XML format understood by our new
+ profiling tool, see <xref linkend="prof-xml-tool"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-xc</option>
+ <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>This option makes use of the extra information
+ maintained by the cost-centre-stack profiler to provide
+ useful information about the location of runtime errors.
+ See <xref linkend="rts-options-debugging"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect1>
+
+ <sect1 id="prof-heap">
+ <title>Profiling memory usage</title>
+
+ <para>In addition to profiling the time and allocation behaviour
+ of your program, you can also generate a graph of its memory usage
+ over time. This is useful for detecting the causes of
+ <firstterm>space leaks</firstterm>, when your program holds on to
+ more memory at run-time that it needs to. Space leaks lead to
+ longer run-times due to heavy garbage collector activity, and may
+ even cause the program to run out of memory altogether.</para>
+
+ <para>To generate a heap profile from your program:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Compile the program for profiling (<xref
+ linkend="prof-compiler-options"/>).</para>
+ </listitem>
+ <listitem>
+ <para>Run it with one of the heap profiling options described
+ below (eg. <option>-hc</option> for a basic producer profile).
+ This generates the file
+ <filename><replaceable>prog</replaceable>.hp</filename>.</para>
+ </listitem>
+ <listitem>
+ <para>Run <command>hp2ps</command> to produce a Postscript
+ file,
+ <filename><replaceable>prog</replaceable>.ps</filename>. The
+ <command>hp2ps</command> utility is described in detail in
+ <xref linkend="hp2ps"/>.</para>
+ </listitem>
+ <listitem>
+ <para>Display the heap profile using a postscript viewer such
+ as <application>Ghostview</application>, or print it out on a
+ Postscript-capable printer.</para>
+ </listitem>
+ </orderedlist>
+
+ <sect2 id="rts-options-heap-prof">
+ <title>RTS options for heap profiling</title>
+
+ <para>There are several different kinds of heap profile that can
+ be generated. All the different profile types yield a graph of
+ live heap against time, but they differ in how the live heap is
+ broken down into bands. The following RTS options select which
+ break-down to use:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-hc</option>
+ <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Breaks down the graph by the cost-centre stack which
+ produced the data.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hm</option>
+ <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Break down the live heap by the module containing
+ the code which produced the data.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hd</option>
+ <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Breaks down the graph by <firstterm>closure
+ description</firstterm>. For actual data, the description
+ is just the constructor name, for other closures it is a
+ compiler-generated string identifying the closure.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hy</option>
+ <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Breaks down the graph by
+ <firstterm>type</firstterm>. For closures which have
+ function type or unknown/polymorphic type, the string will
+ represent an approximation to the actual type.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hr</option>
+ <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Break down the graph by <firstterm>retainer
+ set</firstterm>. Retainer profiling is described in more
+ detail below (<xref linkend="retainer-prof"/>).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hb</option>
+ <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Break down the graph by
+ <firstterm>biography</firstterm>. Biographical profiling
+ is described in more detail below (<xref
+ linkend="biography-prof"/>).</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>In addition, the profile can be restricted to heap data
+ which satisfies certain criteria - for example, you might want
+ to display a profile by type but only for data produced by a
+ certain module, or a profile by retainer for a certain type of
+ data. Restrictions are specified as follows:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-hc</option><replaceable>name</replaceable>,...
+ <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures produced by
+ cost-centre stacks with one of the specified cost centres
+ at the top.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hC</option><replaceable>name</replaceable>,...
+ <indexterm><primary><option>-hC</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures produced by
+ cost-centre stacks with one of the specified cost centres
+ anywhere in the stack.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hm</option><replaceable>module</replaceable>,...
+ <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures produced by the
+ specified modules.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hd</option><replaceable>desc</replaceable>,...
+ <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures with the specified
+ description strings.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hy</option><replaceable>type</replaceable>,...
+ <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures with the specified
+ types.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hr</option><replaceable>cc</replaceable>,...
+ <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures with retainer sets
+ containing cost-centre stacks with one of the specified
+ cost centres at the top.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-hb</option><replaceable>bio</replaceable>,...
+ <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Restrict the profile to closures with one of the
+ specified biographies, where
+ <replaceable>bio</replaceable> is one of
+ <literal>lag</literal>, <literal>drag</literal>,
+ <literal>void</literal>, or <literal>use</literal>.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>For example, the following options will generate a
+ retainer profile restricted to <literal>Branch</literal> and
+ <literal>Leaf</literal> constructors:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hdBranch,Leaf
+</screen>
+
+ <para>There can only be one "break-down" option
+ (eg. <option>-hr</option> in the example above), but there is no
+ limit on the number of further restrictions that may be applied.
+ All the options may be combined, with one exception: GHC doesn't
+ currently support mixing the <option>-hr</option> and
+ <option>-hb</option> options.</para>
+
+ <para>There are two more options which relate to heap
+ profiling:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-i<replaceable>secs</replaceable></option>:
+ <indexterm><primary><option>-i</option></primary></indexterm>
+ </term>
+ <listitem>
+ <para>Set the profiling (sampling) interval to
+ <replaceable>secs</replaceable> seconds (the default is
+ 0.1&nbsp;second). Fractions are allowed: for example
+ <option>-i0.2</option> will get 5 samples per second.
+ This only affects heap profiling; time profiles are always
+ sampled on a 1/50 second frequency.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-xt</option>
+ <indexterm><primary><option>-xt</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>Include the memory occupied by threads in a heap
+ profile. Each thread takes up a small area for its thread
+ state in addition to the space allocated for its stack
+ (stacks normally start small and then grow as
+ necessary).</para>
+
+ <para>This includes the main thread, so using
+ <option>-xt</option> is a good way to see how much stack
+ space the program is using.</para>
+
+ <para>Memory occupied by threads and their stacks is
+ labelled as &ldquo;TSO&rdquo; when displaying the profile
+ by closure description or type description.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ </sect2>
+
+ <sect2 id="retainer-prof">
+ <title>Retainer Profiling</title>
+
+ <para>Retainer profiling is designed to help answer questions
+ like <quote>why is this data being retained?</quote>. We start
+ by defining what we mean by a retainer:</para>
+
+ <blockquote>
+ <para>A retainer is either the system stack, or an unevaluated
+ closure (thunk).</para>
+ </blockquote>
+
+ <para>In particular, constructors are <emphasis>not</emphasis>
+ retainers.</para>
+
+ <para>An object B retains object A if (i) B is a retainer object and
+ (ii) object A can be reached by recursively following pointers
+ starting from object B, but not meeting any other retainer
+ objects on the way. Each live object is retained by one or more
+ retainer objects, collectively called its retainer set, or its
+ <firstterm>retainer set</firstterm>, or its
+ <firstterm>retainers</firstterm>.</para>
+
+ <para>When retainer profiling is requested by giving the program
+ the <option>-hr</option> option, a graph is generated which is
+ broken down by retainer set. A retainer set is displayed as a
+ set of cost-centre stacks; because this is usually too large to
+ fit on the profile graph, each retainer set is numbered and
+ shown abbreviated on the graph along with its number, and the
+ full list of retainer sets is dumped into the file
+ <filename><replaceable>prog</replaceable>.prof</filename>.</para>
+
+ <para>Retainer profiling requires multiple passes over the live
+ heap in order to discover the full retainer set for each
+ object, which can be quite slow. So we set a limit on the
+ maximum size of a retainer set, where all retainer sets larger
+ than the maximum retainer set size are replaced by the special
+ set <literal>MANY</literal>. The maximum set size defaults to 8
+ and can be altered with the <option>-R</option> RTS
+ option:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-R</option><replaceable>size</replaceable></term>
+ <listitem>
+ <para>Restrict the number of elements in a retainer set to
+ <replaceable>size</replaceable> (default 8).</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <sect3>
+ <title>Hints for using retainer profiling</title>
+
+ <para>The definition of retainers is designed to reflect a
+ common cause of space leaks: a large structure is retained by
+ an unevaluated computation, and will be released once the
+ computation is forced. A good example is looking up a value in
+ a finite map, where unless the lookup is forced in a timely
+ manner the unevaluated lookup will cause the whole mapping to
+ be retained. These kind of space leaks can often be
+ eliminated by forcing the relevant computations to be
+ performed eagerly, using <literal>seq</literal> or strictness
+ annotations on data constructor fields.</para>
+
+ <para>Often a particular data structure is being retained by a
+ chain of unevaluated closures, only the nearest of which will
+ be reported by retainer profiling - for example A retains B, B
+ retains C, and C retains a large structure. There might be a
+ large number of Bs but only a single A, so A is really the one
+ we're interested in eliminating. However, retainer profiling
+ will in this case report B as the retainer of the large
+ structure. To move further up the chain of retainers, we can
+ ask for another retainer profile but this time restrict the
+ profile to B objects, so we get a profile of the retainers of
+ B:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hcB
+</screen>
+
+ <para>This trick isn't foolproof, because there might be other
+ B closures in the heap which aren't the retainers we are
+ interested in, but we've found this to be a useful technique
+ in most cases.</para>
+ </sect3>
+ </sect2>
+
+ <sect2 id="biography-prof">
+ <title>Biographical Profiling</title>
+
+ <para>A typical heap object may be in one of the following four
+ states at each point in its lifetime:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>The <firstterm>lag</firstterm> stage, which is the
+ time between creation and the first use of the
+ object,</para>
+ </listitem>
+ <listitem>
+ <para>the <firstterm>use</firstterm> stage, which lasts from
+ the first use until the last use of the object, and</para>
+ </listitem>
+ <listitem>
+ <para>The <firstterm>drag</firstterm> stage, which lasts
+ from the final use until the last reference to the object
+ is dropped.</para>
+ </listitem>
+ <listitem>
+ <para>An object which is never used is said to be in the
+ <firstterm>void</firstterm> state for its whole
+ lifetime.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>A biographical heap profile displays the portion of the
+ live heap in each of the four states listed above. Usually the
+ most interesting states are the void and drag states: live heap
+ in these states is more likely to be wasted space than heap in
+ the lag or use states.</para>
+
+ <para>It is also possible to break down the heap in one or more
+ of these states by a different criteria, by restricting a
+ profile by biography. For example, to show the portion of the
+ heap in the drag or void state by producer: </para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hc -hbdrag,void
+</screen>
+
+ <para>Once you know the producer or the type of the heap in the
+ drag or void states, the next step is usually to find the
+ retainer(s):</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hc<replaceable>cc</replaceable>...
+</screen>
+
+ <para>NOTE: this two stage process is required because GHC
+ cannot currently profile using both biographical and retainer
+ information simultaneously.</para>
+ </sect2>
+
+ <sect2 id="mem-residency">
+ <title>Actual memory residency</title>
+
+ <para>How does the heap residency reported by the heap profiler relate to
+ the actual memory residency of your program when you run it? You might
+ see a large discrepancy between the residency reported by the heap
+ profiler, and the residency reported by tools on your system
+ (eg. <literal>ps</literal> or <literal>top</literal> on Unix, or the
+ Task Manager on Windows). There are several reasons for this:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>There is an overhead of profiling itself, which is subtracted
+ from the residency figures by the profiler. This overhead goes
+ away when compiling without profiling support, of course. The
+ space overhead is currently 2 extra
+ words per heap object, which probably results in
+ about a 30% overhead.</para>
+ </listitem>
+
+ <listitem>
+ <para>Garbage collection requires more memory than the actual
+ residency. The factor depends on the kind of garbage collection
+ algorithm in use: a major GC in the standard
+ generation copying collector will usually require 3L bytes of
+ memory, where L is the amount of live data. This is because by
+ default (see the <option>+RTS -F</option> option) we allow the old
+ generation to grow to twice its size (2L) before collecting it, and
+ we require additionally L bytes to copy the live data into. When
+ using compacting collection (see the <option>+RTS -c</option>
+ option), this is reduced to 2L, and can further be reduced by
+ tweaking the <option>-F</option> option. Also add the size of the
+ allocation area (currently a fixed 512Kb).</para>
+ </listitem>
+
+ <listitem>
+ <para>The stack isn't counted in the heap profile by default. See the
+ <option>+RTS -xt</option> option.</para>
+ </listitem>
+
+ <listitem>
+ <para>The program text itself, the C stack, any non-heap data (eg. data
+ allocated by foreign libraries, and data allocated by the RTS), and
+ <literal>mmap()</literal>'d memory are not counted in the heap profile.</para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ </sect1>
+
+ <sect1 id="prof-xml-tool">
+ <title>Graphical time/allocation profile</title>
+
+ <para>You can view the time and allocation profiling graph of your
+ program graphically, using <command>ghcprof</command>. This is a
+ new tool with GHC 4.08, and will eventually be the de-facto
+ standard way of viewing GHC profiles<footnote><para>Actually this
+ isn't true any more, we are working on a new tool for
+ displaying heap profiles using Gtk+HS, so
+ <command>ghcprof</command> may go away at some point in the future.</para>
+ </footnote></para>
+
+ <para>To run <command>ghcprof</command>, you need
+ <productname>uDraw(Graph)</productname> installed, which can be
+ obtained from <ulink
+ url="http://www.informatik.uni-bremen.de/uDrawGraph/en/uDrawGraph/uDrawGraph.html"><citetitle>uDraw(Graph)</citetitle></ulink>. Install one of
+ the binary
+ distributions, and set your
+ <envar>UDG_HOME</envar> environment variable to point to the
+ installation directory.</para>
+
+ <para><command>ghcprof</command> uses an XML-based profiling log
+ format, and you therefore need to run your program with a
+ different option: <option>-px</option>. The file generated is
+ still called <filename>&lt;prog&gt;.prof</filename>. To see the
+ profile, run <command>ghcprof</command> like this:</para>
+
+ <indexterm><primary><option>-px</option></primary></indexterm>
+
+<screen>
+$ ghcprof &lt;prog&gt;.prof
+</screen>
+
+ <para>which should pop up a window showing the call-graph of your
+ program in glorious detail. More information on using
+ <command>ghcprof</command> can be found at <ulink
+ url="http://www.dcs.warwick.ac.uk/people/academic/Stephen.Jarvis/profiler/index.html"><citetitle>The
+ Cost-Centre Stack Profiling Tool for
+ GHC</citetitle></ulink>.</para>
+
+ </sect1>
+
+ <sect1 id="hp2ps">
+ <title><command>hp2ps</command>&ndash;&ndash;heap profile to PostScript</title>
+
+ <indexterm><primary><command>hp2ps</command></primary></indexterm>
+ <indexterm><primary>heap profiles</primary></indexterm>
+ <indexterm><primary>postscript, from heap profiles</primary></indexterm>
+ <indexterm><primary><option>-h&lt;break-down&gt;</option></primary></indexterm>
+
+ <para>Usage:</para>
+
+<screen>
+hp2ps [flags] [&lt;file&gt;[.hp]]
+</screen>
+
+ <para>The program
+ <command>hp2ps</command><indexterm><primary>hp2ps
+ program</primary></indexterm> converts a heap profile as produced
+ by the <option>-h&lt;break-down&gt;</option> runtime option into a
+ PostScript graph of the heap profile. By convention, the file to
+ be processed by <command>hp2ps</command> has a
+ <filename>.hp</filename> extension. The PostScript output is
+ written to <filename>&lt;file&gt;@.ps</filename>. If
+ <filename>&lt;file&gt;</filename> is omitted entirely, then the
+ program behaves as a filter.</para>
+
+ <para><command>hp2ps</command> is distributed in
+ <filename>ghc/utils/hp2ps</filename> in a GHC source
+ distribution. It was originally developed by Dave Wakeling as part
+ of the HBC/LML heap profiler.</para>
+
+ <para>The flags are:</para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <listitem>
+ <para>In order to make graphs more readable,
+ <command>hp2ps</command> sorts the shaded bands for each
+ identifier. The default sort ordering is for the bands with
+ the largest area to be stacked on top of the smaller ones.
+ The <option>-d</option> option causes rougher bands (those
+ representing series of values with the largest standard
+ deviations) to be stacked on top of smoother ones.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-b</option></term>
+ <listitem>
+ <para>Normally, <command>hp2ps</command> puts the title of
+ the graph in a small box at the top of the page. However, if
+ the JOB string is too long to fit in a small box (more than
+ 35 characters), then <command>hp2ps</command> will choose to
+ use a big box instead. The <option>-b</option> option
+ forces <command>hp2ps</command> to use a big box.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e&lt;float&gt;[in&verbar;mm&verbar;pt]</option></term>
+ <listitem>
+ <para>Generate encapsulated PostScript suitable for
+ inclusion in LaTeX documents. Usually, the PostScript graph
+ is drawn in landscape mode in an area 9 inches wide by 6
+ inches high, and <command>hp2ps</command> arranges for this
+ area to be approximately centred on a sheet of a4 paper.
+ This format is convenient of studying the graph in detail,
+ but it is unsuitable for inclusion in LaTeX documents. The
+ <option>-e</option> option causes the graph to be drawn in
+ portrait mode, with float specifying the width in inches,
+ millimetres or points (the default). The resulting
+ PostScript file conforms to the Encapsulated PostScript
+ (EPS) convention, and it can be included in a LaTeX document
+ using Rokicki's dvi-to-PostScript converter
+ <command>dvips</command>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-g</option></term>
+ <listitem>
+ <para>Create output suitable for the <command>gs</command>
+ PostScript previewer (or similar). In this case the graph is
+ printed in portrait mode without scaling. The output is
+ unsuitable for a laser printer.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-l</option></term>
+ <listitem>
+ <para>Normally a profile is limited to 20 bands with
+ additional identifiers being grouped into an
+ <literal>OTHER</literal> band. The <option>-l</option> flag
+ removes this 20 band and limit, producing as many bands as
+ necessary. No key is produced as it won't fit!. It is useful
+ for creation time profiles with many bands.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-m&lt;int&gt;</option></term>
+ <listitem>
+ <para>Normally a profile is limited to 20 bands with
+ additional identifiers being grouped into an
+ <literal>OTHER</literal> band. The <option>-m</option> flag
+ specifies an alternative band limit (the maximum is
+ 20).</para>
+
+ <para><option>-m0</option> requests the band limit to be
+ removed. As many bands as necessary are produced. However no
+ key is produced as it won't fit! It is useful for displaying
+ creation time profiles with many bands.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p</option></term>
+ <listitem>
+ <para>Use previous parameters. By default, the PostScript
+ graph is automatically scaled both horizontally and
+ vertically so that it fills the page. However, when
+ preparing a series of graphs for use in a presentation, it
+ is often useful to draw a new graph using the same scale,
+ shading and ordering as a previous one. The
+ <option>-p</option> flag causes the graph to be drawn using
+ the parameters determined by a previous run of
+ <command>hp2ps</command> on <filename>file</filename>. These
+ are extracted from <filename>file@.aux</filename>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <listitem>
+ <para>Use a small box for the title.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t&lt;float&gt;</option></term>
+ <listitem>
+ <para>Normally trace elements which sum to a total of less
+ than 1&percnt; of the profile are removed from the
+ profile. The <option>-t</option> option allows this
+ percentage to be modified (maximum 5&percnt;).</para>
+
+ <para><option>-t0</option> requests no trace elements to be
+ removed from the profile, ensuring that all the data will be
+ displayed.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-c</option></term>
+ <listitem>
+ <para>Generate colour output.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-y</option></term>
+ <listitem>
+ <para>Ignore marks.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <listitem>
+ <para>Print out usage information.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+
+ <sect2 id="manipulating-hp">
+ <title>Manipulating the hp file</title>
+
+<para>(Notes kindly offered by Jan-Willhem Maessen.)</para>
+
+<para>
+The <filename>FOO.hp</filename> file produced when you ask for the
+heap profile of a program <filename>FOO</filename> is a text file with a particularly
+simple structure. Here's a representative example, with much of the
+actual data omitted:
+<screen>
+JOB "FOO -hC"
+DATE "Thu Dec 26 18:17 2002"
+SAMPLE_UNIT "seconds"
+VALUE_UNIT "bytes"
+BEGIN_SAMPLE 0.00
+END_SAMPLE 0.00
+BEGIN_SAMPLE 15.07
+ ... sample data ...
+END_SAMPLE 15.07
+BEGIN_SAMPLE 30.23
+ ... sample data ...
+END_SAMPLE 30.23
+... etc.
+BEGIN_SAMPLE 11695.47
+END_SAMPLE 11695.47
+</screen>
+The first four lines (<literal>JOB</literal>, <literal>DATE</literal>, <literal>SAMPLE_UNIT</literal>, <literal>VALUE_UNIT</literal>) form a
+header. Each block of lines starting with <literal>BEGIN_SAMPLE</literal> and ending
+with <literal>END_SAMPLE</literal> forms a single sample (you can think of this as a
+vertical slice of your heap profile). The hp2ps utility should accept
+any input with a properly-formatted header followed by a series of
+*complete* samples.
+</para>
+</sect2>
+
+ <sect2>
+ <title>Zooming in on regions of your profile</title>
+
+<para>
+You can look at particular regions of your profile simply by loading a
+copy of the <filename>.hp</filename> file into a text editor and deleting the unwanted
+samples. The resulting <filename>.hp</filename> file can be run through <command>hp2ps</command> and viewed
+or printed.
+</para>
+</sect2>
+
+ <sect2>
+ <title>Viewing the heap profile of a running program</title>
+
+<para>
+The <filename>.hp</filename> file is generated incrementally as your
+program runs. In principle, running <command>hp2ps</command> on the incomplete file
+should produce a snapshot of your program's heap usage. However, the
+last sample in the file may be incomplete, causing <command>hp2ps</command> to fail. If
+you are using a machine with UNIX utilities installed, it's not too
+hard to work around this problem (though the resulting command line
+looks rather Byzantine):
+<screen>
+ head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+ | hp2ps > FOO.ps
+</screen>
+
+The command <command>fgrep -n END_SAMPLE FOO.hp</command> finds the
+end of every complete sample in <filename>FOO.hp</filename>, and labels each sample with
+its ending line number. We then select the line number of the last
+complete sample using <command>tail</command> and <command>cut</command>. This is used as a
+parameter to <command>head</command>; the result is as if we deleted the final
+incomplete sample from <filename>FOO.hp</filename>. This results in a properly-formatted
+.hp file which we feed directly to <command>hp2ps</command>.
+</para>
+</sect2>
+ <sect2>
+ <title>Viewing a heap profile in real time</title>
+
+<para>
+The <command>gv</command> and <command>ghostview</command> programs
+have a "watch file" option can be used to view an up-to-date heap
+profile of your program as it runs. Simply generate an incremental
+heap profile as described in the previous section. Run <command>gv</command> on your
+profile:
+<screen>
+ gv -watch -seascape FOO.ps
+</screen>
+If you forget the <literal>-watch</literal> flag you can still select
+"Watch file" from the "State" menu. Now each time you generate a new
+profile <filename>FOO.ps</filename> the view will update automatically.
+</para>
+
+<para>
+This can all be encapsulated in a little script:
+<screen>
+ #!/bin/sh
+ head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+ | hp2ps > FOO.ps
+ gv -watch -seascape FOO.ps &amp;
+ while [ 1 ] ; do
+ sleep 10 # We generate a new profile every 10 seconds.
+ head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+ | hp2ps > FOO.ps
+ done
+</screen>
+Occasionally <command>gv</command> will choke as it tries to read an incomplete copy of
+<filename>FOO.ps</filename> (because <command>hp2ps</command> is still running as an update
+occurs). A slightly more complicated script works around this
+problem, by using the fact that sending a SIGHUP to gv will cause it
+to re-read its input file:
+<screen>
+ #!/bin/sh
+ head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+ | hp2ps > FOO.ps
+ gv FOO.ps &amp;
+ gvpsnum=$!
+ while [ 1 ] ; do
+ sleep 10
+ head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+ | hp2ps > FOO.ps
+ kill -HUP $gvpsnum
+ done
+</screen>
+</para>
+</sect2>
+
+
+ </sect1>
+
+ <sect1 id="ticky-ticky">
+ <title>Using &ldquo;ticky-ticky&rdquo; profiling (for implementors)</title>
+ <indexterm><primary>ticky-ticky profiling</primary></indexterm>
+
+ <para>(ToDo: document properly.)</para>
+
+ <para>It is possible to compile Glasgow Haskell programs so that
+ they will count lots and lots of interesting things, e.g., number
+ of updates, number of data constructors entered, etc., etc. We
+ call this &ldquo;ticky-ticky&rdquo;
+ profiling,<indexterm><primary>ticky-ticky
+ profiling</primary></indexterm> <indexterm><primary>profiling,
+ ticky-ticky</primary></indexterm> because that's the sound a Sun4
+ makes when it is running up all those counters
+ (<emphasis>slowly</emphasis>).</para>
+
+ <para>Ticky-ticky profiling is mainly intended for implementors;
+ it is quite separate from the main &ldquo;cost-centre&rdquo;
+ profiling system, intended for all users everywhere.</para>
+
+ <para>To be able to use ticky-ticky profiling, you will need to
+ have built appropriate libraries and things when you made the
+ system. See &ldquo;Customising what libraries to build,&rdquo; in
+ the installation guide.</para>
+
+ <para>To get your compiled program to spit out the ticky-ticky
+ numbers, use a <option>-r</option> RTS
+ option<indexterm><primary>-r RTS option</primary></indexterm>.
+ See <xref linkend="runtime-control"/>.</para>
+
+ <para>Compiling your program with the <option>-ticky</option>
+ switch yields an executable that performs these counts. Here is a
+ sample ticky-ticky statistics file, generated by the invocation
+ <command>foo +RTS -rfoo.ticky</command>.</para>
+
+<screen>
+ foo +RTS -rfoo.ticky
+
+
+ALLOCATIONS: 3964631 (11330900 words total: 3999476 admin, 6098829 goods, 1232595 slop)
+ total words: 2 3 4 5 6+
+ 69647 ( 1.8%) function values 50.0 50.0 0.0 0.0 0.0
+2382937 ( 60.1%) thunks 0.0 83.9 16.1 0.0 0.0
+1477218 ( 37.3%) data values 66.8 33.2 0.0 0.0 0.0
+ 0 ( 0.0%) big tuples
+ 2 ( 0.0%) black holes 0.0 100.0 0.0 0.0 0.0
+ 0 ( 0.0%) prim things
+ 34825 ( 0.9%) partial applications 0.0 0.0 0.0 100.0 0.0
+ 2 ( 0.0%) thread state objects 0.0 0.0 0.0 0.0 100.0
+
+Total storage-manager allocations: 3647137 (11882004 words)
+ [551104 words lost to speculative heap-checks]
+
+STACK USAGE:
+
+ENTERS: 9400092 of which 2005772 (21.3%) direct to the entry code
+ [the rest indirected via Node's info ptr]
+1860318 ( 19.8%) thunks
+3733184 ( 39.7%) data values
+3149544 ( 33.5%) function values
+ [of which 1999880 (63.5%) bypassed arg-satisfaction chk]
+ 348140 ( 3.7%) partial applications
+ 308906 ( 3.3%) normal indirections
+ 0 ( 0.0%) permanent indirections
+
+RETURNS: 5870443
+2137257 ( 36.4%) from entering a new constructor
+ [the rest from entering an existing constructor]
+2349219 ( 40.0%) vectored [the rest unvectored]
+
+RET_NEW: 2137257: 32.5% 46.2% 21.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
+RET_OLD: 3733184: 2.8% 67.9% 29.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
+RET_UNBOXED_TUP: 2: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
+
+RET_VEC_RETURN : 2349219: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
+
+UPDATE FRAMES: 2241725 (0 omitted from thunks)
+SEQ FRAMES: 1
+CATCH FRAMES: 1
+UPDATES: 2241725
+ 0 ( 0.0%) data values
+ 34827 ( 1.6%) partial applications
+ [2 in place, 34825 allocated new space]
+2206898 ( 98.4%) updates to existing heap objects (46 by squeezing)
+UPD_CON_IN_NEW: 0: 0 0 0 0 0 0 0 0 0
+UPD_PAP_IN_NEW: 34825: 0 0 0 34825 0 0 0 0 0
+
+NEW GEN UPDATES: 2274700 ( 99.9%)
+
+OLD GEN UPDATES: 1852 ( 0.1%)
+
+Total bytes copied during GC: 190096
+
+**************************************************
+3647137 ALLOC_HEAP_ctr
+11882004 ALLOC_HEAP_tot
+ 69647 ALLOC_FUN_ctr
+ 69647 ALLOC_FUN_adm
+ 69644 ALLOC_FUN_gds
+ 34819 ALLOC_FUN_slp
+ 34831 ALLOC_FUN_hst_0
+ 34816 ALLOC_FUN_hst_1
+ 0 ALLOC_FUN_hst_2
+ 0 ALLOC_FUN_hst_3
+ 0 ALLOC_FUN_hst_4
+2382937 ALLOC_UP_THK_ctr
+ 0 ALLOC_SE_THK_ctr
+ 308906 ENT_IND_ctr
+ 0 E!NT_PERM_IND_ctr requires +RTS -Z
+[... lots more info omitted ...]
+ 0 GC_SEL_ABANDONED_ctr
+ 0 GC_SEL_MINOR_ctr
+ 0 GC_SEL_MAJOR_ctr
+ 0 GC_FAILED_PROMOTION_ctr
+ 47524 GC_WORDS_COPIED_ctr
+</screen>
+
+ <para>The formatting of the information above the row of asterisks
+ is subject to change, but hopefully provides a useful
+ human-readable summary. Below the asterisks <emphasis>all
+ counters</emphasis> maintained by the ticky-ticky system are
+ dumped, in a format intended to be machine-readable: zero or more
+ spaces, an integer, a space, the counter name, and a newline.</para>
+
+ <para>In fact, not <emphasis>all</emphasis> counters are
+ necessarily dumped; compile- or run-time flags can render certain
+ counters invalid. In this case, either the counter will simply
+ not appear, or it will appear with a modified counter name,
+ possibly along with an explanation for the omission (notice
+ <literal>ENT&lowbar;PERM&lowbar;IND&lowbar;ctr</literal> appears
+ with an inserted <literal>!</literal> above). Software analysing
+ this output should always check that it has the counters it
+ expects. Also, beware: some of the counters can have
+ <emphasis>large</emphasis> values!</para>
+
+ </sect1>
+
+</chapter>
+
+<!-- Emacs stuff:
+ ;;; Local Variables: ***
+ ;;; mode: xml ***
+ ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***
+ ;;; End: ***
+ -->