1 files changed, 1440 insertions, 0 deletions
diff --git a/docs/users_guide/profiling.xml b/docs/users_guide/profiling.xml
new file mode 100644
index 0000000000..a88c8bbf4c
--- /dev/null
+++ b/docs/users_guide/profiling.xml
@@ -0,0 +1,1440 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<chapter id="profiling">
+  <title>Profiling</title>
+  <indexterm><primary>profiling</primary>
+  </indexterm>
+  <indexterm><primary>cost-centre profiling</primary></indexterm>
+
+  <para> Glasgow Haskell comes with a time and space profiling
+  system. Its purpose is to help you improve your understanding of
+  your program's execution behaviour, so you can improve it.</para>
+  
+  <para> Any comments, suggestions and/or improvements you have are
+  welcome.  Recommended &ldquo;profiling tricks&rdquo; would be
+  especially cool! </para>
+
+  <para>Profiling a program is a three-step process:</para>
+
+  <orderedlist>
+    <listitem>
+      <para> Re-compile your program for profiling with the
+      <literal>-prof</literal> option, and probably one of the
+      <literal>-auto</literal> or <literal>-auto-all</literal>
+      options.  These options are described in more detail in <xref
+      linkend="prof-compiler-options"/> </para>
+      <indexterm><primary><literal>-prof</literal></primary>
+      </indexterm>
+      <indexterm><primary><literal>-auto</literal></primary>
+      </indexterm>
+      <indexterm><primary><literal>-auto-all</literal></primary>
+      </indexterm>
+    </listitem>
+
+    <listitem>
+      <para> Run your program with one of the profiling options, eg.
+      <literal>+RTS -p -RTS</literal>.  This generates a file of
+      profiling information.</para>
+      <indexterm><primary><option>-p</option></primary><secondary>RTS
+      option</secondary></indexterm>
+    </listitem>
+      
+    <listitem>
+      <para> Examine the generated profiling information, using one of
+      GHC's profiling tools.  The tool to use will depend on the kind
+      of profiling information generated.</para>
+    </listitem>
+    
+  </orderedlist>
+  
+  <sect1 id="cost-centres">
+    <title>Cost centres and cost-centre stacks</title>
+    
+    <para>GHC's profiling system assigns <firstterm>costs</firstterm>
+    to <firstterm>cost centres</firstterm>.  A cost is simply the time
+    or space required to evaluate an expression.  Cost centres are
+    program annotations around expressions; all costs incurred by the
+    annotated expression are assigned to the enclosing cost centre.
+    Furthermore, GHC will remember the stack of enclosing cost centres
+    for any given expression at run-time and generate a call-graph of
+    cost attributions.</para>
+
+    <para>Let's take a look at an example:</para>
+
+    <programlisting>
+main = print (nfib 25)
+nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+</programlisting>
+
+    <para>Compile and run this program as follows:</para>
+
+    <screen>
+$ ghc -prof -auto-all -o Main Main.hs
+$ ./Main +RTS -p
+121393
+$
+</screen>
+
+    <para>When a GHC-compiled program is run with the
+    <option>-p</option> RTS option, it generates a file called
+    <filename>&lt;prog&gt;.prof</filename>.  In this case, the file
+    will contain something like this:</para>
+
+<screen>
+          Fri May 12 14:06 2000 Time and Allocation Profiling Report  (Final)
+
+           Main +RTS -p -RTS
+
+        total time  =        0.14 secs   (7 ticks @ 20 ms)
+        total alloc =   8,741,204 bytes  (excludes profiling overheads)
+
+COST CENTRE          MODULE     %time %alloc
+
+nfib                 Main       100.0  100.0
+
+
+                                              individual     inherited
+COST CENTRE              MODULE      entries %time %alloc   %time %alloc
+
+MAIN                     MAIN             0    0.0   0.0    100.0 100.0
+ main                    Main             0    0.0   0.0      0.0   0.0
+ CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
+ CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
+ CAF                     Main             6    0.0   0.0    100.0 100.0
+  main                   Main             1    0.0   0.0    100.0 100.0
+   nfib                  Main        242785  100.0 100.0    100.0 100.0
+</screen>
+
+
+    <para>The first part of the file gives the program name and
+    options, and the total time and total memory allocation measured
+    during the run of the program (note that the total memory
+    allocation figure isn't the same as the amount of
+    <emphasis>live</emphasis> memory needed by the program at any one
+    time; the latter can be determined using heap profiling, which we
+    will describe shortly).</para>
+
+    <para>The second part of the file is a break-down by cost centre
+    of the most costly functions in the program.  In this case, there
+    was only one significant function in the program, namely
+    <function>nfib</function>, and it was responsible for 100&percnt;
+    of both the time and allocation costs of the program.</para>
+
+    <para>The third and final section of the file gives a profile
+    break-down by cost-centre stack.  This is roughly a call-graph
+    profile of the program.  In the example above, it is clear that
+    the costly call to <function>nfib</function> came from
+    <function>main</function>.</para>
+
+    <para>The time and allocation incurred by a given part of the
+    program is displayed in two ways: &ldquo;individual&rdquo;, which
+    are the costs incurred by the code covered by this cost centre
+    stack alone, and &ldquo;inherited&rdquo;, which includes the costs
+    incurred by all the children of this node.</para>
+
+    <para>The usefulness of cost-centre stacks is better demonstrated
+    by  modifying the example slightly:</para>
+
+    <programlisting>
+main = print (f 25 + g 25)
+f n  = nfib n
+g n  = nfib (n `div` 2)
+nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+</programlisting>
+
+    <para>Compile and run this program as before, and take a look at
+    the new profiling results:</para>
+
+<screen>
+COST CENTRE              MODULE         scc  %time %alloc   %time %alloc
+
+MAIN                     MAIN             0    0.0   0.0    100.0 100.0
+ main                    Main             0    0.0   0.0      0.0   0.0
+ CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
+ CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
+ CAF                     Main             9    0.0   0.0    100.0 100.0
+  main                   Main             1    0.0   0.0    100.0 100.0
+   g                     Main             1    0.0   0.0      0.0   0.2
+    nfib                 Main           465    0.0   0.2      0.0   0.2
+   f                     Main             1    0.0   0.0    100.0  99.8
+    nfib                 Main        242785  100.0  99.8    100.0  99.8
+</screen>
+
+    <para>Now although we had two calls to <function>nfib</function>
+    in the program, it is immediately clear that it was the call from
+    <function>f</function> which took all the time.</para>
+
+    <para>The actual meaning of the various columns in the output is:</para>
+
+    <variablelist>
+      <varlistentry>
+	<term>entries</term>
+	<listitem>
+	  <para>The number of times this particular point in the call
+	  graph was entered.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>individual &percnt;time</term>
+	<listitem>
+	  <para>The percentage of the total run time of the program
+	  spent at this point in the call graph.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>individual &percnt;alloc</term>
+	<listitem>
+	  <para>The percentage of the total memory allocations
+	  (excluding profiling overheads) of the program made by this
+	  call.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>inherited &percnt;time</term>
+	<listitem>
+	  <para>The percentage of the total run time of the program
+	  spent below this point in the call graph.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>inherited &percnt;alloc</term>
+	<listitem>
+	  <para>The percentage of the total memory allocations
+	  (excluding profiling overheads) of the program made by this
+	  call and all of its sub-calls.</para>
+	</listitem>
+      </varlistentry>
+    </variablelist>
+
+    <para>In addition you can use the <option>-P</option> RTS option
+    <indexterm><primary><option>-P</option></primary></indexterm> to
+    get the following additional information:</para>
+
+    <variablelist>
+      <varlistentry>
+	<term><literal>ticks</literal></term>
+	<listitem>
+	  <para>The raw number of time &ldquo;ticks&rdquo; which were
+          attributed to this cost-centre; from this, we get the
+          <literal>&percnt;time</literal> figure mentioned
+          above.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><literal>bytes</literal></term>
+	<listitem>
+	  <para>Number of bytes allocated in the heap while in this
+          cost-centre; again, this is the raw number from which we get
+          the <literal>&percnt;alloc</literal> figure mentioned
+          above.</para>
+	</listitem>
+      </varlistentry>
+    </variablelist>
+
+    <para>What about recursive functions, and mutually recursive
+    groups of functions?  Where are the costs attributed?  Well,
+    although GHC does keep information about which groups of functions
+    called each other recursively, this information isn't displayed in
+    the basic time and allocation profile, instead the call-graph is
+    flattened into a tree.  The XML profiling tool (described in <xref
+    linkend="prof-xml-tool"/>) will be able to display real loops in
+    the call-graph.</para>
+
+    <sect2><title>Inserting cost centres by hand</title>
+
+      <para>Cost centres are just program annotations.  When you say
+      <option>-auto-all</option> to the compiler, it automatically
+      inserts a cost centre annotation around every top-level function
+      in your program, but you are entirely free to add the cost
+      centre annotations yourself.</para>
+
+      <para>The syntax of a cost centre annotation is</para>
+
+      <programlisting>
+     {-# SCC "name" #-} &lt;expression&gt;
+</programlisting>
+
+      <para>where <literal>"name"</literal> is an arbitrary string,
+      that will become the name of your cost centre as it appears
+      in the profiling output, and
+      <literal>&lt;expression&gt;</literal> is any Haskell
+      expression.  An <literal>SCC</literal> annotation extends as
+      far to the right as possible when parsing.</para>
+
+    </sect2>
+
+    <sect2 id="prof-rules">
+      <title>Rules for attributing costs</title>
+
+      <para>The cost of evaluating any expression in your program is
+      attributed to a cost-centre stack using the following rules:</para>
+
+      <itemizedlist>
+	<listitem>
+	  <para>If the expression is part of the
+	  <firstterm>one-off</firstterm> costs of evaluating the
+	  enclosing top-level definition, then costs are attributed to
+	  the stack of lexically enclosing <literal>SCC</literal>
+	  annotations on top of the special <literal>CAF</literal>
+	  cost-centre. </para>
+	</listitem>
+
+	<listitem>
+	  <para>Otherwise, costs are attributed to the stack of
+	  lexically-enclosing <literal>SCC</literal> annotations,
+	  appended to the cost-centre stack in effect at the
+	  <firstterm>call site</firstterm> of the current top-level
+	  definition<footnote> <para>The call-site is just the place
+	  in the source code which mentions the particular function or
+	  variable.</para></footnote>.  Notice that this is a recursive
+	  definition.</para>
+	</listitem>
+
+	<listitem>
+	  <para>Time spent in foreign code (see <xref linkend="ffi"/>)
+	  is always attributed to the cost centre in force at the
+	  Haskell call-site of the foreign function.</para>
+	</listitem>
+      </itemizedlist>
+
+      <para>What do we mean by one-off costs?  Well, Haskell is a lazy
+      language, and certain expressions are only ever evaluated once.
+      For example, if we write:</para>
+
+      <programlisting>
+x = nfib 25
+</programlisting>
+
+      <para>then <varname>x</varname> will only be evaluated once (if
+      at all), and subsequent demands for <varname>x</varname> will
+      immediately get to see the cached result.  The definition
+      <varname>x</varname> is called a CAF (Constant Applicative
+      Form), because it has no arguments.</para>
+
+      <para>For the purposes of profiling, we say that the expression
+      <literal>nfib 25</literal> belongs to the one-off costs of
+      evaluating <varname>x</varname>.</para>
+
+      <para>Since one-off costs aren't strictly speaking part of the
+      call-graph of the program, they are attributed to a special
+      top-level cost centre, <literal>CAF</literal>.  There may be one
+      <literal>CAF</literal> cost centre for each module (the
+      default), or one for each top-level definition with any one-off
+      costs (this behaviour can be selected by giving GHC the
+      <option>-caf-all</option> flag).</para>
+
+      <indexterm><primary><literal>-caf-all</literal></primary>
+      </indexterm>
+
+      <para>If you think you have a weird profile, or the call-graph
+      doesn't look like you expect it to, feel free to send it (and
+      your program) to us at
+      <email>glasgow-haskell-bugs@haskell.org</email>.</para>
+    </sect2>
+  </sect1>
+
+  <sect1 id="prof-compiler-options">
+    <title>Compiler options for profiling</title>
+
+    <indexterm><primary>profiling</primary><secondary>options</secondary></indexterm>
+    <indexterm><primary>options</primary><secondary>for profiling</secondary></indexterm>
+
+    <variablelist>
+      <varlistentry>
+	<term>
+          <option>-prof</option>:
+          <indexterm><primary><option>-prof</option></primary></indexterm>
+        </term>
+	<listitem>
+	  <para> To make use of the profiling system
+          <emphasis>all</emphasis> modules must be compiled and linked
+          with the <option>-prof</option> option. Any
+          <literal>SCC</literal> annotations you've put in your source
+          will spring to life.</para>
+
+	  <para> Without a <option>-prof</option> option, your
+          <literal>SCC</literal>s are ignored; so you can compile
+          <literal>SCC</literal>-laden code without changing
+          it.</para>
+	</listitem>
+      </varlistentry>
+    </variablelist>
+      
+    <para>There are a few other profiling-related compilation options.
+    Use them <emphasis>in addition to</emphasis>
+    <option>-prof</option>.  These do not have to be used consistently
+    for all modules in a program.</para>
+
+    <variablelist>
+      <varlistentry>
+	<term>
+          <option>-auto</option>:
+          <indexterm><primary><option>-auto</option></primary></indexterm>
+          <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
+        </term>
+	<listitem>
+	  <para> GHC will automatically add
+          <function>&lowbar;scc&lowbar;</function> constructs for all
+          top-level, exported functions.</para>
+	</listitem>
+      </varlistentry>
+      
+      <varlistentry>
+	<term>
+          <option>-auto-all</option>:
+          <indexterm><primary><option>-auto-all</option></primary></indexterm>
+        </term>
+	<listitem>
+	  <para> <emphasis>All</emphasis> top-level functions,
+	  exported or not, will be automatically
+	  <function>&lowbar;scc&lowbar;</function>'d.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-caf-all</option>:
+          <indexterm><primary><option>-caf-all</option></primary></indexterm>
+        </term>
+	<listitem>
+	  <para> The costs of all CAFs in a module are usually
+	  attributed to one &ldquo;big&rdquo; CAF cost-centre. With
+	  this option, all CAFs get their own cost-centre.  An
+	  &ldquo;if all else fails&rdquo; option&hellip;</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-ignore-scc</option>:
+          <indexterm><primary><option>-ignore-scc</option></primary></indexterm>
+        </term>
+	<listitem>
+	  <para>Ignore any <function>&lowbar;scc&lowbar;</function>
+          constructs, so a module which already has
+          <function>&lowbar;scc&lowbar;</function>s can be compiled
+          for profiling with the annotations ignored.</para>
+	</listitem>
+      </varlistentry>
+
+    </variablelist>
+
+  </sect1>
+
+  <sect1 id="prof-time-options">
+    <title>Time and allocation profiling</title>
+
+    <para>To generate a time and allocation profile, give one of the
+    following RTS options to the compiled program when you run it (RTS
+    options should be enclosed between <literal>+RTS...-RTS</literal>
+    as usual):</para>
+
+    <variablelist>
+      <varlistentry>
+	<term>
+          <option>-p</option> or <option>-P</option>:
+          <indexterm><primary><option>-p</option></primary></indexterm>
+          <indexterm><primary><option>-P</option></primary></indexterm>
+          <indexterm><primary>time profile</primary></indexterm>
+        </term>
+	<listitem>
+	  <para>The <option>-p</option> option produces a standard
+          <emphasis>time profile</emphasis> report.  It is written
+          into the file
+          <filename><replaceable>program</replaceable>.prof</filename>.</para>
+
+	  <para>The <option>-P</option> option produces a more
+          detailed report containing the actual time and allocation
+          data as well.  (Not used much.)</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-px</option>:
+          <indexterm><primary><option>-px</option></primary></indexterm>
+        </term>
+	<listitem>
+	  <para>The <option>-px</option> option generates profiling
+	  information in the XML format understood by our new
+	  profiling tool, see <xref linkend="prof-xml-tool"/>.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-xc</option>
+          <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm>
+        </term>
+	<listitem>
+	  <para>This option makes use of the extra information
+	  maintained by the cost-centre-stack profiler to provide
+	  useful information about the location of runtime errors.
+	  See <xref linkend="rts-options-debugging"/>.</para>
+	</listitem>
+      </varlistentry>
+
+    </variablelist>
+    
+  </sect1>
+
+  <sect1 id="prof-heap">
+    <title>Profiling memory usage</title>
+
+    <para>In addition to profiling the time and allocation behaviour
+    of your program, you can also generate a graph of its memory usage
+    over time.  This is useful for detecting the causes of
+    <firstterm>space leaks</firstterm>, when your program holds on to
+    more memory at run-time that it needs to.  Space leaks lead to
+    longer run-times due to heavy garbage collector activity, and may
+    even cause the program to run out of memory altogether.</para>
+
+    <para>To generate a heap profile from your program:</para>
+
+    <orderedlist>
+      <listitem>
+	<para>Compile the program for profiling (<xref
+	linkend="prof-compiler-options"/>).</para>
+      </listitem>
+      <listitem>
+	<para>Run it with one of the heap profiling options described
+	below (eg. <option>-hc</option> for a basic producer profile).
+	This generates the file
+	<filename><replaceable>prog</replaceable>.hp</filename>.</para>
+      </listitem>
+      <listitem>
+	<para>Run <command>hp2ps</command> to produce a Postscript
+	file,
+	<filename><replaceable>prog</replaceable>.ps</filename>.  The
+	<command>hp2ps</command> utility is described in detail in
+	<xref linkend="hp2ps"/>.</para> 
+      </listitem>
+      <listitem>
+	<para>Display the heap profile using a postscript viewer such
+	as <application>Ghostview</application>, or print it out on a
+	Postscript-capable printer.</para>
+      </listitem>
+    </orderedlist>
+
+    <sect2 id="rts-options-heap-prof">
+      <title>RTS options for heap profiling</title>
+
+      <para>There are several different kinds of heap profile that can
+      be generated.  All the different profile types yield a graph of
+      live heap against time, but they differ in how the live heap is
+      broken down into bands.  The following RTS options select which
+      break-down to use:</para>
+
+      <variablelist>
+	<varlistentry>
+	  <term>
+            <option>-hc</option>
+            <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Breaks down the graph by the cost-centre stack which
+	    produced the data.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hm</option>
+            <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Break down the live heap by the module containing
+	    the code which produced the data.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hd</option>
+            <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Breaks down the graph by <firstterm>closure
+	    description</firstterm>.  For actual data, the description
+	    is just the constructor name, for other closures it is a
+	    compiler-generated string identifying the closure.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hy</option>
+            <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Breaks down the graph by
+	    <firstterm>type</firstterm>.  For closures which have
+	    function type or unknown/polymorphic type, the string will
+	    represent an approximation to the actual type.</para>
+	  </listitem>
+	</varlistentry>
+	
+	<varlistentry>
+	  <term>
+            <option>-hr</option>
+            <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Break down the graph by <firstterm>retainer
+	    set</firstterm>.  Retainer profiling is described in more
+	    detail below (<xref linkend="retainer-prof"/>).</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hb</option>
+            <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Break down the graph by
+	    <firstterm>biography</firstterm>.  Biographical profiling
+	    is described in more detail below (<xref
+	    linkend="biography-prof"/>).</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <para>In addition, the profile can be restricted to heap data
+      which satisfies certain criteria - for example, you might want
+      to display a profile by type but only for data produced by a
+      certain module, or a profile by retainer for a certain type of
+      data.  Restrictions are specified as follows:</para>
+      
+      <variablelist>
+	<varlistentry>
+	  <term>
+            <option>-hc</option><replaceable>name</replaceable>,...
+            <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by
+	    cost-centre stacks with one of the specified cost centres
+	    at the top.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hC</option><replaceable>name</replaceable>,...
+            <indexterm><primary><option>-hC</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by
+	    cost-centre stacks with one of the specified cost centres
+	    anywhere in the stack.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hm</option><replaceable>module</replaceable>,...
+            <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by the
+	    specified modules.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hd</option><replaceable>desc</replaceable>,...
+            <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures with the specified
+	    description strings.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hy</option><replaceable>type</replaceable>,...
+            <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures with the specified
+	    types.</para>
+	  </listitem>
+	</varlistentry>
+	
+	<varlistentry>
+	  <term>
+            <option>-hr</option><replaceable>cc</replaceable>,...
+            <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures with retainer sets
+	    containing cost-centre stacks with one of the specified
+	    cost centres at the top.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-hb</option><replaceable>bio</replaceable>,...
+            <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Restrict the profile to closures with one of the
+	    specified biographies, where
+	    <replaceable>bio</replaceable> is one of
+	    <literal>lag</literal>, <literal>drag</literal>,
+	    <literal>void</literal>, or <literal>use</literal>.</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <para>For example, the following options will generate a
+      retainer profile restricted to <literal>Branch</literal> and
+      <literal>Leaf</literal> constructors:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hdBranch,Leaf
+</screen>
+
+      <para>There can only be one "break-down" option
+      (eg. <option>-hr</option> in the example above), but there is no
+      limit on the number of further restrictions that may be applied.
+      All the options may be combined, with one exception: GHC doesn't
+      currently support mixing the <option>-hr</option> and
+      <option>-hb</option> options.</para>
+
+      <para>There are two more options which relate to heap
+      profiling:</para>
+
+      <variablelist>
+	<varlistentry>
+	  <term>
+            <option>-i<replaceable>secs</replaceable></option>:
+            <indexterm><primary><option>-i</option></primary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Set the profiling (sampling) interval to
+            <replaceable>secs</replaceable> seconds (the default is
+            0.1&nbsp;second).  Fractions are allowed: for example
+            <option>-i0.2</option> will get 5 samples per second.
+            This only affects heap profiling; time profiles are always
+            sampled on a 1/50 second frequency.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>
+            <option>-xt</option>
+            <indexterm><primary><option>-xt</option></primary><secondary>RTS option</secondary></indexterm>
+          </term>
+	  <listitem>
+	    <para>Include the memory occupied by threads in a heap
+	    profile.  Each thread takes up a small area for its thread
+	    state in addition to the space allocated for its stack
+	    (stacks normally start small and then grow as
+	    necessary).</para>
+	    
+	    <para>This includes the main thread, so using
+	    <option>-xt</option> is a good way to see how much stack
+	    space the program is using.</para>
+
+	    <para>Memory occupied by threads and their stacks is
+	    labelled as &ldquo;TSO&rdquo; when displaying the profile
+	    by closure description or type description.</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+    </sect2>
+    
+    <sect2 id="retainer-prof">
+      <title>Retainer Profiling</title>
+
+      <para>Retainer profiling is designed to help answer questions
+      like <quote>why is this data being retained?</quote>.  We start
+      by defining what we mean by a retainer:</para>
+
+      <blockquote>
+	<para>A retainer is either the system stack, or an unevaluated
+	closure (thunk).</para>
+      </blockquote>
+
+      <para>In particular, constructors are <emphasis>not</emphasis>
+      retainers.</para>
+
+      <para>An object B retains object A if (i) B is a retainer object and
+     (ii) object A can be reached by recursively following pointers
+     starting from object B, but not meeting any other retainer
+     objects on the way. Each live object is retained by one or more
+     retainer objects, collectively called its retainer set, or its
+      <firstterm>retainer set</firstterm>, or its
+      <firstterm>retainers</firstterm>.</para>
+
+      <para>When retainer profiling is requested by giving the program
+      the <option>-hr</option> option, a graph is generated which is
+      broken down by retainer set.  A retainer set is displayed as a
+      set of cost-centre stacks; because this is usually too large to
+      fit on the profile graph, each retainer set is numbered and
+      shown abbreviated on the graph along with its number, and the
+      full list of retainer sets is dumped into the file
+      <filename><replaceable>prog</replaceable>.prof</filename>.</para>
+
+      <para>Retainer profiling requires multiple passes over the live
+      heap in order to discover the full retainer set for each
+      object, which can be quite slow.  So we set a limit on the
+      maximum size of a retainer set, where all retainer sets larger
+      than the maximum retainer set size are replaced by the special
+      set <literal>MANY</literal>.  The maximum set size defaults to 8
+      and can be altered with the <option>-R</option> RTS
+      option:</para>
+      
+      <variablelist>
+	<varlistentry>
+	  <term><option>-R</option><replaceable>size</replaceable></term>
+	  <listitem>
+	    <para>Restrict the number of elements in a retainer set to
+	    <replaceable>size</replaceable> (default 8).</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <sect3>
+	<title>Hints for using retainer profiling</title>
+
+	<para>The definition of retainers is designed to reflect a
+        common cause of space leaks: a large structure is retained by
+        an unevaluated computation, and will be released once the
+        computation is forced.  A good example is looking up a value in
+        a finite map, where unless the lookup is forced in a timely
+        manner the unevaluated lookup will cause the whole mapping to
+        be retained.  These kind of space leaks can often be
+        eliminated by forcing the relevant computations to be
+        performed eagerly, using <literal>seq</literal> or strictness
+        annotations on data constructor fields.</para>
+
+	<para>Often a particular data structure is being retained by a
+        chain of unevaluated closures, only the nearest of which will
+        be reported by retainer profiling - for example A retains B, B
+        retains C, and C retains a large structure.  There might be a
+        large number of Bs but only a single A, so A is really the one
+        we're interested in eliminating.  However, retainer profiling
+        will in this case report B as the retainer of the large
+        structure.  To move further up the chain of retainers, we can
+        ask for another retainer profile but this time restrict the
+        profile to B objects, so we get a profile of the retainers of
+        B:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hcB
+</screen>
+	
+	<para>This trick isn't foolproof, because there might be other
+        B closures in the heap which aren't the retainers we are
+        interested in, but we've found this to be a useful technique
+        in most cases.</para>
+      </sect3>
+    </sect2>
+
+    <sect2 id="biography-prof">
+      <title>Biographical Profiling</title>
+
+      <para>A typical heap object may be in one of the following four
+      states at each point in its lifetime:</para>
+
+      <itemizedlist>
+	<listitem>
+	  <para>The <firstterm>lag</firstterm> stage, which is the
+	  time between creation and the first use of the
+	  object,</para>
+	</listitem>
+	<listitem>
+	  <para>the <firstterm>use</firstterm> stage, which lasts from
+	  the first use until the last use of the object, and</para>
+	</listitem>
+	<listitem>
+	  <para>The <firstterm>drag</firstterm> stage, which lasts
+	  from the final use until the last reference to the object
+	  is dropped.</para>
+	</listitem>
+	<listitem>
+	  <para>An object which is never used is said to be in the
+	  <firstterm>void</firstterm> state for its whole
+	  lifetime.</para>
+	</listitem>
+      </itemizedlist>
+
+      <para>A biographical heap profile displays the portion of the
+      live heap in each of the four states listed above.  Usually the
+      most interesting states are the void and drag states: live heap
+      in these states is more likely to be wasted space than heap in
+      the lag or use states.</para>
+
+      <para>It is also possible to break down the heap in one or more
+      of these states by a different criteria, by restricting a
+      profile by biography.  For example, to show the portion of the
+      heap in the drag or void state by producer: </para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hc -hbdrag,void
+</screen>
+
+      <para>Once you know the producer or the type of the heap in the
+      drag or void states, the next step is usually to find the
+      retainer(s):</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hc<replaceable>cc</replaceable>...
+</screen>
+
+      <para>NOTE: this two stage process is required because GHC
+      cannot currently profile using both biographical and retainer
+      information simultaneously.</para>
+    </sect2>
+
+    <sect2 id="mem-residency">
+      <title>Actual memory residency</title>
+
+      <para>How does the heap residency reported by the heap profiler relate to
+	the actual memory residency of your program when you run it?  You might
+	see a large discrepancy between the residency reported by the heap
+	profiler, and the residency reported by tools on your system
+	(eg. <literal>ps</literal> or <literal>top</literal> on Unix, or the
+	Task Manager on Windows).  There are several reasons for this:</para>
+
+      <itemizedlist>
+	<listitem>
+	  <para>There is an overhead of profiling itself, which is subtracted
+	    from the residency figures by the profiler.  This overhead goes
+	    away when compiling without profiling support, of course.  The
+	    space overhead is currently 2 extra
+	    words per heap object, which probably results in
+	    about a 30% overhead.</para>
+	</listitem>
+
+	<listitem>
+	  <para>Garbage collection requires more memory than the actual
+	    residency.  The factor depends on the kind of garbage collection
+	    algorithm in use:  a major GC in the standard
+	    generation copying collector will usually require 3L bytes of
+	    memory, where L is the amount of live data.  This is because by
+	    default (see the <option>+RTS -F</option> option) we allow the old
+	    generation to grow to twice its size (2L) before collecting it, and
+	    we require additionally L bytes to copy the live data into.  When
+	    using compacting collection (see the <option>+RTS -c</option>
+	    option), this is reduced to 2L, and can further be reduced by
+	    tweaking the <option>-F</option> option.  Also add the size of the
+	    allocation area (currently a fixed 512Kb).</para>
+	</listitem>
+
+	<listitem>
+	  <para>The stack isn't counted in the heap profile by default.  See the
+    <option>+RTS -xt</option> option.</para>
+	</listitem>
+
+	<listitem>
+	  <para>The program text itself, the C stack, any non-heap data (eg. data
+	    allocated by foreign libraries, and data allocated by the RTS), and
+	    <literal>mmap()</literal>'d memory are not counted in the heap profile.</para>
+	</listitem>
+      </itemizedlist>
+    </sect2>
+
+  </sect1>
+
+  <sect1 id="prof-xml-tool">
+    <title>Graphical time/allocation profile</title>
+
+    <para>You can view the time and allocation profiling graph of your
+    program graphically, using <command>ghcprof</command>.  This is a
+    new tool with GHC 4.08, and will eventually be the de-facto
+    standard way of viewing GHC profiles<footnote><para>Actually this
+    isn't true any more, we are working on a new tool for
+    displaying heap profiles using Gtk+HS, so
+    <command>ghcprof</command> may go away at some point in the future.</para>
+      </footnote></para>
+
+    <para>To run <command>ghcprof</command>, you need
+    <productname>uDraw(Graph)</productname> installed, which can be
+    obtained from <ulink
+    url="http://www.informatik.uni-bremen.de/uDrawGraph/en/uDrawGraph/uDrawGraph.html"><citetitle>uDraw(Graph)</citetitle></ulink>.  Install one of
+    the binary
+    distributions, and set your
+    <envar>UDG_HOME</envar> environment variable to point to the
+    installation directory.</para>
+
+    <para><command>ghcprof</command> uses an XML-based profiling log
+    format, and you therefore need to run your program with a
+    different option: <option>-px</option>.  The file generated is
+    still called <filename>&lt;prog&gt;.prof</filename>.  To see the
+    profile, run <command>ghcprof</command> like this:</para>
+
+    <indexterm><primary><option>-px</option></primary></indexterm>
+
+<screen>
+$ ghcprof &lt;prog&gt;.prof
+</screen>
+
+    <para>which should pop up a window showing the call-graph of your
+    program in glorious detail.  More information on using
+    <command>ghcprof</command> can be found at <ulink
+    url="http://www.dcs.warwick.ac.uk/people/academic/Stephen.Jarvis/profiler/index.html"><citetitle>The
+    Cost-Centre Stack Profiling Tool for
+    GHC</citetitle></ulink>.</para>
+
+  </sect1>
+
+  <sect1 id="hp2ps">
+    <title><command>hp2ps</command>&ndash;&ndash;heap profile to PostScript</title>
+
+    <indexterm><primary><command>hp2ps</command></primary></indexterm>
+    <indexterm><primary>heap profiles</primary></indexterm>
+    <indexterm><primary>postscript, from heap profiles</primary></indexterm>
+    <indexterm><primary><option>-h&lt;break-down&gt;</option></primary></indexterm>
+    
+    <para>Usage:</para>
+    
+<screen>
+hp2ps [flags] [&lt;file&gt;[.hp]]
+</screen>
+
+    <para>The program
+    <command>hp2ps</command><indexterm><primary>hp2ps
+    program</primary></indexterm> converts a heap profile as produced
+    by the <option>-h&lt;break-down&gt;</option> runtime option into a
+    PostScript graph of the heap profile. By convention, the file to
+    be processed by <command>hp2ps</command> has a
+    <filename>.hp</filename> extension. The PostScript output is
+    written to <filename>&lt;file&gt;@.ps</filename>. If
+    <filename>&lt;file&gt;</filename> is omitted entirely, then the
+    program behaves as a filter.</para>
+
+    <para><command>hp2ps</command> is distributed in
+    <filename>ghc/utils/hp2ps</filename> in a GHC source
+    distribution. It was originally developed by Dave Wakeling as part
+    of the HBC/LML heap profiler.</para>
+
+    <para>The flags are:</para>
+
+    <variablelist>
+      
+      <varlistentry>
+	<term><option>-d</option></term>
+	<listitem>
+	  <para>In order to make graphs more readable,
+          <command>hp2ps</command> sorts the shaded bands for each
+          identifier. The default sort ordering is for the bands with
+          the largest area to be stacked on top of the smaller ones.
+          The <option>-d</option> option causes rougher bands (those
+          representing series of values with the largest standard
+          deviations) to be stacked on top of smoother ones.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-b</option></term>
+	<listitem>
+	  <para>Normally, <command>hp2ps</command> puts the title of
+          the graph in a small box at the top of the page. However, if
+          the JOB string is too long to fit in a small box (more than
+          35 characters), then <command>hp2ps</command> will choose to
+          use a big box instead.  The <option>-b</option> option
+          forces <command>hp2ps</command> to use a big box.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-e&lt;float&gt;[in&verbar;mm&verbar;pt]</option></term>
+	<listitem>
+	  <para>Generate encapsulated PostScript suitable for
+          inclusion in LaTeX documents.  Usually, the PostScript graph
+          is drawn in landscape mode in an area 9 inches wide by 6
+          inches high, and <command>hp2ps</command> arranges for this
+          area to be approximately centred on a sheet of a4 paper.
+          This format is convenient of studying the graph in detail,
+          but it is unsuitable for inclusion in LaTeX documents.  The
+          <option>-e</option> option causes the graph to be drawn in
+          portrait mode, with float specifying the width in inches,
+          millimetres or points (the default).  The resulting
+          PostScript file conforms to the Encapsulated PostScript
+          (EPS) convention, and it can be included in a LaTeX document
+          using Rokicki's dvi-to-PostScript converter
+          <command>dvips</command>.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-g</option></term>
+	<listitem>
+	  <para>Create output suitable for the <command>gs</command>
+          PostScript previewer (or similar). In this case the graph is
+          printed in portrait mode without scaling. The output is
+          unsuitable for a laser printer.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-l</option></term>
+	<listitem>
+	  <para>Normally a profile is limited to 20 bands with
+          additional identifiers being grouped into an
+          <literal>OTHER</literal> band. The <option>-l</option> flag
+          removes this 20 band and limit, producing as many bands as
+          necessary. No key is produced as it won't fit!. It is useful
+          for creation time profiles with many bands.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-m&lt;int&gt;</option></term>
+	<listitem>
+	  <para>Normally a profile is limited to 20 bands with
+          additional identifiers being grouped into an
+          <literal>OTHER</literal> band. The <option>-m</option> flag
+          specifies an alternative band limit (the maximum is
+          20).</para>
+
+	  <para><option>-m0</option> requests the band limit to be
+          removed. As many bands as necessary are produced. However no
+          key is produced as it won't fit! It is useful for displaying
+          creation time profiles with many bands.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-p</option></term>
+	<listitem>
+	  <para>Use previous parameters. By default, the PostScript
+          graph is automatically scaled both horizontally and
+          vertically so that it fills the page.  However, when
+          preparing a series of graphs for use in a presentation, it
+          is often useful to draw a new graph using the same scale,
+          shading and ordering as a previous one. The
+          <option>-p</option> flag causes the graph to be drawn using
+          the parameters determined by a previous run of
+          <command>hp2ps</command> on <filename>file</filename>. These
+          are extracted from <filename>file@.aux</filename>.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-s</option></term>
+	<listitem>
+	  <para>Use a small box for the title.</para>
+	</listitem>
+      </varlistentry>
+      
+      <varlistentry>
+	<term><option>-t&lt;float&gt;</option></term>
+	<listitem>
+	  <para>Normally trace elements which sum to a total of less
+          than 1&percnt; of the profile are removed from the
+          profile. The <option>-t</option> option allows this
+          percentage to be modified (maximum 5&percnt;).</para>
+
+	  <para><option>-t0</option> requests no trace elements to be
+          removed from the profile, ensuring that all the data will be
+          displayed.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term><option>-c</option></term>
+	<listitem>
+	  <para>Generate colour output.</para>
+	</listitem>
+      </varlistentry>
+      
+      <varlistentry>
+	<term><option>-y</option></term>
+	<listitem>
+	  <para>Ignore marks.</para>
+	</listitem>
+      </varlistentry>
+      
+      <varlistentry>
+	<term><option>-?</option></term>
+	<listitem>
+	  <para>Print out usage information.</para>
+	</listitem>
+      </varlistentry>
+    </variablelist>
+
+
+    <sect2 id="manipulating-hp">
+      <title>Manipulating the hp file</title>
+
+<para>(Notes kindly offered by Jan-Willhem Maessen.)</para>
+
+<para>
+The <filename>FOO.hp</filename> file produced when you ask for the
+heap profile of a program <filename>FOO</filename> is a text file with a particularly
+simple structure. Here's a representative example, with much of the
+actual data omitted:
+<screen>
+JOB "FOO -hC"
+DATE "Thu Dec 26 18:17 2002"
+SAMPLE_UNIT "seconds"
+VALUE_UNIT "bytes"
+BEGIN_SAMPLE 0.00
+END_SAMPLE 0.00
+BEGIN_SAMPLE 15.07
+  ... sample data ...
+END_SAMPLE 15.07
+BEGIN_SAMPLE 30.23
+  ... sample data ...
+END_SAMPLE 30.23
+... etc.
+BEGIN_SAMPLE 11695.47
+END_SAMPLE 11695.47
+</screen>
+The first four lines (<literal>JOB</literal>, <literal>DATE</literal>, <literal>SAMPLE_UNIT</literal>, <literal>VALUE_UNIT</literal>) form a
+header.  Each block of lines starting with <literal>BEGIN_SAMPLE</literal> and ending
+with <literal>END_SAMPLE</literal> forms a single sample (you can think of this as a
+vertical slice of your heap profile).  The hp2ps utility should accept
+any input with a properly-formatted header followed by a series of
+*complete* samples.
+</para>
+</sect2>
+
+    <sect2>
+      <title>Zooming in on regions of your profile</title>
+
+<para>
+You can look at particular regions of your profile simply by loading a
+copy of the <filename>.hp</filename> file into a text editor and deleting the unwanted
+samples.  The resulting <filename>.hp</filename> file can be run through <command>hp2ps</command> and viewed
+or printed.
+</para>
+</sect2>
+
+    <sect2>
+      <title>Viewing the heap profile of a running program</title>
+
+<para>
+The <filename>.hp</filename> file is generated incrementally as your
+program runs.  In principle, running <command>hp2ps</command> on the incomplete file
+should produce a snapshot of your program's heap usage.  However, the
+last sample in the file may be incomplete, causing <command>hp2ps</command> to fail.  If
+you are using a machine with UNIX utilities installed, it's not too
+hard to work around this problem (though the resulting command line
+looks rather Byzantine):
+<screen>
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+</screen>
+
+The command <command>fgrep -n END_SAMPLE FOO.hp</command> finds the
+end of every complete sample in <filename>FOO.hp</filename>, and labels each sample with
+its ending line number.  We then select the line number of the last
+complete sample using <command>tail</command> and <command>cut</command>.  This is used as a
+parameter to <command>head</command>; the result is as if we deleted the final
+incomplete sample from <filename>FOO.hp</filename>.  This results in a properly-formatted
+.hp file which we feed directly to <command>hp2ps</command>.
+</para>
+</sect2>
+    <sect2>
+      <title>Viewing a heap profile in real time</title>
+
+<para>
+The <command>gv</command> and <command>ghostview</command> programs
+have a "watch file" option can be used to view an up-to-date heap
+profile of your program as it runs.  Simply generate an incremental
+heap profile as described in the previous section.  Run <command>gv</command> on your
+profile:
+<screen>
+  gv -watch -seascape FOO.ps 
+</screen>
+If you forget the <literal>-watch</literal> flag you can still select
+"Watch file" from the "State" menu.  Now each time you generate a new
+profile <filename>FOO.ps</filename> the view will update automatically.
+</para>
+
+<para>
+This can all be encapsulated in a little script:
+<screen>
+  #!/bin/sh
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+  gv -watch -seascape FOO.ps &amp;
+  while [ 1 ] ; do
+    sleep 10 # We generate a new profile every 10 seconds.
+    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+      | hp2ps > FOO.ps
+  done
+</screen>
+Occasionally <command>gv</command> will choke as it tries to read an incomplete copy of
+<filename>FOO.ps</filename> (because <command>hp2ps</command> is still running as an update
+occurs).  A slightly more complicated script works around this
+problem, by using the fact that sending a SIGHUP to gv will cause it
+to re-read its input file:
+<screen>
+  #!/bin/sh
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+  gv FOO.ps &amp;
+  gvpsnum=$!
+  while [ 1 ] ; do
+    sleep 10
+    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+      | hp2ps > FOO.ps
+    kill -HUP $gvpsnum
+  done    
+</screen>
+</para>
+</sect2>
+
+
+  </sect1>
+
+  <sect1 id="ticky-ticky">
+    <title>Using &ldquo;ticky-ticky&rdquo; profiling (for implementors)</title>
+    <indexterm><primary>ticky-ticky profiling</primary></indexterm>
+
+    <para>(ToDo: document properly.)</para>
+
+    <para>It is possible to compile Glasgow Haskell programs so that
+    they will count lots and lots of interesting things, e.g., number
+    of updates, number of data constructors entered, etc., etc.  We
+    call this &ldquo;ticky-ticky&rdquo;
+    profiling,<indexterm><primary>ticky-ticky
+    profiling</primary></indexterm> <indexterm><primary>profiling,
+    ticky-ticky</primary></indexterm> because that's the sound a Sun4
+    makes when it is running up all those counters
+    (<emphasis>slowly</emphasis>).</para>
+
+    <para>Ticky-ticky profiling is mainly intended for implementors;
+    it is quite separate from the main &ldquo;cost-centre&rdquo;
+    profiling system, intended for all users everywhere.</para>
+
+    <para>To be able to use ticky-ticky profiling, you will need to
+    have built appropriate libraries and things when you made the
+    system.  See &ldquo;Customising what libraries to build,&rdquo; in
+    the installation guide.</para>
+
+    <para>To get your compiled program to spit out the ticky-ticky
+    numbers, use a <option>-r</option> RTS
+    option<indexterm><primary>-r RTS option</primary></indexterm>.
+    See <xref linkend="runtime-control"/>.</para>
+
+    <para>Compiling your program with the <option>-ticky</option>
+    switch yields an executable that performs these counts.  Here is a
+    sample ticky-ticky statistics file, generated by the invocation
+    <command>foo +RTS -rfoo.ticky</command>.</para>
+
+<screen>
+ foo +RTS -rfoo.ticky
+
+
+ALLOCATIONS: 3964631 (11330900 words total: 3999476 admin, 6098829 goods, 1232595 slop)
+                                total words:        2     3     4     5    6+
+  69647 (  1.8%) function values                 50.0  50.0   0.0   0.0   0.0
+2382937 ( 60.1%) thunks                           0.0  83.9  16.1   0.0   0.0
+1477218 ( 37.3%) data values                     66.8  33.2   0.0   0.0   0.0
+      0 (  0.0%) big tuples
+      2 (  0.0%) black holes                      0.0 100.0   0.0   0.0   0.0
+      0 (  0.0%) prim things
+  34825 (  0.9%) partial applications             0.0   0.0   0.0 100.0   0.0
+      2 (  0.0%) thread state objects             0.0   0.0   0.0   0.0 100.0
+
+Total storage-manager allocations: 3647137 (11882004 words)
+        [551104 words lost to speculative heap-checks]
+
+STACK USAGE:
+
+ENTERS: 9400092  of which 2005772 (21.3%) direct to the entry code
+                  [the rest indirected via Node's info ptr]
+1860318 ( 19.8%) thunks
+3733184 ( 39.7%) data values
+3149544 ( 33.5%) function values
+                  [of which 1999880 (63.5%) bypassed arg-satisfaction chk]
+ 348140 (  3.7%) partial applications
+ 308906 (  3.3%) normal indirections
+      0 (  0.0%) permanent indirections
+
+RETURNS: 5870443
+2137257 ( 36.4%) from entering a new constructor
+                  [the rest from entering an existing constructor]
+2349219 ( 40.0%) vectored [the rest unvectored]
+
+RET_NEW:         2137257:  32.5% 46.2% 21.3%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
+RET_OLD:         3733184:   2.8% 67.9% 29.3%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
+RET_UNBOXED_TUP:       2:   0.0%  0.0%100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
+
+RET_VEC_RETURN : 2349219:   0.0%  0.0%100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
+
+UPDATE FRAMES: 2241725 (0 omitted from thunks)
+SEQ FRAMES:    1
+CATCH FRAMES:  1
+UPDATES: 2241725
+      0 (  0.0%) data values
+  34827 (  1.6%) partial applications
+                  [2 in place, 34825 allocated new space]
+2206898 ( 98.4%) updates to existing heap objects (46 by squeezing)
+UPD_CON_IN_NEW:         0:       0      0      0      0      0      0      0      0      0
+UPD_PAP_IN_NEW:     34825:       0      0      0  34825      0      0      0      0      0
+
+NEW GEN UPDATES: 2274700 ( 99.9%)
+
+OLD GEN UPDATES: 1852 (  0.1%)
+
+Total bytes copied during GC: 190096
+
+**************************************************
+3647137 ALLOC_HEAP_ctr
+11882004 ALLOC_HEAP_tot
+  69647 ALLOC_FUN_ctr
+  69647 ALLOC_FUN_adm
+  69644 ALLOC_FUN_gds
+  34819 ALLOC_FUN_slp
+  34831 ALLOC_FUN_hst_0
+  34816 ALLOC_FUN_hst_1
+      0 ALLOC_FUN_hst_2
+      0 ALLOC_FUN_hst_3
+      0 ALLOC_FUN_hst_4
+2382937 ALLOC_UP_THK_ctr
+      0 ALLOC_SE_THK_ctr
+ 308906 ENT_IND_ctr
+      0 E!NT_PERM_IND_ctr requires +RTS -Z
+[... lots more info omitted ...]
+      0 GC_SEL_ABANDONED_ctr
+      0 GC_SEL_MINOR_ctr
+      0 GC_SEL_MAJOR_ctr
+      0 GC_FAILED_PROMOTION_ctr
+  47524 GC_WORDS_COPIED_ctr
+</screen>
+
+    <para>The formatting of the information above the row of asterisks
+    is subject to change, but hopefully provides a useful
+    human-readable summary.  Below the asterisks <emphasis>all
+    counters</emphasis> maintained by the ticky-ticky system are
+    dumped, in a format intended to be machine-readable: zero or more
+    spaces, an integer, a space, the counter name, and a newline.</para>
+
+    <para>In fact, not <emphasis>all</emphasis> counters are
+    necessarily dumped; compile- or run-time flags can render certain
+    counters invalid.  In this case, either the counter will simply
+    not appear, or it will appear with a modified counter name,
+    possibly along with an explanation for the omission (notice
+    <literal>ENT&lowbar;PERM&lowbar;IND&lowbar;ctr</literal> appears
+    with an inserted <literal>!</literal> above).  Software analysing
+    this output should always check that it has the counters it
+    expects.  Also, beware: some of the counters can have
+    <emphasis>large</emphasis> values!</para>
+
+  </sect1>
+
+</chapter>
+
+<!-- Emacs stuff:
+     ;;; Local Variables: ***
+     ;;; mode: xml ***
+     ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***
+     ;;; End: ***
+ -->