diff options
Diffstat (limited to 'docs/users_guide/profiling.xml')
-rw-r--r-- | docs/users_guide/profiling.xml | 1440 |
1 files changed, 1440 insertions, 0 deletions
diff --git a/docs/users_guide/profiling.xml b/docs/users_guide/profiling.xml new file mode 100644 index 0000000000..a88c8bbf4c --- /dev/null +++ b/docs/users_guide/profiling.xml @@ -0,0 +1,1440 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<chapter id="profiling"> + <title>Profiling</title> + <indexterm><primary>profiling</primary> + </indexterm> + <indexterm><primary>cost-centre profiling</primary></indexterm> + + <para> Glasgow Haskell comes with a time and space profiling + system. Its purpose is to help you improve your understanding of + your program's execution behaviour, so you can improve it.</para> + + <para> Any comments, suggestions and/or improvements you have are + welcome. Recommended “profiling tricks” would be + especially cool! </para> + + <para>Profiling a program is a three-step process:</para> + + <orderedlist> + <listitem> + <para> Re-compile your program for profiling with the + <literal>-prof</literal> option, and probably one of the + <literal>-auto</literal> or <literal>-auto-all</literal> + options. These options are described in more detail in <xref + linkend="prof-compiler-options"/> </para> + <indexterm><primary><literal>-prof</literal></primary> + </indexterm> + <indexterm><primary><literal>-auto</literal></primary> + </indexterm> + <indexterm><primary><literal>-auto-all</literal></primary> + </indexterm> + </listitem> + + <listitem> + <para> Run your program with one of the profiling options, eg. + <literal>+RTS -p -RTS</literal>. This generates a file of + profiling information.</para> + <indexterm><primary><option>-p</option></primary><secondary>RTS + option</secondary></indexterm> + </listitem> + + <listitem> + <para> Examine the generated profiling information, using one of + GHC's profiling tools. The tool to use will depend on the kind + of profiling information generated.</para> + </listitem> + + </orderedlist> + + <sect1 id="cost-centres"> + <title>Cost centres and cost-centre stacks</title> + + <para>GHC's profiling system assigns <firstterm>costs</firstterm> + to <firstterm>cost centres</firstterm>. A cost is simply the time + or space required to evaluate an expression. Cost centres are + program annotations around expressions; all costs incurred by the + annotated expression are assigned to the enclosing cost centre. + Furthermore, GHC will remember the stack of enclosing cost centres + for any given expression at run-time and generate a call-graph of + cost attributions.</para> + + <para>Let's take a look at an example:</para> + + <programlisting> +main = print (nfib 25) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +</programlisting> + + <para>Compile and run this program as follows:</para> + + <screen> +$ ghc -prof -auto-all -o Main Main.hs +$ ./Main +RTS -p +121393 +$ +</screen> + + <para>When a GHC-compiled program is run with the + <option>-p</option> RTS option, it generates a file called + <filename><prog>.prof</filename>. In this case, the file + will contain something like this:</para> + +<screen> + Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final) + + Main +RTS -p -RTS + + total time = 0.14 secs (7 ticks @ 20 ms) + total alloc = 8,741,204 bytes (excludes profiling overheads) + +COST CENTRE MODULE %time %alloc + +nfib Main 100.0 100.0 + + + individual inherited +COST CENTRE MODULE entries %time %alloc %time %alloc + +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 6 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + nfib Main 242785 100.0 100.0 100.0 100.0 +</screen> + + + <para>The first part of the file gives the program name and + options, and the total time and total memory allocation measured + during the run of the program (note that the total memory + allocation figure isn't the same as the amount of + <emphasis>live</emphasis> memory needed by the program at any one + time; the latter can be determined using heap profiling, which we + will describe shortly).</para> + + <para>The second part of the file is a break-down by cost centre + of the most costly functions in the program. In this case, there + was only one significant function in the program, namely + <function>nfib</function>, and it was responsible for 100% + of both the time and allocation costs of the program.</para> + + <para>The third and final section of the file gives a profile + break-down by cost-centre stack. This is roughly a call-graph + profile of the program. In the example above, it is clear that + the costly call to <function>nfib</function> came from + <function>main</function>.</para> + + <para>The time and allocation incurred by a given part of the + program is displayed in two ways: “individual”, which + are the costs incurred by the code covered by this cost centre + stack alone, and “inherited”, which includes the costs + incurred by all the children of this node.</para> + + <para>The usefulness of cost-centre stacks is better demonstrated + by modifying the example slightly:</para> + + <programlisting> +main = print (f 25 + g 25) +f n = nfib n +g n = nfib (n `div` 2) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +</programlisting> + + <para>Compile and run this program as before, and take a look at + the new profiling results:</para> + +<screen> +COST CENTRE MODULE scc %time %alloc %time %alloc + +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 9 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + g Main 1 0.0 0.0 0.0 0.2 + nfib Main 465 0.0 0.2 0.0 0.2 + f Main 1 0.0 0.0 100.0 99.8 + nfib Main 242785 100.0 99.8 100.0 99.8 +</screen> + + <para>Now although we had two calls to <function>nfib</function> + in the program, it is immediately clear that it was the call from + <function>f</function> which took all the time.</para> + + <para>The actual meaning of the various columns in the output is:</para> + + <variablelist> + <varlistentry> + <term>entries</term> + <listitem> + <para>The number of times this particular point in the call + graph was entered.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>individual %time</term> + <listitem> + <para>The percentage of the total run time of the program + spent at this point in the call graph.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>individual %alloc</term> + <listitem> + <para>The percentage of the total memory allocations + (excluding profiling overheads) of the program made by this + call.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>inherited %time</term> + <listitem> + <para>The percentage of the total run time of the program + spent below this point in the call graph.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>inherited %alloc</term> + <listitem> + <para>The percentage of the total memory allocations + (excluding profiling overheads) of the program made by this + call and all of its sub-calls.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>In addition you can use the <option>-P</option> RTS option + <indexterm><primary><option>-P</option></primary></indexterm> to + get the following additional information:</para> + + <variablelist> + <varlistentry> + <term><literal>ticks</literal></term> + <listitem> + <para>The raw number of time “ticks” which were + attributed to this cost-centre; from this, we get the + <literal>%time</literal> figure mentioned + above.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>bytes</literal></term> + <listitem> + <para>Number of bytes allocated in the heap while in this + cost-centre; again, this is the raw number from which we get + the <literal>%alloc</literal> figure mentioned + above.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>What about recursive functions, and mutually recursive + groups of functions? Where are the costs attributed? Well, + although GHC does keep information about which groups of functions + called each other recursively, this information isn't displayed in + the basic time and allocation profile, instead the call-graph is + flattened into a tree. The XML profiling tool (described in <xref + linkend="prof-xml-tool"/>) will be able to display real loops in + the call-graph.</para> + + <sect2><title>Inserting cost centres by hand</title> + + <para>Cost centres are just program annotations. When you say + <option>-auto-all</option> to the compiler, it automatically + inserts a cost centre annotation around every top-level function + in your program, but you are entirely free to add the cost + centre annotations yourself.</para> + + <para>The syntax of a cost centre annotation is</para> + + <programlisting> + {-# SCC "name" #-} <expression> +</programlisting> + + <para>where <literal>"name"</literal> is an arbitrary string, + that will become the name of your cost centre as it appears + in the profiling output, and + <literal><expression></literal> is any Haskell + expression. An <literal>SCC</literal> annotation extends as + far to the right as possible when parsing.</para> + + </sect2> + + <sect2 id="prof-rules"> + <title>Rules for attributing costs</title> + + <para>The cost of evaluating any expression in your program is + attributed to a cost-centre stack using the following rules:</para> + + <itemizedlist> + <listitem> + <para>If the expression is part of the + <firstterm>one-off</firstterm> costs of evaluating the + enclosing top-level definition, then costs are attributed to + the stack of lexically enclosing <literal>SCC</literal> + annotations on top of the special <literal>CAF</literal> + cost-centre. </para> + </listitem> + + <listitem> + <para>Otherwise, costs are attributed to the stack of + lexically-enclosing <literal>SCC</literal> annotations, + appended to the cost-centre stack in effect at the + <firstterm>call site</firstterm> of the current top-level + definition<footnote> <para>The call-site is just the place + in the source code which mentions the particular function or + variable.</para></footnote>. Notice that this is a recursive + definition.</para> + </listitem> + + <listitem> + <para>Time spent in foreign code (see <xref linkend="ffi"/>) + is always attributed to the cost centre in force at the + Haskell call-site of the foreign function.</para> + </listitem> + </itemizedlist> + + <para>What do we mean by one-off costs? Well, Haskell is a lazy + language, and certain expressions are only ever evaluated once. + For example, if we write:</para> + + <programlisting> +x = nfib 25 +</programlisting> + + <para>then <varname>x</varname> will only be evaluated once (if + at all), and subsequent demands for <varname>x</varname> will + immediately get to see the cached result. The definition + <varname>x</varname> is called a CAF (Constant Applicative + Form), because it has no arguments.</para> + + <para>For the purposes of profiling, we say that the expression + <literal>nfib 25</literal> belongs to the one-off costs of + evaluating <varname>x</varname>.</para> + + <para>Since one-off costs aren't strictly speaking part of the + call-graph of the program, they are attributed to a special + top-level cost centre, <literal>CAF</literal>. There may be one + <literal>CAF</literal> cost centre for each module (the + default), or one for each top-level definition with any one-off + costs (this behaviour can be selected by giving GHC the + <option>-caf-all</option> flag).</para> + + <indexterm><primary><literal>-caf-all</literal></primary> + </indexterm> + + <para>If you think you have a weird profile, or the call-graph + doesn't look like you expect it to, feel free to send it (and + your program) to us at + <email>glasgow-haskell-bugs@haskell.org</email>.</para> + </sect2> + </sect1> + + <sect1 id="prof-compiler-options"> + <title>Compiler options for profiling</title> + + <indexterm><primary>profiling</primary><secondary>options</secondary></indexterm> + <indexterm><primary>options</primary><secondary>for profiling</secondary></indexterm> + + <variablelist> + <varlistentry> + <term> + <option>-prof</option>: + <indexterm><primary><option>-prof</option></primary></indexterm> + </term> + <listitem> + <para> To make use of the profiling system + <emphasis>all</emphasis> modules must be compiled and linked + with the <option>-prof</option> option. Any + <literal>SCC</literal> annotations you've put in your source + will spring to life.</para> + + <para> Without a <option>-prof</option> option, your + <literal>SCC</literal>s are ignored; so you can compile + <literal>SCC</literal>-laden code without changing + it.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>There are a few other profiling-related compilation options. + Use them <emphasis>in addition to</emphasis> + <option>-prof</option>. These do not have to be used consistently + for all modules in a program.</para> + + <variablelist> + <varlistentry> + <term> + <option>-auto</option>: + <indexterm><primary><option>-auto</option></primary></indexterm> + <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm> + </term> + <listitem> + <para> GHC will automatically add + <function>_scc_</function> constructs for all + top-level, exported functions.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-auto-all</option>: + <indexterm><primary><option>-auto-all</option></primary></indexterm> + </term> + <listitem> + <para> <emphasis>All</emphasis> top-level functions, + exported or not, will be automatically + <function>_scc_</function>'d.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-caf-all</option>: + <indexterm><primary><option>-caf-all</option></primary></indexterm> + </term> + <listitem> + <para> The costs of all CAFs in a module are usually + attributed to one “big” CAF cost-centre. With + this option, all CAFs get their own cost-centre. An + “if all else fails” option…</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-ignore-scc</option>: + <indexterm><primary><option>-ignore-scc</option></primary></indexterm> + </term> + <listitem> + <para>Ignore any <function>_scc_</function> + constructs, so a module which already has + <function>_scc_</function>s can be compiled + for profiling with the annotations ignored.</para> + </listitem> + </varlistentry> + + </variablelist> + + </sect1> + + <sect1 id="prof-time-options"> + <title>Time and allocation profiling</title> + + <para>To generate a time and allocation profile, give one of the + following RTS options to the compiled program when you run it (RTS + options should be enclosed between <literal>+RTS...-RTS</literal> + as usual):</para> + + <variablelist> + <varlistentry> + <term> + <option>-p</option> or <option>-P</option>: + <indexterm><primary><option>-p</option></primary></indexterm> + <indexterm><primary><option>-P</option></primary></indexterm> + <indexterm><primary>time profile</primary></indexterm> + </term> + <listitem> + <para>The <option>-p</option> option produces a standard + <emphasis>time profile</emphasis> report. It is written + into the file + <filename><replaceable>program</replaceable>.prof</filename>.</para> + + <para>The <option>-P</option> option produces a more + detailed report containing the actual time and allocation + data as well. (Not used much.)</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-px</option>: + <indexterm><primary><option>-px</option></primary></indexterm> + </term> + <listitem> + <para>The <option>-px</option> option generates profiling + information in the XML format understood by our new + profiling tool, see <xref linkend="prof-xml-tool"/>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-xc</option> + <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>This option makes use of the extra information + maintained by the cost-centre-stack profiler to provide + useful information about the location of runtime errors. + See <xref linkend="rts-options-debugging"/>.</para> + </listitem> + </varlistentry> + + </variablelist> + + </sect1> + + <sect1 id="prof-heap"> + <title>Profiling memory usage</title> + + <para>In addition to profiling the time and allocation behaviour + of your program, you can also generate a graph of its memory usage + over time. This is useful for detecting the causes of + <firstterm>space leaks</firstterm>, when your program holds on to + more memory at run-time that it needs to. Space leaks lead to + longer run-times due to heavy garbage collector activity, and may + even cause the program to run out of memory altogether.</para> + + <para>To generate a heap profile from your program:</para> + + <orderedlist> + <listitem> + <para>Compile the program for profiling (<xref + linkend="prof-compiler-options"/>).</para> + </listitem> + <listitem> + <para>Run it with one of the heap profiling options described + below (eg. <option>-hc</option> for a basic producer profile). + This generates the file + <filename><replaceable>prog</replaceable>.hp</filename>.</para> + </listitem> + <listitem> + <para>Run <command>hp2ps</command> to produce a Postscript + file, + <filename><replaceable>prog</replaceable>.ps</filename>. The + <command>hp2ps</command> utility is described in detail in + <xref linkend="hp2ps"/>.</para> + </listitem> + <listitem> + <para>Display the heap profile using a postscript viewer such + as <application>Ghostview</application>, or print it out on a + Postscript-capable printer.</para> + </listitem> + </orderedlist> + + <sect2 id="rts-options-heap-prof"> + <title>RTS options for heap profiling</title> + + <para>There are several different kinds of heap profile that can + be generated. All the different profile types yield a graph of + live heap against time, but they differ in how the live heap is + broken down into bands. The following RTS options select which + break-down to use:</para> + + <variablelist> + <varlistentry> + <term> + <option>-hc</option> + <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Breaks down the graph by the cost-centre stack which + produced the data.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hm</option> + <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Break down the live heap by the module containing + the code which produced the data.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hd</option> + <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Breaks down the graph by <firstterm>closure + description</firstterm>. For actual data, the description + is just the constructor name, for other closures it is a + compiler-generated string identifying the closure.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hy</option> + <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Breaks down the graph by + <firstterm>type</firstterm>. For closures which have + function type or unknown/polymorphic type, the string will + represent an approximation to the actual type.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hr</option> + <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Break down the graph by <firstterm>retainer + set</firstterm>. Retainer profiling is described in more + detail below (<xref linkend="retainer-prof"/>).</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hb</option> + <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Break down the graph by + <firstterm>biography</firstterm>. Biographical profiling + is described in more detail below (<xref + linkend="biography-prof"/>).</para> + </listitem> + </varlistentry> + </variablelist> + + <para>In addition, the profile can be restricted to heap data + which satisfies certain criteria - for example, you might want + to display a profile by type but only for data produced by a + certain module, or a profile by retainer for a certain type of + data. Restrictions are specified as follows:</para> + + <variablelist> + <varlistentry> + <term> + <option>-hc</option><replaceable>name</replaceable>,... + <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + at the top.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hC</option><replaceable>name</replaceable>,... + <indexterm><primary><option>-hC</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + anywhere in the stack.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hm</option><replaceable>module</replaceable>,... + <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures produced by the + specified modules.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hd</option><replaceable>desc</replaceable>,... + <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures with the specified + description strings.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hy</option><replaceable>type</replaceable>,... + <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures with the specified + types.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hr</option><replaceable>cc</replaceable>,... + <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures with retainer sets + containing cost-centre stacks with one of the specified + cost centres at the top.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-hb</option><replaceable>bio</replaceable>,... + <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Restrict the profile to closures with one of the + specified biographies, where + <replaceable>bio</replaceable> is one of + <literal>lag</literal>, <literal>drag</literal>, + <literal>void</literal>, or <literal>use</literal>.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>For example, the following options will generate a + retainer profile restricted to <literal>Branch</literal> and + <literal>Leaf</literal> constructors:</para> + +<screen> +<replaceable>prog</replaceable> +RTS -hr -hdBranch,Leaf +</screen> + + <para>There can only be one "break-down" option + (eg. <option>-hr</option> in the example above), but there is no + limit on the number of further restrictions that may be applied. + All the options may be combined, with one exception: GHC doesn't + currently support mixing the <option>-hr</option> and + <option>-hb</option> options.</para> + + <para>There are two more options which relate to heap + profiling:</para> + + <variablelist> + <varlistentry> + <term> + <option>-i<replaceable>secs</replaceable></option>: + <indexterm><primary><option>-i</option></primary></indexterm> + </term> + <listitem> + <para>Set the profiling (sampling) interval to + <replaceable>secs</replaceable> seconds (the default is + 0.1 second). Fractions are allowed: for example + <option>-i0.2</option> will get 5 samples per second. + This only affects heap profiling; time profiles are always + sampled on a 1/50 second frequency.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-xt</option> + <indexterm><primary><option>-xt</option></primary><secondary>RTS option</secondary></indexterm> + </term> + <listitem> + <para>Include the memory occupied by threads in a heap + profile. Each thread takes up a small area for its thread + state in addition to the space allocated for its stack + (stacks normally start small and then grow as + necessary).</para> + + <para>This includes the main thread, so using + <option>-xt</option> is a good way to see how much stack + space the program is using.</para> + + <para>Memory occupied by threads and their stacks is + labelled as “TSO” when displaying the profile + by closure description or type description.</para> + </listitem> + </varlistentry> + </variablelist> + + </sect2> + + <sect2 id="retainer-prof"> + <title>Retainer Profiling</title> + + <para>Retainer profiling is designed to help answer questions + like <quote>why is this data being retained?</quote>. We start + by defining what we mean by a retainer:</para> + + <blockquote> + <para>A retainer is either the system stack, or an unevaluated + closure (thunk).</para> + </blockquote> + + <para>In particular, constructors are <emphasis>not</emphasis> + retainers.</para> + + <para>An object B retains object A if (i) B is a retainer object and + (ii) object A can be reached by recursively following pointers + starting from object B, but not meeting any other retainer + objects on the way. Each live object is retained by one or more + retainer objects, collectively called its retainer set, or its + <firstterm>retainer set</firstterm>, or its + <firstterm>retainers</firstterm>.</para> + + <para>When retainer profiling is requested by giving the program + the <option>-hr</option> option, a graph is generated which is + broken down by retainer set. A retainer set is displayed as a + set of cost-centre stacks; because this is usually too large to + fit on the profile graph, each retainer set is numbered and + shown abbreviated on the graph along with its number, and the + full list of retainer sets is dumped into the file + <filename><replaceable>prog</replaceable>.prof</filename>.</para> + + <para>Retainer profiling requires multiple passes over the live + heap in order to discover the full retainer set for each + object, which can be quite slow. So we set a limit on the + maximum size of a retainer set, where all retainer sets larger + than the maximum retainer set size are replaced by the special + set <literal>MANY</literal>. The maximum set size defaults to 8 + and can be altered with the <option>-R</option> RTS + option:</para> + + <variablelist> + <varlistentry> + <term><option>-R</option><replaceable>size</replaceable></term> + <listitem> + <para>Restrict the number of elements in a retainer set to + <replaceable>size</replaceable> (default 8).</para> + </listitem> + </varlistentry> + </variablelist> + + <sect3> + <title>Hints for using retainer profiling</title> + + <para>The definition of retainers is designed to reflect a + common cause of space leaks: a large structure is retained by + an unevaluated computation, and will be released once the + computation is forced. A good example is looking up a value in + a finite map, where unless the lookup is forced in a timely + manner the unevaluated lookup will cause the whole mapping to + be retained. These kind of space leaks can often be + eliminated by forcing the relevant computations to be + performed eagerly, using <literal>seq</literal> or strictness + annotations on data constructor fields.</para> + + <para>Often a particular data structure is being retained by a + chain of unevaluated closures, only the nearest of which will + be reported by retainer profiling - for example A retains B, B + retains C, and C retains a large structure. There might be a + large number of Bs but only a single A, so A is really the one + we're interested in eliminating. However, retainer profiling + will in this case report B as the retainer of the large + structure. To move further up the chain of retainers, we can + ask for another retainer profile but this time restrict the + profile to B objects, so we get a profile of the retainers of + B:</para> + +<screen> +<replaceable>prog</replaceable> +RTS -hr -hcB +</screen> + + <para>This trick isn't foolproof, because there might be other + B closures in the heap which aren't the retainers we are + interested in, but we've found this to be a useful technique + in most cases.</para> + </sect3> + </sect2> + + <sect2 id="biography-prof"> + <title>Biographical Profiling</title> + + <para>A typical heap object may be in one of the following four + states at each point in its lifetime:</para> + + <itemizedlist> + <listitem> + <para>The <firstterm>lag</firstterm> stage, which is the + time between creation and the first use of the + object,</para> + </listitem> + <listitem> + <para>the <firstterm>use</firstterm> stage, which lasts from + the first use until the last use of the object, and</para> + </listitem> + <listitem> + <para>The <firstterm>drag</firstterm> stage, which lasts + from the final use until the last reference to the object + is dropped.</para> + </listitem> + <listitem> + <para>An object which is never used is said to be in the + <firstterm>void</firstterm> state for its whole + lifetime.</para> + </listitem> + </itemizedlist> + + <para>A biographical heap profile displays the portion of the + live heap in each of the four states listed above. Usually the + most interesting states are the void and drag states: live heap + in these states is more likely to be wasted space than heap in + the lag or use states.</para> + + <para>It is also possible to break down the heap in one or more + of these states by a different criteria, by restricting a + profile by biography. For example, to show the portion of the + heap in the drag or void state by producer: </para> + +<screen> +<replaceable>prog</replaceable> +RTS -hc -hbdrag,void +</screen> + + <para>Once you know the producer or the type of the heap in the + drag or void states, the next step is usually to find the + retainer(s):</para> + +<screen> +<replaceable>prog</replaceable> +RTS -hr -hc<replaceable>cc</replaceable>... +</screen> + + <para>NOTE: this two stage process is required because GHC + cannot currently profile using both biographical and retainer + information simultaneously.</para> + </sect2> + + <sect2 id="mem-residency"> + <title>Actual memory residency</title> + + <para>How does the heap residency reported by the heap profiler relate to + the actual memory residency of your program when you run it? You might + see a large discrepancy between the residency reported by the heap + profiler, and the residency reported by tools on your system + (eg. <literal>ps</literal> or <literal>top</literal> on Unix, or the + Task Manager on Windows). There are several reasons for this:</para> + + <itemizedlist> + <listitem> + <para>There is an overhead of profiling itself, which is subtracted + from the residency figures by the profiler. This overhead goes + away when compiling without profiling support, of course. The + space overhead is currently 2 extra + words per heap object, which probably results in + about a 30% overhead.</para> + </listitem> + + <listitem> + <para>Garbage collection requires more memory than the actual + residency. The factor depends on the kind of garbage collection + algorithm in use: a major GC in the standard + generation copying collector will usually require 3L bytes of + memory, where L is the amount of live data. This is because by + default (see the <option>+RTS -F</option> option) we allow the old + generation to grow to twice its size (2L) before collecting it, and + we require additionally L bytes to copy the live data into. When + using compacting collection (see the <option>+RTS -c</option> + option), this is reduced to 2L, and can further be reduced by + tweaking the <option>-F</option> option. Also add the size of the + allocation area (currently a fixed 512Kb).</para> + </listitem> + + <listitem> + <para>The stack isn't counted in the heap profile by default. See the + <option>+RTS -xt</option> option.</para> + </listitem> + + <listitem> + <para>The program text itself, the C stack, any non-heap data (eg. data + allocated by foreign libraries, and data allocated by the RTS), and + <literal>mmap()</literal>'d memory are not counted in the heap profile.</para> + </listitem> + </itemizedlist> + </sect2> + + </sect1> + + <sect1 id="prof-xml-tool"> + <title>Graphical time/allocation profile</title> + + <para>You can view the time and allocation profiling graph of your + program graphically, using <command>ghcprof</command>. This is a + new tool with GHC 4.08, and will eventually be the de-facto + standard way of viewing GHC profiles<footnote><para>Actually this + isn't true any more, we are working on a new tool for + displaying heap profiles using Gtk+HS, so + <command>ghcprof</command> may go away at some point in the future.</para> + </footnote></para> + + <para>To run <command>ghcprof</command>, you need + <productname>uDraw(Graph)</productname> installed, which can be + obtained from <ulink + url="http://www.informatik.uni-bremen.de/uDrawGraph/en/uDrawGraph/uDrawGraph.html"><citetitle>uDraw(Graph)</citetitle></ulink>. Install one of + the binary + distributions, and set your + <envar>UDG_HOME</envar> environment variable to point to the + installation directory.</para> + + <para><command>ghcprof</command> uses an XML-based profiling log + format, and you therefore need to run your program with a + different option: <option>-px</option>. The file generated is + still called <filename><prog>.prof</filename>. To see the + profile, run <command>ghcprof</command> like this:</para> + + <indexterm><primary><option>-px</option></primary></indexterm> + +<screen> +$ ghcprof <prog>.prof +</screen> + + <para>which should pop up a window showing the call-graph of your + program in glorious detail. More information on using + <command>ghcprof</command> can be found at <ulink + url="http://www.dcs.warwick.ac.uk/people/academic/Stephen.Jarvis/profiler/index.html"><citetitle>The + Cost-Centre Stack Profiling Tool for + GHC</citetitle></ulink>.</para> + + </sect1> + + <sect1 id="hp2ps"> + <title><command>hp2ps</command>––heap profile to PostScript</title> + + <indexterm><primary><command>hp2ps</command></primary></indexterm> + <indexterm><primary>heap profiles</primary></indexterm> + <indexterm><primary>postscript, from heap profiles</primary></indexterm> + <indexterm><primary><option>-h<break-down></option></primary></indexterm> + + <para>Usage:</para> + +<screen> +hp2ps [flags] [<file>[.hp]] +</screen> + + <para>The program + <command>hp2ps</command><indexterm><primary>hp2ps + program</primary></indexterm> converts a heap profile as produced + by the <option>-h<break-down></option> runtime option into a + PostScript graph of the heap profile. By convention, the file to + be processed by <command>hp2ps</command> has a + <filename>.hp</filename> extension. The PostScript output is + written to <filename><file>@.ps</filename>. If + <filename><file></filename> is omitted entirely, then the + program behaves as a filter.</para> + + <para><command>hp2ps</command> is distributed in + <filename>ghc/utils/hp2ps</filename> in a GHC source + distribution. It was originally developed by Dave Wakeling as part + of the HBC/LML heap profiler.</para> + + <para>The flags are:</para> + + <variablelist> + + <varlistentry> + <term><option>-d</option></term> + <listitem> + <para>In order to make graphs more readable, + <command>hp2ps</command> sorts the shaded bands for each + identifier. The default sort ordering is for the bands with + the largest area to be stacked on top of the smaller ones. + The <option>-d</option> option causes rougher bands (those + representing series of values with the largest standard + deviations) to be stacked on top of smoother ones.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-b</option></term> + <listitem> + <para>Normally, <command>hp2ps</command> puts the title of + the graph in a small box at the top of the page. However, if + the JOB string is too long to fit in a small box (more than + 35 characters), then <command>hp2ps</command> will choose to + use a big box instead. The <option>-b</option> option + forces <command>hp2ps</command> to use a big box.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-e<float>[in|mm|pt]</option></term> + <listitem> + <para>Generate encapsulated PostScript suitable for + inclusion in LaTeX documents. Usually, the PostScript graph + is drawn in landscape mode in an area 9 inches wide by 6 + inches high, and <command>hp2ps</command> arranges for this + area to be approximately centred on a sheet of a4 paper. + This format is convenient of studying the graph in detail, + but it is unsuitable for inclusion in LaTeX documents. The + <option>-e</option> option causes the graph to be drawn in + portrait mode, with float specifying the width in inches, + millimetres or points (the default). The resulting + PostScript file conforms to the Encapsulated PostScript + (EPS) convention, and it can be included in a LaTeX document + using Rokicki's dvi-to-PostScript converter + <command>dvips</command>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-g</option></term> + <listitem> + <para>Create output suitable for the <command>gs</command> + PostScript previewer (or similar). In this case the graph is + printed in portrait mode without scaling. The output is + unsuitable for a laser printer.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-l</option></term> + <listitem> + <para>Normally a profile is limited to 20 bands with + additional identifiers being grouped into an + <literal>OTHER</literal> band. The <option>-l</option> flag + removes this 20 band and limit, producing as many bands as + necessary. No key is produced as it won't fit!. It is useful + for creation time profiles with many bands.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-m<int></option></term> + <listitem> + <para>Normally a profile is limited to 20 bands with + additional identifiers being grouped into an + <literal>OTHER</literal> band. The <option>-m</option> flag + specifies an alternative band limit (the maximum is + 20).</para> + + <para><option>-m0</option> requests the band limit to be + removed. As many bands as necessary are produced. However no + key is produced as it won't fit! It is useful for displaying + creation time profiles with many bands.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-p</option></term> + <listitem> + <para>Use previous parameters. By default, the PostScript + graph is automatically scaled both horizontally and + vertically so that it fills the page. However, when + preparing a series of graphs for use in a presentation, it + is often useful to draw a new graph using the same scale, + shading and ordering as a previous one. The + <option>-p</option> flag causes the graph to be drawn using + the parameters determined by a previous run of + <command>hp2ps</command> on <filename>file</filename>. These + are extracted from <filename>file@.aux</filename>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-s</option></term> + <listitem> + <para>Use a small box for the title.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-t<float></option></term> + <listitem> + <para>Normally trace elements which sum to a total of less + than 1% of the profile are removed from the + profile. The <option>-t</option> option allows this + percentage to be modified (maximum 5%).</para> + + <para><option>-t0</option> requests no trace elements to be + removed from the profile, ensuring that all the data will be + displayed.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-c</option></term> + <listitem> + <para>Generate colour output.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-y</option></term> + <listitem> + <para>Ignore marks.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-?</option></term> + <listitem> + <para>Print out usage information.</para> + </listitem> + </varlistentry> + </variablelist> + + + <sect2 id="manipulating-hp"> + <title>Manipulating the hp file</title> + +<para>(Notes kindly offered by Jan-Willhem Maessen.)</para> + +<para> +The <filename>FOO.hp</filename> file produced when you ask for the +heap profile of a program <filename>FOO</filename> is a text file with a particularly +simple structure. Here's a representative example, with much of the +actual data omitted: +<screen> +JOB "FOO -hC" +DATE "Thu Dec 26 18:17 2002" +SAMPLE_UNIT "seconds" +VALUE_UNIT "bytes" +BEGIN_SAMPLE 0.00 +END_SAMPLE 0.00 +BEGIN_SAMPLE 15.07 + ... sample data ... +END_SAMPLE 15.07 +BEGIN_SAMPLE 30.23 + ... sample data ... +END_SAMPLE 30.23 +... etc. +BEGIN_SAMPLE 11695.47 +END_SAMPLE 11695.47 +</screen> +The first four lines (<literal>JOB</literal>, <literal>DATE</literal>, <literal>SAMPLE_UNIT</literal>, <literal>VALUE_UNIT</literal>) form a +header. Each block of lines starting with <literal>BEGIN_SAMPLE</literal> and ending +with <literal>END_SAMPLE</literal> forms a single sample (you can think of this as a +vertical slice of your heap profile). The hp2ps utility should accept +any input with a properly-formatted header followed by a series of +*complete* samples. +</para> +</sect2> + + <sect2> + <title>Zooming in on regions of your profile</title> + +<para> +You can look at particular regions of your profile simply by loading a +copy of the <filename>.hp</filename> file into a text editor and deleting the unwanted +samples. The resulting <filename>.hp</filename> file can be run through <command>hp2ps</command> and viewed +or printed. +</para> +</sect2> + + <sect2> + <title>Viewing the heap profile of a running program</title> + +<para> +The <filename>.hp</filename> file is generated incrementally as your +program runs. In principle, running <command>hp2ps</command> on the incomplete file +should produce a snapshot of your program's heap usage. However, the +last sample in the file may be incomplete, causing <command>hp2ps</command> to fail. If +you are using a machine with UNIX utilities installed, it's not too +hard to work around this problem (though the resulting command line +looks rather Byzantine): +<screen> + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps +</screen> + +The command <command>fgrep -n END_SAMPLE FOO.hp</command> finds the +end of every complete sample in <filename>FOO.hp</filename>, and labels each sample with +its ending line number. We then select the line number of the last +complete sample using <command>tail</command> and <command>cut</command>. This is used as a +parameter to <command>head</command>; the result is as if we deleted the final +incomplete sample from <filename>FOO.hp</filename>. This results in a properly-formatted +.hp file which we feed directly to <command>hp2ps</command>. +</para> +</sect2> + <sect2> + <title>Viewing a heap profile in real time</title> + +<para> +The <command>gv</command> and <command>ghostview</command> programs +have a "watch file" option can be used to view an up-to-date heap +profile of your program as it runs. Simply generate an incremental +heap profile as described in the previous section. Run <command>gv</command> on your +profile: +<screen> + gv -watch -seascape FOO.ps +</screen> +If you forget the <literal>-watch</literal> flag you can still select +"Watch file" from the "State" menu. Now each time you generate a new +profile <filename>FOO.ps</filename> the view will update automatically. +</para> + +<para> +This can all be encapsulated in a little script: +<screen> + #!/bin/sh + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + gv -watch -seascape FOO.ps & + while [ 1 ] ; do + sleep 10 # We generate a new profile every 10 seconds. + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + done +</screen> +Occasionally <command>gv</command> will choke as it tries to read an incomplete copy of +<filename>FOO.ps</filename> (because <command>hp2ps</command> is still running as an update +occurs). A slightly more complicated script works around this +problem, by using the fact that sending a SIGHUP to gv will cause it +to re-read its input file: +<screen> + #!/bin/sh + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + gv FOO.ps & + gvpsnum=$! + while [ 1 ] ; do + sleep 10 + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + kill -HUP $gvpsnum + done +</screen> +</para> +</sect2> + + + </sect1> + + <sect1 id="ticky-ticky"> + <title>Using “ticky-ticky” profiling (for implementors)</title> + <indexterm><primary>ticky-ticky profiling</primary></indexterm> + + <para>(ToDo: document properly.)</para> + + <para>It is possible to compile Glasgow Haskell programs so that + they will count lots and lots of interesting things, e.g., number + of updates, number of data constructors entered, etc., etc. We + call this “ticky-ticky” + profiling,<indexterm><primary>ticky-ticky + profiling</primary></indexterm> <indexterm><primary>profiling, + ticky-ticky</primary></indexterm> because that's the sound a Sun4 + makes when it is running up all those counters + (<emphasis>slowly</emphasis>).</para> + + <para>Ticky-ticky profiling is mainly intended for implementors; + it is quite separate from the main “cost-centre” + profiling system, intended for all users everywhere.</para> + + <para>To be able to use ticky-ticky profiling, you will need to + have built appropriate libraries and things when you made the + system. See “Customising what libraries to build,” in + the installation guide.</para> + + <para>To get your compiled program to spit out the ticky-ticky + numbers, use a <option>-r</option> RTS + option<indexterm><primary>-r RTS option</primary></indexterm>. + See <xref linkend="runtime-control"/>.</para> + + <para>Compiling your program with the <option>-ticky</option> + switch yields an executable that performs these counts. Here is a + sample ticky-ticky statistics file, generated by the invocation + <command>foo +RTS -rfoo.ticky</command>.</para> + +<screen> + foo +RTS -rfoo.ticky + + +ALLOCATIONS: 3964631 (11330900 words total: 3999476 admin, 6098829 goods, 1232595 slop) + total words: 2 3 4 5 6+ + 69647 ( 1.8%) function values 50.0 50.0 0.0 0.0 0.0 +2382937 ( 60.1%) thunks 0.0 83.9 16.1 0.0 0.0 +1477218 ( 37.3%) data values 66.8 33.2 0.0 0.0 0.0 + 0 ( 0.0%) big tuples + 2 ( 0.0%) black holes 0.0 100.0 0.0 0.0 0.0 + 0 ( 0.0%) prim things + 34825 ( 0.9%) partial applications 0.0 0.0 0.0 100.0 0.0 + 2 ( 0.0%) thread state objects 0.0 0.0 0.0 0.0 100.0 + +Total storage-manager allocations: 3647137 (11882004 words) + [551104 words lost to speculative heap-checks] + +STACK USAGE: + +ENTERS: 9400092 of which 2005772 (21.3%) direct to the entry code + [the rest indirected via Node's info ptr] +1860318 ( 19.8%) thunks +3733184 ( 39.7%) data values +3149544 ( 33.5%) function values + [of which 1999880 (63.5%) bypassed arg-satisfaction chk] + 348140 ( 3.7%) partial applications + 308906 ( 3.3%) normal indirections + 0 ( 0.0%) permanent indirections + +RETURNS: 5870443 +2137257 ( 36.4%) from entering a new constructor + [the rest from entering an existing constructor] +2349219 ( 40.0%) vectored [the rest unvectored] + +RET_NEW: 2137257: 32.5% 46.2% 21.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% +RET_OLD: 3733184: 2.8% 67.9% 29.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% +RET_UNBOXED_TUP: 2: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% + +RET_VEC_RETURN : 2349219: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% + +UPDATE FRAMES: 2241725 (0 omitted from thunks) +SEQ FRAMES: 1 +CATCH FRAMES: 1 +UPDATES: 2241725 + 0 ( 0.0%) data values + 34827 ( 1.6%) partial applications + [2 in place, 34825 allocated new space] +2206898 ( 98.4%) updates to existing heap objects (46 by squeezing) +UPD_CON_IN_NEW: 0: 0 0 0 0 0 0 0 0 0 +UPD_PAP_IN_NEW: 34825: 0 0 0 34825 0 0 0 0 0 + +NEW GEN UPDATES: 2274700 ( 99.9%) + +OLD GEN UPDATES: 1852 ( 0.1%) + +Total bytes copied during GC: 190096 + +************************************************** +3647137 ALLOC_HEAP_ctr +11882004 ALLOC_HEAP_tot + 69647 ALLOC_FUN_ctr + 69647 ALLOC_FUN_adm + 69644 ALLOC_FUN_gds + 34819 ALLOC_FUN_slp + 34831 ALLOC_FUN_hst_0 + 34816 ALLOC_FUN_hst_1 + 0 ALLOC_FUN_hst_2 + 0 ALLOC_FUN_hst_3 + 0 ALLOC_FUN_hst_4 +2382937 ALLOC_UP_THK_ctr + 0 ALLOC_SE_THK_ctr + 308906 ENT_IND_ctr + 0 E!NT_PERM_IND_ctr requires +RTS -Z +[... lots more info omitted ...] + 0 GC_SEL_ABANDONED_ctr + 0 GC_SEL_MINOR_ctr + 0 GC_SEL_MAJOR_ctr + 0 GC_FAILED_PROMOTION_ctr + 47524 GC_WORDS_COPIED_ctr +</screen> + + <para>The formatting of the information above the row of asterisks + is subject to change, but hopefully provides a useful + human-readable summary. Below the asterisks <emphasis>all + counters</emphasis> maintained by the ticky-ticky system are + dumped, in a format intended to be machine-readable: zero or more + spaces, an integer, a space, the counter name, and a newline.</para> + + <para>In fact, not <emphasis>all</emphasis> counters are + necessarily dumped; compile- or run-time flags can render certain + counters invalid. In this case, either the counter will simply + not appear, or it will appear with a modified counter name, + possibly along with an explanation for the omission (notice + <literal>ENT_PERM_IND_ctr</literal> appears + with an inserted <literal>!</literal> above). Software analysing + this output should always check that it has the counters it + expects. Also, beware: some of the counters can have + <emphasis>large</emphasis> values!</para> + + </sect1> + +</chapter> + +<!-- Emacs stuff: + ;;; Local Variables: *** + ;;; mode: xml *** + ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") *** + ;;; End: *** + --> |