diff options
author | Simon Marlow <marlowsd@gmail.com> | 2011-10-27 13:47:27 +0100 |
---|---|---|
committer | Simon Marlow <marlowsd@gmail.com> | 2011-11-02 16:34:05 +0000 |
commit | 7bb0447df9a783c222c2a077e35e5013c7c68d91 (patch) | |
tree | 78d6d2a14f7e42df5cda32199c71ced973f169ef /docs | |
parent | bd72eeb184a95ae0ae79ccad19c8ccc2b45a12e0 (diff) | |
download | haskell-7bb0447df9a783c222c2a077e35e5013c7c68d91.tar.gz |
Overhaul of infrastructure for profiling, coverage (HPC) and breakpoints
User visible changes
====================
Profilng
--------
Flags renamed (the old ones are still accepted for now):
OLD NEW
--------- ------------
-auto-all -fprof-auto
-auto -fprof-exported
-caf-all -fprof-cafs
New flags:
-fprof-auto Annotates all bindings (not just top-level
ones) with SCCs
-fprof-top Annotates just top-level bindings with SCCs
-fprof-exported Annotates just exported bindings with SCCs
-fprof-no-count-entries Do not maintain entry counts when profiling
(can make profiled code go faster; useful with
heap profiling where entry counts are not used)
Cost-centre stacks have a new semantics, which should in most cases
result in more useful and intuitive profiles. If you find this not to
be the case, please let me know. This is the area where I have been
experimenting most, and the current solution is probably not the
final version, however it does address all the outstanding bugs and
seems to be better than GHC 7.2.
Stack traces
------------
+RTS -xc now gives more information. If the exception originates from
a CAF (as is common, because GHC tends to lift exceptions out to the
top-level), then the RTS walks up the stack and reports the stack in
the enclosing update frame(s).
Result: +RTS -xc is much more useful now - but you still have to
compile for profiling to get it. I've played around a little with
adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
quite accurately.
I plan to add more facilities for stack tracing (e.g. in GHCi) in the
future.
Coverage (HPC)
--------------
* derived instances are now coloured yellow if they weren't used
* likewise record field names
* entry counts are more accurate (hpc --fun-entry-count)
* tab width is now correct (markup was previously off in source with
tabs)
Internal changes
================
In Core, the Note constructor has been replaced by
Tick (Tickish b) (Expr b)
which is used to represent all the kinds of source annotation we
support: profiling SCCs, HPC ticks, and GHCi breakpoints.
Depending on the properties of the Tickish, different transformations
apply to Tick. See CoreUtils.mkTick for details.
Tickets
=======
This commit closes the following tickets, test cases to follow:
- Close #2552: not a bug, but the behaviour is now more intuitive
(test is T2552)
- Close #680 (test is T680)
- Close #1531 (test is result001)
- Close #949 (test is T949)
- Close #2466: test case has bitrotted (doesn't compile against current
version of vector-space package)
Diffstat (limited to 'docs')
-rw-r--r-- | docs/users_guide/flags.xml | 47 | ||||
-rw-r--r-- | docs/users_guide/profiling.xml | 560 | ||||
-rw-r--r-- | docs/users_guide/runtime_control.xml | 56 |
3 files changed, 388 insertions, 275 deletions
diff --git a/docs/users_guide/flags.xml b/docs/users_guide/flags.xml index 244bbf64a6..e765525c13 100644 --- a/docs/users_guide/flags.xml +++ b/docs/users_guide/flags.xml @@ -1668,32 +1668,45 @@ phase <replaceable>n</replaceable></entry> </thead> <tbody> <row> - <entry><option>-auto</option></entry> - <entry>Auto-add <literal>_scc_</literal>s to all - exported functions not marked INLINE</entry> - <entry>dynamic</entry> - <entry><option>-no-auto</option></entry> + <entry><option>-prof</option></entry> + <entry>Turn on profiling</entry> + <entry>static</entry> + <entry>-</entry> </row> <row> - <entry><option>-auto-all</option></entry> - <entry>Auto-add <literal>_scc_</literal>s to all - top-level functions not marked INLINE</entry> + <entry><option>-fprof-auto</option></entry> + <entry>Auto-add <literal>SCC</literal>s to all bindings + not marked INLINE</entry> <entry>dynamic</entry> - <entry><option>-no-auto-all</option></entry> + <entry><option>-fno-prof-auto</option></entry> </row> <row> - <entry><option>-caf-all</option></entry> - <entry>Auto-add <literal>_scc_</literal>s to all CAFs</entry> - <entry>dynamic</entry> - <entry><option>-no-caf-all</option></entry> + <entry><option>-fprof-auto-top</option></entry> + <entry>Auto-add <literal>SCC</literal>s to all top-level + bindings not marked INLINE</entry> + <entry>dynamic</entry> + <entry><option>-fno-prof-auto</option></entry> </row> <row> - <entry><option>-prof</option></entry> - <entry>Turn on profiling</entry> - <entry>static</entry> - <entry>-</entry> + <entry><option>-fprof-auto-exported</option></entry> + <entry>Auto-add <literal>SCC</literal>s to all exported + bindings not marked INLINE</entry> + <entry>dynamic</entry> + <entry><option>-fno-prof-auto</option></entry> </row> <row> + <entry><option>-fprof-cafs</option></entry> + <entry>Auto-add <literal>SCC</literal>s to all CAFs</entry> + <entry>dynamic</entry> + <entry><option>-fno-prof-cafs</option></entry> + </row> + <row> + <entry><option>-fno-prof-count-entries</option></entry> + <entry>Do not collect entry counts</entry> + <entry>dynamic</entry> + <entry><option>-fprof-count-entries</option></entry> + </row> + <row> <entry><option>-ticky</option></entry> <entry><link linkend="ticky-ticky">Turn on ticky-ticky profiling</link></entry> <entry>static</entry> diff --git a/docs/users_guide/profiling.xml b/docs/users_guide/profiling.xml index 01c7576b63..16dead80ae 100644 --- a/docs/users_guide/profiling.xml +++ b/docs/users_guide/profiling.xml @@ -5,45 +5,58 @@ </indexterm> <indexterm><primary>cost-centre profiling</primary></indexterm> - <para> Glasgow Haskell comes with a time and space profiling - system. Its purpose is to help you improve your understanding of - your program's execution behaviour, so you can improve it.</para> + <para>GHC comes with a time and space profiling system, so that you + can answer questions like "why is my program so slow?", or "why is + my program using so much memory?".</para> - <para> Any comments, suggestions and/or improvements you have are - welcome. Recommended “profiling tricks” would be - especially cool! </para> + <para>Note that multi-processor execution (e.g. <literal>+RTS + -N2</literal>) is not currently supported with GHC's time and space + profiling. However, there is a separate tool specifically for + profiling concurrent and parallel programs: <ulink + url="http://www.haskell.org/haskellwiki/ThreadScope">ThreadScope</ulink>.</para> <para>Profiling a program is a three-step process:</para> <orderedlist> <listitem> - <para> Re-compile your program for profiling with the - <literal>-prof</literal> option, and probably one of the - <literal>-auto</literal> or <literal>-auto-all</literal> - options. These options are described in more detail in <xref - linkend="prof-compiler-options"/> </para> - <indexterm><primary><literal>-prof</literal></primary> - </indexterm> - <indexterm><primary><literal>-auto</literal></primary> - </indexterm> - <indexterm><primary><literal>-auto-all</literal></primary> - </indexterm> + <para>Re-compile your program for profiling with the + <literal>-prof</literal> option, and probably one of the options + for adding automatic annotations: + <literal>-fprof-auto</literal> is the most common<footnote><para><option>-fprof-auto</option> was known as <option>-auto-all</option><indexterm><primary><literal>-auto-all</literal></primary> + </indexterm> prior to GHC 7.4.1.</para></footnote>. + <indexterm><primary><literal>-fprof-auto</literal></primary> + </indexterm></para> + + <para>If you are using external packages with + <literal>cabal</literal>, you may need to reinstall these + packages with profiling support; typically this is done with + <literal>cabal install -p <replaceble>package</replaceble> + --reinstall</literal>.</para> </listitem> <listitem> - <para> Run your program with one of the profiling options, eg. - <literal>+RTS -p -RTS</literal>. This generates a file of - profiling information. Note that multi-processor execution - (e.g. <literal>+RTS -N2</literal>) is not supported while - profiling.</para> - <indexterm><primary><option>-p</option></primary><secondary>RTS - option</secondary></indexterm> + <para>Having compiled the program for profiling, you now need to + run it to generate the profile. For example, a simple time + profile can be generated by running the program with + <option>+RTS + -p</option><indexterm><primary><option>-p</option></primary><secondary>RTS + option</secondary></indexterm>, which generates a file named + <literal><replaceable>prog</replaceable>.prof</literal> where + <replaceable>prog</replaceable> is the name of your program + (without the <literal>.exe</literal> extension, if you are on + Windows).</para> + + <para>There are many different kinds of profile that can be + generated, selected by different RTS options. We will be + describing the various kinds of profile throughout the rest of + this chapter. Some profiles require further processing using + additional tools after running the program.</para> </listitem> <listitem> - <para> Examine the generated profiling information, using one of - GHC's profiling tools. The tool to use will depend on the kind - of profiling information generated.</para> + <para>Examine the generated profiling information, use the + information to optimise your program, and repeat as + necessary.</para> </listitem> </orderedlist> @@ -53,24 +66,24 @@ <para>GHC's profiling system assigns <firstterm>costs</firstterm> to <firstterm>cost centres</firstterm>. A cost is simply the time - or space required to evaluate an expression. Cost centres are + or space (memory) required to evaluate an expression. Cost centres are program annotations around expressions; all costs incurred by the annotated expression are assigned to the enclosing cost centre. Furthermore, GHC will remember the stack of enclosing cost centres - for any given expression at run-time and generate a call-graph of + for any given expression at run-time and generate a call-tree of cost attributions.</para> <para>Let's take a look at an example:</para> <programlisting> -main = print (nfib 25) -nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +main = print (fib 30) +fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) </programlisting> <para>Compile and run this program as follows:</para> <screen> -$ ghc -prof -auto-all -o Main Main.hs +$ ghc -prof -fprof-auto -rtsopts Main.hs $ ./Main +RTS -p 121393 $ @@ -78,53 +91,52 @@ $ <para>When a GHC-compiled program is run with the <option>-p</option> RTS option, it generates a file called - <filename><prog>.prof</filename>. In this case, the file + <filename><replaceable>prog</replaceable>.prof</filename>. In this case, the file will contain something like this:</para> <screen> - Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final) + Wed Oct 12 16:14 2011 Time and Allocation Profiling Report (Final) Main +RTS -p -RTS - total time = 0.14 secs (7 ticks @ 20 ms) - total alloc = 8,741,204 bytes (excludes profiling overheads) + total time = 0.68 secs (34 ticks @ 20 ms) + total alloc = 204,677,844 bytes (excludes profiling overheads) -COST CENTRE MODULE %time %alloc +COST CENTRE MODULE %time %alloc -nfib Main 100.0 100.0 +fib Main 100.0 100.0 - individual inherited -COST CENTRE MODULE entries %time %alloc %time %alloc + individual inherited +COST CENTRE MODULE no. entries %time %alloc %time %alloc -MAIN MAIN 0 0.0 0.0 100.0 100.0 - main Main 0 0.0 0.0 0.0 0.0 - CAF PrelHandle 3 0.0 0.0 0.0 0.0 - CAF PrelAddr 1 0.0 0.0 0.0 0.0 - CAF Main 6 0.0 0.0 100.0 100.0 - main Main 1 0.0 0.0 100.0 100.0 - nfib Main 242785 100.0 100.0 100.0 100.0 +MAIN MAIN 102 0 0.0 0.0 100.0 100.0 + CAF GHC.IO.Handle.FD 128 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Encoding.Iconv 120 0 0.0 0.0 0.0 0.0 + CAF GHC.Conc.Signal 110 0 0.0 0.0 0.0 0.0 + CAF Main 108 0 0.0 0.0 100.0 100.0 + main Main 204 1 0.0 0.0 100.0 100.0 + fib Main 205 2692537 100.0 100.0 100.0 100.0 </screen> - <para>The first part of the file gives the program name and options, and the total time and total memory allocation measured during the run of the program (note that the total memory allocation figure isn't the same as the amount of <emphasis>live</emphasis> memory needed by the program at any one time; the latter can be determined using heap profiling, which we - will describe shortly).</para> + will describe later in <xref linkend="prof-heap" />).</para> <para>The second part of the file is a break-down by cost centre of the most costly functions in the program. In this case, there was only one significant function in the program, namely - <function>nfib</function>, and it was responsible for 100% + <function>fib</function>, and it was responsible for 100% of both the time and allocation costs of the program.</para> <para>The third and final section of the file gives a profile - break-down by cost-centre stack. This is roughly a call-graph + break-down by cost-centre stack. This is roughly a call-tree profile of the program. In the example above, it is clear that - the costly call to <function>nfib</function> came from + the costly call to <function>fib</function> came from <function>main</function>.</para> <para>The time and allocation incurred by a given part of the @@ -137,33 +149,39 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 by modifying the example slightly:</para> <programlisting> -main = print (f 25 + g 25) -f n = nfib n -g n = nfib (n `div` 2) -nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +main = print (f 30 + g 30) + where + f n = fib n + g n = fib (n `div` 2) + +fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) </programlisting> <para>Compile and run this program as before, and take a look at the new profiling results:</para> <screen> -COST CENTRE MODULE scc %time %alloc %time %alloc - -MAIN MAIN 0 0.0 0.0 100.0 100.0 - main Main 0 0.0 0.0 0.0 0.0 - CAF PrelHandle 3 0.0 0.0 0.0 0.0 - CAF PrelAddr 1 0.0 0.0 0.0 0.0 - CAF Main 9 0.0 0.0 100.0 100.0 - main Main 1 0.0 0.0 100.0 100.0 - g Main 1 0.0 0.0 0.0 0.2 - nfib Main 465 0.0 0.2 0.0 0.2 - f Main 1 0.0 0.0 100.0 99.8 - nfib Main 242785 100.0 99.8 100.0 99.8 +COST CENTRE MODULE no. entries %time %alloc %time %alloc + +MAIN MAIN 102 0 0.0 0.0 100.0 100.0 + CAF GHC.IO.Handle.FD 128 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Encoding.Iconv 120 0 0.0 0.0 0.0 0.0 + CAF GHC.Conc.Signal 110 0 0.0 0.0 0.0 0.0 + CAF Main 108 0 0.0 0.0 100.0 100.0 + main Main 204 1 0.0 0.0 100.0 100.0 + main.g Main 207 1 0.0 0.0 0.0 0.1 + fib Main 208 1973 0.0 0.1 0.0 0.1 + main.f Main 205 1 0.0 0.0 100.0 99.9 + fib Main 206 2692537 100.0 99.9 100.0 99.9 </screen> - <para>Now although we had two calls to <function>nfib</function> - in the program, it is immediately clear that it was the call from - <function>f</function> which took all the time.</para> + <para>Now although we had two calls to <function>fib</function> in + the program, it is immediately clear that it was the call from + <function>f</function> which took all the time. The functions + <literal>f</literal> and <literal>g</literal> which are defined in + the <literal>where</literal> clause in <literal>main</literal> are + given their own cost centres, <literal>main.f</literal> and + <literal>main.g</literal> respectively.</para> <para>The actual meaning of the various columns in the output is:</para> @@ -172,7 +190,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 <term>entries</term> <listitem> <para>The number of times this particular point in the call - graph was entered.</para> + tree was entered.</para> </listitem> </varlistentry> @@ -180,7 +198,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 <term>individual %time</term> <listitem> <para>The percentage of the total run time of the program - spent at this point in the call graph.</para> + spent at this point in the call tree.</para> </listitem> </varlistentry> @@ -197,7 +215,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 <term>inherited %time</term> <listitem> <para>The percentage of the total run time of the program - spent below this point in the call graph.</para> + spent below this point in the call tree.</para> </listitem> </varlistentry> @@ -242,19 +260,24 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 although GHC does keep information about which groups of functions called each other recursively, this information isn't displayed in the basic time and allocation profile, instead the call-graph is - flattened into a tree.</para> + flattened into a tree as follows: a call to a function that occurs + elsewhere on the current stack does not push another entry on the + stack, instead the costs for this call are aggregated into the + caller<footnote><para>Note that this policy has changed slightly + in GHC 7.4.1 relative to earlier versions, and may yet change + further, feedback is welcome.</para></footnote>.</para> - <sect2><title>Inserting cost centres by hand</title> + <sect2 id="scc-pragma"><title>Inserting cost centres by hand</title> <para>Cost centres are just program annotations. When you say - <option>-auto-all</option> to the compiler, it automatically - inserts a cost centre annotation around every top-level function - not marked INLINE in your program, but you are entirely free to - add the cost centre annotations yourself.</para> + <option>-fprof-auto</option> to the compiler, it automatically + inserts a cost centre annotation around every binding not marked + INLINE in your program, but you are entirely free to add cost + centre annotations yourself.</para> <para>The syntax of a cost centre annotation is</para> - <programlisting> +<programlisting> {-# SCC "name" #-} <expression> </programlisting> @@ -270,92 +293,72 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 <programlisting> main :: IO () -main = do let xs = {-# SCC "X" #-} [1..1000000] - let ys = {-# SCC "Y" #-} [1..2000000] - print $ last xs - print $ last $ init xs - print $ last ys - print $ last $ init ys +main = do let xs = [1..1000000] + let ys = [1..2000000] + print $ {-# SCC "last_xs" #-} last xs + print $ {-# SCC "last_init_xs" #-} last $ init xs + print $ {-# SCC "last_ys" #-} last ys + print $ {-# SCC "last_init_ys" #-}last $ init ys </programlisting> - <para>which gives this heap profile when run:</para> + <para>which gives this profile when run:</para> - <!-- contentwidth/contentheight don't appear to have any effect - other than making the PS file generation work, rather than - falling over. The result seems to be broken PS on the page - with the image. --> - <imagedata fileref="prof_scc" contentwidth="645px" - contentdepth="428px"/> +<screen> +COST CENTRE MODULE no. entries %time %alloc %time %alloc + +MAIN MAIN 102 0 0.0 0.0 100.0 100.0 + CAF GHC.IO.Handle.FD 130 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Encoding.Iconv 122 0 0.0 0.0 0.0 0.0 + CAF GHC.Conc.Signal 111 0 0.0 0.0 0.0 0.0 + CAF Main 108 0 0.0 0.0 100.0 100.0 + main Main 204 1 0.0 0.0 100.0 100.0 + last_init_ys Main 210 1 25.0 27.4 25.0 27.4 + main.ys Main 209 1 25.0 39.2 25.0 39.2 + last_ys Main 208 1 12.5 0.0 12.5 0.0 + last_init_xs Main 207 1 12.5 13.7 12.5 13.7 + main.xs Main 206 1 18.8 19.6 18.8 19.6 + last_xs Main 205 1 6.2 0.0 6.2 0.0 +</screen> </sect2> <sect2 id="prof-rules"> <title>Rules for attributing costs</title> - <para>The cost of evaluating any expression in your program is - attributed to a cost-centre stack using the following rules:</para> - - <itemizedlist> - <listitem> - <para>If the expression is part of the - <firstterm>one-off</firstterm> costs of evaluating the - enclosing top-level definition, then costs are attributed to - the stack of lexically enclosing <literal>SCC</literal> - annotations on top of the special <literal>CAF</literal> - cost-centre. </para> - </listitem> - - <listitem> - <para>Otherwise, costs are attributed to the stack of - lexically-enclosing <literal>SCC</literal> annotations, - appended to the cost-centre stack in effect at the - <firstterm>call site</firstterm> of the current top-level - definition<footnote> <para>The call-site is just the place - in the source code which mentions the particular function or - variable.</para></footnote>. Notice that this is a recursive - definition.</para> - </listitem> - - <listitem> - <para>Time spent in foreign code (see <xref linkend="ffi"/>) - is always attributed to the cost centre in force at the - Haskell call-site of the foreign function.</para> - </listitem> - </itemizedlist> - - <para>What do we mean by one-off costs? Well, Haskell is a lazy - language, and certain expressions are only ever evaluated once. - For example, if we write:</para> - - <programlisting> -x = nfib 25 -</programlisting> - - <para>then <varname>x</varname> will only be evaluated once (if - at all), and subsequent demands for <varname>x</varname> will - immediately get to see the cached result. The definition - <varname>x</varname> is called a CAF (Constant Applicative - Form), because it has no arguments.</para> - - <para>For the purposes of profiling, we say that the expression - <literal>nfib 25</literal> belongs to the one-off costs of - evaluating <varname>x</varname>.</para> - - <para>Since one-off costs aren't strictly speaking part of the - call-graph of the program, they are attributed to a special - top-level cost centre, <literal>CAF</literal>. There may be one - <literal>CAF</literal> cost centre for each module (the - default), or one for each top-level definition with any one-off - costs (this behaviour can be selected by giving GHC the - <option>-caf-all</option> flag).</para> - - <indexterm><primary><literal>-caf-all</literal></primary> - </indexterm> - - <para>If you think you have a weird profile, or the call-graph - doesn't look like you expect it to, feel free to send it (and - your program) to us at - <email>glasgow-haskell-bugs@haskell.org</email>.</para> + <para>While running a program with profiling turned on, GHC + maintains a cost-centre stack behind the scenes, and attributes + any costs (memory allocation and time) to whatever the current + cost-centre stack is at the time the cost is incurred.</para> + + <para>The mechanism is simple: whenever the program evaluates an + expression with an SCC annotation, <literal>{-# SCC c -#} + E</literal>, the cost centre <literal>c</literal> is pushed on + the current stack, and the entry count for this stack is + incremented by one. The stack also sometimes has to be saved + and restored; in particular when the program creates a + <firstterm>thunk</firstterm> (a lazy suspension), the current + cost-centre stack is stored in the thunk, and restored when the + thunk is evaluated. In this way, the cost-centre stack is + independent of the actual evaluation order used by GHC at + runtime.</para> + + <para>At a function call, GHC takes the stack stored in the + function being called (which for a top-level function will be + empty), and <emphasis>appends</emphasis> it to the current + stack, ignoring any prefix that is identical to a prefix of the + current stack.</para> + + <para>We mentioned earlier that lazy computations, i.e. thunks, + capture the current stack when they are created, and restore + this stack when they are evaluated. What about top-level + thunks? They are "created" when the program is compiled, so + what stack should we give them? The technical name for a + top-level thunk is a CAF ("Constant Applicative Form"). GHC + assigns every CAF in a module a stack consisting of the single + cost centre <literal>M.CAF</literal>, where <literal>M</literal> + is the name of the module. It is also possible to give each CAF + a different stack, using the option + <option>-fprof-cafs</option><indexterm><primary><option>-fprof-cafs</option></primary></indexterm>.</para> </sect2> </sect1> @@ -372,13 +375,13 @@ x = nfib 25 <indexterm><primary><option>-prof</option></primary></indexterm> </term> <listitem> - <para> To make use of the profiling system + <para>To make use of the profiling system <emphasis>all</emphasis> modules must be compiled and linked with the <option>-prof</option> option. Any <literal>SCC</literal> annotations you've put in your source will spring to life.</para> - <para> Without a <option>-prof</option> option, your + <para>Without a <option>-prof</option> option, your <literal>SCC</literal>s are ignored; so you can compile <literal>SCC</literal>-laden code without changing it.</para> @@ -394,40 +397,52 @@ x = nfib 25 <variablelist> <varlistentry> <term> - <option>-auto</option>: - <indexterm><primary><option>-auto</option></primary></indexterm> + <option>-fprof-auto</option>: + <indexterm><primary><option>-fprof-auto</option></primary></indexterm> + </term> + <listitem> + <para><emphasis>All</emphasis> bindings not marked INLINE, + whether exported or not, top level or nested, will be given + automatic <literal>SCC</literal> annotations. Functions + marked INLINE must be given a cost centre manually.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-fprof-auto-top</option>: + <indexterm><primary><option>-fprof-auto-top</option></primary></indexterm> <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm> </term> <listitem> - <para> GHC will automatically add - <function>_scc_</function> constructs for all - top-level, exported functions not marked INLINE. If you - want a cost centre on an INLINE function, you have to add - it manually.</para> + <para>GHC will automatically add <literal>SCC</literal> + annotations for all top-level bindings not marked INLINE. If + you want a cost centre on an INLINE function, you have to + add it manually.</para> </listitem> </varlistentry> <varlistentry> <term> - <option>-auto-all</option>: - <indexterm><primary><option>-auto-all</option></primary></indexterm> + <option>-fprof-auto-exported</option>: + <indexterm><primary><option>-fprof-auto-top</option></primary></indexterm> + <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm> </term> <listitem> - <para> <emphasis>All</emphasis> top-level functions - not marked INLINE, exported or not, will be automatically - <function>_scc_</function>'d. - The functions marked INLINE must be given a cost centre - manually.</para> + <para>GHC will automatically add <literal>SCC</literal> + annotations for all exported functions not marked + INLINE. If you want a cost centre on an INLINE function, you + have to add it manually.</para> </listitem> </varlistentry> <varlistentry> <term> - <option>-caf-all</option>: - <indexterm><primary><option>-caf-all</option></primary></indexterm> + <option>-fprof-cafs</option>: + <indexterm><primary><option>-fprof-cafs</option></primary></indexterm> </term> <listitem> - <para> The costs of all CAFs in a module are usually + <para>The costs of all CAFs in a module are usually attributed to one “big” CAF cost-centre. With this option, all CAFs get their own cost-centre. An “if all else fails” option…</para> @@ -436,17 +451,46 @@ x = nfib 25 <varlistentry> <term> - <option>-ignore-scc</option>: - <indexterm><primary><option>-ignore-scc</option></primary></indexterm> + <option>-fno-prof-auto</option>: + <indexterm><primary><option>-no-fprof-auto</option></primary></indexterm> </term> <listitem> - <para>Ignore any <function>_scc_</function> - constructs, so a module which already has - <function>_scc_</function>s can be compiled - for profiling with the annotations ignored.</para> - </listitem> + <para>Disables any previous <option>-fprof-auto</option>, + <option>-fprof-auto-top</option>, or + <option>-fprof-auto-exported</option> options. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-fno-prof-cafs</option>: + <indexterm><primary><option>-fno-prof-cafs</option></primary></indexterm> + </term> + <listitem> + <para>Disables any previous <option>-fprof-cafs</option> option. + </para> + </listitem> </varlistentry> + <varlistentry> + <term> + <option>-fno-prof-count-entries</option>: + <indexterm><primary><option>-fno-prof-count-entries</option></primary></indexterm> + </term> + <listitem> + <para>Tells GHC not to collect information about how often + functions are entered at runtime (the "entries" column of + the time profile), for this module. This tends to make the + profiled code run faster, and hence closer to the speed of + the unprofiled code, because GHC is able to optimise more + aggressively if it doesn't have to maintain correct entry + counts. This option can be useful if you aren't interested + in the entry counts (for example, if you only intend to do + heap profiling). + </para> + </listitem> + </varlistentry> </variablelist> </sect1> @@ -491,7 +535,7 @@ x = nfib 25 <listitem> <para>Sets the interval that the RTS clock ticks at, which is also the sampling interval of the time and allocation profile. - The default is 0.02 second.</para> + The default is 0.02 seconds.</para> </listitem> </varlistentry> @@ -501,13 +545,14 @@ x = nfib 25 <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm> </term> <listitem> - <para>This option makes use of the extra information - maintained by the cost-centre-stack profiler to provide - useful information about the location of runtime errors. - See <xref linkend="rts-options-debugging"/>.</para> - </listitem> + <para>This option causes the runtime to print out the + current cost-centre stack whenever an exception is raised. + This can be particularly useful for debugging the location + of exceptions, such as the notorious <literal>Prelude.head: + empty list</literal> error. See <xref + linkend="rts-options-debugging"/>.</para> + </listitem> </varlistentry> - </variablelist> </sect1> @@ -520,7 +565,7 @@ x = nfib 25 over time. This is useful for detecting the causes of <firstterm>space leaks</firstterm>, when your program holds on to more memory at run-time that it needs to. Space leaks lead to - longer run-times due to heavy garbage collector activity, and may + slower execution due to heavy garbage collector activity, and may even cause the program to run out of memory altogether.</para> <para>To generate a heap profile from your program:</para> @@ -532,7 +577,7 @@ x = nfib 25 </listitem> <listitem> <para>Run it with one of the heap profiling options described - below (eg. <option>-hc</option> for a basic producer profile). + below (eg. <option>-h</option> for a basic producer profile). This generates the file <filename><replaceable>prog</replaceable>.hp</filename>.</para> </listitem> @@ -550,6 +595,16 @@ x = nfib 25 </listitem> </orderedlist> + <para>For example, here is a heap profile produced for the program given above in <xref linkend="scc-pragma" />:</para> + + <!-- + contentwidth/contentheight don't appear to have any effect + other than making the PS file generation work, rather than + falling over. The result seems to be broken PS on the page + with the image. --> + <imagedata fileref="prof_scc.png" contentwidth="645px" + contentdepth="428px"/> + <para>You might also want to take a look at <ulink url="http://www.haskell.org/haskellwiki/Hp2any">hp2any</ulink>, a more advanced suite of tools (not distributed with GHC) for @@ -571,7 +626,7 @@ x = nfib 25 <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm> </term> <listitem> - <para>Breaks down the graph by the cost-centre stack which + <para>(can be shortened to <option>-h</option>). Breaks down the graph by the cost-centre stack which produced the data.</para> </listitem> </varlistentry> @@ -782,8 +837,9 @@ x = nfib 25 space the program is using.</para> <para>Memory occupied by threads and their stacks is - labelled as “TSO” when displaying the profile - by closure description or type description.</para> + labelled as “TSO” and “STACK” + respectively when displaying the profile by closure + description or type description.</para> </listitem> </varlistentry> @@ -1305,14 +1361,13 @@ to re-read its input file: <indexterm><primary>hpc</primary></indexterm> <para> - Code coverage tools allow a programmer to determine what parts of - their code have been actually executed, and which parts have + Code coverage tools allow a programmer to determine what parts + of their code have been actually executed, and which parts have never actually been invoked. GHC has an option for generating instrumented code that records code coverage as part of the - <ulink url="http://www.haskell.org/hpc">Haskell Program Coverage - </ulink>(HPC) toolkit, which is included with GHC. HPC tools can - be used to render the generated code coverage information into - human understandable format. </para> + Haskell Program Coverage (HPC) toolkit, which is included with + GHC. HPC tools can be used to render the generated code coverage + information into human understandable format. </para> <para> Correctly instrumented code provides coverage information of two @@ -1327,8 +1382,8 @@ to re-read its input file: <para> HPC displays both kinds of information in two primary ways: - textual reports with summary statistics (hpc report) and sources - with color mark-up (hpc markup). For boolean coverage, there + textual reports with summary statistics (<literal>hpc report</literal>) and sources + with color mark-up (<literal>hpc markup</literal>). For boolean coverage, there are four possible outcomes for each guard, condition or qualifier: both True and False values occur; only True; only False; never evaluated. In hpc-markup output, highlighting with @@ -1340,7 +1395,7 @@ to re-read its input file: <sect2><title>A small example: Reciprocation</title> <para> - For an example we have a program, called Recip.hs, which computes exact decimal + For an example we have a program, called <filename>Recip.hs</filename>, which computes exact decimal representations of reciprocals, with recurring parts indicated in brackets. </para> @@ -1377,21 +1432,35 @@ main = do main </programlisting> - <para>The HPC instrumentation is enabled using the -fhpc flag. + <para>HPC instrumentation is enabled with the -fhpc flag: </para> <screen> -$ ghc -fhpc Recip.hs --make +$ ghc -fhpc Recip.hs </screen> - <para>HPC index (.mix) files are placed in .hpc subdirectory. These can be considered like - the .hi files for HPC. - </para> + <para>GHC creates a subdirectory <filename>.hpc</filename> in the + current directory, and puts HPC index (<filename>.mix</filename>) + files in there, one for each module compiled. You don't need to + worry about these files: they contain information needed by the + <literal>hpc</literal> tool to generate the coverage data for + compiled modules after the program is run.</para> <screen> $ ./Recip 1/3 = 0.(3) </screen> - <para>We can generate a textual summary of coverage:</para> + <para>Running the program generates a file with the + <literal>.tix</literal> suffix, in this case + <filename>Recip.tix</filename>, which contains the coverage data + for this run of the program. The program may be run multiple + times (e.g. with different test data), and the coverage data from + the separate runs is accumulated in the <filename>.tix</filename> + file. To reset the coverage data and start again, just remove the + <filename>.tix</filename> file. + </para> + + <para>Having run the program, we can generate a textual summary of + coverage:</para> <screen> $ hpc report Recip 80% expressions used (81/101) @@ -1418,20 +1487,32 @@ writing Recip.hs.html </sect2> <sect2><title>Options for instrumenting code for coverage</title> - <para> - Turning on code coverage is easy, use the -fhpc flag. - Instrumented and non-instrumented can be freely mixed. - When compiling the Main module GHC automatically detects when there - is an hpc compiled file, and adds the correct initialization code. - </para> + + <variablelist> + <varlistentry> + <term><option>-fhpc</option></term> + <indexterm><primary><option>-fhpc</option></primary></indexterm> + <listitem> + <para>Enable code coverage for the current module or modules + being compiled.</para> + + <para>Modules compiled with this option can be freely mixed + with modules compiled without it; indeed, most libraries + will typically be compiled without <option>-fhpc</option>. + When the program is run, coverage data will only be + generated for those modules that were compiled with + <option>-fhpc</option>, and the <literal>hpc</literal> tool + will only show information about those modules. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> <sect2><title>The hpc toolkit</title> - <para> - The hpc toolkit uses a cvs/svn/darcs-like interface, where a - single binary contains many function units.</para> + <para>The hpc command has several sub-commands:</para> <screen> $ hpc Usage: hpc COMMAND ... @@ -1453,18 +1534,17 @@ Others: version Display version for hpc </screen> - <para>In general, these options act on .tix file after an - instrumented binary has generated it, which hpc acting as a - conduit between the raw .tix file, and the more detailed reports - produced. + <para>In general, these options act on a + <filename>.tix</filename> file after an instrumented binary has + generated it. </para> <para> The hpc tool assumes you are in the top-level directory of - the location where you built your application, and the .tix + the location where you built your application, and the <filename>.tix</filename> file is in the same top-level directory. You can use the - flag --srcdir to use hpc for any other directory, and use - --srcdir multiple times to analyse programs compiled from + flag <option>--srcdir</option> to use <literal>hpc</literal> for any other directory, and use + <option>--srcdir</option> multiple times to analyse programs compiled from difference locations, as is typical for packages. </para> @@ -1473,10 +1553,10 @@ Others: </para> <sect3><title>hpc report</title> - <para>hpc report gives a textual report of coverage. By default, + <para><literal>hpc report</literal> gives a textual report of coverage. By default, all modules and packages are considered in generating report, unless include or exclude are used. The report is a summary - unless the --per-module flag is used. The --xml-output option + unless the <option>--per-module</option> flag is used. The <option>--xml-output</option> option allows for tools to use hpc to glean coverage. </para> <screen> @@ -1497,7 +1577,7 @@ Options: </screen> </sect3> <sect3><title>hpc markup</title> - <para>hpc markup marks up source files into colored html. + <para><literal>hpc markup</literal> marks up source files into colored html. </para> <screen> $ hpc help markup @@ -1518,8 +1598,8 @@ Options: </sect3> <sect3><title>hpc sum</title> - <para>hpc sum adds together any number of .tix files into a single - .tix file. hpc sum does not change the original .tix file; it generates a new .tix file. + <para><literal>hpc sum</literal> adds together any number of <filename>.tix</filename> files into a single + <filename>.tix</filename> file. <literal>hpc sum</literal> does not change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file. </para> <screen> $ hpc help sum @@ -1535,10 +1615,10 @@ Options: </screen> </sect3> <sect3><title>hpc combine</title> - <para>hpc combine is the swiss army knife of hpc. It can be - used to take the difference between .tix files, to subtract one - .tix file from another, or to add two .tix files. hpc combine does not - change the original .tix file; it generates a new .tix file. + <para><literal>hpc combine</literal> is the swiss army knife of <literal>hpc</literal>. It can be + used to take the difference between <filename>.tix</filename> files, to subtract one + <filename>.tix</filename> file from another, or to add two <filename>.tix</filename> files. hpc combine does not + change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file. </para> <screen> $ hpc help combine @@ -1556,8 +1636,8 @@ Options: </screen> </sect3> <sect3><title>hpc map</title> - <para>hpc map inverts or zeros a .tix file. hpc map does not - change the original .tix file; it generates a new .tix file. + <para>hpc map inverts or zeros a <filename>.tix</filename> file. hpc map does not + change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file. </para> <screen> $ hpc help map @@ -1608,9 +1688,9 @@ Options: </sect2> <sect2><title>Caveats and Shortcomings of Haskell Program Coverage</title> <para> - HPC does not attempt to lock the .tix file, so multiple concurrently running + HPC does not attempt to lock the <filename>.tix</filename> file, so multiple concurrently running binaries in the same directory will exhibit a race condition. There is no way - to change the name of the .tix file generated, apart from renaming the binary. + to change the name of the <filename>.tix</filename> file generated, apart from renaming the binary. HPC does not work with GHCi. </para> </sect2> diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml index 2af87bcc41..20aca07cd3 100644 --- a/docs/users_guide/runtime_control.xml +++ b/docs/users_guide/runtime_control.xml @@ -1068,7 +1068,7 @@ char *ghc_rts_opts = "-H128m -K1m"; option</secondary></indexterm> </term> <listitem> - <para>Generates a basic heap profile, in the + <para>(can be shortened to <option>-h</option>.) Generates a basic heap profile, in the file <literal><replaceable>prog</replaceable>.hp</literal>. To produce the heap profile graph, use <command>hp2ps</command> (see <xref linkend="hp2ps" @@ -1101,7 +1101,7 @@ char *ghc_rts_opts = "-H128m -K1m"; <para> In binary format to a file for later analysis by a variety of tools. One such tool - is <ulink url="http://hackage.haskell.org/package/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>, + is <ulink url="http://www.haskell.org/haskellwiki/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>, which interprets the event log to produce a visual parallel execution profile of the program. </para> @@ -1314,33 +1314,53 @@ char *ghc_rts_opts = "-H128m -K1m"; <para>(Only available when the program is compiled for profiling.) When an exception is raised in the program, this option causes the current cost-centre-stack to be - dumped to <literal>stderr</literal>.</para> + dumped to <literal>stderr</literal>.</para> <para>This can be particularly useful for debugging: if your program is complaining about a <literal>head []</literal> error and you haven't got a clue which bit of code is causing it, compiling with <literal>-prof - -auto-all</literal> and running with <literal>+RTS -xc + -fprof-auto</literal> and running with <literal>+RTS -xc -RTS</literal> will tell you exactly the call stack at the point the error was raised.</para> - <para>The output contains one line for each exception raised - in the program (the program might raise and catch several - exceptions during its execution), where each line is of the - form:</para> + <para>The output contains one report for each exception + raised in the program (the program might raise and catch + several exceptions during its execution), where each report + looks something like this: + </para> <screen> -< cc<subscript>1</subscript>, ..., cc<subscript>n</subscript> > +*** Exception raised (reporting due to +RTS -xc), stack trace: + GHC.List.CAF + --> evaluated by: Main.polynomial.table_search, + called from Main.polynomial.theta_index, + called from Main.polynomial, + called from Main.zonal_pressure, + called from Main.make_pressure.p, + called from Main.make_pressure, + called from Main.compute_initial_state.p, + called from Main.compute_initial_state, + called from Main.CAF + ... </screen> - <para>each <literal>cc</literal><subscript>i</subscript> is - a cost centre in the program (see <xref - linkend="cost-centres"/>), and the sequence represents the - “call stack” at the point the exception was - raised. The leftmost item is the innermost function in the - call stack, and the rightmost item is the outermost - function.</para> - - </listitem> + <para>The stack trace may often begin with something + uninformative like <literal>GHC.List.CAF</literal>; this is + an artifact of GHC's optimiser, which lifts out exceptions + to the top-level where the profiling system assigns them to + the cost centre "CAF". However, <literal>+RTS -xc</literal> + doesn't just print the current stack, it looks deeper and + reports the stack at the time the CAF was evaluated, and it + may report further stacks until a non-CAF stack is found. In + the example above, the next stack (after <literal>--> + evaluated by</literal>) contains plenty of information about + what the program was doing when it evaluated <literal>head + []</literal>.</para> + + <para>Implementation details aside, the function names in + the stack should hopefully give you enough clues to track + down the bug.</para> + </listitem> </varlistentry> <varlistentry> |