Overhaul of infrastructure for profiling, coverage (HPC) and breakpoints

User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
author: Simon Marlow <marlowsd@gmail.com> 2011-10-27 13:47:27 +0100
committer: Simon Marlow <marlowsd@gmail.com> 2011-11-02 16:34:05 +0000
commit: 7bb0447df9a783c222c2a077e35e5013c7c68d91 (patch)
tree: 78d6d2a14f7e42df5cda32199c71ced973f169ef /docs
parent: bd72eeb184a95ae0ae79ccad19c8ccc2b45a12e0 (diff)
download: haskell-7bb0447df9a783c222c2a077e35e5013c7c68d91.tar.gz
3 files changed, 388 insertions, 275 deletions
diff --git a/docs/users_guide/flags.xml b/docs/users_guide/flags.xml
index 244bbf64a6..e765525c13 100644
--- a/docs/users_guide/flags.xml
+++ b/docs/users_guide/flags.xml
@@ -1668,32 +1668,45 @@ phase <replaceable>n</replaceable></entry>
 	  </thead>
 	  <tbody>
 	    <row>
-	      <entry><option>-auto</option></entry>
-	      <entry>Auto-add <literal>_scc_</literal>s to all
-	      exported functions not marked INLINE</entry>
-	      <entry>dynamic</entry>
-	      <entry><option>-no-auto</option></entry>
+	      <entry><option>-prof</option></entry>
+	      <entry>Turn on profiling</entry>
+	      <entry>static</entry>
+	      <entry>-</entry>
 	    </row>
 	    <row>
-	      <entry><option>-auto-all</option></entry>
-	      <entry>Auto-add <literal>_scc_</literal>s to all
-	      top-level functions not marked INLINE</entry>
+              <entry><option>-fprof-auto</option></entry>
+              <entry>Auto-add <literal>SCC</literal>s to all bindings
+              not marked INLINE</entry>
 	      <entry>dynamic</entry>
-	      <entry><option>-no-auto-all</option></entry>
+              <entry><option>-fno-prof-auto</option></entry>
 	    </row>
 	    <row>
-	      <entry><option>-caf-all</option></entry>
-	      <entry>Auto-add <literal>_scc_</literal>s to all CAFs</entry>
-	      <entry>dynamic</entry>
-	      <entry><option>-no-caf-all</option></entry>
+              <entry><option>-fprof-auto-top</option></entry>
+              <entry>Auto-add <literal>SCC</literal>s to all top-level
+              bindings not marked INLINE</entry>
+              <entry>dynamic</entry>
+              <entry><option>-fno-prof-auto</option></entry>
 	    </row>
 	    <row>
-	      <entry><option>-prof</option></entry>
-	      <entry>Turn on profiling</entry>
-	      <entry>static</entry>
-	      <entry>-</entry>
+              <entry><option>-fprof-auto-exported</option></entry>
+              <entry>Auto-add <literal>SCC</literal>s to all exported
+              bindings not marked INLINE</entry>
+              <entry>dynamic</entry>
+              <entry><option>-fno-prof-auto</option></entry>
 	    </row>
 	    <row>
+              <entry><option>-fprof-cafs</option></entry>
+              <entry>Auto-add <literal>SCC</literal>s to all CAFs</entry>
+	      <entry>dynamic</entry>
+              <entry><option>-fno-prof-cafs</option></entry>
+	    </row>
+	    <row>
+              <entry><option>-fno-prof-count-entries</option></entry>
+              <entry>Do not collect entry counts</entry>
+	      <entry>dynamic</entry>
+              <entry><option>-fprof-count-entries</option></entry>
+	    </row>
+            <row>
 	      <entry><option>-ticky</option></entry>
 	      <entry><link linkend="ticky-ticky">Turn on ticky-ticky profiling</link></entry>
 	      <entry>static</entry>
diff --git a/docs/users_guide/profiling.xml b/docs/users_guide/profiling.xml
index 01c7576b63..16dead80ae 100644
--- a/docs/users_guide/profiling.xml
+++ b/docs/users_guide/profiling.xml
@@ -5,45 +5,58 @@
   </indexterm>
   <indexterm><primary>cost-centre profiling</primary></indexterm>
 
-  <para> Glasgow Haskell comes with a time and space profiling
-  system. Its purpose is to help you improve your understanding of
-  your program's execution behaviour, so you can improve it.</para>
+  <para>GHC comes with a time and space profiling system, so that you
+  can answer questions like "why is my program so slow?", or "why is
+  my program using so much memory?".</para>
 
-  <para> Any comments, suggestions and/or improvements you have are
-  welcome.  Recommended &ldquo;profiling tricks&rdquo; would be
-  especially cool! </para>
+  <para>Note that multi-processor execution (e.g. <literal>+RTS
+  -N2</literal>) is not currently supported with GHC's time and space
+  profiling.  However, there is a separate tool specifically for
+  profiling concurrent and parallel programs: <ulink
+  url="http://www.haskell.org/haskellwiki/ThreadScope">ThreadScope</ulink>.</para>
 
   <para>Profiling a program is a three-step process:</para>
 
   <orderedlist>
     <listitem>
-      <para> Re-compile your program for profiling with the
-      <literal>-prof</literal> option, and probably one of the
-      <literal>-auto</literal> or <literal>-auto-all</literal>
-      options.  These options are described in more detail in <xref
-      linkend="prof-compiler-options"/> </para>
-      <indexterm><primary><literal>-prof</literal></primary>
-      </indexterm>
-      <indexterm><primary><literal>-auto</literal></primary>
-      </indexterm>
-      <indexterm><primary><literal>-auto-all</literal></primary>
-      </indexterm>
+      <para>Re-compile your program for profiling with the
+      <literal>-prof</literal> option, and probably one of the options
+      for adding automatic annotations:
+      <literal>-fprof-auto</literal> is the most common<footnote><para><option>-fprof-auto</option> was known as <option>-auto-all</option><indexterm><primary><literal>-auto-all</literal></primary>
+      </indexterm> prior to GHC 7.4.1.</para></footnote>.
+      <indexterm><primary><literal>-fprof-auto</literal></primary>
+      </indexterm></para>
+
+      <para>If you are using external packages with
+      <literal>cabal</literal>, you may need to reinstall these
+      packages with profiling support; typically this is done with
+      <literal>cabal install -p <replaceble>package</replaceble>
+      --reinstall</literal>.</para>
     </listitem>
 
     <listitem>
-      <para> Run your program with one of the profiling options, eg.
-      <literal>+RTS -p -RTS</literal>.  This generates a file of
-      profiling information.  Note that multi-processor execution
-      (e.g. <literal>+RTS -N2</literal>) is not supported while
-      profiling.</para>
-      <indexterm><primary><option>-p</option></primary><secondary>RTS
-      option</secondary></indexterm>
+      <para>Having compiled the program for profiling, you now need to
+      run it to generate the profile.  For example, a simple time
+      profile can be generated by running the program with
+      <option>+RTS
+      -p</option><indexterm><primary><option>-p</option></primary><secondary>RTS
+      option</secondary></indexterm>, which generates a file named
+      <literal><replaceable>prog</replaceable>.prof</literal> where
+      <replaceable>prog</replaceable> is the name of your program
+      (without the <literal>.exe</literal> extension, if you are on
+      Windows).</para>
+
+      <para>There are many different kinds of profile that can be
+      generated, selected by different RTS options.  We will be
+      describing the various kinds of profile throughout the rest of
+      this chapter.  Some profiles require further processing using
+      additional tools after running the program.</para>
     </listitem>
 
     <listitem>
-      <para> Examine the generated profiling information, using one of
-      GHC's profiling tools.  The tool to use will depend on the kind
-      of profiling information generated.</para>
+      <para>Examine the generated profiling information, use the
+      information to optimise your program, and repeat as
+      necessary.</para>
     </listitem>
 
   </orderedlist>
@@ -53,24 +66,24 @@
 
     <para>GHC's profiling system assigns <firstterm>costs</firstterm>
     to <firstterm>cost centres</firstterm>.  A cost is simply the time
-    or space required to evaluate an expression.  Cost centres are
+    or space (memory) required to evaluate an expression.  Cost centres are
     program annotations around expressions; all costs incurred by the
     annotated expression are assigned to the enclosing cost centre.
     Furthermore, GHC will remember the stack of enclosing cost centres
-    for any given expression at run-time and generate a call-graph of
+    for any given expression at run-time and generate a call-tree of
     cost attributions.</para>
 
     <para>Let's take a look at an example:</para>
 
     <programlisting>
-main = print (nfib 25)
-nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+main = print (fib 30)
+fib n = if n &lt; 2 then 1 else fib (n-1) + fib (n-2)
 </programlisting>
 
     <para>Compile and run this program as follows:</para>
 
     <screen>
-$ ghc -prof -auto-all -o Main Main.hs
+$ ghc -prof -fprof-auto -rtsopts Main.hs
 $ ./Main +RTS -p
 121393
 $
@@ -78,53 +91,52 @@ $
 
     <para>When a GHC-compiled program is run with the
     <option>-p</option> RTS option, it generates a file called
-    <filename>&lt;prog&gt;.prof</filename>.  In this case, the file
+    <filename><replaceable>prog</replaceable>.prof</filename>.  In this case, the file
     will contain something like this:</para>
 
 <screen>
-          Fri May 12 14:06 2000 Time and Allocation Profiling Report  (Final)
+        Wed Oct 12 16:14 2011 Time and Allocation Profiling Report  (Final)
 
            Main +RTS -p -RTS
 
-        total time  =        0.14 secs   (7 ticks @ 20 ms)
-        total alloc =   8,741,204 bytes  (excludes profiling overheads)
+        total time  =        0.68 secs   (34 ticks @ 20 ms)
+        total alloc = 204,677,844 bytes  (excludes profiling overheads)
 
-COST CENTRE          MODULE     %time %alloc
+COST CENTRE MODULE  %time %alloc
 
-nfib                 Main       100.0  100.0
+fib         Main    100.0  100.0
 
 
-                                              individual     inherited
-COST CENTRE              MODULE      entries %time %alloc   %time %alloc
+                                                      individual     inherited
+COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc
 
-MAIN                     MAIN             0    0.0   0.0    100.0 100.0
- main                    Main             0    0.0   0.0      0.0   0.0
- CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
- CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
- CAF                     Main             6    0.0   0.0    100.0 100.0
-  main                   Main             1    0.0   0.0    100.0 100.0
-   nfib                  Main        242785  100.0 100.0    100.0 100.0
+MAIN        MAIN                    102           0    0.0    0.0   100.0  100.0
+ CAF        GHC.IO.Handle.FD        128           0    0.0    0.0     0.0    0.0
+ CAF        GHC.IO.Encoding.Iconv   120           0    0.0    0.0     0.0    0.0
+ CAF        GHC.Conc.Signal         110           0    0.0    0.0     0.0    0.0
+ CAF        Main                    108           0    0.0    0.0   100.0  100.0
+  main      Main                    204           1    0.0    0.0   100.0  100.0
+   fib      Main                    205     2692537  100.0  100.0   100.0  100.0
 </screen>
 
-
     <para>The first part of the file gives the program name and
     options, and the total time and total memory allocation measured
     during the run of the program (note that the total memory
     allocation figure isn't the same as the amount of
     <emphasis>live</emphasis> memory needed by the program at any one
     time; the latter can be determined using heap profiling, which we
-    will describe shortly).</para>
+    will describe later in <xref linkend="prof-heap" />).</para>
 
     <para>The second part of the file is a break-down by cost centre
     of the most costly functions in the program.  In this case, there
     was only one significant function in the program, namely
-    <function>nfib</function>, and it was responsible for 100&percnt;
+    <function>fib</function>, and it was responsible for 100&percnt;
     of both the time and allocation costs of the program.</para>
 
     <para>The third and final section of the file gives a profile
-    break-down by cost-centre stack.  This is roughly a call-graph
+    break-down by cost-centre stack.  This is roughly a call-tree
     profile of the program.  In the example above, it is clear that
-    the costly call to <function>nfib</function> came from
+    the costly call to <function>fib</function> came from
     <function>main</function>.</para>
 
     <para>The time and allocation incurred by a given part of the
@@ -137,33 +149,39 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
     by  modifying the example slightly:</para>
 
     <programlisting>
-main = print (f 25 + g 25)
-f n  = nfib n
-g n  = nfib (n `div` 2)
-nfib n = if n &lt; 2 then 1 else nfib (n-1) + nfib (n-2)
+main = print (f 30 + g 30)
+  where
+    f n  = fib n
+    g n  = fib (n `div` 2)
+
+fib n = if n &lt; 2 then 1 else fib (n-1) + fib (n-2)
 </programlisting>
 
     <para>Compile and run this program as before, and take a look at
     the new profiling results:</para>
 
 <screen>
-COST CENTRE              MODULE         scc  %time %alloc   %time %alloc
-
-MAIN                     MAIN             0    0.0   0.0    100.0 100.0
- main                    Main             0    0.0   0.0      0.0   0.0
- CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
- CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
- CAF                     Main             9    0.0   0.0    100.0 100.0
-  main                   Main             1    0.0   0.0    100.0 100.0
-   g                     Main             1    0.0   0.0      0.0   0.2
-    nfib                 Main           465    0.0   0.2      0.0   0.2
-   f                     Main             1    0.0   0.0    100.0  99.8
-    nfib                 Main        242785  100.0  99.8    100.0  99.8
+COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc
+
+MAIN        MAIN                    102           0    0.0    0.0   100.0  100.0
+ CAF        GHC.IO.Handle.FD        128           0    0.0    0.0     0.0    0.0
+ CAF        GHC.IO.Encoding.Iconv   120           0    0.0    0.0     0.0    0.0
+ CAF        GHC.Conc.Signal         110           0    0.0    0.0     0.0    0.0
+ CAF        Main                    108           0    0.0    0.0   100.0  100.0
+  main      Main                    204           1    0.0    0.0   100.0  100.0
+   main.g   Main                    207           1    0.0    0.0     0.0    0.1
+    fib     Main                    208        1973    0.0    0.1     0.0    0.1
+   main.f   Main                    205           1    0.0    0.0   100.0   99.9
+    fib     Main                    206     2692537  100.0   99.9   100.0   99.9
 </screen>
 
-    <para>Now although we had two calls to <function>nfib</function>
-    in the program, it is immediately clear that it was the call from
-    <function>f</function> which took all the time.</para>
+    <para>Now although we had two calls to <function>fib</function> in
+    the program, it is immediately clear that it was the call from
+    <function>f</function> which took all the time.  The functions
+    <literal>f</literal> and <literal>g</literal> which are defined in
+    the <literal>where</literal> clause in <literal>main</literal> are
+    given their own cost centres, <literal>main.f</literal> and
+    <literal>main.g</literal> respectively.</para>
 
     <para>The actual meaning of the various columns in the output is:</para>
 
@@ -172,7 +190,7 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 	<term>entries</term>
 	<listitem>
 	  <para>The number of times this particular point in the call
-	  graph was entered.</para>
+	  tree was entered.</para>
 	</listitem>
       </varlistentry>
 
@@ -180,7 +198,7 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 	<term>individual &percnt;time</term>
 	<listitem>
 	  <para>The percentage of the total run time of the program
-	  spent at this point in the call graph.</para>
+	  spent at this point in the call tree.</para>
 	</listitem>
       </varlistentry>
 
@@ -197,7 +215,7 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 	<term>inherited &percnt;time</term>
 	<listitem>
 	  <para>The percentage of the total run time of the program
-	  spent below this point in the call graph.</para>
+	  spent below this point in the call tree.</para>
 	</listitem>
       </varlistentry>
 
@@ -242,19 +260,24 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
     although GHC does keep information about which groups of functions
     called each other recursively, this information isn't displayed in
     the basic time and allocation profile, instead the call-graph is
-    flattened into a tree.</para>
+    flattened into a tree as follows: a call to a function that occurs
+    elsewhere on the current stack does not push another entry on the
+    stack, instead the costs for this call are aggregated into the
+    caller<footnote><para>Note that this policy has changed slightly
+    in GHC 7.4.1 relative to earlier versions, and may yet change
+    further, feedback is welcome.</para></footnote>.</para>
 
-    <sect2><title>Inserting cost centres by hand</title>
+    <sect2 id="scc-pragma"><title>Inserting cost centres by hand</title>
 
       <para>Cost centres are just program annotations.  When you say
-      <option>-auto-all</option> to the compiler, it automatically
-      inserts a cost centre annotation around every top-level function
-      not marked INLINE in your program, but you are entirely free to
-      add the cost centre annotations yourself.</para>
+      <option>-fprof-auto</option> to the compiler, it automatically
+      inserts a cost centre annotation around every binding not marked
+      INLINE in your program, but you are entirely free to add cost
+      centre annotations yourself.</para>
 
       <para>The syntax of a cost centre annotation is</para>
 
-      <programlisting>
+<programlisting>
      {-# SCC "name" #-} &lt;expression&gt;
 </programlisting>
 
@@ -270,92 +293,72 @@ MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 
 <programlisting>
 main :: IO ()
-main = do let xs = {-# SCC "X" #-} [1..1000000]
-          let ys = {-# SCC "Y" #-} [1..2000000]
-          print $ last xs
-          print $ last $ init xs
-          print $ last ys
-          print $ last $ init ys
+main = do let xs = [1..1000000]
+          let ys = [1..2000000]
+          print $ {-# SCC "last_xs" #-} last xs
+          print $ {-# SCC "last_init_xs" #-} last $ init xs
+          print $ {-# SCC "last_ys" #-} last ys
+          print $ {-# SCC "last_init_ys" #-}last $ init ys
 </programlisting>
 
-      <para>which gives this heap profile when run:</para>
+      <para>which gives this profile when run:</para>
 
-      <!-- contentwidth/contentheight don't appear to have any effect
-           other than making the PS file generation work, rather than
-           falling over.  The result seems to be broken PS on the page
-           with the image. -->
-      <imagedata fileref="prof_scc" contentwidth="645px"
-      contentdepth="428px"/>
+<screen>
+COST CENTRE     MODULE                  no.     entries  %time %alloc   %time %alloc
+
+MAIN            MAIN                    102           0    0.0    0.0   100.0  100.0
+ CAF            GHC.IO.Handle.FD        130           0    0.0    0.0     0.0    0.0
+ CAF            GHC.IO.Encoding.Iconv   122           0    0.0    0.0     0.0    0.0
+ CAF            GHC.Conc.Signal         111           0    0.0    0.0     0.0    0.0
+ CAF            Main                    108           0    0.0    0.0   100.0  100.0
+  main          Main                    204           1    0.0    0.0   100.0  100.0
+   last_init_ys Main                    210           1   25.0   27.4    25.0   27.4
+   main.ys      Main                    209           1   25.0   39.2    25.0   39.2
+   last_ys      Main                    208           1   12.5    0.0    12.5    0.0
+   last_init_xs Main                    207           1   12.5   13.7    12.5   13.7
+   main.xs      Main                    206           1   18.8   19.6    18.8   19.6
+   last_xs      Main                    205           1    6.2    0.0     6.2    0.0
+</screen>
 
     </sect2>
 
     <sect2 id="prof-rules">
       <title>Rules for attributing costs</title>
 
-      <para>The cost of evaluating any expression in your program is
-      attributed to a cost-centre stack using the following rules:</para>
-
-      <itemizedlist>
-	<listitem>
-	  <para>If the expression is part of the
-	  <firstterm>one-off</firstterm> costs of evaluating the
-	  enclosing top-level definition, then costs are attributed to
-	  the stack of lexically enclosing <literal>SCC</literal>
-	  annotations on top of the special <literal>CAF</literal>
-	  cost-centre. </para>
-	</listitem>
-
-	<listitem>
-	  <para>Otherwise, costs are attributed to the stack of
-	  lexically-enclosing <literal>SCC</literal> annotations,
-	  appended to the cost-centre stack in effect at the
-	  <firstterm>call site</firstterm> of the current top-level
-	  definition<footnote> <para>The call-site is just the place
-	  in the source code which mentions the particular function or
-	  variable.</para></footnote>.  Notice that this is a recursive
-	  definition.</para>
-	</listitem>
-
-	<listitem>
-	  <para>Time spent in foreign code (see <xref linkend="ffi"/>)
-	  is always attributed to the cost centre in force at the
-	  Haskell call-site of the foreign function.</para>
-	</listitem>
-      </itemizedlist>
-
-      <para>What do we mean by one-off costs?  Well, Haskell is a lazy
-      language, and certain expressions are only ever evaluated once.
-      For example, if we write:</para>
-
-      <programlisting>
-x = nfib 25
-</programlisting>
-
-      <para>then <varname>x</varname> will only be evaluated once (if
-      at all), and subsequent demands for <varname>x</varname> will
-      immediately get to see the cached result.  The definition
-      <varname>x</varname> is called a CAF (Constant Applicative
-      Form), because it has no arguments.</para>
-
-      <para>For the purposes of profiling, we say that the expression
-      <literal>nfib 25</literal> belongs to the one-off costs of
-      evaluating <varname>x</varname>.</para>
-
-      <para>Since one-off costs aren't strictly speaking part of the
-      call-graph of the program, they are attributed to a special
-      top-level cost centre, <literal>CAF</literal>.  There may be one
-      <literal>CAF</literal> cost centre for each module (the
-      default), or one for each top-level definition with any one-off
-      costs (this behaviour can be selected by giving GHC the
-      <option>-caf-all</option> flag).</para>
-
-      <indexterm><primary><literal>-caf-all</literal></primary>
-      </indexterm>
-
-      <para>If you think you have a weird profile, or the call-graph
-      doesn't look like you expect it to, feel free to send it (and
-      your program) to us at
-      <email>glasgow-haskell-bugs@haskell.org</email>.</para>
+      <para>While running a program with profiling turned on, GHC
+      maintains a cost-centre stack behind the scenes, and attributes
+      any costs (memory allocation and time) to whatever the current
+      cost-centre stack is at the time the cost is incurred.</para>
+
+      <para>The mechanism is simple: whenever the program evaluates an
+      expression with an SCC annotation, <literal>{-# SCC c -#}
+      E</literal>, the cost centre <literal>c</literal> is pushed on
+      the current stack, and the entry count for this stack is
+      incremented by one.  The stack also sometimes has to be saved
+      and restored; in particular when the program creates a
+      <firstterm>thunk</firstterm> (a lazy suspension), the current
+      cost-centre stack is stored in the thunk, and restored when the
+      thunk is evaluated.  In this way, the cost-centre stack is
+      independent of the actual evaluation order used by GHC at
+      runtime.</para>
+
+      <para>At a function call, GHC takes the stack stored in the
+      function being called (which for a top-level function will be
+      empty), and <emphasis>appends</emphasis> it to the current
+      stack, ignoring any prefix that is identical to a prefix of the
+      current stack.</para>
+
+      <para>We mentioned earlier that lazy computations, i.e. thunks,
+      capture the current stack when they are created, and restore
+      this stack when they are evaluated.  What about top-level
+      thunks?  They are "created" when the program is compiled, so
+      what stack should we give them?  The technical name for a
+      top-level thunk is a CAF ("Constant Applicative Form").  GHC
+      assigns every CAF in a module a stack consisting of the single
+      cost centre <literal>M.CAF</literal>, where <literal>M</literal>
+      is the name of the module.  It is also possible to give each CAF
+      a different stack, using the option
+      <option>-fprof-cafs</option><indexterm><primary><option>-fprof-cafs</option></primary></indexterm>.</para>
     </sect2>
   </sect1>
 
@@ -372,13 +375,13 @@ x = nfib 25
           <indexterm><primary><option>-prof</option></primary></indexterm>
         </term>
 	<listitem>
-	  <para> To make use of the profiling system
+	  <para>To make use of the profiling system
           <emphasis>all</emphasis> modules must be compiled and linked
           with the <option>-prof</option> option. Any
           <literal>SCC</literal> annotations you've put in your source
           will spring to life.</para>
 
-	  <para> Without a <option>-prof</option> option, your
+	  <para>Without a <option>-prof</option> option, your
           <literal>SCC</literal>s are ignored; so you can compile
           <literal>SCC</literal>-laden code without changing
           it.</para>
@@ -394,40 +397,52 @@ x = nfib 25
     <variablelist>
       <varlistentry>
 	<term>
-          <option>-auto</option>:
-          <indexterm><primary><option>-auto</option></primary></indexterm>
+          <option>-fprof-auto</option>:
+          <indexterm><primary><option>-fprof-auto</option></primary></indexterm>
+        </term>
+	<listitem>
+          <para><emphasis>All</emphasis> bindings not marked INLINE,
+          whether exported or not, top level or nested, will be given
+          automatic <literal>SCC</literal> annotations.  Functions
+          marked INLINE must be given a cost centre manually.</para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-fprof-auto-top</option>:
+          <indexterm><primary><option>-fprof-auto-top</option></primary></indexterm>
           <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
         </term>
 	<listitem>
-	  <para> GHC will automatically add
-          <function>&lowbar;scc&lowbar;</function> constructs for all
-          top-level, exported functions not marked INLINE. If you
-          want a cost centre on an INLINE function, you have to add
-          it manually.</para>
+	  <para>GHC will automatically add <literal>SCC</literal>
+	  annotations for all top-level bindings not marked INLINE. If
+	  you want a cost centre on an INLINE function, you have to
+	  add it manually.</para>
 	</listitem>
       </varlistentry>
 
       <varlistentry>
 	<term>
-          <option>-auto-all</option>:
-          <indexterm><primary><option>-auto-all</option></primary></indexterm>
+          <option>-fprof-auto-exported</option>:
+          <indexterm><primary><option>-fprof-auto-top</option></primary></indexterm>
+          <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
         </term>
 	<listitem>
-          <para> <emphasis>All</emphasis> top-level functions
-          not marked INLINE, exported or not, will be automatically
-          <function>&lowbar;scc&lowbar;</function>'d.
-          The functions marked INLINE must be given a cost centre
-          manually.</para>
+	  <para>GHC will automatically add <literal>SCC</literal>
+          annotations for all exported functions not marked
+	  INLINE. If you want a cost centre on an INLINE function, you
+	  have to add it manually.</para>
 	</listitem>
       </varlistentry>
 
       <varlistentry>
 	<term>
-          <option>-caf-all</option>:
-          <indexterm><primary><option>-caf-all</option></primary></indexterm>
+          <option>-fprof-cafs</option>:
+          <indexterm><primary><option>-fprof-cafs</option></primary></indexterm>
         </term>
 	<listitem>
-	  <para> The costs of all CAFs in a module are usually
+	  <para>The costs of all CAFs in a module are usually
 	  attributed to one &ldquo;big&rdquo; CAF cost-centre. With
 	  this option, all CAFs get their own cost-centre.  An
 	  &ldquo;if all else fails&rdquo; option&hellip;</para>
@@ -436,17 +451,46 @@ x = nfib 25
 
       <varlistentry>
 	<term>
-          <option>-ignore-scc</option>:
-          <indexterm><primary><option>-ignore-scc</option></primary></indexterm>
+          <option>-fno-prof-auto</option>:
+          <indexterm><primary><option>-no-fprof-auto</option></primary></indexterm>
         </term>
 	<listitem>
-	  <para>Ignore any <function>&lowbar;scc&lowbar;</function>
-          constructs, so a module which already has
-          <function>&lowbar;scc&lowbar;</function>s can be compiled
-          for profiling with the annotations ignored.</para>
-	</listitem>
+          <para>Disables any previous <option>-fprof-auto</option>,
+          <option>-fprof-auto-top</option>, or
+          <option>-fprof-auto-exported</option> options.
+          </para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>
+          <option>-fno-prof-cafs</option>:
+          <indexterm><primary><option>-fno-prof-cafs</option></primary></indexterm>
+        </term>
+	<listitem>
+          <para>Disables any previous <option>-fprof-cafs</option> option.
+          </para>
+        </listitem>
       </varlistentry>
 
+      <varlistentry>
+	<term>
+          <option>-fno-prof-count-entries</option>:
+          <indexterm><primary><option>-fno-prof-count-entries</option></primary></indexterm>
+        </term>
+	<listitem>
+          <para>Tells GHC not to collect information about how often
+          functions are entered at runtime (the "entries" column of
+          the time profile), for this module.  This tends to make the
+          profiled code run faster, and hence closer to the speed of
+          the unprofiled code, because GHC is able to optimise more
+          aggressively if it doesn't have to maintain correct entry
+          counts.  This option can be useful if you aren't interested
+          in the entry counts (for example, if you only intend to do
+          heap profiling).
+          </para>
+        </listitem>
+      </varlistentry>
     </variablelist>
 
   </sect1>
@@ -491,7 +535,7 @@ x = nfib 25
        <listitem>
          <para>Sets the interval that the RTS clock ticks at, which is
          also the sampling interval of the time and allocation profile.
-         The default is 0.02&nbsp;second.</para>
+         The default is 0.02&nbsp;seconds.</para>
        </listitem>
      </varlistentry>
 
@@ -501,13 +545,14 @@ x = nfib 25
           <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm>
         </term>
 	<listitem>
-	  <para>This option makes use of the extra information
-	  maintained by the cost-centre-stack profiler to provide
-	  useful information about the location of runtime errors.
-	  See <xref linkend="rts-options-debugging"/>.</para>
-	</listitem>
+          <para>This option causes the runtime to print out the
+          current cost-centre stack whenever an exception is raised.
+          This can be particularly useful for debugging the location
+          of exceptions, such as the notorious <literal>Prelude.head:
+          empty list</literal> error.  See <xref
+          linkend="rts-options-debugging"/>.</para>
+        </listitem>
       </varlistentry>
-
     </variablelist>
 
   </sect1>
@@ -520,7 +565,7 @@ x = nfib 25
     over time.  This is useful for detecting the causes of
     <firstterm>space leaks</firstterm>, when your program holds on to
     more memory at run-time that it needs to.  Space leaks lead to
-    longer run-times due to heavy garbage collector activity, and may
+    slower execution due to heavy garbage collector activity, and may
     even cause the program to run out of memory altogether.</para>
 
     <para>To generate a heap profile from your program:</para>
@@ -532,7 +577,7 @@ x = nfib 25
       </listitem>
       <listitem>
 	<para>Run it with one of the heap profiling options described
-	below (eg. <option>-hc</option> for a basic producer profile).
+        below (eg. <option>-h</option> for a basic producer profile).
 	This generates the file
 	<filename><replaceable>prog</replaceable>.hp</filename>.</para>
       </listitem>
@@ -550,6 +595,16 @@ x = nfib 25
       </listitem>
     </orderedlist>
 
+    <para>For example, here is a heap profile produced for the program given above in <xref linkend="scc-pragma" />:</para>
+
+      <!--
+           contentwidth/contentheight don't appear to have any effect
+           other than making the PS file generation work, rather than
+           falling over.  The result seems to be broken PS on the page
+           with the image. -->
+      <imagedata fileref="prof_scc.png" contentwidth="645px"
+      contentdepth="428px"/>
+
     <para>You might also want to take a look
       at <ulink url="http://www.haskell.org/haskellwiki/Hp2any">hp2any</ulink>,
       a more advanced suite of tools (not distributed with GHC) for
@@ -571,7 +626,7 @@ x = nfib 25
             <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
           </term>
 	  <listitem>
-	    <para>Breaks down the graph by the cost-centre stack which
+            <para>(can be shortened to <option>-h</option>). Breaks down the graph by the cost-centre stack which
 	    produced the data.</para>
 	  </listitem>
 	</varlistentry>
@@ -782,8 +837,9 @@ x = nfib 25
 	    space the program is using.</para>
 
 	    <para>Memory occupied by threads and their stacks is
-	    labelled as &ldquo;TSO&rdquo; when displaying the profile
-	    by closure description or type description.</para>
+	    labelled as &ldquo;TSO&rdquo; and &ldquo;STACK&rdquo;
+	    respectively when displaying the profile by closure
+	    description or type description.</para>
 	  </listitem>
 	</varlistentry>
 
@@ -1305,14 +1361,13 @@ to re-read its input file:
     <indexterm><primary>hpc</primary></indexterm>
 
     <para>
-      Code coverage tools allow a programmer to determine what parts of
-      their code have been actually executed, and which parts have
+      Code coverage tools allow a programmer to determine what parts
+      of their code have been actually executed, and which parts have
       never actually been invoked.  GHC has an option for generating
       instrumented code that records code coverage as part of the
-      <ulink url="http://www.haskell.org/hpc">Haskell Program Coverage
-      </ulink>(HPC) toolkit, which is included with GHC. HPC tools can
-      be used to render the generated code coverage information into
-      human understandable format.  </para>
+      Haskell Program Coverage (HPC) toolkit, which is included with
+      GHC. HPC tools can be used to render the generated code coverage
+      information into human understandable format.  </para>
 
     <para>
       Correctly instrumented code provides coverage information of two
@@ -1327,8 +1382,8 @@ to re-read its input file:
 
     <para>
       HPC displays both kinds of information in two primary ways:
-      textual reports with summary statistics (hpc report) and sources
-      with color mark-up (hpc markup).  For boolean coverage, there
+      textual reports with summary statistics (<literal>hpc report</literal>) and sources
+      with color mark-up (<literal>hpc markup</literal>).  For boolean coverage, there
       are four possible outcomes for each guard, condition or
       qualifier: both True and False values occur; only True; only
       False; never evaluated. In hpc-markup output, highlighting with
@@ -1340,7 +1395,7 @@ to re-read its input file:
    <sect2><title>A small example: Reciprocation</title>
 
     <para>
-     For an example we have a program, called Recip.hs, which computes exact decimal
+     For an example we have a program, called <filename>Recip.hs</filename>, which computes exact decimal
      representations of reciprocals, with recurring parts indicated in
      brackets.
     </para>
@@ -1377,21 +1432,35 @@ main = do
   main
 </programlisting>
 
-    <para>The HPC instrumentation is enabled using the -fhpc flag.
+    <para>HPC instrumentation is enabled with the -fhpc flag:
     </para>
 
 <screen>
-$ ghc -fhpc Recip.hs --make
+$ ghc -fhpc Recip.hs
 </screen>
-    <para>HPC index (.mix) files are placed in .hpc subdirectory. These can be considered like
-    the .hi files for HPC.
-   </para>
+    <para>GHC creates a subdirectory <filename>.hpc</filename> in the
+    current directory, and puts HPC index (<filename>.mix</filename>)
+    files in there, one for each module compiled.  You don't need to
+    worry about these files: they contain information needed by the
+    <literal>hpc</literal> tool to generate the coverage data for
+    compiled modules after the program is run.</para>
 <screen>
 $ ./Recip
 1/3
 = 0.(3)
 </screen>
-    <para>We can generate a textual summary of coverage:</para>
+    <para>Running the program generates a file with the
+    <literal>.tix</literal> suffix, in this case
+    <filename>Recip.tix</filename>, which contains the coverage data
+    for this run of the program.  The program may be run multiple
+    times (e.g. with different test data), and the coverage data from
+    the separate runs is accumulated in the <filename>.tix</filename>
+    file.  To reset the coverage data and start again, just remove the
+    <filename>.tix</filename> file.
+    </para>
+
+    <para>Having run the program, we can generate a textual summary of
+    coverage:</para>
 <screen>
 $ hpc report Recip
  80% expressions used (81/101)
@@ -1418,20 +1487,32 @@ writing Recip.hs.html
      </sect2>
 
      <sect2><title>Options for instrumenting code for coverage</title>
-	<para>
-		Turning on code coverage is easy, use the -fhpc flag.
-		Instrumented and non-instrumented can be freely mixed.
-		When compiling the Main module GHC automatically detects when there
-		is an hpc compiled file, and adds the correct initialization code.
-	</para>
+
+     <variablelist>
+       <varlistentry>
+         <term><option>-fhpc</option></term>
+         <indexterm><primary><option>-fhpc</option></primary></indexterm>
+        <listitem>
+          <para>Enable code coverage for the current module or modules
+          being compiled.</para>
+
+          <para>Modules compiled with this option can be freely mixed
+          with modules compiled without it; indeed, most libraries
+          will typically be compiled without <option>-fhpc</option>.
+          When the program is run, coverage data will only be
+          generated for those modules that were compiled with
+          <option>-fhpc</option>, and the <literal>hpc</literal> tool
+          will only show information about those modules.
+          </para>
+	</listitem>
+      </varlistentry>
+     </variablelist>
 
      </sect2>
 
      <sect2><title>The hpc toolkit</title>
 
-      <para>
-      The hpc toolkit uses a cvs/svn/darcs-like interface, where a
-      single binary contains many function units.</para>
+      <para>The hpc command has several sub-commands:</para>
 <screen>
 $ hpc
 Usage: hpc COMMAND ...
@@ -1453,18 +1534,17 @@ Others:
   version     Display version for hpc
 </screen>
 
-     <para>In general, these options act on .tix file after an
-     instrumented binary has generated it, which hpc acting as a
-     conduit between the raw .tix file, and the more detailed reports
-     produced.
+     <para>In general, these options act on a
+     <filename>.tix</filename> file after an instrumented binary has
+     generated it.
 	</para>
 
 	<para>
 		The hpc tool assumes you are in the top-level directory of
-		the location where you built your application, and the .tix
+                the location where you built your application, and the <filename>.tix</filename>
 		file is in the same top-level directory. You can use the
-		flag --srcdir to use hpc for any other directory, and use
-		--srcdir multiple times to analyse programs compiled from
+                flag <option>--srcdir</option> to use <literal>hpc</literal> for any other directory, and use
+                <option>--srcdir</option> multiple times to analyse programs compiled from
 		difference locations, as is typical for packages.
 	</para>
 
@@ -1473,10 +1553,10 @@ Others:
      </para>
 
        <sect3><title>hpc report</title>
-		<para>hpc report gives a textual report of coverage. By default,
+                <para><literal>hpc report</literal> gives a textual report of coverage. By default,
 			all modules and packages are considered in generating report,
 			unless include or exclude are used. The report is a summary
-			unless the --per-module flag is used. The --xml-output option
+                        unless the <option>--per-module</option> flag is used. The <option>--xml-output</option> option
 			allows for tools to use hpc to glean coverage.
 		</para>
 <screen>
@@ -1497,7 +1577,7 @@ Options:
 </screen>
        </sect3>
        <sect3><title>hpc markup</title>
-		<para>hpc markup marks up source files into colored html.
+                <para><literal>hpc markup</literal> marks up source files into colored html.
 		</para>
 <screen>
 $ hpc help markup
@@ -1518,8 +1598,8 @@ Options:
 
        </sect3>
        <sect3><title>hpc sum</title>
-		<para>hpc sum adds together any number of .tix files into a single
-		.tix file. hpc sum does not change the original .tix file; it generates a new .tix file.
+                <para><literal>hpc sum</literal> adds together any number of <filename>.tix</filename> files into a single
+                <filename>.tix</filename> file. <literal>hpc sum</literal> does not change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file.
 		</para>
 <screen>
 $ hpc help sum
@@ -1535,10 +1615,10 @@ Options:
 </screen>
        </sect3>
        <sect3><title>hpc combine</title>
-		<para>hpc combine is the swiss army knife of hpc. It can be
-		 used to take the difference between .tix files, to subtract one
-		.tix file from another, or to add two .tix files. hpc combine does not
-		change the original .tix file; it generates a new .tix file.
+                <para><literal>hpc combine</literal> is the swiss army knife of <literal>hpc</literal>. It can be
+                 used to take the difference between <filename>.tix</filename> files, to subtract one
+		<filename>.tix</filename> file from another, or to add two <filename>.tix</filename> files. hpc combine does not
+		change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file.
 		</para>
 <screen>
 $ hpc help combine
@@ -1556,8 +1636,8 @@ Options:
 </screen>
        </sect3>
        <sect3><title>hpc map</title>
-		<para>hpc map inverts or zeros a .tix file. hpc map does not
-		change the original .tix file; it generates a new .tix file.
+		<para>hpc map inverts or zeros a <filename>.tix</filename> file. hpc map does not
+		change the original <filename>.tix</filename> file; it generates a new <filename>.tix</filename> file.
 		</para>
 <screen>
 $ hpc help map
@@ -1608,9 +1688,9 @@ Options:
      </sect2>
      <sect2><title>Caveats and Shortcomings of Haskell Program Coverage</title>
 	  <para>
-		HPC does not attempt to lock the .tix file, so multiple concurrently running
+		HPC does not attempt to lock the <filename>.tix</filename> file, so multiple concurrently running
 		binaries in the same directory will exhibit a race condition. There is no way
-		to change the name of the .tix file generated, apart from renaming the binary.
+		to change the name of the <filename>.tix</filename> file generated, apart from renaming the binary.
 		HPC does not work with GHCi.
   	  </para>
     </sect2>
diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml
index 2af87bcc41..20aca07cd3 100644
--- a/docs/users_guide/runtime_control.xml
+++ b/docs/users_guide/runtime_control.xml
@@ -1068,7 +1068,7 @@ char *ghc_rts_opts = "-H128m -K1m";
               option</secondary></indexterm>
         </term>
         <listitem>
-          <para>Generates a basic heap profile, in the
+          <para>(can be shortened to <option>-h</option>.) Generates a basic heap profile, in the
             file <literal><replaceable>prog</replaceable>.hp</literal>.
             To produce the heap profile graph,
             use <command>hp2ps</command> (see <xref linkend="hp2ps"
@@ -1101,7 +1101,7 @@ char *ghc_rts_opts = "-H128m -K1m";
         <para>
           In binary format to a file for later analysis by a
           variety of tools.  One such tool
-          is <ulink url="http://hackage.haskell.org/package/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>,
+          is <ulink url="http://www.haskell.org/haskellwiki/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>,
           which interprets the event log to produce a visual parallel
           execution profile of the program.
         </para>
@@ -1314,33 +1314,53 @@ char *ghc_rts_opts = "-H128m -K1m";
 	  <para>(Only available when the program is compiled for
 	  profiling.)  When an exception is raised in the program,
 	  this option causes the current cost-centre-stack to be
-	  dumped to <literal>stderr</literal>.</para>
+          dumped to <literal>stderr</literal>.</para>
 
 	  <para>This can be particularly useful for debugging: if your
 	  program is complaining about a <literal>head []</literal>
 	  error and you haven't got a clue which bit of code is
 	  causing it, compiling with <literal>-prof
-	  -auto-all</literal> and running with <literal>+RTS -xc
+          -fprof-auto</literal> and running with <literal>+RTS -xc
 	  -RTS</literal> will tell you exactly the call stack at the
 	  point the error was raised.</para>
 
-	  <para>The output contains one line for each exception raised
-	  in the program (the program might raise and catch several
-	  exceptions during its execution), where each line is of the
-	  form:</para>
+          <para>The output contains one report for each exception
+          raised in the program (the program might raise and catch
+          several exceptions during its execution), where each report
+          looks something like this:
+          </para>
 
 <screen>
-&lt; cc<subscript>1</subscript>, ..., cc<subscript>n</subscript> &gt;
+*** Exception raised (reporting due to +RTS -xc), stack trace:
+  GHC.List.CAF
+  --> evaluated by: Main.polynomial.table_search,
+  called from Main.polynomial.theta_index,
+  called from Main.polynomial,
+  called from Main.zonal_pressure,
+  called from Main.make_pressure.p,
+  called from Main.make_pressure,
+  called from Main.compute_initial_state.p,
+  called from Main.compute_initial_state,
+  called from Main.CAF
+  ...
 </screen>
-	  <para>each <literal>cc</literal><subscript>i</subscript> is
-	  a cost centre in the program (see <xref
-	  linkend="cost-centres"/>), and the sequence represents the
-	  &ldquo;call stack&rdquo; at the point the exception was
-	  raised.  The leftmost item is the innermost function in the
-	  call stack, and the rightmost item is the outermost
-	  function.</para>
-
-	</listitem>
+          <para>The stack trace may often begin with something
+          uninformative like <literal>GHC.List.CAF</literal>; this is
+          an artifact of GHC's optimiser, which lifts out exceptions
+          to the top-level where the profiling system assigns them to
+          the cost centre "CAF".  However, <literal>+RTS -xc</literal>
+          doesn't just print the current stack, it looks deeper and
+          reports the stack at the time the CAF was evaluated, and it
+          may report further stacks until a non-CAF stack is found.  In
+          the example above, the next stack (after <literal>-->
+          evaluated by</literal>) contains plenty of information about
+          what the program was doing when it evaluated <literal>head
+          []</literal>.</para>
+
+          <para>Implementation details aside, the function names in
+          the stack should hopefully give you enough clues to track
+          down the bug.</para>
+        </listitem>
       </varlistentry>
 
       <varlistentry>
author	Simon Marlow <marlowsd@gmail.com>	2011-10-27 13:47:27 +0100
committer	Simon Marlow <marlowsd@gmail.com>	2011-11-02 16:34:05 +0000
commit	7bb0447df9a783c222c2a077e35e5013c7c68d91 (patch)
tree	78d6d2a14f7e42df5cda32199c71ced973f169ef /docs
parent	bd72eeb184a95ae0ae79ccad19c8ccc2b45a12e0 (diff)
download	haskell-7bb0447df9a783c222c2a077e35e5013c7c68d91.tar.gz