summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatthew Pickering <matthewtpickering@gmail.com>2022-06-27 13:04:07 +0100
committersheaf <sam.derbyshire@gmail.com>2023-02-13 11:36:41 +0100
commit1e9eac1c01077c2cd3ce5bf0e4a77c0affaf88e9 (patch)
tree6c5338340c06f6ddb937fe8e1d339606d1b46a52
parent133516af8426d775fa0dc75c787edd56299ee6cf (diff)
downloadhaskell-1e9eac1c01077c2cd3ce5bf0e4a77c0affaf88e9.tar.gz
Refresh profiling docs
I went through the whole of the profiling docs and tried to amend them to reflect current best practices and tooling. In particular I removed some old references to tools such as hp2any and replaced them with references to eventlog2html.
-rw-r--r--docs/users_guide/images/eventlog_profile.pngbin0 -> 210110 bytes
-rw-r--r--docs/users_guide/profiling.rst207
2 files changed, 57 insertions, 150 deletions
diff --git a/docs/users_guide/images/eventlog_profile.png b/docs/users_guide/images/eventlog_profile.png
new file mode 100644
index 0000000000..f94827a6d2
--- /dev/null
+++ b/docs/users_guide/images/eventlog_profile.png
Binary files differ
diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst
index 1c2f458f10..d8199adfd7 100644
--- a/docs/users_guide/profiling.rst
+++ b/docs/users_guide/profiling.rst
@@ -10,17 +10,13 @@ Profiling
GHC comes with a time and space profiling system, so that you can answer
questions like "why is my program so slow?", or "why is my program using
-so much memory?".
+so much memory?". We'll start by describing how to do time profiling.
-Profiling a program is a three-step process:
+Time profiling a program is a three-step process:
1. Re-compile your program for profiling with the :ghc-flag:`-prof` option, and
probably one of the options for adding automatic annotations:
- :ghc-flag:`-fprof-auto` is the most common [1]_.
-
- If you are using external packages with :command:`cabal`, you may need to
- reinstall these packages with profiling support; typically this is
- done with ``cabal install -p package --reinstall``.
+ :ghc-flag:`-fprof-late` is the recommended option.
2. Having compiled the program for profiling, you now need to run it to
generate the profile. For example, a simple time profile can be
@@ -197,7 +193,10 @@ Inserting cost centres by hand
Cost centres are just program annotations. When you say ``-fprof-auto``
to the compiler, it automatically inserts a cost centre annotation
around every binding not marked INLINE in your program, but you are
-entirely free to add cost centre annotations yourself.
+entirely free to add cost centre annotations yourself. Be careful adding too many
+cost-centre annotations as the optimiser is careful to not move them around or
+remove them, which can severly affect how your program is optimised and hence the
+runtime performance!
The syntax of a cost centre annotation for expressions is ::
@@ -356,7 +355,9 @@ Automatically placing cost-centres
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GHC has a number of flags for automatically inserting cost-centres into the
-compiled program.
+compiled program. Use these options carefully because inserting too many cost-centres
+in the wrong places will mean the optimiser will be less effective and the runtime behaviour
+of your profiled program will be different to that of the unprofiled one.
.. ghc-flag:: -fprof-callers=⟨name⟩
:shortdesc: Auto-add ``SCC``\\ s to all call-sites of the named function.
@@ -618,8 +619,10 @@ enclosed between ``+RTS ... -RTS`` as usual):
JSON profile format
~~~~~~~~~~~~~~~~~~~
-When invoked with the :rts-flag:`-pj` flag the runtime will emit the cost-centre
-profile in a machine-readable JSON format. The top-level object of this format
+profile in a machine-readable JSON format. The JSON file can be directly loaded
+into `speedscope.app <https://www.speedscope.app/>`_ to interactively view the profile.
+
+The top-level object of this format
has the following properties,
``program`` (string)
@@ -758,8 +761,12 @@ For instance, a simple profile might look like this,
}
}
+Eventlog profile format
+~~~~~~~~~~~~~~~~~~~~~~~
-
+In addition to the ``.prof`` and ``.json`` formats the cost centre definitions
+and samples are also emitted to the :ref:`eventlog <rts-eventlog>`. The format
+of the events is specified in the :ref:`eventlog encodings <eventlog-encodings>` section.
.. _prof-heap:
@@ -774,18 +781,35 @@ program holds on to more memory at run-time that it needs to. Space
leaks lead to slower execution due to heavy garbage collector activity,
and may even cause the program to run out of memory altogether.
+Heap profiling differs from time profiling in the fact that is not always
+necessary to use the profiling runtime to generate a heap profile. There
+are two heap profiling modes (:rts-flag:`-hT` and :rts-flag:`-hi` [1]_) which are always
+available.
+
To generate a heap profile from your program:
-1. Compile the program for profiling (:ref:`prof-compiler-options`).
+1. Assuming you need the profiling runtime, compile the program for profiling (:ref:`prof-compiler-options`).
2. Run it with one of the heap profiling options described below (eg.
- :rts-flag:`-hc` for a basic producer profile). This generates the file
- :file:`{prog}.hp`.
+ :rts-flag:`-hc` for a basic producer profile) and enable the eventlog using :rts-flag:`-l <-l ⟨flags⟩>`.
- If the :ref:`event log <rts-eventlog>` is enabled (with the :rts-flag:`-l ⟨flags⟩`
- runtime system flag) heap samples will additionally be emitted to the GHC
+ Heap samples will be emitted to the GHC
event log (see :ref:`heap-profiler-events` for details about event format).
+3. Render the heap profile using `eventlog2html <https://hackage.haskell.org/package/eventlog2html>`_.
+ This produces an HTML file which contains the visualised profile.
+
+4. Open the rendered interactive profile in your web browser.
+
+For example, here is a heap profile produced of using eventlog profiling on GHC
+compiling the Cabal library. You can read a lot more about eventlog2html on the website.
+
+.. image:: images/eventlog_profile.*
+
+Note that there is the legacy :file:`{prog}.hp` format which has been deprecated
+in favour of eventlog based profiling. In order to render the legacy format, the
+steps are as follows.
+
3. Run :command:`hp2ps` to produce a Postscript file, :file:`{prog}.ps`. The
:command:`hp2ps` utility is described in detail in :ref:`hp2ps`.
@@ -797,10 +821,6 @@ from GHC's ``nofib`` benchmark suite,
.. image:: images/prof_scc.*
-You might also want to take a look at
-`hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
-suite of tools (not distributed with GHC) for displaying heap profiles.
-
Note that there might be a big difference between the OS reported memory usage
of your program and the amount of live data as reported by heap profiling.
The reasons for the difference are explained in :ref:`hints-os-memory`.
@@ -817,20 +837,14 @@ following RTS options select which break-down to use:
.. rts-flag:: -hT
- Breaks down the graph by heap closure type.
+ Breaks down the graph by heap closure type. This does not require the profiling
+ runtime.
.. rts-flag:: -hc
- -h
*Requires* :ghc-flag:`-prof`. Breaks down the graph by the cost-centre stack
which produced the data.
- .. note:: The meaning of the shortened :rts-flag:`-h` is dependent on whether
- your program was compiled for profiling. When compiled for profiling,
- :rts-flag:`-h` is equivalent to :rts-flag:`-hc`, but otherwise is
- equivalent to :rts-flag:`-hT` (see :ref:`rts-profiling`). The :rts-flag:`-h`
- is deprecated and will be removed in a future release.
-
.. rts-flag:: -hm
*Requires* :ghc-flag:`-prof`. Break down the live heap by the module
@@ -863,7 +877,7 @@ following RTS options select which break-down to use:
Break down the graph by the address of the info table of a closure. For this
to produce useful output the program must have been compiled with
- :ghc-flag:`-finfo-table-map`.
+ :ghc-flag:`-finfo-table-map` but it does not require the profiling runtime.
.. rts-flag:: -l
:noindex:
@@ -1041,6 +1055,14 @@ This trick isn't foolproof, because there might be other ``B`` closures in
the heap which aren't the retainers we are interested in, but we've
found this to be a useful technique in most cases.
+Precise Retainer Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to precisely answer questions about why a certain type of closure is
+retained then it is worthwhile using `ghc-debug <https://gitlab.haskell.org/ghc/ghc-debug>`_ which
+has a terminal interface which can be used to easily answer queries such as, what is retaining
+a certain closure.
+
.. _biography-prof:
Biographical Profiling
@@ -1120,6 +1142,9 @@ reasons for this:
allocated by foreign libraries, and data allocated by the RTS), and
``mmap()``\'d memory are not counted in the heap profile.
+For more discussion about understanding how understanding process residency see
+:ref:`hints-os-memory`.
+
.. _hp2ps:
``hp2ps`` -- Rendering heap profiles to PostScript
@@ -1242,123 +1267,6 @@ The flags are:
Print out usage information.
-.. _manipulating-hp:
-
-Manipulating the ``hp`` file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-(Notes kindly offered by Jan-Willem Maessen.)
-
-The ``FOO.hp`` file produced when you ask for the heap profile of a
-program ``FOO`` is a text file with a particularly simple structure.
-Here's a representative example, with much of the actual data omitted:
-
-.. code-block:: none
-
- JOB "FOO -hC"
- DATE "Thu Dec 26 18:17 2002"
- SAMPLE_UNIT "seconds"
- VALUE_UNIT "bytes"
- BEGIN_SAMPLE 0.00
- END_SAMPLE 0.00
- BEGIN_SAMPLE 15.07
- ... sample data ...
- END_SAMPLE 15.07
- BEGIN_SAMPLE 30.23
- ... sample data ...
- END_SAMPLE 30.23
- ... etc.
- BEGIN_SAMPLE 11695.47
- END_SAMPLE 11695.47
-
-The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
-``VALUE_UNIT``) form a header. Each block of lines starting with
-``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
-(you can think of this as a vertical slice of your heap profile). The
-hp2ps utility should accept any input with a properly-formatted header
-followed by a series of *complete* samples.
-
-Zooming in on regions of your profile
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can look at particular regions of your profile simply by loading a
-copy of the ``.hp`` file into a text editor and deleting the unwanted
-samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
-viewed or printed.
-
-Viewing the heap profile of a running program
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``.hp`` file is generated incrementally as your program runs. In
-principle, running :command:`hp2ps` on the incomplete file should produce a
-snapshot of your program's heap usage. However, the last sample in the
-file may be incomplete, causing :command:`hp2ps` to fail. If you are using a
-machine with UNIX utilities installed, it's not too hard to work around
-this problem (though the resulting command line looks rather Byzantine):
-
-.. code-block:: sh
-
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
-
-The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
-complete sample in ``FOO.hp``, and labels each sample with its ending
-line number. We then select the line number of the last complete sample
-using :command:`tail` and :command:`cut`. This is used as a parameter to :command:`head`; the
-result is as if we deleted the final incomplete sample from :file:`FOO.hp`.
-This results in a properly-formatted .hp file which we feed directly to
-:command:`hp2ps`.
-
-Viewing a heap profile in real time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The :command:`gv` and :command:`ghostview` programs have a "watch file" option
-can be used to view an up-to-date heap profile of your program as it runs.
-Simply generate an incremental heap profile as described in the previous
-section. Run :command:`gv` on your profile:
-
-.. code-block:: sh
-
- gv -watch -orientation=seascape FOO.ps
-
-If you forget the ``-watch`` flag you can still select "Watch file" from
-the "State" menu. Now each time you generate a new profile ``FOO.ps``
-the view will update automatically.
-
-This can all be encapsulated in a little script:
-
-.. code-block:: sh
-
- #!/bin/sh
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- gv -watch -orientation=seascape FOO.ps &
- while [ 1 ] ; do
- sleep 10 # We generate a new profile every 10 seconds.
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- done
-
-Occasionally :command:`gv` will choke as it tries to read an incomplete copy of
-:file:`FOO.ps` (because :command:`hp2ps` is still running as an update occurs). A
-slightly more complicated script works around this problem, by using the
-fact that sending a SIGHUP to gv will cause it to re-read its input
-file:
-
-.. code-block:: sh
-
- #!/bin/sh
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- gv FOO.ps &
- gvpsnum=$!
- while [ 1 ] ; do
- sleep 10
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- kill -HUP $gvpsnum
- done
-
.. _prof-threaded:
Profiling Parallel and Concurrent Programs
@@ -1968,10 +1876,9 @@ Notes about ticky profiling
in some columns. For this reason using an eventlog-based approach should be prefered if
possible.
-
.. [1]
- :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to
- GHC 7.4.1.
+ :rts-flag:`-hi` profiling is avaible with the normal runtime but you will need to
+ compile with :ghc-flag:`-finfo-table-map` to interpret the results.
.. [2]
Note that this policy has changed slightly in GHC 7.4.1 relative to