summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatthew Pickering <matthewtpickering@gmail.com>2019-10-01 15:04:31 +0100
committerMatthew Pickering <matthewtpickering@gmail.com>2019-10-01 15:04:31 +0100
commit0c25edfbeb66a0d4ea120428a9634a494d2cbf35 (patch)
tree319ebf0ee9f97ccf72f195ea7613ff8f69d66bf3
parent0717c800eff5f8631695f62c5d61ffc51ada0d9e (diff)
downloadhaskell-wip/DanielG/ghc-rts-root-profiler.tar.gz
Docs and library function WIPwip/DanielG/ghc-rts-root-profiler
-rw-r--r--docs/users_guide/profiling.rst115
-rw-r--r--libraries/base/GHC/Profiling.hs59
2 files changed, 173 insertions, 1 deletions
diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst
index 1a083eb711..c20def0780 100644
--- a/docs/users_guide/profiling.rst
+++ b/docs/users_guide/profiling.rst
@@ -730,6 +730,12 @@ following RTS options select which break-down to use:
Biographical profiling is described in more detail below
(:ref:`biography-prof`).
+.. rts-flag:: -ho
+
+ *Requires* :ghc-flag:`-prof`. Break down the graph by specific retainer
+ root. The root profiling mode is described in detail below
+ (:ref:`root-prof`).
+
.. rts-flag:: -l
:noindex:
@@ -949,6 +955,115 @@ states, the next step is usually to find the retainer(s):
This two stage process is required because GHC cannot currently
profile using both biographical and retainer information simultaneously.
+.. _root-prof:
+
+Root Profiling
+~~~~~~~~~~~~~~~~~~~~~~
+
+The root profiling mode aims to answer questions about the total size of
+specific "roots" in a program. In your program you specify a certain set of
+roots, the profiler will report how much memory is accessible from each root.
+Typical profiling modes such as ``-hy`` don't
+understand the relationship between different closures so you may end up trying
+to understand where, more generally allocations come from.
+
+A root is different from cost centre profiling because the allocations under
+a root may come from lots of different places. For example, consider a cache which
+is updated incrementally over the course of the program. In the root profiling mode
+you can attach a root to each cache and see how their size grows over the course
+of a program. In other profiling modes you can usually observe this indirectly
+by seeing an abundance of constructors but it's hard to be precise about what
+is causing the allocation.
+
+The ``setHeapRoots`` function can be imported from ``GHC.Profiling``. It is used
+to create the roots of the profile. Consider the following example program,
+the ultimate goal is to work out which part of ``HscEnv`` contributes the
+most allocations in total. In realistic applications you end up with data structures
+like ``HscEnv`` which are very deep and contribute significantly to total allocation
+of the program. The problem, the allocations are only identified by allocation
+of more primitive types such as ``[]`` so it's hard to pinpoint where the
+problem actually is.
+
+Therefore in order to investigate the ``HscEnv`` we create three different
+roots. One which points at the ``HscEnv``, one for ``EPS`` and one for ``FC``.
+
+
+.. code-block:: none
+ module Main where
+
+ import GHC.Profiling
+
+ data HscEnv = HscEnv { hsc_EPS :: EPS, hsc_FC :: FC, hsc_OTHER :: OTHER }
+ data EPS = EPS { eps_A, eps_B, eps_C :: Word }
+ data FC = FC { fc_A, fc_B, fc_C, fc_D :: Word }
+ data OTHER = OTHER { o_A, o_B, o_C :: Word }
+
+ main = do
+ setHeapRoots
+ [ Root "hsc" hsc
+ , Root "eps" (hsc_EPS hsc)
+ , Root "fc" (hsc_FC hsc)
+ ]
+
+ hsc = HscEnv
+ { hsc_EPS = EPS x y e
+ , hsc_FC = FC x y z f
+ , hsc_OTHER = OTHER x z t
+ }
+
+
+ x = 1
+ y = 2
+ z = 3
+ e = 4
+ f = 5
+ t = 6
+
+One sample then would look like the following:
+
+.. code-block::
+ hsc 80
+ eps 0
+ fc 0
+ hsc-eps 48
+ hsc-fc 72
+ eps-fc 0
+ hsc-eps-fc 32
+
+- "hsc" is using 80 bytes of memory which are not shared with any of the
+ other roots.
+ - 1 word info ptr + 3 words payload of ``HscEnv``
+ - 1 word info ptr + 3 words payload of OTHER
+ - 1 word info ptr + 1 word payload of Word of `t`
+ - or (1 + 3 + 1 + 3 + 1 + 1 = 10 words) * 8 bytes = 80 bytes
+
+- "eps" and "fc" do not have any unshared memory usage since they are fully
+ contained in, and thus reachable from, "hsc",
+
+- "hsc" and "eps" share 48 bytes, i.e. 48 bytes worth of heap objects are
+ reachable from both the "hsc" and "eps" roots but not from "fc". Looking
+ at the definition, only `e` is not shared with anything else hence:
+
+ - 1 word info ptr + 3 words payload of EPS
+ - 1 word info ptr + 1 words payload of Word32 of ``e``
+ - or (1 + 3 + (1 + 1) = 6 words) * 8 bytes = 48 bytes. You get the idea.
+
+- "hsc" and "fc" share 72 bytes exclusively and
+
+- "hsc", "eps" and "fc" share 32 bytes exclusively.
+
+
+The sum of all these "bins" ``80 + 48 + 72 + 32 = 232`` is the total amount
+of memory reachable from the set of roots, hence it makes sense to display
+them stacked in a diagram as ``hp2ps`` does.
+
+
+
+.. NOTE::
+ The number of roots is currently limited to 20.
+
+
+
.. _mem-residency:
Actual memory residency
diff --git a/libraries/base/GHC/Profiling.hs b/libraries/base/GHC/Profiling.hs
index 917a208b30..19d83e1d4d 100644
--- a/libraries/base/GHC/Profiling.hs
+++ b/libraries/base/GHC/Profiling.hs
@@ -1,9 +1,40 @@
+
{-# LANGUAGE Trustworthy #-}
+{-# LANGUAGE CPP #-}
{-# LANGUAGE NoImplicitPrelude #-}
+{-# LANGUAGE ForeignFunctionInterface, ExistentialQuantification #-}
--- | @since 4.7.0.0
module GHC.Profiling where
+import Data.List
+import Prelude (fromIntegral)
+import Control.Monad ( forM )
+
+
+import Control.Exception (evaluate)
+
+
+
+import Foreign.C.String
+
+import Foreign.C.Types
+
+import Foreign.Marshal.Array
+
+import Foreign.Ptr
+
+import Foreign.StablePtr
+
+import Foreign.Storable
+
+
+import System.IO
+import System.Mem
+
+import Unsafe.Coerce
+
+-- | @since 4.7.0.0
+
import GHC.Base
-- | Stop attributing ticks to cost centres. Allocations will still be
@@ -17,3 +48,29 @@ foreign import ccall stopProfTimer :: IO ()
--
-- @since 4.7.0.0
foreign import ccall startProfTimer :: IO ()
+
+#if defined(PROFILING)
+foreign import ccall unsafe "setRootProfPtrs" c_setRootProfPtrs
+ :: CInt -> Ptr (StablePtr a) -> Ptr CString -> IO ()
+
+foreign import ccall "&g_rootProfileDebugLevel" g_rootProfileDebugLevel
+ :: Ptr CInt
+
+data Root = forall a. Root
+ { rootDescr :: String
+ , rootClosure :: a
+ }
+
+
+setHeapRoots :: [Root] -> IO ()
+setHeapRoots xs = do
+ descs <- mapM (newCString . rootDescr) xs
+ sps <- forM xs $ \(Root _ a) ->
+ newStablePtr =<< evaluate (unsafeCoerce a :: a)
+ withArray descs $ \descs_arr ->
+ withArray sps $ \sps_arr ->
+ c_setRootProfPtrs (fromIntegral (length xs)) sps_arr descs_arr
+
+#endif
+
+