From 0c25edfbeb66a0d4ea120428a9634a494d2cbf35 Mon Sep 17 00:00:00 2001 From: Matthew Pickering Date: Tue, 1 Oct 2019 15:04:31 +0100 Subject: Docs and library function WIP --- docs/users_guide/profiling.rst | 115 ++++++++++++++++++++++++++++++++++++++++ libraries/base/GHC/Profiling.hs | 59 ++++++++++++++++++++- 2 files changed, 173 insertions(+), 1 deletion(-) diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst index 1a083eb711..c20def0780 100644 --- a/docs/users_guide/profiling.rst +++ b/docs/users_guide/profiling.rst @@ -730,6 +730,12 @@ following RTS options select which break-down to use: Biographical profiling is described in more detail below (:ref:`biography-prof`). +.. rts-flag:: -ho + + *Requires* :ghc-flag:`-prof`. Break down the graph by specific retainer + root. The root profiling mode is described in detail below + (:ref:`root-prof`). + .. rts-flag:: -l :noindex: @@ -949,6 +955,115 @@ states, the next step is usually to find the retainer(s): This two stage process is required because GHC cannot currently profile using both biographical and retainer information simultaneously. +.. _root-prof: + +Root Profiling +~~~~~~~~~~~~~~~~~~~~~~ + +The root profiling mode aims to answer questions about the total size of +specific "roots" in a program. In your program you specify a certain set of +roots, the profiler will report how much memory is accessible from each root. +Typical profiling modes such as ``-hy`` don't +understand the relationship between different closures so you may end up trying +to understand where, more generally allocations come from. + +A root is different from cost centre profiling because the allocations under +a root may come from lots of different places. For example, consider a cache which +is updated incrementally over the course of the program. In the root profiling mode +you can attach a root to each cache and see how their size grows over the course +of a program. In other profiling modes you can usually observe this indirectly +by seeing an abundance of constructors but it's hard to be precise about what +is causing the allocation. + +The ``setHeapRoots`` function can be imported from ``GHC.Profiling``. It is used +to create the roots of the profile. Consider the following example program, +the ultimate goal is to work out which part of ``HscEnv`` contributes the +most allocations in total. In realistic applications you end up with data structures +like ``HscEnv`` which are very deep and contribute significantly to total allocation +of the program. The problem, the allocations are only identified by allocation +of more primitive types such as ``[]`` so it's hard to pinpoint where the +problem actually is. + +Therefore in order to investigate the ``HscEnv`` we create three different +roots. One which points at the ``HscEnv``, one for ``EPS`` and one for ``FC``. + + +.. code-block:: none + module Main where + + import GHC.Profiling + + data HscEnv = HscEnv { hsc_EPS :: EPS, hsc_FC :: FC, hsc_OTHER :: OTHER } + data EPS = EPS { eps_A, eps_B, eps_C :: Word } + data FC = FC { fc_A, fc_B, fc_C, fc_D :: Word } + data OTHER = OTHER { o_A, o_B, o_C :: Word } + + main = do + setHeapRoots + [ Root "hsc" hsc + , Root "eps" (hsc_EPS hsc) + , Root "fc" (hsc_FC hsc) + ] + + hsc = HscEnv + { hsc_EPS = EPS x y e + , hsc_FC = FC x y z f + , hsc_OTHER = OTHER x z t + } + + + x = 1 + y = 2 + z = 3 + e = 4 + f = 5 + t = 6 + +One sample then would look like the following: + +.. code-block:: + hsc 80 + eps 0 + fc 0 + hsc-eps 48 + hsc-fc 72 + eps-fc 0 + hsc-eps-fc 32 + +- "hsc" is using 80 bytes of memory which are not shared with any of the + other roots. + - 1 word info ptr + 3 words payload of ``HscEnv`` + - 1 word info ptr + 3 words payload of OTHER + - 1 word info ptr + 1 word payload of Word of `t` + - or (1 + 3 + 1 + 3 + 1 + 1 = 10 words) * 8 bytes = 80 bytes + +- "eps" and "fc" do not have any unshared memory usage since they are fully + contained in, and thus reachable from, "hsc", + +- "hsc" and "eps" share 48 bytes, i.e. 48 bytes worth of heap objects are + reachable from both the "hsc" and "eps" roots but not from "fc". Looking + at the definition, only `e` is not shared with anything else hence: + + - 1 word info ptr + 3 words payload of EPS + - 1 word info ptr + 1 words payload of Word32 of ``e`` + - or (1 + 3 + (1 + 1) = 6 words) * 8 bytes = 48 bytes. You get the idea. + +- "hsc" and "fc" share 72 bytes exclusively and + +- "hsc", "eps" and "fc" share 32 bytes exclusively. + + +The sum of all these "bins" ``80 + 48 + 72 + 32 = 232`` is the total amount +of memory reachable from the set of roots, hence it makes sense to display +them stacked in a diagram as ``hp2ps`` does. + + + +.. NOTE:: + The number of roots is currently limited to 20. + + + .. _mem-residency: Actual memory residency diff --git a/libraries/base/GHC/Profiling.hs b/libraries/base/GHC/Profiling.hs index 917a208b30..19d83e1d4d 100644 --- a/libraries/base/GHC/Profiling.hs +++ b/libraries/base/GHC/Profiling.hs @@ -1,9 +1,40 @@ + {-# LANGUAGE Trustworthy #-} +{-# LANGUAGE CPP #-} {-# LANGUAGE NoImplicitPrelude #-} +{-# LANGUAGE ForeignFunctionInterface, ExistentialQuantification #-} --- | @since 4.7.0.0 module GHC.Profiling where +import Data.List +import Prelude (fromIntegral) +import Control.Monad ( forM ) + + +import Control.Exception (evaluate) + + + +import Foreign.C.String + +import Foreign.C.Types + +import Foreign.Marshal.Array + +import Foreign.Ptr + +import Foreign.StablePtr + +import Foreign.Storable + + +import System.IO +import System.Mem + +import Unsafe.Coerce + +-- | @since 4.7.0.0 + import GHC.Base -- | Stop attributing ticks to cost centres. Allocations will still be @@ -17,3 +48,29 @@ foreign import ccall stopProfTimer :: IO () -- -- @since 4.7.0.0 foreign import ccall startProfTimer :: IO () + +#if defined(PROFILING) +foreign import ccall unsafe "setRootProfPtrs" c_setRootProfPtrs + :: CInt -> Ptr (StablePtr a) -> Ptr CString -> IO () + +foreign import ccall "&g_rootProfileDebugLevel" g_rootProfileDebugLevel + :: Ptr CInt + +data Root = forall a. Root + { rootDescr :: String + , rootClosure :: a + } + + +setHeapRoots :: [Root] -> IO () +setHeapRoots xs = do + descs <- mapM (newCString . rootDescr) xs + sps <- forM xs $ \(Root _ a) -> + newStablePtr =<< evaluate (unsafeCoerce a :: a) + withArray descs $ \descs_arr -> + withArray sps $ \sps_arr -> + c_setRootProfPtrs (fromIntegral (length xs)) sps_arr descs_arr + +#endif + + -- cgit v1.2.1