diff options
author | Matthew Pickering <matthewtpickering@gmail.com> | 2021-02-19 14:24:40 +0000 |
---|---|---|
committer | Matthew Pickering <matthewtpickering@gmail.com> | 2021-12-06 16:27:35 +0000 |
commit | d72720f9b75fbed51a24edfb691d0c77c6e96dbe (patch) | |
tree | a8e460da6d257e0d5c739d1a7056a296fd68997f | |
parent | a9e035a430c7fdc228d56d21b27b3b8e815fd06b (diff) | |
download | haskell-d72720f9b75fbed51a24edfb691d0c77c6e96dbe.tar.gz |
Add section to the user guide about OS memory usage
-rw-r--r-- | docs/users_guide/hints.rst | 101 | ||||
-rw-r--r-- | docs/users_guide/profiling.rst | 4 | ||||
-rw-r--r-- | docs/users_guide/runtime_control.rst | 7 |
3 files changed, 110 insertions, 2 deletions
diff --git a/docs/users_guide/hints.rst b/docs/users_guide/hints.rst index ea7ff0e9fb..0d011a2563 100644 --- a/docs/users_guide/hints.rst +++ b/docs/users_guide/hints.rst @@ -427,5 +427,102 @@ Inlining generics There are also flags specific to the inlining of generics: -:ghc-flag:`-finline-generics` -:ghc-flag:`-finline-generics-aggressively` +* :ghc-flag:`-finline-generics` +* :ghc-flag:`-finline-generics-aggressively` + + +.. _hints-os-memory: + +Understanding how OS memory usage corresponds to live data +---------------------------------------------------------- + +A confusing aspect about the RTS is the sometimes big difference between +OS reported memory usage and +the amount of live data reported by heap profiling or ``GHC.Stats``. + +There are two main factors which determine OS memory usage. + +Firstly the collection strategy used by the oldest generation. By default a copying +strategy is used which requires at least 2 times the amount of currently live +data in order to perform a major collection. For example, if your program's live data +is 1G then you would expect the OS to report at minimum 2G. + +If instead you are using the compacting (:rts-flag:`-c`) or nonmoving (:rts-flag:`-xn`) strategies +for the +oldest generation then less overhead is required as the strategy immediately +reuses already allocated memory by overwriting. For a program with heap size +1G then you might expect the OS to report at minimum a small percentage above 1G. + +Secondly, after doing some allocation GHC is quite reluctant to return +the memory to the OS. This is because after performing a major collection the program might +still be allocating a lot and it costs to have to request +more memory. Therefore the RTS keeps an extra amount to reuse which +depends on the :rts-flag:`-F ⟨factor⟩` option. By default +the RTS will keep up to ``(2 + F) * live_bytes`` after performing a major collection due to +exhausting the available heap. The default value is ``F = 2`` so you +can see OS memory usage reported to be as high as 4 times the amount used by your +program. + +Without further intervention, once your program has topped out at this high +threshold, no more memory would be returned to the OS so memory usage would always remain +at 4 times the live data. If you had a server with 1.5G live data, then if there was a memory +spike up to 6G for a short period, then OS reported memory would never dip below 6G. This +is what happened before GHC 9.2. In GHC 9.2 memory is gradually returned to the OS so OS memory +usage returns closer to the theoretical minimums. + +The :rts-flag:`-Fd ⟨factor⟩` option controls the rate at which memory is returned to +the OS. On consecutive major collections which are not triggered by heap overflows, a +counter (``t``) is increased and the ``F`` factor is inversly scaled according to the +value of ``t`` and ``Fd``. The factor is scaled by the equation: + +.. math:: + + \texttt{F}' = \texttt{F} \times {2 ^ \frac{- \texttt{t}}{\texttt{Fd}}} + +By default ``Fd = 4``, increasing ``Fd`` decreases the rate memory is returned. + +Major collections which are not triggered by heap overflows arise mainly in two ways. + + 1. Idle collections (controlled by :rts-flag:`-I ⟨seconds⟩`) + 2. Explicit trigger using ``performMajorGC``. + +For example, idle collections happen by default after 0.3 seconds of inactivity. +If you are running your application and have also set ``-Iw30``, so that the minimum +period between idle GCs is 30 seconds, then say you do a small amount of work every 5 seconds, +there will be about 10 idle collections about 5 minutes. This number of consecutive +idle collections will scale the ``F`` factor as follows: + +.. math:: + + \texttt{F}' = 2 \times {2^{\frac{-10}{4}}} \approx 0.35 + +and hence we will only retain ``(0.35 + 2) * live_bytes`` +rather than the original 4 times. If you want less frequent idle collections then +you should also decrease ``Fd`` so that more memory is returned each time +a collection takes place. + +If you set ``-Fd0`` then GHC will not attempt to return memory, which corresponds +with the behaviour from releases prior to 9.2. You probably don't want to do this as +unless you have idle periods in your program the behaviour will be similar anyway. +If you want to retain a specific amount of memory then it's better to set ``-H1G`` +in order to communicate that you are happy with a heap size of ``1G``. If you do this +then OS reported memory will never decrease below this amount if it ever reaches this +threshold. + +The collecting strategy also affects the fragmentation of the heap and hence how easy +it is to return memory to a theoretical baseline. Memory is allocated firstly +in the unit of megablocks which is then further divided into blocks. Block-level +fragmentation is how much unused space within the allocated megablocks there is. +In a fragmented heap there will be many megablocks which are only partially full. + +In theory the compacting +strategy has a lower memory baseline but practically it can be hard to reach the +baseline due to how compacting never defragments. On the other hand, the copying +collecting has a higher theoretical baseline but we can often get very close to +it because the act of copying leads to lower fragmentation. + +There are some other flags which affect the amount of retained memory as well. +Setting the maximum heap size using :rts-flag:`-M ⟨size⟩` will make sure we don't try +and retain more memory than the maximum size and explicitly setting :rts-flag:`-H [⟨size⟩]` +will mean that we will always try and retain at least ``H`` bytes irrespective of +the amount of live data. diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst index f5a99c82a4..0aa437a4dc 100644 --- a/docs/users_guide/profiling.rst +++ b/docs/users_guide/profiling.rst @@ -746,6 +746,10 @@ You might also want to take a look at `hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced suite of tools (not distributed with GHC) for displaying heap profiles. +Note that there might be a big difference between the OS reported memory usage +of your program and the amount of live data as reported by heap profiling. +The reasons for the difference are explained in :ref:`hints-os-memory`. + .. _rts-options-heap-prof: RTS options for heap profiling diff --git a/docs/users_guide/runtime_control.rst b/docs/users_guide/runtime_control.rst index 8f8b9b3fcb..9ebc5db7f3 100644 --- a/docs/users_guide/runtime_control.rst +++ b/docs/users_guide/runtime_control.rst @@ -730,6 +730,11 @@ performance. and too small an interval could adversely affect interactive responsiveness. + The idle period timer only resets after some activity + by a Haskell thread. If your program is doing literally nothing then + after the first idle collection is triggered then no more future collections + will be scheduled until more work is performed. + This is an experimental feature, please let us know if it causes problems and/or could benefit from further tuning. @@ -961,6 +966,8 @@ performance. calling the ``getRTSStats()`` function from C, or ``GHC.Stats.getRTSStats`` from Haskell. + + .. _rts-options-statistics: RTS options to produce runtime statistics |