summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMatthew Pickering <matthewtpickering@gmail.com>2021-02-17 10:20:19 +0000
committerMarge Bot <ben+marge-bot@smart-cactus.org>2021-03-10 10:33:36 -0500
commitafc357d269b6e1d56385220e78fe696c161e9bf7 (patch)
treee7a57944bf74accc7d242114e28238dd59d8f8bd /docs
parentdf8e8ba267ffd7b8be0702bd64b8c39532359461 (diff)
downloadhaskell-afc357d269b6e1d56385220e78fe696c161e9bf7.tar.gz
rts: Gradually return retained memory to the OS
Related to #19381 #19359 #14702 After a spike in memory usage we have been conservative about returning allocated blocks to the OS in case we are still allocating a lot and would end up just reallocating them. The result of this was that up to 4 * live_bytes of blocks would be retained once they were allocated even if memory usage ended up a lot lower. For a heap of size ~1.5G, this would result in OS memory reporting 6G which is both misleading and worrying for users. In long-lived server applications this results in consistent high memory usage when the live data size is much more reasonable (for example ghcide) Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes of blocks before gradually returning uneeded memory back to the OS on subsequent major GCs which are NOT caused by a heap overflow. Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs counter and the amount of memory which is retained is inversely proportional to this number. By default the excess memory retained is oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor) On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0 (as we could continue to allocate more, so retaining all the memory might make sense). Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower. Smaller values make it get returned faster. Setting `-Fd0` disables the memory return completely, which is the behaviour of older GHC versions. The default is `-Fd4` which results in the following scaling: > mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]] (1.0,0.8408964152537146) (2.0,0.7071067811865475) (3.0,0.5946035575013605) (4.0,0.5) (5.0,0.4204482076268573) (6.0,0.35355339059327373) (7.0,0.29730177875068026) (8.0,0.25) (9.0,0.21022410381342865) (10.0,0.17677669529663687) (11.0,0.14865088937534013) (12.0,0.125) (13.0,0.10511205190671433) (14.0,8.838834764831843e-2) (15.0,7.432544468767006e-2) (16.0,6.25e-2) (17.0,5.255602595335716e-2) (18.0,4.4194173824159216e-2) (19.0,3.716272234383503e-2) (20.0,3.125e-2) So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained. Further to this decay factor, the amount of memory we attempt to retain is also influenced by the GC strategy for the oldest generation. If we are using a copying strategy then we will need at least 2 * live_bytes for copying to take place, so we always keep that much. If using compacting or nonmoving then we need a lower number, so we just retain at least `1.2 * live_bytes` for some protection. In future we might want to make this behaviour more aggressive, some relevant literature is > Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106 which describes the "memory reducer" in the V8 javascript engine which on an idle collection immediately returns as much memory as possible.
Diffstat (limited to 'docs')
-rw-r--r--docs/users_guide/9.2.1-notes.rst7
-rw-r--r--docs/users_guide/runtime_control.rst19
2 files changed, 24 insertions, 2 deletions
diff --git a/docs/users_guide/9.2.1-notes.rst b/docs/users_guide/9.2.1-notes.rst
index de4a983001..3b0022fb8a 100644
--- a/docs/users_guide/9.2.1-notes.rst
+++ b/docs/users_guide/9.2.1-notes.rst
@@ -150,8 +150,6 @@ Runtime system
Moreover, we now correctly account for the size of the array, meaning that
space lost to fragmentation is no longer counted as live data.
-
-
- The ``-xt`` RTS flag has been removed. Now STACK and TSO closures are always
included in heap profiles. Tooling can choose to filter out these closure types
` if necessary.
@@ -162,6 +160,11 @@ Runtime system
be consumed with ``eventlog2html``. This profiling mode does not require a
profiling build.
+- The RTS will now gradually return unused memory back to the OS rather than
+ retaining a large amount (up to 4 * live) indefinitely. The rate at which memory
+ is returned is controlled by the :rts-flag:`-Fd ⟨factor⟩`. Memory return
+ is triggered by consecutive idle collections.
+
``ghc-prim`` library
~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/users_guide/runtime_control.rst b/docs/users_guide/runtime_control.rst
index b8da4aee01..25b27fdd1b 100644
--- a/docs/users_guide/runtime_control.rst
+++ b/docs/users_guide/runtime_control.rst
@@ -577,6 +577,25 @@ performance.
The :rts-flag:`-F ⟨factor⟩` setting will be automatically reduced by the garbage
collector when the maximum heap size (the :rts-flag:`-M ⟨size⟩` setting) is approaching.
+.. rts-flag:: -Fd ⟨factor⟩
+
+ :default: 4
+
+ .. index::
+ single: heap size, factor
+
+ The inverse rate at which unused memory is returned to the OS when it is no longer
+ needed. After a large amount of allocation the RTS will start by retaining
+ a lot of allocated blocks in case it will need them again shortly but then
+ it will gradually release them based on the :rts-flag:`-Fd ⟨factor⟩`. On
+ each subsequent major collection which is not caused by a heap overflow a little
+ more memory will attempt to be returned until the amount retained is similar to
+ the amount of live bytes.
+
+ Increasing this factor will make the rate memory is returned slower, decreasing
+ it will make memory be returned more eagerly. Setting it to 0 will disable the
+ memory return (which will emulate the behaviour in releases prior to 9.2).
+
.. rts-flag:: -G ⟨generations⟩
:default: 2