summaryrefslogtreecommitdiff
path: root/rts
Commit message (Collapse)AuthorAgeFilesLines
* rts: Gradually return retained memory to the OSMatthew Pickering2021-03-104-18/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related to #19381 #19359 #14702 After a spike in memory usage we have been conservative about returning allocated blocks to the OS in case we are still allocating a lot and would end up just reallocating them. The result of this was that up to 4 * live_bytes of blocks would be retained once they were allocated even if memory usage ended up a lot lower. For a heap of size ~1.5G, this would result in OS memory reporting 6G which is both misleading and worrying for users. In long-lived server applications this results in consistent high memory usage when the live data size is much more reasonable (for example ghcide) Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes of blocks before gradually returning uneeded memory back to the OS on subsequent major GCs which are NOT caused by a heap overflow. Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs counter and the amount of memory which is retained is inversely proportional to this number. By default the excess memory retained is oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor) On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0 (as we could continue to allocate more, so retaining all the memory might make sense). Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower. Smaller values make it get returned faster. Setting `-Fd0` disables the memory return completely, which is the behaviour of older GHC versions. The default is `-Fd4` which results in the following scaling: > mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]] (1.0,0.8408964152537146) (2.0,0.7071067811865475) (3.0,0.5946035575013605) (4.0,0.5) (5.0,0.4204482076268573) (6.0,0.35355339059327373) (7.0,0.29730177875068026) (8.0,0.25) (9.0,0.21022410381342865) (10.0,0.17677669529663687) (11.0,0.14865088937534013) (12.0,0.125) (13.0,0.10511205190671433) (14.0,8.838834764831843e-2) (15.0,7.432544468767006e-2) (16.0,6.25e-2) (17.0,5.255602595335716e-2) (18.0,4.4194173824159216e-2) (19.0,3.716272234383503e-2) (20.0,3.125e-2) So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained. Further to this decay factor, the amount of memory we attempt to retain is also influenced by the GC strategy for the oldest generation. If we are using a copying strategy then we will need at least 2 * live_bytes for copying to take place, so we always keep that much. If using compacting or nonmoving then we need a lower number, so we just retain at least `1.2 * live_bytes` for some protection. In future we might want to make this behaviour more aggressive, some relevant literature is > Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106 which describes the "memory reducer" in the V8 javascript engine which on an idle collection immediately returns as much memory as possible.
* eventlog: Repost initialisation events when eventlog restartsMatthew Pickering2021-03-085-9/+93
| | | | | | | | | | | | | | | | | | If startEventlog is called after the program has already started running then quite a few useful events are missing from the eventlog because they are only posted when the program starts. This patch adds a mechanism to declare that an event should be reposted everytime the startEventlog function is called. Now in EventLog.c there is a global list of functions called `eventlog_header_funcs` which stores a list of functions which should be called everytime the eventlog starts. When calling `postInitEvent`, the event will not only be immediately posted to the eventlog but also added to the global list. When startEventLog is called, the list is traversed and the events reposted.
* rts: Use a separate free block list for allocatePinnedMatthew Pickering2021-03-084-15/+157
| | | | | | | | | | | The way in which allocatePinned took blocks out of the nursery was leading to horrible fragmentation in some workloads. The strategy now is that a separate free block list is reserved for each capability and blocks are taken from there. When it's empty the global SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated. Fixes #19481
* eventlog: Add BLOCKS_SIZE eventMatthew Pickering2021-03-084-1/+19
| | | | | | | | | | | | | The BLOCKS_SIZE event reports the size of the currently allocated blocks in bytes. It is like the HEAP_SIZE event, but reports about the blocks rather than megablocks. You can work out the current heap fragmentation by looking at the difference between HEAP_SIZE and BLOCKS_SIZE. Fixes #19357
* eventlog: Add MEM_RETURN event to give information about fragmentationMatthew Pickering2021-03-088-3/+81
| | | | | | | | | | | | | | See #19357 The event reports the * Current number of megablocks allocated * The number that the RTS thinks it needs * The number is managed to return to the OS When current > need then the difference is returned to the OS, the successful number of returned mblocks is reported by 'returned'. In a fragmented heap current > need but returned < current - need.
* Implement riscv64 LLVM backendAndreas Schwab2021-03-054-2/+152
| | | | This enables a registerised build for the riscv64 architecture.
* rts: Make markLiveObject thread-safeBen Gamari2021-03-042-3/+9
| | | | | | | | markLiveObject is called by GC worker threads and therefore must be thread-safe. This was a rather egregious oversight which the testsuite missed. (cherry picked from commit fe28a062e47bd914a6879f2d01ff268983c075ad)
* Add whereFrom and whereFrom# primopMatthew Pickering2021-03-032-0/+10
| | | | | | | | | | The `whereFrom` function provides a Haskell interface for using the information created by `-finfo-table-map`. Given a Haskell value, the info table address will be passed to the `lookupIPE` function in order to attempt to find the source location information for that particular closure. At the moment it's not possible to distinguish the absense of the map and a failed lookup.
* Add -finfo-table-map which maps info tables to source positionsMatthew Pickering2021-03-039-0/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new flag embeds a lookup table from the address of an info table to information about that info table. The main interface for consulting the map is the `lookupIPE` C function > InfoProvEnt * lookupIPE(StgInfoTable *info) The `InfoProvEnt` has the following structure: > typedef struct InfoProv_{ > char * table_name; > char * closure_desc; > char * ty_desc; > char * label; > char * module; > char * srcloc; > } InfoProv; > > typedef struct InfoProvEnt_ { > StgInfoTable * info; > InfoProv prov; > struct InfoProvEnt_ *link; > } InfoProvEnt; The source positions are approximated in a similar way to the source positions for DWARF debugging information. They are only approximate but in our experience provide a good enough hint about where the problem might be. It is therefore recommended to use this flag in conjunction with `-g<n>` for more accurate locations. The lookup table is also emitted into the eventlog when it is available as it is intended to be used with the `-hi` profiling mode. Using this flag will significantly increase the size of the resulting object file but only by a factor of 2-3x in our experience.
* Profiling by info table mode (-hi)Matthew Pickering2021-03-033-0/+19
| | | | | | | | This profiling mode creates bands by the address of the info table for each closure. This provides a much more fine-grained profiling output than any of the other profiling modes. The `-hi` profiling mode does not require a profiling build.
* Profiling: Allow heap profiling to be controlled dynamically.Matthew Pickering2021-03-036-11/+60
| | | | | | | | | | This patch exposes three new functions in `GHC.Profiling` which allow heap profiling to be enabled and disabled dynamically. 1. startHeapProfTimer - Starts heap profiling with the given RTS options 2. stopHeapProfTimer - Stops heap profiling 3. requestHeapCensus - Perform a heap census on the next context switch, regardless of whether the timer is enabled or not.
* eventlog: Fix various racesBen Gamari2021-03-024-19/+136
| | | | | | | | | | | | | | | Previously the eventlog infrastructure had a couple of races that could pop up when using the startEventLog/endEventLog interfaces. In particular, stopping and then later restarting logging could result in data preceding the eventlog header, breaking the integrity of the stream. To fix this we rework the invariants regarding the eventlog and generally tighten up the concurrency control surrounding starting and stopping of logging. We also fix an unrelated bug, wherein log events from disabled capabilities could end up never flushed.
* rts/eventlog: Flush MainCapability buffer in non-threaded RTSBen Gamari2021-03-011-0/+2
| | | | | | | Previously flushEventLog failed to flush anything but the global event buffer in the non-threaded RTS. Fixes #19436.
* rts/eventlog: Ensure that all capability buffers are flushedBen Gamari2021-03-012-1/+2
| | | | | | | | | | | The previous approach performed the flush in yieldCapability. However, as pointed out in #19435, this is wrong as it idle capabilities will not go through this codepath. The fix is simple: undo the optimisation, flushing in `flushEventLog` by calling `flushAllCapsEventsBufs` after acquiring all capabilities. Fixes #19435.
* Remove the -xt heap profiling optionMatthew Pickering2021-02-272-28/+3
| | | | | | | It should be left to tooling to perform the filtering to remove these specific closure types from the profile if desired. Fixes #16795
* rts: Introduce --eventlog-flush-interval flagBen Gamari2021-02-272-0/+32
| | | | | | | This introduces a flag, --eventlog-flush-interval, which can be used to set an upper bound on the amount of time for which an eventlog event will remain enqueued. This can be useful in real-time monitoring settings.
* Move absentError into ghc-prim.Andreas Klebinger2021-02-263-1/+8
| | | | | | | | | | | | When using -fdicts-strict we generate references to absentError while compiling ghc-prim. However we always load ghc-prim before base so this caused linker errors. We simply solve this by moving absentError into ghc-prim. This does mean it's now a panic instead of an exception which can no longer be caught. But given that it should only be thrown if there is a compiler error that seems acceptable, and in fact we already do this for absentSumFieldError which has similar constraints.
* linker: Fix atexit handlers on PETamar Christina2021-02-222-3/+5
|
* Do not cas on slowpath of SpinLock unnecessarilyDylan Yudaken2021-02-221-3/+35
| | | | This is a well known technique to reduce inter-CPU bus traffic while waiting for the lock by reducing the number of writes.
* rts: Add generic block traversal function, listAllBlocksMatthew Pickering2021-02-181-0/+36
| | | | | | | | | This function is exposed in the RtsAPI.h so that external users have a blessed way to traverse all the different `bdescr`s which are known by the RTS. The main motivation is to use this function in ghc-debug but avoid having to expose the internal structure of a Capability in the API.
* rts: TraverseHeap: Update resetStaticObjectForProfiling docsDaniel Gröber2021-02-171-22/+18
| | | | | | | | | | | | | | | | | | | | | | | | Simon's concern in the old comment, specifically: So all of the calls to traverseMaybeInitClosureData() here are initialising retainer sets with the wrong flip. Is actually exactly what the code was intended to do. It makes the closure data valid, then at the beginning of the traversal the flip bit is flipped resetting all closures across the heap to invalid. Now it used to be that the profiling code using the traversal has it's own sense of valid vs. invalid beyond what the traversal code does and indeed the retainer profiler still does this, there a getClosureData of NULL is considered an invalid retainer set. So in effect there wasn't any difference in invalidating closure data rather than just resetting it to a valid zero, which might be what confused Simon at the time. As the code is now it actually uses the value of the valid/invalid bit in the form of the 'first_visit' argument to the 'visit' callback so there is a difference.
* rts: TraverseHeap: Fix failed to inline warningsDaniel Gröber2021-02-171-1/+1
| | | | GCC warns that varadic functions simply cannot be inlined.
* rts: ProfHeap: Move definitions for Census to new headerDaniel Gröber2021-02-172-50/+77
|
* rts: ProfHeap: Merge some redundant ifdefsDaniel Gröber2021-02-171-10/+1
|
* rts: TraverseHeap: Allow visit_cb to be NULLDaniel Gröber2021-02-171-2/+4
|
* rts: TraverseHeap: Add a basic testDaniel Gröber2021-02-174-0/+224
| | | | | For now this just tests that the order of the callbacks is what we expect for a couple of synthetic heap graphs.
* rts: TraverseHeap: Move stackElement to headerDaniel Gröber2021-02-172-69/+64
| | | | | | | The point of this is to let user code call traversePushClosure directly instead of going through traversePushRoot. This in turn allows specifying a stackElement to be used when the traversal returns from a top-level (root) closure.
* rts: TraverseHeap: Make "flip" bit flip into it's own functionDaniel Gröber2021-02-173-11/+25
|
* rts: TraverseHeap: Move "flip" bit into traverseState structDaniel Gröber2021-02-176-57/+67
|
* rts: TraverseHeap: Make trav. data macros into functionsDaniel Gröber2021-02-174-22/+30
| | | | | This allows the global 'flip' variable not to be exported. This allows a future commit to also make it part of the traversalState struct.
* rts: TraverseHeap: Simplify profiling headerDaniel Gröber2021-02-174-13/+13
| | | | | Having a union in the closure profiling header really just complicates things so get back to basics, we just have a single StgWord there for now.
* rts: TraverseHeap: Update some commentsDaniel Gröber2021-02-171-4/+4
| | | | data_out was renamed to child_data at some point
* rts: TraverseHeap: Introduce callback for subtree completionDaniel Gröber2021-02-173-77/+185
| | | | | | | | | | | | | | | The callback 'return_cb' allows users to be perform additional accounting when the traversal of a subtree is completed. This is needed for example to determine the number or total size of closures reachable from a given closure. This commit also makes the lifetime increase of stackElements from commit "rts: TraverseHeap: Increase lifetime of stackElements" optional based on 'return_cb' being set enabled or not. Note that our definition of "subtree" here includes leaf nodes. So the invariant is that return_cb is called for all nodes in the traversal exactly once.
* rts: TraverseHeap: Link parent stackElements on the stackDaniel Gröber2021-02-171-44/+56
| | | | | | | | | The new 'sep' field links a stackElement to it's "parent". That is the stackElement containing it's parent closure. Currently not all closure types create long lived elements on the stack so this does not cover all parents along the path to the root but that is about to change in a future commit.
* rts: TraverseHeap: Increase lifetime of stackElementsDaniel Gröber2021-02-171-16/+26
| | | | | | | | | | | | | | | This modifies the lifetime of stackElements such that they stay on the stack until processing of all child closures is complete. Currently the stackElement representing a set of child closures will be removed as soon as processing of the last closure _starts_. We will use this in a future commit to allow storing information on the stack which should be accumulated in a bottom-up manner along the closure parent-child relationship. Note that the lifetime increase does not apply to 'type == posTypeFresh' stack elements. This is because they will always be pushed right back onto the stack as regular stack elements anyways.
* rts: TraverseHeap: Rename traversePushClosure to traversePushRootDaniel Gröber2021-02-173-4/+10
|
* Fix typosBrian Wignall2021-02-066-9/+9
|
* rts: Fix arguments for foreign calls of interpreterStefan Schulze Frielinghaus2021-02-051-2/+24
| | | | | | | | | | | Function arguments passed to the interpreter are extended to whole words. However, foreign function interface expects correctly typed argument pointers. Accordingly, we have to adjust argument pointers in case of a big-endian architecture. In contrast to function arguments where subwords are passed in the low bytes of a word, the return value is expected to reside in the high bytes of a word.
* rts: Use properly sized pointers in e.g. rts_mkInt8Stefan Schulze Frielinghaus2021-02-051-26/+20
| | | | | | | Since commit be5d74caab the payload of a closure of Int<N> or Word<N> is not extended anymore to the machines word size. Instead, only the first N bits of a payload are written. This patch ensures that only those bits are read/written independent of the machines endianness.
* rts: sm/GC.c: make num_idle unsignedAndreas Klebinger2021-01-281-1/+1
| | | | | We compare it to n_gc_idle_threads which is unsigned as well. So make both signed to avoid a warning.
* Deprecate -h flagMatthew Pickering2021-01-271-0/+5
| | | | | | | | | | It is confusing that it defaults to two different things depending on whether we are in the profiling way or not. Use -hc if you have a profiling build Use -hT if you have a normal build Fixes #19031
* Remove ioManager{Start,Die,Wakeup} from IOManager.hDuncan Coutts2021-01-256-15/+34
| | | | | | | | | They are not part of the IOManager interface used within the rest of the RTS. They are the part of the interface of specific I/O manager implementations. They are no longer called directly elsewhere in the RTS, and are now only called by the dispatch functions in IOManager.c
* Add a common wakeupIOManager hookDuncan Coutts2021-01-253-1/+33
| | | | | | | Use in the scheduler in threaded mode. Replaces the direct call to ioManagerWakeup which are part of specific I/O manager implementations.
* Replace a ioManagerDie call with stopIOManagerDuncan Coutts2021-01-252-1/+14
| | | | | The latter is the proper hook defined in IOManager.h. The former is part of a specific I/O manager implementation (the threaded unix one).
* Replace a direct call to ioManagerStartCap with a new hookDuncan Coutts2021-01-253-3/+48
| | | | | | | | | | Replace a direct call to ioManagerStartCap in the forkProcess in Schedule.c with a new hook initIOManagerAfterFork in IOManager. This replaces a direct hook in the scheduler from the a single I/O manager impl (the threaded unix one) with a generic hook. Add some commentrary on opportunities for future rationalisation.
* Move hooks for I/O manager startup / shutdown into IOManager.{c,h}Duncan Coutts2021-01-253-20/+88
|
* Move ioManager{Start,Wakeup,Die} to internal IOManager.hDuncan Coutts2021-01-256-2/+16
| | | | | | | | Move them from the external IOInterface.h to the internal IOManager.h. The functions are all in fact internal. They are not used from the base library at all. Remove ioManagerWakeup as an exported symbol. It is not used elsewhere.
* Move setIOManagerControlFd from Capability.c to IOManager.cDuncan Coutts2021-01-252-17/+17
| | | | | This is a better home for it. It is not really an aspect of capabilities. It is specific to one of the I/O manager impls.
* Start to centralise the I/O manager hooks from other bits of the RTSDuncan Coutts2021-01-253-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | It is currently rather difficult to understand or work with the various I/O manager implementations. This is for a few reasons: 1. They do not have a clear or common API. There are some common function names, but a lot of things just get called directly. 2. They have hooks into many other parts of the RTS where they get called from. 3. There is a _lot_ of CPP involved, both THREADED_RTS vs !THREADED_RTS and also mingw32_HOST_OS vs !mingw32_HOST_OS. This doesn't really identify the I/O manager implementation. 4. They have data structures with unclear ownership, or that are co-owned with other components like the scheduler. Some data structures are used by multiple I/O managers. One thing that would help is if the interface between the I/O managers and the rest of the RTS was clearer, even if it was not completely uniform. Centralising it would make it easier to see how to reduce any unnecessary diversity in the interfaces. This patch makes a start by creating a new IOManager.{h,c} module. It is initially empty, but we will move things into it in subsequent patches.
* Rename includes/rts/IOManager.h to IOInterface.hDuncan Coutts2021-01-253-3/+3
| | | | | | | | | | | | | | | | | | | | | Naming is hard. Where we want to get to is to have a clear internal and external API for the IO manager within the RTS. What we have right now is just the external API (used in base for the Haskell side of the threaded IO manager impls) living in includes/rts/IOManager.h. We want to add a clear RTS internal API, which really ought to live in rts/IOManager.h. Several people think it's too confusing to have both: * includes/rts/IOManager.h for the external API * rts/IOManager.h for the internal API So the plan is to add rts/IOManager.{h,c} as the internal parts, and rename the external part to be includes/rts/IOInterface.h. It is admittidly not great to have .h files in includes/rts/ called "interface" since by definition, every .h fle under includes/ is an interface! Alternative naming scheme suggestions welcome!