summaryrefslogtreecommitdiff
path: root/rts/sm
Commit message (Collapse)AuthorAgeFilesLines
* rts: state explicitly what evacuate and scavange mean in the copying gcAdam Sandberg Ericsson2022-04-272-1/+9
|
* Add note about inefficiency in returnMemoryToOSFabian Thorand2022-04-271-0/+8
|
* Defer freeing of mega block groupsFabian Thorand2022-04-273-35/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Solves the quadratic worst case performance of freeing megablocks that was described in issue #19897. During GC runs, we now keep a secondary free list for megablocks that is neither sorted, nor coalesced. That way, free becomes an O(1) operation at the expense of not being able to reuse memory for larger allocations. At the end of a GC run, the secondary free list is sorted and then merged into the actual free list in a single pass. That way, our worst case performance is O(n log(n)) rather than O(n^2). We postulate that temporarily losing coalescense during a single GC run won't have any adverse effects in practice because: - We would need to release enough memory during the GC, and then after that (but within the same GC run) allocate a megablock group of more than one megablock. This seems unlikely, as large objects are not copied during GC, and so we shouldn't need such large allocations during a GC run. - Allocations of megablock groups of more than one megablock are rare. They only happen when a single heap object is large enough to require that amount of space. Any allocation areas that are supposed to hold more than one heap object cannot use megablock groups, because only the first megablock of a megablock group has valid `bdescr`s. Thus, heap object can only start in the first megablock of a group, not in later ones.
* rts: Fix various #include issuesBen Gamari2022-04-063-5/+6
| | | | This fixes various violations of the newly-added RTS includes linter.
* rts: Don't mark object code in markCAFs unless necessaryBen Gamari2022-03-231-2/+4
| | | | | | | | Previously `markCAFs` would call `markObjectCode` even in non-major GCs. This is problematic since `prepareUnloadCheck` is not called in such GCs, meaning that the section index has not been updated. Fixes #21254
* rts: Untag function field in scavenge_PAP_payloadBen Gamari2022-03-231-1/+2
| | | | | | | | Previously we failed to untag the function closure when scavenging the payload of a PAP, resulting in an invalid closure pointer being passed to scavenge_large_bitmap and consequently #21254. Fix this. Fixes #21254
* rts: Address failures to inlineDouglas Wilson2022-02-023-11/+25
|
* Fix a few Note inconsistenciesBen Gamari2022-02-0112-29/+19
|
* rts: Rip out SPARC supportBen Gamari2022-01-291-20/+0
|
* rts/winio: Fix #18382Ben Gamari2022-01-183-3/+0
| | | | | | | | | | | | | | | | | | | Here we refactor WinIO's IO completion scheme, squashing a memory leak and fixing #18382. To fix #18382 we drop the special thread status introduced for IoPort blocking, BlockedOnIoCompletion, as well as drop the non-threaded RTS's special dead-lock detection logic (which is redundant to the GC's deadlock detection logic), as proposed in #20947. Previously WinIO relied on foreign import ccall "wrapper" to create an adjustor thunk which can be attached to the OVERLAPPED structure passed to the operating system. It would then use foreign import ccall "dynamic" to back out the original continuation from the adjustor. This roundtrip is significantly more expensive than the alternative, using a StablePtr. Furthermore, the implementation let the adjustor leak, meaning that every IO request would leak a page of memory. Fixes T18382.
* rts: correct stats when running with +RTS -qn1Douglas Wilson2021-12-121-28/+42
| | | | | | | | | | | | | | | | | | | Despite the documented care having been taken, several bugs are fixed here. When run with -qn1, when a SYNC_GC_PAR is requested we will have n_gc_threads == n_capabilities && n_gc_idle_threads == (n_gc_threads - 1) In this case we now: * Don't increment par_collections * Don't increment par_balanced_copied * Don't emit debug traces for idle threads * Take the fast path in scavenge_until_all_done, wakeup_gc_threads, and shutdown_gc_threads. Some ASSERTs have also been tightened. Fixes #19685
* Require all dirty_MUT_VAR callers to do explicit stg_MUT_VAR_CLEAN_info ↵nineonine2021-12-021-7/+9
| | | | comparison (#20088)
* rts: Ensure that markCAFs marks object codeBen Gamari2021-11-201-4/+11
| | | | | | | | | | | | | Previously `markCAFs` would only evacuate CAFs' indirectees. This would allow reachable object code to be unloaded by the linker as `evacuate` may never be called on the CAF itself, despite it being reachable via the `{dyn,revertible}_caf_list`s. To fix this we teach `markCAFs` to explicit call `markObjectCode`, ensuring that the linker is aware of objects reachable via the CAF lists. Fixes #20649.
* rts/nonmoving: Enable selector optimisation by defaultBen Gamari2021-10-121-5/+1
|
* rts/nonmoving: Rename mark_* to trace_*Ben Gamari2021-10-121-42/+42
| | | | These functions really do no marking; they merely trace pointers.
* nonmoving: Fix and factor out mark_trec_chunkBen Gamari2021-10-121-22/+17
| | | | | We need to ensure that the TRecChunk itself is marked, in addition to the TRecs it contains.
* fix non-moving gc heap space requirements estimateTeo Camarasu2021-10-071-1/+1
| | | | | | | | | The space requirements of the non-moving gc are comparable to the compacting gc, not the copying gc. The copying gc requires a much larger overhead. Fixes #20475
* Corrected types of thread ids obtained from the RTSMann mit Hut2021-10-061-1/+1
| | | | | | | | | | | | | | While the thread ids had been changed to 64 bit words in e57b7cc6d8b1222e0939d19c265b51d2c3c2b4c0 the return type of the foreign import function used to retrieve these ids - namely 'GHC.Conc.Sync.getThreadId' - was never updated accordingly. In order to fix that this function returns now a 'CUULong'. In addition to that the types used in the thread labeling subsystem were adjusted as well and several format strings were modified throughout the whole RTS to display thread ids in a consistent and correct way. Fixes #16761
* rts: Add missing write barriers in MVar wake-up pathsBen Gamari2021-10-021-0/+4
| | | | | | | | | | | Previously PerformPut failed to respect the non-moving collector's snapshot invariant, hiding references to an MVar and its new value by overwriting a stack frame without dirtying the stack. Fix this. PerformTake exhibited a similar bug, failing to dirty (and therefore mark) the blocked stack before mutating it. Closes #20399.
* Remove special case for large objects in allocateForCompactFabian Thorand2021-09-291-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | allocateForCompact() is called when the current allocation for the compact region does not fit in the nursery. It previously had a special case for objects exceeding the large object threshold. In that case, it would allocate a new compact region block just for that object. That led to a lot of small blocks being allocated in compact regions with a larger default block size (`autoBlockW`). This commit removes this special case because having a lot of small compact region blocks contributes significantly to memory fragmentation. The removal should be valid because - a more generic case for allocating a new compact region block follows at the end of allocateForCompact(), and that one takes `autoBlockW` into account - the reason for allocating separate blocks for large objects in the main heap seems to be to avoid copying during GCs, but once inside the compact region, the object will never be copied anyway. Fixes #18757. A regression test T18757 was added.
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* Make `PosixSource.h` installed and under `rts/`John Ericson2021-08-0913-15/+15
| | | | | | is used outside of the rts so we do this rather than just fish it out of the repo in ad-hoc way, in order to make packages in this repo more self-contained.
* Add configure flag to enable ASSERTs in all waysDaniel Gröber2021-07-293-4/+3
| | | | | | | | Running the test suite with asserts enabled is somewhat tricky at the moment as running it with a GHC compiled the DEBUG way has some hundred failures from the start. These seem to be unrelated to assertions though. So this provides a toggle to make it easier to debug failing assertions using the test suite.
* rts: Drop allocateExec and friendsBen Gamari2021-07-271-92/+0
| | | | All uses of these now use ExecPage.
* rts: Move libffi interfaces all to AdjustorBen Gamari2021-07-271-83/+2
| | | | | | | Previously the libffi Adjustor implementation would use allocateExec to create executable mappings. However, allocateExec is also used elsewhere in GHC to allocate things other than ffi_closure, which is a use-case which libffi does not support.
* Guard Allocate Exec via LIBFFI by LIBFFIMoritz Angermann2021-06-201-1/+1
| | | | | | | | | | | | We now have two darwin flavours. AArch64-Darwin, and x86_64-darwin, the latter one which has proper custom adjustor support, the former though relies on libffi. Mixing both leads to odd crashes, as the closures might not fit the size of the libffi closures. Hence this needs to be guarded by the USE_LBFFI_FOR_ADJUSTORS guard. Original patch by Hamish Mackenzie
* Fix copy+pasto in Sanity.cMatthew Pickering2021-04-021-1/+1
|
* Allocate Adjustors and mark them readable in two stepsMoritz Angermann2021-03-291-1/+36
| | | | | | | | | This drops allocateExec for darwin, and replaces it with a alloc, write, mark executable strategy instead. This prevents us from trying to allocate an executable range and then write to it, which X^W will prohibit on darwin. This will *only* work if we can use mmap.
* Make traceHeapEventInfo an init eventMatthew Pickering2021-03-141-6/+18
| | | | This means it will be reposted everytime the eventlog is started.
* rts: Gradually return retained memory to the OSMatthew Pickering2021-03-102-5/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related to #19381 #19359 #14702 After a spike in memory usage we have been conservative about returning allocated blocks to the OS in case we are still allocating a lot and would end up just reallocating them. The result of this was that up to 4 * live_bytes of blocks would be retained once they were allocated even if memory usage ended up a lot lower. For a heap of size ~1.5G, this would result in OS memory reporting 6G which is both misleading and worrying for users. In long-lived server applications this results in consistent high memory usage when the live data size is much more reasonable (for example ghcide) Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes of blocks before gradually returning uneeded memory back to the OS on subsequent major GCs which are NOT caused by a heap overflow. Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs counter and the amount of memory which is retained is inversely proportional to this number. By default the excess memory retained is oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor) On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0 (as we could continue to allocate more, so retaining all the memory might make sense). Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower. Smaller values make it get returned faster. Setting `-Fd0` disables the memory return completely, which is the behaviour of older GHC versions. The default is `-Fd4` which results in the following scaling: > mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]] (1.0,0.8408964152537146) (2.0,0.7071067811865475) (3.0,0.5946035575013605) (4.0,0.5) (5.0,0.4204482076268573) (6.0,0.35355339059327373) (7.0,0.29730177875068026) (8.0,0.25) (9.0,0.21022410381342865) (10.0,0.17677669529663687) (11.0,0.14865088937534013) (12.0,0.125) (13.0,0.10511205190671433) (14.0,8.838834764831843e-2) (15.0,7.432544468767006e-2) (16.0,6.25e-2) (17.0,5.255602595335716e-2) (18.0,4.4194173824159216e-2) (19.0,3.716272234383503e-2) (20.0,3.125e-2) So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained. Further to this decay factor, the amount of memory we attempt to retain is also influenced by the GC strategy for the oldest generation. If we are using a copying strategy then we will need at least 2 * live_bytes for copying to take place, so we always keep that much. If using compacting or nonmoving then we need a lower number, so we just retain at least `1.2 * live_bytes` for some protection. In future we might want to make this behaviour more aggressive, some relevant literature is > Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106 which describes the "memory reducer" in the V8 javascript engine which on an idle collection immediately returns as much memory as possible.
* rts: Use a separate free block list for allocatePinnedMatthew Pickering2021-03-082-15/+154
| | | | | | | | | | | The way in which allocatePinned took blocks out of the nursery was leading to horrible fragmentation in some workloads. The strategy now is that a separate free block list is reserved for each capability and blocks are taken from there. When it's empty the global SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated. Fixes #19481
* eventlog: Add MEM_RETURN event to give information about fragmentationMatthew Pickering2021-03-083-3/+9
| | | | | | | | | | | | | | See #19357 The event reports the * Current number of megablocks allocated * The number that the RTS thinks it needs * The number is managed to return to the OS When current > need then the difference is returned to the OS, the successful number of returned mblocks is reported by 'returned'. In a fragmented heap current > need but returned < current - need.
* rts: Add generic block traversal function, listAllBlocksMatthew Pickering2021-02-181-0/+36
| | | | | | | | | This function is exposed in the RtsAPI.h so that external users have a blessed way to traverse all the different `bdescr`s which are known by the RTS. The main motivation is to use this function in ghc-debug but avoid having to expose the internal structure of a Capability in the API.
* rts: TraverseHeap: Move "flip" bit into traverseState structDaniel Gröber2021-02-171-1/+1
|
* Fix typosBrian Wignall2021-02-061-1/+1
|
* rts: sm/GC.c: make num_idle unsignedAndreas Klebinger2021-01-281-1/+1
| | | | | We compare it to n_gc_idle_threads which is unsigned as well. So make both signed to avoid a warning.
* rts: gc: use mutex+condvar instead of spinlooks in gc entry/exitDouglas Wilson2021-01-172-79/+110
| | | | | | used timed wait on condition variable in waitForGcThreads fix dodgy timespec calculation
* rts: add max_n_todo_overflow internal counterDouglas Wilson2021-01-173-2/+18
| | | | | | | | I've never observed this counter taking a non-zero value, however I do think it's existence is justified by the comment in grab_local_todo_block. I've not added it to RTSStats in GHC.Stats, as it doesn't seem worth the api churn.
* rts: remove no_work counterDouglas Wilson2021-01-173-12/+3
| | | | We are no longer busyish waiting, so this is no longer meaningful
* rts: gc: use mutex+condvar instead of sched_yield in gc main loopDouglas Wilson2021-01-173-134/+237
| | | | | | | | | | | | | | | | | | | Here we remove the schedYield loop in scavenge_until_all_done+any_work, replacing it with a single mutex + condition variable. Previously any_work would check todo_large_objects, todo_q, todo_overflow of each gen for work. Comments explained that this was checking global work in any gen. However, these must have been out of date, because all of these locations are local to a gc thread. We've eliminated any_work entirely, instead simply looping back into scavenge_loop, which will quickly return if there is no work. shutdown_gc_threads is called slightly earlier than before. This ensures that n_gc_threads can never be observed to increase from 0 by a worker thread. startup_gc_threads is removed. It consisted of a single variable assignment, which is moved inline to it's single callsite.
* rts: Use SEQ_CST accesses when touching `wakeup`Ben Gamari2021-01-092-3/+3
| | | | | These are the two remaining non-atomic accesses to `wakeup` which were missed by the original TSAN patch.
* rts: stats: Some fixes to stats for sequential gcsDouglas Wilson2021-01-091-10/+25
| | | | | | | | Solves #19147. When n_capabilities > 1 we were not correctly accounting for gc time for sequential collections. In this case par_n_gcthreads == 1, however it is not guaranteed that the single gc thread is capability 0. A similar issue for copied is addressed as well.
* rts/Sanity: Allow DEAD_WEAKs in weak pointer listBen Gamari2021-01-071-1/+1
| | | | | | | The weak pointer check in `checkGenWeakPtrList` previously failed to account for dead weak pointers. This caused `fptr01` to fail in the `sanity` way. Fixes #19162.
* rts: Zero shrunk array slop in vanilla RTSBen Gamari2021-01-071-4/+9
| | | | | | But only when profiling or DEBUG are enabled. Fixes #17572.
* Storage: Unconditionally enable zeroing of alignment slopBen Gamari2021-01-071-11/+11
| | | | This is necessary since the user may enable `+RTS -hT` at any time.
* spelling: thead -> threadDouglas Wilson2020-12-231-2/+2
|
* nonmoving: Add comments to nonmovingResurrectThreadsGHC GitLab CI2020-12-201-0/+5
|
* nonmoving: Don't push objects during deadlock detect GCBen Gamari2020-12-201-2/+6
| | | | | | Previously we would push large objects and compact regions to the mark queue during the deadlock detect GC, resulting in failure to detect deadlocks.
* nonmoving: Refactor alloc_for_copyGHC GitLab CI2020-12-201-48/+79
| | | | Pull the cold non-moving allocation path out of alloc_for_copy.
* nonmoving: Ensure deadlock detection promotion worksGHC GitLab CI2020-12-201-18/+22
| | | | | | Previously the deadlock-detection promotion logic in alloc_for_copy was just plain wrong: it failed to fire when gct->evac_gen_no != oldest_gen->gen_no. The fix is simple: move the