summaryrefslogtreecommitdiff
path: root/rts/sm
Commit message (Collapse)AuthorAgeFilesLines
...
* nonmoving: Refactor update remembered set initializationBen Gamari2022-12-235-34/+66
| | | | | | | This avoids a lock inversion between the storage manager mutex and the stable pointer table mutex by not dropping the SM_MUTEX in nonmovingCollect. This requires quite a bit of rejiggering but it does seem like a better strategy.
* nonmoving: Make segment state updates atomicBen Gamari2022-12-231-1/+1
|
* nonmoving: Fix races in collector status trackingBen Gamari2022-12-232-7/+10
| | | | | | Mark a number of accesses to do with tracking of the status of the concurrent collection thread as atomic. No interesting races here, merely necessary to satisfy TSAN.
* nonmoving: Ensure that mutable fields have acquire barrierBen Gamari2022-12-231-8/+16
|
* nonmoving: Eliminate race in bump_static_flagBen Gamari2022-12-231-8/+10
| | | | | | | To ensure that we don't race with a mutator entering a new CAF we take the SM mutex before touching static_flag. The other option here would be to instead modify newCAF to use a CAS but the present approach is a bit safer.
* nonmoving: Use atomic when looking at bd->genBen Gamari2022-12-231-1/+4
| | | | Since it may have been mutated by a moving GC.
* nonmoving: Fix segment list racesBen Gamari2022-12-232-6/+12
|
* nonmoving: Fix race in marking of blackholesBen Gamari2022-12-231-2/+6
| | | | | We must use an acquire-fence when marking to ensure that the indirectee is visible.
* rts: Drop racy assertionBen Gamari2022-12-181-0/+3
| | | | | | | 0e274c39bf836d5bb846f5fa08649c75f85326ac added an assertion in `dirty_MUT_VAR` checking that the MUT_VAR being dirtied was clean. However, this isn't necessarily the case since another thread may have raced us to dirty the object.
* rts: Style fixBen Gamari2022-12-161-6/+3
|
* rts: Encapsulate sched_stateBen Gamari2022-12-161-3/+3
|
* rts: Encapsulate access to capabilities arrayBen Gamari2022-12-168-64/+64
|
* rts: Introduce getNumCapabilitiesBen Gamari2022-12-169-68/+68
| | | | | | And ensure accesses to n_capabilities are atomic (although with relaxed ordering). This is necessary as RTS API callers may concurrently call into the RTS without holding a capability.
* Improve heap memory barrier NoteBen Gamari2022-12-161-2/+2
| | | | | Also introduce MUT_FIELD marker in Closures.h to document mutable fields.
* Remove the now-unused markSchedulerDuncan Coutts2022-11-223-7/+0
| | | | | The global vars {blocked,sleeping}_queue are now in the Capability and so get marked there via markCapabilityIOManager.
* rts: make flushExec a no-op on wasm32Cheng Shao2022-11-111-0/+1
| | | | | This patch makes flushExec a no-op on wasm32, since there's no such thing as executable memory on wasm32 in the first place.
* rts: don't return memory to OS on wasm32Cheng Shao2022-11-112-0/+8
| | | | | | This patch makes the storage manager not return any memory on wasm32. The detailed reason is described in Note [Megablock allocator on wasm].
* rts: ensure we are below maxHeapSize after returning megablocksTeo Camarasu2022-10-151-0/+7
| | | | | | | | | When the heap is heavily block fragmented the live byte size might be low while the memory usage is high. We want to ensure that heap overflow triggers in these cases. We do so by checking that we can return enough megablocks to under maxHeapSize at the end of GC.
* Refactor IPE initializationBen Gamari2022-10-111-6/+5
| | | | | | | | | | | | | | | Here we refactor the representation of info table provenance information in object code to significantly reduce its size and link-time impact. Specifically, we deduplicate strings and represent them as 32-bit offsets into a common string table. In addition, we rework the registration logic to eliminate allocation from the registration path, which is run from a static initializer where things like allocation are technically undefined behavior (although it did previously seem to work). For similar reasons we eliminate lock usage from registration path, instead relying on atomic CAS. Closes #22077.
* Add native delimited continuations to the RTSAlexis King2022-09-117-1/+64
| | | | | | | | | | | | | | | | | | | | | This patch implements GHC proposal 313, "Delimited continuation primops", by adding native support for delimited continuations to the GHC RTS. All things considered, the patch is relatively small. It almost exclusively consists of changes to the RTS; the compiler itself is essentially unaffected. The primops come with fairly extensive Haddock documentation, and an overview of the implementation strategy is given in the Notes in rts/Continuation.c. This first stab at the implementation prioritizes simplicity over performance. Most notably, every continuation is always stored as a single, contiguous chunk of stack. If one of these chunks is particularly large, it can result in poor performance, as the current implementation does not attempt to cleverly squeeze a subset of the stack frames into the existing stack: it must fit all at once. If this proves to be a performance issue in practice, a cleverer strategy would be a worthwhile target for future improvements.
* rts: Move thread labels into TSOBen Gamari2022-08-064-0/+16
| | | | | | | This eliminates the thread label HashTable and instead tracks this information in the TSO, allowing us to use proper StgArrBytes arrays for backing the label and greatly simplifying management of object lifetimes when we expose them to the user with the coming `threadLabel#` primop.
* rts/nonmoving: Don't scavenge objects which weren't evacuatedBen Gamari2022-07-253-5/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a rather subtle bug in the logic responsible for scavenging objects evacuated to the non-moving generation. In particular, objects can be allocated into the non-moving generation by two ways: a. evacuation out of from-space by the garbage collector b. direct allocation by the mutator Like all evacuation, objects moved by (a) must be scavenged, since they may contain references to other objects located in from-space. To accomplish this we have the following scheme: * each nonmoving segment's block descriptor has a scan pointer which points to the first object which has yet to be scavenged * the GC tracks a set of "todo" segments which have pending scavenging work * to scavenge a segment, we scavenge each of the unmarked blocks between the scan pointer and segment's `next_free` pointer. We skip marked blocks since we know the allocator wouldn't have allocated into marked blocks (since they contain presumably live data). We can stop at `next_free` since, by definition, the GC could not have evacuated any objects to blocks above `next_free` (otherwise `next_free wouldn't be the first free block). However, this neglected to consider objects allocated by path (b). In short, the problem is that objects directly allocated by the mutator may become unreachable (but not swept, since the containing segment is not yet full), at which point they may contain references to swept objects. Specifically, we observed this in #21885 in the following way: 1. the mutator (specifically in #21885, a `lockCAF`) allocates an object (specifically a blackhole, which here we will call `blkh`; see Note [Static objects under the nonmoving collector] for the reason why) on the non-moving heap. The bitmap of the allocated block remains 0 (since allocation doesn't affect the bitmap) and the containing segment's (which we will call `blkh_seg`) `next_free` is advanced. 2. We enter the blackhole, evaluating the blackhole to produce a result (specificaly a cons cell) in the nursery 3. The blackhole gets updated into an indirection pointing to the cons cell; it is pushed to the generational remembered set 4. we perform a GC, the cons cell is evacuated into the nonmoving heap (into segment `cons_seg`) 5. the cons cell is marked 6. the GC concludes 7. the CAF and blackhole become unreachable 8. `cons_seg` is filled 9. we start another GC; the cons cell is swept 10. we start a new GC 11. something is evacuated into `blkh_seg`, adding it to the "todo" list 12. we attempt to scavenge `blkh_seg` (namely, all unmarked blocks between `scan` and `next_free`, which includes `blkh`). We attempt to evacuate `blkh`'s indirectee, which is the previously-swept cons cell. This is unsafe, since the indirectee is no longer a valid heap object. The problem here was that the scavenging logic *assumed* that (a) was the only source of allocations into the non-moving heap and therefore *all* unmarked blocks between `scan` and `next_free` were evacuated. However, due to (b) this is not true. The solution is to ensure that that the scanned region only encompasses the region of objects allocated during evacuation. We do this by updating `scan` as we push the segment to the todo-segment list to point to the block which was evacuated into. Doing this required changing the nonmoving scavenging implementation's update of the `scan` pointer to bump it *once*, instead of after scavenging each block as was done previously. This is because we may end up evacuating into the segment being scavenged as we scavenge it. This was quite tricky to discover but the result is quite simple, demonstrating yet again that global mutable state should be used exceedingly sparingly. Fixes #21885 (cherry picked from commit 0b27ea23efcb08639309293faf13fdfef03f1060)
* rts/nonmoving: Track segment stateBen Gamari2022-07-252-1/+28
| | | | | | | | It can often be useful during debugging to be able to determine the state of a nonmoving segment. Introduce some state, enabled by DEBUG, to track this. (cherry picked from commit 40e797ef591ae3122ccc98ab0cc3cfcf9d17bd7f)
* Allow running memInventory when the concurrent nonmoving gc is enabledTeo Camarasu2022-07-182-5/+14
| | | | | | | | If the nonmoving gc is enabled and we are using a threaded RTS, we now try to grab the collector mutex to avoid memInventory and the collection racing. Before memInventory was disabled.
* rts: gc stats: account properly for copied bytes in sequential collectionsDouglas Wilson2022-07-011-0/+7
| | | | | | | | | | | We were not updating the [copied,any_work,scav_find_work, max_n_todo_overflow] counters during sequential collections. As well, we were double counting for parallel collections. To fix this we add an `else` clause to the `if (is_par_gc())`. The par_* counters do not need to be updated in the sequential case because they must be 0.
* Transcribe discussion from #21483 into a NoteMatthew Pickering2022-06-221-0/+78
| | | | | | | | | In #21483 I had a discussion with Simon Marlow about the memory retention behaviour of -Fd. I have just transcribed that conversation here as it elucidates the potentially subtle assumptions which led to the design of the memory retention behaviours of -Fd. Fixes #21483
* typosEric Lindblad2022-06-012-3/+3
|
* codeGen: Ensure that static datacon apps are included in SRTsBen Gamari2022-05-171-0/+6
| | | | | | | | | | | | | | | | When generating an SRT for a recursive group, GHC.Cmm.Info.Build.oneSRT filters out recursive references, as described in Note [recursive SRTs]. However, doing so for static functions would be unsound, for the reason described in Note [Invalid optimisation: shortcutting]. However, the same argument applies to static data constructor applications, as we discovered in #20959. Fix this by ensuring that static data constructor applications are included in recursive SRTs. The approach here is not entirely satisfactory, but it is a starting point. Fixes #20959.
* rts: Drop setExecutableBen Gamari2022-05-111-1/+0
| | | | | Since f6e366c058b136f0789a42222b8189510a3693d1 setExecutable has been dead code. Drop it.
* rts: state explicitly what evacuate and scavange mean in the copying gcAdam Sandberg Ericsson2022-04-272-1/+9
|
* Add note about inefficiency in returnMemoryToOSFabian Thorand2022-04-271-0/+8
|
* Defer freeing of mega block groupsFabian Thorand2022-04-273-35/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Solves the quadratic worst case performance of freeing megablocks that was described in issue #19897. During GC runs, we now keep a secondary free list for megablocks that is neither sorted, nor coalesced. That way, free becomes an O(1) operation at the expense of not being able to reuse memory for larger allocations. At the end of a GC run, the secondary free list is sorted and then merged into the actual free list in a single pass. That way, our worst case performance is O(n log(n)) rather than O(n^2). We postulate that temporarily losing coalescense during a single GC run won't have any adverse effects in practice because: - We would need to release enough memory during the GC, and then after that (but within the same GC run) allocate a megablock group of more than one megablock. This seems unlikely, as large objects are not copied during GC, and so we shouldn't need such large allocations during a GC run. - Allocations of megablock groups of more than one megablock are rare. They only happen when a single heap object is large enough to require that amount of space. Any allocation areas that are supposed to hold more than one heap object cannot use megablock groups, because only the first megablock of a megablock group has valid `bdescr`s. Thus, heap object can only start in the first megablock of a group, not in later ones.
* rts: Fix various #include issuesBen Gamari2022-04-063-5/+6
| | | | This fixes various violations of the newly-added RTS includes linter.
* rts: Don't mark object code in markCAFs unless necessaryBen Gamari2022-03-231-2/+4
| | | | | | | | Previously `markCAFs` would call `markObjectCode` even in non-major GCs. This is problematic since `prepareUnloadCheck` is not called in such GCs, meaning that the section index has not been updated. Fixes #21254
* rts: Untag function field in scavenge_PAP_payloadBen Gamari2022-03-231-1/+2
| | | | | | | | Previously we failed to untag the function closure when scavenging the payload of a PAP, resulting in an invalid closure pointer being passed to scavenge_large_bitmap and consequently #21254. Fix this. Fixes #21254
* rts: Address failures to inlineDouglas Wilson2022-02-023-11/+25
|
* Fix a few Note inconsistenciesBen Gamari2022-02-0112-29/+19
|
* rts: Rip out SPARC supportBen Gamari2022-01-291-20/+0
|
* rts/winio: Fix #18382Ben Gamari2022-01-183-3/+0
| | | | | | | | | | | | | | | | | | | Here we refactor WinIO's IO completion scheme, squashing a memory leak and fixing #18382. To fix #18382 we drop the special thread status introduced for IoPort blocking, BlockedOnIoCompletion, as well as drop the non-threaded RTS's special dead-lock detection logic (which is redundant to the GC's deadlock detection logic), as proposed in #20947. Previously WinIO relied on foreign import ccall "wrapper" to create an adjustor thunk which can be attached to the OVERLAPPED structure passed to the operating system. It would then use foreign import ccall "dynamic" to back out the original continuation from the adjustor. This roundtrip is significantly more expensive than the alternative, using a StablePtr. Furthermore, the implementation let the adjustor leak, meaning that every IO request would leak a page of memory. Fixes T18382.
* rts: correct stats when running with +RTS -qn1Douglas Wilson2021-12-121-28/+42
| | | | | | | | | | | | | | | | | | | Despite the documented care having been taken, several bugs are fixed here. When run with -qn1, when a SYNC_GC_PAR is requested we will have n_gc_threads == n_capabilities && n_gc_idle_threads == (n_gc_threads - 1) In this case we now: * Don't increment par_collections * Don't increment par_balanced_copied * Don't emit debug traces for idle threads * Take the fast path in scavenge_until_all_done, wakeup_gc_threads, and shutdown_gc_threads. Some ASSERTs have also been tightened. Fixes #19685
* Require all dirty_MUT_VAR callers to do explicit stg_MUT_VAR_CLEAN_info ↵nineonine2021-12-021-7/+9
| | | | comparison (#20088)
* rts: Ensure that markCAFs marks object codeBen Gamari2021-11-201-4/+11
| | | | | | | | | | | | | Previously `markCAFs` would only evacuate CAFs' indirectees. This would allow reachable object code to be unloaded by the linker as `evacuate` may never be called on the CAF itself, despite it being reachable via the `{dyn,revertible}_caf_list`s. To fix this we teach `markCAFs` to explicit call `markObjectCode`, ensuring that the linker is aware of objects reachable via the CAF lists. Fixes #20649.
* rts/nonmoving: Enable selector optimisation by defaultBen Gamari2021-10-121-5/+1
|
* rts/nonmoving: Rename mark_* to trace_*Ben Gamari2021-10-121-42/+42
| | | | These functions really do no marking; they merely trace pointers.
* nonmoving: Fix and factor out mark_trec_chunkBen Gamari2021-10-121-22/+17
| | | | | We need to ensure that the TRecChunk itself is marked, in addition to the TRecs it contains.
* fix non-moving gc heap space requirements estimateTeo Camarasu2021-10-071-1/+1
| | | | | | | | | The space requirements of the non-moving gc are comparable to the compacting gc, not the copying gc. The copying gc requires a much larger overhead. Fixes #20475
* Corrected types of thread ids obtained from the RTSMann mit Hut2021-10-061-1/+1
| | | | | | | | | | | | | | While the thread ids had been changed to 64 bit words in e57b7cc6d8b1222e0939d19c265b51d2c3c2b4c0 the return type of the foreign import function used to retrieve these ids - namely 'GHC.Conc.Sync.getThreadId' - was never updated accordingly. In order to fix that this function returns now a 'CUULong'. In addition to that the types used in the thread labeling subsystem were adjusted as well and several format strings were modified throughout the whole RTS to display thread ids in a consistent and correct way. Fixes #16761
* rts: Add missing write barriers in MVar wake-up pathsBen Gamari2021-10-021-0/+4
| | | | | | | | | | | Previously PerformPut failed to respect the non-moving collector's snapshot invariant, hiding references to an MVar and its new value by overwriting a stack frame without dirtying the stack. Fix this. PerformTake exhibited a similar bug, failing to dirty (and therefore mark) the blocked stack before mutating it. Closes #20399.
* Remove special case for large objects in allocateForCompactFabian Thorand2021-09-291-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | allocateForCompact() is called when the current allocation for the compact region does not fit in the nursery. It previously had a special case for objects exceeding the large object threshold. In that case, it would allocate a new compact region block just for that object. That led to a lot of small blocks being allocated in compact regions with a larger default block size (`autoBlockW`). This commit removes this special case because having a lot of small compact region blocks contributes significantly to memory fragmentation. The removal should be valid because - a more generic case for allocating a new compact region block follows at the end of allocateForCompact(), and that one takes `autoBlockW` into account - the reason for allocating separate blocks for large objects in the main heap seems to be to avoid copying during GCs, but once inside the compact region, the object will never be copied anyway. Fixes #18757. A regression test T18757 was added.
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.