summaryrefslogtreecommitdiff
path: root/rts/PrimOps.cmm
Commit message (Collapse)AuthorAgeFilesLines
* rts: fix missing dirty_MVAR argument in stg_writeIOPortzhCheng Shao2022-09-121-1/+1
|
* rts: Ensure that Array# card arrays are initializedBen Gamari2022-08-081-0/+5
| | | | | | | | | | | | In #19143 I noticed that newArray# failed to initialize the card table of newly-allocated arrays. However, embarrassingly, I then only fixed the issue in newArrayArray# and, in so doing, introduced the potential for an integer underflow on zero-length arrays (#21962). Here I fix the issue in newArray#, this time ensuring that we do not underflow in pathological cases. Fixes #19143.
* rts: remove redundant stg_traceCcszhCheng Shao2022-08-081-15/+0
| | | | | | This out-of-line primop has no Haskell wrapper and hasn't been used anywhere in the tree. Furthermore, the code gets in the way of !7632, so it should be garbage collected.
* Add a primop to query the label of a threadBen Gamari2022-08-061-0/+11
|
* Add primop to list threadsBen Gamari2022-08-061-0/+8
| | | | | | | A user came to #ghc yesterday wondering how best to check whether they were leaking threads. We ended up using the eventlog but it seems to me like it would be generally useful if Haskell programs could query their own threads.
* rts: forkOn context switches the target capabilityDouglas Wilson2022-07-161-4/+0
| | | | Fixes #21824
* Make keepAlive# out-of-lineBen Gamari2022-07-161-0/+20
| | | | | | | | | | | | | | This is a naive approach to fixing the unsoundness noticed in #21708. Specifically, we remove the lowering of `keepAlive#` via CorePrep and instead turn it into an out-of-line primop. This is simple, inefficient (since the continuation must now be heap allocated), but good enough for 9.4.1. We will revisit this (particiularly via #16098) in a future release. Metric Increase: T4978 T7257 T9203
* rts: allow NULL to be used as an invalid StgStablePtrAdam Sandberg Ericsson2022-07-071-2/+8
|
* Fix a few Note inconsistenciesBen Gamari2022-02-011-5/+4
|
* Levity-polymorphic arrays and mutable variablessheaf2022-01-261-47/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the following types levity-polymorphic in their last argument: - Array# a, SmallArray# a, Weak# b, StablePtr# a, StableName# a - MutableArray# s a, SmallMutableArray# s a, MutVar# s a, TVar# s a, MVar# s a, IOPort# s a The corresponding primops are also made levity-polymorphic, e.g. `newArray#`, `readArray#`, `writeMutVar#`, `writeIOPort#`, etc. Additionally, exception handling functions such as `catch#`, `raise#`, `maskAsyncExceptions#`,... are made levity/representation-polymorphic. Now that Array# and MutableArray# also work with unlifted types, we can simply re-define ArrayArray# and MutableArrayArray# in terms of them. This means that ArrayArray# and MutableArrayArray# are no longer primitive types, but simply unlifted newtypes around Array# and MutableArrayArray#. This completes the implementation of the Pointer Rep proposal https://github.com/ghc-proposals/ghc-proposals/pull/203 Fixes #20911 ------------------------- Metric Increase: T12545 ------------------------- ------------------------- Metric Decrease: T12545 -------------------------
* rts/winio: Fix #18382Ben Gamari2022-01-181-3/+3
| | | | | | | | | | | | | | | | | | | Here we refactor WinIO's IO completion scheme, squashing a memory leak and fixing #18382. To fix #18382 we drop the special thread status introduced for IoPort blocking, BlockedOnIoCompletion, as well as drop the non-threaded RTS's special dead-lock detection logic (which is redundant to the GC's deadlock detection logic), as proposed in #20947. Previously WinIO relied on foreign import ccall "wrapper" to create an adjustor thunk which can be attached to the OVERLAPPED structure passed to the operating system. It would then use foreign import ccall "dynamic" to back out the original continuation from the adjustor. This roundtrip is significantly more expensive than the alternative, using a StablePtr. Furthermore, the implementation let the adjustor leak, meaning that every IO request would leak a page of memory. Fixes T18382.
* rts: Add optional bounds checking in out-of-line primopsBen Gamari2021-12-211-0/+18
|
* rts/primops: Fix write barrier in stg_atomicModifyMutVarzuzhBen Gamari2021-10-121-4/+4
| | | | | | | Previously the call to dirty_MUT_VAR in stg_atomicModifyMutVarzuzh was missing its final argument. Fixes #20414.
* rts: Unify stack dirtiness checkBen Gamari2021-10-021-3/+3
| | | | | | This fixes an inconsistency where one dirtiness check would not mask out the STACK_DIRTY flag, meaning it may also be affected by the STACK_SANE flag.
* rts: Add missing write barriers in MVar wake-up pathsBen Gamari2021-10-021-12/+33
| | | | | | | | | | | Previously PerformPut failed to respect the non-moving collector's snapshot invariant, hiding references to an MVar and its new value by overwriting a stack frame without dirtying the stack. Fix this. PerformTake exhibited a similar bug, failing to dirty (and therefore mark) the blocked stack before mutating it. Closes #20399.
* Use Info Table Provenances to decode cloned stack (#18163)Sven Tennie2021-09-231-11/+0
| | | | | | | | | | | | | | | | Emit an Info Table Provenance Entry (IPE) for every stack represeted info table if -finfo-table-map is turned on. To decode a cloned stack, lookupIPE() is used. It provides a mapping between info tables and their source location. Please see these notes for details: - [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)] - [Mapping Info Tables to Source Positions] Metric Increase: T12545
* Introduce stack snapshotting / cloning (#18741)Sven Tennie2021-09-231-0/+11
| | | | | | | | | | | | | | Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack). The cloning interface consists of two functions, that clone either the treads own stack (cloneMyStack) or another threads stack (cloneThreadStack). The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is useful for analyses as it prevents concurrent modifications. For technical details, please see Note [Stack Cloning]. Co-authored-by: Ben Gamari <bgamari.foss@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
* PrimOps: Add CAS op for all int sizesPeter Trommler2021-08-021-0/+52
| | | | | | | | | | | PPC NCG: Implement CAS inline for 32 and 64 bit testsuite: Add tests for smaller atomic CAS X86 NCG: Catch calls to CAS C fallback Primops: Add atomicCasWord[8|16|32|64]Addr# Add tests for atomicCasWord[8|16|32|64]Addr# Add changelog entry for new primops X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch ghc-prim: 64-bit CAS C fallback on all archs
* rts: Eliminate redundant branchGHC GitLab CI2021-06-261-3/+1
| | | | | | Previously we branched unnecessarily on IF_NONMOVING_WRITE_BARRIER_ENABLED on every trip through the array barrier push loop.
* Add whereFrom and whereFrom# primopMatthew Pickering2021-03-031-0/+9
| | | | | | | | | | The `whereFrom` function provides a Haskell interface for using the information created by `-finfo-table-map`. Given a Haskell value, the info table address will be passed to the `lookupIPE` function in order to attempt to find the source location information for that particular closure. At the moment it's not possible to distinguish the absense of the map and a failed lookup.
* rts: Initialize card table in newArray#Ben Gamari2021-01-171-0/+3
| | | | | | | | | | Previously we would leave the card table of new arrays uninitialized. This wasn't a soundness issue: at worst we would end up doing unnecessary scavenging during GC, after which the card table would be reset. That being said, it seems worth initializing this properly to avoid both unnecessary work and non-determinism. Fixes #19143.
* Maintain invariant: MVars on mut_list are dirtyViktor Dukhovni2021-01-031-0/+2
| | | | | | | | | | | | The fix for 18919 was somewhat incomplete: while the MVars were correctly added to the mut_list via dirty_MVAR(), their info table remained "clean". While this is mostly harmless in non-debug builds, but trips an assertion in the debug build, and may result in the MVar being needlessly being added to the mut_list multiple times. Resolves: #19145
* rts: enable thread label table in all RTS flavours #17972Adam Sandberg Ericsson2020-12-201-2/+0
|
* dirty MVAR after mutating TSO queue headViktor Dukhovni2020-11-301-10/+20
| | | | | | | | | While the original head and tail of the TSO queue may be in the same generation as the MVAR, interior elements of the queue could be younger after a GC run and may then be exposed by putMVar operation that updates the queue head. Resolves #18919
* nonmoving: Add missing write barrier in shrinkSmallByteArrayGHC GitLab CI2020-11-291-0/+15
|
* ghc-heap: partial TSO/STACK decodingDavid Eichmann2020-11-281-3/+3
| | | | | | Co-authored-by: Sven Tennie <sven.tennie@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com> Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
* Use allocate, not ALLOC_PRIM_P for unpackClosure#Michalis Pardalos2020-07-271-5/+7
| | | | | | | ALLOC_PRIM_P fails for large closures, by directly using allocate we can handle closures which are larger than the block size. Fixes #12492
* winio: remove dead argument to stg_newIOPortzhTamar Christina2020-07-261-1/+1
|
* winio: A few more improvements to the IOPort primitives.Andreas Klebinger2020-07-151-11/+41
|
* winio: Clean up code surrounding IOPort primitives.Andreas Klebinger2020-07-151-30/+67
| | | | | | | | | | | | | | According to phyx these should only be read and written once per object. Not neccesarily in that order. To strengthen that guarantee the primitives will now throw an exception if we violate this invariant. As a consequence we can eliminate some code from their primops. In particular code dealing with multiple queued readers/writers now simply checks the invariant and throws an exception if it was violated. That is in contrast to mvars which will do things like wake up all readers, queue multi writers etc.
* winio: Add IOPort synchronization primitiveTamar Christina2020-07-151-0/+173
|
* winio: Use SlimReaderLocks and ConditonalVariables provided by the OS ↵Tamar Christina2020-07-151-2/+2
| | | | instead of emulated ones
* Fix ghc-bignum exceptionsSylvain Henry2020-06-271-19/+0
| | | | | | | | | | | | | | | We must ensure that exceptions are not simplified. Previously we used: case raiseDivZero of _ -> 0## -- dummyValue But it was wrong because the evaluation of `raiseDivZero` was removed and the dummy value was directly returned. See new Note [ghc-bignum exceptions]. I've also removed the exception triggering primops which were fragile. We don't need them to be primops, we can have them exported by ghc-prim. I've also added a test for #18359 which triggered this patch.
* Clean up file paths for new module hierarchyTakenobu Tani2020-06-011-1/+1
| | | | | | | | | This updates comments only. This patch replaces file references according to new module hierarchy. See also: * https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular * https://gitlab.haskell.org/ghc/ghc/issues/13009
* Cleanup OVERWRITING_CLOSURE logicDaniel Gröber2020-06-011-6/+6
| | | | | | | | | | | The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
* Modules (#13009)Sylvain Henry2020-04-181-1/+1
| | | | | | | | | | | | | | * SysTools * Parser * GHC.Builtin * GHC.Iface.Recomp * Settings Update Haddock submodule Metric Decrease: Naperian parsing001
* Remove call to LDV_RECORD_CREATE for array resizingDaniel Gröber2020-04-141-15/+10
|
* rts: Fix nomenclature in OVERWRITING_CLOSURE macrosDaniel Gröber2020-04-141-4/+16
| | | | | | | | | | The additional commentary introduced by commit 8916e64e5437 ("Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got this wrong. We set 'prim' to true in overwritingClosureOfs because we _don't_ want to call LDV_recordDead(). The reason is because of this "inherently used" distinction made in the LDV profiler so I rename the variable to be more appropriate.
* Zero out pinned block alignment slop when profilingDaniel Gröber2020-04-141-18/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
* Module hierarchy: ByteCode and Runtime (cf #13009)Sylvain Henry2020-02-121-1/+1
| | | | Update haddock submodule
* Add arithmetic exception primops (#14664)Sylvain Henry2020-02-111-0/+19
|
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-1/+1
|
* Fix more typos, via an improved Levenshtein-style correctorBrian Wignall2020-01-121-1/+1
|
* Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.Andrew Martin2019-10-261-1/+18
| | | | | | | | | | | | | | | | | | | | | This is a part of GHC Proposal #25: "Offer more array resizing primitives". Resources related to the proposal: - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121 - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst Only shrinkSmallMutableArray# is implemented as a primop since a library-space implementation of resizeSmallMutableArray# (in GHC.Exts) is no less efficient than a primop would be. This may be replaced by a primop in the future if someone devises a strategy for growing arrays in-place. The library-space implementation always copies the array when growing it. This commit also tweaks the documentation of the deprecated sizeofMutableByteArray#, removing the mention of concurrency. That primop is unsound even in single-threaded applications. Additionally, the non-negativity assertion on the existing shrinkMutableByteArray# primop has been removed since this predicate is trivially always true.
* Merge non-moving garbage collectorBen Gamari2019-10-231-26/+86
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: | A variety of small optimisations to further reduce pause times. | * wip/gc/compact-nfdata: | Introduce support for compact regions into the non-moving |\ collector | \ | \ | | * wip/gc/segment-header-to-bdescr: | | | Another optimization that we are considering, pushing | | | some segment metadata into the segment descriptor for | | | the sake of locality during mark | | | | * | wip/gc/shortcutting: | | | Support for indirection shortcutting and the selector optimization | | | in the non-moving heap. | | | * | | wip/gc/docs: | |/ Work on implementation documentation. | / |/ * wip/gc/everything: | A roll-up of everything below. |\ | \ | |\ | | \ | | * wip/gc/optimize: | | | A variety of optimizations, primarily to the mark loop. | | | Some of these are microoptimizations but a few are quite | | | significant. In particular, the prefetch patches have | | | produced a nontrivial improvement in mark performance. | | | | | * wip/gc/aging: | | | Enable support for aging in major collections. | | | | * | wip/gc/test: | | | Fix up the testsuite to more or less pass. | | | * | | wip/gc/instrumentation: | | | A variety of runtime instrumentation including statistics | | / support, the nonmoving census, and eventlog support. | |/ | / |/ * wip/gc/nonmoving-concurrent: | The concurrent write barriers. | * wip/gc/nonmoving-nonconcurrent: | The nonmoving collector without the write barriers necessary | for concurrent collection. | * wip/gc/preparation: | A merge of the various preparatory patches that aren't directly | implementing the GC. | | * GHC HEAD . . . ```
| * Nonmoving: Ensure write barrier vanishes in non-threaded RTSBen Gamari2019-10-211-3/+3
| |
| * rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-201-24/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
| * rts: Give stack flags proper macrosBen Gamari2019-10-181-2/+2
| | | | | | | | | | This were previously quite unclear and will change a bit under the non-moving collector so let's clear this up now.
* | Full abort on validate failure merging `orElse`.Ryan Yates2019-10-231-20/+34
|/ | | | | | | | Previously partial roll back of a branch of an `orElse` was attempted if validation failure was observed. Validation here, however, does not account for what part of the transaction observed inconsistent state. This commit fixes this by fully aborting and restarting the transaction.
* Extend argument of createIOThread to word sizeStefan Schulze Frielinghaus2019-10-031-2/+2
| | | | | | | Function createIOThread expects its second argument to be of size word. The natural size of the second parameter is 32bits. Thus for some 64bit architectures, where a write of the lower half of a register does not clear the upper half, the value must be zero extended.