| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #19143 I noticed that newArray# failed to initialize the card table
of newly-allocated arrays. However, embarrassingly, I then only fixed
the issue in newArrayArray# and, in so doing, introduced the potential
for an integer underflow on zero-length arrays (#21962).
Here I fix the issue in newArray#, this time ensuring that we do not
underflow in pathological cases.
Fixes #19143.
|
|
|
|
|
|
| |
This out-of-line primop has no Haskell wrapper and hasn't been used
anywhere in the tree. Furthermore, the code gets in the way of !7632, so
it should be garbage collected.
|
| |
|
|
|
|
|
|
|
| |
A user came to #ghc yesterday wondering how best to check whether they
were leaking threads. We ended up using the eventlog but it seems to me
like it would be generally useful if Haskell programs could query their
own threads.
|
|
|
|
| |
Fixes #21824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a naive approach to fixing the unsoundness noticed in #21708.
Specifically, we remove the lowering of `keepAlive#` via CorePrep and
instead turn it into an out-of-line primop.
This is simple, inefficient (since the continuation must now be heap
allocated), but good enough for 9.4.1. We will revisit this
(particiularly via #16098) in a future release.
Metric Increase:
T4978
T7257
T9203
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the following types levity-polymorphic in their
last argument:
- Array# a, SmallArray# a, Weak# b, StablePtr# a, StableName# a
- MutableArray# s a, SmallMutableArray# s a,
MutVar# s a, TVar# s a, MVar# s a, IOPort# s a
The corresponding primops are also made levity-polymorphic, e.g.
`newArray#`, `readArray#`, `writeMutVar#`, `writeIOPort#`, etc.
Additionally, exception handling functions such as `catch#`, `raise#`,
`maskAsyncExceptions#`,... are made levity/representation-polymorphic.
Now that Array# and MutableArray# also work with unlifted types,
we can simply re-define ArrayArray# and MutableArrayArray# in terms
of them. This means that ArrayArray# and MutableArrayArray# are no
longer primitive types, but simply unlifted newtypes around Array# and
MutableArrayArray#.
This completes the implementation of the Pointer Rep proposal
https://github.com/ghc-proposals/ghc-proposals/pull/203
Fixes #20911
-------------------------
Metric Increase:
T12545
-------------------------
-------------------------
Metric Decrease:
T12545
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we refactor WinIO's IO completion scheme, squashing a memory leak
and fixing #18382.
To fix #18382 we drop the special thread status introduced for IoPort
blocking, BlockedOnIoCompletion, as well as drop the non-threaded RTS's
special dead-lock detection logic (which is redundant to the GC's
deadlock detection logic), as proposed in #20947.
Previously WinIO relied on foreign import ccall "wrapper" to create an
adjustor thunk which can be attached to the OVERLAPPED structure passed
to the operating system. It would then use foreign import ccall
"dynamic" to back out the original continuation from the adjustor. This
roundtrip is significantly more expensive than the alternative, using a
StablePtr. Furthermore, the implementation let the adjustor leak,
meaning that every IO request would leak a page of memory.
Fixes T18382.
|
| |
|
|
|
|
|
|
|
| |
Previously the call to dirty_MUT_VAR in stg_atomicModifyMutVarzuzh was
missing its final argument.
Fixes #20414.
|
|
|
|
|
|
| |
This fixes an inconsistency where one dirtiness check would not mask out
the STACK_DIRTY flag, meaning it may also be affected by the STACK_SANE
flag.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously PerformPut failed to respect the non-moving collector's
snapshot invariant, hiding references to an MVar and its new value by
overwriting a stack frame without dirtying the stack. Fix this.
PerformTake exhibited a similar bug, failing to dirty (and therefore
mark) the blocked stack before mutating it.
Closes #20399.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit an Info Table Provenance Entry (IPE) for every stack represeted info table
if -finfo-table-map is turned on.
To decode a cloned stack, lookupIPE() is used. It provides a mapping between
info tables and their source location.
Please see these notes for details:
- [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)]
- [Mapping Info Tables to Source Positions]
Metric Increase:
T12545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack).
The cloning interface consists of two functions, that clone either the treads
own stack (cloneMyStack) or another threads stack (cloneThreadStack).
The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is
useful for analyses as it prevents concurrent modifications.
For technical details, please see Note [Stack Cloning].
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs
|
|
|
|
|
|
| |
Previously we branched unnecessarily on
IF_NONMOVING_WRITE_BARRIER_ENABLED on every trip through the array
barrier push loop.
|
|
|
|
|
|
|
|
|
|
| |
The `whereFrom` function provides a Haskell interface for using the
information created by `-finfo-table-map`. Given a Haskell value, the
info table address will be passed to the `lookupIPE` function in order
to attempt to find the source location information for that particular closure.
At the moment it's not possible to distinguish the absense of the map
and a failed lookup.
|
|
|
|
|
|
|
|
|
|
| |
Previously we would leave the card table of new arrays uninitialized.
This wasn't a soundness issue: at worst we would end up doing
unnecessary scavenging during GC, after which the card table would be
reset. That being said, it seems worth initializing this properly to
avoid both unnecessary work and non-determinism.
Fixes #19143.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fix for 18919 was somewhat incomplete: while the MVars were
correctly added to the mut_list via dirty_MVAR(), their info table
remained "clean".
While this is mostly harmless in non-debug builds, but trips an
assertion in the debug build, and may result in the MVar being
needlessly being added to the mut_list multiple times.
Resolves: #19145
|
| |
|
|
|
|
|
|
|
|
|
| |
While the original head and tail of the TSO queue may be in the same
generation as the MVAR, interior elements of the queue could be younger
after a GC run and may then be exposed by putMVar operation that updates
the queue head.
Resolves #18919
|
| |
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
| |
ALLOC_PRIM_P fails for large closures, by directly using allocate
we can handle closures which are larger than the block size.
Fixes #12492
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to phyx these should only be read and written once per
object. Not neccesarily in that order.
To strengthen that guarantee the primitives will now throw an
exception if we violate this invariant.
As a consequence we can eliminate some code from their primops.
In particular code dealing with multiple queued readers/writers
now simply checks the invariant and throws an exception if it
was violated. That is in contrast to mvars which will do things
like wake up all readers, queue multi writers etc.
|
| |
|
|
|
|
| |
instead of emulated ones
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must ensure that exceptions are not simplified. Previously we used:
case raiseDivZero of
_ -> 0## -- dummyValue
But it was wrong because the evaluation of `raiseDivZero` was removed and
the dummy value was directly returned. See new Note [ghc-bignum exceptions].
I've also removed the exception triggering primops which were fragile.
We don't need them to be primops, we can have them exported by ghc-prim.
I've also added a test for #18359 which triggered this patch.
|
|
|
|
|
|
|
|
|
| |
This updates comments only.
This patch replaces file references according to new module hierarchy.
See also:
* https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular
* https://gitlab.haskell.org/ghc/ghc/issues/13009
|
|
|
|
|
|
|
|
|
|
|
| |
The code is just more confusing than it needs to be. We don't need to mix
the threaded check with the ldv profiling check since ldv's init already
checks for this. Hence they can be two separate checks. Taking the sanity
checking into account is also cleaner via DebugFlags.sanity. No need for
checking the DEBUG define.
The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the
old code had also make things a lot more opaque IMO so I removed those.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* SysTools
* Parser
* GHC.Builtin
* GHC.Iface.Recomp
* Settings
Update Haddock submodule
Metric Decrease:
Naperian
parsing001
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The additional commentary introduced by commit 8916e64e5437 ("Implement
shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got
this wrong. We set 'prim' to true in overwritingClosureOfs because we
_don't_ want to call LDV_recordDead().
The reason is because of this "inherently used" distinction made in the LDV
profiler so I rename the variable to be more appropriate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The heap profiler currently cannot traverse pinned blocks because of
alignment slop. This used to just be a minor annoyance as the whole block
is accounted into a special cost center rather than the respective object's
CCS, cf. #7275. However for the new root profiler we would like to be able
to visit _every_ closure on the heap. We need to do this so we can get rid
of the current 'flip' bit hack in the heap traversal code.
Since info pointers are always non-zero we can in principle skip all the
slop in the profiler if we can rely on it being zeroed. This assumption
caused problems in the past though, commit a586b33f8e ("rts: Correct
handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
the same trick for BF_LARGE objects but neglected to take into account that
shrink*Array# functions don't ensure that slop is zeroed when not
compiling with profiling.
Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix
as we will only be assuming slop is zeroed when profiling is on.
This commit also reduces the ammount of slop we introduce in the first
place by calculating the needed alignment before doing the allocation for
small objects where we know the next available address. For large objects
we don't know how much alignment we'll have to do yet since those details
are hidden behind the allocateMightFail function so there we continue to
allocate the maximum additional words we'll need to do the alignment.
So we don't have to duplicate all this logic in the cmm code we pull it
into the RTS allocatePinned function instead.
Metric Decrease:
T7257
haddock.Cabal
haddock.base
|
|
|
|
| |
Update haddock submodule
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a part of GHC Proposal #25: "Offer more array resizing primitives".
Resources related to the proposal:
- Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121
- Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst
Only shrinkSmallMutableArray# is implemented as a primop since a
library-space implementation of resizeSmallMutableArray# (in GHC.Exts)
is no less efficient than a primop would be. This may be replaced by
a primop in the future if someone devises a strategy for growing
arrays in-place. The library-space implementation always copies the
array when growing it.
This commit also tweaks the documentation of the deprecated
sizeofMutableByteArray#, removing the mention of concurrency. That
primop is unsound even in single-threaded applications. Additionally,
the non-negativity assertion on the existing shrinkMutableByteArray#
primop has been removed since this predicate is trivially always true.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.
Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.
Collector design
================
The full design of the collector implemented here is described in detail
in a technical note
> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)
This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by
> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
Implementation structure
========================
The majority of the collector is implemented in a handful of files:
* `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
to the nonmoving collector (`nonmoving_collect`), as well as the allocator
(`nonmoving_allocate`) and a number of utilities for manipulating the heap.
* `rts/NonmovingMark.c` implements the mark queue functionality, update
remembered set, and mark loop.
* `rts/NonmovingSweep.c` implements the sweep loop.
* `rts/NonmovingScav.c` implements the logic necessary to scavenge the
nonmoving heap.
Branch overview
===============
```
* wip/gc/opt-pause:
| A variety of small optimisations to further reduce pause times.
|
* wip/gc/compact-nfdata:
| Introduce support for compact regions into the non-moving
|\ collector
| \
| \
| | * wip/gc/segment-header-to-bdescr:
| | | Another optimization that we are considering, pushing
| | | some segment metadata into the segment descriptor for
| | | the sake of locality during mark
| | |
| * | wip/gc/shortcutting:
| | | Support for indirection shortcutting and the selector optimization
| | | in the non-moving heap.
| | |
* | | wip/gc/docs:
| |/ Work on implementation documentation.
| /
|/
* wip/gc/everything:
| A roll-up of everything below.
|\
| \
| |\
| | \
| | * wip/gc/optimize:
| | | A variety of optimizations, primarily to the mark loop.
| | | Some of these are microoptimizations but a few are quite
| | | significant. In particular, the prefetch patches have
| | | produced a nontrivial improvement in mark performance.
| | |
| | * wip/gc/aging:
| | | Enable support for aging in major collections.
| | |
| * | wip/gc/test:
| | | Fix up the testsuite to more or less pass.
| | |
* | | wip/gc/instrumentation:
| | | A variety of runtime instrumentation including statistics
| | / support, the nonmoving census, and eventlog support.
| |/
| /
|/
* wip/gc/nonmoving-concurrent:
| The concurrent write barriers.
|
* wip/gc/nonmoving-nonconcurrent:
| The nonmoving collector without the write barriers necessary
| for concurrent collection.
|
* wip/gc/preparation:
| A merge of the various preparatory patches that aren't directly
| implementing the GC.
|
|
* GHC HEAD
.
.
.
```
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This extends the non-moving collector to allow concurrent collection.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.
The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.
While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:
* In the case of objects with a dirty flag (e.g. `MVar`s) we can
exploit the fact that only the *first* mutation requires a write
barrier.
* Weak references, as usual, complicate things. In particular, we must
ensure that the referee of a weak object is marked if dereferenced by
the mutator. For this we (unfortunately) must introduce a read
barrier, as described in Note [Concurrent read barrier on deRefWeak#]
(in `NonMovingMark.c`).
* Stable names are also a bit tricky as described in Note [Sweeping
stable names in the concurrent collector] (`NonMovingSweep.c`).
We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
|
| |
| |
| |
| |
| | |
This were previously quite unclear and will change a bit under the
non-moving collector so let's clear this up now.
|
|/
|
|
|
|
|
|
| |
Previously partial roll back of a branch of an `orElse` was attempted
if validation failure was observed. Validation here, however, does
not account for what part of the transaction observed inconsistent
state. This commit fixes this by fully aborting and restarting the
transaction.
|
|
|
|
|
|
|
| |
Function createIOThread expects its second argument to be of size word.
The natural size of the second parameter is 32bits. Thus for some 64bit
architectures, where a write of the lower half of a register does not
clear the upper half, the value must be zero extended.
|