| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
This avoids a lock inversion between the storage manager mutex and
the stable pointer table mutex by not dropping the SM_MUTEX in
nonmovingCollect. This requires quite a bit of rejiggering but it
does seem like a better strategy.
|
| |
|
|
|
|
|
|
| |
Mark a number of accesses to do with tracking of the status of the
concurrent collection thread as atomic. No interesting races here,
merely necessary to satisfy TSAN.
|
| |
|
|
|
|
|
|
|
| |
To ensure that we don't race with a mutator entering a new CAF we take
the SM mutex before touching static_flag. The other option here would be
to instead modify newCAF to use a CAS but the present approach is a bit
safer.
|
|
|
|
| |
Since it may have been mutated by a moving GC.
|
| |
|
|
|
|
|
| |
We must use an acquire-fence when marking to ensure that the indirectee
is visible.
|
|
|
|
|
|
|
| |
0e274c39bf836d5bb846f5fa08649c75f85326ac added an assertion in
`dirty_MUT_VAR` checking that the MUT_VAR being dirtied was clean.
However, this isn't necessarily the case since another thread may have
raced us to dirty the object.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
And ensure accesses to n_capabilities are atomic (although with relaxed
ordering). This is necessary as RTS API callers may concurrently call
into the RTS without holding a capability.
|
|
|
|
|
| |
Also introduce MUT_FIELD marker in Closures.h to document mutable
fields.
|
|
|
|
|
| |
The global vars {blocked,sleeping}_queue are now in the Capability and
so get marked there via markCapabilityIOManager.
|
|
|
|
|
| |
This patch makes flushExec a no-op on wasm32, since there's no such
thing as executable memory on wasm32 in the first place.
|
|
|
|
|
|
| |
This patch makes the storage manager not return any memory on wasm32.
The detailed reason is described in Note [Megablock allocator on
wasm].
|
|
|
|
|
|
|
|
|
| |
When the heap is heavily block fragmented the live byte size might be
low while the memory usage is high. We want to ensure that heap overflow
triggers in these cases.
We do so by checking that we can return enough megablocks to
under maxHeapSize at the end of GC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we refactor the representation of info table provenance information
in object code to significantly reduce its size and link-time impact.
Specifically, we deduplicate strings and represent them as 32-bit
offsets into a common string table.
In addition, we rework the registration logic to eliminate allocation
from the registration path, which is run from a static initializer where
things like allocation are technically undefined behavior (although it
did previously seem to work). For similar reasons we eliminate lock
usage from registration path, instead relying on atomic CAS.
Closes #22077.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements GHC proposal 313, "Delimited continuation
primops", by adding native support for delimited continuations to the
GHC RTS.
All things considered, the patch is relatively small. It almost
exclusively consists of changes to the RTS; the compiler itself is
essentially unaffected. The primops come with fairly extensive Haddock
documentation, and an overview of the implementation strategy is given
in the Notes in rts/Continuation.c.
This first stab at the implementation prioritizes simplicity over
performance. Most notably, every continuation is always stored as a
single, contiguous chunk of stack. If one of these chunks is
particularly large, it can result in poor performance, as the current
implementation does not attempt to cleverly squeeze a subset of the
stack frames into the existing stack: it must fit all at once. If this
proves to be a performance issue in practice, a cleverer strategy would
be a worthwhile target for future improvements.
|
|
|
|
|
|
|
| |
This eliminates the thread label HashTable and instead tracks this
information in the TSO, allowing us to use proper StgArrBytes arrays for
backing the label and greatly simplifying management of object lifetimes
when we expose them to the user with the coming `threadLabel#` primop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a rather subtle bug in the logic responsible for scavenging
objects evacuated to the non-moving generation. In particular, objects
can be allocated into the non-moving generation by two ways:
a. evacuation out of from-space by the garbage collector
b. direct allocation by the mutator
Like all evacuation, objects moved by (a) must be scavenged, since they
may contain references to other objects located in from-space. To
accomplish this we have the following scheme:
* each nonmoving segment's block descriptor has a scan pointer which
points to the first object which has yet to be scavenged
* the GC tracks a set of "todo" segments which have pending scavenging
work
* to scavenge a segment, we scavenge each of the unmarked blocks
between the scan pointer and segment's `next_free` pointer.
We skip marked blocks since we know the allocator wouldn't have
allocated into marked blocks (since they contain presumably live
data).
We can stop at `next_free` since, by
definition, the GC could not have evacuated any objects to blocks
above `next_free` (otherwise `next_free wouldn't be the first free
block).
However, this neglected to consider objects allocated by path (b).
In short, the problem is that objects directly allocated by the mutator
may become unreachable (but not swept, since the containing segment is
not yet full), at which point they may contain references to swept objects.
Specifically, we observed this in #21885 in the following way:
1. the mutator (specifically in #21885, a `lockCAF`) allocates an object
(specifically a blackhole, which here we will call `blkh`; see Note
[Static objects under the nonmoving collector] for the reason why) on
the non-moving heap. The bitmap of the allocated block remains 0
(since allocation doesn't affect the bitmap) and the containing
segment's (which we will call `blkh_seg`) `next_free` is advanced.
2. We enter the blackhole, evaluating the blackhole to produce a result
(specificaly a cons cell) in the nursery
3. The blackhole gets updated into an indirection pointing to the cons
cell; it is pushed to the generational remembered set
4. we perform a GC, the cons cell is evacuated into the nonmoving heap
(into segment `cons_seg`)
5. the cons cell is marked
6. the GC concludes
7. the CAF and blackhole become unreachable
8. `cons_seg` is filled
9. we start another GC; the cons cell is swept
10. we start a new GC
11. something is evacuated into `blkh_seg`, adding it to the "todo" list
12. we attempt to scavenge `blkh_seg` (namely, all unmarked blocks
between `scan` and `next_free`, which includes `blkh`). We attempt to
evacuate `blkh`'s indirectee, which is the previously-swept cons cell.
This is unsafe, since the indirectee is no longer a valid heap
object.
The problem here was that the scavenging logic *assumed* that (a) was
the only source of allocations into the non-moving heap and therefore
*all* unmarked blocks between `scan` and `next_free` were evacuated.
However, due to (b) this is not true.
The solution is to ensure that that the scanned region only encompasses
the region of objects allocated during evacuation. We do this by
updating `scan` as we push the segment to the todo-segment list to
point to the block which was evacuated into.
Doing this required changing the nonmoving scavenging implementation's
update of the `scan` pointer to bump it *once*, instead of after
scavenging each block as was done previously. This is because we may end
up evacuating into the segment being scavenged as we scavenge it. This
was quite tricky to discover but the result is quite simple,
demonstrating yet again that global mutable state should be used
exceedingly sparingly.
Fixes #21885
(cherry picked from commit 0b27ea23efcb08639309293faf13fdfef03f1060)
|
|
|
|
|
|
|
|
| |
It can often be useful during debugging to be able to determine the
state of a nonmoving segment. Introduce some state, enabled by DEBUG, to
track this.
(cherry picked from commit 40e797ef591ae3122ccc98ab0cc3cfcf9d17bd7f)
|
|
|
|
|
|
|
|
| |
If the nonmoving gc is enabled and we are using a threaded RTS,
we now try to grab the collector mutex to avoid memInventory and
the collection racing.
Before memInventory was disabled.
|
|
|
|
|
|
|
|
|
|
|
| |
We were not updating the [copied,any_work,scav_find_work, max_n_todo_overflow]
counters during sequential collections. As well, we were double counting for
parallel collections.
To fix this we add an `else` clause to the `if (is_par_gc())`.
The par_* counters do not need to be updated in the sequential case
because they must be 0.
|
|
|
|
|
|
|
|
|
| |
In #21483 I had a discussion with Simon Marlow about the memory
retention behaviour of -Fd. I have just transcribed that conversation
here as it elucidates the potentially subtle assumptions which led to
the design of the memory retention behaviours of -Fd.
Fixes #21483
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating an SRT for a recursive group, GHC.Cmm.Info.Build.oneSRT
filters out recursive references, as described in Note [recursive SRTs].
However, doing so for static functions would be unsound, for the reason
described in Note [Invalid optimisation: shortcutting].
However, the same argument applies to static data constructor
applications, as we discovered in #20959. Fix this by ensuring that
static data constructor applications are included in recursive SRTs.
The approach here is not entirely satisfactory, but it is a starting
point.
Fixes #20959.
|
|
|
|
|
| |
Since f6e366c058b136f0789a42222b8189510a3693d1 setExecutable has been
dead code. Drop it.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Solves the quadratic worst case performance of freeing megablocks that
was described in issue #19897.
During GC runs, we now keep a secondary free list for megablocks that is
neither sorted, nor coalesced. That way, free becomes an O(1) operation
at the expense of not being able to reuse memory for larger allocations.
At the end of a GC run, the secondary free list is sorted and then
merged into the actual free list in a single pass.
That way, our worst case performance is O(n log(n)) rather than O(n^2).
We postulate that temporarily losing coalescense during a single GC run
won't have any adverse effects in practice because:
- We would need to release enough memory during the GC, and then after
that (but within the same GC run) allocate a megablock group of more
than one megablock. This seems unlikely, as large objects are not
copied during GC, and so we shouldn't need such large allocations
during a GC run.
- Allocations of megablock groups of more than one megablock are rare.
They only happen when a single heap object is large enough to require
that amount of space. Any allocation areas that are supposed to hold
more than one heap object cannot use megablock groups, because only
the first megablock of a megablock group has valid `bdescr`s. Thus,
heap object can only start in the first megablock of a group, not in
later ones.
|
|
|
|
| |
This fixes various violations of the newly-added RTS includes linter.
|
|
|
|
|
|
|
|
| |
Previously `markCAFs` would call `markObjectCode` even in non-major GCs.
This is problematic since `prepareUnloadCheck` is not called in such
GCs, meaning that the section index has not been updated.
Fixes #21254
|
|
|
|
|
|
|
|
| |
Previously we failed to untag the function closure when scavenging the
payload of a PAP, resulting in an invalid closure pointer being passed
to scavenge_large_bitmap and consequently #21254. Fix this.
Fixes #21254
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we refactor WinIO's IO completion scheme, squashing a memory leak
and fixing #18382.
To fix #18382 we drop the special thread status introduced for IoPort
blocking, BlockedOnIoCompletion, as well as drop the non-threaded RTS's
special dead-lock detection logic (which is redundant to the GC's
deadlock detection logic), as proposed in #20947.
Previously WinIO relied on foreign import ccall "wrapper" to create an
adjustor thunk which can be attached to the OVERLAPPED structure passed
to the operating system. It would then use foreign import ccall
"dynamic" to back out the original continuation from the adjustor. This
roundtrip is significantly more expensive than the alternative, using a
StablePtr. Furthermore, the implementation let the adjustor leak,
meaning that every IO request would leak a page of memory.
Fixes T18382.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Despite the documented care having been taken, several bugs are fixed here.
When run with -qn1, when a SYNC_GC_PAR is requested we will have
n_gc_threads == n_capabilities && n_gc_idle_threads == (n_gc_threads - 1)
In this case we now:
* Don't increment par_collections
* Don't increment par_balanced_copied
* Don't emit debug traces for idle threads
* Take the fast path in scavenge_until_all_done, wakeup_gc_threads, and
shutdown_gc_threads.
Some ASSERTs have also been tightened.
Fixes #19685
|
|
|
|
| |
comparison (#20088)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously `markCAFs` would only evacuate CAFs' indirectees. This would
allow reachable object code to be unloaded by the linker as `evacuate`
may never be called on the CAF itself, despite it being reachable via
the `{dyn,revertible}_caf_list`s.
To fix this we teach `markCAFs` to explicit call `markObjectCode`,
ensuring that the linker is aware of objects reachable via the CAF
lists.
Fixes #20649.
|
| |
|
|
|
|
| |
These functions really do no marking; they merely trace pointers.
|
|
|
|
|
| |
We need to ensure that the TRecChunk itself is marked, in addition to
the TRecs it contains.
|
|
|
|
|
|
|
|
|
| |
The space requirements of the non-moving gc are comparable to the
compacting gc, not the copying gc.
The copying gc requires a much larger overhead.
Fixes #20475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the thread ids had been changed to 64 bit words in
e57b7cc6d8b1222e0939d19c265b51d2c3c2b4c0 the return type of the foreign
import function used to retrieve these ids - namely
'GHC.Conc.Sync.getThreadId' - was never updated accordingly.
In order to fix that this function returns now a 'CUULong'.
In addition to that the types used in the thread labeling subsystem were
adjusted as well and several format strings were modified throughout the
whole RTS to display thread ids in a consistent and correct way.
Fixes #16761
|
|
|
|
|
|
|
|
|
|
|
| |
Previously PerformPut failed to respect the non-moving collector's
snapshot invariant, hiding references to an MVar and its new value by
overwriting a stack frame without dirtying the stack. Fix this.
PerformTake exhibited a similar bug, failing to dirty (and therefore
mark) the blocked stack before mutating it.
Closes #20399.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocateForCompact() is called when the current allocation for the
compact region does not fit in the nursery. It previously had a special
case for objects exceeding the large object threshold. In that case, it
would allocate a new compact region block just for that object. That led
to a lot of small blocks being allocated in compact regions with a
larger default block size (`autoBlockW`).
This commit removes this special case because having a lot of small
compact region blocks contributes significantly to memory fragmentation.
The removal should be valid because
- a more generic case for allocating a new compact region block follows
at the end of allocateForCompact(), and that one takes `autoBlockW`
into account
- the reason for allocating separate blocks for large objects in the
main heap seems to be to avoid copying during GCs, but once inside
the compact region, the object will never be copied anyway.
Fixes #18757.
A regression test T18757 was added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|