diff options
author | Ben Gamari <ben@smart-cactus.org> | 2022-07-21 11:36:59 -0400 |
---|---|---|
committer | Douglas Wilson <douglas.wilson@gmail.com> | 2022-07-27 11:29:56 +0100 |
commit | 094221ef9e5e0c50d16fafce6deffcdd72234b92 (patch) | |
tree | 4cceeaf935bac3d40d8cbfe7a3f16ef55585ca9d | |
parent | 4c678ffb917ba3b88c46f7085fe0811b65acedb5 (diff) | |
download | haskell-094221ef9e5e0c50d16fafce6deffcdd72234b92.tar.gz |
rts/nonmoving: Don't scavenge objects which weren't evacuated
This fixes a rather subtle bug in the logic responsible for scavenging
objects evacuated to the non-moving generation. In particular, objects
can be allocated into the non-moving generation by two ways:
a. evacuation out of from-space by the garbage collector
b. direct allocation by the mutator
Like all evacuation, objects moved by (a) must be scavenged, since they
may contain references to other objects located in from-space. To
accomplish this we have the following scheme:
* each nonmoving segment's block descriptor has a scan pointer which
points to the first object which has yet to be scavenged
* the GC tracks a set of "todo" segments which have pending scavenging
work
* to scavenge a segment, we scavenge each of the unmarked blocks
between the scan pointer and segment's `next_free` pointer.
We skip marked blocks since we know the allocator wouldn't have
allocated into marked blocks (since they contain presumably live
data).
We can stop at `next_free` since, by
definition, the GC could not have evacuated any objects to blocks
above `next_free` (otherwise `next_free wouldn't be the first free
block).
However, this neglected to consider objects allocated by path (b).
In short, the problem is that objects directly allocated by the mutator
may become unreachable (but not swept, since the containing segment is
not yet full), at which point they may contain references to swept objects.
Specifically, we observed this in #21885 in the following way:
1. the mutator (specifically in #21885, a `lockCAF`) allocates an object
(specifically a blackhole, which here we will call `blkh`; see Note
[Static objects under the nonmoving collector] for the reason why) on
the non-moving heap. The bitmap of the allocated block remains 0
(since allocation doesn't affect the bitmap) and the containing
segment's (which we will call `blkh_seg`) `next_free` is advanced.
2. We enter the blackhole, evaluating the blackhole to produce a result
(specificaly a cons cell) in the nursery
3. The blackhole gets updated into an indirection pointing to the cons
cell; it is pushed to the generational remembered set
4. we perform a GC, the cons cell is evacuated into the nonmoving heap
(into segment `cons_seg`)
5. the cons cell is marked
6. the GC concludes
7. the CAF and blackhole become unreachable
8. `cons_seg` is filled
9. we start another GC; the cons cell is swept
10. we start a new GC
11. something is evacuated into `blkh_seg`, adding it to the "todo" list
12. we attempt to scavenge `blkh_seg` (namely, all unmarked blocks
between `scan` and `next_free`, which includes `blkh`). We attempt to
evacuate `blkh`'s indirectee, which is the previously-swept cons cell.
This is unsafe, since the indirectee is no longer a valid heap
object.
The problem here was that the scavenging logic *assumed* that (a) was
the only source of allocations into the non-moving heap and therefore
*all* unmarked blocks between `scan` and `next_free` were evacuated.
However, due to (b) this is not true.
The solution is to ensure that that the scanned region only encompasses
the region of objects allocated during evacuation. We do this by
updating `scan` as we push the segment to the todo-segment list to
point to the block which was evacuated into.
Doing this required changing the nonmoving scavenging implementation's
update of the `scan` pointer to bump it *once*, instead of after
scavenging each block as was done previously. This is because we may end
up evacuating into the segment being scavenged as we scavenge it. This
was quite tricky to discover but the result is quite simple,
demonstrating yet again that global mutable state should be used
exceedingly sparingly.
Fixes #21885
(cherry picked from commit 0b27ea23efcb08639309293faf13fdfef03f1060)
(cherry picked from commit 54a5c32d9d0792d8f3f80728edf6f39db5321fe5)
-rw-r--r-- | rts/sm/Evac.c | 3 | ||||
-rw-r--r-- | rts/sm/NonMoving.c | 4 | ||||
-rw-r--r-- | rts/sm/NonMovingScav.c | 91 |
3 files changed, 93 insertions, 5 deletions
diff --git a/rts/sm/Evac.c b/rts/sm/Evac.c index 834df459b4..e427386f5c 100644 --- a/rts/sm/Evac.c +++ b/rts/sm/Evac.c @@ -69,6 +69,7 @@ alloc_in_nonmoving_heap (uint32_t size) gct->copied += size; StgPtr to = nonmovingAllocate(gct->cap, size); + // See Note [Scavenging the non-moving heap] in NonMovingScav.c. // Add segment to the todo list unless it's already there // current->todo_link == NULL means not in todo list struct NonmovingSegment *seg = nonmovingGetSegment(to); @@ -76,6 +77,8 @@ alloc_in_nonmoving_heap (uint32_t size) gen_workspace *ws = &gct->gens[oldest_gen->no]; seg->todo_link = ws->todo_seg; ws->todo_seg = seg; + bdescr *seg_bd = Bdescr((StgPtr) seg); + seg_bd->u.scan = to; } // The object which refers to this closure may have been aged (i.e. diff --git a/rts/sm/NonMoving.c b/rts/sm/NonMoving.c index ad140ff495..0e601026ee 100644 --- a/rts/sm/NonMoving.c +++ b/rts/sm/NonMoving.c @@ -197,6 +197,10 @@ Mutex concurrent_coll_finished_lock; * - Note [Concurrent non-moving collection] (NonMoving.c) describes * concurrency control of the nonmoving collector * + * - Note [Scavenging the non-moving heap] (NonMovingScav.c) describes + * how data is scavenged after having been promoted into the non-moving + * heap. + * * - Note [Live data accounting in nonmoving collector] (NonMoving.c) * describes how we track the quantity of live data in the nonmoving * generation. diff --git a/rts/sm/NonMovingScav.c b/rts/sm/NonMovingScav.c index 56ebe5ffe4..b9e04ae527 100644 --- a/rts/sm/NonMovingScav.c +++ b/rts/sm/NonMovingScav.c @@ -10,6 +10,79 @@ #include "Printer.h" #include "MarkWeak.h" // scavengeLiveWeak +/* + * Note [Scavenging the non-moving heap] + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * The moving GC promotes objects from the moving heap into the non-moving heap + * via evacuation and subsequent scavenging. This of course raises the question + * of how to keep track of which objects still require scavenging. The story + * here is as follows: + * + * - each GC thread maintains a list, `todo_seg`, of "todo" segments which + * still have outstanding scavenging work + * - similar to the moving collector, we use the scan pointer of a segment's + * block descriptor to track the first yet-to-be-scavenged object. + * - the first time we evacuate into an segment during a GC, we push the + * segment onto the `todo_seg` list + * - scavenge_find_work checks `todo_seg` as a source of scavenging work + * + * The scan pointer requires very careful treatment here. Specifically, we must + * only scavenge objects which we evacuated in the current GC to avoid issues + * like #21885. objects evacuated to the non-moving generation. In particular, + * objects can be allocated into the non-moving generation by two ways: + * + * a. evacuation out of from-space by the garbage collector + * b. direct allocation by the mutator + * + * Like all evacuation, objects moved by (a) must be scavenged, since they + * may contain references to other objects located in from-space.. + * However, we must not neglect to consider objects allocated by path (b). + * In short, the problem is that objects directly allocated by the mutator + * may become unreachable (but not swept, since the containing segment is + * not yet full), at which point they may contain references to swept objects. + * Specifically, we observed this in #21885 in the following way: + * + * 1. the mutator (specifically in #21885, a `lockCAF`) allocates an object + * (specifically a blackhole, which here we will call `blkh`; see Note + * [Static objects under the nonmoving collector] for the reason why) on + * the non-moving heap. The bitmap of the allocated block remains 0 + * (since allocation doesn't affect the bitmap) and the containing + * segment's (which we will call `blkh_seg`) `next_free` is advanced. + * 2. We enter the blackhole, evaluating the blackhole to produce a result + * (specificaly a cons cell) in the nursery + * 3. The blackhole gets updated into an indirection pointing to the cons + * cell; it is pushed to the generational remembered set + * 4. we perform a GC, the cons cell is evacuated into the nonmoving heap + * (into segment `cons_seg`) + * 5. the cons cell is marked + * 6. the GC concludes + * 7. the CAF and blackhole become unreachable + * 8. `cons_seg` is filled + * 9. we start another GC; the cons cell is swept + * 10. we start a new GC + * 11. something is evacuated into `blkh_seg`, adding it to the "todo" list + * 12. we attempt to scavenge `blkh_seg` (namely, all unmarked blocks + * between `scan` and `next_free`, which includes `blkh`). We attempt to + * evacuate `blkh`'s indirectee, which is the previously-swept cons cell. + * This is unsafe, since the indirectee is no longer a valid heap + * object. + * + * The problem here was that the scavenging logic previously assumed that (a) + * was the only source of allocations into the non-moving heap and therefore + * *all* unmarked blocks between `scan` and `next_free` were evacuated. + * However, due to (b) this is not true, since the scan pointer was only + * updated (1) when the segment was initialized (to point to block 0), + * and (2) when an object is scavenged (by advancing it to the next block). + * Consequently, at the beginning of a GC `scan` may point a block which was + * allocated by the mutator since the last GC. + * + * The solution is to ensure that that the scanned region only encompasses + * the region of objects allocated for evacuation during the present GC. We do + * this by updating `scan` as we push the segment to the todo-segment list to + * point to the block which was evacuated into. + * + */ + void nonmovingScavengeOne (StgClosure *q) { @@ -383,13 +456,21 @@ scavengeNonmovingSegment (struct NonmovingSegment *seg) ASSERT(seg_block->u.scan >= (P_)nonmovingSegmentGetBlock(seg, 0)); ASSERT(seg_block->u.scan <= (P_)nonmovingSegmentGetBlock(seg, seg->next_free)); + StgPtr scan = seg_block->u.scan; StgPtr scan_end = (P_)nonmovingSegmentGetBlock(seg, seg->next_free); - if (seg_block->u.scan == scan_end) + if (scan == scan_end) return; - nonmoving_block_idx p_idx = nonmovingGetBlockIdx(seg_block->u.scan); - while (seg_block->u.scan < scan_end) { - StgClosure *p = (StgClosure*)seg_block->u.scan; + // At this point the segment is not on the todo_seg list. Consequently, we + // may need to re-add it during scavenging (as we may evacuate a new object + // to this segment); this has the effect of updating the scan pointer. + // We must therefore take care to move the scan pointer to the end of the + // scanned region *before* doing any scavenging. + seg_block->u.scan = scan_end; + + nonmoving_block_idx p_idx = nonmovingGetBlockIdx(scan); + while (scan < scan_end) { + StgClosure *p = (StgClosure*)scan; // bit set = was allocated in a previous GC, no need to scavenge // bit not set = new allocation, so scavenge @@ -397,7 +478,7 @@ scavengeNonmovingSegment (struct NonmovingSegment *seg) nonmovingScavengeOne(p); } + scan = (StgPtr) ((uint8_t*) scan + blk_size); p_idx++; - seg_block->u.scan = (P_)(((uint8_t*)seg_block->u.scan) + blk_size); } } |