summaryrefslogtreecommitdiff
path: root/includes/rts
diff options
context:
space:
mode:
authorBen Gamari <ben@smart-cactus.org>2019-10-23 14:01:45 -0400
committerBen Gamari <ben@smart-cactus.org>2019-10-23 14:56:46 -0400
commit7f72b540288bbdb32a6750dd64b9d366501ed10c (patch)
tree438203c9c0b052fb65210b5e89acfa7b1d44d5b8 /includes/rts
parent8abddac870d4b49f77b5ce56bfeb68328dd0d651 (diff)
parent984745b074c186f6058730087a4fc8156240ec76 (diff)
downloadhaskell-7f72b540288bbdb32a6750dd64b9d366501ed10c.tar.gz
Merge non-moving garbage collector
This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: | A variety of small optimisations to further reduce pause times. | * wip/gc/compact-nfdata: | Introduce support for compact regions into the non-moving |\ collector | \ | \ | | * wip/gc/segment-header-to-bdescr: | | | Another optimization that we are considering, pushing | | | some segment metadata into the segment descriptor for | | | the sake of locality during mark | | | | * | wip/gc/shortcutting: | | | Support for indirection shortcutting and the selector optimization | | | in the non-moving heap. | | | * | | wip/gc/docs: | |/ Work on implementation documentation. | / |/ * wip/gc/everything: | A roll-up of everything below. |\ | \ | |\ | | \ | | * wip/gc/optimize: | | | A variety of optimizations, primarily to the mark loop. | | | Some of these are microoptimizations but a few are quite | | | significant. In particular, the prefetch patches have | | | produced a nontrivial improvement in mark performance. | | | | | * wip/gc/aging: | | | Enable support for aging in major collections. | | | | * | wip/gc/test: | | | Fix up the testsuite to more or less pass. | | | * | | wip/gc/instrumentation: | | | A variety of runtime instrumentation including statistics | | / support, the nonmoving census, and eventlog support. | |/ | / |/ * wip/gc/nonmoving-concurrent: | The concurrent write barriers. | * wip/gc/nonmoving-nonconcurrent: | The nonmoving collector without the write barriers necessary | for concurrent collection. | * wip/gc/preparation: | A merge of the various preparatory patches that aren't directly | implementing the GC. | | * GHC HEAD . . . ```
Diffstat (limited to 'includes/rts')
-rw-r--r--includes/rts/EventLogFormat.h11
-rw-r--r--includes/rts/Flags.h11
-rw-r--r--includes/rts/NonMoving.h43
-rw-r--r--includes/rts/storage/Block.h42
-rw-r--r--includes/rts/storage/ClosureMacros.h14
-rw-r--r--includes/rts/storage/Closures.h2
-rw-r--r--includes/rts/storage/GC.h10
-rw-r--r--includes/rts/storage/InfoTables.h2
-rw-r--r--includes/rts/storage/TSO.h58
9 files changed, 177 insertions, 16 deletions
diff --git a/includes/rts/EventLogFormat.h b/includes/rts/EventLogFormat.h
index 7b989b014b..d5ed01a864 100644
--- a/includes/rts/EventLogFormat.h
+++ b/includes/rts/EventLogFormat.h
@@ -185,12 +185,21 @@
#define EVENT_USER_BINARY_MSG 181
+#define EVENT_CONC_MARK_BEGIN 200
+#define EVENT_CONC_MARK_END 201
+#define EVENT_CONC_SYNC_BEGIN 202
+#define EVENT_CONC_SYNC_END 203
+#define EVENT_CONC_SWEEP_BEGIN 204
+#define EVENT_CONC_SWEEP_END 205
+#define EVENT_CONC_UPD_REM_SET_FLUSH 206
+#define EVENT_NONMOVING_HEAP_CENSUS 207
+
/*
* The highest event code +1 that ghc itself emits. Note that some event
* ranges higher than this are reserved but not currently emitted by ghc.
* This must match the size of the EventDesc[] array in EventLog.c
*/
-#define NUM_GHC_EVENT_TAGS 182
+#define NUM_GHC_EVENT_TAGS 208
#if 0 /* DEPRECATED EVENTS: */
/* we don't actually need to record the thread, it's implicit */
diff --git a/includes/rts/Flags.h b/includes/rts/Flags.h
index b3caf13c1f..f27ce23b0b 100644
--- a/includes/rts/Flags.h
+++ b/includes/rts/Flags.h
@@ -52,6 +52,9 @@ typedef struct _GC_FLAGS {
double oldGenFactor;
double pcFreeHeap;
+ bool useNonmoving; // default = false
+ bool nonmovingSelectorOpt; // Do selector optimization in the
+ // non-moving heap, default = false
uint32_t generations;
bool squeezeUpdFrames;
@@ -95,6 +98,7 @@ typedef struct _DEBUG_FLAGS {
bool weak; /* 'w' */
bool gccafs; /* 'G' */
bool gc; /* 'g' */
+ bool nonmoving_gc; /* 'n' */
bool block_alloc; /* 'b' */
bool sanity; /* 'S' warning: might be expensive! */
bool zero_on_gc; /* 'Z' */
@@ -168,6 +172,7 @@ typedef struct _TRACE_FLAGS {
bool timestamp; /* show timestamp in stderr output */
bool scheduler; /* trace scheduler events */
bool gc; /* trace GC events */
+ bool nonmoving_gc; /* trace nonmoving GC events */
bool sparks_sampled; /* trace spark events by a sampled method */
bool sparks_full; /* trace spark events 100% accurately */
bool user; /* trace user events (emitted from Haskell code) */
@@ -268,7 +273,11 @@ typedef struct _RTS_FLAGS {
#if defined(COMPILING_RTS_MAIN)
extern DLLIMPORT RTS_FLAGS RtsFlags;
#elif IN_STG_CODE
-/* Hack because the C code generator can't generate '&label'. */
+/* Note [RtsFlags is a pointer in STG code]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * When compiling with IN_STG_CODE the RtsFlags symbol is defined as a pointer.
+ * This is necessary because the C code generator can't generate '&label'.
+ */
extern RTS_FLAGS RtsFlags[];
#else
extern RTS_FLAGS RtsFlags;
diff --git a/includes/rts/NonMoving.h b/includes/rts/NonMoving.h
new file mode 100644
index 0000000000..314c582a1e
--- /dev/null
+++ b/includes/rts/NonMoving.h
@@ -0,0 +1,43 @@
+/* -----------------------------------------------------------------------------
+ *
+ * (c) The GHC Team, 2018-2019
+ *
+ * Non-moving garbage collector
+ *
+ * Do not #include this file directly: #include "Rts.h" instead.
+ *
+ * To understand the structure of the RTS headers, see the wiki:
+ * http://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
+ *
+ * -------------------------------------------------------------------------- */
+
+#pragma once
+
+// Forward declaration for Stg.h
+struct StgClosure_;
+struct StgThunk_;
+struct Capability_;
+
+/* This is called by the code generator */
+extern DLL_IMPORT_RTS
+void updateRemembSetPushClosure_(StgRegTable *reg, struct StgClosure_ *p);
+
+extern DLL_IMPORT_RTS
+void updateRemembSetPushThunk_(StgRegTable *reg, struct StgThunk_ *p);
+
+// Forward declaration for unregisterised backend.
+EF_(stg_copyArray_barrier);
+
+// Note that RTS code should not condition on this directly by rather
+// use the IF_NONMOVING_WRITE_BARRIER_ENABLED macro to ensure that
+// the barrier is eliminated in the non-threaded RTS.
+extern StgWord DLL_IMPORT_DATA_VAR(nonmoving_write_barrier_enabled);
+
+// A similar macro is defined in includes/Cmm.h for C-- code.
+#if defined(THREADED_RTS)
+#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
+ if (RTS_UNLIKELY(nonmoving_write_barrier_enabled))
+#else
+#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
+ if (0)
+#endif
diff --git a/includes/rts/storage/Block.h b/includes/rts/storage/Block.h
index ecd6bf5dd8..4afc3689cb 100644
--- a/includes/rts/storage/Block.h
+++ b/includes/rts/storage/Block.h
@@ -88,15 +88,23 @@ typedef struct bdescr_ {
StgPtr start; // [READ ONLY] start addr of memory
- StgPtr free; // First free byte of memory.
- // allocGroup() sets this to the value of start.
- // NB. during use this value should lie
- // between start and start + blocks *
- // BLOCK_SIZE. Values outside this
- // range are reserved for use by the
- // block allocator. In particular, the
- // value (StgPtr)(-1) is used to
- // indicate that a block is unallocated.
+ union {
+ StgPtr free; // First free byte of memory.
+ // allocGroup() sets this to the value of start.
+ // NB. during use this value should lie
+ // between start and start + blocks *
+ // BLOCK_SIZE. Values outside this
+ // range are reserved for use by the
+ // block allocator. In particular, the
+ // value (StgPtr)(-1) is used to
+ // indicate that a block is unallocated.
+ //
+ // Unused by the non-moving allocator.
+ struct NonmovingSegmentInfo {
+ StgWord8 log_block_size;
+ StgWord16 next_free_snap;
+ } nonmoving_segment;
+ };
struct bdescr_ *link; // used for chaining blocks together
@@ -141,7 +149,8 @@ typedef struct bdescr_ {
#define BF_LARGE 2
/* Block is pinned */
#define BF_PINNED 4
-/* Block is to be marked, not copied */
+/* Block is to be marked, not copied. Also used for marked large objects in
+ * non-moving heap. */
#define BF_MARKED 8
/* Block is executable */
#define BF_EXEC 32
@@ -153,6 +162,12 @@ typedef struct bdescr_ {
#define BF_SWEPT 256
/* Block is part of a Compact */
#define BF_COMPACT 512
+/* A non-moving allocator segment (see NonMoving.c) */
+#define BF_NONMOVING 1024
+/* A large object which has been moved to off of oldest_gen->large_objects and
+ * onto nonmoving_large_objects. The mark phase ignores objects which aren't
+ * so-flagged */
+#define BF_NONMOVING_SWEEPING 2048
/* Maximum flag value (do not define anything higher than this!) */
#define BF_FLAG_MAX (1 << 15)
@@ -290,6 +305,13 @@ EXTERN_INLINE bdescr* allocBlock(void)
bdescr *allocGroupOnNode(uint32_t node, W_ n);
+// Allocate n blocks, aligned at n-block boundary. The returned bdescr will
+// have this invariant
+//
+// bdescr->start % BLOCK_SIZE*n == 0
+//
+bdescr *allocAlignedGroupOnNode(uint32_t node, W_ n);
+
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node);
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node)
{
diff --git a/includes/rts/storage/ClosureMacros.h b/includes/rts/storage/ClosureMacros.h
index a3873cc49d..2af50863d0 100644
--- a/includes/rts/storage/ClosureMacros.h
+++ b/includes/rts/storage/ClosureMacros.h
@@ -107,6 +107,20 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c)
return CON_INFO_PTR_TO_STRUCT((c)->header.info);
}
+/* Used when we expect another thread to be mutating the info table pointer of
+ * a closure (e.g. when busy-waiting on a WHITEHOLE).
+ */
+INLINE_HEADER const StgInfoTable *get_volatile_itbl(StgClosure *c) {
+ // The volatile here is import to ensure that the compiler does not
+ // optimise away multiple loads, e.g. in a busy-wait loop. Note that
+ // we can't use VOLATILE_LOAD here as the casts result in strict aliasing
+ // rule violations and this header may be compiled outside of the RTS
+ // (where we use -fno-strict-aliasing).
+ StgInfoTable * *volatile p = (StgInfoTable * *volatile) &c->header.info;
+ return INFO_PTR_TO_STRUCT(*p);
+}
+
+
INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con)
{
return get_itbl(con)->srt;
diff --git a/includes/rts/storage/Closures.h b/includes/rts/storage/Closures.h
index 6088fc8a10..b2b5eda407 100644
--- a/includes/rts/storage/Closures.h
+++ b/includes/rts/storage/Closures.h
@@ -94,7 +94,7 @@ typedef struct StgClosure_ {
struct StgClosure_ *payload[];
} *StgClosurePtr; // StgClosure defined in rts/Types.h
-typedef struct {
+typedef struct StgThunk_ {
StgThunkHeader header;
struct StgClosure_ *payload[];
} StgThunk;
diff --git a/includes/rts/storage/GC.h b/includes/rts/storage/GC.h
index 1571975852..7931433019 100644
--- a/includes/rts/storage/GC.h
+++ b/includes/rts/storage/GC.h
@@ -234,15 +234,23 @@ void setKeepCAFs (void);
and is put on the mutable list.
-------------------------------------------------------------------------- */
-void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p);
+void dirty_MUT_VAR(StgRegTable *reg, StgMutVar *mv, StgClosure *old);
/* set to disable CAF garbage collection in GHCi. */
/* (needed when dynamic libraries are used). */
extern bool keepCAFs;
+#include "rts/Flags.h"
+
INLINE_HEADER void initBdescr(bdescr *bd, generation *gen, generation *dest)
{
bd->gen = gen;
bd->gen_no = gen->no;
bd->dest_no = dest->no;
+
+#if !IN_STG_CODE
+ /* See Note [RtsFlags is a pointer in STG code] */
+ ASSERT(gen->no < RtsFlags.GcFlags.generations);
+ ASSERT(dest->no < RtsFlags.GcFlags.generations);
+#endif
}
diff --git a/includes/rts/storage/InfoTables.h b/includes/rts/storage/InfoTables.h
index 4de5207b4d..b97e12982b 100644
--- a/includes/rts/storage/InfoTables.h
+++ b/includes/rts/storage/InfoTables.h
@@ -355,7 +355,7 @@ typedef struct StgConInfoTable_ {
*/
#if defined(TABLES_NEXT_TO_CODE)
#define GET_CON_DESC(info) \
- ((const char *)((StgWord)((info)+1) + (info->con_desc)))
+ ((const char *)((StgWord)((info)+1) + ((info)->con_desc)))
#else
#define GET_CON_DESC(info) ((const char *)(info)->con_desc)
#endif
diff --git a/includes/rts/storage/TSO.h b/includes/rts/storage/TSO.h
index 93018581fd..d706282796 100644
--- a/includes/rts/storage/TSO.h
+++ b/includes/rts/storage/TSO.h
@@ -185,10 +185,66 @@ typedef struct StgTSO_ {
} *StgTSOPtr; // StgTSO defined in rts/Types.h
+/* Note [StgStack dirtiness flags and concurrent marking]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * Without concurrent collection by the nonmoving collector the stack dirtiness story
+ * is quite simple: The stack is either STACK_DIRTY (meaning it has been added to mut_list)
+ * or not.
+ *
+ * However, things are considerably more complicated with concurrent collection
+ * (namely, when nonmoving_write_barrier_enabled is set): In addition to adding
+ * the stack to mut_list and flagging it as STACK_DIRTY, we also must ensure
+ * that stacks are marked in accordance with the nonmoving collector's snapshot
+ * invariant. This is: every stack alive at the time the snapshot is taken must
+ * be marked at some point after the moment the snapshot is taken and before it
+ * is mutated or the commencement of the sweep phase.
+ *
+ * This marking may be done by the concurrent mark phase (in the case of a
+ * thread that never runs during the concurrent mark) or by the mutator when
+ * dirtying the stack. However, it is unsafe for the concurrent collector to
+ * traverse the stack while it is under mutation. Consequently, the following
+ * handshake is obeyed by the mutator's write barrier and the concurrent mark to
+ * ensure this doesn't happen:
+ *
+ * 1. The entity seeking to mark first checks that the stack lives in the nonmoving
+ * generation; if not then the stack was not alive at the time the snapshot
+ * was taken and therefore we need not mark it.
+ *
+ * 2. The entity seeking to mark checks the stack's mark bit. If it is set then
+ * no mark is necessary.
+ *
+ * 3. The entity seeking to mark tries to lock the stack for marking by
+ * atomically setting its `marking` field to the current non-moving mark
+ * epoch:
+ *
+ * a. If the mutator finds the concurrent collector has already locked the
+ * stack then it waits until it is finished (indicated by the mark bit
+ * being set) before proceeding with execution.
+ *
+ * b. If the concurrent collector finds that the mutator has locked the stack
+ * then it moves on, leaving the mutator to mark it. There is no need to wait;
+ * the mark is guaranteed to finish before sweep due to the post-mark
+ * synchronization with mutators.
+ *
+ * c. Whoever succeeds in locking the stack is responsible for marking it and
+ * setting the stack's mark bit (either the BF_MARKED bit for large objects
+ * or otherwise its bit in its segment's mark bitmap).
+ *
+ * To ensure that mutation does not proceed until the stack is fully marked the
+ * mark phase must not set the mark bit until it has finished tracing.
+ *
+ */
+
+#define STACK_DIRTY 1
+// used by sanity checker to verify that all dirty stacks are on the mutable list
+#define STACK_SANE 64
+
typedef struct StgStack_ {
StgHeader header;
StgWord32 stack_size; // stack size in *words*
- StgWord32 dirty; // non-zero => dirty
+ StgWord8 dirty; // non-zero => dirty
+ StgWord8 marking; // non-zero => someone is currently marking the stack
StgPtr sp; // current stack pointer
StgWord stack[];
} StgStack;