delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| * \|	TSANUtils: Ensure that C11 atomics are supported	Ben Gamari	2020-10-24	1	-0/+4
\| \|/
\| *	rts: Introduce SET_HDR_RELEASEwip/tsan/prep	Ben Gamari	2020-10-24	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Also ensure that we also store the info table pointer last to ensure that the synchronization covers all stores.
\| *	rts/ClosureMaros: Use relaxed atomics	Ben Gamari	2020-10-24	1	-10/+13
\| \|
\| *	rts/SpinLock: Move to proper atomics	Ben Gamari	2020-10-24	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This is fairly straightforward; we just needed to use relaxed operations for the PROF_SPIN counters and a release store instead of a write barrier.
\| *	rts: Infrastructure for testing with ThreadSanitizer	Ben Gamari	2020-10-24	1	-0/+63
\| \|
* \|	[AArch64] Aarch64 Always PIC	Moritz Angermann	2020-11-06	1	-1/+1
\| \|
* \|	RtsAPI: pause and resume the RTS	David Eichmann	2020-11-02	1	-0/+4
\|/ \| \| \| \| \| \| \| \|	The `rts_pause` and `rts_resume` functions have been added to `RtsAPI.h` and allow an external process to completely pause and resume the RTS. Co-authored-by: Sven Tennie <sven.tennie@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com> Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
*	Remove unused global variables	Sylvain Henry	2020-09-30	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	Some removed globals variables were still declared in the RTS. They were removed in the following commits: * 4fc6524a2a4a0003495a96c8b84783286f65c198 * 0dc7985663efa1739aafb480759e2e2e7fca2a36 * bbd3c399939311ec3e308721ab87ca6b9443f358
*	Remove unsafeGlobalDynFlags (#17957, #14597)	Sylvain Henry	2020-09-30	1	-1/+3
\| \| \| \| \|	There are still global variables but only 3 booleans instead of a single DynFlags.
*	rts: Refactor unloading of foreign export StablePtrs	Ben Gamari	2020-09-18	1	-0/+2
\| \| \| \| \| \|	Previously we would allocate a linked list cell for each foreign export. Now we can avoid this by taking advantage of the fact that they are already broken into groups.
*	rts: Refactor foreign export tracking	Ben Gamari	2020-09-18	1	-0/+36
\| \| \| \| \| \| \| \| \|	This avoids calling `libc` in the initializers which are responsible for registering foreign exports. We believe this should avoid the corruption observed in #18548. See Note [Tracking foreign exports] in rts/ForeignExports.c for an overview of the new scheme.
*	winio: update lockfile signature and remove mistaken symbol in rts.	Tamar Christina	2020-07-15	1	-2/+2
\|
*	winio: Expand BlockedOnIOCompletion description.	Andreas Klebinger	2020-07-15	1	-1/+3
\|
*	winio: add a note about file locking in the RTS.	Andreas Klebinger	2020-07-15	1	-0/+16
\|
*	winio: Multiple refactorings and support changes.	Tamar Christina	2020-07-15	2	-8/+17
\|
*	winio: Add new io-manager cmdline options	Tamar Christina	2020-07-15	1	-0/+5
\|
*	winio: Add IOPort synchronization primitive	Tamar Christina	2020-07-15	2	-1/+6
\|
*	winio: Use SlimReaderLocks and ConditonalVariables provided by the OS ↵	Tamar Christina	2020-07-15	1	-42/+35
\| \| \| \|	instead of emulated ones
*	rts: Remove unused GET_ENTRY closure macro	Ömer Sinan Ağacan	2020-06-10	1	-2/+0
\| \| \| \| \|	This macro is not used and got broken in the meantime, as ENTRY_CODE was deleted.
*	Always zero shrunk mutable array slop when profiling	Daniel Gröber	2020-06-01	1	-5/+26
\| \| \| \| \| \| \| \| \|	When shrinking arrays in the profiling way we currently don't always zero the leftover slop. This means we can't traverse such closures in the heap profiler. The old Note [zeroing slop] and #8402 have some rationale for why this is so but I belive the reasoning doesn't apply to mutable closures. There users already have to ensure multiple threads don't step on each other's toes so zeroing should be safe.
*	Fix OVERWRITING_CLOSURE assuming closures are not inherently used	Daniel Gröber	2020-06-01	1	-21/+17
\| \| \| \| \| \|	The new ASSERT in LDV_recordDead() was being tripped up by MVars when removeFromMVarBlockedQueue() calls OVERWRITING_CLOSURE() via OVERWRITE_INFO().
*	Cleanup OVERWRITING_CLOSURE logic	Daniel Gröber	2020-06-01	1	-32/+49
\| \| \| \| \| \| \| \| \| \| \|	The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
*	eventlog: Fix racy flushing	Ben Gamari	2020-05-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	Previously no attempt was made to avoid multiple threads writing their capability-local eventlog buffers to the eventlog writer simultaneously. This could result in multiple eventlog streams being interleaved. Fix this by documenting that the EventLogWriter's write() and flush() functions may be called reentrantly and fix the default writer to protect its FILE* by a mutex. Fixes #18210.
*	fix printf warning when using with ghc with clang on mingw	Emeka Nkurumeh	2020-05-13	1	-1/+1
\|
*	users guide: Move eventlog documentation users guide	Ben Gamari	2020-05-03	1	-49/+9
\|
*	rts: Assert LDV_recordDead is not called for inherently used closures	Daniel Gröber	2020-04-14	1	-0/+7
\| \| \| \| \|	The comments make it clear LDV_recordDead should not be called for inhererently used closures, so add an assertion to codify this fact.
*	rts: Fix nomenclature in OVERWRITING_CLOSURE macros	Daniel Gröber	2020-04-14	1	-15/+16
\| \| \| \| \| \| \| \| \| \|	The additional commentary introduced by commit 8916e64e5437 ("Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got this wrong. We set 'prim' to true in overwritingClosureOfs because we _don't_ want to call LDV_recordDead(). The reason is because of this "inherently used" distinction made in the LDV profiler so I rename the variable to be more appropriate.
*	rts: Underline some Notes as is conventional	Daniel Gröber	2020-04-14	1	-0/+1
\|
*	rts: Expand and add more notes regarding slop	Daniel Gröber	2020-04-14	1	-16/+23
\|
*	Zero out pinned block alignment slop when profiling	Daniel Gröber	2020-04-14	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
*	Fix CNF handling in compacting GC	Ömer Sinan Ağacan	2020-04-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #17937 Previously compacting GC simply ignored CNFs. This is mostly fine as most (see "What about small compacts?" below) CNF objects don't have outgoing pointers, and are "large" (allocated in large blocks) and large objects are not moved or compacted. However if we do GC during sharing-preserving compaction then the CNF will have a hash table mapping objects that have been moved to the CNF to their location in the CNF, to be able to preserve sharing. This case is handled in the copying collector, in `scavenge_compact`, where we evacuate hash table entries and then rehash the table. Compacting GC ignored this case. We now visit CNFs in all generations when threading pointers to the compacted heap and thread hash table keys. A visited CNF is added to the list `nfdata_chain`. After compaction is done, we re-visit the CNFs in that list and rehash the tables. The overhead is minimal: the list is static in `Compact.c`, and link field is added to `StgCompactNFData` closure. Programs that don't use CNFs should not be affected. To test this CNF tests are now also run in a new way 'compacting_gc', which just passes `-c` to the RTS, enabling compacting GC for the oldest generation. Before this patch the result would be: Unexpected failures: compact_gc.run compact_gc [bad exit code (139)] (compacting_gc) compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc) With this patch all tests pass. I can also pass `-c -DS` without any failures. What about small compacts? Small CNFs are still not handled by the compacting GC. However so far I'm unable to write a test that triggers a runtime panic ("update_fwd: unknown/strange object") by allocating a small CNF in a compated heap. It's possible that I'm missing something and it's not possible to have a small CNF. NoFib Results: -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS +0.1% 0.0% 0.0% +0.0% +0.0% CSD +0.1% 0.0% 0.0% 0.0% 0.0% FS +0.1% 0.0% 0.0% 0.0% 0.0% S +0.1% 0.0% 0.0% 0.0% 0.0% VS +0.1% 0.0% 0.0% 0.0% 0.0% VSD +0.1% 0.0% +0.0% +0.0% -0.0% VSM +0.1% 0.0% +0.0% -0.0% 0.0% anna +0.0% 0.0% -0.0% -0.0% -0.0% ansi +0.1% 0.0% +0.0% +0.0% +0.0% atom +0.1% 0.0% +0.0% +0.0% +0.0% awards +0.1% 0.0% +0.0% +0.0% +0.0% banner +0.1% 0.0% +0.0% +0.0% +0.0% bernouilli +0.1% 0.0% 0.0% -0.0% +0.0% binary-trees +0.1% 0.0% -0.0% -0.0% 0.0% boyer +0.1% 0.0% +0.0% +0.0% +0.0% boyer2 +0.1% 0.0% +0.0% +0.0% +0.0% bspt +0.1% 0.0% -0.0% -0.0% -0.0% cacheprof +0.1% 0.0% -0.0% -0.0% -0.0% calendar +0.1% 0.0% +0.0% +0.0% +0.0% cichelli +0.1% 0.0% +0.0% +0.0% +0.0% circsim +0.1% 0.0% +0.0% +0.0% +0.0% clausify +0.1% 0.0% -0.0% +0.0% +0.0% comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0% compress +0.1% 0.0% +0.0% +0.0% 0.0% compress2 +0.1% 0.0% -0.0% 0.0% 0.0% constraints +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0% cse +0.1% 0.0% +0.0% +0.0% +0.0% digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0% digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0% dom-lt +0.1% 0.0% +0.0% +0.0% +0.0% eliza +0.1% 0.0% +0.0% +0.0% +0.0% event +0.1% 0.0% +0.0% +0.0% +0.0% exact-reals +0.1% 0.0% +0.0% +0.0% +0.0% exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0% expert +0.1% 0.0% +0.0% +0.0% +0.0% fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0% fasta +0.1% 0.0% -0.0% +0.0% +0.0% fem +0.1% 0.0% -0.0% +0.0% 0.0% fft +0.1% 0.0% -0.0% +0.0% +0.0% fft2 +0.1% 0.0% +0.0% +0.0% +0.0% fibheaps +0.1% 0.0% +0.0% +0.0% +0.0% fish +0.1% 0.0% +0.0% +0.0% +0.0% fluid +0.0% 0.0% +0.0% +0.0% +0.0% fulsom +0.1% 0.0% -0.0% +0.0% 0.0% gamteb +0.1% 0.0% +0.0% +0.0% 0.0% gcd +0.1% 0.0% +0.0% +0.0% +0.0% gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0% genfft +0.1% 0.0% +0.0% +0.0% +0.0% gg +0.1% 0.0% 0.0% +0.0% +0.0% grep +0.1% 0.0% -0.0% +0.0% +0.0% hidden +0.1% 0.0% +0.0% -0.0% 0.0% hpg +0.1% 0.0% -0.0% -0.0% -0.0% ida +0.1% 0.0% +0.0% +0.0% +0.0% infer +0.1% 0.0% +0.0% 0.0% -0.0% integer +0.1% 0.0% +0.0% +0.0% +0.0% integrate +0.1% 0.0% -0.0% -0.0% -0.0% k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0% kahan +0.1% 0.0% +0.0% +0.0% +0.0% knights +0.1% 0.0% -0.0% -0.0% -0.0% lambda +0.1% 0.0% +0.0% +0.0% -0.0% last-piece +0.1% 0.0% +0.0% 0.0% 0.0% lcss +0.1% 0.0% +0.0% +0.0% 0.0% life +0.1% 0.0% -0.0% +0.0% +0.0% lift +0.1% 0.0% +0.0% +0.0% +0.0% linear +0.1% 0.0% -0.0% +0.0% 0.0% listcompr +0.1% 0.0% +0.0% +0.0% +0.0% listcopy +0.1% 0.0% +0.0% +0.0% +0.0% maillist +0.1% 0.0% +0.0% -0.0% -0.0% mandel +0.1% 0.0% +0.0% +0.0% 0.0% mandel2 +0.1% 0.0% +0.0% +0.0% +0.0% mate +0.1% 0.0% +0.0% 0.0% +0.0% minimax +0.1% 0.0% -0.0% 0.0% -0.0% mkhprog +0.1% 0.0% +0.0% +0.0% +0.0% multiplier +0.1% 0.0% +0.0% 0.0% 0.0% n-body +0.1% 0.0% +0.0% +0.0% +0.0% nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0% para +0.1% 0.0% 0.0% +0.0% +0.0% paraffins +0.1% 0.0% +0.0% -0.0% 0.0% parser +0.1% 0.0% -0.0% -0.0% -0.0% parstof +0.1% 0.0% +0.0% +0.0% +0.0% pic +0.1% 0.0% -0.0% -0.0% 0.0% pidigits +0.1% 0.0% +0.0% -0.0% -0.0% power +0.1% 0.0% +0.0% +0.0% +0.0% pretty +0.1% 0.0% -0.0% -0.0% -0.1% primes +0.1% 0.0% -0.0% -0.0% -0.0% primetest +0.1% 0.0% -0.0% -0.0% -0.0% prolog +0.1% 0.0% -0.0% -0.0% -0.0% puzzle +0.1% 0.0% -0.0% -0.0% -0.0% queens +0.1% 0.0% +0.0% +0.0% +0.0% reptile +0.1% 0.0% -0.0% -0.0% +0.0% reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0% rewrite +0.1% 0.0% -0.0% -0.0% -0.0% rfib +0.1% 0.0% +0.0% +0.0% +0.0% rsa +0.1% 0.0% -0.0% +0.0% -0.0% scc +0.1% 0.0% -0.0% -0.0% -0.1% sched +0.1% 0.0% +0.0% +0.0% +0.0% scs +0.1% 0.0% +0.0% +0.0% +0.0% simple +0.1% 0.0% -0.0% -0.0% -0.0% solid +0.1% 0.0% +0.0% +0.0% +0.0% sorting +0.1% 0.0% -0.0% -0.0% -0.0% spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0% sphere +0.1% 0.0% -0.0% -0.0% -0.0% symalg +0.1% 0.0% -0.0% -0.0% -0.0% tak +0.1% 0.0% +0.0% +0.0% +0.0% transform +0.1% 0.0% +0.0% +0.0% +0.0% treejoin +0.1% 0.0% +0.0% -0.0% -0.0% typecheck +0.1% 0.0% +0.0% +0.0% +0.0% veritas +0.0% 0.0% +0.0% +0.0% +0.0% wang +0.1% 0.0% 0.0% +0.0% +0.0% wave4main +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0% x2n1 +0.1% 0.0% +0.0% +0.0% +0.0% -------------------------------------------------------------------------------- Min +0.0% 0.0% -0.0% -0.0% -0.1% Max +0.1% 0.0% +0.0% +0.0% +0.0% Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0% Bumping numbers of nonsensical perf tests: Metric Increase: T12150 T12234 T12425 T13035 T5837 T6048 It's simply not possible for this patch to increase allocations, and I've wasted enough time on these test in the past (see #17686). I think these tests should not be perf tests, but for now I'll bump the numbers.
*	nonmoving-gc: Track time usage of nonmoving marking	Ben Gamari	2020-03-05	1	-0/+1
\|
*	Document and refactor a few things around bitmap scavenging	Ömer Sinan Ağacan	2020-02-29	1	-0/+6
\| \| \| \| \| \| \| \|	- Added a few comments in StgPAP - Added a few comments and assertions in scavenge_small_bitmap and walk_large_bitmap - Did tiny refactor in GHC.Data.Bitmap: added some comments, deleted dead code, used PlatformWordSize type.
*	Module hierarchy: ByteCode and Runtime (cf #13009)	Sylvain Henry	2020-02-12	1	-1/+1
\| \| \| \|	Update haddock submodule
*	Make Block.h compile with c++ compilers	Matthew Pickering	2020-01-27	1	-4/+9
\|
*	Fix typos, via a Levenshtein-style corrector	Brian Wignall	2020-01-04	1	-1/+1
\|
*	Add "-Iw" RTS flag for minimum wait between idle GCs (#11134)	Kevin Buhr	2019-12-31	1	-0/+1
\|
*	rts: Consolidate spinlock implementation	Ben Gamari	2019-11-23	1	-47/+13
\| \| \| \| \|	Previously we had two distinct implementations: one with spinlock profiling and another without. This seems like needless duplication.
*	rts: Expose interface for configuring EventLogWriters	Ben Gamari	2019-11-23	1	-1/+27
\| \| \| \| \| \|	This exposes a set of interfaces from the GHC API for configuring EventLogWriters. These can be used by consumers like [ghc-eventlog-socket](https://github.com/bgamari/ghc-eventlog-socket).
*	Use pointer equality in Eq/Ord for ThreadId	Roland Zumkeller	2019-11-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Changes (==) to use only pointer equality. This is safe because two threads are the same iff they have the same id. Changes `compare` to check pointer equality first and fall back on ids only in case of inequality. See discussion in #16761.
*	Changing Thread IDs from 32 bits to 64 bits.	Roland Zumkeller	2019-11-19	2	-3/+3
\|
*	nonmoving: Use correct info table pointer accessor	Ben Gamari	2019-11-19	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \|	Previously we used INFO_PTR_TO_STRUCT instead of THUNK_INFO_PTR_TO_STRUCT when looking at a thunk. These two happen to be equivalent on 64-bit architectures due to alignment considerations however they are different on 32-bit platforms. This lead to #17487. To fix this we also employ a small optimization: there is only one thunk of type WHITEHOLE (namely stg_WHITEHOLE_info). Consequently, we can just use a plain pointer comparison instead of testing against info->type.
*	Add +RTS --disable-delayed-os-memory-return. Fixes #17411.	Niklas Hambüchen	2019-11-01	1	-0/+6
\| \| \| \| \| \|	Sets `MiscFlags.disableDelayedOsMemoryReturn`. See the added `Note [MADV_FREE and MADV_DONTNEED]` for details.
*	Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.	Andrew Martin	2019-10-26	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a part of GHC Proposal #25: "Offer more array resizing primitives". Resources related to the proposal: - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121 - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst Only shrinkSmallMutableArray# is implemented as a primop since a library-space implementation of resizeSmallMutableArray# (in GHC.Exts) is no less efficient than a primop would be. This may be replaced by a primop in the future if someone devises a strategy for growing arrays in-place. The library-space implementation always copies the array when growing it. This commit also tweaks the documentation of the deprecated sizeofMutableByteArray#, removing the mention of concurrency. That primop is unsound even in single-threaded applications. Additionally, the non-negativity assertion on the existing shrinkMutableByteArray# primop has been removed since this predicate is trivially always true.
*	Merge non-moving garbage collector	Ben Gamari	2019-10-23	9	-16/+177
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: \| A variety of small optimisations to further reduce pause times. \| * wip/gc/compact-nfdata: \| Introduce support for compact regions into the non-moving \|\ collector \| \ \| \ \| \| * wip/gc/segment-header-to-bdescr: \| \| \| Another optimization that we are considering, pushing \| \| \| some segment metadata into the segment descriptor for \| \| \| the sake of locality during mark \| \| \| \| * \| wip/gc/shortcutting: \| \| \| Support for indirection shortcutting and the selector optimization \| \| \| in the non-moving heap. \| \| \| * \| \| wip/gc/docs: \| \|/ Work on implementation documentation. \| / \|/ * wip/gc/everything: \| A roll-up of everything below. \|\ \| \ \| \|\ \| \| \ \| \| * wip/gc/optimize: \| \| \| A variety of optimizations, primarily to the mark loop. \| \| \| Some of these are microoptimizations but a few are quite \| \| \| significant. In particular, the prefetch patches have \| \| \| produced a nontrivial improvement in mark performance. \| \| \| \| \| * wip/gc/aging: \| \| \| Enable support for aging in major collections. \| \| \| \| * \| wip/gc/test: \| \| \| Fix up the testsuite to more or less pass. \| \| \| * \| \| wip/gc/instrumentation: \| \| \| A variety of runtime instrumentation including statistics \| \| / support, the nonmoving census, and eventlog support. \| \|/ \| / \|/ * wip/gc/nonmoving-concurrent: \| The concurrent write barriers. \| * wip/gc/nonmoving-nonconcurrent: \| The nonmoving collector without the write barriers necessary \| for concurrent collection. \| * wip/gc/preparation: \| A merge of the various preparatory patches that aren't directly \| implementing the GC. \| \| * GHC HEAD . . . ```
\| *-.	Merge branches 'wip/gc/segment-header-to-bdescr' and 'wip/gc/docs' into ↵wip/gc/everything2	Ben Gamari	2019-10-22	2	-11/+20
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	wip/gc/everything2
\| \| \| *	NonMoving: Add summarizing Notewip/gc/docs	Ben Gamari	2019-10-22	1	-0/+3
\| \| \| \|
\| \| * \|	NonMoving: Move next_free_snap to block descriptorwip/gc/segment-header-to-bdescr	Ben Gamari	2019-10-22	1	-0/+1
\| \| \| \|
\| \| * \|	NonMoving: Move block size to block descriptor	Ben Gamari	2019-10-22	1	-11/+16
\| \| \|/
\| * \|	NonMoving: Implement -xns to disable selector optimizationwip/gc/shortcutting	Ömer Sinan Ağacan	2019-10-22	1	-1/+3
\| \|/