| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
By taking all_tasks_mutex in stat_exit. Also better-document the fact
that the task statistics are protected by all_tasks_mutex.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the core heap structure and a serial mark/sweep
collector which can be used to manage the oldest-generation heap.
This is the first step towards a concurrent mark-and-sweep collector
aimed at low-latency applications.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
The basic heap structure used in this design is heavily inspired by
K. Ueno & A. Ohori. "A fully concurrent garbage collector for
functional programs on multicore processors." /ACM SIGPLAN Notices/
Vol. 51. No. 9 (presented by ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
The mark queue is a fairly straightforward chunked-array structure.
The representation is a bit more verbose than a typical mark queue to
accomodate a combination of two features:
* a mark FIFO, which improves the locality of marking, reducing one of
the major overheads seen in mark/sweep allocators (see [1] for
details)
* the selector optimization and indirection shortcutting, which
requires that we track where we found each reference to an object
in case we need to update the reference at a later point (e.g. when
we find that it is an indirection). See Note [Origin references in
the nonmoving collector] (in `NonMovingMark.h`) for details.
Beyond this the mark/sweep is fairly run-of-the-mill.
[1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for
Mark-Sweep Garbage Collection." ISMM 2007.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY
MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN
Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
MUT_ARR_PTRS).
(alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
Removed a few comments in Scav.c about FROZEN0 being on the mut_list
because it's now clear from the closure type.
Reviewers: bgamari, simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4784
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a large heap it's possible to build up a lot of finalizers
between GCs. We've observed GC spending up to 50% of its time running
finalizers. But there's no reason we have to run finalizers during
GC, and especially no reason we have to block *all* the mutator
threads while *one* GC thread runs finalizers one by one.
I thought about a bunch of alternative ways to handle this, which are
documented along with runSomeFinalizers() in Weak.c. The approach I
settled on is to have a capability run finalizers if it is idle. So
running finalizers is like a low-priority background thread. This
requires some minor scheduler changes, but not much. In the future we
might be able to move more GC work into here (I have my eye on freeing
large blocks, for example).
Test Plan:
* validate
* tested on our system and saw reductions in GC pauses of 40-50%.
Reviewers: bgamari, niteria, osa1, erikd
Reviewed By: bgamari, osa1
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4521
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: pacak, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4068
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[skip ci]
There ware some old file names (.lhs, ...) at comments.
* rts/win32/ThrIOManager.c
- Conc.lhs -> Conc.hs
* rts/PrimOps.cmm
- ByteCodeLink.lhs -> ByteCodeLink.hs
- StgMiscClosures.hc -> StgMiscClosures.cmm
* rts/AutoApply.h
- AutoApply.hc -> AutoApply.cmm
* rts/HeapStackCheck.cmm
- PrimOps.hc -> PrimOps.cmm
* rts/LdvProfile.h
- Updates.hc -> Updates.cmm
* rts/Schedule.c
- StgStartup.hc -> StgStartup.cmm
* rts/Weak.c
- StgMiscClosures.hc -> StgMiscClosures.cmm
Reviewers: bgamari, austin, erikd, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3075
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There's a race condition like this:
# A foreign pointer gets promoted to the last generation
# It has its finalizer called manually
# We start shutting down the runtime in `hs_exit_` from the main
thread
# A minor GC starts running (`scheduleDoGC`) on one of the threads
# The minor GC notices that we're in `SCHED_INTERRUPTING` state and
advances to `SCHED_SHUTTING_DOWN`
# The main thread tries to do major GC (with `scheduleDoGC`), but it
exits early because we're in `SCHED_SHUTTING_DOWN` state
# We end up with a `DEAD_WEAK` left on the list of weak pointers of
the last generation, because it relied on major GC removing it from
that list
This change:
* Ignores DEAD_WEAK finalizers when shutting down
* Makes the major GC on shutdown more likely
* Fixes a bogus assert
Test Plan:
before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5
reproduced and after it doesn't
Reviewers: ezyang, austin, simonmar
Reviewed By: simonmar
Subscribers: bgamari, thomie
Differential Revision: https://phabricator.haskell.org/D921
GHC Trac Issues: #7170
|
|
|
|
|
|
|
| |
This reverts commit 35672072b4091d6f0031417bc160c568f22d0469.
Conflicts:
compiler/main/DriverPipeline.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In preparation for indirecting all references to closures,
we rename _closure to _static_closure to ensure any old code
will get an undefined symbol error. In order to reference
a closure foobar_closure (which is now undefined), you should instead
use STATIC_CLOSURE(foobar). For convenience, a number of these
old identifiers are macro'd.
Across C-- and C (Windows and otherwise), there were differing
conventions on whether or not foobar_closure or &foobar_closure
was the address of the closure. Now, all foobar_closure references
are addresses, and no & is necessary.
CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
Part of remove HEAP_ALLOCED patch set (#8199)
Depends on D265
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D267
GHC Trac Issues: #8199
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#.
This new primop mutates an existing Weak# object and adds a new
C finalizer to it.
This change removes an invariant in MarkWeak.c, namely that the relative
order of Weak# objects in the list needs to be preserved across GC. This
makes it easier to split the list into per-generation structures.
The patch also removes a race condition between two threads calling
finalizeWeak# on the same WEAK object at that same time.
|
|
|
|
|
| |
Fixes a bug reported by Lennart Augustsson, whereby we could get an
incorrect error from the RTS about re-entry from a finalizer,
|
|
|
|
|
|
|
|
|
|
|
|
| |
The card table is an array of bytes, placed directly following the
actual array data. This means that array reading is unaffected, but
array writing needs to read the array size from the header in order to
find the card table.
We use a bytemap rather than a bitmap, because updating the card table
must be multi-thread safe. Each byte refers to 128 entries of the
array, but this is tunable by changing the constant
MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of refactoring to remove some of the GC's global
state, as we move towards CPU-local GC.
- allocateLocal() now allocates large objects into the local
nursery, rather than taking a global lock and allocating
then in gen 0 step 0.
- allocatePinned() was still allocating from global storage and
taking a lock each time, now it uses local storage.
(mallocForeignPtrBytes should be faster with -threaded).
- We had a gen 0 step 0, distinct from the nurseries, which are
stored in a separate nurseries[] array. This is slightly strange.
I removed the g0s0 global that pointed to gen 0 step 0, and
removed all uses of it. I think now we don't use gen 0 step 0 at
all, except possibly when there is only one generation. Possibly
more tidying up is needed here.
- I removed the global allocate() function, and renamed
allocateLocal() to allocate().
- the alloc_blocks global is gone. MAYBE_GC() and
doYouWantToGC() now check the local nursery only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
|
|
|
|
|
|
| |
This is illegal now, after the fix for #1364, but it turns out that
the existing check for dodgy callbacks doesn't catch finalizers
calling back, so we need another test. This will be particularly
important for 6.10.2, because the behaviour has changed.
|
|
|
|
|
|
|
|
|
|
|
| |
not longer reachable.
Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by
Simon Marlow:
- mkWeakFinalizer# commoned up with mkWeakFinalizerEnv#
- GC parameters to ALLOC_PRIM fixed
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
A simple interface for generating trace messages with timestamps and
thread IDs attached to them. Most debugging output goes through this
interface now, so it is straightforward to get timestamped debugging
traces with +RTS -vt. Also, we plan to use this to generate
parallelism profiles from the trace output.
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|