| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Fixes #21824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
|
|
|
|
|
|
|
| |
The way in which allocatePinned took blocks out of the nursery was
leading to horrible fragmentation in some workloads.
The strategy now is that a separate free block list is reserved for each
capability and blocks are taken from there. When it's empty the global
SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated.
Fixes #19481
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #18043, flushTrace failed flush anything beyond the writer.
This means that a significant amount of data sitting in capability-local
event buffers may never get flushed, despite the users' pleads for us to
flush.
Fix this by making flushEventLog flush all of the event buffers before
flushing the writer.
Fixes #18043.
|
|\ |
|
| | |
|
| |
| |
| |
| |
| | |
This is necessary since emptyInbox may read from to_cap->inbox without
taking cap->lock.
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
| |
Previously this was left uninitialized.
Also clarify some comments.
|
|
|
|
|
|
|
|
|
|
|
| |
We're now correctly computing allocated bytes on 32-bit arch, so we get
huge increases.
Metric Increase:
haddock.Cabal
haddock.base
haddock.compiler
space_leak_001
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.
Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.
Collector design
================
The full design of the collector implemented here is described in detail
in a technical note
> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)
This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by
> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
Implementation structure
========================
The majority of the collector is implemented in a handful of files:
* `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
to the nonmoving collector (`nonmoving_collect`), as well as the allocator
(`nonmoving_allocate`) and a number of utilities for manipulating the heap.
* `rts/NonmovingMark.c` implements the mark queue functionality, update
remembered set, and mark loop.
* `rts/NonmovingSweep.c` implements the sweep loop.
* `rts/NonmovingScav.c` implements the logic necessary to scavenge the
nonmoving heap.
Branch overview
===============
```
* wip/gc/opt-pause:
| A variety of small optimisations to further reduce pause times.
|
* wip/gc/compact-nfdata:
| Introduce support for compact regions into the non-moving
|\ collector
| \
| \
| | * wip/gc/segment-header-to-bdescr:
| | | Another optimization that we are considering, pushing
| | | some segment metadata into the segment descriptor for
| | | the sake of locality during mark
| | |
| * | wip/gc/shortcutting:
| | | Support for indirection shortcutting and the selector optimization
| | | in the non-moving heap.
| | |
* | | wip/gc/docs:
| |/ Work on implementation documentation.
| /
|/
* wip/gc/everything:
| A roll-up of everything below.
|\
| \
| |\
| | \
| | * wip/gc/optimize:
| | | A variety of optimizations, primarily to the mark loop.
| | | Some of these are microoptimizations but a few are quite
| | | significant. In particular, the prefetch patches have
| | | produced a nontrivial improvement in mark performance.
| | |
| | * wip/gc/aging:
| | | Enable support for aging in major collections.
| | |
| * | wip/gc/test:
| | | Fix up the testsuite to more or less pass.
| | |
* | | wip/gc/instrumentation:
| | | A variety of runtime instrumentation including statistics
| | / support, the nonmoving census, and eventlog support.
| |/
| /
|/
* wip/gc/nonmoving-concurrent:
| The concurrent write barriers.
|
* wip/gc/nonmoving-nonconcurrent:
| The nonmoving collector without the write barriers necessary
| for concurrent collection.
|
* wip/gc/preparation:
| A merge of the various preparatory patches that aren't directly
| implementing the GC.
|
|
* GHC HEAD
.
.
.
```
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This extends the non-moving collector to allow concurrent collection.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.
The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.
While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:
* In the case of objects with a dirty flag (e.g. `MVar`s) we can
exploit the fact that only the *first* mutation requires a write
barrier.
* Weak references, as usual, complicate things. In particular, we must
ensure that the referee of a weak object is marked if dereferenced by
the mutator. For this we (unfortunately) must introduce a read
barrier, as described in Note [Concurrent read barrier on deRefWeak#]
(in `NonMovingMark.c`).
* Stable names are also a bit tricky as described in Note [Sweeping
stable names in the concurrent collector] (`NonMovingSweep.c`).
We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This implements the core heap structure and a serial mark/sweep
collector which can be used to manage the oldest-generation heap.
This is the first step towards a concurrent mark-and-sweep collector
aimed at low-latency applications.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
The basic heap structure used in this design is heavily inspired by
K. Ueno & A. Ohori. "A fully concurrent garbage collector for
functional programs on multicore processors." /ACM SIGPLAN Notices/
Vol. 51. No. 9 (presented by ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
The mark queue is a fairly straightforward chunked-array structure.
The representation is a bit more verbose than a typical mark queue to
accomodate a combination of two features:
* a mark FIFO, which improves the locality of marking, reducing one of
the major overheads seen in mark/sweep allocators (see [1] for
details)
* the selector optimization and indirection shortcutting, which
requires that we track where we found each reference to an object
in case we need to update the reference at a later point (e.g. when
we find that it is an indirection). See Note [Origin references in
the nonmoving collector] (in `NonMovingMark.h`) for details.
Beyond this the mark/sweep is fairly run-of-the-mill.
[1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for
Mark-Sweep Garbage Collection." ISMM 2007.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|/
|
|
|
|
| |
This patch adds support for the s390x architecture for the LLVM code
generator. The patch includes a register mapping of STG registers onto
s390x machine registers which enables a registerised build.
|
|
|
|
|
|
|
|
| |
These are unexploded minds as far as the linter is concerned. I don't
want to hit in my MRs by mistake!
I did this with `sed`, and then rolled back some changes in the docs,
config.guess, and the linter itself.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan:
I can't validate this because of existing errors with the debug runtime. I'll
see if this introduces any new failures.
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5337
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature has some very serious correctness issues (#14310),
introduces a great deal of complexity, and hasn't seen wide usage.
Consequently we are removing it, as proposed in Proposal #77 [1]. This
is heavily based on a patch from fryguybob.
Updates stm submodule.
[1] https://github.com/ghc-proposals/ghc-proposals/pull/77
Test Plan: Validate
Reviewers: erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14310
Differential Revision: https://phabricator.haskell.org/D4760
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
can be called from C/C++.
It's useful for callback-based C/C++ APIs: the idea is that the callback
invokes hs_try_putmvar(), and the Haskell code waits for the callback to
run by blocking in takeMVar.
The callback doesn't block - this is often a requirement of
callback-based APIs. The callback wakes up the Haskell thread with
minimal overhead and no unnecessary context-switches.
There are a couple of benchmarks in
testsuite/tests/concurrent/should_run. Some example results comparing
hs_try_putmvar() with using a standard foreign export:
./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s
./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s
hs_try_putmvar() is 4x faster for this workload (see the source for
hs_try_putmvar003.hs for details of the workload).
An alternative solution is to use the IO Manager for this. We've tried
it, but there are problems with that approach:
* Need to create a new file descriptor for each callback
* The IO Manger thread(s) become a bottleneck
* More potential for things to go wrong, e.g. throwing an exception in
an IO Manager callback kills the IO Manager thread.
Test Plan: validate; new unit tests
Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2501
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ASSERT_THREADED_CAPABILITY_INVARIANTS was testing properties of the
returning_tasks queue, but that requires cap->lock to access safely.
This assertion would randomly fail if stressed enough.
Instead I've removed it from the catch-all
ASSERT_PARTIAL_CAPABILITIY_INVARIANTS and made it a separate assertion
only called under cap->lock.
Test Plan:
```
cd testsuite/tests/concurrent/should_run
make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0
while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 || break; done
```
Reviewers: niteria, bgamari, ezyang, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2440
GHC Trac Issues: #10860
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Knowing the length of the run queue in O(1) time is useful: for example
we don't have to traverse the run queue to know how many threads we have
to migrate in schedulePushWork().
Test Plan: validate
Reviewers: ezyang, erikd, bgamari, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2437
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function takes a pointer parameter and doesn't update what
the pointer points to, we can add `const` to the parameter
declaration to document that no updates occur.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: austin, Phyx, bgamari, simonmar, hsyl20
Reviewed By: bgamari, simonmar, hsyl20
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2200
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the GC to use fewer threads than the number of capabilities.
At each GC, we choose some of the capabilities to be "idle", which means
that the thread running on that capability (if any) will sleep for the
duration of the GC, and the other threads will do its work. We choose
capabilities that are already idle (if any) to be the idle capabilities.
The idea is that this helps in the following situation:
* We want to use a large -N value so as to make use of hyperthreaded
cores
* We use a large heap size, so GC is infrequent
* But we don't want to use all -N threads in the GC, because that
thrashes the memory too much.
See docs for usage.
|
|
|
|
|
|
|
|
|
|
| |
Noticed by uselex.rb:
last_free_capability: [R]: exported from:
./rts/dist/build/Capability.o
shutdownCapability: [R]: exported from:
./rts/dist/build/Capability.o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yieldCapability() was not prepared to be called by a Task that is not
either a worker or a bound Task. This could happen if we ended up in
yieldCapability via this call stack:
performGC()
scheduleDoGC()
requestSync()
yieldCapability()
and there were a few other ways this could happen via requestSync.
The fix is to handle this case in yieldCapability(): when the Task is
not a worker or a bound Task, we put it on the returning_workers
queue, where it will be woken up again.
Summary of changes:
* `yieldCapability`: factored out subroutine waitForWorkerCapability`
* `waitForReturnCapability` renamed to `waitForCapability`, and
factored out subroutine `waitForReturnCapability`
* `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does
not take a lock and no longer tests if `!isBoundTask()`
* `yieldCapability` adjusted for refactorings, only change in behavior
is when it is not a worker or bound task.
Test Plan:
* new test concurrent/should_run/performGC
* validate
Reviewers: niteria, austin, ezyang, bgamari
Subscribers: thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D997
GHC Trac Issues: #10545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
clearNursery resets all the bd->free pointers of nursery blocks to
make the blocks empty. In profiles we've seen clearNursery taking
significant amounts of time particularly with large -N and -A values.
This patch moves the work of clearNursery to the point at which we
actually need the new block, thereby introducing an invariant that
blocks to the right of the CurrentNursery pointer still need their
bd->free pointer reset. This should make things faster overall,
because we don't need to clear blocks that we don't use.
Test Plan: validate
Reviewers: AndreasVoellmy, ezyang, austin
Subscribers: thomie, carter, ezyang, simonmar
Differential Revision: https://phabricator.haskell.org/D318
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This reverts commit 4748f5936fe72d96edfa17b153dbfd84f2c4c053. The fix for #9423
was reverted because this commit introduced a C function setIOManagerControlFd()
(defined in Schedule.c) defined for all OS types, while the prototype
(in includes/rts/IOManager.h) was only included when mingw32_HOST_OS is
not defined. This broke Windows builds.
This commit reverts the original commit and resolves the problem by only defining
setIOManagerControlFd() when mingw32_HOST_OS is defined. Hence the missing prototype
error should not occur on Windows.
In addition, since the io_manager_control_wr_fd field of the Capability struct is only
usd by the setIOManagerControlFd, this commit includes the io_manager_control_wr_fd
field in the Capability struct only when mingw32_HOST_OS is not defined.
Test Plan: Try to compile successfully on all platforms.
Reviewers: austin
Reviewed By: austin
Subscribers: simonmar, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D174
|
|
|
|
|
|
|
|
|
| |
This should fix the Windows fallout, and hopefully this will be fixed
once that's sorted out.
This reverts commit f9f89b7884ccc8ee5047cf4fffdf2b36df6832df.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix #9423.
The problem in #9423 is caused when code invoked by `hs_exit()` waits
on all foreign calls to return, but some IO managers are in `safe` foreign
calls and do not return. The previous design signaled to the timer manager
(via its control pipe) that it should "die" and when the timer manager
returned to Haskell-land, the Haskell code in timer manager then signalled
to the IO manager threads that they should return from foreign calls and
`die`. Unfortunately, in the shutdown sequence the timer manager is unable
to return to Haskell-land fast enough and so the code that signals to the
IO manager threads (via their control pipes) is never executed and the IO
manager threads remain out in the foreign calls.
This patch solves this problem by having the RTS signal to all the IO
manager threads (via their control pipes; and in addition to signalling
to the timer manager thread) that they should shutdown (in `ioManagerDie()`
in `rts/Signals.c`. To do this, we arrange for each IO manager thread to
register its control pipe with the RTS (in `GHC.Thread.startIOManagerThread`).
In addition, `GHC.Thread.startTimerManagerThread` registers its control pipe.
These are registered via C functions `setTimerManagerControlFd` (in
`rts/Signals.c`) and `setIOManagerControlFd` (in `rts/Capability.c`). The IO
manager control pipe file descriptors are stored in a new field of the
`Capability_ struct`.
Test Plan: See the notes on #9423 to recreate the problem and to verify that it no longer occurs with the fix.
Auditors: simonmar
Reviewers: simonmar, edsko, ezyang, austin
Reviewed By: austin
Subscribers: phaskell, simonmar, ezyang, carter, relrod
Differential Revision: https://phabricator.haskell.org/D129
GHC Trac Issues: #9423, #9284
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have various problems with reallocating the array of Capabilities,
due to threads in waitForReturnCapability that are already holding a
pointer to a Capability.
Rather than add more locking to make this safer, I decided it would be
easier to ensure that we never move the Capabilities at all. The
capabilities array is now an array of pointers to Capabaility. There
are extra indirections, but it rarely matters - we don't often access
Capabilities via the array, normally we already have a pointer to
one. I ran the parallel benchmarks and didn't see any difference.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
lnat was originally "long unsigned int" but we were using it when we
wanted a 64-bit type on a 64-bit machine. This broke on Windows x64,
where long == int == 32 bits. Using types of unspecified size is bad,
but what we really wanted was a type with N bits on an N-bit machine.
StgWord is exactly that.
lnat was mentioned in some APIs that clients might be using
(e.g. StackOverflowHook()), so we leave it defined but with a comment
to say that it's deprecated.
|
|
|
|
|
|
| |
If we are interrupted to do a GC, then we do not immediately do another
one. This avoids a starvation situation where one Capability keeps
forcing a GC and the other Capabilities make no progress at all.
|
|
|
|
|
|
|
| |
In addition to the existing global method. For now we just do
it both ways and assert they give the same grand total. At some
stage we can simplify the global method to just take the sum of
the per-cap counters.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocator.
Prompted by a benchmark posted to parallel-haskell@haskell.org by
Andreas Voellmy <andreas.voellmy@gmail.com>. This program exhibits
contention for the block allocator when run with -N2 and greater
without the fix:
{-# LANGUAGE MagicHash, UnboxedTuples, BangPatterns #-}
module Main where
import Control.Monad
import Control.Concurrent
import System.Environment
import GHC.IO
import GHC.Exts
import GHC.Conc
main = do
[m] <- fmap (fmap read) getArgs
n <- getNumCapabilities
ms <- replicateM n newEmptyMVar
sequence [ forkIO $ busyWorkerB (m `quot` n) >> putMVar mv () | mv <- ms ]
mapM takeMVar ms
busyWorkerB :: Int -> IO ()
busyWorkerB n_loops = go 0
where go !n | n >= n_loops = return ()
| otherwise =
do p <- (IO $ \s ->
case newPinnedByteArray# 1024# s of
{ (# s', mbarr# #) ->
(# s', () #)
}
)
go (n+1)
|