summaryrefslogtreecommitdiff
path: root/rts/SMPClosureOps.h
Commit message (Collapse)AuthorAgeFilesLines
* rts: Introduce getNumCapabilitiesBen Gamari2022-12-161-3/+5
| | | | | | And ensure accesses to n_capabilities are atomic (although with relaxed ordering). This is necessary as RTS API callers may concurrently call into the RTS without holding a capability.
* rts: Use fences instead of explicit barriersBen Gamari2022-12-161-2/+1
|
* Merge branch 'wip/tsan/misc' into wip/tsan/allBen Gamari2020-11-011-2/+2
|\
| * rts: Use relaxed atomics for whitehole spin statsBen Gamari2020-10-241-2/+2
| |
* | rts: Use RELEASE ordering in unlockClosureBen Gamari2020-10-301-3/+2
|/
* Fix deadlock between STM and throwToSimon Marlow2018-07-121-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a lock-order reversal between lockTSO() and the TVar lock, see #15136 for the details. It turns out we can fix this pretty easily by just deleting all the locking code(!). The principle for unblocking a `BlockedOnSTM` thread then becomes the same as for other kinds of blocking: if the TSO belongs to this capability then we do it directly, otherwise we send a message to the capability that owns the TSO. That is, a thread blocked on STM is owned by its capability, as it should be. The possible downside of this is that we might send multiple messages to wake up a thread when the thread is on another capability. This is safe, it's just not very efficient. I'll try to do some experiments to see if this is a problem. Test Plan: Test case from #15136 doesn't deadlock any more. Reviewers: bgamari, osa1, erikd Reviewed By: osa1 Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15136 Differential Revision: https://phabricator.haskell.org/D4956
* rts: Add --internal-counters RTS flag and several countersDouglas Wilson2018-03-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing internal counters: * gc_alloc_block_sync * whitehole_spin * gen[g].sync * gen[1].sync are now not shown in the -s report unless --internal-counters is also passed. If --internal-counters is passed we now show the counters above, reformatted, as well as several other counters. In particular, we now count the yieldThread() calls that SpinLocks do as well as their spins. The added counters are: * gc_spin (spin and yield) * mut_spin (spin and yield) * whitehole_threadPaused (spin only) * whitehole_executeMessage (spin only) * whitehole_lockClosure (spin only) * waitForGcThreadsd (spin and yield) As well as the following, which are not SpinLock-like things: * any_work * do_work * scav_find_work See the Note for descriptions of what these counters are. We add busy_wait_nops in these loops along with the counter increment where it was absent. Old internal counters output: ``` gc_alloc_block_sync: 0 whitehole_gc_spin: 0 gen[0].sync: 0 gen[1].sync: 0 ``` New internal counters output: ``` Internal Counters: Spins Yields gc_alloc_block_sync 323 0 gc_spin 9016713 752 mut_spin 57360944 47716 whitehole_gc 0 n/a whitehole_threadPaused 0 n/a whitehole_executeMessage 0 n/a whitehole_lockClosure 0 0 waitForGcThreads 2 415 gen[0].sync 6 0 gen[1].sync 1 0 any_work 2017 no_work 2014 scav_find_work 1004 ``` Test Plan: ./validate Check it builds with #define PROF_SPIN removed from includes/rts/Config.h Reviewers: bgamari, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #3553, #9221 Differential Revision: https://phabricator.haskell.org/D4302
* Prefer #if defined to #ifdefBen Gamari2017-04-281-1/+1
| | | | Our new CPP linter enforces this.
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-231-4/+1
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* NUMA supportSimon Marlow2016-06-101-0/+129
Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199