summaryrefslogtreecommitdiff
path: root/rts/Schedule.c
Commit message (Collapse)AuthorAgeFilesLines
* Finish stable splitDavid Feuer2018-08-291-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long ago, the stable name table and stable pointer tables were one. Now, they are separate, and have significantly different implementations. I believe the time has come to finish the split that began in #7674. * Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`. * Give each table its own mutex. * Add FFI functions `hs_lock_stable_ptr_table` and `hs_unlock_stable_ptr_table` and document them. These are intended to replace the previously undocumented `hs_lock_stable_tables` and `hs_lock_stable_tables`, which are now documented as deprecated synonyms. * Make `eqStableName#` use pointer equality instead of unnecessarily comparing stable name table indices. Reviewers: simonmar, bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #15555 Differential Revision: https://phabricator.haskell.org/D5084
* rts: Fix a var name in a comment, fix a typoÖmer Sinan Ağacan2018-06-121-1/+1
|
* Schedule.c: remove some unused parametersÖmer Sinan Ağacan2018-04-111-13/+13
|
* Schedule.c: remove unused codeÖmer Sinan Ağacan2018-04-111-13/+0
|
* Remove PARALLEL_HASKELL commentsÖmer Sinan Ağacan2018-04-101-2/+0
| | | | | | PARALLEL_HASKELL was long gone, remove references [skip ci]
* Run C finalizers incrementally during mutationSimon Marlow2018-03-251-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With a large heap it's possible to build up a lot of finalizers between GCs. We've observed GC spending up to 50% of its time running finalizers. But there's no reason we have to run finalizers during GC, and especially no reason we have to block *all* the mutator threads while *one* GC thread runs finalizers one by one. I thought about a bunch of alternative ways to handle this, which are documented along with runSomeFinalizers() in Weak.c. The approach I settled on is to have a capability run finalizers if it is idle. So running finalizers is like a low-priority background thread. This requires some minor scheduler changes, but not much. In the future we might be able to move more GC work into here (I have my eye on freeing large blocks, for example). Test Plan: * validate * tested on our system and saw reductions in GC pauses of 40-50%. Reviewers: bgamari, niteria, osa1, erikd Reviewed By: bgamari, osa1 Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4521
* Schedule.c: remove unreachable code blockÖmer Sinan Ağacan2018-03-071-7/+0
|
* Schedule.c: remove a redundant CPP guardÖmer Sinan Ağacan2018-03-061-2/+0
| | | | (the CPP guard is already wrapped with the same guard in line 1549)
* rts: Note functions which must take all_tasks_mutex.Ben Gamari2018-03-021-0/+1
|
* forkProcess: fix task mutex release orderÖmer Sinan Ağacan2018-03-021-4/+4
| | | | | | | | | | | | | | `all_tasks_mutex` should be released before calling `releaseCapability_` in the parent process as `releaseCapability_` spawns worker tasks in some cases. Reviewers: bgamari, erikd, simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14538 Differential Revision: https://phabricator.haskell.org/D4460
* rts: fix some barf format specifiers.Douglas Wilson2018-02-061-1/+1
| | | | | | | | | | Reviewers: bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4390
* rts: Add format attribute to barfBen Gamari2018-02-061-1/+1
| | | | | | | | | | | | Test Plan: Validate Reviewers: erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4374
* Fix a missing getNewNursery(), and related cleanupSimon Marlow2017-07-181-16/+3
| | | | | | | | | | | | | | | | | | | | Summary: When we use nursery chunks with +RTS -n<size>, when the current nursery runs out we have to check whether there's another chunk available with getNewNursery(). There was one place we weren't doing this: the ad-hoc heap check in scheduleProcessInbox(). The impact of the bug was that we would GC too early when using nursery chunks, especially in programs that used messages (throwTo between capabilities could do this, also hs_try_putmvar()). Test Plan: validate, also local testing in our application Reviewers: bgamari, niteria, austin, erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3749
* rts: Ensure that new capability count is > 0Ben Gamari2017-06-181-1/+7
| | | | | The Haskell wrapper already checks this but we should also check it in the RTS to catch non-Haskell callers. See #13832.
* Fix a lost-wakeup bug in BLACKHOLE handling (#13751)Simon Marlow2017-06-081-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The problem occurred when * Threads A & B evaluate the same thunk * Thread A context-switches, so the thunk gets blackholed * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to the blackhole and thread A's `tso->bq` queue * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE * We GC, replacing A's update frame with stg_enter_checkbh * Throw an exception in A, which ignores the stg_enter_checkbh frame Now we have C blocked on A's tso->bq queue, but we forgot to check the queue because the stg_enter_checkbh frame has been thrown away by the exception. The solution and alternative designs are discussed in Note [upd-black-hole]. This also exposed a bug in the interpreter, whereby we were sometimes context-switching without calling `threadPaused()`. I've fixed this and added some Notes. Test Plan: * `cd testsuite/tests/concurrent && make slow` * validate Reviewers: niteria, bgamari, austin, erikd Reviewed By: erikd Subscribers: rwbarton, thomie GHC Trac Issues: #13751 Differential Revision: https://phabricator.haskell.org/D3630
* Typos [ci skip]Gabor Greif2017-05-101-1/+1
|
* Prefer #if defined to #ifdefBen Gamari2017-04-281-31/+31
| | | | Our new CPP linter enforces this.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-281-8/+8
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Revert "Enable new warning for fragile/incorrect CPP #if usage"Ben Gamari2017-04-051-8/+8
| | | | | | | | This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-051-8/+8
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* rts: print incorrect prev_what_nextSergei Trofimovich2017-04-011-1/+1
| | | | | | | | | | Moritz Angermann reports mysterious rts crash: A: link: internal error: schedule: invalid what_next field A: (GHC version 8.3.20170321 for arm_none_linux_android) This change prints actual prev_what_next value. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Fix comment (old file names) in rts/Takenobu Tani2017-02-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [skip ci] There ware some old file names (.lhs, ...) at comments. * rts/win32/ThrIOManager.c - Conc.lhs -> Conc.hs * rts/PrimOps.cmm - ByteCodeLink.lhs -> ByteCodeLink.hs - StgMiscClosures.hc -> StgMiscClosures.cmm * rts/AutoApply.h - AutoApply.hc -> AutoApply.cmm * rts/HeapStackCheck.cmm - PrimOps.hc -> PrimOps.cmm * rts/LdvProfile.h - Updates.hc -> Updates.cmm * rts/Schedule.c - StgStartup.hc -> StgStartup.cmm * rts/Weak.c - StgMiscClosures.hc -> StgMiscClosures.cmm Reviewers: bgamari, austin, erikd, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3075
* Throw an exception on heap overflowDemi Obenour2017-01-101-19/+49
| | | | | | | | | | | | | | | | | This changes heap overflow to throw a HeapOverflow exception instead of killing the process. Test Plan: GHC CI Reviewers: simonmar, austin, hvr, erikd, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2790 GHC Trac Issues: #1791
* Install toplevel handler inside fork.Alexander Vershilov2016-12-021-1/+4
| | | | | | | | | | | | | | | | | | | | When rts is forked it doesn't update toplevel handler, so UserInterrupt exception is sent to Thread1 that doesn't exist in forked process. We install toplevel handler when fork so signal will be delivered to the new main thread. Fixes #12903 Reviewers: simonmar, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2770 GHC Trac Issues: #12903
* Use C99's boolBen Gamari2016-11-291-77/+77
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Spelling in comment onlyGabor Greif2016-11-181-1/+1
|
* Fix a bug in parallel GC synchronisationSimon Marlow2016-10-291-14/+16
| | | | | | | | | | | | | | | | | | | | | Summary: The problem boils down to global variables: in particular gc_threads[], which was being modified by a subsequent GC before the previous GC had finished with it. The fix is to not use global variables. This was causing setnumcapabilities001 to fail (again!). It's an old bug though. Test Plan: Ran setnumcapabilities001 in a loop for a couple of hours. Before this patch it had been failing after a few minutes. Not a very scientific test, but it's the best I have. Reviewers: bgamari, austin, fryguybob, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2654
* Make it possible to use +RTS -qn without -NSimon Marlow2016-10-281-1/+5
| | | | | | It's entirely reasonable to set +RTS -qn without setting -N, because the program might later call setNumCapabilities. If we disallow it, there's no way to use -qn on programs that use setNumCapabilities.
* Fix failure in setnumcapabilities001 (#12728)Simon Marlow2016-10-221-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | The value of enabled_capabilities can change across a call to requestSync(), and we were erroneously using an old value, causing things to go wrong later. It manifested as an assertion failure, I'm not sure whether there are worse consequences or not, but we should get this fix into 8.0.2 anyway. The failure didn't happen for me because it only shows up on machines with fewer than 4 processors, due to the new logic to enable -qn automatically. I've bumped the test parameter 8 to make it more likely to exercise that code. Test Plan: Ran setnumcapabilities001 many times Reviewers: niteria, austin, erikd, rwbarton, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2617 GHC Trac Issues: #12728
* Default +RTS -qn to the number of coresSimon Marlow2016-10-091-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting a -N value that is too large has a dramatic negative effect on performance, but the new -qn flag can mitigate the worst of the effects by limiting the number of GC threads. So now, if you don't explcitly set +RTS -qn, and you set -N larger than the number of cores (or use setNumCapabilities to do the same), we'll default -qn to the number of cores. These are the results from nofib/parallel on my 4-core (2 cores x 2 threads) i7 laptop, comparing -N8 before and after this change. ``` ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ blackscholes +0.0% +0.0% -72.5% -72.0% +9.5% coins +0.0% -0.0% -73.7% -72.2% -0.8% mandel +0.0% +0.0% -76.4% -75.4% +3.3% matmult +0.0% +15.5% -26.8% -33.4% +1.0% nbody +0.0% +2.4% +0.7% 0.076 0.0% parfib +0.0% -8.5% -33.2% -31.5% +2.0% partree +0.0% -0.0% -60.4% -56.8% +5.7% prsa +0.0% -0.0% -65.4% -60.4% 0.0% queens +0.0% +0.2% -58.8% -58.8% -1.5% ray +0.0% -1.5% -88.7% -85.6% -3.6% sumeuler +0.0% -0.0% -47.8% -46.9% 0.0% ------------------------------------------------------------------------ Min +0.0% -8.5% -88.7% -85.6% -3.6% Max +0.0% +15.5% +0.7% -31.5% +9.5% Geometric Mean +0.0% +0.6% -61.4% -63.1% +1.4% ``` Test Plan: validate, nofib/parallel benchmarks Reviewers: niteria, ezyang, nh2, austin, erikd, trofi, bgamari Reviewed By: trofi, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2580 GHC Trac Issues: #9221
* Add hs_try_putmvar()Simon Marlow2016-09-121-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a fast, non-blocking, asynchronous, interface to tryPutMVar that can be called from C/C++. It's useful for callback-based C/C++ APIs: the idea is that the callback invokes hs_try_putmvar(), and the Haskell code waits for the callback to run by blocking in takeMVar. The callback doesn't block - this is often a requirement of callback-based APIs. The callback wakes up the Haskell thread with minimal overhead and no unnecessary context-switches. There are a couple of benchmarks in testsuite/tests/concurrent/should_run. Some example results comparing hs_try_putmvar() with using a standard foreign export: ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s hs_try_putmvar() is 4x faster for this workload (see the source for hs_try_putmvar003.hs for details of the workload). An alternative solution is to use the IO Manager for this. We've tried it, but there are problems with that approach: * Need to create a new file descriptor for each callback * The IO Manger thread(s) become a bottleneck * More potential for things to go wrong, e.g. throwing an exception in an IO Manager callback kills the IO Manager thread. Test Plan: validate; new unit tests Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2501
* Another try to get thread migration rightSimon Marlow2016-08-051-99/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is surprisingly tricky. There were linked list bugs in the previous version (D2430) that showed up as a test failure in setnumcapabilities001 (that's a great stress test!). This new version uses a different strategy that doesn't suffer from the problem that @ezyang pointed out in D2430. We now pre-calculate how many threads to keep for this capability, and then migrate any surplus threads off the front of the queue, taking care to account for threads that can't be migrated. Test Plan: 1. setnumcapabilities001 stress test with sanity checking (+RTS -DS) turned on: ``` cd testsuite/tests/concurrent/should_run make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0 while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 || break; done ``` 2. The test case from #12419 Reviewers: niteria, ezyang, rwbarton, austin, bgamari, erikd Subscribers: thomie, ezyang Differential Revision: https://phabricator.haskell.org/D2441 GHC Trac Issues: #12419
* Fix to thread migrationSimon Marlow2016-08-031-24/+63
| | | | | | | | | | | | | | | | | | | | | | Summary: If we had 2 threads on the run queue, say [A,B], and B is bound to the current Task, then we would fail to migrate any threads. This fixes it so that we would migrate A in that case. This will help parallelism a bit in programs that have lots of bound threads. Test Plan: Test program in #12419, which is actually not a great program but it does behave a bit better after this change. Reviewers: ezyang, niteria, bgamari, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2430 GHC Trac Issues: #12419
* Track the lengths of the thread queuesSimon Marlow2016-08-031-22/+13
| | | | | | | | | | | | | | | Summary: Knowing the length of the run queue in O(1) time is useful: for example we don't have to traverse the run queue to know how many threads we have to migrate in schedulePushWork(). Test Plan: validate Reviewers: ezyang, erikd, bgamari, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2437
* Move stat_startGCSyncBartosz Nitka2016-07-271-0/+2
| | | | | | | | | | | | | | @simonmar told me that it makes more sense this way. Test Plan: it still builds Reviewers: bgamari, austin, simonmar, erikd Reviewed By: simonmar, erikd Subscribers: thomie, simonmar Differential Revision: https://phabricator.haskell.org/D2428
* Fix double-free in T5644 (#12208)Simon Marlow2016-06-201-2/+2
|
* NUMA supportSimon Marlow2016-06-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-4/+4
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* Fix ASSERT failure and re-enable setnumcapabilities001Simon Marlow2016-05-111-2/+5
| | | | | | | | | | | | The assertion failure was fairly benign, I think, but this fixes it. I've been running the test repeatedly for the last 30 mins and it hasn't triggered. There are other problems exposed by this test (see #12038), but I've worked around those in the test itself for now. I also copied the relevant bits of the parallel library here so that we don't need parallel for the test to run.
* Fix a crash in requestSync()Simon Marlow2016-05-101-9/+14
| | | | | | | | It was possible for a thread to read invalid memory after a conflict when multiple threads were synchronising. I haven't been successful in constructing a test case that triggers this, but we have some internal code that ran into it.
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-26/+26
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* schedulePushWork: avoid unnecessary wakeupsSimon Marlow2016-05-041-7/+25
| | | | | | | | | | | | | | | | | | | | This function had some pathalogically bad behaviour: if we had 2 threads on the current capability and 23 other idle capabilities, we would * grab all 23 capabilities * migrate one Haskell thread to one of them * wake up a worker on *all* 23 other capabilities. This lead to a lot of unnecessary wakeups when using large -N values. Now, we * Count how many capabilities we need to wake up * Start from cap->no+1, so that we don't overload low-numbered capabilities * Only wake up capabilities that we migrated a thread to (unless we have sparks to steal) This results in a pretty dramatic improvement in our production system.
* Allow limiting the number of GC threads (+RTS -qn<n>)Simon Marlow2016-05-041-85/+192
| | | | | | | | | | | | | | | | | | This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
* Refactor comments about shutdownSimon Marlow2016-04-101-15/+19
|
* rts: mark 'removeFromRunQueue' as staticSergei Trofimovich2016-02-071-1/+1
| | | | | | | | Noticed by uselex.rb: removeFromRunQueue: [R]: exported from: ./rts/dist/build/Schedule.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* RTS: Rename InCall.stat struct field to .rstatHerbert Valerio Riedel2015-12-041-6/+6
| | | | | | | | | | | On AIX, C system headers can redirect the token `stat` via #define stat stat64 to provide large-file support. Simply avoiding the use of `stat` as an identifier eschews macro-replacement. Differential Revision: https://phabricator.haskell.org/D1566
* rts/Schedule.c: remove unused variableÖmer Sinan Ağacan2015-10-221-6/+0
| | | | Differential Revision: https://phabricator.haskell.org/D1348
* Don't get a new nursery if we exceeded large_alloc_limSimon Marlow2015-07-151-12/+19
| | | | | | | | | | | | | | | Summary: When using nursery chunks, if we failed a heap check due to large_alloc_lim, we would pointlessly keep grabbing new nursery chunks when we should just GC immediately. Test Plan: validate Reviewers: austin, bgamari, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1072
* Delete the WayPar wayThomas Miedema2015-07-101-25/+2
| | | | | | Also remove 't' and 's' from ALL_WAYS; they don't exist. Differential Revision: https://phabricator.haskell.org/D1055
* Fix deadlock (#10545)Simon Marlow2015-06-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yieldCapability() was not prepared to be called by a Task that is not either a worker or a bound Task. This could happen if we ended up in yieldCapability via this call stack: performGC() scheduleDoGC() requestSync() yieldCapability() and there were a few other ways this could happen via requestSync. The fix is to handle this case in yieldCapability(): when the Task is not a worker or a bound Task, we put it on the returning_workers queue, where it will be woken up again. Summary of changes: * `yieldCapability`: factored out subroutine waitForWorkerCapability` * `waitForReturnCapability` renamed to `waitForCapability`, and factored out subroutine `waitForReturnCapability` * `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does not take a lock and no longer tests if `!isBoundTask()` * `yieldCapability` adjusted for refactorings, only change in behavior is when it is not a worker or bound task. Test Plan: * new test concurrent/should_run/performGC * validate Reviewers: niteria, austin, ezyang, bgamari Subscribers: thomie, bgamari Differential Revision: https://phabricator.haskell.org/D997 GHC Trac Issues: #10545