summaryrefslogtreecommitdiff
path: root/rts/Schedule.c
Commit message (Collapse)AuthorAgeFilesLines
* Install toplevel handler inside fork.Alexander Vershilov2016-12-021-1/+4
| | | | | | | | | | | | | | | | | | | | When rts is forked it doesn't update toplevel handler, so UserInterrupt exception is sent to Thread1 that doesn't exist in forked process. We install toplevel handler when fork so signal will be delivered to the new main thread. Fixes #12903 Reviewers: simonmar, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2770 GHC Trac Issues: #12903
* Use C99's boolBen Gamari2016-11-291-77/+77
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Spelling in comment onlyGabor Greif2016-11-181-1/+1
|
* Fix a bug in parallel GC synchronisationSimon Marlow2016-10-291-14/+16
| | | | | | | | | | | | | | | | | | | | | Summary: The problem boils down to global variables: in particular gc_threads[], which was being modified by a subsequent GC before the previous GC had finished with it. The fix is to not use global variables. This was causing setnumcapabilities001 to fail (again!). It's an old bug though. Test Plan: Ran setnumcapabilities001 in a loop for a couple of hours. Before this patch it had been failing after a few minutes. Not a very scientific test, but it's the best I have. Reviewers: bgamari, austin, fryguybob, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2654
* Make it possible to use +RTS -qn without -NSimon Marlow2016-10-281-1/+5
| | | | | | It's entirely reasonable to set +RTS -qn without setting -N, because the program might later call setNumCapabilities. If we disallow it, there's no way to use -qn on programs that use setNumCapabilities.
* Fix failure in setnumcapabilities001 (#12728)Simon Marlow2016-10-221-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | The value of enabled_capabilities can change across a call to requestSync(), and we were erroneously using an old value, causing things to go wrong later. It manifested as an assertion failure, I'm not sure whether there are worse consequences or not, but we should get this fix into 8.0.2 anyway. The failure didn't happen for me because it only shows up on machines with fewer than 4 processors, due to the new logic to enable -qn automatically. I've bumped the test parameter 8 to make it more likely to exercise that code. Test Plan: Ran setnumcapabilities001 many times Reviewers: niteria, austin, erikd, rwbarton, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2617 GHC Trac Issues: #12728
* Default +RTS -qn to the number of coresSimon Marlow2016-10-091-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting a -N value that is too large has a dramatic negative effect on performance, but the new -qn flag can mitigate the worst of the effects by limiting the number of GC threads. So now, if you don't explcitly set +RTS -qn, and you set -N larger than the number of cores (or use setNumCapabilities to do the same), we'll default -qn to the number of cores. These are the results from nofib/parallel on my 4-core (2 cores x 2 threads) i7 laptop, comparing -N8 before and after this change. ``` ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ blackscholes +0.0% +0.0% -72.5% -72.0% +9.5% coins +0.0% -0.0% -73.7% -72.2% -0.8% mandel +0.0% +0.0% -76.4% -75.4% +3.3% matmult +0.0% +15.5% -26.8% -33.4% +1.0% nbody +0.0% +2.4% +0.7% 0.076 0.0% parfib +0.0% -8.5% -33.2% -31.5% +2.0% partree +0.0% -0.0% -60.4% -56.8% +5.7% prsa +0.0% -0.0% -65.4% -60.4% 0.0% queens +0.0% +0.2% -58.8% -58.8% -1.5% ray +0.0% -1.5% -88.7% -85.6% -3.6% sumeuler +0.0% -0.0% -47.8% -46.9% 0.0% ------------------------------------------------------------------------ Min +0.0% -8.5% -88.7% -85.6% -3.6% Max +0.0% +15.5% +0.7% -31.5% +9.5% Geometric Mean +0.0% +0.6% -61.4% -63.1% +1.4% ``` Test Plan: validate, nofib/parallel benchmarks Reviewers: niteria, ezyang, nh2, austin, erikd, trofi, bgamari Reviewed By: trofi, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2580 GHC Trac Issues: #9221
* Add hs_try_putmvar()Simon Marlow2016-09-121-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a fast, non-blocking, asynchronous, interface to tryPutMVar that can be called from C/C++. It's useful for callback-based C/C++ APIs: the idea is that the callback invokes hs_try_putmvar(), and the Haskell code waits for the callback to run by blocking in takeMVar. The callback doesn't block - this is often a requirement of callback-based APIs. The callback wakes up the Haskell thread with minimal overhead and no unnecessary context-switches. There are a couple of benchmarks in testsuite/tests/concurrent/should_run. Some example results comparing hs_try_putmvar() with using a standard foreign export: ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s hs_try_putmvar() is 4x faster for this workload (see the source for hs_try_putmvar003.hs for details of the workload). An alternative solution is to use the IO Manager for this. We've tried it, but there are problems with that approach: * Need to create a new file descriptor for each callback * The IO Manger thread(s) become a bottleneck * More potential for things to go wrong, e.g. throwing an exception in an IO Manager callback kills the IO Manager thread. Test Plan: validate; new unit tests Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2501
* Another try to get thread migration rightSimon Marlow2016-08-051-99/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is surprisingly tricky. There were linked list bugs in the previous version (D2430) that showed up as a test failure in setnumcapabilities001 (that's a great stress test!). This new version uses a different strategy that doesn't suffer from the problem that @ezyang pointed out in D2430. We now pre-calculate how many threads to keep for this capability, and then migrate any surplus threads off the front of the queue, taking care to account for threads that can't be migrated. Test Plan: 1. setnumcapabilities001 stress test with sanity checking (+RTS -DS) turned on: ``` cd testsuite/tests/concurrent/should_run make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0 while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 || break; done ``` 2. The test case from #12419 Reviewers: niteria, ezyang, rwbarton, austin, bgamari, erikd Subscribers: thomie, ezyang Differential Revision: https://phabricator.haskell.org/D2441 GHC Trac Issues: #12419
* Fix to thread migrationSimon Marlow2016-08-031-24/+63
| | | | | | | | | | | | | | | | | | | | | | Summary: If we had 2 threads on the run queue, say [A,B], and B is bound to the current Task, then we would fail to migrate any threads. This fixes it so that we would migrate A in that case. This will help parallelism a bit in programs that have lots of bound threads. Test Plan: Test program in #12419, which is actually not a great program but it does behave a bit better after this change. Reviewers: ezyang, niteria, bgamari, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2430 GHC Trac Issues: #12419
* Track the lengths of the thread queuesSimon Marlow2016-08-031-22/+13
| | | | | | | | | | | | | | | Summary: Knowing the length of the run queue in O(1) time is useful: for example we don't have to traverse the run queue to know how many threads we have to migrate in schedulePushWork(). Test Plan: validate Reviewers: ezyang, erikd, bgamari, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2437
* Move stat_startGCSyncBartosz Nitka2016-07-271-0/+2
| | | | | | | | | | | | | | @simonmar told me that it makes more sense this way. Test Plan: it still builds Reviewers: bgamari, austin, simonmar, erikd Reviewed By: simonmar, erikd Subscribers: thomie, simonmar Differential Revision: https://phabricator.haskell.org/D2428
* Fix double-free in T5644 (#12208)Simon Marlow2016-06-201-2/+2
|
* NUMA supportSimon Marlow2016-06-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-4/+4
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* Fix ASSERT failure and re-enable setnumcapabilities001Simon Marlow2016-05-111-2/+5
| | | | | | | | | | | | The assertion failure was fairly benign, I think, but this fixes it. I've been running the test repeatedly for the last 30 mins and it hasn't triggered. There are other problems exposed by this test (see #12038), but I've worked around those in the test itself for now. I also copied the relevant bits of the parallel library here so that we don't need parallel for the test to run.
* Fix a crash in requestSync()Simon Marlow2016-05-101-9/+14
| | | | | | | | It was possible for a thread to read invalid memory after a conflict when multiple threads were synchronising. I haven't been successful in constructing a test case that triggers this, but we have some internal code that ran into it.
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-26/+26
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* schedulePushWork: avoid unnecessary wakeupsSimon Marlow2016-05-041-7/+25
| | | | | | | | | | | | | | | | | | | | This function had some pathalogically bad behaviour: if we had 2 threads on the current capability and 23 other idle capabilities, we would * grab all 23 capabilities * migrate one Haskell thread to one of them * wake up a worker on *all* 23 other capabilities. This lead to a lot of unnecessary wakeups when using large -N values. Now, we * Count how many capabilities we need to wake up * Start from cap->no+1, so that we don't overload low-numbered capabilities * Only wake up capabilities that we migrated a thread to (unless we have sparks to steal) This results in a pretty dramatic improvement in our production system.
* Allow limiting the number of GC threads (+RTS -qn<n>)Simon Marlow2016-05-041-85/+192
| | | | | | | | | | | | | | | | | | This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
* Refactor comments about shutdownSimon Marlow2016-04-101-15/+19
|
* rts: mark 'removeFromRunQueue' as staticSergei Trofimovich2016-02-071-1/+1
| | | | | | | | Noticed by uselex.rb: removeFromRunQueue: [R]: exported from: ./rts/dist/build/Schedule.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* RTS: Rename InCall.stat struct field to .rstatHerbert Valerio Riedel2015-12-041-6/+6
| | | | | | | | | | | On AIX, C system headers can redirect the token `stat` via #define stat stat64 to provide large-file support. Simply avoiding the use of `stat` as an identifier eschews macro-replacement. Differential Revision: https://phabricator.haskell.org/D1566
* rts/Schedule.c: remove unused variableÖmer Sinan Ağacan2015-10-221-6/+0
| | | | Differential Revision: https://phabricator.haskell.org/D1348
* Don't get a new nursery if we exceeded large_alloc_limSimon Marlow2015-07-151-12/+19
| | | | | | | | | | | | | | | Summary: When using nursery chunks, if we failed a heap check due to large_alloc_lim, we would pointlessly keep grabbing new nursery chunks when we should just GC immediately. Test Plan: validate Reviewers: austin, bgamari, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1072
* Delete the WayPar wayThomas Miedema2015-07-101-25/+2
| | | | | | Also remove 't' and 's' from ALL_WAYS; they don't exist. Differential Revision: https://phabricator.haskell.org/D1055
* Fix deadlock (#10545)Simon Marlow2015-06-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yieldCapability() was not prepared to be called by a Task that is not either a worker or a bound Task. This could happen if we ended up in yieldCapability via this call stack: performGC() scheduleDoGC() requestSync() yieldCapability() and there were a few other ways this could happen via requestSync. The fix is to handle this case in yieldCapability(): when the Task is not a worker or a bound Task, we put it on the returning_workers queue, where it will be woken up again. Summary of changes: * `yieldCapability`: factored out subroutine waitForWorkerCapability` * `waitForReturnCapability` renamed to `waitForCapability`, and factored out subroutine `waitForReturnCapability` * `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does not take a lock and no longer tests if `!isBoundTask()` * `yieldCapability` adjusted for refactorings, only change in behavior is when it is not a worker or bound task. Test Plan: * new test concurrent/should_run/performGC * validate Reviewers: niteria, austin, ezyang, bgamari Subscribers: thomie, bgamari Differential Revision: https://phabricator.haskell.org/D997 GHC Trac Issues: #10545
* Don't call DEAD_WEAK finalizer again on shutdown (#7170)Simon Marlow2015-06-011-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There's a race condition like this: # A foreign pointer gets promoted to the last generation # It has its finalizer called manually # We start shutting down the runtime in `hs_exit_` from the main thread # A minor GC starts running (`scheduleDoGC`) on one of the threads # The minor GC notices that we're in `SCHED_INTERRUPTING` state and advances to `SCHED_SHUTTING_DOWN` # The main thread tries to do major GC (with `scheduleDoGC`), but it exits early because we're in `SCHED_SHUTTING_DOWN` state # We end up with a `DEAD_WEAK` left on the list of weak pointers of the last generation, because it relied on major GC removing it from that list This change: * Ignores DEAD_WEAK finalizers when shutting down * Makes the major GC on shutdown more likely * Fixes a bogus assert Test Plan: before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5 reproduced and after it doesn't Reviewers: ezyang, austin, simonmar Reviewed By: simonmar Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D921 GHC Trac Issues: #7170
* fix bus errors on SPARC caused by unalignment access to alloc_limit (fixes ↵Karel Gardas2015-02-231-3/+3
| | | | | | | | | | #10043) Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D657
* Add +RTS -n<size>: divide the nursery into chunksSimon Marlow2014-11-251-0/+6
| | | | See the documentation for details.
* Make clearNursery freeSimon Marlow2014-11-251-11/+7
| | | | | | | | | | | | | | | | | | | | | Summary: clearNursery resets all the bd->free pointers of nursery blocks to make the blocks empty. In profiles we've seen clearNursery taking significant amounts of time particularly with large -N and -A values. This patch moves the work of clearNursery to the point at which we actually need the new block, thereby introducing an invariant that blocks to the right of the CurrentNursery pointer still need their bd->free pointer reset. This should make things faster overall, because we don't need to clear blocks that we don't use. Test Plan: validate Reviewers: AndreasVoellmy, ezyang, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D318
* Add missing semicolon in Schedule.cSimon Peyton Jones2014-11-181-1/+1
| | | | I think this went wrong in 2a6f193b
* Fix a bug introduced with allocation countersSimon Marlow2014-11-171-0/+3
|
* Per-thread allocation counters and limitsSimon Marlow2014-11-121-0/+19
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* [skip ci] rts: Detabify Schedule.cAustin Seipp2014-10-211-443/+443
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* Terminate in forkProcess like in real_mainEdsko de Vries2014-08-041-2/+1
| | | | | | | | | | | | | | Test Plan: validate Reviewers: simonmar, austin Reviewed By: simonmar, austin Subscribers: phaskell, simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D99 GHC Trac Issues: #9377
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Acquire all_tasks_mutex in forkProcessEdsko de Vries2014-07-131-1/+12
| | | | | | | | | | | | | | Summary: (for the same reason that we acquire all the other mutexes) Test Plan: validate Reviewers: simonmar, austin, duncan Reviewed By: simonmar, austin, duncan Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D60
* Revert "Per-thread allocation counters and limits"Simon Marlow2014-05-041-19/+0
| | | | | | | | Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
* Per-thread allocation counters and limitsSimon Marlow2014-05-021-0/+19
| | | | | | | | | | | | | | | | | | | | | | | This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
* rts: Fix typo in commentBen Gamari2013-10-251-1/+1
|
* s/Heep/Heap/Edward Z. Yang2013-10-031-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* use a nat, not StgWord8, for gc_typeSimon Marlow2013-10-011-1/+1
|
* Revert "Default to infinite stack size (#8189)"Austin Seipp2013-09-081-1/+1
| | | | This reverts commit d85044f6b201eae0a9e453b89c0433608e0778f0.
* Default to infinite stack size (#8189)Austin Seipp2013-09-081-1/+1
| | | | | | | | | | | | | When servicing a stack overflows, only throw an exception to the given thread if the user explicitly set a max stack size, using +RTS -K. Otherwise just service it normally and grow the stack. In case we actually run out of *heap* (stack chuncks are allocated on the heap), then we need to bail by calling the stackOverflow() hook and exit immediately. Authored-by: Ben Gamari <bgamari.foss@gmail.com> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Don't move Capabilities in setNumCapabilities (#8209)Simon Marlow2013-09-041-52/+35
| | | | | | | | | | | | | We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
* Implement atomicReadMVar, fixing #4001.Edward Z. Yang2013-07-091-0/+2
| | | | | | | | | We add the invariant to the MVar blocked threads queue that threads blocked on an atomic read are always at the front of the queue. This invariant is easy to maintain, since takers are only ever added to the end of the queue. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Fix segfault with STM; fixes #8035. Patch from errge.Ian Lynagh2013-07-071-1/+13
|
* Ensure gc_type is StgWord8.Austin Seipp2013-06-211-1/+1
| | | | | | | Again, the range of gc_type is actually 1-3, which is technically outside the range of rtsBool. Signed-off-by: Austin Seipp <aseipp@pobox.com>