summaryrefslogtreecommitdiff
path: root/rts/Threads.c
Commit message (Collapse)AuthorAgeFilesLines
* winio: Add IOPort synchronization primitiveTamar Christina2020-07-151-0/+5
|
* Use pointer equality in Eq/Ord for ThreadIdRoland Zumkeller2019-11-191-5/+20
| | | | | | | | | | Changes (==) to use only pointer equality. This is safe because two threads are the same iff they have the same id. Changes `compare` to check pointer equality first and fall back on ids only in case of inequality. See discussion in #16761.
* Changing Thread IDs from 32 bits to 64 bits.Roland Zumkeller2019-11-191-3/+3
|
* nonmoving: Drop redundant write barrier on stack underflowBen Gamari2019-11-191-10/+0
| | | | | | | | | | | | | Previously we would push stack-carried return values to the new stack on a stack overflow. While the precise reasoning for this barrier is unfortunately lost to history, in hindsight I suspect it was prompted by a missing barrier elsewhere (that has been since fixed). Moreover, there the redundant barrier is actively harmful: the stack may contain non-pointer values; blindly pushing these to the mark queue will result in a crash. This is precisely what happened in the `stack003` test. However, because of a (now fixed) deficiency in the test this crash did not trigger on amd64.
* Nonmoving: Ensure write barrier vanishes in non-threaded RTSBen Gamari2019-10-211-1/+1
|
* rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-201-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* rts: Give stack flags proper macrosBen Gamari2019-10-181-2/+2
| | | | | This were previously quite unclear and will change a bit under the non-moving collector so let's clear this up now.
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Remove redundant slop zeroingÖmer Sinan Ağacan2018-09-211-6/+0
| | | | OVERWRITE_INFO already does zero slopping by calling OVERWRITING_CLOSURE
* Fix deadlock between STM and throwToSimon Marlow2018-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a lock-order reversal between lockTSO() and the TVar lock, see #15136 for the details. It turns out we can fix this pretty easily by just deleting all the locking code(!). The principle for unblocking a `BlockedOnSTM` thread then becomes the same as for other kinds of blocking: if the TSO belongs to this capability then we do it directly, otherwise we send a message to the capability that owns the TSO. That is, a thread blocked on STM is owned by its capability, as it should be. The possible downside of this is that we might send multiple messages to wake up a thread when the thread is on another capability. This is safe, it's just not very efficient. I'll try to do some experiments to see if this is a problem. Test Plan: Test case from #15136 doesn't deadlock any more. Reviewers: bgamari, osa1, erikd Reviewed By: osa1 Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15136 Differential Revision: https://phabricator.haskell.org/D4956
* Improve accuracy of get/setAllocationCounterBen Gamari2018-03-191-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: hvr, erikd, simonmar, jrtc27, trommler Reviewed By: simonmar Subscribers: trommler, jrtc27, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4363
* rts: Add format attribute to barfBen Gamari2018-02-061-1/+1
| | | | | | | | | | | | Test Plan: Validate Reviewers: erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4374
* Revert "Improve accuracy of get/setAllocationCounter"Ben Gamari2018-01-181-1/+12
| | | | This reverts commit a1a689dda48113f3735834350fb562bb1927a633.
* Improve accuracy of get/setAllocationCounterSimon Marlow2018-01-081-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: niteria, bgamari, hvr, erikd Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4288
* updateThunk: indirectee can be taggedJames Clarke2017-10-161-1/+1
| | | | | | | | | | Reviewers: austin, bgamari, erikd, simonmar, trofi Reviewed By: trofi Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4100
* Fix calculation in threadStackOverflowSimon Marlow2017-10-161-2/+2
| | | | | | | | | | | | | | | Summary: The calculation was too conservative, and could result in copying zero frames into the new stack chunk, which caused a knock-on failure in the interpreter. Test Plan: Tested on an in-house repro (not shareable, unfortunately) Reviewers: niteria, bgamari, austin, erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4052
* A bunch of typofixesGabor Greif2017-09-261-1/+1
|
* Prefer #if defined to #ifdefBen Gamari2017-04-281-3/+3
| | | | Our new CPP linter enforces this.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-281-1/+1
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Revert "Enable new warning for fragile/incorrect CPP #if usage"Ben Gamari2017-04-051-1/+1
| | | | | | | | This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-051-1/+1
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Use C99's boolBen Gamari2016-11-291-13/+13
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Add hs_try_putmvar()Simon Marlow2016-09-121-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a fast, non-blocking, asynchronous, interface to tryPutMVar that can be called from C/C++. It's useful for callback-based C/C++ APIs: the idea is that the callback invokes hs_try_putmvar(), and the Haskell code waits for the callback to run by blocking in takeMVar. The callback doesn't block - this is often a requirement of callback-based APIs. The callback wakes up the Haskell thread with minimal overhead and no unnecessary context-switches. There are a couple of benchmarks in testsuite/tests/concurrent/should_run. Some example results comparing hs_try_putmvar() with using a standard foreign export: ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s hs_try_putmvar() is 4x faster for this workload (see the source for hs_try_putmvar003.hs for details of the workload). An alternative solution is to use the IO Manager for this. We've tried it, but there are problems with that approach: * Need to create a new file descriptor for each callback * The IO Manger thread(s) become a bottleneck * More potential for things to go wrong, e.g. throwing an exception in an IO Manager callback kills the IO Manager thread. Test Plan: validate; new unit tests Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2501
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-4/+4
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* rts: mark 'wakeBlockingQueue' as staticSergei Trofimovich2016-02-071-1/+1
| | | | | | | | Noticed by uselex.rb: wakeBlockingQueue: [R]: exported from: ./rts/dist/build/Threads.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* Account for stack allocation in the thread's allocation counterSimon Marlow2015-09-141-0/+8
| | | | | | | | | | | | Summary: (see comment for details) Test Plan: validate Reviewers: bgamari, ezyang, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1243
* Do not copy stack after stack overflow, refix #8435Flaviu Andrei Csernik (archblob)2015-06-121-0/+1
| | | | | | | | | | | | | | | | | | Summary: This was reverted in d70b19bfb5ed79b22c2ac31e22f46782fc47a117 and is a part of the reason for #10445. Test Plan: validate Reviewers: ezyang, simonmar, austin Reviewed By: simonmar, austin Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D938 GHC Trac Issues: #8435
* Remove comments and flag for GranSimThomas Miedema2015-03-191-2/+0
| | | | | | | | | The GranSim code was removed in dd56e9ab and 297b05a9 in 2009, and perhaps other commits I couldn't find. Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D737
* fix bus errors on SPARC caused by unalignment access to alloc_limit (fixes ↵Karel Gardas2015-02-231-3/+3
| | | | | | | | | | #10043) Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D657
* Per-thread allocation counters and limitsSimon Marlow2014-11-121-48/+29
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* [skip ci] rts: Detabify Threads.cAustin Seipp2014-10-211-58/+58
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* panic message fixSimon Marlow2014-08-011-1/+1
|
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "Per-thread allocation counters and limits"Simon Marlow2014-05-041-29/+48
| | | | | | | | Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
* Per-thread allocation counters and limitsSimon Marlow2014-05-021-48/+29
| | | | | | | | | | | | | | | | | | | | | | | This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
* rts: Allow for infinite stack sizeBen Gamari2013-10-251-1/+2
| | | | This is encoded as RtsFlags.GcFlags.maxStkSize == 0.
* Do not copy stack after stack overflow, fixes #8435Edward Z. Yang2013-10-111-0/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* If exceptions are blocked, add stack overflow to blocked exceptions list. ↵Edward Z. Yang2013-10-111-14/+52
| | | | | | Fixes #8303. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Typofix.Edward Z. Yang2013-10-101-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Clarify the TSO_SQUEEZED check.Edward Z. Yang2013-10-101-0/+8
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* s/pathalogical/pathological/Edward Z. Yang2013-10-031-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Revert "Default to infinite stack size (#8189)"Austin Seipp2013-09-081-8/+2
| | | | This reverts commit d85044f6b201eae0a9e453b89c0433608e0778f0.
* Default to infinite stack size (#8189)Austin Seipp2013-09-081-2/+8
| | | | | | | | | | | | | When servicing a stack overflows, only throw an exception to the given thread if the user explicitly set a max stack size, using +RTS -K. Otherwise just service it normally and grow the stack. In case we actually run out of *heap* (stack chuncks are allocated on the heap), then we need to bail by calling the stackOverflow() hook and exit immediately. Authored-by: Ben Gamari <bgamari.foss@gmail.com> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Don't move Capabilities in setNumCapabilities (#8209)Simon Marlow2013-09-041-1/+1
| | | | | | | | | | | | | We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
* Implement atomicReadMVar, fixing #4001.Edward Z. Yang2013-07-091-0/+4
| | | | | | | | | We add the invariant to the MVar blocked threads queue that threads blocked on an atomic read are always at the front of the queue. This invariant is easy to maintain, since takers are only ever added to the end of the queue. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* More accurate cost attribution for stacks. Fixes #7818.Edward Z. Yang2013-04-221-2/+2
| | | | | | | | Previously, stacks were always attributed to CCCS_SYSTEM. Now, we attribute them to the CCS when the stack was allocated. If a stack grows, new stack chunks inherit the CCS of the old stack. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Lots of nat -> StgWord changesSimon Marlow2012-09-071-4/+4
|
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-4/+4
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* Fix crash with tiny initial stack size (#5993)Simon Marlow2012-04-121-2/+2
|