summaryrefslogtreecommitdiff
path: root/rts/STM.c
Commit message (Collapse)AuthorAgeFilesLines
* Make `PosixSource.h` installed and under `rts/`John Ericson2021-08-091-1/+1
| | | | | | is used outside of the rts so we do this rather than just fish it out of the repo in ad-hoc way, in order to make packages in this repo more self-contained.
* rts/stm: Strengthen orderings to SEQ_CST instead of volatilewip/tsan/stmBen Gamari2020-10-241-20/+20
| | | | | | | | | | Previously the `current_value`, `first_watch_queue_entry`, and `num_updates` fields of `StgTVar` were marked as `volatile` in an attempt to provide strong ordering. Of course, this isn't sufficient. We now use proper atomic operations. In most of these cases I strengthen the ordering all the way to SEQ_CST although it's possible that some could be weakened with some thought.
* rts/STM: Use atomicsBen Gamari2020-10-241-27/+45
| | | | | | | | This fixes a potentially harmful race where we failed to synchronize before looking at a TVar's current_value. Also did a bit of refactoring to avoid abstract over management of max_commits.
* rts/nonmoving: Add missing STM write barrierBen Gamari2020-09-181-0/+3
| | | | | | When updating a TRec for a TVar already part of a transaction we previously neglected to add the old value to the update remembered set. I suspect this was the cause of #18587.
* Fix more typos, via an improved Levenshtein-style correctorBrian Wignall2020-01-121-2/+2
|
* Nonmoving: Ensure write barrier vanishes in non-threaded RTSBen Gamari2019-10-211-4/+7
|
* rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-201-13/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* Revert incorrect STM wakeup optimisationÖmer Sinan Ağacan2018-09-111-11/+6
| | | | | | | | | | | | Summary: (see the comments) Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5144
* Optimise wakeups for STMSimon Marlow2018-07-141-1/+23
| | | | | | | | | | | | | | | | | | | Avoids repeated wakeup messages being sent when a TVar is written to multiple times. See comments for details. Test Plan: * Test from #15136 (will be added to stm shortly) * existing stm tests Reviewers: bgamari, osa1, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15136 Differential Revision: https://phabricator.haskell.org/D4961
* Fix deadlock between STM and throwToSimon Marlow2018-07-121-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a lock-order reversal between lockTSO() and the TVar lock, see #15136 for the details. It turns out we can fix this pretty easily by just deleting all the locking code(!). The principle for unblocking a `BlockedOnSTM` thread then becomes the same as for other kinds of blocking: if the TSO belongs to this capability then we do it directly, otherwise we send a message to the capability that owns the TSO. That is, a thread blocked on STM is owned by its capability, as it should be. The possible downside of this is that we might send multiple messages to wake up a thread when the thread is on another capability. This is safe, it's just not very efficient. I'll try to do some experiments to see if this is a problem. Test Plan: Test case from #15136 doesn't deadlock any more. Reviewers: bgamari, osa1, erikd Reviewed By: osa1 Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15136 Differential Revision: https://phabricator.haskell.org/D4956
* rts: Rip out support for STM invariantsBen Gamari2018-06-021-342/+5
| | | | | | | | | | | | | | | | | | | | | | | This feature has some very serious correctness issues (#14310), introduces a great deal of complexity, and hasn't seen wide usage. Consequently we are removing it, as proposed in Proposal #77 [1]. This is heavily based on a patch from fryguybob. Updates stm submodule. [1] https://github.com/ghc-proposals/ghc-proposals/pull/77 Test Plan: Validate Reviewers: erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14310 Differential Revision: https://phabricator.haskell.org/D4760
* Prefer #if defined to #ifdefBen Gamari2017-04-281-1/+1
| | | | Our new CPP linter enforces this.
* Typos in comments [ci skip]Gabor Greif2017-04-051-2/+2
|
* Use C99's boolBen Gamari2016-11-291-67/+52
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* NUMA supportSimon Marlow2016-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* Typos in comments, etc.Gabor Greif2016-02-261-2/+2
|
* rts: Remove space before argument list in ASSERTsBen Gamari2015-12-071-48/+48
| | | | | | | | | | Test Plan: Validate Reviewers: austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1569
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-2/+2
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: reflow some comments in STM.cAustin Seipp2014-08-201-34/+34
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: detabify/dewhitespace STM.cAustin Seipp2014-08-201-199/+199
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Fix loop on 64bit Big-Endian platforms (#8134)Austin Seipp2013-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fun one. In the RTS, `cas` expects a pointer to StgWord which will translate to unsigned long (8 bytes under LP64.) But we had previously declared token_locked as *StgBool* - which evaluates to 'int' (4 bytes under LP64.) That means we fail to provide enough storage for the cas primitive, causing it to corrupt memory on a 64bit platform. Hilariously, this somehow did not affect little-endian platforms (ARM, x86, etc) before. That's because to clear our lock token, we would say: token_locked = 0; But because token_locked is 32bits technically, this only writes to half of the 64bit quantity. On a Big-Endian machine, this won't do anything. That is, token_locked starts as 0: / token_locked | v 0x00000000 and the first cas modifies the memory to: / valid / corrupted | | v v 0x00000000 0x00000001 We then clear token_locked, but this doesn't change the corrupted 4 bytes of memory. And then we try to lock the token again, spinning until it is released - clearly a deadlock. Related: Windows (amd64) doesn't follow LP64, but LLP64, where both int and long are 4 bytes, so this shouldn't change anything on these platforms. Thanks to Reid Barton for helping the diagnosis. Also, thanks to Jens Peterson who confirmed this also fixes building GHC on Fedora/ppc64 and Fedora/s390x. Authored-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* Minor typos (fixes #8496)Kirill Boltaev2013-11-011-3/+3
|
* Check to see if TVar's are locked in check_read_only (fixes #7815)Ryan Yates2013-04-171-2/+6
|
* fix warningsSimon Marlow2013-01-301-2/+3
|
* STM: Only wake up onceBen Gamari2013-01-301-5/+11
| | | | | | | | | | | Previously, threads blocked on an STM retry would be sent a wakeup message each time an unpark was requested. This could result in the accumulation of a large number of wake-up messages, which would slow wake-up once the sleeping thread is finally scheduled. Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO which has been sent a wake-up message, allowing us to send only one wakeup.
* A better fix for #7493 (see comment for details)Simon Marlow2012-12-181-20/+44
|
* Revert "Fix a bug in the handling of nested orElse"Simon Marlow2012-12-181-21/+3
| | | | | | This reverts commit f184d9caffa09750ef6a374a7987b9213d6db28e. The next commit will fix it in a better way.
* Fix a bug in the handling of nested orElseSimon Marlow2012-12-101-3/+21
| | | | | | | | | | | | | | Exposed by the following snippet, courtesy of Bas van Dijk and Patrick Palka on libraries@haskell.org: import Control.Concurrent.STM main = do x <- atomically $ do t <- newTVar 1 writeTVar t 2 ((readTVar t >> retry) `orElse` return ()) `orElse` return () readTVar t print x
* Add a write barrier for TVAR closuresSimon Marlow2012-11-161-25/+38
| | | | | | | | | | This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
* small optimisation: inline stmNewTVar()Simon Marlow2012-11-051-15/+0
|
* Fix gcc 4.6 warnings; fixes #5176Ian Lynagh2011-06-251-2/+2
| | | | | | | | | | | Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
* Refactoring and tidy upSimon Marlow2011-04-111-9/+4
| | | | | | | | | | | | This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
* stmAddInvariantToCheck: add missing init of invariant->lock (#4057)Simon Marlow2010-06-151-0/+1
|
* New implementation of BLACKHOLEsSimon Marlow2010-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
* Use message-passing to implement throwTo in the RTSSimon Marlow2010-03-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces some complicated locking schemes with message-passing in the implementation of throwTo. The benefits are - previously it was impossible to guarantee that a throwTo from a thread running on one CPU to a thread running on another CPU would be noticed, and we had to rely on the GC to pick up these forgotten exceptions. This no longer happens. - the locking regime is simpler (though the code is about the same size) - threads can be unblocked from a blocked_exceptions queue without having to traverse the whole queue now. It's a rare case, but replaces an O(n) operation with an O(1). - generally we move in the direction of sharing less between Capabilities (aka HECs), which will become important with other changes we have planned. Also in this patch I replaced several STM-specific closure types with a generic MUT_PRIM closure type, which allowed a lot of code in the GC and other places to go away, hence the line-count reduction. The message-passing changes resulted in about a net zero line-count difference.
* Make allocatePinned use local storage, and other refactoringsSimon Marlow2009-12-011-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
* micro-opt: replace stmGetEnclosingTRec() with a field accessSimon Marlow2009-10-141-10/+0
| | | | | While fixing #3578 I noticed that this function was just a field access to StgTRecHeader, so I inlined it manually.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* Strip tag bits from closure pointers before trying to deference them.Ben.Lippmeier@anu.edu.au2009-02-241-2/+2
|
* Fix parse error with older gccs (#2752)Simon Marlow2008-11-111-1/+1
|
* When waking up thread blocked on TVars, wake oldest first (#2319)Josef Svenningsson2008-10-101-2/+10
| | | | | | | | StgTVarWatchQueue contains the threads blocked on a TVar in order youngest first. The list has to be traversed backwards to unpark the threads oldest first. This improves the fairness when using STM in some situations.
* fix warnings with gcc 4.3Simon Marlow2008-06-181-2/+2
|
* Fix building RTS with gcc 2.*; declare all variables at the top of a blockIan Lynagh2007-09-031-15/+23
| | | | Patch from Audrey Tang.
* Split GC.c, and move storage manager into sm/ directorySimon Marlow2006-10-241-1/+1
| | | | | | | | | | | | | | | | | In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.
* fix a printf format warningSimon Marlow2006-10-241-1/+1
|
* STM invariantstharris@microsoft.com2006-10-071-170/+573
|