summaryrefslogtreecommitdiff
path: root/rts/STM.c
Commit message (Collapse)AuthorAgeFilesLines
* Prefer #if defined to #ifdefBen Gamari2017-04-281-1/+1
| | | | Our new CPP linter enforces this.
* Typos in comments [ci skip]Gabor Greif2017-04-051-2/+2
|
* Use C99's boolBen Gamari2016-11-291-67/+52
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* NUMA supportSimon Marlow2016-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* Typos in comments, etc.Gabor Greif2016-02-261-2/+2
|
* rts: Remove space before argument list in ASSERTsBen Gamari2015-12-071-48/+48
| | | | | | | | | | Test Plan: Validate Reviewers: austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1569
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-2/+2
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: reflow some comments in STM.cAustin Seipp2014-08-201-34/+34
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: detabify/dewhitespace STM.cAustin Seipp2014-08-201-199/+199
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Fix loop on 64bit Big-Endian platforms (#8134)Austin Seipp2013-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fun one. In the RTS, `cas` expects a pointer to StgWord which will translate to unsigned long (8 bytes under LP64.) But we had previously declared token_locked as *StgBool* - which evaluates to 'int' (4 bytes under LP64.) That means we fail to provide enough storage for the cas primitive, causing it to corrupt memory on a 64bit platform. Hilariously, this somehow did not affect little-endian platforms (ARM, x86, etc) before. That's because to clear our lock token, we would say: token_locked = 0; But because token_locked is 32bits technically, this only writes to half of the 64bit quantity. On a Big-Endian machine, this won't do anything. That is, token_locked starts as 0: / token_locked | v 0x00000000 and the first cas modifies the memory to: / valid / corrupted | | v v 0x00000000 0x00000001 We then clear token_locked, but this doesn't change the corrupted 4 bytes of memory. And then we try to lock the token again, spinning until it is released - clearly a deadlock. Related: Windows (amd64) doesn't follow LP64, but LLP64, where both int and long are 4 bytes, so this shouldn't change anything on these platforms. Thanks to Reid Barton for helping the diagnosis. Also, thanks to Jens Peterson who confirmed this also fixes building GHC on Fedora/ppc64 and Fedora/s390x. Authored-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* Minor typos (fixes #8496)Kirill Boltaev2013-11-011-3/+3
|
* Check to see if TVar's are locked in check_read_only (fixes #7815)Ryan Yates2013-04-171-2/+6
|
* fix warningsSimon Marlow2013-01-301-2/+3
|
* STM: Only wake up onceBen Gamari2013-01-301-5/+11
| | | | | | | | | | | Previously, threads blocked on an STM retry would be sent a wakeup message each time an unpark was requested. This could result in the accumulation of a large number of wake-up messages, which would slow wake-up once the sleeping thread is finally scheduled. Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO which has been sent a wake-up message, allowing us to send only one wakeup.
* A better fix for #7493 (see comment for details)Simon Marlow2012-12-181-20/+44
|
* Revert "Fix a bug in the handling of nested orElse"Simon Marlow2012-12-181-21/+3
| | | | | | This reverts commit f184d9caffa09750ef6a374a7987b9213d6db28e. The next commit will fix it in a better way.
* Fix a bug in the handling of nested orElseSimon Marlow2012-12-101-3/+21
| | | | | | | | | | | | | | Exposed by the following snippet, courtesy of Bas van Dijk and Patrick Palka on libraries@haskell.org: import Control.Concurrent.STM main = do x <- atomically $ do t <- newTVar 1 writeTVar t 2 ((readTVar t >> retry) `orElse` return ()) `orElse` return () readTVar t print x
* Add a write barrier for TVAR closuresSimon Marlow2012-11-161-25/+38
| | | | | | | | | | This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
* small optimisation: inline stmNewTVar()Simon Marlow2012-11-051-15/+0
|
* Fix gcc 4.6 warnings; fixes #5176Ian Lynagh2011-06-251-2/+2
| | | | | | | | | | | Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
* Refactoring and tidy upSimon Marlow2011-04-111-9/+4
| | | | | | | | | | | | This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
* stmAddInvariantToCheck: add missing init of invariant->lock (#4057)Simon Marlow2010-06-151-0/+1
|
* New implementation of BLACKHOLEsSimon Marlow2010-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
* Use message-passing to implement throwTo in the RTSSimon Marlow2010-03-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces some complicated locking schemes with message-passing in the implementation of throwTo. The benefits are - previously it was impossible to guarantee that a throwTo from a thread running on one CPU to a thread running on another CPU would be noticed, and we had to rely on the GC to pick up these forgotten exceptions. This no longer happens. - the locking regime is simpler (though the code is about the same size) - threads can be unblocked from a blocked_exceptions queue without having to traverse the whole queue now. It's a rare case, but replaces an O(n) operation with an O(1). - generally we move in the direction of sharing less between Capabilities (aka HECs), which will become important with other changes we have planned. Also in this patch I replaced several STM-specific closure types with a generic MUT_PRIM closure type, which allowed a lot of code in the GC and other places to go away, hence the line-count reduction. The message-passing changes resulted in about a net zero line-count difference.
* Make allocatePinned use local storage, and other refactoringsSimon Marlow2009-12-011-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
* micro-opt: replace stmGetEnclosingTRec() with a field accessSimon Marlow2009-10-141-10/+0
| | | | | While fixing #3578 I noticed that this function was just a field access to StgTRecHeader, so I inlined it manually.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* Strip tag bits from closure pointers before trying to deference them.Ben.Lippmeier@anu.edu.au2009-02-241-2/+2
|
* Fix parse error with older gccs (#2752)Simon Marlow2008-11-111-1/+1
|
* When waking up thread blocked on TVars, wake oldest first (#2319)Josef Svenningsson2008-10-101-2/+10
| | | | | | | | StgTVarWatchQueue contains the threads blocked on a TVar in order youngest first. The list has to be traversed backwards to unpark the threads oldest first. This improves the fairness when using STM in some situations.
* fix warnings with gcc 4.3Simon Marlow2008-06-181-2/+2
|
* Fix building RTS with gcc 2.*; declare all variables at the top of a blockIan Lynagh2007-09-031-15/+23
| | | | Patch from Audrey Tang.
* Split GC.c, and move storage manager into sm/ directorySimon Marlow2006-10-241-1/+1
| | | | | | | | | | | | | | | | | In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.
* fix a printf format warningSimon Marlow2006-10-241-1/+1
|
* STM invariantstharris@microsoft.com2006-10-071-170/+573
|
* add STM support to the new throwTo mechanismSimon Marlow2006-06-161-9/+15
|
* New tracing interfaceSimon Marlow2006-06-081-10/+2
| | | | | | | | A simple interface for generating trace messages with timestamps and thread IDs attached to them. Most debugging output goes through this interface now, so it is straightforward to get timestamped debugging traces with +RTS -vt. Also, we plan to use this to generate parallelism profiles from the trace output.
* more warning fixageSimon Marlow2006-06-081-6/+6
|
* fix a warningSimon Marlow2006-06-081-1/+1
|
* fix some warningsSimon Marlow2006-06-081-3/+6
|
* Reorganisation of the source treeSimon Marlow2006-04-071-0/+1261
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.