summaryrefslogtreecommitdiff
path: root/rts/Sparks.c
Commit message (Collapse)AuthorAgeFilesLines
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Schedule.c: remove unreachable code blockÖmer Sinan Ağacan2018-03-071-15/+0
|
* rts: Label all threads created by the RTSBen Gamari2017-10-161-1/+2
| | | | | | | | | | Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: pacak, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4068
* Don't GC sparks for CAFsSimon Marlow2016-06-141-9/+6
| | | | We can't tell whether the CAF is actually garbage or not.
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-3/+3
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Two step allocator for 64-bit systemsGiovanni Campagna2015-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: The current OS memory allocator conflates the concepts of allocating address space and allocating memory, which makes the HEAP_ALLOCED() implementation excessively complicated (as the only thing it cares about is address space layout) and slow. Instead, what we want is to allocate a single insanely large contiguous block of address space (to make HEAP_ALLOCED() checks fast), and then commit subportions of that in 1MB blocks as we did before. This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for certain OSes. Test Plan: validate Reviewers: simonmar, ezyang, austin Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D524 GHC Trac Issues: #9706
* Delete the WayPar wayThomas Miedema2015-07-101-1/+1
| | | | | | Also remove 't' and 's' from ALL_WAYS; they don't exist. Differential Revision: https://phabricator.haskell.org/D1055
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: detabify/dewhitespace Sparks.cAustin Seipp2014-08-201-15/+15
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Add new fully-accurate per-spark trace/eventlog eventsDuncan Coutts2011-07-181-1/+10
| | | | | | | | | | | | | | Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events covering all stages of the spark lifcycle: create, dud, overflow, run, steal, fizzle, gc The sampled spark events are still available. There are now two event classes for sparks, the sampled and the fully accurate. They can be enabled/disabled independently. By default +RTS -l includes the sampled but not full detail spark events. Use +RTS -lf-p to enable the detailed 'f' and disable the sampled 'p' spark. Includes work by Mikolaj <mikolaj.konarski@gmail.com>
* Move allocation of spark pools into initCapabilityDuncan Coutts2011-07-181-7/+3
| | | | | | Rather than a separate phase of initSparkPools. It means all the spark stuff for a capability is initialisaed at the same time, which is then becomes a good place to stick an initial spark trace event.
* Classify overflowed sparks separatelyDuncan Coutts2011-07-181-2/+6
| | | | | | | | | | | When you use `par` to make a spark, if the spark pool on the current capability is full then the spark is discarded. This represents a loss of potential parallelism and it also means there are simply a lot of sparks around. Both are things that might be of concern to a programmer when tuning a parallel program that uses par. The "+RTS -s" stats command now reports overflowed sparks, e.g. SPARKS: 100001 (15521 converted, 84480 overflowed, 0 dud, 0 GC'd, 0 fizzled)
* Use a struct for the set of spark countersDuncan Coutts2011-07-181-8/+8
|
* Change tryStealSpark so it does not consume fizzled sparksDuncan Coutts2011-07-181-26/+0
| | | | | We want to count fizzled sparks accurately. Now tryStealSpark returns fizzled sparks, and the callers now update the fizzled spark count.
* Improve the newSpark dud test by using the pointer tag bitsDuncan Coutts2011-07-181-7/+1
| | | | | | | | newSpark() checks if the spark is a dud, and if so does not add it to the spark pool. Previously, newSpark would discard the pointer tag bits and just check closure_SHOULD_SPARK(p). We can take advantage of the tag bits which can tell us if the pointer points to a value. If it is, it's a dud spark and we don't need to add it to the spark pool.
* pruneSparkQueue: handle CAFsSimon Marlow2011-03-181-9/+24
|
* pruneSparkQueue: check for tagged pointersSimon Marlow2011-02-141-22/+33
| | | | | | | This was a bug in 6.12.3. I think the problem no longer occurs due to the way sparks are treated as weak pointers, but it doesn't hurt to test for tagged pointers anyway: better to do the test than have a subtle invariant.
* count fizzled and GC'd sparks separatelySimon Marlow2010-11-111-3/+3
|
* count "dud" sparks (expressions that were already evaluated when sparked)Simon Marlow2010-11-011-3/+4
|
* fix warningSimon Marlow2010-05-251-1/+1
|
* Make sparks into weak pointers (#2185)Simon Marlow2010-05-251-4/+8
| | | | | The new strategies library (parallel-2.0+, preferably 2.2+) is now required for parallel programming, otherwise parallelism will be lost.
* Windows DLLs: use DLL aware runSparks_closure instead of ↵Ben.Lippmeier@anu.edu.au2009-11-231-1/+1
| | | | base_GHCziConc_runSparks_closure directly
* Expose all EventLog events as DTrace probesManuel M T Chakravarty2009-12-121-1/+1
| | | | | | | | | | | | | | - Defines a DTrace provider, called 'HaskellEvent', that provides a probe for every event of the eventlog framework. - In contrast to the original eventlog, the DTrace probes are available in all flavours of the runtime system (DTrace probes have virtually no overhead if not enabled); when -DTRACING is defined both the regular event log as well as DTrace probes can be used. - Currently, Mac OS X only. User-space DTrace probes are implemented differently on Mac OS X than in the original DTrace implementation. Nevertheless, it shouldn't be too hard to enable these probes on other platforms, too. - Documentation is at http://hackage.haskell.org/trac/ghc/wiki/DTrace
* Unify event logging and debug tracing.Simon Marlow2009-08-291-7/+5
| | | | | | | | | | | | | | | | | | | - tracing facilities are now enabled with -DTRACING, and -DDEBUG additionally enables debug-tracing. -DEVENTLOG has been removed. - -debug now implies -eventlog - events can be printed to stderr instead of being sent to the binary .eventlog file by adding +RTS -v (which is implied by the +RTS -Dx options). - -Dx debug messages can be sent to the binary .eventlog file by adding +RTS -l. This should help debugging by reducing the impact of debug tracing on execution time. - Various debug messages that duplicated the information in events have been removed.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-12/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* Remove old GUM/GranSim codeSimon Marlow2009-06-021-660/+2
|
* Add EVENT_CREATE_SPARK_THREAD to replace EVENT_SPARK_TO_THREADSimon Marlow2009-04-231-6/+1
| | | | Also some tidyups and renaming
* For consistency, changed access of thread id to be through capability ↵donnie@darthik.com2009-04-131-1/+1
| | | | instead of directly from StgRegTable.
* Added new EventLog event: Spark to Thread.donnie@darthik.com2009-04-131-0/+7
|
* Eventlog support for new event type: create spark.donnie@darthik.com2009-04-031-0/+3
|
* Refactor the spark queue implementation into a generic work-stealing dequeSimon Marlow2009-02-051-299/+37
| | | | So we can use this abstraction elsewhere in the RTS
* Fix some unsigned comparisions that should be signedSimon Marlow2008-11-191-3/+8
| | | | | Fixes crashes when using reclaimSpark() (not used currently, but may be in the future).
* Remove incorrect assertions in steal()Simon Marlow2008-11-191-2/+6
|
* pruneSparkQueue(): fix bug when top>bottomSimon Marlow2008-11-061-0/+6
|
* Run sparks in batches, instead of creating a new thread for each oneSimon Marlow2008-11-061-4/+6
| | | | | Signficantly reduces the overhead for par, which means that we can make use of paralellism at a much finer granularity.
* retreat the top/bottom fields of the spark pool in pruneSparkPool()Simon Marlow2008-11-051-0/+7
|
* traverse the spark pools only once during GC rather than twiceSimon Marlow2008-10-221-20/+29
|
* Refactoring and reorganisation of the schedulerSimon Marlow2008-10-221-67/+58
| | | | | | | | | | | | | | | | | Change the way we look for work in the scheduler. Previously, checking to see whether there was anything to do was a non-side-effecting operation, but this has changed now that we do work-stealing. This lead to a refactoring of the inner loop of the scheduler. Also, lots of cleanup in the new work-stealing code, but no functional changes. One new statistic is added to the +RTS -s output: SPARKS: 1430 (2 converted, 1427 pruned) lets you know something about the use of `par` in the program.
* Work stealing for sparksberthold@mathematik.uni-marburg.de2008-09-151-108/+409
| | | | | | | | | | | | | | | | | | | | | | | | | | Spark stealing support for PARALLEL_HASKELL and THREADED_RTS versions of the RTS. Spark pools are per capability, separately allocated and held in the Capability structure. The implementation uses Double-Ended Queues (deque) and cas-protected access. The write end of the queue (position bottom) can only be used with mutual exclusion, i.e. by exactly one caller at a time. Multiple readers can steal()/findSpark() from the read end (position top), and are synchronised without a lock, based on a cas of the top position. One reader wins, the others return NULL for a failure. Work stealing is called when Capabilities find no other work (inside yieldCapability), and tries all capabilities 0..n-1 twice, unless a theft succeeds. Inside schedulePushWork, all considered cap.s (those which were idle and could be grabbed) are woken up. Future versions should wake up capabilities immediately when putting a new spark in the local pool, from newSpark(). Patch has been re-recorded due to conflicting bugfixes in the sparks.c, also fixing a (strange) conflict in the scheduler.
* Separate pruning from marking of spark poolsSimon Marlow2008-09-091-7/+13
| | | | Fixes crash when using compacting GC in parallel programs
* Undo fix for #2185: sparks really should be treated as rootsSimon Marlow2008-07-231-6/+7
| | | | | Unless sparks are roots, strategies don't work at all: all the sparks get GC'd. We need to think about this some more.
* debug message tweaksSimon Marlow2008-07-231-1/+1
|
* FIX #2185: sparks should not be treated as roots by the GCSimon Marlow2008-04-241-18/+23
|
* Reorganisation to fix problems related to the gct register variableSimon Marlow2008-04-161-0/+69
| | | | | | | | | - GCAux.c contains code not compiled with the gct register enabled, it is callable from outside the GC - marking functions are moved to their relevant subsystems, outside the GC - mark_root needs to save the gct register, as it is called from outside the GC
* move markSparkQueue into GC.c, as it needs the register variable definedSimon Marlow2008-01-091-68/+0
|
* have each GC thread call GetRoots()simonmar@microsoft.com2007-12-131-46/+43
|
* Pointer TaggingSimon Marlow2007-07-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | This patch implements pointer tagging as per our ICFP'07 paper "Faster laziness using dynamic pointer tagging". It improves performance by 10-15% for most workloads, including GHC itself. The original patches were by Alexey Rodriguez Yakushev <mrchebas@gmail.com>, with additions and improvements by me. I've re-recorded the development as a single patch. The basic idea is this: we use the low 2 bits of a pointer to a heap object (3 bits on a 64-bit architecture) to encode some information about the object pointed to. For a constructor, we encode the "tag" of the constructor (e.g. True vs. False), for a function closure its arity. This enables some decisions to be made without dereferencing the pointer, which speeds up some common operations. In particular it enables us to avoid costly indirect jumps in many cases. More information in the commentary: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
* Free various things we allocateIan Lynagh2006-12-151-0/+5
|
* Split GC.c, and move storage manager into sm/ directorySimon Marlow2006-10-241-1/+1
| | | | | | | | | | | | | | | | | In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.