summaryrefslogtreecommitdiff
path: root/rts/sm/GCUtils.c
Commit message (Collapse)AuthorAgeFilesLines
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Fix a race between GC threads in concurrent scavengingÖmer Sinan Ağacan2018-09-061-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While debugging #15285 I realized that free block lists (free_list in BlockAlloc.c) get corrupted when multiple scavenge threads allocate and release blocks concurrently. Here's a picture of one such race: Thread 2 (Thread 32573.32601): #0 check_tail (bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860 #1 0x0000000000928ef7 in checkFreeListSanity () at rts/sm/BlockAlloc.c:896 #2 0x0000000000928979 in freeGroup (p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721 #3 0x0000000000928a17 in freeChain (bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738 #4 0x0000000000926911 in freeChain_sync (bd=0x7e998ce02880) at rts/sm/GCUtils.c:80 #5 0x0000000000934720 in scavenge_capability_mut_lists (cap=0x1acae80) at rts/sm/Scav.c:1665 #6 0x000000000092b411 in gcWorkerThread (cap=0x1acae80) at rts/sm/GC.c:1157 #7 0x000000000090be9a in yieldCapability (pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861 #8 0x0000000000906120 in scheduleYield (pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673 #9 0x0000000000905500 in schedule (initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293 #10 0x0000000000908d4f in scheduleWorker (cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554 #11 0x000000000091a30a in workerStart (task=0x7e9984000b70) at rts/Task.c:444 #12 0x00007f99937fa6db in start_thread (arg=0x7f9994e6a700) at pthread_create.c:463 #13 0x000061654d59f88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 32573.32573): #0 checkFreeListSanity () at rts/sm/BlockAlloc.c:887 #1 0x0000000000928979 in freeGroup (p=0x7e998d303540) at rts/sm/BlockAlloc.c:721 #2 0x0000000000926f23 in todo_block_full (size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264 #3 0x00000000009583b9 in alloc_for_copy (size=513, gen_no=0) at rts/sm/Evac.c:80 #4 0x000000000095850d in copy_tag_nolock (p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513, gen_no=0, tag=1) at rts/sm/Evac.c:153 #5 0x0000000000959177 in evacuate (p=0x7e998c675f28) at rts/sm/Evac.c:715 #6 0x0000000000932388 in scavenge_small_bitmap (p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271 #7 0x0000000000934aaf in scavenge_stack (p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908 #8 0x0000000000934295 in scavenge_one (p=0x7e998c66e000) at rts/sm/Scav.c:1466 #9 0x0000000000934662 in scavenge_mutable_list (bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643 #10 0x0000000000934700 in scavenge_capability_mut_lists (cap=0x1aaa340) at rts/sm/Scav.c:1664 #11 0x00000000009299b6 in GarbageCollect (collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0) at rts/sm/GC.c:378 #12 0x0000000000907a4a in scheduleDoGC (pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798 #13 0x0000000000905de7 in schedule (initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546 #14 0x0000000000908bc4 in scheduleWaitThread (tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537 #15 0x000000000091b5a0 in rts_evalLazyIO (cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530 #16 0x000000000091ca56 in hs_main (argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72 #17 0x0000000000421ea0 in main () In particular, dbl_link_onto() which is used to add a freed block to a doubly-linked free list is not thread safe and corrupts the list when called concurrently. Note that thread 1 is to blame here as thread 2 is properly taking the spinlock. With this patch we now take the spinlock when freeing a todo block in GC, avoiding this race. Test Plan: - Tried slow validate locally: this patch does not introduce new failures. - circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed because it took 5 hours but T7919 (which was previously failing on circleci) passed. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15285 Differential Revision: https://phabricator.haskell.org/D5115
* Move printMutableList to Printer.c next to other printersÖmer Sinan Ağacan2018-05-301-21/+0
|
* Prefer #if defined to #ifdefBen Gamari2017-04-281-3/+3
| | | | Our new CPP linter enforces this.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-281-1/+1
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Revert "Enable new warning for fragile/incorrect CPP #if usage"Ben Gamari2017-04-051-1/+1
| | | | | | | | This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-051-1/+1
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Use C99's boolBen Gamari2016-11-291-1/+1
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* NUMA supportSimon Marlow2016-06-101-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-8/+8
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Allocate blocks in the GC in batchesSimon Marlow2016-04-121-27/+22
| | | | | | | | | | | | | | | | | Avoids contention for the block allocator lock in the GC; this can be seen in the gc_alloc_block_sync counter emitted by +RTS -s. I experimented with this a while ago, and there was already commented-out code for it in GCUtils.c, but I've now improved it so that it doesn't result in significantly worse memory usage. * The old method of putting spare blocks on ws->part_list was wasteful, the spare blocks are now shared between all generations and retained between GCs. * repeated allocGroup() results in fragmentation, so I switched to using allocLargeChunk() instead which is fragmentation-friendly; we already use it for the same reason in nursery allocation.
* Cache the size of part_list/scavd_list (#11783)Simon Marlow2016-04-121-0/+3
| | | | | | | | | After a parallel GC, it is possible to have a long list of blocks in ws->part_list, if we did a lot of work stealing but didn't fill up the blocks we stole. These blocks persist until the next load-balanced GC, which might be a long time, and during every GC we were traversing this list to find its size. The fix is to maintain the size all the time, so we don't have to compute it.
* Fix #7919 (again)Simon Marlow2015-07-311-13/+35
| | | | | | | | | | | | | | | | | Summary: The fix is a bit clunky, and is perhaps not the best fix, but I'm not sure how much work it would be to fix it the other way (see comments for more info). Test Plan: T7919 doesn't crash Reviewers: austin, rwbarton, ezyang, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1113 GHC Trac Issues: #7919
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: detabify/dewhitespace sm/GCUtils.cAustin Seipp2014-08-201-16/+16
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Globally replace "hackage.haskell.org" with "ghc.haskell.org"Simon Marlow2013-10-011-1/+1
|
* Fix crash with large objects (#7919)Simon Marlow2013-05-241-14/+44
| | | | See comments for details.
* Fix a problem caused by very large objects (#7919)Simon Marlow2013-05-211-5/+12
| | | | | | As far as I can tell the bug should be harmless, apart from the failing assertion. Since the ticket reported crashes, there might be problems elsewhere that aren't triggered by this test case.
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-1/+1
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* Fix gcc 4.6 warnings; fixes #5176Ian Lynagh2011-06-251-2/+2
| | | | | | | | | | | Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
* Remove unused variableIan Lynagh2011-06-241-3/+0
|
* Refactoring and tidy upSimon Marlow2011-04-111-0/+1
| | | | | | | | | | | | This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
* Remove the per-generation mutable listsSimon Marlow2011-02-021-4/+3
| | | | Now that we use the per-capability mutable lists exclusively.
* GC refactoring, remove "steps"Simon Marlow2009-12-031-16/+16
| | | | | | | | | | | | | | | | | | | | | The GC had a two-level structure, G generations each of T steps. Steps are for aging within a generation, mostly to avoid premature promotion. Measurements show that more than 2 steps is almost never worthwhile, and 1 step is usually worse than 2. In theory fractional steps are possible, so the ideal number of steps is somewhere between 1 and 3. GHC's default has always been 2. We can implement 2 steps quite straightforwardly by having each block point to the generation to which objects in that block should be promoted, so blocks in the nursery point to generation 0, and blocks in gen 0 point to gen 1, and so on. This commit removes the explicit step structures, merging generations with steps, thus simplifying a lot of code. Performance is unaffected. The tunable number of steps is now gone, although it may be replaced in the future by a way to tune the aging in generation 0.
* Refactoring onlySimon Marlow2009-12-021-0/+1
|
* Store a destination step in the block descriptorSimon Marlow2009-11-291-2/+1
| | | | | | | At the moment, this just saves a memory reference in the GC inner loop (worth a percent or two of GC time). Later, it will hopefully let me experiment with partial steps, and simplifying the generation/step infrastructure.
* Relax the assumption that all objects fit in a single block (#3424)Simon Marlow2009-08-201-11/+26
| | | | | | | | | | | | | | | It is possible for the program to allocate single object larger than a block, without going through the normal large-object mechanisms that we have for arrays and threads and so on. The GC was assuming that no object was larger than a block, but #3424 contains a program that breaks the assumption. This patch removes the assumption. The objects in question will still be copied, that is they don't get the normal large-object treatment, but this case is unlikely to occur often in practice. In the future we may improve things by generating code to allocate them as large objects in the first place.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* small GC optimisationSimon Marlow2009-04-031-2/+11
|
* Add fast event loggingSimon Marlow2009-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Generate binary log files from the RTS containing a log of runtime events with timestamps. The log file can be visualised in various ways, for investigating runtime behaviour and debugging performance problems. See for example the forthcoming ThreadScope viewer. New GHC option: -eventlog (link-time option) Enables event logging. +RTS -l (runtime option) Generates <prog>.eventlog with the binary event information. This replaces some of the tracing machinery we already had in the RTS: e.g. +RTS -vg for GC tracing (we should do this using the new event logging instead). Event logging has almost no runtime cost when it isn't enabled, though in the future we might add more fine-grained events and this might change; hence having a link-time option and compiling a separate version of the RTS for event logging. There's a small runtime cost for enabling event-logging, for most programs it shouldn't make much difference. (Todo: docs)
* Use work-stealing for load-balancing in the GCSimon Marlow2009-03-131-32/+43
| | | | | | | | | | | | | | | | | New flag: "+RTS -qb" disables load-balancing in the parallel GC (though this is subject to change, I think we will probably want to do something more automatic before releasing this). To get the "PARGC3" configuration described in the "Runtime support for Multicore Haskell" paper, use "+RTS -qg0 -qb -RTS". The main advantage of this is that it allows us to easily disable load-balancing altogether, which turns out to be important in parallel programs. Maintaining locality is sometimes more important that spreading the work out in parallel GC. There is a side benefit in that the parallel GC should have improved locality even when load-balancing, because each processor prefers to take work from its own queue before stealing from others.
* fix gcc warnings for printf formats on 32-bitSimon Marlow2008-06-181-1/+2
|
* fix some printf formats for 64 bitsSimon Marlow2008-06-171-1/+1
|
* tmp: alloc one block at a timeSimon Marlow2008-04-161-9/+19
|
* Use the BF_EVACUATED flag to indicate to-space consistentlySimon Marlow2008-04-161-9/+1
| | | | | | | | | BF_EVACUATED is now set on all blocks except those that we are copying. This means we don't need a separate test for gen>N in evacuate(), because in generations older than N, BF_EVACUATED will be set anyway. The disadvantage is that we have to reset the BF_EVACUATED flag on the blocks of any generation we're collecting before starting GC. Results in a small speed improvement.
* update copyrights in rts/smSimon Marlow2008-04-161-1/+1
|
* Reorganisation to fix problems related to the gct register variableSimon Marlow2008-04-161-0/+1
| | | | | | | | | - GCAux.c contains code not compiled with the gct register enabled, it is callable from outside the GC - marking functions are moved to their relevant subsystems, outside the GC - mark_root needs to save the gct register, as it is called from outside the GC
* allocate more blocks in one go, to reduce contention for the block allocatorSimon Marlow2008-04-161-8/+35
|
* move the scan block pointer into the gct structureSimon Marlow2008-04-161-6/+6
|
* improvements to +RTS -s outputSimon Marlow2008-04-161-0/+1
| | | | | | | - count and report number of parallel collections - calculate bytes scanned in addition to bytes copied per thread - calculate "work balance factor" - tidy up the formatting a bit
* debug output tweaksSimon Marlow2008-04-161-2/+2
|
* Allow work units smaller than a block to improve load balancingSimon Marlow2008-04-161-62/+116
|
* small debug output improvementsSimon Marlow2008-04-161-1/+1
|
* treat the global work list as a queue rather than a stackSimon Marlow2008-04-161-2/+10
|
* Add +RTS -vg flag for requesting some GC trace messages, outside DEBUGSimon Marlow2008-04-161-1/+1
| | | | | | | DEBUG imposes a significant performance hit in the GC, yet we often want some of the debugging output, so -vg gives us the cheap trace messages without the sanity checking of DEBUG, just like -vs for the scheduler.
* use synchronised version of freeChain() in scavenge_mutable_list()simonmar@microsoft.com2007-12-131-0/+8
|
* GC: small improvement to parallelismsimonmar@microsoft.com2007-11-291-3/+13
| | | | don't cache a work block locally if the global queue is empty
* count the number of todo blocks, and add a tracesimonmar@microsoft.com2007-11-271-0/+4
|
* cache bd->todo_bd->free and the limit in the workspaceSimon Marlow2007-11-211-2/+8
| | | | | avoids cache contention: bd->todo_bd->free may clash with any cache line, so we localise it.