summaryrefslogtreecommitdiff
path: root/rts/sm
Commit message (Collapse)AuthorAgeFilesLines
* Add +RTS -AL<size>Simon Marlow2016-05-041-2/+5
| | | | | | | | | | | | | | | +RTS -AL<size> controls the total size of large objects that can be allocated before a GC is triggered. Previously this was always just the value of -A, and the limit mainly existed to prevent runaway allocation in pathalogical programs that allocate a lot of large objects. However, since the limit is shared between all cores, on a large multicore the default becomes more restrictive, and can end up triggering GC well before it would normally have been. Arguably a better default would be A*N, but this is probably excessive. Adding a flag lets you choose, and I've left the default as it was. See docs for usage.
* Allow limiting the number of GC threads (+RTS -qn<n>)Simon Marlow2016-05-041-3/+1
| | | | | | | | | | | | | | | | | | This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
* Cleanups related to MAX_FREE_LISTSimon Marlow2016-05-021-19/+24
| | | | | | | | | | | | | | | | - Rename to the (more correct) NUM_FREE_LISTS - NUM_FREE_LISTS should be derived from the block and mblock sizes, not defined manually. It was actually too large by one, which caused a little bit of (benign) extra work in the form of a redundant loop iteration in some cases. - Add some ASSERTs for input preconditions to log_2() and log_2_ceil() - Fix some comments - Fix usage in allocLargeChunk, to account for the fact that log_2_ceil() can return NUM_FREE_LISTS.
* Revert "Revert "Use __builtin_clz() to implement log_1()""U-THEFACEBOOK\smarlow2016-05-021-11/+27
| | | | | | | This reverts commit 546f24e4f8a7c086b1e5afcdda624176610cbcf8. And adds a fix for Windows: we need to use __builtin_clzll() rather than __builtin_clzl(), because StgWord is unsigned long long on Windows.
* RTS: delete BlockedOnGA* + dead codeThomas Miedema2016-04-291-4/+0
| | | | | | | | Some old stuff related to the PAR way. Reviewed by: austin, simonmar Differential Revision: https://phabricator.haskell.org/D2137
* Revert "Use __builtin_clz() to implement log_2()"Simon Peyton Jones2016-04-281-21/+11
| | | | This reverts commit 24864ba5587c1a0447beabae90529e8bb4fa117a.
* Just comments & reformattingSimon Marlow2016-04-261-35/+21
|
* Use __builtin_clz() to implement log_2()Simon Marlow2016-04-261-11/+21
| | | | A microoptimisation in the block allocator.
* Allocate blocks in the GC in batchesSimon Marlow2016-04-123-30/+29
| | | | | | | | | | | | | | | | | Avoids contention for the block allocator lock in the GC; this can be seen in the gc_alloc_block_sync counter emitted by +RTS -s. I experimented with this a while ago, and there was already commented-out code for it in GCUtils.c, but I've now improved it so that it doesn't result in significantly worse memory usage. * The old method of putting spare blocks on ws->part_list was wasteful, the spare blocks are now shared between all generations and retained between GCs. * repeated allocGroup() results in fragmentation, so I switched to using allocLargeChunk() instead which is fragmentation-friendly; we already use it for the same reason in nursery allocation.
* Cache the size of part_list/scavd_list (#11783)Simon Marlow2016-04-124-9/+20
| | | | | | | | | After a parallel GC, it is possible to have a long list of blocks in ws->part_list, if we did a lot of work stealing but didn't fill up the blocks we stole. These blocks persist until the next load-balanced GC, which might be a long time, and during every GC we were traversing this list to find its size. The fix is to maintain the size all the time, so we don't have to compute it.
* Small simplification (#11777)Simon Marlow2016-04-121-5/+1
| | | | | DEAD_WEAK used to have a different layout, see d61c623ed6b2d352474a7497a65015dbf6a72e12
* Remove all mentions of IND_OLDGEN outside of docs/rtsJoachim Breitner2016-03-291-1/+1
|
* Revert "Various ticky-related work"Ben Gamari2016-03-241-1/+1
| | | | | This reverts commit 6c2c853b11fe25c106469da7b105e2be596c17de which was supposed to be merged as individual commits.
* Various ticky-related workJoachim Breitner2016-03-241-1/+1
| | | | | | | | | | | | | | | | | | this Diff contains small, self-contained changes as I work towards fixing #10613. It is mostly created to let harbormaster do its job, but feedback is welcome as well. Please do not merge this via arc; I’d like to push the individual patches as layed out here. I might push mostly trivial ones even without review, as long as the build passes. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2014
* rts: drop unused global 'blackhole_queue'Sergei Trofimovich2016-02-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | Commit 5d52d9b64c21dcf77849866584744722f8121389 removed global 'blackhole_queue' in favour of new mechanism: when TSO hits blackhole TSO blocks waiting for 'MessgaeBlackhole' delivery. Patch removed unused global and updates stale comments. Noticed by Yuras Shumovich. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: build test Reviewers: simonmar, austin, Yuras, bgamari Reviewed By: Yuras, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1953
* rts: mark 'copied' as staticSergei Trofimovich2016-02-072-3/+1
| | | | | | | | Noticed by uselex.rb: copied: [R]: exported from: ./rts/dist/build/sm/GC.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* rts: mark scavenge_mutable_list as staticSergei Trofimovich2016-02-072-3/+1
| | | | | | | | | | Noticed by uselex.rb: scavenge_mutable_list: [R]: exported from: ./rts/dist/build/sm/Scav.o scavenge_mutable_list1: [R]: exported from: ./rts/dist/build/sm/Scav.thr_o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* rts: drop unused calcLiveBlocks, calcLiveWordsSergei Trofimovich2016-02-072-29/+0
| | | | | | | | | | | | | | | Use of these helper functions was removed by commit 18896fa2b06844407fd1e0d3f85cd3db97a96ff4 Author: Simon Marlow <marlowsd@gmail.com> Date: Wed Feb 2 15:49:55 2011 +0000 Noticed by uselex.rb: calcLiveBlocks: [R]: exported from: ./rts/dist/build/sm/Storage.o calcLiveWords: [R]: exported from: ./rts/dist/build/sm/Storage.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* Remove unused IND_PERMJoachim Breitner2016-01-235-14/+0
| | | | | | | | | | | | | | | | | it seems that this closure type has not been in use since 5d52d9, so all this is dead and untested code. This removes it. Some of the code might be useful for a counting indirection as described in #10613, so when implementing that, have a look at what this commit removes. Test Plan: validate on harbormaster Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1821
* - fix gc_thread related compilation failure on Solaris/i386 platformKarel Gardas2015-12-231-1/+2
| | | | | | | | | | | | | Summary: This patch fixes gc_thread related compilation failure on Solaris/i386 platform. It uses Linux way of __thread declared gc_thread variable for register starving i386 from now. Reviewers: bgamari, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1688
* rts: One more Clang-unfriendly CPP usageBen Gamari2015-12-071-3/+3
|
* rts: Kill PAPI supportBen Gamari2015-11-182-26/+4
| | | | | | | | | | | | | | | This hasn't been used for a very long time and will soon be superceded by perf_events support. Test Plan: validate Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: thomie, erikd Differential Revision: https://phabricator.haskell.org/D1493
* rts/posix: Reduce heap allocation amount on mmap failureBen Gamari2015-11-012-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the two-step allocator the RTS asks the kernel for a large upfront mmap'd region of memory (on the order of terabytes). While we have no expectation that this entire region will be backed by physical memory, this scheme nevertheless fails on some systems with resource limits. Here we use a back-off scheme to reduce our allocation request until we find a size agreeable to the kernel. Fixes #10877. This also fixes a latent bug wherein the heap reservation retry logic would fail to free the previously reserved address space, which would likely result in a heap allocation failure. Test Plan: set address space limit with `ulimit -v 67108864` and try running a compiled program Reviewers: simonmar, austin Reviewed By: simonmar Subscribers: thomie, RyanGlScott Differential Revision: https://phabricator.haskell.org/D1405 GHC Trac Issues: #10877
* rts: Make MBLOCK_SPACE_SIZE dynamicBen Gamari2015-10-303-27/+37
| | | | | | | | | | | | | | | | | | | | | | | Previously this was introduced in D524 as a compile-time constant. Sadly, this isn't flexible enough to allow for environments where ulimits restrict the maximum address space size (see, for instance, Consequently, we are forced to make this dynamic. In principle this shouldn't be so terrible as we can place both the beginning and end addresses within the same cache line, likely incurring only one or so additional instruction in HEAP_ALLOCED. Test Plan: validate Reviewers: austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1353 GHC Trac Issues: #10877
* Fix segfault due to reading non-existent memorySimon Marlow2015-10-301-2/+14
| | | | | | | | | | | | | | | | | | | It was possible to read non-existent memory, if we try to read the srt_offset field of an info table when there is no SRT, and the info table is right at the start of the text section. This actually happened to me, I'm not sure why it never happened before. Test Plan: validate Reviewers: rwbarton, ezyang, austin, bgamari Reviewed By: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1401
* Fix a bug with mallocForeignPtr and finalizers (#10904)Simon Marlow2015-09-241-0/+5
| | | | | | | | | | | | Summary: See Note [MallocPtr finalizers] Test Plan: validate; new test T10904 Reviewers: ezyang, bgamari, austin, hvr, rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1275
* s/StgArrWords/StgArrBytes/Siddhanathan Shanmugam2015-09-114-4/+4
| | | | | | | | | | Rename StgArrWords to StgArrBytes (see Trac #8552) Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D1233 GHC Trac Issues: #8552
* RTS: Reduce MBLOCK_SPACE_SIZE on AArch64Erik de Castro Lopo2015-08-291-0/+5
| | | | | | | | | | | | | | | | | | | Commit 0d1a8d09f4 added a two step allocator for 64 bit systems. This allocator mmaps a huge (1 TB) chunk of memory out of which it does smaller allocations. On AArch64/Arm64 linux, this mmap was failing due to the Arm64 Linux kernel parameter CONFIG_ARM64_VA_BITS defaulting to 39 bits. Therefore reducing the AArch64 value for MBLOCK_SPACE_SIZE to make this allocation 1/4 TB while remaining 1 TB for other archs. Reviewers: ezyang, austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1171 GHC Trac Issues: #10682
* Fix #7919 (again)Simon Marlow2015-07-311-13/+35
| | | | | | | | | | | | | | | | | Summary: The fix is a bit clunky, and is perhaps not the best fix, but I'm not sure how much work it would be to fix it the other way (see comments for more info). Test Plan: T7919 doesn't crash Reviewers: austin, rwbarton, ezyang, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1113 GHC Trac Issues: #7919
* Eliminate zero_static_objects_list()Simon Marlow2015-07-289-116/+130
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: [Revised version of D1076 that was committed and then backed out] In a workload with a large amount of code, zero_static_objects_list() takes a significant amount of time, and furthermore it is in the single-threaded part of the GC. This patch uses a slightly fiddly scheme for marking objects on the static object lists, using a flag in the low 2 bits that flips between two states to indicate whether an object has been visited during this GC or not. We also have to take into account objects that have not been visited yet, which might appear at any time due to runtime linking. Test Plan: validate Reviewers: austin, ezyang, rwbarton, bgamari, thomie Reviewed By: bgamari, thomie Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1106
* Revert "Eliminate zero_static_objects_list()"Simon Marlow2015-07-279-129/+116
| | | | This reverts commit b949c96b4960168a3b399fe14485b24a2167b982.
* rts/sm: Add missing argument names in function definitionsBen Gamari2015-07-231-6/+6
| | | | | | C99 does not allow unnamed parameters in definition argument lists [1]. [1] http://stackoverflow.com/questions/8776810/parameter-name-omitted-c-vs-c
* Two step allocator for 64-bit systemsGiovanni Campagna2015-07-225-24/+656
| | | | | | | | | | | | | | | | | | | | | | | Summary: The current OS memory allocator conflates the concepts of allocating address space and allocating memory, which makes the HEAP_ALLOCED() implementation excessively complicated (as the only thing it cares about is address space layout) and slow. Instead, what we want is to allocate a single insanely large contiguous block of address space (to make HEAP_ALLOCED() checks fast), and then commit subportions of that in 1MB blocks as we did before. This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for certain OSes. Test Plan: validate Reviewers: simonmar, ezyang, austin Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D524 GHC Trac Issues: #9706
* Eliminate zero_static_objects_list()Simon Marlow2015-07-229-116/+129
| | | | | | | | | | | | | | | | | | | | | Summary: In a workload with a large amount of code, zero_static_objects_list() takes a significant amount of time, and furthermore it is in the single-threaded part of the GC. This patch uses a slightly fiddly scheme for marking objects on the static object lists, using a flag in the low 2 bits that flips between two states to indicate whether an object has been visited during this GC or not. We also have to take into account objects that have not been visited yet, which might appear at any time due to runtime linking. Test Plan: validate Reviewers: austin, bgamari, ezyang, rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1076
* initGroup: only initialize the first and last blocks of a groupSimon Marlow2015-07-151-15/+11
| | | | | | | | | | | | Summary: Initialising the whole group is expensive and unnecessary. Test Plan: validate Reviewers: austin, bgamari, rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1071
* Update comments around blackholesSimon Marlow2015-07-071-1/+1
| | | | | | | | | | Test Plan: validate Reviewers: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1047
* Fix for crash in setnumcapabilities001Simon Marlow2015-06-261-6/+12
| | | | | | | getNewNursery() was unconditionally incrementing next_nursery, which is normally fine but it broke an assumption in storageAddCapabilities(). This manifested as an occasional crash in the setnumcapabilities001 test.
* Fix for CAF retention when dynamically loading & unloading codeSimon Marlow2015-06-081-7/+34
| | | | | | | | | | | | | In a situaion where we have some statically-linked code and we want to load and unload a series of objects, we need the CAFs in the statically-linked code to be retained indefinitely, while the CAFs in the dynamically-linked code should be GC'd as normal, so that we can detect when the code is unloadable. This was wrong before - we GC'd CAFs in the static code, leading to a crash in the rare case where we use a CAF, GC it, and then load a new object that uses it again. I also did some tidy up: RtsConfig now has a field keep_cafs to indicate whether we want CAFs to be retained in static code.
* Don't call DEAD_WEAK finalizer again on shutdown (#7170)Simon Marlow2015-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There's a race condition like this: # A foreign pointer gets promoted to the last generation # It has its finalizer called manually # We start shutting down the runtime in `hs_exit_` from the main thread # A minor GC starts running (`scheduleDoGC`) on one of the threads # The minor GC notices that we're in `SCHED_INTERRUPTING` state and advances to `SCHED_SHUTTING_DOWN` # The main thread tries to do major GC (with `scheduleDoGC`), but it exits early because we're in `SCHED_SHUTTING_DOWN` state # We end up with a `DEAD_WEAK` left on the list of weak pointers of the last generation, because it relied on major GC removing it from that list This change: * Ignores DEAD_WEAK finalizers when shutting down * Makes the major GC on shutdown more likely * Fixes a bogus assert Test Plan: before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5 reproduced and after it doesn't Reviewers: ezyang, austin, simonmar Reviewed By: simonmar Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D921 GHC Trac Issues: #7170
* Newline after type of allocate().Edward Z. Yang2015-06-011-1/+2
| | | | Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
* Replace hooks by callbacks in RtsConfig (#8785)Simon Marlow2015-04-072-0/+4
| | | | | | | | | | | | Summary: Hooks rely on static linking semantics, and are broken by -Bsymbolic which we need when using dynamic linking. Test Plan: Built it Reviewers: austin, hvr, tibbe Differential Revision: https://phabricator.haskell.org/D8
* fix bus errors on SPARC caused by unalignment access to alloc_limit (fixes ↵Karel Gardas2015-02-231-2/+8
| | | | | | | | | | #10043) Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D657
* comments onlySimon Marlow2015-01-201-0/+2
|
* Optimise scavenge_large_srt_bitmapSimon Marlow2015-01-131-12/+22
| | | | | | | | | | | Very large modules can sometimes contain very large SRT bitmaps (this is a separate problem that I need to look into). The large bitmaps often contain a lot of zeros, so this patch skips over empty words in the bitmap. It makes a dramatic difference in the particular example that I saw, where an old gen GC was taking 0.5s before this change and 0.07s after it.
* Add +RTS -n<size>: divide the nursery into chunksSimon Marlow2014-11-254-57/+102
| | | | See the documentation for details.
* Make clearNursery freeSimon Marlow2014-11-252-19/+79
| | | | | | | | | | | | | | | | | | | | | Summary: clearNursery resets all the bd->free pointers of nursery blocks to make the blocks empty. In profiles we've seen clearNursery taking significant amounts of time particularly with large -N and -A values. This patch moves the work of clearNursery to the point at which we actually need the new block, thereby introducing an invariant that blocks to the right of the CurrentNursery pointer still need their bd->free pointer reset. This should make things faster overall, because we don't need to clear blocks that we don't use. Test Plan: validate Reviewers: AndreasVoellmy, ezyang, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D318
* arm64: 64bit iOS and SMP support (#7942)Luke Iannini2014-11-191-2/+2
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Per-thread allocation counters and limitsSimon Marlow2014-11-121-1/+7
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* Fix a rare parallel GC bugSimon Marlow2014-10-231-1/+6
| | | | | | | | | When there's a conflict between two threads evacuating the same TSO, in some cases we would update the incall->tso pointer to point to the wrong copy of the TSO. This would get fixed during the next GC, but if the thread completed in the meantime, it would likely crash. We're seeing this about once per day on a heavily loaded machine (it varies a lot though).
* [skip ci] rts: Detabify sm/Compact.hAustin Seipp2014-10-211-9/+9
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>