summaryrefslogtreecommitdiff
path: root/rts/sm
Commit message (Collapse)AuthorAgeFilesLines
* Drop copy step from the rts/ghc.mkMoritz Angermann2017-02-282-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I've used a different build system for building the rts (Xcode). And in doing so, I looked through the rts/ghc.mk to figure out how to build the rts. In general it's quite straight forward to just compile all the c files with the proper flags. However there is one rather awkward copy step that copies some files for special handling for the rts way. I'm wondering if the proposed solution in this diff is better or worse than the current situation? The idea is to keep the files, but use #includes to produce identical files with just an additional define. It does however produce empty objects for non threaded ways. Reviewers: ezyang, bgamari, austin, erikd, simonmar, rwbarton Reviewed By: bgamari, simonmar, rwbarton Subscribers: rwbarton, thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3237
* rts: Correct the nursery size in the gen 1 growth computationJohn C. Carey2017-02-231-1/+13
| | | | | | | | | | | | Fixes trac issue #13288. Reviewers: austin, bgamari, erikd, simonmar Reviewed By: simonmar Subscribers: mutjida, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3143
* Fix crashes in hash table scanning with THREADED_RTSSimon Marlow2016-12-071-3/+19
| | | | See comments.
* Overhaul of Compact Regions (#12455)Simon Marlow2016-12-077-546/+478
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
* Overhaul GC statsSimon Marlow2016-12-063-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Visible API changes: * The C struct `GCDetails` gives the stats about a single GC. This is passed to the `gcDone()` callback if one is set via the RtsConfig. (previously we just passed a collection of values, so this is more extensible, at the expense of breaking the existing API) * `RTSStats` gives cumulative stats since the start of the program, and includes the `GCDetails` for the most recent GC. This struct can be obtained via `getRTSStats()` (the old `getGCStats()` has been removed, and `getGCStatsEnabled()` has been renamed to `getRTSStatsEnabled()`) Improvements: * The per-GC stats and cumulative stats are now cleanly separated. * Inside the RTS we have a top-level `RTSStats` struct to keep all our stats in, previously this was just a collection of strangely-named variables. This struct is mostly just copied in `getRTSStats()`, so the implementation of that function is a lot shorter. * Types are more consistent. We use a uint64_t byte count for all memory values, and Time for all time values. * Names are more consistent. We use a suffix `_bytes` for all byte counts and `_ns` for all time values. * We now collect information about the amount of memory in large objects and compact objects in `GCDetails`. (the latter was the reason I started doing this patch but it seems to have ballooned a bit!) * I fixed a bug in the calculation of the elapsed MUT time, and added an ASSERT to stop the calculations going wrong in the future. For now I kept the Haskell API in `GHC.Stats` the same, by impedence-matching with the new API. We could either break that API and make it match the C API more closely, or we could add a new API and deprecate the old one. Opinions welcome. This stuff is very easy to get wrong, and it's hard to test. Reviews welcome! Test Plan: manual testing validate Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2756
* Fix x86 Windows build and testsuiteTamar Christina2016-12-061-1/+1
| | | | | | | | | | | | | | | | Summary: Fix issues preventing x86 GHC to build on Windows and fix segfault in the testsuite. Test Plan: ./validate Reviewers: austin, erikd, simonmar, bgamari Reviewed By: bgamari Subscribers: #ghc_windows_task_force, thomie Differential Revision: https://phabricator.haskell.org/D2789
* Use C99's boolBen Gamari2016-11-2916-246/+246
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Fix type of GarbageCollect declarationBen Gamari2016-11-291-1/+1
| | | | | | | | | | Test Plan: Validate Reviewers: simonmar, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2764
* Storage.c: Pass a size to sys_icache_invalidateShea Levy2016-11-151-2/+2
| | | | | | | | | | | | | | | | | The previous code passed an end pointer, but the interface takes a size instead. Fixes #12838. Reviewers: austin, erikd, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2711 GHC Trac Issues: #12838
* Remove CONSTR_STATICSimon Marlow2016-11-146-31/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We currently have two info tables for a constructor * XXX_con_info: the info table for a heap-resident instance of the constructor, It has type CONSTR, or one of the specialised types like CONSTR_1_0 * XXX_static_info: the info table for a static instance of this constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF. I'm getting rid of the latter, and using the `con_info` info table for both static and dynamic constructors. For rationale and more details see Note [static constructors] in SMRep.hs. I also removed these macros: `isSTATIC()`, `ip_STATIC()`, `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC distinction, and anyway HEAP_ALLOCED() does the same job. Test Plan: validate Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2690 GHC Trac Issues: #12455
* Fix a bug in parallel GC synchronisationSimon Marlow2016-10-293-28/+29
| | | | | | | | | | | | | | | | | | | | | Summary: The problem boils down to global variables: in particular gc_threads[], which was being modified by a subsequent GC before the previous GC had finished with it. The fix is to not use global variables. This was causing setnumcapabilities001 to fail (again!). It's an old bug though. Test Plan: Ran setnumcapabilities001 in a loop for a couple of hours. Before this patch it had been failing after a few minutes. Not a very scientific test, but it's the best I have. Reviewers: bgamari, austin, fryguybob, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2654
* Turn on -n4m with -A16m or greaterSimon Marlow2016-10-091-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nursery chunks help reduce the cost of GC when capabilities are unevenly loaded, by ensuring that we use more of the available nursery. The rationale for enabling this at -A16m is that any negative effects due to loss of cache locality are less likely to be an issue at -A16m and above. It's a conservative guess. If we had a lot of benchmark data we could probably do better. Results for nofib/parallel at -N4 -A32m with and without -n4m: ``` ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ blackscholes 0.0% -9.5% -9.0% -15.0% -2.2% coins 0.0% -4.7% -3.6% -0.6% -13.6% mandel 0.0% -0.3% +7.7% +13.1% +0.1% matmult 0.0% +1.5% +10.0% +7.7% +0.1% nbody 0.0% -4.1% -2.9% 0.085 0.0% parfib 0.0% -1.4% +1.0% +1.5% +0.2% partree 0.0% -0.3% +0.8% +2.9% -0.8% prsa 0.0% -0.5% -2.1% -7.6% 0.0% queens 0.0% -3.2% -1.4% +2.2% +1.3% ray 0.0% -5.6% -14.5% -7.6% +0.8% sumeuler 0.0% -0.4% +2.4% +1.1% 0.0% ------------------------------------------------------------------------ Min 0.0% -9.5% -14.5% -15.0% -13.6% Max 0.0% +1.5% +10.0% +13.1% +1.3% Geometric Mean +0.0% -2.6% -1.3% -0.5% -1.4% ``` Not conclusive, but slightly better. This matters a lot more when you have more cores. Test Plan: validate, nofib/paralel Reviewers: niteria, ezyang, nh2, trofi, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2581 GHC Trac Issues: #9221
* Make start address of `osReserveHeapMemory` tunable via command line -xbFrancesco Mazzoli2016-09-092-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We stumbled upon a case where an external library (OpenCL) does not work if a specific address (0x200000000) is taken. It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000: ``` void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE); at = osTryReserveHeapMemory(*len, hint); ``` This makes it impossible to use Haskell programs compiled with GHC 8 with C functions that use OpenCL. See this example ​https://github.com/chpatrick/oclwtf for a repro. This patch allows the user to work around this kind of behavior outside our control by letting the user override the starting address through an RTS command line flag. Reviewers: bgamari, Phyx, simonmar, erikd, austin Reviewed By: Phyx, simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2513
* When in sanity mode, un-zero malloc'd memory; fix uninitialized memory bugs.Edward Z. Yang2016-08-151-0/+2
| | | | | | | | | | | | | | | | malloc'd memory is not guaranteed to be zeroed. On Linux, however, it is often zeroed, leading to latent bugs. In fact, with this patch I fix two uninitialized memory bugs stemming from this. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonmar, austin, Phyx, bgamari, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2455
* refactor test for __builtin_unreachable into Rts.h macro RTS_UNREACHABLEKarel Gardas2016-08-151-4/+1
| | | | | | | | | | | | | Summary: This patch refactors GNU C version test (for 4.5 and more modern) due to usage of __builtin_unreachable done in the CNF.c code directly into the new RTS_UNREACHABLE macro placed into Rts.h Reviewers: bgamari, austin, simonmar, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2457
* fix compilation failure on OpenBSD with system supplied GNU C 4.2.1Karel Gardas2016-08-141-1/+4
| | | | | | | | | | | | | | Summary: This patch fixes compilation failure on OpenBSD. The OpenBSD's GNU C compiler is of 4.2.1 version and problematic __builtin_unreachable was added in GNU C 4.5 release. Let's use pure abort() call on OpenBSD instead of __builtin_unreachable Reviewers: bgamari, austin, erikd, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2453
* Track the lengths of the thread queuesSimon Marlow2016-08-031-2/+4
| | | | | | | | | | | | | | | Summary: Knowing the length of the run queue in O(1) time is useful: for example we don't have to traverse the run queue to know how many threads we have to migrate in schedulePushWork(). Test Plan: validate Reviewers: ezyang, erikd, bgamari, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2437
* Move stat_startGCSyncBartosz Nitka2016-07-271-2/+0
| | | | | | | | | | | | | | @simonmar told me that it makes more sense this way. Test Plan: it still builds Reviewers: bgamari, austin, simonmar, erikd Reviewed By: simonmar, erikd Subscribers: thomie, simonmar Differential Revision: https://phabricator.haskell.org/D2428
* Fix the non-Linux buildErik de Castro Lopo2016-07-221-14/+5
| | | | | | | | | | | | | | | | | | | Summary: The recent Compact Regions commit (cf989ffe49) builds fine on Linux but doesn't build on OS X r Windows. * rts/sm/CNF.c: Drop un-needed #includes. * Fix parenthesis usage with CPP ASSERT macro. * Fix format string in debugBelch messages. * Use stg_max() instead hand rolled inline max() function. Test Plan: Build on Linux, OS X and Windows Reviewers: gcampax, simonmar, austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2421
* Compact RegionsGiovanni Campagna2016-07-209-12/+1653
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
* NUMA cleanupsSimon Marlow2016-06-173-16/+14
| | | | | - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c - Add a test to tests/rts
* Rts flags cleanupSimon Marlow2016-06-102-5/+5
| | | | | | | | * Remove unused/old flags from the structs * Update old comments * Add missing flags to GHC.RTS * Simplify GHC.RTS, remove C code and use hsc2hs instead * Make ParFlags unconditional, and add support to GHC.RTS
* NUMA supportSimon Marlow2016-06-109-246/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* Runtime linker: Break m32 allocator out into its own fileErik de Castro Lopo2016-05-251-1/+16
| | | | | | | | | | | | | | | | | | | | This makes the code a little more modular and allows the removal of some CPP hackery. By providing dummy implementations of of the `m32_*` functions (which simply call `errorBelch`) it means that the call sites for these functions are syntax checked even when `RTS_LINKER_USE_MMAP` is `0`. Also changes some size parameter types from `unsigned int` to `size_t`. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, simonmar, austin Reviewed By: simonmar, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2237
* {,M}BLOCK_SIZE_W * sizeof(W_) -> {,M}BLOCK_SIZETomas Carnecky2016-05-191-4/+4
| | | | | | | | | | Reviewers: austin, erikd, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2241
* Get types in osFreeMBlocks in sync with osGetMBlocksTomas Carnecky2016-05-191-1/+1
| | | | | | | | | | | | | The first argument of 'osFreeMBlocks' ought to have the same type as the return value from 'osGetMBlocks'. Make it so. Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: erikd, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2235
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-184-21/+23
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* Fix comments about scavenging WEAK objectsTakano Akio2016-05-122-7/+5
| | | | | | | | | | | | | | | | | | This is a follow-up of D2189. If fixes some comments, deletes a section in the User's Guide about the bug, and updates .mailmap as suggested on the WorkinConventions wiki page. Test Plan: It compiles. Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2202 GHC Trac Issues: #11108
* rts: Make function pointer parameters `const` where possibleErik de Castro Lopo2016-05-123-7/+7
| | | | | | | | | | | | | | | | If a function takes a pointer parameter and doesn't update what the pointer points to, we can add `const` to the parameter declaration to document that no updates occur. Test Plan: Validate on Linux, OS X and Windows Reviewers: austin, Phyx, bgamari, simonmar, hsyl20 Reviewed By: bgamari, simonmar, hsyl20 Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2200
* Handle promotion failures when scavenging a WEAK (#11108)Takano Akio2016-05-113-3/+42
| | | | | | | | | | | | | | | | | | | | | | Previously, we ignored promotion failures when evacuating fields of a WEAK object. When a failure happens, this resulted in an WEAK object pointing to another object in a younger generation, causing crashes. I used the test case from #11746 to check that the fix is working. However I haven't managed to produce a test case that quickly reproduces the issue. Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2189 GHC Trac Issues: #11108
* Use stdint types for Stg{Word,Int}{8,16,32,64}Tomas Carnecky2016-05-101-1/+1
| | | | | | | | | | | | | | | | | | We can't define Stg{Int,Word} in terms of {,u}intptr_t because STG depends on them being the exact same size as void*, and {,u}intptr_t does not make that guarantee. Furthermore, we also need to define StgHalf{Int,Word}, so the preprocessor if needs to stay. But we can at least keep it in a single place instead of repeating it in various files. Also define STG_{INT,WORD}{8,16,32,64}_{MIN,MAX} and use it in HsFFI.h, further reducing the need for CPP in other files. Reviewers: austin, bgamari, simonmar, hvr, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2182
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-0518-174/+174
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Add +RTS -AL<size>Simon Marlow2016-05-041-2/+5
| | | | | | | | | | | | | | | +RTS -AL<size> controls the total size of large objects that can be allocated before a GC is triggered. Previously this was always just the value of -A, and the limit mainly existed to prevent runaway allocation in pathalogical programs that allocate a lot of large objects. However, since the limit is shared between all cores, on a large multicore the default becomes more restrictive, and can end up triggering GC well before it would normally have been. Arguably a better default would be A*N, but this is probably excessive. Adding a flag lets you choose, and I've left the default as it was. See docs for usage.
* Allow limiting the number of GC threads (+RTS -qn<n>)Simon Marlow2016-05-041-3/+1
| | | | | | | | | | | | | | | | | | This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
* Cleanups related to MAX_FREE_LISTSimon Marlow2016-05-021-19/+24
| | | | | | | | | | | | | | | | - Rename to the (more correct) NUM_FREE_LISTS - NUM_FREE_LISTS should be derived from the block and mblock sizes, not defined manually. It was actually too large by one, which caused a little bit of (benign) extra work in the form of a redundant loop iteration in some cases. - Add some ASSERTs for input preconditions to log_2() and log_2_ceil() - Fix some comments - Fix usage in allocLargeChunk, to account for the fact that log_2_ceil() can return NUM_FREE_LISTS.
* Revert "Revert "Use __builtin_clz() to implement log_1()""U-THEFACEBOOK\smarlow2016-05-021-11/+27
| | | | | | | This reverts commit 546f24e4f8a7c086b1e5afcdda624176610cbcf8. And adds a fix for Windows: we need to use __builtin_clzll() rather than __builtin_clzl(), because StgWord is unsigned long long on Windows.
* RTS: delete BlockedOnGA* + dead codeThomas Miedema2016-04-291-4/+0
| | | | | | | | Some old stuff related to the PAR way. Reviewed by: austin, simonmar Differential Revision: https://phabricator.haskell.org/D2137
* Revert "Use __builtin_clz() to implement log_2()"Simon Peyton Jones2016-04-281-21/+11
| | | | This reverts commit 24864ba5587c1a0447beabae90529e8bb4fa117a.
* Just comments & reformattingSimon Marlow2016-04-261-35/+21
|
* Use __builtin_clz() to implement log_2()Simon Marlow2016-04-261-11/+21
| | | | A microoptimisation in the block allocator.
* Allocate blocks in the GC in batchesSimon Marlow2016-04-123-30/+29
| | | | | | | | | | | | | | | | | Avoids contention for the block allocator lock in the GC; this can be seen in the gc_alloc_block_sync counter emitted by +RTS -s. I experimented with this a while ago, and there was already commented-out code for it in GCUtils.c, but I've now improved it so that it doesn't result in significantly worse memory usage. * The old method of putting spare blocks on ws->part_list was wasteful, the spare blocks are now shared between all generations and retained between GCs. * repeated allocGroup() results in fragmentation, so I switched to using allocLargeChunk() instead which is fragmentation-friendly; we already use it for the same reason in nursery allocation.
* Cache the size of part_list/scavd_list (#11783)Simon Marlow2016-04-124-9/+20
| | | | | | | | | After a parallel GC, it is possible to have a long list of blocks in ws->part_list, if we did a lot of work stealing but didn't fill up the blocks we stole. These blocks persist until the next load-balanced GC, which might be a long time, and during every GC we were traversing this list to find its size. The fix is to maintain the size all the time, so we don't have to compute it.
* Small simplification (#11777)Simon Marlow2016-04-121-5/+1
| | | | | DEAD_WEAK used to have a different layout, see d61c623ed6b2d352474a7497a65015dbf6a72e12
* Remove all mentions of IND_OLDGEN outside of docs/rtsJoachim Breitner2016-03-291-1/+1
|
* Revert "Various ticky-related work"Ben Gamari2016-03-241-1/+1
| | | | | This reverts commit 6c2c853b11fe25c106469da7b105e2be596c17de which was supposed to be merged as individual commits.
* Various ticky-related workJoachim Breitner2016-03-241-1/+1
| | | | | | | | | | | | | | | | | | this Diff contains small, self-contained changes as I work towards fixing #10613. It is mostly created to let harbormaster do its job, but feedback is welcome as well. Please do not merge this via arc; I’d like to push the individual patches as layed out here. I might push mostly trivial ones even without review, as long as the build passes. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2014
* rts: drop unused global 'blackhole_queue'Sergei Trofimovich2016-02-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | Commit 5d52d9b64c21dcf77849866584744722f8121389 removed global 'blackhole_queue' in favour of new mechanism: when TSO hits blackhole TSO blocks waiting for 'MessgaeBlackhole' delivery. Patch removed unused global and updates stale comments. Noticed by Yuras Shumovich. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: build test Reviewers: simonmar, austin, Yuras, bgamari Reviewed By: Yuras, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1953
* rts: mark 'copied' as staticSergei Trofimovich2016-02-072-3/+1
| | | | | | | | Noticed by uselex.rb: copied: [R]: exported from: ./rts/dist/build/sm/GC.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* rts: mark scavenge_mutable_list as staticSergei Trofimovich2016-02-072-3/+1
| | | | | | | | | | Noticed by uselex.rb: scavenge_mutable_list: [R]: exported from: ./rts/dist/build/sm/Scav.o scavenge_mutable_list1: [R]: exported from: ./rts/dist/build/sm/Scav.thr_o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* rts: drop unused calcLiveBlocks, calcLiveWordsSergei Trofimovich2016-02-072-29/+0
| | | | | | | | | | | | | | | Use of these helper functions was removed by commit 18896fa2b06844407fd1e0d3f85cd3db97a96ff4 Author: Simon Marlow <marlowsd@gmail.com> Date: Wed Feb 2 15:49:55 2011 +0000 Noticed by uselex.rb: calcLiveBlocks: [R]: exported from: ./rts/dist/build/sm/Storage.o calcLiveWords: [R]: exported from: ./rts/dist/build/sm/Storage.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>