delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	rts: forkOn context switches the target capability	Douglas Wilson	2022-07-16	1	-3/+8
\| \| \| \|	Fixes #21824
*	Move `/includes` to `/rts/include`, sort per package better	John Ericson	2021-08-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
*	rts: Use a separate free block list for allocatePinned	Matthew Pickering	2021-03-08	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	The way in which allocatePinned took blocks out of the nursery was leading to horrible fragmentation in some workloads. The strategy now is that a separate free block list is reserved for each capability and blocks are taken from there. When it's empty the global SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated. Fixes #19481
*	rts: Flush eventlog buffers from flushEventLog	Ben Gamari	2020-11-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	As noted in #18043, flushTrace failed flush anything beyond the writer. This means that a significant amount of data sitting in capability-local event buffers may never get flushed, despite the users' pleads for us to flush. Fix this by making flushEventLog flush all of the event buffers before flushing the writer. Fixes #18043.
*	Merge branch 'wip/tsan/storage' into wip/tsan/all	Ben Gamari	2020-11-01	1	-2/+4
\|\
\| *	rts/GC: Use atomics	Ben Gamari	2020-10-30	1	-2/+4
\| \|
* \|	rts: Make write of to_cap->inbox atomicwip/tsan/sched	Ben Gamari	2020-10-24	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	This is necessary since emptyInbox may read from to_cap->inbox without taking cap->lock.
* \|	rts: Avoid data races in message handling	Ben Gamari	2020-10-24	1	-2/+6
\| \|
* \|	rts: Mitigate races in capability interruption logic	Ben Gamari	2020-10-24	1	-3/+4
\| \|
* \|	rts: Add assertions for task ownership of capabilities	Ben Gamari	2020-10-24	1	-4/+4
\|/
*	rts/Capability: Intialize interrupt field	Ben Gamari	2020-10-24	1	-0/+4
\| \| \| \| \| \|	Previously this was left uninitialized. Also clarify some comments.
*	RTS: avoid overflow on 32-bit arch (#18375)	Sylvain Henry	2020-06-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We're now correctly computing allocated bytes on 32-bit arch, so we get huge increases. Metric Increase: haddock.Cabal haddock.base haddock.compiler space_leak_001
*	Merge non-moving garbage collector	Ben Gamari	2019-10-23	1	-1/+6
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: \| A variety of small optimisations to further reduce pause times. \| * wip/gc/compact-nfdata: \| Introduce support for compact regions into the non-moving \|\ collector \| \ \| \ \| \| * wip/gc/segment-header-to-bdescr: \| \| \| Another optimization that we are considering, pushing \| \| \| some segment metadata into the segment descriptor for \| \| \| the sake of locality during mark \| \| \| \| * \| wip/gc/shortcutting: \| \| \| Support for indirection shortcutting and the selector optimization \| \| \| in the non-moving heap. \| \| \| * \| \| wip/gc/docs: \| \|/ Work on implementation documentation. \| / \|/ * wip/gc/everything: \| A roll-up of everything below. \|\ \| \ \| \|\ \| \| \ \| \| * wip/gc/optimize: \| \| \| A variety of optimizations, primarily to the mark loop. \| \| \| Some of these are microoptimizations but a few are quite \| \| \| significant. In particular, the prefetch patches have \| \| \| produced a nontrivial improvement in mark performance. \| \| \| \| \| * wip/gc/aging: \| \| \| Enable support for aging in major collections. \| \| \| \| * \| wip/gc/test: \| \| \| Fix up the testsuite to more or less pass. \| \| \| * \| \| wip/gc/instrumentation: \| \| \| A variety of runtime instrumentation including statistics \| \| / support, the nonmoving census, and eventlog support. \| \|/ \| / \|/ * wip/gc/nonmoving-concurrent: \| The concurrent write barriers. \| * wip/gc/nonmoving-nonconcurrent: \| The nonmoving collector without the write barriers necessary \| for concurrent collection. \| * wip/gc/preparation: \| A merge of the various preparatory patches that aren't directly \| implementing the GC. \| \| * GHC HEAD . . . ```
\| *	rts: Implement concurrent collection in the nonmoving collector	Ben Gamari	2019-10-20	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the first mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
\| *	rts: Non-concurrent mark and sweep	Ömer Sinan Ağacan	2019-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the core heap structure and a serial mark/sweep collector which can be used to manage the oldest-generation heap. This is the first step towards a concurrent mark-and-sweep collector aimed at low-latency applications. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) The basic heap structure used in this design is heavily inspired by K. Ueno & A. Ohori. "A fully concurrent garbage collector for functional programs on multicore processors." /ACM SIGPLAN Notices/ Vol. 51. No. 9 (presented by ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). The mark queue is a fairly straightforward chunked-array structure. The representation is a bit more verbose than a typical mark queue to accomodate a combination of two features: * a mark FIFO, which improves the locality of marking, reducing one of the major overheads seen in mark/sweep allocators (see [1] for details) * the selector optimization and indirection shortcutting, which requires that we track where we found each reference to an object in case we need to update the reference at a later point (e.g. when we find that it is an indirection). See Note [Origin references in the nonmoving collector] (in `NonMovingMark.h`) for details. Beyond this the mark/sweep is fairly run-of-the-mill. [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for Mark-Sweep Garbage Collection." ISMM 2007. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* \|	Implement s390x LLVM backend.	Stefan Schulze Frielinghaus	2019-10-22	1	-1/+3
\|/ \| \| \| \| \|	This patch adds support for the s390x architecture for the LLVM code generator. The patch includes a register mapping of STG registers onto s390x machine registers which enables a registerised build.
*	Expunge #ifdef and #ifndef from the codebase	John Ericson	2019-07-14	1	-1/+1
\| \| \| \| \| \| \| \|	These are unexploded minds as far as the linter is concerned. I don't want to hit in my MRs by mistake! I did this with `sed`, and then rolled back some changes in the docs, config.guess, and the linter itself.
*	Update Wiki URLs to point to GitLab	Takenobu Tani	2019-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
*	Some assertions and comments in scheduler	Ömer Sinan Ağacan	2018-11-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test Plan: I can't validate this because of existing errors with the debug runtime. I'll see if this introduces any new failures. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5337
*	rts: Update some comments, minor refactoring	Ömer Sinan Ağacan	2018-06-27	1	-1/+1
\|
*	rts: Rip out support for STM invariants	Ben Gamari	2018-06-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This feature has some very serious correctness issues (#14310), introduces a great deal of complexity, and hasn't seen wide usage. Consequently we are removing it, as proposed in Proposal #77 [1]. This is heavily based on a patch from fryguybob. Updates stm submodule. [1] https://github.com/ghc-proposals/ghc-proposals/pull/77 Test Plan: Validate Reviewers: erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14310 Differential Revision: https://phabricator.haskell.org/D4760
*	Prefer #if defined to #ifdef	Ben Gamari	2017-04-28	1	-2/+2
\| \| \| \|	Our new CPP linter enforces this.
*	cpp: Use #pragma once instead of #ifndef guards	Ben Gamari	2017-04-23	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
*	Use C99's bool	Ben Gamari	2016-11-29	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
*	Add hs_try_putmvar()	Simon Marlow	2016-09-12	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a fast, non-blocking, asynchronous, interface to tryPutMVar that can be called from C/C++. It's useful for callback-based C/C++ APIs: the idea is that the callback invokes hs_try_putmvar(), and the Haskell code waits for the callback to run by blocking in takeMVar. The callback doesn't block - this is often a requirement of callback-based APIs. The callback wakes up the Haskell thread with minimal overhead and no unnecessary context-switches. There are a couple of benchmarks in testsuite/tests/concurrent/should_run. Some example results comparing hs_try_putmvar() with using a standard foreign export: ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s hs_try_putmvar() is 4x faster for this workload (see the source for hs_try_putmvar003.hs for details of the workload). An alternative solution is to use the IO Manager for this. We've tried it, but there are problems with that approach: * Need to create a new file descriptor for each callback * The IO Manger thread(s) become a bottleneck * More potential for things to go wrong, e.g. throwing an exception in an IO Manager callback kills the IO Manager thread. Test Plan: validate; new unit tests Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2501
*	Fix an assertion that could randomly fail	Simon Marlow	2016-08-05	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ASSERT_THREADED_CAPABILITY_INVARIANTS was testing properties of the returning_tasks queue, but that requires cap->lock to access safely. This assertion would randomly fail if stressed enough. Instead I've removed it from the catch-all ASSERT_PARTIAL_CAPABILITIY_INVARIANTS and made it a separate assertion only called under cap->lock. Test Plan: ``` cd testsuite/tests/concurrent/should_run make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0 while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 \|\| break; done ``` Reviewers: niteria, bgamari, ezyang, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2440 GHC Trac Issues: #10860
*	Track the lengths of the thread queues	Simon Marlow	2016-08-03	1	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Knowing the length of the run queue in O(1) time is useful: for example we don't have to traverse the run queue to know how many threads we have to migrate in schedulePushWork(). Test Plan: validate Reviewers: ezyang, erikd, bgamari, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2437
*	NUMA cleanups	Simon Marlow	2016-06-17	1	-4/+13
\| \| \| \| \|	- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c - Add a test to tests/rts
*	NUMA support	Simon Marlow	2016-06-10	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
*	rts: Make function pointer parameters `const` where possible	Erik de Castro Lopo	2016-05-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a function takes a pointer parameter and doesn't update what the pointer points to, we can add `const` to the parameter declaration to document that no updates occur. Test Plan: Validate on Linux, OS X and Windows Reviewers: austin, Phyx, bgamari, simonmar, hsyl20 Reviewed By: bgamari, simonmar, hsyl20 Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2200
*	rts: Replace `nat` with `uint32_t`	Erik de Castro Lopo	2016-05-05	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \|	The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
*	Allow limiting the number of GC threads (+RTS -qn<n>)	Simon Marlow	2016-05-04	1	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
*	rts: mark 'shutdownCapability' as static	Sergei Trofimovich	2016-02-07	1	-10/+0
\| \| \| \| \| \| \| \| \| \|	Noticed by uselex.rb: last_free_capability: [R]: exported from: ./rts/dist/build/Capability.o shutdownCapability: [R]: exported from: ./rts/dist/build/Capability.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
*	Fix deadlock (#10545)	Simon Marlow	2015-06-26	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	yieldCapability() was not prepared to be called by a Task that is not either a worker or a bound Task. This could happen if we ended up in yieldCapability via this call stack: performGC() scheduleDoGC() requestSync() yieldCapability() and there were a few other ways this could happen via requestSync. The fix is to handle this case in yieldCapability(): when the Task is not a worker or a bound Task, we put it on the returning_workers queue, where it will be woken up again. Summary of changes: * `yieldCapability`: factored out subroutine waitForWorkerCapability` * `waitForReturnCapability` renamed to `waitForCapability`, and factored out subroutine `waitForReturnCapability` * `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does not take a lock and no longer tests if `!isBoundTask()` * `yieldCapability` adjusted for refactorings, only change in behavior is when it is not a worker or bound task. Test Plan: * new test concurrent/should_run/performGC * validate Reviewers: niteria, austin, ezyang, bgamari Subscribers: thomie, bgamari Differential Revision: https://phabricator.haskell.org/D997 GHC Trac Issues: #10545
*	Make clearNursery free	Simon Marlow	2014-11-25	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: clearNursery resets all the bd->free pointers of nursery blocks to make the blocks empty. In profiles we've seen clearNursery taking significant amounts of time particularly with large -N and -A values. This patch moves the work of clearNursery to the point at which we actually need the new block, thereby introducing an invariant that blocks to the right of the CurrentNursery pointer still need their bd->free pointer reset. This should make things faster overall, because we don't need to clear blocks that we don't use. Test Plan: validate Reviewers: AndreasVoellmy, ezyang, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D318
*	[skip ci] rts: Detabify Capability.h	Austin Seipp	2014-10-21	1	-18/+18
\| \| \| \|	Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Revert "rts: add Emacs 'Local Variables' to every .c file"	Simon Marlow	2014-09-29	1	-8/+0
\| \| \| \|	This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
*	Revert "Revert "rts/base: Fix #9423"" and resolve issue that caused the revert.	Andreas Voellmy	2014-09-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This reverts commit 4748f5936fe72d96edfa17b153dbfd84f2c4c053. The fix for #9423 was reverted because this commit introduced a C function setIOManagerControlFd() (defined in Schedule.c) defined for all OS types, while the prototype (in includes/rts/IOManager.h) was only included when mingw32_HOST_OS is not defined. This broke Windows builds. This commit reverts the original commit and resolves the problem by only defining setIOManagerControlFd() when mingw32_HOST_OS is defined. Hence the missing prototype error should not occur on Windows. In addition, since the io_manager_control_wr_fd field of the Capability struct is only usd by the setIOManagerControlFd, this commit includes the io_manager_control_wr_fd field in the Capability struct only when mingw32_HOST_OS is not defined. Test Plan: Try to compile successfully on all platforms. Reviewers: austin Reviewed By: austin Subscribers: simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D174
*	Revert "rts/base: Fix #9423"	Austin Seipp	2014-08-22	1	-3/+0
\| \| \| \| \| \| \| \| \|	This should fix the Windows fallout, and hopefully this will be fixed once that's sorted out. This reverts commit f9f89b7884ccc8ee5047cf4fffdf2b36df6832df. Signed-off-by: Austin Seipp <austin@well-typed.com>
*	rts/base: Fix #9423	Andreas Voellmy	2014-08-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix #9423. The problem in #9423 is caused when code invoked by `hs_exit()` waits on all foreign calls to return, but some IO managers are in `safe` foreign calls and do not return. The previous design signaled to the timer manager (via its control pipe) that it should "die" and when the timer manager returned to Haskell-land, the Haskell code in timer manager then signalled to the IO manager threads that they should return from foreign calls and `die`. Unfortunately, in the shutdown sequence the timer manager is unable to return to Haskell-land fast enough and so the code that signals to the IO manager threads (via their control pipes) is never executed and the IO manager threads remain out in the foreign calls. This patch solves this problem by having the RTS signal to all the IO manager threads (via their control pipes; and in addition to signalling to the timer manager thread) that they should shutdown (in `ioManagerDie()` in `rts/Signals.c`. To do this, we arrange for each IO manager thread to register its control pipe with the RTS (in `GHC.Thread.startIOManagerThread`). In addition, `GHC.Thread.startTimerManagerThread` registers its control pipe. These are registered via C functions `setTimerManagerControlFd` (in `rts/Signals.c`) and `setIOManagerControlFd` (in `rts/Capability.c`). The IO manager control pipe file descriptors are stored in a new field of the `Capability_ struct`. Test Plan: See the notes on #9423 to recreate the problem and to verify that it no longer occurs with the fix. Auditors: simonmar Reviewers: simonmar, edsko, ezyang, austin Reviewed By: austin Subscribers: phaskell, simonmar, ezyang, carter, relrod Differential Revision: https://phabricator.haskell.org/D129 GHC Trac Issues: #9423, #9284
*	rts: add Emacs 'Local Variables' to every .c file	Austin Seipp	2014-07-28	1	-0/+8
\| \| \| \| \| \| \| \|	This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Per-capability nursery weak pointer lists, fixes #9075	Edward Z. Yang	2014-05-29	1	-0/+5
\| \| \| \|	Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
*	Globally replace "hackage.haskell.org" with "ghc.haskell.org"	Simon Marlow	2013-10-01	1	-1/+1
\|
*	Don't move Capabilities in setNumCapabilities (#8209)	Simon Marlow	2013-09-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
*	Make enabled_capabilities visible (fixes dynamic linking)	Simon Marlow	2012-12-13	1	-2/+1
\|
*	Deprecate lnat, and use StgWord instead	Simon Marlow	2012-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
*	scheduleYield: avoid doing a GC again if we just did one	Ian Lynagh	2012-06-07	1	-1/+1
\| \| \| \| \| \|	If we are interrupted to do a GC, then we do not immediately do another one. This avoids a starvation situation where one Capability keeps forcing a GC and the other Capabilities make no progress at all.
*	Calculate the total memory allocated on a per-capability basis	Duncan Coutts	2012-04-04	1	-0/+2
\| \| \| \| \| \| \|	In addition to the existing global method. For now we just do it both ways and assert they give the same grand total. At some stage we can simplify the global method to just take the sum of the per-cap counters.
*	zap extra semi	Gabor Greif	2012-02-27	1	-1/+1
\|
*	Allocate pinned object blocks from the nursery, not the global	Simon Marlow	2012-02-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	allocator. Prompted by a benchmark posted to parallel-haskell@haskell.org by Andreas Voellmy <andreas.voellmy@gmail.com>. This program exhibits contention for the block allocator when run with -N2 and greater without the fix: {-# LANGUAGE MagicHash, UnboxedTuples, BangPatterns #-} module Main where import Control.Monad import Control.Concurrent import System.Environment import GHC.IO import GHC.Exts import GHC.Conc main = do [m] <- fmap (fmap read) getArgs n <- getNumCapabilities ms <- replicateM n newEmptyMVar sequence [ forkIO $ busyWorkerB (m `quot` n) >> putMVar mv () \| mv <- ms ] mapM takeMVar ms busyWorkerB :: Int -> IO () busyWorkerB n_loops = go 0 where go !n \| n >= n_loops = return () \| otherwise = do p <- (IO $ \s -> case newPinnedByteArray# 1024# s of { (# s', mbarr# #) -> (# s', () #) } ) go (n+1)