delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Run sparks in batches, instead of creating a new thread for each one	Simon Marlow	2008-11-06	1	-0/+1
\| \| \| \| \|	Signficantly reduces the overhead for par, which means that we can make use of paralellism at a much finer granularity.
*	Refactoring and reorganisation of the scheduler	Simon Marlow	2008-10-22	1	-39/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the way we look for work in the scheduler. Previously, checking to see whether there was anything to do was a non-side-effecting operation, but this has changed now that we do work-stealing. This lead to a refactoring of the inner loop of the scheduler. Also, lots of cleanup in the new work-stealing code, but no functional changes. One new statistic is added to the +RTS -s output: SPARKS: 1430 (2 converted, 1427 pruned) lets you know something about the use of `par` in the program.
*	Work stealing for sparks	berthold@mathematik.uni-marburg.de	2008-09-15	2	-95/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark stealing support for PARALLEL_HASKELL and THREADED_RTS versions of the RTS. Spark pools are per capability, separately allocated and held in the Capability structure. The implementation uses Double-Ended Queues (deque) and cas-protected access. The write end of the queue (position bottom) can only be used with mutual exclusion, i.e. by exactly one caller at a time. Multiple readers can steal()/findSpark() from the read end (position top), and are synchronised without a lock, based on a cas of the top position. One reader wins, the others return NULL for a failure. Work stealing is called when Capabilities find no other work (inside yieldCapability), and tries all capabilities 0..n-1 twice, unless a theft succeeds. Inside schedulePushWork, all considered cap.s (those which were idle and could be grabbed) are woken up. Future versions should wake up capabilities immediately when putting a new spark in the local pool, from newSpark(). Patch has been re-recorded due to conflicting bugfixes in the sparks.c, also fixing a (strange) conflict in the scheduler.
*	add readTVarIO :: TVar a -> IO a	Simon Marlow	2008-10-10	2	-0/+3
\|
*	Remove #define _BSD_SOURCE from Stg.h	Ian Lynagh	2008-10-06	1	-3/+0
\| \| \| \|	It's no longer needed, as base no longer #includes it
*	On Linux use libffi for allocating executable memory (fixed #738)	Simon Marlow	2008-09-19	2	-2/+2
\|
*	Move the context_switch flag into the Capability	Simon Marlow	2008-09-19	3	-2/+2
\| \| \| \| \|	Fixes a long-standing bug that could in some cases cause sub-optimal scheduling behaviour.
*	Fix MacOS X build: don't believe __GNUC_GNU_INLINE__ on MacOS X	Simon Marlow	2008-09-18	1	-1/+5
\|
*	FIX #2469: sort out our static/extern inline story	Simon Marlow	2008-09-16	2	-15/+22
\| \| \| \| \| \|	gcc has changed the meaning of "extern inline" when certain flags are on (e.g. --std=gnu99), and this broke our use of it in the header files.
*	when a memory leak is detected, report which blocks are unreachable	Simon Marlow	2008-09-09	1	-1/+2
\|
*	More sanity checking for the TSO write barrier	Simon Marlow	2008-09-09	1	-0/+2
\| \| \| \|	Check that all threads marked as dirty are really on the mutable list.
*	Make LOOKS_LIKE_{INFO,CLOSURE}_PTR into inline functions, instead of macros	Simon Marlow	2008-09-08	1	-9/+18
\| \| \| \| \| \|	The macros were duplicating their arguments, which was normally harmless, but in the parallel GC was actually wrong and caused spurious assertion failures.
*	Define _BSD_SOURCE in Stg.h	Ian Lynagh	2008-09-04	1	-1/+5
\| \| \| \|	This means S_ISSOCK gets defined on Linux
*	bindists are now some way towards working	Ian Lynagh	2008-08-10	1	-4/+2
\|
*	FIX #2332: avoid overflow on 64-bit machines in the memory allocator	Simon Marlow	2008-07-29	1	-4/+4
\|
*	add threadStatus# primop, for querying the status of a ThreadId#	Simon Marlow	2008-07-10	2	-0/+2
\|
*	add new primop: asyncExceptionsBlocked# :: IO Bool	Simon Marlow	2008-07-09	1	-0/+1
\|
*	FIX part of #2301, and #1619	Simon Marlow	2008-07-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2301: Control-C now causes the new exception (AsyncException UserInterrupt) to be raised in the main thread. The signal handler is set up by GHC.TopHandler.runMainIO, and can be overriden in the usual way by installing a new signal handler. The advantage is that now all programs will get a chance to clean up on ^C. When UserInterrupt is caught by the topmost handler, we now exit the program via kill(getpid(),SIGINT), which tells the parent process that we exited as a result of ^C, so the parent can take appropriate action (it might want to exit too, for example). One subtlety is that we have to use a weak reference to the ThreadId for the main thread, so that the signal handler doesn't prevent the main thread from being subject to deadlock detection. 1619: we now ignore SIGPIPE by default. Although POSIX says that a SIGPIPE should terminate the process by default, I wonder if this decision was made because many C applications failed to check the exit code from write(). In Haskell a failed write due to a closed pipe will generate an exception anyway, so the main difference is that we now get a useful error message instead of silent program termination. See #1619 for more discussion.
*	FIX #2313 do not include BFD symbols in RTS when the BFD library is not ↵	Karel Gardas	2008-05-28	1	-1/+1
\| \| \| \|	available for linking
*	Fix up inlines for gcc 4.3	Simon Marlow	2008-06-19	3	-21/+44
\| \| \| \| \| \| \| \| \|	gcc 4.3 emits warnings for static inline functions that its heuristics decided not to inline. The workaround is to either mark appropriate functions as "hot" (a new attribute in gcc 4.3), or sometimes to use "extern inline" instead. With this fix I can validate with gcc 4.3 on Fedora 9.
*	Experimental "mark-region" strategy for the old generation	Simon Marlow	2008-06-09	3	-3/+11
\| \| \| \|	Sometimes better than the default copying, enabled by +RTS -w
*	remove EVACUATED: store the forwarding pointer in the info pointer	Simon Marlow	2008-04-17	3	-7/+5
\|
*	Don't traverse the entire list of threads on every GC (phase 1)	Simon Marlow	2008-04-16	1	-0/+3
\| \| \| \| \| \|	Instead of keeping a single list of all threads, keep one per step and only look at the threads belonging to steps that we are collecting.
*	Add a write barrier to the TSO link field (#1589)	Simon Marlow	2008-04-16	5	-7/+28
\|
*	pad step_workspace to 64 bytes, to speed up access to gct->steps[]	Simon Marlow	2008-04-16	1	-0/+6
\|
*	Reorganisation to fix problems related to the gct register variable	Simon Marlow	2008-04-16	2	-5/+6
\| \| \| \| \| \| \| \| \|	- GCAux.c contains code not compiled with the gct register enabled, it is callable from outside the GC - marking functions are moved to their relevant subsystems, outside the GC - mark_root needs to save the gct register, as it is called from outside the GC
*	improvements to +RTS -s output	Simon Marlow	2008-04-16	1	-0/+1
\| \| \| \| \| \| \|	- count and report number of parallel collections - calculate bytes scanned in addition to bytes copied per thread - calculate "work balance factor" - tidy up the formatting a bit
*	Keep track of an accurate count of live words in each step	Simon Marlow	2008-04-16	1	-0/+1
\| \| \| \| \|	This means we can calculate slop easily, and also improve predictability of GC.
*	Allow work units smaller than a block to improve load balancing	Simon Marlow	2008-04-16	2	-0/+4
\|
*	use RTS_VAR()	Simon Marlow	2008-04-16	1	-1/+1
\|
*	treat the global work list as a queue rather than a stack	Simon Marlow	2008-04-16	1	-0/+1
\|
*	GC: move static object processinng into thread-local storage	Simon Marlow	2008-04-16	1	-1/+0
\|
*	Add +RTS -vg flag for requesting some GC trace messages, outside DEBUG	Simon Marlow	2008-04-16	1	-0/+1
\| \| \| \| \| \| \|	DEBUG imposes a significant performance hit in the GC, yet we often want some of the debugging output, so -vg gives us the cheap trace messages without the sanity checking of DEBUG, just like -vs for the scheduler.
*	GC: rearrange storage to reduce memory accesses in the inner loop	Simon Marlow	2008-04-16	1	-6/+15
\|
*	Add profiling of spinlocks	Simon Marlow	2008-04-16	1	-0/+4
\|
*	rename StgSync to SpinLock	Simon Marlow	2008-04-16	1	-24/+19
\|
*	Release some of the memory allocated to a stack when it shrinks (#2090)	simonmar@microsoft.com	2008-02-28	2	-9/+21
\| \| \| \| \| \|	When a stack is occupying less than 1/4 of the memory it owns, and is larger than a megablock, we release half of it. Shrinking is O(1), it doesn't need to copy the stack.
*	round_to_mblocks: should use StgWord not nat	Simon Marlow	2008-02-20	1	-2/+2
\|
*	add ROUNDUP_BYTES_TO_WDS	simonmar@microsoft.com	2008-02-15	1	-1/+3
\|
*	memInventory: optionally dump the memory inventory	simonmar@microsoft.com	2008-01-30	1	-1/+1
\| \| \| \|	in addition to checking for leaks
*	recordMutableGen_GC: we must call the spinlocked version of allocBlock()	Simon Marlow	2008-01-11	1	-1/+18
\|
*	calculate wastage due to unused memory at the end of each block	simonmar@microsoft.com	2007-12-14	1	-1/+3
\|
*	remove declarations for variables that no longer exist	simonmar@microsoft.com	2007-12-13	1	-3/+0
\|
*	improvements to PAPI support	simonmar@microsoft.com	2007-11-20	1	-2/+7
\| \| \| \| \| \| \|	- major (multithreaded) GC is measured separately from minor GC - events to measure can now be specified on the command line, e.g prog +RTS -a+PAPI_TOT_CYC
*	Initial parallel GC support	Simon Marlow	2007-10-31	1	-1/+2
\| \| \| \| \| \| \| \| \|	eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised, minor GCs are still sequential. Don't use more threads than you have CPUs. It works most of the time, although you won't see much speedup yet. Tuning and more work on stability still required.
*	Refactoring of the GC in preparation for parallel GC	Simon Marlow	2007-10-31	2	-40/+60
\| \| \| \| \| \| \| \| \| \| \| \|	This patch localises the state of the GC into a gc_thread structure, and reorganises the inner loop of the GC to scavenge one block at a time from global work lists in each "step". The gc_thread structure has a "workspace" for each step, in which it collects evacuated objects until it has a full block to push out to the step's global list. Details of the algorithm will be on the wiki in due course. At the moment, THREADED_RTS does not compile, but the single-threaded GC works (and is 10-20% slower than before).
*	move GetRoots() to GC.c	Simon Marlow	2007-10-30	1	-2/+2
\|
*	Fix conversions between Double/Float and simple-integer	Ian Lynagh	2008-06-14	2	-1/+3
\|
*	Fix unreg build	Simon Marlow	2008-06-04	1	-0/+1
\|
*	FIX #1861: floating-point constants for infinity and NaN in via-C	Simon Marlow	2008-05-12	1	-0/+3
\|