delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	rts: Implement concurrent collection in the nonmoving collector	Ben Gamari	2019-10-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the first mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
*	Correct closure observation, construction, and mutation on weak memory machines.	Travis Whitaker	2019-06-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	Fix raiseAsync() UNDERFLOW_FRAME handling in profiling runtime	Ömer Sinan Ağacan	2019-01-12	1	-7/+6
\| \| \| \| \| \| \|	UNDERFLOW_FRAMEs don't have profiling headers so we have to use the AP_STACK's function's CCS as the new frame's CCS. Fixes one of the many bugs caught by concprog001 (#15508).
*	rts: Add FALLTHROUGH macro	Ben Gamari	2018-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of using the GCC `/* fallthrough */` syntax we now use the `__attribute__((fallthrough))`, which Phyx says should be more portable than the former. Also adds a missing fallthrough annotation in the MachO linker, fixing #14613. Reviewers: erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #14613 Differential Revision: https://phabricator.haskell.org/D5292
*	Fix deadlock between STM and throwTo	Simon Marlow	2018-07-12	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a lock-order reversal between lockTSO() and the TVar lock, see #15136 for the details. It turns out we can fix this pretty easily by just deleting all the locking code(!). The principle for unblocking a `BlockedOnSTM` thread then becomes the same as for other kinds of blocking: if the TSO belongs to this capability then we do it directly, otherwise we send a message to the capability that owns the TSO. That is, a thread blocked on STM is owned by its capability, as it should be. The possible downside of this is that we might send multiple messages to wake up a thread when the thread is on another capability. This is safe, it's just not very efficient. I'll try to do some experiments to see if this is a problem. Test Plan: Test case from #15136 doesn't deadlock any more. Reviewers: bgamari, osa1, erikd Reviewed By: osa1 Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15136 Differential Revision: https://phabricator.haskell.org/D4956
*	Fix note references and some typos	Gabor Greif	2017-07-26	1	-1/+1
\|
*	Eagerly blackhole AP_STACKs	Ben Gamari	2017-07-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes #13615. See the rather lengthy Note [AP_STACKs must be eagerly blackholed] for details. Reviewers: simonmar, austin, erikd, dfeuer Subscribers: duog, dfeuer, hsyl20, rwbarton, thomie GHC Trac Issues: #13615 Differential Revision: https://phabricator.haskell.org/D3695
*	rts: annotate switch/case with '/* fallthrough */'	Sergei Trofimovich	2017-05-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes gcc-7.1.0 warnings of form: rts/sm/Scav.c:559:9: error: error: this statement may fall through [-Werror=implicit-fallthrough=] scavenge_fun_srt(info); ^~~~~~~~~~~~~~~~~~~~~~ Many of places are indeed unobvious and some are already annotated by comments. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
*	Prefer #if defined to #ifdef	Ben Gamari	2017-04-28	1	-4/+4
\| \| \| \|	Our new CPP linter enforces this.
*	Spelling only [ci skip]	Gabor Greif	2017-02-23	1	-1/+1
\|
*	Use C99's bool	Ben Gamari	2016-11-29	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
*	rts: More const correct-ness fixes	Erik de Castro Lopo	2016-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
*	rts: Replace `nat` with `uint32_t`	Erik de Castro Lopo	2016-05-05	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
*	rts: mark 'blockedThrowTo' as static	Sergei Trofimovich	2016-02-07	1	-0/+3
\| \| \| \| \| \| \| \|	Noticed by uselex.rb: blockedThrowTo: [R]: exported from: ./rts/dist/build/RaiseAsync.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
*	fix EBADF unqueueing in select backend (Trac #10590)	Sergei Trofimovich	2015-07-07	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alexander found a interesting case: 1. We have a queue of two waiters in a blocked_queue 2. first file descriptor changes state to RUNNABLE, second changes to INVALID 3. awaitEvent function dequeued RUNNABLE thread to a run queue and attempted to dequeue INVALID descriptor to a run queue. Unqueueing INVALID fails thusly: #3 0x000000000045cf1c in barf (s=0x4c1cb0 "removeThreadFromDeQueue: not found") at rts/RtsMessages.c:42 #4 0x000000000046848b in removeThreadFromDeQueue (...) at rts/Threads.c:249 #5 0x000000000049a120 in removeFromQueues (...) at rts/RaiseAsync.c:719 #6 0x0000000000499502 in throwToSingleThreaded__ (...) at rts/RaiseAsync.c:67 #7 0x0000000000499555 in throwToSingleThreaded (..) at rts/RaiseAsync.c:75 #8 0x000000000047c27d in awaitEvent (wait=rtsFalse) at rts/posix/Select.c:415 The problem here is a throwToSingleThreaded function that tries to unqueue a TSO from blocked_queue, but awaitEvent function leaves blocked_queue in a inconsistent state while traverses over blocked_queue: case RTS_FD_IS_READY: IF_DEBUG(scheduler, debugBelch("Waking up blocked thread %lu\n", (unsigned long)tso->id)); tso->why_blocked = NotBlocked; tso->_link = END_TSO_QUEUE; // Here we break the queue head pushOnRunQueue(&MainCapability,tso); break; Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: tested on a sample from T10590 Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: qnikst, thomie, bgamari Differential Revision: https://phabricator.haskell.org/D1024 GHC Trac Issues: #10590, #4934
*	Per-thread allocation counters and limits	Simon Marlow	2014-11-12	1	-0/+54
\| \| \| \| \| \| \| \|	This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
*	[skip ci] rts: Detabify RaiseAsync.c	Austin Seipp	2014-10-21	1	-229/+227
\| \| \| \|	Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Revert "Rename _closure to _static_closure, apply naming consistently."	Edward Z. Yang	2014-10-20	1	-2/+2
\| \| \| \| \| \| \|	This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
*	Rename _closure to _static_closure, apply naming consistently.	Edward Z. Yang	2014-10-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
*	Revert "rts: add Emacs 'Local Variables' to every .c file"	Simon Marlow	2014-09-29	1	-8/+0
\| \| \| \|	This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
*	rts: add Emacs 'Local Variables' to every .c file	Austin Seipp	2014-07-28	1	-0/+8
\| \| \| \| \| \| \| \|	This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Revert "Per-thread allocation counters and limits"	Simon Marlow	2014-05-04	1	-54/+0
\| \| \| \| \| \| \| \|	Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
*	Per-thread allocation counters and limits	Simon Marlow	2014-05-02	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
*	s/excpetions/exceptions/	Edward Z. Yang	2013-10-21	1	-1/+1
\| \| \| \|	Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
*	If exceptions are blocked, add stack overflow to blocked exceptions list. ↵	Edward Z. Yang	2013-10-11	1	-4/+1
\| \| \| \| \| \|	Fixes #8303. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
*	Implement atomicReadMVar, fixing #4001.	Edward Z. Yang	2013-07-09	1	-1/+3
\| \| \| \| \| \| \| \| \|	We add the invariant to the MVar blocked threads queue that threads blocked on an atomic read are always at the front of the queue. This invariant is easy to maintain, since takers are only ever added to the end of the queue. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
*	ticky enhancements	Nicolas Frisby	2013-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap__info resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation of closure in addition to allocation by that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
*	Produce new-style Cmm from the Cmm parser	Simon Marlow	2012-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
*	Make a function for get_itbl, rather than using a CPP macro	Ian Lynagh	2012-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This has several advantages: * It can be called from gdb * There is more type information for the user, and type checking for the compiler * Less opportunity for things to go wrong, e.g. due to missing parentheses or repeated execution The sizes of the non-debug .o files hasn't changed (other than Inlines.o), so I'm pretty sure the compiled code is identical.
*	throwTo: unlock the MSG_THROWTO object before returning (#6103)	Simon Marlow	2012-06-07	1	-2/+8
\|
*	raiseAsync: cope with ATOMICALLY_FRAMES inside UPDATE_FRAMES (#5866)	Simon Marlow	2012-02-27	1	-11/+56
\|
*	Fix crash with +RTS -xc (occasional cgrun057(profthreaded) failure)	Simon Marlow	2012-01-06	1	-1/+1
\| \| \| \| \| \|	Don't try to print a stack trace from raiseAsync() when there's no exception - we might just be deleting the thread, or suspending duplicate work.
*	Rename the CCCS field of StgTSO so as not to conflict with the CCCS ↵	Simon Marlow	2012-01-05	1	-1/+1
\| \| \| \| \| \|	pseudo-register Needed by #5357
*	+RTS -xc: print a the closure type of the exception too	Simon Marlow	2011-11-14	1	-1/+1
\|
*	Overhaul of infrastructure for profiling, coverage (HPC) and breakpoints	Simon Marlow	2011-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
*	GC refactoring and cleanup	Simon Marlow	2011-02-02	1	-3/+3
\| \| \| \| \| \| \| \| \|	Now we keep any partially-full blocks in the gc_thread[] structs after each GC, rather than moving them to the generation. This should give us slightly better locality (though I wasn't able to measure any difference). Also in this patch: better sanity checking with THREADED.
*	Implement stack chunks and separate TSO/STACK objects	Simon Marlow	2010-12-15	1	-55/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
*	throwTo: report the why_blocked value in the barf()	Simon Marlow	2010-12-03	1	-1/+1
\|
*	handle ThreadMigrating in throwTo() (#4811)	Simon Marlow	2010-12-03	1	-0/+12
\| \| \| \| \| \| \|	If a throwTo targets a thread that has just been created with forkOnIO, then it is possible the exception strikes while the thread is still in the process of migrating. throwTo() didn't handle this case, but it's fairly straightforward.
*	minor refactoring	Simon Marlow	2010-09-26	1	-21/+19
\|
*	Fix for interruptible FFI handling	Simon Marlow	2010-09-25	1	-8/+0
\| \| \| \| \| \|	Set tso->why_blocked before calling maybePerformBlockedException(), so that throwToSingleThreaded() doesn't try to unblock the current thread (it is already unblocked).
*	Don't interrupt when task blocks exceptions, don't immediately start exception.	Edward Z. Yang	2010-09-25	1	-3/+13
\|
*	Interruptible FFI calls with pthread_kill and CancelSynchronousIO. v4	Edward Z. Yang	2010-09-19	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
*	New asynchronous exception control API (ghc parts)	Simon Marlow	2010-07-08	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed on the libraries/haskell-cafe mailing lists http://www.haskell.org/pipermail/libraries/2010-April/013420.html This is a replacement for block/unblock in the asychronous exceptions API to fix a problem whereby a function could unblock asynchronous exceptions even if called within a blocked context. The new terminology is "mask" rather than "block" (to avoid confusion due to overloaded meanings of the latter). In GHC, we changed the names of some primops: blockAsyncExceptions# -> maskAsyncExceptions# unblockAsyncExceptions# -> unmaskAsyncExceptions# asyncExceptionsBlocked# -> getMaskingState# and added one new primop: maskUninterruptible# See the accompanying patch to libraries/base for the API changes.
*	Don't raise a throwTo when the target is masking and BlockedOnBlackHole	Simon Marlow	2010-05-05	1	-8/+14
\|
*	Fix for derefing ThreadRelocated TSOs in MVar operations	Simon Marlow	2010-04-07	1	-2/+2
\|
*	Change the representation of the MVar blocked queue	Simon Marlow	2010-04-01	1	-78/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The list of threads blocked on an MVar is now represented as a list of separately allocated objects rather than being linked through the TSOs themselves. This lets us remove a TSO from the list in O(1) time rather than O(n) time, by marking the list object. Removing this linear component fixes some pathalogical performance cases where many threads were blocked on an MVar and became unreachable simultaneously (nofib/smp/threads007), or when sending an asynchronous exception to a TSO in a long list of thread blocked on an MVar. MVar performance has actually improved by a few percent as a result of this change, slightly to my surprise. This is the final cleanup in the sequence, which let me remove the old way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent). It is now the case that only the Capability that owns a TSO may modify its state (well, almost), and this simplifies various things. More of the RTS is based on message-passing between Capabilities now.
*	change throwTo to use tryWakeupThread rather than unblockOne	Simon Marlow	2010-03-29	1	-31/+23
\|
*	New implementation of BLACKHOLEs	Simon Marlow	2010-03-29	1	-106/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
*	Fix a couple of bugs in the throwTo handling, exposed by conc016(threaded2)	Simon Marlow	2010-03-11	1	-8/+11
\|