summaryrefslogtreecommitdiff
path: root/rts/StgMiscClosures.cmm
Commit message (Collapse)AuthorAgeFilesLines
* Working towards fixing DLLs on Win64Ian Lynagh2012-05-061-3/+3
|
* Soem more Wind64 fixesIan Lynagh2012-03-161-3/+3
| | | | | We may need to do this differently once we get as far as building the RTS in the dyn ways.
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Tabs -> SpacesDavid Terei2011-11-221-24/+24
|
* Remove some old comments about the manglerDavid Terei2011-11-221-2/+0
|
* Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-021-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-151-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* Change some TARGET tests to HOST tests in the RTSIan Lynagh2010-07-131-1/+1
| | | | Which was being used seemed to be random
* initialise the headers of MSG_BLACKHOLE objects properlySimon Marlow2010-04-071-1/+1
|
* Remove the IND_OLDGEN and IND_OLDGEN_PERM closure typesSimon Marlow2010-04-011-39/+0
| | | | | | | These are no longer used: once upon a time they used to have different layout from IND and IND_PERM respectively, but that is no longer the case since we changed the remembered set to be an array of addresses instead of a linked list of closures.
* Change the representation of the MVar blocked queueSimon Marlow2010-04-011-2/+7
| | | | | | | | | | | | | | | | | | | | | The list of threads blocked on an MVar is now represented as a list of separately allocated objects rather than being linked through the TSOs themselves. This lets us remove a TSO from the list in O(1) time rather than O(n) time, by marking the list object. Removing this linear component fixes some pathalogical performance cases where many threads were blocked on an MVar and became unreachable simultaneously (nofib/smp/threads007), or when sending an asynchronous exception to a TSO in a long list of thread blocked on an MVar. MVar performance has actually improved by a few percent as a result of this change, slightly to my surprise. This is the final cleanup in the sequence, which let me remove the old way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent). It is now the case that only the Capability that owns a TSO may modify its state (well, almost), and this simplifies various things. More of the RTS is based on message-passing between Capabilities now.
* change throwTo to use tryWakeupThread rather than unblockOneSimon Marlow2010-03-291-0/+4
|
* New implementation of BLACKHOLEsSimon Marlow2010-03-291-73/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
* Use message-passing to implement throwTo in the RTSSimon Marlow2010-03-111-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces some complicated locking schemes with message-passing in the implementation of throwTo. The benefits are - previously it was impossible to guarantee that a throwTo from a thread running on one CPU to a thread running on another CPU would be noticed, and we had to rely on the GC to pick up these forgotten exceptions. This no longer happens. - the locking regime is simpler (though the code is about the same size) - threads can be unblocked from a blocked_exceptions queue without having to traverse the whole queue now. It's a rare case, but replaces an O(n) operation with an O(1). - generally we move in the direction of sharing less between Capabilities (aka HECs), which will become important with other changes we have planned. Also in this patch I replaced several STM-specific closure types with a generic MUT_PRIM closure type, which allowed a lot of code in the GC and other places to go away, hence the line-count reduction. The message-passing changes resulted in about a net zero line-count difference.
* If a comment says "Is this correct?", it's not.Ben.Lippmeier@anu.edu.au2009-11-141-1/+1
|
* Don't share low valued Int and Char closures with Windows DLLsBen.Lippmeier@anu.edu.au2009-11-141-2/+7
|
* Remove old GUM/GranSim codeSimon Marlow2009-06-021-44/+0
|
* FIX #1364: added support for C finalizers that run as soon as the value is ↵Simon Marlow2008-12-101-3/+3
| | | | | | | | | | | not longer reachable. Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by Simon Marlow: - mkWeakFinalizer# commoned up with mkWeakFinalizerEnv# - GC parameters to ALLOC_PRIM fixed
* Add optional eager black-holing, with new flag -feager-blackholingSimon Marlow2008-11-181-5/+24
| | | | | | | | | | | | | | | Eager blackholing can improve parallel performance by reducing the chances that two threads perform the same computation. However, it has a cost: one extra memory write per thunk entry. To get the best results, any code which may be executed in parallel should be compiled with eager blackholing turned on. But since there's a cost for sequential code, we make it optional and turn it on for the parallel package only. It might be a good idea to compile applications (or modules) with parallel code in with -feager-blackholing. ToDo: document -feager-blackholing.
* Move Int, Float and Double into ghc-prim:GHC.TypesIan Lynagh2008-08-061-3/+3
|
* C# has moved to ghc-prim:GHC.TypesIan Lynagh2008-08-051-3/+3
|
* remove EVACUATED: store the forwarding pointer in the info pointerSimon Marlow2008-04-171-8/+0
|
* Add a write barrier to the TSO link field (#1589)Simon Marlow2008-04-161-2/+2
|
* Initial parallel GC supportSimon Marlow2007-10-311-4/+1
| | | | | | | | | eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised, minor GCs are still sequential. Don't use more threads than you have CPUs. It works most of the time, although you won't see much speedup yet. Tuning and more work on stability still required.
* Do not #include external header files when compiling via CSimon Marlow2008-04-021-6/+4
| | | | | | | | | | | | | | | | | | | | | | | This has several advantages: - -fvia-C is consistent with -fasm with respect to FFI declarations: both bind to the ABI, not the API. - foreign calls can now be inlined freely across module boundaries, since a header file is not required when compiling the call. - bootstrapping via C will be more reliable, because this difference in behavour between the two backends has been removed. There is one disadvantage: - we get no checking by the C compiler that the FFI declaration is correct. So now, the c-includes field in a .cabal file is always ignored by GHC, as are header files specified in an FFI declaration. This was previously the case only for -fasm compilations, now it is also the case for -fvia-C too.
* Add a proper write barrier for MVarsSimon Marlow2007-10-111-4/+4
| | | | | | | | | | | | Previously MVars were always on the mutable list of the old generation, which meant every MVar was visited during every minor GC. With lots of MVars hanging around, this gets expensive. We addressed this problem for MUT_VARs (aka IORefs) a while ago, the solution is to use a traditional GC write-barrier when the object is modified. This patch does the same thing for MVars. TVars are still done the old way, they could probably benefit from the same treatment too.
* {Enter,Leave}CriticalSection imports should be outside #ifdef __PIC__Simon Marlow2007-09-051-2/+2
|
* FIX: Correct Leave/EnterCriticalSection importsManuel M T Chakravarty2007-09-051-2/+2
|
* put the @N suffix on stdcall foreign calls in .cmm codeSimon Marlow2007-09-041-0/+2
| | | | This applies to EnterCriticalSection and LeaveCriticalSection in the RTS
* Windows: remove the {Enter,Leave}CricialSection wrappersSimon Marlow2007-08-291-2/+2
| | | | | | The C-- parser was missing the "stdcall" calling convention for foreign calls, but once added we can call {Enter,Leave}CricialSection directly.
* annotate C-- calls that do not returnNorman Ramsey2007-08-201-34/+34
| | | | | | | | | * The correct definition of C-- requires that a procedure not 'fall off the end'. The 'never returns' annotation tells us if a (foreign) call is not going to return. Validated!
* Properly guard imports because they have to be precise on Windows and Darwin ↵Clemens Fruhwirth2007-08-101-0/+2
| | | | sets __PIC__ automatically
* Add explicit imports for RTS-external variablesClemens Fruhwirth2007-08-061-4/+7
|
* Build RTS as dynamic libraryClemens Fruhwirth2007-08-081-1/+1
|
* Pointer TaggingSimon Marlow2007-07-271-5/+5
| | | | | | | | | | | | | | | | | | | | | | This patch implements pointer tagging as per our ICFP'07 paper "Faster laziness using dynamic pointer tagging". It improves performance by 10-15% for most workloads, including GHC itself. The original patches were by Alexey Rodriguez Yakushev <mrchebas@gmail.com>, with additions and improvements by me. I've re-recorded the development as a single patch. The basic idea is this: we use the low 2 bits of a pointer to a heap object (3 bits on a 64-bit architecture) to encode some information about the object pointed to. For a constructor, we encode the "tag" of the constructor (e.g. True vs. False), for a function closure its arity. This enables some decisions to be made without dereferencing the pointer, which speeds up some common operations. In particular it enables us to avoid costly indirect jumps in many cases. More information in the commentary: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
* Implemented and fixed bugs in CmmInfo handlingMichael D. Adams2007-06-271-24/+8
|
* Remove vectored returns.Simon Marlow2007-02-281-43/+11
| | | | | We recently discovered that they aren't a win any more, and just cost code size.
* Lightweight ticky-ticky profilingKirsten Chevalier2007-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following changes restore ticky-ticky profiling to functionality from its formerly bit-rotted state. Sort of. (It got bit-rotted as part of the switch to the C-- back-end.) The way that ticky-ticky is supposed to work is documented in Section 5.7 of the GHC manual (though the manual doesn't mention that it hasn't worked since sometime around 6.0, alas). Changes from this are as follows (which I'll document on the wiki): * In the past, you had to build all of the libraries with way=t in order to use ticky-ticky, because it entailed a different closure layout. No longer. You still need to do make way=t in rts/ in order to build the ticky RTS, but you should now be able to mix ticky and non-ticky modules. * Some of the counters that worked in the past aren't implemented yet. I was originally just trying to get entry counts to work, so those should be correct. The list of counters was never documented in the first place, so I hope it's not too much of a disaster that some don't appear anymore. Someday, someone (perhaps me) should document all the counters and what they do. For now, all of the counters are either accurate (or at least as accurate as they always were), zero, or missing from the ticky profiling report altogether. This hasn't been particularly well-tested, but these changes shouldn't affect anything except when compiling with -fticky-ticky (famous last words...) Implementation details: I got rid of StgTicky.h, which in the past had the macros and declarations for all of the ticky counters. Now, those macros are defined in Cmm.h. StgTicky.h was still there for inclusion in C code. Now, any remaining C code simply cannot call the ticky macros -- or rather, they do call those macros, but from the perspective of C code, they're defined as no-ops. (This shouldn't be too big a problem.) I added a new file TickyCounter.h that has all the declarations for ticky counters, as well as dummy macros for use in C code. Someday, these declarations should really be automatically generated, since they need to be kept consistent with the macros defined in Cmm.h. Other changes include getting rid of the header that was getting added to closures before, and getting rid of various code having to do with eager blackholing and permanent indirections (the changes under compiler/ and rts/Updates.*).
* STM invariantstharris@microsoft.com2006-10-071-5/+16
|
* change wired-in Haskell symbols to include the package nameSimon Marlow2006-07-261-2/+2
|
* Reorganisation of the source treeSimon Marlow2006-04-071-0/+953
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.