summaryrefslogtreecommitdiff
path: root/rts/Schedule.c
Commit message (Collapse)AuthorAgeFilesLines
* Better abstraction over run queues.Edward Z. Yang2013-01-161-7/+13
| | | | | | | | | This adds some new functions: peekRunQueue, promoteInRunQueue, singletonRunQueue and truncateRunQueue which help abstract away manual linked list manipulation, making it easier to swap in a new queue implementation. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Add a write barrier for TVAR closuresSimon Marlow2012-11-161-1/+1
| | | | | | | | | | This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
* delete old commentsSimon Marlow2012-10-251-22/+0
|
* remove unused sched_shutting_downSimon Marlow2012-10-251-7/+0
|
* fix a warningSimon Marlow2012-10-231-2/+2
|
* typoSimon Marlow2012-10-221-1/+1
|
* Another overhaul of the recent_activity / idle GC handling (#5991)Simon Marlow2012-09-241-4/+12
| | | | | | | | | | | | | | | Improvements: - we now turn off the timer signal in the non-threaded RTS after idleGCDelay. This should make the xmonad users on #5991 happy. - we now turn off the timer signal after idleGCDelay even if the idle GC is disabled with +RTS -I0. - we now do *not* turn off the timer when profiling. - more comments to explain the meaning of the various ACTIVITY_* values
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-2/+2
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* tidy upSimon Marlow2012-08-211-6/+11
|
* Fix a bug in the handling of recent_activitySimon Marlow2012-08-071-13/+22
| | | | | | | | The problem occurred when the idle GC was turned off with +RTS -I0. Then the scheduler would go into the state ACTIVITY_DONE_GC directly without doing a GC, and a subsequent GC would put it back to ACTIVITY_YES but without turning the timer back on. Instead if the GC finds the state is ACTIVITY_DONE_GC it should leave it there.
* Merge remote branch 'mikolaj/dcoutts'Ian Lynagh2012-07-141-1/+12
|\
| * Emit the task-tracking eventsDuncan Coutts2012-07-101-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com> Use the new task tracing functions traceTaskCreate/Migrate/Delete. There are two key places. One is for worker tasks which have a relatively simple life cycle. Worker tasks are created and deleted by the RTS. The other case is bound tasks which are either created by the RTS, or appear as foreign C threads making calls into the RTS. For bound threads we do the tracing in rts_lock/unlock, which actually covers both threads coming in from outside, and also bound threads made by the RTS.
* | The final GC should be a major oneSimon Marlow2012-07-101-1/+1
|/ | | | | | | | We do a final GC before shutting down the system, to clean up. However, we were doing an ordinary GC rather than forcing a major GC, so especially when the allocation area is large, this final GC could be expensive. This is really just a bug - the final GC should have virtually nothing to do, because there is nothing live.
* Merge branch 'master' of http://darcs.haskell.org//ghcIan Lynagh2012-06-071-1/+1
|\
| * Test USE_MINIINTERPRETER rather than GhcUnregisterisedIan Lynagh2012-05-271-1/+1
| |
* | scheduleYield: avoid doing a GC again if we just did oneIan Lynagh2012-06-071-8/+19
|/ | | | | | If we are interrupted to do a GC, then we do not immediately do another one. This avoids a starvation situation where one Capability keeps forcing a GC and the other Capabilities make no progress at all.
* Fix the timestamps in GC_START and GC_END events on the GC-initiating capMikolaj2012-04-041-2/+0
| | | | | | | | | | | There was a discrepancy between GC times reported in +RTS -s and the timestamps of GC_START and GC_END events on the cap, on which +RTS -s stats for the given GC are based. This is fixed by posting the events with exactly the same timestamp as generated for the stat calculation. The calls posting the events are moved too, so that the events are emitted close to the time instant they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring it's emitted before the moved GC_END on all caps, which simplifies tools code.
* Add eventlog/trace stuff for capabilities: create/delete/enable/disableDuncan Coutts2012-04-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | Now that we can adjust the number of capabilities on the fly, we need this reflected in the eventlog. Previously the eventlog had a single startup event that declared a static number of capabilities. Obviously that's no good anymore. For compatability we're keeping the EVENT_STARTUP but adding new EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old EVENT_SHUTDOWN but renamed and extended (using the existing mechanism to extend eventlog events in a compatible way). So we now emit both EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP. Since reducing the number of capabilities at runtime does not really delete them, it just disables them, then we also have new events for disable/enable. The old EVENT_SHUTDOWN was in the scheduler class of events. The new EVENT_CAP_* events are in the unconditional class, along with the EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted is crucial to making sense of eventlogs, you always want those events. In any case, they're extremely low volume.
* Use win32AllocStack on Win64 tooIan Lynagh2012-03-191-1/+1
|
* Fixed for unregisterised Windows buildsIan Lynagh2012-03-181-1/+1
|
* Another Win64 fixIan Lynagh2012-03-161-1/+1
|
* typoGabor Greif2012-02-271-1/+1
|
* setNumCapabilities: don't barf() if it isn't supported, just print an errorSimon Marlow2012-01-061-3/+9
|
* Support for reducing the number of Capabilities with setNumCapabilitiesSimon Marlow2011-12-151-70/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows setNumCapabilities to /reduce/ the number of active capabilities as well as increase it. This is particularly tricky to do, because a Capability is a large data structure and ties into the rest of the system in many ways. Trying to clean it all up would be extremely error prone. So instead, the solution is to mark the extra capabilities as "disabled". This has the following consequences: - threads on a disabled capability are migrated away by the scheduler loop - disabled capabilities do not participate in GC (see scheduleDoGC()) - No spark threads are created on this capability (see scheduleActivateSpark()) - We do not attempt to migrate threads *to* a disabled capability (see schedulePushWork()). So a disabled capability should do no work, and does not participate in GC, although it remains alive in other respects. For example, a blocked thread might wake up on a disabled capability, and it will get quickly migrated to a live capability. A disabled capability can still initiate GC if necessary. Indeed, it turns out to be hard to migrate bound threads, so we wait until the next GC to do this (see comments for details).
* New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GCSimon Marlow2011-12-131-2/+68
| | | | | | | | | | | | | | | | | This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
* Allow the number of capabilities to be increased at runtime (#3729)Simon Marlow2011-12-061-29/+143
| | | | | At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
* Make forkProcess work with +RTS -NSimon Marlow2011-12-061-86/+182
| | | | | | | | | | | | | | | | | | | | | | Consider this experimental for the time being. There are a lot of things that could go wrong, but I've verified that at least it works on the test cases we have. I also did some API cleanups while I was here. Previously we had: Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret); but this API is particularly error-prone: if you forget to discard the Capability * you passed in and use the return value instead, then you're in for subtle bugs with +RTS -N later on. So I changed all these functions to this form: void rts_eval (/* inout */ Capability **cap, /* in */ HaskellObj p, /* out */ HaskellObj *ret) It's much harder to use this version incorrectly, because you have to pass the Capability in by reference.
* Fix a scheduling bug in the threaded RTSSimon Marlow2011-12-011-7/+13
| | | | | | | | | | | | | | | The parallel GC was using setContextSwitches() to stop all the other threads, which sets the context_switch flag on every Capability. That had the side effect of causing every Capability to also switch threads, and since GCs can be much more frequent than context switches, this increased the context switch frequency. When context switches are expensive (because the switch is between two bound threads or a bound and unbound thread), the difference is quite noticeable. The fix is to have a separate flag to indicate that a Capability should stop and return to the scheduler, but not switch threads. I've called this the "interrupt" flag.
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Time handling overhaulSimon Marlow2011-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* fix occasional failure of numsparks001 test. During shutdown weSimon Marlow2011-08-141-5/+14
| | | | | | | | | discard all the sparks from each Capability, but we were forgetting to account for the discarded sparks in the stats, leading to a failure of the assertion that tests the spark invariant. I've moved the discarding of sparks to just before the GC, to avoid race conditions, and counted the discarded sparks as GC'd.
* Move the call to heapCensus() into GarbageCollect(), just beforeSimon Marlow2011-07-201-5/+4
| | | | | | | | calling resurrectThreads() (fixes #5314). This avoids a lot of problems, because resurrectThreads() may overwrite some closures in the heap, leaving slop behind. The bug in instances, this fix avoids them all in one go.
* Add spark counter tracingDuncan Coutts2011-07-181-0/+2
| | | | | | | A new eventlog event containing 7 spark counters/statistics: sparks created, dud, overflowed, converted, GC'd, fizzled and remaining. These are maintained and logged separately for each capability. We log them at startup, on each GC (minor and major) and on shutdown.
* Move allocation of spark pools into initCapabilityDuncan Coutts2011-07-181-4/+0
| | | | | | Rather than a separate phase of initSparkPools. It means all the spark stuff for a capability is initialisaed at the same time, which is then becomes a good place to stick an initial spark trace event.
* Add assertion of the invariant for the spark countersDuncan Coutts2011-07-181-0/+10
| | | | | | | | | The invariant is: created = converted + remaining + gcd + fizzled Since sparks move between capabilities, we have to aggregate the counters over all capabilities. This in turn means we can only check the invariant at stable points where all but one capabilities are stopped. We can do this at shutdown time and before and after a global synchronised GC.
* Change tryStealSpark so it does not consume fizzled sparksDuncan Coutts2011-07-181-0/+4
| | | | | We want to count fizzled sparks accurately. Now tryStealSpark returns fizzled sparks, and the callers now update the fizzled spark count.
* Fix Windows breakage (#5322). When I modified StgRun to use the pureSimon Marlow2011-07-181-0/+4
| | | | | | | | assembly version as part of the fix for #5250, we inadvertently lost the Windows magic for extending the stack. Win32 requires that the stack is extended a page at a time, otherwise you get a segfault. The C compiler knows how to do this, so we now call a C stub to ensure there's enough stack space at each invocation of the scheduler.
* Fix gcc 4.6 warnings; fixes #5176Ian Lynagh2011-06-251-2/+8
| | | | | | | | | | | Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
* Rearrange shutdownCapability code slightlyDuncan Coutts2011-05-261-10/+1
| | | | | | | | | | | | | | | | | | | This is mostly for the beneift of having sensible places to put tracing code later. We want a code path that has somewhere to trace (in order): (1) starting up all capabilities; (2) N * starting up an individual capability; (3) N * shutting down an individual capability; (4) shutting down all capabilities. This has to work in both threaded and non-threaded modes. Locations (1) and (2) are provided by initCapabilities and initCapability respectively. Previously, there was no loccation for (4) and while shutdownCapability should be usable for (3) it was only called in the !THREADED_RTS case. Now, shutdownCapability is called unconditionally (and the body is conditonal on THREADED_RTS) and there is a new shutdownCapabilities that calls shutdownCapability in a loop.
* Revert "Add capability sets to the event system. Contains code from Duncan ↵Duncan Coutts2011-05-231-8/+8
| | | | | | | | Coutts." This reverts commit 58532eb46041aec8d4cbb48b054cb5b001edb43c. Turns out it didn't work on Windows and it'll need some non-trivial changes to make it work on Windows. We'll get it in later once that's sorted out.
* Add capability sets to the event system. Contains code from Duncan Coutts.Spencer Janssen2011-05-181-8/+8
|
* scheduleDoGC: if we're doing heapCensus(), do it *before* releasingSimon Marlow2011-05-111-6/+6
| | | | the other mutator threads (#5127)
* Refactoring and tidy upSimon Marlow2011-04-111-0/+10
| | | | | | | | | | | | This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
* scheduleThreadOn: use TSO_LOCKED even on the non-threaded RTSSimon Marlow2011-03-301-1/+1
|
* scheduleProcessInbox: use non-blocking acquire, and take the whole queueSimon Marlow2011-02-021-4/+28
| | | | | This is an improvement from my GC branch, that helps performance for intensive message-passing communication between Capabilities.
* Annotate thread stop events with the owner of the black holeSimon Marlow2011-01-271-2/+12
| | | | | | | | | So we can now get these in ThreadScope: 19487000: cap 1: stopping thread 6 (blocked on black hole owned by thread 4) Note: needs an update to ghc-events. Older ThreadScopes will just ignore the new information.
* raiseExceptionHelper: update tso->stackobj->sp before calling ↵Simon Marlow2010-12-211-0/+1
| | | | threadStackOverflow (#4845)
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-151-247/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* Only reset the event log if logging is turned on (addendum to #4512)Simon Marlow2010-12-101-4/+4
|
* Catch too-large allocations and emit an error message (#4505)Simon Marlow2010-12-091-0/+4
| | | | | | | | | | | | | | | | This is a temporary measure until we fix the bug properly (which is somewhat tricky, and we think might be easier in the new code generator). For now we get: ghc-stage2: sorry! (unimplemented feature or known bug) (GHC version 7.1 for i386-unknown-linux): Trying to allocate more than 1040384 bytes. See: http://hackage.haskell.org/trac/ghc/ticket/4550 Suggestion: read data from a file instead of having large static data structures in the code.