diff options
author | Simon Marlow <marlowsd@gmail.com> | 2011-11-28 16:48:43 +0000 |
---|---|---|
committer | Simon Marlow <marlowsd@gmail.com> | 2011-11-29 12:21:18 +0000 |
commit | 50de6034343abc93a7b01daccff34121042c0e7c (patch) | |
tree | 24496a5fc6bc39c6baaa574608e53c5d76c169f6 /rts/AutoApply.h | |
parent | 1c2b838131134d44004dfdff18c302131478390d (diff) | |
download | haskell-50de6034343abc93a7b01daccff34121042c0e7c.tar.gz |
Make profiling work with multiple capabilities (+RTS -N)
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
Diffstat (limited to 'rts/AutoApply.h')
-rw-r--r-- | rts/AutoApply.h | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/rts/AutoApply.h b/rts/AutoApply.h index 547c5d2f28..d0c5c3fe6b 100644 --- a/rts/AutoApply.h +++ b/rts/AutoApply.h @@ -22,7 +22,7 @@ TICK_ALLOC_HEAP_NOCTR(BYTES_TO_WDS(size)); \ TICK_ALLOC_PAP(n+1 /* +1 for the FUN */, 0); \ pap = Hp + WDS(1) - size; \ - SET_HDR(pap, stg_PAP_info, W_[CCCS]); \ + SET_HDR(pap, stg_PAP_info, CCCS); \ StgPAP_arity(pap) = HALF_W_(arity - m); \ StgPAP_fun(pap) = R1; \ StgPAP_n_args(pap) = HALF_W_(n); \ @@ -52,7 +52,7 @@ TICK_ALLOC_HEAP_NOCTR(BYTES_TO_WDS(size)); \ TICK_ALLOC_PAP(n+1 /* +1 for the FUN */, 0); \ new_pap = Hp + WDS(1) - size; \ - SET_HDR(new_pap, stg_PAP_info, W_[CCCS]); \ + SET_HDR(new_pap, stg_PAP_info, CCCS); \ StgPAP_arity(new_pap) = HALF_W_(arity - m); \ W_ n_args; \ n_args = TO_W_(StgPAP_n_args(pap)); \ @@ -78,10 +78,10 @@ // Jump to target, saving CCCS and restoring it on return #if defined(PROFILING) -#define jump_SAVE_CCCS(target) \ - Sp(-1) = W_[CCCS]; \ - Sp(-2) = stg_restore_cccs_info; \ - Sp_adj(-2); \ +#define jump_SAVE_CCCS(target) \ + Sp(-1) = CCCS; \ + Sp(-2) = stg_restore_cccs_info; \ + Sp_adj(-2); \ jump (target) #else #define jump_SAVE_CCCS(target) jump (target) |