summaryrefslogtreecommitdiff
path: root/compiler/codeGen/CgUtils.hs
diff options
context:
space:
mode:
authorSimon Marlow <marlowsd@gmail.com>2011-11-28 16:48:43 +0000
committerSimon Marlow <marlowsd@gmail.com>2011-11-29 12:21:18 +0000
commit50de6034343abc93a7b01daccff34121042c0e7c (patch)
tree24496a5fc6bc39c6baaa574608e53c5d76c169f6 /compiler/codeGen/CgUtils.hs
parent1c2b838131134d44004dfdff18c302131478390d (diff)
downloadhaskell-50de6034343abc93a7b01daccff34121042c0e7c.tar.gz
Make profiling work with multiple capabilities (+RTS -N)
This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
Diffstat (limited to 'compiler/codeGen/CgUtils.hs')
-rw-r--r--compiler/codeGen/CgUtils.hs6
1 files changed, 5 insertions, 1 deletions
diff --git a/compiler/codeGen/CgUtils.hs b/compiler/codeGen/CgUtils.hs
index 85957e81b9..5274a176a0 100644
--- a/compiler/codeGen/CgUtils.hs
+++ b/compiler/codeGen/CgUtils.hs
@@ -286,7 +286,7 @@ callerSaveVolatileRegs vols = (caller_save, caller_load)
caller_save = foldr ($!) [] (map callerSaveGlobalReg regs_to_save)
caller_load = foldr ($!) [] (map callerRestoreGlobalReg regs_to_save)
- system_regs = [Sp,SpLim,Hp,HpLim,CurrentTSO,CurrentNursery,
+ system_regs = [Sp,SpLim,Hp,HpLim,CCCS,CurrentTSO,CurrentNursery,
{-SparkHd,SparkTl,SparkBase,SparkLim,-}BaseReg ]
regs_to_save = system_regs ++ vol_list
@@ -384,6 +384,9 @@ callerSaves Hp = True
#ifdef CALLER_SAVES_HpLim
callerSaves HpLim = True
#endif
+#ifdef CALLER_SAVES_CCCS
+callerSaves CCCS = True
+#endif
#ifdef CALLER_SAVES_CurrentTSO
callerSaves CurrentTSO = True
#endif
@@ -423,6 +426,7 @@ baseRegOffset (LongReg 1) = oFFSET_StgRegTable_rL1
baseRegOffset (LongReg n) = panic ("Registers above L1 are not supported (tried to use L" ++ show n ++ ")")
baseRegOffset Hp = oFFSET_StgRegTable_rHp
baseRegOffset HpLim = oFFSET_StgRegTable_rHpLim
+baseRegOffset CCCS = oFFSET_StgRegTable_rCCCS
baseRegOffset CurrentTSO = oFFSET_StgRegTable_rCurrentTSO
baseRegOffset CurrentNursery = oFFSET_StgRegTable_rCurrentNursery
baseRegOffset HpAlloc = oFFSET_StgRegTable_rHpAlloc