diff options
author | Sylvain Henry <sylvain@haskus.fr> | 2020-03-18 10:44:56 +0100 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2020-03-29 17:28:51 -0400 |
commit | 1941ef4f050c0dfcb68229641fcbbde3a10f1072 (patch) | |
tree | 8e25a61af77696d3022d35cc277b5db5af540f03 /compiler/profiling-notes | |
parent | 1c446220250dcada51d4bb33a0cc7d8ce572e8b6 (diff) | |
download | haskell-1941ef4f050c0dfcb68229641fcbbde3a10f1072.tar.gz |
Modules: Types (#13009)
Update Haddock submodule
Metric Increase:
haddock.compiler
Diffstat (limited to 'compiler/profiling-notes')
-rw-r--r-- | compiler/profiling-notes | 303 |
1 files changed, 303 insertions, 0 deletions
diff --git a/compiler/profiling-notes b/compiler/profiling-notes new file mode 100644 index 0000000000..824d52b12a --- /dev/null +++ b/compiler/profiling-notes @@ -0,0 +1,303 @@ +Profiling Implementation Notes -- June/July/Sept 1994 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Simon and Will + +Pre-code-generator-ish +~~~~~~~~~~~~~~~~~~~~~~ + +* Automagic insertion of _sccs_ on... + + - If -fprof-auto-exported is specified, add _scc_ on each *exported* top-level definition. + NB this includes CAFs. Done by addAutoCostCentres (Core-to-Core pass). + + - If -fprof-auto-top is specified, add _scc_ on *all* top-level definitions. + Done by same pass. + + - If -fprof-auto is specified, add _scc_ on *all* definitions. + + - Always: just before code generation of module M, onto any CAF + which hasn't already got an explicit cost centre attached, pin + "AllCAFs-M". + + Done by finalStgMassageForProfiling (final STG-to-STG pass) + + Only the one-off costs of evaluating the CAFs will be attributed + to the AllCAFs-M cost centre. We hope that these costs will be + small; since the _scc_s are introduced automatically it's + confusing to attribute any significant costs to them. However if + there *are* significant one-off costs we'd better know about it. + + Why so late in the compilation process? We aren't *absolutely* + sure what is and isn't a CAF until *just* before code generation. + So we don't want to mark them as such until then. + + - Individual DICTs + + We do it in the desugarer, because that's the *only* point at + which we *know* exactly what bindings are introduced by + overloading. NB should include bindings for selected methods, eg + + f d = let op = _scc_ DICT op_sel d in + ...op...op...op + + The DICT CC ensures that: + (a) [minor] that the selection cost is separately attributed + (b) [major] that the cost of executing op is attributed to + its call site, eg + + ...(scc "a" op)...(scc "b" op)...(scc "c" op)... + +* Automagic "boxing" of higher-order args: + + finalStgMassageForProfiling (final STG-to-STG pass) + + This (as well as CAF stuff above) is really quite separate + from the other business of finalStgMassageForProfiling + (collecting up CostCentres that need to be + declared/registered). + + But throwing it all into the pot together means that we don't + have to have Yet Another STG Syntax Walker. + + Furthermore, these "boxes" are really just let-bindings that + many other parts of the compiler will happily substitute away! + Doing them at the very last instant prevents this. + + A down side of doing these so late is that we get lots of + "let"s, which if generated earlier and not substituted away, + could be floated outwards. Having them floated outwards would + lessen the chance of skewing profiling results (because of + gratuitous "let"s added by the compiler into the inner loop of + some program...). The allocation itself will be attributed to + profiling overhead; the only thing which'll be skewed is time measurement. + + So if we have, post-boxing-higher-order-args... + + _scc_ "foo" ( let f' = [f] \ [] f + in + map f' xs ) + + ... we want "foo" to be put in the thunk for "f'", but we want the + allocation cost (heap census stuff) to be attr to OVERHEAD. + + As an example of what could be improved + f = _scc_ "f" (g h) + To save dynamic allocation, we could have a static closure for h: + h_inf = _scc_ "f" h + f = _scc_ "f" (g h_inf) + + + + + +Code generator-ish +~~~~~~~~~~~~~~~~~~ + +(1) _Entry_ code for a closure *usually* sets CC from the closure, + at the fast entry point + + Exceptions: + + (a) Top-level subsumed functions (i.e., w/ no _scc_ on them) + + Refrain from setting CC from the closure + + (b) Constructors + + Again, refrain. (This is *new*) + + Reasons: (i) The CC will be zapped very shortly by the restore + of the enclosing CC when we return to the eval'ing "case". + (ii) Any intervening updates will indirect to this existing + constructor (...mumble... new update mechanism... mumble...) + +(2) "_scc_ cc expr" + + Set current CC to "cc". + No later "restore" of the previous CC is reqd. + +(3) "case e of { ...alts... }" expression (eval) + + Save CC before eval'ing scrutinee + Restore CC at the start of the case-alternative(s) + +(4) _Updates_ : updatee gets current CC + + (???? not sure this is OK yet 94/07/04) + + Reasons: + + * Constructors : want to be insensitive to return-in-heap vs + return-in-regs. For example, + + f x = _scc_ "f" (x, x) + + The pair (x,x) would get CC of "f" if returned-in-heap; + therefore, updatees should get CC of "f". + + * PAPs : Example: + + f x = _scc_ "f" (let g = \ y -> ... in g) + + At the moment of update (updatePAP?), CC is "f", which + is what we want to set it to if the "updatee" is entered + + When we enter the PAP ("please put the arguments back so I can + use them"), we restore the setup as at the moment the + arg-satisfaction check failed. + + Be careful! UPDATE_PAP is called from the arg-satis check, + which is before the fast entry point. So the cost centre + won't yet have been set from the closure which has just + been entered. Solution: in UPDATE_PAP see if the cost centre inside + the function closure which is being entered is "SUB"; if so, use + the current cost centre to update the updatee; otherwise use that + inside the function closure. (See the computation of cc_pap + in rule 16_l for lexical semantics.) + + +(5) CAFs + +CAFs get their own cost centre. Ie + + x = e +is transformed to + x = _scc_ "CAF:x" e + +Or sometimes we lump all the CAFs in a module together. +(Reporting issue or code-gen issue?) + + + +Hybrid stuff +~~~~~~~~~~~~ + +The problem: + + f = _scc_ "CAF:f" (let g = \xy -> ... + in (g,g)) + +Now, g has cost-centre "CAF:f", and is returned as part of +the result. So whenever the function embedded in the result +is called, the costs will accumulate to "CAF:f". This is +particularly (de)pressing for dictionaries, which contain lots +of functions. + +Solution: + + A. Whenever in case (1) above we would otherwise "set the CC from the + closure", we *refrain* from doing so if + (a) the closure is a function, not a thunk; and + (b) the cost-centre in the closure is a CAF cost centre. + + B. Whenever we enter a thunk [at least, one which might return a function] + we save the current cost centre in the update frame. Then, UPDATE_PAP + restores the saved cost centre from the update frame iff the cost + centre at the point of update (cc_pap in (4) above) is a CAF cost centre. + + It isn't necessary to save and possibly-restore the cost centre for + thunks which will certainly return a constructor, because the + cost centre is about to be restored anyway by the enclosing case. + +Both A and B are runtime tests. For A, consider: + + f = _scc_ "CAF:f" (g 2) + + h y = _scc_ "h" g (y+y) + + g x = let w = \p -> ... + in (w,w) + + +Now, in the call to g from h, the cost-centre on w will be "h", and +indeed all calls to the result of the call should be attributed to +"h". + + ... _scc_ "x1" (let (t,_) = h 2 in t 3) ... + + Costs of executing (w 3) attributed to "h". + +But in the call to g from f, the cost-centre on w will be +"CAF:f", and calls to w should be attributed to the call site. + + ..._scc_ "x2" (let (t,_) = f in t 3)... + + Costs of executing (w 3) attributed to "x2". + + + Remaining problem + +Consider + + _scc_ "CAF:f" (if expensive then g 2 else g 3) + +where g is a function with arity 2. In theory we should +restore the enclosing cost centre once we've reduced to +(g 2) or (g 3). In practice this is pretty tiresome; and pretty rare. + +A quick fix: given (_scc_ "CAF" e) where e might be function-valued +(in practice we usually know, because CAF sccs are top level), transform to + + _scc_ "CAF" (let f = e in f) + + + + + +============ + +scc cc x ===> x + + UNLESS + +(a) cc is a user-defined, non-dup'd cost + centre (so we care about entry counts) + +OR + +(b) cc is not a CAF/DICT cost centre and x is top-level subsumed + function. + [If x is lambda/let bound it'll have a cost centre + attached dynamically.] + + To repeat, the transformation is OK if + x is a not top-level subsumed function + OR + cc is a CAF/DICT cost centre and x is a top-level + subsumed function + + + +(scc cc e) x ===> (scc cc e x) + + OK????? IFF + +cc is not CAF/DICT --- remains to be proved!!!!!! +True for lex +False for eval +Can we tell which in hybrid? + +eg Is this ok? + + (scc "f" (scc "CAF" (\x.b))) y ==> (scc "f" (scc "CAF" (\x.b) y)) + + +\x -> (scc cc e) ===> (scc cc \x->e) + + OK IFF cc is not CAF/DICT + + +scc cc1 (scc cc2 e)) ===> scc cc2 e + + IFF not interested in cc1's entry count + AND cc2 is not CAF/DICT + +(scc cc1 ... (scc cc2 e) ...) ===> (scc cc1 ... e ...) + + IFF cc2 is CAF/DICT + AND e is a lambda not appearing as the RHS of a let + OR + e is a variable not bound to SUB + + |