1 files changed, 303 insertions, 0 deletions
diff --git a/compiler/profiling-notes b/compiler/profiling-notes
new file mode 100644
index 0000000000..824d52b12a
--- /dev/null
+++ b/compiler/profiling-notes
@@ -0,0 +1,303 @@
+Profiling Implementation Notes -- June/July/Sept 1994
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Simon and Will
+
+Pre-code-generator-ish
+~~~~~~~~~~~~~~~~~~~~~~
+
+* Automagic insertion of _sccs_ on...
+
+  - If -fprof-auto-exported is specified, add _scc_ on each *exported* top-level definition. 
+    NB this includes CAFs.  Done by addAutoCostCentres (Core-to-Core pass).
+
+  - If -fprof-auto-top is specified, add _scc_ on *all* top-level definitions.
+    Done by same pass.
+
+  - If -fprof-auto is specified, add _scc_ on *all* definitions.
+
+  - Always: just before code generation of module M, onto any CAF
+    which hasn't already got an explicit cost centre attached, pin
+    "AllCAFs-M".
+
+    Done by finalStgMassageForProfiling (final STG-to-STG pass)
+
+    Only the one-off costs of evaluating the CAFs will be attributed
+    to the AllCAFs-M cost centre.  We hope that these costs will be
+    small; since the _scc_s are introduced automatically it's
+    confusing to attribute any significant costs to them.  However if
+    there *are* significant one-off costs we'd better know about it.
+
+    Why so late in the compilation process?  We aren't *absolutely*
+    sure what is and isn't a CAF until *just* before code generation.
+    So we don't want to mark them as such until then.
+
+  - Individual DICTs
+
+    We do it in the desugarer, because that's the *only* point at
+    which we *know* exactly what bindings are introduced by
+    overloading.  NB should include bindings for selected methods, eg
+
+	f d = let op = _scc_ DICT op_sel d in
+	      ...op...op...op
+
+    The DICT CC ensures that:
+    (a) [minor] that the selection cost is separately attributed
+    (b) [major] that the cost of executing op is attributed to
+	its call site, eg
+
+	...(scc "a" op)...(scc "b" op)...(scc "c" op)...
+
+* Automagic "boxing" of higher-order args:
+
+	finalStgMassageForProfiling (final STG-to-STG pass)
+
+	This (as well as CAF stuff above) is really quite separate
+	from the other business of finalStgMassageForProfiling
+	(collecting up CostCentres that need to be
+	declared/registered).
+	
+	But throwing it all into the pot together means that we don't
+	have to have Yet Another STG Syntax Walker.
+
+	Furthermore, these "boxes" are really just let-bindings that
+	many other parts of the compiler will happily substitute away!
+	Doing them at the very last instant prevents this.
+
+	A down side of doing these so late is that we get lots of
+	"let"s, which if generated earlier and not substituted away,
+	could be floated outwards.  Having them floated outwards would
+	lessen the chance of skewing profiling results (because of
+	gratuitous "let"s added by the compiler into the inner loop of
+	some program...).  The allocation itself will be attributed to
+	profiling overhead; the only thing which'll be skewed is time measurement.
+
+	So if we have, post-boxing-higher-order-args...
+
+	    _scc_ "foo" ( let f' = [f] \ [] f
+			  in
+			  map f' xs )
+
+	... we want "foo" to be put in the thunk for "f'", but we want the
+	allocation cost (heap census stuff) to be attr to OVERHEAD.
+
+	As an example of what could be improved
+		f = _scc_ "f" (g h)
+	To save dynamic allocation, we could have a static closure for h:
+		h_inf = _scc_ "f" h
+		f = _scc_ "f" (g h_inf)
+	
+
+	
+
+
+Code generator-ish
+~~~~~~~~~~~~~~~~~~
+
+(1) _Entry_ code for a closure *usually* sets CC from the closure,
+		 at the fast entry point
+
+    Exceptions:
+
+    (a) Top-level subsumed functions (i.e., w/ no _scc_ on them)
+
+	Refrain from setting CC from the closure
+
+    (b) Constructors
+
+	Again, refrain.  (This is *new*)
+	
+	Reasons: (i) The CC will be zapped very shortly by the restore
+	of the enclosing CC when we return to the eval'ing "case".
+	(ii) Any intervening updates will indirect to this existing
+	constructor (...mumble... new update mechanism... mumble...)
+
+(2) "_scc_ cc expr"
+
+    Set current CC to "cc".  
+    No later "restore" of the previous CC is reqd.
+
+(3) "case e of { ...alts... }" expression (eval)
+
+    Save CC before eval'ing scrutinee
+    Restore CC at the start of the case-alternative(s)
+
+(4) _Updates_ : updatee gets current CC
+
+    (???? not sure this is OK yet 94/07/04)
+
+    Reasons:
+
+    * Constructors : want to be insensitive to return-in-heap vs
+	return-in-regs.  For example,
+
+	f x = _scc_ "f" (x, x)
+
+	The pair (x,x) would get CC of "f" if returned-in-heap;
+	therefore, updatees should get CC of "f".
+
+    * PAPs : Example:
+
+	f x = _scc_ "f" (let g = \ y -> ... in g)
+
+	At the moment of update (updatePAP?), CC is "f", which
+	is what we want to set it to if the "updatee" is entered
+
+    	When we enter the PAP ("please put the arguments back so I can
+	use them"), we restore the setup as at the moment the
+	arg-satisfaction check failed.
+
+        Be careful!  UPDATE_PAP is called from the arg-satis check,
+	which is before the fast entry point.  So the cost centre
+	won't yet have been set from the closure which has just
+	been entered.  Solution: in UPDATE_PAP see if the cost centre inside
+	the function closure which is being entered is "SUB"; if so, use
+	the current cost centre to update the updatee; otherwise use that
+	inside the function closure.  (See the computation of cc_pap
+	in rule 16_l for lexical semantics.)
+
+
+(5) CAFs
+
+CAFs get their own cost centre.  Ie
+
+			x = e
+is transformed to
+			x = _scc_ "CAF:x" e
+
+Or sometimes we lump all the CAFs in a module together.
+(Reporting issue or code-gen issue?)
+
+
+
+Hybrid stuff
+~~~~~~~~~~~~
+
+The problem:
+
+  f = _scc_ "CAF:f" (let g = \xy -> ...
+	 	   in (g,g))
+
+Now, g has cost-centre "CAF:f", and is returned as part of
+the result.  So whenever the function embedded in the result
+is called, the costs will accumulate to "CAF:f".  This is
+particularly (de)pressing for dictionaries, which contain lots
+of functions.
+
+Solution: 
+
+  A.  Whenever in case (1) above we would otherwise "set the CC from the
+  closure", we *refrain* from doing so if 
+	(a) the closure is a function, not a thunk; and
+	(b) the cost-centre in the closure is a CAF cost centre.
+
+  B.  Whenever we enter a thunk [at least, one which might return a function]
+  we save the current cost centre in the update frame.  Then, UPDATE_PAP
+  restores the saved cost centre from the update frame iff the cost
+  centre at the point of update (cc_pap in (4) above) is a CAF cost centre.
+
+  It isn't necessary to save and possibly-restore the cost centre for
+  thunks which will certainly return a constructor, because the 
+  cost centre is about to be restored anyway by the enclosing case.
+
+Both A and B are runtime tests.  For A, consider:
+
+  f = _scc_ "CAF:f" (g 2)
+
+  h y = _scc_ "h" g (y+y)
+
+  g x = let w = \p -> ...
+	in (w,w)
+
+
+Now, in the call to g from h, the cost-centre on w will be "h", and
+indeed all calls to the result of the call should be attributed to
+"h".  
+
+  ... _scc_ "x1" (let (t,_) = h 2 in t 3) ...
+
+  Costs of executing (w 3) attributed to "h".
+
+But in the call to g from f, the cost-centre on w will be
+"CAF:f", and calls to w should be attributed to the call site.
+
+  ..._scc_ "x2" (let (t,_) = f in t 3)...
+
+  Costs of executing (w 3) attributed to "x2".
+
+
+	Remaining problem
+
+Consider
+
+	_scc_ "CAF:f" (if expensive then g 2 else g 3)
+
+where g is a function with arity 2.  In theory we should
+restore the enclosing cost centre once we've reduced to
+(g 2) or (g 3).  In practice this is pretty tiresome; and pretty rare.
+
+A quick fix: given (_scc_ "CAF" e) where e might be function-valued
+(in practice we usually know, because CAF sccs are top level), transform to
+
+	_scc_ "CAF" (let f = e in f)
+
+
+
+
+
+============
+
+scc cc x  ===>   x
+
+  UNLESS
+
+(a)  cc is a user-defined, non-dup'd cost 
+     centre (so we care about entry counts)
+
+OR
+
+(b) cc is not a CAF/DICT cost centre and x is top-level subsumed
+     function.
+	[If x is lambda/let bound it'll have a cost centre
+	 attached dynamically.]
+
+	To repeat, the transformation is OK if 
+		x is a not top-level subsumed function
+	OR	
+		cc is a CAF/DICT cost centre and x is a top-level
+		subsumed function
+
+
+
+(scc cc e) x  ===>  (scc cc e x)
+
+	OK????? IFF
+
+cc is not CAF/DICT  --- remains to be proved!!!!!!
+True for lex
+False for eval
+Can we tell which in hybrid?
+
+eg  Is this ok?
+
+	(scc "f" (scc "CAF" (\x.b))) y ==>   (scc "f" (scc "CAF" (\x.b) y))
+
+
+\x -> (scc cc e)    ===>   (scc cc \x->e)
+
+	OK IFF cc is not CAF/DICT
+
+
+scc cc1 (scc cc2 e))   ===>  scc cc2 e
+
+	IFF not interested in cc1's entry count
+	AND cc2 is not CAF/DICT
+
+(scc cc1 ... (scc cc2 e) ...)   ===>  (scc cc1 ... e ...)
+
+	IFF cc2 is CAF/DICT
+	AND e is a lambda not appearing as the RHS of a let
+	    OR
+	    e is a variable not bound to SUB
+
+