compiler/profiling-notes


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303

Profiling Implementation Notes -- June/July/Sept 1994
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Simon and Will

Pre-code-generator-ish
~~~~~~~~~~~~~~~~~~~~~~

* Automagic insertion of _sccs_ on...

  - If -fprof-auto-exported is specified, add _scc_ on each *exported* top-level definition. 
    NB this includes CAFs.  Done by addAutoCostCentres (Core-to-Core pass).

  - If -fprof-auto-top is specified, add _scc_ on *all* top-level definitions.
    Done by same pass.

  - If -fprof-auto is specified, add _scc_ on *all* definitions.

  - Always: just before code generation of module M, onto any CAF
    which hasn't already got an explicit cost centre attached, pin
    "AllCAFs-M".

    Done by finalStgMassageForProfiling (final STG-to-STG pass)

    Only the one-off costs of evaluating the CAFs will be attributed
    to the AllCAFs-M cost centre.  We hope that these costs will be
    small; since the _scc_s are introduced automatically it's
    confusing to attribute any significant costs to them.  However if
    there *are* significant one-off costs we'd better know about it.

    Why so late in the compilation process?  We aren't *absolutely*
    sure what is and isn't a CAF until *just* before code generation.
    So we don't want to mark them as such until then.

  - Individual DICTs

    We do it in the desugarer, because that's the *only* point at
    which we *know* exactly what bindings are introduced by
    overloading.  NB should include bindings for selected methods, eg

	f d = let op = _scc_ DICT op_sel d in
	      ...op...op...op

    The DICT CC ensures that:
    (a) [minor] that the selection cost is separately attributed
    (b) [major] that the cost of executing op is attributed to
	its call site, eg

	...(scc "a" op)...(scc "b" op)...(scc "c" op)...

* Automagic "boxing" of higher-order args:

	finalStgMassageForProfiling (final STG-to-STG pass)

	This (as well as CAF stuff above) is really quite separate
	from the other business of finalStgMassageForProfiling
	(collecting up CostCentres that need to be
	declared/registered).
	
	But throwing it all into the pot together means that we don't
	have to have Yet Another STG Syntax Walker.

	Furthermore, these "boxes" are really just let-bindings that
	many other parts of the compiler will happily substitute away!
	Doing them at the very last instant prevents this.

	A down side of doing these so late is that we get lots of
	"let"s, which if generated earlier and not substituted away,
	could be floated outwards.  Having them floated outwards would
	lessen the chance of skewing profiling results (because of
	gratuitous "let"s added by the compiler into the inner loop of
	some program...).  The allocation itself will be attributed to
	profiling overhead; the only thing which'll be skewed is time measurement.

	So if we have, post-boxing-higher-order-args...

	    _scc_ "foo" ( let f' = [f] \ [] f
			  in
			  map f' xs )

	... we want "foo" to be put in the thunk for "f'", but we want the
	allocation cost (heap census stuff) to be attr to OVERHEAD.

	As an example of what could be improved
		f = _scc_ "f" (g h)
	To save dynamic allocation, we could have a static closure for h:
		h_inf = _scc_ "f" h
		f = _scc_ "f" (g h_inf)
	

Code generator-ish
~~~~~~~~~~~~~~~~~~

(1) _Entry_ code for a closure *usually* sets CC from the closure,
		 at the fast entry point

    Exceptions:

    (a) Top-level subsumed functions (i.e., w/ no _scc_ on them)

	Refrain from setting CC from the closure

    (b) Constructors

	Again, refrain.  (This is *new*)
	
	Reasons: (i) The CC will be zapped very shortly by the restore
	of the enclosing CC when we return to the eval'ing "case".
	(ii) Any intervening updates will indirect to this existing
	constructor (...mumble... new update mechanism... mumble...)

(2) "_scc_ cc expr"

    Set current CC to "cc".  
    No later "restore" of the previous CC is reqd.

(3) "case e of { ...alts... }" expression (eval)

    Save CC before eval'ing scrutinee
    Restore CC at the start of the case-alternative(s)

(4) _Updates_ : updatee gets current CC

    (???? not sure this is OK yet 94/07/04)

    Reasons:

    * Constructors : want to be insensitive to return-in-heap vs
	return-in-regs.  For example,

	f x = _scc_ "f" (x, x)

	The pair (x,x) would get CC of "f" if returned-in-heap;
	therefore, updatees should get CC of "f".

    * PAPs : Example:

	f x = _scc_ "f" (let g = \ y -> ... in g)

	At the moment of update (updatePAP?), CC is "f", which
	is what we want to set it to if the "updatee" is entered

    	When we enter the PAP ("please put the arguments back so I can
	use them"), we restore the setup as at the moment the
	arg-satisfaction check failed.

        Be careful!  UPDATE_PAP is called from the arg-satis check,
	which is before the fast entry point.  So the cost centre
	won't yet have been set from the closure which has just
	been entered.  Solution: in UPDATE_PAP see if the cost centre inside
	the function closure which is being entered is "SUB"; if so, use
	the current cost centre to update the updatee; otherwise use that
	inside the function closure.  (See the computation of cc_pap
	in rule 16_l for lexical semantics.)


(5) CAFs

CAFs get their own cost centre.  Ie

			x = e
is transformed to
			x = _scc_ "CAF:x" e

Or sometimes we lump all the CAFs in a module together.
(Reporting issue or code-gen issue?)


Hybrid stuff
~~~~~~~~~~~~

The problem:

  f = _scc_ "CAF:f" (let g = \xy -> ...
	 	   in (g,g))

Now, g has cost-centre "CAF:f", and is returned as part of
the result.  So whenever the function embedded in the result
is called, the costs will accumulate to "CAF:f".  This is
particularly (de)pressing for dictionaries, which contain lots
of functions.

Solution: 

  A.  Whenever in case (1) above we would otherwise "set the CC from the
  closure", we *refrain* from doing so if 
	(a) the closure is a function, not a thunk; and
	(b) the cost-centre in the closure is a CAF cost centre.

  B.  Whenever we enter a thunk [at least, one which might return a function]
  we save the current cost centre in the update frame.  Then, UPDATE_PAP
  restores the saved cost centre from the update frame iff the cost
  centre at the point of update (cc_pap in (4) above) is a CAF cost centre.

  It isn't necessary to save and possibly-restore the cost centre for
  thunks which will certainly return a constructor, because the 
  cost centre is about to be restored anyway by the enclosing case.

Both A and B are runtime tests.  For A, consider:

  f = _scc_ "CAF:f" (g 2)

  h y = _scc_ "h" g (y+y)

  g x = let w = \p -> ...
	in (w,w)


Now, in the call to g from h, the cost-centre on w will be "h", and
indeed all calls to the result of the call should be attributed to
"h".  

  ... _scc_ "x1" (let (t,_) = h 2 in t 3) ...

  Costs of executing (w 3) attributed to "h".

But in the call to g from f, the cost-centre on w will be
"CAF:f", and calls to w should be attributed to the call site.

  ..._scc_ "x2" (let (t,_) = f in t 3)...

  Costs of executing (w 3) attributed to "x2".


	Remaining problem

Consider

	_scc_ "CAF:f" (if expensive then g 2 else g 3)

where g is a function with arity 2.  In theory we should
restore the enclosing cost centre once we've reduced to
(g 2) or (g 3).  In practice this is pretty tiresome; and pretty rare.

A quick fix: given (_scc_ "CAF" e) where e might be function-valued
(in practice we usually know, because CAF sccs are top level), transform to

	_scc_ "CAF" (let f = e in f)


============

scc cc x  ===>   x

  UNLESS

(a)  cc is a user-defined, non-dup'd cost 
     centre (so we care about entry counts)

OR

(b) cc is not a CAF/DICT cost centre and x is top-level subsumed
     function.
	[If x is lambda/let bound it'll have a cost centre
	 attached dynamically.]

	To repeat, the transformation is OK if 
		x is a not top-level subsumed function
	OR	
		cc is a CAF/DICT cost centre and x is a top-level
		subsumed function


(scc cc e) x  ===>  (scc cc e x)

	OK????? IFF

cc is not CAF/DICT  --- remains to be proved!!!!!!
True for lex
False for eval
Can we tell which in hybrid?

eg  Is this ok?

	(scc "f" (scc "CAF" (\x.b))) y ==>   (scc "f" (scc "CAF" (\x.b) y))


\x -> (scc cc e)    ===>   (scc cc \x->e)

	OK IFF cc is not CAF/DICT


scc cc1 (scc cc2 e))   ===>  scc cc2 e

	IFF not interested in cc1's entry count
	AND cc2 is not CAF/DICT

(scc cc1 ... (scc cc2 e) ...)   ===>  (scc cc1 ... e ...)

	IFF cc2 is CAF/DICT
	AND e is a lambda not appearing as the RHS of a let
	    OR
	    e is a variable not bound to SUB