diff options
Diffstat (limited to 'docs/storage-mgt/rp.tex')
-rw-r--r-- | docs/storage-mgt/rp.tex | 1102 |
1 files changed, 1102 insertions, 0 deletions
diff --git a/docs/storage-mgt/rp.tex b/docs/storage-mgt/rp.tex new file mode 100644 index 0000000000..2055894282 --- /dev/null +++ b/docs/storage-mgt/rp.tex @@ -0,0 +1,1102 @@ +\documentclass{article} +\usepackage{code,a4wide} + +\usepackage{graphics,epsfig,epic,eepic,epsfig} + +\setlength{\parskip}{0.25cm} +\setlength{\parsep}{0.25cm} +\setlength{\topsep}{0cm} +\setlength{\parindent}{0cm} +\renewcommand{\textfraction}{0.2} +\renewcommand{\floatpagefraction}{0.7} + + +% Terminology +\newcommand{\block}{block} +\newcommand{\Block}{Block} +\newcommand{\segment}{segment} +\newcommand{\Segment}{Segment} +\newcommand{\step}{step} +\newcommand{\Step}{Step} + +\newcommand{\note}[1]{{\em $\spadesuit$ #1}} + +\begin{document} +\title{Implementation of Retainer Profiling} +\author{Sungwoo Park and Simon Peyton-Jones} + +\makeatactive +\maketitle + +\section{Retainer Profiling} + +Retainer profiling~\cite{CN} is a profiling technique which is based upon a +special view of production and consumption of heap objects at runtime: +while producers build heap objects to form new pieces of graph, +consumers hold pointers to these heap objects, or \emph{retain} them, so +that they are not freed during garbage collections. +On this basis, we refereed to such consumers as \emph{retainers}. +Notice that an object can have more than one retainer because it can +be pointed to by multiple objects. + +For each live object in the heap, retainer profiling computes +all its retainers, or its \emph{retainer set}. +A naive implementation of retainer profiling could consider every +immediate ancestor of an object as its retainer. +Although this approach appears to provide an accurate report on the +relationship between retainers and retainees, the result can hardly be useful. +For instance, it is pointless to scrutinize a list and treat each cons +cell as a retainer of the following cons cell. +This observation suggests that we need to have a way of designating only +certain types of objects as candidates for retainers. +In other words, we need to employ an oracle which tells whether a given +object can be a retainer or not. + +Since no retainer of a particular object needs to be using the +object actively, we can find all its retainers simply by traversing +the graph. In other words, we do not have to distinguish those retainers +actively exploiting it from other retainers just holding pointers +to it (either directly or indirectly). +Thus, retainer profiling can be accomplished simply by traversing the +graph. + +Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer +profiling. The traversal begins at every root, and proceeds +in a depth first manner (or a breadth first manner). +The function @R()@ returns the \emph{identity} of a given retainer such +as its information table address or +the name of the module which creates it. +Notice that the retainer identity function does not need to be a +one-to-one mapping: +multiple objects can share the same identity. +Such a retainer identity function reduces the cost of traversal. +For instance, when an object +is reached from several retainers which share the same identity, we need to +consider only the first visit to the object. +In other words, whichever retainer (among those sharing the same identity) +leads to the object for the first time affects the retainer set of the object +in consideration +and all the other retainers can be ignored. +Thus, the more coarse the function @R()@ is, the less +it costs to traverse the graph for retainer profiling. +The function @isRetainer()@ tells whether a given object is a retainer or not. +Hence, the behavior of the retainer profiling algorithm is parameterized +over: 1) the set of roots; 2) the function @R()@; 3) the function +@isRetainer()@. + +One important invariant on the function @R()@ is that its return value +must be consistent for a given retainer. In other words, @R()@ must return +the same value for a given retainer no matter it is invoked. +For this reason, the memory address of a retainer, for instance, cannot be used as +its retainer identity because its location may change during garbage collections. + +\begin{figure}[ht] +\begin{center} +\begin{code} +for every root r + retain(r, r) + +R(r) = + the identity of r + +isRetainer(c) = + if c is a retainer then + true + else + false + +retain(c, r) = + if R(r) is a member of c.retainerSet then + return + add R(r) to c.retainerSet + if isRetainer(c) then + r' := c + else + r' := r + for every successor c' of c + retain(c', r') +\end{code} +\caption{Retainer profiling algorithm} +\label{fig-retaineralgorithm} +\end{center} +\end{figure} + +Another way of formulating retainer profiling is in terms of fixed point +equations on retainer sets. +To be specific, given the two functions @isRetainer()@ and @R()@, +the retainer set of every object is computed as the least fixed point +solution of the following equations: +\begin{itemize} +\item For every root @r@, +\begin{center} + @R(r)@ $\in$ @r.retainerSet@. +\end{center} +\item For every reachable object @c@, +\begin{center} +$\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$ +@c.retainerSet@ +\end{center} +where @from(a)@ returns retainer(s) obtainable from @a@: +\begin{center} +@from(a)@ = if @isRetainer(a)@ then $\{@a@\}$ else @a.retainerSet@. +\end{center} +\end{itemize} + +This document describes the implementation of retainer profiling on +the Glasgow Haskell Compiler runtime system. +It explains every detail of the development so that it can be (hopefully) +a complete maintenance guide. +A secondary goal is to help (hopefully) those who wish to extend the system +to implement another profiling scheme.\footnote{Unless otherwise mentioned, +all identifiers are defined in @RetainerProfile.c@ or @RetainerSet.c@.} + +\section{Installing the GHC} + +Installing the GHC is done as follows: + +\begin{enumerate} +\item Get the source code from the CVS repository. +\begin{code} +./ cvs checkout fpconfig +./fptools/ cvs update -d CONTRIB common-rts distrib docs ghc glafp-utils + hslibs literate mhms mk nofib testsuite +\end{code} + +\item Set up the basic configuration. +\begin{code} +./fptools/ autoconf +./fptools/ghc/ autoconf +./fptools/ configure +\end{code} + +\item Set up the configuration for development and debugging. +\begin{code} +./fptools/mk vi build.mk + GhcHcOpts = -O -fasm -Rghc-timing + SplitObjs = NO + GhcRtsHcOpts = + GhcRtsCcOpts = -g + STRIP =: +\end{code} +@GhcLibWays@ tells the compiler to build the code for profiling as well. +@GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files. +@GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files. +Since we will implement retainer profiling in @.c@ files, we turn on the +debugging flag @-g@. +The empty setting for @STRIP@ tells the compiler not to remove source code +information (generated due to the @-g@ option) from executable files so that +they can be examined with @gdb@. + +\item Remove unnecessary files if needed and build everything. +\begin{code} +./fptools/ make +\end{code} +\end{enumerate} + +\section{Adding Retainer Set Fields} + +Since every Haskell closure now needs to store its retainer set at runtime, +it must be augmented with a new field, +namely, a \emph{retainer set field}. +This section explains how to add such a field to Haskell closures. +It should be clear how to generalize the idea for adding +any number of new fields.\footnote{The GHC provides two +ways of building executable programs from +source files: normal way and profiling way. +We are concerned only about profiling way, and all the pieces of code +implementing profiling way are wrapped by the @PROFILING@ +pre-processing directive (as in @\#ifdef PROFILING@). +Therefore, all the additions and changes that we make to the source code +are assumed to be wrapped by the @PROFILING@ pre-processing +directive as well unless otherwise mentioned.} + +\subsection{Adding a new field to Haskell closures} + +We want to add a retainer set field of type @retainerSet@ to every +closure, so we create a new file @includes/StgRetainerProf.h@ where +we define the type @retainerSet@. +The actual definition of @retainerSet@ will be given later. + +\begin{code} +/* includes/StgRetainerProf.h */ +typedef ... retainerSet; +\end{code} + +We make type @retainerSet@ to be publicly available by including +@includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped +by @PROFILING@). + +\begin{code} +/* includes/Stg.h */ +#include "StgRetainerProf.h" +\end{code} + +Then we add a retainer set field @rs@ to the @StgProfHeader@ structure. + +\begin{code} +/* include/Closures.h */ +typedef struct { + CostCentreStack *ccs; + retainerSet *rs; +} StgProfHeader; +\end{code} + +Now every closure @c@ (including static closures) has a retainer set field, +which can be accessed with @c->header.prof.rs@ (where @c@ denotes the +address of the closure). + +\subsection{Changing constants} + +We are ready to reflect the new size of Haskell closures to other part +of the source code. +This is accomplished by changing a few constants which specify the size +of certain types of closures and their layout. + +When building the runtime system, the @gcc@ compiler correctly figures out +the size of every structure on its own. +However, +GHC simply reads @includes/Constants.h@ to to determine the size of +closures assumed by the runtime system. +Thus, we must change the constants used by the GHC itself (as opposed to +the runtime system). They are all found in @includes/Constants.h@. +We increase each of them by 1 to reflect the retainer set field which is one +word long: +\begin{code} +/* includes/Constants.h */ +#define PROF_HDR_SIZE 2 +#define SCC_UF_SIZE 5 +#define SCC_SEQ_FRAME_SIZE 4 +\end{code} +@PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which +is now two words long. +@SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures +@StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in +words. + +Now we must rebuild the GHC so that, when executed, the code generated by +the GHC must now allocate one more word for the retainer set field than before. + +\begin{code} +./fptools/ghc/ make boot +./fptools/ghc/ make +\end{code} + +The second command @make boot@ instructs the build system to analyze +the source code dependency so that the next execution of @make@ correctly +finds all required files. + +Next we change four bitmap constants which specify the layout of +certain types of closures. +As an example, let us consider @RET_BITMAP@, which specifies the layout +of thunk selectors (corresponding to closure type @THUNK_SELECTOR@). +Without a retainer set field, there is only one non-pointer (represented +by $1$) followed by one or more pointers (represented by $0$) in the closure +body and the bitmap representation is $0b1$, or $1$. +With a retainer set field, which is not a pointer to another closure and thus +represented by $1$, there are two non-pointers, and the bitmap representation +is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order: +the least significant bit corresponds to the first word in the closure body, +and the second least significant bit to the second word, and so forth. +The same rule applies to the other three bitmap constants: +@CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure +@StgCatchFrame@), +@STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure +@StgStopFrame@), and +@UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure +@StgUpdateFrame@). + +\begin{code} +/* rts/StgStdThunks.hc */ +#define RET_BITMAP 3 +/* rts/Exception.hc */ +#define CATCH_FRAME_BITMAP 15 +/* rts/StgStartup.hc */ +#define STOP_THREAD_BITMAP 3 +/* rts/updates.hc */ +#define UPD_FRAME_BITMAP 7 +\end{code} + +For most closure types, the new definition of @StgProfHeader@ is +automatically propagated to their corresponding structures. +However, there are six closures types which are not affected by +@StgProfHeader@. They are all stack closures: +@RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and +@RET_VEC_BIG@. +If you want a new field to be added to these closures, you may +have to modify their corresponding structures. + +\textbf{To do:} Presently the above changes introduce two bug in the +runtime system. +First, @nofib/real/symalg@ ends up with a division-by-zero +exception if we add a new field. +Second, the runtime system option @-auto-all@ clashes in some test files +in the @nofib@ testing suite (e.g., @spectral/expert@). + +\subsection{Initialization code} + +When a new closure is allocated, its retainer set field may have to be +initialized according to the way that retainer profiling is implemented. +For instance, we could use as an initial value a pointer to an empty retainer +set. +Alternatively we could assign a null pointer to indicate that its retainer +set has not been computed yet, which we adopt in our implementation. +In either case, we have to visit the new closure and execute some initialization +code on it so that its retainer set field is set to an appropriate value. + +There are three parts in the source code which need to be modified. +dynamic closure initialization, static closure initialization, +and update frame initialization. +The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in +@include/ClosureMacros.h@). When a closure @c@ is created at runtime, +@SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument. +Thus, the following code initializes the retainer set field of every +dynamic closure to a null pointer. + +\begin{code} +/* include/ClosureMacros.h */ +#define SET_PROF_HDR(c,ccs_) \ + ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = NULL) +\end{code} + +Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the +same file) specifies how the retainer set field of every static closure +is initialized, which is rewritten as follows: + +\begin{code} +/* include/ClosureMacros.h */ +#define SET_STATIC_PROF_HDR(ccs_) \ + prof : { ccs : ccs_, rs : NULL }, +\end{code} + +\textbf{Obsolete:} Dynamic closures created through explicit C function invocations +(in @RtsAPI.c@) are now initialized by @SET_HDR()@. + +%There is another way of creating dynamic closures through explicit C +%function invocations at runtime. +%Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@, +%@rts_mkWord()@, and so forth. +%Each function allocates memory for a new closure, +%initializes it, and returns its address. +%Therefore, we can simply insert in each function another initialization code +%for retainer set fields. +%To this end, we define a macro @setRetainerField()@ and insert it +%in each function: +% +%\begin{code} +%#define setRetainerField(p) \ +% (p)->header.prof.rs = NULL +%\end{code} +% +%For instance, @rts_mkChar()@ is now defined as follows: +% +%\begin{code} +%/* RtsAPI.c */ +%HaskellObj +%rts_mkChar (HsChar c) +%{ +% StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1)); +% ... +% setRetainerField(p); +% return p; +%} +%\end{code} + +Finally we may need to initialize the retainer set field of an update frame +(stack closure) when it is pushed onto the stack for the first time. +For instance, if we want to initialize the retainer set field of update +frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@ +(in @includes/Updates.h@) as follows: + +\begin{code} +/* includes/Updates.h */ +#define PUSH_STD_CCCS(frame) \ + (frame->header.prof.ccs = CCCS, frame->header.prof.rs = NULL) +\end{code} + +In our implementation of retainer profiling, however, the retainer set field is not +used for any stack closure. +Hence, the above modification is entirely unnecessary. +Also, update frames are the only exception to the standard way of creating +stack closures: all the other types of stack closures with a retainer set +field are eventually initialized by +the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn +invokes @SET\_PROF\_HDR()@. This is not the case for update frames. +Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and +@PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification. + +\section{Retainer Sets} + +At the end of retainer profiling, every live closure (except stack +closures, for which we do not compute retainer sets) is associated with +a retainer set; there can be no closure without an associated retainer set +because every live closure is visited during traversal. +Since many closures may well be associated with a common retainer set, +we want to avoid creating any particular retainer set more than once. +This section presents the details of manipulating retainer sets in our +implementation. + +\subsection{Identity of a retainer} + +The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns +the identity of a retainer. In order to implement it, we need +a type for retainer identity. +The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for +this purpose. + +There are various ways of defining the type @retainer@. +For instance, we can designate the information table address of a retainer as +its identity as follows: + +\begin{code} +struct _StgInfoTable; +typedef struct _StgInfoTable *retainer; +\end{code} + +We can also use the cost centre stack associated with the retainer: + +\begin{code} +typedef CostCentreStack *retainer; +\end{code} + +The function @R()@ is embodied as the function @getRetainerFrom()@ in the +implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@. +It is straightforward to define @getRetainerFrom()@ according to the definition +of @retainer@, as illustrated below: + +\begin{code} +retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); } +retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; } +\end{code} + +\subsection{Retainer sets and the cost function} + +A retainer set is stored in the structure @retainerSet@ +(in @includes/StgRetainerProf.h@): + +\begin{code} +typedef struct _retainerSet { + nat num; + nat cost; + ... + int id; + retainer element[0]; +} retainerSet; +\end{code} + +The field @num@ gives the number of retainers in the retainer set, which +are all stored in the array @element[]@. Thus, the size of @element[]@ +is assumed to be @num@. +The field @cost@ gives the sum of the \emph{costs} of those closures +associated with the retainer set: if a closure @c@ is +associated with the retainer set, that is, if @c@ is retained by each +retainer in the retainer set and none else, +the cost of @c@ is added to the field @cost@. +The field @id@ gives a unique identification number for the retainer set. + +The interface to @retainerSet@ is as follows +(see @RetainerSet.h@): + +\begin{description} +\item[@void initializeAllRetainerSet(void)@] initializes the store for retainer sets. +\item[@void refreshAllRetainerSet(void)@] refreshes each retainer set by setting +its @cost@ field to zero. This function does destroy any retainer set. +\item[@void closeAllRetainerSet(void)@] destroys all retainer sets and closes +the store for retainer sets. +\item[@retainerSet *singleton(retainer r)@] returns a singleton retainer set +consisting of @r@ alone. If such a retainer set already exists, no new retainer +set is created. Otherwise, a new retainer set is created. +\item[@retainerSet *addElement(retainer r, retainerSet *rs)@] returns a retainer set +@rs@ augmented with @r@. If such a retainer set already exists, no new retainer set +is created. Otherwise, a new retainer set is created. +\item[@rtsBool isMember(retainer r, retainerSet *rs)@] returns a boolean value +indicating whether @r@ is a member of @rs@. +\item[@void traverseAllRetainerSet(void (*f)(retainerSet *))@] invokes the function +@f@ on every retainer set created. +\item[@void printRetainerSetShort(FILE *, retainerSet *)@] prints a single retainer +set. +\item[@void outputRetainerSet(FILE *, nat *allCost, nat *numSet)@] prints all +retainer sets. Stores the sum of all their costs in @*allCost@ and the number +of them in @*numSet@. +\item[@void outputAllRetainerSet(FILE *)@] prints all retainer sets. +\end{description} + +We also define a \emph{cost function}, which returns the cost of a given closure, +in order to compute the field @cost@. +The cost function can be defined in several ways. +A typical definition is on the size of a closure, which results in +the field @cost@ accumulating the size of all the closures retained by a +retainer set. +If we just want to count the number of closures retained by the +retainer set, we can simply set the cost of every closure to one regardless +of its closure type. +Furthermore, we can define the cost function flexibly according to +the closure type. +For instance, we can set the size of any static closure to zero so that +it is not taken into account at all in computing the field @cost@. +Notice that static closures are also visited during traversal because they +may lead to other dynamic closures (e.g., static indirection closures of +closure type @IND_STATIC@). +This is especially desirable because we usually focus on the heap use. +We can also selectively choose certain dynamic closure types not to contribute +to the field @cost@. + +In our implementation, there are two functions related with the cost function: +@cost()@ and @costPure()@. +@cost()@ returns the size of the entire memory allocated for a given closure +(even including the two fields in the structure @StgProfHeader@). +It returns zero for static closures. +@costPure()@ returns the size of the memory which would be allocated for +a given closure with no profiling. +It is defined in terms of @cost()@, and it suffices to change only @cost()@ +when a new scheme for the cost function is desired. +@costPure()@ is put to actual use in computing the field @cost@ because it +effectively hides the memory overhead incurred by profiling. + +\subsection{Implementation} + +The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm} +adds at most one retainer to an existing retainer set (or an empty retainer set) +at any moment; it does not require a retainer set union operation. +This observation simplifies the implementation, and +we employ the following two functions for creating new retainer sets: +@singleton()@, which creates a singleton retainer set, and +@addElement()@, which adds an element to an existing retainer set. + +It is a frequent operation during retainer profiling to search for a retainer +set, which may or may not exist, built from a given retainer set and a +particular retainer. +To efficiently implement this operation, +we choose to store all retainer sets in a hash table and +the structure @retainerSet@ is now extended with two new fields +@hashKey@ and @link@. +The field @hashKey@ stores the hash key which is obtained +from the retainers in a retainer set. +The field @link@ points to the next retainer set in the same bucket: + +\begin{code} +typedef struct _retainerSet { + ... + StgWord hashKey; + struct _retainerSet *link; + ... +} retainerSet; +\end{code} + +The hashing function must be defined in such a way that a retainer set +can have only one unique hash key regardless of the order its elements +are stored, i.e., the hashing function must be additive. + +It is often observed that two successive executions of retainer profiling share +a number of retainer sets in common, especially if the two executions are +close in time. +This also implies that the number of all retainer sets which can be created +at any moment does not grow indefinitely regardless of the interval at which +retainer profiling is performed; it does not grow commensurately with the +number of times retainer profiling is executed. +This observation eliminates the need to free the memory allocated for +retainer sets; we can simply set the @cost@ field of every retainer set +to zero after each retainer profiling and reuse it during the next time. + +\section{Graph Traversal} + +At the heart of retainer profiling lies \emph{graph traversal}; +the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit +every closure in the graph at least once and yield statistics on the heap use. +Since only live closures are reachable from the root, the algorithm +does not deal with dead closures. + +This section presents details on how to achieve an efficient implementation of +graph traversal without incurring extra memory overhead and compromising speed. + +\subsection{Goal} + +Traversing a graph itself can be done in a straightforward way; +we choose either depth first search or breadth first search, and traverse +the graph starting from a given set of roots. +After a complete traversal, each live closure @c@ (including static closures) +has an associated retainer set, whose address is stored in the field +@c->header.prof.rs@. + +A real complication arises when retainer profiling is performed once again: +all live closures which have survived all garbage collections since +the previous retainer profiling +still have an associated retainer set (indicated by +a non-null pointer in their retainer set field), which is no longer +valid. Any new closure created since then has +a null pointer in its retainer set field at the beginning of retainer +profiling and will become associated with a retainer set. +Thus, we can no longer distinguish valid retainer set fields +from invalid ones. + +A simple remedy is to linearly scan the heap at the beginning of each +retainer profiling and set all retainer set fields to a null pointer. +It resets the retainer set field of each dynamic closure, whether it is +live or not with respect to the given set of root. +This is feasible because any closure in the heap directly adjoins the +next closure, if any. +The problem is that we have no way of visiting all live static closures, +for which we compute retainer sets. + +A moment of thought, however, reveals that we can completely avoid computing +retainer sets for static closures. This is because retainer profiling is +concerned only about the heap, which consists of dynamic closures and no +static closures. In other words, we can treat every static closure as +a bridge connecting two dynamic closures. +For instance, if a dynamic closure @c@$_1$ has a pointer to a static +closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$, +we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$. +The big problem is that if the graph has a cycle containing static closures, +an infinite loop occurs. In other words, we have no way of telling whether +a static closure has been visited or not and are forced to compute +retainer sets for static closures as well.\footnote{For instance, +a static closure is allowed to have a self-reference in its SRT, which +is also followed during retainer profiling.} + +Another remedy is to stores in every closure a time stamp for the +retainer set field. The time stamp indicates whether the retainer +set field is valid or no longer valid (i.e., it is for the previous +retainer profiling). +At the cost of one extra field in each closure, we can achieve an +elegant implementation with little complication. +However, it turns out that the memory overhead is too big.\footnote{A typical +dynamic closure is only two or three words long.} +Thus, our goal is to stick to the definition of the structure @StgProfHeader@ +given earlier and yet to achieve an elegant solution. + +\subsection{Basic plan} + +Since we visit every live object and update its retainer set field, +any retainer set field can either be valid (the corresponding retainer +set is valid) or point to a retainer set created during the previous +retainer profiling. +In order to distinguish valid retainer set fields +from invalid ones, we exploit the least significant bit of the retainer +set field: we maintain a one bit mark which flips over every time +retainer profiling is performed, and judge that a retainer set field is +valid only if its least significant bit matches the mark. +The variable @flip@ serves for this purpose. +The macros @isRetainerSetFieldValid()@ tests if the retainer set field +of a give closure @c@ is valid: + +\begin{code} +#define isRetainerSetFieldValid(c) \ + ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0) +\end{code} + +As an example, a retainer set field can be set to a null value conforming +the current value of @flip@ by the macro @setRetainerSetToNull()@: + +\begin{code} +#define setRetainerSetToNull(c) \ + (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip) +\end{code} + +Now, when a dynamic closure @c@ is created, its retainer set field is +initialized to a null value conforming to the current value of +@flip@:\footnote{Actually this is not mandatory: even when the null +value does not conform to the current value of @flip@, it will be replaced +by a correct null value when @c@ is visited later.} + +\begin{code} +extern StgWord flip; +#define SET_PROF_HDR(c,ccs_) \ + ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)) +\end{code} + +We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of +@flip@ is set to $0$.\footnote{For the same reason, an initial value $1$ +does not compromise the correctness of the implementation.} + +\subsection{Set of roots} + +The set of roots consists of all thread closures (running, sleeping, or +blocked) existing at the beginning of a retainer profiling. +It is handily obtained in an indirect way by invoking the function +@GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be +a function: +@GetRoots()@ invokes on each root known to the runtime system its argument. +Thus, we implement a function @retainClosure()@, which initiates traversal +from a given root and updates the retainer set of every closure reachable +from the root, +and invokes @GetRoots()@ with @retainClosure@ as an argument. + +In addition to the thread closures, weak pointers are also considered +as roots; they may not be reachable from any thread closure yet are still +being in used. +A weak pointer has three pointer fields: @key@, @value@, and +@finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@). +It turns out that these pointers may not be valid all the time: +at a certain point during execution, for instance, the pointer @key@ may point +to a dead closure. +However, right after a major garbage collection, all the three pointers are +guaranteed to be valid, i.e., they all point to live closures. +This facilitates the handling of weak pointers if we choose to +perform retainer profiling immediately after a major garbage collection. +All weak pointers are found in the linked list @weak_ptr_list@ +(in @Weak.c@). + +See the function @computeRetainerSet()@ for details. + +\subsection{Static closures} + +When a live dynamic closure @c@ is visited for the first time during traversal, +its retainer set field is checked against the current value of @flip@. +If it was created at some point since the previous retainer profiling, +its retainer set field is already set to a correct null value. +Otherwise, it must have been visited +during the previous retainer profiling and thus its retainer set field is +invalid and will be set to a correct null value. +Therefore it is unnecessary to visit all dynamic closures and set their +retainer set field to a correct null value at the beginning of each retainer +profiling. + +However, this operation is required for static closures. +The reason is that a static closure, which is never garbage collected, +may appear alternately in the set of live closures. +In other words, a currently live static closure may become dead and +be resuscitated again. +Therefore, for a static closure, it does not help to check if its +retainer set field conforms to the current value of @flip@. +For instance, +if a static closure happens to belong to the set of live closures every other +retainer profiling, its retainer set field will never set to a null value, +which is disastrous. +Therefore, we choose to visit all live static closures at the beginning +of each retainer profiling and set their retainer set field to a +correct null value. + +In order to find all live static closures, we have each retainer +profiling preceded by a major garbage collection, which knows all live +static closures.\footnote{This is a heavy +restriction on retainer profiling, which makes retainer profiling partially +dependent on garbage collection. +However, it does not affect any retainer profiling result because +retainer profiling considers only live closures, which survive any +garbage collection.} +To be specific, the garbage collector builds a linked list +@scavenged_static_objects@ (in @GC.c@) during a major garbage collection, +which stores all live static closures of our interest. +\footnote{ +A static closure of closure type @IND\_STATIC@ may be put in the +list @mut\_once\_list@ of the oldest generation, instead of the list +@scavenged\_static\_objects@. +In our implementation, such a closure is just skipped over because it +contains only a pointer to a dynamic closure, and we do not compute +its retainer set. +Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest +generation.} +Since it destroys the linked list after finishing the major garbage collection +(by invoking the function @zero_static_object_list()@ with +@scavenged_static_objects@ as its argument), +we traverse the linked list to set the retainer set field of each +live static closure to a correct null value before its destruction. +This is done by invoking the function +@resetStaticObjectForRetainerProfiling()@. + +\textbf{To do:} In the current implemenation, if a static closure has no child +(e.g., @CONSTR_NOCAF_STATIC@, @THUNK_STATIC@ with an empty SRT, and +@FUN_STATIC@ with an empty SRT), we do not compute its retainer set (because +there is no need to do). This slight optimization allows us to render +retainer profiling no longer dependent on garbage collection due to the +following propoerty: + +\begin{center} +A static closure can alternately appear and disappear in the set of live +closures across multiple executions of retainer profiling if and only if +it has an empty SRT and no child. +\end{center} + +Then we can completely eliminate the function +@resetStaticObjectForRetainerProfiling()@. + +\subsection{Traversal} + +The traversal proceeds in a depth first manner and is implemented +with two mutually recursive functions: @retainStack()@ and @retainerClosure()@. +@retainerStack()@ can be invoked on dynamic closures holding a stack chunk: +closure types @TSO@, @PAP@, and @AP_UPD@. +It in turn invokes @retainerClosure()@ on each closure reachable from +stack closures in the stack chunk. Notice that it does not invoke +@retainerClosure()@ on those stack closures because we do not compute +retainer sets for stack closures. +@retainerClosure()@ iteratively traverses all live closures reachable +from a given closure. +It maintains its own stack to record the next scan position in every closure +currently under consideration.\footnote{A recursive version of +@retainerClosure()@ could be implemented easily. +@retainerClosure()@ in our implementation is an iterative version.} +When it encounters a closure holding a stack chunk, it invokes @retainerStack()@ +on that closure. +Hence, +the traversal is triggered simply by invoking @retainerClosure()@ on every root. + +\textbf{To do:} +The correctness of retainer profiling is subject to the correctness +of the two macros @IS_ARG_TAG()@ and @LOOKS_LIKE_GHC_INFO()@ +(see @retainStack()@). Since +@LOOKS_LIKE_GHC_INFO()@ is a bit precarious macro, so I believe that +the current implementation may not be quite safe. Also, @scavenge_stack()@ +in @GC.c@ also exploits this macro in order to identify shallow pointers. +This can be a serious problem if a stack chunk contains some +word which looks like a pointer but is actually not a pointer. + +\subsection{Sanity check} + +Since we assume that a retainer profiling is preceded by a major garbage +collection, +we expect that the size of all the dynamic closures visited during +any retainer profiling adds up exactly to the total size of the heap. +In fact, this is not the case; there can be closures not reachable from +the set of roots yet residing in the heap even after a major garbage +collection. + +First, a dead weak pointer remains in the heap until its finalizer +finishes. Although its finalizer thread closure is part of the set of roots, +the dead weak pointer itself is not reachable from any root. +Since it cannot be visited during retainer profiling anyhow, we do not +need to located it and set its retainer set field +appropriately.\footnote{Dead weak pointers are identified with their +information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@). +Notice that their closure type is @CONSTR@, \emph{not} @WEAK@; +their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the +function @scheduleFinalizers()@ (in @GC.c@).} + +Second, +mutable variables (of closure type @MUT_VAR@) may remain in the heap +even when they are not reachable from the set of roots while +dynamic closures pointed to by them must be live.\footnote{I do not +understand clearly why this happens :(} +Since such mutable variables may become live again (in the sense that +they become reachable from the set of roots), we must locate them +and set their retainer set field appropriately after each retainer +profiling. This is handily accomplished by traversing the list +@mut_once_list@ in every generation. + +\section{Retainer Profiling Schemes} + +A retainer profiling scheme specifies \emph{what} retainer profiling +yields (as opposed to \emph{how} retainer profiling computes the retainer +set for every live object). +It is determined primarily by the meaning of retainer identity, +that is, the type @retainer@ (in @includes/StgRetainerProf.h@). +The function @getRetainerFrom()@ must be defined according to the +definition of the type @retainer@. + +In order for a new retain profiling scheme to fully work, we need to follow +four steps: + +\begin{enumerate} +\item Define the type @retainer@ as desired. +\item Write @getRetainerFrom()@. +\item Write two hashing functions @hashkeySingletone()@ and + @hashKeyAddElement()@, which return the hash key from a single + retainer and a retainer set with another retainer, respectively. +\item Write two printing functions @printRetainer()@ and + @printRetainerSetShort()@. + These functions are employed when a retainer or a retainer set is + printed in the output file. +\end{enumerate} + +In our implementation, we use cost centre stacks for retainer identity: + +\begin{code} +typedef CostCentreStack *retainer; +\end{code} +\begin{code} +retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; } +\end{code} +\begin{code} +void printRetainer(FILE *f, retainer cc) +{ + fprintf(f,"%s[%s]", cc->label, cc->module); +} +\end{code} + +\textbf{To do:} All the closures created by @rts_mk...()@ in @RtsAPI.c@ are given +@CCS_SYSTEM@ as their cost centre stacks. This may not be accurate indeed, +and, for instance, @CCCS@ may be a better choice than @CCS_SYSTEM@. + +\section{Usage} + +Since cost centre stacks are used as retainer identity, a source program +must be given proper cost centre annotations by programmers. +Alternatively, +we can ask the compiler to automatically insert cost centre annotations. +For instance, the compiler option @-auto-all@ inserts a cost centre +annotation around every top-level function as shown below +(the @-p@ option is a must +because we must build the executable file in a profiling way): + +\begin{code} +$ ghc-inplace -o Foo.out -p -auto-all Foo.hs +\end{code} + +The runtime system option @-hR@ tells the executable program to +gather profiling statistics and report them in a @.prof@ file: + +\begin{code} +$ Foo.out +RTS -hR -RTS +\end{code} + +The option @-i@ can be used to +specify a desired interval at which retainer profiling is performed. +The default and minimum value is half a second: + +\begin{code} +$ Foo.out +RTS -hR -i2.5 -RTS +\end{code} + +Then, two text files are generated: a @.prof@ file and a @.hp@ file. +The @.prof@ file records the progress of retainer profiling: +for each retainer profiling performed during program execution, +it shows +the Haskell mutator time (as opposed to the user time) at which +the retainer profiling starts, +the average number of times a closure is visited, +the sum of costs assigned to all retainer sets (obtained from the field +@cost@ in each retainer set), +and the number of all retainer sets created \emph{since} the beginning +of program execution. +A typical entry in a @.prof@ file looks like: + +\begin{code} +Retainer Profiling: 3, at 3.530000 seconds + Average number of visits per object = 1.687765 + Current total costs = 801844 + Number of retainer sets = 118 +\end{code} + +The sum of costs assigned to all retainer sets may \emph{not} be equal to the +size of the heap. +The discrepancy is attributed to those live object which are not reachable +from the set of roots. +Still it is a good estimate of the size of the heap at the moment when +the retainer profiling was performed. + +The @.prof@ file also shows the contents of every retainer set which +has been assigned a positive cost (i.e., the field @cost@) at least once; +not every retainer set created is assigned a positive cost because quite +a few retainer sets are created as intermediate retainer sets before +creating a real retainer set. This results from the restriction on the way +retainer sets are created (only one retainer can be added to an existing +retainer set at a time). + +An example of the contents of a retainer set is: + +\begin{code} +SET 71 = {<doFile[Main],main[Main],MAIN[MAIN]>, <synth_2[Main],doFile[Main],main[Main],MAIN[MAIN]>} +\end{code} + +The retainer set has an identification number $71$. +It is associated with two retainers, whose retainer identities are shown +inside angle brackets @<...>@. +For instance, the first retainer is created when the cost centre stack +is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom. +Each entry in angle brackets consists of a cost centre name (e.g., @doFile@) +and its module name (e.g., @Main@). + +The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript +file showing the progress of retainer profiling in a graph: + +\begin{code} +$ hp2ps Foo.hs +$ gv Foo.ps +\end{code} + +An example of such a graph is shown in Figure~\ref{fig-cacheprof}. +It shows the cost assigned to each retainer set at the point +when a retainer profiling is performed (marked by a corresponding inverted +triangles on the horizontal axis). +The abbreviated contents of each retainer set is displayed in the right column. +Due to the space limitation, +it shows only topmost cost centres (without module names) +instead of printing the full contents. +For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above +(@71@ is its identification number). +The contents may be truncated if it is too long. + +Notice that the time is in the Haskell mutator time, which excludes +the runtime system time such as garbage collection time and retainer profiling +time. Thus, the actual execution takes longer than indicated in the +graph. Also, the timer employed to periodically perform retainer profiling +is not perfectly accurate. Therefore, the result may slightly vary for each +execution of retainer profiling. + +\begin{figure}[ht] +\centering +\epsfig{file=cacheprof_p.eps,width=5in} +\caption{A graph showing the progress of retainer profiling} +\label{fig-cacheprof} +\end{figure} + +\section{Comparision with nhc} + +\section{Files} + +This section gives a summary of changes made to the GHC in +implementing retainer profiling. +Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and +@RetainerProfile.h@) are new, and all others exist in the GHC. + +@\includes@ directory: + +\begin{description} +\item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@. +\item[Stg.h] includes the header file @StgRetainerProf.h@. +\item[Closures.h] changes structure @StgProfHeader@. +\item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and + @SCC_SEQ_FRAME_SIZE@. +\item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and + @SET_STATIC_PROF_HDR()@. +\item[Updates.h] changes macro @PUSH_STD_CCCS()@. +\end{description} + +@\rts@ directory: + +\begin{description} +\item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@, +\item[StgStartup.hc] changes constant @STOP_THREAD_BITMAP@. +\item[StgStdThunks.hc] changes constant @RET_BITMAP@. +\item[Updates.hc] changes constant @UPD_FRAME_BITMAP@. +\item[RetainerProfile.c] implements the retainer profiling engine. +\item[RetainerProfile.h] is the header for @RetainerProfile.c@. +\item[RetainerSet.c] implements the abstract datatype @retainerSet@. +\item[RetainerSet.h] defines the interface for @retainerSet@. +\item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in + @GarbageCollect()@. +\item[Itimer.c] changes @handle_tick()@. +\item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@. +\item[Profiling.c] changes @initProfilingLogFile()@ and + @report_ccs_profiling()@. +\item[Proftimer.c] declares @ticks_to_retainer_profiling@, + @performRetainerProfiling@, and @doContextSwitch@. +\item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@, + which specifies the minimum profiling period and the default profiling period. +%\item[RtsAPI.c] implements @setRetainerField()@. +\item[RtsFlags.c] + sets @RtsFlags.ProfFlags.doHeapProfile@ and + adds a string to @usage_text[]@ in @setupRtsFlags()@. +\item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@. +\item[RtsStartup.c] includes the header file @RetainerProfile.h@. + Changes @shutdownHaskell()@. +\item[Schedule.c] changes @schedule()@. +\item[Stats.c] + declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@, + @RPe_tot_time@. + Changes @mut_user_time_during_GC()@, @mut_user_time()@, + @stat_startExit()@, + @stat_endExit()@, and + @stat_exit()@. + Defines + @mut_user_time_during_RP()@, + @stat_startRP()@, and + @stat_endRP()@. +\item[Stats.h] is the header for @Stats.c@. +\item[StgMiscClosures.hc] redefines @stg_DEAD_WEAK_info@. +\item[Storage.c] changes @initStorage()@, @memInventory()@. +\end{description} + +\bibliographystyle{plain} +\bibliography{reference} + +\end{document} |