summaryrefslogtreecommitdiff
path: root/docs/rts/rts.tex
diff options
context:
space:
mode:
Diffstat (limited to 'docs/rts/rts.tex')
-rw-r--r--docs/rts/rts.tex4683
1 files changed, 4683 insertions, 0 deletions
diff --git a/docs/rts/rts.tex b/docs/rts/rts.tex
new file mode 100644
index 0000000000..158ae7e79a
--- /dev/null
+++ b/docs/rts/rts.tex
@@ -0,0 +1,4683 @@
+%
+% (c) The OBFUSCATION-THROUGH-GRATUITOUS-PREPROCESSOR-ABUSE Project,
+% Glasgow University, 1990-1994
+%
+
+% TODO:
+%
+% o I (ADR) think it would be worth making the connection with CPS explicit.
+% Now that we have explicit activation records (on the stack), we can
+% explain the whole system in terms of CPS and tail calls --- with the
+% one requirement that we carefuly distinguish stack-allocated objects
+% from heap-allocated objects.
+
+% \documentstyle[preprint]{acmconf}
+\documentclass[11pt]{article}
+\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin
+\evensidemargin 0.1 in
+\marginparwidth 0.85in % Narrow margins require narrower marginal notes
+\marginparsep 0 in
+\sloppy
+
+%\usepackage{epsfig}
+\usepackage{shortvrb}
+\MakeShortVerb{\@}
+
+%\newcommand{\note}[1]{{\em Note: #1}}
+\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
+\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
+\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
+\newcommand{\bottom}{\perp}
+
+\newcommand{\secref}[1]{Section~\ref{sec:#1}}
+\newcommand{\figref}[1]{Figure~\ref{fig:#1}}
+\newcommand{\Section}[2]{\section{#1}\label{sec:#2}}
+\newcommand{\Subsection}[2]{\subsection{#1}\label{sec:#2}}
+\newcommand{\Subsubsection}[2]{\subsubsection{#1}\label{sec:#2}}
+
+% DIMENSION OF TEXT:
+\textheight 8.5 in
+\textwidth 6.25 in
+
+\topmargin 0 in
+\headheight 0 in
+\headsep .25 in
+
+
+\setlength{\parskip}{0.15cm}
+\setlength{\parsep}{0.15cm}
+\setlength{\topsep}{0cm} % Reduces space before and after verbatim,
+ % which is implemented using trivlist
+\setlength{\parindent}{0cm}
+
+\renewcommand{\textfraction}{0.2}
+\renewcommand{\floatpagefraction}{0.7}
+
+\begin{document}
+
+\title{The STG runtime system (revised)}
+\author{Simon Peyton Jones \\ Microsoft Research Ltd., Cambridge \and
+Simon Marlow \\ Microsoft Research Ltd., Cambridge \and
+Alastair Reid \\ Yale University}
+
+\maketitle
+
+\tableofcontents
+\newpage
+
+\part{Introduction}
+\Section{Overview}{overview}
+
+This document describes the GHC/Hugs run-time system. It serves as
+a Glasgow/Yale/Nottingham ``contract'' about what the RTS does.
+
+\Subsection{New features compared to GHC 3.xx}{new-features}
+
+\begin{itemize}
+\item The RTS supports mixed compiled/interpreted execution, so
+that a program can consist of a mixture of GHC-compiled and Hugs-interpreted
+code.
+
+\item The RTS supports concurrency by default.
+This has some costs (eg we can't do hardware stack checks) but
+reduces the number of different configurations we need to support.
+
+\item CAFs are only retained if they are
+reachable. Since they are referred to by implicit references buried
+in code, this means that the garbage collector must traverse the whole
+accessible code tree. This feature eliminates a whole class of painful
+space leaks.
+
+\item A running thread has only one stack, which contains a mixture of
+pointers and non-pointers. \secref{TSO} describes how we find out
+which is which. (GHC has used two stacks for some while. Using one
+stack instead of two reduces register pressure, reduces the size of
+update frames, and eliminates ``stack-stubbing'' instructions.)
+
+\item The ``return in registers'' return convention has been dropped
+because it was complicated and doesn't work well on register-poor
+architectures. It has been partly replaced by unboxed tuples
+(\secref{unboxed-tuples}) which allow the programmer to
+explicitly state where results should be returned in registers (or on
+the stack) instead of on the heap.
+
+\item Exceptions are supported by the RTS.
+
+\item Weak Pointers generalise the previously available Foreign Object
+interface.
+
+\item The garbage collector supports a number of new features,
+including a dynamically resizable heap and multiple generations with
+aging within a generation.
+
+\end{itemize}
+
+\Subsection{Wish list}{wish-list}
+
+Here's a list of things we'd like to support in the future.
+\begin{itemize}
+\item Interrupts, speculative computation.
+
+\item
+The SM could tune the size of the allocation arena, the number of
+generations, etc taking into account residency, GC rate and page fault
+rate.
+
+\item
+We could trigger a GC when all threads are blocked waiting for IO if
+the allocation arena (or some of the generations) are nearly full.
+
+\end{itemize}
+
+\Subsection{Configuration}{configuration}
+
+Some of the above features are expensive or less portable, so we
+envision building a number of different configurations supporting
+different subsets of the above features.
+
+You can make the following choices:
+\begin{itemize}
+\item
+Support for parallelism. There are three mutually-exclusive choices.
+
+\begin{description}
+\item[@SEQUENTIAL@] Support for concurrency but not for parallelism.
+\item[@GRANSIM@] Concurrency support and simulated parallelism.
+\item[@PARALLEL@] Concurrency support and real parallelism.
+\end{description}
+
+\item @PROFILING@ adds cost-centre profiling.
+
+\item @TICKY@ gathers internal statistics (often known as ``ticky-ticky'' code).
+
+\item @DEBUG@ does internal consistency checks.
+
+\item Persistence. (well, not yet).
+
+\item
+Which garbage collector to use. At the moment we
+only anticipate one, however.
+\end{itemize}
+
+\Subsection{Glossary}{glossary}
+
+\ToDo{This terminology is not used consistently within the document.
+If you find something which disagrees with this terminology, fix the
+usage.}
+
+In the type system, we have boxed and unboxed types.
+
+\begin{itemize}
+
+\item A \emph{pointed} type is one that contains $\bot$. Variables with
+pointed types are the only things which can be lazily evaluated. In
+the STG machine, this means that they are the only things that can be
+\emph{entered} or \emph{updated} and it requires that they be boxed.
+
+\item An \emph{unpointed} type is one that does not contain $\bot$.
+Variables with unpointed types are never delayed --- they are always
+evaluated when they are constructed. In the STG machine, this means
+that they cannot be \emph{entered} or \emph{updated}. Unpointed objects
+may be boxed (like @Array#@) or unboxed (like @Int#@).
+
+\end{itemize}
+
+In the implementation, we have different kinds of objects:
+
+\begin{itemize}
+
+\item \emph{boxed} objects are heap objects used by the evaluators
+
+\item \emph{unboxed} objects are not heap allocated
+
+\item \emph{stack} objects are allocated on the stack
+
+\item \emph{closures} are objects which can be \emph{entered}.
+They are always boxed and always have boxed types.
+They may be in WHNF or they may be unevaluated.
+
+\item A \emph{thunk} is a (representation of) a value of a \emph{pointed}
+type which is \emph{not} in WHNF.
+
+\item A \emph{value} is an object in WHNF. It can be pointed or unpointed.
+
+\end{itemize}
+
+
+
+At the hardware level, we have \emph{word}s and \emph{pointer}s.
+
+\begin{itemize}
+
+\item A \emph{word} is (at least) 32 bits and can hold either a signed
+or an unsigned int.
+
+\item A \emph{pointer} is (at least) 32 bits and big enough to hold a
+function pointer or a data pointer.
+
+\end{itemize}
+
+Occasionally, a field of a data structure must hold either a word or a
+pointer. In such circumstances, it is \emph{not safe} to assume that
+words and pointers are the same size. \ToDo{GHC currently makes words
+the same size as pointers to reduce complexity in the code
+generator/RTS. It would be useful to relax this restriction, and have
+eg. 32-bit Ints on a 64-bit machine.}
+
+% should define terms like SRT, CAF, PAP, etc. here? --KSW 1999-03
+
+\subsection{Subtle Dependencies}
+
+Some decisions have very subtle consequences which should be written
+down in case we want to change our minds.
+
+\begin{itemize}
+
+\item
+
+If the garbage collector is allowed to shrink the stack of a thread,
+we cannot omit the stack check in return continuations
+(\secref{heap-and-stack-checks}).
+
+\item
+
+When we return to the scheduler, the top object on the stack is a closure.
+The scheduler restarts the thread by entering the closure.
+
+\secref{hugs-return-convention} discusses how Hugs returns an
+unboxed value to GHC and how GHC returns an unboxed value to Hugs.
+
+\item
+
+When we return to the scheduler, we need a few empty words on the stack
+to store a closure to reenter. \secref{heap-and-stack-checks}
+discusses who does the stack check and how much space they need.
+
+\item
+
+Heap objects never contain slop --- this is required if we want to
+support mostly-copying garbage collection.
+
+This is a big problem when updating since the updatee is usually
+bigger than an indirection object. The fix is to overwrite the end of
+the updatee with ``slop objects'' (described in
+\secref{slop-objects}). This is hard to arrange if we do
+\emph{lazy} blackholing (\secref{lazy-black-holing}) so we
+currently plan to blackhole an object when we push the update frame.
+
+% Idea: have specialised update code for various common sizes of
+% updatee, the update frame hence encodes the length of the object.
+% Specialised indirections will also encode the length of the object. A
+% generic version of the update code will overwrite the slop with a slop
+% object. We can do the same thing for blackhole objects, or just have
+% a generic version that is the same size as an indirection and
+% overwrite the slop with a slop object when blackholing. So: does this
+% avoid the need to do eager black holing?
+
+\item
+
+Info tables for constructors contain enough information to decide which
+return convention they use. This allows Hugs to use a single piece of
+entry code for all constructors and insulates Hugs from changes in the
+choice of return convention.
+
+\end{itemize}
+
+\Section{Source Language}{source-language}
+
+\Subsection{Explicit Allocation}{explicit-allocation}
+
+As in the original STG machine, (almost) all heap allocation is caused
+by executing a let(rec). Since we no longer support the return in
+registers convention for data constructors, constructors now cause heap
+allocation and so they should be let-bound.
+
+For example, we now write
+\begin{verbatim}
+> cons = \ x xs -> let r = (:) x xs in r
+@
+instead of
+\begin{verbatim}
+> cons = \ x xs -> (:) x xs
+\end{verbatim}
+
+\note{For historical reasons, GHC doesn't use this syntax --- but it should.}
+
+\Subsection{Unboxed tuples}{unboxed-tuples}
+
+Functions can take multiple arguments as easily as they can take one
+argument: there's no cost for adding another argument. But functions
+can only return one result: the cost of adding a second ``result'' is
+that the function must construct a tuple of ``results'' on the heap.
+The assymetry is rather galling and can make certain programming
+styles quite expensive. For example, consider a simple state transformer
+monad:
+\begin{verbatim}
+> type S a = State -> (a,State)
+> bindS m k s0 = case m s0 of { (a,s1) -> k a s1 }
+> returnS a s = (a,s)
+> getS s = (s,s)
+> setS s _ = ((),s)
+\end{verbatim}
+Here, every use of @returnS@, @getS@ or @setS@ constructs a new tuple
+in the heap which is instantly taken apart (and becomes garbage) by
+the case analysis in @bind@. Even a short state-transformer program
+will construct a lot of these temporary tuples.
+
+Unboxed tuples provide a way for the programmer to indicate that they
+do not expect a tuple to be shared and that they do not expect it to
+be allocated in the heap. Syntactically, unboxed tuples are just like
+single constructor datatypes except for the annotation @unboxed@.
+\begin{verbatim}
+> data unboxed AAndState# a = AnS a State
+> type S a = State -> AAndState# a
+> bindS m k s0 = case m s0 of { AnS a s1 -> k a s1 }
+> returnS a s = AnS a s
+> getS s = AnS s s
+> setS s _ = AnS () s
+\end{verbatim}
+Semantically, unboxed tuples are just unlifted tuples and are subject
+to the same restrictions as other unpointed types.
+
+Operationally, unboxed tuples are never built on the heap. When
+an unboxed tuple is returned, it is returned in multiple registers
+or multiple stack slots. At first sight, this seems a little strange
+but it's no different from passing double precision floats in two
+registers.
+
+Notes:
+\begin{itemize}
+\item
+Unboxed tuples can only have one constructor and that
+thunks never have unboxed types --- so we'll never try to update an
+unboxed constructor. The restriction to a single constructor is
+largely to avoid garbage collection complications.
+
+\item
+The core syntax does not allow variables to be bound to
+unboxed tuples (ie in default case alternatives or as function arguments)
+and does not allow unboxed tuples to be fields of other constructors.
+However, there's no harm in allowing it in the source syntax as a
+convenient, but easily removed, syntactic sugar.
+
+\item
+The compiler generates a closure of the form
+\begin{verbatim}
+> c = \ x y z -> C x y z
+\end{verbatim}
+for every constructor (whether boxed or unboxed).
+
+This closure is normally used during desugaring to ensure that
+constructors are saturated and to apply any strictness annotations.
+They are also used when returning unboxed constructors to the machine
+code evaluator from the bytecode evaluator and when a heap check fails
+in a return continuation for an unboxed-tuple scrutinee.
+
+\end{itemize}
+
+\Subsection{STG Syntax}{stg-syntax}
+
+
+\ToDo{Insert STG syntax with appropriate changes.}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{System Overview}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+This part is concerned with defining the external interfaces of the
+major components of the system; the next part is concerned with their
+inner workings.
+
+The major components of the system are:
+\begin{itemize}
+
+\item
+
+The evaluators (\secref{sm-overview}) are responsible for
+evaluating heap objects. The system supports two evaluators: the
+machine code evaluator; and the bytecode evaluator.
+
+\item
+
+The scheduler (\secref{scheduler-overview}) acts as the
+coordinator for the whole system. It is responsible for switching
+between evaluators, switching between threads, garbage collection,
+communication between multiple processors, etc.
+
+\item
+
+The storage manager (\secref{evaluators-overview}) is
+responsible for allocating blocks of contiguous memory and for garbage
+collection.
+
+\item
+
+The loader (\secref{loader-overview}) is responsible for
+loading machine code and bytecode files from the file system and for
+resolving references between separately compiled modules.
+
+\item
+
+The compilers (\secref{compilers-overview}) generate machine
+code and bytecode files which can be loaded by the loader.
+
+\end{itemize}
+
+\ToDo{Insert diagram showing all components underneath the scheduler
+and communicating only with the scheduler}
+
+
+\Section{The Evaluators}{evaluators-overview}
+
+There are two evaluators: a machine code evaluator and a bytecode
+evaluator. The evaluators task is to evaluate code within a thread
+until one of the following happens:
+
+\begin{itemize}
+\item heap overflow
+\item stack overflow
+\item it is preempted
+\item it blocks in one of the concurrency primitives
+\item it performs a safe ccall
+\item it needs to switch to the other evaluator.
+\end{itemize}
+
+The evaluators expect to find a closure on top of the thread's stack
+and terminate with a closure on top of the thread's stack.
+
+\Subsection{Evaluation Model}{evaluation-model}
+
+Whilst the evaluators differ internally, they share a common
+evaluation model and many object representations.
+
+\Subsubsection{Heap objects}{heap-objects-overview}
+
+The choice of heap and stack objects used by the evaluators is tightly
+bound to the evaluation model. This section provides an overview of
+the most important heap and stack objects; further details are given
+later.
+
+All heap objects look like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Header} & \emph{Payload} \\ \hline
+\end{tabular}
+\end{center}
+
+The headers vary between different kinds of object but they all start
+with a pointer to a pair consisting of an \emph{info table} and some
+\emph{entry code}. The info table is used both by the evaluators and
+by the storage manager and contains a @type@ field which identifies
+which kind of heap object uses it and determines the interpretation of
+the payload and of the other fields of the info table. The entry code
+is some machine code used by the machine code evaluator to evaluate
+closures and raises an error for other kinds of objects.
+
+The major kinds of heap object used are as follows. (For simplicity,
+this description omits certain optimisations and extra fields required
+by the garbage collector.)
+
+\begin{description}
+
+\item[Constructors] are used to represent data constructors. Their
+payload consists of the fields of the constructor; the tag of the
+constructor is stored in the info table.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@CONSTR@ & \emph{Fields} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Primitive objects] are used to represent objects with unlifted
+types which are too large to fit in a register (or stack slot) or for
+which sharing must be preserved. Primitive objects include large
+objects such as multiple precision integers and immutable arrays and
+mutable objects such as mutable arrays, mutable variables, MVar's,
+IVar's and foreign object pointers. Since primitive objects are not
+lifted, they cannot be entered. Their payload varies according to the
+kind of object.
+
+\item[Function closures] are used to represent functions. Their
+payload (if any) consists of the free variables of the function.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@FUN@ & \emph{Free Variables} \\ \hline
+\end{tabular}
+\end{center}
+
+Function closures are only generated by the machine code compiler.
+
+\item[Thunks] are used to represent unevaluated expressions which will
+be updated with their result. Their payload (if any) consists of the
+free variables of the function. The entry code for a thunk starts by
+pushing an \emph{update frame} onto the stack. When evaluation of the
+thunk completes, the update frame will cause the thunk to be
+overwritten again with an \emph{indirection} to the result of the
+thunk, which is always a constructor or a partial application.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@THUNK@ & \emph{Free Variables} \\ \hline
+\end{tabular}
+\end{center}
+
+Thunks are only generated by the machine code evaluator.
+
+\item[Byte-code Objects (@BCO@s)] are generated by the bytecode
+compiler. In conjunction with \emph{updatable applications} and
+\emph{non-updatable applications} they are used to represent
+functions, unevaluated expressions and return addresses.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@BCO@ & \emph{Constant Pool} & \emph{Bytecodes} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Non-updatable (Partial) Applications] are used to represent the
+application of a function to an insufficient number of arguments.
+Their payload consists of the function and the arguments received so far.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@PAP@ & \emph{Function Closure} & \emph{Arguments} \\ \hline
+\end{tabular}
+\end{center}
+
+@PAP@s are used when a function is applied to too few arguments and by
+code generated by the lambda-lifting phase of the bytecode compiler.
+
+\item[Updatable Applications] are used to represent the application of
+a function to a sufficient number of arguments. Their payload
+consists of the function and its arguments.
+
+Updateable applications are like thunks: on entering an updateable
+application, the evaluators push an \emph{update frame} onto the stack
+and overwrite the application with a \emph{black hole}; when
+evaluation completes, the evaluators overwrite the application with an
+\emph{indirection} to the result of the application.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@AP@ & \emph{Function Closure} & \emph{Arguments} \\ \hline
+\end{tabular}
+\end{center}
+
+@AP@s are only generated by the bytecode compiler.
+
+\item[Black holes] are used to mark updateable closures which are
+currently being evaluated. ``Black holing'' an object cures a
+potential space leak and detects certain classes of infinite loops.
+More imporantly, black holes act as synchronisation objects between
+separate threads: if a second thread tries to enter an updateable
+closure which is already being evaluated, the second thread is added
+to a list of blocked threads and the thread is suspended.
+
+When evaluation of the black-holed closure completes, the black hole
+is overwritten with an indirection to the result of the closure and
+any blocked threads are restored to the runnable queue.
+
+Closures are overwritten by black-holes during a ``lazy black-holing''
+phase which runs on each thread when it returns to the scheduler.
+\ToDo{section describing lazy black-holing}.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@BLACKHOLE@ & \emph{Blocked threads} \\ \hline
+\end{tabular}
+\end{center}
+
+\ToDo{In a single threaded system, it's trivial to detect infinite
+loops: reentering a BLACKHOLE is always an error. How easy is it in a
+multi-threaded system?}
+
+\item[Indirections] are used to update an unevaluated closure with its
+(usually fully evaluated) result in situations where it isn't possible
+to perform an update in place. (In the current system, we always
+update with an indirection to avoid duplicating the result when doing
+an update in place.)
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@IND@ & \emph{Closure} \\ \hline
+\end{tabular}
+\end{center}
+
+Indirections needn't always point to a closure in WHNF. They can
+point to a chain of indirections which point to an evaluated closure.
+
+\item[Thread State Objects (@TSO@s)] represent Haskell threads. Their
+payload consists of some per-thread information such as the Thread ID
+and the status of the thread (runnable, blocked etc.), and the
+thread's stack. See @TSO.h@ for the full story. @TSO@s may be
+resized by the scheduler if its stack is too small or too large.
+
+The thread stack grows downwards from higher to lower addresses.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@TSO@ & \emph{Thread info} & \emph{Stack} \\ \hline
+\end{tabular}
+\end{center}
+
+\end{description}
+
+\Subsubsection{Stack objects}{stack-objects-overview}
+
+The stack contains a mixture of \emph{pending arguments} and
+\emph{stack objects}.
+
+Pending arguments are arguments to curried functions which have not
+yet been incorporated into an activation frame. For example, when
+evaluating @let { g x y = x + y; f x = g{x} } in f{3,4}@, the
+evaluator pushes both arguments onto the stack and enters @f@. @f@
+only requires one argument so it leaves the second argument as a
+\emph{pending argument}. The pending argument remains on the stack
+until @f@ calls @g@ which requires two arguments: the argument passed
+to it by @f@ and the pending argument which was passed to @f@.
+
+Unboxed pending arguments are always preceeded by a ``tag'' which says
+how large the argument is. This allows the garbage collector to
+locate pointers within the stack.
+
+There are three kinds of stack object: return addresses, update frames
+and seq frames. All stack objects look like this
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Header} & \emph{Payload} \\ \hline
+\end{tabular}
+\end{center}
+
+As with heap objects, the header starts with a pointer to a pair
+consisting of an \emph{info table} and some \emph{entry code}.
+
+\begin{description}
+
+\item[Return addresses] are used to cause selection and execution of
+case alternatives when a constructor is returned. Return addresses
+generated by the machine code compiler look like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@RET_XXX@ & \emph{Free Variables of the case alternatives} \\ \hline
+\end{tabular}
+\end{center}
+
+The free variables are a mixture of pointers and non-pointers whose
+layout is described by a bitmask in the info table.
+
+There are several kinds of @RET_XXX@ return address - see
+\secref{activation-records} for the details.
+
+Return addresses generated by the bytecode compiler look like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@BCO_RET@ & \emph{BCO} & \emph{Free Variables of the case alternatives} \\ \hline
+\end{tabular}
+\end{center}
+
+There is just one @BCO_RET@ info pointer. We avoid needing different
+@BCO_RET@s for each stack layout by tagging unboxed free variables as
+though they were pending arguments.
+
+\item[Update frames] are used to trigger updates. When an update
+frame is entered, it overwrites the updatee with an indirection to the
+result, restarts any threads blocked on the @BLACKHOLE@ and returns to
+the stack object underneath the update frame.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@UPDATE_FRAME@ & \emph{Next Update Frame} & \emph{Updatee} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Seq frames] are used to implement the polymorphic @seq@
+primitive. They are a special kind of update frame, and are linked on
+the update frame list.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@SEQ_FRAME@ & \emph{Next Update Frame} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Stop frames] are put on the bottom of each thread's stack, and
+act as sentinels for the update frame list (i.e. the last update frame
+points to the stop frame). Returning to a stop frame terminates the
+thread. Stop frames have no payload:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@SEQ_FRAME@ \\ \hline
+\end{tabular}
+\end{center}
+
+\end{description}
+
+\Subsubsection{Case expressions}{case-expr-overview}
+
+In the STG language, all evaluation is triggered by evaluating a case
+expression. When evaluating a case expression @case e of alts@, the
+evaluators pushes a return address onto the stack and evaluate the
+expression @e@. When @e@ eventually reduces to a constructor, the
+return address on the stack is entered. The details of how the
+constructor is passed to the return address and how the appropriate
+case alternative is selected vary between evaluators.
+
+Case expressions for unboxed data types are essentially the same: the
+case expression pushes a return address onto the stack before
+evaluating the scrutinee; when a function returns an unboxed value, it
+enters the return address on top of the stack.
+
+
+\Subsubsection{Function applications}{fun-app-overview}
+
+In the STG language, all function calls are tail calls. The arguments
+are pushed onto the stack and the function closure is entered. If any
+arguments are unboxed, they must be tagged as unboxed pending
+arguments. Entering a closure is just a special case of calling a
+function with no arguments.
+
+
+\Subsubsection{Let expressions}{let-expr-overview}
+
+In the STG language, almost all heap allocation is caused by let
+expressions. Filling in the contents of a set of mutually recursive
+heap objects is simple enough; the only difficulty is that once the
+heap space has been allocated, the thread must not return to the
+scheduler until after the objects are filled in.
+
+
+\Subsubsection{Primitive operations}{primop-overview}
+
+\ToDo{}
+
+Most primops are simple, some aren't.
+
+
+
+
+
+
+\Section{Scheduler}{scheduler-overview}
+
+The Scheduler is the heart of the run-time system. A running program
+consists of a single running thread, and a list of runnable and
+blocked threads. A thread is represented by a \emph{Thread Status
+Object} (TSO), which contains a few words status information and a
+stack. Except for the running thread, all threads have a closure on
+top of their stack; the scheduler restarts a thread by entering an
+evaluator which performs some reduction and returns to the scheduler.
+
+\Subsection{The scheduler's main loop}{scheduler-main-loop}
+
+The scheduler consists of a loop which chooses a runnable thread and
+invokes one of the evaluators which performs some reduction and
+returns.
+
+The scheduler also takes care of system-wide issues such as heap
+overflow or communication with other processors (in the parallel
+system) and thread-specific problems such as stack overflow.
+
+\Subsection{Creating a thread}{create-thread}
+
+Threads are created:
+
+\begin{itemize}
+
+\item
+
+When the scheduler is first invoked.
+
+\item
+
+When a message is received from another processor (I think). (Parallel
+system only.)
+
+\item
+
+When a C program calls some Haskell code.
+
+\item
+
+By @forkIO@, @takeMVar@ and (maybe) other Concurrent Haskell primitives.
+
+\end{itemize}
+
+
+\Subsection{Restarting a thread}{thread-restart}
+
+When the scheduler decides to run a thread, it has to decide which
+evaluator to use. It does this by looking at the type of the closure
+on top of the stack.
+\begin{itemize}
+\item @BCO@ $\Rightarrow$ bytecode evaluator
+\item @FUN@ or @THUNK@ $\Rightarrow$ machine code evaluator
+\item @CONSTR@ $\Rightarrow$ machine code evaluator
+\item other $\Rightarrow$ either evaluator.
+\end{itemize}
+
+The only surprise in the above is that the scheduler must enter the
+machine code evaluator if there's a constructor on top of the stack.
+This allows the bytecode evaluator to return a constructor to a
+machine code return address by pushing the constructor on top of the
+stack and returning to the scheduler. If the return address under the
+constructor is @HUGS_RET@, the entry code for @HUGS_RET@ will
+rearrange the stack so that the return @BCO@ is on top of the stack
+and return to the scheduler which will then call the bytecode
+evaluator. There is little point in trying to shorten this slightly
+indirect route since it is will happen very rarely if at all.
+
+\note{As an optimisation, we could store the choice of evaluator in
+the TSO status whenever we leave the evaluator. This is required for
+any thread, no matter what state it is in (blocked, stack overflow,
+etc). It isn't clear whether this would accomplish anything.}
+
+\Subsection{Returning from a thread}{thread-return}
+
+The evaluators return to the scheduler when any of the following
+conditions arise:
+
+\begin{itemize}
+\item A heap check fails, and a garbage collection is required.
+
+\item A stack check fails, and the scheduler must either enlarge the
+current thread's stack, or flag an out of memory condition.
+
+\item A thread enters a closure built by the other evaluator. That
+is, when the bytecode interpreter enters a closure compiled by GHC or
+when the machine code evaluator enters a BCO.
+
+\item A thread returns to a return continuation built by the other
+evaluator. That is, when the machine code evaluator returns to a
+continuation built by Hugs or when the bytecode evaluator returns to a
+continuation built by GHC.
+
+\item The evaluator needs to perform a ``safe'' C call
+(\secref{c-calls}).
+
+\item The thread becomes blocked. This happens when a thread requires
+the result of a computation currently being performed by another
+thread, or it reads a synchronisation variable that is currently empty
+(\secref{MVAR}).
+
+\item The thread is preempted (the preemption mechanism is described
+in \secref{thread-preemption}).
+
+\item The thread terminates.
+\end{itemize}
+
+Except when the thread terminates, the thread always terminates with a
+closure on the top of the stack. The mechanism used to trigger the
+world switch and the choice of closure left on top of the stack varies
+according to which world is being left and what is being returned.
+
+\Subsubsection{Leaving the bytecode evaluator}{hugs-to-ghc-switch}
+
+\paragraph{Entering a machine code closure}
+
+When it enters a closure, the bytecode evaluator performs a switch
+based on the type of closure (@AP@, @PAP@, @Ind@, etc). On entering a
+machine code closure, it returns to the scheduler with the closure on
+top of the stack.
+
+\paragraph{Returning a constructor}
+
+When it enters a constructor, the bytecode evaluator tests the return
+continuation on top of the stack. If it is a machine code
+continuation, it returns to the scheduler with the constructor on top
+of the stack.
+
+\note{This is why the scheduler must enter the machine code evaluator
+if it finds a constructor on top of the stack.}
+
+\paragraph{Returning an unboxed value}
+
+\note{Hugs doesn't support unboxed values in source programs but they
+are used for a few complex primops.}
+
+When it returns an unboxed value, the bytecode evaluator tests the
+return continuation on top of the stack. If it is a machine code
+continuation, it returns to the scheduler with the tagged unboxed
+value and a special closure on top of the stack. When the closure is
+entered (by the machine code evaluator), it returns the unboxed value
+on top of the stack to the return continuation under it.
+
+The runtime library for GHC provides one of these closures for each unboxed
+type. Hugs cannot generate them itself since the entry code is really
+very tricky.
+
+\paragraph{Heap/Stack overflow and preemption}
+
+The bytecode evaluator tests for heap/stack overflow and preemption
+when entering a BCO and simply returns with the BCO on top of the
+stack.
+
+\Subsubsection{Leaving the machine code evaluator}{ghc-to-hugs-switch}
+
+\paragraph{Entering a BCO}
+
+The entry code for a BCO pushes the BCO onto the stack and returns to
+the scheduler.
+
+\paragraph{Returning a constructor}
+
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation. \figref{hugs-return-stack1}
+shows the state of the stack just before evaluating the scrutinee.
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| bco |--> BCO
++----------+
+| HUGS_RET |
++----------+
+\end{verbatim}
+%\input{hugs_return1.pstex_t}
+\end{center}
+\caption{Stack layout for evaluating a scrutinee}
+\label{fig:hugs-return-stack1}
+\end{figure}
+
+This return address rearranges the stack so that the bco pointer is
+above the constructor on the stack (as shown in
+\figref{hugs-boxed-return}) and returns to the scheduler.
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| con |--> Constructor
++----------+
+| bco |--> BCO
++----------+
+\end{verbatim}
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for entering a Hugs return address}
+\label{fig:hugs-boxed-return}
+\end{figure}
+
+\paragraph{Returning an unboxed value}
+
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation. This return address rearranges the
+stack so that the bco pointer is above the tagged unboxed value (as
+shown in \figref{hugs-entering-unboxed-return}) and returns to the
+scheduler.
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| 1# |
++----------+
+| I# |
++----------+
+| bco |--> BCO
++----------+
+\end{verbatim}
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for returning an unboxed value}
+\label{fig:hugs-entering-unboxed-return}
+\end{figure}
+
+\paragraph{Heap/Stack overflow and preemption}
+
+\ToDo{}
+
+
+\Subsection{Preempting a thread}{thread-preemption}
+
+Strictly speaking, threads cannot be preempted --- the scheduler
+merely sets a preemption request flag which the thread must arrange to
+test on a regular basis. When an evaluator finds that the preemption
+request flag is set, it pushes an appropriate closure onto the stack
+and returns to the scheduler.
+
+In the bytecode interpreter, the flag is tested whenever we enter a
+closure. If the preemption flag is set, it leaves the closure on top
+of the stack and returns to the scheduler.
+
+In the machine code evaluator, the flag is only tested when a heap or
+stack check fails. This is less expensive than testing the flag on
+entering every closure but runs the risk that a thread will enter an
+infinite loop which does not allocate any space. If the flag is set,
+the evaluator returns to the scheduler exactly as if a heap check had
+failed.
+
+\Subsection{``Safe'' and ``unsafe'' C calls}{c-calls}
+
+There are two ways of calling C:
+
+\begin{description}
+
+\item[``Unsafe'' C calls] are used if the programer is certain that
+the C function will not do anything dangerous. Unsafe C calls are
+faster but must be hand-checked by the programmer.
+
+Dangerous things include:
+
+\begin{itemize}
+
+\item
+
+Call a system function such as @getchar@ which might block
+indefinitely. This is dangerous because we don't want the entire
+runtime system to block just because one thread blocks.
+
+\item
+
+Call an RTS function which will block on the RTS access semaphore.
+This would lead to deadlock.
+
+\item
+
+Call a Haskell function. This is just a special case of calling an
+RTS function.
+
+\end{itemize}
+
+Unsafe C calls are performed by pushing the arguments onto the C stack
+and jumping to the C function's entry point. On exit, the result of
+the function is in a register which is returned to the Haskell code as
+an unboxed value.
+
+\item[``Safe'' C calls] are used if the programmer suspects that the
+thread may do something dangerous. Safe C calls are relatively slow
+but are less problematic.
+
+Safe C calls are performed by pushing the arguments onto the Haskell
+stack, pushing a return continuation and returning a \emph{C function
+descriptor} to the scheduler. The scheduler suspends the Haskell thread,
+spawns a new operating system thread which pops the arguments off the
+Haskell stack onto the C stack, calls the C function, pushes the
+function result onto the Haskell stack and informs the scheduler that
+the C function has completed and the Haskell thread is now runnable.
+
+\end{description}
+
+The bytecode evaluator will probably treat all C calls as being safe.
+
+\ToDo{It might be good for the programmer to indicate how the program
+is unsafe. For example, if we distinguish between C functions which
+might call Haskell functions and those which might block, we could
+perform an unsafe call for blocking functions in a single-threaded
+system or, perhaps, in a multi-threaded system which only happens to
+have a single thread at the moment.}
+
+
+
+\Section{The Storage Manager}{sm-overview}
+
+The storage manager is responsible for managing the heap and all
+objects stored in it. It provides special support for lazy evaluation
+and for foreign function calls.
+
+\Subsection{SM support for lazy evaluation}{sm-lazy-evaluation}
+
+\begin{itemize}
+\item
+
+Indirections are shorted out.
+
+\item
+
+Update frames pointing to unreachable objects are squeezed out.
+
+\ToDo{Part IV suggests this doesn't happen.}
+
+\item
+
+Adjacent update frames (for different closures) are compressed to a
+single update frame pointing to a single black hole.
+
+\end{itemize}
+
+
+\Subsection{SM support for foreign function calls}{sm-foreign-calls}
+
+\begin{itemize}
+
+\item
+
+Stable pointers allow other languages to access Haskell objects.
+
+\item
+
+Weak pointers and foreign objects provide finalisation support for
+Haskell references to external objects.
+
+\end{itemize}
+
+\Subsection{Misc}{sm-misc}
+
+\begin{itemize}
+
+\item
+
+If the stack contains a large amount of free space, the storage
+manager may shrink the stack. If it shrinks the stack, it guarantees
+never to leave less than @MIN_SIZE_SHRUNKEN_STACK@ empty words on the
+stack when it does so.
+
+\item
+
+For efficiency reasons, very large objects (eg large arrays and TSOs)
+are not moved if possible.
+
+\end{itemize}
+
+
+\Section{The Compilers}{compilers-overview}
+
+Need to describe interface files, format of bytecode files, symbols
+defined by machine code files.
+
+\Subsection{Interface Files}{interface-files}
+
+Here's an example - but I don't know the grammar - ADR.
+\begin{verbatim}
+_interface_ Main 1
+_exports_
+Main main ;
+_declarations_
+1 main _:_ IOBase.IO PrelBase.();;
+\end{verbatim}
+
+\Subsection{Bytecode files}{bytecode-files}
+
+(All that matters here is what the loader sees.)
+
+\Subsection{Machine code files}{asm-files}
+
+(Again, all that matters is what the loader sees.)
+
+\Section{The Loader}{loader-overview}
+
+In a batch mode system, we can statically link all the modules
+together. In an interactive system we need a loader which will
+explicitly load and unload individual modules (or, perhaps, blocks of
+mutually dependent modules) and resolve references between modules.
+
+While many operating systems provide support for dynamic loading and
+will automatically resolve cross-module references for us, we generally
+cannot rely on being able to load mutually dependent modules.
+
+A portable solution is to perform some of the linking ourselves. Each module
+should provide three global symbols:
+\begin{itemize}
+\item
+An initialisation routine. (Might also be used for finalisation.)
+\item
+A table of symbols it exports.
+Entries in this table consist of the symbol name and the address of the
+name's value.
+\item
+A table of symbols it imports.
+Entries in this table consist of the symbol name and a list of references
+to that symbol.
+\end{itemize}
+
+On loading a group of modules, the loader adds the contents of the
+export lists to a symbol table and then fills in all the references in the
+import lists.
+
+References in import lists are of two types:
+\begin{description}
+\item[ References in machine code ]
+
+The most efficient approach is to patch the machine code directly, but
+this will be a lot of work, very painful to port and rather fragile.
+
+Alternatively, the loader could store the value of each symbol in the
+import table for each module and the compiled code can access all
+external objects through the import table. This requires that the
+import table be writable but does not require that the machine code or
+info tables be writable.
+
+\item[ References in data structures (SRTs and static data constructors) ]
+
+Either we patch the SRTs and constructors directly or we somehow use
+indirections through the symbol table. Patching the SRTs requires
+that we make them writable and prevents us from making effective use
+of virtual memories that use copy-on-write policies (this only makes a
+difference if we want to run several copies of the same program
+simultaneously). Using an indirection is possible but tricky.
+
+Note: We could avoid patching machine code if all references to
+external references went through the SRT --- then we just have one
+thing to patch. But the SRT always contains a pointer to the closure
+rather than the fast entry point (say), so we'd take a big performance
+hit for doing this.
+
+\end{description}
+
+Using the above scheme, all accesses to ``external'' objects involve a
+layer of indirection. To avoid this overhead, the machine code
+compiler might provide a way for the programmer to specify which
+modules will be statically linked and which will be dynamically linked
+--- the idea being that statically linked code and data will be
+accessed directly.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{Internal details}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+This part is concerned with the internal details of the components
+described in the previous part.
+
+The major components of the system are:
+\begin{itemize}
+\item The scheduler (\secref{scheduler-internals})
+\item The storage manager (\secref{storage-manager-internals})
+\item The evaluators
+\item The loader
+\item The compilers
+\end{itemize}
+
+\Section{The Scheduler}{scheduler-internals}
+
+\ToDo{Detailed description of scheduler}
+
+Many heap objects contain fields allowing them to be inserted onto lists
+during evaluation or during garbage collection. The lists required by
+the evaluator and storage manager are as follows.
+
+\begin{itemize}
+
+\item 4 lists of threads: runnable threads, sleeping threads, threads
+waiting for timeout and threads waiting for I/O.
+
+\item The \emph{mutables list} is a list of all objects in the old
+generation which might contain pointers into the new generation. Most
+of the objects on this list are indirections (\secref{IND})
+or ``mutable.'' (\secref{mutables}.)
+
+\item The \emph{Foreign Object list} is a list of all foreign objects
+ which have not yet been deallocated. (\secref{FOREIGN}.)
+
+\item The \emph{Spark pool} is a doubly(?) linked list of Spark objects
+maintained by the parallel system. (\secref{SPARK}.)
+
+\item The \emph{Blocked Fetch list} (or
+lists?). (\secref{BLOCKED_FETCH}.)
+
+\item For each thread, there is a list of all update frames on the
+stack. (\secref{data-updates}.)
+
+\item The Stable Pointer Table is a table of pointers to objects which
+are known to the outside world and must be retained by the garbage
+collector even if they are not accessible from within the heap.
+
+\end{itemize}
+
+\ToDo{The links for these fields are usually inserted immediately
+after the fixed header except ...}
+
+
+
+\Section{The Storage Manager}{storage-manager-internals}
+
+\subsection{Misc Text looking for a home}
+
+A \emph{value} may be:
+\begin{itemize}
+\item \emph{Boxed}, i.e.~represented indirectly by a pointer to a heap object (e.g.~foreign objects, arrays); or
+\item \emph{Unboxed}, i.e.~represented directly by a bit-pattern in one or more registers (e.g.~@Int#@ and @Float#@).
+\end{itemize}
+All \emph{pointed} values are \emph{boxed}.
+
+
+\Subsection{Heap Objects}{heap-objects}
+\label{sec:fixed-header}
+
+\begin{figure}
+\begin{center}
+\input{closure}
+\end{center}
+\ToDo{Fix this picture}
+\caption{A closure}
+\label{fig:closure}
+\end{figure}
+
+Every \emph{heap object} is a contiguous block of memory, consisting
+of a fixed-format \emph{header} followed by zero or more \emph{data
+words}.
+
+The header consists of the following fields:
+\begin{itemize}
+\item A one-word \emph{info pointer}, which points to
+the object's static \emph{info table}.
+\item Zero or more \emph{admin words} that support
+\begin{itemize}
+\item Profiling (notably a \emph{cost centre} word).
+ \note{We could possibly omit the cost centre word from some
+ administrative objects.}
+\item Parallelism (e.g. GranSim keeps the object's global address here,
+though GUM keeps a separate hash table).
+\item Statistics (e.g. a word to track how many times a thunk is entered.).
+
+We add a Ticky word to the fixed-header part of closures. This is
+used to indicate if a closure has been updated but not yet entered. It
+is set when the closure is updated and cleared when subsequently
+entered. \footnote{% NB: It is \emph{not} an ``entry count'', it is
+an ``entries-after-update count.'' The commoning up of @CONST@,
+@CHARLIKE@ and @INTLIKE@ closures is turned off(?) if this is
+required. This has only been done for 2s collection. }
+
+\end{itemize}
+\end{itemize}
+
+Most of the RTS is completely insensitive to the number of admin
+words. The total size of the fixed header is given by
+@sizeof(StgHeader)@.
+
+\Subsection{Info Tables}{info-tables}
+
+An \emph{info table} is a contiguous block of memory, laid out as follows:
+
+\begin{center}
+\begin{tabular}{|r|l|}
+ \hline Parallelism Info & variable
+\\ \hline Profile Info & variable
+\\ \hline Debug Info & variable
+\\ \hline Static reference table & pointer word (optional)
+\\ \hline Storage manager layout info & pointer word
+\\ \hline Closure flags & 8 bits
+\\ \hline Closure type & 8 bits
+\\ \hline Constructor Tag / SRT length & 16 bits
+\\ \hline entry code
+\\ \vdots
+\end{tabular}
+\end{center}
+
+On a 64-bit machine the tag, type and flags fields will all be doubled
+in size, so the info table is a multiple of 64 bits.
+
+An info table has the following contents (working backwards in memory
+addresses):
+
+\begin{itemize}
+
+\item The \emph{entry code} for the closure. This code appears
+literally as the (large) last entry in the info table, immediately
+preceded by the rest of the info table. An \emph{info pointer} always
+points to the first byte of the entry code.
+
+\item A 16-bit constructor tag / SRT length. For a constructor info
+table this field contains the tag of the constructor, in the range
+$0..n-1$ where $n$ is the number of constructors in the datatype.
+Otherwise, it contains the number of entries in this closure's Static
+Reference Table (\secref{srt}).
+
+\item An 8-bit {\em closure type field}, which identifies what kind of
+closure the object is. The various types of closure are described in
+\secref{closures}.
+
+\item an 8-bit flags field, which holds various flags pertaining to
+the closure type.
+
+\item A single pointer or word --- the {\em storage manager info
+field}, contains auxiliary information describing the closure's
+precise layout, for the benefit of the garbage collector and the code
+that stuffs graph into packets for transmission over the network.
+There are three kinds of layout information:
+
+\begin{itemize}
+\item Standard layout information is for closures which place pointers
+before non-pointers in instances of the closure (this applies to most
+heap-based and static closures, but not activation records). The
+layout information for standard closures is
+
+ \begin{itemize}
+ \item Number of pointer fields (16 bits).
+ \item Number of non-pointer fields (16 bits).
+ \end{itemize}
+
+\item Activation records don't have pointers before non-pointers,
+since stack-stubbing requires that the record has holes in it. The
+layout is therefore represented by a bitmap in which each '1' bit
+represents a non-pointer word. This kind of layout info is used for
+@RET_SMALL@ and @RET_VEC_SMALL@ closures.
+
+\item If an activation record is longer than 32 words, then the layout
+field contains a pointer to a bitmap record, consisting of a length
+field followed by two or more bitmap words. This layout information
+is used for @RET_BIG@ and @RET_VEC_BIG@ closures.
+
+\item Selector Thunks (\secref{THUNK_SELECTOR}) use the closure
+layout field to hold the selector index, since the layout is always
+known (the closure contains a single pointer field).
+\end{itemize}
+
+\item A one-word {\em Static Reference Table} field. This field
+points to the static reference table for the closure (\secref{srt}),
+and is only present for the following closure types:
+
+ \begin{itemize}
+ \item @FUN_*@
+ \item @THUNK_*@
+ \item @RET_*@
+ \end{itemize}
+
+\ToDo{Expand the following explanation.}
+
+An SRT is basically a vector of pointers to static closures. A
+top-level function or thunk will have an SRT (which might be empty),
+which points to all the static closures referenced by that function or
+thunk. Every non-top-level thunk or function also has an SRT, but
+it'll be a sub-sequence of the top-level SRT, so we just store a
+pointer and a length in the info table - the pointer points into the
+middle of the larger SRT.
+
+At GC time, the garbage collector traverses the transitive closure of
+all the SRTs reachable from the roots, and thereby discovers which
+CAFs are live.
+
+\item \emph{Profiling info\/}
+
+\ToDo{The profiling info is completely bogus. I've not deleted it
+from the document but I've commented it all out.}
+
+% change to \iftrue to uncomment this section
+\iffalse
+
+Closure category records are attached to the info table of the
+closure. They are declared with the info table. We put pointers to
+these ClCat things in info tables. We need these ClCat things because
+they are mutable, whereas info tables are immutable. Hashing will map
+similar categories to the same hash value allowing statistics to be
+grouped by closure category.
+
+Cost Centres and Closure Categories are hashed to provide indexes
+against which arbitrary information can be stored. These indexes are
+memoised in the appropriate cost centre or category record and
+subsequent hashes avoided by the index routine (it simply returns the
+memoised index).
+
+There are different features which can be hashed allowing information
+to be stored for different groupings. Cost centres have the cost
+centre recorded (using the pointer), module and group. Closure
+categories have the closure description and the type
+description. Records with the same feature will be hashed to the same
+index value.
+
+The initialisation routines, @init_index_<feature>@, allocate a hash
+table in which the cost centre / category records are stored. The
+lower bound for the table size is taken from @max_<feature>_no@. They
+return the actual table size used (the next power of 2). Unused
+locations in the hash table are indicated by a 0 entry. Successive
+@init_index_<feature>@ calls just return the actual table size.
+
+Calls to @index_<feature>@ will insert the cost centre / category
+record in the @<feature>@ hash table, if not already inserted. The hash
+index is memoised in the record and returned.
+
+CURRENTLY ONLY ONE MEMOISATION SLOT IS AVILABLE IN EACH RECORD SO
+HASHING CAN ONLY BE DONE ON ONE FEATURE FOR EACH RECORD. This can be
+easily relaxed at the expense of extra memoisation space or continued
+rehashing.
+
+The initialisation routines must be called before initialisation of
+the stacks and heap as they require to allocate storage. It is also
+expected that the caller may want to allocate additional storage in
+which to store profiling information based on the return table size
+value(s).
+
+\begin{center}
+\begin{tabular}{|l|}
+ \hline Hash Index
+\\ \hline Selected
+\\ \hline Kind
+\\ \hline Description String
+\\ \hline Type String
+\\ \hline
+\end{tabular}
+\end{center}
+
+\begin{description}
+\item[Hash Index] Memoised copy
+\item[Selected]
+ Is this category selected (-1 == not memoised, selected? 0 or 1)
+\item[Kind]
+One of the following values (defined in CostCentre.lh):
+
+\begin{description}
+\item[@CON_K@]
+A constructor.
+\item[@FN_K@]
+A literal function.
+\item[@PAP_K@]
+A partial application.
+\item[@THK_K@]
+A thunk, or suspension.
+\item[@BH_K@]
+A black hole.
+\item[@ARR_K@]
+An array.
+\item[@ForeignObj_K@]
+A Foreign object (non-Haskell heap resident).
+\item[@SPT_K@]
+The Stable Pointer table. (There should only be one of these but it
+represents a form of weak space leak since it can't shrink to meet
+non-demand so it may be worth watching separately? ADR)
+\item[@INTERNAL_KIND@]
+Something internal to the runtime system.
+\end{description}
+
+
+\item[Description] Source derived string detailing closure description.
+\item[Type] Source derived string detailing closure type.
+\end{description}
+
+\fi % end of commented out stuff
+
+\item \emph{Parallelism info\/}
+\ToDo{}
+
+\item \emph{Debugging info\/}
+\ToDo{}
+
+\end{itemize}
+
+
+%-----------------------------------------------------------------------------
+\Subsection{Kinds of Heap Object}{closures}
+
+Heap objects can be classified in several ways, but one useful one is
+this:
+\begin{itemize}
+\item
+\emph{Static closures} occupy fixed, statically-allocated memory
+locations, with globally known addresses.
+
+\item
+\emph{Dynamic closures} are individually allocated in the heap.
+
+\item
+\emph{Stack closures} are closures allocated within a thread's stack
+(which is itself a heap object). Unlike other closures, there are
+never any pointers to stack closures. Stack closures are discussed in
+\secref{TSO}.
+
+\end{itemize}
+A second useful classification is this:
+\begin{itemize}
+
+\item \emph{Executive objects}, such as thunks and data constructors,
+participate directly in a program's execution. They can be subdivided
+into three kinds of objects according to their type: \begin{itemize}
+
+\item \emph{Pointed objects}, represent values of a \emph{pointed}
+type (<.pointed types launchbury.>) --i.e.~a type that includes
+$\bottom$ such as @Int@ or @Int# -> Int#@.
+
+\item \emph{Unpointed objects}, represent values of a \emph{unpointed}
+type --i.e.~a type that does not include $\bottom$ such as @Int#@ or
+@Array#@.
+
+\item \emph{Activation frames}, represent ``continuations''. They are
+always stored on the stack and are never pointed to by heap objects or
+passed as arguments. \note{It's not clear if this will still be true
+once we support speculative evaluation.}
+
+\end{itemize}
+
+\item \emph{Administrative objects}, such as stack objects and thread
+state objects, do not represent values in the original program.
+\end{itemize}
+
+Only pointed objects can be entered. If an unpointed object is
+entered the program will usually terminate with a fatal error.
+
+This section enumerates all the kinds of heap objects in the system.
+Each is identified by a distinct closure type field in its info table.
+
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+
+closure type & Section \\
+
+\hline
+\emph{Pointed} \\
+\hline
+
+@CONSTR@ & \ref{sec:CONSTR} \\
+@CONSTR_p_n@ & \ref{sec:CONSTR} \\
+@CONSTR_STATIC@ & \ref{sec:CONSTR} \\
+@CONSTR_NOCAF_STATIC@ & \ref{sec:CONSTR} \\
+
+@FUN@ & \ref{sec:FUN} \\
+@FUN_p_n@ & \ref{sec:FUN} \\
+@FUN_STATIC@ & \ref{sec:FUN} \\
+
+@THUNK@ & \ref{sec:THUNK} \\
+@THUNK_p_n@ & \ref{sec:THUNK} \\
+@THUNK_STATIC@ & \ref{sec:THUNK} \\
+@THUNK_SELECTOR@ & \ref{sec:THUNK_SELECTOR} \\
+
+@BCO@ & \ref{sec:BCO} \\
+
+@AP_UPD@ & \ref{sec:AP_UPD} \\
+@PAP@ & \ref{sec:PAP} \\
+
+@IND@ & \ref{sec:IND} \\
+@IND_OLDGEN@ & \ref{sec:IND} \\
+@IND_PERM@ & \ref{sec:IND} \\
+@IND_OLDGEN_PERM@ & \ref{sec:IND} \\
+@IND_STATIC@ & \ref{sec:IND} \\
+
+@CAF_UNENTERED@ & \ref{sec:CAF} \\
+@CAF_ENTERED@ & \ref{sec:CAF} \\
+@CAF_BLACKHOLE@ & \ref{sec:CAF} \\
+
+\hline
+\emph{Unpointed} \\
+\hline
+
+@BLACKHOLE@ & \ref{sec:BLACKHOLE} \\
+@BLACKHOLE_BQ@ & \ref{sec:BLACKHOLE_BQ} \\
+
+@MVAR@ & \ref{sec:MVAR} \\
+
+@ARR_WORDS@ & \ref{sec:ARR_WORDS} \\
+
+@MUTARR_PTRS@ & \ref{sec:MUT_ARR_PTRS} \\
+@MUTARR_PTRS_FROZEN@ & \ref{sec:MUT_ARR_PTRS_FROZEN} \\
+
+@MUT_VAR@ & \ref{sec:MUT_VAR} \\
+
+@WEAK@ & \ref{sec:WEAK} \\
+@FOREIGN@ & \ref{sec:FOREIGN} \\
+@STABLE_NAME@ & \ref{sec:STABLE_NAME} \\
+\hline
+\end{tabular}
+
+Activation frames do not live (directly) on the heap --- but they have
+a similar organisation.
+
+\begin{tabular}{|l|l|}\hline
+closure type & Section \\ \hline
+@RET_SMALL@ & \ref{sec:activation-records} \\
+@RET_VEC_SMALL@ & \ref{sec:activation-records} \\
+@RET_BIG@ & \ref{sec:activation-records} \\
+@RET_VEC_BIG@ & \ref{sec:activation-records} \\
+@UPDATE_FRAME@ & \ref{sec:activation-records} \\
+@CATCH_FRAME@ & \ref{sec:activation-records} \\
+@SEQ_FRAME@ & \ref{sec:activation-records} \\
+@STOP_FRAME@ & \ref{sec:activation-records} \\
+\hline
+\end{tabular}
+
+There are also a number of administrative objects. It is an error to
+enter one of these objects.
+
+\begin{tabular}{|l|l|}\hline
+closure type & Section \\ \hline
+@TSO@ & \ref{sec:TSO} \\
+@SPARK_OBJECT@ & \ref{sec:SPARK} \\
+@BLOCKED_FETCH@ & \ref{sec:BLOCKED_FETCH} \\
+@FETCHME@ & \ref{sec:FETCHME} \\
+\hline
+\end{tabular}
+
+\Subsection{Predicates}{closure-predicates}
+
+The runtime system sometimes needs to be able to distinguish objects
+according to their properties: is the object updateable? is it in weak
+head normal form? etc. These questions can be answered by examining
+the closure type field of the object's info table.
+
+We define the following predicates to detect families of related
+info types. They are mutually exclusive and exhaustive.
+
+\begin{itemize}
+\item @isCONSTR@ is true for @CONSTR@s.
+\item @isFUN@ is true for @FUN@s.
+\item @isTHUNK@ is true for @THUNK@s.
+\item @isBCO@ is true for @BCO@s.
+\item @isAP@ is true for @AP@s.
+\item @isPAP@ is true for @PAP@s.
+\item @isINDIRECTION@ is true for indirection objects.
+\item @isBH@ is true for black holes.
+\item @isFOREIGN_OBJECT@ is true for foreign objects.
+\item @isARRAY@ is true for array objects.
+\item @isMVAR@ is true for @MVAR@s.
+\item @isIVAR@ is true for @IVAR@s.
+\item @isFETCHME@ is true for @FETCHME@s.
+\item @isSLOP@ is true for slop objects.
+\item @isRET_ADDR@ is true for return addresses.
+\item @isUPD_ADDR@ is true for update frames.
+\item @isTSO@ is true for @TSO@s.
+\item @isSTABLE_PTR_TABLE@ is true for the stable pointer table.
+\item @isSPARK_OBJECT@ is true for spark objects.
+\item @isBLOCKED_FETCH@ is true for blocked fetch objects.
+\item @isINVALID_INFOTYPE@ is true for all other info types.
+
+\end{itemize}
+
+The following predicates detect other interesting properties:
+
+\begin{itemize}
+
+\item @isPOINTED@ is true if an object has a pointed type.
+
+If an object is pointed, the following predicates may be true
+(otherwise they are false). @isWHNF@ and @isUPDATEABLE@ are
+mutually exclusive.
+
+\begin{itemize}
+\item @isWHNF@ is true if the object is in Weak Head Normal Form.
+Note that unpointed objects are (arbitrarily) not considered to be in WHNF.
+
+@isWHNF@ is true for @PAP@s, @CONSTR@s, @FUN@s and all @BCO@s.
+
+\ToDo{Need to distinguish between whnf BCOs and non-whnf BCOs in their
+closure type}
+
+\item @isUPDATEABLE@ is true if the object may be overwritten with an
+ indirection object.
+
+@isUPDATEABLE@ is true for @THUNK@s, @AP@s and @BH@s.
+
+\end{itemize}
+
+It is possible for a pointed object to be neither updatable nor in
+WHNF. For example, indirections.
+
+\item @isUNPOINTED@ is true if an object has an unpointed type.
+All such objects are boxed since only boxed objects have info pointers.
+
+It is true for @ARR_WORDS@, @ARR_PTRS@, @MUTVAR@, @MUTARR_PTRS@,
+@MUTARR_PTRS_FROZEN@, @FOREIGN@ objects, @MVAR@s and @IVAR@s.
+
+\item @isACTIVATION_FRAME@ is true for activation frames of all sorts.
+
+It is true for return addresses and update frames.
+\begin{itemize}
+\item @isVECTORED_RETADDR@ is true for vectored return addresses.
+\item @isDIRECT_RETADDR@ is true for direct return addresses.
+\end{itemize}
+
+\item @isADMINISTRATIVE@ is true for administrative objects:
+@TSO@s, the stable pointer table, spark objects and blocked fetches.
+
+\item @hasSRT@ is true if the info table for the object contains an
+SRT pointer.
+
+@hasSRT@ is true for @THUNK@s, @FUN@s, and @RET@s.
+
+\end{itemize}
+
+\begin{itemize}
+
+\item @isSTATIC@ is true for any statically allocated closure.
+
+\item @isMUTABLE@ is true for objects with mutable pointer fields:
+ @MUT_ARR@s, @MUTVAR@s, @MVAR@s and @IVAR@s.
+
+\item @isSparkable@ is true if the object can (and should) be sparked.
+It is true of updateable objects which are not in WHNF with the
+exception of @THUNK_SELECTOR@s and black holes.
+
+\end{itemize}
+
+As a minor optimisation, we might use the top bits of the @INFO_TYPE@
+field to ``cache'' the answers to some of these predicates.
+
+An indirection either points to HNF (post update); or is result of
+overwriting a FetchMe, in which case the thing fetched is either under
+evaluation (BLACKHOLE), or by now an HNF. Thus, indirections get
+NoSpark flag.
+
+\subsection{Closures (aka Pointed Objects)}
+
+An object can be entered iff it is a closure.
+
+\Subsubsection{Function closures}{FUN}
+
+Function closures represent lambda abstractions. For example,
+consider the top-level declaration:
+\begin{verbatim}
+ f = \x -> let g = \y -> x+y
+ in g x
+\end{verbatim}
+Both @f@ and @g@ are represented by function closures. The closure
+for @f@ is \emph{static} while that for @g@ is \emph{dynamic}.
+
+The layout of a function closure is as follows:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+
+The data words (pointers and non-pointers) are the free variables of
+the function closure. The number of pointers and number of
+non-pointers are stored in @info->layout.ptrs@ and
+@info->layout.nptrs@ respecively.
+
+There are several different sorts of function closure, distinguished
+by their closure type field:
+
+\begin{itemize}
+
+\item @FUN@: a vanilla, dynamically allocated on the heap.
+
+\item $@FUN_@p@_@np$: to speed up garbage collection a number of
+specialised forms of @FUN@ are provided, for particular $(p,np)$
+pairs, where $p$ is the number of pointers and $np$ the number of
+non-pointers.
+
+\item @FUN_STATIC@. Top-level, static, function closures (such as @f@
+above) have a different layout than dynamic ones:
+
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header} & \emph{Static object link} \\ \hline
+\end{tabular}
+\end{center}
+
+Static function closures have no free variables. (However they may
+refer to other static closures; these references are recorded in the
+function closure's SRT.) They have one field that is not present in
+dynamic closures, the \emph{static object link} field. This is used
+by the garbage collector in the same way that to-space is, to gather
+closures that have been determined to be live but that have not yet
+been scavenged.
+
+\note{Static function closures that have no static references, and
+hence a null SRT pointer, don't need the static object link field. We
+don't take advantage of this at the moment, but we could. See
+@CONSTR\_NOCAF\_STATIC@.}
+\end{itemize}
+
+Each lambda abstraction, $f$, in the STG program has its own private
+info table. The following labels are relevant:
+
+\begin{itemize}
+
+\item $f$@_info@ is $f$'s info table.
+
+\item $f$@_entry@ is $f$'s slow entry point (i.e. the entry code of
+its info table; so it will label the same byte as $f$@_info@).
+
+\item $f@_fast_@k$ is $f$'s fast entry point. $k$ is the number of
+arguments $f$ takes; encoding this number in the fast-entry label
+occasionally catches some nasty code-generation errors.
+
+\end{itemize}
+
+\Subsubsection{Data constructors}{CONSTR}
+
+Data-constructor closures represent values constructed with algebraic
+data type constructors. The general layout of data constructors is
+the same as that for function closures. That is
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+
+There are several different sorts of constructor:
+
+\begin{itemize}
+
+\item @CONSTR@: a vanilla, dynamically allocated constructor.
+
+\item @CONSTR_@$p$@_@$np$: just like $@FUN_@p@_@np$.
+
+\item @CONSTR_INTLIKE@. A dynamically-allocated heap object that
+looks just like an @Int@. The garbage collector checks to see if it
+can common it up with one of a fixed set of static int-like closures,
+thus getting it out of the dynamic heap altogether.
+
+\item @CONSTR_CHARLIKE@: same deal, but for @Char@.
+
+\item @CONSTR_STATIC@ is similar to @FUN_STATIC@, with the
+complication that the layout of the constructor must mimic that of a
+dynamic constructor, because a static constructor might be returned to
+some code that unpacks it. So its layout is like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} & \emph{Static object link}\\ \hline
+\end{tabular}
+\end{center}
+
+The static object link, at the end of the closure, serves the same purpose
+as that for @FUN_STATIC@. The pointers in the static constructor can point
+only to other static closures.
+
+The static object link occurs last in the closure so that static
+constructors can store their data fields in exactly the same place as
+dynamic constructors.
+
+\item @CONSTR_NOCAF_STATIC@. A statically allocated data constructor
+that guarantees not to point (directly or indirectly) to any CAF
+(\secref{CAF}). This means it does not need a static object
+link field. Since we expect that there might be quite a lot of static
+constructors this optimisation makes sense. Furthermore, the @NOCAF@
+tag allows the compiler to indicate that no CAFs can be reached
+anywhere \emph{even indirectly}.
+
+\end{itemize}
+
+For each data constructor $Con$, two info tables are generated:
+
+\begin{itemize}
+\item $Con$@_con_info@ labels $Con$'s dynamic info table,
+shared by all dynamic instances of the constructor.
+\item $Con$@_static@ labels $Con$'s static info table,
+shared by all static instances of the constructor.
+\end{itemize}
+
+Each constructor also has a \emph{constructor function}, which is a
+curried function which builds an instance of the constructor. The
+constructor function has an info table labelled as @$Con$_info@, and
+entry code pointed to by @$Con$_entry@.
+
+Nullary constructors are represented by a single static info table,
+which everyone points to. Thus for a nullary constructor we can omit
+the dynamic info table and the constructor function.
+
+\subsubsection{Thunks}
+\label{sec:THUNK}
+\label{sec:THUNK_SELECTOR}
+
+A thunk represents an expression that is not obviously in head normal
+form. For example, consider the following top-level definitions:
+\begin{verbatim}
+ range = between 1 10
+ f = \x -> let ys = take x range
+ in sum ys
+\end{verbatim}
+Here the right-hand sides of @range@ and @ys@ are both thunks; the former
+is static while the latter is dynamic.
+
+The layout of a thunk is the same as that for a function closure.
+However, thunks must have a payload of at least @MIN_UPD_SIZE@
+words to allow it to be overwritten with a black hole and an
+indirection. The compiler may have to add extra non-pointer fields to
+satisfy this constraint.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+
+The layout word in the info table contains the same information as for
+function closures; that is, number of pointers and number of
+non-pointers.
+
+A thunk differs from a function closure in that it can be updated.
+
+There are several forms of thunk:
+
+\begin{itemize}
+
+\item @THUNK@ and $@THUNK_@p@_@np$: vanilla, dynamically allocated
+thunks. Dynamic thunks are overwritten with normal indirections
+(@IND@), or old generation indirections (@IND_OLDGEN@): see
+\secref{IND}.
+
+\item @THUNK_STATIC@. A static thunk is also known as a
+\emph{constant applicative form}, or \emph{CAF}. Static thunks are
+overwritten with static indirections.
+
+\begin{center}
+\begin{tabular}{|l|l|}\hline
+\emph{Fixed header} & \emph{Static object link}\\ \hline
+\end{tabular}
+\end{center}
+
+\item @THUNK_SELECTOR@ is a (dynamically allocated) thunk whose entry
+code performs a simple selection operation from a data constructor
+drawn from a single-constructor type. For example, the thunk
+\begin{verbatim}
+ x = case y of (a,b) -> a
+\end{verbatim}
+is a selector thunk. A selector thunk is laid out like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Selectee pointer} \\ \hline
+\end{tabular}
+\end{center}
+
+The layout word contains the byte offset of the desired word in the
+selectee. Note that this is different from all other thunks.
+
+The garbage collector ``peeks'' at the selectee's tag (in its info
+table). If it is evaluated, then it goes ahead and does the
+selection, and then behaves just as if the selector thunk was an
+indirection to the selected field. If it is not evaluated, it treats
+the selector thunk like any other thunk of that shape.
+[Implementation notes. Copying: only the evacuate routine needs to be
+special. Compacting: only the PRStart (marking) routine needs to be
+special.]
+
+There is a fixed set of pre-compiled selector thunks built into the
+RTS, representing offsets from 0 to @MAX_SPEC_SELECTOR_THUNK@. The
+info tables are labelled @__sel_$n$_upd_info@ where $n$ is the offset.
+Non-updating versions are also built in, with info tables labelled
+@__sel_$n$_noupd_info@.
+
+\end{itemize}
+
+The only label associated with a thunk is its info table:
+
+\begin{description}
+\item[$f$@\_info@] is $f$'s info table.
+\end{description}
+
+
+\Subsubsection{Byte-code objects}{BCO}
+
+A Byte-Code Object (BCO) is a container for a a chunk of byte-code,
+which can be executed by Hugs. The byte-code represents a
+supercombinator in the program: when Hugs compiles a module, it
+performs lambda lifting and each resulting supercombinator becomes a
+byte-code object in the heap.
+
+BCOs are not updateable; the bytecode compiler represents updatable
+thunks using a combination of @AP@s and @BCO@s.
+
+The semantics of BCOs are described in \secref{hugs-heap-objects}. A
+BCO has the following structure:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}
+\hline
+\emph{Fixed Header} & \emph{Layout} & \emph{Offset} & \emph{Size} &
+\emph{Literals} & \emph{Byte code} \\
+\hline
+\end{tabular}
+\end{center}
+
+\noindent where:
+\begin{itemize}
+\item The entry code is a static code fragment/info table that returns
+to the scheduler to invoke Hugs (\secref{ghc-to-hugs-switch}).
+\item \emph{Layout} contains the number of pointer literals in the
+\emph{Literals} field.
+\item \emph{Offset} is the offset to the byte code from the start of
+the object.
+\item \emph{Size} is the number of words of byte code in the object.
+\item \emph{Literals} contains any pointer and non-pointer literals used in
+the byte-codes (including jump addresses), pointers first.
+\item \emph{Byte code} contains \emph{Size} words of non-pointer byte
+code.
+\end{itemize}
+
+
+\Subsubsection{Partial applications}{PAP}
+
+A partial application (PAP) represents a function applied to too few
+arguments. It is only built as a result of updating after an
+argument-satisfaction check failure. A PAP has the following shape:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{No of words of stack} & \emph{Function closure} & \emph{Stack chunk ...} \\ \hline
+\end{tabular}
+\end{center}
+
+The ``Stack chunk'' is a copy of the chunk of stack above the update
+frame; ``No of words of stack'' tells how many words it consists of.
+The function closure is (a pointer to) the closure for the function
+whose argument-satisfaction check failed.
+
+In the normal case where a PAP is built as a result of an argument
+satisfaction check failure, the stack chunk will just contain
+``pending arguments'', ie. pointers and tagged non-pointers. It may
+in fact also contain activation records, but not update frames, seq
+frames, or catch frames. The reason is the garbage collector uses the
+same code to scavenge a stack as it does to scavenge the payload of a
+PAP, but an update frame contains a link to the next update frame in
+the chain and this link would need to be relocated during garbage
+collection. Revertible black holes and asynchronous exceptions use
+the more general form of PAPs (see Section \ref{revertible-bh}).
+
+There is just one standard form of PAP. There is just one info table
+too, called @PAP_info@. Its entry code simply copies the arg stack
+chunk back on top of the stack and enters the function closure. (It
+has to do a stack overflow test first.)
+
+There is just one way to build a PAP: by calling @stg_update_PAP@ with
+the function closure in register @R1@ and the pending arguments on the
+stack. The @stg_update_PAP@ function will build the PAP, perform the
+update, and return to the next activation record on the stack. If
+there are \emph{no} pending arguments on the stack, then no PAP need
+be built: in this case @stg_update_PAP@ just overwrites the updatee
+with an indirection to the function closure.
+
+PAPs are also used to implement Hugs functions (where the arguments
+are free variables). PAPs generated by Hugs can be static so we need
+both @PAP@ and @PAP_STATIC@.
+
+\Subsubsection{\texttt{AP\_UPD} objects}{AP_UPD}
+
+@AP_UPD@ objects are used to represent thunks built by Hugs, and to
+save the currently-active computations when performing @raiseAsync()@.
+The only
+distinction between an @AP_UPD@ and a @PAP@ is that an @AP_UPD@ is
+updateable.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed Header} & \emph{No of stack words} & \emph{Function closure} & \emph{Stack chunk} \\
+\hline
+\end{tabular}
+\end{center}
+
+The entry code pushes an update frame, copies the arg stack chunk on
+top of the stack, and enters the function closure. (It has to do a
+stack overflow test first.)
+
+The ``stack chunk'' is a block of stack not containing update frames,
+seq frames or catch frames (just like a PAP). In the case of Hugs,
+the stack chunk will contain the free variables of the thunk, and the
+function closure is (a pointer to) the closure for the thunk. The
+argument stack may be empty if the thunk has no free variables.
+
+\note{Since @AP\_UPD@s are updateable, the @MIN\_UPD\_SIZE@ constraint applies here too.}
+
+\Subsubsection{Indirections}{IND}
+
+Indirection closures just point to other closures. They are introduced
+when a thunk is updated to point to its value. The entry code for all
+indirections simply enters the closure it points to.
+
+There are several forms of indirection:
+
+\begin{description}
+\item[@IND@] is the vanilla, dynamically-allocated indirection.
+It is removed by the garbage collector. It has the following
+shape:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header} & \emph{Target closure} \\ \hline
+\end{tabular}
+\end{center}
+
+An @IND@ only exists in the youngest generation. In older
+generations, we have @IND_OLDGEN@s. The update code
+(@Upd_frame_$n$_entry@) checks whether the updatee is in the youngest
+generation before deciding which kind of indirection to use.
+
+\item[@IND\_OLDGEN@] is the vanilla, dynamically-allocated indirection.
+It is removed by the garbage collector. It has the following
+shape:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header} & \emph{Target closure} & \emph{Mutable link field} \\ \hline
+\end{tabular}
+\end{center}
+It contains a \emph{mutable link field} that is used to string together
+mutable objects in each old generation.
+
+\item[@IND\_PERM@]
+For lexical profiling, it is necessary to maintain cost centre
+information in an indirection, so ``permanent indirections'' are
+retained forever. Otherwise they are just like vanilla indirections.
+\note{If a permanent indirection points to another permanent
+indirection or a @CONST@ closure, it is possible to elide the indirection
+since it will have no effect on the profiler.}
+
+\note{Do we still need @IND@ in the profiling build, or do we just
+need @IND@ but its behaviour changes when profiling is on?}
+
+\item[@IND\_OLDGEN\_PERM@]
+Just like an @IND_OLDGEN@, but sticks around like an @IND_PERM@.
+
+\item[@IND\_STATIC@] is used for overwriting CAFs when they have been
+evaluated. Static indirections are not removed by the garbage
+collector; and are statically allocated outside the heap (and should
+stay there). Their static object link field is used just as for
+@FUN_STATIC@ closures.
+
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Target closure} & \emph{Static link field} \\
+\hline
+\end{tabular}
+\end{center}
+
+\end{description}
+
+\subsubsection{Black holes and blocking queues}
+\label{sec:BLACKHOLE}
+\label{sec:BLACKHOLE_BQ}
+
+Black hole closures are used to overwrite closures currently being
+evaluated. They inform the garbage collector that there are no live
+roots in the closure, thus removing a potential space leak.
+
+Black holes also become synchronization points in the concurrent
+world. When a thread attempts to enter a blackhole, it must wait for
+the result of the computation, which is presumably in progress in
+another thread.
+
+\note{In a single-threaded system, entering a black hole indicates an
+infinite loop. In a concurrent system, entering a black hole
+indicates an infinite loop only if the hole is being entered by the
+same thread that originally entered the closure. It could also bring
+about a deadlock situation where several threads are waiting
+circularly on computations in progress.}
+
+There are two types of black hole:
+
+\begin{description}
+
+\item[@BLACKHOLE@]
+A straightforward blackhole just consists of an info pointer and some
+padding to allow updating with an @IND_OLDGEN@ if necessary. This
+type of blackhole has no waiting threads.
+
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Padding} & \emph{Padding} \\
+\hline
+\end{tabular}
+\end{center}
+
+If we're doing \emph{eager blackholing} then a thunk's info pointer is
+overwritten with @BLACKHOLE_info@ at the time of entry; hence the need
+for blackholes to be small, otherwise we'd be overwriting part of the
+thunk itself.
+
+\item[@BLACKHOLE\_BQ@]
+When a thread enters a @BLACKHOLE@, it is turned into a @BLACKHOLE_BQ@
+(blocking queue), which contains a linked list of blocked threads in
+addition to the info pointer.
+
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Blocked thread link} & \emph{Mutable link field} \\
+\hline
+\end{tabular}
+\end{center}
+
+The \emph{Blocked thread link} points to the TSO of the first thread
+waiting for the value of this thunk. All subsequent TSOs in the list
+are linked together using their @tso->link@ field, ending in
+@END_TSO_QUEUE_closure@.
+
+Because new threads can be added to the \emph{Blocked thread link}, a
+blocking queue is \emph{mutable}, so we need a mutable link field in
+order to chain it on to a mutable list for the generational garbage
+collector.
+
+\end{description}
+
+\Subsubsection{FetchMes}{FETCHME}
+
+In the parallel systems, FetchMes are used to represent pointers into
+the global heap. When evaluated, the value they point to is read from
+the global heap.
+
+\ToDo{Describe layout}
+
+Because there may be offsets into these arrays, a primitive array
+cannot be handled as a FetchMe in the parallel system, but must be
+shipped in its entirety if its parent closure is shipped.
+
+
+
+\Subsection{Unpointed Objects}{unpointed-objects}
+
+A variable of unpointed type is always bound to a \emph{value}, never
+to a \emph{thunk}. For this reason, unpointed objects cannot be
+entered.
+
+\subsubsection{Immutable objects}
+\label{sec:ARR_WORDS}
+
+\begin{description}
+\item[@ARR\_WORDS@] is a variable-sized object consisting solely of
+non-pointers. It is used for arrays of all sorts of things (bytes,
+words, floats, doubles... it doesn't matter).
+
+Strictly speaking, an @ARR_WORDS@ could be mutable, but because it
+only contains non-pointers we don't need to track this fact.
+
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{No of non-pointers} & \emph{Non-pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+\end{description}
+
+\subsubsection{Mutable objects}
+\label{sec:mutables}
+\label{sec:MUT_VAR}
+\label{sec:MUT_ARR_PTRS}
+\label{sec:MUT_ARR_PTRS_FROZEN}
+\label{sec:MVAR}
+
+Some of these objects are \emph{mutable}; they represent objects which
+are explicitly mutated by Haskell code through the @ST@ or @IO@
+monads. They're not used for thunks which are updated precisely once.
+Depending on the garbage collector, mutable closures may contain extra
+header information which allows a generational collector to implement
+the ``write barrier.''
+
+Notice that mutable objects all have the same general layout: there is
+a mutable link field as the second word after the header. This is so
+that code to process old-generation mutable lists doesn't need to look
+at the type of the object to determine where its link field is.
+
+\begin{description}
+
+\item[@MUT\_VAR@] is a mutable variable.
+\begin{center}
+\begin{tabular}{|c|c|c|}
+\hline
+\emph{Fixed Hdr} \emph{Pointer} & \emph{Mutable link} & \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MUT\_ARR\_PTRS@] is a mutable array of pointers. Such an array
+may be \emph{frozen}, becoming an @MUT_ARR_PTRS_FROZEN@, with a
+different info-table.
+
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{No of ptrs} & \emph{Mutable link} & \emph{Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MUT\_ARR\_PTRS\_FROZEN@] This is the immutable version of
+@MUT_ARR_PTRS@. It still has a mutable link field for two reasons: we
+need to keep it on the mutable list for an old generation at least
+until the next garbage collection, and it may become mutable again via
+@thawArray@.
+
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{No of ptrs} & \emph{Mutable link} & \emph{Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MVAR@]
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Head} & \emph{Mutable link} & \emph{Tail}
+& \emph{Value}\\
+\hline
+\end{tabular}
+\end{center}
+
+\ToDo{MVars}
+
+\end{description}
+
+
+\Subsubsection{Foreign objects}{FOREIGN}
+
+Here's what a ForeignObj looks like:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Data} \\
+\hline
+\end{tabular}
+\end{center}
+
+A foreign object is simple a boxed pointer to an address outside the
+Haskell heap, possible to @malloc@ed data. The only reason foreign
+objects exist is so that we can track the lifetime of one using weak
+pointers (see \secref{WEAK}) and run a finaliser when the foreign
+object is unreachable.
+
+\subsubsection{Weak pointers}
+\label{sec:WEAK}
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Key} & \emph{Value} & \emph{Finaliser}
+& \emph{Link}\\
+\hline
+\end{tabular}
+\end{center}
+
+\ToDo{Weak poitners}
+
+\subsubsection{Stable names}
+\label{sec:STABLE_NAME}
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Index} \\
+\hline
+\end{tabular}
+\end{center}
+
+\ToDo{Stable names}
+
+The remaining objects types are all administrative --- none of them
+may be entered.
+
+\subsection{Other weird objects}
+\label{sec:SPARK}
+\label{sec:BLOCKED_FETCH}
+
+\begin{description}
+\item[@BlockedFetch@ heap objects (`closures')] (parallel only)
+
+@BlockedFetch@s are inbound fetch messages blocked on local closures.
+They arise as entries in a local blocking queue when a fetch has been
+received for a local black hole. When awakened, we look at their
+contents to figure out where to send a resume.
+
+A @BlockedFetch@ closure has the form:
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Fixed header} & link & node & gtid & slot & weight \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Spark Closures] (parallel only)
+
+Spark closures are used to link together all closures in the spark pool. When
+the current processor is idle, it may choose to speculatively evaluate some of
+the closures in the pool. It may also choose to delete sparks from the pool.
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Spark pool link} & \emph{Sparked closure} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Slop Objects]\label{sec:slop-objects}
+
+Slop objects are used to overwrite the end of an updatee if it is
+larger than an indirection. Normal slop objects consist of an info
+pointer a size word and a number of slop words.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Info Pointer} & \emph{Size} & \emph{Slop Words} \\ \hline
+\end{tabular}
+\end{center}
+
+This is too large for single word slop objects which consist of a
+single info table.
+
+Note that slop objects only contain an info pointer, not a standard
+fixed header. This doesn't cause problems because slop objects are
+always unreachable --- they can only be accessed by linearly scanning
+the heap.
+
+\note{Currently we don't use slop objects because the storage manager
+isn't reliant on objects being adjacent, but if we move to a ``mostly
+copying'' style collector, this will become an issue.}
+
+\end{description}
+
+\Subsection{Thread State Objects (TSOs)}{TSO}
+
+In the multi-threaded system, the state of a suspended thread is
+packed up into a Thread State Object (TSO) which contains all the
+information needed to restart the thread and for the garbage collector
+to find all reachable objects. When a thread is running, it may be
+``unpacked'' into machine registers and various other memory locations
+to provide faster access.
+
+Single-threaded systems don't really \emph{need\/} TSOs --- but they do
+need some way to tell the storage manager about live roots so it is
+convenient to use a single TSO to store the mutator state even in
+single-threaded systems.
+
+Rather than manage TSOs' alloc/dealloc, etc., in some \emph{ad hoc}
+way, we instead alloc/dealloc/etc them in the heap; then we can use
+all the standard garbage-collection/fetching/flushing/etc machinery on
+them. So that's why TSOs are ``heap objects,'' albeit very special
+ones.
+\begin{center}
+\begin{tabular}{|l|l|}
+ \hline \emph{Fixed header}
+\\ \hline \emph{Link field}
+\\ \hline \emph{Mutable link field}
+\\ \hline \emph{What next}
+\\ \hline \emph{State}
+\\ \hline \emph{Thread Id}
+\\ \hline \emph{Exception Handlers}
+\\ \hline \emph{Ticky Info}
+\\ \hline \emph{Profiling Info}
+\\ \hline \emph{Parallel Info}
+\\ \hline \emph{GranSim Info}
+\\ \hline \emph{Stack size}
+\\ \hline \emph{Max Stack size}
+\\ \hline \emph{Sp}
+\\ \hline \emph{Su}
+\\ \hline \emph{SpLim}
+\\ \hline
+\\
+ \emph{Stack}
+\\
+\\ \hline
+\end{tabular}
+\end{center}
+The contents of a TSO are:
+\begin{description}
+
+\item[\emph{Link field}] This is a pointer used to maintain a list of
+threads with a similar state (e.g.~all runnable, all sleeping, all
+blocked on the same black hole, all blocked on the same MVar,
+etc.)
+
+\item[\emph{Mutable link field}] Because the stack is mutable by
+definition, the generational collector needs to track TSOs in older
+generations that may point into younger ones (which is just about any
+TSO for a thread that has run recently). Hence the need for a mutable
+link field (see \secref{mutables}).
+
+\item[\emph{What next}]
+This field has five values:
+\begin{description}
+\item[@ThreadEnterGHC@] The thread can be started by entering the
+closure pointed to by the word on the top of the stack.
+\item[@ThreadRunGHC@] The thread can be started by jumping to the
+address on the top of the stack.
+\item[@ThreadEnterHugs@] The stack has a pointer to a Hugs-built
+closure on top of the stack: enter the closure to run the thread.
+\item[@ThreadKilled@] The thread has been killed (by @killThread#@).
+It is probably still around because it is on some queue somewhere and
+hasn't been garbage collected yet.
+\item[@ThreadComplete@] The thread has finished. Its @TSO@ hasn't
+been garbage collected yet.
+\end{description}
+
+\item[\emph{Thread Id}]
+This field contains a (not necessarily unique) integer that identifies
+the thread. It can be used eg. for hashing.
+
+\item[\emph{Ticky Info}] Optional information for ``Ticky Ticky''
+statistics: @TSO_STK_HWM@ is the maximum number of words allocated to
+this thread.
+
+\item[\emph{Profiling Info}] Optional information for profiling:
+@TSO_CCC@ is the current cost centre.
+
+\item[\emph{Parallel Info}]
+Optional information for parallel execution.
+
+% \begin{itemize}
+%
+% \item The types of threads (@TSO_TYPE@):
+% \begin{description}
+% \item[@T_MAIN@] Must be executed locally.
+% \item[@T_REQUIRED@] A required thread -- may be exported.
+% \item[@T_ADVISORY@] An advisory thread -- may be exported.
+% \item[@T_FAIL@] A failure thread -- may be exported.
+% \end{description}
+%
+% \item I've no idea what else
+%
+% \end{itemize}
+
+\item[\emph{GranSim Info}]
+Optional information for gransim execution.
+
+% \item Optional information for GranSim execution:
+% \begin{itemize}
+% \item locked
+% \item sparkname
+% \item started at
+% \item exported
+% \item basic blocks
+% \item allocs
+% \item exectime
+% \item fetchtime
+% \item fetchcount
+% \item blocktime
+% \item blockcount
+% \item global sparks
+% \item local sparks
+% \item queue
+% \item priority
+% \item clock (gransim light only)
+% \end{itemize}
+%
+%
+% Here are the various queues for GrAnSim-type events.
+%
+% Q_RUNNING
+% Q_RUNNABLE
+% Q_BLOCKED
+% Q_FETCHING
+% Q_MIGRATING
+%
+
+\item[\emph{Stack Info}] Various fields contain information on the
+stack: its current size, its maximum size (to avoid infinite loops
+overflowing the memory), the current stack pointer (\emph{Sp}), the
+current stack update frame pointer (\emph{Su}), and the stack limit
+(\emph{SpLim}). The latter three fields are loaded into the relevant
+registers when the thread is run.
+
+\item[\emph{Stack}] This is the actual stack for the thread,
+\emph{Stack size} words long. It grows downwards from higher
+addresses to lower addresses. When the stack overflows, it will
+generally be relocated into larger premises unless \emph{Max stack
+size} is reached.
+
+\end{description}
+
+The garbage collector needs to be able to find all the
+pointers in a stack. How does it do this?
+
+\begin{itemize}
+
+\item Within the stack there are return addresses, pushed
+by @case@ expressions. Below a return address (i.e. at higher
+memory addresses, since the stack grows downwards) is a chunk
+of stack that the return address ``knows about'', namely the
+activation record of the currently running function.
+
+\item Below each such activation record is a \emph{pending-argument
+section}, a chunk of
+zero or more words that are the arguments to which the result
+of the function should be applied. The return address does not
+statically
+``know'' how many pending arguments there are, or their types.
+(For example, the function might return a result of type $\alpha$.)
+
+\item Below each pending-argument section is another return address,
+and so on. Actually, there might be an update frame instead, but we
+can consider update frames as a special case of a return address with
+a well-defined activation record.
+
+\end{itemize}
+
+The game plan is this. The garbage collector walks the stack from the
+top, traversing pending-argument sections and activation records
+alternately. Next we discuss how it finds the pointers in each of
+these two stack regions.
+
+
+\Subsubsection{Activation records}{activation-records}
+
+An \emph{activation record} is a contiguous chunk of stack,
+with a return address as its first word, followed by as many
+data words as the return address ``knows about''. The return
+address is actually a fully-fledged info pointer. It points
+to an info table, replete with:
+
+\begin{itemize}
+\item entry code (i.e. the code to return to).
+
+\item closure type is either @RET_SMALL/RET_VEC_SMALL@ or
+@RET_BIG/RET_VEC_BIG@, depending on whether the activation record has
+more than 32 data words (\note{64 for 8-byte-word architectures}) and
+on whether to use a direct or a vectored return.
+
+\item the layout info for @RET_SMALL@ is a bitmap telling the layout
+of the activation record, one bit per word. The least-significant bit
+describes the first data word of the record (adjacent to the fixed
+header) and so on. A ``@1@'' indicates a non-pointer, a ``@0@''
+indicates a pointer. We don't need to indicate exactly how many words
+there are, because when we get to all zeros we can treat the rest of
+the activation record as part of the next pending-argument region.
+
+For @RET_BIG@ the layout field points to a block of bitmap words,
+starting with a word that tells how many words are in the block.
+
+\item the info table contains a Static Reference Table pointer for the
+return address (\secref{srt}).
+\end{itemize}
+
+The activation record is a fully fledged closure too. As well as an
+info pointer, it has all the other attributes of a fixed header
+(\secref{fixed-header}) including a saved cost centre which
+is reloaded when the return address is entered.
+
+In other words, all the attributes of closures are needed for
+activation records, so it's very convenient to make them look alike.
+
+
+\Subsubsection{Pending arguments}{pending-args}
+
+So that the garbage collector can correctly identify pointers in
+pending-argument sections we explicitly tag all non-pointers. Every
+non-pointer in a pending-argument section is preceded (at the next
+lower memory word) by a one-word byte count that says how many bytes
+to skip over (excluding the tag word).
+
+The garbage collector traverses a pending argument section from the
+top (i.e. lowest memory address). It looks at each word in turn:
+
+\begin{itemize}
+\item If it is less than or equal to a small constant @ARGTAG_MAX@
+then it treats it as a tag heralding zero or more words of
+non-pointers, so it just skips over them.
+
+\item If it points to the code segment, it must be a return
+address, so we have come to the end of the pending-argument section.
+
+\item Otherwise it must be a bona fide heap pointer.
+\end{itemize}
+
+
+\Subsection{The Stable Pointer Table}{STABLEPTR_TABLE}
+
+A stable pointer is a name for a Haskell object which can be passed to
+the external world. It is ``stable'' in the sense that the name does
+not change when the Haskell garbage collector runs---in contrast to
+the address of the object which may well change.
+
+A stable pointer is represented by an index into the
+@StablePointerTable@. The Haskell garbage collector treats the
+@StablePointerTable@ as a source of roots for GC.
+
+In order to provide efficient access to stable pointers and to be able
+to cope with any number of stable pointers (eg $0 \ldots 100000$), the
+table of stable pointers is an array stored on the heap and can grow
+when it overflows. (Since we cannot compact the table by moving
+stable pointers about, it seems unlikely that a half-empty table can
+be reduced in size---this could be fixed if necessary by using a
+hash table of some sort.)
+
+In general a stable pointer table closure looks like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{No of pointers} & \emph{Free} & $SP_0$ & \ldots & $SP_{n-1}$
+\\\hline
+\end{tabular}
+\end{center}
+
+The fields are:
+\begin{description}
+
+\item[@NPtrs@:] number of (stable) pointers.
+
+\item[@Free@:] the byte offset (from the first byte of the object) of the first free stable pointer.
+
+\item[$SP_i$:] A stable pointer slot. If this entry is in use, it is
+an ``unstable'' pointer to a closure. If this entry is not in use, it
+is a byte offset of the next free stable pointer slot.
+
+\end{description}
+
+When a stable pointer table is evacuated
+\begin{enumerate}
+\item the free list entries are all set to @NULL@ so that the evacuation
+ code knows they're not pointers;
+
+\item The stable pointer slots are scanned linearly: non-@NULL@ slots
+are evacuated and @NULL@-values are chained together to form a new free list.
+\end{enumerate}
+
+There's no need to link the stable pointer table onto the mutable
+list because we always treat it as a root.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\Subsection{Garbage Collecting CAFs}{CAF}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+% begin{direct quote from current paper}
+A CAF (constant applicative form) is a top-level expression with no
+arguments. The expression may need a large, even unbounded, amount of
+storage when it is fully evaluated.
+
+CAFs are represented by closures in static memory that are updated
+with indirections to objects in the heap space once the expression is
+evaluated. Previous version of GHC maintained a list of all evaluated
+CAFs and traversed them during GC, the result being that the storage
+allocated by a CAF would reside in the heap until the program ended.
+% end{direct quote from current paper}
+
+% begin{elaboration on why CAFs are very very bad}
+Treating CAFs this way has two problems:
+\begin{itemize}
+\item
+It can cause a very large space leak. For example, this program
+should run in constant space but, instead, will run out of memory.
+\begin{verbatim}
+> main :: IO ()
+> main = print nats
+>
+> nats :: [Int]
+> nats = [0..maxInt]
+\end{verbatim}
+
+\item
+Expressions with no arguments have very different space behaviour
+depending on whether or not they occur at the top level. For example,
+if we make \verb+nats+ a local definition, the space leak goes away
+and the resulting program runs in constant space, as expected.
+\begin{verbatim}
+> main :: IO ()
+> main = print nats
+> where
+> nats :: [Int]
+> nats = [0..maxInt]
+\end{verbatim}
+
+This huge change in the operational behaviour of the program
+is a problem for optimising compilers and for programmers.
+For example, GHC will normally flatten a set of let bindings using
+this transformation:
+\begin{verbatim}
+let x1 = let x2 = e2 in e1 ==> let x2 = e2 in let x1 = e1
+\end{verbatim}
+but it does not do so if this would raise \verb+x2+ to the top level
+since that may create a CAF. Many Haskell programmers avoid creating
+large CAFs by adding a dummy argument to a CAF or by moving a CAF away
+from the top level.
+
+\end{itemize}
+% end{elaboration on why CAFs are very very bad}
+
+Solving the CAF problem requires different treatment in interactive
+systems such as Hugs than in batch-mode systems such as GHC
+\begin{itemize}
+\item
+In a batch-mode the program the runtime system is terminated
+after every execution of the runtime system. In such systems,
+the garbage collector can completely ``destroy'' a CAF when it
+is no longer live --- in much the same way as it ``destroys''
+normal closures when they are no longer live.
+
+\item
+In an interactive system, many expressions are evaluated without
+restarting the runtime system between each evaluation. In such
+systems, the garbage collector cannot completely ``destroy'' a CAF
+when it is no longer live because, whilst it might not be required in
+the evaluation of the current expression, it might be required in the
+next evaluation.
+
+There are two possible behaviours we might want:
+\begin{enumerate}
+\item
+When a CAF is no longer required for the current evaluation, the CAF
+should be reverted to its original form. This behaviour ensures that
+the operational behaviour of the interactive system is a reasonable
+predictor of the operational behaviour of the batch-mode system. This
+allows us to use Hugs for performance debugging (in particular, trying
+to understand and reduce the heap usage of a program) --- an area of
+increasing importance as Haskell is used more and more to solve ``real
+problems'' in ``real problem domains''.
+
+\item
+Even if a CAF is no longer required for the current evaluation, we might
+choose to hang onto it by collecting it in the normal way. This keeps
+the space leak but might be useful in a teaching environment when
+trying to teach the difference between call by name evaluation (which
+doesn't share work) and lazy evaluation (which does share work).
+
+\end{enumerate}
+
+It turns out that it is easy to support both styles of use, so the
+runtime system provides a switch which lets us turn this on and off
+during execution. \ToDo{What is this switch called?} It would also
+be easy to provide a function \verb+RevertCAF+ to let the interpreter
+revert any CAF it wanted between (but not during) executions, if we so
+desired. Running \verb+RevertCAF+ during execution would lose some sharing
+but is otherwise harmless.
+
+\end{itemize}
+
+% % begin{even more pointless observation?}
+% The simplest fix would be to remove the special treatment of
+% top level variables. This works but is very inefficient.
+% ToDo: say why.
+% (Note: delete this paragraph from final version.)
+% % end{even more pointless observation?}
+
+% begin{pointless observation?}
+An easy but inefficient fix to the CAF problem would be to make a
+complete copy of the heap before every evaluation and discard the copy
+after evaluation. This works but is inefficient.
+% end{pointless observation?}
+
+An efficient way to achieve a similar effect is to revert all
+updatable thunks to their original form as they become unnecessary for
+the current evaluation. To do this, we modify the compiler to ensure
+that the only updatable thunks generated by the compiler are CAFs and
+we modify the garbage collector to revert entered CAFs to unentered
+CAFs as their value becomes unnecessary.
+
+
+\subsubsection{New Heap Objects}
+
+We add three new kinds of heap object: unentered CAF closures, entered
+CAF objects and CAF blackholes. We first describe how they are
+evaluated and then how they are garbage collected.
+\begin{itemize}
+\item
+Unentered CAF closures contain a pointer to closure representing the
+body of the CAF. The ``body closure'' is not updatable.
+
+Unentered CAF closures contain two unused fields to make them the same
+size as entered CAF closures --- which allows us to perform an inplace
+update. \ToDo{Do we have to add another kind of inplace update operation
+to the storage manager interface or do we consider this to be internal
+to the SM?}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\verb+CAF_unentered+ & \emph{body closure} & \emph{unused} & \emph{unused} \\ \hline
+\end{tabular}
+\end{center}
+When an unentered CAF is entered, we do the following:
+\begin{itemize}
+\item
+allocate a CAF black hole;
+
+\item
+push an update frame (to update the CAF black hole) onto the stack;
+
+\item
+overwrite the CAF with an entered CAF object (see below) with the same
+body and whose value field points to the black hole;
+
+\item
+add the CAF to a list of all entered CAFs (called ``the CAF list'');
+and
+
+\item
+the closure representing the value of the CAF is entered.
+
+\end{itemize}
+
+When evaluation of the CAF body returns a value, the update frame
+causes the CAF black hole to be updated with the value in the normal
+way.
+
+\ToDo{Add a picture}
+
+\item
+Entered CAF closures contain two pointers: a pointer to the CAF body
+(the same as for unentered CAF closures); a pointer to the CAF value
+(this is initialised with a CAF blackhole, as previously described);
+and a link to the next CAF in the CAF list
+
+\ToDo{How is the end of the list marked? Null pointer or sentinel value?}.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\verb+CAF_entered+ & \emph{body closure} & \emph{value} & \emph{link} \\ \hline
+\end{tabular}
+\end{center}
+When an entered CAF is entered, it enters its value closure.
+
+\item
+CAF blackholes are identical to normal blackholes except that they
+have a different infotable. The only reason for having CAF blackholes
+is to allow an optimisation of lazy blackholing where we stop scanning
+the stack when we see the first {\em normal blackhole} but not
+when we see a {\em CAF blackhole.}
+\ToDo{The optimisation we want to allow should be described elsewhere
+so that all we have to do here is describe the difference.}
+
+Instead of allocating a blackhole to update with the value of the CAF,
+it might seem simpler to update the CAF directly. This would require
+a new kind of update frame which would update the value field of the
+CAF with a pointer to the value and wouldn't catch blackholes caused
+by CAFs that depend on themselves so we chose not to do so.
+
+\end{itemize}
+
+\subsubsection{Garbage Collection}
+
+To avoid the space leak, each run of the garbage collector must revert
+the entered CAFs which are not required to complete the current
+evaluation (that is all the closures reachable from the set of
+runnable threads and the stable pointer table).
+
+It does this by performing garbage collection in three phases:
+\begin{enumerate}
+\item
+During the first phase, we ``mark'' all closures reachable from the
+scheduler state.
+
+How we ``mark'' closures depends on the garbage collector. For
+example, in a 2-space collector, closures are ``marked'' by copying
+them into ``to-space'', overwriting them with a forwarding node and
+``marking'' all the closures reachable from the copy. The only
+requirements are that we can test whether a closure is marked and if a
+closure is marked then so are all closures reachable from it.
+
+\ToDo{At present we say that the scheduler state includes any state
+that Hugs may have. This is not true anymore.}
+
+Performing this phase first provides us with a cheap test for
+execution closures: at this stage in execution, the execution closures
+are precisely the marked closures.
+
+\item
+During the second phase, we revert all unmarked CAFs on the CAF list
+and remove them from the CAF list.
+
+Since the CAF list is exactly the set of all entered CAFs, this reverts
+all entered CAFs which are not execution closures.
+
+\item
+During the third phase, we mark all top level objects (including CAFs)
+by calling \verb+MarkHugsRoots+ which will call \verb+MarkRoot+ for
+each top level object known to Hugs.
+
+\end{enumerate}
+
+To implement the second style of interactive behaviour (where we
+deliberately keep the CAF-related space leak), we simply omit the
+second phase. Omitting the second phase causes the third phase to
+mark any unmarked CAF value closures.
+
+So far, we have been describing a pure Hugs system which contains no
+machine generated code. The main difference in a hybrid system is
+that GHC-generated code is statically allocated in memory instead of
+being dynamically allocated on the heap. We split both
+\verb+CAF_unentered+ and \verb+CAF_entered+ into two versions: a
+static and a dynamic version. The static and dynamic versions of each
+CAF differ only in whether they are moved during garbage collection.
+When reverting CAFs, we revert dynamic entered CAFs to dynamic
+unentered CAFs and static entered CAFs to static unentered CAFs.
+
+
+
+
+\Section{The Bytecode Evaluator}{bytecode-evaluator}
+
+This section describes how the Hugs interpreter interprets code in the
+same environment as compiled code executes. Both evaluation models
+use a common garbage collector, so they must agree on the form of
+objects in the heap.
+
+Hugs interprets code by converting it to byte-code and applying a
+byte-code interpreter to it. Wherever possible, we try to ensure that
+the byte-code is all that is required to interpret a section of code.
+This means not dynamically generating info tables, and hence we can
+only have a small number of possible heap objects each with a statically
+compiled info table. Similarly for stack objects: in fact we only
+have one Hugs stack object, in which all information is tagged for the
+garbage collector.
+
+There is, however, one exception to this rule. Hugs must generate
+info tables for any constructors it is asked to compile, since the
+alternative is to force a context-switch each time compiled code
+enters a Hugs-built constructor, which would be prohibitively
+expensive.
+
+We achieve this simplicity by forgoing some of the optimisations used
+by compiled code:
+\begin{itemize}
+\item
+
+Whereas compiled code has five different ways of entering a closure
+(\secref{ghc-fun-call}), interpreted code has only one.
+The entry point for interpreted code behaves like slow entry points for
+compiled code.
+
+\item
+
+We use just one info table for \emph{all\/} direct returns.
+This introduces two problems:
+\begin{enumerate}
+\item How does the interpreter know what code to execute?
+
+Instead of pushing just a return address, we push a return BCO and a
+trivial return address which just enters the return BCO.
+
+(In a purely interpreted system, we could avoid pushing the trivial
+return address.)
+
+\item How can the garbage collector follow pointers within the
+activation record?
+
+We could push a third word ---a bitmask describing the location of any
+pointers within the record--- but, since we're already tagging unboxed
+function arguments on the stack, we use the same mechanism for unboxed
+values within the activation record.
+
+\ToDo{Do we have to stub out dead variables in the activation frame?}
+
+\end{enumerate}
+
+\item
+
+We trivially support vectored returns by pushing a return vector whose
+entries are all the same.
+
+\item
+
+We avoid the need to build SRTs by putting bytecode objects on the
+heap and restricting BCOs to a single basic block.
+
+\end{itemize}
+
+\Subsection{Hugs Info Tables}{hugs-info-tables}
+
+Hugs requires the following info tables and closures:
+\begin{description}
+\item [@HUGS\_RET@].
+
+Contains both a vectored return table and a direct entry point. All
+entry points are the same: they rearrange the stack to match the Hugs
+return convention (\secref{hugs-return-convention}) and return to the
+scheduler. When the scheduler restarts the thread, it will find a BCO
+on top of the stack and will enter the Hugs interpreter.
+
+\item [@UPD\_RET@].
+
+This is just the standard info table for an update frame.
+
+\item [Constructors].
+
+The entry code for a constructor jumps to a generic entry point in the
+runtime system which decides whether to do a vectored or unvectored
+return depending on the shape of the constructor/type. This implies that
+info tables must have enough info to make that decision.
+
+\item [@AP@ and @PAP@].
+
+\item [Indirections].
+
+\item [Selectors].
+
+Hugs doesn't generate them itself but it ought to recognise them
+
+\item [Complex primops].
+
+Some of the primops are too complex for GHC to generate inline.
+Instead, these primops are hand-written and called as normal functions.
+Hugs only needs to know their names and types but doesn't care whether
+they are generated by GHC or by hand. Two things to watch:
+
+\begin{enumerate}
+\item
+Hugs must be able to enter these primops even if it is working on a
+standalone system that does not support genuine GHC generated code.
+
+\item The complex primops often involve unboxed tuple types (which
+Hugs does not support at the source level) so we cannot specify their
+types in a Haskell source file.
+
+\end{enumerate}
+
+\end{description}
+
+\Subsection{Hugs Heap Objects}{hugs-heap-objects}
+
+\subsubsection{Byte-code objects}
+
+Compiled byte code lives on the global heap, in objects called
+Byte-Code Objects (or BCOs). The layout of BCOs is described in
+detail in \secref{BCO}, in this section we will describe
+their semantics.
+
+Since byte-code lives on the heap, it can be garbage collected just
+like any other heap-resident data. Hugs arranges that any BCO's
+referred to by the Hugs symbol tables are treated as live objects by
+the garbage collector. When a module is unloaded, the pointers to its
+BCOs are removed from the symbol table, and the code will be garbage
+collected some time later.
+
+A BCO represents a basic block of code --- the (only) entry points is
+at the beginning of a BCO, and it is impossible to jump into the
+middle of one. A BCO represents not only the code for a function, but
+also its closure; a BCO can be entered just like any other closure.
+Hugs performs lambda-lifting during compilation to byte-code, and each
+top-level combinator becomes a BCO in the heap.
+
+
+\subsubsection{Thunks and partial applications}
+
+A thunk consists of a code pointer, and values for the free variables
+of that code. Since Hugs byte-code is lambda-lifted, free variables
+become arguments and are expected to be on the stack by the called
+function.
+
+Hugs represents updateable thunks with @AP_UPD@ objects applying a closure
+to a list of arguments. (As for @PAP@s, unboxed arguments should be
+preceded by a tag.) When it is entered, it pushes an update frame
+followed by its payload on the stack, and enters the first word (which
+will be a pointer to a BCO). The layout of @AP_UPD@ objects is described
+in more detail in \secref{AP_UPD}.
+
+Partial applications are represented by @PAP@ objects, which are
+non-updatable.
+
+\ToDo{Hugs Constructors}.
+
+\Subsection{Calling conventions}{hugs-calling-conventions}
+
+The calling convention for any byte-code function is straightforward:
+\begin{itemize}
+\item Push any arguments on the stack.
+\item Push a pointer to the BCO.
+\item Begin interpreting the byte code.
+\end{itemize}
+
+In a system containing both GHC and Hugs, the bytecode interpreter
+only has to be able to enter BCOs: everything else can be handled by
+returning to the compiled world (as described in
+\secref{hugs-to-ghc-switch}) and entering the closure
+there.
+
+This would work but it would obviously be very inefficient if we
+entered a @AP@ by switching worlds, entering the @AP@, pushing the
+arguments and function onto the stack, and entering the function
+which, likely as not, will be a byte-code object which we will enter
+by \emph{returning} to the byte-code interpreter. To avoid such
+gratuitious world switching, we choose to recognise certain closure
+types as being ``standard'' --- and duplicate the entry code for the
+``standard closures'' in the bytecode interpreter.
+
+A closure is said to be ``standard'' if its entry code is entirely
+determined by its info table. \emph{Standard Closures} have the
+desirable property that the byte-code interpreter can enter the
+closure by simply ``interpreting'' the info table instead of switching
+to the compiled world. The standard closures include:
+
+\begin{description}
+\item[Constructor] To enter a constructor, we simply return (see
+\secref{hugs-return-convention}).
+
+\item[Indirection]
+To enter an indirection, we simply enter the object it points to
+after possibly adjusting the current cost centre.
+
+\item[@AP@]
+
+To enter an @AP@, we push an update frame, push the
+arguments, push the function and enter the function.
+(Not forgetting a stack check at the start.)
+
+\item[@PAP@]
+
+To enter a @PAP@, we push the arguments, push the function and enter
+the function. (Not forgetting a stack check at the start.)
+
+\item[Selector]
+
+To enter a selector (\secref{THUNK_SELECTOR}), we test whether the
+selectee is a value. If so, we simply select the appropriate
+component; if not, it's simplest to treat it as a GHC-built closure
+--- though we could interpret it if we wanted.
+
+\end{description}
+
+The most obvious omissions from the above list are @BCO@s (which we
+dealt with above) and GHC-built closures (which are covered in
+\secref{hugs-to-ghc-switch}).
+
+
+\Subsection{Return convention}{hugs-return-convention}
+
+When Hugs pushes a return address, it pushes both a pointer to the BCO
+to return to, and a pointer to a static code fragment @HUGS_RET@ (this
+is described in \secref{ghc-to-hugs-switch}). The
+stack layout is shown in \figref{hugs-return-stack}.
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| bco |--> BCO
++----------+
+| HUGS_RET |
++----------+
+\end{verbatim}
+%\input{hugs_ret.pstex_t}
+\end{center}
+\caption{Stack layout for a Hugs return address}
+\label{fig:hugs-return-stack}
+% this figure apparently duplicates {fig:hugs-return-stack1} earlier.
+\end{figure}
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| con |--> CON
++----------+
+\end{verbatim}
+%\input{hugs_ret2.pstex_t}
+\end{center}
+\caption{Stack layout on enterings a Hugs return address}
+\label{fig:hugs-return2}
+\end{figure}
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| 3# |
++----------+
+| I# |
++----------+
+\end{verbatim}
+%\input{hugs_ret2.pstex_t}
+\end{center}
+\caption{Stack layout on entering a Hugs return address with an unboxed value}
+\label{fig:hugs-return-int1}
+\end{figure}
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| ghc_ret |
++----------+
+| con |--> CON
++----------+
+\end{verbatim}
+%\input{hugs_ret3.pstex_t}
+\end{center}
+\caption{Stack layout on enterings a GHC return address}
+\label{fig:hugs-return3}
+\end{figure}
+
+\begin{figure}[ht]
+\begin{center}
+\begin{verbatim}
+| stack |
++----------+
+| ghc_ret |
++----------+
+| 3# |
++----------+
+| I# |
++----------+
+| restart |--> id_Int#_closure
++----------+
+\end{verbatim}
+%\input{hugs_ret2.pstex_t}
+\end{center}
+\caption{Stack layout on enterings a GHC return address with an unboxed value}
+\label{fig:hugs-return-int}
+\end{figure}
+
+When a Hugs byte-code sequence enters a closure, it examines the
+return address on top of the stack.
+
+\begin{itemize}
+
+\item If the return address is @HUGS_RET@, pop the @HUGS_RET@ and the
+bco for the continuation off the stack, push a pointer to the constructor onto
+the stack and enter the BCO with the current object pointer set to the BCO
+(\figref{hugs-return2}).
+
+\item If the top of the stack is not @HUGS_RET@, we need to do a world
+switch as described in \secref{hugs-to-ghc-switch}.
+
+\end{itemize}
+
+\ToDo{This duplicates what we say about switching worlds
+(\secref{switching-worlds}) - kill one or t'other.}
+
+
+\ToDo{This was in the evaluation model part but it really belongs in
+this part which is about the internal details of each of the major
+sections.}
+
+\Subsection{Addressing Modes}{hugs-addressing-modes}
+
+To avoid potential alignment problems and simplify garbage collection,
+all literal constants are stored in two tables (one boxed, the other
+unboxed) within each BCO and are referred to by offsets into the tables.
+Slots in the constant tables are word aligned.
+
+\ToDo{How big can the offsets be? Is the offset specified in the
+address field or in the instruction?}
+
+Literals can have the following types: char, int, nat, float, double,
+and pointer to boxed object. There is no real difference between
+char, int, nat and float since they all occupy 32 bits --- but it
+costs almost nothing to distinguish them and may improve portability
+and simplify debugging.
+
+\Subsection{Compilation}{hugs-compilation}
+
+
+\def\is{\mbox{\it is}}
+\def\ts{\mbox{\it ts}}
+\def\as{\mbox{\it as}}
+\def\bs{\mbox{\it bs}}
+\def\cs{\mbox{\it cs}}
+\def\rs{\mbox{\it rs}}
+\def\us{\mbox{\it us}}
+\def\vs{\mbox{\it vs}}
+\def\ws{\mbox{\it ws}}
+\def\xs{\mbox{\it xs}}
+
+\def\e{\mbox{\it e}}
+\def\alts{\mbox{\it alts}}
+\def\fail{\mbox{\it fail}}
+\def\panic{\mbox{\it panic}}
+\def\ua{\mbox{\it ua}}
+\def\obj{\mbox{\it obj}}
+\def\bco{\mbox{\it bco}}
+\def\tag{\mbox{\it tag}}
+\def\entry{\mbox{\it entry}}
+\def\su{\mbox{\it su}}
+
+\def\Ind#1{{\mbox{\it Ind}\ {#1}}}
+\def\update#1{{\mbox{\it update}\ {#1}}}
+
+\def\next{$\Longrightarrow$}
+\def\append{\mathrel{+\mkern-6mu+}}
+\def\reverse{\mbox{\it reverse}}
+\def\size#1{{\vert {#1} \vert}}
+\def\arity#1{{\mbox{\it arity}{#1}}}
+
+\def\AP{\mbox{\it AP}}
+\def\PAP{\mbox{\it PAP}}
+\def\GHCRET{\mbox{\it GHCRET}}
+\def\GHCOBJ{\mbox{\it GHCOBJ}}
+
+To make sense of the instructions, we need a sense of how they will be
+used. Here is a small compiler for the STG language.
+
+\begin{verbatim}
+> cg (f{a1, ... am}) = do
+> pushAtom am; ... pushAtom a1
+> pushVar f
+> SLIDE (m+1) |env|
+> ENTER
+> cg (let {x1=rhs1; ... xm=rhsm} in e) = do
+> ALLOC x1 |rhs1|, ... ALLOC xm |rhsm|
+> build x1 rhs1, ... build xm rhsm
+> cg e
+> cg (case e of alts) = do
+> PUSHALTS (cgAlts alts)
+> cg e
+
+> cgAlts { alt1; ... altm } = cgAlt alt1 $ ... $ cgAlt altm pmFail
+>
+> cgAlt (x@C{xs} -> e) fail = do
+> TEST C fail
+> HEAPCHECK (heapUse e)
+> UNPACK xs
+> cg e
+
+> build x (C{a1, ... am}) = do
+> pushUntaggedAtom am; ... pushUntaggedAtom a1
+> PACK x C
+> -- A useful optimisation
+> build x ({v1, ... vm} \ {}. f{a1, ... am}) = do
+> pushVar am; ... pushVar a1
+> pushVar f
+> MKAP x m
+> build x ({v1, ... vm} \ {}. e) = do
+> pushVar vm; ... pushVar v1
+> PUSHBCO (cgRhs ({v1, ... vm} \ {}. e))
+> MKAP x m
+> build x ({v1, ... vm} \ {x1, ... xm}. e) = do
+> pushVar vm; ... pushVar v1
+> PUSHBCO (cgRhs ({v1, ... vm} \ {x1, ... xm}. e))
+> MKPAP x m
+
+> cgRhs (vs \ xs. e) = do
+> ARGCHECK (xs ++ vs) -- can be omitted if xs == {}
+> STACKCHECK min(stackUse e,heapOverflowSlop)
+> HEAPCHECK (heapUse e)
+> cg e
+
+> pushAtom x = pushVar x
+> pushAtom i# = PUSHINT i#
+
+> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x
+
+> pushUntaggedAtom x = pushVar x
+> pushUntaggedAtom i# = PUSHUNTAGGEDINT i#
+
+> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x
+\end{verbatim}
+
+\ToDo{Is there an easy way to add semi-tagging? Would it be that different?}
+
+\ToDo{Optimise thunks of the form @f{x1,...xm}@ so that we build an AP directly}
+
+\Subsection{Instructions}{hugs-instructions}
+
+We specify the semantics of instructions using transition rules of
+the form:
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & $\is$ & $s$ & $\su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is'$ & $s'$ & $\su'$ & $h'$ & $hp'$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+where $\is$ is an instruction stream, $s$ is the stack, $\su$ is the
+update frame pointer and $h$ is the heap.
+
+
+\Subsection{Stack manipulation}{hugs-stack-manipulation}
+
+\begin{description}
+
+\item[ Push a global variable ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHGLOBAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Push a local variable ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHLOCAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Push an unboxed int ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $I\# : \sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+The $I\#$ is a tag included for the benefit of the garbage collector.
+Similar rules exist for floats, doubles, chars, etc.
+
+\item[ Push an unboxed int ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHUNTAGGEDINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+Similar rules exist for floats, doubles, chars, etc.
+
+\item[ Delete environment from stack --- ready for tail call ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & SLIDE $m$ $n$ : $\is$ & $\as \append \bs \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $\as \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $\size{\as} = m$ and $\size{\bs} = n$.
+
+
+\item[ Push a return address ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHALTS $o$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $@HUGS_RET@:\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Push a BCO ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PUSHBCO $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\Subsection{Heap manipulation}{hugs-heap-manipulation}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\begin{description}
+
+\item[ Allocate a heap object ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & ALLOC $m$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $hp:s$ & $su$ & $h$ & $hp+m$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Build a constructor ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & PACK $o$ $o'$ : $\is$ & $\ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto Pack C\{\ws\}]$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $C = \sigma!o'$ and $\size{\ws} = \arity{C}$.
+
+\item[ Build an AP or PAP ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & MKAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \AP(f,\ws)]$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $\size{\ws} = m$.
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & MKPAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $\size{\ws} = m$.
+
+\item[ Unpacking a constructor ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & UNPACK : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\
+\next & $is'$ & $(\reverse\ \ws) \append a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+The $\reverse\ \ws$ looks expensive but, since the stack grows down
+and the heap grows up, that's actually the cheap way of copying from
+heap to stack. Looking at the compilation rules, you'll see that we
+always push the args in reverse order.
+
+\end{description}
+
+
+\Subsection{Entering a closure}{hugs-entering}
+
+\begin{description}
+
+\item[ Enter a BCO ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : s$ & $su$ & $h[a \mapsto BCO\{\is\} ]$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $a : s$ & $su$ & $h$ & $hp$ & $a$ \\
+\hline
+\end{tabular}
+
+\item[ Enter a PAP closure ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\
+\next & [ENTER] & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $???$ \\
+\hline
+\end{tabular}
+
+\item[ Entering an AP closure ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \AP(f,ws)]$ & $hp$ & $\sigma$ \\
+\next & [ENTER] & $f : \ws \append @UPD_RET@:\su:a:s$ & $su'$ & $h$ & $hp$ & $???$ \\
+\hline
+\end{tabular}
+
+Optimisations:
+\begin{itemize}
+\item Instead of blindly pushing an update frame for $a$, we can first test whether there's already
+ an update frame there. If so, overwrite the existing updatee with an indirection to $a$ and
+ overwrite the updatee field with $a$. (Overwriting $a$ with an indirection to the updatee also
+ works.) This results in update chains of maximum length 2.
+\end{itemize}
+
+
+\item[ Returning a constructor ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : @HUGS_RET@ : \alts : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\
+\next & $\alts.\entry$ & $a:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+
+\item[ Entering an indirection node ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \Ind{a'}]$ & $hp$ & $\sigma$ \\
+\next & [ENTER] & $a' : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[Entering GHC closure].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \GHCOBJ]$ & $hp$ & $\sigma$ \\
+\next & [ENTERGHC] & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[Returning a constructor to GHC].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : \GHCRET : s$ & $su$ & $h[a \mapsto C \ws]$ & $hp$ & $\sigma$ \\
+\next & [ENTERGHC] & $a : \GHCRET : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\end{description}
+
+
+\Subsection{Updates}{hugs-updates}
+
+\begin{description}
+
+\item[ Updating with a constructor].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & [ENTER] & $a : @UPD_RET@ : ua : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\
+\next & [ENTER] & $a \append s$ & $su$ & $h[au \mapsto \Ind{a}$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Argument checks].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & ARGCHECK $m$:$\is$ & $a : \as \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $m \ge (su - sp)$
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & ARGCHECK $m$:$\is$ & $a : \as \append @UPD_RET@:su:ua:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $m < (su - sp)$ and
+ $h' = h[ua \mapsto \Ind{a'}, a' \mapsto \PAP(a,\reverse\ \as) ]$
+
+Again, we reverse the list of values as we transfer them from the
+stack to the heap --- reflecting the fact that the stack and heap grow
+in different directions.
+
+\end{description}
+
+\Subsection{Branches}{hugs-branches}
+
+\begin{description}
+
+\item[ Testing a constructor ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\
+\next & $is$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $C.\tag = tag$
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\
+\next & $is'$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $C.\tag \neq tag$
+
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\Subsection{Heap and stack checks}{hugs-heap-stack-checks}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & STACKCHECK $stk$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+if $s$ has $stk$ free slots.
+
+\begin{tabular}{|llrrrrr|}
+\hline
+ & HEAPCHECK $hp$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+if $h$ has $hp$ free slots.
+
+If either check fails, we push the current bco ($\sigma$) onto the
+stack and return to the scheduler. When the scheduler has fixed the
+problem, it pops the top object off the stack and reenters it.
+
+
+Optimisations:
+\begin{itemize}
+\item The bytecode CHECK1000 conservatively checks for 1000 words of heap space and 1000 words of stack space.
+ We use it to reduce code space and instruction decoding time.
+\item The bytecode HEAPCHECK1000 conservatively checks for 1000 words of heap space.
+ It is used in case alternatives.
+\end{itemize}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\Subsection{Primops}{hugs-primops}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\ToDo{primops take m words and return n words. The expect boxed arguments on the stack.}
+
+
+\Section{The Machine Code Evaluator}{asm-evaluator}
+
+This section describes the framework in which compiled code evaluates
+expressions. Only at certain points will compiled code need to be
+able to talk to the interpreted world; these are discussed in
+\secref{switching-worlds}.
+
+\Subsection{Calling conventions}{ghc-calling-conventions}
+
+\Subsubsection{The call/return registers}{ghc-regs}
+
+One of the problems in designing a virtual machine is that we want it
+abstract away from tedious machine details but still reveal enough of
+the underlying hardware that we can make sensible decisions about code
+generation. A major problem area is the use of registers in
+call/return conventions. On a machine with lots of registers, it's
+cheaper to pass arguments and results in registers than to pass them
+on the stack. On a machine with very few registers, it's cheaper to
+pass arguments and results on the stack than to use ``virtual
+registers'' in memory. We therefore use a hybrid system: the first
+$n$ arguments or results are passed in registers; and the remaining
+arguments or results are passed on the stack. For register-poor
+architectures, it is important that we allow $n=0$.
+
+We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with
+the understanding that \Arg{1} \ldots \Arg{n} are in registers and
+\Arg{n+1} \ldots \Arg{m} are on top of the stack.
+
+Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine
+registers depends on the \emph{kinds} of the arguments. For example,
+if the first argument is a Float, we might pass it in a different
+register from if it is an Int. In fact, we might find that a given
+architecture lets us pass varying numbers of arguments according to
+their types. For example, if a CPU has 2 Int registers and 2 Float
+registers then we could pass between 2 and 4 arguments in machine
+registers --- depending on whether they all have the same kind or they
+have different kinds.
+
+\Subsubsection{Entering closures}{entering-closures}
+
+To evaluate a closure we jump to the entry code for the closure
+passing a pointer to the closure in \Arg{1} so that the entry code can
+access its environment.
+
+\Subsubsection{Function call}{ghc-fun-call}
+
+The function-call mechanism is obviously crucial. There are five different
+cases to consider:
+\begin{enumerate}
+
+\item \emph{Known combinator (function with no free variables) and
+enough arguments.}
+
+A fast call can be made: push excess arguments onto stack and jump to
+function's \emph{fast entry point} passing arguments in \Arg{1} \ldots
+\Arg{m}.
+
+The \emph{fast entry point} is only called with exactly the right
+number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly
+start doing useful work without first testing whether it has enough
+registers or having to pop them off the stack first.
+
+\item \emph{Known combinator and insufficient arguments.}
+
+A slow call can be made: push all arguments onto stack and jump to
+function's \emph{slow entry point}.
+
+Any unpointed arguments which are pushed on the stack must be tagged.
+This means pushing an extra word on the stack below the unpointed
+words, containing the number of unpointed words above it.
+
+%Todo: forward ref about tagging?
+%Todo: picture?
+
+The \emph{slow entry point} might be called with insufficient arguments
+and so it must test whether there are enough arguments on the stack.
+This \emph{argument satisfaction check} consists of checking that
+@Su-Sp@ is big enough to hold all the arguments (including any tags).
+
+\begin{itemize}
+
+\item If the argument satisfaction check fails, it is because there is
+one or more update frames on the stack before the rest of the
+arguments that the function needs. In this case, we construct a PAP
+(partial application, \secref{PAP}) containing the arguments
+which are on the stack. The PAP construction code will return to the
+update frame with the address of the PAP in \Arg{1}.
+
+\item If the argument satisfaction check succeeds, we jump to the fast
+entry point with the arguments in \Arg{1} \ldots \Arg{arity}.
+
+If the fast entry point expects to receive some of \Arg{i} on the
+stack, we can reduce the amount of movement required by making the
+stack layout for the fast entry point look like the stack layout for
+the slow entry point. Since the slow entry point is entered with the
+first argument on the top of the stack and with tags in front of any
+unpointed arguments, this means that if \Arg{i} is unpointed, there
+should be space below it for a tag and that the highest numbered
+argument should be passed on the top of the stack.
+
+We usually arrange that the fast entry point is placed immediately
+after the slow entry point --- so we can just ``fall through'' to the
+fast entry point without performing a jump.
+
+\end{itemize}
+
+
+\item \emph{Known function closure (function with free variables) and
+enough arguments.}
+
+A fast call can be made: push excess arguments onto stack and jump to
+function's \emph{fast entry point} passing a pointer to closure in
+\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}.
+
+Like the fast entry point for a combinator, the fast entry point for a
+closure is only called with appropriate values in \Arg{1} \ldots
+\Arg{m+1} so we can start work straight away. The pointer to the
+closure is used to access the free variables of the closure.
+
+
+\item \emph{Known function closure and insufficient arguments.}
+
+A slow call can be made: push all arguments onto stack and jump to the
+closure's slow entry point passing a pointer to the closure in \Arg{1}.
+
+Again, the slow entry point performs an argument satisfaction check
+and either builds a PAP or pops the arguments off the stack into
+\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
+
+
+\item \emph{Unknown function closure, thunk or constructor.}
+
+Sometimes, the function being called is not statically identifiable.
+Consider, for example, the @compose@ function:
+\begin{verbatim}
+ compose f g x = f (g x)
+\end{verbatim}
+Since @f@ and @g@ are passed as arguments to @compose@, the latter has
+to make a heap call. In a heap call the arguments are pushed onto the
+stack, and the closure bound to the function is entered. In the
+example, a thunk for @(g x)@ will be allocated, (a pointer to it)
+pushed on the stack, and the closure bound to @f@ will be
+entered. That is, we will jump to @f@s entry point passing @f@ in
+\Arg{1}. If \Arg{1} is passed on the stack, it is pushed on top of
+the thunk for @(g x)@.
+
+The \emph{entry code} for an updateable thunk (which must have arity 0)
+pushes an update frame on the stack and starts executing the body of
+the closure --- using \Arg{1} to access any free variables. This is
+described in more detail in \secref{data-updates}.
+
+The \emph{entry code} for a non-updateable closure is just the
+closure's slow entry point.
+
+\end{enumerate}
+
+In addition to the above considerations, if there are \emph{too many}
+arguments then the extra arguments are simply pushed on the stack with
+appropriate tags.
+
+To summarise, a closure's standard (slow) entry point performs the
+following:
+
+\begin{description}
+\item[Argument satisfaction check.] (function closure only)
+\item[Stack overflow check.]
+\item[Heap overflow check.]
+\item[Copy free variables out of closure.] %Todo: why?
+\item[Eager black holing.] (updateable thunk only) %Todo: forward ref.
+\item[Push update frame.]
+\item[Evaluate body of closure.]
+\end{description}
+
+
+\Subsection{Case expressions and return conventions}{return-conventions}
+
+The \emph{evaluation} of a thunk is always initiated by
+a @case@ expression. For example:
+\begin{verbatim}
+ case x of (a,b) -> E
+\end{verbatim}
+
+The code for a @case@ expression looks like this:
+
+\begin{itemize}
+\item Push the free variables of the branches on the stack (fv(@E@) in
+this case).
+\item Push a \emph{return address} on the stack.
+\item Evaluate the scrutinee (@x@ in this case).
+\end{itemize}
+
+Once evaluation of the scrutinee is complete, execution resumes at the
+return address, which points to the code for the expression @E@.
+
+When execution resumes at the return point, there must be some {\em
+return convention} that defines where the components of the pair, @a@
+and @b@, can be found. The return convention varies according to the
+type of the scrutinee @x@:
+
+\begin{itemize}
+
+\item
+
+(A space for) the return address is left on the top of the stack.
+Leaving the return address on the stack ensures that the top of the
+stack contains a valid activation record
+(\secref{activation-records}) --- should a garbage
+collection be required.
+
+\item If @x@ has a boxed type (e.g.~a data constructor or a function),
+a pointer to @x@ is returned in \Arg{1}.
+
+\ToDo{Warn that components of E should be extracted as soon as
+possible to avoid a space leak.}
+
+\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is
+returned in \Arg{1}
+
+\item If @x@ is an unboxed tuple constructor, the components of @x@
+are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
+the heap.
+
+When passing an unboxed tuple to a function, the components are
+flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
+
+\end{itemize}
+
+\Subsection{Vectored Returns}{vectored-returns}
+
+Many algebraic data types have more than one constructor. For
+example, the @Maybe@ type is defined like this:
+\begin{verbatim}
+ data Maybe a = Nothing | Just a
+\end{verbatim}
+How does the return convention encode which of the two constructors is
+being returned? A @case@ expression scrutinising a value of @Maybe@
+type would look like this:
+\begin{verbatim}
+ case E of
+ Nothing -> ...
+ Just a -> ...
+\end{verbatim}
+Rather than pushing a return address before evaluating the scrutinee,
+@E@, the @case@ expression pushes (a pointer to) a \emph{return
+vector}, a static table consisting of two code pointers: one for the
+@Just@ alternative, and one for the @Nothing@ alternative.
+
+\begin{itemize}
+
+\item
+
+The constructor @Nothing@ returns by jumping to the first item in the
+return vector with a pointer to a (statically built) Nothing closure
+in \Arg{1}.
+
+It might seem that we could avoid loading \Arg{1} in this case since the
+first item in the return vector will know that @Nothing@ was returned
+(and can easily access the Nothing closure in the (unlikely) event
+that it needs it. The only reason we load \Arg{1} is in case we have to
+perform an update (\secref{data-updates}).
+
+\item
+
+The constructor @Just@ returns by jumping to the second element of the
+return vector with a pointer to the closure in \Arg{1}.
+
+\end{itemize}
+
+In this way no test need be made to see which constructor returns;
+instead, execution resumes immediately in the appropriate branch of
+the @case@.
+
+\Subsection{Direct Returns}{direct-returns}
+
+When a datatype has a large number of constructors, it may be
+inappropriate to use vectored returns. The vector tables may be
+large and sparse, and it may be better to identify the constructor
+using a test-and-branch sequence on the tag. For this reason, we
+provide an alternative return convention, called a \emph{direct
+return}.
+
+In a direct return, the return address pushed on the stack really is a
+code pointer. The returning code loads a pointer to the closure being
+returned in \Arg{1} as usual, and also loads the tag into \Arg{2}.
+The code at the return address will test the tag and jump to the
+appropriate code for the case branch. If \Arg{2} isn't mapped to a
+real machine register on this architecture, then we don't load it on a
+return, instead using the tag directly from the info table.
+
+The choice of whether to use a vectored return or a direct return is
+made on a type-by-type basis --- up to a certain maximum number of
+constructors imposed by the update mechanism
+(\secref{data-updates}).
+
+Single-constructor data types also use direct returns, although in
+that case there is no need to return a tag in \Arg{2}.
+
+\ToDo{for a nullary constructor we needn't return a pointer to the
+constructor in \Arg{1}.}
+
+\Subsection{Updates}{data-updates}
+
+The entry code for an updatable thunk (which must be of arity 0):
+
+\begin{itemize}
+\item copies the free variables out of the thunk into registers or
+ onto the stack.
+\item pushes an \emph{update frame} onto the stack.
+
+An update frame is a small activation record consisting of
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Update Frame link} & \emph{Updatee} \\
+\hline
+\end{tabular}
+\end{center}
+
+\note{In the semantics part of the STG paper (section 5.6), an update
+frame consists of everything down to the last update frame on the
+stack. This would make sense too --- and would fit in nicely with
+what we're going to do when we add support for speculative
+evaluation.}
+\ToDo{I think update frames contain cost centres sometimes}
+
+\item If we are doing ``eager blackholing,'' we then overwrite the
+thunk with a black hole (\secref{BLACKHOLE}). Otherwise, we leave it
+to the garbage collector to black hole the thunk.
+
+\item
+Start evaluating the body of the expression.
+
+\end{itemize}
+
+When the expression finishes evaluation, it will enter the update
+frame on the top of the stack. Since the returner doesn't know
+whether it is entering a normal return address/vector or an update
+frame, we follow exactly the same conventions as return addresses and
+return vectors. That is, on entering the update frame:
+
+\begin{itemize}
+\item The value of the thunk is in \Arg{1}. (Recall that only thunks
+are updateable and that thunks return just one value.)
+
+\item If the data type is a direct-return type rather than a
+vectored-return type, then the tag is in \Arg{2}.
+
+\item The update frame is still on the stack.
+\end{itemize}
+
+We can safely share a single statically-compiled update function
+between all types. However, the code must be able to handle both
+vectored and direct-return datatypes. This is done by arranging that
+the update code looks like this:
+
+\begin{verbatim}
+ | ^ |
+ | return vector |
+ |---------------|
+ | fixed-size |
+ | info table |
+ |---------------| <- update code pointer
+ | update code |
+ | v |
+\end{verbatim}
+
+Each entry in the return vector (which is large enough to cover the
+largest vectored-return type) points to the update code.
+
+The update code:
+\begin{itemize}
+\item overwrites the \emph{updatee} with an indirection to \Arg{1};
+\item loads @Su@ from the Update Frame link;
+\item removes the update frame from the stack; and
+\item enters \Arg{1}.
+\end{itemize}
+
+We enter \Arg{1} again, having probably just come from there, because
+it knows whether to perform a direct or vectored return. This could
+be optimised by compiling special update code for each slot in the
+return vector, which performs the correct return.
+
+\Subsection{Semi-tagging}{semi-tagging}
+
+When a @case@ expression evaluates a variable that might be bound
+to a thunk it is often the case that the scrutinee is already evaluated.
+In this case we have paid the penalty of (a) pushing the return address (or
+return vector address) on the stack, (b) jumping through the info pointer
+of the scrutinee, and (c) returning by an indirect jump through the
+return address on the stack.
+
+If we knew that the scrutinee was already evaluated we could generate
+(better) code which simply jumps to the appropriate branch of the
+@case@ with a pointer to the scrutinee in \Arg{1}. (For direct
+returns to multiconstructor datatypes, we might also load the tag into
+\Arg{2}).
+
+An obvious idea, therefore, is to test dynamically whether the heap
+closure is a value (using the tag in the info table). If not, we
+enter the closure as usual; if so, we jump straight to the appropriate
+alternative. Here, for example, is pseudo-code for the expression
+@(case x of { (a,_,c) -> E }@:
+\begin{verbatim}
+ \Arg{1} = <pointer to x>;
+ tag = \Arg{1}->entry->tag;
+ if (isWHNF(tag)) {
+ Sp--; \\ insert space for return address
+ goto ret;
+ }
+ push(ret);
+ goto \Arg{1}->entry;
+
+ <info table for return address goes here>
+ret: a = \Arg{1}->data1; \\ suck out a and c to avoid space leak
+ c = \Arg{1}->data3;
+ <code for E2>
+\end{verbatim}
+and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
+\begin{verbatim}
+ \Arg{1} = <pointer to x>;
+ tag = \Arg{1}->entry->tag;
+ if (isWHNF(tag)) {
+ Sp--; \\ insert space for return address
+ goto retvec[tag];
+ }
+ push(retinfo);
+ goto \Arg{1}->entry;
+
+ .addr ret2
+ .addr ret1
+retvec: \\ reversed return vector
+ <return info table for case goes here>
+retinfo:
+ panic("Direct return into vectored case");
+
+ret1: <code for E1>
+
+ret2: x = \Arg{1}->head;
+ xs = \Arg{1}->tail;
+ <code for E2>
+\end{verbatim}
+There is an obvious cost in compiled code size (but none in the size
+of the bytecodes). There is also a cost in execution time if we enter
+more thunks than data constructors.
+
+Both the direct and vectored returns are easily modified to chase chains
+of indirections too. In the vectored case, this is most easily done by
+making sure that @IND = TAG_1 - 1@, and adding an extra field to every
+return vector. In the above example, the indirection code would be
+\begin{verbatim}
+ind: \Arg{1} = \Arg{1}->next;
+ goto ind_loop;
+\end{verbatim}
+where @ind_loop@ is the second line of code.
+
+Note that we have to leave space for a return address since the return
+address expects to find one. If the body of the expression requires a
+heap check, we will actually have to write the return address before
+entering the garbage collector.
+
+
+\Subsection{Heap and Stack Checks}{heap-and-stack-checks}
+
+The storage manager detects that it needs to garbage collect the old
+generation when the evaluator requests a garbage collection without
+having moved the heap pointer since the last garbage collection. It
+is therefore important that the GC routines \emph{not} move the heap
+pointer unless the heap check fails. This is different from what
+happens in the current STG implementation.
+
+Assuming that the stack can never shrink, we perform a stack check
+when we enter a closure but not when we return to a return
+continuation. This doesn't work for heap checks because we cannot
+predict what will happen to the heap if we call a function.
+
+If we wish to allow the stack to shrink, we need to perform a stack
+check whenever we enter a return continuation. Most of these checks
+could be eliminated if the storage manager guaranteed that a stack
+would always have 1000 words (say) of space after it was shrunk. Then
+we can omit stack checks for less than 1000 words in return
+continuations.
+
+When an argument satisfaction check fails, we need to push the closure
+(in R1) onto the stack - so we need to perform a stack check. The
+problem is that the argument satisfaction check occurs \emph{before}
+the stack check. The solution is that the caller of a slow entry
+point or closure will guarantee that there is at least one word free
+on the stack for the callee to use.
+
+Similarily, if a heap or stack check fails, we need to push the arguments
+and closure onto the stack. If we just came from the slow entry point,
+there's certainly enough space and it is the responsibility of anyone
+using the fast entry point to guarantee that there is enough space.
+
+\ToDo{Be more precise about how much space is required - document it
+in the calling convention section.}
+
+\Subsection{Handling interrupts/signals}{signals}
+
+\begin{verbatim}
+May have to keep C stack pointer in register to placate OS?
+May have to revert black holes - ouch!
+\end{verbatim}
+
+
+
+\section{The Loader}
+\section{The Compilers}
+
+\iffalse
+\part{Old stuff - needs to be mined for useful info}
+
+\section{The Scheduler}
+
+The Scheduler is the heart of the run-time system. A running program
+consists of a single running thread, and a list of runnable and
+blocked threads. The running thread returns to the scheduler when any
+of the following conditions arises:
+
+\begin{itemize}
+\item A heap check fails, and a garbage collection is required
+\item Compiled code needs to switch to interpreted code, and vice
+versa.
+\item The thread becomes blocked.
+\item The thread is preempted.
+\end{itemize}
+
+A running system has a global state, consisting of
+
+\begin{itemize}
+\item @Hp@, the current heap pointer, which points to the next
+available address in the Heap.
+\item @HpLim@, the heap limit pointer, which points to the end of the
+heap.
+\item The Thread Preemption Flag, which is set whenever the currently
+running thread should be preempted at the next opportunity.
+\item A list of runnable threads.
+\item A list of blocked threads.
+\end{itemize}
+
+Each thread is represented by a Thread State Object (TSO), which is
+described in detail in \secref{TSO}.
+
+The following is pseudo-code for the inner loop of the scheduler
+itself.
+
+\begin{verbatim}
+while (threads_exist) {
+ // handle global problems: GC, parallelism, etc
+ if (need_gc) gc();
+ if (external_message) service_message();
+ // deal with other urgent stuff
+
+ pick a runnable thread;
+ do {
+ // enter object on top of stack
+ // if the top object is a BCO, we must enter it
+ // otherwise appply any heuristic we wish.
+ if (thread->stack[thread->sp]->info.type == BCO) {
+ status = runHugs(thread,&smInfo);
+ } else {
+ status = runGHC(thread,&smInfo);
+ }
+ switch (status) { // handle local problems
+ case (StackOverflow): enlargeStack; break;
+ case (Error e) : error(thread,e); break;
+ case (ExitWith e) : exit(e); break;
+ case (Yield) : break;
+ }
+ } while (thread_runnable);
+}
+\end{verbatim}
+
+\Subsection{Invoking the garbage collector}{ghc-invoking-gc}
+
+\Subsection{Putting the thread to sleep}{ghc-thread-sleeps}
+
+\Subsection{Calling C from Haskell}{ghc-ccall}
+
+We distinguish between "safe calls" where the programmer guarantees
+that the C function will not call a Haskell function or, in a
+multithreaded system, block for a long period of time and "unsafe
+calls" where the programmer cannot make that guarantee.
+
+Safe calls are performed without returning to the scheduler and are
+discussed elsewhere (\ToDo{discuss elsewhere}).
+
+Unsafe calls are performed by returning an array (outside the Haskell
+heap) of arguments and a C function pointer to the scheduler. The
+scheduler allocates a new thread from the operating system
+(multithreaded system only), spawns a call to the function and
+continues executing another thread. When the ccall completes, the
+thread informs the scheduler and the scheduler adds the thread to the
+runnable threads list.
+
+\ToDo{Describe this in more detail.}
+
+
+\Subsection{Calling Haskell from C}{ghc-c-calls-haskell}
+
+When C calls a Haskell closure, it sends a message to the scheduler
+thread. On receiving the message, the scheduler creates a new Haskell
+thread, pushes the arguments to the C function onto the thread's stack
+(with tags for unboxed arguments) pushes the Haskell closure and adds
+the thread to the runnable list so that it can be entered in the
+normal way.
+
+When the closure returns, the scheduler sends back a message which
+awakens the (C) thread.
+
+\ToDo{Do we need to worry about the garbage collector deallocating the
+thread if it gets blocked?}
+
+\Subsection{Switching Worlds}{switching-worlds}
+
+\ToDo{This has all changed: we always leave a closure on top of the
+stack if we mean to continue executing it. The scheduler examines the
+top of the stack and tries to guess which world we want to be in. If
+it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
+closure, it certainly enters GHC and if it finds a standard closure,
+it is free to choose either one but it's probably best to enter GHC
+for everything except @BCO@s and perhaps @AP@s.}
+
+Because this is a combined compiled/interpreted system, the
+interpreter will sometimes encounter compiled code, and vice-versa.
+
+All world-switches go via the scheduler, ensuring that the world is in
+a known state ready to enter either compiled code or the interpreter.
+When a thread is run from the scheduler, the @whatNext@ field in the
+TSO (\secref{TSO}) is checked to find out how to execute the
+thread.
+
+\begin{itemize}
+\item If @whatNext@ is set to @ReturnGHC@, we load up the required
+registers from the TSO and jump to the address at the top of the user
+stack.
+\item If @whatNext@ is set to @EnterGHC@, we load up the required
+registers from the TSO and enter the closure pointed to by the top
+word of the stack.
+\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
+the stack, using the interpreter.
+\end{itemize}
+
+There are four cases we need to consider:
+
+\begin{enumerate}
+\item A GHC thread enters a Hugs-built closure.
+\item A GHC thread returns to a Hugs-compiled return address.
+\item A Hugs thread enters a GHC-built closure.
+\item A Hugs thread returns to a Hugs-compiled return address.
+\end{enumerate}
+
+GHC-compiled modules cannot call functions in a Hugs-compiled module
+directly, because the compiler has no information about arities in the
+external module. Therefore it must assume any top-level objects are
+CAFs, and enter their closures.
+
+\ToDo{Hugs-built constructors?}
+
+We now examine the various cases one by one and describe how the
+switch happens in each situation.
+
+\subsection{A GHC thread enters a Hugs-built closure}
+\label{sec:ghc-to-hugs-switch}
+
+There is three possibilities: GHC has entered a @PAP@, or it has
+entered a @AP@, or it has entered the BCO directly (for a top-level
+function closure). @AP@s and @PAP@s are ``standard closures'' and
+so do not require us to enter the bytecode interpreter.
+
+The entry code for a BCO does the following:
+
+\begin{itemize}
+\item Push the address of the object entered on the stack.
+\item Save the current state of the thread in its TSO.
+\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
+\end{itemize}
+
+BCO's for thunks and functions have the same entry conventions as
+slow entry points: they expect to find their arguments on the stac
+with unboxed arguments preceded by appropriate tags.
+
+\subsection{A GHC thread returns to a Hugs-compiled return address}
+\label{sec:ghc-to-hugs-switch}
+
+Hugs return addresses are laid out as in \figref{hugs-return-stack}.
+If GHC is returning, it will return to the address at the top of the
+stack, namely @HUGS_RET@. The code at @HUGS_RET@ performs the
+following:
+
+\begin{itemize}
+\item pushes \Arg{1} (the return value) on the stack.
+\item saves the thread state in the TSO
+\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
+\end{itemize}
+
+\noindent When Hugs runs, it will enter the return value, which will
+return using the correct Hugs convention
+(\secref{hugs-return-convention}) to the return address underneath it
+on the stack.
+
+\subsection{A Hugs thread enters a GHC-compiled closure}
+\label{sec:hugs-to-ghc-switch}
+
+Hugs can recognise a GHC-built closure as not being one of the
+following types of object:
+
+\begin{itemize}
+\item A @BCO@,
+\item A @AP@,
+\item A @PAP@,
+\item An indirection, or
+\item A constructor.
+\end{itemize}
+
+When Hugs is called on to enter a GHC closure, it executes the
+following sequence of instructions:
+
+\begin{itemize}
+\item Push the address of the closure on the stack.
+\item Save the current state of the thread in the TSO.
+\item Return to the scheduler, with the @whatNext@ field set to
+@EnterGHC@.
+\end{itemize}
+
+\subsection{A Hugs thread returns to a GHC-compiled return address}
+\label{sec:hugs-to-ghc-switch}
+
+When Hugs encounters a return address on the stack that is not
+@HUGS_RET@, it knows that a world-switch is required. At this point
+the stack contains a pointer to the return value, followed by the GHC
+return address. The following sequence is then performed:
+
+\begin{itemize}
+\item save the state of the thread in the TSO.
+\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
+\end{itemize}
+
+The first thing that GHC will do is enter the object on the top of the
+stack, which is a pointer to the return value. This value will then
+return itself to the return address using the GHC return convention.
+
+
+\fi
+
+
+\part{History}
+
+We're nuking the following:
+
+\begin{itemize}
+\item
+ Two stacks
+
+\item
+ Return in registers.
+ This lets us remove update code pointers from info tables,
+ removes the need for phantom info tables, simplifies
+ semi-tagging, etc.
+
+\item
+ Threaded GC.
+ Careful analysis suggests that it doesn't buy us very much
+ and it is hard to work with.
+
+ Eliminating threaded GCs eliminates the desire to share SMReps
+ so they are (once more) part of the Info table.
+
+\item
+ RetReg.
+ Doesn't buy us anything on a register-poor architecture and
+ isn't so important if we have semi-tagging.
+
+\begin{verbatim}
+ - Probably bad on register poor architecture
+ - Can avoid need to write return address to stack on reg rich arch.
+ - when a function does a small amount of work, doesn't
+ enter any other thunks and then returns.
+ eg entering a known constructor (but semitagging will catch this)
+ - Adds complications
+\end{verbatim}
+
+\item
+ Update in place
+
+ This lets us drop CONST closures and CHARLIKE closures (assuming we
+ don't support Unicode). The only point of these closures was to
+ avoid updating with an indirection.
+
+ We also drop @MIN_UPD_SIZE@ --- all we need is space to insert an
+ indirection or a black hole.
+
+\item
+ STATIC SMReps are now called CONST
+
+\item
+ @MUTVAR@ is new
+
+\item The profiling ``kind'' field is now encoded in the @INFO_TYPE@ field.
+This identifies the general sort of the closure for profiling purposes.
+
+\item Various papers describe deleting update frames for unreachable objects.
+ This has never been implemented and we don't plan to anytime soon.
+
+\end{itemize}
+
+
+\end{document}
+
+