[project @ 1998-01-08 14:40:22 by areid]

Added an overview section, commented out some of the more bogus parts of the document (but some bits still remain)
author: areid <unknown> 1998-01-08 14:40:22 +0000
committer: areid <unknown> 1998-01-08 14:40:22 +0000
commit: ff14742cc328f19b9bf7c04d9a69408e641cf64a (patch)
tree: f66486d1f07b4808eb6098105769c9c1d851d9ac /docs/rts
parent: b3163aefcecd596fe200f9d3cb5ea82443767269 (diff)
download: haskell-ff14742cc328f19b9bf7c04d9a69408e641cf64a.tar.gz
1 files changed, 2533 insertions, 630 deletions
diff --git a/docs/rts/rts.verb b/docs/rts/rts.verb
index 7e0af3d1a0..9dced4122a 100644
--- a/docs/rts/rts.verb
+++ b/docs/rts/rts.verb
@@ -5,13 +5,12 @@
 
 % TODO:
 %
-% o I think it would be worth making the connection with CPS explicit.
+% o I (ADR) think it would be worth making the connection with CPS explicit.
 %   Now that we have explicit activation records (on the stack), we can
 %   explain the whole system in terms of CPS and tail calls --- with the
 %   one requirement that we carefuly distinguish stack-allocated objects
 %   from heap-allocated objects.
 
-
 % \documentstyle[preprint]{acmconf}
 \documentclass[11pt]{article}
 \oddsidemargin 0.1 in       %   Note that \oddsidemargin = \evensidemargin
@@ -20,7 +19,7 @@
 \marginparsep 0 in 
 \sloppy
 
-\usepackage{epsfig}
+%\usepackage{epsfig}
 
 \newcommand{\note}[1]{{\em Note: #1}}
 % DIMENSION OF TEXT:
@@ -91,6 +90,12 @@ architectures.  It has been partly replaced by unboxed tuples
 explicitly state where results should be returned in registers (or on
 the stack) instead of on the heap.
 
+\item
+
+Lazy black-holing has been replaced by eager black-holing.  The
+problem with lazy black-holing is that it leaves slop in the heap
+which conflicts with the use of a mostly-copying collector.
+
 \end{itemize} 
 
 \subsection{Wish list}
@@ -131,6 +136,11 @@ mutually-exclusive choices.
 \begin{description}
 \item[@SEQUENTIAL@] No concurrency or parallelism support.
   This configuration might not support interrupt recovery.
+
+  \note{There's probably not much point in supporting this option.  If
+    we've gone to the effort of supporting concurency, we don't gain
+    much by being able to turn it off.}
+    
 \item[@CONCURRENT@]  Support for concurrency but not for parallelism.
 \item[@CONCURRENT@+@GRANSIM@] Concurrency support and simulated parallelism.
 \item[@CONCURRENT@+@PARALLEL@]     Concurrency support and real parallelism.
@@ -172,7 +182,7 @@ pointed types are the only things which can be lazily evaluated.  In
 the STG machine, this means that they are the only things that can be 
 {\em entered} or {\em updated} and it requires that they be boxed.
 
-\item An {\em unpointed} type is one that does not contains $\bot$.
+\item An {\em unpointed} type is one that does not contain $\bot$.
 Variables with unpointed types are never delayed --- they are always
 evaluated when they are constructed.  In the STG machine, this means
 that they cannot be {\em entered} or {\em updated}.  Unpointed objects
@@ -200,7 +210,7 @@ words and pointers are the same size.
 \subsection{Subtle Dependencies}
 
 Some decisions have very subtle consequences which should be written
-down in case we want to change our minds.
+down in case we want to change our minds.  
 
 \begin{itemize}
 
@@ -234,10 +244,16 @@ discusses who does the stack check and how much space they need.
 Heap objects never contain slop --- this is required if we want to
 support mostly-copying garbage collection.
 
+This is a big problem when updating since the updatee is usually
+bigger than an indirection object.  The fix is to overwrite the end of
+the updatee with ``slop objects'' (described in
+section~\ref{sect:slop-objects}).
 This is hard to arrange if we do \emph{lazy} blackholing
 (section~\ref{sect:lazy-black-holing}) so we currently plan to
 blackhole an object when we push the update frame.
 
+
+
 \item
 
 Info tables for constructors contain enough information to decide which
@@ -268,12 +284,6 @@ instead of
 
 \subsection{Unboxed tuples}\label{sect:unboxed-tuples}
 
-\Note{We're not planning to implement this right away.  There doesn't
-seem to be any real difficulty adding it to the runtime system but
-it'll take a lot of work adding it to the compiler.  Since unboxed
-tuples do not trigger allocation, the syntax could be modified to allow
-unboxed tuples in expressions.}
-
 Functions can take multiple arguments as easily as they can take one
 argument: there's no cost for adding another argument.  But functions
 can only return one result: the cost of adding a second ``result'' is
@@ -314,20 +324,44 @@ or multiple stack slots.  At first sight, this seems a little strange
 but it's no different from passing double precision floats in two
 registers.
 
-Note that unboxed tuples can only have one constructor and that
+Notes:
+\begin{itemize}
+\item
+Unboxed tuples can only have one constructor and that
 thunks never have unboxed types --- so we'll never try to update an
 unboxed constructor.  The restriction to a single constructor is
 largely to avoid garbage collection complications.
 
+\item
+The core syntax does not allow variables to be bound to
+unboxed tuples (ie in default case alternatives or as function arguments)
+and does not allow unboxed tuples to be fields of other constructors.
+However, there's no harm in allowing it in the source syntax as a
+convenient, but easily removed, syntactic sugar.
+
+\item
+The compiler generates a closure of the form
+@
+> c = \ x y z -> C x y z
+@
+for every constructor (whether boxed or unboxed).  
+
+This closure is normally used during desugaring to ensure that
+constructors are saturated and to apply any strictness annotations.
+They are also used when returning unboxed constructors to the machine
+code evaluator from the bytecode evaluator and when a heap check fails
+in a return continuation for an unboxed-tuple scrutinee.
+
+\end{itemize}
+
 \subsection{STG Syntax}
 
 \ToDo{Insert STG syntax with appropriate changes.}
 
 
-%-----------------------------------------------------------------------------
-\part{Evaluation Model}
-
-\section{Overview}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{System Overview}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 This part is concerned with defining the external interfaces of the
 major components of the system; the next part is concerned with their
@@ -336,255 +370,400 @@ inner workings.
 The major components of the system are:
 \begin{itemize}
 \item The scheduler
-\item The loader
 \item The storage manager
-\item The machine code evaluator (compiled code)
-\item The bytecode evaluator (interpreted code)
+\item The evaluators
+\item The loader
 \item The compilers
 \end{itemize}
 
-\section{The Compilers}
+\ToDo{Insert diagram showing all components underneath the scheduler
+and communicating only with the scheduler}
 
-Need to describe interface files.
-
-Here's an example - but I don't know the grammar - ADR.
-@
-_interface_ Main 1
-_exports_
-Main main ;
-_declarations_
-1 main _:_ IOBase.IO PrelBase.();;
-@
-
-\section{The Scheduler}
+\section{Scheduler}
 
 The Scheduler is the heart of the run-time system.  A running program
 consists of a single running thread, and a list of runnable and
-blocked threads.  The running thread returns to the scheduler when any
-of the following conditions arises:
+blocked threads.  All threads consist of a stack and a few words of
+status information.  Except for the running thread, all threads have a
+closure on top of their stack; the scheduler restarts a thread by
+entering an evaluator which performs some reduction and returns.
+
+\subsection{The scheduler's main loop}
+
+The scheduler consists of a loop which chooses a runnable thread and
+invokes one of the evaluators which performs some reduction and
+returns.
+
+The scheduler also takes care of system-wide issues such as heap
+overflow or communication with other processors (in the parallel
+system) and thread-specific problems such as stack overflow.
+
+\subsection{Creating a thread}
+
+Threads are created:
+
+\begin{itemize}
+
+\item
+
+When the scheduler is first invoked.
+
+\item
+
+When a message is received from another processor (I think). (Parallel
+system only.)
+
+\item
+
+When a C program calls some Haskell code.
+
+\end{itemize}
+
+
+\subsection{Restarting a thread}
+
+The evaluators can reduce almost all types of closure except that only
+the machine code evaluator can reduce GHC-compiled closures and only
+the bytecode evaluator can reduce Hugs-compiled closures.
+Consequently, the scheduler may use either evaluator to restart a
+thread unless the top closure is a @BCO@ or contains machine code.
+
+However, if the top of the stack contains a constructor, the scheduler
+should use the machine code evaluator to restart the thread.  This
+allows the bytecode evaluator to return a constructor to a machine
+code return address by pushing the constructor on top of the stack and
+returning to the scheduler.  If the return address under the
+constructor is @HUGS_RET@, the entry code for @HUGS_RET@ will
+rearrange the stack so that the return @BCO@ is on top of the stack
+and return to the scheduler which will then call the bytecode
+evaluator.  There is little point in trying to shorten this slightly
+indirect route since it will happen very rarely if at all.
+
+\subsection{Returning from a thread}
+
+The evaluators return to the scheduler when any of the following
+conditions arise:
 
 \begin{itemize}
 \item A heap check fails, and a garbage collection is required
-\item Compiled code needs to switch to interpreted code, and vice
-versa.
+\item Compiled code needs to switch to interpreted code, and vice versa.
+\item The evaluator needs to perform an ``unsafe'' C call.
 \item The thread becomes blocked.
 \item The thread is preempted.
+\item The thread terminates.
 \end{itemize}
 
-A running system has a global state, consisting of
+Except when the thread terminates, the thread always terminates with a
+closure on the top of the stack.  
+
+\subsection{Preempting a thread}
+
+Strictly speaking, threads cannot be preempted --- the scheduler
+merely sets a preemption request flag which the thread must arrange to
+test on a regular basis.  When an evaluator finds that the preemption
+request flag is set, it pushes an appropriate closure onto the stack
+and returns to the scheduler.
+
+In the bytecode interpreter, the flag is tested whenever we enter a
+closure.  If the preemption flag is set, it leaves the closure on top
+of the stack and returns to the scheduler.
+
+In the machine code evaluator, the flag is only tested when a heap or
+stack check fails.  This is less expensive than testing the flag on
+entering every closure but runs the risk that a thread will enter an
+infinite loop which does not allocate any space.  If the flag is set,
+the evaluator returns to the scheduler exactly as if a heap check had
+failed.
+
+\subsection{``Safe'' and ``unsafe'' C calls}
+
+There are two ways of calling C: 
+
+\begin{description}
+
+\item[``Safe'' C calls]
+are used if the programer is certain that the C function will not
+do anything dangerous such as calling a Haskell function or an
+operating system call which blocks the thread for a long period of time.
+\footnote{Warning: this use of ``safe'' and ``unsafe'' is the exact
+opposite of the usage for functions like @unsafePerformIO@.}
+Safe C calls are faster but must be hand-checked by the programmer.
+
+Safe C calls are performed by pushing the arguments onto the C stack
+and jumping to the C function's entry point.  On exit, the result of
+the function is in a register which is returned to the Haskell code as
+an unboxed value.
+
+\item[``Unsafe'' C calls] are used if the programmer suspects that the
+thread may do something dangerous like blocking or calling a Haskell
+function.  Unsafe C calls are relatively slow but are less problematic.
+
+Unsafe C calls are performed by pushing the arguments onto the Haskell
+stack, pushing a return continuation and returning a \emph{C function
+descriptor} to the scheduler.  The scheduler suspends the Haskell thread,
+spawns a new operating system thread which pops the arguments off the
+Haskell stack onto the C stack, calls the C function, pushes the
+function result onto the Haskell stack and informs the scheduler that
+the C function has completed and the Haskell thread is now runnable.
+
+\end{description}
+
+The bytecode evaluator will probably treat all C calls as being unsafe.
+
+\ToDo{It might be good for the programmer to indicate how the program is 
+unsafe.  For example, if we distinguish between C functions which might
+call Haskell functions and those which might block, we could perform a 
+safe call for blocking functions in a single-threaded system or, perhaps, in a multi-threaded system which only happens to have a single thread at the moment.}
+
 
+\section{The Evaluators}
+
+All the scheduler needs to know about evaluation is how to manipulate
+threads and how to find the closure on top of the stack.  The
+evaluators need to agree on the representations of certain objects and
+on how to return from the scheduler.
+
+\subsection{Returning to the Scheduler}
+\label{sect:switching-worlds}
+
+The evaluators return to the scheduler under three circumstances:
 \begin{itemize}
-\item @Hp@, the current heap pointer, which points to the next
-available address in the Heap.
-\item @HpLim@, the heap limit pointer, which points to the end of the
-heap.
-\item The Thread Preemption Flag, which is set whenever the currently
-running thread should be preempted at the next opportunity.
-\item A list of runnable threads. 
-\item A list of blocked threads.
-\end{itemize}
 
-Each thread is represented by a Thread State Object (TSO), which is
-described in detail in Section \ref{sect:TSO}.
+\item
 
-The following is pseudo-code for the inner loop of the scheduler
-itself.
+When they enter a closure built by the other evaluator.  That is, when
+the bytecode interpreter enters a closure compiled by GHC or when the
+machine code evaluator enters a BCO.
 
-@
-while (threads_exist) {
-  // handle global problems: GC, parallelism, etc
-  if (need_gc) gc();  
-  if (external_message) service_message();
-  // deal with other urgent stuff
+\item
 
-  pick a runnable thread;
-  do {
-    // enter object on top of stack
-    // if the top object is a BCO, we must enter it
-    // otherwise appply any heuristic we wish.
-    if (thread->stack[thread->sp]->info.type == BCO) {
-	status = runHugs(thread,&smInfo);
-    } else {
-	status = runGHC(thread,&smInfo);
-    }
-    switch (status) {  // handle local problems
-      case (StackOverflow): enlargeStack; break;
-      case (Error e)      : error(thread,e); break;
-      case (ExitWith e)   : exit(e); break;
-      case (Yield)        : break;
-    }
-  } while (thread_runnable);
-}
-@
+When they return to a return continuation built by the other
+evaluator.  That is, when the machine code evaluator returns to a
+continuation built by Hugs or when the bytecode evaluator returns to a
+continuation built by GHC.
 
-\subsection{Invoking the garbage collector}
-\subsection{Putting the thread to sleep}
+\item
 
-\subsection{Calling C from Haskell}
+When a heap or stack check fails or when the preemption flag is set.
 
-We distinguish between "safe calls" where the programmer guarantees
-that the C function will not call a Haskell function or, in a
-multithreaded system, block for a long period of time and "unsafe
-calls" where the programmer cannot make that guarantee.  
+\end{itemize}
 
-Safe calls are performed without returning to the scheduler and are
-discussed elsewhere (\ToDo{discuss elsewhere}).
+In all cases, they return to the scheduler with a closure on top of
+the stack.  The mechanism used to trigger the world switch and the
+choice of closure left on top of the stack varies according to which
+world is being left and what is being returned.
 
-Unsafe calls are performed by returning an array (outside the Haskell
-heap) of arguments and a C function pointer to the scheduler.  The
-scheduler allocates a new thread from the operating system
-(multithreaded system only), spawns a call to the function and
-continues executing another thread.  When the ccall completes, the
-thread informs the scheduler and the scheduler adds the thread to the
-runnable threads list.  
+\subsubsection{Leaving the bytecode evaluator}
+\label{sect:hugs-to-ghc-switch}
 
-\ToDo{Describe this in more detail.}
+\paragraph{Entering a machine code closure}
 
+When it enters a closure, the bytecode evaluator performs a switch
+based on the type of closure (@AP@, @PAP@, @Ind@, etc).  On entering a
+machine code closure, it returns to the scheduler with the closure on
+top of the stack.
 
-\subsection{Calling Haskell from C}
+\paragraph{Returning a constructor}
 
-When C calls a Haskell closure, it sends a message to the scheduler
-thread.  On receiving the message, the scheduler creates a new Haskell
-thread, pushes the arguments to the C function onto the thread's stack
-(with tags for unboxed arguments) pushes the Haskell closure and adds
-the thread to the runnable list so that it can be entered in the
-normal way.
+When it enters a constructor, the bytecode evaluator tests the return
+continuation on top of the stack.  If it is a machine code
+continuation, it returns to the scheduler with the constructor on top
+of the stack.
 
-When the closure returns, the scheduler sends back a message which
-awakens the (C) thread.  
+\note{This is why the scheduler must enter the machine code evaluator
+if it finds a constructor on top of the stack.}
 
-\ToDo{Do we need to worry about the garbage collector deallocating the
-thread if it gets blocked?}
+\paragraph{Returning an unboxed value}
 
-\subsection{Switching Worlds}
-\label{sect:switching-worlds}
+\note{Hugs doesn't support unboxed values in source programs but they
+are used for a few complex primops.}
 
-\ToDo{This has all changed: we always leave a closure on top of the
-stack if we mean to continue executing it.  The scheduler examines the
-top of the stack and tries to guess which world we want to be in.  If
-it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
-closure, it certainly enters GHC and if it finds a standard closure,
-it is free to choose either one but it's probably best to enter GHC
-for everything except @BCO@s and perhaps @AP@s.}
+When it enters a constructor, the bytecode evaluator tests the return
+continuation on top of the stack.  If it is a machine code
+continuation, it returns to the scheduler with the unboxed value and a
+special closure on top of the stack.  When the closure is entered (by
+the machine code evaluator), it returns the unboxed value on top of
+the stack to the return continuation under it.
 
-Because this is a combined compiled/interpreted system, the
-interpreter will sometimes encounter compiled code, and vice-versa.
+The runtime system (or GHC?) provides one of these closures for each 
+unboxed type.  Hugs cannot generate them itself since the entry code is
+really very tricky.
 
-All world-switches go via the scheduler, ensuring that the world is in
-a known state ready to enter either compiled code or the interpreter.
-When a thread is run from the scheduler, the @whatNext@ field in the
-TSO (Section \ref{sect:TSO}) is checked to find out how to execute the
-thread.
+\paragraph{Heap/Stack overflow and preemption}
 
-\begin{itemize}
-\item If @whatNext@ is set to @ReturnGHC@, we load up the required
-registers from the TSO and jump to the address at the top of the user
+The bytecode evaluator tests for heap/stack overflow and preemption
+when entering a BCO and simply returns with the BCO on top of the
 stack.
-\item If @whatNext@ is set to @EnterGHC@, we load up the required
-registers from the TSO and enter the closure pointed to by the top
-word of the stack.
-\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
-the stack, using the interpreter.
-\end{itemize}
 
-There are four cases we need to consider:
+\subsubsection{Leaving the machine code evaluator}
+\label{sect:ghc-to-hugs-switch}
 
-\begin{enumerate}
-\item A GHC thread enters a Hugs-built closure.
-\item A GHC thread returns to a Hugs-compiled return address.
-\item A Hugs thread enters a GHC-built closure.
-\item A Hugs thread returns to a Hugs-compiled return address.
-\end{enumerate}
+\paragraph{Entering a BCO}
 
-GHC-compiled modules cannot call functions in a Hugs-compiled module
-directly, because the compiler has no information about arities in the
-external module.  Therefore it must assume any top-level objects are
-CAFs, and enter their closures.
+The entry code for a BCO pushes the BCO onto the stack and returns to
+the scheduler.
 
-\ToDo{Hugs-built constructors?}
+\paragraph{Returning a constructor}
 
-We now examine the various cases one by one and describe how the
-switch happens in each situation.
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation.  Figure~\ref{fig:hugs-return-stack}
+shows the state of the stack just before evaluating the scrutinee.
 
-\subsection{A GHC thread enters a Hugs-built closure}
-\label{sect:ghc-to-hugs-closure}
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| bco      |--> BCO
++----------+
+| HUGS_RET |
++----------+
+@
+%\input{hugs_return1.pstex_t}
+\end{center}
+\caption{Stack layout for evaluating a scrutinee}
+\label{fig:hugs-return-stack}
+\end{figure}
 
-There is three possibilities: GHC has entered a @PAP@, or it has
-entered a @AP@, or it has entered the BCO directly (for a top-level
-function closure).  @AP@s and @PAP@s are ``standard closures'' and
-so do not require us to enter the bytecode interpreter.
+This return address rearranges the stack so that the bco pointer is
+above the constructor on the stack (as shown in
+figure~\ref{fig:hugs-boxed-return}) and returns to the scheduler.
 
-The entry code for a BCO does the following:
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| con      |--> Constructor
++----------+
+| bco      |--> BCO
++----------+
+@
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for entering a Hugs return address}
+\label{fig:hugs-boxed-return}
+\end{figure}
 
-\begin{itemize}
-\item Push the address of the object entered on the stack.
-\item Save the current state of the thread in its TSO.
-\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
-\end{itemize}
+\paragraph{Returning an unboxed value}
 
-BCO's for thunks and functions have the same entry conventions as
-slow entry points: they expect to find their arguments on the stac
-with unboxed arguments preceded by appropriate tags.
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation.  This return address rearranges the
+stack so that the bco pointer is above the unboxed value (as shown in
+figure~\ref{fig:hugs-entering-unboxed-return}) and returns to the scheduler.
 
-\subsection{A GHC thread returns to a Hugs-compiled return address}
-\label{sect:ghc-to-hugs-return}
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| 1#       |
++----------+
+| I#       |
++----------+
+| bco      |--> BCO
++----------+
+@
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for returning an unboxed value}
+\label{fig:hugs-entering-unboxed-return}
+\end{figure}
 
-Hugs return addresses are laid out as in Figure
-\ref{fig:hugs-return-stack}.  If GHC is returning, it will return to
-the address at the top of the stack, namely @HUGS_RET@.  The code at
-@HUGS_RET@ performs the following:
+\paragraph{Heap/Stack overflow and preemption}
 
-\begin{itemize}
-\item pushes \Arg{1} (the return value) on the stack.
-\item saves the thread state in the TSO
-\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
-\end{itemize}
+\ToDo{}
 
-\noindent When Hugs runs, it will enter the return value, which will
-return using the correct Hugs convention (Section
-\ref{sect:hugs-return-convention}) to the return address underneath it
-on the stack.
+\subsection{Shared Representations}
 
-\subsection{A Hugs thread enters a GHC-compiled closure}
-\label{sect:hugs-to-ghc-closure}
+We share @AP@s, @PAP@s, constructors, indirections, selectors(?) and
+update frames.  These are described in section~\ref{sect:heap-objects}.
 
-Hugs can recognise a GHC-built closure as not being one of the
-following types of object:
+
+\section{The Storage Manager}
+
+The storage manager is responsible for managing the heap and all
+objects stored in it.  Most objects are just copied in the normal way
+but a number receive special treatment by the storage manager:
 
 \begin{itemize}
-\item A @BCO@,
-\item A @AP@,
-\item A @PAP@,
-\item An indirection, or
-\item A constructor.
-\end{itemize}
+\item
 
-When Hugs is called on to enter a GHC closure, it executes the
-following sequence of instructions:
+Indirections are shorted out.
+
+\item
+
+Weak pointers and stable pointers are treated specially.
+
+\item
+
+Thread State Objects (TSOs) and the stacks within them are treated specially.
+In particular:
 
 \begin{itemize}
-\item Push the address of the closure on the stack.
-\item Save the current state of the thread in the TSO.
-\item Return to the scheduler, with the @whatNext@ field set to
-@EnterGHC@.
+\item
+
+Update frames pointing to unreachable objects are squeezed out.
+
+\item
+
+Adjacent update frames (for different closures) are compressed to a
+single update frame pointing to a single black hole.
+
+\item
+
+If the stack contains a large amount of free space, the storage
+manager may shrink the stack.  If it shrinks the stack, it guarantees
+never to leave less than @MIN_SIZE_SHRUNKEN_STACK@ empty words on the
+stack when it does so.
+
+\ToDo{Would it be useful for the storage manager to enlarge the stack?}
+
 \end{itemize}
 
-\subsection{A Hugs thread returns to a GHC-compiled return address}
-\label{sect:hugs-to-ghc-return}
+\item
 
-When Hugs encounters a return address on the stack that is not
-@HUGS_RET@, it knows that a world-switch is required.  At this point
-the stack contains a pointer to the return value, followed by the GHC
-return address.  The following sequence is then performed:
+Very large objects (eg large arrays and TSOs) are not moved if
+possible.
 
-\begin{itemize}
-\item save the state of the thread in the TSO.
-\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
 \end{itemize}
 
-The first thing that GHC will do is enter the object on the top of the
-stack, which is a pointer to the return value.  This value will then
-return itself to the return address using the GHC return convention.
+
+\section{The Compilers}
+
+Need to describe interface files, format of bytecode files, symbols
+defined by machine code files.
+
+\subsection{Interface Files}
+
+Here's an example - but I don't know the grammar - ADR.
+@
+_interface_ Main 1
+_exports_
+Main main ;
+_declarations_
+1 main _:_ IOBase.IO PrelBase.();;
+@
+
+\subsection{Bytecode files}
+
+(All that matters here is what the loader sees.)
+
+\subsection{Machine code files}
+
+(Again, all that matters is what the loader sees.)
+
+
+\subsection{Bytecode files}
+
+(All that matters here is what the loader sees.)
+
+\subsection{Machine code files}
+
+(Again, all that matters is what the loader sees.)
+
 
 \section{The Loader}
 
@@ -647,512 +826,1458 @@ hit for doing this.
 
 \end{description}
 
-\section{Compiled Execution}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{Internal details}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-This section describes the framework in which compiled code evaluates
-expressions.  Only at certain points will compiled code need to be
-able to talk to the interpreted world; these are discussed in Section
-\ref{sect:switching-worlds}.
+This part is concerned with the internal details of the components
+described in the previous part.
 
-\subsection{Calling conventions}
+The major components of the system are:
+\begin{itemize}
+\item The scheduler
+\item The storage manager
+\item The evaluators
+\item The loader
+\item The compilers
+\end{itemize}
 
-\subsubsection{The call/return registers}
+\section{The Scheduler}
 
-One of the problems in designing a virtual machine is that we want it
-abstract away from tedious machine details but still reveal enough of
-the underlying hardware that we can make sensible decisions about code
-generation.  A major problem area is the use of registers in
-call/return conventions.  On a machine with lots of registers, it's
-cheaper to pass arguments and results in registers than to pass them
-on the stack.  On a machine with very few registers, it's cheaper to
-pass arguments and results on the stack than to use ``virtual
-registers'' in memory.  We therefore use a hybrid system: the first
-$n$ arguments or results are passed in registers; and the remaining
-arguments or results are passed on the stack.  For register-poor
-architectures, it is important that we allow $n=0$.
+\section{The Storage Manager}
+\label{sect:storage-manager-internals}
 
-We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with
-the understanding that \Arg{1} \ldots \Arg{n} are in registers and
-\Arg{n+1} \ldots \Arg{m} are on top of the stack.
+\ToDo{Fix this picture}
 
-Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine
-registers depends on the {\em kinds} of the arguments.  For example,
-if the first argument is a Float, we might pass it in a different
-register from if it is an Int.  In fact, we might find that a given
-architecture lets us pass varying numbers of arguments according to
-their types.  For example, if a CPU has 2 Int registers and 2 Float
-registers then we could pass between 2 and 4 arguments in machine
-registers --- depending on whether they all have the same kind or they
-have different kinds.
+\begin{figure}
+\begin{center}
+\input{closure}
+\end{center}
+\caption{A closure}
+\label{fig:closure}
+\end{figure}
 
-\subsubsection{Entering closures}
-\label{sect:entering-closures}
+Every {\em heap object} is a contiguous block
+of memory, consisting of a fixed-format {\em header} followed
+by zero or more {\em data words}.
 
-To evaluate a closure we jump to the entry code for the closure
-passing a pointer to the closure in \Arg{1} so that the entry code can
-access its environment.
+\ToDo{I absolutely do not believe that every heap object has a header
+like this - ADR.  I believe that they all have an info pointer but I
+see no readon why stack objects and unpointed heap objects would have
+an entry count since this will always be zero.}
 
-\subsubsection{Function call}
+The header consists of the following fields:
+\begin{itemize}
+\item A one-word {\em info pointer}, which points to
+the object's static {\em info table}.
+\item Zero or more {\em admin words} that support
+\begin{itemize}
+\item Profiling (notably a {\em cost centre} word).
+  \note{We could possibly omit the cost centre word from some 
+  administrative objects.}
+\item Parallelism (e.g. GranSim keeps the object's global address here,
+though GUM keeps a separate hash table).
+\item Statistics (e.g. a word to track how many times a thunk is entered.).
 
-The function-call mechanism is obviously crucial.  There are five different
-cases to consider:
-\begin{enumerate}
+We add a Ticky word to the fixed-header part of closures.  This is
+used to indicate if a closure has been updated but not yet entered. It
+is set when the closure is updated and cleared when subsequently
+entered.
 
-\item {\em Known combinator (function with no free variables) and enough arguments.}  
+NB: It is {\em not} an ``entry count'', it is an
+``entries-after-update count.''  The commoning up of @CONST@,
+@CHARLIKE@ and @INTLIKE@ closures is turned off(?) if this is
+required. This has only been done for 2s collection.
 
-A fast call can be made: push excess arguments onto stack and jump to
-function's {\em fast entry point} passing arguments in \Arg{1} \ldots
-\Arg{m}.  
+\end{itemize}
+\end{itemize}
 
-The {\em fast entry point} is only called with exactly the right
-number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly
-start doing useful work without first testing whether it has enough
-registers or having to pop them off the stack first.
+Most of the RTS is completely insensitive to the number of admin words.
+The total size of the fixed header is @FIXED_HS@.
 
-\item {\em Known combinator and insufficient arguments.}
+Many heap objects contain fields allowing them to be inserted onto lists
+during evaluation or during garbage collection. The lists required by
+the evaluator and storage manager are as follows.
 
-A slow call can be made: push all arguments onto stack and jump to
-function's {\em slow entry point}.
+\begin{itemize}
+\item 2 lists of threads: runnable threads and sleeping threads.
 
-Any unpointed arguments which are pushed on the stack must be tagged.
-This means pushing an extra word on the stack below the unpointed
-words, containing the number of unpointed words above it.
+\item The {\em static object list} is a list of all statically
+allocated objects which might contain pointers into the heap.
+(Section~\ref{sect:static-objects}.)
 
-%Todo: forward ref about tagging?
-%Todo: picture?
+\item The {\em updated thunk list} is a list of all thunks in the old
+generation which have been updated with an indirection.  
+(Section~\ref{sect:IND_OLDGEN}.)
 
-The {\em slow entry point} might be called with insufficient arguments
-and so it must test whether there are enough arguments on the stack.
-This {\em argument satisfaction check} consists of checking that
-@Su-Sp@ is big enough to hold all the arguments (including any tags).
+\item The {\em mutables list} is a list of all other objects in the
+old generation which might contain pointers into the new generation.
+Most of the object on this list are ``mutable.''
+(Section~\ref{sect:mutables}.)
 
-\begin{itemize} 
+\item The {\em Foreign Object list} is a list of all foreign objects
+ which have not yet been deallocated. (Section~\ref{sect:FOREIGN}.)
 
-\item If the argument satisfaction check fails, it is because there is
-one or more update frames on the stack before the rest of the
-arguments that the function needs.  In this case, we construct a PAP
-(partial application, section~\ref{sect:PAP}) containing the arguments
-which are on the stack.  The PAP construction code will return to the
-update frame with the address of the PAP in \Arg{1}.
+\item The {\em Spark pool} is a doubly(?) linked list of Spark objects
+maintained by the parallel system.  (Section~\ref{sect:SPARK}.)
 
-\item If the argument satisfaction check succeeds, we jump to the fast
-entry point with the arguments in \Arg{1} \ldots \Arg{arity}.
+\item The {\em Blocked Fetch list} (or
+lists?). (Section~\ref{sect:BLOCKED_FETCH}.)
 
-If the fast entry point expects to receive some of \Arg{i} on the
-stack, we can reduce the amount of movement required by making the
-stack layout for the fast entry point look like the stack layout for
-the slow entry point.  Since the slow entry point is entered with the
-first argument on the top of the stack and with tags in front of any
-unpointed arguments, this means that if \Arg{i} is unpointed, there
-should be space below it for a tag and that the highest numbered
-argument should be passed on the top of the stack.
+\item For each thread, there is a list of all update frames on the
+stack.  (Section~\ref{sect:data-updates}.)
 
-We usually arrange that the fast entry point is placed immediately
-after the slow entry point --- so we can just ``fall through'' to the
-fast entry point without performing a jump.
 
 \end{itemize}
 
+\ToDo{The links for these fields are usually inserted immediately
+after the fixed header except ...}
 
-\item {\em Known function closure (function with free variables) and enough arguments.}
+\subsection{Info Tables}
 
-A fast call can be made: push excess arguments onto stack and jump to
-function's {\em fast entry point} passing a pointer to closure in
-\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}.
+An {\em info table} is a contiguous block of memory, {\em laid out
+backwards}.  That is, the first field in the list that follows
+occupies the highest memory address, and the successive fields occupy
+successive decreasing memory addresses.
 
-Like the fast entry point for a combinator, the fast entry point for a
-closure is only called with appropriate values in \Arg{1} \ldots
-\Arg{m+1} so we can start work straight away.  The pointer to the
-closure is used to access the free variables of the closure.
+\begin{center}
+\begin{tabular}{|c|}
+   \hline Parallelism Info 
+\\ \hline Profile Info 
+\\ \hline Debug Info 
+\\ \hline Tag / Static reference table
+\\ \hline Storage manager layout info
+\\ \hline Closure type 
+\\ \hline entry code
+\\       \vdots
+\end{tabular}
+\end{center}
+An info table has the following contents (working backwards in memory
+addresses):
+\begin{itemize}
+\item The {\em entry code} for the closure.
+This code appears literally as the (large) last entry in the
+info table, immediately preceded by the rest of the info table.
+An {\em info pointer} always points to the first byte of the entry code.
 
+\item A one-word {\em closure type field}, @INFO_TYPE@, identifies what kind
+of closure the object is.  The various types of closure are described
+in Section~\ref{sect:closures}.
+In some configurations, some useful properties of 
+closures (is it a HNF?  can it be sparked?)
+are represented as high-order bits so they can be tested quickly.
 
-\item {\em Known function closure and insufficient arguments.}
+\item A single pointer or word --- the {\em storage manager info field},
+@INFO_SM@, contains auxiliary information describing the closure's
+precise layout, for the benefit of the garbage collector and the code
+that stuffs graph into packets for transmission over the network.
 
-A slow call can be made: push all arguments onto stack and jump to the
-closure's slow entry point passing a pointer to the closure in \Arg{1}.
+\item A one-word {\em Tag/Static Reference Table} field, @INFO_SRT@.
+For data constructors, this field contains the constructor tag, in the
+range $0..n-1$ where $n$ is the number of constructors.  For all other
+objects it contains a pointer to a table which enables the garbage
+collector to identify all accessible code and CAFs.  They are fully
+described in Section~\ref{sect:srt}.
 
-Again, the slow entry point performs an argument satisfaction check
-and either builds a PAP or pops the arguments off the stack into
-\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
+\item {\em Profiling info\/}
 
+\ToDo{The profiling info is completely bogus.  I've not deleted it
+from the document but I've commented it all out.}
 
-\item {\em Unknown function closure, thunk or constructor.}
+% change to \iftrue to uncomment this section
+\iffalse
 
-Sometimes, the function being called is not statically identifiable.
-Consider, for example, the @compose@ function:
-@
-  compose f g x = f (g x)
-@
-Since @f@ and @g@ are passed as arguments to @compose@, the latter has
-to make a heap call.  In a heap call the arguments are pushed onto the
-stack, and the closure bound to the function is entered.  In the
-example, a thunk for @(g x)@ will be allocated, (a pointer to it)
-pushed on the stack, and the closure bound to @f@ will be
-entered. That is, we will jump to @f@s entry point passing @f@ in
-\Arg{1}.  If \Arg{1} is passed on the stack, it is pushed on top of
-the thunk for @(g x)@.
+Closure category records are attached to the info table of the
+closure. They are declared with the info table. We put pointers to
+these ClCat things in info tables.  We need these ClCat things because
+they are mutable, whereas info tables are immutable.  Hashing will map
+similar categories to the same hash value allowing statistics to be
+grouped by closure category.
 
-The {\em entry code} for an updateable thunk (which must have arity 0)
-pushes an update frame on the stack and starts executing the body of
-the closure --- using \Arg{1} to access any free variables.  This is
-described in more detail in section~\ref{sect:data-updates}.
+Cost Centres and Closure Categories are hashed to provide indexes
+against which arbitrary information can be stored. These indexes are
+memoised in the appropriate cost centre or category record and
+subsequent hashes avoided by the index routine (it simply returns the
+memoised index).
 
-The {\em entry code} for a non-updateable closure is just the
-closure's slow entry point.
+There are different features which can be hashed allowing information
+to be stored for different groupings. Cost centres have the cost
+centre recorded (using the pointer), module and group. Closure
+categories have the closure description and the type
+description. Records with the same feature will be hashed to the same
+index value.
 
-\end{enumerate}
+The initialisation routines, @init_index_<feature>@, allocate a hash
+table in which the cost centre / category records are stored. The
+lower bound for the table size is taken from @max_<feature>_no@. They
+return the actual table size used (the next power of 2). Unused
+locations in the hash table are indicated by a 0 entry. Successive
+@init_index_<feature>@ calls just return the actual table size.
 
-In addition to the above considerations, if there are \emph{too many}
-arguments then the extra arguments are simply pushed on the stack with
-appropriate tags.
+Calls to @index_<feature>@ will insert the cost centre / category
+record in the @<feature>@ hash table, if not already inserted. The hash
+index is memoised in the record and returned. 
+
+CURRENTLY ONLY ONE MEMOISATION SLOT IS AVILABLE IN EACH RECORD SO
+HASHING CAN ONLY BE DONE ON ONE FEATURE FOR EACH RECORD. This can be
+easily relaxed at the expense of extra memoisation space or continued
+rehashing.
+
+The initialisation routines must be called before initialisation of
+the stacks and heap as they require to allocate storage. It is also
+expected that the caller may want to allocate additional storage in
+which to store profiling information based on the return table size
+value(s).
+
+\begin{center}
+\begin{tabular}{|l|}
+   \hline Hash Index
+\\ \hline Selected
+\\ \hline Kind
+\\ \hline Description String
+\\ \hline Type String
+\\ \hline
+\end{tabular}
+\end{center}
 
-To summarise, a closure's standard (slow) entry point performs the following:
 \begin{description}
-\item[Argument satisfaction check.] (function closure only)
-\item[Stack overflow check.]
-\item[Heap overflow check.]
-\item[Copy free variables out of closure.] %Todo: why?
-\item[Eager black holing.] (updateable thunk only) %Todo: forward ref.
-\item[Push update frame.]
-\item[Evaluate body of closure.]
+\item[Hash Index] Memoised copy
+\item[Selected] 
+  Is this category selected (-1 == not memoised, selected? 0 or 1)
+\item[Kind]
+One of the following values (defined in CostCentre.lh):
+
+\begin{description}
+\item[@CON_K@]
+A constructor.
+\item[@FN_K@]
+A literal function.
+\item[@PAP_K@]
+A partial application.
+\item[@THK_K@]
+A thunk, or suspension.
+\item[@BH_K@]
+A black hole.
+\item[@ARR_K@]
+An array.
+\item[@ForeignObj_K@]
+A Foreign object (non-Haskell heap resident).
+\item[@SPT_K@]
+The Stable Pointer table.  (There should only be one of these but it
+represents a form of weak space leak since it can't shrink to meet
+non-demand so it may be worth watching separately? ADR)
+\item[@INTERNAL_KIND@]
+Something internal to the runtime system.
 \end{description}
 
 
-\subsection{Case expressions and return conventions}
-\label{sect:return-conventions}
+\item[Description] Source derived string detailing closure description.
+\item[Type] Source derived string detailing closure type.
+\end{description}
 
-The {\em evaluation} of a thunk is always initiated by
-a @case@ expression.  For example:
-@
-  case x of (a,b) -> E
-@
+\fi % end of commented out stuff
 
-The code for a @case@ expression looks like this:
+\item {\em Parallelism info\/}
+\ToDo{}
+
+\item {\em Debugging info\/}
+\ToDo{}
 
-\begin{itemize}
-\item Push the free variables of the branches on the stack (fv(@E@) in
-this case).
-\item  Push a \emph{return address} on the stack.
-\item  Evaluate the scrutinee (@x@ in this case).
 \end{itemize}
 
-Once evaluation of the scrutinee is complete, execution resumes at the
-return address, which points to the code for the expression @E@.
 
-When execution resumes at the return point, there must be some {\em
-return convention} that defines where the components of the pair, @a@
-and @b@, can be found.  The return convention varies according to the
-type of the scrutinee @x@:
+%-----------------------------------------------------------------------------
+\subsection{Kinds of Heap Object}
+\label{sect:closures}
 
+Heap objects can be classified in several ways, but one useful one is
+this:
 \begin{itemize}
+\item 
+{\em Static closures} occupy fixed, statically-allocated memory
+locations, with globally known addresses.
 
 \item 
+{\em Dynamic closures} are individually allocated in the heap.
 
-(A space for) the return address is left on the top of the stack.
-Leaving the return address on the stack ensures that the top of the
-stack contains a valid activation record
-(section~\ref{sect:activation-records}) --- should a garbage collection
-be required.
+\item 
+{\em Stack closures} are closures allocated within a thread's stack
+(which is itself a heap object).  Unlike other closures, there are
+never any pointers to stack closures.  Stack closures are discussed in
+Section~\ref{sect:stacks}.
 
-\item If @x@ has a boxed type (e.g.~a data constructor or a function),
-a pointer to @x@ is returned in \Arg{1}.
+\end{itemize}
+A second useful classification is this:
+\begin{itemize}
+\item 
+{\em Executive objects}, such as thunks and data constructors,
+participate directly in a program's execution.  They can be subdivided into
+three kinds of objects according to their type:
+\begin{itemize}
+\item 
+{\em Pointed objects}, represent values of a {\em pointed} type
+(<.pointed types launchbury.>) --i.e.~a type that includes $\bottom$ such as @Int@ or @Int# -> Int#@.
 
-\ToDo{Warn that components of E should be extracted as soon as
-possible to avoid a space leak.}
+\item {\em Unpointed objects}, represent values of a {\em unpointed} type --i.e.~a type that does not include $\bottom$ such as @Int#@ or @Array#@.
 
-\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is
-returned in \Arg{1}
+\item {\em Activation frames}, represent ``continuations''.  They are
+always stored on the stack and are never pointed to by heap objects or
+passed as arguments.  \note{It's not clear if this will still be true
+once we support speculative evaluation.}
 
-\item If @x@ is an unboxed tuple constructor, the components of @x@
-are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
-the heap.  
+\end{itemize}
 
-When passing an unboxed tuple to a function, the components are
-flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
+\item {\em Administrative objects}, such as stack objects and thread
+state objects, do not represent values in the original program.
+\end{itemize}
+
+Only pointed objects can be entered.  All pointed objects share a
+common header format: the ``pointed header''; while all unpointed
+objects share a common header format: the ``unpointed header''.
+\ToDo{Describe the difference and update the diagrams to mention
+an appropriate header type.}
+
+This section enumerates all the kinds of heap objects in the system.
+Each is identified by a distinct @INFO_TYPE@ tag in its info table.
+
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+
+closure kind          & Section \\
+		      
+\hline                          
+{\em Pointed} \\      
+\hline 		      
+		      
+@CONSTR@              & \ref{sect:CONSTR}    \\
+@CONSTR_STATIC@       & \ref{sect:CONSTR}    \\
+@CONSTR_STATIC_NOCAF@ & \ref{sect:CONSTR}    \\
+		      
+@FUN@                 & \ref{sect:FUN}       \\
+@FUN_STATIC@          & \ref{sect:FUN}       \\
+		      
+@THUNK@               & \ref{sect:THUNK}     \\
+@THUNK_STATIC@        & \ref{sect:THUNK}     \\
+@THUNK_SELECTOR@      & \ref{sect:THUNK_SEL} \\
+		      
+@BCO@		      & \ref{sect:BCO}       \\
+@BCO_CAF@	      & \ref{sect:BCO}       \\
+		      
+@AP@	      	      & \ref{sect:AP}   	 \\
+@PAP@                 & \ref{sect:PAP}       \\
+		      
+@IND@                 & \ref{sect:IND}       \\
+@IND_OLDGEN@          & \ref{sect:IND}       \\
+@IND_PERM@            & \ref{sect:IND}       \\
+@IND_OLDGEN_PERM@     & \ref{sect:IND}       \\
+@IND_STATIC@          & \ref{sect:IND}       \\
+		      
+\hline		      
+{\em Unpointed} \\    
+\hline		      
+		      		      
+@ARR_WORDS@           & \ref{sect:ARR_WORDS1},\ref{sect:ARR_WORDS2} \\
+@ARR_PTRS@            & \ref{sect:ARR_PTRS}  \\
+@MUTVAR@              & \ref{sect:MUTVAR}    \\
+@MUTARR_PTRS@         & \ref{sect:MUTARR_PTRS} \\
+@MUTARR_PTRS_FROZEN@  & \ref{sect:MUTARR_PTRS_FROZEN} \\
+		      
+@FOREIGN@             & \ref{sect:FOREIGN}   \\
+		      
+@BH@                  & \ref{sect:BH}        \\
+@MVAR@ 		      & \ref{sect:MVAR}      \\
+@IVAR@ 		      & \ref{sect:IVAR}      \\
+@FETCHME@             & \ref{sect:FETCHME}   \\
+\hline
+\end{tabular}
+
+Activation frames do not live (directly) on the heap --- but they have
+a similar organisation.
+
+\begin{tabular}{|l|l|}\hline
+closure kind		& Section			\\ \hline
+@RET_SMALL@ 		& \ref{sect:activation-records}	\\
+@RET_VEC_SMALL@ 	& \ref{sect:activation-records}	\\
+@RET_BIG@		& \ref{sect:activation-records}	\\
+@RET_VEC_BIG@		& \ref{sect:activation-records}	\\
+@UPDATE_FRAME@ 		& \ref{sect:activation-records}	\\
+\hline
+\end{tabular}
+
+There are also a number of administrative objects.
+
+\begin{tabular}{|l|l|}\hline
+closure kind		& Section 			\\ \hline
+@TSO@                   & \ref{sect:TSO} 		\\
+@STACK_OBJECT@          & \ref{sect:STACK_OBJECT} 	\\
+@STABLEPTR_TABLE@       & \ref{sect:STABLEPTR_TABLE} 	\\
+@SPARK_OBJECT@          & \ref{sect:SPARK}		\\
+@BLOCKED_FETCH@       	& \ref{sect:BLOCKED_FETCH} 	\\
+\hline
+\end{tabular}
+
+\ToDo{I guess the parallel system has something like a stable ptr
+table.  Is there any opportunity for sharing code/data structures
+here?}
+
+
+\subsection{Predicates}
+
+\ToDo{The following is a first attempt at defining a useful set of
+predicates.  Some (such as @isWHNF@ and @isSparkable@) may need their
+definitions tweaked a little.}
+
+The runtime system sometimes needs to be able to distinguish objects
+according to their properties: is the object updateable? is it in weak
+head normal form? etc.  These questions can be answered by examining
+the @INFO_TYPE@ field of the object's info table.  
+
+We define the following predicates to detect families of related
+info types.  They are mutually exclusive and exhaustive.
+
+\begin{itemize}
+\item @isCONSTR@ is true for @CONSTR@s.
+\item @isFUN@ is true for @FUN@s.
+\item @isTHUNK@ is true for @THUNK@s.
+\item @isBCO@ is true for @BCO@s.
+\item @isAP@ is true for @AP@s.
+\item @isPAP@ is true for @PAP@s.
+\item @isINDIRECTION@ is true for indirection objects. 
+\item @isBH@ is true for black holes.
+\item @isFOREIGN_OBJECT@ is true for foreign objects.
+\item @isARRAY@ is true for array objects.
+\item @isMVAR@ is true for @MVAR@s.
+\item @isIVAR@ is true for @IVAR@s.
+\item @isFETCHME@ is true for @FETCHME@s.
+\item @isSLOP@ is true for slop objects.
+\item @isRET_ADDR@ is true for return addresses.
+\item @isUPD_ADDR@ is true for update frames.
+\item @isTSO@ is true for @TSO@s.
+\item @isSTABLE_PTR_TABLE@ is true for the stable pointer table.
+\item @isSPARK_OBJECT@ is true for spark objects.
+\item @isBLOCKED_FETCH@ is true for blocked fetch objects.
+\item @isINVALID_INFOTYPE@ is true for all other info types.
 
 \end{itemize}
 
-\subsection{Vectored Returns}
+The following predicates detect other interesting properties:
 
-Many algebraic data types have more than one constructor.  For
-example, the @Maybe@ type is defined like this:
+\begin{itemize}
+
+\item @isPOINTED@ is true if an object has a pointed type.
+
+If an object is pointed, the following predicates may be true
+(otherwise they are false).  @isWHNF@ and @isUPDATEABLE@ are
+mutually exclusive.
+
+\begin{itemize} 
+\item @isWHNF@ is true if the object is in Weak Head Normal Form.  
+Note that unpointed objects are (arbitrarily) not considered to be in WHNF.
+
+@isWHNF@ is true for @PAP@s, @CONSTR@s, @FUN@s and some @BCO@s.
+
+\ToDo{Need to distinguish between whnf BCOs and non-whnf BCOs in their
+@INFO_TYPE@}
+
+\item @isBOTTOM@ is true if the object is known to be $\bot$.  It is
+true of @BH@s.  \note{I suspect we'll want to add other kinds of
+infotype which are known to be bottom later.}
+
+\item @isUPDATEABLE@ is true if the object may be overwritten with an
+ indirection object.
+
+@isUPDATEABLE@ is true for @THUNK@s, @AP@s and @BH@s.
+
+\end{itemize}
+
+It is possible for a pointed object to be neither updatable nor in
+WHNF.  For example, indirections.
+
+\item @isUNPOINTED@ is true if an object has an unpointed type.
+All such objects are boxed since only boxed objects have info pointers.
+
+It is true for @ARR_WORDS@, @ARR_PTRS@, @MUTVAR@, @MUTARR_PTRS@,
+@MUTARR_PTRS_FROZEN@, @FOREIGN@ objects, @MVAR@s and @IVAR@s.
+
+\item @isACTIVATION_FRAME@ is true for activation frames of all sorts.
+
+It is true for return addresses and update frames.
+\begin{itemize}
+\item @isVECTORED_RETADDR@ is true for vectored return addresses.
+\item @isDIRECT_RETADDR@ is true for direct return addresses.
+\end{itemize}
+
+\item @isADMINISTRATIVE@ is true for administrative objects:
+@TSO@s, the stable pointer table, spark objects and blocked fetches.
+
+\end{itemize}
+
+\begin{itemize}
+
+\item @isSTATIC@ is true for any statically allocated closure.
+
+\item @isMUTABLE@ is true for objects with mutable pointer fields:
+  @MUT_ARR@s, @MUTVAR@s, @MVAR@s and @IVAR@s.
+
+\item @isSparkable@ is true if the object can (and should) be sparked.
+It is true of updateable objects which are not in WHNF with the
+exception of @THUNK_SELECTOR@s and black holes.
+
+\end{itemize}
+
+As a minor optimisation, we might use the top bits of the @INFO_TYPE@
+field to ``cache'' the answers to some of these predicates.
+
+An indirection either points to HNF (post update); or is result of
+overwriting a FetchMe, in which case the thing fetched is either
+under evaluation (BH), or by now an HNF.  Thus, indirections get NoSpark flag.
+
+
+\iffalse
 @
-  data Maybe a = Nothing | Just a
+#define _NF			0x0001  /* Normal form  */
+#define _NS			0x0002	/* Don't spark  */
+#define _ST			0x0004	/* Is static    */
+#define _MU 	    	    	0x0008  /* Is mutable   */
+#define _UP			0x0010	/* Is updatable (but not mutable) */
+#define _BM			0x0020	/* Is a "primitive" array */
+#define _BH			0x0040	/* Is a black hole */
+#define _IN			0x0080	/* Is an indirection */
+#define _TH			0x0100	/* Is a thunk */
+
+
+
+SPEC	
+SPEC_N		SPEC | _NF | _NS
+SPEC_S		SPEC | _TH
+SPEC_U		SPEC | _UP | _TH
+		
+GEN	
+GEN_N		GEN | _NF | _NS
+GEN_S		GEN | _TH
+GEN_U		GEN | _UP | _TH
+		
+DYN	        _NF | _NS
+TUPLE		_NF | _NS | _BM
+DATA		_NF | _NS | _BM
+MUTUPLE		_NF | _NS | _MU | _BM
+IMMUTUPLE	_NF | _NS | _BM
+STATIC		_NS | _ST
+CONST		_NF | _NS
+CHARLIKE	_NF | _NS
+INTLIKE		_NF | _NS
+
+BH		_NS | _BH
+BH_N		BH
+BH_U		BH | _UP
+		
+BQ		_NS | _MU | _BH
+IND		_NS | _IN
+CAF		_NF | _NS | _ST | _IN
+
+FM		
+FETCHME		FM | _MU
+FMBQ		FM | _MU | _BH
+
+TSO		_MU
+
+STKO	
+STKO_DYNAMIC	STKO | _MU
+STKO_STATIC	STKO | _ST
+		
+SPEC_RBH	_NS | _MU | _BH
+GEN_RBH		_NS | _MU | _BH
+BF		_NS | _MU | _BH
+INTERNAL	
+
 @
-How does the return convention encode which of the two constructors is
-being returned?  A @case@ expression scrutinising a value of @Maybe@
-type would look like this: 
+\fi
+
+
+\subsection{Pointed Objects}
+
+All pointed objects can be entered.
+
+\subsubsection{Function closures}\label{sect:FUN}
+
+Function closures represent lambda abstractions.  For example,
+consider the top-level declaration:
 @
-  case E of 
-    Nothing -> ...
-    Just a  -> ...
+  f = \x -> let g = \y -> x+y
+	    in g x
 @
-Rather than pushing a return address before evaluating the scrutinee,
-@E@, the @case@ expression pushes (a pointer to) a {\em return
-vector}, a static table consisting of two code pointers: one for the
-@Just@ alternative, and one for the @Nothing@ alternative.  
+Both @f@ and @g@ are represented by function closures.  The closure
+for @f@ is {\em static} while that for @g@ is {\em dynamic}.
+
+The layout of a function closure is as follows:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+The data words (pointers and non-pointers) are the free variables of
+the function closure.  
+The number of pointers
+and number of non-pointers are stored in the @INFO_SM@ word, in the least significant
+and most significant half-word respectively.
 
+There are several different sorts of function closure, distinguished
+by their @INFO_TYPE@ field:
 \begin{itemize}
+\item @FUN@: a vanilla, dynamically allocated on the heap. 
 
-\item
+\item $@FUN_@p@_@np$: to speed up garbage collection a number of
+specialised forms of @FUN@ are provided, for particular $(p,np)$ pairs,
+where $p$ is the number of pointers and $np$ the number of non-pointers.
 
-The constructor @Nothing@ returns by jumping to the first item in the
-return vector with a pointer to a (statically built) Nothing closure
-in \Arg{1}.  
+\item @FUN_STATIC@.  Top-level, static, function closures (such as
+@f@ above) have a different
+layout than dynamic ones:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+{\em Fixed header}  & {\em Static object link} \\ \hline
+\end{tabular}
+\end{center}
+Static function closures have no free variables.  (However they may refer to other 
+static closures; these references are recorded in the function closure's SRT.)
+They have one field that is not present in dynamic closures, the {\em static object
+link} field.  This is used by the garbage collector in the same way that to-space
+is, to gather closures that have been determined to be live but that have not yet
+been scavenged.
+\note{Static function closures that have no static references, and hence
+a null SRT pointer, don't need the static object link field.  Is it worth
+taking advantage of this?  See @CONSTR_STATIC_NOCAF@.}
+\end{itemize}
 
-It might seem that we could avoid loading \Arg{1} in this case since the
-first item in the return vector will know that @Nothing@ was returned
-(and can easily access the Nothing closure in the (unlikely) event
-that it needs it.  The only reason we load \Arg{1} is in case we have to
-perform an update (section~\ref{sect:data-updates}).
+Each lambda abstraction, $f$, in the STG program has its own private info table.
+The following labels are relevant:
+\begin{itemize}
+\item $f$@_info@  is $f$'s info table.
+\item $f$@_entry@ is $f$'s slow entry point (i.e. the entry code of its
+info table; so it will label the same byte as $f$@_info@).
+\item $f@_fast_@k$ is $f$'s fast entry point.  $k$ is the number of arguments
+$f$ takes; encoding this number in the fast-entry label occasionally catches some nasty
+code-generation errors.
+\end{itemize}
 
-\item 
+\subsubsection{Data Constructors}\label{sect:CONSTR}
+
+Data-constructor closures represent values constructed with
+algebraic data type constructors.
+The general layout of data constructors is the same as that for function
+closures.  That is
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+
+The SRT pointer in a data constructor's info table is used for the
+constructor tag, since a constructor never has any static references.
+
+There are several different sorts of constructor:
+\begin{itemize}
+\item @CONSTR@: a vanilla, dynamically allocated constructor.
+\item @CONSTR_@$p$@_@$np$: just like $@FUN_@p@_@np$.
+\item @CONSTR_INTLIKE@.
+A dynamically-allocated heap object that looks just like an @Int@.  The 
+garbage collector checks to see if it can common it up with one of a fixed
+set of static int-like closures, thus getting it out of the dynamic heap
+altogether.
+
+\item @CONSTR_CHARLIKE@:  same deal, but for @Char@.
+
+\item @CONSTR_STATIC@ is similar to @FUN_STATIC@, with the complication that
+the layout of the constructor must mimic that of a dynamic constructor,
+because a static constructor might be returned to some code that unpacks it.
+So its layout is like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Static object link}\\ \hline
+\end{tabular}
+\end{center}
+The static object link, at the end of the closure, serves the same purpose
+as that for @FUN_STATIC@.  The pointers in the static constructor can point
+only to other static closures.
+
+The static object link occurs last in the closure so that static
+constructors can store their data fields in exactly the same place as
+dynamic constructors.
+
+\item @CONSTR_STATIC_NOCAF@.  A statically allocated data constructor
+that guarantees not to point (directly or indirectly) to any CAF
+(section~\ref{sect:CAF}).  This means it does not need a static object
+link field.  Since we expect that there might be quite a lot of static
+constructors this optimisation makes sense.  Furthermore, the @NOCAF@
+tag allows the compiler to indicate that no CAFs can be reached
+anywhere {\em even indirectly}.
 
-The constructor @Just@ returns by jumping to the second element of the
-return vector with a pointer to the closure in \Arg{1}.  
 
 \end{itemize}
 
-In this way no test need be made to see which constructor returns;
-instead, execution resumes immediately in the appropriate branch of
-the @case@.
+For each data constructor $Con$, two
+info tables are generated:
+\begin{itemize}
+\item $Con$@_info@ labels $Con$'s dynamic info table, 
+shared by all dynamic instances of the constructor.
+\item $Con$@_static@ labels $Con$'s static info table, 
+shared by all static instances of the constructor.
+\end{itemize}
 
-\subsection{Direct Returns}
+\subsubsection{Thunks}
+\label{sect:THUNK}
+\label{sect:THUNK_SEL}
 
-When a datatype has a large number of constructors, it may be
-inappropriate to use vectored returns.  The vector tables may be
-large and sparse, and it may be better to identify the constructor
-using a test-and-branch sequence on the tag.  For this reason, we
-provide an alternative return convention, called a \emph{direct
-return}.
+A thunk represents an expression that is not obviously in head normal 
+form.  For example, consider the following top-level definitions:
+@
+  range = between 1 10
+  f = \x -> let ys = take x range
+	    in sum ys
+@
+Here the right-hand sides of @range@ and @ys@ are both thunks; the former
+is static while the latter is dynamic.
 
-In a direct return, the return address pushed on the stack really is a
-code pointer.  The returning code loads a pointer to the closure being
-returned in \Arg{1} as usual, and also loads the tag into \Arg{2}.
-The code at the return address will test the tag and jump to the
-appropriate code for the case branch.
+The layout of a thunk is the same as that for a function closure,
+except that it may have some words of ``slop'' at the end to make sure
+that it has 
+at least @MIN_UPD_PAYLOAD@ words in addition to its
+fixed header.
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Slop} \\ \hline
+\end{tabular}
+\end{center}
+The @INFO_SM@ word contains the same information as for function
+closures; that is, number of pointers and number of non-pointers (excluding slop).
 
-The choice of whether to use a vectored return or a direct return is
-made on a type-by-type basis --- up to a certain maximum number of
-constructors imposed by the update mechanism
-(section~\ref{sect:data-updates}).
+A thunk differs from a function closure in that it can be updated.
 
-Single-constructor data types also use direct returns, although in
-that case there is no need to return a tag in \Arg{2}.
+There are several forms of thunk:
+\begin{itemize}
+\item @THUNK@: a vanilla, dynamically allocated thunk.
+The garbage collection code for thunks whose
+pointer + non-pointer words is less than @MIN_UPD_PAYLOAD@ differs from
+that for function closures and data constructors, because the GC code
+has to account for the slop.
+\item $@THUNK_@p@_@np$.  Similar comments apply.
+\item @THUNK_STATIC@.  A static thunk is also known as 
+a {\em constant applicative form}, or {\em CAF}.
 
-\ToDo{Say whether we pop the return address before returning}
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Slop} & {\em Static object link}\\ \hline
+\end{tabular}
+\end{center}
 
-\ToDo{Stack stubbing?}
+\item @THUNK_SELECTOR@ is a (dynamically allocated) thunk
+whose entry code performs a simple selection operation from
+a data constructor drawn from a single-constructor type.  For example,
+the thunk
+@
+	x = case y of (a,b) -> a
+@
+is a selector thunk.  A selector thunk is laid out like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+{\em Fixed header}  & {\em Selectee pointer} \\ \hline
+\end{tabular}
+\end{center}
+The @INFO_SM@ word contains the byte offset of the desired word in
+the selectee.  Note that this is different from all other thunks.
 
-\subsection{Updates}
-\label{sect:data-updates}
+The garbage collector ``peeks'' at the selectee's
+tag (in its info table).  If it is evaluated, then it goes ahead and do
+the selection, and then behaves just as if the selector thunk was an
+indirection to the selected field.
+If it is not
+evaluated, it treats the selector thunk like any other thunk of that
+shape.  [Implementation notes.  
+Copying: only the evacuate routine needs to be special.
+Compacting: only the PRStart (marking) routine needs to be special.]
+\end{itemize}
 
-The entry code for an updatable thunk (which must be of arity 0):
 
+The only label associated with a thunk is its info table:
+\begin{description}
+\item[$f$@_info@] is $f$'s info table.
+\end{description}
+
+
+\subsubsection{Byte-Code Objects}
+\label{sect:BCO}
+
+A Byte-Code Object (BCO) is a container for a a chunk of byte-code,
+which can be executed by Hugs.  The byte-code represents a
+supercombinator in the program: when hugs compiles a module, it
+performs lambda lifting and each resulting supercombinator becomes a
+byte-code object in the heap.
+
+There are two kinds of BCO: a standard @BCO@ which has an arity of one
+or more, and a @BCO_CAF@ which takes no arguments and can be updated.
+When a @BCO_CAF@ is updated, the code is thrown away!
+
+The semantics of BCOs are described in Section
+\ref{sect:hugs-heap-objects}.  A BCO has the following structure:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}
+\hline 
+\emph{Fixed Header} & \emph{Layout} & \emph{Offset} & \emph{Size} &
+\emph{Literals} & \emph{Byte code} \\
+\hline
+\end{tabular}
+\end{center}
+
+\noindent where:
 \begin{itemize}
-\item copies the free variables out of the thunk into registers or
-  onto the stack.
-\item pushes an {\em update frame} onto the stack.
+\item The entry code is a static code fragment/info table that
+returns to the scheduler to invoke Hugs (Section
+\ref{sect:ghc-to-hugs-switch}).
+\item \emph{Layout} contains the number of pointer literals in the
+\emph{Literals} field.
+\item \emph{Offset} is the offset to the byte code from the start of
+the object.
+\item \emph{Size} is the number of words of byte code in the object.
+\item \emph{Literals} contains any pointer and non-pointer literals used in
+the byte-codes (including jump addresses), pointers first.
+\item \emph{Byte code} contains \emph{Size} words of non-pointer byte
+code.
+\end{itemize}
+
+\subsubsection{Partial applications (PAPs)}\label{sect:PAP}
+
+A partial application (PAP) represents a function applied to too few arguments.
+It is only built as a result of updating after an argument-satisfaction
+check failure.  A PAP has the following shape:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+{\em Fixed header}  & {\em No of arg words} & {\em Function closure} & {\em Arg stack} \\ \hline
+\end{tabular}
+\end{center}
+The ``arg stack'' is a copy of the chunk of stack above the update
+frame; ``no of arg words'' tells how many words it consists of.  The
+function closure is (a pointer to) the closure for the function whose
+argument-satisfaction check failed.
+
+There is just one standard form of PAP with @INFO_TYPE@ = @PAP@.
+There is just one info table too, called @PAP_info@.
+Its entry code simply copies the arg stack chunk back on top of the
+stack and enters the function closure.  (It has to do a stack overflow test first.)
+
+PAPs are also used to implement Hugs functions (where the arguments are free variables).
+PAPs generated by Hugs can be static.
+
+\subsubsection{@AP@ objects}
+\label{sect:AP}
+
+@AP@ objects are used to represent thunks built by Hugs.  The only distintion between
+an @AP@ and a @PAP@ is that an @AP@ is updateable.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed Header} & {\em No of arg words} & {\em Function closure} & {\em Arg stack} \\
+\hline
+\end{tabular}
+\end{center}
+
+The entry code pushes an update frame, copies the arg stack chunk on
+top of the stack, and enters the function closure.  (It has to do a
+stack overflow test first.)
+
+The ``arg stack'' is a block of (tagged) arguments representing the
+free variables of the thunk; ``no of arg words'' tells how many words
+it consists of.  The function closure is (a pointer to) the closure
+for the thunk.  The argument stack may be empty if the thunk has no
+free variables.
+
+
+\subsubsection{Indirections}
+\label{sect:IND}
+\label{sect:IND_OLDGEN}
+
+Indirection closures just point to other closures. They are introduced
+when a thunk is updated to point to its value. 
+The entry code for all indirections simply enters the closure it points to.
+
+There are several forms of indirection:
+\begin{description}
+\item[@IND@] is the vanilla, dynamically-allocated indirection.
+It is removed by the garbage collector. It has the following
+shape:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+{\em Fixed header} & {\em Target closure} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@IND_OLDGEN@] is the indirection used to update an old-generation
+thunk. Its shape is like this:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+{\em Fixed header} & {\em Mutable link field} & {\em Target closure} \\ \hline
+\end{tabular}
+\end{center}
+It contains a {\em mutable link field} that is used to string together
+all old-generation indirections that might have a pointer into
+the new generation.
+
+
+\item[@IND_PERMANENT@ and @IND_OLDGEN_PERMANENT@.]
+for lexical profiling, it is necessary to maintain cost centre
+information in an indirection, so ``permanent indirections'' are
+retained forever.  Otherwise they are just like vanilla indirections.
+\note{If a permanent indirection points to another permanent
+indirection or a @CONST@ closure, it is possible to elide the indirection
+since it will have no effect on the profiler.}
+\note{Do we still need @IND@ and @IND_OLDGEN@
+in the profiling build, or can we just make
+do with one pair whose behaviour changes when profiling is built?}
+
+\item[@IND_STATIC@] is used for overwriting CAFs when they have been
+evaluated.  Static indirections are not removed by the garbage
+collector; and are statically allocated outside the heap (and should
+stay there).  Their static object link field is used just as for
+@FUN_STATIC@ closures.
 
-An update frame is a small activation record consisting of
 \begin{center}
 \begin{tabular}{|l|l|l|}
 \hline
-{\em Fixed header} & {\em Update Frame link} & {\em Updatee} \\
+{\em Fixed header} & {\em Target closure} & {\em Static object link} \\
 \hline
 \end{tabular}
 \end{center}
 
-\note{In the semantics part of the STG paper (section 5.6), an update
-frame consists of everything down to the last update frame on the
-stack.  This would make sense too --- and would fit in nicely with
-what we're going to do when we add support for speculative
-evaluation.}
-\ToDo{I think update frames contain cost centres sometimes}
+\end{description}
 
-\item 
-If we are doing ``eager blackholing,'' we then overwrite the thunk
-with a black hole.  Otherwise, we leave it to the garbage collector to
-black hole the thunk.
+\subsubsection{Activation Records}
 
-\item 
-Start evaluating the body of the expression.
+Activation records are parts of the stack described by return address
+info tables (closures with @INFO_TYPE@ values of @RET_SMALL@,
+@RET_VEC_SMALL@, @RET_BIG@ and @RET_VEC_BIG@). They are described in
+section~\ref{sect:activation-records}.
+
+
+\subsubsection{Black holes, MVars and IVars}
+\label{sect:BH}
+\label{sect:MVAR}
+\label{sect:IVAR}
+
+Black hole closures are used to overwrite closures currently being
+evaluated. They inform the garbage collector that there are no live
+roots in the closure, thus removing a potential space leak.  
+
+Black holes also become synchronization points in the threaded world.
+They contain a pointer to a list of blocked threads to be awakened
+when the black hole is updated (or @NULL@ if the list is empty).
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline 
+{\em Fixed header} & {\em Mutable link} & {\em Blocked thread link} \\
+\hline
+\end{tabular}
+\end{center}
+The {\em Blocked thread link} points to the TSO of the first thread
+waiting for the value of this thunk.  All subsequent TSOs in the list
+are linked together using their @TSO_LINK@ field.
+
+When the blocking queue is non-@NULL@, the black hole must be added to
+the mutables list since the TSOs on the list may contain pointers into
+the new generation.  There is no need to clutter up the mutables list
+with black holes with empty blocking queues.
+
+\ToDo{MVars}
+
+
+\subsubsection{FetchMes}\label{sect:FETCHME}
 
+In the parallel systems, FetchMes are used to represent pointers into
+the global heap.  When evaluated, the value they point to is read from
+the global heap.
+
+\ToDo{Describe layout}
+
+Because there may be offsets into these arrays, a primitive array
+cannot be handled as a FetchMe in the parallel system, but must be
+shipped in its entirety if its parent closure is shipped.
+
+
+
+\subsection{Unpointed Objects}
+
+A variable of unpointed type is always bound to a {\em value}, never to a {\em thunk}.
+For this reason, unpointed objects cannot be entered.
+
+A {\em value} may be:
+\begin{itemize}
+\item {\em Boxed}, i.e.~represented indirectly by a pointer to a heap object (e.g.~foreign objects, arrays); or
+\item {\em Unboxed}, i.e.~represented directly by a bit-pattern in one or more registers (e.g.~@Int#@ and @Float#@).
 \end{itemize}
+All {\em pointed} values are {\em boxed}.  
 
-When the expression finishes evaluation, it will enter the update
-frame on the top of the stack.  Since the returner doesn't know
-whether it is entering a normal return address/vector or an update
-frame, we follow exactly the same conventions as return addresses and
-return vectors.  That is, on entering the update frame:
+\subsubsection{Immutable Objects}
+\label{sect:ARR_WORDS1}
+\label{sect:ARR_PTRS}
 
-\begin{itemize} 
-\item The value of the thunk is in \Arg{1}.  (Recall that only thunks
-are updateable and that thunks return just one value.)
+\begin{description}
+\item[@ARR_WORDS@] is a variable-sized object consisting solely of
+non-pointers.  It is used for arrays of all
+sorts of things (bytes, words, floats, doubles... it doesn't matter).
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+{\em Fixed Hdr} & {\em No of non-pointers} & {\em Non-pointers\ldots}	\\ \hline
+\end{tabular}
+\end{center}
 
-\item If the data type is a direct-return type rather than a
-vectored-return type, then the tag is in \Arg{2}.
+\item[@ARR_PTRS@] is an immutable, variable sized array of pointers.
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+{\em Fixed Hdr} & {\em Mutable link} & {\em No of pointers} & {\em Pointers\ldots}	\\ \hline
+\end{tabular}
+\end{center}
+The mutable link is present so that we can easily freeze and thaw an
+array (by changing the header and adding/removing the array to the
+mutables list).
+
+\end{description}
+
+\subsubsection{Mutable Objects}
+\label{sect:mutables}
+\label{sect:ARR_WORDS2}
+\label{sect:MUTVAR}
+\label{sect:MUTARR_PTRS}
+\label{sect:MUTARR_PTRS_FROZEN}
+
+Some of these objects are {\em mutable}; they represent objects which
+are explicitly mutated by Haskell code through the @ST@ monad.
+They're not used for thunks which are updated precisely once.
+Depending on the garbage collector, mutable closures may contain extra
+header information which allows a generational collector to implement
+the ``write barrier.''
+
+\begin{description}
+
+\item[@ARR_WORDS@] is also used to represent {\em mutable} arrays of
+bytes, words, floats, doubles, etc.  It's possible to use the same
+object type because even generational collectors don't need to
+distinguish them.
+
+\item[@MUTVAR@] is a mutable variable.
+\begin{center}
+\begin{tabular}{|c|c|c|}
+\hline
+{\em Fixed Hdr} & {\em Mutable link} & {\em Pointer} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MUTARR_PTRS@] is a mutable array of pointers.
+Such an array may be {\em frozen}, becoming an @SM_MUTARR_PTRS_FROZEN@, with a
+different info-table.
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+{\em Fixed Hdr} & {\em Mutable link} & {\em No of ptrs} & {\em Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MUTARR_PTRS_FROZEN@] is a frozen @MUTARR_PTRS@ closure.
+The garbage collector converts @MUTARR_PTRS_FROZEN@ to @ARR_PTRS@ as it removes them from
+the mutables list.
+
+\end{description}
+
+
+\subsubsection{Foreign Objects}\label{sect:FOREIGN}
+
+Here's what a ForeignObj looks like:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline 
+{\em Fixed header} & {\em Data} & {\em Free Routine} & {\em Foreign object link} \\
+\hline
+\end{tabular}
+\end{center}
+
+The @FreeRoutine@ is a reference to the finalisation routine to call
+when the @ForeignObj@ becomes garbage.  If @freeForeignObject@ is
+called on a Foreign Object, the @FreeRoutine@ is set to zero and the
+garbage collector will not attempt to call @FreeRoutine@ when the 
+object becomes garbage.
+
+The Foreign object link is a link to the next foreign object in the
+list.  This list is traversed at the end of garbage collection: if an
+object is about to be deallocated (e.g.~it was not marked or
+evacuated), the free routine is called and the object is deleted from
+the list.  
+
+
+The remaining objects types are all administrative --- none of them may be entered.
+
+\subsection{Other weird objects}
+\label{sect:SPARK}
+\label{sect:BLOCKED_FETCH}
+
+\begin{description}
+\item[@BlockedFetch@ heap objects (`closures')] (parallel only)
+
+@BlockedFetch@s are inbound fetch messages blocked on local closures.
+They arise as entries in a local blocking queue when a fetch has been
+received for a local black hole.  When awakened, we look at their
+contents to figure out where to send a resume.
+
+A @BlockedFetch@ closure has the form:
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+{\em Fixed header} & link & node & gtid & slot & weight \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Spark Closures] (parallel only)
+
+Spark closures are used to link together all closures in the spark pool.  When
+the current processor is idle, it may choose to speculatively evaluate some of
+the closures in the pool.  It may also choose to delete sparks from the pool.
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+{\em Fixed header} & {\em Spark pool link} & {\em Sparked closure} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Slop Objects]\label{sect:slop-objects}
+
+Slop objects are used to overwrite the end of an updatee if it is
+larger than an indirection.  Normal slop objects consist of an info
+pointer a size word and a number of slop words.  
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+{\em Info Pointer} & {\em Size} & {\em Slop Words} \\ \hline
+\end{tabular}
+\end{center}
+
+This is too large for single word slop objects which consist of a
+single info table.
+
+Note that slop objects only contain an info pointer, not a standard
+fixed header.  This doesn't cause problems because slop objects are
+always unreachable --- they can only be accessed by linearly scanning
+the heap.
+
+\end{description}
+
+\subsection{Thread State Objects (TSOs)}\label{sect:TSO}
+
+\ToDo{This is very out of date.  We now embed a single stack object
+within the TSO.  TSOs include an ID number which can be used to
+generate a hash value.  The gransim, profiling and ticky info is
+surely bogus.}
+
+In the multi-threaded system, the state of a suspended thread is
+packed up into a Thread State Object (TSO) which contains all the
+information needed to restart the thread and for the garbage collector
+to find all reachable objects.  When a thread is running, it may be
+``unpacked'' into machine registers and various other memory locations
+to provide faster access.
+
+Single-threaded systems don't really {\em need\/} TSOs --- but they do
+need some way to tell the storage manager about live roots so it is
+convenient to use a single TSO to store the mutator state even in
+single-threaded systems.
+
+Rather than manage TSOs' alloc/dealloc, etc., in some {\em ad hoc}
+way, we instead alloc/dealloc/etc them in the heap; then we can use
+all the standard garbage-collection/fetching/flushing/etc machinery on
+them.  So that's why TSOs are ``heap objects,'' albeit very special
+ones.
+\begin{center}
+\begin{tabular}{|l|l|}
+   \hline {\em Fixed header}
+\\ \hline @TSO_LINK@
+\\ \hline @TSO_WHATNEXT@
+\\ \hline @TSO_WHATNEXT_INFO@ 
+\\ \hline @TSO_STACK@ 
+\\ \hline {\em Exception Handlers}
+\\ \hline {\em Ticky Info}
+\\ \hline {\em Profiling Info}
+\\ \hline {\em Parallel Info}
+\\ \hline {\em GranSim Info}
+\\ \hline
+\end{tabular}
+\end{center}
+The contents of a TSO are:
+\begin{itemize}
+
+\item A pointer (@TSO_LINK@) used to maintain a list of threads with a similar
+  state (e.g.~all runnable, all sleeping, all blocked on the same black
+  hole, all blocked on the same MVar, etc.)
+
+\item A word (@TSO_WHATNEXT@) which is in suspended threads to record
+ how to awaken it.  This typically requires a program counter which is stored
+ in the pointer @TSO_WHATNEXT_INFO@
+
+\item A pointer (@TSO_STACK@) to the top stack block.
+
+\item Optional information for ``Ticky Ticky'' statistics: @TSO_STK_HWM@ is
+  the maximum number of words allocated to this thread.
+
+\item Optional information for profiling: 
+  @TSO_CCC@ is the current cost centre.
+
+\item Optional information for parallel execution:
+\begin{itemize}
+
+\item The types of threads (@TSO_TYPE@):
+\begin{description}
+\item[@T_MAIN@]     Must be executed locally.
+\item[@T_REQUIRED@] A required thread  -- may be exported.
+\item[@T_ADVISORY@] An advisory thread -- may be exported.
+\item[@T_FAIL@]     A failure thread   -- may be exported.
+\end{description}
+
+\item I've no idea what else
 
-\item The update frame is still on the stack.
 \end{itemize}
 
-We can safely share a single statically-compiled update function
-between all types.  However, the code must be able to handle both
-vectored and direct-return datatypes.  This is done by arranging that
-the update code looks like this:
+\item Optional information for GranSim execution:
+\begin{itemize}
+\item locked         
+\item sparkname	 
+\item started at	 
+\item exported	 
+\item basic blocks	 
+\item allocs	 
+\item exectime	 
+\item fetchtime	 
+\item fetchcount	 
+\item blocktime	 
+\item blockcount	 
+\item global sparks	 
+\item local sparks	 
+\item queue		 
+\item priority	 
+\item clock          (gransim light only)
+\end{itemize}
+
 
+Here are the various queues for GrAnSim-type events.
 @
-                |       ^       |
-                | return vector |
-                |---------------|
-                |  fixed-size   |
-                |  info table   |
-                |---------------|  <- update code pointer
-                |  update code  |
-                |       v       |
+Q_RUNNING   
+Q_RUNNABLE  
+Q_BLOCKED   
+Q_FETCHING  
+Q_MIGRATING 
 @
 
-Each entry in the return vector (which is large enough to cover the
-largest vectored-return type) points to the update code.
+\end{itemize}
+
+\subsection{Stack Objects}
+\label{sect:STACK_OBJECT}
+\label{sect:stacks}
+
+These are ``stack objects,'' which are used in the threaded world as
+the stack for each thread is allocated from the heap in smallish
+chunks.  (The stack in the sequential world is allocated outside of
+the heap.)
+
+Each reduction thread has to have its own stack space.  As there may
+be many such threads, and as any given one may need quite a big stack,
+a naive give-'em-a-big-stack-and-let-'em-run approach will cost a {\em
+lot} of memory.
+
+Our approach is to give a thread a small stack space, and then link
+on/off extra ``chunks'' as the need arises.  Again, this is a
+storage-management problem, and, yet again, we choose to graft the
+whole business onto the existing heap-management machinery.  So stack
+objects will live in the heap, be garbage collected, etc., etc..
+
+A stack object is laid out like this:
+
+\begin{center}
+\begin{tabular}{|l|}
+\hline
+{\em Fixed header} 
+\\ \hline
+{\em Link to next stack object (0 for last)}
+\\ \hline
+{\em N, the payload size in words}
+\\ \hline
+{\em @Sp@ (byte offset from head of object)}
+\\ \hline
+{\em @Su@ (byte offset from head of object)}
+\\ \hline
+{\em Payload (N words)}
+\\ \hline
+\end{tabular}
+\end{center}
+
+\ToDo{Are stack objects on the mutable list?}
+
+The stack grows downwards, towards decreasing
+addresses.  This makes it easier to print out the stack
+when debugging, and it means that a return address is
+at the lowest address of the chunk of stack it ``knows about''
+just like an info pointer on a closure.
+
+The garbage collector needs to be able to find all the
+pointers in a stack.  How does it do this?
 
-The update code:
 \begin{itemize}
-\item overwrites the {\em updatee} with an indirection to \Arg{1};
-\item loads @Su@ from the Update Frame link;
-\item removes the update frame from the stack; and 
-\item enters \Arg{1}.
+
+\item Within the stack there are return addresses, pushed
+by @case@ expressions.  Below a return address (i.e. at higher
+memory addresses, since the stack grows downwards) is a chunk
+of stack that the return address ``knows about'', namely the
+activation record of the currently running function.
+
+\item Below each such activation record is a {\em pending-argument
+section}, a chunk of
+zero or more words that are the arguments to which the result
+of the function should be applied.  The return address does not
+statically
+``know'' how many pending arguments there are, or their types.
+(For example, the function might return a result of type $\alpha$.)
+
+\item Below each pending-argument section is another return address,
+and so on.  Actually, there might be an update frame instead, but we
+can consider update frames as a special case of a return address with
+a well-defined activation record.
+
+\ToDo{Doesn't it {\em have} to be an update frame?  After all, the arg
+satisfaction check is @Su - Sp >= ...@.}
+
 \end{itemize}
 
-We enter \Arg{1} again, having probably just come from there, because
-it knows whether to perform a direct or vectored return.  This could
-be optimised by compiling special update code for each slot in the
-return vector, which performs the correct return.
+The game plan is this.  The garbage collector
+walks the stack from the top, traversing pending-argument sections and
+activation records alternately.  Next we discuss how it finds
+the pointers in each of these two stack regions.
 
-\subsection{Semi-tagging}
-\label{sect:semi-tagging}
 
-When a @case@ expression evaluates a variable that might be bound
-to a thunk it is often the case that the scrutinee is already evaluated.
-In this case we have paid the penalty of (a) pushing the return address (or
-return vector address) on the stack, (b) jumping through the info pointer
-of the scrutinee, and (c) returning by an indirect jump through the
-return address on the stack.
+\subsubsection{Activation records}\label{sect:activation-records}
 
-If we knew that the scrutinee was already evaluated we could generate
-(better) code which simply jumps to the appropriate branch of the
-@case@ with a pointer to the scrutinee in \Arg{1}.  (For direct
-returns to multiconstructor datatypes, we might also load the tag into
-\Arg{2}).
+An {\em activation record} is a contiguous chunk of stack,
+with a return address as its first word, followed by as many
+data words as the return address ``knows about''.  The return
+address is actually a fully-fledged info pointer.  It points
+to an info table, replete with:
 
-An obvious idea, therefore, is to test dynamically whether the heap
-closure is a value (using the tag in the info table).  If not, we
-enter the closure as usual; if so, we jump straight to the appropriate
-alternative.  Here, for example, is pseudo-code for the expression
-@(case x of { (a,_,c) -> E }@:
-@
-      \Arg{1} = <pointer to x>;
-      tag = \Arg{1}->entry->tag;
-      if (isWHNF(tag)) {
-          Sp--;  \\ insert space for return address
-          goto ret;
-      }
-      push(ret);           
-      goto \Arg{1}->entry;
-      
-      <info table for return address goes here>
-ret:  a = \Arg{1}->data1; \\ suck out a and c to avoid space leak
-      c = \Arg{1}->data3;
-      <code for E2>
-@
-and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
-@
-      \Arg{1} = <pointer to x>;
-      tag = \Arg{1}->entry->tag;
-      if (isWHNF(tag)) {
-          Sp--;  \\ insert space for return address
-          goto retvec[tag];
-      }
-      push(retinfo);          
-      goto \Arg{1}->entry;
-      
-      .addr ret2
-      .addr ret1
-retvec:           \\ reversed return vector
-      <return info table for case goes here>
-retinfo:
-      panic("Direct return into vectored case");
-      
-ret1: <code for E1>
+\begin{itemize}
+\item entry code (i.e. the code to return to).
+\item @INFO_TYPE@ is either @RET_SMALL/RET_VEC_SMALL@ or @RET_BIG/RET_VEC_BIG@, depending
+on whether the activation record has more than 32 data words (\note{64 for 8-byte-word architectures}) and on whether 
+to use a direct or a vectored return.
+\item @INFO_SM@ for @RET_SMALL@ is a bitmap telling the layout
+of the activation record, one bit per word.  The least-significant bit
+describes the first data word of the record (adjacent to the fixed
+header) and so on.  A ``@1@'' indicates a non-pointer, a ``@0@''
+indicates
+a pointer.  We don't need to indicate exactly how many words there
+are,
+because when we get to all zeros we can treat the rest of the 
+activation record as part of the next pending-argument region.
 
-ret2: x  = \Arg{1}->head;
-      xs = \Arg{1}->tail;
-      <code for E2>
-@
-There is an obvious cost in compiled code size (but none in the size
-of the bytecodes).  There is also a cost in execution time if we enter
-more thunks than data constructors.
+For @RET_BIG@ the @INFO_SM@ field points to a block of bitmap
+words, starting with a word that tells how many words are in
+the block.
 
-Both the direct and vectored returns are easily modified to chase chains
-of indirections too.  In the vectored case, this is most easily done by
-making sure that @IND = TAG_1 - 1@, and adding an extra field to every
-return vector.  In the above example, the indirection code would be
-@
-ind:  \Arg{1} = \Arg{1}->next;
-      goto ind_loop;
-@
-where @ind_loop@ is the second line of code.
+\item @INFO_SRT@ is the Static Reference Table for the return
+address (Section~\ref{sect:srt}).
+\end{itemize}
 
-Note that we have to leave space for a return address since the return
-address expects to find one.  If the body of the expression requires a
-heap check, we will actually have to write the return address before
-entering the garbage collector.
+The activation record is a fully fledged closure too.
+As well as an info pointer, it has all the other attributes of
+a fixed header (Section~\ref{sect:fixed-header}) including a saved cost
+centre which is reloaded when the return address is entered.
 
+In other words, all the attributes of closures are needed for
+activation records, so it's very convenient to make them look alike.
 
-\subsection{Heap and Stack Checks}
-\label{sect:heap-and-stack-checks}
 
-The storage manager detects that it needs to garbage collect the old
-generation when the evaluator requests a garbage collection without
-having moved the heap pointer since the last garbage collection.  It
-is therefore important that the GC routines {\em not} move the heap
-pointer unless the heap check fails.  This is different from what
-happens in the current STG implementation.
+\subsubsection{Pending arguments}
 
-Assuming that the stack can never shrink, we perform a stack check
-when we enter a closure but not when we return to a return
-continuation.  This doesn't work for heap checks because we cannot
-predict what will happen to the heap if we call a function.
+So that the garbage collector can correctly identify pointers
+in pending-argument sections we explicitly tag all non-pointers.
+Every non-pointer in a pending-argument section is preceded
+(at the next lower memory word) by a one-word byte count that
+says how many bytes to skip over (excluding the tag word).
 
-If we wish to allow the stack to shrink, we need to perform a stack
-check whenever we enter a return continuation.  Most of these checks
-could be eliminated if the storage manager guaranteed that a stack
-would always have 1000 words (say) of space after it was shrunk.  Then
-we can omit stack checks for less than 1000 words in return
-continuations.
+The garbage collector traverses a pending argument section from 
+the top (i.e. lowest memory address).  It looks at each word in turn:
 
-When an argument satisfaction check fails, we need to push the closure
-(in R1) onto the stack - so we need to perform a stack check.  The
-problem is that the argument satisfaction check occurs \emph{before}
-the stack check.  The solution is that the caller of a slow entry
-point or closure will guarantee that there is at least one word free
-on the stack for the callee to use.  
+\begin{itemize}
+\item If it is less than or equal to a small constant @MAX_STACK_TAG@
+then
+it treats it as a tag heralding zero or more words of non-pointers,
+so it just skips over them.
 
-Similarily, if a heap or stack check fails, we need to push the arguments
-and closure onto the stack.  If we just came from the slow entry point, 
-there's certainly enough space and it is the responsibility of anyone
-using the fast entry point to guarantee that there is enough space.
+\item If it points to the code segment, it must be a return
+address, so we have come to the end of the pending-argument section.
 
-\ToDo{Be more precise about how much space is required - document it
-in the calling convention section.}
+\item Otherwise it must be a bona fide heap pointer.
+\end{itemize}
 
-\subsection{Handling interrupts/signals}
 
-@
-May have to keep C stack pointer in register to placate OS?
-May have to revert black holes - ouch!
-@
+\subsection{The Stable Pointer Table}\label{sect:STABLEPTR_TABLE}
+
+A stable pointer is a name for a Haskell object which can be passed to
+the external world.  It is ``stable'' in the sense that the name does
+not change when the Haskell garbage collector runs---in contrast to
+the address of the object which may well change.
 
-\section{Interpreted Execution}
+A stable pointer is represented by an index into the
+@StablePointerTable@.  The Haskell garbage collector treats the
+@StablePointerTable@ as a source of roots for GC.
+
+In order to provide efficient access to stable pointers and to be able
+to cope with any number of stable pointers (eg $0 \ldots 100000$), the
+table of stable pointers is an array stored on the heap and can grow
+when it overflows.  (Since we cannot compact the table by moving
+stable pointers about, it seems unlikely that a half-empty table can
+be reduced in size---this could be fixed if necessary by using a
+hash table of some sort.)
+
+In general a stable pointer table closure looks like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+{\em Fixed header} & {\em No of pointers} & {\em Free} & $SP_0$ & \ldots & $SP_{n-1}$ 
+\\\hline
+\end{tabular}
+\end{center}
+
+The fields are:
+\begin{description}
+
+\item[@NPtrs@:] number of (stable) pointers.
+
+\item[@Free@:] the byte offset (from the first byte of the object) of the first free stable pointer.
+
+\item[$SP_i$:] A stable pointer slot.  If this entry is in use, it is
+an ``unstable'' pointer to a closure.  If this entry is not in use, it
+is a byte offset of the next free stable pointer slot.
+
+\end{description}
+
+When a stable pointer table is evacuated
+\begin{enumerate}
+\item the free list entries are all set to @NULL@ so that the evacuation
+  code knows they're not pointers;
+
+\item The stable pointer slots are scanned linearly: non-@NULL@ slots
+are evacuated and @NULL@-values are chained together to form a new free list.
+\end{enumerate}
+
+There's no need to link the stable pointer table onto the mutable
+list because we always treat it as a root.
+
+
+
+\section{The Bytecode Evaluator}
 
 This section describes how the Hugs interpreter interprets code in the
 same environment as compiled code executes.  Both evaluation models
@@ -1221,7 +2346,7 @@ heap and restricting BCOs to a single basic block.
 
 \end{itemize}
 
-\subsubsection{Hugs Info Tables}
+\subsection{Hugs Info Tables}
 
 Hugs requires the following info tables and closures:
 \begin{description}
@@ -1229,12 +2354,14 @@ Hugs requires the following info tables and closures:
 
 Contains both a vectored return table and a direct entry point.  All
 entry points are the same: they rearrange the stack to match the Hugs
-return convention (section~{sect:hugs-return-convention}) and return
+return convention (section~\label{sect:hugs-return-convention}) and return
 to the scheduler.  When the scheduler restarts the thread, it will
 find a BCO on top of the stack and will enter the Hugs interpreter.
 
 \item [@UPD_RET@].
 
+This is just the standard info table for an update frame.
+
 \item [Constructors].
 
 The entry code for a constructor jumps to a generic entry point in the
@@ -1248,12 +2375,27 @@ info tables must have enough info to make that decision.
 
 \item [Selectors].
 
--- doesn't generate them itself but it ought to recognise them
+Hugs doesn't generate them itself but it ought to recognise them
 
 \item [Complex primops].
 
-\end{description}
+Some of the primops are too complex for GHC to generate inline.
+Instead, these primops are hand-written and called as normal functions.
+Hugs only needs to know their names and types but doesn't care whether
+they are generated by GHC or by hand.  Two things to watch:
+
+\begin{enumerate}
+\item
+Hugs must be able to enter these primops even if it is working on a
+standalone system that does not support genuine GHC generated code.
+
+\item The complex primops often involve unboxed tuple types (which
+Hugs does not support at the source level) so we cannot specify their
+types in a Haskell source file.
 
+\end{enumerate}
+
+\end{description}
 
 \subsection{Hugs Heap Objects}
 \label{sect:hugs-heap-objects}
@@ -1268,7 +2410,7 @@ their semantics.
 Since byte-code lives on the heap, it can be garbage collected just
 like any other heap-resident data.  Hugs arranges that any BCO's
 referred to by the Hugs symbol tables are treated as live objects by
-the garbage collectr.  When a module is unloaded, the pointers to its
+the garbage collector.  When a module is unloaded, the pointers to its
 BCOs are removed from the symbol table, and the code will be garbage
 collected some time later.
 
@@ -1315,7 +2457,7 @@ The calling convention for any byte-code function is straightforward:
 In a system containing both GHC and Hugs, the bytecode interpreter
 only has to be able to enter BCOs: everything else can be handled by
 returning to the compiled world (as described in
-Section~\ref{sect:hugs-to-ghc-closure}) and entering the closure
+Section~\ref{sect:hugs-to-ghc-switch}) and entering the closure
 there.
 
 This would work but it would obviously be very inefficient if
@@ -1363,7 +2505,7 @@ wanted.
 
 The most obvious omissions from the above list are @BCO@s (which we
 dealt with above) and GHC-built closures (which are covered in Section
-\ref{sect:hugs-to-ghc-closure}).
+\ref{sect:hugs-to-ghc-switch}).
 
 
 \subsection{Return convention}
@@ -1371,10 +2513,10 @@ dealt with above) and GHC-built closures (which are covered in Section
 
 When Hugs pushes a return address, it pushes both a pointer to the BCO
 to return to, and a pointer to a static code fragment @HUGS_RET@ (this
-will be described in Section \ref{sect:ghc-to-hugs-return}).  The
+is described in Section \ref{sect:ghc-to-hugs-switch}).  The
 stack layout is shown in Figure \ref{fig:hugs-return-stack}.
 
-\begin{figure}
+\begin{figure}[ht]
 \begin{center}
 @
 | stack    |
@@ -1390,7 +2532,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
 \label{fig:hugs-return-stack}
 \end{figure}
 
-\begin{figure}
+\begin{figure}[ht]
 \begin{center}
 @
 | stack    |
@@ -1404,7 +2546,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
 \label{fig:hugs-return2}
 \end{figure}
 
-\begin{figure}
+\begin{figure}[ht]
 \begin{center}
 @
 | stack    |
@@ -1416,11 +2558,11 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
 @
 %\input{hugs_ret2.pstex_t}
 \end{center}
-\caption{Stack layout on enterings a Hugs return address with an unboxed value}
+\caption{Stack layout on entering a Hugs return address with an unboxed value}
 \label{fig:hugs-return-int}
 \end{figure}
 
-\begin{figure}
+\begin{figure}[ht]
 \begin{center}
 @
 | stack    |
@@ -1436,7 +2578,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
 \label{fig:hugs-return3}
 \end{figure}
 
-\begin{figure}
+\begin{figure}[ht]
 \begin{center}
 @
 | stack    |
@@ -1467,37 +2609,19 @@ the stack and enter the BCO with the current object pointer set to the BCO
 (Figure \ref{fig:hugs-return2}).
 
 \item If the top of the stack is not @HUGS_RET@, we need to do a world
-switch as described in Section \ref{sect:hugs-to-ghc-return}.
+switch as described in Section \ref{sect:hugs-to-ghc-switch}.
 
 \end{itemize}
 
+\ToDo{This duplicates what we say about switching worlds
+(section~\ref{sect:switching-worlds}) - kill one or t'other.}
 
-\part{Implementation}
-
-\section{Overview}
-
-This part describes the inner workings of the major components of the system.
-Their external interfaces are described in the previous part.
-
-The major components of the system are:
-\begin{itemize}
-\item The scheduler
-\item The loader
-\item The storage manager
-\item The machine code evaluator (compiled code)
-\item The bytecode evaluator (interpreted code)
-\end{itemize}
-
-
-
-\section{Hugs Bytecodes}
-\label{sect:hugs-bytecodes}
 
 \ToDo{This was in the evaluation model part but it really belongs in
 this part which is about the internal details of each of the major
 sections.}
 
-\subsubsection{Addressing Modes}
+\subsection{Addressing Modes}
 
 To avoid potential alignment problems and simplify garbage collection,
 all literal constants are stored in two tables (one boxed, the other
@@ -1513,7 +2637,7 @@ char, int, nat and float since they all occupy 32 bits --- but it
 costs almost nothing to distinguish them and may improve portability
 and simplify debugging.
 
-\subsubsection{Compilation}
+\subsection{Compilation}
 
 
 \def\is{\mbox{\it is}}
@@ -1610,7 +2734,7 @@ used.  Here is a small compiler for the STG language.
 
 \ToDo{Optimise thunks of the form @f{x1,...xm}@ so that we build an AP directly}
 
-\subsubsection{Instructions}
+\subsection{Instructions}
 
 We specify the semantics of instructions using transition rules of
 the form:
@@ -1626,7 +2750,7 @@ where $\is$ is an instruction stream, $s$ is the stack, $\su$ is the
 update frame pointer and $h$ is the heap.
 
 
-\subsubsection{Stack manipulation}
+\subsection{Stack manipulation}
 
 \begin{description}
 
@@ -1704,7 +2828,7 @@ where $\size{\as} = m$ and $\size{\bs} = n$.
 \end{description}
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsection{Heap manipulation}
+\subsection{Heap manipulation}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \begin{description}
@@ -1766,7 +2890,7 @@ always push the args in reverse order.
 \end{description}
 
 
-\subsubsection{Entering a closure}
+\subsection{Entering a closure}
 
 \begin{description}
 
@@ -1846,7 +2970,7 @@ Optimisations:
 \end{description}
 
 
-\subsubsection{Updates}
+\subsection{Updates}
 
 \begin{description}
 
@@ -1886,7 +3010,7 @@ in different directions.
 
 \end{description}
 
-\subsubsection{Branches}
+\subsection{Branches}
 
 \begin{description}
 
@@ -1913,7 +3037,7 @@ where $C.\tag \neq tag$
 \end{description}
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsection{Heap and stack checks}
+\subsection{Heap and stack checks}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \begin{tabular}{|llrrrrr|}
@@ -1949,16 +3073,778 @@ Optimisations:
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsection{Primops}
+\subsection{Primops}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \ToDo{primops take m words and return n words. The expect boxed arguments on the stack.}
 
 
+\section{The Machine Code Evaluator}
+
+This section describes the framework in which compiled code evaluates
+expressions.  Only at certain points will compiled code need to be
+able to talk to the interpreted world; these are discussed in Section
+\ref{sect:switching-worlds}.
+
+\subsection{Calling conventions}
+
+\subsubsection{The call/return registers}
+
+One of the problems in designing a virtual machine is that we want it
+abstract away from tedious machine details but still reveal enough of
+the underlying hardware that we can make sensible decisions about code
+generation.  A major problem area is the use of registers in
+call/return conventions.  On a machine with lots of registers, it's
+cheaper to pass arguments and results in registers than to pass them
+on the stack.  On a machine with very few registers, it's cheaper to
+pass arguments and results on the stack than to use ``virtual
+registers'' in memory.  We therefore use a hybrid system: the first
+$n$ arguments or results are passed in registers; and the remaining
+arguments or results are passed on the stack.  For register-poor
+architectures, it is important that we allow $n=0$.
+
+We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with
+the understanding that \Arg{1} \ldots \Arg{n} are in registers and
+\Arg{n+1} \ldots \Arg{m} are on top of the stack.
+
+Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine
+registers depends on the {\em kinds} of the arguments.  For example,
+if the first argument is a Float, we might pass it in a different
+register from if it is an Int.  In fact, we might find that a given
+architecture lets us pass varying numbers of arguments according to
+their types.  For example, if a CPU has 2 Int registers and 2 Float
+registers then we could pass between 2 and 4 arguments in machine
+registers --- depending on whether they all have the same kind or they
+have different kinds.
+
+\subsubsection{Entering closures}
+\label{sect:entering-closures}
+
+To evaluate a closure we jump to the entry code for the closure
+passing a pointer to the closure in \Arg{1} so that the entry code can
+access its environment.
+
+\subsubsection{Function call}
+
+The function-call mechanism is obviously crucial.  There are five different
+cases to consider:
+\begin{enumerate}
+
+\item {\em Known combinator (function with no free variables) and enough arguments.}  
+
+A fast call can be made: push excess arguments onto stack and jump to
+function's {\em fast entry point} passing arguments in \Arg{1} \ldots
+\Arg{m}.  
+
+The {\em fast entry point} is only called with exactly the right
+number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly
+start doing useful work without first testing whether it has enough
+registers or having to pop them off the stack first.
+
+\item {\em Known combinator and insufficient arguments.}
+
+A slow call can be made: push all arguments onto stack and jump to
+function's {\em slow entry point}.
+
+Any unpointed arguments which are pushed on the stack must be tagged.
+This means pushing an extra word on the stack below the unpointed
+words, containing the number of unpointed words above it.
+
+%Todo: forward ref about tagging?
+%Todo: picture?
+
+The {\em slow entry point} might be called with insufficient arguments
+and so it must test whether there are enough arguments on the stack.
+This {\em argument satisfaction check} consists of checking that
+@Su-Sp@ is big enough to hold all the arguments (including any tags).
+
+\begin{itemize} 
+
+\item If the argument satisfaction check fails, it is because there is
+one or more update frames on the stack before the rest of the
+arguments that the function needs.  In this case, we construct a PAP
+(partial application, section~\ref{sect:PAP}) containing the arguments
+which are on the stack.  The PAP construction code will return to the
+update frame with the address of the PAP in \Arg{1}.
+
+\item If the argument satisfaction check succeeds, we jump to the fast
+entry point with the arguments in \Arg{1} \ldots \Arg{arity}.
+
+If the fast entry point expects to receive some of \Arg{i} on the
+stack, we can reduce the amount of movement required by making the
+stack layout for the fast entry point look like the stack layout for
+the slow entry point.  Since the slow entry point is entered with the
+first argument on the top of the stack and with tags in front of any
+unpointed arguments, this means that if \Arg{i} is unpointed, there
+should be space below it for a tag and that the highest numbered
+argument should be passed on the top of the stack.
+
+We usually arrange that the fast entry point is placed immediately
+after the slow entry point --- so we can just ``fall through'' to the
+fast entry point without performing a jump.
+
+\end{itemize}
+
+
+\item {\em Known function closure (function with free variables) and enough arguments.}
+
+A fast call can be made: push excess arguments onto stack and jump to
+function's {\em fast entry point} passing a pointer to closure in
+\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}.
+
+Like the fast entry point for a combinator, the fast entry point for a
+closure is only called with appropriate values in \Arg{1} \ldots
+\Arg{m+1} so we can start work straight away.  The pointer to the
+closure is used to access the free variables of the closure.
+
+
+\item {\em Known function closure and insufficient arguments.}
+
+A slow call can be made: push all arguments onto stack and jump to the
+closure's slow entry point passing a pointer to the closure in \Arg{1}.
+
+Again, the slow entry point performs an argument satisfaction check
+and either builds a PAP or pops the arguments off the stack into
+\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
+
+
+\item {\em Unknown function closure, thunk or constructor.}
+
+Sometimes, the function being called is not statically identifiable.
+Consider, for example, the @compose@ function:
+@
+  compose f g x = f (g x)
+@
+Since @f@ and @g@ are passed as arguments to @compose@, the latter has
+to make a heap call.  In a heap call the arguments are pushed onto the
+stack, and the closure bound to the function is entered.  In the
+example, a thunk for @(g x)@ will be allocated, (a pointer to it)
+pushed on the stack, and the closure bound to @f@ will be
+entered. That is, we will jump to @f@s entry point passing @f@ in
+\Arg{1}.  If \Arg{1} is passed on the stack, it is pushed on top of
+the thunk for @(g x)@.
+
+The {\em entry code} for an updateable thunk (which must have arity 0)
+pushes an update frame on the stack and starts executing the body of
+the closure --- using \Arg{1} to access any free variables.  This is
+described in more detail in section~\ref{sect:data-updates}.
+
+The {\em entry code} for a non-updateable closure is just the
+closure's slow entry point.
+
+\end{enumerate}
+
+In addition to the above considerations, if there are \emph{too many}
+arguments then the extra arguments are simply pushed on the stack with
+appropriate tags.
+
+To summarise, a closure's standard (slow) entry point performs the following:
+\begin{description}
+\item[Argument satisfaction check.] (function closure only)
+\item[Stack overflow check.]
+\item[Heap overflow check.]
+\item[Copy free variables out of closure.] %Todo: why?
+\item[Eager black holing.] (updateable thunk only) %Todo: forward ref.
+\item[Push update frame.]
+\item[Evaluate body of closure.]
+\end{description}
+
+
+\subsection{Case expressions and return conventions}
+\label{sect:return-conventions}
+
+The {\em evaluation} of a thunk is always initiated by
+a @case@ expression.  For example:
+@
+  case x of (a,b) -> E
+@
+
+The code for a @case@ expression looks like this:
+
+\begin{itemize}
+\item Push the free variables of the branches on the stack (fv(@E@) in
+this case).
+\item  Push a \emph{return address} on the stack.
+\item  Evaluate the scrutinee (@x@ in this case).
+\end{itemize}
+
+Once evaluation of the scrutinee is complete, execution resumes at the
+return address, which points to the code for the expression @E@.
+
+When execution resumes at the return point, there must be some {\em
+return convention} that defines where the components of the pair, @a@
+and @b@, can be found.  The return convention varies according to the
+type of the scrutinee @x@:
+
+\begin{itemize}
+
+\item 
+
+(A space for) the return address is left on the top of the stack.
+Leaving the return address on the stack ensures that the top of the
+stack contains a valid activation record
+(section~\ref{sect:activation-records}) --- should a garbage collection
+be required.
+
+\item If @x@ has a boxed type (e.g.~a data constructor or a function),
+a pointer to @x@ is returned in \Arg{1}.
+
+\ToDo{Warn that components of E should be extracted as soon as
+possible to avoid a space leak.}
+
+\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is
+returned in \Arg{1}
+
+\item If @x@ is an unboxed tuple constructor, the components of @x@
+are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
+the heap.  
+
+When passing an unboxed tuple to a function, the components are
+flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
+
+\end{itemize}
+
+\subsection{Vectored Returns}
+
+Many algebraic data types have more than one constructor.  For
+example, the @Maybe@ type is defined like this:
+@
+  data Maybe a = Nothing | Just a
+@
+How does the return convention encode which of the two constructors is
+being returned?  A @case@ expression scrutinising a value of @Maybe@
+type would look like this: 
+@
+  case E of 
+    Nothing -> ...
+    Just a  -> ...
+@
+Rather than pushing a return address before evaluating the scrutinee,
+@E@, the @case@ expression pushes (a pointer to) a {\em return
+vector}, a static table consisting of two code pointers: one for the
+@Just@ alternative, and one for the @Nothing@ alternative.  
+
+\begin{itemize}
+
+\item
+
+The constructor @Nothing@ returns by jumping to the first item in the
+return vector with a pointer to a (statically built) Nothing closure
+in \Arg{1}.  
+
+It might seem that we could avoid loading \Arg{1} in this case since the
+first item in the return vector will know that @Nothing@ was returned
+(and can easily access the Nothing closure in the (unlikely) event
+that it needs it.  The only reason we load \Arg{1} is in case we have to
+perform an update (section~\ref{sect:data-updates}).
+
+\item 
+
+The constructor @Just@ returns by jumping to the second element of the
+return vector with a pointer to the closure in \Arg{1}.  
+
+\end{itemize}
+
+In this way no test need be made to see which constructor returns;
+instead, execution resumes immediately in the appropriate branch of
+the @case@.
+
+\subsection{Direct Returns}
+
+When a datatype has a large number of constructors, it may be
+inappropriate to use vectored returns.  The vector tables may be
+large and sparse, and it may be better to identify the constructor
+using a test-and-branch sequence on the tag.  For this reason, we
+provide an alternative return convention, called a \emph{direct
+return}.
+
+In a direct return, the return address pushed on the stack really is a
+code pointer.  The returning code loads a pointer to the closure being
+returned in \Arg{1} as usual, and also loads the tag into \Arg{2}.
+The code at the return address will test the tag and jump to the
+appropriate code for the case branch.
+
+The choice of whether to use a vectored return or a direct return is
+made on a type-by-type basis --- up to a certain maximum number of
+constructors imposed by the update mechanism
+(section~\ref{sect:data-updates}).
+
+Single-constructor data types also use direct returns, although in
+that case there is no need to return a tag in \Arg{2}.
+
+\ToDo{Say whether we pop the return address before returning}
+
+\ToDo{Stack stubbing?}
+
+\subsection{Updates}
+\label{sect:data-updates}
+
+The entry code for an updatable thunk (which must be of arity 0):
+
+\begin{itemize}
+\item copies the free variables out of the thunk into registers or
+  onto the stack.
+\item pushes an {\em update frame} onto the stack.
+
+An update frame is a small activation record consisting of
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+{\em Fixed header} & {\em Update Frame link} & {\em Updatee} \\
+\hline
+\end{tabular}
+\end{center}
+
+\note{In the semantics part of the STG paper (section 5.6), an update
+frame consists of everything down to the last update frame on the
+stack.  This would make sense too --- and would fit in nicely with
+what we're going to do when we add support for speculative
+evaluation.}
+\ToDo{I think update frames contain cost centres sometimes}
+
+\item 
+If we are doing ``eager blackholing,'' we then overwrite the thunk
+with a black hole.  Otherwise, we leave it to the garbage collector to
+black hole the thunk.
+
+\item 
+Start evaluating the body of the expression.
+
+\end{itemize}
+
+When the expression finishes evaluation, it will enter the update
+frame on the top of the stack.  Since the returner doesn't know
+whether it is entering a normal return address/vector or an update
+frame, we follow exactly the same conventions as return addresses and
+return vectors.  That is, on entering the update frame:
+
+\begin{itemize} 
+\item The value of the thunk is in \Arg{1}.  (Recall that only thunks
+are updateable and that thunks return just one value.)
+
+\item If the data type is a direct-return type rather than a
+vectored-return type, then the tag is in \Arg{2}.
+
+\item The update frame is still on the stack.
+\end{itemize}
+
+We can safely share a single statically-compiled update function
+between all types.  However, the code must be able to handle both
+vectored and direct-return datatypes.  This is done by arranging that
+the update code looks like this:
+
+@
+                |       ^       |
+                | return vector |
+                |---------------|
+                |  fixed-size   |
+                |  info table   |
+                |---------------|  <- update code pointer
+                |  update code  |
+                |       v       |
+@
+
+Each entry in the return vector (which is large enough to cover the
+largest vectored-return type) points to the update code.
+
+The update code:
+\begin{itemize}
+\item overwrites the {\em updatee} with an indirection to \Arg{1};
+\item loads @Su@ from the Update Frame link;
+\item removes the update frame from the stack; and 
+\item enters \Arg{1}.
+\end{itemize}
+
+We enter \Arg{1} again, having probably just come from there, because
+it knows whether to perform a direct or vectored return.  This could
+be optimised by compiling special update code for each slot in the
+return vector, which performs the correct return.
+
+\subsection{Semi-tagging}
+\label{sect:semi-tagging}
+
+When a @case@ expression evaluates a variable that might be bound
+to a thunk it is often the case that the scrutinee is already evaluated.
+In this case we have paid the penalty of (a) pushing the return address (or
+return vector address) on the stack, (b) jumping through the info pointer
+of the scrutinee, and (c) returning by an indirect jump through the
+return address on the stack.
+
+If we knew that the scrutinee was already evaluated we could generate
+(better) code which simply jumps to the appropriate branch of the
+@case@ with a pointer to the scrutinee in \Arg{1}.  (For direct
+returns to multiconstructor datatypes, we might also load the tag into
+\Arg{2}).
+
+An obvious idea, therefore, is to test dynamically whether the heap
+closure is a value (using the tag in the info table).  If not, we
+enter the closure as usual; if so, we jump straight to the appropriate
+alternative.  Here, for example, is pseudo-code for the expression
+@(case x of { (a,_,c) -> E }@:
+@
+      \Arg{1} = <pointer to x>;
+      tag = \Arg{1}->entry->tag;
+      if (isWHNF(tag)) {
+          Sp--;  \\ insert space for return address
+          goto ret;
+      }
+      push(ret);           
+      goto \Arg{1}->entry;
+      
+      <info table for return address goes here>
+ret:  a = \Arg{1}->data1; \\ suck out a and c to avoid space leak
+      c = \Arg{1}->data3;
+      <code for E2>
+@
+and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
+@
+      \Arg{1} = <pointer to x>;
+      tag = \Arg{1}->entry->tag;
+      if (isWHNF(tag)) {
+          Sp--;  \\ insert space for return address
+          goto retvec[tag];
+      }
+      push(retinfo);          
+      goto \Arg{1}->entry;
+      
+      .addr ret2
+      .addr ret1
+retvec:           \\ reversed return vector
+      <return info table for case goes here>
+retinfo:
+      panic("Direct return into vectored case");
+      
+ret1: <code for E1>
+
+ret2: x  = \Arg{1}->head;
+      xs = \Arg{1}->tail;
+      <code for E2>
+@
+There is an obvious cost in compiled code size (but none in the size
+of the bytecodes).  There is also a cost in execution time if we enter
+more thunks than data constructors.
+
+Both the direct and vectored returns are easily modified to chase chains
+of indirections too.  In the vectored case, this is most easily done by
+making sure that @IND = TAG_1 - 1@, and adding an extra field to every
+return vector.  In the above example, the indirection code would be
+@
+ind:  \Arg{1} = \Arg{1}->next;
+      goto ind_loop;
+@
+where @ind_loop@ is the second line of code.
+
+Note that we have to leave space for a return address since the return
+address expects to find one.  If the body of the expression requires a
+heap check, we will actually have to write the return address before
+entering the garbage collector.
+
+
+\subsection{Heap and Stack Checks}
+\label{sect:heap-and-stack-checks}
+
+The storage manager detects that it needs to garbage collect the old
+generation when the evaluator requests a garbage collection without
+having moved the heap pointer since the last garbage collection.  It
+is therefore important that the GC routines {\em not} move the heap
+pointer unless the heap check fails.  This is different from what
+happens in the current STG implementation.
+
+Assuming that the stack can never shrink, we perform a stack check
+when we enter a closure but not when we return to a return
+continuation.  This doesn't work for heap checks because we cannot
+predict what will happen to the heap if we call a function.
+
+If we wish to allow the stack to shrink, we need to perform a stack
+check whenever we enter a return continuation.  Most of these checks
+could be eliminated if the storage manager guaranteed that a stack
+would always have 1000 words (say) of space after it was shrunk.  Then
+we can omit stack checks for less than 1000 words in return
+continuations.
+
+When an argument satisfaction check fails, we need to push the closure
+(in R1) onto the stack - so we need to perform a stack check.  The
+problem is that the argument satisfaction check occurs \emph{before}
+the stack check.  The solution is that the caller of a slow entry
+point or closure will guarantee that there is at least one word free
+on the stack for the callee to use.  
+
+Similarily, if a heap or stack check fails, we need to push the arguments
+and closure onto the stack.  If we just came from the slow entry point, 
+there's certainly enough space and it is the responsibility of anyone
+using the fast entry point to guarantee that there is enough space.
+
+\ToDo{Be more precise about how much space is required - document it
+in the calling convention section.}
+
+\subsection{Handling interrupts/signals}
+
+@
+May have to keep C stack pointer in register to placate OS?
+May have to revert black holes - ouch!
+@
+
+
+
+\section{The Loader}
+\section{The Compilers}
+
+\iffalse
+\part{Old stuff - needs to be mined for useful info}
+
+\section{The Scheduler}
+
+The Scheduler is the heart of the run-time system.  A running program
+consists of a single running thread, and a list of runnable and
+blocked threads.  The running thread returns to the scheduler when any
+of the following conditions arises:
+
+\begin{itemize}
+\item A heap check fails, and a garbage collection is required
+\item Compiled code needs to switch to interpreted code, and vice
+versa.
+\item The thread becomes blocked.
+\item The thread is preempted.
+\end{itemize}
+
+A running system has a global state, consisting of
+
+\begin{itemize}
+\item @Hp@, the current heap pointer, which points to the next
+available address in the Heap.
+\item @HpLim@, the heap limit pointer, which points to the end of the
+heap.
+\item The Thread Preemption Flag, which is set whenever the currently
+running thread should be preempted at the next opportunity.
+\item A list of runnable threads. 
+\item A list of blocked threads.
+\end{itemize}
+
+Each thread is represented by a Thread State Object (TSO), which is
+described in detail in Section \ref{sect:TSO}.
+
+The following is pseudo-code for the inner loop of the scheduler
+itself.
+
+@
+while (threads_exist) {
+  // handle global problems: GC, parallelism, etc
+  if (need_gc) gc();  
+  if (external_message) service_message();
+  // deal with other urgent stuff
+
+  pick a runnable thread;
+  do {
+    // enter object on top of stack
+    // if the top object is a BCO, we must enter it
+    // otherwise appply any heuristic we wish.
+    if (thread->stack[thread->sp]->info.type == BCO) {
+	status = runHugs(thread,&smInfo);
+    } else {
+	status = runGHC(thread,&smInfo);
+    }
+    switch (status) {  // handle local problems
+      case (StackOverflow): enlargeStack; break;
+      case (Error e)      : error(thread,e); break;
+      case (ExitWith e)   : exit(e); break;
+      case (Yield)        : break;
+    }
+  } while (thread_runnable);
+}
+@
+
+\subsection{Invoking the garbage collector}
+\subsection{Putting the thread to sleep}
+
+\subsection{Calling C from Haskell}
+
+We distinguish between "safe calls" where the programmer guarantees
+that the C function will not call a Haskell function or, in a
+multithreaded system, block for a long period of time and "unsafe
+calls" where the programmer cannot make that guarantee.  
+
+Safe calls are performed without returning to the scheduler and are
+discussed elsewhere (\ToDo{discuss elsewhere}).
+
+Unsafe calls are performed by returning an array (outside the Haskell
+heap) of arguments and a C function pointer to the scheduler.  The
+scheduler allocates a new thread from the operating system
+(multithreaded system only), spawns a call to the function and
+continues executing another thread.  When the ccall completes, the
+thread informs the scheduler and the scheduler adds the thread to the
+runnable threads list.  
+
+\ToDo{Describe this in more detail.}
 
 
+\subsection{Calling Haskell from C}
+
+When C calls a Haskell closure, it sends a message to the scheduler
+thread.  On receiving the message, the scheduler creates a new Haskell
+thread, pushes the arguments to the C function onto the thread's stack
+(with tags for unboxed arguments) pushes the Haskell closure and adds
+the thread to the runnable list so that it can be entered in the
+normal way.
+
+When the closure returns, the scheduler sends back a message which
+awakens the (C) thread.  
+
+\ToDo{Do we need to worry about the garbage collector deallocating the
+thread if it gets blocked?}
+
+\subsection{Switching Worlds}
+\label{sect:switching-worlds}
+
+\ToDo{This has all changed: we always leave a closure on top of the
+stack if we mean to continue executing it.  The scheduler examines the
+top of the stack and tries to guess which world we want to be in.  If
+it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
+closure, it certainly enters GHC and if it finds a standard closure,
+it is free to choose either one but it's probably best to enter GHC
+for everything except @BCO@s and perhaps @AP@s.}
+
+Because this is a combined compiled/interpreted system, the
+interpreter will sometimes encounter compiled code, and vice-versa.
+
+All world-switches go via the scheduler, ensuring that the world is in
+a known state ready to enter either compiled code or the interpreter.
+When a thread is run from the scheduler, the @whatNext@ field in the
+TSO (Section \ref{sect:TSO}) is checked to find out how to execute the
+thread.
+
+\begin{itemize}
+\item If @whatNext@ is set to @ReturnGHC@, we load up the required
+registers from the TSO and jump to the address at the top of the user
+stack.
+\item If @whatNext@ is set to @EnterGHC@, we load up the required
+registers from the TSO and enter the closure pointed to by the top
+word of the stack.
+\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
+the stack, using the interpreter.
+\end{itemize}
+
+There are four cases we need to consider:
+
+\begin{enumerate}
+\item A GHC thread enters a Hugs-built closure.
+\item A GHC thread returns to a Hugs-compiled return address.
+\item A Hugs thread enters a GHC-built closure.
+\item A Hugs thread returns to a Hugs-compiled return address.
+\end{enumerate}
+
+GHC-compiled modules cannot call functions in a Hugs-compiled module
+directly, because the compiler has no information about arities in the
+external module.  Therefore it must assume any top-level objects are
+CAFs, and enter their closures.
+
+\ToDo{Hugs-built constructors?}
+
+We now examine the various cases one by one and describe how the
+switch happens in each situation.
+
+\subsection{A GHC thread enters a Hugs-built closure}
+\label{sect:ghc-to-hugs-switch}
+
+There is three possibilities: GHC has entered a @PAP@, or it has
+entered a @AP@, or it has entered the BCO directly (for a top-level
+function closure).  @AP@s and @PAP@s are ``standard closures'' and
+so do not require us to enter the bytecode interpreter.
+
+The entry code for a BCO does the following:
+
+\begin{itemize}
+\item Push the address of the object entered on the stack.
+\item Save the current state of the thread in its TSO.
+\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
+\end{itemize}
+
+BCO's for thunks and functions have the same entry conventions as
+slow entry points: they expect to find their arguments on the stac
+with unboxed arguments preceded by appropriate tags.
+
+\subsection{A GHC thread returns to a Hugs-compiled return address}
+\label{sect:ghc-to-hugs-switch}
+
+Hugs return addresses are laid out as in Figure
+\ref{fig:hugs-return-stack}.  If GHC is returning, it will return to
+the address at the top of the stack, namely @HUGS_RET@.  The code at
+@HUGS_RET@ performs the following:
+
+\begin{itemize}
+\item pushes \Arg{1} (the return value) on the stack.
+\item saves the thread state in the TSO
+\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
+\end{itemize}
+
+\noindent When Hugs runs, it will enter the return value, which will
+return using the correct Hugs convention (Section
+\ref{sect:hugs-return-convention}) to the return address underneath it
+on the stack.
+
+\subsection{A Hugs thread enters a GHC-compiled closure}
+\label{sect:hugs-to-ghc-switch}
+
+Hugs can recognise a GHC-built closure as not being one of the
+following types of object:
+
+\begin{itemize}
+\item A @BCO@,
+\item A @AP@,
+\item A @PAP@,
+\item An indirection, or
+\item A constructor.
+\end{itemize}
+
+When Hugs is called on to enter a GHC closure, it executes the
+following sequence of instructions:
+
+\begin{itemize}
+\item Push the address of the closure on the stack.
+\item Save the current state of the thread in the TSO.
+\item Return to the scheduler, with the @whatNext@ field set to
+@EnterGHC@.
+\end{itemize}
+
+\subsection{A Hugs thread returns to a GHC-compiled return address}
+\label{sect:hugs-to-ghc-switch}
+
+When Hugs encounters a return address on the stack that is not
+@HUGS_RET@, it knows that a world-switch is required.  At this point
+the stack contains a pointer to the return value, followed by the GHC
+return address.  The following sequence is then performed:
+
+\begin{itemize}
+\item save the state of the thread in the TSO.
+\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
+\end{itemize}
+
+The first thing that GHC will do is enter the object on the top of the
+stack, which is a pointer to the return value.  This value will then
+return itself to the return address using the GHC return convention.
+
+
+\fi
+
+\part{Implementation}
+
+\section{Overview}
+
+This part describes the inner workings of the major components of the system.
+Their external interfaces are described in the previous part.
+
+The major components of the system are:
+\begin{itemize}
+\item The scheduler
+\item The loader
+\item The storage manager
+\item The machine code evaluator (compiled code)
+\item The bytecode evaluator (interpreted code)
+\end{itemize}
+
+\iffalse
 
 \section{Heap objects}
+\label{sect:heap-objects}
 \label{sect:fixed-header}
 
 \ToDo{Fix this picture}
@@ -2460,7 +4346,7 @@ The semantics of BCOs are described in Section
 \begin{itemize}
 \item The entry code is a static code fragment/info table that
 returns to the scheduler to invoke Hugs (Section
-\ref{sect:ghc-to-hugs-closure}).
+\ref{sect:ghc-to-hugs-switch}).
 \item \emph{Layout} contains the number of pointer literals in the
 \emph{Literals} field.
 \item \emph{Offset} is the offset to the byte code from the start of
@@ -3284,7 +5170,7 @@ are evacuated and @NULL@-values are chained together to form a new free list.
 There's no need to link the stable pointer table onto the mutable
 list because we always treat it as a root.
 
-
+\fi
 
 \section{The Storage Manager}
 
@@ -3409,13 +5295,16 @@ which are still live.
 
 \end{description}
 
-
 %************************************************************************
 %*									*
 \subsection{``What really happens in a garbage collection?''}
 %*									*
 %************************************************************************
 
+\ToDo{I commented out this long, out of date section - ADR}
+
+\iffalse
+
 This is a brief tutorial on ``what really happens'' going to/from the
 storage manager in a garbage collection.
 
@@ -3526,7 +5415,7 @@ parallel system, we will now loop back around, and make sure there is
 enough space before continuing.
 \end{description}
 
-
+\fi % end of commented out part
 
 \subsection{Static Reference Tables (SRTs)}
 \label{sect:srt}
@@ -3718,6 +5607,11 @@ place between each.
 
 \subsection{Squeezing identical updates}
 
+\Note{This can also be done by testing whether @Sp == Su@ when we push
+an update frame.  If so, we can overwrite the updatee with an
+indirection to the existing updatee (and some slop objects) and avoid
+pushing an update frame.}
+
 \iffalse
 
 \ToDo{Insert text stolen from update paper}
@@ -3883,8 +5777,6 @@ to implement than Sparud's.
 \subsection{Internal workings of the Generational Collector}
 
 
-\section{Dynamic Linking}
-
 \section{Profiling}
 
 Registering costs centres looks awkward - can we tidy it up?
@@ -3919,7 +5811,7 @@ Measure what proportion of ...:
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-stk-heap-use]{Stack and heap usage}
+\subsection[ticky-stk-heap-use]{Stack and heap usage}
 %*									*
 %************************************************************************
 
@@ -3943,7 +5835,7 @@ Re-use of stack slots, and stubbing of stack slots:
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-allocs]{Allocations}
+\subsection[ticky-allocs]{Allocations}
 %*									*
 %************************************************************************
 
@@ -3977,7 +5869,7 @@ admin/goods/slop breakdown.)
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-enters]{Enters}
+\subsection[ticky-enters]{Enters}
 %*									*
 %************************************************************************
 
@@ -4002,7 +5894,7 @@ struct ent_counter {
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-returns]{Returns}
+\subsection[ticky-returns]{Returns}
 %*									*
 %************************************************************************
 
@@ -4042,7 +5934,7 @@ vectored?  (The rest were obviously unvectored).
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-update-frames]{Update frames}
+\subsection[ticky-update-frames]{Update frames}
 %*									*
 %************************************************************************
 
@@ -4062,7 +5954,7 @@ Macro			&	Counts					\\ \hline
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-updates]{Updates}
+\subsection[ticky-updates]{Updates}
 %*									*
 %************************************************************************
 
@@ -4086,7 +5978,7 @@ Macro			&	Where						\\ \hline
 
 %************************************************************************
 %*									*
-\subsubsection[ticky-selectors]{Doing selectors at GC time}
+\subsection[ticky-selectors]{Doing selectors at GC time}
 %*									*
 %************************************************************************
 
@@ -4188,6 +6080,10 @@ collection code.
 
 \section{Old stuff about SRTs}
 
+\ToDo{Commented out}
+
+\iffalse
+
 Garbage collection of @CAF@s is tricky.  We have to cope with explicit
 collection from the @CAFlist@ as well as potential references from the
 stack and heap which will cause the @CAF@ evacuation code to be
@@ -4304,8 +6200,14 @@ shorted out.
 This scheme has not been adopted but has been implemented. The code is
 commented out with @#if 0@.
 
+\fi
+
 \subsection{The virtual register set}
 
+\ToDo{Commented out}
+
+\iffalse
+
 We refer to any (atomic) part of the virtual machine state as a ``register.''
 These ``registers'' may be shared between all threads in the system or may be
 specific to each thread.
@@ -4356,6 +6258,7 @@ basis:
   allow faster access to all the other non-machine registers.
 @
 
+\fi
 
 \end{document}
author	areid <unknown>	1998-01-08 14:40:22 +0000
committer	areid <unknown>	1998-01-08 14:40:22 +0000
commit	ff14742cc328f19b9bf7c04d9a69408e641cf64a (patch)
tree	f66486d1f07b4808eb6098105769c9c1d851d9ac /docs/rts
parent	b3163aefcecd596fe200f9d3cb5ea82443767269 (diff)
download	haskell-ff14742cc328f19b9bf7c04d9a69408e641cf64a.tar.gz