[project @ 2000-09-26 16:29:59 by sewardj]

Rearrange the Linker section a bit, in line with upcoming rearrangement of Compiler and CM sections.
author: sewardj <unknown> 2000-09-26 16:29:59 +0000
committer: sewardj <unknown> 2000-09-26 16:29:59 +0000
commit: 603eccf76ca5df1732dd0744ec941f014e6e3805 (patch)
tree: 6e557c6feeeacaba21e4984f02ed9e9c0cb0f5b6 /ghc/docs/ghci/ghci.tex
parent: 1fb4b05b4e8e7e744a62595af6669af1a57c8779 (diff)
download: haskell-603eccf76ca5df1732dd0744ec941f014e6e3805.tar.gz
1 files changed, 152 insertions, 136 deletions
diff --git a/ghc/docs/ghci/ghci.tex b/ghc/docs/ghci/ghci.tex
index 62a1fb1ac9..6fab8a3fab 100644
--- a/ghc/docs/ghci/ghci.tex
+++ b/ghc/docs/ghci/ghci.tex
@@ -423,56 +423,167 @@ What @compile@ does: \ToDo{A bit vague ... needs refining.  How does
    boot interface against the inferred interface.}
 \end{itemize}
 
-\section{Linking}
 
-\subsection{External API}
+
+\subsection{Linking}
+\label{sec:linker}
+
+\subsubsection{The linker's private data structures}
+
+In the same way that @compile@ has a persistent compiler state (PCS),
+the linker has a persistent (session-lifetime) state, LPS, the
+Linker's Persistent State.  In batch mode LPS is entirely irrelevant,
+because there is only a single link step, and can be a unit value
+ignored by everybody.  In interactive mode LPS is composed of the
+following three parts:
+
+\begin{itemize}
+\item 
+\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@   
+  The source symbol table is used when linking interpreted code.
+  Unlinked interpreted code consists of an STG  tree where
+  the leaves are @RdrNames@.  The linker's job is to resolve these to
+  actual addresses (the alternative is to resolve these lazily when
+  the code is run, but this requires passing the full symbol table
+  through the interpreter and the repeated lookups will probably be
+  expensive).
+
+  The source symbol table therefore maps @RdrName@s to @HValue@s, for
+  every @RdrName@ that currently \emph{has} an @HValue@, including all
+  exported functions from object code modules that are currently
+  linked in.  Linking therefore turns a @StgTree RdrName@ into an
+  @StgTree HValue@.
+
+  It is important that we can prune this symbol table by throwing away
+  the mappings for an entire module, whenever we recompile/relink a
+  given module.  The representation is therefore probably a two-level
+  mapping, from module names, to function/constructor names, to
+  @HValue@s.
+
+\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
+  This is a lower level symbol table, mapping symbol names in object
+  modules to their addresses in memory.  It is used only when
+  resolving the external references in an object module, and contains
+  only entries that are defined in object modules.
+
+  Why have two symbol tables?  Well, there is a clear distinction
+  between the two: the source symbol table maps Haskell symbols to
+  Haskell values, and the object symbol table maps object symbols to
+  addresses.  There is some overlap, in that Haskell symbols certainly
+  have addresses, and we could look up a Haskell symbol's address by
+  manufacturing the right object symbol and looking that up in the
+  object symbol table, but this is likely to be slow and would force
+  us to extend the object symbol table with all the symbols
+  ``exported'' by interpreted code.  Doing it this way enables us to
+  decouple the object management subsystem from the rest of the linker
+  with a minimal interface; something like
+
+  \begin{verbatim}
+  loadObject   :: Unlinked -> IO Object
+  unloadModule :: Unlinked -> IO ()
+  lookupSymbol :: String   -> IO Addr
+  \end{verbatim}
+
+  Rather unfortunately we need @lookupSymbol@ in order to populate the
+  source symbol table when linking in a new compiled module.  Our
+  object management subsystem is currently written in C, so decoupling
+  this interface as much as possible is highly desirable.
+
+\item
+   {\bf Linked Image (LI)} @:: no-explicit-representation@
+
+   LI isn't explicitly represented in the system, but we record it
+   here for completeness anyway.  LI is the current set of
+   linked-together module, package and other library fragments
+   constituting the current executable mass.  LI comprises:
+   \begin{itemize}
+   \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
+         These are loaded from disk when needed, and stored in
+         @malloc@ville.  To simplify storage management, they are
+         never freed or reused, since this creates serious
+         complications for storage management.  When no longer needed,
+         they are simply abandoned.  New linkings of the same object
+         code produces new copies in memory.  We hope this not to be
+         too much of a space leak.
+   \item STG trees, which live in the GHCI heap and are managed by the
+         storage manager in the usual way.  They are held alive (are 
+         reachable) via the @HValue@s in the OST.  Such @HValue@s are
+         applications of the interpreter function to the trees
+         themselves.  Linking a tree comprises travelling over the 
+         tree, replacing all the @Id@s with pointers directly to the
+         relevant @_closure@ labels, as determined by searching the
+         OST.  Once the leaves are linked, trees are wrapped with the
+         interpreter function.  The resulting @HValue@s then behave
+         indistinguishably from compiled versions of the same code.
+   \end{itemize}
+   Because object code is outside the heap and never deallocated,
+   whilst interpreted code is held alive by the OST, there's no need
+   to have a data structure which ``is'' the linked image.
+
+   For batch compilation, LI doesn't exist because OST doesn't exist,
+   and because @link@ doesn't load code into memory, instead just
+   invokes the system linker.
+
+   \ToDo{Do we need to say anything about CAFs and SRTs?  Probably ...}
+\end{itemize}
+
+
+\subsubsection{The linker's interface}
+
+In practice, the PLS might be hidden in the I/O monad rather
+than passed around explicitly.  (The same might be true for PCS).
+Anyway:
 
 \begin{verbatim}
-   data LinkState  -- abstract
+   data PCS -- as described above; opaque to everybody except the linker
 
-   link :: [[Linkable]] -> LinkState -> IO LinkResult
+   link :: PCI -> ??? -> [[Linkable]] -> LinkState -> IO LinkResult
 
-   data LinkResult = LinkOK LinkState
-                   | LinkErrs [SDoc] LinkState
+   data LinkResult = LinkOK   LinkState
+                   | LinkErrs LinkState [SDoc]
 \end{verbatim}
 
-In practice, the @LinkState@ might be hidden in the I/O monad rather
-than passed around explicitly.
+CM uses @link@ as follows:
 
-The linker is used by the compilation manager as follows after
-repeatedly calling the compiler to compile all modules which are
-out-of-date, the linker is invoked.  The @[[Linkable]]@ argument to
-@link@ represents the list of (recursive groups of) modules which have
-been newly compiled, along with @Linkable@s representing each of the
-packages in use (the compilation manager knows which external packages
-are referenced by the home package).  The order of the list is
-important: it is sorted in such a way that linking any prefix of the
-list will result in an image with no unresolved references.  Note that
-for batch linking there may be further restrictions; for example it
-may not be possible to link recursive groups containing libraries.
+After repeatedly using @compile@ to compile all modules which are
+out-of-date, the @link@ is invoked.  The @[[Linkable]]@ argument to
+@link@ represents the list of (recursive groups of) home modules which
+have been newly compiled, along with @Linkable@s for each of
+the packages in use (the compilation manager knows which external
+packages are referenced by the home package).  The order of the list
+is important: it is sorted in such a way that linking any prefix of
+the list will result in an image with no unresolved references.  Note
+that for batch linking there may be further restrictions; for example
+it may not be possible to link recursive groups containing libraries.
 
-The linker must do the following when invoked via @link@:
+@link@ does the following:
 
 \begin{itemize}
-  \item Unlink any objects already in memory which correspond to
-  modules which have just been recompiled (interactive system only).
-  The objects which correspond to a module are obtained from the
-  @Linkable@ (see below).
-
-  \item Link the objects representing the newly compiled modules into
-  memory, along with any packages which haven't already been brought
-  in.  In the batch system, this just means invoking the external
-  linker to link everything in one go.
+  \item 
+  In batch mode, do nothing.  In interactive mode,
+  examine the supplied @[[Linkable]]@ to determine which home 
+  module @Unlinked@s are new.  Remove precisely these @Linkable@s 
+  from PLS.  (In fact we really need to remove their upwards
+  transitive closure, but I think it is an invariant that CM will
+  supply an upwards transitive closure of new modules).
+  See below for descriptions of @Linkable@ and @Unlinked@.
+
+  \item 
+  Batch system: invoke the external linker to link everything in one go.
+  Interactive: bind the @Unlinked@s for the newly compiled modules,
+  plus those for any newly required packages, into PLS.
 
   Note that it is the linker's responsibility to remember which
-  objects and packages have already been linked.
+  objects and packages have already been linked.  By comparing this
+  with the @Linkable@s supplied to @link@, it can determine which
+  of the linkables in LI are out of date
 \end{itemize}
 
-If linking in of a group should fail for some reason, it is @link@'s
-responsibility to not modify its @LinkState@ at all.  In other words,
-linking each group is atomic; it either succeeds or fails.
+If linking in of a group should fail for some reason, @link@ should
+not modify its @LinkState@ at all.  In other words, linking each group
+is atomic; it either succeeds or fails.
 
-\subsection{Internal Data Structures}
+\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
 
 Two important types: @Unlinked@ and @Linkable@.  The latter is a 
 higher-level representation involving multiple of the former.
@@ -486,7 +597,7 @@ a linker could take as input:
                  | Trees  [StgTree RdrName]
 \end{verbatim}
 
-\noindent The first three describe the location of a file (presumably)
+The first three describe the location of a file (presumably)
 containing the code to link.  @Trees@, which only exists in
 interactive mode, gives a list of @StgTrees@, in which the unresolved
 references are @RdrNames@ -- hence it's non-linkedness.  Once linked,
@@ -501,115 +612,20 @@ with either a module or package:
                  | LP PkgName              -- a package
 \end{verbatim}
 
-\noindent The order of the @Unlinked@s in the list is important, as
+The order of the @Unlinked@s in the list is important, as
 they are linked in left-to-right order.  The @Unlinked@ objects for a
 particular package can be obtained from the package configuration (see
 Section \ref{sec:package-config}).
 
-\subsubsection{Contents of \texttt{LinkState}}
+\ToDo{When adding @Addr@s from an object module to SST, we need to
+      somehow find out the @RdrName@s of the symbols exported by that
+      module. 
+      So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
 
-The @LinkState@ is empty for batch compilation, where the linker
-doesn't need andy persistent state because there is only a single link
-step.
 
-In the interactive system, the @LinkState@ contains two symbol tables:
-
-\begin{itemize}
-\item \textbf{The Source Symbol Table}@ :: FiniteMap RdrName HValue@
-
-The source symbol table is used when linking interpreted code.
-Unlinked interpreted code consists of an abstract syntax tree where
-the leaves are @RdrNames@; the linker's job is to resolve these to
-actual addresses (the alternative is to resolve these lazily when the
-code is run, but this requires passing the full symbol table through
-the interpreter and the repeated lookups will probably be expensive).
-
-The source symbol table therefore maps @RdrName@s to @HValue@s, for
-every @RdrName@ that currently \emph{has} an @HValue@, including all
-exported functions from object code modules that are currently linked
-in.
-
-It is important that we can prune this symbol table by throwing away
-the mappings for an entire module, whenever we recompile/relink a
-given module.  The representation is therefore probably a two-level
-mapping, from module names, to function/constructor names, to
-@HValue@s.
-
-\item \textbf{The Object Symbol Table}@ :: FiniteMap String Addr@
-
-This is a lower level symbol table, mapping symbol names in object
-modules to their addresses in memory.  It is used only when resolving
-the external references in an object module, and contains only entries
-that are defined in object modules.
-\end{itemize}
-
-Why have two symbol tables?  Well, there is a clear distinction
-between the two: the source symbol table is mapping Haskell symbols to
-Haskell values, and the object symbol table is mapping object symbols
-to addresses.  There is some overlap, in that Haskell symbols
-certainly have addresses, and we could look up a Haskell symbol's
-address by manufacturing the right object symbol and looking that up
-in the object symbol table, but this is likely to be slow and would
-force us to extend the object symbol table with all the symbols
-``exported'' by interpreted code.  Doing it this way enables us to
-decouple the object management subsystem from the rest of the linker
-with a minimal interface; something like
-
-\begin{verbatim}
-  loadObject   :: Unlinked -> IO Object
-  unloadModule :: Unlinked -> IO ()
-  lookupSymbol :: String   -> IO Addr
-\end{verbatim}
-
-\noindent Rather unfortunately we need @lookupSymbol@ in order to
-populate the source symbol table when linking in a new compiled
-module.
-
-Our object management subsystem is currently written in C, so
-decoupling this interface as much as possible is highly desirable.
-
-The @LinkState@ also notionally contains the currently linked image:
-
-\begin{itemize}
-\item
-   {\bf Linked Image (LI)} @:: no-explicit-representation@
-
-   LI isn't explicitly represented in the system, but we record it
-   here for completeness anyway.  LI is the current set of
-   linked-together module, package and other library fragments
-   constituting the current executable mass.  LI comprises:
-   \begin{itemize}
-   \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
-         These are loaded from disk when needed, and stored in
-         @malloc@ville.  To simplify storage management, they are
-         never freed or reused, since this creates serious
-         complications for storage management.  When no longer needed,
-         they are simply abandoned.  New linkings of the same object
-         code produces new copies in memory.  We hope this not to be
-         too much of a space leak.
-   \item STG trees, which live in the GHCI heap and are managed by the
-         storage manager in the usual way.  They are held alive (are 
-         reachable) via the @HValue@s in the OST.  Such @HValue@s are
-         applications of the interpreter function to the trees
-         themselves.  Linking a tree comprises travelling over the 
-         tree, replacing all the @Id@s with pointers directly to the
-         relevant @_closure@ labels, as determined by searching the
-         OST.  Once the leaves are linked, trees are wrapped with the
-         interpreter function.  The resulting @HValue@s then behave
-         indistinguishably from compiled versions of the same code.
-   \end{itemize}
-   Because object code is outside the heap and never deallocated,
-   whilst interpreted code is held alive by the OST, there's no need
-   to have a data structure which ``is'' the linked image.
-
-   For batch compilation, LI doesn't exist because OST doesn't exist,
-   and because @link@ doesn't load code into memory, instead just
-   invokes the system linker.
-
-   \ToDo{Do we need to say anything about CAFs and SRTs?  Probably ...}
-\end{itemize}
 
 \subsection{What CM does}
+\label{sec:compilation-manager}
 Pretty much as before.  
 
 Plus: detect module cycles during the downsweep.  During the upsweep,
author	sewardj <unknown>	2000-09-26 16:29:59 +0000
committer	sewardj <unknown>	2000-09-26 16:29:59 +0000
commit	603eccf76ca5df1732dd0744ec941f014e6e3805 (patch)
tree	6e557c6feeeacaba21e4984f02ed9e9c0cb0f5b6 /ghc/docs/ghci/ghci.tex
parent	1fb4b05b4e8e7e744a62595af6669af1a57c8779 (diff)
download	haskell-603eccf76ca5df1732dd0744ec941f014e6e3805.tar.gz