summaryrefslogtreecommitdiff
path: root/ghc/docs/ghci/ghci.tex
diff options
context:
space:
mode:
authorsewardj <unknown>2000-09-26 16:29:59 +0000
committersewardj <unknown>2000-09-26 16:29:59 +0000
commit603eccf76ca5df1732dd0744ec941f014e6e3805 (patch)
tree6e557c6feeeacaba21e4984f02ed9e9c0cb0f5b6 /ghc/docs/ghci/ghci.tex
parent1fb4b05b4e8e7e744a62595af6669af1a57c8779 (diff)
downloadhaskell-603eccf76ca5df1732dd0744ec941f014e6e3805.tar.gz
[project @ 2000-09-26 16:29:59 by sewardj]
Rearrange the Linker section a bit, in line with upcoming rearrangement of Compiler and CM sections.
Diffstat (limited to 'ghc/docs/ghci/ghci.tex')
-rw-r--r--ghc/docs/ghci/ghci.tex288
1 files changed, 152 insertions, 136 deletions
diff --git a/ghc/docs/ghci/ghci.tex b/ghc/docs/ghci/ghci.tex
index 62a1fb1ac9..6fab8a3fab 100644
--- a/ghc/docs/ghci/ghci.tex
+++ b/ghc/docs/ghci/ghci.tex
@@ -423,56 +423,167 @@ What @compile@ does: \ToDo{A bit vague ... needs refining. How does
boot interface against the inferred interface.}
\end{itemize}
-\section{Linking}
-\subsection{External API}
+
+\subsection{Linking}
+\label{sec:linker}
+
+\subsubsection{The linker's private data structures}
+
+In the same way that @compile@ has a persistent compiler state (PCS),
+the linker has a persistent (session-lifetime) state, LPS, the
+Linker's Persistent State. In batch mode LPS is entirely irrelevant,
+because there is only a single link step, and can be a unit value
+ignored by everybody. In interactive mode LPS is composed of the
+following three parts:
+
+\begin{itemize}
+\item
+\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@
+ The source symbol table is used when linking interpreted code.
+ Unlinked interpreted code consists of an STG tree where
+ the leaves are @RdrNames@. The linker's job is to resolve these to
+ actual addresses (the alternative is to resolve these lazily when
+ the code is run, but this requires passing the full symbol table
+ through the interpreter and the repeated lookups will probably be
+ expensive).
+
+ The source symbol table therefore maps @RdrName@s to @HValue@s, for
+ every @RdrName@ that currently \emph{has} an @HValue@, including all
+ exported functions from object code modules that are currently
+ linked in. Linking therefore turns a @StgTree RdrName@ into an
+ @StgTree HValue@.
+
+ It is important that we can prune this symbol table by throwing away
+ the mappings for an entire module, whenever we recompile/relink a
+ given module. The representation is therefore probably a two-level
+ mapping, from module names, to function/constructor names, to
+ @HValue@s.
+
+\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
+ This is a lower level symbol table, mapping symbol names in object
+ modules to their addresses in memory. It is used only when
+ resolving the external references in an object module, and contains
+ only entries that are defined in object modules.
+
+ Why have two symbol tables? Well, there is a clear distinction
+ between the two: the source symbol table maps Haskell symbols to
+ Haskell values, and the object symbol table maps object symbols to
+ addresses. There is some overlap, in that Haskell symbols certainly
+ have addresses, and we could look up a Haskell symbol's address by
+ manufacturing the right object symbol and looking that up in the
+ object symbol table, but this is likely to be slow and would force
+ us to extend the object symbol table with all the symbols
+ ``exported'' by interpreted code. Doing it this way enables us to
+ decouple the object management subsystem from the rest of the linker
+ with a minimal interface; something like
+
+ \begin{verbatim}
+ loadObject :: Unlinked -> IO Object
+ unloadModule :: Unlinked -> IO ()
+ lookupSymbol :: String -> IO Addr
+ \end{verbatim}
+
+ Rather unfortunately we need @lookupSymbol@ in order to populate the
+ source symbol table when linking in a new compiled module. Our
+ object management subsystem is currently written in C, so decoupling
+ this interface as much as possible is highly desirable.
+
+\item
+ {\bf Linked Image (LI)} @:: no-explicit-representation@
+
+ LI isn't explicitly represented in the system, but we record it
+ here for completeness anyway. LI is the current set of
+ linked-together module, package and other library fragments
+ constituting the current executable mass. LI comprises:
+ \begin{itemize}
+ \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
+ These are loaded from disk when needed, and stored in
+ @malloc@ville. To simplify storage management, they are
+ never freed or reused, since this creates serious
+ complications for storage management. When no longer needed,
+ they are simply abandoned. New linkings of the same object
+ code produces new copies in memory. We hope this not to be
+ too much of a space leak.
+ \item STG trees, which live in the GHCI heap and are managed by the
+ storage manager in the usual way. They are held alive (are
+ reachable) via the @HValue@s in the OST. Such @HValue@s are
+ applications of the interpreter function to the trees
+ themselves. Linking a tree comprises travelling over the
+ tree, replacing all the @Id@s with pointers directly to the
+ relevant @_closure@ labels, as determined by searching the
+ OST. Once the leaves are linked, trees are wrapped with the
+ interpreter function. The resulting @HValue@s then behave
+ indistinguishably from compiled versions of the same code.
+ \end{itemize}
+ Because object code is outside the heap and never deallocated,
+ whilst interpreted code is held alive by the OST, there's no need
+ to have a data structure which ``is'' the linked image.
+
+ For batch compilation, LI doesn't exist because OST doesn't exist,
+ and because @link@ doesn't load code into memory, instead just
+ invokes the system linker.
+
+ \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...}
+\end{itemize}
+
+
+\subsubsection{The linker's interface}
+
+In practice, the PLS might be hidden in the I/O monad rather
+than passed around explicitly. (The same might be true for PCS).
+Anyway:
\begin{verbatim}
- data LinkState -- abstract
+ data PCS -- as described above; opaque to everybody except the linker
- link :: [[Linkable]] -> LinkState -> IO LinkResult
+ link :: PCI -> ??? -> [[Linkable]] -> LinkState -> IO LinkResult
- data LinkResult = LinkOK LinkState
- | LinkErrs [SDoc] LinkState
+ data LinkResult = LinkOK LinkState
+ | LinkErrs LinkState [SDoc]
\end{verbatim}
-In practice, the @LinkState@ might be hidden in the I/O monad rather
-than passed around explicitly.
+CM uses @link@ as follows:
-The linker is used by the compilation manager as follows after
-repeatedly calling the compiler to compile all modules which are
-out-of-date, the linker is invoked. The @[[Linkable]]@ argument to
-@link@ represents the list of (recursive groups of) modules which have
-been newly compiled, along with @Linkable@s representing each of the
-packages in use (the compilation manager knows which external packages
-are referenced by the home package). The order of the list is
-important: it is sorted in such a way that linking any prefix of the
-list will result in an image with no unresolved references. Note that
-for batch linking there may be further restrictions; for example it
-may not be possible to link recursive groups containing libraries.
+After repeatedly using @compile@ to compile all modules which are
+out-of-date, the @link@ is invoked. The @[[Linkable]]@ argument to
+@link@ represents the list of (recursive groups of) home modules which
+have been newly compiled, along with @Linkable@s for each of
+the packages in use (the compilation manager knows which external
+packages are referenced by the home package). The order of the list
+is important: it is sorted in such a way that linking any prefix of
+the list will result in an image with no unresolved references. Note
+that for batch linking there may be further restrictions; for example
+it may not be possible to link recursive groups containing libraries.
-The linker must do the following when invoked via @link@:
+@link@ does the following:
\begin{itemize}
- \item Unlink any objects already in memory which correspond to
- modules which have just been recompiled (interactive system only).
- The objects which correspond to a module are obtained from the
- @Linkable@ (see below).
-
- \item Link the objects representing the newly compiled modules into
- memory, along with any packages which haven't already been brought
- in. In the batch system, this just means invoking the external
- linker to link everything in one go.
+ \item
+ In batch mode, do nothing. In interactive mode,
+ examine the supplied @[[Linkable]]@ to determine which home
+ module @Unlinked@s are new. Remove precisely these @Linkable@s
+ from PLS. (In fact we really need to remove their upwards
+ transitive closure, but I think it is an invariant that CM will
+ supply an upwards transitive closure of new modules).
+ See below for descriptions of @Linkable@ and @Unlinked@.
+
+ \item
+ Batch system: invoke the external linker to link everything in one go.
+ Interactive: bind the @Unlinked@s for the newly compiled modules,
+ plus those for any newly required packages, into PLS.
Note that it is the linker's responsibility to remember which
- objects and packages have already been linked.
+ objects and packages have already been linked. By comparing this
+ with the @Linkable@s supplied to @link@, it can determine which
+ of the linkables in LI are out of date
\end{itemize}
-If linking in of a group should fail for some reason, it is @link@'s
-responsibility to not modify its @LinkState@ at all. In other words,
-linking each group is atomic; it either succeeds or fails.
+If linking in of a group should fail for some reason, @link@ should
+not modify its @LinkState@ at all. In other words, linking each group
+is atomic; it either succeeds or fails.
-\subsection{Internal Data Structures}
+\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
Two important types: @Unlinked@ and @Linkable@. The latter is a
higher-level representation involving multiple of the former.
@@ -486,7 +597,7 @@ a linker could take as input:
| Trees [StgTree RdrName]
\end{verbatim}
-\noindent The first three describe the location of a file (presumably)
+The first three describe the location of a file (presumably)
containing the code to link. @Trees@, which only exists in
interactive mode, gives a list of @StgTrees@, in which the unresolved
references are @RdrNames@ -- hence it's non-linkedness. Once linked,
@@ -501,115 +612,20 @@ with either a module or package:
| LP PkgName -- a package
\end{verbatim}
-\noindent The order of the @Unlinked@s in the list is important, as
+The order of the @Unlinked@s in the list is important, as
they are linked in left-to-right order. The @Unlinked@ objects for a
particular package can be obtained from the package configuration (see
Section \ref{sec:package-config}).
-\subsubsection{Contents of \texttt{LinkState}}
+\ToDo{When adding @Addr@s from an object module to SST, we need to
+ somehow find out the @RdrName@s of the symbols exported by that
+ module.
+ So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
-The @LinkState@ is empty for batch compilation, where the linker
-doesn't need andy persistent state because there is only a single link
-step.
-In the interactive system, the @LinkState@ contains two symbol tables:
-
-\begin{itemize}
-\item \textbf{The Source Symbol Table}@ :: FiniteMap RdrName HValue@
-
-The source symbol table is used when linking interpreted code.
-Unlinked interpreted code consists of an abstract syntax tree where
-the leaves are @RdrNames@; the linker's job is to resolve these to
-actual addresses (the alternative is to resolve these lazily when the
-code is run, but this requires passing the full symbol table through
-the interpreter and the repeated lookups will probably be expensive).
-
-The source symbol table therefore maps @RdrName@s to @HValue@s, for
-every @RdrName@ that currently \emph{has} an @HValue@, including all
-exported functions from object code modules that are currently linked
-in.
-
-It is important that we can prune this symbol table by throwing away
-the mappings for an entire module, whenever we recompile/relink a
-given module. The representation is therefore probably a two-level
-mapping, from module names, to function/constructor names, to
-@HValue@s.
-
-\item \textbf{The Object Symbol Table}@ :: FiniteMap String Addr@
-
-This is a lower level symbol table, mapping symbol names in object
-modules to their addresses in memory. It is used only when resolving
-the external references in an object module, and contains only entries
-that are defined in object modules.
-\end{itemize}
-
-Why have two symbol tables? Well, there is a clear distinction
-between the two: the source symbol table is mapping Haskell symbols to
-Haskell values, and the object symbol table is mapping object symbols
-to addresses. There is some overlap, in that Haskell symbols
-certainly have addresses, and we could look up a Haskell symbol's
-address by manufacturing the right object symbol and looking that up
-in the object symbol table, but this is likely to be slow and would
-force us to extend the object symbol table with all the symbols
-``exported'' by interpreted code. Doing it this way enables us to
-decouple the object management subsystem from the rest of the linker
-with a minimal interface; something like
-
-\begin{verbatim}
- loadObject :: Unlinked -> IO Object
- unloadModule :: Unlinked -> IO ()
- lookupSymbol :: String -> IO Addr
-\end{verbatim}
-
-\noindent Rather unfortunately we need @lookupSymbol@ in order to
-populate the source symbol table when linking in a new compiled
-module.
-
-Our object management subsystem is currently written in C, so
-decoupling this interface as much as possible is highly desirable.
-
-The @LinkState@ also notionally contains the currently linked image:
-
-\begin{itemize}
-\item
- {\bf Linked Image (LI)} @:: no-explicit-representation@
-
- LI isn't explicitly represented in the system, but we record it
- here for completeness anyway. LI is the current set of
- linked-together module, package and other library fragments
- constituting the current executable mass. LI comprises:
- \begin{itemize}
- \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
- These are loaded from disk when needed, and stored in
- @malloc@ville. To simplify storage management, they are
- never freed or reused, since this creates serious
- complications for storage management. When no longer needed,
- they are simply abandoned. New linkings of the same object
- code produces new copies in memory. We hope this not to be
- too much of a space leak.
- \item STG trees, which live in the GHCI heap and are managed by the
- storage manager in the usual way. They are held alive (are
- reachable) via the @HValue@s in the OST. Such @HValue@s are
- applications of the interpreter function to the trees
- themselves. Linking a tree comprises travelling over the
- tree, replacing all the @Id@s with pointers directly to the
- relevant @_closure@ labels, as determined by searching the
- OST. Once the leaves are linked, trees are wrapped with the
- interpreter function. The resulting @HValue@s then behave
- indistinguishably from compiled versions of the same code.
- \end{itemize}
- Because object code is outside the heap and never deallocated,
- whilst interpreted code is held alive by the OST, there's no need
- to have a data structure which ``is'' the linked image.
-
- For batch compilation, LI doesn't exist because OST doesn't exist,
- and because @link@ doesn't load code into memory, instead just
- invokes the system linker.
-
- \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...}
-\end{itemize}
\subsection{What CM does}
+\label{sec:compilation-manager}
Pretty much as before.
Plus: detect module cycles during the downsweep. During the upsweep,