summaryrefslogtreecommitdiff
path: root/docs/ghci/ghci.tex
diff options
context:
space:
mode:
authorSimon Marlow <simonmar@microsoft.com>2006-04-07 02:05:11 +0000
committerSimon Marlow <simonmar@microsoft.com>2006-04-07 02:05:11 +0000
commit0065d5ab628975892cea1ec7303f968c3338cbe1 (patch)
tree8e2afe0ab48ee33cf95009809d67c9649573ef92 /docs/ghci/ghci.tex
parent28a464a75e14cece5db40f2765a29348273ff2d2 (diff)
downloadhaskell-0065d5ab628975892cea1ec7303f968c3338cbe1.tar.gz
Reorganisation of the source tree
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
Diffstat (limited to 'docs/ghci/ghci.tex')
-rw-r--r--docs/ghci/ghci.tex1598
1 files changed, 1598 insertions, 0 deletions
diff --git a/docs/ghci/ghci.tex b/docs/ghci/ghci.tex
new file mode 100644
index 0000000000..c4638a6719
--- /dev/null
+++ b/docs/ghci/ghci.tex
@@ -0,0 +1,1598 @@
+%
+% (c) The OBFUSCATION-THROUGH-GRATUITOUS-PREPROCESSOR-ABUSE Project,
+% Glasgow University, 1990-2000
+%
+
+% \documentstyle[preprint]{acmconf}
+\documentclass[11pt]{article}
+\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin
+\evensidemargin 0.1 in
+\marginparwidth 0.85in % Narrow margins require narrower marginal notes
+\marginparsep 0 in
+\sloppy
+
+%\usepackage{epsfig}
+\usepackage{shortvrb}
+\MakeShortVerb{\@}
+
+%\newcommand{\note}[1]{{\em Note: #1}}
+\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
+\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
+\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
+\newcommand{\bottom}{\perp}
+
+\newcommand{\secref}[1]{Section~\ref{sec:#1}}
+\newcommand{\figref}[1]{Figure~\ref{fig:#1}}
+\newcommand{\Section}[2]{\section{#1}\label{sec:#2}}
+\newcommand{\Subsection}[2]{\subsection{#1}\label{sec:#2}}
+\newcommand{\Subsubsection}[2]{\subsubsection{#1}\label{sec:#2}}
+
+% DIMENSION OF TEXT:
+\textheight 8.5 in
+\textwidth 6.25 in
+
+\topmargin 0 in
+\headheight 0 in
+\headsep .25 in
+
+
+\setlength{\parskip}{0.15cm}
+\setlength{\parsep}{0.15cm}
+\setlength{\topsep}{0cm} % Reduces space before and after verbatim,
+ % which is implemented using trivlist
+\setlength{\parindent}{0cm}
+
+\renewcommand{\textfraction}{0.2}
+\renewcommand{\floatpagefraction}{0.7}
+
+\begin{document}
+
+\title{The GHCi Draft Design, round 2}
+\author{MSR Cambridge Haskell Crew \\
+ Microsoft Research Ltd., Cambridge}
+
+\maketitle
+
+%%%\tableofcontents
+%%%\newpage
+
+%%-----------------------------------------------------------------%%
+\section{Details}
+
+\subsection{Outline of the design}
+\label{sec:details-intro}
+
+The design falls into three major parts:
+\begin{itemize}
+\item The compilation manager (CM), which coordinates the
+ system and supplies a HEP-like interface to clients.
+\item The module compiler (@compile@), which translates individual
+ modules to interpretable or machine code.
+\item The linker (@link@),
+ which maintains the executable image in interpreted mode.
+\end{itemize}
+
+There are also three auxiliary parts: the finder, which locates
+source, object and interface files, the summariser, which quickly
+finds dependency information for modules, and the static info
+(compiler flags and package details), which is unchanged over the
+course of a session.
+
+This section continues with an overview of the session-lifetime data
+structures. Then follows the finder (section~\ref{sec:finder}),
+summariser (section~\ref{sec:summariser}),
+static info (section~\ref{sec:staticinfo}),
+and finally the three big sections
+(\ref{sec:manager},~\ref{sec:compiler},~\ref{sec:linker})
+on the compilation manager, compiler and linker respectively.
+
+\subsubsection*{Some terminology}
+
+Lifetimes: the phrase {\bf session lifetime} covers a complete run of
+GHCI, encompassing multiple recompilation runs. {\bf Module lifetime}
+is a lot shorter, being that of data needed to translate a single
+module, but then discarded, for example Core, AbstractC, Stix trees.
+
+Data structures with module lifetime are well documented and understood.
+This document is mostly concerned with session-lifetime data.
+Most of these structures are ``owned'' by CM, since that's
+the only major component of GHCI which deals with session-lifetime
+issues.
+
+Modules and packages: {\bf home} refers to modules in this package,
+precisely the ones tracked and updated by the compilation manager.
+{\bf Package} refers to all other packages, which are assumed static.
+
+\subsubsection*{A summary of all session-lifetime data structures}
+
+These structures have session lifetime but not necessarily global
+visibility. Subsequent sections elaborate who can see what.
+\begin{itemize}
+\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming
+ environments created by compiling each home module.
+\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory
+ representations of the interface file created by compiling
+ each home module.
+\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet
+ unlinked translations of home modules only.
+\item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
+\item {\bf Static Info (SI)} (owner: CM) is the package configuration
+ information (PCI) and compiler flags (FLAGS).
+\item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
+ is @compile@'s private cache of information about package
+ modules.
+\item {\bf Persistent Linker State (PLS)} (owner: @link@) is
+ @link@'s private information concerning the the current
+ state of the (in-memory) executable image.
+\end{itemize}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The finder (\mbox{\tt type Finder})}
+\label{sec:finder}
+
+@Path@ could be an indication of a location in a filesystem, or it
+could be some more generic kind of resource identifier, a URL for
+example.
+\begin{verbatim}
+ data Path = ...
+\end{verbatim}
+
+And some names. @Module@s are now used as primary keys for various
+maps, so they are given a @Unique@.
+\begin{verbatim}
+ type ModName = String -- a module name
+ type PkgName = String -- a package name
+ type Module = -- contains ModName and a Unique, at least
+\end{verbatim}
+
+A @ModLocation@ says where a module is, what it's called and in what
+form it is.
+\begin{verbatim}
+ data ModLocation = SourceOnly Module Path -- .hs
+ | ObjectCode Module Path Path -- .o, .hi
+ | InPackage Module PkgName
+ -- examine PCI to determine package Path
+\end{verbatim}
+
+The module finder generates @ModLocation@s from @ModName@s. We expect
+it will assume packages to be static, but we want to be able to track
+changes in home modules during the session. Specifically, we want to
+be able to notice that a module's object and interface have been
+updated, presumably by a compile run outside of the GHCI session.
+Hence the two-stage type:
+\begin{verbatim}
+ type Finder = ModName -> IO ModLocation
+ newFinder :: PCI -> IO Finder
+\end{verbatim}
+@newFinder@ examines the package information right at the start, but
+returns an @IO@-typed function which can inspect home module changes
+later in the session.
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The summariser (\mbox{\tt summarise})}
+\label{sec:summariser}
+
+A @ModSummary@ records the minimum information needed to establish the
+module graph and determine whose source has changed. @ModSummary@s
+can be created quickly.
+\begin{verbatim}
+ data ModSummary = ModSummary
+ ModLocation -- location and kind
+ (Maybe (String, Fingerprint))
+ -- source and fingerprint if .hs
+ (Maybe [ModName]) -- imports if .hs or .hi
+
+ type Fingerprint = ... -- file timestamp, or source checksum?
+
+ summarise :: ModLocation -> IO ModSummary
+\end{verbatim}
+
+The summary contains the location and source text, and the location
+contains the name. We would like to remove the assumption that
+sources live on disk, but I'm not sure this is good enough yet.
+
+\ToDo{Should @ModSummary@ contain source text for interface files too?}
+\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@ (why?).}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{Static information (SI)}
+\label{sec:staticinfo}
+
+PCI, the package configuration information, is a list of @PkgInfo@,
+each containing at least the following:
+\begin{verbatim}
+ data PkgInfo
+ = PkgInfo PkgName -- my name
+ Path -- path to my base location
+ [PkgName] -- who I depend on
+ [ModName] -- modules I supply
+ [Unlinked] -- paths to my object files
+
+ type PCI = [PkgInfo]
+\end{verbatim}
+The @Path@s in it, including those in the @Unlinked@s, are set up
+when GHCI starts.
+
+FLAGS is a bunch of compiler options. We haven't figured out yet how
+to partition them into those for the whole session vs those for
+specific source files, so currently the best we can do is:
+\begin{verbatim}
+ data FLAGS = ...
+\end{verbatim}
+
+The static information (SI) is the both of these:
+\begin{verbatim}
+ data SI = SI PCI
+ FLAGS
+\end{verbatim}
+
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The Compilation Manager (CM)}
+\label{sec:manager}
+
+\subsubsection{Data structures owned by CM}
+
+CM maintains two maps (HST, HIT) and a set (UI). It's important to
+realise that CM only knows about the map/set-ness, and has no idea
+what a @ModDetails@, @ModIFace@ or @Linkable@ is. Only @compile@ and
+@link@ know that, and CM passes these types around without
+inspecting them.
+
+\begin{itemize}
+\item
+ {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
+
+ The @ModDetails@ (a couple of layers down) contain tycons, classes,
+ instances, etc, collectively known as ``entities''. Referrals from
+ other modules to these entities is direct, with no intervening
+ indirections of any kind; conversely, these entities refer directly
+ to other entities, regardless of module boundaries. HST only holds
+ information for home modules; the corresponding wired-up details
+ for package (non-home) modules are created on demand in the package
+ symbol table (PST) inside the persistent compiler's state (PCS).
+
+ CM maintains the HST, which is passed to, but not modified by,
+ @compile@. If compilation of a module is successful, @compile@
+ returns the resulting @ModDetails@ (inside the @CompResult@) which
+ CM then adds to HST.
+
+ CM throws away arbitrarily large parts of HST at the start of a
+ rebuild, and uses @compile@ to incrementally reconstruct it.
+
+\item
+ {\bf Home Interface Table (HIT)} @:: FiniteMap Module ModIFace@
+
+ (Completely private to CM; nobody else sees this).
+
+ Compilation of a module always creates a @ModIFace@, which contains
+ the unlinked symbol table entries. CM maintains this @FiniteMap@
+ @ModName@ @ModIFace@, with session lifetime. CM never throws away
+ @ModIFace@s, but it does update them, by passing old ones to
+ @compile@ if they exist, and getting new ones back.
+
+ CM acquires @ModuleIFace@s from @compile@, which it only applies
+ to modules in the home package. As a result, HIT only contains
+ @ModuleIFace@s for modules in the home package. Those from other
+ packages reside in the package interface table (PIT) which is a
+ component of PCS.
+
+\item
+ {\bf Unlinked Images (UI)} @:: Set Linkable@
+
+ The @Linkable@s in UI represent executable but as-yet unlinked
+ module translations. A @Linkable@ can contain the name of an
+ object, archive or DLL file. In interactive mode, it may also be
+ the STG trees derived from translating a module. So @compile@
+ returns a @Linkable@ from each successful run, namely that of
+ translating the module at hand.
+
+ At link-time, CM supplies @Linkable@s for the upwards closure of
+ all packages which have changed, to @link@. It also examines the
+ @ModSummary@s for all home modules, and by examining their imports
+ and the SI.PCI (package configuration info) it can determine the
+ @Linkable@s from all required imported packages too.
+
+ @Linkable@s and @ModIFace@s have a close relationship. Each
+ translated module has a corresponding @Linkable@ somewhere.
+ However, there may be @Linkable@s with no corresponding modules
+ (the RTS, for example). Conversely, multiple modules may share a
+ single @Linkable@ -- as is the case for any module from a
+ multi-module package. For these reasons it seems appropriate to
+ keep the two concepts distinct. @Linkable@s also provide
+ information about the sequence in which individual package
+ components should be linked, and that isn't the business of any
+ specific module to know.
+
+ CM passes @compile@ a module's old @ModIFace@, if it has one, in
+ the hope that the module won't need recompiling. If so, @compile@
+ can just return the new @ModDetails@ created from it, and CM will
+ re-use the old @ModIFace@. If the module {\em is} recompiled (or
+ scheduled to be loaded from disk), @compile@ returns both the
+ new @ModIFace@ and new @Linkable@.
+
+\item
+ {\bf Module Graph (MG)} @:: known-only-to-CM@
+
+ Records, for CM's purposes, the current module graph,
+ up-to-dateness and summaries. More details when I get to them.
+ Only contains home modules.
+\end{itemize}
+Probably all this stuff is rolled together into the Persistent CM
+State (PCMS):
+\begin{verbatim}
+ data PCMS = PCMS HST HIT UI MG
+ emptyPCMS :: IO PCMS
+\end{verbatim}
+
+\subsubsection{What CM implements}
+It pretty much implements the HEP interface. First, though, define a
+containing structure for the state of the entire CM system and its
+subsystems @compile@ and @link@:
+\begin{verbatim}
+ data CmState
+ = CmState PCMS -- CM's stuff
+ PCS -- compile's stuff
+ PLS -- link's stuff
+ SI -- the static info, never changes
+ Finder -- the finder
+\end{verbatim}
+
+The @CmState@ is threaded through the HEP interface. In reality
+this might be done using @IORef@s, but for clarity:
+\begin{verbatim}
+ type ModHandle = ... (opaque to CM/HEP clients) ...
+ type HValue = ... (opaque to CM/HEP clients) ...
+
+ cmInit :: FLAGS
+ -> [PkgInfo]
+ -> IO CmState
+
+ cmLoadModule :: CmState
+ -> ModName
+ -> IO (CmState, Either [SDoc] ModHandle)
+
+ cmGetExpr :: ModHandle
+ -> CmState
+ -> String -> IO (CmState, Either [SDoc] HValue)
+
+ cmRunExpr :: HValue -> IO () -- don't need CmState here
+\end{verbatim}
+Almost all the huff and puff in this document pertains to @cmLoadModule@.
+
+
+\subsubsection{Implementing \mbox{\tt cmInit}}
+@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@,
+@emptyPLS@, making SI from the supplied flags and package info, and
+by supplying the package info the @newFinder@.
+
+
+\subsubsection{Implementing \mbox{\tt cmLoadModule}}
+
+\begin{enumerate}
+\item {\bf Downsweep:} using @finder@ and @summarise@, chase from
+ the given module to
+ establish the new home module graph (MG). Do not chase into
+ package modules.
+\item Remove from HIT, HST, UI any modules in the old MG which are
+ not in the new one. The old MG is then replaced by the new one.
+\item Topologically sort MG to generate a bottom-to-top traversal
+ order, giving a worklist.
+\item {\bf Upsweep:} call @compile@ on each module in the worklist in
+ turn, passing it
+ the ``correct'' HST, PCS, the old @ModIFace@ if
+ available, and the summary. ``Correct'' HST in the sense that
+ HST contains only the modules in the this module's downward
+ closure, so that @compile@ can construct the correct instance
+ and rule environments simply as the union of those in
+ the module's downward closure.
+
+ If @compile@ doesn't return a new interface/linkable pair,
+ compilation wasn't necessary. Either way, update HST with
+ the new @ModDetails@, and UI and HIT respectively if a
+ compilation {\em did} occur.
+
+ Keep going until the root module is successfully done, or
+ compilation fails.
+
+\item If the previous step terminated because compilation failed,
+ define the successful set as those modules in successfully
+ completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding
+ those from modules in any cycle which includes the module which failed.
+ Remove from HST, HIT, UI and MG all modules mentioned in MG which
+ are not in the successful set. Call @link@ with the successful
+ set,
+ which should succeed. The net effect is to back off to a point
+ in which those modules which are still aboard are correctly
+ compiled and linked.
+
+ If the previous step terminated successfully,
+ call @link@ passing it the @Linkable@s in the upward closure of
+ all those modules for which @compile@ produced a new @Linkable@.
+\end{enumerate}
+As a small optimisation, do this:
+\begin{enumerate}
+\item[3a.] Remove from the worklist any module M where M's source
+ hasn't changed and neither has the source of any module in M's
+ downward closure. This has the effect of not starting the upsweep
+ right at the bottom of the graph when that's not needed.
+ Source-change checking can be done quickly by CM by comparing
+ summaries of modules in MG against corresponding
+ summaries from the old MG.
+\end{enumerate}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The compiler (\mbox{\tt compile})}
+\label{sec:compiler}
+
+\subsubsection{Data structures owned by \mbox{\tt compile}}
+
+{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
+
+This contains info about foreign packages only, acting as a cache,
+which is private to @compile@. The cache never becomes out of
+date. There are three parts to it:
+
+ \begin{itemize}
+ \item
+ {\bf Package Interface Table (PIT)} @:: FiniteMap Module ModIFace@
+
+ @compile@ reads interfaces from modules in foreign packages, and
+ caches them in the PIT. Subsequent imports of the same module get
+ them directly out of the PIT, avoiding slow lexing/parsing phases.
+ Because foreign packages are assumed never to become out of date,
+ all contents of PIT remain valid forever. @compile@ of course
+ tries to find package interfaces in PIT in preference to reading
+ them from files.
+
+ Both successful and failed runs of @compile@ can add arbitrary
+ numbers of new interfaces to the PIT. The failed runs don't matter
+ because we assume that packages are static, so the data cached even
+ by a failed run is valid forever (ie for the rest of the session).
+
+ \item
+ {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
+
+ Adding an package interface to PIT doesn't make it directly usable
+ to @compile@, because it first needs to be wired (renamed +
+ typechecked) into the sphagetti of the HST. On the other hand,
+ most modules only use a few entities from any imported interface,
+ so wiring-in the interface at PIT-entry time might be a big time
+ waster. Also, wiring in an interface could mean reading other
+ interfaces, and we don't want to do that unnecessarily.
+
+ The PST avoids these problems by allowing incremental wiring-in to
+ happen. Pieces of foreign interfaces are copied out of the holding
+ pen (HP), renamed, typechecked, and placed in the PST, but only as
+ @compile@ discovers it needs them. In the process of incremental
+ renaming/typechecking, @compile@ may need to read more package
+ interfaces, which are added to the PIT and hence to
+ HP.~\ToDo{How? When?}
+
+ CM passes the PST to @compile@ and is returned an updated version
+ on both success and failure.
+
+ \item
+ {\bf Holding Pen (HP)} @:: HoldingPen@
+
+ HP holds parsed but not-yet renamed-or-typechecked fragments of
+ package interfaces. As typechecking of other modules progresses,
+ fragments are removed (``slurped'') from HP, renamed and
+ typechecked, and placed in PCS.PST (see above). Slurping a
+ fragment may require new interfaces to be read into HP. The hope
+ is, though, that many fragments will never get slurped, reducing
+ the total number of interfaces read (as compared to eager slurping).
+
+ \end{itemize}
+
+ PCS is opaque to CM; only @compile@ knows what's in it, and how to
+ update it. Because packages are assumed static, PCS never becomes
+ out of date. So CM only needs to be able to create an empty PCS,
+ with @emptyPCS@, and thence just passes it through @compile@ with
+ no further ado.
+
+ In return, @compile@ must promise not to store in PCS any
+ information pertaining to the home modules. If it did so, CM would
+ need to have a way to remove this information prior to commencing a
+ rebuild, which conflicts with PCS's opaqueness to CM.
+
+
+
+
+\subsubsection{What {\tt compile} does}
+@compile@ is necessarily somewhat complex. We've decided to do away
+with private global variables -- they make the design specification
+less clear, although the implementation might use them. Without
+further ado:
+\begin{verbatim}
+ compile :: SI -- obvious
+ -> Finder -- to find modules
+ -> ModSummary -- summary, including source
+ -> Maybe ModIFace
+ -- former summary, if avail
+ -> HST -- for home module ModDetails
+ -> PCS -- IN: the persistent compiler state
+
+ -> IO CompResult
+
+ data CompResult
+ = CompOK ModDetails -- new details (== HST additions)
+ (Maybe (ModIFace, Linkable))
+ -- summary and code; Nothing => compilation
+ -- not needed (old summary and code are still valid)
+ PCS -- updated PCS
+ [SDoc] -- warnings
+
+ | CompErrs PCS -- updated PCS
+ [SDoc] -- warnings and errors
+
+ data PCS
+ = MkPCS PIT -- package interfaces
+ PST -- post slurping global symtab contribs
+ HoldingPen -- pre slurping interface bits and pieces
+
+ emptyPCS :: IO PCS -- since CM has no other way to make one
+\end{verbatim}
+Although @compile@ is passed three of the global structures (FLAGS,
+HST and PCS), it only modifies PCS. The rest are modified by CM as it
+sees fit, from the stuff returned in the @CompResult@.
+
+@compile@ is allowed to return an updated PCS even if compilation
+errors occur, since the information in it pertains only to foreign
+packages and is assumed to be always-correct.
+
+What @compile@ does: \ToDo{A bit vague ... needs refining. How does
+ @finder@ come into the game?}
+\begin{itemize}
+\item Figure out if this module needs recompilation.
+ \begin{itemize}
+ \item If there's no old @ModIFace@, it does. Else:
+ \item Compare the @ModSummary@ supplied with that in the
+ old @ModIFace@. If the source has changed, recompilation
+ is needed. Else:
+ \item Compare the usage version numbers in the old @ModIFace@ with
+ those in the imported @ModIFace@s. All needed interfaces
+ for this should be in either HIT or PIT. If any version
+ numbers differ, recompilation is needed.
+ \item Otherwise it isn't needed.
+ \end{itemize}
+
+\item
+ If recompilation is not needed, create a new @ModDetails@ from the
+ old @ModIFace@, looking up information in HST and PCS.PST as
+ necessary. Return the new details, a @Nothing@ denoting
+ compilation was not needed, the PCS \ToDo{I don't think the PCS
+ should be updated, but who knows?}, and an empty warning list.
+
+\item
+ Otherwise, compilation is needed.
+
+ If the module is only available in object+interface form, read the
+ interface, make up details, create a linkable pointing at the
+ object code. \ToDo{Does this involve reading any more interfaces? Does
+ it involve updating PST?}
+
+ Otherwise, translate from source, then create and return: an
+ details, interface, linkable, updated PST, and warnings.
+
+ When looking for a new interface, search HST, then PCS.PIT, and only
+ then read from disk. In which case add the new interface(s) to
+ PCS.PIT.
+
+ \ToDo{If compiling a module with a boot-interface file, check the
+ boot interface against the inferred interface.}
+\end{itemize}
+
+
+\subsubsection{Contents of \mbox{\tt ModDetails},
+ \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}}
+Only @compile@ can see inside these three types -- they are opaque to
+everyone else. @ModDetails@ holds the post-renaming,
+post-typechecking environment created by compiling a module.
+
+\begin{verbatim}
+ data ModDetails
+ = ModDetails {
+ moduleExports :: Avails
+ moduleEnv :: GlobalRdrEnv -- == FM RdrName [Name]
+ typeEnv :: FM Name TyThing -- TyThing is in TcEnv.lhs
+ instEnv :: InstEnv
+ fixityEnv :: FM Name Fixity
+ ruleEnv :: FM Id [Rule]
+ }
+\end{verbatim}
+
+@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@:
+\begin{verbatim}
+ type ModIFace = ParsedIFace -- not really, but ...
+ data ParsedIface
+ = ParsedIface {
+ pi_mod :: Module, -- Complete with package info
+ pi_vers :: Version, -- Module version number
+ pi_orphan :: WhetherHasOrphans, -- Whether this module has orphans
+ pi_usages :: [ImportVersion OccName], -- Usages
+ pi_exports :: [ExportItem], -- Exports
+ pi_insts :: [RdrNameInstDecl], -- Local instance declarations
+ pi_decls :: [(Version, RdrNameHsDecl)], -- Local definitions
+ pi_fixity :: (Version, [RdrNameFixitySig]), -- Local fixity declarations,
+ -- with their version
+ pi_rules :: (Version, [RdrNameRuleDecl]), -- Rules, with their version
+ pi_deprecs :: [RdrNameDeprecation] -- Deprecations
+ }
+\end{verbatim}
+
+@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@,
+retaining just the 3 pieces actually comprising the holding pen:
+\begin{verbatim}
+ data HoldingPen
+ = HoldingPen {
+ iDecls :: DeclsMap, -- A single, global map of Names to decls
+
+ iInsts :: IfaceInsts,
+ -- The as-yet un-slurped instance decls; this bag is depleted when we
+ -- slurp an instance decl so that we don't slurp the same one twice.
+ -- Each is 'gated' by the names that must be available before
+ -- this instance decl is needed.
+
+ iRules :: IfaceRules
+ -- Similar to instance decls, only for rules
+ }
+\end{verbatim}
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The linker (\mbox{\tt link})}
+\label{sec:linker}
+
+\subsubsection{Data structures owned by the linker}
+
+In the same way that @compile@ has a persistent compiler state (PCS),
+the linker has a persistent (session-lifetime) state, PLS, the
+Linker's Persistent State. In batch mode PLS is entirely irrelevant,
+because there is only a single link step, and can be a unit value
+ignored by everybody. In interactive mode PLS is composed of the
+following three parts:
+
+\begin{itemize}
+\item
+\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@
+ The source symbol table is used when linking interpreted code.
+ Unlinked interpreted code consists of an STG tree where
+ the leaves are @RdrNames@. The linker's job is to resolve these to
+ actual addresses (the alternative is to resolve these lazily when
+ the code is run, but this requires passing the full symbol table
+ through the interpreter and the repeated lookups will probably be
+ expensive).
+
+ The source symbol table therefore maps @RdrName@s to @HValue@s, for
+ every @RdrName@ that currently \emph{has} an @HValue@, including all
+ exported functions from object code modules that are currently
+ linked in. Linking therefore turns a @StgTree RdrName@ into an
+ @StgTree HValue@.
+
+ It is important that we can prune this symbol table by throwing away
+ the mappings for an entire module, whenever we recompile/relink a
+ given module. The representation is therefore probably a two-level
+ mapping, from module names, to function/constructor names, to
+ @HValue@s.
+
+\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
+ This is a lower level symbol table, mapping symbol names in object
+ modules to their addresses in memory. It is used only when
+ resolving the external references in an object module, and contains
+ only entries that are defined in object modules.
+
+ Why have two symbol tables? Well, there is a clear distinction
+ between the two: the source symbol table maps Haskell symbols to
+ Haskell values, and the object symbol table maps object symbols to
+ addresses. There is some overlap, in that Haskell symbols certainly
+ have addresses, and we could look up a Haskell symbol's address by
+ manufacturing the right object symbol and looking that up in the
+ object symbol table, but this is likely to be slow and would force
+ us to extend the object symbol table with all the symbols
+ ``exported'' by interpreted code. Doing it this way enables us to
+ decouple the object management subsystem from the rest of the linker
+ with a minimal interface; something like
+
+ \begin{verbatim}
+ loadObject :: Unlinked -> IO Object
+ unloadModule :: Unlinked -> IO ()
+ lookupSymbol :: String -> IO Addr
+ \end{verbatim}
+
+ Rather unfortunately we need @lookupSymbol@ in order to populate the
+ source symbol table when linking in a new compiled module. Our
+ object management subsystem is currently written in C, so decoupling
+ this interface as much as possible is highly desirable.
+
+\item
+ {\bf Linked Image (LI)} @:: no-explicit-representation@
+
+ LI isn't explicitly represented in the system, but we record it
+ here for completeness anyway. LI is the current set of
+ linked-together module, package and other library fragments
+ constituting the current executable mass. LI comprises:
+ \begin{itemize}
+ \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
+ These are loaded from disk when needed, and stored in
+ @malloc@ville. To simplify storage management, they are
+ never freed or reused, since this creates serious
+ complications for storage management. When no longer needed,
+ they are simply abandoned. New linkings of the same object
+ code produces new copies in memory. We hope this not to be
+ too much of a space leak.
+ \item STG trees, which live in the GHCI heap and are managed by the
+ storage manager in the usual way. They are held alive (are
+ reachable) via the @HValue@s in the OST. Such @HValue@s are
+ applications of the interpreter function to the trees
+ themselves. Linking a tree comprises travelling over the
+ tree, replacing all the @Id@s with pointers directly to the
+ relevant @_closure@ labels, as determined by searching the
+ OST. Once the leaves are linked, trees are wrapped with the
+ interpreter function. The resulting @HValue@s then behave
+ indistinguishably from compiled versions of the same code.
+ \end{itemize}
+ Because object code is outside the heap and never deallocated,
+ whilst interpreted code is held alive via the HST, there's no need
+ to have a data structure which ``is'' the linked image.
+
+ For batch compilation, LI doesn't exist because OST doesn't exist,
+ and because @link@ doesn't load code into memory, instead just
+ invokes the system linker.
+
+ \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...}
+\end{itemize}
+As with PCS, CM has no way to create an initial PLS, so we supply
+@emptyPLS@ for that purpose.
+
+\subsubsection{The linker's interface}
+
+In practice, the PLS might be hidden in the I/O monad rather
+than passed around explicitly. (The same might be true for PCS).
+Anyway:
+
+\begin{verbatim}
+ data PLS -- as described above; opaque to everybody except the linker
+
+ link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult
+
+ data LinkResult = LinkOK PLS
+ | LinkErrs PLS [SDoc]
+
+ emptyPLS :: IO PLS -- since CM has no other way to make one
+\end{verbatim}
+
+CM uses @link@ as follows:
+
+After repeatedly using @compile@ to compile all modules which are
+out-of-date, the @link@ is invoked. The @[[Linkable]]@ argument to
+@link@ represents the list of (recursive groups of) home modules which
+have been newly compiled, along with @Linkable@s for each of
+the packages in use (the compilation manager knows which external
+packages are referenced by the home package). The order of the list
+is important: it is sorted in such a way that linking any prefix of
+the list will result in an image with no unresolved references. Note
+that for batch linking there may be further restrictions; for example
+it may not be possible to link recursive groups containing libraries.
+
+@link@ does the following:
+
+\begin{itemize}
+ \item
+ In batch mode, do nothing. In interactive mode,
+ examine the supplied @[[Linkable]]@ to determine which home
+ module @Unlinked@s are new. Remove precisely these @Linkable@s
+ from PLS. (In fact we really need to remove their upwards
+ transitive closure, but I think it is an invariant that CM will
+ supply an upwards transitive closure of new modules).
+ See below for descriptions of @Linkable@ and @Unlinked@.
+
+ \item
+ Batch system: invoke the external linker to link everything in one go.
+ Interactive: bind the @Unlinked@s for the newly compiled modules,
+ plus those for any newly required packages, into PLS.
+
+ Note that it is the linker's responsibility to remember which
+ objects and packages have already been linked. By comparing this
+ with the @Linkable@s supplied to @link@, it can determine which
+ of the linkables in LI are out of date
+\end{itemize}
+
+If linking in of a group should fail for some reason, @link@ should
+not modify its PLS at all. In other words, linking each group
+is atomic; it either succeeds or fails.
+
+\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
+
+Two important types: @Unlinked@ and @Linkable@. The latter is a
+higher-level representation involving multiple of the former.
+An @Unlinked@ is a reference to unlinked executable code, something
+a linker could take as input:
+
+\begin{verbatim}
+ data Unlinked = DotO Path
+ | DotA Path
+ | DotDLL Path
+ | Trees [StgTree RdrName]
+\end{verbatim}
+
+The first three describe the location of a file (presumably)
+containing the code to link. @Trees@, which only exists in
+interactive mode, gives a list of @StgTrees@, in which the unresolved
+references are @RdrNames@ -- hence it's non-linkedness. Once linked,
+those @RdrNames@ are replaced with pointers to the machine code
+implementing them.
+
+A @Linkable@ gathers together several @Unlinked@s and associates them
+with either a module or package:
+
+\begin{verbatim}
+ data Linkable = LM Module [Unlinked] -- a module
+ | LP PkgName -- a package
+\end{verbatim}
+
+The order of the @Unlinked@s in the list is important, as
+they are linked in left-to-right order. The @Unlinked@ objects for a
+particular package can be obtained from the package configuration (see
+Section \ref{sec:staticinfo}).
+
+\ToDo{When adding @Addr@s from an object module to SST, we need to
+ somehow find out the @RdrName@s of the symbols exported by that
+ module.
+ So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Background ideas}
+\subsubsection*{Out of date, but correct in spirit}
+
+\subsection{Restructuring the system}
+
+At the moment @hsc@ compiles one source module into C or assembly.
+This functionality is pushed inside a function called @compile@,
+introduced shortly. The main new chunk of code is CM, the compilation manager,
+which supervises multiple runs of @compile@ so as to create up-to-date
+translations of a whole bunch of modules, as quickly as possible.
+CM also employs some minor helper functions, @finder@, @summarise@ and
+@link@, to do its work.
+
+Our intent is to allow CM to be used as the basis either of a
+multi-module, batch mode compilation system, or to supply an
+interactive environment similar to that of Hugs.
+Only minor modifications to the behaviour of @compile@ and @link@
+are needed to give these different behaviours.
+
+CM and @compile@, and, for interactive use, an interpreter, are the
+main code components. The most important data structure is the global
+symbol table; much design effort has been expended thereupon.
+
+
+\subsection{How the global symbol table is implemented}
+
+The top level symbol table is a @FiniteMap@ @ModuleName@
+@ModuleDetails@. @ModuleDetails@ contains essentially the environment
+created by compiling a module. CM manages this finite map, adding and
+deleting module entries as required.
+
+The @ModuleDetails@ for a module @M@ contains descriptions of all
+tycons, classes, instances, values, unfoldings, etc (henceforth
+referred to as ``entities''), available from @M@. These are just
+trees in the GHCI heap. References from other modules to these
+entities is direct -- when you have a @TyCon@ in your hand, you really
+have a pointer directly to the @TyCon@ structure in the defining module,
+rather than some kind of index into a global symbol table. So there
+is a global symbol table, but it has a distributed (sphagetti-like?)
+nature.
+
+This gives fast and convenient access to tycon, class, instance,
+etc, information. But because there are no levels of indirection,
+there's a problem when we replace @M@ with an updated version of @M@.
+We then need to find all references to entities in the old @M@'s
+sphagetti, and replace them with pointers to the new @M@'s sphagetti.
+This problem motivates a large part of the design.
+
+
+
+\subsection{Implementing incremental recompilation -- simple version}
+Given the following module graph
+\begin{verbatim}
+ D
+ / \
+ / \
+ B C
+ \ /
+ \ /
+ A
+\end{verbatim}
+(@D@ imports @B@ and @C@, @B@ imports @A@, @C@ imports @A@) the aim is to do the
+least possible amount of compilation to bring @D@ back up to date. The
+simplest scheme we can think of is:
+\begin{itemize}
+\item {\bf Downsweep}:
+ starting with @D@, re-establish what the current module graph is
+ (it might have changed since last time). This means getting a
+ @ModuleSummary@ of @D@. The summary can be quickly generated,
+ contains @D@'s import lists, and gives some way of knowing whether
+ @D@'s source has changed since the last time it was summarised.
+
+ Transitively follow summaries from @D@, thereby establishing the
+ module graph.
+\item
+ Remove from the global symbol table (the @FiniteMap@ @ModuleName@
+ @ModuleDetails@) the upwards closure of all modules in this package
+ which are out-of-date with respect to their previous versions. Also
+ remove all modules no longer reachable from @D@.
+\item {\bf Upsweep}:
+ Starting at the lowest point in the still-in-date module graph,
+ start compiling upwards, towards @D@. At each module, call
+ @compile@, passing it a @FiniteMap@ @ModuleName@ @ModuleDetails@,
+ and getting a new @ModuleDetails@ for the module, which is added to
+ the map.
+
+ When compiling a module, the compiler must be able to know which
+ entries in the map are for modules in its strict downwards closure,
+ and which aren't, so that it can manufacture the instance
+ environment correctly (as union of instances in its downwards
+ closure).
+\item
+ Once @D@ has been compiled, invoke some kind of linking phase
+ if batch compilation. For interactive use, can either do it all
+ at the end, or as you go along.
+\end{itemize}
+In this simple world, recompilation visits the upwards closure of
+all changed modules. That means when a module @M@ is recompiled,
+we can be sure no-one has any references to entities in the old @M@,
+because modules importing @M@ will have already been removed from the
+top-level finite map in the second step above.
+
+The upshot is that we don't need to worry about updating links to @M@ in
+the global symbol table -- there shouldn't be any to update.
+\ToDo{What about mutually recursive modules?}
+
+CM will happily chase through module interfaces in other packages in
+the downsweep. But it will only process modules in this package
+during the upsweep. So it assumes that modules in other packages
+never become out of date. This is a design decision -- we could have
+decided otherwise.
+
+In fact we go further, and require other packages to be compiled,
+i.e. to consist of a collection of interface files, and one or more
+source files. CM will never apply @compile@ to a foreign package
+module, so there's no way a package can be built on the fly from source.
+
+We require @compile@ to cache foreign package interfaces it reads, so
+that subsequent uses don't have to re-read them. The cache never
+becomes out of date, since we've assumed that the source of foreign
+packages doesn't change during the course of a session (run of GHCI).
+As well as caching interfaces, @compile@ must cache, in some sense,
+the linkable code for modules. In batch compilation this might simply
+mean remembering the names of object files to link, whereas in
+interactive mode @compile@ probably needs to load object code into
+memory in preparation for in-memory linking.
+
+Important signatures for this simple scheme are:
+\begin{verbatim}
+ finder :: ModuleName -> ModLocation
+
+ summarise :: ModLocation -> IO ModSummary
+
+ compile :: ModSummary
+ -> FM ModName ModDetails
+ -> IO CompileResult
+
+ data CompileResult = CompOK ModDetails
+ | CompErr [ErrMsg]
+
+ link :: [ModLocation] -> [PackageLocation] -> IO Bool -- linked ok?
+\end{verbatim}
+
+
+\subsection{Implementing incremental recompilation -- clever version}
+
+So far, our upsweep, which is the computationally expensive bit,
+recompiles a module if either its source is out of date, or it
+imports a module which has been recompiled. Sometimes we know
+we can do better than this:
+\begin{verbatim}
+ module B where module A
+ import A ( f ) {-# NOINLINE f #-}
+ ... f ... f x = x + 42
+\end{verbatim}
+If the definition of @f@ is changed to @f x = x + 43@, the simple
+upsweep would recompile @B@ unnecessarily. We would like to detect
+this situation and avoid propagating recompilation all the way to the
+top. There are two parts to this: detecting when a module doesn't
+need recompilation, and managing inter-module references in the
+global symbol table.
+
+\subsubsection*{Detecting when a module doesn't need recompilation}
+
+To do this, we introduce a new concept: the @ModuleIFace@. This is
+effectively an in-memory interface file. References to entities in
+other modules are done via strings, rather than being pointers
+directly to those entities. Recall that, by comparison,
+@ModuleDetails@ do contain pointers directly to the entities they
+refer to. So a @ModuleIFace@ is not part of the global symbol table.
+
+As before, compiling a module produces a @ModuleDetails@ (inside the
+@CompileResult@), but it also produces a @ModuleIFace@. The latter
+records, amongst things, the version numbers of all imported entities
+needed for the compilation of that module. @compile@ optionally also
+takes the old @ModuleIFace@ as input during compilation:
+\begin{verbatim}
+ data CompileResult = CompOK ModDetails ModIFace
+ | CompErr [ErrMsg]
+
+ compile :: ModSummary
+ -> FM ModName ModDetails
+ -> Maybe ModuleIFace
+ -> IO CompileResult
+\end{verbatim}
+Now, if the @ModuleSummary@ indicates this module's source hasn't
+changed, we only need to recompile it if something it depends on has
+changed. @compile@ can detect this by inspecting the imported entity
+version numbers in the module's old @ModuleIFace@, and comparing them
+with the version numbers from the entities in the modules being
+imported. If they are all the same, nothing it depends on has
+changed, so there's no point in recompiling.
+
+\subsubsection*{Managing inter-module references in the global symbol table}
+
+In the above example with @A@, @B@ and @f@, the specified change to @f@ would
+require @A@ but not @B@ to be recompiled. That generates a new
+@ModuleDetails@ for @A@. Problem is, if we leave @B@'s @ModuleDetails@
+unchanged, they continue to refer (directly) to the @f@ in @A@'s old
+@ModuleDetails@. This is not good, especially if equality between
+entities is implemented using pointer equality.
+
+One solution is to throw away @B@'s @ModuleDetails@ and recompile @B@.
+But this is precisely what we're trying to avoid, as it's expensive.
+Instead, a cheaper mechanism achieves the same thing: recreate @B@'s
+details directly from the old @ModuleIFace@. The @ModuleIFace@ will
+(textually) mention @f@; @compile@ can then find a pointer to the
+up-to-date global symbol table entry for @f@, and place that pointer
+in @B@'s @ModuleDetails@. The @ModuleDetails@ are, therefore,
+regenerated just by a quick lookup pass over the module's former
+@ModuleIFace@. All this applies, of course, only when @compile@ has
+concluded it doesn't need to recompile @B@.
+
+Now @compile@'s signature becomes a little clearer. @compile@ has to
+recompile the module, generating a fresh @ModuleDetails@ and
+@ModuleIFace@, if any of the following hold:
+\begin{itemize}
+\item
+ The old @ModuleIFace@ wasn't supplied, for some reason (perhaps
+ we've never compiled this module before?)
+\item
+ The module's source has changed.
+\item
+ The module's source hasn't changed, but inspection of @ModuleIFaces@
+ for this and its imports indicates that an imported entity has
+ changed.
+\end{itemize}
+If none of those are true, we're in luck: quickly knock up a new
+@ModuleDetails@ from the old @ModuleIFace@, and return them both.
+
+As a result, the upsweep still visits all modules in the upwards
+closure of those whose sources have changed. However, at some point
+we hopefully make a transition from generating new @ModuleDetails@ the
+expensive way (recompilation) to a cheap way (recycling old
+@ModuleIFaces@). Either way, all modules still get new
+@ModuleDetails@, so the global symbol table is correctly
+reconstructed.
+
+
+\subsection{How linking works, roughly}
+
+When @compile@ translates a module, it produces a @ModuleDetails@,
+@ModuleIFace@ and a @Linkable@. The @Linkable@ contains the
+translated but un-linked code for the module. And when @compile@
+ventures into an interface in package it hasn't seen so far, it
+copies the package's object code into memory, producing one or more
+@Linkable@s. CM keeps track of these linkables.
+
+Once all modules have been @compile@d, CM invokes @link@, supplying
+the all the @Linkable@s it knows about. If @compile@ had also been
+linking incrementally as it went along, @link@ doesn't have to do
+anything. On the other hand, @compile@ could choose not to be
+incremental, and leave @link@ to do all the work.
+
+@Linkable@s are opaque to CM. For batch compilation, a @Linkable@
+can record just the name of an object file, DLL, archive, or whatever,
+in which case the CM's call to @link@ supplies exactly the set of
+file names to be linked. @link@ can pass these verbatim to the
+standard system linker.
+
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Ancient stuff}
+\subsubsection*{Should be selectively merged into ``Background ideas''}
+
+\subsection{Overall}
+Top level structure is:
+\begin{itemize}
+\item The Compilation Manager (CM) calculates and maintains module
+ dependencies, and knows how create up-to-date object or bytecode
+ for a given module. In doing so it may need to recompile
+ arbitrary other modules, based on its knowledge of the module
+ dependencies.
+\item On top of the CM are the ``user-level'' services. We envisage
+ both a HEP-like interface, for interactive use, and an
+ @hmake@ style batch compiler facility.
+\item The CM only deals with inter-module issues. It knows nothing
+ about how to recompile an individual module, nor where the compiled
+ result for a module lives, nor how to tell if
+ a module is up to date, nor how to find the dependencies of a module.
+ Instead, these services are supplied abstractly to CM via a
+ @Compiler@ record. To a first approximation, a @Compiler@
+ contains
+ the same functionality as @hsc@ has had until now -- the ability to
+ translate a single Haskell module to C/assembly/object/bytecode.
+
+ Different clients of CM (HEP vs @hmake@) may supply different
+ @Compiler@s, since they need slightly different behaviours.
+ Specifically, HEP needs a @Compiler@ which creates bytecode
+ in memory, and knows how to link it, whereas @hmake@ wants
+ the traditional behaviour of emitting assembly code to disk,
+ and making no attempt at linkage.
+\end{itemize}
+
+\subsection{Open questions}
+\begin{itemize}
+\item
+ Error reporting from @open@ and @compile@.
+\item
+ Instance environment management
+\item
+ We probably need to make interface files say what
+ packages they depend on (so that we can figure out
+ which packages to load/link).
+\item
+ CM is parameterised both by the client uses and the @Compiler@
+ supplied. But it doesn't make sense to have a HEP-style client
+ attached to a @hmake@-style @Compiler@. So, really, the
+ parameterising entity should contain both aspects, not just the
+ current @Compiler@ contents.
+\end{itemize}
+
+\subsection{Assumptions}
+
+\begin{itemize}
+\item Packages other than the "current" one are assumed to be
+ already compiled.
+\item
+ The "current" package is usually "MAIN",
+ but we can set it with a command-line flag.
+ One invocation of ghci has only one "current" package.
+\item
+ Packages are not mutually recursive
+\item
+ All the object code for a package P is in libP.a or libP.dll
+\end{itemize}
+
+\subsection{Stuff we need to be able to do}
+\begin{itemize}
+\item Create the environment in which a module has been translated,
+ so that interactive queries can be satisfied as if ``in'' that
+ module.
+\end{itemize}
+
+%%-----------------------------------------------------------------%%
+\section{The Compilation Manager}
+
+CM (@compilationManager@) is a functor, thus:
+\begin{verbatim}
+compilationManager :: Compiler -> IO HEP -- IO so that it can create
+ -- global vars (IORefs)
+
+data HEP = HEP {
+ load :: ModuleName -> IO (),
+ compileString :: ModuleName -> String -> IO HValue,
+ ....
+ }
+
+newCompiler :: IO Compiler -- ??? this is a peer of compilationManager?
+
+run :: HValue -> IO () -- Run an HValue of type IO ()
+ -- In HEP?
+\end{verbatim}
+
+@load@ is the central action of CM: its job is to bring a module and
+all its descendents into an executable state, by doing the following:
+\begin{enumerate}
+\item
+ Use @summarise@ to descend the module hierarchy, starting from the
+ nominated root, creating @ModuleSummary@s, and
+ building a map @ModuleName@ @->@ @ModuleSummary@. @summarise@
+ expects to be passed absolute paths to files. Use @finder@ to
+ convert module names to file paths.
+\item
+ Topologically sort the map,
+ using dependency info in the @ModuleSummary@s.
+\item
+ Clean up the symbol table by deleting the upward closure of
+ changed modules.
+\item
+ Working bottom to top, call @compile@ on the upward closure of
+ all modules whose source has changed. A module's source has
+ changed when @sourceHasChanged@ indicates there is a difference
+ between old and new summaries for the module. Update the running
+ @FiniteMap@ @ModuleName@ @ModuleDetails@ with the new details
+ for this module. Ditto for the running
+ @FiniteMap@ @ModuleName@ @ModuleIFace@.
+\item
+ Call @compileDone@ to signify that we've reached the top, so
+ that the batch system can now link.
+\end{enumerate}
+
+
+%%-----------------------------------------------------------------%%
+\section{A compiler}
+
+Most of the system's complexity is hidden inside the functions
+supplied in the @Compiler@ record:
+\begin{verbatim}
+data Compiler = Compiler {
+
+ finder :: PackageConf -> [Path] -> IO (ModuleName -> ModuleLocation)
+
+ summarise :: ModuleLocation -> IO ModuleSummary
+
+ compile :: ModuleSummary
+ -> Maybe ModuleIFace
+ -> FiniteMap ModuleName ModuleDetails
+ -> IO CompileResult
+
+ compileDone :: IO ()
+ compileStarting :: IO () -- still needed? I don't think so.
+ }
+
+type ModuleName = String (or some such)
+type Path = String -- an absolute file name
+\end{verbatim}
+
+\subsection{The module \mbox{\tt finder}}
+The @finder@, given a package configuration file and a list of
+directories to look in, will map module names to @ModuleLocation@s,
+in which the @Path@s are filenames, probably with an absolute path
+to them.
+\begin{verbatim}
+data ModuleLocation = SourceOnly Path -- .hs
+ | ObjectCode Path Path -- .o & .hi
+ | InPackage Path -- .hi
+\end{verbatim}
+@SourceOnly@ and @ObjectCode@ are unremarkable. For sanity,
+we require that a module's object and interface be in the same
+directory. @InPackage@ indicates that the module is in a
+different package.
+
+@Module@ values -- perhaps all @Name@ish things -- contain the name of
+their package. That's so that
+\begin{itemize}
+\item Correct code can be generated for in-DLL vs out-of-DLL refs.
+\item We don't have version number dependencies for symbols
+ imported from different packages.
+\end{itemize}
+
+Somehow or other, it will be possible to know all the packages
+required, so that the for the linker can load them.
+We could detect package dependencies by recording them in the
+@compile@r's @ModuleIFace@ cache, and with that and the
+package config info, figure out the complete set of packages
+to link. Or look at the command line args on startup.
+
+\ToDo{Need some way to tell incremental linkers about packages,
+ since in general we'll need to load and link them before
+ linking any modules in the current package.}
+
+
+\subsection{The module \mbox{\tt summarise}r}
+Given a filename of a module (\ToDo{presumably source or iface}),
+create a summary of it. A @ModuleSummary@ should contain only enough
+information for CM to construct an up-to-date picture of the
+dependency graph. Rather than expose CM to details of timestamps,
+etc, @summarise@ merely provides an up-to-date summary of any module.
+CM can extract the list of dependencies from a @ModuleSummary@, but
+other than that has no idea what's inside it.
+\begin{verbatim}
+data ModuleSummary = ... (abstract) ...
+
+depsFromSummary :: ModuleSummary -> [ModuleName] -- module names imported
+sourceHasChanged :: ModuleSummary -> ModuleSummary -> Bool
+\end{verbatim}
+@summarise@ is intended to be fast -- a @stat@ of the source or
+interface to see if it has changed, and, if so, a quick semi-parse to
+determine the new imports.
+
+\subsection{The module \mbox{\tt compile}r}
+@compile@ traffics in @ModuleIFace@s and @ModuleDetails@.
+
+A @ModuleIFace@ is an in-memory representation of the contents of an
+interface file, including version numbers, unfoldings and pragmas, and
+the linkable code for the module. @ModuleIFace@s are un-renamed,
+using @HsSym@/@RdrNames@ rather than (globally distinct) @Names@.
+
+@ModuleDetails@, by contrast, is an in-memory representation of the
+static environment created by compiling a module. It is phrased in
+terms of post-renaming @Names@, @TyCon@s, etc, so it's basically a
+renamed-to-global-uniqueness rendition of a @ModuleIFace@.
+
+In an interactive session, we'll want to be able to evaluate
+expressions as if they had been compiled in the scope of some
+specified module. This means that the @ModuleDetails@ must contain
+the type of everything defined in the module, rather than just the
+types of exported stuff. As a consequence, @ModuleIFace@ must also
+contain the type of everything, because it should always be possible
+to generate a module's @ModuleDetails@ from its @ModuleIFace@.
+
+CM maintains two mappings, one from @ModuleName@s to @ModuleIFace@s,
+the other from @ModuleName@s to @ModuleDetail@s. It passes the former
+to each call of @compile@. This is used to supply information about
+modules compiled prior to this one (lower down in the graph). The
+returned @CompileResult@ supplies a new @ModuleDetails@ for the module
+if compilation succeeded, and CM adds this to the mapping. The
+@CompileResult@ also supplies a new @ModuleIFace@, which is either the
+same as that supplied to @compile@, if @compile@ decided not to
+retranslate the module, or is the result of a fresh translation (from
+source). So these mappings are an explicitly-passed-around part of
+the global system state.
+
+@compile@ may also {\em optionally} also accumulate @ModuleIFace@s for
+modules in different packages -- that is, interfaces which we read,
+but never attempt to recompile source for. Such interfaces, being
+from foreign packages, never change, so @compile@ can accumulate them
+in perpetuity in a private global variable. Indeed, a major motivator
+of this design is to facilitate this caching of interface files,
+reading of which is a serious bottleneck for the current compiler.
+
+When CM restarts compilation down at the bottom of the module graph,
+it first needs to throw away all \ToDo{all?} @ModuleDetails@ in the
+upward closure of the out-of-date modules. So @ModuleDetails@ don't
+persist across recompilations. But @ModuleIFace@s do, since they
+are conceptually equivalent to interface files.
+
+
+\subsubsection*{What @compile@ returns}
+@compile@ returns a @CompileResult@ to CM.
+Note that the @compile@'s foreign-package interface cache can
+become augmented even as a result of reading interfaces for a
+compilation attempt which ultimately fails, although it will not be
+augmented with a new @ModuleIFace@ for the failed module.
+\begin{verbatim}
+-- CompileResult is not abstract to the Compilation Manager
+data CompileResult
+ = CompOK ModuleIFace
+ ModuleDetails -- compiled ok, here are new details
+ -- and new iface
+
+ | CompErr [SDoc] -- compilation gave errors
+
+ | NoChange -- no change required, meaning:
+ -- exports, unfoldings, strictness, etc,
+ -- unchanged, and executable code unchanged
+\end{verbatim}
+
+
+
+\subsubsection*{Re-establishing local-to-global name mappings}
+Consider
+\begin{verbatim}
+module Upper where module Lower ( f ) where
+import Lower ( f ) f = ...
+g = ... f ...
+\end{verbatim}
+When @Lower@ is first compiled, @f@ is allocated a @Unique@
+(presumably inside an @Id@ or @Name@?). When @Upper@ is then
+compiled, its reference to @f@ is attached directly to the
+@Id@ created when compiling @Lower@.
+
+If the definition of @f@ is now changed, but not the type,
+unfolding, strictness, or any other thing which affects the way
+it should be called, we will have to recompile @Lower@, but not
+@Upper@. This creates a problem -- @g@ will then refer to the
+the old @Id@ for @f@, not the new one. This may or may not
+matter, but it seems safer to ensure that all @Unique@-based
+references into child modules are always up to date.
+
+So @compile@ recreates the @ModuleDetails@ for @Upper@ from
+the @ModuleIFace@ of @Upper@ and the @ModuleDetails@ of @Lower@.
+
+The rule is: if a module is up to date with respect to its
+source, but a child @C@ has changed, then either:
+\begin{itemize}
+\item On examination of the version numbers in @C@'s
+ interface/@ModuleIFace@ that we used last time, we discover that
+ an @Id@/@TyCon@/class/instance we depend on has changed. So
+ we need to retranslate the module from its source, generating
+ a new @ModuleIFace@ and @ModuleDetails@.
+\item Or: there's nothing in @C@'s interface that we depend on.
+ So we quickly recreate a new @ModuleDetails@ from the existing
+ @ModuleIFace@, creating fresh links to the new @Unique@-world
+ entities in @C@'s new @ModuleDetails@.
+\end{itemize}
+
+Upshot: we need to redo @compile@ on all modules all the way up,
+rather than just the ones that need retranslation. However, we hope
+that most modules won't need retranslation -- just regeneration of the
+@ModuleDetails@ from the @ModuleIFace@. In effect, the @ModuleIFace@
+is a quickly-compilable representation of the module's contents, just
+enough to create the @ModuleDetails@.
+
+\ToDo{Is there anything in @ModuleDetails@ which can't be
+ recreated from @ModuleIFace@ ?}
+
+So the @ModuleIFace@s persist across calls to @HEP.load@, whereas
+@ModuleDetails@ are reconstructed on every compilation pass. This
+means that @ModuleIFace@s have the same lifetime as the byte/object
+code, and so should somehow contain their code.
+
+The behind-the-scenes @ModuleIFace@ cache has some kind of holding-pen
+arrangement, to lazify the copying-out of stuff from it, and thus to
+minimise redundant interface reading. \ToDo{Burble burble. More
+details.}.
+
+When CM starts working back up the module graph with @compile@, it
+needs to remove from the travelling @FiniteMap@ @ModuleName@
+@ModuleDetails@ the details for all modules in the upward closure of
+the compilation start points. However, since we're going to visit
+precisely those modules and no others on the way back up, we might as
+well just zap them the old @ModuleDetails@ incrementally. This does
+mean that the @FiniteMap@ @ModuleName@ @ModuleDetails@ will be
+inconsistent until we reach the top.
+
+In interactive mode, each @compile@ call on a module for which no
+object code is available, or for which it is out of date wrt source,
+emit bytecode into memory, update the resulting @ModuleIFace@ with the
+address of the bytecode image, and link the image.
+
+In batch mode, emit assembly or object code onto disk. Record
+somewhere \ToDo{where?} that this object file needs to go into the
+final link.
+
+When we reach the top, @compileDone@ is called, to signify that batch
+linking can now proceed, if need be.
+
+Modules in other packages never get a @ModuleIFace@ or @ModuleDetails@
+entry in CM's maps -- those maps are only for modules in this package.
+As previously mentioned, @compile@ may optionally cache @ModuleIFace@s
+for foreign package modules. When reading such an interface, we don't
+need to read the version info for individual symbols, since foreign
+packages are assumed static.
+
+\subsubsection*{What's in a \mbox{\tt ModuleIFace}?}
+
+Current interface file contents?
+
+
+\subsubsection*{What's in a \mbox{\tt ModuleDetails}?}
+
+There is no global symbol table @:: Name -> ???@. To look up a
+@Name@, first extract the @ModuleName@ from it, look that up in
+the passed-in @FiniteMap@ @ModuleName@ @ModuleDetails@,
+and finally look in the relevant @Env@.
+
+\ToDo{Do we still have the @HoldingPen@, or is it now composed from
+per-module bits too?}
+\begin{verbatim}
+data ModuleDetails = ModuleDetails {
+
+ moduleExports :: what it exports (Names)
+ -- roughly a subset of the .hi file contents
+
+ moduleEnv :: RdrName -> Name
+ -- maps top-level entities in this module to
+ -- globally distinct (Uniq-ified) Names
+
+ moduleDefs :: Bag Name -- All the things in the global symbol table
+ -- defined by this module
+
+ package :: Package -- what package am I in?
+
+ lastCompile :: Date -- of last compilation
+
+ instEnv :: InstEnv -- local inst env
+ typeEnv :: Name -> TyThing -- local tycon env?
+ }
+
+-- A (globally unique) symbol table entry. Note that Ids contain
+-- unfoldings.
+data TyThing = AClass Class
+ | ATyCon TyCon
+ | AnId Id
+\end{verbatim}
+What's the stuff in @ModuleDetails@ used for?
+\begin{itemize}
+\item @moduleExports@ so that the stuff which is visible from outside
+ the module can be calculated.
+\item @moduleEnv@: \ToDo{umm err}
+\item @moduleDefs@: one reason we want this is so that we can nuke the
+ global symbol table contribs from this module when it leaves the
+ system. \ToDo{except ... we don't have a global symbol table any
+ more.}
+\item @package@: we will need to chase arbitrarily deep into the
+ interfaces of other packages. Of course we don't want to
+ recompile those, but as we've read their interfaces, we may
+ as well cache that info. So @package@ indicates whether this
+ module is in the default package, or, if not, which it is in.
+
+ Also, when we come to linking, we'll need to know which
+ packages are demanded, so we know to load their objects.
+
+\item @lastCompile@: When the module was last compiled. If the
+ source is older than that, then a recompilation can only be
+ required if children have changed.
+\item @typeEnv@: obvious??
+\item @instEnv@: the instances contributed by this module only. The
+ Report allegedly says that when a module is translated, the
+ available
+ instance env is all the instances in the downward closure of
+ itself in the module graph.
+
+ We choose to use this simple representation -- each module
+ holds just its own instances -- and do the naive thing when
+ creating an inst env for compilation with. If this turns out
+ to be a performance problem we'll revisit the design.
+\end{itemize}
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Misc text looking for a home}
+
+\subsection*{Linking}
+
+\ToDo{All this linking stuff is now bogus.}
+
+There's an abstract @LinkState@, which is threaded through the linkery
+bits. CM can call @addpkgs@ to notify the linker of packages
+required, and it can call @addmods@ to announce modules which need to
+be linked. Finally, CM calls @endlink@, after which an executable
+image should be ready. The linker may link incrementally, during each
+call of @addpkgs@ and @addmods@, or it can just store up names and do
+all the linking when @endlink@ is called.
+
+In order that incremental linking is possible, CM should specify
+packages and module groups in dependency order, ie, from the bottom up.
+
+\subsection*{In-memory linking of bytecode}
+When being HEP-like, @compile@ will translate sources to bytecodes
+in memory, with all the bytecode for a module as a contiguous lump
+outside the heap. It needs to communicate the addresses of these
+lumps to the linker. The linker also needs to know whether a
+given module is available as in-memory bytecode, or whether it
+needs to load machine code from a file.
+
+I guess @LinkState@ needs to map module names to base addresses
+of their loaded images, + the nature of the image, + whether or not
+the image has been linked.
+
+\subsection*{On disk linking of object code, to give an executable}
+The @LinkState@ in this case is just a list of module and package
+names, which @addpkgs@ and @addmods@ add to. The final @endlink@
+call can invoke the system linker.
+
+\subsection{Finding out about packages, dependencies, and auxiliary
+ objects}
+
+Ask the @packages.conf@ file that lives with the driver at the mo.
+
+\ToDo{policy about upward closure?}
+
+
+
+\ToDo{record story about how in memory linking is done.}
+
+\ToDo{linker start/stop/initialisation/persistence. Need to
+ say more about @LinkState@.}
+
+
+\end{document}
+
+