path: root/docs/ghci
diff options
Diffstat (limited to 'docs/ghci')
1 files changed, 1598 insertions, 0 deletions
diff --git a/docs/ghci/ghci.tex b/docs/ghci/ghci.tex
new file mode 100644
index 0000000000..c4638a6719
--- /dev/null
+++ b/docs/ghci/ghci.tex
@@ -0,0 +1,1598 @@
+% Glasgow University, 1990-2000
+% \documentstyle[preprint]{acmconf}
+\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin
+\evensidemargin 0.1 in
+\marginparwidth 0.85in % Narrow margins require narrower marginal notes
+\marginparsep 0 in
+%\newcommand{\note}[1]{{\em Note: #1}}
+\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
+\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
+\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
+\textheight 8.5 in
+\textwidth 6.25 in
+\topmargin 0 in
+\headheight 0 in
+\headsep .25 in
+\setlength{\topsep}{0cm} % Reduces space before and after verbatim,
+ % which is implemented using trivlist
+\title{The GHCi Draft Design, round 2}
+\author{MSR Cambridge Haskell Crew \\
+ Microsoft Research Ltd., Cambridge}
+\subsection{Outline of the design}
+The design falls into three major parts:
+\item The compilation manager (CM), which coordinates the
+ system and supplies a HEP-like interface to clients.
+\item The module compiler (@compile@), which translates individual
+ modules to interpretable or machine code.
+\item The linker (@link@),
+ which maintains the executable image in interpreted mode.
+There are also three auxiliary parts: the finder, which locates
+source, object and interface files, the summariser, which quickly
+finds dependency information for modules, and the static info
+(compiler flags and package details), which is unchanged over the
+course of a session.
+This section continues with an overview of the session-lifetime data
+structures. Then follows the finder (section~\ref{sec:finder}),
+summariser (section~\ref{sec:summariser}),
+static info (section~\ref{sec:staticinfo}),
+and finally the three big sections
+on the compilation manager, compiler and linker respectively.
+\subsubsection*{Some terminology}
+Lifetimes: the phrase {\bf session lifetime} covers a complete run of
+GHCI, encompassing multiple recompilation runs. {\bf Module lifetime}
+is a lot shorter, being that of data needed to translate a single
+module, but then discarded, for example Core, AbstractC, Stix trees.
+Data structures with module lifetime are well documented and understood.
+This document is mostly concerned with session-lifetime data.
+Most of these structures are ``owned'' by CM, since that's
+the only major component of GHCI which deals with session-lifetime
+Modules and packages: {\bf home} refers to modules in this package,
+precisely the ones tracked and updated by the compilation manager.
+{\bf Package} refers to all other packages, which are assumed static.
+\subsubsection*{A summary of all session-lifetime data structures}
+These structures have session lifetime but not necessarily global
+visibility. Subsequent sections elaborate who can see what.
+\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming
+ environments created by compiling each home module.
+\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory
+ representations of the interface file created by compiling
+ each home module.
+\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet
+ unlinked translations of home modules only.
+\item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
+\item {\bf Static Info (SI)} (owner: CM) is the package configuration
+ information (PCI) and compiler flags (FLAGS).
+\item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
+ is @compile@'s private cache of information about package
+ modules.
+\item {\bf Persistent Linker State (PLS)} (owner: @link@) is
+ @link@'s private information concerning the the current
+ state of the (in-memory) executable image.
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The finder (\mbox{\tt type Finder})}
+@Path@ could be an indication of a location in a filesystem, or it
+could be some more generic kind of resource identifier, a URL for
+ data Path = ...
+And some names. @Module@s are now used as primary keys for various
+maps, so they are given a @Unique@.
+ type ModName = String -- a module name
+ type PkgName = String -- a package name
+ type Module = -- contains ModName and a Unique, at least
+A @ModLocation@ says where a module is, what it's called and in what
+form it is.
+ data ModLocation = SourceOnly Module Path -- .hs
+ | ObjectCode Module Path Path -- .o, .hi
+ | InPackage Module PkgName
+ -- examine PCI to determine package Path
+The module finder generates @ModLocation@s from @ModName@s. We expect
+it will assume packages to be static, but we want to be able to track
+changes in home modules during the session. Specifically, we want to
+be able to notice that a module's object and interface have been
+updated, presumably by a compile run outside of the GHCI session.
+Hence the two-stage type:
+ type Finder = ModName -> IO ModLocation
+ newFinder :: PCI -> IO Finder
+@newFinder@ examines the package information right at the start, but
+returns an @IO@-typed function which can inspect home module changes
+later in the session.
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The summariser (\mbox{\tt summarise})}
+A @ModSummary@ records the minimum information needed to establish the
+module graph and determine whose source has changed. @ModSummary@s
+can be created quickly.
+ data ModSummary = ModSummary
+ ModLocation -- location and kind
+ (Maybe (String, Fingerprint))
+ -- source and fingerprint if .hs
+ (Maybe [ModName]) -- imports if .hs or .hi
+ type Fingerprint = ... -- file timestamp, or source checksum?
+ summarise :: ModLocation -> IO ModSummary
+The summary contains the location and source text, and the location
+contains the name. We would like to remove the assumption that
+sources live on disk, but I'm not sure this is good enough yet.
+\ToDo{Should @ModSummary@ contain source text for interface files too?}
+\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@ (why?).}
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{Static information (SI)}
+PCI, the package configuration information, is a list of @PkgInfo@,
+each containing at least the following:
+ data PkgInfo
+ = PkgInfo PkgName -- my name
+ Path -- path to my base location
+ [PkgName] -- who I depend on
+ [ModName] -- modules I supply
+ [Unlinked] -- paths to my object files
+ type PCI = [PkgInfo]
+The @Path@s in it, including those in the @Unlinked@s, are set up
+when GHCI starts.
+FLAGS is a bunch of compiler options. We haven't figured out yet how
+to partition them into those for the whole session vs those for
+specific source files, so currently the best we can do is:
+ data FLAGS = ...
+The static information (SI) is the both of these:
+ data SI = SI PCI
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The Compilation Manager (CM)}
+\subsubsection{Data structures owned by CM}
+CM maintains two maps (HST, HIT) and a set (UI). It's important to
+realise that CM only knows about the map/set-ness, and has no idea
+what a @ModDetails@, @ModIFace@ or @Linkable@ is. Only @compile@ and
+@link@ know that, and CM passes these types around without
+inspecting them.
+ {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
+ The @ModDetails@ (a couple of layers down) contain tycons, classes,
+ instances, etc, collectively known as ``entities''. Referrals from
+ other modules to these entities is direct, with no intervening
+ indirections of any kind; conversely, these entities refer directly
+ to other entities, regardless of module boundaries. HST only holds
+ information for home modules; the corresponding wired-up details
+ for package (non-home) modules are created on demand in the package
+ symbol table (PST) inside the persistent compiler's state (PCS).
+ CM maintains the HST, which is passed to, but not modified by,
+ @compile@. If compilation of a module is successful, @compile@
+ returns the resulting @ModDetails@ (inside the @CompResult@) which
+ CM then adds to HST.
+ CM throws away arbitrarily large parts of HST at the start of a
+ rebuild, and uses @compile@ to incrementally reconstruct it.
+ {\bf Home Interface Table (HIT)} @:: FiniteMap Module ModIFace@
+ (Completely private to CM; nobody else sees this).
+ Compilation of a module always creates a @ModIFace@, which contains
+ the unlinked symbol table entries. CM maintains this @FiniteMap@
+ @ModName@ @ModIFace@, with session lifetime. CM never throws away
+ @ModIFace@s, but it does update them, by passing old ones to
+ @compile@ if they exist, and getting new ones back.
+ CM acquires @ModuleIFace@s from @compile@, which it only applies
+ to modules in the home package. As a result, HIT only contains
+ @ModuleIFace@s for modules in the home package. Those from other
+ packages reside in the package interface table (PIT) which is a
+ component of PCS.
+ {\bf Unlinked Images (UI)} @:: Set Linkable@
+ The @Linkable@s in UI represent executable but as-yet unlinked
+ module translations. A @Linkable@ can contain the name of an
+ object, archive or DLL file. In interactive mode, it may also be
+ the STG trees derived from translating a module. So @compile@
+ returns a @Linkable@ from each successful run, namely that of
+ translating the module at hand.
+ At link-time, CM supplies @Linkable@s for the upwards closure of
+ all packages which have changed, to @link@. It also examines the
+ @ModSummary@s for all home modules, and by examining their imports
+ and the SI.PCI (package configuration info) it can determine the
+ @Linkable@s from all required imported packages too.
+ @Linkable@s and @ModIFace@s have a close relationship. Each
+ translated module has a corresponding @Linkable@ somewhere.
+ However, there may be @Linkable@s with no corresponding modules
+ (the RTS, for example). Conversely, multiple modules may share a
+ single @Linkable@ -- as is the case for any module from a
+ multi-module package. For these reasons it seems appropriate to
+ keep the two concepts distinct. @Linkable@s also provide
+ information about the sequence in which individual package
+ components should be linked, and that isn't the business of any
+ specific module to know.
+ CM passes @compile@ a module's old @ModIFace@, if it has one, in
+ the hope that the module won't need recompiling. If so, @compile@
+ can just return the new @ModDetails@ created from it, and CM will
+ re-use the old @ModIFace@. If the module {\em is} recompiled (or
+ scheduled to be loaded from disk), @compile@ returns both the
+ new @ModIFace@ and new @Linkable@.
+ {\bf Module Graph (MG)} @:: known-only-to-CM@
+ Records, for CM's purposes, the current module graph,
+ up-to-dateness and summaries. More details when I get to them.
+ Only contains home modules.
+Probably all this stuff is rolled together into the Persistent CM
+State (PCMS):
+ emptyPCMS :: IO PCMS
+\subsubsection{What CM implements}
+It pretty much implements the HEP interface. First, though, define a
+containing structure for the state of the entire CM system and its
+subsystems @compile@ and @link@:
+ data CmState
+ = CmState PCMS -- CM's stuff
+ PCS -- compile's stuff
+ PLS -- link's stuff
+ SI -- the static info, never changes
+ Finder -- the finder
+The @CmState@ is threaded through the HEP interface. In reality
+this might be done using @IORef@s, but for clarity:
+ type ModHandle = ... (opaque to CM/HEP clients) ...
+ type HValue = ... (opaque to CM/HEP clients) ...
+ cmInit :: FLAGS
+ -> [PkgInfo]
+ -> IO CmState
+ cmLoadModule :: CmState
+ -> ModName
+ -> IO (CmState, Either [SDoc] ModHandle)
+ cmGetExpr :: ModHandle
+ -> CmState
+ -> String -> IO (CmState, Either [SDoc] HValue)
+ cmRunExpr :: HValue -> IO () -- don't need CmState here
+Almost all the huff and puff in this document pertains to @cmLoadModule@.
+\subsubsection{Implementing \mbox{\tt cmInit}}
+@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@,
+@emptyPLS@, making SI from the supplied flags and package info, and
+by supplying the package info the @newFinder@.
+\subsubsection{Implementing \mbox{\tt cmLoadModule}}
+\item {\bf Downsweep:} using @finder@ and @summarise@, chase from
+ the given module to
+ establish the new home module graph (MG). Do not chase into
+ package modules.
+\item Remove from HIT, HST, UI any modules in the old MG which are
+ not in the new one. The old MG is then replaced by the new one.
+\item Topologically sort MG to generate a bottom-to-top traversal
+ order, giving a worklist.
+\item {\bf Upsweep:} call @compile@ on each module in the worklist in
+ turn, passing it
+ the ``correct'' HST, PCS, the old @ModIFace@ if
+ available, and the summary. ``Correct'' HST in the sense that
+ HST contains only the modules in the this module's downward
+ closure, so that @compile@ can construct the correct instance
+ and rule environments simply as the union of those in
+ the module's downward closure.
+ If @compile@ doesn't return a new interface/linkable pair,
+ compilation wasn't necessary. Either way, update HST with
+ the new @ModDetails@, and UI and HIT respectively if a
+ compilation {\em did} occur.
+ Keep going until the root module is successfully done, or
+ compilation fails.
+\item If the previous step terminated because compilation failed,
+ define the successful set as those modules in successfully
+ completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding
+ those from modules in any cycle which includes the module which failed.
+ Remove from HST, HIT, UI and MG all modules mentioned in MG which
+ are not in the successful set. Call @link@ with the successful
+ set,
+ which should succeed. The net effect is to back off to a point
+ in which those modules which are still aboard are correctly
+ compiled and linked.
+ If the previous step terminated successfully,
+ call @link@ passing it the @Linkable@s in the upward closure of
+ all those modules for which @compile@ produced a new @Linkable@.
+As a small optimisation, do this:
+\item[3a.] Remove from the worklist any module M where M's source
+ hasn't changed and neither has the source of any module in M's
+ downward closure. This has the effect of not starting the upsweep
+ right at the bottom of the graph when that's not needed.
+ Source-change checking can be done quickly by CM by comparing
+ summaries of modules in MG against corresponding
+ summaries from the old MG.
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The compiler (\mbox{\tt compile})}
+\subsubsection{Data structures owned by \mbox{\tt compile}}
+{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
+This contains info about foreign packages only, acting as a cache,
+which is private to @compile@. The cache never becomes out of
+date. There are three parts to it:
+ \begin{itemize}
+ \item
+ {\bf Package Interface Table (PIT)} @:: FiniteMap Module ModIFace@
+ @compile@ reads interfaces from modules in foreign packages, and
+ caches them in the PIT. Subsequent imports of the same module get
+ them directly out of the PIT, avoiding slow lexing/parsing phases.
+ Because foreign packages are assumed never to become out of date,
+ all contents of PIT remain valid forever. @compile@ of course
+ tries to find package interfaces in PIT in preference to reading
+ them from files.
+ Both successful and failed runs of @compile@ can add arbitrary
+ numbers of new interfaces to the PIT. The failed runs don't matter
+ because we assume that packages are static, so the data cached even
+ by a failed run is valid forever (ie for the rest of the session).
+ \item
+ {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
+ Adding an package interface to PIT doesn't make it directly usable
+ to @compile@, because it first needs to be wired (renamed +
+ typechecked) into the sphagetti of the HST. On the other hand,
+ most modules only use a few entities from any imported interface,
+ so wiring-in the interface at PIT-entry time might be a big time
+ waster. Also, wiring in an interface could mean reading other
+ interfaces, and we don't want to do that unnecessarily.
+ The PST avoids these problems by allowing incremental wiring-in to
+ happen. Pieces of foreign interfaces are copied out of the holding
+ pen (HP), renamed, typechecked, and placed in the PST, but only as
+ @compile@ discovers it needs them. In the process of incremental
+ renaming/typechecking, @compile@ may need to read more package
+ interfaces, which are added to the PIT and hence to
+ HP.~\ToDo{How? When?}
+ CM passes the PST to @compile@ and is returned an updated version
+ on both success and failure.
+ \item
+ {\bf Holding Pen (HP)} @:: HoldingPen@
+ HP holds parsed but not-yet renamed-or-typechecked fragments of
+ package interfaces. As typechecking of other modules progresses,
+ fragments are removed (``slurped'') from HP, renamed and
+ typechecked, and placed in PCS.PST (see above). Slurping a
+ fragment may require new interfaces to be read into HP. The hope
+ is, though, that many fragments will never get slurped, reducing
+ the total number of interfaces read (as compared to eager slurping).
+ \end{itemize}
+ PCS is opaque to CM; only @compile@ knows what's in it, and how to
+ update it. Because packages are assumed static, PCS never becomes
+ out of date. So CM only needs to be able to create an empty PCS,
+ with @emptyPCS@, and thence just passes it through @compile@ with
+ no further ado.
+ In return, @compile@ must promise not to store in PCS any
+ information pertaining to the home modules. If it did so, CM would
+ need to have a way to remove this information prior to commencing a
+ rebuild, which conflicts with PCS's opaqueness to CM.
+\subsubsection{What {\tt compile} does}
+@compile@ is necessarily somewhat complex. We've decided to do away
+with private global variables -- they make the design specification
+less clear, although the implementation might use them. Without
+further ado:
+ compile :: SI -- obvious
+ -> Finder -- to find modules
+ -> ModSummary -- summary, including source
+ -> Maybe ModIFace
+ -- former summary, if avail
+ -> HST -- for home module ModDetails
+ -> PCS -- IN: the persistent compiler state
+ -> IO CompResult
+ data CompResult
+ = CompOK ModDetails -- new details (== HST additions)
+ (Maybe (ModIFace, Linkable))
+ -- summary and code; Nothing => compilation
+ -- not needed (old summary and code are still valid)
+ PCS -- updated PCS
+ [SDoc] -- warnings
+ | CompErrs PCS -- updated PCS
+ [SDoc] -- warnings and errors
+ data PCS
+ = MkPCS PIT -- package interfaces
+ PST -- post slurping global symtab contribs
+ HoldingPen -- pre slurping interface bits and pieces
+ emptyPCS :: IO PCS -- since CM has no other way to make one
+Although @compile@ is passed three of the global structures (FLAGS,
+HST and PCS), it only modifies PCS. The rest are modified by CM as it
+sees fit, from the stuff returned in the @CompResult@.
+@compile@ is allowed to return an updated PCS even if compilation
+errors occur, since the information in it pertains only to foreign
+packages and is assumed to be always-correct.
+What @compile@ does: \ToDo{A bit vague ... needs refining. How does
+ @finder@ come into the game?}
+\item Figure out if this module needs recompilation.
+ \begin{itemize}
+ \item If there's no old @ModIFace@, it does. Else:
+ \item Compare the @ModSummary@ supplied with that in the
+ old @ModIFace@. If the source has changed, recompilation
+ is needed. Else:
+ \item Compare the usage version numbers in the old @ModIFace@ with
+ those in the imported @ModIFace@s. All needed interfaces
+ for this should be in either HIT or PIT. If any version
+ numbers differ, recompilation is needed.
+ \item Otherwise it isn't needed.
+ \end{itemize}
+ If recompilation is not needed, create a new @ModDetails@ from the
+ old @ModIFace@, looking up information in HST and PCS.PST as
+ necessary. Return the new details, a @Nothing@ denoting
+ compilation was not needed, the PCS \ToDo{I don't think the PCS
+ should be updated, but who knows?}, and an empty warning list.
+ Otherwise, compilation is needed.
+ If the module is only available in object+interface form, read the
+ interface, make up details, create a linkable pointing at the
+ object code. \ToDo{Does this involve reading any more interfaces? Does
+ it involve updating PST?}
+ Otherwise, translate from source, then create and return: an
+ details, interface, linkable, updated PST, and warnings.
+ When looking for a new interface, search HST, then PCS.PIT, and only
+ then read from disk. In which case add the new interface(s) to
+ \ToDo{If compiling a module with a boot-interface file, check the
+ boot interface against the inferred interface.}
+\subsubsection{Contents of \mbox{\tt ModDetails},
+ \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}}
+Only @compile@ can see inside these three types -- they are opaque to
+everyone else. @ModDetails@ holds the post-renaming,
+post-typechecking environment created by compiling a module.
+ data ModDetails
+ = ModDetails {
+ moduleExports :: Avails
+ moduleEnv :: GlobalRdrEnv -- == FM RdrName [Name]
+ typeEnv :: FM Name TyThing -- TyThing is in TcEnv.lhs
+ instEnv :: InstEnv
+ fixityEnv :: FM Name Fixity
+ ruleEnv :: FM Id [Rule]
+ }
+@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@:
+ type ModIFace = ParsedIFace -- not really, but ...
+ data ParsedIface
+ = ParsedIface {
+ pi_mod :: Module, -- Complete with package info
+ pi_vers :: Version, -- Module version number
+ pi_orphan :: WhetherHasOrphans, -- Whether this module has orphans
+ pi_usages :: [ImportVersion OccName], -- Usages
+ pi_exports :: [ExportItem], -- Exports
+ pi_insts :: [RdrNameInstDecl], -- Local instance declarations
+ pi_decls :: [(Version, RdrNameHsDecl)], -- Local definitions
+ pi_fixity :: (Version, [RdrNameFixitySig]), -- Local fixity declarations,
+ -- with their version
+ pi_rules :: (Version, [RdrNameRuleDecl]), -- Rules, with their version
+ pi_deprecs :: [RdrNameDeprecation] -- Deprecations
+ }
+@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@,
+retaining just the 3 pieces actually comprising the holding pen:
+ data HoldingPen
+ = HoldingPen {
+ iDecls :: DeclsMap, -- A single, global map of Names to decls
+ iInsts :: IfaceInsts,
+ -- The as-yet un-slurped instance decls; this bag is depleted when we
+ -- slurp an instance decl so that we don't slurp the same one twice.
+ -- Each is 'gated' by the names that must be available before
+ -- this instance decl is needed.
+ iRules :: IfaceRules
+ -- Similar to instance decls, only for rules
+ }
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The linker (\mbox{\tt link})}
+\subsubsection{Data structures owned by the linker}
+In the same way that @compile@ has a persistent compiler state (PCS),
+the linker has a persistent (session-lifetime) state, PLS, the
+Linker's Persistent State. In batch mode PLS is entirely irrelevant,
+because there is only a single link step, and can be a unit value
+ignored by everybody. In interactive mode PLS is composed of the
+following three parts:
+\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@
+ The source symbol table is used when linking interpreted code.
+ Unlinked interpreted code consists of an STG tree where
+ the leaves are @RdrNames@. The linker's job is to resolve these to
+ actual addresses (the alternative is to resolve these lazily when
+ the code is run, but this requires passing the full symbol table
+ through the interpreter and the repeated lookups will probably be
+ expensive).
+ The source symbol table therefore maps @RdrName@s to @HValue@s, for
+ every @RdrName@ that currently \emph{has} an @HValue@, including all
+ exported functions from object code modules that are currently
+ linked in. Linking therefore turns a @StgTree RdrName@ into an
+ @StgTree HValue@.
+ It is important that we can prune this symbol table by throwing away
+ the mappings for an entire module, whenever we recompile/relink a
+ given module. The representation is therefore probably a two-level
+ mapping, from module names, to function/constructor names, to
+ @HValue@s.
+\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
+ This is a lower level symbol table, mapping symbol names in object
+ modules to their addresses in memory. It is used only when
+ resolving the external references in an object module, and contains
+ only entries that are defined in object modules.
+ Why have two symbol tables? Well, there is a clear distinction
+ between the two: the source symbol table maps Haskell symbols to
+ Haskell values, and the object symbol table maps object symbols to
+ addresses. There is some overlap, in that Haskell symbols certainly
+ have addresses, and we could look up a Haskell symbol's address by
+ manufacturing the right object symbol and looking that up in the
+ object symbol table, but this is likely to be slow and would force
+ us to extend the object symbol table with all the symbols
+ ``exported'' by interpreted code. Doing it this way enables us to
+ decouple the object management subsystem from the rest of the linker
+ with a minimal interface; something like
+ \begin{verbatim}
+ loadObject :: Unlinked -> IO Object
+ unloadModule :: Unlinked -> IO ()
+ lookupSymbol :: String -> IO Addr
+ \end{verbatim}
+ Rather unfortunately we need @lookupSymbol@ in order to populate the
+ source symbol table when linking in a new compiled module. Our
+ object management subsystem is currently written in C, so decoupling
+ this interface as much as possible is highly desirable.
+ {\bf Linked Image (LI)} @:: no-explicit-representation@
+ LI isn't explicitly represented in the system, but we record it
+ here for completeness anyway. LI is the current set of
+ linked-together module, package and other library fragments
+ constituting the current executable mass. LI comprises:
+ \begin{itemize}
+ \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
+ These are loaded from disk when needed, and stored in
+ @malloc@ville. To simplify storage management, they are
+ never freed or reused, since this creates serious
+ complications for storage management. When no longer needed,
+ they are simply abandoned. New linkings of the same object
+ code produces new copies in memory. We hope this not to be
+ too much of a space leak.
+ \item STG trees, which live in the GHCI heap and are managed by the
+ storage manager in the usual way. They are held alive (are
+ reachable) via the @HValue@s in the OST. Such @HValue@s are
+ applications of the interpreter function to the trees
+ themselves. Linking a tree comprises travelling over the
+ tree, replacing all the @Id@s with pointers directly to the
+ relevant @_closure@ labels, as determined by searching the
+ OST. Once the leaves are linked, trees are wrapped with the
+ interpreter function. The resulting @HValue@s then behave
+ indistinguishably from compiled versions of the same code.
+ \end{itemize}
+ Because object code is outside the heap and never deallocated,
+ whilst interpreted code is held alive via the HST, there's no need
+ to have a data structure which ``is'' the linked image.
+ For batch compilation, LI doesn't exist because OST doesn't exist,
+ and because @link@ doesn't load code into memory, instead just
+ invokes the system linker.
+ \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...}
+As with PCS, CM has no way to create an initial PLS, so we supply
+@emptyPLS@ for that purpose.
+\subsubsection{The linker's interface}
+In practice, the PLS might be hidden in the I/O monad rather
+than passed around explicitly. (The same might be true for PCS).
+ data PLS -- as described above; opaque to everybody except the linker
+ link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult
+ data LinkResult = LinkOK PLS
+ | LinkErrs PLS [SDoc]
+ emptyPLS :: IO PLS -- since CM has no other way to make one
+CM uses @link@ as follows:
+After repeatedly using @compile@ to compile all modules which are
+out-of-date, the @link@ is invoked. The @[[Linkable]]@ argument to
+@link@ represents the list of (recursive groups of) home modules which
+have been newly compiled, along with @Linkable@s for each of
+the packages in use (the compilation manager knows which external
+packages are referenced by the home package). The order of the list
+is important: it is sorted in such a way that linking any prefix of
+the list will result in an image with no unresolved references. Note
+that for batch linking there may be further restrictions; for example
+it may not be possible to link recursive groups containing libraries.
+@link@ does the following:
+ \item
+ In batch mode, do nothing. In interactive mode,
+ examine the supplied @[[Linkable]]@ to determine which home
+ module @Unlinked@s are new. Remove precisely these @Linkable@s
+ from PLS. (In fact we really need to remove their upwards
+ transitive closure, but I think it is an invariant that CM will
+ supply an upwards transitive closure of new modules).
+ See below for descriptions of @Linkable@ and @Unlinked@.
+ \item
+ Batch system: invoke the external linker to link everything in one go.
+ Interactive: bind the @Unlinked@s for the newly compiled modules,
+ plus those for any newly required packages, into PLS.
+ Note that it is the linker's responsibility to remember which
+ objects and packages have already been linked. By comparing this
+ with the @Linkable@s supplied to @link@, it can determine which
+ of the linkables in LI are out of date
+If linking in of a group should fail for some reason, @link@ should
+not modify its PLS at all. In other words, linking each group
+is atomic; it either succeeds or fails.
+\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
+Two important types: @Unlinked@ and @Linkable@. The latter is a
+higher-level representation involving multiple of the former.
+An @Unlinked@ is a reference to unlinked executable code, something
+a linker could take as input:
+ data Unlinked = DotO Path
+ | DotA Path
+ | DotDLL Path
+ | Trees [StgTree RdrName]
+The first three describe the location of a file (presumably)
+containing the code to link. @Trees@, which only exists in
+interactive mode, gives a list of @StgTrees@, in which the unresolved
+references are @RdrNames@ -- hence it's non-linkedness. Once linked,
+those @RdrNames@ are replaced with pointers to the machine code
+implementing them.
+A @Linkable@ gathers together several @Unlinked@s and associates them
+with either a module or package:
+ data Linkable = LM Module [Unlinked] -- a module
+ | LP PkgName -- a package
+The order of the @Unlinked@s in the list is important, as
+they are linked in left-to-right order. The @Unlinked@ objects for a
+particular package can be obtained from the package configuration (see
+Section \ref{sec:staticinfo}).
+\ToDo{When adding @Addr@s from an object module to SST, we need to
+ somehow find out the @RdrName@s of the symbols exported by that
+ module.
+ So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
+\section{Background ideas}
+\subsubsection*{Out of date, but correct in spirit}
+\subsection{Restructuring the system}
+At the moment @hsc@ compiles one source module into C or assembly.
+This functionality is pushed inside a function called @compile@,
+introduced shortly. The main new chunk of code is CM, the compilation manager,
+which supervises multiple runs of @compile@ so as to create up-to-date
+translations of a whole bunch of modules, as quickly as possible.
+CM also employs some minor helper functions, @finder@, @summarise@ and
+@link@, to do its work.
+Our intent is to allow CM to be used as the basis either of a
+multi-module, batch mode compilation system, or to supply an
+interactive environment similar to that of Hugs.
+Only minor modifications to the behaviour of @compile@ and @link@
+are needed to give these different behaviours.
+CM and @compile@, and, for interactive use, an interpreter, are the
+main code components. The most important data structure is the global
+symbol table; much design effort has been expended thereupon.
+\subsection{How the global symbol table is implemented}
+The top level symbol table is a @FiniteMap@ @ModuleName@
+@ModuleDetails@. @ModuleDetails@ contains essentially the environment
+created by compiling a module. CM manages this finite map, adding and
+deleting module entries as required.
+The @ModuleDetails@ for a module @M@ contains descriptions of all
+tycons, classes, instances, values, unfoldings, etc (henceforth
+referred to as ``entities''), available from @M@. These are just
+trees in the GHCI heap. References from other modules to these
+entities is direct -- when you have a @TyCon@ in your hand, you really
+have a pointer directly to the @TyCon@ structure in the defining module,
+rather than some kind of index into a global symbol table. So there
+is a global symbol table, but it has a distributed (sphagetti-like?)
+This gives fast and convenient access to tycon, class, instance,
+etc, information. But because there are no levels of indirection,
+there's a problem when we replace @M@ with an updated version of @M@.
+We then need to find all references to entities in the old @M@'s
+sphagetti, and replace them with pointers to the new @M@'s sphagetti.
+This problem motivates a large part of the design.
+\subsection{Implementing incremental recompilation -- simple version}
+Given the following module graph
+ D
+ / \
+ / \
+ B C
+ \ /
+ \ /
+ A
+(@D@ imports @B@ and @C@, @B@ imports @A@, @C@ imports @A@) the aim is to do the
+least possible amount of compilation to bring @D@ back up to date. The
+simplest scheme we can think of is:
+\item {\bf Downsweep}:
+ starting with @D@, re-establish what the current module graph is
+ (it might have changed since last time). This means getting a
+ @ModuleSummary@ of @D@. The summary can be quickly generated,
+ contains @D@'s import lists, and gives some way of knowing whether
+ @D@'s source has changed since the last time it was summarised.
+ Transitively follow summaries from @D@, thereby establishing the
+ module graph.
+ Remove from the global symbol table (the @FiniteMap@ @ModuleName@
+ @ModuleDetails@) the upwards closure of all modules in this package
+ which are out-of-date with respect to their previous versions. Also
+ remove all modules no longer reachable from @D@.
+\item {\bf Upsweep}:
+ Starting at the lowest point in the still-in-date module graph,
+ start compiling upwards, towards @D@. At each module, call
+ @compile@, passing it a @FiniteMap@ @ModuleName@ @ModuleDetails@,
+ and getting a new @ModuleDetails@ for the module, which is added to
+ the map.
+ When compiling a module, the compiler must be able to know which
+ entries in the map are for modules in its strict downwards closure,
+ and which aren't, so that it can manufacture the instance
+ environment correctly (as union of instances in its downwards
+ closure).
+ Once @D@ has been compiled, invoke some kind of linking phase
+ if batch compilation. For interactive use, can either do it all
+ at the end, or as you go along.
+In this simple world, recompilation visits the upwards closure of
+all changed modules. That means when a module @M@ is recompiled,
+we can be sure no-one has any references to entities in the old @M@,
+because modules importing @M@ will have already been removed from the
+top-level finite map in the second step above.
+The upshot is that we don't need to worry about updating links to @M@ in
+the global symbol table -- there shouldn't be any to update.
+\ToDo{What about mutually recursive modules?}
+CM will happily chase through module interfaces in other packages in
+the downsweep. But it will only process modules in this package
+during the upsweep. So it assumes that modules in other packages
+never become out of date. This is a design decision -- we could have
+decided otherwise.
+In fact we go further, and require other packages to be compiled,
+i.e. to consist of a collection of interface files, and one or more
+source files. CM will never apply @compile@ to a foreign package
+module, so there's no way a package can be built on the fly from source.
+We require @compile@ to cache foreign package interfaces it reads, so
+that subsequent uses don't have to re-read them. The cache never
+becomes out of date, since we've assumed that the source of foreign
+packages doesn't change during the course of a session (run of GHCI).
+As well as caching interfaces, @compile@ must cache, in some sense,
+the linkable code for modules. In batch compilation this might simply
+mean remembering the names of object files to link, whereas in
+interactive mode @compile@ probably needs to load object code into
+memory in preparation for in-memory linking.
+Important signatures for this simple scheme are:
+ finder :: ModuleName -> ModLocation
+ summarise :: ModLocation -> IO ModSummary
+ compile :: ModSummary
+ -> FM ModName ModDetails
+ -> IO CompileResult
+ data CompileResult = CompOK ModDetails
+ | CompErr [ErrMsg]
+ link :: [ModLocation] -> [PackageLocation] -> IO Bool -- linked ok?
+\subsection{Implementing incremental recompilation -- clever version}
+So far, our upsweep, which is the computationally expensive bit,
+recompiles a module if either its source is out of date, or it
+imports a module which has been recompiled. Sometimes we know
+we can do better than this:
+ module B where module A
+ import A ( f ) {-# NOINLINE f #-}
+ ... f ... f x = x + 42
+If the definition of @f@ is changed to @f x = x + 43@, the simple
+upsweep would recompile @B@ unnecessarily. We would like to detect
+this situation and avoid propagating recompilation all the way to the
+top. There are two parts to this: detecting when a module doesn't
+need recompilation, and managing inter-module references in the
+global symbol table.
+\subsubsection*{Detecting when a module doesn't need recompilation}
+To do this, we introduce a new concept: the @ModuleIFace@. This is
+effectively an in-memory interface file. References to entities in
+other modules are done via strings, rather than being pointers
+directly to those entities. Recall that, by comparison,
+@ModuleDetails@ do contain pointers directly to the entities they
+refer to. So a @ModuleIFace@ is not part of the global symbol table.
+As before, compiling a module produces a @ModuleDetails@ (inside the
+@CompileResult@), but it also produces a @ModuleIFace@. The latter
+records, amongst things, the version numbers of all imported entities
+needed for the compilation of that module. @compile@ optionally also
+takes the old @ModuleIFace@ as input during compilation:
+ data CompileResult = CompOK ModDetails ModIFace
+ | CompErr [ErrMsg]
+ compile :: ModSummary
+ -> FM ModName ModDetails
+ -> Maybe ModuleIFace
+ -> IO CompileResult
+Now, if the @ModuleSummary@ indicates this module's source hasn't
+changed, we only need to recompile it if something it depends on has
+changed. @compile@ can detect this by inspecting the imported entity
+version numbers in the module's old @ModuleIFace@, and comparing them
+with the version numbers from the entities in the modules being
+imported. If they are all the same, nothing it depends on has
+changed, so there's no point in recompiling.
+\subsubsection*{Managing inter-module references in the global symbol table}
+In the above example with @A@, @B@ and @f@, the specified change to @f@ would
+require @A@ but not @B@ to be recompiled. That generates a new
+@ModuleDetails@ for @A@. Problem is, if we leave @B@'s @ModuleDetails@
+unchanged, they continue to refer (directly) to the @f@ in @A@'s old
+@ModuleDetails@. This is not good, especially if equality between
+entities is implemented using pointer equality.
+One solution is to throw away @B@'s @ModuleDetails@ and recompile @B@.
+But this is precisely what we're trying to avoid, as it's expensive.
+Instead, a cheaper mechanism achieves the same thing: recreate @B@'s
+details directly from the old @ModuleIFace@. The @ModuleIFace@ will
+(textually) mention @f@; @compile@ can then find a pointer to the
+up-to-date global symbol table entry for @f@, and place that pointer
+in @B@'s @ModuleDetails@. The @ModuleDetails@ are, therefore,
+regenerated just by a quick lookup pass over the module's former
+@ModuleIFace@. All this applies, of course, only when @compile@ has
+concluded it doesn't need to recompile @B@.
+Now @compile@'s signature becomes a little clearer. @compile@ has to
+recompile the module, generating a fresh @ModuleDetails@ and
+@ModuleIFace@, if any of the following hold:
+ The old @ModuleIFace@ wasn't supplied, for some reason (perhaps
+ we've never compiled this module before?)
+ The module's source has changed.
+ The module's source hasn't changed, but inspection of @ModuleIFaces@
+ for this and its imports indicates that an imported entity has
+ changed.
+If none of those are true, we're in luck: quickly knock up a new
+@ModuleDetails@ from the old @ModuleIFace@, and return them both.
+As a result, the upsweep still visits all modules in the upwards
+closure of those whose sources have changed. However, at some point
+we hopefully make a transition from generating new @ModuleDetails@ the
+expensive way (recompilation) to a cheap way (recycling old
+@ModuleIFaces@). Either way, all modules still get new
+@ModuleDetails@, so the global symbol table is correctly
+\subsection{How linking works, roughly}
+When @compile@ translates a module, it produces a @ModuleDetails@,
+@ModuleIFace@ and a @Linkable@. The @Linkable@ contains the
+translated but un-linked code for the module. And when @compile@
+ventures into an interface in package it hasn't seen so far, it
+copies the package's object code into memory, producing one or more
+@Linkable@s. CM keeps track of these linkables.
+Once all modules have been @compile@d, CM invokes @link@, supplying
+the all the @Linkable@s it knows about. If @compile@ had also been
+linking incrementally as it went along, @link@ doesn't have to do
+anything. On the other hand, @compile@ could choose not to be
+incremental, and leave @link@ to do all the work.
+@Linkable@s are opaque to CM. For batch compilation, a @Linkable@
+can record just the name of an object file, DLL, archive, or whatever,
+in which case the CM's call to @link@ supplies exactly the set of
+file names to be linked. @link@ can pass these verbatim to the
+standard system linker.
+\section{Ancient stuff}
+\subsubsection*{Should be selectively merged into ``Background ideas''}
+Top level structure is:
+\item The Compilation Manager (CM) calculates and maintains module
+ dependencies, and knows how create up-to-date object or bytecode
+ for a given module. In doing so it may need to recompile
+ arbitrary other modules, based on its knowledge of the module
+ dependencies.
+\item On top of the CM are the ``user-level'' services. We envisage
+ both a HEP-like interface, for interactive use, and an
+ @hmake@ style batch compiler facility.
+\item The CM only deals with inter-module issues. It knows nothing
+ about how to recompile an individual module, nor where the compiled
+ result for a module lives, nor how to tell if
+ a module is up to date, nor how to find the dependencies of a module.
+ Instead, these services are supplied abstractly to CM via a
+ @Compiler@ record. To a first approximation, a @Compiler@
+ contains
+ the same functionality as @hsc@ has had until now -- the ability to
+ translate a single Haskell module to C/assembly/object/bytecode.
+ Different clients of CM (HEP vs @hmake@) may supply different
+ @Compiler@s, since they need slightly different behaviours.
+ Specifically, HEP needs a @Compiler@ which creates bytecode
+ in memory, and knows how to link it, whereas @hmake@ wants
+ the traditional behaviour of emitting assembly code to disk,
+ and making no attempt at linkage.
+\subsection{Open questions}
+ Error reporting from @open@ and @compile@.
+ Instance environment management
+ We probably need to make interface files say what
+ packages they depend on (so that we can figure out
+ which packages to load/link).
+ CM is parameterised both by the client uses and the @Compiler@
+ supplied. But it doesn't make sense to have a HEP-style client
+ attached to a @hmake@-style @Compiler@. So, really, the
+ parameterising entity should contain both aspects, not just the
+ current @Compiler@ contents.
+\item Packages other than the "current" one are assumed to be
+ already compiled.
+ The "current" package is usually "MAIN",
+ but we can set it with a command-line flag.
+ One invocation of ghci has only one "current" package.
+ Packages are not mutually recursive
+ All the object code for a package P is in libP.a or libP.dll
+\subsection{Stuff we need to be able to do}
+\item Create the environment in which a module has been translated,
+ so that interactive queries can be satisfied as if ``in'' that
+ module.
+\section{The Compilation Manager}
+CM (@compilationManager@) is a functor, thus:
+compilationManager :: Compiler -> IO HEP -- IO so that it can create
+ -- global vars (IORefs)
+data HEP = HEP {
+ load :: ModuleName -> IO (),
+ compileString :: ModuleName -> String -> IO HValue,
+ ....
+ }
+newCompiler :: IO Compiler -- ??? this is a peer of compilationManager?
+run :: HValue -> IO () -- Run an HValue of type IO ()
+ -- In HEP?
+@load@ is the central action of CM: its job is to bring a module and
+all its descendents into an executable state, by doing the following:
+ Use @summarise@ to descend the module hierarchy, starting from the
+ nominated root, creating @ModuleSummary@s, and
+ building a map @ModuleName@ @->@ @ModuleSummary@. @summarise@
+ expects to be passed absolute paths to files. Use @finder@ to
+ convert module names to file paths.
+ Topologically sort the map,
+ using dependency info in the @ModuleSummary@s.
+ Clean up the symbol table by deleting the upward closure of
+ changed modules.
+ Working bottom to top, call @compile@ on the upward closure of
+ all modules whose source has changed. A module's source has
+ changed when @sourceHasChanged@ indicates there is a difference
+ between old and new summaries for the module. Update the running
+ @FiniteMap@ @ModuleName@ @ModuleDetails@ with the new details
+ for this module. Ditto for the running
+ @FiniteMap@ @ModuleName@ @ModuleIFace@.
+ Call @compileDone@ to signify that we've reached the top, so
+ that the batch system can now link.
+\section{A compiler}
+Most of the system's complexity is hidden inside the functions
+supplied in the @Compiler@ record:
+data Compiler = Compiler {
+ finder :: PackageConf -> [Path] -> IO (ModuleName -> ModuleLocation)
+ summarise :: ModuleLocation -> IO ModuleSummary
+ compile :: ModuleSummary
+ -> Maybe ModuleIFace
+ -> FiniteMap ModuleName ModuleDetails
+ -> IO CompileResult
+ compileDone :: IO ()
+ compileStarting :: IO () -- still needed? I don't think so.
+ }
+type ModuleName = String (or some such)
+type Path = String -- an absolute file name
+\subsection{The module \mbox{\tt finder}}
+The @finder@, given a package configuration file and a list of
+directories to look in, will map module names to @ModuleLocation@s,
+in which the @Path@s are filenames, probably with an absolute path
+to them.
+data ModuleLocation = SourceOnly Path -- .hs
+ | ObjectCode Path Path -- .o & .hi
+ | InPackage Path -- .hi
+@SourceOnly@ and @ObjectCode@ are unremarkable. For sanity,
+we require that a module's object and interface be in the same
+directory. @InPackage@ indicates that the module is in a
+different package.
+@Module@ values -- perhaps all @Name@ish things -- contain the name of
+their package. That's so that
+\item Correct code can be generated for in-DLL vs out-of-DLL refs.
+\item We don't have version number dependencies for symbols
+ imported from different packages.
+Somehow or other, it will be possible to know all the packages
+required, so that the for the linker can load them.
+We could detect package dependencies by recording them in the
+@compile@r's @ModuleIFace@ cache, and with that and the
+package config info, figure out the complete set of packages
+to link. Or look at the command line args on startup.
+\ToDo{Need some way to tell incremental linkers about packages,
+ since in general we'll need to load and link them before
+ linking any modules in the current package.}
+\subsection{The module \mbox{\tt summarise}r}
+Given a filename of a module (\ToDo{presumably source or iface}),
+create a summary of it. A @ModuleSummary@ should contain only enough
+information for CM to construct an up-to-date picture of the
+dependency graph. Rather than expose CM to details of timestamps,
+etc, @summarise@ merely provides an up-to-date summary of any module.
+CM can extract the list of dependencies from a @ModuleSummary@, but
+other than that has no idea what's inside it.
+data ModuleSummary = ... (abstract) ...
+depsFromSummary :: ModuleSummary -> [ModuleName] -- module names imported
+sourceHasChanged :: ModuleSummary -> ModuleSummary -> Bool
+@summarise@ is intended to be fast -- a @stat@ of the source or
+interface to see if it has changed, and, if so, a quick semi-parse to
+determine the new imports.
+\subsection{The module \mbox{\tt compile}r}
+@compile@ traffics in @ModuleIFace@s and @ModuleDetails@.
+A @ModuleIFace@ is an in-memory representation of the contents of an
+interface file, including version numbers, unfoldings and pragmas, and
+the linkable code for the module. @ModuleIFace@s are un-renamed,
+using @HsSym@/@RdrNames@ rather than (globally distinct) @Names@.
+@ModuleDetails@, by contrast, is an in-memory representation of the
+static environment created by compiling a module. It is phrased in
+terms of post-renaming @Names@, @TyCon@s, etc, so it's basically a
+renamed-to-global-uniqueness rendition of a @ModuleIFace@.
+In an interactive session, we'll want to be able to evaluate
+expressions as if they had been compiled in the scope of some
+specified module. This means that the @ModuleDetails@ must contain
+the type of everything defined in the module, rather than just the
+types of exported stuff. As a consequence, @ModuleIFace@ must also
+contain the type of everything, because it should always be possible
+to generate a module's @ModuleDetails@ from its @ModuleIFace@.
+CM maintains two mappings, one from @ModuleName@s to @ModuleIFace@s,
+the other from @ModuleName@s to @ModuleDetail@s. It passes the former
+to each call of @compile@. This is used to supply information about
+modules compiled prior to this one (lower down in the graph). The
+returned @CompileResult@ supplies a new @ModuleDetails@ for the module
+if compilation succeeded, and CM adds this to the mapping. The
+@CompileResult@ also supplies a new @ModuleIFace@, which is either the
+same as that supplied to @compile@, if @compile@ decided not to
+retranslate the module, or is the result of a fresh translation (from
+source). So these mappings are an explicitly-passed-around part of
+the global system state.
+@compile@ may also {\em optionally} also accumulate @ModuleIFace@s for
+modules in different packages -- that is, interfaces which we read,
+but never attempt to recompile source for. Such interfaces, being
+from foreign packages, never change, so @compile@ can accumulate them
+in perpetuity in a private global variable. Indeed, a major motivator
+of this design is to facilitate this caching of interface files,
+reading of which is a serious bottleneck for the current compiler.
+When CM restarts compilation down at the bottom of the module graph,
+it first needs to throw away all \ToDo{all?} @ModuleDetails@ in the
+upward closure of the out-of-date modules. So @ModuleDetails@ don't
+persist across recompilations. But @ModuleIFace@s do, since they
+are conceptually equivalent to interface files.
+\subsubsection*{What @compile@ returns}
+@compile@ returns a @CompileResult@ to CM.
+Note that the @compile@'s foreign-package interface cache can
+become augmented even as a result of reading interfaces for a
+compilation attempt which ultimately fails, although it will not be
+augmented with a new @ModuleIFace@ for the failed module.
+-- CompileResult is not abstract to the Compilation Manager
+data CompileResult
+ = CompOK ModuleIFace
+ ModuleDetails -- compiled ok, here are new details
+ -- and new iface
+ | CompErr [SDoc] -- compilation gave errors
+ | NoChange -- no change required, meaning:
+ -- exports, unfoldings, strictness, etc,
+ -- unchanged, and executable code unchanged
+\subsubsection*{Re-establishing local-to-global name mappings}
+module Upper where module Lower ( f ) where
+import Lower ( f ) f = ...
+g = ... f ...
+When @Lower@ is first compiled, @f@ is allocated a @Unique@
+(presumably inside an @Id@ or @Name@?). When @Upper@ is then
+compiled, its reference to @f@ is attached directly to the
+@Id@ created when compiling @Lower@.
+If the definition of @f@ is now changed, but not the type,
+unfolding, strictness, or any other thing which affects the way
+it should be called, we will have to recompile @Lower@, but not
+@Upper@. This creates a problem -- @g@ will then refer to the
+the old @Id@ for @f@, not the new one. This may or may not
+matter, but it seems safer to ensure that all @Unique@-based
+references into child modules are always up to date.
+So @compile@ recreates the @ModuleDetails@ for @Upper@ from
+the @ModuleIFace@ of @Upper@ and the @ModuleDetails@ of @Lower@.
+The rule is: if a module is up to date with respect to its
+source, but a child @C@ has changed, then either:
+\item On examination of the version numbers in @C@'s
+ interface/@ModuleIFace@ that we used last time, we discover that
+ an @Id@/@TyCon@/class/instance we depend on has changed. So
+ we need to retranslate the module from its source, generating
+ a new @ModuleIFace@ and @ModuleDetails@.
+\item Or: there's nothing in @C@'s interface that we depend on.
+ So we quickly recreate a new @ModuleDetails@ from the existing
+ @ModuleIFace@, creating fresh links to the new @Unique@-world
+ entities in @C@'s new @ModuleDetails@.
+Upshot: we need to redo @compile@ on all modules all the way up,
+rather than just the ones that need retranslation. However, we hope
+that most modules won't need retranslation -- just regeneration of the
+@ModuleDetails@ from the @ModuleIFace@. In effect, the @ModuleIFace@
+is a quickly-compilable representation of the module's contents, just
+enough to create the @ModuleDetails@.
+\ToDo{Is there anything in @ModuleDetails@ which can't be
+ recreated from @ModuleIFace@ ?}
+So the @ModuleIFace@s persist across calls to @HEP.load@, whereas
+@ModuleDetails@ are reconstructed on every compilation pass. This
+means that @ModuleIFace@s have the same lifetime as the byte/object
+code, and so should somehow contain their code.
+The behind-the-scenes @ModuleIFace@ cache has some kind of holding-pen
+arrangement, to lazify the copying-out of stuff from it, and thus to
+minimise redundant interface reading. \ToDo{Burble burble. More
+When CM starts working back up the module graph with @compile@, it
+needs to remove from the travelling @FiniteMap@ @ModuleName@
+@ModuleDetails@ the details for all modules in the upward closure of
+the compilation start points. However, since we're going to visit
+precisely those modules and no others on the way back up, we might as
+well just zap them the old @ModuleDetails@ incrementally. This does
+mean that the @FiniteMap@ @ModuleName@ @ModuleDetails@ will be
+inconsistent until we reach the top.
+In interactive mode, each @compile@ call on a module for which no
+object code is available, or for which it is out of date wrt source,
+emit bytecode into memory, update the resulting @ModuleIFace@ with the
+address of the bytecode image, and link the image.
+In batch mode, emit assembly or object code onto disk. Record
+somewhere \ToDo{where?} that this object file needs to go into the
+final link.
+When we reach the top, @compileDone@ is called, to signify that batch
+linking can now proceed, if need be.
+Modules in other packages never get a @ModuleIFace@ or @ModuleDetails@
+entry in CM's maps -- those maps are only for modules in this package.
+As previously mentioned, @compile@ may optionally cache @ModuleIFace@s
+for foreign package modules. When reading such an interface, we don't
+need to read the version info for individual symbols, since foreign
+packages are assumed static.
+\subsubsection*{What's in a \mbox{\tt ModuleIFace}?}
+Current interface file contents?
+\subsubsection*{What's in a \mbox{\tt ModuleDetails}?}
+There is no global symbol table @:: Name -> ???@. To look up a
+@Name@, first extract the @ModuleName@ from it, look that up in
+the passed-in @FiniteMap@ @ModuleName@ @ModuleDetails@,
+and finally look in the relevant @Env@.
+\ToDo{Do we still have the @HoldingPen@, or is it now composed from
+per-module bits too?}
+data ModuleDetails = ModuleDetails {
+ moduleExports :: what it exports (Names)
+ -- roughly a subset of the .hi file contents
+ moduleEnv :: RdrName -> Name
+ -- maps top-level entities in this module to
+ -- globally distinct (Uniq-ified) Names
+ moduleDefs :: Bag Name -- All the things in the global symbol table
+ -- defined by this module
+ package :: Package -- what package am I in?
+ lastCompile :: Date -- of last compilation
+ instEnv :: InstEnv -- local inst env
+ typeEnv :: Name -> TyThing -- local tycon env?
+ }
+-- A (globally unique) symbol table entry. Note that Ids contain
+-- unfoldings.
+data TyThing = AClass Class
+ | ATyCon TyCon
+ | AnId Id
+What's the stuff in @ModuleDetails@ used for?
+\item @moduleExports@ so that the stuff which is visible from outside
+ the module can be calculated.
+\item @moduleEnv@: \ToDo{umm err}
+\item @moduleDefs@: one reason we want this is so that we can nuke the
+ global symbol table contribs from this module when it leaves the
+ system. \ToDo{except ... we don't have a global symbol table any
+ more.}
+\item @package@: we will need to chase arbitrarily deep into the
+ interfaces of other packages. Of course we don't want to
+ recompile those, but as we've read their interfaces, we may
+ as well cache that info. So @package@ indicates whether this
+ module is in the default package, or, if not, which it is in.
+ Also, when we come to linking, we'll need to know which
+ packages are demanded, so we know to load their objects.
+\item @lastCompile@: When the module was last compiled. If the
+ source is older than that, then a recompilation can only be
+ required if children have changed.
+\item @typeEnv@: obvious??
+\item @instEnv@: the instances contributed by this module only. The
+ Report allegedly says that when a module is translated, the
+ available
+ instance env is all the instances in the downward closure of
+ itself in the module graph.
+ We choose to use this simple representation -- each module
+ holds just its own instances -- and do the naive thing when
+ creating an inst env for compilation with. If this turns out
+ to be a performance problem we'll revisit the design.
+\section{Misc text looking for a home}
+\ToDo{All this linking stuff is now bogus.}
+There's an abstract @LinkState@, which is threaded through the linkery
+bits. CM can call @addpkgs@ to notify the linker of packages
+required, and it can call @addmods@ to announce modules which need to
+be linked. Finally, CM calls @endlink@, after which an executable
+image should be ready. The linker may link incrementally, during each
+call of @addpkgs@ and @addmods@, or it can just store up names and do
+all the linking when @endlink@ is called.
+In order that incremental linking is possible, CM should specify
+packages and module groups in dependency order, ie, from the bottom up.
+\subsection*{In-memory linking of bytecode}
+When being HEP-like, @compile@ will translate sources to bytecodes
+in memory, with all the bytecode for a module as a contiguous lump
+outside the heap. It needs to communicate the addresses of these
+lumps to the linker. The linker also needs to know whether a
+given module is available as in-memory bytecode, or whether it
+needs to load machine code from a file.
+I guess @LinkState@ needs to map module names to base addresses
+of their loaded images, + the nature of the image, + whether or not
+the image has been linked.
+\subsection*{On disk linking of object code, to give an executable}
+The @LinkState@ in this case is just a list of module and package
+names, which @addpkgs@ and @addmods@ add to. The final @endlink@
+call can invoke the system linker.
+\subsection{Finding out about packages, dependencies, and auxiliary
+ objects}
+Ask the @packages.conf@ file that lives with the driver at the mo.
+\ToDo{policy about upward closure?}
+\ToDo{record story about how in memory linking is done.}
+\ToDo{linker start/stop/initialisation/persistence. Need to
+ say more about @LinkState@.}