diff options
Diffstat (limited to 'docs/ghci')
-rw-r--r-- | docs/ghci/ghci.tex | 1598 |
1 files changed, 1598 insertions, 0 deletions
diff --git a/docs/ghci/ghci.tex b/docs/ghci/ghci.tex new file mode 100644 index 0000000000..c4638a6719 --- /dev/null +++ b/docs/ghci/ghci.tex @@ -0,0 +1,1598 @@ +% +% (c) The OBFUSCATION-THROUGH-GRATUITOUS-PREPROCESSOR-ABUSE Project, +% Glasgow University, 1990-2000 +% + +% \documentstyle[preprint]{acmconf} +\documentclass[11pt]{article} +\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin +\evensidemargin 0.1 in +\marginparwidth 0.85in % Narrow margins require narrower marginal notes +\marginparsep 0 in +\sloppy + +%\usepackage{epsfig} +\usepackage{shortvrb} +\MakeShortVerb{\@} + +%\newcommand{\note}[1]{{\em Note: #1}} +\newcommand{\note}[1]{{{\bf Note:}\sl #1}} +\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}} +\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}} +\newcommand{\bottom}{\perp} + +\newcommand{\secref}[1]{Section~\ref{sec:#1}} +\newcommand{\figref}[1]{Figure~\ref{fig:#1}} +\newcommand{\Section}[2]{\section{#1}\label{sec:#2}} +\newcommand{\Subsection}[2]{\subsection{#1}\label{sec:#2}} +\newcommand{\Subsubsection}[2]{\subsubsection{#1}\label{sec:#2}} + +% DIMENSION OF TEXT: +\textheight 8.5 in +\textwidth 6.25 in + +\topmargin 0 in +\headheight 0 in +\headsep .25 in + + +\setlength{\parskip}{0.15cm} +\setlength{\parsep}{0.15cm} +\setlength{\topsep}{0cm} % Reduces space before and after verbatim, + % which is implemented using trivlist +\setlength{\parindent}{0cm} + +\renewcommand{\textfraction}{0.2} +\renewcommand{\floatpagefraction}{0.7} + +\begin{document} + +\title{The GHCi Draft Design, round 2} +\author{MSR Cambridge Haskell Crew \\ + Microsoft Research Ltd., Cambridge} + +\maketitle + +%%%\tableofcontents +%%%\newpage + +%%-----------------------------------------------------------------%% +\section{Details} + +\subsection{Outline of the design} +\label{sec:details-intro} + +The design falls into three major parts: +\begin{itemize} +\item The compilation manager (CM), which coordinates the + system and supplies a HEP-like interface to clients. +\item The module compiler (@compile@), which translates individual + modules to interpretable or machine code. +\item The linker (@link@), + which maintains the executable image in interpreted mode. +\end{itemize} + +There are also three auxiliary parts: the finder, which locates +source, object and interface files, the summariser, which quickly +finds dependency information for modules, and the static info +(compiler flags and package details), which is unchanged over the +course of a session. + +This section continues with an overview of the session-lifetime data +structures. Then follows the finder (section~\ref{sec:finder}), +summariser (section~\ref{sec:summariser}), +static info (section~\ref{sec:staticinfo}), +and finally the three big sections +(\ref{sec:manager},~\ref{sec:compiler},~\ref{sec:linker}) +on the compilation manager, compiler and linker respectively. + +\subsubsection*{Some terminology} + +Lifetimes: the phrase {\bf session lifetime} covers a complete run of +GHCI, encompassing multiple recompilation runs. {\bf Module lifetime} +is a lot shorter, being that of data needed to translate a single +module, but then discarded, for example Core, AbstractC, Stix trees. + +Data structures with module lifetime are well documented and understood. +This document is mostly concerned with session-lifetime data. +Most of these structures are ``owned'' by CM, since that's +the only major component of GHCI which deals with session-lifetime +issues. + +Modules and packages: {\bf home} refers to modules in this package, +precisely the ones tracked and updated by the compilation manager. +{\bf Package} refers to all other packages, which are assumed static. + +\subsubsection*{A summary of all session-lifetime data structures} + +These structures have session lifetime but not necessarily global +visibility. Subsequent sections elaborate who can see what. +\begin{itemize} +\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming + environments created by compiling each home module. +\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory + representations of the interface file created by compiling + each home module. +\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet + unlinked translations of home modules only. +\item {\bf Module Graph (MG)} (owner: CM) is the current module graph. +\item {\bf Static Info (SI)} (owner: CM) is the package configuration + information (PCI) and compiler flags (FLAGS). +\item {\bf Persistent Compiler State (PCS)} (owner: @compile@) + is @compile@'s private cache of information about package + modules. +\item {\bf Persistent Linker State (PLS)} (owner: @link@) is + @link@'s private information concerning the the current + state of the (in-memory) executable image. +\end{itemize} + + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{The finder (\mbox{\tt type Finder})} +\label{sec:finder} + +@Path@ could be an indication of a location in a filesystem, or it +could be some more generic kind of resource identifier, a URL for +example. +\begin{verbatim} + data Path = ... +\end{verbatim} + +And some names. @Module@s are now used as primary keys for various +maps, so they are given a @Unique@. +\begin{verbatim} + type ModName = String -- a module name + type PkgName = String -- a package name + type Module = -- contains ModName and a Unique, at least +\end{verbatim} + +A @ModLocation@ says where a module is, what it's called and in what +form it is. +\begin{verbatim} + data ModLocation = SourceOnly Module Path -- .hs + | ObjectCode Module Path Path -- .o, .hi + | InPackage Module PkgName + -- examine PCI to determine package Path +\end{verbatim} + +The module finder generates @ModLocation@s from @ModName@s. We expect +it will assume packages to be static, but we want to be able to track +changes in home modules during the session. Specifically, we want to +be able to notice that a module's object and interface have been +updated, presumably by a compile run outside of the GHCI session. +Hence the two-stage type: +\begin{verbatim} + type Finder = ModName -> IO ModLocation + newFinder :: PCI -> IO Finder +\end{verbatim} +@newFinder@ examines the package information right at the start, but +returns an @IO@-typed function which can inspect home module changes +later in the session. + + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{The summariser (\mbox{\tt summarise})} +\label{sec:summariser} + +A @ModSummary@ records the minimum information needed to establish the +module graph and determine whose source has changed. @ModSummary@s +can be created quickly. +\begin{verbatim} + data ModSummary = ModSummary + ModLocation -- location and kind + (Maybe (String, Fingerprint)) + -- source and fingerprint if .hs + (Maybe [ModName]) -- imports if .hs or .hi + + type Fingerprint = ... -- file timestamp, or source checksum? + + summarise :: ModLocation -> IO ModSummary +\end{verbatim} + +The summary contains the location and source text, and the location +contains the name. We would like to remove the assumption that +sources live on disk, but I'm not sure this is good enough yet. + +\ToDo{Should @ModSummary@ contain source text for interface files too?} +\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@ (why?).} + + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{Static information (SI)} +\label{sec:staticinfo} + +PCI, the package configuration information, is a list of @PkgInfo@, +each containing at least the following: +\begin{verbatim} + data PkgInfo + = PkgInfo PkgName -- my name + Path -- path to my base location + [PkgName] -- who I depend on + [ModName] -- modules I supply + [Unlinked] -- paths to my object files + + type PCI = [PkgInfo] +\end{verbatim} +The @Path@s in it, including those in the @Unlinked@s, are set up +when GHCI starts. + +FLAGS is a bunch of compiler options. We haven't figured out yet how +to partition them into those for the whole session vs those for +specific source files, so currently the best we can do is: +\begin{verbatim} + data FLAGS = ... +\end{verbatim} + +The static information (SI) is the both of these: +\begin{verbatim} + data SI = SI PCI + FLAGS +\end{verbatim} + + + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{The Compilation Manager (CM)} +\label{sec:manager} + +\subsubsection{Data structures owned by CM} + +CM maintains two maps (HST, HIT) and a set (UI). It's important to +realise that CM only knows about the map/set-ness, and has no idea +what a @ModDetails@, @ModIFace@ or @Linkable@ is. Only @compile@ and +@link@ know that, and CM passes these types around without +inspecting them. + +\begin{itemize} +\item + {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@ + + The @ModDetails@ (a couple of layers down) contain tycons, classes, + instances, etc, collectively known as ``entities''. Referrals from + other modules to these entities is direct, with no intervening + indirections of any kind; conversely, these entities refer directly + to other entities, regardless of module boundaries. HST only holds + information for home modules; the corresponding wired-up details + for package (non-home) modules are created on demand in the package + symbol table (PST) inside the persistent compiler's state (PCS). + + CM maintains the HST, which is passed to, but not modified by, + @compile@. If compilation of a module is successful, @compile@ + returns the resulting @ModDetails@ (inside the @CompResult@) which + CM then adds to HST. + + CM throws away arbitrarily large parts of HST at the start of a + rebuild, and uses @compile@ to incrementally reconstruct it. + +\item + {\bf Home Interface Table (HIT)} @:: FiniteMap Module ModIFace@ + + (Completely private to CM; nobody else sees this). + + Compilation of a module always creates a @ModIFace@, which contains + the unlinked symbol table entries. CM maintains this @FiniteMap@ + @ModName@ @ModIFace@, with session lifetime. CM never throws away + @ModIFace@s, but it does update them, by passing old ones to + @compile@ if they exist, and getting new ones back. + + CM acquires @ModuleIFace@s from @compile@, which it only applies + to modules in the home package. As a result, HIT only contains + @ModuleIFace@s for modules in the home package. Those from other + packages reside in the package interface table (PIT) which is a + component of PCS. + +\item + {\bf Unlinked Images (UI)} @:: Set Linkable@ + + The @Linkable@s in UI represent executable but as-yet unlinked + module translations. A @Linkable@ can contain the name of an + object, archive or DLL file. In interactive mode, it may also be + the STG trees derived from translating a module. So @compile@ + returns a @Linkable@ from each successful run, namely that of + translating the module at hand. + + At link-time, CM supplies @Linkable@s for the upwards closure of + all packages which have changed, to @link@. It also examines the + @ModSummary@s for all home modules, and by examining their imports + and the SI.PCI (package configuration info) it can determine the + @Linkable@s from all required imported packages too. + + @Linkable@s and @ModIFace@s have a close relationship. Each + translated module has a corresponding @Linkable@ somewhere. + However, there may be @Linkable@s with no corresponding modules + (the RTS, for example). Conversely, multiple modules may share a + single @Linkable@ -- as is the case for any module from a + multi-module package. For these reasons it seems appropriate to + keep the two concepts distinct. @Linkable@s also provide + information about the sequence in which individual package + components should be linked, and that isn't the business of any + specific module to know. + + CM passes @compile@ a module's old @ModIFace@, if it has one, in + the hope that the module won't need recompiling. If so, @compile@ + can just return the new @ModDetails@ created from it, and CM will + re-use the old @ModIFace@. If the module {\em is} recompiled (or + scheduled to be loaded from disk), @compile@ returns both the + new @ModIFace@ and new @Linkable@. + +\item + {\bf Module Graph (MG)} @:: known-only-to-CM@ + + Records, for CM's purposes, the current module graph, + up-to-dateness and summaries. More details when I get to them. + Only contains home modules. +\end{itemize} +Probably all this stuff is rolled together into the Persistent CM +State (PCMS): +\begin{verbatim} + data PCMS = PCMS HST HIT UI MG + emptyPCMS :: IO PCMS +\end{verbatim} + +\subsubsection{What CM implements} +It pretty much implements the HEP interface. First, though, define a +containing structure for the state of the entire CM system and its +subsystems @compile@ and @link@: +\begin{verbatim} + data CmState + = CmState PCMS -- CM's stuff + PCS -- compile's stuff + PLS -- link's stuff + SI -- the static info, never changes + Finder -- the finder +\end{verbatim} + +The @CmState@ is threaded through the HEP interface. In reality +this might be done using @IORef@s, but for clarity: +\begin{verbatim} + type ModHandle = ... (opaque to CM/HEP clients) ... + type HValue = ... (opaque to CM/HEP clients) ... + + cmInit :: FLAGS + -> [PkgInfo] + -> IO CmState + + cmLoadModule :: CmState + -> ModName + -> IO (CmState, Either [SDoc] ModHandle) + + cmGetExpr :: ModHandle + -> CmState + -> String -> IO (CmState, Either [SDoc] HValue) + + cmRunExpr :: HValue -> IO () -- don't need CmState here +\end{verbatim} +Almost all the huff and puff in this document pertains to @cmLoadModule@. + + +\subsubsection{Implementing \mbox{\tt cmInit}} +@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@, +@emptyPLS@, making SI from the supplied flags and package info, and +by supplying the package info the @newFinder@. + + +\subsubsection{Implementing \mbox{\tt cmLoadModule}} + +\begin{enumerate} +\item {\bf Downsweep:} using @finder@ and @summarise@, chase from + the given module to + establish the new home module graph (MG). Do not chase into + package modules. +\item Remove from HIT, HST, UI any modules in the old MG which are + not in the new one. The old MG is then replaced by the new one. +\item Topologically sort MG to generate a bottom-to-top traversal + order, giving a worklist. +\item {\bf Upsweep:} call @compile@ on each module in the worklist in + turn, passing it + the ``correct'' HST, PCS, the old @ModIFace@ if + available, and the summary. ``Correct'' HST in the sense that + HST contains only the modules in the this module's downward + closure, so that @compile@ can construct the correct instance + and rule environments simply as the union of those in + the module's downward closure. + + If @compile@ doesn't return a new interface/linkable pair, + compilation wasn't necessary. Either way, update HST with + the new @ModDetails@, and UI and HIT respectively if a + compilation {\em did} occur. + + Keep going until the root module is successfully done, or + compilation fails. + +\item If the previous step terminated because compilation failed, + define the successful set as those modules in successfully + completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding + those from modules in any cycle which includes the module which failed. + Remove from HST, HIT, UI and MG all modules mentioned in MG which + are not in the successful set. Call @link@ with the successful + set, + which should succeed. The net effect is to back off to a point + in which those modules which are still aboard are correctly + compiled and linked. + + If the previous step terminated successfully, + call @link@ passing it the @Linkable@s in the upward closure of + all those modules for which @compile@ produced a new @Linkable@. +\end{enumerate} +As a small optimisation, do this: +\begin{enumerate} +\item[3a.] Remove from the worklist any module M where M's source + hasn't changed and neither has the source of any module in M's + downward closure. This has the effect of not starting the upsweep + right at the bottom of the graph when that's not needed. + Source-change checking can be done quickly by CM by comparing + summaries of modules in MG against corresponding + summaries from the old MG. +\end{enumerate} + + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{The compiler (\mbox{\tt compile})} +\label{sec:compiler} + +\subsubsection{Data structures owned by \mbox{\tt compile}} + +{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@ + +This contains info about foreign packages only, acting as a cache, +which is private to @compile@. The cache never becomes out of +date. There are three parts to it: + + \begin{itemize} + \item + {\bf Package Interface Table (PIT)} @:: FiniteMap Module ModIFace@ + + @compile@ reads interfaces from modules in foreign packages, and + caches them in the PIT. Subsequent imports of the same module get + them directly out of the PIT, avoiding slow lexing/parsing phases. + Because foreign packages are assumed never to become out of date, + all contents of PIT remain valid forever. @compile@ of course + tries to find package interfaces in PIT in preference to reading + them from files. + + Both successful and failed runs of @compile@ can add arbitrary + numbers of new interfaces to the PIT. The failed runs don't matter + because we assume that packages are static, so the data cached even + by a failed run is valid forever (ie for the rest of the session). + + \item + {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@ + + Adding an package interface to PIT doesn't make it directly usable + to @compile@, because it first needs to be wired (renamed + + typechecked) into the sphagetti of the HST. On the other hand, + most modules only use a few entities from any imported interface, + so wiring-in the interface at PIT-entry time might be a big time + waster. Also, wiring in an interface could mean reading other + interfaces, and we don't want to do that unnecessarily. + + The PST avoids these problems by allowing incremental wiring-in to + happen. Pieces of foreign interfaces are copied out of the holding + pen (HP), renamed, typechecked, and placed in the PST, but only as + @compile@ discovers it needs them. In the process of incremental + renaming/typechecking, @compile@ may need to read more package + interfaces, which are added to the PIT and hence to + HP.~\ToDo{How? When?} + + CM passes the PST to @compile@ and is returned an updated version + on both success and failure. + + \item + {\bf Holding Pen (HP)} @:: HoldingPen@ + + HP holds parsed but not-yet renamed-or-typechecked fragments of + package interfaces. As typechecking of other modules progresses, + fragments are removed (``slurped'') from HP, renamed and + typechecked, and placed in PCS.PST (see above). Slurping a + fragment may require new interfaces to be read into HP. The hope + is, though, that many fragments will never get slurped, reducing + the total number of interfaces read (as compared to eager slurping). + + \end{itemize} + + PCS is opaque to CM; only @compile@ knows what's in it, and how to + update it. Because packages are assumed static, PCS never becomes + out of date. So CM only needs to be able to create an empty PCS, + with @emptyPCS@, and thence just passes it through @compile@ with + no further ado. + + In return, @compile@ must promise not to store in PCS any + information pertaining to the home modules. If it did so, CM would + need to have a way to remove this information prior to commencing a + rebuild, which conflicts with PCS's opaqueness to CM. + + + + +\subsubsection{What {\tt compile} does} +@compile@ is necessarily somewhat complex. We've decided to do away +with private global variables -- they make the design specification +less clear, although the implementation might use them. Without +further ado: +\begin{verbatim} + compile :: SI -- obvious + -> Finder -- to find modules + -> ModSummary -- summary, including source + -> Maybe ModIFace + -- former summary, if avail + -> HST -- for home module ModDetails + -> PCS -- IN: the persistent compiler state + + -> IO CompResult + + data CompResult + = CompOK ModDetails -- new details (== HST additions) + (Maybe (ModIFace, Linkable)) + -- summary and code; Nothing => compilation + -- not needed (old summary and code are still valid) + PCS -- updated PCS + [SDoc] -- warnings + + | CompErrs PCS -- updated PCS + [SDoc] -- warnings and errors + + data PCS + = MkPCS PIT -- package interfaces + PST -- post slurping global symtab contribs + HoldingPen -- pre slurping interface bits and pieces + + emptyPCS :: IO PCS -- since CM has no other way to make one +\end{verbatim} +Although @compile@ is passed three of the global structures (FLAGS, +HST and PCS), it only modifies PCS. The rest are modified by CM as it +sees fit, from the stuff returned in the @CompResult@. + +@compile@ is allowed to return an updated PCS even if compilation +errors occur, since the information in it pertains only to foreign +packages and is assumed to be always-correct. + +What @compile@ does: \ToDo{A bit vague ... needs refining. How does + @finder@ come into the game?} +\begin{itemize} +\item Figure out if this module needs recompilation. + \begin{itemize} + \item If there's no old @ModIFace@, it does. Else: + \item Compare the @ModSummary@ supplied with that in the + old @ModIFace@. If the source has changed, recompilation + is needed. Else: + \item Compare the usage version numbers in the old @ModIFace@ with + those in the imported @ModIFace@s. All needed interfaces + for this should be in either HIT or PIT. If any version + numbers differ, recompilation is needed. + \item Otherwise it isn't needed. + \end{itemize} + +\item + If recompilation is not needed, create a new @ModDetails@ from the + old @ModIFace@, looking up information in HST and PCS.PST as + necessary. Return the new details, a @Nothing@ denoting + compilation was not needed, the PCS \ToDo{I don't think the PCS + should be updated, but who knows?}, and an empty warning list. + +\item + Otherwise, compilation is needed. + + If the module is only available in object+interface form, read the + interface, make up details, create a linkable pointing at the + object code. \ToDo{Does this involve reading any more interfaces? Does + it involve updating PST?} + + Otherwise, translate from source, then create and return: an + details, interface, linkable, updated PST, and warnings. + + When looking for a new interface, search HST, then PCS.PIT, and only + then read from disk. In which case add the new interface(s) to + PCS.PIT. + + \ToDo{If compiling a module with a boot-interface file, check the + boot interface against the inferred interface.} +\end{itemize} + + +\subsubsection{Contents of \mbox{\tt ModDetails}, + \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}} +Only @compile@ can see inside these three types -- they are opaque to +everyone else. @ModDetails@ holds the post-renaming, +post-typechecking environment created by compiling a module. + +\begin{verbatim} + data ModDetails + = ModDetails { + moduleExports :: Avails + moduleEnv :: GlobalRdrEnv -- == FM RdrName [Name] + typeEnv :: FM Name TyThing -- TyThing is in TcEnv.lhs + instEnv :: InstEnv + fixityEnv :: FM Name Fixity + ruleEnv :: FM Id [Rule] + } +\end{verbatim} + +@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@: +\begin{verbatim} + type ModIFace = ParsedIFace -- not really, but ... + data ParsedIface + = ParsedIface { + pi_mod :: Module, -- Complete with package info + pi_vers :: Version, -- Module version number + pi_orphan :: WhetherHasOrphans, -- Whether this module has orphans + pi_usages :: [ImportVersion OccName], -- Usages + pi_exports :: [ExportItem], -- Exports + pi_insts :: [RdrNameInstDecl], -- Local instance declarations + pi_decls :: [(Version, RdrNameHsDecl)], -- Local definitions + pi_fixity :: (Version, [RdrNameFixitySig]), -- Local fixity declarations, + -- with their version + pi_rules :: (Version, [RdrNameRuleDecl]), -- Rules, with their version + pi_deprecs :: [RdrNameDeprecation] -- Deprecations + } +\end{verbatim} + +@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@, +retaining just the 3 pieces actually comprising the holding pen: +\begin{verbatim} + data HoldingPen + = HoldingPen { + iDecls :: DeclsMap, -- A single, global map of Names to decls + + iInsts :: IfaceInsts, + -- The as-yet un-slurped instance decls; this bag is depleted when we + -- slurp an instance decl so that we don't slurp the same one twice. + -- Each is 'gated' by the names that must be available before + -- this instance decl is needed. + + iRules :: IfaceRules + -- Similar to instance decls, only for rules + } +\end{verbatim} + +%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%% +\subsection{The linker (\mbox{\tt link})} +\label{sec:linker} + +\subsubsection{Data structures owned by the linker} + +In the same way that @compile@ has a persistent compiler state (PCS), +the linker has a persistent (session-lifetime) state, PLS, the +Linker's Persistent State. In batch mode PLS is entirely irrelevant, +because there is only a single link step, and can be a unit value +ignored by everybody. In interactive mode PLS is composed of the +following three parts: + +\begin{itemize} +\item +\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@ + The source symbol table is used when linking interpreted code. + Unlinked interpreted code consists of an STG tree where + the leaves are @RdrNames@. The linker's job is to resolve these to + actual addresses (the alternative is to resolve these lazily when + the code is run, but this requires passing the full symbol table + through the interpreter and the repeated lookups will probably be + expensive). + + The source symbol table therefore maps @RdrName@s to @HValue@s, for + every @RdrName@ that currently \emph{has} an @HValue@, including all + exported functions from object code modules that are currently + linked in. Linking therefore turns a @StgTree RdrName@ into an + @StgTree HValue@. + + It is important that we can prune this symbol table by throwing away + the mappings for an entire module, whenever we recompile/relink a + given module. The representation is therefore probably a two-level + mapping, from module names, to function/constructor names, to + @HValue@s. + +\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@ + This is a lower level symbol table, mapping symbol names in object + modules to their addresses in memory. It is used only when + resolving the external references in an object module, and contains + only entries that are defined in object modules. + + Why have two symbol tables? Well, there is a clear distinction + between the two: the source symbol table maps Haskell symbols to + Haskell values, and the object symbol table maps object symbols to + addresses. There is some overlap, in that Haskell symbols certainly + have addresses, and we could look up a Haskell symbol's address by + manufacturing the right object symbol and looking that up in the + object symbol table, but this is likely to be slow and would force + us to extend the object symbol table with all the symbols + ``exported'' by interpreted code. Doing it this way enables us to + decouple the object management subsystem from the rest of the linker + with a minimal interface; something like + + \begin{verbatim} + loadObject :: Unlinked -> IO Object + unloadModule :: Unlinked -> IO () + lookupSymbol :: String -> IO Addr + \end{verbatim} + + Rather unfortunately we need @lookupSymbol@ in order to populate the + source symbol table when linking in a new compiled module. Our + object management subsystem is currently written in C, so decoupling + this interface as much as possible is highly desirable. + +\item + {\bf Linked Image (LI)} @:: no-explicit-representation@ + + LI isn't explicitly represented in the system, but we record it + here for completeness anyway. LI is the current set of + linked-together module, package and other library fragments + constituting the current executable mass. LI comprises: + \begin{itemize} + \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory. + These are loaded from disk when needed, and stored in + @malloc@ville. To simplify storage management, they are + never freed or reused, since this creates serious + complications for storage management. When no longer needed, + they are simply abandoned. New linkings of the same object + code produces new copies in memory. We hope this not to be + too much of a space leak. + \item STG trees, which live in the GHCI heap and are managed by the + storage manager in the usual way. They are held alive (are + reachable) via the @HValue@s in the OST. Such @HValue@s are + applications of the interpreter function to the trees + themselves. Linking a tree comprises travelling over the + tree, replacing all the @Id@s with pointers directly to the + relevant @_closure@ labels, as determined by searching the + OST. Once the leaves are linked, trees are wrapped with the + interpreter function. The resulting @HValue@s then behave + indistinguishably from compiled versions of the same code. + \end{itemize} + Because object code is outside the heap and never deallocated, + whilst interpreted code is held alive via the HST, there's no need + to have a data structure which ``is'' the linked image. + + For batch compilation, LI doesn't exist because OST doesn't exist, + and because @link@ doesn't load code into memory, instead just + invokes the system linker. + + \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...} +\end{itemize} +As with PCS, CM has no way to create an initial PLS, so we supply +@emptyPLS@ for that purpose. + +\subsubsection{The linker's interface} + +In practice, the PLS might be hidden in the I/O monad rather +than passed around explicitly. (The same might be true for PCS). +Anyway: + +\begin{verbatim} + data PLS -- as described above; opaque to everybody except the linker + + link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult + + data LinkResult = LinkOK PLS + | LinkErrs PLS [SDoc] + + emptyPLS :: IO PLS -- since CM has no other way to make one +\end{verbatim} + +CM uses @link@ as follows: + +After repeatedly using @compile@ to compile all modules which are +out-of-date, the @link@ is invoked. The @[[Linkable]]@ argument to +@link@ represents the list of (recursive groups of) home modules which +have been newly compiled, along with @Linkable@s for each of +the packages in use (the compilation manager knows which external +packages are referenced by the home package). The order of the list +is important: it is sorted in such a way that linking any prefix of +the list will result in an image with no unresolved references. Note +that for batch linking there may be further restrictions; for example +it may not be possible to link recursive groups containing libraries. + +@link@ does the following: + +\begin{itemize} + \item + In batch mode, do nothing. In interactive mode, + examine the supplied @[[Linkable]]@ to determine which home + module @Unlinked@s are new. Remove precisely these @Linkable@s + from PLS. (In fact we really need to remove their upwards + transitive closure, but I think it is an invariant that CM will + supply an upwards transitive closure of new modules). + See below for descriptions of @Linkable@ and @Unlinked@. + + \item + Batch system: invoke the external linker to link everything in one go. + Interactive: bind the @Unlinked@s for the newly compiled modules, + plus those for any newly required packages, into PLS. + + Note that it is the linker's responsibility to remember which + objects and packages have already been linked. By comparing this + with the @Linkable@s supplied to @link@, it can determine which + of the linkables in LI are out of date +\end{itemize} + +If linking in of a group should fail for some reason, @link@ should +not modify its PLS at all. In other words, linking each group +is atomic; it either succeeds or fails. + +\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}} + +Two important types: @Unlinked@ and @Linkable@. The latter is a +higher-level representation involving multiple of the former. +An @Unlinked@ is a reference to unlinked executable code, something +a linker could take as input: + +\begin{verbatim} + data Unlinked = DotO Path + | DotA Path + | DotDLL Path + | Trees [StgTree RdrName] +\end{verbatim} + +The first three describe the location of a file (presumably) +containing the code to link. @Trees@, which only exists in +interactive mode, gives a list of @StgTrees@, in which the unresolved +references are @RdrNames@ -- hence it's non-linkedness. Once linked, +those @RdrNames@ are replaced with pointers to the machine code +implementing them. + +A @Linkable@ gathers together several @Unlinked@s and associates them +with either a module or package: + +\begin{verbatim} + data Linkable = LM Module [Unlinked] -- a module + | LP PkgName -- a package +\end{verbatim} + +The order of the @Unlinked@s in the list is important, as +they are linked in left-to-right order. The @Unlinked@ objects for a +particular package can be obtained from the package configuration (see +Section \ref{sec:staticinfo}). + +\ToDo{When adding @Addr@s from an object module to SST, we need to + somehow find out the @RdrName@s of the symbols exported by that + module. + So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?} + + + +%%-----------------------------------------------------------------%% +\section{Background ideas} +\subsubsection*{Out of date, but correct in spirit} + +\subsection{Restructuring the system} + +At the moment @hsc@ compiles one source module into C or assembly. +This functionality is pushed inside a function called @compile@, +introduced shortly. The main new chunk of code is CM, the compilation manager, +which supervises multiple runs of @compile@ so as to create up-to-date +translations of a whole bunch of modules, as quickly as possible. +CM also employs some minor helper functions, @finder@, @summarise@ and +@link@, to do its work. + +Our intent is to allow CM to be used as the basis either of a +multi-module, batch mode compilation system, or to supply an +interactive environment similar to that of Hugs. +Only minor modifications to the behaviour of @compile@ and @link@ +are needed to give these different behaviours. + +CM and @compile@, and, for interactive use, an interpreter, are the +main code components. The most important data structure is the global +symbol table; much design effort has been expended thereupon. + + +\subsection{How the global symbol table is implemented} + +The top level symbol table is a @FiniteMap@ @ModuleName@ +@ModuleDetails@. @ModuleDetails@ contains essentially the environment +created by compiling a module. CM manages this finite map, adding and +deleting module entries as required. + +The @ModuleDetails@ for a module @M@ contains descriptions of all +tycons, classes, instances, values, unfoldings, etc (henceforth +referred to as ``entities''), available from @M@. These are just +trees in the GHCI heap. References from other modules to these +entities is direct -- when you have a @TyCon@ in your hand, you really +have a pointer directly to the @TyCon@ structure in the defining module, +rather than some kind of index into a global symbol table. So there +is a global symbol table, but it has a distributed (sphagetti-like?) +nature. + +This gives fast and convenient access to tycon, class, instance, +etc, information. But because there are no levels of indirection, +there's a problem when we replace @M@ with an updated version of @M@. +We then need to find all references to entities in the old @M@'s +sphagetti, and replace them with pointers to the new @M@'s sphagetti. +This problem motivates a large part of the design. + + + +\subsection{Implementing incremental recompilation -- simple version} +Given the following module graph +\begin{verbatim} + D + / \ + / \ + B C + \ / + \ / + A +\end{verbatim} +(@D@ imports @B@ and @C@, @B@ imports @A@, @C@ imports @A@) the aim is to do the +least possible amount of compilation to bring @D@ back up to date. The +simplest scheme we can think of is: +\begin{itemize} +\item {\bf Downsweep}: + starting with @D@, re-establish what the current module graph is + (it might have changed since last time). This means getting a + @ModuleSummary@ of @D@. The summary can be quickly generated, + contains @D@'s import lists, and gives some way of knowing whether + @D@'s source has changed since the last time it was summarised. + + Transitively follow summaries from @D@, thereby establishing the + module graph. +\item + Remove from the global symbol table (the @FiniteMap@ @ModuleName@ + @ModuleDetails@) the upwards closure of all modules in this package + which are out-of-date with respect to their previous versions. Also + remove all modules no longer reachable from @D@. +\item {\bf Upsweep}: + Starting at the lowest point in the still-in-date module graph, + start compiling upwards, towards @D@. At each module, call + @compile@, passing it a @FiniteMap@ @ModuleName@ @ModuleDetails@, + and getting a new @ModuleDetails@ for the module, which is added to + the map. + + When compiling a module, the compiler must be able to know which + entries in the map are for modules in its strict downwards closure, + and which aren't, so that it can manufacture the instance + environment correctly (as union of instances in its downwards + closure). +\item + Once @D@ has been compiled, invoke some kind of linking phase + if batch compilation. For interactive use, can either do it all + at the end, or as you go along. +\end{itemize} +In this simple world, recompilation visits the upwards closure of +all changed modules. That means when a module @M@ is recompiled, +we can be sure no-one has any references to entities in the old @M@, +because modules importing @M@ will have already been removed from the +top-level finite map in the second step above. + +The upshot is that we don't need to worry about updating links to @M@ in +the global symbol table -- there shouldn't be any to update. +\ToDo{What about mutually recursive modules?} + +CM will happily chase through module interfaces in other packages in +the downsweep. But it will only process modules in this package +during the upsweep. So it assumes that modules in other packages +never become out of date. This is a design decision -- we could have +decided otherwise. + +In fact we go further, and require other packages to be compiled, +i.e. to consist of a collection of interface files, and one or more +source files. CM will never apply @compile@ to a foreign package +module, so there's no way a package can be built on the fly from source. + +We require @compile@ to cache foreign package interfaces it reads, so +that subsequent uses don't have to re-read them. The cache never +becomes out of date, since we've assumed that the source of foreign +packages doesn't change during the course of a session (run of GHCI). +As well as caching interfaces, @compile@ must cache, in some sense, +the linkable code for modules. In batch compilation this might simply +mean remembering the names of object files to link, whereas in +interactive mode @compile@ probably needs to load object code into +memory in preparation for in-memory linking. + +Important signatures for this simple scheme are: +\begin{verbatim} + finder :: ModuleName -> ModLocation + + summarise :: ModLocation -> IO ModSummary + + compile :: ModSummary + -> FM ModName ModDetails + -> IO CompileResult + + data CompileResult = CompOK ModDetails + | CompErr [ErrMsg] + + link :: [ModLocation] -> [PackageLocation] -> IO Bool -- linked ok? +\end{verbatim} + + +\subsection{Implementing incremental recompilation -- clever version} + +So far, our upsweep, which is the computationally expensive bit, +recompiles a module if either its source is out of date, or it +imports a module which has been recompiled. Sometimes we know +we can do better than this: +\begin{verbatim} + module B where module A + import A ( f ) {-# NOINLINE f #-} + ... f ... f x = x + 42 +\end{verbatim} +If the definition of @f@ is changed to @f x = x + 43@, the simple +upsweep would recompile @B@ unnecessarily. We would like to detect +this situation and avoid propagating recompilation all the way to the +top. There are two parts to this: detecting when a module doesn't +need recompilation, and managing inter-module references in the +global symbol table. + +\subsubsection*{Detecting when a module doesn't need recompilation} + +To do this, we introduce a new concept: the @ModuleIFace@. This is +effectively an in-memory interface file. References to entities in +other modules are done via strings, rather than being pointers +directly to those entities. Recall that, by comparison, +@ModuleDetails@ do contain pointers directly to the entities they +refer to. So a @ModuleIFace@ is not part of the global symbol table. + +As before, compiling a module produces a @ModuleDetails@ (inside the +@CompileResult@), but it also produces a @ModuleIFace@. The latter +records, amongst things, the version numbers of all imported entities +needed for the compilation of that module. @compile@ optionally also +takes the old @ModuleIFace@ as input during compilation: +\begin{verbatim} + data CompileResult = CompOK ModDetails ModIFace + | CompErr [ErrMsg] + + compile :: ModSummary + -> FM ModName ModDetails + -> Maybe ModuleIFace + -> IO CompileResult +\end{verbatim} +Now, if the @ModuleSummary@ indicates this module's source hasn't +changed, we only need to recompile it if something it depends on has +changed. @compile@ can detect this by inspecting the imported entity +version numbers in the module's old @ModuleIFace@, and comparing them +with the version numbers from the entities in the modules being +imported. If they are all the same, nothing it depends on has +changed, so there's no point in recompiling. + +\subsubsection*{Managing inter-module references in the global symbol table} + +In the above example with @A@, @B@ and @f@, the specified change to @f@ would +require @A@ but not @B@ to be recompiled. That generates a new +@ModuleDetails@ for @A@. Problem is, if we leave @B@'s @ModuleDetails@ +unchanged, they continue to refer (directly) to the @f@ in @A@'s old +@ModuleDetails@. This is not good, especially if equality between +entities is implemented using pointer equality. + +One solution is to throw away @B@'s @ModuleDetails@ and recompile @B@. +But this is precisely what we're trying to avoid, as it's expensive. +Instead, a cheaper mechanism achieves the same thing: recreate @B@'s +details directly from the old @ModuleIFace@. The @ModuleIFace@ will +(textually) mention @f@; @compile@ can then find a pointer to the +up-to-date global symbol table entry for @f@, and place that pointer +in @B@'s @ModuleDetails@. The @ModuleDetails@ are, therefore, +regenerated just by a quick lookup pass over the module's former +@ModuleIFace@. All this applies, of course, only when @compile@ has +concluded it doesn't need to recompile @B@. + +Now @compile@'s signature becomes a little clearer. @compile@ has to +recompile the module, generating a fresh @ModuleDetails@ and +@ModuleIFace@, if any of the following hold: +\begin{itemize} +\item + The old @ModuleIFace@ wasn't supplied, for some reason (perhaps + we've never compiled this module before?) +\item + The module's source has changed. +\item + The module's source hasn't changed, but inspection of @ModuleIFaces@ + for this and its imports indicates that an imported entity has + changed. +\end{itemize} +If none of those are true, we're in luck: quickly knock up a new +@ModuleDetails@ from the old @ModuleIFace@, and return them both. + +As a result, the upsweep still visits all modules in the upwards +closure of those whose sources have changed. However, at some point +we hopefully make a transition from generating new @ModuleDetails@ the +expensive way (recompilation) to a cheap way (recycling old +@ModuleIFaces@). Either way, all modules still get new +@ModuleDetails@, so the global symbol table is correctly +reconstructed. + + +\subsection{How linking works, roughly} + +When @compile@ translates a module, it produces a @ModuleDetails@, +@ModuleIFace@ and a @Linkable@. The @Linkable@ contains the +translated but un-linked code for the module. And when @compile@ +ventures into an interface in package it hasn't seen so far, it +copies the package's object code into memory, producing one or more +@Linkable@s. CM keeps track of these linkables. + +Once all modules have been @compile@d, CM invokes @link@, supplying +the all the @Linkable@s it knows about. If @compile@ had also been +linking incrementally as it went along, @link@ doesn't have to do +anything. On the other hand, @compile@ could choose not to be +incremental, and leave @link@ to do all the work. + +@Linkable@s are opaque to CM. For batch compilation, a @Linkable@ +can record just the name of an object file, DLL, archive, or whatever, +in which case the CM's call to @link@ supplies exactly the set of +file names to be linked. @link@ can pass these verbatim to the +standard system linker. + + + + +%%-----------------------------------------------------------------%% +\section{Ancient stuff} +\subsubsection*{Should be selectively merged into ``Background ideas''} + +\subsection{Overall} +Top level structure is: +\begin{itemize} +\item The Compilation Manager (CM) calculates and maintains module + dependencies, and knows how create up-to-date object or bytecode + for a given module. In doing so it may need to recompile + arbitrary other modules, based on its knowledge of the module + dependencies. +\item On top of the CM are the ``user-level'' services. We envisage + both a HEP-like interface, for interactive use, and an + @hmake@ style batch compiler facility. +\item The CM only deals with inter-module issues. It knows nothing + about how to recompile an individual module, nor where the compiled + result for a module lives, nor how to tell if + a module is up to date, nor how to find the dependencies of a module. + Instead, these services are supplied abstractly to CM via a + @Compiler@ record. To a first approximation, a @Compiler@ + contains + the same functionality as @hsc@ has had until now -- the ability to + translate a single Haskell module to C/assembly/object/bytecode. + + Different clients of CM (HEP vs @hmake@) may supply different + @Compiler@s, since they need slightly different behaviours. + Specifically, HEP needs a @Compiler@ which creates bytecode + in memory, and knows how to link it, whereas @hmake@ wants + the traditional behaviour of emitting assembly code to disk, + and making no attempt at linkage. +\end{itemize} + +\subsection{Open questions} +\begin{itemize} +\item + Error reporting from @open@ and @compile@. +\item + Instance environment management +\item + We probably need to make interface files say what + packages they depend on (so that we can figure out + which packages to load/link). +\item + CM is parameterised both by the client uses and the @Compiler@ + supplied. But it doesn't make sense to have a HEP-style client + attached to a @hmake@-style @Compiler@. So, really, the + parameterising entity should contain both aspects, not just the + current @Compiler@ contents. +\end{itemize} + +\subsection{Assumptions} + +\begin{itemize} +\item Packages other than the "current" one are assumed to be + already compiled. +\item + The "current" package is usually "MAIN", + but we can set it with a command-line flag. + One invocation of ghci has only one "current" package. +\item + Packages are not mutually recursive +\item + All the object code for a package P is in libP.a or libP.dll +\end{itemize} + +\subsection{Stuff we need to be able to do} +\begin{itemize} +\item Create the environment in which a module has been translated, + so that interactive queries can be satisfied as if ``in'' that + module. +\end{itemize} + +%%-----------------------------------------------------------------%% +\section{The Compilation Manager} + +CM (@compilationManager@) is a functor, thus: +\begin{verbatim} +compilationManager :: Compiler -> IO HEP -- IO so that it can create + -- global vars (IORefs) + +data HEP = HEP { + load :: ModuleName -> IO (), + compileString :: ModuleName -> String -> IO HValue, + .... + } + +newCompiler :: IO Compiler -- ??? this is a peer of compilationManager? + +run :: HValue -> IO () -- Run an HValue of type IO () + -- In HEP? +\end{verbatim} + +@load@ is the central action of CM: its job is to bring a module and +all its descendents into an executable state, by doing the following: +\begin{enumerate} +\item + Use @summarise@ to descend the module hierarchy, starting from the + nominated root, creating @ModuleSummary@s, and + building a map @ModuleName@ @->@ @ModuleSummary@. @summarise@ + expects to be passed absolute paths to files. Use @finder@ to + convert module names to file paths. +\item + Topologically sort the map, + using dependency info in the @ModuleSummary@s. +\item + Clean up the symbol table by deleting the upward closure of + changed modules. +\item + Working bottom to top, call @compile@ on the upward closure of + all modules whose source has changed. A module's source has + changed when @sourceHasChanged@ indicates there is a difference + between old and new summaries for the module. Update the running + @FiniteMap@ @ModuleName@ @ModuleDetails@ with the new details + for this module. Ditto for the running + @FiniteMap@ @ModuleName@ @ModuleIFace@. +\item + Call @compileDone@ to signify that we've reached the top, so + that the batch system can now link. +\end{enumerate} + + +%%-----------------------------------------------------------------%% +\section{A compiler} + +Most of the system's complexity is hidden inside the functions +supplied in the @Compiler@ record: +\begin{verbatim} +data Compiler = Compiler { + + finder :: PackageConf -> [Path] -> IO (ModuleName -> ModuleLocation) + + summarise :: ModuleLocation -> IO ModuleSummary + + compile :: ModuleSummary + -> Maybe ModuleIFace + -> FiniteMap ModuleName ModuleDetails + -> IO CompileResult + + compileDone :: IO () + compileStarting :: IO () -- still needed? I don't think so. + } + +type ModuleName = String (or some such) +type Path = String -- an absolute file name +\end{verbatim} + +\subsection{The module \mbox{\tt finder}} +The @finder@, given a package configuration file and a list of +directories to look in, will map module names to @ModuleLocation@s, +in which the @Path@s are filenames, probably with an absolute path +to them. +\begin{verbatim} +data ModuleLocation = SourceOnly Path -- .hs + | ObjectCode Path Path -- .o & .hi + | InPackage Path -- .hi +\end{verbatim} +@SourceOnly@ and @ObjectCode@ are unremarkable. For sanity, +we require that a module's object and interface be in the same +directory. @InPackage@ indicates that the module is in a +different package. + +@Module@ values -- perhaps all @Name@ish things -- contain the name of +their package. That's so that +\begin{itemize} +\item Correct code can be generated for in-DLL vs out-of-DLL refs. +\item We don't have version number dependencies for symbols + imported from different packages. +\end{itemize} + +Somehow or other, it will be possible to know all the packages +required, so that the for the linker can load them. +We could detect package dependencies by recording them in the +@compile@r's @ModuleIFace@ cache, and with that and the +package config info, figure out the complete set of packages +to link. Or look at the command line args on startup. + +\ToDo{Need some way to tell incremental linkers about packages, + since in general we'll need to load and link them before + linking any modules in the current package.} + + +\subsection{The module \mbox{\tt summarise}r} +Given a filename of a module (\ToDo{presumably source or iface}), +create a summary of it. A @ModuleSummary@ should contain only enough +information for CM to construct an up-to-date picture of the +dependency graph. Rather than expose CM to details of timestamps, +etc, @summarise@ merely provides an up-to-date summary of any module. +CM can extract the list of dependencies from a @ModuleSummary@, but +other than that has no idea what's inside it. +\begin{verbatim} +data ModuleSummary = ... (abstract) ... + +depsFromSummary :: ModuleSummary -> [ModuleName] -- module names imported +sourceHasChanged :: ModuleSummary -> ModuleSummary -> Bool +\end{verbatim} +@summarise@ is intended to be fast -- a @stat@ of the source or +interface to see if it has changed, and, if so, a quick semi-parse to +determine the new imports. + +\subsection{The module \mbox{\tt compile}r} +@compile@ traffics in @ModuleIFace@s and @ModuleDetails@. + +A @ModuleIFace@ is an in-memory representation of the contents of an +interface file, including version numbers, unfoldings and pragmas, and +the linkable code for the module. @ModuleIFace@s are un-renamed, +using @HsSym@/@RdrNames@ rather than (globally distinct) @Names@. + +@ModuleDetails@, by contrast, is an in-memory representation of the +static environment created by compiling a module. It is phrased in +terms of post-renaming @Names@, @TyCon@s, etc, so it's basically a +renamed-to-global-uniqueness rendition of a @ModuleIFace@. + +In an interactive session, we'll want to be able to evaluate +expressions as if they had been compiled in the scope of some +specified module. This means that the @ModuleDetails@ must contain +the type of everything defined in the module, rather than just the +types of exported stuff. As a consequence, @ModuleIFace@ must also +contain the type of everything, because it should always be possible +to generate a module's @ModuleDetails@ from its @ModuleIFace@. + +CM maintains two mappings, one from @ModuleName@s to @ModuleIFace@s, +the other from @ModuleName@s to @ModuleDetail@s. It passes the former +to each call of @compile@. This is used to supply information about +modules compiled prior to this one (lower down in the graph). The +returned @CompileResult@ supplies a new @ModuleDetails@ for the module +if compilation succeeded, and CM adds this to the mapping. The +@CompileResult@ also supplies a new @ModuleIFace@, which is either the +same as that supplied to @compile@, if @compile@ decided not to +retranslate the module, or is the result of a fresh translation (from +source). So these mappings are an explicitly-passed-around part of +the global system state. + +@compile@ may also {\em optionally} also accumulate @ModuleIFace@s for +modules in different packages -- that is, interfaces which we read, +but never attempt to recompile source for. Such interfaces, being +from foreign packages, never change, so @compile@ can accumulate them +in perpetuity in a private global variable. Indeed, a major motivator +of this design is to facilitate this caching of interface files, +reading of which is a serious bottleneck for the current compiler. + +When CM restarts compilation down at the bottom of the module graph, +it first needs to throw away all \ToDo{all?} @ModuleDetails@ in the +upward closure of the out-of-date modules. So @ModuleDetails@ don't +persist across recompilations. But @ModuleIFace@s do, since they +are conceptually equivalent to interface files. + + +\subsubsection*{What @compile@ returns} +@compile@ returns a @CompileResult@ to CM. +Note that the @compile@'s foreign-package interface cache can +become augmented even as a result of reading interfaces for a +compilation attempt which ultimately fails, although it will not be +augmented with a new @ModuleIFace@ for the failed module. +\begin{verbatim} +-- CompileResult is not abstract to the Compilation Manager +data CompileResult + = CompOK ModuleIFace + ModuleDetails -- compiled ok, here are new details + -- and new iface + + | CompErr [SDoc] -- compilation gave errors + + | NoChange -- no change required, meaning: + -- exports, unfoldings, strictness, etc, + -- unchanged, and executable code unchanged +\end{verbatim} + + + +\subsubsection*{Re-establishing local-to-global name mappings} +Consider +\begin{verbatim} +module Upper where module Lower ( f ) where +import Lower ( f ) f = ... +g = ... f ... +\end{verbatim} +When @Lower@ is first compiled, @f@ is allocated a @Unique@ +(presumably inside an @Id@ or @Name@?). When @Upper@ is then +compiled, its reference to @f@ is attached directly to the +@Id@ created when compiling @Lower@. + +If the definition of @f@ is now changed, but not the type, +unfolding, strictness, or any other thing which affects the way +it should be called, we will have to recompile @Lower@, but not +@Upper@. This creates a problem -- @g@ will then refer to the +the old @Id@ for @f@, not the new one. This may or may not +matter, but it seems safer to ensure that all @Unique@-based +references into child modules are always up to date. + +So @compile@ recreates the @ModuleDetails@ for @Upper@ from +the @ModuleIFace@ of @Upper@ and the @ModuleDetails@ of @Lower@. + +The rule is: if a module is up to date with respect to its +source, but a child @C@ has changed, then either: +\begin{itemize} +\item On examination of the version numbers in @C@'s + interface/@ModuleIFace@ that we used last time, we discover that + an @Id@/@TyCon@/class/instance we depend on has changed. So + we need to retranslate the module from its source, generating + a new @ModuleIFace@ and @ModuleDetails@. +\item Or: there's nothing in @C@'s interface that we depend on. + So we quickly recreate a new @ModuleDetails@ from the existing + @ModuleIFace@, creating fresh links to the new @Unique@-world + entities in @C@'s new @ModuleDetails@. +\end{itemize} + +Upshot: we need to redo @compile@ on all modules all the way up, +rather than just the ones that need retranslation. However, we hope +that most modules won't need retranslation -- just regeneration of the +@ModuleDetails@ from the @ModuleIFace@. In effect, the @ModuleIFace@ +is a quickly-compilable representation of the module's contents, just +enough to create the @ModuleDetails@. + +\ToDo{Is there anything in @ModuleDetails@ which can't be + recreated from @ModuleIFace@ ?} + +So the @ModuleIFace@s persist across calls to @HEP.load@, whereas +@ModuleDetails@ are reconstructed on every compilation pass. This +means that @ModuleIFace@s have the same lifetime as the byte/object +code, and so should somehow contain their code. + +The behind-the-scenes @ModuleIFace@ cache has some kind of holding-pen +arrangement, to lazify the copying-out of stuff from it, and thus to +minimise redundant interface reading. \ToDo{Burble burble. More +details.}. + +When CM starts working back up the module graph with @compile@, it +needs to remove from the travelling @FiniteMap@ @ModuleName@ +@ModuleDetails@ the details for all modules in the upward closure of +the compilation start points. However, since we're going to visit +precisely those modules and no others on the way back up, we might as +well just zap them the old @ModuleDetails@ incrementally. This does +mean that the @FiniteMap@ @ModuleName@ @ModuleDetails@ will be +inconsistent until we reach the top. + +In interactive mode, each @compile@ call on a module for which no +object code is available, or for which it is out of date wrt source, +emit bytecode into memory, update the resulting @ModuleIFace@ with the +address of the bytecode image, and link the image. + +In batch mode, emit assembly or object code onto disk. Record +somewhere \ToDo{where?} that this object file needs to go into the +final link. + +When we reach the top, @compileDone@ is called, to signify that batch +linking can now proceed, if need be. + +Modules in other packages never get a @ModuleIFace@ or @ModuleDetails@ +entry in CM's maps -- those maps are only for modules in this package. +As previously mentioned, @compile@ may optionally cache @ModuleIFace@s +for foreign package modules. When reading such an interface, we don't +need to read the version info for individual symbols, since foreign +packages are assumed static. + +\subsubsection*{What's in a \mbox{\tt ModuleIFace}?} + +Current interface file contents? + + +\subsubsection*{What's in a \mbox{\tt ModuleDetails}?} + +There is no global symbol table @:: Name -> ???@. To look up a +@Name@, first extract the @ModuleName@ from it, look that up in +the passed-in @FiniteMap@ @ModuleName@ @ModuleDetails@, +and finally look in the relevant @Env@. + +\ToDo{Do we still have the @HoldingPen@, or is it now composed from +per-module bits too?} +\begin{verbatim} +data ModuleDetails = ModuleDetails { + + moduleExports :: what it exports (Names) + -- roughly a subset of the .hi file contents + + moduleEnv :: RdrName -> Name + -- maps top-level entities in this module to + -- globally distinct (Uniq-ified) Names + + moduleDefs :: Bag Name -- All the things in the global symbol table + -- defined by this module + + package :: Package -- what package am I in? + + lastCompile :: Date -- of last compilation + + instEnv :: InstEnv -- local inst env + typeEnv :: Name -> TyThing -- local tycon env? + } + +-- A (globally unique) symbol table entry. Note that Ids contain +-- unfoldings. +data TyThing = AClass Class + | ATyCon TyCon + | AnId Id +\end{verbatim} +What's the stuff in @ModuleDetails@ used for? +\begin{itemize} +\item @moduleExports@ so that the stuff which is visible from outside + the module can be calculated. +\item @moduleEnv@: \ToDo{umm err} +\item @moduleDefs@: one reason we want this is so that we can nuke the + global symbol table contribs from this module when it leaves the + system. \ToDo{except ... we don't have a global symbol table any + more.} +\item @package@: we will need to chase arbitrarily deep into the + interfaces of other packages. Of course we don't want to + recompile those, but as we've read their interfaces, we may + as well cache that info. So @package@ indicates whether this + module is in the default package, or, if not, which it is in. + + Also, when we come to linking, we'll need to know which + packages are demanded, so we know to load their objects. + +\item @lastCompile@: When the module was last compiled. If the + source is older than that, then a recompilation can only be + required if children have changed. +\item @typeEnv@: obvious?? +\item @instEnv@: the instances contributed by this module only. The + Report allegedly says that when a module is translated, the + available + instance env is all the instances in the downward closure of + itself in the module graph. + + We choose to use this simple representation -- each module + holds just its own instances -- and do the naive thing when + creating an inst env for compilation with. If this turns out + to be a performance problem we'll revisit the design. +\end{itemize} + + + +%%-----------------------------------------------------------------%% +\section{Misc text looking for a home} + +\subsection*{Linking} + +\ToDo{All this linking stuff is now bogus.} + +There's an abstract @LinkState@, which is threaded through the linkery +bits. CM can call @addpkgs@ to notify the linker of packages +required, and it can call @addmods@ to announce modules which need to +be linked. Finally, CM calls @endlink@, after which an executable +image should be ready. The linker may link incrementally, during each +call of @addpkgs@ and @addmods@, or it can just store up names and do +all the linking when @endlink@ is called. + +In order that incremental linking is possible, CM should specify +packages and module groups in dependency order, ie, from the bottom up. + +\subsection*{In-memory linking of bytecode} +When being HEP-like, @compile@ will translate sources to bytecodes +in memory, with all the bytecode for a module as a contiguous lump +outside the heap. It needs to communicate the addresses of these +lumps to the linker. The linker also needs to know whether a +given module is available as in-memory bytecode, or whether it +needs to load machine code from a file. + +I guess @LinkState@ needs to map module names to base addresses +of their loaded images, + the nature of the image, + whether or not +the image has been linked. + +\subsection*{On disk linking of object code, to give an executable} +The @LinkState@ in this case is just a list of module and package +names, which @addpkgs@ and @addmods@ add to. The final @endlink@ +call can invoke the system linker. + +\subsection{Finding out about packages, dependencies, and auxiliary + objects} + +Ask the @packages.conf@ file that lives with the driver at the mo. + +\ToDo{policy about upward closure?} + + + +\ToDo{record story about how in memory linking is done.} + +\ToDo{linker start/stop/initialisation/persistence. Need to + say more about @LinkState@.} + + +\end{document} + + |