diff options
author | Edward Z. Yang <ezyang@cs.stanford.edu> | 2014-10-07 23:20:33 -0600 |
---|---|---|
committer | Edward Z. Yang <ezyang@cs.stanford.edu> | 2014-10-07 23:20:36 -0600 |
commit | 21dff57244376131c902501f447e52cad1aaaf74 (patch) | |
tree | 321ea66e0d5fb56463a3745d5907efc8ac515fc9 /docs | |
parent | 6f2eca11b064c8e888badb8942a8fa4ba0fa7524 (diff) | |
download | haskell-21dff57244376131c902501f447e52cad1aaaf74.tar.gz |
Initial commit of the Backpack manual [skip ci]
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/backpack/Makefile | 5 | ||||
-rw-r--r-- | docs/backpack/backpack-manual.tex | 325 |
2 files changed, 330 insertions, 0 deletions
diff --git a/docs/backpack/Makefile b/docs/backpack/Makefile index 0dd7a9dad5..641889bce0 100644 --- a/docs/backpack/Makefile +++ b/docs/backpack/Makefile @@ -1,2 +1,7 @@ +all: backpack-impl.pdf backpack-manual.pdf + backpack-impl.pdf: backpack-impl.tex latexmk -pdf -latexoption=-halt-on-error -latexoption=-file-line-error -latexoption=-synctex=1 backpack-impl.tex && touch paper.dvi || ! rm -f $@ + +backpack-manual.pdf: backpack-manual.tex + latexmk -pdf -latexoption=-halt-on-error -latexoption=-file-line-error -latexoption=-synctex=1 backpack-manual.tex && touch paper.dvi || ! rm -f $@ diff --git a/docs/backpack/backpack-manual.tex b/docs/backpack/backpack-manual.tex new file mode 100644 index 0000000000..bcb33b4b9b --- /dev/null +++ b/docs/backpack/backpack-manual.tex @@ -0,0 +1,325 @@ +\documentclass{article} + +\usepackage{color} +\usepackage{fullpage} + +\newcommand{\Red}[1]{{\color{red} #1}} + +\title{The Backpack Manual} + +\begin{document} + +\maketitle + +\section{GHC} + +\subsection{Conventions} + +In this subsection, we describe some recurrent patterns you will see in the +Backpack implementation. + +\paragraph{Package key} A package key is an opaque identifier +identifying a package which serves as the basis for type identity and +linker symbols. Informally, you might think of a package as a package +name and its version, e.g., \texttt{containers-0.9}; however, sometimes, +it is necessary to distinguish between installed instances of a package +with the same name and version which were compiled with different +dependencies. Thus, in practice, a package key looks something like +\texttt{conta\_GtvvBIboSRuDmyUQfSZoAx}. In this document, we'll use +\texttt{containers\_KEY} as a convenient shorthand to refer to some +package key for the \texttt{containers} package. The package key is +programatically generated by Cabal (consisting of a hash of the +dependencies), but as far as GHC is concerned, the details of this +generation are not important. + +The package key is different from an \emph{installed package ID}, which +is a more fine-grained identifier for a package. Identical installed +package IDs imply identical package keys, but not vice versa. However, +in this subsection, we will treat them as equivalent, since within a single +run of GHC we enforce that package keys and (non-shadowed) installed +package IDs are in one-to-one correspondence. + +\paragraph{Syntax of original names} In a few command flags +(\texttt{-sig-of} and \texttt{-shape-of}), we need to specify an +original name and/or original module. These are qualified names which +unambiguously specify some entity, and are the combination of a +\emph{package key}, a \emph{module name} and, in the case of original +names, the unqualified \emph{name} itself. We use the convention of +representing such names as \texttt{package:The.Module} or +\texttt{package:The.Module.name} in the case of original names.% +\footnote{As a forward reference: Cabal adopts a different convention which +utilizes installed package IDs rather than package keys (and a +correspondingly different syntactic convention, since installed package +IDs are permitted to contain colons). Why does GHC prefer package keys? +It is the preferred internal format (since it is used for type +identity), and converting from installed package IDs to package keys is +not possible until the package database is loaded, which occurs after +package flag parsing.} + +\subsection{Signatures} + +Backpack introduces a new type of file, \texttt{hsig} files, which +represent signatures for Haskell modules. A signature can be used +to type-check client code which uses a module (without the module +implementation), or to verify that an implementation upholds some +signature (without a client implementation.) + +The syntax of an \texttt{hsig} file is similar to an \texttt{hs-boot} +file. Value definitions are not permitted, but you can write type +signatures, data declarations, type classes and type class instances. +At the moment, data families and type families are not supported. Here +is an example of a module signature representing an abstract map type: + +\begin{verbatim} +type role Map nominal representational +data Map k v +instance Functor (Map k) +empty :: Map k a +\end{verbatim} + +An \texttt{hsig} file can be processed by GHC with or without a backing +implementation. + +\paragraph{Typechecking only} When no backing implementation is provided, we are +only able to generate an interface file for the module, as in the +following command: + +\begin{verbatim} +$ ghc -c Map.hsig -fno-code -fwrite-interface +\end{verbatim} + +By default, this generated interface file has a \emph{fresh} original +names for everything in the signature. For example, in our +previous example, the default original name of \texttt{Map} compiled as +a signature is \texttt{main:Map.Map} (even if, say, eventually, you +discover that in fact the original name is +\texttt{containers\_KEY:Data.Map.Map}.) You can also explicitly specify what +original name should be assigned to some data types, as follows: + +\begin{verbatim} +$ ghc -c Map.hsig -shape-of "Map is containers_KEY:Data.Map.Map" \ + -fno-code -fwrite-interface +\end{verbatim} + +\texttt{-shape-of} is comma separated list of \texttt{name is origname} +pairs, which overrides the original name specified in the new interface +file. \Red{At the moment, \texttt{-shape-of} is not implemented, and the syntax +is not finalized.} + + +\paragraph{Compilation} A backing implementation can be specified +using the \texttt{-sig-of} flag: + +\begin{verbatim} +$ ghc -c Map.hsig -sig-of "containers_KEY:Data.Map" +\end{verbatim} + +GHC does two things in this case. First, it does a consistency check +between the signature and the original implementation and makes sure +that the implementation accurately implements all of the types in the +signature. This is a simple, width-subtyping check, nearly identical to +the check used by \texttt{hs-boot}. Second, it generates an interface +file which reexports the set of identifiers from the original +implementation that were specified in the signature. Any files which +import the signature (as opposed to the implementation) will use this +interface file to find relevant implementations, in the same way that +modules today can reexport identifiers from other modules. + +\paragraph{Imports} An import of a signature operates in much the same +way as ordinary imports. When a source file imports two modules with +conflicting identifiers, we report an error when a user attempts to use +such an ambiguous identifier. However, if the identifiers have the same +original name, the import is unambiguous. Similarly for signatures, if +a user imports two signature files exporting the same name, we will only +report an error if the identifiers have conflicting original names.\footnote{This implements ``signature merging'' as described by the original Backpack paper; notably, it was not necessary to synthesize a new interface file for the combined signature.} + +\begin{verbatim} +$ cat > AImpl.hsig +module AImpl where + foo = 2 +$ cat > ASig.hsig +module ASig where + foo :: Int +$ cat > BSig.hsig +module BSig where + foo :: Int +$ cat > Main.hs +module Main where + import ASig + import BSig + main = print foo +$ ghc -c AImpl.hs +$ ghc -c ASig.hsig -sig-of "main:AImpl" +$ ghc -c BSig.hsig -sig-of "main:AImpl" +$ ghc -c Main.hs +SUCCEEDS + +$ cat > BImpl.hs +module BImpl where + foo = 4 +$ ghc -c BImpl.hs +$ ghc -c BSig.hsig -sig-of "main:BImpl" # recompile +$ ghc -c Main.hs +FAILS due to ambiguous original name +\end{verbatim} + +There is a twist, however: when we are only type-checking, GHC +performs an \emph{additional} check and ensure that identifiers with the +same original names originating from different signature files have +compatible types: e.g. \texttt{foo} from \texttt{ASig} and \texttt{foo} +from \texttt{BSig} have the same type. (This also applies to type +definitions, etc.) This is because the \texttt{-shape-of} flag could be +used to cause two incompatible definitions to be given the same original +name. This check is not necessary during compilation, because +\texttt{-sig-of} will ensure that the signatures are compatible with a +common, unique backing implementation. \Red{The additional check is not implemented yet.} + +\begin{verbatim} +$ ghc -c ASig.hsig -fno-code -fwrite-interface +$ ghc -c BSig.hsig -fno-code -fwrite-interface +$ ghc -c Main.hs -fno-code -fwrite-interface +FAILS due to ambiguous original name (default used) + +$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface +$ ghc -c BSig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface +$ ghc -c Main.hs -fno-code -fwrite-interface +SUCCEEDS + +$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface +$ ghc -c BSig.hsig -shape-of "foo is main:BImpl" -fno-code -fwrite-interface +$ ghc -c Main.hs -fno-code -fwrite-interface +FAILS due to ambiguous original name + +$ cat > BSig.hsig +module BSig where + foo :: Bool + +$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface +$ ghc -c BSig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface +$ ghc -c Main.hs -fno-code -fwrite-interface +FAILS due to mismatchng types +\end{verbatim} + +\subsection{Compiled external packages} + +This subsection concerns external packages which have been fully compiled. +They may have had some abstract signatures, but those signatures were +filled with implementations before installation to the package database. + +\paragraph{Installed package database} +The format of packages in the installed package database has been +generalized to support exposed signatures and module/signature +reexports. Previously, installed package info recorded +\texttt{exposed-modules} and \texttt{hidden-modules}: exposed modules +provided a list of modules that should be added to the module import map +for processing \texttt{import} statements, whereas hidden modules was a +purely documentary mechanism to let users know if they attempted to +import a hidden module (rather than just saying the module didn't exist +at all.) + +With Backpack, \texttt{exposed-modules} has been generalized to record +two extra, optional pieces of information about the module in question: +what the \emph{original module/signature} is (if it is a reexport), and +what the \emph{backing implementation} is (if it is a signature). + +\begin{verbatim} +exposed-modules: + A, # original module + B from ipid:B, # reexported module + C is ipid:CImpl, # exposed signature + D from ipid:D is ipid:DImpl, # reexported signature + D from ipid:D2 is ipid:DImpl # duplicates can be OK +\end{verbatim} + +If no reexports or signatures are used, the commas can be omitted +(making this syntax backwards compatible with the original syntax.) + +\texttt{ghc-pkg}, when processing this field, does a simple validation +that, for any duplicate exposed modules, their backing implementation +or original module if there is no backing implementation must be equal. + +\Red{All of this is not implemented. What is currently implemented is +that \texttt{reexported-modules} has a seperate field, where the +original module is always set and backing implementation is always empty. +I came to this generalization when I found I needed to add support for +signatures and reexported signatures. An alternate design is just to +have a separate list for every parameter: however, we end up with a lot +of duplication in the validation and handling code GHC side. I do like +the parametric approach better, but since the original +\texttt{exposed-modules} was space separated, there's not an easy way to +extend the syntax in a backwards-compatible way. The current proposal +suggests we add the comma variant because it is unambiguous with the old +syntax.} + +\Red{Breadcrumb for abandoned design path: at one point, simonpj proposed +that we not only allow duplicate signatures, but also duplicate implementations. +We decided later that there was no compelling use-case for this functionality +at the moment.} + +\paragraph{External imports} +In order to support imports of modules which are not in the home package +(modules that are currently being compiled), GHC builds a mapping from module +names to implementations based on the exposed packages. Previously, if +two exposed packages attempted to create a mapping for the same name, +this resulted in ambiguity. + +With signatures and reexports, some collisions are now no longer +considered ambiguous and trigger different behavior when their name is +imported. Specifically, define the \emph{true module} of a package export +to be the backing implementation, if the module is a signature, or the +original module otherwise. For any module name, we consider the set of +mappings for it to be \emph{unambiguous} if all of their true modules +are equal. Here is an example of an unambiguous set of mappings: + +\begin{verbatim} +exposed-modules: + A from pkg:AImpl, + A is pkg:AImpl, + A from other-pkg:Sig is pkg:AImpl +\end{verbatim} + +This mapping says that this package reexports \texttt{pkg:AImpl} as +\texttt{A}, has an \texttt{A.hsig} which was compiled against +\texttt{pkg:AImpl}, and reexports a signature from \texttt{other-pkg} +which itself was compiled against \texttt{pkg:AImpl}. + +When there are multiple, unambiguous mappings for a module name, which +interface do we load when we actually \texttt{import} it? If any +mapping in the set is an implementation (\texttt{hs} file), we just use +the interface of the true module (it doesn't matter which mapping, since +they are all equivalent). Otherwise, we load the interface files of +\emph{all} the signatures, as if there were multiple import statements. +For example, if package \texttt{a} and package \texttt{b} both offer +exposed signatures for \texttt{ASig}, then if I \texttt{import ASig} +with both packages exposed, it is equivalent to having said: + +\begin{verbatim} +import "a" ASig +import "b" ASig +\end{verbatim} + +\Red{Behavior for signatures is not implemented yet.} + +\subsection{Indefinite external packages} + +\Red{This is not implemented yet.} + +\section{Cabal} + +\subsection{Cabal file syntax} + +\subsection{Setup new flags} + +\subsection{Package key generation} + +\subsection{Metadata in the installed package database} + +Cabal records + +\texttt{instantiated-with} + +\section{cabal-install} + +\subsection{Indefinite package instantiation} + +\end{document} |