summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2014-10-07 23:20:33 -0600
committerEdward Z. Yang <ezyang@cs.stanford.edu>2014-10-07 23:20:36 -0600
commit21dff57244376131c902501f447e52cad1aaaf74 (patch)
tree321ea66e0d5fb56463a3745d5907efc8ac515fc9 /docs
parent6f2eca11b064c8e888badb8942a8fa4ba0fa7524 (diff)
downloadhaskell-21dff57244376131c902501f447e52cad1aaaf74.tar.gz
Initial commit of the Backpack manual [skip ci]
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r--docs/backpack/Makefile5
-rw-r--r--docs/backpack/backpack-manual.tex325
2 files changed, 330 insertions, 0 deletions
diff --git a/docs/backpack/Makefile b/docs/backpack/Makefile
index 0dd7a9dad5..641889bce0 100644
--- a/docs/backpack/Makefile
+++ b/docs/backpack/Makefile
@@ -1,2 +1,7 @@
+all: backpack-impl.pdf backpack-manual.pdf
+
backpack-impl.pdf: backpack-impl.tex
latexmk -pdf -latexoption=-halt-on-error -latexoption=-file-line-error -latexoption=-synctex=1 backpack-impl.tex && touch paper.dvi || ! rm -f $@
+
+backpack-manual.pdf: backpack-manual.tex
+ latexmk -pdf -latexoption=-halt-on-error -latexoption=-file-line-error -latexoption=-synctex=1 backpack-manual.tex && touch paper.dvi || ! rm -f $@
diff --git a/docs/backpack/backpack-manual.tex b/docs/backpack/backpack-manual.tex
new file mode 100644
index 0000000000..bcb33b4b9b
--- /dev/null
+++ b/docs/backpack/backpack-manual.tex
@@ -0,0 +1,325 @@
+\documentclass{article}
+
+\usepackage{color}
+\usepackage{fullpage}
+
+\newcommand{\Red}[1]{{\color{red} #1}}
+
+\title{The Backpack Manual}
+
+\begin{document}
+
+\maketitle
+
+\section{GHC}
+
+\subsection{Conventions}
+
+In this subsection, we describe some recurrent patterns you will see in the
+Backpack implementation.
+
+\paragraph{Package key} A package key is an opaque identifier
+identifying a package which serves as the basis for type identity and
+linker symbols. Informally, you might think of a package as a package
+name and its version, e.g., \texttt{containers-0.9}; however, sometimes,
+it is necessary to distinguish between installed instances of a package
+with the same name and version which were compiled with different
+dependencies. Thus, in practice, a package key looks something like
+\texttt{conta\_GtvvBIboSRuDmyUQfSZoAx}. In this document, we'll use
+\texttt{containers\_KEY} as a convenient shorthand to refer to some
+package key for the \texttt{containers} package. The package key is
+programatically generated by Cabal (consisting of a hash of the
+dependencies), but as far as GHC is concerned, the details of this
+generation are not important.
+
+The package key is different from an \emph{installed package ID}, which
+is a more fine-grained identifier for a package. Identical installed
+package IDs imply identical package keys, but not vice versa. However,
+in this subsection, we will treat them as equivalent, since within a single
+run of GHC we enforce that package keys and (non-shadowed) installed
+package IDs are in one-to-one correspondence.
+
+\paragraph{Syntax of original names} In a few command flags
+(\texttt{-sig-of} and \texttt{-shape-of}), we need to specify an
+original name and/or original module. These are qualified names which
+unambiguously specify some entity, and are the combination of a
+\emph{package key}, a \emph{module name} and, in the case of original
+names, the unqualified \emph{name} itself. We use the convention of
+representing such names as \texttt{package:The.Module} or
+\texttt{package:The.Module.name} in the case of original names.%
+\footnote{As a forward reference: Cabal adopts a different convention which
+utilizes installed package IDs rather than package keys (and a
+correspondingly different syntactic convention, since installed package
+IDs are permitted to contain colons). Why does GHC prefer package keys?
+It is the preferred internal format (since it is used for type
+identity), and converting from installed package IDs to package keys is
+not possible until the package database is loaded, which occurs after
+package flag parsing.}
+
+\subsection{Signatures}
+
+Backpack introduces a new type of file, \texttt{hsig} files, which
+represent signatures for Haskell modules. A signature can be used
+to type-check client code which uses a module (without the module
+implementation), or to verify that an implementation upholds some
+signature (without a client implementation.)
+
+The syntax of an \texttt{hsig} file is similar to an \texttt{hs-boot}
+file. Value definitions are not permitted, but you can write type
+signatures, data declarations, type classes and type class instances.
+At the moment, data families and type families are not supported. Here
+is an example of a module signature representing an abstract map type:
+
+\begin{verbatim}
+type role Map nominal representational
+data Map k v
+instance Functor (Map k)
+empty :: Map k a
+\end{verbatim}
+
+An \texttt{hsig} file can be processed by GHC with or without a backing
+implementation.
+
+\paragraph{Typechecking only} When no backing implementation is provided, we are
+only able to generate an interface file for the module, as in the
+following command:
+
+\begin{verbatim}
+$ ghc -c Map.hsig -fno-code -fwrite-interface
+\end{verbatim}
+
+By default, this generated interface file has a \emph{fresh} original
+names for everything in the signature. For example, in our
+previous example, the default original name of \texttt{Map} compiled as
+a signature is \texttt{main:Map.Map} (even if, say, eventually, you
+discover that in fact the original name is
+\texttt{containers\_KEY:Data.Map.Map}.) You can also explicitly specify what
+original name should be assigned to some data types, as follows:
+
+\begin{verbatim}
+$ ghc -c Map.hsig -shape-of "Map is containers_KEY:Data.Map.Map" \
+ -fno-code -fwrite-interface
+\end{verbatim}
+
+\texttt{-shape-of} is comma separated list of \texttt{name is origname}
+pairs, which overrides the original name specified in the new interface
+file. \Red{At the moment, \texttt{-shape-of} is not implemented, and the syntax
+is not finalized.}
+
+
+\paragraph{Compilation} A backing implementation can be specified
+using the \texttt{-sig-of} flag:
+
+\begin{verbatim}
+$ ghc -c Map.hsig -sig-of "containers_KEY:Data.Map"
+\end{verbatim}
+
+GHC does two things in this case. First, it does a consistency check
+between the signature and the original implementation and makes sure
+that the implementation accurately implements all of the types in the
+signature. This is a simple, width-subtyping check, nearly identical to
+the check used by \texttt{hs-boot}. Second, it generates an interface
+file which reexports the set of identifiers from the original
+implementation that were specified in the signature. Any files which
+import the signature (as opposed to the implementation) will use this
+interface file to find relevant implementations, in the same way that
+modules today can reexport identifiers from other modules.
+
+\paragraph{Imports} An import of a signature operates in much the same
+way as ordinary imports. When a source file imports two modules with
+conflicting identifiers, we report an error when a user attempts to use
+such an ambiguous identifier. However, if the identifiers have the same
+original name, the import is unambiguous. Similarly for signatures, if
+a user imports two signature files exporting the same name, we will only
+report an error if the identifiers have conflicting original names.\footnote{This implements ``signature merging'' as described by the original Backpack paper; notably, it was not necessary to synthesize a new interface file for the combined signature.}
+
+\begin{verbatim}
+$ cat > AImpl.hsig
+module AImpl where
+ foo = 2
+$ cat > ASig.hsig
+module ASig where
+ foo :: Int
+$ cat > BSig.hsig
+module BSig where
+ foo :: Int
+$ cat > Main.hs
+module Main where
+ import ASig
+ import BSig
+ main = print foo
+$ ghc -c AImpl.hs
+$ ghc -c ASig.hsig -sig-of "main:AImpl"
+$ ghc -c BSig.hsig -sig-of "main:AImpl"
+$ ghc -c Main.hs
+SUCCEEDS
+
+$ cat > BImpl.hs
+module BImpl where
+ foo = 4
+$ ghc -c BImpl.hs
+$ ghc -c BSig.hsig -sig-of "main:BImpl" # recompile
+$ ghc -c Main.hs
+FAILS due to ambiguous original name
+\end{verbatim}
+
+There is a twist, however: when we are only type-checking, GHC
+performs an \emph{additional} check and ensure that identifiers with the
+same original names originating from different signature files have
+compatible types: e.g. \texttt{foo} from \texttt{ASig} and \texttt{foo}
+from \texttt{BSig} have the same type. (This also applies to type
+definitions, etc.) This is because the \texttt{-shape-of} flag could be
+used to cause two incompatible definitions to be given the same original
+name. This check is not necessary during compilation, because
+\texttt{-sig-of} will ensure that the signatures are compatible with a
+common, unique backing implementation. \Red{The additional check is not implemented yet.}
+
+\begin{verbatim}
+$ ghc -c ASig.hsig -fno-code -fwrite-interface
+$ ghc -c BSig.hsig -fno-code -fwrite-interface
+$ ghc -c Main.hs -fno-code -fwrite-interface
+FAILS due to ambiguous original name (default used)
+
+$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface
+$ ghc -c BSig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface
+$ ghc -c Main.hs -fno-code -fwrite-interface
+SUCCEEDS
+
+$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface
+$ ghc -c BSig.hsig -shape-of "foo is main:BImpl" -fno-code -fwrite-interface
+$ ghc -c Main.hs -fno-code -fwrite-interface
+FAILS due to ambiguous original name
+
+$ cat > BSig.hsig
+module BSig where
+ foo :: Bool
+
+$ ghc -c ASig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface
+$ ghc -c BSig.hsig -shape-of "foo is main:AImpl" -fno-code -fwrite-interface
+$ ghc -c Main.hs -fno-code -fwrite-interface
+FAILS due to mismatchng types
+\end{verbatim}
+
+\subsection{Compiled external packages}
+
+This subsection concerns external packages which have been fully compiled.
+They may have had some abstract signatures, but those signatures were
+filled with implementations before installation to the package database.
+
+\paragraph{Installed package database}
+The format of packages in the installed package database has been
+generalized to support exposed signatures and module/signature
+reexports. Previously, installed package info recorded
+\texttt{exposed-modules} and \texttt{hidden-modules}: exposed modules
+provided a list of modules that should be added to the module import map
+for processing \texttt{import} statements, whereas hidden modules was a
+purely documentary mechanism to let users know if they attempted to
+import a hidden module (rather than just saying the module didn't exist
+at all.)
+
+With Backpack, \texttt{exposed-modules} has been generalized to record
+two extra, optional pieces of information about the module in question:
+what the \emph{original module/signature} is (if it is a reexport), and
+what the \emph{backing implementation} is (if it is a signature).
+
+\begin{verbatim}
+exposed-modules:
+ A, # original module
+ B from ipid:B, # reexported module
+ C is ipid:CImpl, # exposed signature
+ D from ipid:D is ipid:DImpl, # reexported signature
+ D from ipid:D2 is ipid:DImpl # duplicates can be OK
+\end{verbatim}
+
+If no reexports or signatures are used, the commas can be omitted
+(making this syntax backwards compatible with the original syntax.)
+
+\texttt{ghc-pkg}, when processing this field, does a simple validation
+that, for any duplicate exposed modules, their backing implementation
+or original module if there is no backing implementation must be equal.
+
+\Red{All of this is not implemented. What is currently implemented is
+that \texttt{reexported-modules} has a seperate field, where the
+original module is always set and backing implementation is always empty.
+I came to this generalization when I found I needed to add support for
+signatures and reexported signatures. An alternate design is just to
+have a separate list for every parameter: however, we end up with a lot
+of duplication in the validation and handling code GHC side. I do like
+the parametric approach better, but since the original
+\texttt{exposed-modules} was space separated, there's not an easy way to
+extend the syntax in a backwards-compatible way. The current proposal
+suggests we add the comma variant because it is unambiguous with the old
+syntax.}
+
+\Red{Breadcrumb for abandoned design path: at one point, simonpj proposed
+that we not only allow duplicate signatures, but also duplicate implementations.
+We decided later that there was no compelling use-case for this functionality
+at the moment.}
+
+\paragraph{External imports}
+In order to support imports of modules which are not in the home package
+(modules that are currently being compiled), GHC builds a mapping from module
+names to implementations based on the exposed packages. Previously, if
+two exposed packages attempted to create a mapping for the same name,
+this resulted in ambiguity.
+
+With signatures and reexports, some collisions are now no longer
+considered ambiguous and trigger different behavior when their name is
+imported. Specifically, define the \emph{true module} of a package export
+to be the backing implementation, if the module is a signature, or the
+original module otherwise. For any module name, we consider the set of
+mappings for it to be \emph{unambiguous} if all of their true modules
+are equal. Here is an example of an unambiguous set of mappings:
+
+\begin{verbatim}
+exposed-modules:
+ A from pkg:AImpl,
+ A is pkg:AImpl,
+ A from other-pkg:Sig is pkg:AImpl
+\end{verbatim}
+
+This mapping says that this package reexports \texttt{pkg:AImpl} as
+\texttt{A}, has an \texttt{A.hsig} which was compiled against
+\texttt{pkg:AImpl}, and reexports a signature from \texttt{other-pkg}
+which itself was compiled against \texttt{pkg:AImpl}.
+
+When there are multiple, unambiguous mappings for a module name, which
+interface do we load when we actually \texttt{import} it? If any
+mapping in the set is an implementation (\texttt{hs} file), we just use
+the interface of the true module (it doesn't matter which mapping, since
+they are all equivalent). Otherwise, we load the interface files of
+\emph{all} the signatures, as if there were multiple import statements.
+For example, if package \texttt{a} and package \texttt{b} both offer
+exposed signatures for \texttt{ASig}, then if I \texttt{import ASig}
+with both packages exposed, it is equivalent to having said:
+
+\begin{verbatim}
+import "a" ASig
+import "b" ASig
+\end{verbatim}
+
+\Red{Behavior for signatures is not implemented yet.}
+
+\subsection{Indefinite external packages}
+
+\Red{This is not implemented yet.}
+
+\section{Cabal}
+
+\subsection{Cabal file syntax}
+
+\subsection{Setup new flags}
+
+\subsection{Package key generation}
+
+\subsection{Metadata in the installed package database}
+
+Cabal records
+
+\texttt{instantiated-with}
+
+\section{cabal-install}
+
+\subsection{Indefinite package instantiation}
+
+\end{document}