diff options
author | Edward Z. Yang <ezyang@cs.stanford.edu> | 2015-08-06 22:31:17 -0700 |
---|---|---|
committer | Edward Z. Yang <ezyang@cs.stanford.edu> | 2015-08-06 22:31:24 -0700 |
commit | 6cab3afe90264b00e2a92f1aef09d6ca4e7a1680 (patch) | |
tree | 1e83aacbd4d1740433b0c79cd945b88d1edd1072 /docs | |
parent | a1c934c1b97a09db841d20da4811e0e1310f7511 (diff) | |
download | haskell-6cab3afe90264b00e2a92f1aef09d6ca4e7a1680.tar.gz |
Big batch of Backpack documentation edits.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/backpack/algorithm.tex | 1395 | ||||
-rw-r--r-- | docs/backpack/backpack-manual.tex | 265 |
2 files changed, 898 insertions, 762 deletions
diff --git a/docs/backpack/algorithm.tex b/docs/backpack/algorithm.tex index 588f10e8cc..6cb316952f 100644 --- a/docs/backpack/algorithm.tex +++ b/docs/backpack/algorithm.tex @@ -15,6 +15,7 @@ \usepackage{algpseudocode} \usepackage{bigfoot} \usepackage{amssymb} +\usepackage{amsmath} \usepackage{framed} % Alter some LaTeX defaults for better treatment of figures: @@ -67,9 +68,8 @@ Here are the steps a unit goes through: site (unit declaration), or an external unit declaration from the indefinite unit database. - \item The \textbf{dependency solver} takes a \I{unit} - and converts it into a - directed acyclic graph representing the + \item The \textbf{dependency solver} takes a unit + and converts it into a directed acyclic graph representing the dependency structure of the declarations in a unit. It also computes the \I{Module} of each module/signature declaration, the \I{UnitKey} of each include, and the overall @@ -79,16 +79,16 @@ Here are the steps a unit goes through: and computes its \I{Shape}. A \I{Shape} describes precisely what a unit provides and requires at the Haskell declaration level (\I{AvailInfo}). - \item The \textbf{indefinite pipeline} takes the DAG and the shape + \item The \textbf{indefinite type checker} takes the DAG and the shape and typechecks each module and signature against the indefinite unit database. The type checking results are saved in the indefinite - unit database under the \I{IndefiniteUnitId}. + unit database under the \I{IndefUnitId}. - \item The \textbf{definite pipeline} takes the DAG as well as a + \item The \textbf{definite type checker and compiler} takes the DAG as well as a \emph{hole mapping} specifying how each requirement in the unit - is to be filled, and type-checks and compiles the unit against the - installed unit database. The type checking results and object files - are saved to the installed unit database under the \I{InstalledUnitId}. + is filled, and type-checks and compiles the unit against the + installed unit database. The results are saved to the installed unit database + under the \I{InstalledUnitId}. \end{enumerate} \section{Front-end syntax} @@ -120,61 +120,112 @@ $$ A (slightly simplified) syntax of Backpack is given in Figure~\ref{fig:syntax}. \newpage -\section{Unit renamer} +\section{Unit databases and the unit renamer} \begin{figure}[htpb] $$ \begin{array}{rcll} - \I{InstalledPackageId} & & \mbox{Installed package IDs} \\ - \I{IndefiniteUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\ - \I{InstalledUnitId} & ::= & \I{IndefiniteUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\ - \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\ +\multicolumn{3}{l}{\mbox{\bf Identifiers}} \\ + \I{InstalledPackageId} & & \mbox{Opaque unique identifier} \\ + \I{IndefUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\ + \I{UnitKey} & ::= & \I{IndefUnitId} \verb|(|\, \I{HoleMap}\, \verb|)| \\ + & | & \verb|HOLE| \\ + \I{HoleMap} & ::= & \I{ModuleName}\; \verb|->|\; \I{Module}\; \verb|,|\, \ldots \\ + \I{Module} & ::= & \I{UnitKey} \verb|:| \I{ModuleName} \\ + \I{InstalledUnitId} & ::= & \I{UnitKey} \quad \mbox{(with no occurrence of \texttt{HOLE})} \\[1em] +\multicolumn{3}{l}{\mbox{\bf Unit databases}} \\ + \I{IndefUnitDb} & ::= & \I{IndefUnitId} \; \verb|->| \; \I{IndefUnitRecord} \verb|,|\, \ldots \\ + \I{InstalledUnitDb} & ::= & \I{InstalledUnitId} \; \verb|->| \; \I{InstalledUnitRecord} \verb|,|\, \ldots \\ \end{array} $$ \caption{Unit identification} \label{fig:ids} \end{figure} -The unit renamer is responsible for transforming unit names $p$ into \I{IndefiniteUnitId}s, given some current \I{InstalledPackageId} (\I{ThisInstalledPackageId}) and a mapping from $p$ to -\I{IndefiniteUnitId} (\I{UnitNameMap}). Its operation on a Backpack file (collection of units) is very simple: +\subsection{The unit databases} -\begin{itemize} - \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ - \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}. -\end{itemize} +To install a package so that it is available when other packages are compiled, +we must record it in some sort of database (which the compiler will query later). +The obvious design for such a database is for it to record \emph{installed packages}, +each of which has a collection of object files and interfaces from a compilation. +However, with Backpack, we have to refine this picture in two ways: + +\begin{enumerate} + \item A package is always a unit of distribution: something that has + single authorship and is uploaded to Hackage. It would seriously hamper + modular programming in the small, however, if you always had to create + a new package to abstract over some requirements. Thus, while packages + continue to be a unit of distribution, the unit of modularity that the + compiler deals with is a unit; a package can contain multiple units. + Consequently, the database is a database of \emph{units}, not packages. + \item In Backpack, it may not be possible to \emph{compile} a unit + at install time: it may depend on some (as yet) unspecified holes. + Thus, we must maintain two databases: the \emph{indefinite unit + database}, recording type-checked but uncompiled units, and the + \emph{installed unit database}, recording compiled units. unit + database. +\end{enumerate} % -The purpose of an \emph{IndefiniteUnitId} is to uniquely identify the results of typechecking -an indefinite unit; similarly, an \emph{InstalledUnitId} uniquely identifies -the results of compiling a unit with all its holes -filled. It thus records a \emph{hole mapping} which specifies how each -hole was filled. - -An \emph{installed package ID} (IPID) is an opaque string provided by -Cabal which uniquely identifies an installed package. A recipe for -computing an IPID would incorporate both the source info, such as the hash of the -source code distribution tarball, as well as build info, such -as the selected Cabal and GHC flags, what the provided mapping from -$p$ to $IndefiniteUnitId$ was, etc. - -\paragraph{The difference between units and packages} -Cabal packages are: +Thus, there are four closely related types of identifier to be aware of: + +\begin{description} \item[Installed package IDs] Although the unit +databases record compile products per unit, it is still useful to be +able to assign an identifier to an installed package as a whole. Thus, +the \I{InstalledPackageId} is a hash of the source code, as well as +build info for the package, such as the Cabal and GHC flags, the +resolution of textual dependencies, etc. The IPID is opaque to GHC +and selected by Cabal, which operates at the level of packages (rather +than units). + +\item[Indefinite unit IDs] A package contains multiple units; so if an +installed package ID represents a package with all of its dependencies +resolved, an indefinite unit ID simply singles out a specific unit in +this installed package. Indefinite unit IDs identify entries in the +\textbf{indefinite unit database}, which contain the results of +type-checking a unit, but no actual object code. Instead, it has enough +information so that the indefinite unit can be built into actual code +when a requirement is filled (which means it includes source code and +build flags as well.) + +\item[Unit keys] A unit key is an indefinite unit ID, but augmented +with a \I{HoleMap}, which says how the requirements of the unit are +instantiated. Unit keys are assigned when typecheckign both indefinite +definite units: a fictitious unit \verb|HOLE| is specified to define +all unfilled requirements in the unit. Units instantiated with holes +are never installed; they are strictly for type-checking (although +we could generate code for them, which could be linked if the +\verb|HOLE| symbols are rewritten to their true destinations). + +\item[Installed unit IDs] An installed unit ID is a unit key which has +no \verb|HOLE|s; it identifies a unit that can be compiled. The +\textbf{installed unit database} caches the compilation results of these +units, so that if a unit is compiled multiple times with the same +instantiation, this code can be reused. (This database most closely +resembles the existing installed package database with GHC today.) +\end{description} + +\subsection{The unit renamer} + +The unit renamer is responsible for transforming unit names in a +Backpack file into \I{IndefUnitId}s, so that they can be uniquely +identified in the indefinite unit database. +Given the \I{InstalledPackageId} of the package we are compiling (\I{ThisInstalledPackageId}) +and a mapping from $p$ to \I{IndefUnitId} (\I{UnitNameMap}), we rename +as follows: \begin{itemize} - \item The unit of distribution - \item The unit that Hackage handles - \item The unit of versioning - \item The unit of ownership (who maintains it etc) + \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ + \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}. \end{itemize} -Backpack units are the building blocks of modular development; -there may be multiple units per a Cabal package. While in theory -Cabal could do sophisticated things with multiple units in a -package, we expect Cabal -to pick a distinguished unit (with the same unit name $p$ as -the package) which serves as the publically visible unit. +The \I{UnitNameMap} is entirely user specified, so there is a great deal +of flexibility on how it can be created, but the convention we expect to +be used by Cabal is that a unit name $p$ corresponds to the same-named +unit in a \emph{package} named $p$. Packages that don't use Backpack +only have on unit, which has the same name as package. -\paragraph{Notational convention} +\paragraph{Notational conventions} In the rest of this document, when it is unambiguous, we will use $p$, $q$ and $r$ -interchangeably with \I{IndefiniteUnitId}, as after unit renaming, there +interchangeably with \I{IndefUnitId}, as after unit renaming, there are no occurrences of unit names. \newpage @@ -185,114 +236,496 @@ $$ \begin{array}{rcll} \tilde{d} & ::= & \verb|module|\; Module \; [exports]\; \verb|where|\; \I{body} \\ & | & \verb|signature|\; \I{Module} \; [exports]\; \verb|where|\; \I{body} \\ - & | & \verb|include|\; p\,\verb|(|\, m\; \verb|->|\; \I{Module}\,\verb|,|\;\ldots\,\verb|)| \\ - & | & \verb|merge|\; \I{Module}\; \verb|of|\; (\I{Module}, \I{IsSource?}) \ldots + & | & \verb|merge|\; \I{Module} \\ + & | & \verb|include|\; \I{UnitKey} \\ + \I{IndefUnitRecord}^{\mathsf{dep}} & ::= & \verb|provides:|\; m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\\ + & & \verb|requires:|\; m\verb|,|\, \ldots \end{array} $$ \caption{Resolved declarations} \label{fig:resolved} \end{figure} -The first phase of compilation is defines a directed acyclic graph on -the source syntax representing the dependency structure of the -modules/signatures/includes in the unit. This DAG has a node per: +The dependency solver computes the unfilled requirements of a unit, a +dependency DAG on the modules/signatures in the unit, and a dependency +DAG on the includes in the unit. We assume every referenced $p$ in the +unit must be recorded in the indefinite unit database, such that we can +look up $\I{IndefUnitRecord}^{\mathsf{dep}}$. + +\paragraph{Computing unfilled requirements} The unfilled requirements are $R-P$, where $R$ and $P$ are sets of module names computed from the declarations in the following manner: \begin{itemize} - \item Each source-level module, signature and include, and - \item Each unfilled requirement (called a ``signature merge'' node). + \item $\verb|include|\; p$: union the (domain of the) provisions with $P$ and the requirements with $R$. + \item $\verb|module|\; m$: add $m$ to $P$. + \item $\verb|signature|\; m$: add $m$ to $R$. \end{itemize} + +\paragraph{Declaration dependency graph} +We define a graph where the nodes are as described in Figure~\ref{fig:resolved}: +there is a node per +for each module and signature, as well as an extra ``merge'' node for +every unfilled requirement, merges the interfaces of a local signature and +any requirements brought in from includes. % -Each module, signature and signature merge node can be identified -with the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where \I{IsSource?} -is true for signatures and false for modules and signature merges. -The four nodes are described in Figure~\ref{fig:resolved}. +Each node is identified by the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where +the \I{Module} of a declaration $m$ in unit $p$ is \verb|p(H):m|, where $H$ maps +each unfilled requirement $n$ to \verb|HOLE:n|, and \I{IsSource?} is true only for signatures. The edges of the directed graph signify a ``depends on'' relation, and are defined as follows: \begin{itemize} - \item A module/signature $m$ depends on $\verb|include|\; p$ if $m$ imports a module provided by $p$. \item A module/signature $m$ depends on a module/signature merge $n$ if $m$ imports $n$. \item A module/signature $m$ depends on a signature $n$ if $m$ \verb|{-# SOURCE #-}| imports $n$. \item A module/signature merge $m$ depends on a local signature $m$ (if it exists). - \item A module/signature merge $m$ depends on a $\verb|include|\; p$, if the include requires $m$. -\end{itemize} -% -For compilation, these extra edges can also be defined if they -do not introduce a cycle: - -\begin{itemize} - \item An $\verb|include|\; p$ depends on $\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and $q$ provides $m$. - \item An $\verb|include|\; p$ depends on a module $m$ if $p$ requires a module named $m$. \end{itemize} % If the resulting graph has a cycle, this is an error. -\paragraph{Computing unfilled requirements} To compute unfilled requirements, -maintain two sets of module names: the provisions $P$ and the possible requirements $R'$. For each declaration: +\paragraph{Include dependency graph} We define an dependency graph +between includes, where an $\verb|include|\; p$ depends on +$\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and +$q$ provides $m$. If there is a cyclic, then there is cross-unit +mutual recursion: for now, this is an error. -\begin{itemize} - \item $\verb|include|\; p$: union provisions with $P$ and requirements with $R'$. - \item $\verb|module|\; m$: add $m$ to $P$ - \item $\verb|signature|\; m$: add $m$ to $R'$ -\end{itemize} -% -The unfilled requirements $R=R'-P$. - -\paragraph{Computing the \I{Module} of declarations} -The \I{Module} of any declaration $m$ in a unit $p$ is simply -\verb|p(A -> HOLE:A, ...):m|, where the hole map is a map -from each unfilled requirement $n$ to \verb|HOLE:n|. +Assuming an acyclic graph, we can compute the \I{UnitKey} of each +key as follows. Initialize $\Gamma$, a substitution from holes to \I{Module}, +to the identity substitution. For each $\verb|include|\; p$ in topological +order, define its \I{UnitKey} to be $p$ with the mapping $\Gamma$ with its +domain restricted to the requirements of $p$. Then, for each provision +$m\; \verb|->|\; \I{Module}$, update $\Gamma$ so that +$\Gamma(m) = \operatorname{subst} (\Gamma, \I{Module})$ +(where $\operatorname{subst}$ recursively applies the substitution $\Gamma$ in \I{Module}). -\paragraph{Computing the hole mapping of includes} In absence of mutual -recursion of includes, the DAG is acyclic with include-include edges. Process includes -in this topological order, maintaining a mapping of provided modules $\Gamma$, accumulating -provisions of includes as we go along. For each $\verb|include|; p$, the hole map is -simply the requirements of $p$, mapping $m$ to $\Gamma(m)$ if it is defined, and \verb|HOLE:m| otherwise. - -With mutual recursion, we have to use the regular tree unification algorithm described in the Backpack paper. We omit it from here for now. +During compilation, the include dependency graph is useful for +determining a compilation order for included units. \newpage -\section{Shaping} +\section{Shaping pass} \begin{figure}[htpb] $$ \begin{array}{rcll} -\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\ - & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\ +\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \I{ModShape} \verb|;|\; \ldots \\ + & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \I{ModShape} \verb|;|\; \ldots \\ +\I{ModShape} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n \\ \I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\ & | & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\ \I{Name} & ::= & \I{Module} \verb|.| \I{OccName} \\ -\I{OccName} & & \mbox{Unqualified name in a namespace} +\I{OccName} & & \mbox{Unqualified name in a namespace} \\ +\I{IndefUnitRecord}^{\mathsf{shape}} & ::= & \I{Shape} \end{array} $$ -\caption{Shaping} \label{fig:semantic} +\caption{Shaping} \label{fig:shaping} \end{figure} -Shaping computes a \I{Shape}, whose form is described in Figure~\ref{fig:semantic}. -A shape describes what modules a unit implements and exports (the \emph{provides}) -and what signatures a unit needs to have filled in (the \emph{requires}). Both -provisions and requires are available for import by units which include this -unit. +The shaping pass computes the export \I{AvailInfo}s for each node in +the dependency graph; collectively, these form the \I{Shape} of +the unit described in Figure~\ref{fig:shaping}. Equivalently, +the \I{Shape} of unit specifies what a unit requires and provides +at the Haskell declaration level. + +An \I{AvailInfo} names a Haskell declaration that may be exported. +It may be a plain identifier \I{Name}, or it may be a type constructor, +in which case it has children \I{Name}s representing the names of the +data constructors, record selectors, etc. This level of hierarchy +makes it possible to use ellipses in an import list, e.g. \verb|TyCon(..)|, +to selectively import just the logical children of a type constructor. +Children names have the invariant that they have the same \I{Module} as the parent name. +In a \I{ModShape}/export list, +the \I{OccName}s of the plain identifier \I{AvailInfo}s and the \emph{children} +of type constructor are unique +(although the top-level \I{Name}s may not have unique \I{OccName}s). + +The compilation of every node is associated with a ``shape context'', +which represents the modules which are transitively depended upon. Let the +environment shape context is the merge of the shapes of all includes; to +shape a node: + +\begin{enumerate} + \item Merge the environment shape contexts with the shape contexts + of all direct dependencies, resulting in the initial shape context. + \item Rename the module/signature according to the initial shape context, + getting a \I{ModShape}. Importantly, when renaming the signature \verb|M|, + any declarations defined in the signature are assigned a \I{Name}s with the \I{Module} \verb|HOLE:M| (rather than a \I{Module} based on the current unit $p$). + \item Merge this \I{ModShape} into the initial shape context + (modules go in provisions while signatures go in provisions), the + result defining the shape context of this node. +\end{enumerate} + +We now elaborate on these steps in more detail. -We incrementally build a shape by starting with an empty -shape context and adding to it as follows: +\subsection{Shapes of includes} +Given an \verb|include p (X) requires (Y)|, we can look up the shape +for $p$ from the indefinite package database. However, an include can +also rename provisions and requires (where $X$, $Y$ are partial maps +from module name to module name), which requires transforms the shape +in the following way: + +\begin{itemize} + \item For each original provision $m\; \verb|->|\; \ldots$, provide + $X (m)\; \verb|->|\; \ldots$ if $X (m)$ is defined. + \item For each original requirement $m\; \verb|->|\; \ldots$, require + $Y (m)\; \verb|->|\; \ldots$ if $Y (m)$ is defined, and $m$ if it is not. + (Non-mentioned requirements are always passed through). + \item For each requirement renaming from \verb|M| to \verb|N| in $Y$, substitute + all occurrences of \verb|HOLE:M| to \verb|HOLE:N| in the \I{ModShape} + of all provisions and requirements. +\end{itemize} + +\subsection{Shape merging} +Before specifying the how to merge shapes algorithm, we must define some subprocedures +for unifying and merging lower-level entities such as \I{AvailInfo}s and \I{Name}s, +which produce \I{Name} substitution that are applied to shapes. + +\begin{description} + \item[Unify two \textit{Name}s] + (produces a \I{Name} and a \I{Name} substitution) \\ + Error if the names do not have matching \I{OccName}s. Error if neither name + is a hole name. Otherwise, without loss of generality let $m$ be the hole name + and $n$ the other name, return $n$ and the substitution of $m$ to $n$. + \item[Merge two sets of \textit{Name}s] + (produces a set of \I{Name}s and a \I{Name} substitution) \\ + Let two \I{Name}s be related if they have the same \I{OccName}. + Union the two sets, unifying related names. + \item[Unify two \textit{AvailInfo}s] + (produces an \I{AvailInfo} and a \I{Name} substitution) \\ + If both \I{AvailInfo}s are simply + a \I{Name}, unify the two \I{Name}s. If both \I{AvailInfo}s are + $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$, + unify the top-level \I{Name}, apply the substitution to both \I{AvailInfo}s, + and return the unified \I{Name} with the union of the child names of the + substituted \I{AvailInfo}s. + Otherwise, error. + \item[Merge two sets of \textit{AvailInfo}s] + (produces a set of \I{AvailInfo}s and a \I{Name} substitution) \\ + Let two \I{AvailInfo}s be related if they both are of the form + \I{Name} and have matching \I{OccName}s, or if they both are of + the form + $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$ + and there exists a child name in each which have matching \I{OccName}s. + Union the two sets, unifying related \I{AvailInfo}s. + \item[Apply a name substitution on an \textit{AvailInfo}] (produces an \I{AvailInfo}) \\ + Substitute the top-level \I{Name}, which induces a substitution from + \I{Module} to $\I{Module}'$. Apply this module substitution to each child + \I{Name} in the \I{AvailInfo}. + +\end{description} +% +Shape merging takes two units with inputs (requirements) and outputs +(provisions) and ``wires'' them up so that outputs feed into inputs. To +merge the shape of $p$ with the shape of $q$: \begin{enumerate} - \item Calculate the shape of a declaration, with respect to the - current shape context. (e.g., by renaming a module/signature, - or using the shape from an included unit.) - \item Merge this shape into the shape context. + \item \emph{Fill every requirement of $q$ with provided modules from + $p$.} For each requirement $M$ of $q$ that is provided by $p$, + substitute each \I{Module} occurrence of \verb|HOLE:M| with the + provided $p\verb|(|M\verb|)|$ (however, do \textbf{NOT} substitute the + top-level \I{Module} in a \I{Name}s), merge the \I{AvailInfo}s and apply + the resulting \I{Name} substitution, and + remove the requirement from $q$. If the \I{AvailInfo}s of the + provision are not a superset of the required \I{AvailInfo}s, + error. + \item If mutual recursion is supported, \emph{fill every requirement + of $p$ with provided modules from $q$.} + \item \emph{Merge leftover requirements.} For each requirement $M$ + of $q$ that is not provided by $p$ but required by $p$, and let + the new requirement be the merge of + \I{AvailInfo}s, applying the resulting \I{Name} substitution. + \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, (lazily) erroring + if there is a duplicate that doesn't have the same \I{Module}. \end{enumerate} -The final shape context is the shape of the unit as a whole. -Optionally, we can also compute the renamed syntax trees of -modules and signatures. +\newpage +\section{Indefinite type checker} + +\begin{figure}[htpb] +$$ +\begin{array}{rcll} +\I{IndefUnitRecord} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em] +\multicolumn{3}{l}{\mbox{\bf Module interface}} \\ +\I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\ +& & \qquad \I{mi\_decls} \\ +& & \qquad \I{mi\_insts} \\ +& & \qquad \I{dep\_orphs} \\ +\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\ +\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\ +\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\ +\I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em] +\multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\ +\I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\ + & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\ + & | & \ldots \\ +\I{IfaceClsInst} & & \mbox{A type-class instance} \\ +\I{IfaceId} & & \mbox{Interface of top-level binder} \\ +\I{IfaceData} & & \mbox{Interface of type constructor} \\ +\end{array} +$$ +\caption{Module interfaces in GHC} \label{fig:typecheck} +\end{figure} + +\Red{This needs updating.} + +In general terms, +type checking an indefinite unit (a unit with holes) involves +calculating, for every module, a \I{ModIface} representing the +type/interface of the module in question (which is serialized +to disk). The general form of these +interface files are described in Figure~\ref{fig:typecheck}; notably, +the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references, +which must be resolved by +looking up a \I{ModIface} corresponding to the \I{Module} associated +with the \I{Name}. (We will say more about this lookup process shortly.) +For example, given: + +\begin{verbatim} + unit p where + signature H where + data T + module A(S, T) where + import H + data S = S T +\end{verbatim} +% +the \I{PkgType} is: + +\begin{verbatim} + module HOLE:H (HOLE:H.T) where + data T -- abstract type constructor + module THIS:A (THIS:A.S, HOLE:H.T) where + data S = S HOLE:H.T + -- where THIS = p(H -> HOLE:H) +\end{verbatim} +% +However, while it is true that the \I{ModIface} is the final result +of type checking, we actually are conflating two distinct concepts: the user-visible +notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s +into scope (or could trigger a deprecation warning, or pull in some +orphan instances\ldots), versus the actual declarations, which, while recorded +in the \I{ModIface}, have an independent existence: even if a declaration +is not visible for an import, we may internally refer to its \I{Name}, and +need to look it up to find out type information. (A simple case when +this can occur is if a module exports a function with type \verb|T -> T|, +but doesn't export \verb|T|). + +\begin{figure}[htpb] +$$ +\begin{array}{rcll} +\I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\ +\I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\ +\I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em] +\multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\ +\I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\ +\I{ClsInst} & & \mbox{Type-checked type class instance} \\ +\end{array} +$$ +\caption{Semantic objects in GHC} \label{fig:typecheck-more} +\end{figure} + +Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in +Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just +a bag of type-checkable entities which GHC knows about. We +define the \emph{external package state (EPT)} to +simply be the union of the \I{ModDetails} +of all external modules. + +Type checking is a delicate balancing act between module +interfaces and our semantic objects. A \I{ModIface} may get +type-checked multiple times with different hole instantiations +to provide multiple \I{ModDetails}. +Furthermore complicating matters +is that GHC does this resolution \emph{lazily}: a \I{ModIface} +is only converted to a \I{ModDetails} when we are looking up +the type of a \I{Name} that is described by the interface; +thus, unlike usual theoretical treatments of type checking, we can't +eagerly go ahead and perform substitutions on \I{ModIface}s when +they get included. + +In a separate compiler like GHC, there are two primary functions we must provide: + +\paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which +was explicitly imported by a user, we must produce a \I{ModIface} +that, among other things, specifies what \I{Name}s are brought +into scope. This is used by the renamer to resolve plain references +to identifiers to real \I{Name}s. (By the way, if shaping produced +renamed trees, it would not be necessary to do this step!) + +\paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be +a part of a \I{Name}, we must be able to type check it into +a \I{ModDetails} (usually by reading and typechecking the \I{ModIface} +associated with the \I{Module}, but this process is involved). This +is used by the type checker to find out type information on things. \\ + +There are two points in the type checker where these capabilities are exercised: + +\paragraph{Source-level imports} When a user explicitly imports a +module, the \textit{ModuleName} is mapped to a \textit{ModIface} +to find out what exports are brought into scope (\I{mi\_exports}) +and what orphan instances must be loaded (\I{dep\_orphs}). Additionally, +the \textit{Module} is loaded to the EPT to bring instances from +the module into scope. + +\paragraph{Internal name lookup} During type checking, we may have +a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the +EPT, we type check and load +into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name}, +and then check the EPT again. (\verb|importDecl|) + +\subsection{\textit{ModName} to \textit{ModIface}} + +In all cases, the \I{mi\_exports} can be calculated directly from the +shaping process, which specifies exactly for each \I{ModName} in scope +what will be brought into scope. + +\paragraph{Modules} Modules are straightforward, as for any +\I{Module} there is only one possibly \I{ModIface} associated +with it (the \I{ModIface} for when we type-checked the (unique) \verb|module| +declaration.) -In the description below, we'll assume \verb|THIS| is the unit key -of the unit being processed. +\paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s +associated with a \I{ModName} in scope, e.g. in this situation: + +\begin{verbatim} + unit p where + signature S where + data A + unit q where + include p + signature S where + data B + module M where + import S +\end{verbatim} +% +Each literal \verb|signature| has a \I{ModIface} associated with it; and +the import of \verb|S| in \verb|M|, we want to see the \emph{merged} +\I{ModIface}s. We can determine the \I{mi\_exports} from the shape, +but we also need to pull in orphan instances for each signature, and +produce a warning for each deprecated signature. \begin{aside} -\textbf{\textit{OccName} is implied by \textit{Name}.} +\textbf{Does hiding a signature hide its orphans.} Suppose that we have +extended Backpack to allow hiding signatures from import. + +\begin{verbatim} + unit p requires (H) where -- H is hidden from import + module A where + instance Eq (a -> b) where -- orphan + signature H {-# DEPRECATED "Don't use me" #-} where + import A + + unit q where + include p + signature H where + data T + module M where + import H -- warn deprecated? + instance Eq (a -> b) -- overlap? +\end{verbatim} + +It is probably the most consistent to not pull in orphan instances +and not give the deprecated warning: this corresponds to merging +visible \I{ModIface}s, and ignoring invisible ones. +\end{aside} + +\subsection{\textit{Module} to \textit{ModDetails}} + +\paragraph{Modules} For modules, we have a \I{Module} of +the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$, +and we also have a unique \I{ModIface}, where each hole instantiation +is $\verb|HOLE:|m$. + +To generate the \I{ModDetails} associated with the specific instantiation, +we have to type-check the \I{ModIface} with the following adjustments: + +\begin{enumerate} + \item Perform a \I{Module} substitution according to the instantiation + of the \I{ModIface}'s \I{Module}. (NB: we \emph{do} + substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated + \verb|A -> HOLE:B|, \emph{unlike} the disjoint + substitutions applied by shaping.) + \item Perform a \I{Name} substitution as follows: for any name + with a unit key that is a $\verb|HOLE|$, + substitute with the recorded \I{Name} in the requirements of the shape. + Otherwise, look up the (unique) \I{ModIface} for the \I{Module}, + and subsitute with the corresponding \I{Name} in the \I{mi\_exports}. +\end{enumerate} + +\paragraph{Signatures} For signatures, we have a \I{Module} of the form +$\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole. +We distinguish each separate \I{ModIface} by considering the full \I{UnitKey} +it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this +the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole +and their defining unit keys can easily be calculated during shaping. + +To generate the \I{ModDetails} associated with a hole, we type-check each +\I{ModIface}, with the following adjustments: + +\begin{enumerate} + \item Perform a \I{Module} substitution according to the instantiation + of the defining unit key. (NB: This may rename the hole itself!) + \item Perform a \I{Name} substitution as follows, in the same manner + as would be done in the case of modules. + \item When these \I{ModDetails} are merged into the EPT, some merging + of duplicate types may occur; a type + may be defined multiple times, in which case we check that each + definition is compatible with the previous ones. A concrete + type is always compatible with an abstract type. +\end{enumerate} + +\paragraph{Invariants} When we perform \I{Name} substitutions, we must be +sure that we can always find out the correct \I{Name} to substitute to. +This isn't obviously true, consider: + +\begin{verbatim} + unit p where + signature S(foo) where + data T + foo :: T + module M(bar) where + import S + bar = foo + unit q where + module A(T(..)) where + data T = T + foo = T + module S(foo) where + import A + include p + module A where + import M + ... bar ... +\end{verbatim} +% +When we type check \verb|p|, we get the \I{ModIface}s: + +\begin{verbatim} + module HOLE:S(HOLE:S.foo) where + data T + foo :: HOLE:S.T + module THIS:M(THIS:M.bar) where + bar :: HOLE:S.T +\end{verbatim} +% +Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|, +which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|. +The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|; +this should be substituted to \verb|q():S.T|. But how do we discover this? +We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try +and look for \verb|q():S.T|. However, this \I{Name} does not exist because +the \verb|module S| reexports the selector from \verb|A|! Nor can we consult +the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant +type. + +The conclusion, then, is that a module written this way should be disallowed. +Specifically, the correctness condition for a signature is this: \emph{Any \I{Name} +mentioned in the \I{ModIface} of a signature must either be from an external module, or be +exported by the signature}. + +\section{Appendix: Shaping} + +This section clarifies some subtle aspects about shaping. + +\subsection{\textit{OccName} is implied by \textit{Name}} In Haskell, the following is not valid syntax: \begin{verbatim} @@ -305,15 +738,15 @@ in such a world, we need a different definition of shape: \begin{verbatim} Shape ::= - provided: ModName -> { OccName -> Name } - required: ModName -> { OccName -> Name } + provided: ModName -> Module { OccName -> Name } + required: ModName -> { OccName -> Name } \end{verbatim} Presently, however, such an \I{OccName} annotation would be redundant: it can be inferred from the \I{Name}. -\end{aside} -\begin{aside} -\textbf{Holes of a unit are a mapping, not a set.} Why can't the \I{UnitKey} just record a -set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider: +\subsection{Holes of a unit are a mapping, not a set.} + +Why can't the \I{UnitKey} just record a +set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; p \; \verb|{| \; \I{Module} \; \verb|}|$? Consider: \begin{verbatim} unit p (A) requires (H1, H2) where @@ -336,12 +769,10 @@ set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{M \end{verbatim} With a mapping, the first instance of \verb|p| has key \verb|p(H1 -> q():I1, H2 -> q():I2)| while the second instance has key \verb|p(H1 -> q():I2, H2 -> q():I1)|; with -a set, both would have the key \verb|p(q():I1, q():I2)|. -\end{aside} - +a set, both would have the key \verb|p{q():I1, q():I2}| and be +indistinguishable. -\begin{aside} -\textbf{Signatures can require a specific entity.} +\subsection{Signatures can require a specific entity.} With requirements like \verb|A -> { HOLE:A.T, HOLE:A.foo }|, why not specify it as \verb|A -> { T, foo }|, e.g., \verb|required: { ModName -> { OccName } }|? Consider: @@ -360,12 +791,12 @@ can be expressed with \I{Name}s as A -> { HOLE:A.T } B -> { HOLE:A.T } \end{verbatim} -But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| don't have to be implemented with the same module.) -\end{aside} +But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| could be filled with different modules, they just have +to both export the same \verb|T|.) -\begin{aside} -\textbf{The \textit{Name} of a value is used to avoid -ambiguous identifier errors.} We state that two types +\subsection{The \textit{Name} of a value is used to avoid +ambiguous identifier errors.} +We state that two types are equal when their \I{Name}s are the same; however, for values, it is less clear why we care. But consider this example: @@ -385,10 +816,8 @@ that \verb|x| from \verb|H1| and \verb|x| from \verb|H2| are the same (have the same \I{Name}.) If they were not the same, it would be ambiguous and should cause an error. Knowing the \I{Name} of a value distinguishes between these two cases. -\end{aside} -\begin{aside} -\textbf{Holes are linear} +\subsection{Holes are linear} Requirements do not record what \I{Module} represents the identity of a requirement, which means that it's not possible to assert that hole \verb|A| and hole \verb|B| should be implemented with the same module, @@ -408,55 +837,15 @@ looking for in one of $n$ names, accepting any name as an acceptable linkage. If aliasing was allowed, we'd need a separate physical shaping context, to make sure multiple mentions of the same hole were consistent. -\end{aside} - -%\newpage - -\subsection{\texttt{module M}} - -A module declaration provides a module \verb|THIS:M| at module name \verb|M|. It -has the shape: - -\begin{verbatim} - provides: M -> THIS:M { exports of renamed M under THIS:M } - requires: (nothing) -\end{verbatim} -Example: - -\begin{verbatim} - module A(T) where - data T = T - - -- provides: A -> THIS:A { THIS:A.T } - -- requires: (nothing) -\end{verbatim} - -\newpage -\subsection{\texttt{signature M}} - -A signature declaration creates a requirement at module name \verb|M|. It has the shape: - -\begin{verbatim} - provides: (nothing) - requires: M -> { exports of renamed M under HOLE:M } -\end{verbatim} - -\noindent Example: - -\begin{verbatim} - signature H(T) where - data T - - -- provides: H -> (nothing) - -- requires: H -> { HOLE:H.T } -\end{verbatim} - -\begin{aside} -\textbf{In-scope signatures are not provisions}. We enforce the invariant that +\subsection{A unit does not ``provide'' its signatures} +We enforce the invariant that a provision is always (syntactically) a \verb|module| and a requirement is always a \verb|signature|. This means that if you have a requirement and a provision of the same name, the requirement can \emph{always} be filled -with the provision. Without this invariant, it's not clear if a provision +with the provision. + +The alternate design, where a unit both requires and provides +its signatures, makes it unclear if a provision will actually fill a signature. Consider this example, where a signature is required and exposed: @@ -484,7 +873,7 @@ actually provide enough declarations to satisfy \verb|a-user|'s requirement: the intended semantics \emph{merges} the requirements of \verb|a-sigs| and \verb|a-user|. - +What about this example? \begin{verbatim} unit a-sigs (M as A) requires (H as A) where @@ -508,236 +897,21 @@ be under the name \verb|H| or \verb|A|? It may still be possible to use the \verb|(A) requires (A)| syntax to indicate exposed signatures, but this would be a mere syntactic alternative to \verb|() requires (exposed A)|. -\end{aside} -% - -\newpage -\subsection{\texttt{include pkg (X) requires (Y)}} - -We merge with the transformed shape of unit \verb|pkg|, where this -shape is transformed by: - -\begin{itemize} - \item Renaming and thinning the provisions according to \verb|(X)| - \item Renaming requirements according to \verb|(Y)| (requirements cannot - be thinned, so non-mentioned requirements are implicitly passed through.) - For each renamed requirement from \verb|Y| to \verb|Y'|, - substitute \verb|HOLE:Y| with \verb|HOLE:Y'| in the - \I{Module}s and \I{Name}s of the provides and requires. -\end{itemize} -% -If there are no thinnings/renamings, you just merge the -shape unchanged! Here is an example: - -\begin{verbatim} - unit p (M) requires (H) where - signature H where - data T - module M where - import H - data S = S T - - unit q (A) where - module X where - data T = T - include p (M as A) requires (H as X) -\end{verbatim} -% -The shape of unit \verb|p| is: -\begin{verbatim} - requires: M -> { p(H -> HOLE:H):M.S } - provides: H -> { HOLE:H.T } -\end{verbatim} -% -Thus, when we process the \verb|include| in unit \verb|q|, -we make the following two changes: we rename the provisions, -and we rename the requirements, substituting \verb|HOLE|s. -The resulting shape to be merged in is: - -\begin{verbatim} - provides: A -> { p(H -> HOLE:X):M.S } - requires: X -> { HOLE:X.T } -\end{verbatim} -% -After merging this in, the final shape of \verb|q| is: - -\begin{verbatim} - provides: X -> { q():X.T } -- from shaping 'module X' - A -> { p(H -> q():X):M.S } - requires: (nothing) -- discharged by provided X -\end{verbatim} - -\newpage - -\subsection{Merging} - -The shapes we've given for individual declarations have been quite -simple. Merging combines two shapes, filling requirements with -implementations, unifying \I{Name}s, and unioning requirements; it is -the most complicated part of the shaping process. - -The best way to think about merging is that we take two units with -inputs (requirements) and outputs (provisions) and ``wiring'' them up so -that outputs feed into inputs. In the absence -of mutual recursion, this wiring process is \emph{directed}: the provisions -of the first unit feed into the requirements of the second unit, -but never vice versa. (With mutual recursion, things can go in the opposite -direction as well.) - -Suppose we are merging shape $p$ with shape $q$ (e.g., $p; q$). Merging -proceeds as follows: - -\begin{enumerate} - \item \emph{Fill every requirement of $q$ with provided modules from - $p$.} For each requirement $M$ of $q$ that is provided by $p$ (in particular, - all of its required \verb|Name|s are provided), - substitute each \I{Module} occurrence of \verb|HOLE:M| with the - provided $p\verb|(|M\verb|)|$, unify the names, and remove the requirement from $q$. - If the names of the provision are not a superset of the required names, error. - \item If mutual recursion is supported, \emph{fill every requirement of $p$ with provided modules from $q$.} - \item \emph{Merge leftover requirements.} For each requirement $M$ of $q$ that is not - provided by $p$ but required by $p$, unify the names, and union them together to form the new requirement. (It's not - necessary to substitute \I{Module}s, since they are guaranteed to be the same.) - \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, erroring - if there is a duplicate that doesn't have the same identity. -\end{enumerate} -% -To unify two sets of names, find each pair of names with matching \I{OccName}s $n$ and $m$ and do the following: - -\begin{enumerate} - \item If both are from holes, pick a canonical representative $m$ and substitute $n$ with $m$. - \item If one $n$ is from a hole, substitute $n$ with $m$. - \item Otherwise, error if the names are not the same. -\end{enumerate} -% -It is important to note that substitutions on \I{Module}s and substitutions on -\I{Name}s are disjoint: a substitution from \verb|HOLE:A| to \verb|HOLE:B| -does \emph{not} substitute inside the name \verb|HOLE:A.T|. - -Since merging is the most complicated step of shaping, here are a big -pile of examples of it in action. - -\subsubsection{A simple example} - -In the following set of units: - -\begin{verbatim} - unit p(M) requires (A) where - signature A(T) where - data T - module M(T, S) where - import A(T) - data S = S T - - unit q where - module A where - data T = T - include p -\end{verbatim} - -When we \verb|include p|, we need to merge the partial shape -of \verb|q| (with just provides \verb|A|) with the shape -of \verb|p|. Here is each step of the merging process: - -\begin{verbatim} - shape 1 shape 2 - -------------------------------------------------------------------------------- -(initial shapes) -provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S } -requires: (nothing) A -> { HOLE:A.T } - -(after filling requirements) -provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S } -requires: (nothing) (nothing) - -(after adding provides) -provides: A -> THIS:A { q():A.T } - M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S } -requires: (nothing) -\end{verbatim} - -Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|. - -\subsubsection{Requirements merging can affect provisions} - -When a merge results in a substitution, we substitute over both -requirements and provisions: - -\begin{verbatim} - signature H(T) where - data T - module A(T) where - import H(T) - module B(T) where - data T = T - - -- provides: A -> THIS:A { HOLE:H.T } - -- B -> THIS:B { THIS:B.T } - -- requires: H -> { HOLE:H.T } - - signature H(T, f) where - import B(T) - f :: a -> a - - -- provides: A -> THIS:A { THIS:B.T } -- UPDATED - -- B -> THIS:B { THIS:B.T } - -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED -\end{verbatim} - -\subsubsection{Sharing constraints} - -Suppose you have two signature which both independently define a type, -and you would like to assert that these two types are the same. In the -ML world, such a constraint is known as a sharing constraint. Sharing -constraints can be encoded in Backpacks via clever use of reexports; -they are also an instructive example for signature merging. - -\begin{verbatim} - signature A(T) where - data T - signature B(T) where - data T - - -- requires: A -> { HOLE:A.T } - B -> { HOLE:B.T } - - -- the sharing constraint! - signature A(T) where - import B(T) - -- (shape to merge) - -- requires: A -> { HOLE:B.T } - - -- (after merge) - -- requires: A -> { HOLE:A.T } - -- B -> { HOLE:A.T } -\end{verbatim} -% -\Red{I'm pretty sure any choice of \textit{Name} is OK, since the -subsequent substitution will make it alpha-equivalent.} - -\subsection{Export declarations} - -If an explicit export declaration is given, the final shape is the -computed shape, minus any provisions not mentioned in the list, with the -appropriate renaming applied to provisions and requirements. (Requirements -are implicitly passed through if they are not named.) -If no explicit export declaration is given, the final shape is -the computed shape, including only provisions which were defined -in the declarations of the unit. - -\begin{aside} -\textbf{Signature visibility, and defaulting} +\subsection{Signature visibility, and defaulting} The simplest formulation of requirements is to have them always be -visible. Signature visibility could be controlled by associating -every requirement with a flag indicating if it is importable or -not: a signature declaration sets a requirement to be visible, and -an explicit export list can specify if a requirement is to be visible -or not. +importable. One proposed enhancement, however, is to allow some +requirements to be ``non-importable''; that is, they are not visible +to people who include packages. + +One simple way of modeling this is to associate each required module +with a flag indicating whether or not it is importable or not. Then, we +might imagine that an explicit export list could be used to toggle +whether or not a requirement is visible or not. -When an export list is absent, we have to pick a default visibility -for a signature. If we use the same behavior as with modules, -a strange situation can occur: +However, when an export list is absent, we have to pick a default +visibility for a signature. If we use the same behavior as with +modules, a strange situation can occur: \begin{verbatim} unit p where -- S is visible @@ -779,44 +953,13 @@ unit q where include p signature S where \end{verbatim} -\end{aside} % % SPJ: This would be too complicated (if there's yet a third way) -\clearpage -\newpage +This is pretty complicated, so signature visibility is not currently +planned to be implemented. -\subsection{Merging AvailInfos} - -We describe how to take two sets of \I{AvailInfo}s and merges them -into one set. In the degenerate case where every \I{AvailInfo} is a -$Name$, this algorithm operates the same as the original algorithm. -Merging proceeds in two steps: unification and then simple union. - -Unification proceeds as follows: for each pair of \I{Name}s with -matching \I{OccName}s, unify the names. For each pair of $\I{Name}\, \verb|{|\, -\I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$, where there -exists some pair of child names with matching \I{OccName}s, unify the -parent \I{Name}s. (A single \I{AvailInfo} may participate in multiple such -pairs.) A simple identifier and a type constructor \I{AvailInfo} with -overlapping in-scope names fails to unify. After unification, -the simple union combines entries with matching \verb|availName|s (parent -name in the case of a type constructor), recursively unioning the child -names of type constructor \I{AvailInfo}s. - -Unification of \I{Name}s results in a substitution, and a \I{Name} substitution -on \I{AvailInfo} is a little unconventional. Specifically, substitution on $\I{Name}\, \verb|{|\, -\I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$ proceeds specially: -a substitution from \I{Name} to $\I{Name}'$ induces a substitution from -\I{Module} to $Module'$ (as the \I{OccName}s of the \I{Name}s are guaranteed -to be equal), so for each child $\I{Name}_i$, perform the \I{Module} -substitution. So for example, the substitution \verb|HOLE:A.T| to \verb|THIS:A.T| -takes the \I{AvailInfo} \verb|HOLE:A.T { HOLE:A.B, HOLE:A.foo }| to -\verb|THIS:A.T { THIS:A.B, THIS:A.foo }|. In particular, substitution -on children \I{Name}s is \emph{only} carried out by substituting on the outer name; -we will never directly substitute children. - -Unfortunately, there are a number of tricky scenarios: +\subsection{Tricky \textit{AvailInfo} merging scenarios} \paragraph{Merging when type constructors are not in scope} @@ -1002,315 +1145,151 @@ more about \verb|A| when the sharing constraint performs unification; however, the \I{AvailInfo} will only tell us about what is in-scope, which is \emph{not} enough information. -%\newpage - -\section{Type checking} +\subsection{Some examples} -\begin{figure}[htpb] -$$ -\begin{array}{rcll} -\I{PkgType} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em] -\multicolumn{3}{l}{\mbox{\bf Module interface}} \\ -\I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\ -& & \qquad \I{mi\_decls} \\ -& & \qquad \I{mi\_insts} \\ -& & \qquad \I{dep\_orphs} \\ -\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\ -\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\ -\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\ -\I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em] -\multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\ -\I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\ - & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\ - & | & \ldots \\ -\I{IfaceClsInst} & & \mbox{A type-class instance} \\ -\I{IfaceId} & & \mbox{Interface of top-level binder} \\ -\I{IfaceData} & & \mbox{Interface of type constructor} \\ -\end{array} -$$ -\caption{Module interfaces in GHC} \label{fig:typecheck} -\end{figure} +\subsubsection{A simple example} -In general terms, -type checking an indefinite unit (a unit with holes) involves -calculating, for every module, a \I{ModIface} representing the -type/interface of the module in question (which is serialized -to disk). The general form of these -interface files are described in Figure~\ref{fig:typecheck}; notably, -the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references, -which must be resolved by -looking up a \I{ModIface} corresponding to the \I{Module} associated -with the \I{Name}. (We will say more about this lookup process shortly.) -For example, given: +In the following set of units: \begin{verbatim} - unit p where - signature H where + unit p(M) requires (A) where + signature A(T) where data T - module A(S, T) where - import H + module M(T, S) where + import A(T) data S = S T -\end{verbatim} -% -the \I{PkgType} is: -\begin{verbatim} - module HOLE:H (HOLE:H.T) where - data T -- abstract type constructor - module THIS:A (THIS:A.S, HOLE:H.T) where - data S = S HOLE:H.T - -- where THIS = p(H -> HOLE:H) + unit q where + module A where + data T = T + include p \end{verbatim} -% -However, while it is true that the \I{ModIface} is the final result -of type checking, we actually are conflating two distinct concepts: the user-visible -notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s -into scope (or could trigger a deprecation warning, or pull in some -orphan instances\ldots), versus the actual declarations, which, while recorded -in the \I{ModIface}, have an independent existence: even if a declaration -is not visible for an import, we may internally refer to its \I{Name}, and -need to look it up to find out type information. (A simple case when -this can occur is if a module exports a function with type \verb|T -> T|, -but doesn't export \verb|T|). -\begin{figure}[htpb] -$$ -\begin{array}{rcll} -\I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\ -\I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\ -\I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em] -\multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\ -\I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\ -\I{ClsInst} & & \mbox{Type-checked type class instance} \\ -\end{array} -$$ -\caption{Semantic objects in GHC} \label{fig:typecheck-more} -\end{figure} +When we \verb|include p|, we need to merge the partial shape +of \verb|q| (with just provides \verb|A|) with the shape +of \verb|p|. Here is each step of the merging process: -Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in -Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just -a bag of type-checkable entities which GHC knows about. We -define the \emph{external package state (EPT)} to -simply be the union of the \I{ModDetails} -of all external modules. +\begin{verbatim} + shape 1 shape 2 + -------------------------------------------------------------------------------- +(initial shapes) +provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S } +requires: (nothing) A -> { HOLE:A.T } -Type checking is a delicate balancing act between module -interfaces and our semantic objects. A \I{ModIface} may get -type-checked multiple times with different hole instantiations -to provide multiple \I{ModDetails}. -Furthermore complicating matters -is that GHC does this resolution \emph{lazily}: a \I{ModIface} -is only converted to a \I{ModDetails} when we are looking up -the type of a \I{Name} that is described by the interface; -thus, unlike usual theoretical treatments of type checking, we can't -eagerly go ahead and perform substitutions on \I{ModIface}s when -they get included. +(after filling requirements) +provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S } +requires: (nothing) (nothing) -In a separate compiler like GHC, there are two primary functions we must provide: +(after adding provides) +provides: A -> THIS:A { q():A.T } + M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S } +requires: (nothing) +\end{verbatim} -\paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which -was explicitly imported by a user, we must produce a \I{ModIface} -that, among other things, specifies what \I{Name}s are brought -into scope. This is used by the renamer to resolve plain references -to identifiers to real \I{Name}s. (By the way, if shaping produced -renamed trees, it would not be necessary to do this step!) +Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|. -\paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be -a part of a \I{Name}, we must be able to type check it into -a \I{ModDetails} (usually by reading and typechecking the \I{ModIface} -associated with the \I{Module}, but this process is involved). This -is used by the type checker to find out type information on things. \\ +\subsubsection{Requirements merging can affect provisions} -There are two points in the type checker where these capabilities are exercised: +When a merge results in a substitution, we substitute over both +requirements and provisions: -\paragraph{Source-level imports} When a user explicitly imports a -module, the \textit{ModuleName} is mapped to a \textit{ModIface} -to find out what exports are brought into scope (\I{mi\_exports}) -and what orphan instances must be loaded (\I{dep\_orphs}). Additionally, -the \textit{Module} is loaded to the EPT to bring instances from -the module into scope. +\begin{verbatim} + signature H(T) where + data T + module A(T) where + import H(T) + module B(T) where + data T = T -\paragraph{Internal name lookup} During type checking, we may have -a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the -EPT, we type check and load -into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name}, -and then check the EPT again. (\verb|importDecl|) + -- provides: A -> THIS:A { HOLE:H.T } + -- B -> THIS:B { THIS:B.T } + -- requires: H -> { HOLE:H.T } -\subsection{\textit{ModName} to \textit{ModIface}} + signature H(T, f) where + import B(T) + f :: a -> a -In all cases, the \I{mi\_exports} can be calculated directly from the -shaping process, which specifies exactly for each \I{ModName} in scope -what will be brought into scope. + -- provides: A -> THIS:A { THIS:B.T } -- UPDATED + -- B -> THIS:B { THIS:B.T } + -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED +\end{verbatim} -\paragraph{Modules} Modules are straightforward, as for any -\I{Module} there is only one possibly \I{ModIface} associated -with it (the \I{ModIface} for when we type-checked the (unique) \verb|module| -declaration.) +\subsubsection{Sharing constraints} -\paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s -associated with a \I{ModName} in scope, e.g. in this situation: +Suppose you have two signature which both independently define a type, +and you would like to assert that these two types are the same. In the +ML world, such a constraint is known as a sharing constraint. Sharing +constraints can be encoded in Backpacks via clever use of reexports; +they are also an instructive example for signature merging. \begin{verbatim} - unit p where - signature S where - data A - unit q where - include p - signature S where - data B - module M where - import S -\end{verbatim} -% -Each literal \verb|signature| has a \I{ModIface} associated with it; and -the import of \verb|S| in \verb|M|, we want to see the \emph{merged} -\I{ModIface}s. We can determine the \I{mi\_exports} from the shape, -but we also need to pull in orphan instances for each signature, and -produce a warning for each deprecated signature. + signature A(T) where + data T + signature B(T) where + data T -\begin{aside} -\textbf{Does hiding a signature hide its orphans.} Suppose that we have -extended Backpack to allow hiding signatures from import. + -- requires: A -> { HOLE:A.T } + B -> { HOLE:B.T } -\begin{verbatim} - unit p requires (H) where -- H is hidden from import - module A where - instance Eq (a -> b) where -- orphan - signature H {-# DEPRECATED "Don't use me" #-} where - import A + -- the sharing constraint! + signature A(T) where + import B(T) + -- (shape to merge) + -- requires: A -> { HOLE:B.T } - unit q where - include p - signature H where - data T - module M where - import H -- warn deprecated? - instance Eq (a -> b) -- overlap? + -- (after merge) + -- requires: A -> { HOLE:A.T } + -- B -> { HOLE:A.T } \end{verbatim} +% +\Red{I'm pretty sure any choice of \textit{Name} is OK, since the +subsequent substitution will make it alpha-equivalent.} -It is probably the most consistent to not pull in orphan instances -and not give the deprecated warning: this corresponds to merging -visible \I{ModIface}s, and ignoring invisible ones. -\end{aside} +\subsection{Export declarations} -\subsection{\textit{Module} to \textit{ModDetails}} +If an explicit export declaration is given, the final shape is the +computed shape, minus any provisions not mentioned in the list, with the +appropriate renaming applied to provisions and requirements. (Requirements +are implicitly passed through if they are not named.) +If no explicit export declaration is given, the final shape is +the computed shape, including only provisions which were defined +in the declarations of the unit. -\paragraph{Modules} For modules, we have a \I{Module} of -the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$, -and we also have a unique \I{ModIface}, where each hole instantiation -is $\verb|HOLE:|m$. -To generate the \I{ModDetails} associated with the specific instantiation, -we have to type-check the \I{ModIface} with the following adjustments: +\section{Cabal} -\begin{enumerate} - \item Perform a \I{Module} substitution according to the instantiation - of the \I{ModIface}'s \I{Module}. (NB: we \emph{do} - substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated - \verb|A -> HOLE:B|, \emph{unlike} the disjoint - substitutions applied by shaping.) - \item Perform a \I{Name} substitution as follows: for any name - with a unit key that is a $\verb|HOLE|$, - substitute with the recorded \I{Name} in the requirements of the shape. - Otherwise, look up the (unique) \I{ModIface} for the \I{Module}, - and subsitute with the corresponding \I{Name} in the \I{mi\_exports}. -\end{enumerate} -\paragraph{Signatures} For signatures, we have a \I{Module} of the form -$\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole. -We distinguish each separate \I{ModIface} by considering the full \I{UnitKey} -it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this -the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole -and their defining unit keys can easily be calculated during shaping. +% \I{InstalledUnitId} & ::= & \I{IndefUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\ +% \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\ -To generate the \I{ModDetails} associated with a hole, we type-check each -\I{ModIface}, with the following adjustments: +\paragraph{Indefinite versus installed units} +The purpose of an \I{IndefUnitId} is to uniquely identify the results of \textbf{typechecking} +an indefinite unit; whereas an \I{InstalledUnitId} uniquely identifies +the results of \textbf{compiling} a unit with all its holes +filled. Thus, an \I{InstalledUnitId} also records a \emph{hole mapping} +which specifies how each hole was filled. If an \I{InstalledUnitId} +is only partially filled, we may refer to it as a \I{UnitKey} (as these +are never installed.) -\begin{enumerate} - \item Perform a \I{Module} substitution according to the instantiation - of the defining unit key. (NB: This may rename the hole itself!) - \item Perform a \I{Name} substitution as follows, in the same manner - as would be done in the case of modules. - \item When these \I{ModDetails} are merged into the EPT, some merging - of duplicate types may occur; a type - may be defined multiple times, in which case we check that each - definition is compatible with the previous ones. A concrete - type is always compatible with an abstract type. -\end{enumerate} - -\paragraph{Invariants} When we perform \I{Name} substitutions, we must be -sure that we can always find out the correct \I{Name} to substitute to. -This isn't obviously true, consider: +\paragraph{Units versus packages} -\begin{verbatim} - unit p where - signature S(foo) where - data T - foo :: T - module M(bar) where - import S - bar = foo - unit q where - module A(T(..)) where - data T = T - foo = T - module S(foo) where - import A - include p - module A where - import M - ... bar ... -\end{verbatim} -% -When we type check \verb|p|, we get the \I{ModIface}s: +Cabal packages are: -\begin{verbatim} - module HOLE:S(HOLE:S.foo) where - data T - foo :: HOLE:S.T - module THIS:M(THIS:M.bar) where - bar :: HOLE:S.T -\end{verbatim} -% -Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|, -which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|. -The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|; -this should be substituted to \verb|q():S.T|. But how do we discover this? -We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try -and look for \verb|q():S.T|. However, this \I{Name} does not exist because -the \verb|module S| reexports the selector from \verb|A|! Nor can we consult -the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant -type. +\begin{itemize} + \item The unit of distribution + \item The unit that Hackage handles + \item The unit of versioning + \item The unit of ownership (who maintains it etc) +\end{itemize} -The conclusion, then, is that a module written this way should be disallowed. -Specifically, the correctness condition for a signature is this: \emph{Any \I{Name} -mentioned in the \I{ModIface} of a signature must either be from an external module, or be -exported by the signature}. +Backpack units are the building blocks of modular development; +there may be multiple units per a Cabal package. While in theory +Cabal could do sophisticated things with multiple units in a +package, we expect Cabal +to pick a distinguished unit (with the same unit name $p$ as +the package) which serves as the publically visible unit. -\begin{aside} -\textbf{Special case export rule for record selectors.} Here is the analogous case for -record selectors: -\begin{verbatim} - unit p where - signature S(foo) where - data T = T { foo :: Int } - module M(bar) where - import S - bar = foo - unit q where - module A(T(..)) where - data T = T { foo :: Int } - module S(foo) where - import A - include p - module A where - import M - ... bar ... -\end{verbatim} -We could reject this, but technically we can find the right substitution -for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which -does mention \verb|T|. -\end{aside} +%\newpage \end{document} % chktex 16 diff --git a/docs/backpack/backpack-manual.tex b/docs/backpack/backpack-manual.tex index 89826789a5..658df1badc 100644 --- a/docs/backpack/backpack-manual.tex +++ b/docs/backpack/backpack-manual.tex @@ -13,72 +13,134 @@ \maketitle Backpack is a new module system for Haskell, intended to enable -``separate modular development'': a style of development where -application-writers can develop against abstract interfaces or -\emph{signatures}, while separately library-writers write software which -implement these interfaces. Backpack was originally described in a -POPL'14 paper \url{http://plv.mpi-sws.org/backpack/}, but the point of -this document is to describe the syntax of a language you might actually -be able to \emph{write}, as well as describe some of the changes to the -design we've made since this paper. - -\paragraph{Examples} - -Before we dive in, here are some examples of Backpack files to whet -your appetite: +``separate modular development'', where application-writers depend on +libraries by programming against abstract interfaces instead of specific +implementations. Our goal is to reduce software coupling, and let the +type system and compiler assist developers when they are developing +large software systems. Backpack was originally described in a POPL'14 +paper \url{http://plv.mpi-sws.org/backpack/}, but this document is +intended to describe the syntax of a language you might actually be able +to \emph{write}, i.e., from the perspective of a software developer. + +\paragraph{Examples of Backpack ``in the large''} +In the standard practice of large-scale software development today, +users organize code into libraries. Here is a very simple example of +some Haskell code structured in this way: \begin{verbatim} -package p(A) where - module A(a) where - a = True +unit p where + module A where + x = True + y = False -package q(B, C) where +unit q where include p - module B(b) where + module B where import A - b = a - module C(c) where - import B - c = b + b = x \end{verbatim} -In this example package \verb|p| exports a single module \verb|A|; package \verb|q| exports -two modules, and includes package \verb|p| so that its modules are in -scope. The form of a \verb|package| and a \verb|module| are quite similar: -a package can express an explicit export list of modules in much the same -way a module exports identifiers. \\ +\verb|p| is a reusable unit (or package/library if you like) which +provides a single module \verb|A|. \verb|q| depends on \verb|p| by +directly including it, bringing all of the modules of \verb|p| into scope for +import. -Here is a more complicated Backpack file taking advantage of the availability -of signatures in Backpack. This example omits the actual module bodies: GHC -will automatically look for them in appropriate files (e.g. \verb|p/Map.hsig|, -\verb|p/P/Types.hs|, \verb|p/P.hs|, \verb|q/QMap.hs| and \verb|q/Q.hs|). +There is a downside to writing code this way: without editing your +source code, you can't actually swap out \verb|p| with another version +(maybe \verb|p-2.0|, or perhaps even a version of \verb|p| written by someone else). +Traditionally, this is overcome by relying on an extralinguistic mechanism, +the \emph{package manager}, which indirects \verb|include p| so that +it can refer to some other piece of code. + +Backpack offers a different, in-language way of expressing dependency +without committing to an implementation: \begin{verbatim} -package p (P) requires (Map) where - include base - signature Map - module P.Types - module P +unit q where + require p + module B where + import A + b = x +\end{verbatim} -package q (Q) where - include base - module QMap - include p (P) requires (Map as QMap) - module Q +Or, equivalently: + +\begin{verbatim} +unit p-interface where + signature A where + x :: Bool + y :: Bool +unit q where + include p-interface + module B where + import A + b = x \end{verbatim} -Package \verb|p| provides a module \verb|P|, but it also \emph{requires} -a module \verb|Map| (which must fulfill the specification described -in \verb|signature Map|). This makes it an \emph{indefinite package}: -a package which isn't fully implemented, and cannot be compiled into -executable code until its ``hole'' (\verb|Map|) is filled. -It also sports an internal module named \verb|P.Types| which -is missing from the export list, and thus not exported. +With the \verb|require| keyword, Backpack automatically computes +\verb|p-interface|, which expresses the abstract interface of \verb|p|. +\verb|q| utilizes a subset of this interface (unused requirements +don't ``count''), resulting in this final version of \verb|q|: + +\begin{verbatim} +unit q where + signature A where + x :: Bool + module B where + import A + b = x +\end{verbatim} + +The technical innovation of Backpack is that the indefinite version of \verb|q|, which +does not depend on a specific implementation of \verb|p|, still composes +naturally and in the \emph{same} way that the definite version of +\verb|q|. That is to say, you can take a hierarchy of libraries that \verb|include| one +another (today's situation) and replace each \verb|include| with a +\verb|require|. The result is a reusable set of components with +explicitly defined requirements. If the inferred requirements are not +general enough, a signature can be written explicitly. + +Finally, the package manager serves a new role: it can be used to select +a combination of components which fulfill all the requirements. Unlike +dependency resolution with version numbers, with interfaces the package +manager can do this \emph{completely} precisely. + +\paragraph{Examples of Backpack ``in the small''} +Although Backpack was originally designed for library-scale development, +it can also be used for small-scale modular development, +similar to ML modules. Here is a simple example of writing an +associated list implementation that is parametrized over the key: + +\begin{verbatim} + unit assoc-map where + signature H where + data T + eq :: T -> T -> Bool + module Assoc where + import H + mylookup :: T -> [(T, a)] -> Maybe a + mylookup x xs = fmap snd (find (eq x) xs) + unit main where + module AbsInt where + type T = Int + eq x y = abs x == abs y + include assoc-map (Assoc as AbsIntAssoc) requires (T as AbsInt) + module Main where + import AbsIntAssoc + import Prelude + main = print (mylookup -1 [(1,"yes"),(-2,"no")]) +\end{verbatim} + +For example, in Haskell there are many different kinds of ``string''-like +data types: \verb|String| (a linked list of characters), \verb|ByteString| +(an unstructured raw bytestring) and \verb|Text| (a UTF-8 encoded bytestring). +Many libraries, e.g., parsers, need to work on any string-like input which +a user might want to give them. + +A more full example of a Backpack version of some real library +code can be found in Appendix~\ref{sec:tagstream-example}. + -Package \verb|q|, on the other hand, is a \emph{definite package}; as it -has no requirements, it can be compiled. When it includes \verb|p| -into scope, it also fills the \verb|Map| requirement with an -implementation \verb|QMap|. \section{The Backpack file} @@ -162,7 +224,7 @@ modules and signatures.} A signature, denoted with \verb|signature| in a Backpack file and a file with the \verb|hsig| extension on the file system, represents a (type) signature for a Haskell module. It can contain type signatures, data declarations, type -classes, type class instances and reexports(!), but it cannot contain any +classes, type class instances and reexports, but it cannot contain any value definitions.\footnote{Signatures are the backbone of the Backpack module system. A signature can be used to type-check client code which uses a module (without the module implementation), or to verify that an @@ -417,7 +479,7 @@ and, symmetrically, ascription in an include hides identifiers: ... private ... -- ERROR \end{verbatim} -\Red{OBSERVATION: thinning is subsumed by transparent signature ascription, but NOT renaming. Thus RENAMING does not commute across signature ascription; you must do it either before or after. Syntax for this is tricky.} +\Red{Observation: thinning is subsumed by transparent signature ascription, but NOT renaming. Thus RENAMING does not commute across signature ascription; you must do it either before or after. Syntax for this is tricky.} \paragraph{Syntactic sugar for anonymous packages} @@ -463,4 +525,99 @@ pkgexp ::= pkgname | path \end{verbatim} +\newpage +\label{sec:tagstream-example} +\section{\texttt{tagstream-conduit} example} + +When someone recently asked on Haskell Reddit what the ``precise problem Backpack +addresses'' was, Chris Doner offered a nice example from the +\verb|tagstream-conduit|. The existing implementation, at \url{http://hackage.haskell.org/package/tagstream-conduit-0.5.5.1/docs/Text-HTML-TagStream-Entities.html}, uses a data type +to define a ``module'' which is then used to implement two modules in the +library, a variant for \verb|ByteString| and a variant for \verb|Text|. + +Here is how this would be done in Backpack: + +\Red{This still contains some syntax which I haven't fully explained.} + +\begin{verbatim} + -- | This is an ordinary module which defines some types + -- which are exported by the package. + module Text.HTML.TagStream.Types where + data Token' s = Text s + | ... + + -- | This provides a 'Decoder' implementation for 'ByteString's. + -- We define 'Str' to be 'ByteString' and implement a few + -- functions manually, as well as reexport existing functions + -- from some libraries. We don't plan to publically export + -- these from the package. + module Decoder.ByteString ( + module Decoder.ByteString, + Builder, break, drop, uncons + ) where + import Data.ByteString + import Data.ByteString.Builder + type Str = ByteString + toStr = toByteString + fromStr = fromByteString + decodeEntity = ... + + -- | This provides a 'Decoder' implementation for 'Text', much + -- the same as 'Decoder.ByteString'. + module Decoder.Text ( + module Decoder.Text, + Builder, break, drop, uncons + ) where + import Data.Text + import Data.Text.Lazy.Builder + type Str = Text + toStr = toText + fromStr = fromText + decodeEntity = ... + + -- | This defines the "functor"; the module implementing entity + -- decoding, 'Entities', is parametrized by an abstract interface + -- 'Decoder' which describes two types, 'Str' and 'Builder', as + -- well as operations on them which are avaiable for the implementation. + unit decoder where + signature Decoder where + data Str + data Builder + toStr :: Builder -> Str + break :: (Char -> Bool) -> Str -> (Str, Str) + fromStr :: Str -> Builder + drop :: Int -> Str -> Str + decodeEntity :: Str -> Maybe Str + uncons :: Str -> Maybe (Char, Str) + module Entities where + import Text.HTML.TagStream.Types + import Data.Conduit + import Decoder + decodeEntities :: Monad m => Conduit (Token' Str) m (Token' Str) + decodeEntities = ... + + -- | Finally, these two lines instantiate 'Entities' twice; + -- once with 'Decoder.ByteString', and once as 'Decoder.Text'. + include decoder (Entities as Text.HTML.TagStream.ByteString) + requires (Decoder as Decoder.ByteString) + include decoder (Entities as Text.HTML.TagStream.Text) + requires (Decoder as Decoder.Text) +\end{verbatim} + +Without having the source-code inline, the out-of-line Backpack file +would look something like this: + +\begin{verbatim} + module Text.HTML.TagStream.Types -- Text/HTML/TagStream/Types.hs + module Decoder.ByteString -- Decoder/ByteString.hs + module Decoder.Text -- Decoder/Text.hs + unit decoder where + signature Decoder -- decoder/Decoder.hsig + module Entities -- decoder/Entities.hs + include decoder (Entities as Text.HTML.TagStream.ByteString) + requires (Decoder as Decoder.ByteString) + include decoder (Entities as Text.HTML.TagStream.Text) + requires (Decoder as Decoder.Text) +\end{verbatim} + \end{document} |