summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2015-08-06 22:31:17 -0700
committerEdward Z. Yang <ezyang@cs.stanford.edu>2015-08-06 22:31:24 -0700
commit6cab3afe90264b00e2a92f1aef09d6ca4e7a1680 (patch)
tree1e83aacbd4d1740433b0c79cd945b88d1edd1072
parenta1c934c1b97a09db841d20da4811e0e1310f7511 (diff)
downloadhaskell-6cab3afe90264b00e2a92f1aef09d6ca4e7a1680.tar.gz
Big batch of Backpack documentation edits.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
-rw-r--r--docs/backpack/algorithm.tex1395
-rw-r--r--docs/backpack/backpack-manual.tex265
2 files changed, 898 insertions, 762 deletions
diff --git a/docs/backpack/algorithm.tex b/docs/backpack/algorithm.tex
index 588f10e8cc..6cb316952f 100644
--- a/docs/backpack/algorithm.tex
+++ b/docs/backpack/algorithm.tex
@@ -15,6 +15,7 @@
\usepackage{algpseudocode}
\usepackage{bigfoot}
\usepackage{amssymb}
+\usepackage{amsmath}
\usepackage{framed}
% Alter some LaTeX defaults for better treatment of figures:
@@ -67,9 +68,8 @@ Here are the steps a unit goes through:
site (unit declaration), or an external unit declaration from
the indefinite unit database.
- \item The \textbf{dependency solver} takes a \I{unit}
- and converts it into a
- directed acyclic graph representing the
+ \item The \textbf{dependency solver} takes a unit
+ and converts it into a directed acyclic graph representing the
dependency structure of the declarations in a unit.
It also computes the \I{Module} of each module/signature
declaration, the \I{UnitKey} of each include, and the overall
@@ -79,16 +79,16 @@ Here are the steps a unit goes through:
and computes its \I{Shape}. A \I{Shape} describes precisely what
a unit provides and requires at the Haskell declaration level (\I{AvailInfo}).
- \item The \textbf{indefinite pipeline} takes the DAG and the shape
+ \item The \textbf{indefinite type checker} takes the DAG and the shape
and typechecks each module and signature against the indefinite unit
database. The type checking results are saved in the indefinite
- unit database under the \I{IndefiniteUnitId}.
+ unit database under the \I{IndefUnitId}.
- \item The \textbf{definite pipeline} takes the DAG as well as a
+ \item The \textbf{definite type checker and compiler} takes the DAG as well as a
\emph{hole mapping} specifying how each requirement in the unit
- is to be filled, and type-checks and compiles the unit against the
- installed unit database. The type checking results and object files
- are saved to the installed unit database under the \I{InstalledUnitId}.
+ is filled, and type-checks and compiles the unit against the
+ installed unit database. The results are saved to the installed unit database
+ under the \I{InstalledUnitId}.
\end{enumerate}
\section{Front-end syntax}
@@ -120,61 +120,112 @@ $$
A (slightly simplified) syntax of Backpack is given in Figure~\ref{fig:syntax}.
\newpage
-\section{Unit renamer}
+\section{Unit databases and the unit renamer}
\begin{figure}[htpb]
$$
\begin{array}{rcll}
- \I{InstalledPackageId} & & \mbox{Installed package IDs} \\
- \I{IndefiniteUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\
- \I{InstalledUnitId} & ::= & \I{IndefiniteUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\
- \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\
+\multicolumn{3}{l}{\mbox{\bf Identifiers}} \\
+ \I{InstalledPackageId} & & \mbox{Opaque unique identifier} \\
+ \I{IndefUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\
+ \I{UnitKey} & ::= & \I{IndefUnitId} \verb|(|\, \I{HoleMap}\, \verb|)| \\
+ & | & \verb|HOLE| \\
+ \I{HoleMap} & ::= & \I{ModuleName}\; \verb|->|\; \I{Module}\; \verb|,|\, \ldots \\
+ \I{Module} & ::= & \I{UnitKey} \verb|:| \I{ModuleName} \\
+ \I{InstalledUnitId} & ::= & \I{UnitKey} \quad \mbox{(with no occurrence of \texttt{HOLE})} \\[1em]
+\multicolumn{3}{l}{\mbox{\bf Unit databases}} \\
+ \I{IndefUnitDb} & ::= & \I{IndefUnitId} \; \verb|->| \; \I{IndefUnitRecord} \verb|,|\, \ldots \\
+ \I{InstalledUnitDb} & ::= & \I{InstalledUnitId} \; \verb|->| \; \I{InstalledUnitRecord} \verb|,|\, \ldots \\
\end{array}
$$
\caption{Unit identification} \label{fig:ids}
\end{figure}
-The unit renamer is responsible for transforming unit names $p$ into \I{IndefiniteUnitId}s, given some current \I{InstalledPackageId} (\I{ThisInstalledPackageId}) and a mapping from $p$ to
-\I{IndefiniteUnitId} (\I{UnitNameMap}). Its operation on a Backpack file (collection of units) is very simple:
+\subsection{The unit databases}
-\begin{itemize}
- \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$
- \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}.
-\end{itemize}
+To install a package so that it is available when other packages are compiled,
+we must record it in some sort of database (which the compiler will query later).
+The obvious design for such a database is for it to record \emph{installed packages},
+each of which has a collection of object files and interfaces from a compilation.
+However, with Backpack, we have to refine this picture in two ways:
+
+\begin{enumerate}
+ \item A package is always a unit of distribution: something that has
+ single authorship and is uploaded to Hackage. It would seriously hamper
+ modular programming in the small, however, if you always had to create
+ a new package to abstract over some requirements. Thus, while packages
+ continue to be a unit of distribution, the unit of modularity that the
+ compiler deals with is a unit; a package can contain multiple units.
+ Consequently, the database is a database of \emph{units}, not packages.
+ \item In Backpack, it may not be possible to \emph{compile} a unit
+ at install time: it may depend on some (as yet) unspecified holes.
+ Thus, we must maintain two databases: the \emph{indefinite unit
+ database}, recording type-checked but uncompiled units, and the
+ \emph{installed unit database}, recording compiled units. unit
+ database.
+\end{enumerate}
%
-The purpose of an \emph{IndefiniteUnitId} is to uniquely identify the results of typechecking
-an indefinite unit; similarly, an \emph{InstalledUnitId} uniquely identifies
-the results of compiling a unit with all its holes
-filled. It thus records a \emph{hole mapping} which specifies how each
-hole was filled.
-
-An \emph{installed package ID} (IPID) is an opaque string provided by
-Cabal which uniquely identifies an installed package. A recipe for
-computing an IPID would incorporate both the source info, such as the hash of the
-source code distribution tarball, as well as build info, such
-as the selected Cabal and GHC flags, what the provided mapping from
-$p$ to $IndefiniteUnitId$ was, etc.
-
-\paragraph{The difference between units and packages}
-Cabal packages are:
+Thus, there are four closely related types of identifier to be aware of:
+
+\begin{description} \item[Installed package IDs] Although the unit
+databases record compile products per unit, it is still useful to be
+able to assign an identifier to an installed package as a whole. Thus,
+the \I{InstalledPackageId} is a hash of the source code, as well as
+build info for the package, such as the Cabal and GHC flags, the
+resolution of textual dependencies, etc. The IPID is opaque to GHC
+and selected by Cabal, which operates at the level of packages (rather
+than units).
+
+\item[Indefinite unit IDs] A package contains multiple units; so if an
+installed package ID represents a package with all of its dependencies
+resolved, an indefinite unit ID simply singles out a specific unit in
+this installed package. Indefinite unit IDs identify entries in the
+\textbf{indefinite unit database}, which contain the results of
+type-checking a unit, but no actual object code. Instead, it has enough
+information so that the indefinite unit can be built into actual code
+when a requirement is filled (which means it includes source code and
+build flags as well.)
+
+\item[Unit keys] A unit key is an indefinite unit ID, but augmented
+with a \I{HoleMap}, which says how the requirements of the unit are
+instantiated. Unit keys are assigned when typecheckign both indefinite
+definite units: a fictitious unit \verb|HOLE| is specified to define
+all unfilled requirements in the unit. Units instantiated with holes
+are never installed; they are strictly for type-checking (although
+we could generate code for them, which could be linked if the
+\verb|HOLE| symbols are rewritten to their true destinations).
+
+\item[Installed unit IDs] An installed unit ID is a unit key which has
+no \verb|HOLE|s; it identifies a unit that can be compiled. The
+\textbf{installed unit database} caches the compilation results of these
+units, so that if a unit is compiled multiple times with the same
+instantiation, this code can be reused. (This database most closely
+resembles the existing installed package database with GHC today.)
+\end{description}
+
+\subsection{The unit renamer}
+
+The unit renamer is responsible for transforming unit names in a
+Backpack file into \I{IndefUnitId}s, so that they can be uniquely
+identified in the indefinite unit database.
+Given the \I{InstalledPackageId} of the package we are compiling (\I{ThisInstalledPackageId})
+and a mapping from $p$ to \I{IndefUnitId} (\I{UnitNameMap}), we rename
+as follows:
\begin{itemize}
- \item The unit of distribution
- \item The unit that Hackage handles
- \item The unit of versioning
- \item The unit of ownership (who maintains it etc)
+ \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$
+ \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}.
\end{itemize}
-Backpack units are the building blocks of modular development;
-there may be multiple units per a Cabal package. While in theory
-Cabal could do sophisticated things with multiple units in a
-package, we expect Cabal
-to pick a distinguished unit (with the same unit name $p$ as
-the package) which serves as the publically visible unit.
+The \I{UnitNameMap} is entirely user specified, so there is a great deal
+of flexibility on how it can be created, but the convention we expect to
+be used by Cabal is that a unit name $p$ corresponds to the same-named
+unit in a \emph{package} named $p$. Packages that don't use Backpack
+only have on unit, which has the same name as package.
-\paragraph{Notational convention}
+\paragraph{Notational conventions}
In the rest of this document, when it is unambiguous, we will use $p$, $q$ and $r$
-interchangeably with \I{IndefiniteUnitId}, as after unit renaming, there
+interchangeably with \I{IndefUnitId}, as after unit renaming, there
are no occurrences of unit names.
\newpage
@@ -185,114 +236,496 @@ $$
\begin{array}{rcll}
\tilde{d} & ::= & \verb|module|\; Module \; [exports]\; \verb|where|\; \I{body} \\
& | & \verb|signature|\; \I{Module} \; [exports]\; \verb|where|\; \I{body} \\
- & | & \verb|include|\; p\,\verb|(|\, m\; \verb|->|\; \I{Module}\,\verb|,|\;\ldots\,\verb|)| \\
- & | & \verb|merge|\; \I{Module}\; \verb|of|\; (\I{Module}, \I{IsSource?}) \ldots
+ & | & \verb|merge|\; \I{Module} \\
+ & | & \verb|include|\; \I{UnitKey} \\
+ \I{IndefUnitRecord}^{\mathsf{dep}} & ::= & \verb|provides:|\; m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\\
+ & & \verb|requires:|\; m\verb|,|\, \ldots
\end{array}
$$
\caption{Resolved declarations} \label{fig:resolved}
\end{figure}
-The first phase of compilation is defines a directed acyclic graph on
-the source syntax representing the dependency structure of the
-modules/signatures/includes in the unit. This DAG has a node per:
+The dependency solver computes the unfilled requirements of a unit, a
+dependency DAG on the modules/signatures in the unit, and a dependency
+DAG on the includes in the unit. We assume every referenced $p$ in the
+unit must be recorded in the indefinite unit database, such that we can
+look up $\I{IndefUnitRecord}^{\mathsf{dep}}$.
+
+\paragraph{Computing unfilled requirements} The unfilled requirements are $R-P$, where $R$ and $P$ are sets of module names computed from the declarations in the following manner:
\begin{itemize}
- \item Each source-level module, signature and include, and
- \item Each unfilled requirement (called a ``signature merge'' node).
+ \item $\verb|include|\; p$: union the (domain of the) provisions with $P$ and the requirements with $R$.
+ \item $\verb|module|\; m$: add $m$ to $P$.
+ \item $\verb|signature|\; m$: add $m$ to $R$.
\end{itemize}
+
+\paragraph{Declaration dependency graph}
+We define a graph where the nodes are as described in Figure~\ref{fig:resolved}:
+there is a node per
+for each module and signature, as well as an extra ``merge'' node for
+every unfilled requirement, merges the interfaces of a local signature and
+any requirements brought in from includes.
%
-Each module, signature and signature merge node can be identified
-with the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where \I{IsSource?}
-is true for signatures and false for modules and signature merges.
-The four nodes are described in Figure~\ref{fig:resolved}.
+Each node is identified by the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where
+the \I{Module} of a declaration $m$ in unit $p$ is \verb|p(H):m|, where $H$ maps
+each unfilled requirement $n$ to \verb|HOLE:n|, and \I{IsSource?} is true only for signatures.
The edges of the directed graph signify a ``depends on'' relation, and are
defined as follows:
\begin{itemize}
- \item A module/signature $m$ depends on $\verb|include|\; p$ if $m$ imports a module provided by $p$.
\item A module/signature $m$ depends on a module/signature merge $n$ if $m$ imports $n$.
\item A module/signature $m$ depends on a signature $n$ if $m$ \verb|{-# SOURCE #-}| imports $n$.
\item A module/signature merge $m$ depends on a local signature $m$ (if it exists).
- \item A module/signature merge $m$ depends on a $\verb|include|\; p$, if the include requires $m$.
-\end{itemize}
-%
-For compilation, these extra edges can also be defined if they
-do not introduce a cycle:
-
-\begin{itemize}
- \item An $\verb|include|\; p$ depends on $\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and $q$ provides $m$.
- \item An $\verb|include|\; p$ depends on a module $m$ if $p$ requires a module named $m$.
\end{itemize}
%
If the resulting graph has a cycle, this is an error.
-\paragraph{Computing unfilled requirements} To compute unfilled requirements,
-maintain two sets of module names: the provisions $P$ and the possible requirements $R'$. For each declaration:
+\paragraph{Include dependency graph} We define an dependency graph
+between includes, where an $\verb|include|\; p$ depends on
+$\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and
+$q$ provides $m$. If there is a cyclic, then there is cross-unit
+mutual recursion: for now, this is an error.
-\begin{itemize}
- \item $\verb|include|\; p$: union provisions with $P$ and requirements with $R'$.
- \item $\verb|module|\; m$: add $m$ to $P$
- \item $\verb|signature|\; m$: add $m$ to $R'$
-\end{itemize}
-%
-The unfilled requirements $R=R'-P$.
-
-\paragraph{Computing the \I{Module} of declarations}
-The \I{Module} of any declaration $m$ in a unit $p$ is simply
-\verb|p(A -> HOLE:A, ...):m|, where the hole map is a map
-from each unfilled requirement $n$ to \verb|HOLE:n|.
+Assuming an acyclic graph, we can compute the \I{UnitKey} of each
+key as follows. Initialize $\Gamma$, a substitution from holes to \I{Module},
+to the identity substitution. For each $\verb|include|\; p$ in topological
+order, define its \I{UnitKey} to be $p$ with the mapping $\Gamma$ with its
+domain restricted to the requirements of $p$. Then, for each provision
+$m\; \verb|->|\; \I{Module}$, update $\Gamma$ so that
+$\Gamma(m) = \operatorname{subst} (\Gamma, \I{Module})$
+(where $\operatorname{subst}$ recursively applies the substitution $\Gamma$ in \I{Module}).
-\paragraph{Computing the hole mapping of includes} In absence of mutual
-recursion of includes, the DAG is acyclic with include-include edges. Process includes
-in this topological order, maintaining a mapping of provided modules $\Gamma$, accumulating
-provisions of includes as we go along. For each $\verb|include|; p$, the hole map is
-simply the requirements of $p$, mapping $m$ to $\Gamma(m)$ if it is defined, and \verb|HOLE:m| otherwise.
-
-With mutual recursion, we have to use the regular tree unification algorithm described in the Backpack paper. We omit it from here for now.
+During compilation, the include dependency graph is useful for
+determining a compilation order for included units.
\newpage
-\section{Shaping}
+\section{Shaping pass}
\begin{figure}[htpb]
$$
\begin{array}{rcll}
-\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\
- & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\
+\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \I{ModShape} \verb|;|\; \ldots \\
+ & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \I{ModShape} \verb|;|\; \ldots \\
+\I{ModShape} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n \\
\I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\
& | & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\
\I{Name} & ::= & \I{Module} \verb|.| \I{OccName} \\
-\I{OccName} & & \mbox{Unqualified name in a namespace}
+\I{OccName} & & \mbox{Unqualified name in a namespace} \\
+\I{IndefUnitRecord}^{\mathsf{shape}} & ::= & \I{Shape}
\end{array}
$$
-\caption{Shaping} \label{fig:semantic}
+\caption{Shaping} \label{fig:shaping}
\end{figure}
-Shaping computes a \I{Shape}, whose form is described in Figure~\ref{fig:semantic}.
-A shape describes what modules a unit implements and exports (the \emph{provides})
-and what signatures a unit needs to have filled in (the \emph{requires}). Both
-provisions and requires are available for import by units which include this
-unit.
+The shaping pass computes the export \I{AvailInfo}s for each node in
+the dependency graph; collectively, these form the \I{Shape} of
+the unit described in Figure~\ref{fig:shaping}. Equivalently,
+the \I{Shape} of unit specifies what a unit requires and provides
+at the Haskell declaration level.
+
+An \I{AvailInfo} names a Haskell declaration that may be exported.
+It may be a plain identifier \I{Name}, or it may be a type constructor,
+in which case it has children \I{Name}s representing the names of the
+data constructors, record selectors, etc. This level of hierarchy
+makes it possible to use ellipses in an import list, e.g. \verb|TyCon(..)|,
+to selectively import just the logical children of a type constructor.
+Children names have the invariant that they have the same \I{Module} as the parent name.
+In a \I{ModShape}/export list,
+the \I{OccName}s of the plain identifier \I{AvailInfo}s and the \emph{children}
+of type constructor are unique
+(although the top-level \I{Name}s may not have unique \I{OccName}s).
+
+The compilation of every node is associated with a ``shape context'',
+which represents the modules which are transitively depended upon. Let the
+environment shape context is the merge of the shapes of all includes; to
+shape a node:
+
+\begin{enumerate}
+ \item Merge the environment shape contexts with the shape contexts
+ of all direct dependencies, resulting in the initial shape context.
+ \item Rename the module/signature according to the initial shape context,
+ getting a \I{ModShape}. Importantly, when renaming the signature \verb|M|,
+ any declarations defined in the signature are assigned a \I{Name}s with the \I{Module} \verb|HOLE:M| (rather than a \I{Module} based on the current unit $p$).
+ \item Merge this \I{ModShape} into the initial shape context
+ (modules go in provisions while signatures go in provisions), the
+ result defining the shape context of this node.
+\end{enumerate}
+
+We now elaborate on these steps in more detail.
-We incrementally build a shape by starting with an empty
-shape context and adding to it as follows:
+\subsection{Shapes of includes}
+Given an \verb|include p (X) requires (Y)|, we can look up the shape
+for $p$ from the indefinite package database. However, an include can
+also rename provisions and requires (where $X$, $Y$ are partial maps
+from module name to module name), which requires transforms the shape
+in the following way:
+
+\begin{itemize}
+ \item For each original provision $m\; \verb|->|\; \ldots$, provide
+ $X (m)\; \verb|->|\; \ldots$ if $X (m)$ is defined.
+ \item For each original requirement $m\; \verb|->|\; \ldots$, require
+ $Y (m)\; \verb|->|\; \ldots$ if $Y (m)$ is defined, and $m$ if it is not.
+ (Non-mentioned requirements are always passed through).
+ \item For each requirement renaming from \verb|M| to \verb|N| in $Y$, substitute
+ all occurrences of \verb|HOLE:M| to \verb|HOLE:N| in the \I{ModShape}
+ of all provisions and requirements.
+\end{itemize}
+
+\subsection{Shape merging}
+Before specifying the how to merge shapes algorithm, we must define some subprocedures
+for unifying and merging lower-level entities such as \I{AvailInfo}s and \I{Name}s,
+which produce \I{Name} substitution that are applied to shapes.
+
+\begin{description}
+ \item[Unify two \textit{Name}s]
+ (produces a \I{Name} and a \I{Name} substitution) \\
+ Error if the names do not have matching \I{OccName}s. Error if neither name
+ is a hole name. Otherwise, without loss of generality let $m$ be the hole name
+ and $n$ the other name, return $n$ and the substitution of $m$ to $n$.
+ \item[Merge two sets of \textit{Name}s]
+ (produces a set of \I{Name}s and a \I{Name} substitution) \\
+ Let two \I{Name}s be related if they have the same \I{OccName}.
+ Union the two sets, unifying related names.
+ \item[Unify two \textit{AvailInfo}s]
+ (produces an \I{AvailInfo} and a \I{Name} substitution) \\
+ If both \I{AvailInfo}s are simply
+ a \I{Name}, unify the two \I{Name}s. If both \I{AvailInfo}s are
+ $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$,
+ unify the top-level \I{Name}, apply the substitution to both \I{AvailInfo}s,
+ and return the unified \I{Name} with the union of the child names of the
+ substituted \I{AvailInfo}s.
+ Otherwise, error.
+ \item[Merge two sets of \textit{AvailInfo}s]
+ (produces a set of \I{AvailInfo}s and a \I{Name} substitution) \\
+ Let two \I{AvailInfo}s be related if they both are of the form
+ \I{Name} and have matching \I{OccName}s, or if they both are of
+ the form
+ $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$
+ and there exists a child name in each which have matching \I{OccName}s.
+ Union the two sets, unifying related \I{AvailInfo}s.
+ \item[Apply a name substitution on an \textit{AvailInfo}] (produces an \I{AvailInfo}) \\
+ Substitute the top-level \I{Name}, which induces a substitution from
+ \I{Module} to $\I{Module}'$. Apply this module substitution to each child
+ \I{Name} in the \I{AvailInfo}.
+
+\end{description}
+%
+Shape merging takes two units with inputs (requirements) and outputs
+(provisions) and ``wires'' them up so that outputs feed into inputs. To
+merge the shape of $p$ with the shape of $q$:
\begin{enumerate}
- \item Calculate the shape of a declaration, with respect to the
- current shape context. (e.g., by renaming a module/signature,
- or using the shape from an included unit.)
- \item Merge this shape into the shape context.
+ \item \emph{Fill every requirement of $q$ with provided modules from
+ $p$.} For each requirement $M$ of $q$ that is provided by $p$,
+ substitute each \I{Module} occurrence of \verb|HOLE:M| with the
+ provided $p\verb|(|M\verb|)|$ (however, do \textbf{NOT} substitute the
+ top-level \I{Module} in a \I{Name}s), merge the \I{AvailInfo}s and apply
+ the resulting \I{Name} substitution, and
+ remove the requirement from $q$. If the \I{AvailInfo}s of the
+ provision are not a superset of the required \I{AvailInfo}s,
+ error.
+ \item If mutual recursion is supported, \emph{fill every requirement
+ of $p$ with provided modules from $q$.}
+ \item \emph{Merge leftover requirements.} For each requirement $M$
+ of $q$ that is not provided by $p$ but required by $p$, and let
+ the new requirement be the merge of
+ \I{AvailInfo}s, applying the resulting \I{Name} substitution.
+ \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, (lazily) erroring
+ if there is a duplicate that doesn't have the same \I{Module}.
\end{enumerate}
-The final shape context is the shape of the unit as a whole.
-Optionally, we can also compute the renamed syntax trees of
-modules and signatures.
+\newpage
+\section{Indefinite type checker}
+
+\begin{figure}[htpb]
+$$
+\begin{array}{rcll}
+\I{IndefUnitRecord} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em]
+\multicolumn{3}{l}{\mbox{\bf Module interface}} \\
+\I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
+& & \qquad \I{mi\_decls} \\
+& & \qquad \I{mi\_insts} \\
+& & \qquad \I{dep\_orphs} \\
+\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
+\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
+\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
+\I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
+\multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
+\I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
+ & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
+ & | & \ldots \\
+\I{IfaceClsInst} & & \mbox{A type-class instance} \\
+\I{IfaceId} & & \mbox{Interface of top-level binder} \\
+\I{IfaceData} & & \mbox{Interface of type constructor} \\
+\end{array}
+$$
+\caption{Module interfaces in GHC} \label{fig:typecheck}
+\end{figure}
+
+\Red{This needs updating.}
+
+In general terms,
+type checking an indefinite unit (a unit with holes) involves
+calculating, for every module, a \I{ModIface} representing the
+type/interface of the module in question (which is serialized
+to disk). The general form of these
+interface files are described in Figure~\ref{fig:typecheck}; notably,
+the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references,
+which must be resolved by
+looking up a \I{ModIface} corresponding to the \I{Module} associated
+with the \I{Name}. (We will say more about this lookup process shortly.)
+For example, given:
+
+\begin{verbatim}
+ unit p where
+ signature H where
+ data T
+ module A(S, T) where
+ import H
+ data S = S T
+\end{verbatim}
+%
+the \I{PkgType} is:
+
+\begin{verbatim}
+ module HOLE:H (HOLE:H.T) where
+ data T -- abstract type constructor
+ module THIS:A (THIS:A.S, HOLE:H.T) where
+ data S = S HOLE:H.T
+ -- where THIS = p(H -> HOLE:H)
+\end{verbatim}
+%
+However, while it is true that the \I{ModIface} is the final result
+of type checking, we actually are conflating two distinct concepts: the user-visible
+notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
+into scope (or could trigger a deprecation warning, or pull in some
+orphan instances\ldots), versus the actual declarations, which, while recorded
+in the \I{ModIface}, have an independent existence: even if a declaration
+is not visible for an import, we may internally refer to its \I{Name}, and
+need to look it up to find out type information. (A simple case when
+this can occur is if a module exports a function with type \verb|T -> T|,
+but doesn't export \verb|T|).
+
+\begin{figure}[htpb]
+$$
+\begin{array}{rcll}
+\I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\
+\I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\
+\I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em]
+\multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\
+\I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\
+\I{ClsInst} & & \mbox{Type-checked type class instance} \\
+\end{array}
+$$
+\caption{Semantic objects in GHC} \label{fig:typecheck-more}
+\end{figure}
+
+Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
+Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
+a bag of type-checkable entities which GHC knows about. We
+define the \emph{external package state (EPT)} to
+simply be the union of the \I{ModDetails}
+of all external modules.
+
+Type checking is a delicate balancing act between module
+interfaces and our semantic objects. A \I{ModIface} may get
+type-checked multiple times with different hole instantiations
+to provide multiple \I{ModDetails}.
+Furthermore complicating matters
+is that GHC does this resolution \emph{lazily}: a \I{ModIface}
+is only converted to a \I{ModDetails} when we are looking up
+the type of a \I{Name} that is described by the interface;
+thus, unlike usual theoretical treatments of type checking, we can't
+eagerly go ahead and perform substitutions on \I{ModIface}s when
+they get included.
+
+In a separate compiler like GHC, there are two primary functions we must provide:
+
+\paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
+was explicitly imported by a user, we must produce a \I{ModIface}
+that, among other things, specifies what \I{Name}s are brought
+into scope. This is used by the renamer to resolve plain references
+to identifiers to real \I{Name}s. (By the way, if shaping produced
+renamed trees, it would not be necessary to do this step!)
+
+\paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
+a part of a \I{Name}, we must be able to type check it into
+a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
+associated with the \I{Module}, but this process is involved). This
+is used by the type checker to find out type information on things. \\
+
+There are two points in the type checker where these capabilities are exercised:
+
+\paragraph{Source-level imports} When a user explicitly imports a
+module, the \textit{ModuleName} is mapped to a \textit{ModIface}
+to find out what exports are brought into scope (\I{mi\_exports})
+and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
+the \textit{Module} is loaded to the EPT to bring instances from
+the module into scope.
+
+\paragraph{Internal name lookup} During type checking, we may have
+a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
+EPT, we type check and load
+into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
+and then check the EPT again. (\verb|importDecl|)
+
+\subsection{\textit{ModName} to \textit{ModIface}}
+
+In all cases, the \I{mi\_exports} can be calculated directly from the
+shaping process, which specifies exactly for each \I{ModName} in scope
+what will be brought into scope.
+
+\paragraph{Modules} Modules are straightforward, as for any
+\I{Module} there is only one possibly \I{ModIface} associated
+with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
+declaration.)
-In the description below, we'll assume \verb|THIS| is the unit key
-of the unit being processed.
+\paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
+associated with a \I{ModName} in scope, e.g. in this situation:
+
+\begin{verbatim}
+ unit p where
+ signature S where
+ data A
+ unit q where
+ include p
+ signature S where
+ data B
+ module M where
+ import S
+\end{verbatim}
+%
+Each literal \verb|signature| has a \I{ModIface} associated with it; and
+the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
+\I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
+but we also need to pull in orphan instances for each signature, and
+produce a warning for each deprecated signature.
\begin{aside}
-\textbf{\textit{OccName} is implied by \textit{Name}.}
+\textbf{Does hiding a signature hide its orphans.} Suppose that we have
+extended Backpack to allow hiding signatures from import.
+
+\begin{verbatim}
+ unit p requires (H) where -- H is hidden from import
+ module A where
+ instance Eq (a -> b) where -- orphan
+ signature H {-# DEPRECATED "Don't use me" #-} where
+ import A
+
+ unit q where
+ include p
+ signature H where
+ data T
+ module M where
+ import H -- warn deprecated?
+ instance Eq (a -> b) -- overlap?
+\end{verbatim}
+
+It is probably the most consistent to not pull in orphan instances
+and not give the deprecated warning: this corresponds to merging
+visible \I{ModIface}s, and ignoring invisible ones.
+\end{aside}
+
+\subsection{\textit{Module} to \textit{ModDetails}}
+
+\paragraph{Modules} For modules, we have a \I{Module} of
+the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
+and we also have a unique \I{ModIface}, where each hole instantiation
+is $\verb|HOLE:|m$.
+
+To generate the \I{ModDetails} associated with the specific instantiation,
+we have to type-check the \I{ModIface} with the following adjustments:
+
+\begin{enumerate}
+ \item Perform a \I{Module} substitution according to the instantiation
+ of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
+ substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
+ \verb|A -> HOLE:B|, \emph{unlike} the disjoint
+ substitutions applied by shaping.)
+ \item Perform a \I{Name} substitution as follows: for any name
+ with a unit key that is a $\verb|HOLE|$,
+ substitute with the recorded \I{Name} in the requirements of the shape.
+ Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
+ and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
+\end{enumerate}
+
+\paragraph{Signatures} For signatures, we have a \I{Module} of the form
+$\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
+We distinguish each separate \I{ModIface} by considering the full \I{UnitKey}
+it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
+the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole
+and their defining unit keys can easily be calculated during shaping.
+
+To generate the \I{ModDetails} associated with a hole, we type-check each
+\I{ModIface}, with the following adjustments:
+
+\begin{enumerate}
+ \item Perform a \I{Module} substitution according to the instantiation
+ of the defining unit key. (NB: This may rename the hole itself!)
+ \item Perform a \I{Name} substitution as follows, in the same manner
+ as would be done in the case of modules.
+ \item When these \I{ModDetails} are merged into the EPT, some merging
+ of duplicate types may occur; a type
+ may be defined multiple times, in which case we check that each
+ definition is compatible with the previous ones. A concrete
+ type is always compatible with an abstract type.
+\end{enumerate}
+
+\paragraph{Invariants} When we perform \I{Name} substitutions, we must be
+sure that we can always find out the correct \I{Name} to substitute to.
+This isn't obviously true, consider:
+
+\begin{verbatim}
+ unit p where
+ signature S(foo) where
+ data T
+ foo :: T
+ module M(bar) where
+ import S
+ bar = foo
+ unit q where
+ module A(T(..)) where
+ data T = T
+ foo = T
+ module S(foo) where
+ import A
+ include p
+ module A where
+ import M
+ ... bar ...
+\end{verbatim}
+%
+When we type check \verb|p|, we get the \I{ModIface}s:
+
+\begin{verbatim}
+ module HOLE:S(HOLE:S.foo) where
+ data T
+ foo :: HOLE:S.T
+ module THIS:M(THIS:M.bar) where
+ bar :: HOLE:S.T
+\end{verbatim}
+%
+Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
+which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
+The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
+this should be substituted to \verb|q():S.T|. But how do we discover this?
+We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
+and look for \verb|q():S.T|. However, this \I{Name} does not exist because
+the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
+the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
+type.
+
+The conclusion, then, is that a module written this way should be disallowed.
+Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
+mentioned in the \I{ModIface} of a signature must either be from an external module, or be
+exported by the signature}.
+
+\section{Appendix: Shaping}
+
+This section clarifies some subtle aspects about shaping.
+
+\subsection{\textit{OccName} is implied by \textit{Name}}
In Haskell, the following is not valid syntax:
\begin{verbatim}
@@ -305,15 +738,15 @@ in such a world, we need a different definition of shape:
\begin{verbatim}
Shape ::=
- provided: ModName -> { OccName -> Name }
- required: ModName -> { OccName -> Name }
+ provided: ModName -> Module { OccName -> Name }
+ required: ModName -> { OccName -> Name }
\end{verbatim}
Presently, however, such an \I{OccName} annotation would be redundant: it can be inferred from the \I{Name}.
-\end{aside}
-\begin{aside}
-\textbf{Holes of a unit are a mapping, not a set.} Why can't the \I{UnitKey} just record a
-set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
+\subsection{Holes of a unit are a mapping, not a set.}
+
+Why can't the \I{UnitKey} just record a
+set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; p \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
\begin{verbatim}
unit p (A) requires (H1, H2) where
@@ -336,12 +769,10 @@ set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{M
\end{verbatim}
With a mapping, the first instance of \verb|p| has key \verb|p(H1 -> q():I1, H2 -> q():I2)|
while the second instance has key \verb|p(H1 -> q():I2, H2 -> q():I1)|; with
-a set, both would have the key \verb|p(q():I1, q():I2)|.
-\end{aside}
-
+a set, both would have the key \verb|p{q():I1, q():I2}| and be
+indistinguishable.
-\begin{aside}
-\textbf{Signatures can require a specific entity.}
+\subsection{Signatures can require a specific entity.}
With requirements like \verb|A -> { HOLE:A.T, HOLE:A.foo }|,
why not specify it as \verb|A -> { T, foo }|,
e.g., \verb|required: { ModName -> { OccName } }|? Consider:
@@ -360,12 +791,12 @@ can be expressed with \I{Name}s as
A -> { HOLE:A.T }
B -> { HOLE:A.T }
\end{verbatim}
-But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| don't have to be implemented with the same module.)
-\end{aside}
+But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| could be filled with different modules, they just have
+to both export the same \verb|T|.)
-\begin{aside}
-\textbf{The \textit{Name} of a value is used to avoid
-ambiguous identifier errors.} We state that two types
+\subsection{The \textit{Name} of a value is used to avoid
+ambiguous identifier errors.}
+We state that two types
are equal when their \I{Name}s are the same; however,
for values, it is less clear why we care. But consider this example:
@@ -385,10 +816,8 @@ that \verb|x| from \verb|H1| and \verb|x| from \verb|H2| are the same (have
the same \I{Name}.) If they were not the same, it would be ambiguous and
should cause an error. Knowing the \I{Name} of a value distinguishes
between these two cases.
-\end{aside}
-\begin{aside}
-\textbf{Holes are linear}
+\subsection{Holes are linear}
Requirements do not record what \I{Module} represents
the identity of a requirement, which means that it's not possible to assert
that hole \verb|A| and hole \verb|B| should be implemented with the same module,
@@ -408,55 +837,15 @@ looking for in one of $n$ names, accepting any name as an acceptable linkage.
If aliasing was allowed, we'd need a separate physical shaping context,
to make sure multiple mentions of the same hole were consistent.
-\end{aside}
-
-%\newpage
-
-\subsection{\texttt{module M}}
-
-A module declaration provides a module \verb|THIS:M| at module name \verb|M|. It
-has the shape:
-
-\begin{verbatim}
- provides: M -> THIS:M { exports of renamed M under THIS:M }
- requires: (nothing)
-\end{verbatim}
-Example:
-
-\begin{verbatim}
- module A(T) where
- data T = T
-
- -- provides: A -> THIS:A { THIS:A.T }
- -- requires: (nothing)
-\end{verbatim}
-
-\newpage
-\subsection{\texttt{signature M}}
-
-A signature declaration creates a requirement at module name \verb|M|. It has the shape:
-
-\begin{verbatim}
- provides: (nothing)
- requires: M -> { exports of renamed M under HOLE:M }
-\end{verbatim}
-
-\noindent Example:
-
-\begin{verbatim}
- signature H(T) where
- data T
-
- -- provides: H -> (nothing)
- -- requires: H -> { HOLE:H.T }
-\end{verbatim}
-
-\begin{aside}
-\textbf{In-scope signatures are not provisions}. We enforce the invariant that
+\subsection{A unit does not ``provide'' its signatures}
+We enforce the invariant that
a provision is always (syntactically) a \verb|module| and a requirement
is always a \verb|signature|. This means that if you have a requirement
and a provision of the same name, the requirement can \emph{always} be filled
-with the provision. Without this invariant, it's not clear if a provision
+with the provision.
+
+The alternate design, where a unit both requires and provides
+its signatures, makes it unclear if a provision
will actually fill a signature. Consider this example, where
a signature is required and exposed:
@@ -484,7 +873,7 @@ actually provide enough declarations to satisfy \verb|a-user|'s
requirement: the intended semantics \emph{merges} the requirements
of \verb|a-sigs| and \verb|a-user|.
-
+What about this example?
\begin{verbatim}
unit a-sigs (M as A) requires (H as A) where
@@ -508,236 +897,21 @@ be under the name \verb|H| or \verb|A|?
It may still be possible to use the \verb|(A) requires (A)| syntax to
indicate exposed signatures, but this would be a mere syntactic
alternative to \verb|() requires (exposed A)|.
-\end{aside}
-%
-
-\newpage
-\subsection{\texttt{include pkg (X) requires (Y)}}
-
-We merge with the transformed shape of unit \verb|pkg|, where this
-shape is transformed by:
-
-\begin{itemize}
- \item Renaming and thinning the provisions according to \verb|(X)|
- \item Renaming requirements according to \verb|(Y)| (requirements cannot
- be thinned, so non-mentioned requirements are implicitly passed through.)
- For each renamed requirement from \verb|Y| to \verb|Y'|,
- substitute \verb|HOLE:Y| with \verb|HOLE:Y'| in the
- \I{Module}s and \I{Name}s of the provides and requires.
-\end{itemize}
-%
-If there are no thinnings/renamings, you just merge the
-shape unchanged! Here is an example:
-
-\begin{verbatim}
- unit p (M) requires (H) where
- signature H where
- data T
- module M where
- import H
- data S = S T
-
- unit q (A) where
- module X where
- data T = T
- include p (M as A) requires (H as X)
-\end{verbatim}
-%
-The shape of unit \verb|p| is:
-\begin{verbatim}
- requires: M -> { p(H -> HOLE:H):M.S }
- provides: H -> { HOLE:H.T }
-\end{verbatim}
-%
-Thus, when we process the \verb|include| in unit \verb|q|,
-we make the following two changes: we rename the provisions,
-and we rename the requirements, substituting \verb|HOLE|s.
-The resulting shape to be merged in is:
-
-\begin{verbatim}
- provides: A -> { p(H -> HOLE:X):M.S }
- requires: X -> { HOLE:X.T }
-\end{verbatim}
-%
-After merging this in, the final shape of \verb|q| is:
-
-\begin{verbatim}
- provides: X -> { q():X.T } -- from shaping 'module X'
- A -> { p(H -> q():X):M.S }
- requires: (nothing) -- discharged by provided X
-\end{verbatim}
-
-\newpage
-
-\subsection{Merging}
-
-The shapes we've given for individual declarations have been quite
-simple. Merging combines two shapes, filling requirements with
-implementations, unifying \I{Name}s, and unioning requirements; it is
-the most complicated part of the shaping process.
-
-The best way to think about merging is that we take two units with
-inputs (requirements) and outputs (provisions) and ``wiring'' them up so
-that outputs feed into inputs. In the absence
-of mutual recursion, this wiring process is \emph{directed}: the provisions
-of the first unit feed into the requirements of the second unit,
-but never vice versa. (With mutual recursion, things can go in the opposite
-direction as well.)
-
-Suppose we are merging shape $p$ with shape $q$ (e.g., $p; q$). Merging
-proceeds as follows:
-
-\begin{enumerate}
- \item \emph{Fill every requirement of $q$ with provided modules from
- $p$.} For each requirement $M$ of $q$ that is provided by $p$ (in particular,
- all of its required \verb|Name|s are provided),
- substitute each \I{Module} occurrence of \verb|HOLE:M| with the
- provided $p\verb|(|M\verb|)|$, unify the names, and remove the requirement from $q$.
- If the names of the provision are not a superset of the required names, error.
- \item If mutual recursion is supported, \emph{fill every requirement of $p$ with provided modules from $q$.}
- \item \emph{Merge leftover requirements.} For each requirement $M$ of $q$ that is not
- provided by $p$ but required by $p$, unify the names, and union them together to form the new requirement. (It's not
- necessary to substitute \I{Module}s, since they are guaranteed to be the same.)
- \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, erroring
- if there is a duplicate that doesn't have the same identity.
-\end{enumerate}
-%
-To unify two sets of names, find each pair of names with matching \I{OccName}s $n$ and $m$ and do the following:
-
-\begin{enumerate}
- \item If both are from holes, pick a canonical representative $m$ and substitute $n$ with $m$.
- \item If one $n$ is from a hole, substitute $n$ with $m$.
- \item Otherwise, error if the names are not the same.
-\end{enumerate}
-%
-It is important to note that substitutions on \I{Module}s and substitutions on
-\I{Name}s are disjoint: a substitution from \verb|HOLE:A| to \verb|HOLE:B|
-does \emph{not} substitute inside the name \verb|HOLE:A.T|.
-
-Since merging is the most complicated step of shaping, here are a big
-pile of examples of it in action.
-
-\subsubsection{A simple example}
-
-In the following set of units:
-
-\begin{verbatim}
- unit p(M) requires (A) where
- signature A(T) where
- data T
- module M(T, S) where
- import A(T)
- data S = S T
-
- unit q where
- module A where
- data T = T
- include p
-\end{verbatim}
-
-When we \verb|include p|, we need to merge the partial shape
-of \verb|q| (with just provides \verb|A|) with the shape
-of \verb|p|. Here is each step of the merging process:
-
-\begin{verbatim}
- shape 1 shape 2
- --------------------------------------------------------------------------------
-(initial shapes)
-provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S }
-requires: (nothing) A -> { HOLE:A.T }
-
-(after filling requirements)
-provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
-requires: (nothing) (nothing)
-
-(after adding provides)
-provides: A -> THIS:A { q():A.T }
- M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
-requires: (nothing)
-\end{verbatim}
-
-Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|.
-
-\subsubsection{Requirements merging can affect provisions}
-
-When a merge results in a substitution, we substitute over both
-requirements and provisions:
-
-\begin{verbatim}
- signature H(T) where
- data T
- module A(T) where
- import H(T)
- module B(T) where
- data T = T
-
- -- provides: A -> THIS:A { HOLE:H.T }
- -- B -> THIS:B { THIS:B.T }
- -- requires: H -> { HOLE:H.T }
-
- signature H(T, f) where
- import B(T)
- f :: a -> a
-
- -- provides: A -> THIS:A { THIS:B.T } -- UPDATED
- -- B -> THIS:B { THIS:B.T }
- -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED
-\end{verbatim}
-
-\subsubsection{Sharing constraints}
-
-Suppose you have two signature which both independently define a type,
-and you would like to assert that these two types are the same. In the
-ML world, such a constraint is known as a sharing constraint. Sharing
-constraints can be encoded in Backpacks via clever use of reexports;
-they are also an instructive example for signature merging.
-
-\begin{verbatim}
- signature A(T) where
- data T
- signature B(T) where
- data T
-
- -- requires: A -> { HOLE:A.T }
- B -> { HOLE:B.T }
-
- -- the sharing constraint!
- signature A(T) where
- import B(T)
- -- (shape to merge)
- -- requires: A -> { HOLE:B.T }
-
- -- (after merge)
- -- requires: A -> { HOLE:A.T }
- -- B -> { HOLE:A.T }
-\end{verbatim}
-%
-\Red{I'm pretty sure any choice of \textit{Name} is OK, since the
-subsequent substitution will make it alpha-equivalent.}
-
-\subsection{Export declarations}
-
-If an explicit export declaration is given, the final shape is the
-computed shape, minus any provisions not mentioned in the list, with the
-appropriate renaming applied to provisions and requirements. (Requirements
-are implicitly passed through if they are not named.)
-If no explicit export declaration is given, the final shape is
-the computed shape, including only provisions which were defined
-in the declarations of the unit.
-
-\begin{aside}
-\textbf{Signature visibility, and defaulting}
+\subsection{Signature visibility, and defaulting}
The simplest formulation of requirements is to have them always be
-visible. Signature visibility could be controlled by associating
-every requirement with a flag indicating if it is importable or
-not: a signature declaration sets a requirement to be visible, and
-an explicit export list can specify if a requirement is to be visible
-or not.
+importable. One proposed enhancement, however, is to allow some
+requirements to be ``non-importable''; that is, they are not visible
+to people who include packages.
+
+One simple way of modeling this is to associate each required module
+with a flag indicating whether or not it is importable or not. Then, we
+might imagine that an explicit export list could be used to toggle
+whether or not a requirement is visible or not.
-When an export list is absent, we have to pick a default visibility
-for a signature. If we use the same behavior as with modules,
-a strange situation can occur:
+However, when an export list is absent, we have to pick a default
+visibility for a signature. If we use the same behavior as with
+modules, a strange situation can occur:
\begin{verbatim}
unit p where -- S is visible
@@ -779,44 +953,13 @@ unit q where
include p
signature S where
\end{verbatim}
-\end{aside}
%
% SPJ: This would be too complicated (if there's yet a third way)
-\clearpage
-\newpage
+This is pretty complicated, so signature visibility is not currently
+planned to be implemented.
-\subsection{Merging AvailInfos}
-
-We describe how to take two sets of \I{AvailInfo}s and merges them
-into one set. In the degenerate case where every \I{AvailInfo} is a
-$Name$, this algorithm operates the same as the original algorithm.
-Merging proceeds in two steps: unification and then simple union.
-
-Unification proceeds as follows: for each pair of \I{Name}s with
-matching \I{OccName}s, unify the names. For each pair of $\I{Name}\, \verb|{|\,
-\I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$, where there
-exists some pair of child names with matching \I{OccName}s, unify the
-parent \I{Name}s. (A single \I{AvailInfo} may participate in multiple such
-pairs.) A simple identifier and a type constructor \I{AvailInfo} with
-overlapping in-scope names fails to unify. After unification,
-the simple union combines entries with matching \verb|availName|s (parent
-name in the case of a type constructor), recursively unioning the child
-names of type constructor \I{AvailInfo}s.
-
-Unification of \I{Name}s results in a substitution, and a \I{Name} substitution
-on \I{AvailInfo} is a little unconventional. Specifically, substitution on $\I{Name}\, \verb|{|\,
-\I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$ proceeds specially:
-a substitution from \I{Name} to $\I{Name}'$ induces a substitution from
-\I{Module} to $Module'$ (as the \I{OccName}s of the \I{Name}s are guaranteed
-to be equal), so for each child $\I{Name}_i$, perform the \I{Module}
-substitution. So for example, the substitution \verb|HOLE:A.T| to \verb|THIS:A.T|
-takes the \I{AvailInfo} \verb|HOLE:A.T { HOLE:A.B, HOLE:A.foo }| to
-\verb|THIS:A.T { THIS:A.B, THIS:A.foo }|. In particular, substitution
-on children \I{Name}s is \emph{only} carried out by substituting on the outer name;
-we will never directly substitute children.
-
-Unfortunately, there are a number of tricky scenarios:
+\subsection{Tricky \textit{AvailInfo} merging scenarios}
\paragraph{Merging when type constructors are not in scope}
@@ -1002,315 +1145,151 @@ more about \verb|A| when the sharing constraint performs unification;
however, the \I{AvailInfo} will only tell us about what is in-scope, which
is \emph{not} enough information.
-%\newpage
-
-\section{Type checking}
+\subsection{Some examples}
-\begin{figure}[htpb]
-$$
-\begin{array}{rcll}
-\I{PkgType} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em]
-\multicolumn{3}{l}{\mbox{\bf Module interface}} \\
-\I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
-& & \qquad \I{mi\_decls} \\
-& & \qquad \I{mi\_insts} \\
-& & \qquad \I{dep\_orphs} \\
-\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
-\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
-\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
-\I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
-\multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
-\I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
- & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
- & | & \ldots \\
-\I{IfaceClsInst} & & \mbox{A type-class instance} \\
-\I{IfaceId} & & \mbox{Interface of top-level binder} \\
-\I{IfaceData} & & \mbox{Interface of type constructor} \\
-\end{array}
-$$
-\caption{Module interfaces in GHC} \label{fig:typecheck}
-\end{figure}
+\subsubsection{A simple example}
-In general terms,
-type checking an indefinite unit (a unit with holes) involves
-calculating, for every module, a \I{ModIface} representing the
-type/interface of the module in question (which is serialized
-to disk). The general form of these
-interface files are described in Figure~\ref{fig:typecheck}; notably,
-the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references,
-which must be resolved by
-looking up a \I{ModIface} corresponding to the \I{Module} associated
-with the \I{Name}. (We will say more about this lookup process shortly.)
-For example, given:
+In the following set of units:
\begin{verbatim}
- unit p where
- signature H where
+ unit p(M) requires (A) where
+ signature A(T) where
data T
- module A(S, T) where
- import H
+ module M(T, S) where
+ import A(T)
data S = S T
-\end{verbatim}
-%
-the \I{PkgType} is:
-\begin{verbatim}
- module HOLE:H (HOLE:H.T) where
- data T -- abstract type constructor
- module THIS:A (THIS:A.S, HOLE:H.T) where
- data S = S HOLE:H.T
- -- where THIS = p(H -> HOLE:H)
+ unit q where
+ module A where
+ data T = T
+ include p
\end{verbatim}
-%
-However, while it is true that the \I{ModIface} is the final result
-of type checking, we actually are conflating two distinct concepts: the user-visible
-notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
-into scope (or could trigger a deprecation warning, or pull in some
-orphan instances\ldots), versus the actual declarations, which, while recorded
-in the \I{ModIface}, have an independent existence: even if a declaration
-is not visible for an import, we may internally refer to its \I{Name}, and
-need to look it up to find out type information. (A simple case when
-this can occur is if a module exports a function with type \verb|T -> T|,
-but doesn't export \verb|T|).
-\begin{figure}[htpb]
-$$
-\begin{array}{rcll}
-\I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\
-\I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\
-\I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em]
-\multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\
-\I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\
-\I{ClsInst} & & \mbox{Type-checked type class instance} \\
-\end{array}
-$$
-\caption{Semantic objects in GHC} \label{fig:typecheck-more}
-\end{figure}
+When we \verb|include p|, we need to merge the partial shape
+of \verb|q| (with just provides \verb|A|) with the shape
+of \verb|p|. Here is each step of the merging process:
-Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
-Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
-a bag of type-checkable entities which GHC knows about. We
-define the \emph{external package state (EPT)} to
-simply be the union of the \I{ModDetails}
-of all external modules.
+\begin{verbatim}
+ shape 1 shape 2
+ --------------------------------------------------------------------------------
+(initial shapes)
+provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S }
+requires: (nothing) A -> { HOLE:A.T }
-Type checking is a delicate balancing act between module
-interfaces and our semantic objects. A \I{ModIface} may get
-type-checked multiple times with different hole instantiations
-to provide multiple \I{ModDetails}.
-Furthermore complicating matters
-is that GHC does this resolution \emph{lazily}: a \I{ModIface}
-is only converted to a \I{ModDetails} when we are looking up
-the type of a \I{Name} that is described by the interface;
-thus, unlike usual theoretical treatments of type checking, we can't
-eagerly go ahead and perform substitutions on \I{ModIface}s when
-they get included.
+(after filling requirements)
+provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
+requires: (nothing) (nothing)
-In a separate compiler like GHC, there are two primary functions we must provide:
+(after adding provides)
+provides: A -> THIS:A { q():A.T }
+ M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
+requires: (nothing)
+\end{verbatim}
-\paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
-was explicitly imported by a user, we must produce a \I{ModIface}
-that, among other things, specifies what \I{Name}s are brought
-into scope. This is used by the renamer to resolve plain references
-to identifiers to real \I{Name}s. (By the way, if shaping produced
-renamed trees, it would not be necessary to do this step!)
+Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|.
-\paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
-a part of a \I{Name}, we must be able to type check it into
-a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
-associated with the \I{Module}, but this process is involved). This
-is used by the type checker to find out type information on things. \\
+\subsubsection{Requirements merging can affect provisions}
-There are two points in the type checker where these capabilities are exercised:
+When a merge results in a substitution, we substitute over both
+requirements and provisions:
-\paragraph{Source-level imports} When a user explicitly imports a
-module, the \textit{ModuleName} is mapped to a \textit{ModIface}
-to find out what exports are brought into scope (\I{mi\_exports})
-and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
-the \textit{Module} is loaded to the EPT to bring instances from
-the module into scope.
+\begin{verbatim}
+ signature H(T) where
+ data T
+ module A(T) where
+ import H(T)
+ module B(T) where
+ data T = T
-\paragraph{Internal name lookup} During type checking, we may have
-a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
-EPT, we type check and load
-into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
-and then check the EPT again. (\verb|importDecl|)
+ -- provides: A -> THIS:A { HOLE:H.T }
+ -- B -> THIS:B { THIS:B.T }
+ -- requires: H -> { HOLE:H.T }
-\subsection{\textit{ModName} to \textit{ModIface}}
+ signature H(T, f) where
+ import B(T)
+ f :: a -> a
-In all cases, the \I{mi\_exports} can be calculated directly from the
-shaping process, which specifies exactly for each \I{ModName} in scope
-what will be brought into scope.
+ -- provides: A -> THIS:A { THIS:B.T } -- UPDATED
+ -- B -> THIS:B { THIS:B.T }
+ -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED
+\end{verbatim}
-\paragraph{Modules} Modules are straightforward, as for any
-\I{Module} there is only one possibly \I{ModIface} associated
-with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
-declaration.)
+\subsubsection{Sharing constraints}
-\paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
-associated with a \I{ModName} in scope, e.g. in this situation:
+Suppose you have two signature which both independently define a type,
+and you would like to assert that these two types are the same. In the
+ML world, such a constraint is known as a sharing constraint. Sharing
+constraints can be encoded in Backpacks via clever use of reexports;
+they are also an instructive example for signature merging.
\begin{verbatim}
- unit p where
- signature S where
- data A
- unit q where
- include p
- signature S where
- data B
- module M where
- import S
-\end{verbatim}
-%
-Each literal \verb|signature| has a \I{ModIface} associated with it; and
-the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
-\I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
-but we also need to pull in orphan instances for each signature, and
-produce a warning for each deprecated signature.
+ signature A(T) where
+ data T
+ signature B(T) where
+ data T
-\begin{aside}
-\textbf{Does hiding a signature hide its orphans.} Suppose that we have
-extended Backpack to allow hiding signatures from import.
+ -- requires: A -> { HOLE:A.T }
+ B -> { HOLE:B.T }
-\begin{verbatim}
- unit p requires (H) where -- H is hidden from import
- module A where
- instance Eq (a -> b) where -- orphan
- signature H {-# DEPRECATED "Don't use me" #-} where
- import A
+ -- the sharing constraint!
+ signature A(T) where
+ import B(T)
+ -- (shape to merge)
+ -- requires: A -> { HOLE:B.T }
- unit q where
- include p
- signature H where
- data T
- module M where
- import H -- warn deprecated?
- instance Eq (a -> b) -- overlap?
+ -- (after merge)
+ -- requires: A -> { HOLE:A.T }
+ -- B -> { HOLE:A.T }
\end{verbatim}
+%
+\Red{I'm pretty sure any choice of \textit{Name} is OK, since the
+subsequent substitution will make it alpha-equivalent.}
-It is probably the most consistent to not pull in orphan instances
-and not give the deprecated warning: this corresponds to merging
-visible \I{ModIface}s, and ignoring invisible ones.
-\end{aside}
+\subsection{Export declarations}
-\subsection{\textit{Module} to \textit{ModDetails}}
+If an explicit export declaration is given, the final shape is the
+computed shape, minus any provisions not mentioned in the list, with the
+appropriate renaming applied to provisions and requirements. (Requirements
+are implicitly passed through if they are not named.)
+If no explicit export declaration is given, the final shape is
+the computed shape, including only provisions which were defined
+in the declarations of the unit.
-\paragraph{Modules} For modules, we have a \I{Module} of
-the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
-and we also have a unique \I{ModIface}, where each hole instantiation
-is $\verb|HOLE:|m$.
-To generate the \I{ModDetails} associated with the specific instantiation,
-we have to type-check the \I{ModIface} with the following adjustments:
+\section{Cabal}
-\begin{enumerate}
- \item Perform a \I{Module} substitution according to the instantiation
- of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
- substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
- \verb|A -> HOLE:B|, \emph{unlike} the disjoint
- substitutions applied by shaping.)
- \item Perform a \I{Name} substitution as follows: for any name
- with a unit key that is a $\verb|HOLE|$,
- substitute with the recorded \I{Name} in the requirements of the shape.
- Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
- and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
-\end{enumerate}
-\paragraph{Signatures} For signatures, we have a \I{Module} of the form
-$\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
-We distinguish each separate \I{ModIface} by considering the full \I{UnitKey}
-it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
-the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole
-and their defining unit keys can easily be calculated during shaping.
+% \I{InstalledUnitId} & ::= & \I{IndefUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\
+% \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\
-To generate the \I{ModDetails} associated with a hole, we type-check each
-\I{ModIface}, with the following adjustments:
+\paragraph{Indefinite versus installed units}
+The purpose of an \I{IndefUnitId} is to uniquely identify the results of \textbf{typechecking}
+an indefinite unit; whereas an \I{InstalledUnitId} uniquely identifies
+the results of \textbf{compiling} a unit with all its holes
+filled. Thus, an \I{InstalledUnitId} also records a \emph{hole mapping}
+which specifies how each hole was filled. If an \I{InstalledUnitId}
+is only partially filled, we may refer to it as a \I{UnitKey} (as these
+are never installed.)
-\begin{enumerate}
- \item Perform a \I{Module} substitution according to the instantiation
- of the defining unit key. (NB: This may rename the hole itself!)
- \item Perform a \I{Name} substitution as follows, in the same manner
- as would be done in the case of modules.
- \item When these \I{ModDetails} are merged into the EPT, some merging
- of duplicate types may occur; a type
- may be defined multiple times, in which case we check that each
- definition is compatible with the previous ones. A concrete
- type is always compatible with an abstract type.
-\end{enumerate}
-
-\paragraph{Invariants} When we perform \I{Name} substitutions, we must be
-sure that we can always find out the correct \I{Name} to substitute to.
-This isn't obviously true, consider:
+\paragraph{Units versus packages}
-\begin{verbatim}
- unit p where
- signature S(foo) where
- data T
- foo :: T
- module M(bar) where
- import S
- bar = foo
- unit q where
- module A(T(..)) where
- data T = T
- foo = T
- module S(foo) where
- import A
- include p
- module A where
- import M
- ... bar ...
-\end{verbatim}
-%
-When we type check \verb|p|, we get the \I{ModIface}s:
+Cabal packages are:
-\begin{verbatim}
- module HOLE:S(HOLE:S.foo) where
- data T
- foo :: HOLE:S.T
- module THIS:M(THIS:M.bar) where
- bar :: HOLE:S.T
-\end{verbatim}
-%
-Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
-which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
-The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
-this should be substituted to \verb|q():S.T|. But how do we discover this?
-We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
-and look for \verb|q():S.T|. However, this \I{Name} does not exist because
-the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
-the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
-type.
+\begin{itemize}
+ \item The unit of distribution
+ \item The unit that Hackage handles
+ \item The unit of versioning
+ \item The unit of ownership (who maintains it etc)
+\end{itemize}
-The conclusion, then, is that a module written this way should be disallowed.
-Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
-mentioned in the \I{ModIface} of a signature must either be from an external module, or be
-exported by the signature}.
+Backpack units are the building blocks of modular development;
+there may be multiple units per a Cabal package. While in theory
+Cabal could do sophisticated things with multiple units in a
+package, we expect Cabal
+to pick a distinguished unit (with the same unit name $p$ as
+the package) which serves as the publically visible unit.
-\begin{aside}
-\textbf{Special case export rule for record selectors.} Here is the analogous case for
-record selectors:
-\begin{verbatim}
- unit p where
- signature S(foo) where
- data T = T { foo :: Int }
- module M(bar) where
- import S
- bar = foo
- unit q where
- module A(T(..)) where
- data T = T { foo :: Int }
- module S(foo) where
- import A
- include p
- module A where
- import M
- ... bar ...
-\end{verbatim}
-We could reject this, but technically we can find the right substitution
-for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which
-does mention \verb|T|.
-\end{aside}
+%\newpage
\end{document} % chktex 16
diff --git a/docs/backpack/backpack-manual.tex b/docs/backpack/backpack-manual.tex
index 89826789a5..658df1badc 100644
--- a/docs/backpack/backpack-manual.tex
+++ b/docs/backpack/backpack-manual.tex
@@ -13,72 +13,134 @@
\maketitle
Backpack is a new module system for Haskell, intended to enable
-``separate modular development'': a style of development where
-application-writers can develop against abstract interfaces or
-\emph{signatures}, while separately library-writers write software which
-implement these interfaces. Backpack was originally described in a
-POPL'14 paper \url{http://plv.mpi-sws.org/backpack/}, but the point of
-this document is to describe the syntax of a language you might actually
-be able to \emph{write}, as well as describe some of the changes to the
-design we've made since this paper.
-
-\paragraph{Examples}
-
-Before we dive in, here are some examples of Backpack files to whet
-your appetite:
+``separate modular development'', where application-writers depend on
+libraries by programming against abstract interfaces instead of specific
+implementations. Our goal is to reduce software coupling, and let the
+type system and compiler assist developers when they are developing
+large software systems. Backpack was originally described in a POPL'14
+paper \url{http://plv.mpi-sws.org/backpack/}, but this document is
+intended to describe the syntax of a language you might actually be able
+to \emph{write}, i.e., from the perspective of a software developer.
+
+\paragraph{Examples of Backpack ``in the large''}
+In the standard practice of large-scale software development today,
+users organize code into libraries. Here is a very simple example of
+some Haskell code structured in this way:
\begin{verbatim}
-package p(A) where
- module A(a) where
- a = True
+unit p where
+ module A where
+ x = True
+ y = False
-package q(B, C) where
+unit q where
include p
- module B(b) where
+ module B where
import A
- b = a
- module C(c) where
- import B
- c = b
+ b = x
\end{verbatim}
-In this example package \verb|p| exports a single module \verb|A|; package \verb|q| exports
-two modules, and includes package \verb|p| so that its modules are in
-scope. The form of a \verb|package| and a \verb|module| are quite similar:
-a package can express an explicit export list of modules in much the same
-way a module exports identifiers. \\
+\verb|p| is a reusable unit (or package/library if you like) which
+provides a single module \verb|A|. \verb|q| depends on \verb|p| by
+directly including it, bringing all of the modules of \verb|p| into scope for
+import.
-Here is a more complicated Backpack file taking advantage of the availability
-of signatures in Backpack. This example omits the actual module bodies: GHC
-will automatically look for them in appropriate files (e.g. \verb|p/Map.hsig|,
-\verb|p/P/Types.hs|, \verb|p/P.hs|, \verb|q/QMap.hs| and \verb|q/Q.hs|).
+There is a downside to writing code this way: without editing your
+source code, you can't actually swap out \verb|p| with another version
+(maybe \verb|p-2.0|, or perhaps even a version of \verb|p| written by someone else).
+Traditionally, this is overcome by relying on an extralinguistic mechanism,
+the \emph{package manager}, which indirects \verb|include p| so that
+it can refer to some other piece of code.
+
+Backpack offers a different, in-language way of expressing dependency
+without committing to an implementation:
\begin{verbatim}
-package p (P) requires (Map) where
- include base
- signature Map
- module P.Types
- module P
+unit q where
+ require p
+ module B where
+ import A
+ b = x
+\end{verbatim}
-package q (Q) where
- include base
- module QMap
- include p (P) requires (Map as QMap)
- module Q
+Or, equivalently:
+
+\begin{verbatim}
+unit p-interface where
+ signature A where
+ x :: Bool
+ y :: Bool
+unit q where
+ include p-interface
+ module B where
+ import A
+ b = x
\end{verbatim}
-Package \verb|p| provides a module \verb|P|, but it also \emph{requires}
-a module \verb|Map| (which must fulfill the specification described
-in \verb|signature Map|). This makes it an \emph{indefinite package}:
-a package which isn't fully implemented, and cannot be compiled into
-executable code until its ``hole'' (\verb|Map|) is filled.
-It also sports an internal module named \verb|P.Types| which
-is missing from the export list, and thus not exported.
+With the \verb|require| keyword, Backpack automatically computes
+\verb|p-interface|, which expresses the abstract interface of \verb|p|.
+\verb|q| utilizes a subset of this interface (unused requirements
+don't ``count''), resulting in this final version of \verb|q|:
+
+\begin{verbatim}
+unit q where
+ signature A where
+ x :: Bool
+ module B where
+ import A
+ b = x
+\end{verbatim}
+
+The technical innovation of Backpack is that the indefinite version of \verb|q|, which
+does not depend on a specific implementation of \verb|p|, still composes
+naturally and in the \emph{same} way that the definite version of
+\verb|q|. That is to say, you can take a hierarchy of libraries that \verb|include| one
+another (today's situation) and replace each \verb|include| with a
+\verb|require|. The result is a reusable set of components with
+explicitly defined requirements. If the inferred requirements are not
+general enough, a signature can be written explicitly.
+
+Finally, the package manager serves a new role: it can be used to select
+a combination of components which fulfill all the requirements. Unlike
+dependency resolution with version numbers, with interfaces the package
+manager can do this \emph{completely} precisely.
+
+\paragraph{Examples of Backpack ``in the small''}
+Although Backpack was originally designed for library-scale development,
+it can also be used for small-scale modular development,
+similar to ML modules. Here is a simple example of writing an
+associated list implementation that is parametrized over the key:
+
+\begin{verbatim}
+ unit assoc-map where
+ signature H where
+ data T
+ eq :: T -> T -> Bool
+ module Assoc where
+ import H
+ mylookup :: T -> [(T, a)] -> Maybe a
+ mylookup x xs = fmap snd (find (eq x) xs)
+ unit main where
+ module AbsInt where
+ type T = Int
+ eq x y = abs x == abs y
+ include assoc-map (Assoc as AbsIntAssoc) requires (T as AbsInt)
+ module Main where
+ import AbsIntAssoc
+ import Prelude
+ main = print (mylookup -1 [(1,"yes"),(-2,"no")])
+\end{verbatim}
+
+For example, in Haskell there are many different kinds of ``string''-like
+data types: \verb|String| (a linked list of characters), \verb|ByteString|
+(an unstructured raw bytestring) and \verb|Text| (a UTF-8 encoded bytestring).
+Many libraries, e.g., parsers, need to work on any string-like input which
+a user might want to give them.
+
+A more full example of a Backpack version of some real library
+code can be found in Appendix~\ref{sec:tagstream-example}.
+
-Package \verb|q|, on the other hand, is a \emph{definite package}; as it
-has no requirements, it can be compiled. When it includes \verb|p|
-into scope, it also fills the \verb|Map| requirement with an
-implementation \verb|QMap|.
\section{The Backpack file}
@@ -162,7 +224,7 @@ modules and signatures.}
A signature, denoted with \verb|signature| in a Backpack file and a file
with the \verb|hsig| extension on the file system, represents a (type) signature for a
Haskell module. It can contain type signatures, data declarations, type
-classes, type class instances and reexports(!), but it cannot contain any
+classes, type class instances and reexports, but it cannot contain any
value definitions.\footnote{Signatures are the backbone of the Backpack
module system. A signature can be used to type-check client code which
uses a module (without the module implementation), or to verify that an
@@ -417,7 +479,7 @@ and, symmetrically, ascription in an include hides identifiers:
... private ... -- ERROR
\end{verbatim}
-\Red{OBSERVATION: thinning is subsumed by transparent signature ascription, but NOT renaming. Thus RENAMING does not commute across signature ascription; you must do it either before or after. Syntax for this is tricky.}
+\Red{Observation: thinning is subsumed by transparent signature ascription, but NOT renaming. Thus RENAMING does not commute across signature ascription; you must do it either before or after. Syntax for this is tricky.}
\paragraph{Syntactic sugar for anonymous packages}
@@ -463,4 +525,99 @@ pkgexp ::= pkgname
| path
\end{verbatim}
+\newpage
+\label{sec:tagstream-example}
+\section{\texttt{tagstream-conduit} example}
+
+When someone recently asked on Haskell Reddit what the ``precise problem Backpack
+addresses'' was, Chris Doner offered a nice example from the
+\verb|tagstream-conduit|. The existing implementation, at \url{http://hackage.haskell.org/package/tagstream-conduit-0.5.5.1/docs/Text-HTML-TagStream-Entities.html}, uses a data type
+to define a ``module'' which is then used to implement two modules in the
+library, a variant for \verb|ByteString| and a variant for \verb|Text|.
+
+Here is how this would be done in Backpack:
+
+\Red{This still contains some syntax which I haven't fully explained.}
+
+\begin{verbatim}
+ -- | This is an ordinary module which defines some types
+ -- which are exported by the package.
+ module Text.HTML.TagStream.Types where
+ data Token' s = Text s
+ | ...
+
+ -- | This provides a 'Decoder' implementation for 'ByteString's.
+ -- We define 'Str' to be 'ByteString' and implement a few
+ -- functions manually, as well as reexport existing functions
+ -- from some libraries. We don't plan to publically export
+ -- these from the package.
+ module Decoder.ByteString (
+ module Decoder.ByteString,
+ Builder, break, drop, uncons
+ ) where
+ import Data.ByteString
+ import Data.ByteString.Builder
+ type Str = ByteString
+ toStr = toByteString
+ fromStr = fromByteString
+ decodeEntity = ...
+
+ -- | This provides a 'Decoder' implementation for 'Text', much
+ -- the same as 'Decoder.ByteString'.
+ module Decoder.Text (
+ module Decoder.Text,
+ Builder, break, drop, uncons
+ ) where
+ import Data.Text
+ import Data.Text.Lazy.Builder
+ type Str = Text
+ toStr = toText
+ fromStr = fromText
+ decodeEntity = ...
+
+ -- | This defines the "functor"; the module implementing entity
+ -- decoding, 'Entities', is parametrized by an abstract interface
+ -- 'Decoder' which describes two types, 'Str' and 'Builder', as
+ -- well as operations on them which are avaiable for the implementation.
+ unit decoder where
+ signature Decoder where
+ data Str
+ data Builder
+ toStr :: Builder -> Str
+ break :: (Char -> Bool) -> Str -> (Str, Str)
+ fromStr :: Str -> Builder
+ drop :: Int -> Str -> Str
+ decodeEntity :: Str -> Maybe Str
+ uncons :: Str -> Maybe (Char, Str)
+ module Entities where
+ import Text.HTML.TagStream.Types
+ import Data.Conduit
+ import Decoder
+ decodeEntities :: Monad m => Conduit (Token' Str) m (Token' Str)
+ decodeEntities = ...
+
+ -- | Finally, these two lines instantiate 'Entities' twice;
+ -- once with 'Decoder.ByteString', and once as 'Decoder.Text'.
+ include decoder (Entities as Text.HTML.TagStream.ByteString)
+ requires (Decoder as Decoder.ByteString)
+ include decoder (Entities as Text.HTML.TagStream.Text)
+ requires (Decoder as Decoder.Text)
+\end{verbatim}
+
+Without having the source-code inline, the out-of-line Backpack file
+would look something like this:
+
+\begin{verbatim}
+ module Text.HTML.TagStream.Types -- Text/HTML/TagStream/Types.hs
+ module Decoder.ByteString -- Decoder/ByteString.hs
+ module Decoder.Text -- Decoder/Text.hs
+ unit decoder where
+ signature Decoder -- decoder/Decoder.hsig
+ module Entities -- decoder/Entities.hs
+ include decoder (Entities as Text.HTML.TagStream.ByteString)
+ requires (Decoder as Decoder.ByteString)
+ include decoder (Entities as Text.HTML.TagStream.Text)
+ requires (Decoder as Decoder.Text)
+\end{verbatim}
+
\end{document}