\documentclass{article} \usepackage{pifont} \usepackage{graphicx} %[pdftex] OR [dvips] \usepackage{fullpage} \usepackage{wrapfig} \usepackage{float} \usepackage{titling} \usepackage{hyperref} \usepackage{tikz} \usepackage{color} \usepackage{footnote} \usepackage{float} \usepackage{algorithm} \usepackage{algpseudocode} \usetikzlibrary{arrows} \usetikzlibrary{positioning} \setlength{\droptitle}{-6em} \newcommand{\Red}[1]{{\color{red} #1}} \input{commands-new-new.tex} \newcommand{\nuAA}{\nu_\mathit{AA}} \newcommand{\nuAB}{\nu_\mathit{AB}} \newcommand{\nuGA}{\nu_\mathit{GA}} \newcommand{\nuGB}{\nu_\mathit{GB}} \newcommand{\betaPL}{\beta_\mathit{PL}} \newcommand{\betaAA}{\beta_\mathit{AA}} \newcommand{\betaAS}{\beta_\mathit{AS}} \newcommand{\thinandalso}{\hspace{.45cm}} \newcommand{\thinnerandalso}{\hspace{.38cm}} \input{commands-rebindings.tex} \newcommand{\var}[1]{\textsf{#1}} \newcommand{\ghcfile}[1]{\textsl{#1}} \title{$\mu$Backpack} \begin{document} \maketitle $\mu$Backpack is a simplified version of Backpack which is easier to implement and explain. Changes: \begin{itemize} \item A reexport may not be used to implement a declaration in a signature, \emph{unless} the implementation comes textually before all signature, including included signatures (i.e., absent mutual recursion.) \item Type identity is computed by considering \emph{all} of the holes in the package. \item No aliases, unless shaping is used. \item Mutual recursion is not permitted, unless shaping is used. \end{itemize} Generally speaking, you are subject to \verb|hs-boot| style restrictions when mutual recursion is used, and no restrictions otherwise. \section{Things that don't work} \paragraph{No re-exports} Signatures cannot be implemented using a module which reexports the entity in question, unless the implementation preceeds the signature. This example is rejected: \begin{example} \Pdef{p}{ \Psig{A}{x :: Bool} \Pmod{B}{x = True} \Pmod{A}{\Mimp{B}; \Mexp{x}} } \end{example} This restriction is the same as the restriction in \verb|hs-boot| files. \paragraph{No module-level granularity} In the following example, \verb|B1.B| and \verb|B2.B| are not type-equal: \begin{example} \Pdef{p}{ \Psig{A}{data A} \Pmod{B}{data B = B} } & \Pdef{q}{ \Pmod{A1}{data A = A1} \Pmod{A2}{data A = A2} \Pinct{p}{\Emvp{A}{A1}, \Emvp{B}{B1}} \Pinct{p}{\Emvp{A}{A2}, \Emvp{B}{B2}} } \end{example} \paragraph{No inter-package mutual recursion} Inter-package mutual recursion requires a separate shaping pass to determine module identities (for same-package mutual recursion, the identity is known in advance.) \section{Worked example} \begin{example} \Pdef{p}{ \Psig{A}{data A; x :: A} \Pmod{B}{\Mimp{A}; data B = B A} } & \Pdef{q}{ \Psig{A}{data A; y :: A} \Pmod{C}{\Mimp{A}; data C = C A} } \\ \Pdef{r}{ \Pmod{P}{data A = A; x = A} \Pmod{A2}{\Mimp{P}; \Mexp{A}} \Pinc{p} \Pinc{q} % \Pref{A3}{A} \Pinct{p}{\Emvp{A}{A2}, \Emvp{B}{B2}} \Pmodbig{D}{\ldots \textit{stuff} \ldots} } \end{example} This example exercises a few features that Backpack provides: \begin{itemize} \item \verb|A| is never instantiated, so we can only typecheck these packages. \item \verb|A| from \verb|p| and \verb|q| are linked together. \item A fresh instance of \verb|A| from \verb|p| is linked against \verb|A2|. %\item \verb|A| is aliased with \verb|A3|, so the same hole is exported from a package under different names. \end{itemize} At the end of type-checking \verb|r|, we have the following mapping of module names to identities: \begin{itemize} \item \mname{A} has identity $\alpha_A$. When we import \mname{A}, we actually see \emph{two} interfaces: one which exports $\alpha_{A}.$\verb|A| and $\alpha_{A}.$\verb|x| (from \pname{p}), and one which exports $\alpha_{A}.$\verb|A| and $\alpha_{A}.$\verb|y| (from \pname{q}). An import of \mname{A} effectively imports the merge of these two interfaces, but this merging process is left to be handled at the Haskell source level.\footnote{From an implementation perspective, this is helpful because otherwise, it's not really clear who should actually do signature merging. Deferring merges to import time is convenient to implement.} The mapping of interfaces is a local property: for example, when type-checking \mname{B}, only one interface is available (the one from \pname{p}). \item \mname{B} has identity $\nu_B$, where $\nu_B=$ \pname{p}(\mname{A} $\mapsto\alpha_A$):\mname{B} (read this as, ``module \mname{B} from package \pname{p} with \mname{A} instantiated with $\alpha_A$''). Its interface exports $\nu_{B}.$\verb|B| with a single constructor $\nu_{B}.$\verb|B| $\cc$ $\alpha_{A}.$\verb|A| $\rightarrow$ $\nu_{B}.$\verb|B| \item \mname{C} has identity $\nu_C$, where $\nu_C=$ \pname{q}(\mname{A} $\mapsto\alpha_A$):\mname{C}. Its interface exports $\nu_{C}.$\verb|C| with a single constructor $\nu_{C}.$\verb|C| $\cc$ $\alpha_{A}.$\verb|A| $\rightarrow$ $\nu_{C}.$\verb|C| \item \mname{P} has identity $\nu_P$, where $\nu_P=$ \pname{r}(\mname{A} $\mapsto\alpha_A$):\mname{P}. Its interface exports $\nu_{P}.$\verb|A| and $\nu_{P}.$\verb|x|. \item \mname{A2} has identity $\nu_{A2}$, where $\nu_{A2}=$ \pname{r}(\mname{A} $\mapsto\alpha_A$):\mname{A2}. Its interface exports $\nu_{P}.$\verb|A| (nota bene, this is \emph{not} $\nu_{A2}.$\verb|A|!) \item \mname{B2} has identity $\nu_{B2}$, where $\nu_{B2}=$ \pname{p}(\mname{A} $\mapsto\nu_{A2}$):\mname{B}. Its interface exports $\nu_{B2}.$\verb|B| with a single constructor $\nu_{B2}.$\verb|B| $\cc$ $\alpha_{A}.$\verb|A| $\rightarrow$ $\nu_{B2}.$\verb|B| %\item \mname{A3} is $\alpha_A$ \item \mname{D} has identity $\nu_{D}$, where $\nu_{D}=$ \pname{r}(\mname{A} $\mapsto\alpha_A$):\mname{D}. Its interface exports stuff. \end{itemize} \section{The rules} \paragraph{Nomenclature} We use \emph{interface} ($\tau$) to refer to the type of a module. \paragraph{The environment} While type-checking a package, we have the following pieces of information: \begin{itemize} \item A package environment $\Delta$, which holds the package definitions of all packages in scope. \item The current package name $P$ and hole instantiation $\mathcal{H}$, such that for any module $m$ defined in the package, $P(\mathcal{H})$:$m$ is the identity of that module (package level granularity). \item The package context $\Gamma$. This is a mapping from a module name to its identity and interfaces (plural!), accumulated from the modules/signatures we have type-checked and put into scope under this name. \item \textbf{Optional.} An implementation cache $\Phi$. This is a mapping from module identity to interface for each identity that has a final implementation type-checked. This context is used to determine if an implementation comes textually before a signature (if so, when type checking the signature, the module will be present in the implementation cache). In the absence of this cache, reexports are never allowed in implementations (the \verb|hs-boot| restriction always applies). \end{itemize} To begin type-checking of a particular indefinite package, generate fresh module variables $\alpha$ for each hole of the package (this is the only time module variables are generated). Set $P$ to the name of the package, and $\mathcal{H}$ to be a mapping of each hole to the fresh module identity variables and begin type-checking each binding starting with an empty package context and implementation cache. \paragraph{Calculate holes} As a pre-processing step, we elaborate the syntax with the holes each package requires. This proceeds recursively: the holes of a package are any signatures it defines, as well as the renamed holes of any packages it includes, minus any implemented modules. We assume for any package definition in the package environment $\Delta$, we can tell what holes it requires. In the worked example, the set of holes for each of the packages is just \mname{A}. \paragraph{Typing modules} A module definition $m = [M]$ is straightforwardly type-checked by using the Haskell level typing judgment $\Gamma; \nu_0 \vdash M : \tau$, where $\nu_0 = P(\mathcal{H})$:$m$. The resulting interface $\tau$ is brought into the package context as the logical binding $m \mapsto \nu_0@\tau$; we also record $\nu_0 : \tau$ in the implementation cache. \paragraph{Typing signatures} A signature definition $m :: [S]$ is type-checked with the Haskell level typing judgment $\Gamma; \nu_0 \vdash S : \tau$, where $\nu_0 = \mathcal{H}(m)$ if this hole was instantiated externally, or $\nu_0 = \Gamma(m)$ if it was instantiated internally. If the identity $\mathcal{H}(m)$ is already in the physical cache $\Phi$, then the identities of the declarations in the signature are taken from the interface $\Phi(\mathcal{H}(m))$ (with the types checked for consistency); otherwise, they are given \emph{fresh} identities originating from $\mathcal{H}(m)$ (in the same manner as an hs-boot file). The resulting type $\tau$ is brought into the package context as the logical binding $p \mapsto \mathcal{H}(m)@\tau$. From the worked example, when \mname{A} is type-checked from \pname{p} of the first inclusion, $\mathcal{H}($\mname{A}$) = \alpha_A$, there is no implementation in $\Phi$, so the type \verb|T| is assigned the original name $\alpha_A.$\verb|T|. In the second inclusion, $\mathcal{H}($\mname{A}$) =$ \pname{r}(\mname{A} $\mapsto\alpha_A$):\mname{A2}, which is implemented, so we look at the original name of \verb|T| in \mname{A2} and assign our type \verb|T| the same: \pname{r}(\mname{A} $\mapsto\alpha_A$):\mname{P}.\verb|T| (notably, \mname{A2} is no where to be seen in this identifier, as it would be under the rule for holes.) \paragraph{Typing includes} An include of package $P$ with renaming $r$ is typed by recursively typechecking the source pointed to by $P$ ($\Delta(P)$), but with an adjusted $\mathcal{H}'$ given thinning and renaming (and an empty context $\Gamma'$. In particular: the new $\mathcal{H}'$ is computed by taking the holes of $P$, renaming them according to $r$, and then finding the identities in $\mathcal{H}$ or $\Gamma$. In the case of cross-package mutual recursion, this lookup would fail (the shaping pass serves as an oracle which provides the correct identity). In the worked example, we type-check with the following parameters: \begin{itemize} \item Top-level: $P=$ \pname{r}, $\mathcal{H}=A\mapsto\alpha_A$ \item \verb|include p|: $P=$ \pname{p}, $\mathcal{H}=A\mapsto\alpha_A$ \item \verb|include q|: $P=$ \pname{q}, $\mathcal{H}=A\mapsto\alpha_A$ \item \verb|include p (A as A2, B as B2)|: $P=$ \pname{p}, $\mathcal{H}=A\mapsto\nu_{A2}$ \end{itemize} After recursively typechecking, we are left with a new context $\Gamma'$. The logical mapping of this context is renamed according to $r$, and then added to our current context (along with the implementation cache). This is a non-compositional typing rule. \paragraph{Merging package contexts} If you have two bindings $m \mapsto \nu:\overline{\tau}$; check that the identities $\nu$ agree, and that all of the types are consistent with one another. If so, create a new binding with the lists of $\tau$ concatenated together. As an optimization, if one $\tau$ is from the final implementation, all other $\tau$s can be discarded (as they are guaranteed to be subsets of the final $\tau$). \paragraph{Merging implementation caches} Cache merging is a trivial union; given two equal physical module identities their interfaces are guaranteed to be the same. \section{Complications} \subsection{Signature consistency} Suppose that you want to use a signature, but you don't want to export it: \begin{example} \Pdeft{p}{\m{P}}{ \Psig{A}{data A = A Bool} \Pmod{P}{\Mimp{A}; \ldots} } & \Pdef{q}{ \Pinc{p} \Pmod{Q}{\Mimp{P}; \ldots} } \end{example} Evidently, we can't \emph{completely} eliminate \m{A}, since eventually we'll need to fill it in with an implementation. Thus, the thinning must actually be shorthand for making a logical mapping in the context to an \emph{empty} list of interfaces. In this example, in the context visible from \m{Q}, there is a binding for \m{A} which has module identity $\alpha_A$ but no interfaces. \paragraph{The bug} If we modify \pname{q} to be: \begin{example} \Pdef{q}{ \Pinc{p} \Psig{A}{data A = A Int} \Pmod{Q}{\Mimp{P}; \ldots} } \end{example} Both \m{A} from \pname{p} and \m{A} from \pname{q} have the same module identity, but their signatures are incompatible. However, while type-checking, we only check for signature consistency when (1) there is a backing implementation, and (2) when merging logical contexts. The first check is inapplicable in this case (\m{A} is a hole), and the second check does not see the inconsistency, because $\tau$ from \pname{p} was dropped. \paragraph{The fix} The correct approach is to \emph{keep} the types in the mapping (for consistency checking), but mark them as hidden (so they are not loaded on import). \subsection{Hole duplication} Hole duplication is when an indefinite module identity $\alpha$ is available from a package under multiple names. While this is technically disallowed under the current syntax in the absence of aliases, there are easy to imagine syntactic extensions which achieve this, e.g. \begin{example} \Pdeft{p}{\m{P}}{ \Psig{A}{data A} } & \Pdef{q}{ \Pinct{p}{\Emvp{A}{A1}, \Emvp{A}{A2}} \Pinct{p}{\Emvp{A}{A3}} } \end{example} In this example, \m{A1} and \m{A2} have the same identity, while \m{A3} has a different identity. Another example would be to have the multiple renaming occur in the thinning specification of a package. \paragraph{The bug} If the holes of \pname{q} are specified to be \m{A1}, \m{A2} and \m{A3}, then when type-checking \pname{q} we will create three fresh identities, and \m{A1}.\verb|A| and \m{A2}.\verb|A| will not unify, even though they should. \paragraph{The fix} For every module identity, there must be a module name which is nominated as the \emph{principal} hole: this is the only module which is included in the list of holes. This is the \emph{only} hole which can be linked against to set the identity of the hole; any other occurrence of the module identity is ineligible. Thus, in the previous example, if \m{A1} is the principal hole, then the holes of \pname{q} are \m{A1} and \m{A3}, and there is one valid linking: \begin{example} \Pdef{r-ok}{ \Pmod{A1}{data A = A} \Pinc{q} } & \Pdef{r-failed}{ \Pmod{A2}{data A = A} \Pinc{q} } \end{example} \subsection{Module binding reordering} We stated that re-exports are not allowed if the implementation preceeds the signature, as in this example: \begin{example} \Pdef{p}{ \Psig{A}{x :: Bool} \Pmod{C}{\Mimp{A}; y = True} \Pmod{B}{x = True} \Pmod{A}{\Mimp{B}; y = False; \Mexp{x, y}} \Pmod{D}{\Mimp{A}; \Mimp{C}; \ldots} } \end{example} However, in this particular example, \m{B} does not depend on the signature of \m{A}, so the bindings in this package could be reordered to restore ordering: \begin{example} \Pdef{p}{ \Pmod{B}{x = True} \Pmod{A}{\Mimp{B}; y = False; \Mexp{x, y}} \Psig{A}{x :: Bool} \Pmod{C}{\Mimp{A}; y = True} \Pmod{D}{\Mimp{A}; \Mimp{C}; \ldots} } \end{example} This reordering is always possible as long as there is no mutual recursion. \paragraph{The bug} We've changed the semantics of the package: in the original example, only \verb|x| was visible in \m{C}; in the reordered example, both \verb|x| and \verb|y| are visible. \paragraph{The fix} Float implementations to be type-checked as early as possible, but do not add the resulting interface to the package context until you arrive at their original definition site: \begin{example} \Pdef{p}{ \Pmod{B}{\textit{-- hidden}; x = True} \Pmod{A}{\textit{-- hidden}; \Mimp{B}; y = False; \Mexp{x, y}} \Psig{A}{x :: Bool} \Pmod{C}{\Mimp{A}; y = True} \Pmod{B}{\textit{-- visible }} \Pmod{A}{\textit{-- visible }} \Pmod{D}{\Mimp{A}; \Mimp{C}; \ldots} } \end{example} In this example, the \m{A} signature is type-checked with $\Gamma($\m{A}$) = $\pname{p}:\m{A}, which is present in the implementation cache; however, \m{C} is type-checked in a package context that only has $\tau$ from the signature, and not the implementation. \end{document}