summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2015-05-13 16:16:39 -0700
committerEdward Z. Yang <ezyang@cs.stanford.edu>2015-05-14 15:43:05 -0700
commit59720370849c0e15f207f059e54b21e4f8e05d82 (patch)
tree938932d0f834677b92b70dda305bb9041a093157
parent3ef7fcedfa1ad47968ca5fa107d51a6ab7051ed7 (diff)
downloadhaskell-59720370849c0e15f207f059e54b21e4f8e05d82.tar.gz
Backpack docs: Rewrite type checking section to have a more concrete plan.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
-rw-r--r--docs/backpack/algorithm.pdfbin272423 -> 279771 bytes
-rw-r--r--docs/backpack/algorithm.tex425
2 files changed, 225 insertions, 200 deletions
diff --git a/docs/backpack/algorithm.pdf b/docs/backpack/algorithm.pdf
index 05738bc3d9..557bdf23f4 100644
--- a/docs/backpack/algorithm.pdf
+++ b/docs/backpack/algorithm.pdf
Binary files differ
diff --git a/docs/backpack/algorithm.tex b/docs/backpack/algorithm.tex
index 1d074a5fc5..f3828b2adc 100644
--- a/docs/backpack/algorithm.tex
+++ b/docs/backpack/algorithm.tex
@@ -43,7 +43,7 @@
\newenvironment{aside}
{\begin{figure}\def\FrameCommand{\hspace{2em}}
- \MakeFramed {\advance\hsize-\width}\optionrule\small}
+ \MakeFramed{\advance\hsize-\width}\optionrule\small}
{\par\vskip-\smallskipamount\optionrule\endMakeFramed\end{figure}}
\setlength{\droptitle}{-6em}
@@ -168,7 +168,7 @@ Presently, however, such an \I{OccName} annotation would be redundant: it can be
\begin{aside}
\textbf{Holes of a package are a mapping, not a set.} Why can't the \I{PkgKey} just record a
-set of \I{Module}s, e.g. $\I{PkgKey} ::= \I{SrcPkgKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
+set of \I{Module}s, e.g. $\I{PkgKey}\;::= \; \I{SrcPkgKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
\begin{verbatim}
package p (A) requires (H1, H2) where
@@ -273,7 +273,7 @@ A module declaration provides a module \verb|THIS:M| at module name \verb|M|. It
has the shape:
\begin{verbatim}
- provides: M -> THIS:M { exports of renamed M }
+ provides: M -> THIS:M { exports of renamed M under THIS:M }
requires: (nothing)
\end{verbatim}
Example:
@@ -293,7 +293,7 @@ A signature declaration creates a requirement at module name \verb|M|. It has t
\begin{verbatim}
provides: (nothing)
- requires: M -> { exports of renamed M }
+ requires: M -> { exports of renamed M under HOLE:M }
\end{verbatim}
\noindent Example:
@@ -448,7 +448,7 @@ proceeds as follows:
$p$.} For each requirement $M$ of $q$ that is provided by $p$ (in particular,
all of its required \verb|Name|s are provided),
substitute each \I{Module} occurrence of \verb|HOLE:M| with the
- provided $p(M)$, unify the names, and remove the requirement from $q$.
+ provided $p\verb|(|M\verb|)|$, unify the names, and remove the requirement from $q$.
If the names of the provision are not a superset of the required names, error.
\item If mutual recursion is supported, \emph{fill every requirement of $p$ with provided modules from $q$.}
\item \emph{Merge leftover requirements.} For each requirement $M$ of $q$ that is not
@@ -972,9 +972,11 @@ $$
\I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
& & \qquad \I{mi\_decls} \\
& & \qquad \I{mi\_insts} \\
-\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n \\
-\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n \\
-\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n \\[1em]
+& & \qquad \I{dep\_orphs} \\
+\I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
+\I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
+\I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
+\I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
\multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
\I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
& | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
@@ -987,7 +989,8 @@ $$
\caption{Module interfaces in GHC} \label{fig:typecheck}
\end{figure}
-Type checking an indefinite package (a package with holes) involves
+In general terms,
+type checking an indefinite package (a package with holes) involves
calculating, for every module, a \I{ModIface} representing the
type/interface of the module in question (which is serialized
to disk). The general form of these
@@ -1017,33 +1020,16 @@ the \I{PkgType} is:
-- where THIS = p(H -> HOLE:H)
\end{verbatim}
%
-However, a \I{PkgType} of \I{ModIface}s is not the whole story:
-when a package has holes, a \I{PkgType} specified in this manner
-defines a family of possible types, based on how the holes are
-shaped and instantiated. A package which writes \verb|include p requires (H as S)|
-and has a sharing constraint of \verb|H.T| with \verb|q():B.T| may
-end up with this ``type'':
-
-\begin{verbatim}
- module HOLE:S (q():B.T) where
- module THIS:A (THIS:A.S, q():B.T) where
- data S = S q():B.T
- -- where THIS = p(H -> HOLE:S)
-\end{verbatim}
-%
-Furthermore, for ease of implementation, GHC prefers to resolve all
-indirect \I{Name} references (which are just strings) into a
-\I{ModIface} into direct \I{TyThing} references (which are data
-structures that have type information). This resolution is done lazily
-with some hackery!
-
-Thus, given a shaping and a hole instantiation, a \I{ModIface} can be
-converted into an in-memory \I{ModDetails}, described in
-Figure~\ref{fig:typecheck-more}. (Technically, the \I{Module} does not
-have to be recorded as it is recorded in the \I{Name} associated with a
-\I{TyThing}; so you can think of a \I{ModDetails} as a big bag of type-checkable
-entities which GHC knows about; in the source code this
-is referred to as the \emph{external package state (EPT)}).
+However, while it is true that the \I{ModIface} is the final result
+of type checking, we actually are conflating two distinct concepts: the user-visible
+notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
+into scope (or could trigger a deprecation warning, or pull in some
+orphan instances\ldots), versus the actual declarations, which, while recorded
+in the \I{ModIface}, have an independent existence: even if a declaration
+is not visible for an import, we may internally refer to its \I{Name}, and
+need to look it up to find out type information. (A simple case when
+this can occur is if a module exports a function with type \verb|T -> T|,
+but doesn't export \verb|T|).
\begin{figure}[htpb]
$$
@@ -1059,191 +1045,230 @@ $$
\caption{Semantic objects in GHC} \label{fig:typecheck-more}
\end{figure}
-Type checking, thus, is a delicate balancing act between module
-interfaces and our semantic objects. Given a shaping, here
-are some important operations we have to do in type checking:
-
-\paragraph{Imports} When a module/signature imports a module name,
-we must consult the exports of \I{ModIface}s associated with this
-module name, modulo what we calculated into the shaping pass.
-(In fact, we can get all of this information directly from the
-shaping pass.)
+Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
+Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
+a bag of type-checkable entities which GHC knows about. We
+define the \emph{external package state (EPT)} to
+simply be the union of the \I{ModDetails}
+of all external modules.
+
+Type checking is a delicate balancing act between module
+interfaces and our semantic objects. A \I{ModIface} may get
+type-checked multiple times with different hole instantiations
+to provide multiple \I{ModDetails}.
+Furthermore complicating matters
+is that GHC does this resolution \emph{lazily}: a \I{ModIface}
+is only converted to a \I{ModDetails} when we are looking up
+the type of a \I{Name} that is described by the interface;
+thus, unlike usual theoretical treatments of type checking, we can't
+eagerly go ahead and perform substitutions on \I{ModIface}s when
+they get included.
+
+In a separate compiler like GHC, there are two primary functions we must provide:
+
+\paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
+was explicitly imported by a user, we must produce a \I{ModIface}
+that, among other things, specifies what \I{Name}s are brought
+into scope. This is used by the renamer to resolve plain references
+to identifiers to real \I{Name}s. (By the way, if shaping produced
+renamed trees, it would not be necessary to do this step!)
+
+\paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
+a part of a \I{Name}, we must be able to type check it into
+a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
+associated with the \I{Module}, but this process is involved). This
+is used by the type checker to find out type information on things. \\
+
+There are two points in the type checker where these capabilities are exercised:
+
+\paragraph{Source-level imports} When a user explicitly imports a
+module, the \textit{ModuleName} is mapped to a \textit{ModIface}
+to find out what exports are brought into scope (\I{mi\_exports})
+and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
+the \textit{Module} is loaded to the EPT to bring instances from
+the module into scope.
+
+\paragraph{Internal name lookup} During type checking, we may have
+a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
+EPT, we type check and load
+into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
+and then check the EPT again. (\verb|importDecl|)
+
+\subsection{\textit{ModName} to \textit{ModIface}}
+
+In all cases, the \I{mi\_exports} can be calculated directly from the
+shaping process, which specifies exactly for each \I{ModName} in scope
+what will be brought into scope.
+
+\paragraph{Modules} Modules are straightforward, as for any
+\I{Module} there is only one possibly \I{ModIface} associated
+with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
+declaration.)
+
+\paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
+associated with a \I{ModName} in scope, e.g. in this situation:
-\paragraph{Includes and name lookups} When we include a package,
-take all of the \I{ModIface}s it brings into scope, type-check
-them with respect to the instantiation of holes and shaping, and add
-them to the EPT.
-
-Even better, this process should be done lazily on name lookup.
-When we have a renamed identifier, e.g. a \I{Name},
-we first check if we know about this object in EPT, and if not,
-we must find the \I{ModIface}(s) (plural!) that would have brought the object into
-scope, add them to EPT and try again. The process of adding semantic
-objects to the EPT may cause us to learn \emph{more} information
-about an object than we knew previously.
-
-\paragraph{Cross-package compilation} When we begin type-checking a new
-indefinite package, we must \emph{clear} all \I{ModDetails} which depend on
-holes. This is because shaping may cause the type-checked entities to refer
-to different semantic objects.
-
-\subsection{The basic plan}
+\begin{verbatim}
+ package p where
+ signature S where
+ data A
+ package q where
+ include p
+ signature S where
+ data B
+ module M where
+ import S
+\end{verbatim}
+%
+Each literal \verb|signature| has a \I{ModIface} associated with it; and
+the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
+\I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
+but we also need to pull in orphan instances for each signature, and
+produce a warning for each deprecated signature.
-Given a module or signature of a package, we can type check given these two assumptions:
+\begin{aside}
+\textbf{Does hiding a signature hide its orphans.} Suppose that we have
+extended Backpack to allow hiding signatures from import.
-\begin{itemize}
- \item We have a renamed syntax tree, whose identifiers have been
- resolved as according to the result of the shaping pass.
- \item For any \I{Name} in the renamed tree, the corresponding
- \I{ModDetails} for the \I{Module} has been loaded
- (or can be lazily loaded).
-\end{itemize}
+\begin{verbatim}
+ package p requires (H) where -- H is hidden from import
+ module A where
+ instance Eq (a -> b) where -- orphan
+ signature H {-# DEPRECATED "Don't use me" #-} where
+ import A
-The result of type checking is a \I{ModDetails} which can then be
-converted into a \I{ModIface}, because we assumed each signature
-to serve as an uninstantiated hole (thus, the computed \I{ModDetails} is
-in its most general form).
-Arranging for these two assumptions to be true is the bulk of the
-complexity of type checking.
+ package q where
+ include p
+ signature H where
+ data T
+ module M where
+ import H -- warn deprecated?
+ instance Eq (a -> b) -- overlap?
+\end{verbatim}
-\subsection{Loading modules from indefinite packages}
+It is probably the most consistent to not pull in orphan instances
+and not give the deprecated warning: this corresponds to merging
+visible \I{ModIface}s, and ignoring invisible ones.
+\end{aside}
-\paragraph{Everything is done modulo a shape} Consider
-this package:
+\subsection{\textit{Module} to \textit{ModDetails}}
-\begin{verbatim}
-package p where
- signature H(T) where
- data T = T
- module A(T) where
- data T = T
- signature H(T) where
- import A(T)
+\paragraph{Modules} For modules, we have a \I{Module} of
+the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
+and we also have a unique \I{ModIface}, where each hole instantiation
+is $\verb|HOLE:|m$.
--- provides: A -> THIS:A { THIS:A.T }
--- H -> HOLE:H { THIS:A.T }
--- requires: H -> { THIS:A.T }
-\end{verbatim}
+To generate the \I{ModDetails} associated with the specific instantiation,
+we have to type-check the \I{ModIface} with the following adjustments:
-With this shaping information, when we are type-checking the first
-signature for \verb|H|, it is completely wrong to try to create
-a definition for \verb|HOLE:H.T|, since we know that it refers
-to \verb|THIS:A.T| via the requirements of the shape. This applies
-even if \verb|H| is included from another package. Thus, when
-we are loading \I{ModDetails} into memory, it is always done
-\emph{with respect to some shaping}. Whenever you reshape,
-you must clear the module environment.
+\begin{enumerate}
+ \item Perform a \I{Module} substitution according to the instantiation
+ of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
+ substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
+ \verb|A -> HOLE:B|, \emph{unlike} the disjoint
+ substitutions applied by shaping.)
+ \item Perform a \I{Name} substitution as follows: for any name
+ with a package key that is a $\verb|HOLE|$,
+ substitute with the recorded \I{Name} in the requirements of the shape.
+ Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
+ and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
+\end{enumerate}
-\paragraph{Figuring out where to consult for shape information}
+\paragraph{Signatures} For signatures, we have a \I{Module} of the form
+$\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
+We distinguish each separate \I{ModIface} by considering the full \I{PkgKey}
+it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
+the hole's \emph{defining package key}; the set of \I{ModIface}s for a hole
+and their defining package keys can easily be calculated during shaping.
-For this example, let's suppose we have already type-checked
-this package \verb|p|:
+To generate the \I{ModDetails} associated with a hole, we type-check each
+\I{ModIface}, with the following adjustments:
-\begin{verbatim}
-package p (A) requires (S) where
- signature S where
- data S
- data T
- module A(A) where
- import S
- data A = A S T
-\end{verbatim}
+\begin{enumerate}
+ \item Perform a \I{Module} substitution according to the instantiation
+ of the defining package key. (NB: This may rename the hole itself!)
+ \item Perform a \I{Name} substitution as follows, in the same manner
+ as would be done in the case of modules.
+ \item When these \I{ModDetails} are merged into the EPT, some merging
+ of duplicate types may occur; a type
+ may be defined multiple times, in which case we check that each
+ definition is compatible with the previous ones. A concrete
+ type is always compatible with an abstract type.
+\end{enumerate}
-giving us the following \I{ModIface}s:
+\paragraph{Invariants} When we perform \I{Name} substitutions, we must be
+sure that we can always find out the correct \I{Name} to substitute to.
+This isn't obviously true, consider:
\begin{verbatim}
-module HOLE:S.S where
- data S
- data T
-module THIS:A where
- data A = A HOLE:S.S HOLE:S.T
--- where THIS = p(S -> HOLE:S)
+ package p where
+ signature S(foo) where
+ data T
+ foo :: T
+ module M(bar) where
+ import S
+ bar = foo
+ package q where
+ module A(T(..)) where
+ data T = T
+ foo = T
+ module S(foo) where
+ import A
+ include p
+ module A where
+ import M
+ ... bar ...
\end{verbatim}
-
-Next, we'd like to type check a package which includes \verb|p|:
+%
+When we type check \verb|p|, we get the \I{ModIface}s:
\begin{verbatim}
-package q (T, A, B) requires (H) where
- include p (A) requires (S as H)
- module T(T) where
- data T = T
- signature H(T) where
- import T(T)
- module B(B) where
- import A
- data B = B A
+ module HOLE:S(HOLE:S.foo) where
+ data T
+ foo :: HOLE:S.T
+ module THIS:M(THIS:M.bar) where
+ bar :: HOLE:S.T
\end{verbatim}
+%
+Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
+which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
+The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
+this should be substituted to \verb|q():S.T|. But how do we discover this?
+We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
+and look for \verb|q():S.T|. However, this \I{Name} does not exist because
+the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
+the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
+type.
+
+The conclusion, then, is that a module written this way should be disallowed.
+Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
+mentioned in the \I{ModIface} of a signature must either be from an external module, or be
+exported by the signature}.
-% package r where
-% include q
-% module H(S,T) where
-% import T(T)
-% data S = S
-% module C where
-% import A
-% import B
-% ...
-
-Prior to typechecking, we compute its shape:
-
+\begin{aside}
+\textbf{Special case export rule for record selectors.} Here is the analogous case for
+record selectors:
\begin{verbatim}
-provides: (elided)
-requires: H -> { HOLE:H.S, THIS:T.T }
--- where THIS = q(H -> HOLE:H)
+ package p where
+ signature S(foo) where
+ data T = T { foo :: Int }
+ module M(bar) where
+ import S
+ bar = foo
+ package q where
+ module A(T(..)) where
+ data T = T { foo :: Int }
+ module S(foo) where
+ import A
+ include p
+ module A where
+ import M
+ ... bar ...
\end{verbatim}
-Our goal is to get the following type:
-
-\begin{verbatim}
-module THIS:T where
- data T = T
-module THIS:B where
- data B = B p(S -> HOLE:H):A.A
- -- where data A = A HOLE:H.S THIS:T.T
--- where THIS = q(H -> HOLE:H)
-\end{verbatim}
+We could reject this, but technically we can find the right substitution
+for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which
+does mention \verb|T|.
+\end{aside}
-This type information does \emph{not} match the pre-existing
-type information from \verb|p|: when we translate the \I{ModIface} for
-\verb|A| in the context into a \I{ModDetails} from this typechecking,
-we need to substitute \I{Name}s and \I{Module}s
-as specified by shaping. Specifically, when we load \verb|p(S -> HOLE:H):A|
-to find out the type of \verb|p(S -> HOLE:H):A.A|,
-we need to take \verb|HOLE:S.S| to \verb|HOLE:H.S| and \verb|HOLE:S.T| to \verb|THIS:T.T|.
-In both cases, we can determine the right translation by looking at how \verb|S| is
-instantiated in the package key for \verb|p| (it is instantiated with \verb|HOLE:H|),
-and then consulting the shape in the requirements.
-
-This process is done lazily, as we may not have typechecked the original
-\I{Name} in question when doing this. \verb|hs-boot| considerations apply
-if things are loopy: we have to treat the type abstractly and re-typecheck it
-to the right type later.
-
-
-\subsection{Re-renaming}
-
-Theoretically, the cleanest way to do shaping and typechecking is to have shaping
-result in a fully renamed syntax tree, which we then typecheck: when done this way,
-we don't have to worry about logical contexts (i.e., what is in scope) because
-shaping will already have complained if things were not in scope.
-
-However, for practical purposes, it's better if we don't try to keep
-around renamed syntax trees, because this could result in very large
-memory use; additionally, whenever a substitution occurs, we would have
-to substitute over all of the renamed syntax trees. Thus, while
-type-checking, we'll also re-compute what is in scope (i.e., just the
-\I{OccName} bits of \verb|provided|). Nota bene: we still use the
-\I{Name}s from the shape as the destinations of these
-\I{OccName}s! Note that we can't just use the final shape, because
-this may report more things in scope than we actually want. (It's also
-worth noting that if we could reduce the set of provided things in
-scope in a single package, just the \I{Shape} would not be enough.)
-
-\subsection{Merging \texttt{ModDetails}}
-
-After type-checking a signature, we may turn to add it to our module
-environment and discover there is already an entry for it! In that case,
-we simply merge it with the existing entry, erroring if there are incompatible
-entries.
-
-\end{document}
+\end{document} % chktex 16