summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-14 10:59:47 +0100
committerEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-14 10:59:53 +0100
commitc85a3b0bc190fb065be92357b264e932a8423388 (patch)
tree4222f0786b769255c47f4a5c5882bf66651447a3 /docs
parent22e992e24abc5690d794a1b1a913511845c61046 (diff)
downloadhaskell-c85a3b0bc190fb065be92357b264e932a8423388.tar.gz
Finish TCs section
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r--docs/backpack/backpack-impl.tex323
1 files changed, 238 insertions, 85 deletions
diff --git a/docs/backpack/backpack-impl.tex b/docs/backpack/backpack-impl.tex
index e032cd193a..4775a5a6cb 100644
--- a/docs/backpack/backpack-impl.tex
+++ b/docs/backpack/backpack-impl.tex
@@ -384,9 +384,13 @@ package foo-0.3(bar-0.1) where
Notably, we must \emph{rename} the package to include information about
how we resolved all of the inner package references, and if these inner
package references had dependencies, those must be included too! In
-effect, the \emph{dependency resolution} must be encoded into the package ID,
-along with the existing Backpack \emph{physical identity regular tree}.
-Phew!
+effect, the \emph{dependency resolution} must be encoded into the package ID\@.
+
+When we are operating at the package granularity with definite packages,
+this is, in fact, the only information we need to record. However,
+in other cases, we may need to record more information, e.g., the
+\emph{physical identity regular tree}. This, however, depends on the
+choices made in Section~\ref{sec:flatten}.
\paragraph{Free variables (or, what is a non-concrete physical
identity?)} Physical identities in their full generality are permitted
@@ -1785,14 +1789,49 @@ As it turns out, there is already another feature in Haskell which
must enforce global uniqueness, to prevent segfaults.
We now turn to type classes' close cousin: type families.
-\paragraph{Type families}
+\paragraph{Type families} With type families, confluence is the primary
+property of interest. (Coherence is not of much interest because type
+families are elaborated into coercions, which don't have any
+computational content.) Rather than considering what the set of
+constraints we reduce to, confluence for type families considers the
+reduction of type families. The overlap checks for type families
+can be quite sophisticated, especially in the case of closed type
+families.
-\subsection{Explicit instance imports}
+Unlike type classes, however, GHC \emph{does} check the non-overlap
+of type families eagerly. The analogous program does \emph{not} type check:
-Backpack applies thinning behavior to types and values, so that an
-errant declaration doesn't cause a module which typechecked against
-a signature to stop typechecking against a concrete implementation.
-The same situation applies for type class instances:
+\begin{verbatim}
+-- F.hs
+type family F a :: *
+-- A.hs
+import F
+type instance F Bool = Int
+-- B.hs
+import F
+type instance F Bool = Bool
+-- C.hs
+import A
+import B
+\end{verbatim}
+
+The reason is that it is \emph{unsound} to ever allow any overlap
+(unlike in the case of type classes where it just leads to incoherence.)
+Thus, whereas one might imagine dropping the global uniqueness of instances
+invariant for type classes, it is absolutely necessary to perform global
+enforcement here. There's no way around it!
+
+\subsection{Local type classes}
+
+Here, we say \textbf{NO} to global uniqueness.
+
+This design is perhaps best discussed in relation to modular type
+classes, which shares many similar properties. Instances are now
+treated as first class objects (in MTCs, they are simply modules)---we
+may explicitly hide or include instances for type class resolution (in
+MTCs, this is done via the \verb|using| top-level declaration). This is
+essentially what was sketched in Section 5 of the original Backpack
+paper. As a simple example:
\begin{verbatim}
package p where
@@ -1804,108 +1843,222 @@ package q where
include p
\end{verbatim}
-The instance defined in \verb|A| should be hidden from \verb|B|. Note
-that this does \emph{not} cause incoherence: it is still unambiguous
-which instance is referred to in the text of both modules. However,
-it does make it a lot easier to observe different choices of an
-instance downstream.
+Here, \verb|B| does not see the extra instance declared by \verb|A|,
+because it was thinned from its signature of \verb|A| (and thus never
+declared canonical.) To declare an instance without making it
+canonical, it must placed in a separate (unimported) module.
-\subsection{Module inequalities}
+Like modular type classes, Backpack does not give rise to incoherence,
+because instance visibility can only be changed at the top level module
+language, where it is already considered best practice to provide
+explicit signatures. Here is the example used in the Modular Type
+Classes paper to demonstrate the problem:
+
+\begin{verbatim}
+structure A = using EqInt1 in
+ struct ...fun f x = eq(x,x)... end
+structure B = using EqInt2 in
+ struct ...val y = A.f(3)... end
+\end{verbatim}
-So what is the big problem we have to deal with? Here is the canonical
-example, from the Backpack paper:
+Is the type of f \verb|int -> bool|, or does it have a type-class
+constraint? Because type checking proceeds over the entire program, ML
+could hypothetically pick either. However, ported to Haskell, the
+example looks like this:
\begin{verbatim}
-P = [ class Eq a where ... ]
-A :: [ data T ]
-B :: [ data T ]
-C :: [
- import P
- import qualified A
- import qualified B
- instance Eq A.T where ...
- instance Eq B.T where ...
+EqInt1 :: [ instance Eq Int ]
+EqInt2 :: [ instance Eq Int ]
+A = [
+ import EqInt1
+ f x = x == x
+]
+B = [
+ import EqInt2
+ import A hiding (instance Eq Int)
+ y = f 3
]
\end{verbatim}
-Signature \verb|C| is only well typed if \verb|A.T != B.T|. There have
-been two proposed solutions for this problem:
-
-\paragraph{Link-time check} This is the solution that was described in
-the paper. When some instances are typechecked initially, we type check
-them as if all of variable module identities were distinct. Then, when
-we perform linking (we \verb|include| or we unify some
-module identities), we check again if to see if we've discovered some
-instance overlap.
+There may be ambiguity, yes, but it can be easily resolved by the
+addition of a top-level type signature to \verb|f|, which is considered
+best-practice anyway. Additionally, Haskell users are trained to expect
+a particular inference for \verb|f| in any case (the polymorphic one).
-This solution is extremely inefficient, unfortunately. Consider the following
-set of modules, as compiled by GHC today:
+Here is another example which might be considered surprising:
\begin{verbatim}
-A = [ data T ]
-B = [ import A; instance Eq T ]
-C = [ import A; instance Eq T ]
-D = [ import B; import C ]
+package p where
+ A :: [ data T = T ]
+ B :: [ data T = T ]
+ C = [
+ import qualified A
+ import qualified B
+ instance Show A.T where show T = "A"
+ instance Show B.T where show T = "B"
+ x :: String
+ x = show A.T ++ show B.T
+ ]
\end{verbatim}
-Currently, \emph{no overlapping instance} error is thrown in module \verb|D|.
-Why? Because in order to discover overlap, GHC must compare every instance
-exported by \verb|B| with every instance against \verb|C|.
+In the original Backpack paper, it was implied that module \verb|C|
+should not type check if \verb|A.T = B.T| (failing at link time).
+However, if we set aside, for a moment, the issue that anyone who
+imports \verb|C| in such a context will now have overlapping instances,
+there is no reason in principle why the module itself should be
+problematic. Here is the example in MTCs, which I have good word from
+Derek does type check.
-\paragraph{Inequality constraints}
-
-\subsection{Type families}
+\begin{verbatim}
+signature SIG = sig
+ type t
+ val mk : t
+end
+signature SHOW = sig
+ type t
+ val show : t -> string
+end
+functor Example (A : SIG) (B : SIG) =
+ let structure ShowA : SHOW = struct
+ type t = A.t
+ fun show _ = "A"
+ end in
+ let structure ShowB : SHOW = struct
+ type t = B.t
+ fun show _ = "B"
+ end in
+ using ShowA, ShowB in
+ struct
+ val x = show A.mk ++ show B.mk
+ end : sig val x : string end
+\end{verbatim}
-Like type classes, type families must not overlap (and this is a question of
-type safety!)
+The moral of the story is, even though in a later context the instances
+are overlapping, inside the functor, the type-class resolution is unambiguous
+and should be done (so \verb|x = "AB"|).
-A more subtle question is compatibility and apartness of type family
-equations. Under these checks, aliasing of modules can fail if it causes
-two type families to be identified, but their definitions are not apart.
-Here is a simple example:
+Up until this point, we've argued why MTCs and this Backpack design are similar.
+However, there is an important sociological difference between modular type-classes
+and this proposed scheme for Backpack. In the presentation ``Why Applicative
+Functors Matter'', Derek mentions the canonical example of defining a set:
\begin{verbatim}
-A :: [
- type family F a :: *
- type instance F Int = Char
-]
-B :: [
- type family F a :: *
- type instance F Int = Bool
-]
+signature ORD = sig type t; val cmp : t -> t -> bool end
+signature SET = sig type t; type elem;
+ val empty : t;
+ val insert : elem -> t -> t ...
+end
+functor MkSet (X : ORD) :> SET where type elem = X.t
+ = struct ... end
\end{verbatim}
-Now it is illegal for \verb|A = B|, because when the type families are
-unified, the instances now fail the apartness check. However, if the second
-instance was \verb|F Int = Char|, the families would be able to link together.
-
-To make matters worse, an implementation may define more axioms than are
-visible in the signature:
+This is actually very different from how sets tend to be defined in
+Haskell today. If we directly encoded this in Backpack, it would
+look like this:
\begin{verbatim}
-package a where
- A :: [
- type family F a :: *
- type instance F Int = Bool
+package mk-set where
+ X :: [
+ data T
+ cmp :: T -> T-> Bool
]
-package b where
- include a
- B = [
- import A
- type instance F Bool = Bool
- ...
+ Set :: [
+ data Set
+ empty :: Set
+ insert :: T -> Set -> Set
]
-package c where
- A = [
- type family F a :: *
- type instance F Int = Bool
- type instance F Bool = Int
+ Set = [
+ import X
+ ...
]
- include b
\end{verbatim}
-It would seem that private axioms cannot be naively supported. Is
-there any way that thinning axioms could be made to work?
+It's also informative to consider how MTCs would encode set as it is written
+today in Haskell:
+
+\begin{verbatim}
+signature ORD = sig type t; val cmp : t -> t -> bool end
+signature SET = sig type 'a t;
+ val empty : 'a t;
+ val insert : (X : ORD) => X.t -> X.t t -> X.t t
+end
+struct MkSet :> SET = struct ... end
+\end{verbatim}
+
+Here, it is clear to see that while functor instantiation occurs for
+implementation, it is not occuring for types. This is a big limitation
+with the Haskell approach, and it explains why Haskellers, in practice,
+find global uniqueness of instances so desirable.
+
+Implementation-wise, this requires some subtle modifications to how we
+do type class resolution. Type checking of indefinite modules works as
+before, but when go to actually compile them against explicit
+implementations, we need to ``forget'' that two types are equal when
+doing instance resolution. This could probably be implemented by
+associating type class instances with the original name that was
+utilized when typechecking, so that we can resolve ambiguous matches
+against types which have the same original name now that we are
+compiling.
+
+As we've mentioned previously, this strategy is unsound for type families.
+
+\subsection{Globally unique}
+
+Here, we say \textbf{YES} to global uniqueness.
+
+When we require the global uniqueness of instances (either because
+that's the type class design we chose, or because we're considering
+the problem of type families), we will need to reject declarations like the
+one cited above when \verb|A.T = B.T|:
+
+\begin{verbatim}
+A :: [ data T ]
+B :: [ data T ]
+C :: [
+ import qualified A
+ import qualified B
+ instance Show A.T where show T = "A"
+ instance Show B.T where show T = "B"
+]
+\end{verbatim}
+
+The paper mentions that a link-time check is sufficient to prevent this
+case from arising. While in the previous section, we've argued why this
+is actually unnecessary when local instances are allowed, the link-time
+check is a good match in the case of global instances, because any
+instance \emph{must} be declared in the signature. The scheme proceeds
+as follows: when some instances are typechecked initially, we type check
+them as if all of variable module identities were distinct. Then, when
+we perform linking (we \verb|include| or we unify some module
+identities), we check again if to see if we've discovered some instance
+overlap. This linking check is akin to the eager check that is
+performed today for type families; it would need to be implemented for
+type classes as well: however, there is a twist: we are \emph{redoing}
+the overlap check now that some identities have been unified.
+
+As an implementation trick, one could deferring the check until \verb|C|
+is compiled, keeping in line with GHC's lazy ``don't check for overlap
+until the use site.'' (Once again, unsound for type families.)
+
+\paragraph{What about module inequalities?} An older proposal was for
+signatures to contain ``module inequalities'', i.e., assertions that two
+modules are not equal. (Technically: we need to be able to apply this
+assertion to $\beta$ module variables, since \verb|A != B| while
+\verb|A.T = B.T|). Currently, Edward thinks that module inequalities
+are only marginal utility with local instances (i.e., not enough to
+justify the implementation cost) and not useful at all in the world of
+global instances!
+
+With local instances, module inequalities could be useful to statically
+rule out examples like \verb|show A.T ++ show B.T|. Because such uses
+are not necessarily reflected in the signature, it would be a violation
+of separate module development to try to divine the constraint from the
+implementation itself. I claim this is of limited utility, however, because,
+as we mentioned earlier, we can compile these ``incoherent'' modules perfectly
+coherently. With global instances, all instances must be in the signature, so
+while it might be aesthetically displeasing to have the signature impose
+extra restrictions on linking identities, we can carry this out without
+violating the linking restriction.
\section{Bits and bobs}