diff options
author | Manuel M T Chakravarty <chak@cse.unsw.edu.au> | 2008-10-08 06:19:27 +0000 |
---|---|---|
committer | Manuel M T Chakravarty <chak@cse.unsw.edu.au> | 2008-10-08 06:19:27 +0000 |
commit | dd82b49ad6f719bd324de7f2d63f3341c0e87694 (patch) | |
tree | c06a4c7fc54ed1ffc58341ec2ac861d3d77e3cab /docs | |
parent | 5f68f5880068404476c21b00be4aa2fc1a27a5a9 (diff) | |
download | haskell-dd82b49ad6f719bd324de7f2d63f3341c0e87694.tar.gz |
Users Guide: added type family documentation
MERGE TO 6.10
Diffstat (limited to 'docs')
-rw-r--r-- | docs/users_guide/glasgow_exts.xml | 668 |
1 files changed, 641 insertions, 27 deletions
diff --git a/docs/users_guide/glasgow_exts.xml b/docs/users_guide/glasgow_exts.xml index 0eb0f7d480..e19f5d0f28 100644 --- a/docs/users_guide/glasgow_exts.xml +++ b/docs/users_guide/glasgow_exts.xml @@ -3694,6 +3694,647 @@ to work since it gets translated into an equality comparison. </sect1> +<sect1 id="type-families"> +<title>Type families</title> + +<para> + <firstterm>Indexed type families</firstterm> are a new GHC extension to + facilitate type-level + programming. Type families are a generalisation of <firstterm>associated + data types</firstterm> + (“<ulink url="http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html">Associated + Types with Class</ulink>”, M. Chakravarty, G. Keller, S. Peyton Jones, + and S. Marlow. In Proceedings of “The 32nd Annual ACM SIGPLAN-SIGACT + Symposium on Principles of Programming Languages (POPL'05)”, pages + 1-13, ACM Press, 2005) and <firstterm>associated type synonyms</firstterm> + (“<ulink url="http://www.cse.unsw.edu.au/~chak/papers/CKP05.html">Type + Associated Type Synonyms</ulink>”. M. Chakravarty, G. Keller, and + S. Peyton Jones. + In Proceedings of “The Tenth ACM SIGPLAN International Conference on + Functional Programming”, ACM Press, pages 241-253, 2005). Type families + themselves are described in the paper “<ulink + url="http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html">Type + Checking with Open Type Functions</ulink>”, T. Schrijvers, + S. Peyton-Jones, + M. Chakravarty, and M. Sulzmann, in Proceedings of “ICFP 2008: The + 13th ACM SIGPLAN International Conference on Functional + Programming”, ACM Press, pages 51-62, 2008. Type families + essentially provide type-indexed data types and named functions on types, + which are useful for generic programming and highly parameterised library + interfaces as well as interfaces with enhanced static information, much like + dependent types. They might also be regarded as an alternative to functional + dependencies, but provide a more functional style of type-level programming + than the relational style of functional dependencies. +</para> +<para> + Indexed type families, or type families for short, are type constructors that + represent sets of types. Set members are denoted by supplying the type family + constructor with type parameters, which are called <firstterm>type + indices</firstterm>. The + difference between vanilla parametrised type constructors and family + constructors is much like between parametrically polymorphic functions and + (ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions + behave the same at all type instances, whereas class methods can change their + behaviour in dependence on the class type parameters. Similarly, vanilla type + constructors imply the same data representation for all type instances, but + family constructors can have varying representation types for varying type + indices. +</para> +<para> + Indexed type families come in two flavours: <firstterm>data + families</firstterm> and <firstterm>type synonym + families</firstterm>. They are the indexed family variants of algebraic + data types and type synonyms, respectively. The instances of data families + can be data types and newtypes. +</para> +<para> + Type families are enabled by the flag <option>-XTypeFamilies</option>. + Additional information on the use of type families in GHC is available on + <ulink url="http://www.haskell.org/haskellwiki/GHC/Indexed_types">the + Haskell wiki page on type families</ulink>. +</para> + +<sect2 id="data-families"> + <title>Data families</title> + + <para> + Data families appear in two flavours: (1) they can be defined on the + toplevel + or (2) they can appear inside type classes (in which case they are known as + associated types). The former is the more general variant, as it lacks the + requirement for the type-indexes to coincide with the class + parameters. However, the latter can lead to more clearly structured code and + compiler warnings if some type instances were - possibly accidentally - + omitted. In the following, we always discuss the general toplevel form first + and then cover the additional constraints placed on associated types. + </para> + + <sect3 id="data-family-declarations"> + <title>Data family declarations</title> + + <para> + Indexed data families are introduced by a signature, such as +<programlisting> +data family GMap k :: * -> * +</programlisting> + The special <literal>family</literal> distinguishes family from standard + data declarations. The result kind annotation is optional and, as + usual, defaults to <literal>*</literal> if omitted. An example is +<programlisting> +data family Array e +</programlisting> + Named arguments can also be given explicit kind signatures if needed. + Just as with + [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT + declarations] named arguments are entirely optional, so that we can + declare <literal>Array</literal> alternatively with +<programlisting> +data family Array :: * -> * +</programlisting> + </para> + + <sect4 id="assoc-data-family-decl"> + <title>Associated data family declarations</title> + <para> + When a data family is declared as part of a type class, we drop + the <literal>family</literal> special. The <literal>GMap</literal> + declaration takes the following form +<programlisting> +class GMapKey k where + data GMap k :: * -> * + ... +</programlisting> + In contrast to toplevel declarations, named arguments must be used for + all type parameters that are to be used as type-indexes. Moreover, + the argument names must be class parameters. Each class parameter may + only be used at most once per associated type, but some may be omitted + and they may be in an order other than in the class head. Hence, the + following contrived example is admissible: +<programlisting> + class C a b c where + data T c a :: * +</programlisting> + </para> + </sect4> + </sect3> + + <sect3 id="data-instance-declarations"> + <title>Data instance declarations</title> + + <para> + Instance declarations of data and newtype families are very similar to + standard data and newtype declarations. The only two differences are + that the keyword <literal>data</literal> or <literal>newtype</literal> + is followed by <literal>instance</literal> and that some or all of the + type arguments can be non-variable types, but may not contain forall + types or type synonym families. However, data families are generally + allowed in type parameters, and type synonyms are allowed as long as + they are fully applied and expand to a type that is itself admissible - + exactly as this is required for occurrences of type synonyms in class + instance parameters. For example, the <literal>Either</literal> + instance for <literal>GMap</literal> is +<programlisting> +data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) +</programlisting> + In this example, the declaration has only one variant. In general, it + can be any number. + </para> + <para> + Data and newtype instance declarations are only legit when an + appropriate family declaration is in scope - just like class instances + require the class declaration to be visible. Moreover, each instance + declaration has to conform to the kind determined by its family + declaration. This implies that the number of parameters of an instance + declaration matches the arity determined by the kind of the family. + Although, all data families are declared with + the <literal>data</literal> keyword, instances can be + either <literal>data</literal> or <literal>newtype</literal>s, or a mix + of both. + </para> + <para> + Even if type families are defined as toplevel declarations, functions + that perform different computations for different family instances still + need to be defined as methods of type classes. In particular, the + following is not possible: +<programlisting> +data family T a +data instance T Int = A +data instance T Char = B +nonsence :: T a -> Int +nonsence A = 1 -- WRONG: These two equations together... +nonsence B = 2 -- ...will produce a type error. +</programlisting> + Given the functionality provided by GADTs (Generalised Algebraic Data + Types), it might seem as if a definition, such as the above, should be + feasible. However, type families are - in contrast to GADTs - are + <emphasis>open;</emphasis> i.e., new instances can always be added, + possibly in other + modules. Supporting pattern matching across different data instances + would require a form of extensible case construct. + </para> + + <sect4 id="assoc-data-inst"> + <title>Associated data instances</title> + <para> + When an associated data family instance is declared within a type + class instance, we drop the <literal>instance</literal> keyword in the + family instance. So, the <literal>Either</literal> instance + for <literal>GMap</literal> becomes: +<programlisting> +instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where + data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) + ... +</programlisting> + The most important point about associated family instances is that the + type indexes corresponding to class parameters must be identical to + the type given in the instance head; here this is the first argument + of <literal>GMap</literal>, namely <literal>Either a b</literal>, + which coincides with the only class parameter. Any parameters to the + family constructor that do not correspond to class parameters, need to + be variables in every instance; here this is the + variable <literal>v</literal>. + </para> + <para> + Instances for an associated family can only appear as part of + instances declarations of the class in which the family was declared - + just as with the equations of the methods of a class. Also in + correspondence to how methods are handled, declarations of associated + types can be omitted in class instances. If an associated family + instance is omitted, the corresponding instance type is not inhabited; + i.e., only diverging expressions, such + as <literal>undefined</literal>, can assume the type. + </para> + </sect4> + + <sect4 id="scoping-class-params"> + <title>Scoping of class parameters</title> + <para> + In the case of multi-parameter type classes, the visibility of class + parameters in the right-hand side of associated family instances + depends <emphasis>solely</emphasis> on the parameters of the data + family. As an example, consider the simple class declaration +<programlisting> +class C a b where + data T a +</programlisting> + Only one of the two class parameters is a parameter to the data + family. Hence, the following instance declaration is invalid: +<programlisting> +instance C [c] d where + data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope +</programlisting> + Here, the right-hand side of the data instance mentions the type + variable <literal>d</literal> that does not occur in its left-hand + side. We cannot admit such data instances as they would compromise + type safety. + </para> + </sect4> + + <sect4 id="family-class-inst"> + <title>Type class instances of family instances</title> + <para> + Type class instances of instances of data families can be defined as + usual, and in particular data instance declarations can + have <literal>deriving</literal> clauses. For example, we can write +<programlisting> +data GMap () v = GMapUnit (Maybe v) + deriving Show +</programlisting> + which implicitly defines an instance of the form +<programlisting> +instance Show v => Show (GMap () v) where ... +</programlisting> + </para> + <para> + Note that class instances are always for + particular <emphasis>instances</emphasis> of a data family and never + for an entire family as a whole. This is for essentially the same + reasons that we cannot define a toplevel function that performs + pattern matching on the data constructors + of <emphasis>different</emphasis> instances of a single type family. + It would require a form of extensible case construct. + </para> + </sect4> + + <sect4 id="data-family-overlap"> + <title>Overlap of data instances</title> + <para> + The instance declarations of a data family used in a single program + may not overlap at all, independent of whether they are associated or + not. In contrast to type class instances, this is not only a matter + of consistency, but one of type safety. + </para> + </sect4> + + </sect3> + + <sect3 id="data-family-import-export"> + <title>Import and export</title> + + <para> + The association of data constructors with type families is more dynamic + than that is the case with standard data and newtype declarations. In + the standard case, the notation <literal>T(..)</literal> in an import or + export list denotes the type constructor and all the data constructors + introduced in its declaration. However, a family declaration never + introduces any data constructors; instead, data constructors are + introduced by family instances. As a result, which data constructors + are associated with a type family depends on the currently visible + instance declarations for that family. Consequently, an import or + export item of the form <literal>T(..)</literal> denotes the family + constructor and all currently visible data constructors - in the case of + an export item, these may be either imported or defined in the current + module. The treatment of import and export items that explicitly list + data constructors, such as <literal>GMap(GMapEither)</literal>, is + analogous. + </para> + + <sect4 id="data-family-impexp-assoc"> + <title>Associated families</title> + <para> + As expected, an import or export item of the + form <literal>C(..)</literal> denotes all of the class' methods and + associated types. However, when associated types are explicitly + listed as subitems of a class, we need some new syntax, as uppercase + identifiers as subitems are usually data constructors, not type + constructors. To clarify that we denote types here, each associated + type name needs to be prefixed by the keyword <literal>type</literal>. + So for example, when explicitly listing the components of + the <literal>GMapKey</literal> class, we write <literal>GMapKey(type + GMap, empty, lookup, insert)</literal>. + </para> + </sect4> + + <sect4 id="data-family-impexp-examples"> + <title>Examples</title> + <para> + Assuming our running <literal>GMapKey</literal> class example, let us + look at some export lists and their meaning: + <itemizedlist> + <listitem> + <para><literal>module GMap (GMapKey) where...</literal>: Exports + just the class name.</para> + </listitem> + <listitem> + <para><literal>module GMap (GMapKey(..)) where...</literal>: + Exports the class, the associated type <literal>GMap</literal> + and the member + functions <literal>empty</literal>, <literal>lookup</literal>, + and <literal>insert</literal>. None of the data constructors is + exported.</para> + </listitem> + <listitem> + <para><literal>module GMap (GMapKey(..), GMap(..)) + where...</literal>: As before, but also exports all the data + constructors <literal>GMapInt</literal>, + <literal>GMapChar</literal>, + <literal>GMapUnit</literal>, <literal>GMapPair</literal>, + and <literal>GMapUnit</literal>.</para> + </listitem> + <listitem> + <para><literal>module GMap (GMapKey(empty, lookup, insert), + GMap(..)) where...</literal>: As before.</para> + </listitem> + <listitem> + <para><literal>module GMap (GMapKey, empty, lookup, insert, GMap(..)) + where...</literal>: As before.</para> + </listitem> + </itemizedlist> + </para> + <para> + Finally, you can write <literal>GMapKey(type GMap)</literal> to denote + both the class <literal>GMapKey</literal> as well as its associated + type <literal>GMap</literal>. However, you cannot + write <literal>GMapKey(type GMap(..))</literal> — i.e., + sub-component specifications cannot be nested. To + specify <literal>GMap</literal>'s data constructors, you have to list + it separately. + </para> + </sect4> + + <sect4 id="data-family-impexp-instances"> + <title>Instances</title> + <para> + Family instances are implicitly exported, just like class instances. + However, this applies only to the heads of instances, not to the data + constructors an instance defines. + </para> + </sect4> + + </sect3> + +</sect2> + +<sect2 id="synonym-families"> + <title>Synonym families</title> + + <para> + Type families appear in two flavours: (1) they can be defined on the + toplevel or (2) they can appear inside type classes (in which case they + are known as associated type synonyms). The former is the more general + variant, as it lacks the requirement for the type-indexes to coincide with + the class parameters. However, the latter can lead to more clearly + structured code and compiler warnings if some type instances were - + possibly accidentally - omitted. In the following, we always discuss the + general toplevel form first and then cover the additional constraints + placed on associated types. + </para> + + <sect3 id="type-family-declarations"> + <title>Type family declarations</title> + + <para> + Indexed type families are introduced by a signature, such as +<programlisting> +type family Elem c :: * +</programlisting> + The special <literal>family</literal> distinguishes family from standard + type declarations. The result kind annotation is optional and, as + usual, defaults to <literal>*</literal> if omitted. An example is +<programlisting> +type family Elem c +</programlisting> + Parameters can also be given explicit kind signatures if needed. We + call the number of parameters in a type family declaration, the family's + arity, and all applications of a type family must be fully saturated + w.r.t. to that arity. This requirement is unlike ordinary type synonyms + and it implies that the kind of a type family is not sufficient to + determine a family's arity, and hence in general, also insufficient to + determine whether a type family application is well formed. As an + example, consider the following declaration: +<programlisting> +type family F a b :: * -> * -- F's arity is 2, + -- although it's overall kind is * -> * -> * -> * +</programlisting> + Given this declaration the following are examples of well-formed and + malformed types: +<programlisting> +F Char [Int] -- OK! Kind: * -> * +F Char [Int] Bool -- OK! Kind: * +F IO Bool -- WRONG: kind mismatch in the first argument +F Bool -- WRONG: unsaturated application +</programlisting> + </para> + + <sect4 id="assoc-type-family-decl"> + <title>Associated type family declarations</title> + <para> + When a type family is declared as part of a type class, we drop + the <literal>family</literal> special. The <literal>Elem</literal> + declaration takes the following form +<programlisting> +class Collects ce where + type Elem ce :: * + ... +</programlisting> + The argument names of the type family must be class parameters. Each + class parameter may only be used at most once per associated type, but + some may be omitted and they may be in an order other than in the + class head. Hence, the following contrived example is admissible: +<programlisting> +class C a b c where + type T c a :: * +</programlisting> + These rules are exactly as for associated data families. + </para> + </sect4> + </sect3> + + <sect3 id="type-instance-declarations"> + <title>Type instance declarations</title> + <para> + Instance declarations of type families are very similar to standard type + synonym declarations. The only two differences are that the + keyword <literal>type</literal> is followed + by <literal>instance</literal> and that some or all of the type + arguments can be non-variable types, but may not contain forall types or + type synonym families. However, data families are generally allowed, and + type synonyms are allowed as long as they are fully applied and expand + to a type that is admissible - these are the exact same requirements as + for data instances. For example, the <literal>[e]</literal> instance + for <literal>Elem</literal> is +<programlisting> +type instance Elem [e] = e +</programlisting> + </para> + <para> + Type family instance declarations are only legitimate when an + appropriate family declaration is in scope - just like class instances + require the class declaration to be visible. Moreover, each instance + declaration has to conform to the kind determined by its family + declaration, and the number of type parameters in an instance + declaration must match the number of type parameters in the family + declaration. Finally, the right-hand side of a type instance must be a + monotype (i.e., it may not include foralls) and after the expansion of + all saturated vanilla type synonyms, no synonyms, except family synonyms + may remain. Here are some examples of admissible and illegal type + instances: +<programlisting> +type family F a :: * +type instance F [Int] = Int -- OK! +type instance F String = Char -- OK! +type instance F (F a) = a -- WRONG: type parameter mentions a type family +type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter +type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type + +type family G a b :: * -> * +type instance G Int = (,) -- WRONG: must be two type parameters +type instance G Int Char Float = Double -- WRONG: must be two type parameters +</programlisting> + </para> + + <sect4 id="assoc-type-instance"> + <title>Associated type instance declarations</title> + <para> + When an associated family instance is declared within a type class + instance, we drop the <literal>instance</literal> keyword in the family + instance. So, the <literal>[e]</literal> instance + for <literal>Elem</literal> becomes: +<programlisting> +instance (Eq (Elem [e])) => Collects ([e]) where + type Elem [e] = e + ... +</programlisting> + The most important point about associated family instances is that the + type indexes corresponding to class parameters must be identical to the + type given in the instance head; here this is <literal>[e]</literal>, + which coincides with the only class parameter. + </para> + <para> + Instances for an associated family can only appear as part of instances + declarations of the class in which the family was declared - just as + with the equations of the methods of a class. Also in correspondence to + how methods are handled, declarations of associated types can be omitted + in class instances. If an associated family instance is omitted, the + corresponding instance type is not inhabited; i.e., only diverging + expressions, such as <literal>undefined</literal>, can assume the type. + </para> + </sect4> + + <sect4 id="type-family-overlap"> + <title>Overlap of type synonym instances</title> + <para> + The instance declarations of a type family used in a single program + may only overlap if the right-hand sides of the overlapping instances + coincide for the overlapping types. More formally, two instance + declarations overlap if there is a substitution that makes the + left-hand sides of the instances syntactically the same. Whenever + that is the case, the right-hand sides of the instances must also be + syntactically equal under the same substitution. This condition is + independent of whether the type family is associated or not, and it is + not only a matter of consistency, but one of type safety. + </para> + <para> + Here are two example to illustrate the condition under which overlap + is permitted. +<programlisting> +type instance F (a, Int) = [a] +type instance F (Int, b) = [b] -- overlap permitted + +type instance G (a, Int) = [a] +type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int] +</programlisting> + </para> + </sect4> + + <sect4 id="type-family-decidability"> + <title>Decidability of type synonym instances</title> + <para> + In order to guarantee that type inference in the presence of type + families decidable, we need to place a number of additional + restrictions on the formation of type instance declarations (c.f., + Definition 5 (Relaxed Conditions) of “<ulink + url="http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html">Type + Checking with Open Type Functions</ulink>”). Instance + declarations have the general form +<programlisting> +type instance F t1 .. tn = t +</programlisting> + where we require that for every type family application <literal>(G s1 + .. sm)</literal> in <literal>t</literal>, + <orderedlist> + <listitem> + <para><literal>s1 .. sm</literal> do not contain any type family + constructors,</para> + </listitem> + <listitem> + <para>the total number of symbols (data type constructors and type + variables) in <literal>s1 .. sm</literal> is strictly smaller than + in <literal>t1 .. tn</literal>, and</para> + </listitem> + <listitem> + <para>for every type + variable <literal>a</literal>, <literal>a</literal> occurs + in <literal>s1 .. sm</literal> at most as often as in <literal>t1 + .. tn</literal>.</para> + </listitem> + </orderedlist> + These restrictions are easily verified and ensure termination of type + inference. However, they are not sufficient to guarantee completeness + of type inference in the presence of, so called, ''loopy equalities'', + such as <literal>a ~ [F a]</literal>, where a recursive occurrence of + a type variable is underneath a family application and data + constructor application - see the above mentioned paper for details. + </para> + <para> + If the option <option>-XUndecidableInstances</option> is passed to the + compiler, the above restrictions are not enforced and it is on the + programmer to ensure termination of the normalisation of type families + during type inference. + </para> + </sect4> + </sect3> + + <sect3 id-="equality-constraints"> + <title>Equality constraints</title> + <para> + Type context can include equality constraints of the form <literal>t1 ~ + t2</literal>, which denote that the types <literal>t1</literal> + and <literal>t2</literal> need to be the same. In the presence of type + families, whether two types are equal cannot generally be decided + locally. Hence, the contexts of function signatures may include + equality constraints, as in the following example: +<programlisting> +sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2 +</programlisting> + where we require that the element type of <literal>c1</literal> + and <literal>c2</literal> are the same. In general, the + types <literal>t1</literal> and <literal>t2</literal> of an equality + constraint may be arbitrary monotypes; i.e., they may not contain any + quantifiers, independent of whether higher-rank types are otherwise + enabled. + </para> + <para> + Equality constraints can also appear in class and instance contexts. + The former enable a simple translation of programs using functional + dependencies into programs using family synonyms instead. The general + idea is to rewrite a class declaration of the form +<programlisting> +class C a b | a -> b +</programlisting> + to +<programlisting> +class (F a ~ b) => C a b where + type F a +</programlisting> + That is, we represent every functional dependency (FD) <literal>a1 .. an + -> b</literal> by an FD type family <literal>F a1 .. an</literal> and a + superclass context equality <literal>F a1 .. an ~ b</literal>, + essentially giving a name to the functional dependency. In class + instances, we define the type instances of FD families in accordance + with the class head. Method signatures are not affected by that + process. + </para> + <para> + NB: Equalities in superclass contexts are not fully implemented in + GHC 6.10. + </para> + </sect3> + +</sect2> + +</sect1> + <sect1 id="other-type-extensions"> <title>Other type system extensions</title> @@ -4975,33 +5616,6 @@ pattern binding must have the same context. For example, this is fine: </para> </sect2> -<sect2 id="type-families"> -<title>Type families -</title> - -<para> -GHC supports the definition of type families indexed by types. They may be -seen as an extension of Haskell 98's class-based overloading of values to -types. When type families are declared in classes, they are also known as -associated types. -</para> -<para> -There are two forms of type families: data families and type synonym families. -Currently, only the former are fully implemented, while we are still working -on the latter. As a result, the specification of the language extension is -also still to some degree in flux. Hence, a more detailed description of -the language extension and its use is currently available -from <ulink url="http://www.haskell.org/haskellwiki/GHC/Indexed_types">the Haskell -wiki page on type families</ulink>. The material will be moved to this user's -guide when it has stabilised. -</para> -<para> -Type families are enabled by the flag <option>-XTypeFamilies</option>. -</para> - - -</sect2> - </sect1> <!-- ==================== End of type system extensions ================= --> |