diff options
Diffstat (limited to 'docs/users_guide/glasgow_exts.xml')
-rw-r--r-- | docs/users_guide/glasgow_exts.xml | 6264 |
1 files changed, 6264 insertions, 0 deletions
diff --git a/docs/users_guide/glasgow_exts.xml b/docs/users_guide/glasgow_exts.xml new file mode 100644 index 0000000000..beaaad616a --- /dev/null +++ b/docs/users_guide/glasgow_exts.xml @@ -0,0 +1,6264 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<para> +<indexterm><primary>language, GHC</primary></indexterm> +<indexterm><primary>extensions, GHC</primary></indexterm> +As with all known Haskell systems, GHC implements some extensions to +the language. They are all enabled by options; by default GHC +understands only plain Haskell 98. +</para> + +<para> +Some of the Glasgow extensions serve to give you access to the +underlying facilities with which we implement Haskell. Thus, you can +get at the Raw Iron, if you are willing to write some non-portable +code at a more primitive level. You need not be “stuck” +on performance because of the implementation costs of Haskell's +“high-level” features—you can always code +“under” them. In an extreme case, you can write all your +time-critical code in C, and then just glue it together with Haskell! +</para> + +<para> +Before you get too carried away working at the lowest level (e.g., +sloshing <literal>MutableByteArray#</literal>s around your +program), you may wish to check if there are libraries that provide a +“Haskellised veneer” over the features you want. The +separate <ulink url="../libraries/index.html">libraries +documentation</ulink> describes all the libraries that come with GHC. +</para> + +<!-- LANGUAGE OPTIONS --> + <sect1 id="options-language"> + <title>Language options</title> + + <indexterm><primary>language</primary><secondary>option</secondary> + </indexterm> + <indexterm><primary>options</primary><secondary>language</secondary> + </indexterm> + <indexterm><primary>extensions</primary><secondary>options controlling</secondary> + </indexterm> + + <para>These flags control what variation of the language are + permitted. Leaving out all of them gives you standard Haskell + 98.</para> + + <para>NB. turning on an option that enables special syntax + <emphasis>might</emphasis> cause working Haskell 98 code to fail + to compile, perhaps because it uses a variable name which has + become a reserved word. So, together with each option below, we + list the special syntax which is enabled by this option. We use + notation and nonterminal names from the Haskell 98 lexical syntax + (see the Haskell 98 Report). There are two classes of special + syntax:</para> + + <itemizedlist> + <listitem> + <para>New reserved words and symbols: character sequences + which are no longer available for use as identifiers in the + program.</para> + </listitem> + <listitem> + <para>Other special syntax: sequences of characters that have + a different meaning when this particular option is turned + on.</para> + </listitem> + </itemizedlist> + + <para>We are only listing syntax changes here that might affect + existing working programs (i.e. "stolen" syntax). Many of these + extensions will also enable new context-free syntax, but in all + cases programs written to use the new syntax would not be + compilable without the option enabled.</para> + + <variablelist> + + <varlistentry> + <term> + <option>-fglasgow-exts</option>: + <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm> + </term> + <listitem> + <para>This simultaneously enables all of the extensions to + Haskell 98 described in <xref + linkend="ghc-language-features"/>, except where otherwise + noted. </para> + + <para>New reserved words: <literal>forall</literal> (only in + types), <literal>mdo</literal>.</para> + + <para>Other syntax stolen: + <replaceable>varid</replaceable>{<literal>#</literal>}, + <replaceable>char</replaceable><literal>#</literal>, + <replaceable>string</replaceable><literal>#</literal>, + <replaceable>integer</replaceable><literal>#</literal>, + <replaceable>float</replaceable><literal>#</literal>, + <replaceable>float</replaceable><literal>##</literal>, + <literal>(#</literal>, <literal>#)</literal>, + <literal>|)</literal>, <literal>{|</literal>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-ffi</option> and <option>-fffi</option>: + <indexterm><primary><option>-ffi</option></primary></indexterm> + <indexterm><primary><option>-fffi</option></primary></indexterm> + </term> + <listitem> + <para>This option enables the language extension defined in the + Haskell 98 Foreign Function Interface Addendum plus deprecated + syntax of previous versions of the FFI for backwards + compatibility.</para> + + <para>New reserved words: <literal>foreign</literal>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-fno-monomorphism-restriction</option>: + <indexterm><primary><option>-fno-monomorphism-restriction</option></primary></indexterm> + </term> + <listitem> + <para> Switch off the Haskell 98 monomorphism restriction. + Independent of the <option>-fglasgow-exts</option> + flag. </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-fallow-overlapping-instances</option> + <indexterm><primary><option>-fallow-overlapping-instances</option></primary></indexterm> + </term> + <term> + <option>-fallow-undecidable-instances</option> + <indexterm><primary><option>-fallow-undecidable-instances</option></primary></indexterm> + </term> + <term> + <option>-fallow-incoherent-instances</option> + <indexterm><primary><option>-fallow-incoherent-instances</option></primary></indexterm> + </term> + <term> + <option>-fcontext-stack</option> + <indexterm><primary><option>-fcontext-stack</option></primary></indexterm> + </term> + <listitem> + <para> See <xref linkend="instance-decls"/>. Only relevant + if you also use <option>-fglasgow-exts</option>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-finline-phase</option> + <indexterm><primary><option>-finline-phase</option></primary></indexterm> + </term> + <listitem> + <para>See <xref linkend="rewrite-rules"/>. Only relevant if + you also use <option>-fglasgow-exts</option>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-farrows</option> + <indexterm><primary><option>-farrows</option></primary></indexterm> + </term> + <listitem> + <para>See <xref linkend="arrow-notation"/>. Independent of + <option>-fglasgow-exts</option>.</para> + + <para>New reserved words/symbols: <literal>rec</literal>, + <literal>proc</literal>, <literal>-<</literal>, + <literal>>-</literal>, <literal>-<<</literal>, + <literal>>>-</literal>.</para> + + <para>Other syntax stolen: <literal>(|</literal>, + <literal>|)</literal>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <option>-fgenerics</option> + <indexterm><primary><option>-fgenerics</option></primary></indexterm> + </term> + <listitem> + <para>See <xref linkend="generic-classes"/>. Independent of + <option>-fglasgow-exts</option>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-fno-implicit-prelude</option></term> + <listitem> + <para><indexterm><primary>-fno-implicit-prelude + option</primary></indexterm> GHC normally imports + <filename>Prelude.hi</filename> files for you. If you'd + rather it didn't, then give it a + <option>-fno-implicit-prelude</option> option. The idea is + that you can then import a Prelude of your own. (But don't + call it <literal>Prelude</literal>; the Haskell module + namespace is flat, and you must not conflict with any + Prelude module.)</para> + + <para>Even though you have not imported the Prelude, most of + the built-in syntax still refers to the built-in Haskell + Prelude types and values, as specified by the Haskell + Report. For example, the type <literal>[Int]</literal> + still means <literal>Prelude.[] Int</literal>; tuples + continue to refer to the standard Prelude tuples; the + translation for list comprehensions continues to use + <literal>Prelude.map</literal> etc.</para> + + <para>However, <option>-fno-implicit-prelude</option> does + change the handling of certain built-in syntax: see <xref + linkend="rebindable-syntax"/>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-fimplicit-params</option></term> + <listitem> + <para>Enables implicit parameters (see <xref + linkend="implicit-parameters"/>). Currently also implied by + <option>-fglasgow-exts</option>.</para> + + <para>Syntax stolen: + <literal>?<replaceable>varid</replaceable></literal>, + <literal>%<replaceable>varid</replaceable></literal>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-fscoped-type-variables</option></term> + <listitem> + <para>Enables lexically-scoped type variables (see <xref + linkend="scoped-type-variables"/>). Implied by + <option>-fglasgow-exts</option>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-fth</option></term> + <listitem> + <para>Enables Template Haskell (see <xref + linkend="template-haskell"/>). Currently also implied by + <option>-fglasgow-exts</option>.</para> + + <para>Syntax stolen: <literal>[|</literal>, + <literal>[e|</literal>, <literal>[p|</literal>, + <literal>[d|</literal>, <literal>[t|</literal>, + <literal>$(</literal>, + <literal>$<replaceable>varid</replaceable></literal>.</para> + </listitem> + </varlistentry> + + </variablelist> + </sect1> + +<!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS --> +<!-- included from primitives.sgml --> +<!-- &primitives; --> +<sect1 id="primitives"> + <title>Unboxed types and primitive operations</title> + +<para>GHC is built on a raft of primitive data types and operations. +While you really can use this stuff to write fast code, + we generally find it a lot less painful, and more satisfying in the + long run, to use higher-level language features and libraries. With + any luck, the code you write will be optimised to the efficient + unboxed version in any case. And if it isn't, we'd like to know + about it.</para> + +<para>We do not currently have good, up-to-date documentation about the +primitives, perhaps because they are mainly intended for internal use. +There used to be a long section about them here in the User Guide, but it +became out of date, and wrong information is worse than none.</para> + +<para>The Real Truth about what primitive types there are, and what operations +work over those types, is held in the file +<filename>fptools/ghc/compiler/prelude/primops.txt.pp</filename>. +This file is used directly to generate GHC's primitive-operation definitions, so +it is always correct! It is also intended for processing into text.</para> + +<para> Indeed, +the result of such processing is part of the description of the + <ulink + url="http://haskell.cs.yale.edu/ghc/docs/papers/core.ps.gz">External + Core language</ulink>. +So that document is a good place to look for a type-set version. +We would be very happy if someone wanted to volunteer to produce an SGML +back end to the program that processes <filename>primops.txt</filename> so that +we could include the results here in the User Guide.</para> + +<para>What follows here is a brief summary of some main points.</para> + +<sect2 id="glasgow-unboxed"> +<title>Unboxed types +</title> + +<para> +<indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm> +</para> + +<para>Most types in GHC are <firstterm>boxed</firstterm>, which means +that values of that type are represented by a pointer to a heap +object. The representation of a Haskell <literal>Int</literal>, for +example, is a two-word heap object. An <firstterm>unboxed</firstterm> +type, however, is represented by the value itself, no pointers or heap +allocation are involved. +</para> + +<para> +Unboxed types correspond to the “raw machine” types you +would use in C: <literal>Int#</literal> (long int), +<literal>Double#</literal> (double), <literal>Addr#</literal> +(void *), etc. The <emphasis>primitive operations</emphasis> +(PrimOps) on these types are what you might expect; e.g., +<literal>(+#)</literal> is addition on +<literal>Int#</literal>s, and is the machine-addition that we all +know and love—usually one instruction. +</para> + +<para> +Primitive (unboxed) types cannot be defined in Haskell, and are +therefore built into the language and compiler. Primitive types are +always unlifted; that is, a value of a primitive type cannot be +bottom. We use the convention that primitive types, values, and +operations have a <literal>#</literal> suffix. +</para> + +<para> +Primitive values are often represented by a simple bit-pattern, such +as <literal>Int#</literal>, <literal>Float#</literal>, +<literal>Double#</literal>. But this is not necessarily the case: +a primitive value might be represented by a pointer to a +heap-allocated object. Examples include +<literal>Array#</literal>, the type of primitive arrays. A +primitive array is heap-allocated because it is too big a value to fit +in a register, and would be too expensive to copy around; in a sense, +it is accidental that it is represented by a pointer. If a pointer +represents a primitive value, then it really does point to that value: +no unevaluated thunks, no indirections…nothing can be at the +other end of the pointer than the primitive value. +A numerically-intensive program using unboxed types can +go a <emphasis>lot</emphasis> faster than its “standard” +counterpart—we saw a threefold speedup on one example. +</para> + +<para> +There are some restrictions on the use of primitive types: +<itemizedlist> +<listitem><para>The main restriction +is that you can't pass a primitive value to a polymorphic +function or store one in a polymorphic data type. This rules out +things like <literal>[Int#]</literal> (i.e. lists of primitive +integers). The reason for this restriction is that polymorphic +arguments and constructor fields are assumed to be pointers: if an +unboxed integer is stored in one of these, the garbage collector would +attempt to follow it, leading to unpredictable space leaks. Or a +<function>seq</function> operation on the polymorphic component may +attempt to dereference the pointer, with disastrous results. Even +worse, the unboxed value might be larger than a pointer +(<literal>Double#</literal> for instance). +</para> +</listitem> +<listitem><para> You cannot bind a variable with an unboxed type +in a <emphasis>top-level</emphasis> binding. +</para></listitem> +<listitem><para> You cannot bind a variable with an unboxed type +in a <emphasis>recursive</emphasis> binding. +</para></listitem> +<listitem><para> You may bind unboxed variables in a (non-recursive, +non-top-level) pattern binding, but any such variable causes the entire +pattern-match +to become strict. For example: +<programlisting> + data Foo = Foo Int Int# + + f x = let (Foo a b, w) = ..rhs.. in ..body.. +</programlisting> +Since <literal>b</literal> has type <literal>Int#</literal>, the entire pattern +match +is strict, and the program behaves as if you had written +<programlisting> + data Foo = Foo Int Int# + + f x = case ..rhs.. of { (Foo a b, w) -> ..body.. } +</programlisting> +</para> +</listitem> +</itemizedlist> +</para> + +</sect2> + +<sect2 id="unboxed-tuples"> +<title>Unboxed Tuples +</title> + +<para> +Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>, +they're available by default with <option>-fglasgow-exts</option>. An +unboxed tuple looks like this: +</para> + +<para> + +<programlisting> +(# e_1, ..., e_n #) +</programlisting> + +</para> + +<para> +where <literal>e_1..e_n</literal> are expressions of any +type (primitive or non-primitive). The type of an unboxed tuple looks +the same. +</para> + +<para> +Unboxed tuples are used for functions that need to return multiple +values, but they avoid the heap allocation normally associated with +using fully-fledged tuples. When an unboxed tuple is returned, the +components are put directly into registers or on the stack; the +unboxed tuple itself does not have a composite representation. Many +of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed +tuples. +In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed +tuples to avoid unnecessary allocation during sequences of operations. +</para> + +<para> +There are some pretty stringent restrictions on the use of unboxed tuples: +<itemizedlist> +<listitem> + +<para> +Values of unboxed tuple types are subject to the same restrictions as +other unboxed types; i.e. they may not be stored in polymorphic data +structures or passed to polymorphic functions. + +</para> +</listitem> +<listitem> + +<para> +No variable can have an unboxed tuple type, nor may a constructor or function +argument have an unboxed tuple type. The following are all illegal: + + +<programlisting> + data Foo = Foo (# Int, Int #) + + f :: (# Int, Int #) -> (# Int, Int #) + f x = x + + g :: (# Int, Int #) -> Int + g (# a,b #) = a + + h x = let y = (# x,x #) in ... +</programlisting> +</para> +</listitem> +</itemizedlist> +</para> +<para> +The typical use of unboxed tuples is simply to return multiple values, +binding those multiple results with a <literal>case</literal> expression, thus: +<programlisting> + f x y = (# x+1, y-1 #) + g x = case f x x of { (# a, b #) -> a + b } +</programlisting> +You can have an unboxed tuple in a pattern binding, thus +<programlisting> + f x = let (# p,q #) = h x in ..body.. +</programlisting> +If the types of <literal>p</literal> and <literal>q</literal> are not unboxed, +the resulting binding is lazy like any other Haskell pattern binding. The +above example desugars like this: +<programlisting> + f x = let t = case h x o f{ (# p,q #) -> (p,q) + p = fst t + q = snd t + in ..body.. +</programlisting> +Indeed, the bindings can even be recursive. +</para> + +</sect2> +</sect1> + + +<!-- ====================== SYNTACTIC EXTENSIONS ======================= --> + +<sect1 id="syntax-extns"> +<title>Syntactic extensions</title> + + <!-- ====================== HIERARCHICAL MODULES ======================= --> + + <sect2 id="hierarchical-modules"> + <title>Hierarchical Modules</title> + + <para>GHC supports a small extension to the syntax of module + names: a module name is allowed to contain a dot + <literal>‘.’</literal>. This is also known as the + “hierarchical module namespace” extension, because + it extends the normally flat Haskell module namespace into a + more flexible hierarchy of modules.</para> + + <para>This extension has very little impact on the language + itself; modules names are <emphasis>always</emphasis> fully + qualified, so you can just think of the fully qualified module + name as <quote>the module name</quote>. In particular, this + means that the full module name must be given after the + <literal>module</literal> keyword at the beginning of the + module; for example, the module <literal>A.B.C</literal> must + begin</para> + +<programlisting>module A.B.C</programlisting> + + + <para>It is a common strategy to use the <literal>as</literal> + keyword to save some typing when using qualified names with + hierarchical modules. For example:</para> + +<programlisting> +import qualified Control.Monad.ST.Strict as ST +</programlisting> + + <para>For details on how GHC searches for source and interface + files in the presence of hierarchical modules, see <xref + linkend="search-path"/>.</para> + + <para>GHC comes with a large collection of libraries arranged + hierarchically; see the accompanying library documentation. + There is an ongoing project to create and maintain a stable set + of <quote>core</quote> libraries used by several Haskell + compilers, and the libraries that GHC comes with represent the + current status of that project. For more details, see <ulink + url="http://www.haskell.org/~simonmar/libraries/libraries.html">Haskell + Libraries</ulink>.</para> + + </sect2> + + <!-- ====================== PATTERN GUARDS ======================= --> + +<sect2 id="pattern-guards"> +<title>Pattern guards</title> + +<para> +<indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm> +The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.) +</para> + +<para> +Suppose we have an abstract data type of finite maps, with a +lookup operation: + +<programlisting> +lookup :: FiniteMap -> Int -> Maybe Int +</programlisting> + +The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise, +where <varname>v</varname> is the value that the key maps to. Now consider the following definition: +</para> + +<programlisting> +clunky env var1 var2 | ok1 && ok2 = val1 + val2 +| otherwise = var1 + var2 +where + m1 = lookup env var1 + m2 = lookup env var2 + ok1 = maybeToBool m1 + ok2 = maybeToBool m2 + val1 = expectJust m1 + val2 = expectJust m2 +</programlisting> + +<para> +The auxiliary functions are +</para> + +<programlisting> +maybeToBool :: Maybe a -> Bool +maybeToBool (Just x) = True +maybeToBool Nothing = False + +expectJust :: Maybe a -> a +expectJust (Just x) = x +expectJust Nothing = error "Unexpected Nothing" +</programlisting> + +<para> +What is <function>clunky</function> doing? The guard <literal>ok1 && +ok2</literal> checks that both lookups succeed, using +<function>maybeToBool</function> to convert the <function>Maybe</function> +types to booleans. The (lazily evaluated) <function>expectJust</function> +calls extract the values from the results of the lookups, and binds the +returned values to <varname>val1</varname> and <varname>val2</varname> +respectively. If either lookup fails, then clunky takes the +<literal>otherwise</literal> case and returns the sum of its arguments. +</para> + +<para> +This is certainly legal Haskell, but it is a tremendously verbose and +un-obvious way to achieve the desired effect. Arguably, a more direct way +to write clunky would be to use case expressions: +</para> + +<programlisting> +clunky env var1 var1 = case lookup env var1 of + Nothing -> fail + Just val1 -> case lookup env var2 of + Nothing -> fail + Just val2 -> val1 + val2 +where + fail = val1 + val2 +</programlisting> + +<para> +This is a bit shorter, but hardly better. Of course, we can rewrite any set +of pattern-matching, guarded equations as case expressions; that is +precisely what the compiler does when compiling equations! The reason that +Haskell provides guarded equations is because they allow us to write down +the cases we want to consider, one at a time, independently of each other. +This structure is hidden in the case version. Two of the right-hand sides +are really the same (<function>fail</function>), and the whole expression +tends to become more and more indented. +</para> + +<para> +Here is how I would write clunky: +</para> + +<programlisting> +clunky env var1 var1 + | Just val1 <- lookup env var1 + , Just val2 <- lookup env var2 + = val1 + val2 +...other equations for clunky... +</programlisting> + +<para> +The semantics should be clear enough. The qualifiers are matched in order. +For a <literal><-</literal> qualifier, which I call a pattern guard, the +right hand side is evaluated and matched against the pattern on the left. +If the match fails then the whole guard fails and the next equation is +tried. If it succeeds, then the appropriate binding takes place, and the +next qualifier is matched, in the augmented environment. Unlike list +comprehensions, however, the type of the expression to the right of the +<literal><-</literal> is the same as the type of the pattern to its +left. The bindings introduced by pattern guards scope over all the +remaining guard qualifiers, and over the right hand side of the equation. +</para> + +<para> +Just as with list comprehensions, boolean expressions can be freely mixed +with among the pattern guards. For example: +</para> + +<programlisting> +f x | [y] <- x + , y > 3 + , Just z <- h y + = ... +</programlisting> + +<para> +Haskell's current guards therefore emerge as a special case, in which the +qualifier list has just one element, a boolean expression. +</para> +</sect2> + + <!-- ===================== Recursive do-notation =================== --> + +<sect2 id="mdo-notation"> +<title>The recursive do-notation +</title> + +<para> The recursive do-notation (also known as mdo-notation) is implemented as described in +"A recursive do for Haskell", +Levent Erkok, John Launchbury", +Haskell Workshop 2002, pages: 29-37. Pittsburgh, Pennsylvania. +</para> +<para> +The do-notation of Haskell does not allow <emphasis>recursive bindings</emphasis>, +that is, the variables bound in a do-expression are visible only in the textually following +code block. Compare this to a let-expression, where bound variables are visible in the entire binding +group. It turns out that several applications can benefit from recursive bindings in +the do-notation, and this extension provides the necessary syntactic support. +</para> +<para> +Here is a simple (yet contrived) example: +</para> +<programlisting> +import Control.Monad.Fix + +justOnes = mdo xs <- Just (1:xs) + return xs +</programlisting> +<para> +As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [1,1,1,...</literal>. +</para> + +<para> +The Control.Monad.Fix library introduces the <literal>MonadFix</literal> class. It's definition is: +</para> +<programlisting> +class Monad m => MonadFix m where + mfix :: (a -> m a) -> m a +</programlisting> +<para> +The function <literal>mfix</literal> +dictates how the required recursion operation should be performed. If recursive bindings are required for a monad, +then that monad must be declared an instance of the <literal>MonadFix</literal> class. +For details, see the above mentioned reference. +</para> +<para> +The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO. +Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class +for Haskell's internal state monad (strict and lazy, respectively). +</para> +<para> +There are three important points in using the recursive-do notation: +<itemizedlist> +<listitem><para> +The recursive version of the do-notation uses the keyword <literal>mdo</literal> (rather +than <literal>do</literal>). +</para></listitem> + +<listitem><para> +You should <literal>import Control.Monad.Fix</literal>. +(Note: Strictly speaking, this import is required only when you need to refer to the name +<literal>MonadFix</literal> in your program, but the import is always safe, and the programmers +are encouraged to always import this module when using the mdo-notation.) +</para></listitem> + +<listitem><para> +As with other extensions, ghc should be given the flag <literal>-fglasgow-exts</literal> +</para></listitem> +</itemizedlist> +</para> + +<para> +The web page: <ulink url="http://www.cse.ogi.edu/PacSoft/projects/rmb">http://www.cse.ogi.edu/PacSoft/projects/rmb</ulink> +contains up to date information on recursive monadic bindings. +</para> + +<para> +Historical note: The old implementation of the mdo-notation (and most +of the existing documents) used the name +<literal>MonadRec</literal> for the class and the corresponding library. +This name is not supported by GHC. +</para> + +</sect2> + + + <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== --> + + <sect2 id="parallel-list-comprehensions"> + <title>Parallel List Comprehensions</title> + <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary> + </indexterm> + <indexterm><primary>parallel list comprehensions</primary> + </indexterm> + + <para>Parallel list comprehensions are a natural extension to list + comprehensions. List comprehensions can be thought of as a nice + syntax for writing maps and filters. Parallel comprehensions + extend this to include the zipWith family.</para> + + <para>A parallel list comprehension has multiple independent + branches of qualifier lists, each separated by a `|' symbol. For + example, the following zips together two lists:</para> + +<programlisting> + [ (x, y) | x <- xs | y <- ys ] +</programlisting> + + <para>The behavior of parallel list comprehensions follows that of + zip, in that the resulting list will have the same length as the + shortest branch.</para> + + <para>We can define parallel list comprehensions by translation to + regular comprehensions. Here's the basic idea:</para> + + <para>Given a parallel comprehension of the form: </para> + +<programlisting> + [ e | p1 <- e11, p2 <- e12, ... + | q1 <- e21, q2 <- e22, ... + ... + ] +</programlisting> + + <para>This will be translated to: </para> + +<programlisting> + [ e | ((p1,p2), (q1,q2), ...) <- zipN [(p1,p2) | p1 <- e11, p2 <- e12, ...] + [(q1,q2) | q1 <- e21, q2 <- e22, ...] + ... + ] +</programlisting> + + <para>where `zipN' is the appropriate zip for the given number of + branches.</para> + + </sect2> + +<sect2 id="rebindable-syntax"> +<title>Rebindable syntax</title> + + + <para>GHC allows most kinds of built-in syntax to be rebound by + the user, to facilitate replacing the <literal>Prelude</literal> + with a home-grown version, for example.</para> + + <para>You may want to define your own numeric class + hierarchy. It completely defeats that purpose if the + literal "1" means "<literal>Prelude.fromInteger + 1</literal>", which is what the Haskell Report specifies. + So the <option>-fno-implicit-prelude</option> flag causes + the following pieces of built-in syntax to refer to + <emphasis>whatever is in scope</emphasis>, not the Prelude + versions: + + <itemizedlist> + <listitem> + <para>An integer literal <literal>368</literal> means + "<literal>fromInteger (368::Integer)</literal>", rather than + "<literal>Prelude.fromInteger (368::Integer)</literal>". +</para> </listitem> + + <listitem><para>Fractional literals are handed in just the same way, + except that the translation is + <literal>fromRational (3.68::Rational)</literal>. +</para> </listitem> + + <listitem><para>The equality test in an overloaded numeric pattern + uses whatever <literal>(==)</literal> is in scope. +</para> </listitem> + + <listitem><para>The subtraction operation, and the + greater-than-or-equal test, in <literal>n+k</literal> patterns + use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope. + </para></listitem> + + <listitem> + <para>Negation (e.g. "<literal>- (f x)</literal>") + means "<literal>negate (f x)</literal>", both in numeric + patterns, and expressions. + </para></listitem> + + <listitem> + <para>"Do" notation is translated using whatever + functions <literal>(>>=)</literal>, + <literal>(>>)</literal>, and <literal>fail</literal>, + are in scope (not the Prelude + versions). List comprehensions, mdo (<xref linkend="mdo-notation"/>), and parallel array + comprehensions, are unaffected. </para></listitem> + + <listitem> + <para>Arrow + notation (see <xref linkend="arrow-notation"/>) + uses whatever <literal>arr</literal>, + <literal>(>>>)</literal>, <literal>first</literal>, + <literal>app</literal>, <literal>(|||)</literal> and + <literal>loop</literal> functions are in scope. But unlike the + other constructs, the types of these functions must match the + Prelude types very closely. Details are in flux; if you want + to use this, ask! + </para></listitem> + </itemizedlist> +In all cases (apart from arrow notation), the static semantics should be that of the desugared form, +even if that is a little unexpected. For emample, the +static semantics of the literal <literal>368</literal> +is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for +<literal>fromInteger</literal> to have any of the types: +<programlisting> +fromInteger :: Integer -> Integer +fromInteger :: forall a. Foo a => Integer -> a +fromInteger :: Num a => a -> Integer +fromInteger :: Integer -> Bool -> Bool +</programlisting> +</para> + + <para>Be warned: this is an experimental facility, with + fewer checks than usual. Use <literal>-dcore-lint</literal> + to typecheck the desugared program. If Core Lint is happy + you should be all right.</para> + +</sect2> +</sect1> + + +<!-- TYPE SYSTEM EXTENSIONS --> +<sect1 id="type-extensions"> +<title>Type system extensions</title> + + +<sect2> +<title>Data types and type synonyms</title> + +<sect3 id="nullary-types"> +<title>Data types with no constructors</title> + +<para>With the <option>-fglasgow-exts</option> flag, GHC lets you declare +a data type with no constructors. For example:</para> + +<programlisting> + data S -- S :: * + data T a -- T :: * -> * +</programlisting> + +<para>Syntactically, the declaration lacks the "= constrs" part. The +type can be parameterised over types of any kind, but if the kind is +not <literal>*</literal> then an explicit kind annotation must be used +(see <xref linkend="sec-kinding"/>).</para> + +<para>Such data types have only one value, namely bottom. +Nevertheless, they can be useful when defining "phantom types".</para> +</sect3> + +<sect3 id="infix-tycons"> +<title>Infix type constructors, classes, and type variables</title> + +<para> +GHC allows type constructors, classes, and type variables to be operators, and +to be written infix, very much like expressions. More specifically: +<itemizedlist> +<listitem><para> + A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>. + The lexical syntax is the same as that for data constructors. + </para></listitem> +<listitem><para> + Data type and type-synonym declarations can be written infix, parenthesised + if you want further arguments. E.g. +<screen> + data a :*: b = Foo a b + type a :+: b = Either a b + class a :=: b where ... + + data (a :**: b) x = Baz a b x + type (a :++: b) y = Either (a,b) y +</screen> + </para></listitem> +<listitem><para> + Types, and class constraints, can be written infix. For example + <screen> + x :: Int :*: Bool + f :: (a :=: b) => a -> b + </screen> + </para></listitem> +<listitem><para> + A type variable can be an (unqualified) operator e.g. <literal>+</literal>. + The lexical syntax is the same as that for variable operators, excluding "(.)", + "(!)", and "(*)". In a binding position, the operator must be + parenthesised. For example: +<programlisting> + type T (+) = Int + Int + f :: T Either + f = Left 3 + + liftA2 :: Arrow (~>) + => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c) + liftA2 = ... +</programlisting> + </para></listitem> +<listitem><para> + Back-quotes work + as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or + <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>. + </para></listitem> +<listitem><para> + Fixities may be declared for type constructors, or classes, just as for data constructors. However, + one cannot distinguish between the two in a fixity declaration; a fixity declaration + sets the fixity for a data constructor and the corresponding type constructor. For example: +<screen> + infixl 7 T, :*: +</screen> + sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>, + and similarly for <literal>:*:</literal>. + <literal>Int `a` Bool</literal>. + </para></listitem> +<listitem><para> + Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.) + </para></listitem> + +</itemizedlist> +</para> +</sect3> + +<sect3 id="type-synonyms"> +<title>Liberalised type synonyms</title> + +<para> +Type synonyms are like macros at the type level, and +GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>. +That means that GHC can be very much more liberal about type synonyms than Haskell 98: +<itemizedlist> +<listitem> <para>You can write a <literal>forall</literal> (including overloading) +in a type synonym, thus: +<programlisting> + type Discard a = forall b. Show b => a -> b -> (a, String) + + f :: Discard a + f x y = (x, show y) + + g :: Discard Int -> (Int,Bool) -- A rank-2 type + g f = f Int True +</programlisting> +</para> +</listitem> + +<listitem><para> +You can write an unboxed tuple in a type synonym: +<programlisting> + type Pr = (# Int, Int #) + + h :: Int -> Pr + h x = (# x, x #) +</programlisting> +</para></listitem> + +<listitem><para> +You can apply a type synonym to a forall type: +<programlisting> + type Foo a = a -> a -> Bool + + f :: Foo (forall b. b->b) +</programlisting> +After expanding the synonym, <literal>f</literal> has the legal (in GHC) type: +<programlisting> + f :: (forall b. b->b) -> (forall b. b->b) -> Bool +</programlisting> +</para></listitem> + +<listitem><para> +You can apply a type synonym to a partially applied type synonym: +<programlisting> + type Generic i o = forall x. i x -> o x + type Id x = x + + foo :: Generic Id [] +</programlisting> +After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type: +<programlisting> + foo :: forall x. x -> [x] +</programlisting> +</para></listitem> + +</itemizedlist> +</para> + +<para> +GHC currently does kind checking before expanding synonyms (though even that +could be changed.) +</para> +<para> +After expanding type synonyms, GHC does validity checking on types, looking for +the following mal-formedness which isn't detected simply by kind checking: +<itemizedlist> +<listitem><para> +Type constructor applied to a type involving for-alls. +</para></listitem> +<listitem><para> +Unboxed tuple on left of an arrow. +</para></listitem> +<listitem><para> +Partially-applied type synonym. +</para></listitem> +</itemizedlist> +So, for example, +this will be rejected: +<programlisting> + type Pr = (# Int, Int #) + + h :: Pr -> Int + h x = ... +</programlisting> +because GHC does not allow unboxed tuples on the left of a function arrow. +</para> +</sect3> + + +<sect3 id="existential-quantification"> +<title>Existentially quantified data constructors +</title> + +<para> +The idea of using existential quantification in data type declarations +was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation +of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of +London, 1991). It was later formalised by Laufer and Odersky +(<emphasis>Polymorphic type inference and abstract data types</emphasis>, +TOPLAS, 16(5), pp1411-1430, 1994). +It's been in Lennart +Augustsson's <command>hbc</command> Haskell compiler for several years, and +proved very useful. Here's the idea. Consider the declaration: +</para> + +<para> + +<programlisting> + data Foo = forall a. MkFoo a (a -> Bool) + | Nil +</programlisting> + +</para> + +<para> +The data type <literal>Foo</literal> has two constructors with types: +</para> + +<para> + +<programlisting> + MkFoo :: forall a. a -> (a -> Bool) -> Foo + Nil :: Foo +</programlisting> + +</para> + +<para> +Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function> +does not appear in the data type itself, which is plain <literal>Foo</literal>. +For example, the following expression is fine: +</para> + +<para> + +<programlisting> + [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo] +</programlisting> + +</para> + +<para> +Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function +<function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c' +isUpper</function> packages a character with a compatible function. These +two things are each of type <literal>Foo</literal> and can be put in a list. +</para> + +<para> +What can we do with a value of type <literal>Foo</literal>?. In particular, +what happens when we pattern-match on <function>MkFoo</function>? +</para> + +<para> + +<programlisting> + f (MkFoo val fn) = ??? +</programlisting> + +</para> + +<para> +Since all we know about <literal>val</literal> and <function>fn</function> is that they +are compatible, the only (useful) thing we can do with them is to +apply <function>fn</function> to <literal>val</literal> to get a boolean. For example: +</para> + +<para> + +<programlisting> + f :: Foo -> Bool + f (MkFoo val fn) = fn val +</programlisting> + +</para> + +<para> +What this allows us to do is to package heterogenous values +together with a bunch of functions that manipulate them, and then treat +that collection of packages in a uniform manner. You can express +quite a bit of object-oriented-like programming this way. +</para> + +<sect4 id="existential"> +<title>Why existential? +</title> + +<para> +What has this to do with <emphasis>existential</emphasis> quantification? +Simply that <function>MkFoo</function> has the (nearly) isomorphic type +</para> + +<para> + +<programlisting> + MkFoo :: (exists a . (a, a -> Bool)) -> Foo +</programlisting> + +</para> + +<para> +But Haskell programmers can safely think of the ordinary +<emphasis>universally</emphasis> quantified type given above, thereby avoiding +adding a new existential quantification construct. +</para> + +</sect4> + +<sect4> +<title>Type classes</title> + +<para> +An easy extension is to allow +arbitrary contexts before the constructor. For example: +</para> + +<para> + +<programlisting> +data Baz = forall a. Eq a => Baz1 a a + | forall b. Show b => Baz2 b (b -> b) +</programlisting> + +</para> + +<para> +The two constructors have the types you'd expect: +</para> + +<para> + +<programlisting> +Baz1 :: forall a. Eq a => a -> a -> Baz +Baz2 :: forall b. Show b => b -> (b -> b) -> Baz +</programlisting> + +</para> + +<para> +But when pattern matching on <function>Baz1</function> the matched values can be compared +for equality, and when pattern matching on <function>Baz2</function> the first matched +value can be converted to a string (as well as applying the function to it). +So this program is legal: +</para> + +<para> + +<programlisting> + f :: Baz -> String + f (Baz1 p q) | p == q = "Yes" + | otherwise = "No" + f (Baz2 v fn) = show (fn v) +</programlisting> + +</para> + +<para> +Operationally, in a dictionary-passing implementation, the +constructors <function>Baz1</function> and <function>Baz2</function> must store the +dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and +extract it on pattern matching. +</para> + +<para> +Notice the way that the syntax fits smoothly with that used for +universal quantification earlier. +</para> + +</sect4> + +<sect4> +<title>Record Constructors</title> + +<para> +GHC allows existentials to be used with records syntax as well. For example: + +<programlisting> +data Counter a = forall self. NewCounter + { _this :: self + , _inc :: self -> self + , _display :: self -> IO () + , tag :: a + } +</programlisting> +Here <literal>tag</literal> is a public field, with a well-typed selector +function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal> +type is hidden from the outside; any attempt to apply <literal>_this</literal>, +<literal>_inc</literal> or <literal>_output</literal> as functions will raise a +compile-time error. In other words, <emphasis>GHC defines a record selector function +only for fields whose type does not mention the existentially-quantified variables</emphasis>. +(This example used an underscore in the fields for which record selectors +will not be defined, but that is only programming style; GHC ignores them.) +</para> + +<para> +To make use of these hidden fields, we need to create some helper functions: + +<programlisting> +inc :: Counter a -> Counter a +inc (NewCounter x i d t) = NewCounter + { _this = i x, _inc = i, _display = d, tag = t } + +display :: Counter a -> IO () +display NewCounter{ _this = x, _display = d } = d x +</programlisting> + +Now we can define counters with different underlying implementations: + +<programlisting> +counterA :: Counter String +counterA = NewCounter + { _this = 0, _inc = (1+), _display = print, tag = "A" } + +counterB :: Counter String +counterB = NewCounter + { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" } + +main = do + display (inc counterA) -- prints "1" + display (inc (inc counterB)) -- prints "##" +</programlisting> + +In GADT declarations (see <xref linkend="gadt"/>), the explicit +<literal>forall</literal> may be omitted. For example, we can express +the same <literal>Counter a</literal> using GADT: + +<programlisting> +data Counter a where + NewCounter { _this :: self + , _inc :: self -> self + , _display :: self -> IO () + , tag :: a + } + :: Counter a +</programlisting> + +At the moment, record update syntax is only supported for Haskell 98 data types, +so the following function does <emphasis>not</emphasis> work: + +<programlisting> +-- This is invalid; use explicit NewCounter instead for now +setTag :: Counter a -> a -> Counter a +setTag obj t = obj{ tag = t } +</programlisting> + +</para> + +</sect4> + + +<sect4> +<title>Restrictions</title> + +<para> +There are several restrictions on the ways in which existentially-quantified +constructors can be use. +</para> + +<para> + +<itemizedlist> +<listitem> + +<para> + When pattern matching, each pattern match introduces a new, +distinct, type for each existential type variable. These types cannot +be unified with any other type, nor can they escape from the scope of +the pattern match. For example, these fragments are incorrect: + + +<programlisting> +f1 (MkFoo a f) = a +</programlisting> + + +Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal> +is the result of <function>f1</function>. One way to see why this is wrong is to +ask what type <function>f1</function> has: + + +<programlisting> + f1 :: Foo -> a -- Weird! +</programlisting> + + +What is this "<literal>a</literal>" in the result type? Clearly we don't mean +this: + + +<programlisting> + f1 :: forall a. Foo -> a -- Wrong! +</programlisting> + + +The original program is just plain wrong. Here's another sort of error + + +<programlisting> + f2 (Baz1 a b) (Baz1 p q) = a==q +</programlisting> + + +It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but +<literal>a==q</literal> is wrong because it equates the two distinct types arising +from the two <function>Baz1</function> constructors. + + +</para> +</listitem> +<listitem> + +<para> +You can't pattern-match on an existentially quantified +constructor in a <literal>let</literal> or <literal>where</literal> group of +bindings. So this is illegal: + + +<programlisting> + f3 x = a==b where { Baz1 a b = x } +</programlisting> + +Instead, use a <literal>case</literal> expression: + +<programlisting> + f3 x = case x of Baz1 a b -> a==b +</programlisting> + +In general, you can only pattern-match +on an existentially-quantified constructor in a <literal>case</literal> expression or +in the patterns of a function definition. + +The reason for this restriction is really an implementation one. +Type-checking binding groups is already a nightmare without +existentials complicating the picture. Also an existential pattern +binding at the top level of a module doesn't make sense, because it's +not clear how to prevent the existentially-quantified type "escaping". +So for now, there's a simple-to-state restriction. We'll see how +annoying it is. + +</para> +</listitem> +<listitem> + +<para> +You can't use existential quantification for <literal>newtype</literal> +declarations. So this is illegal: + + +<programlisting> + newtype T = forall a. Ord a => MkT a +</programlisting> + + +Reason: a value of type <literal>T</literal> must be represented as a +pair of a dictionary for <literal>Ord t</literal> and a value of type +<literal>t</literal>. That contradicts the idea that +<literal>newtype</literal> should have no concrete representation. +You can get just the same efficiency and effect by using +<literal>data</literal> instead of <literal>newtype</literal>. If +there is no overloading involved, then there is more of a case for +allowing an existentially-quantified <literal>newtype</literal>, +because the <literal>data</literal> version does carry an +implementation cost, but single-field existentially quantified +constructors aren't much use. So the simple restriction (no +existential stuff on <literal>newtype</literal>) stands, unless there +are convincing reasons to change it. + + +</para> +</listitem> +<listitem> + +<para> + You can't use <literal>deriving</literal> to define instances of a +data type with existentially quantified data constructors. + +Reason: in most cases it would not make sense. For example:# + +<programlisting> +data T = forall a. MkT [a] deriving( Eq ) +</programlisting> + +To derive <literal>Eq</literal> in the standard way we would need to have equality +between the single component of two <function>MkT</function> constructors: + +<programlisting> +instance Eq T where + (MkT a) == (MkT b) = ??? +</programlisting> + +But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared. +It's just about possible to imagine examples in which the derived instance +would make sense, but it seems altogether simpler simply to prohibit such +declarations. Define your own instances! +</para> +</listitem> + +</itemizedlist> + +</para> + +</sect4> +</sect3> + +</sect2> + + + +<sect2 id="multi-param-type-classes"> +<title>Class declarations</title> + +<para> +This section, and the next one, documents GHC's type-class extensions. +There's lots of background in the paper <ulink +url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space" >Type +classes: exploring the design space</ulink > (Simon Peyton Jones, Mark +Jones, Erik Meijer). +</para> +<para> +All the extensions are enabled by the <option>-fglasgow-exts</option> flag. +</para> + +<sect3> +<title>Multi-parameter type classes</title> +<para> +Multi-parameter type classes are permitted. For example: + + +<programlisting> + class Collection c a where + union :: c a -> c a -> c a + ...etc. +</programlisting> + +</para> +</sect3> + +<sect3> +<title>The superclasses of a class declaration</title> + +<para> +There are no restrictions on the context in a class declaration +(which introduces superclasses), except that the class hierarchy must +be acyclic. So these class declarations are OK: + + +<programlisting> + class Functor (m k) => FiniteMap m k where + ... + + class (Monad m, Monad (t m)) => Transform t m where + lift :: m a -> (t m) a +</programlisting> + + +</para> +<para> +As in Haskell 98, The class hierarchy must be acyclic. However, the definition +of "acyclic" involves only the superclass relationships. For example, +this is OK: + + +<programlisting> + class C a where { + op :: D b => a -> b -> b + } + + class C a => D a where { ... } +</programlisting> + + +Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a +class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It +would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.) +</para> +</sect3> + + + + +<sect3 id="class-method-types"> +<title>Class method types</title> + +<para> +Haskell 98 prohibits class method types to mention constraints on the +class type variable, thus: +<programlisting> + class Seq s a where + fromList :: [a] -> s a + elem :: Eq a => a -> s a -> Bool +</programlisting> +The type of <literal>elem</literal> is illegal in Haskell 98, because it +contains the constraint <literal>Eq a</literal>, constrains only the +class type variable (in this case <literal>a</literal>). +GHC lifts this restriction. +</para> + + +</sect3> +</sect2> + +<sect2 id="functional-dependencies"> +<title>Functional dependencies +</title> + +<para> Functional dependencies are implemented as described by Mark Jones +in “<ulink url="http://www.cse.ogi.edu/~mpj/pubs/fundeps.html">Type Classes with Functional Dependencies</ulink>”, Mark P. Jones, +In Proceedings of the 9th European Symposium on Programming, +ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782, +. +</para> +<para> +Functional dependencies are introduced by a vertical bar in the syntax of a +class declaration; e.g. +<programlisting> + class (Monad m) => MonadState s m | m -> s where ... + + class Foo a b c | a b -> c where ... +</programlisting> +There should be more documentation, but there isn't (yet). Yell if you need it. +</para> + +<sect3><title>Rules for functional dependencies </title> +<para> +In a class declaration, all of the class type variables must be reachable (in the sense +mentioned in <xref linkend="type-restrictions"/>) +from the free variables of each method type. +For example: + +<programlisting> + class Coll s a where + empty :: s + insert :: s -> a -> s +</programlisting> + +is not OK, because the type of <literal>empty</literal> doesn't mention +<literal>a</literal>. Functional dependencies can make the type variable +reachable: +<programlisting> + class Coll s a | s -> a where + empty :: s + insert :: s -> a -> s +</programlisting> + +Alternatively <literal>Coll</literal> might be rewritten + +<programlisting> + class Coll s a where + empty :: s a + insert :: s a -> a -> s a +</programlisting> + + +which makes the connection between the type of a collection of +<literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>. +Occasionally this really doesn't work, in which case you can split the +class like this: + + +<programlisting> + class CollE s where + empty :: s + + class CollE s => Coll s a where + insert :: s -> a -> s +</programlisting> +</para> +</sect3> + + +<sect3> +<title>Background on functional dependencies</title> + +<para>The following description of the motivation and use of functional dependencies is taken +from the Hugs user manual, reproduced here (with minor changes) by kind +permission of Mark Jones. +</para> +<para> +Consider the following class, intended as part of a +library for collection types: +<programlisting> + class Collects e ce where + empty :: ce + insert :: e -> ce -> ce + member :: e -> ce -> Bool +</programlisting> +The type variable e used here represents the element type, while ce is the type +of the container itself. Within this framework, we might want to define +instances of this class for lists or characteristic functions (both of which +can be used to represent collections of any equality type), bit sets (which can +be used to represent collections of characters), or hash tables (which can be +used to represent any collection whose elements have a hash function). Omitting +standard implementation details, this would lead to the following declarations: +<programlisting> + instance Eq e => Collects e [e] where ... + instance Eq e => Collects e (e -> Bool) where ... + instance Collects Char BitSet where ... + instance (Hashable e, Collects a ce) + => Collects e (Array Int ce) where ... +</programlisting> +All this looks quite promising; we have a class and a range of interesting +implementations. Unfortunately, there are some serious problems with the class +declaration. First, the empty function has an ambiguous type: +<programlisting> + empty :: Collects e ce => ce +</programlisting> +By "ambiguous" we mean that there is a type variable e that appears on the left +of the <literal>=></literal> symbol, but not on the right. The problem with +this is that, according to the theoretical foundations of Haskell overloading, +we cannot guarantee a well-defined semantics for any term with an ambiguous +type. +</para> +<para> +We can sidestep this specific problem by removing the empty member from the +class declaration. However, although the remaining members, insert and member, +do not have ambiguous types, we still run into problems when we try to use +them. For example, consider the following two functions: +<programlisting> + f x y = insert x . insert y + g = f True 'a' +</programlisting> +for which GHC infers the following types: +<programlisting> + f :: (Collects a c, Collects b c) => a -> b -> c -> c + g :: (Collects Bool c, Collects Char c) => c -> c +</programlisting> +Notice that the type for f allows the two parameters x and y to be assigned +different types, even though it attempts to insert each of the two values, one +after the other, into the same collection. If we're trying to model collections +that contain only one type of value, then this is clearly an inaccurate +type. Worse still, the definition for g is accepted, without causing a type +error. As a result, the error in this code will not be flagged at the point +where it appears. Instead, it will show up only when we try to use g, which +might even be in a different module. +</para> + +<sect4><title>An attempt to use constructor classes</title> + +<para> +Faced with the problems described above, some Haskell programmers might be +tempted to use something like the following version of the class declaration: +<programlisting> + class Collects e c where + empty :: c e + insert :: e -> c e -> c e + member :: e -> c e -> Bool +</programlisting> +The key difference here is that we abstract over the type constructor c that is +used to form the collection type c e, and not over that collection type itself, +represented by ce in the original class declaration. This avoids the immediate +problems that we mentioned above: empty has type <literal>Collects e c => c +e</literal>, which is not ambiguous. +</para> +<para> +The function f from the previous section has a more accurate type: +<programlisting> + f :: (Collects e c) => e -> e -> c e -> c e +</programlisting> +The function g from the previous section is now rejected with a type error as +we would hope because the type of f does not allow the two arguments to have +different types. +This, then, is an example of a multiple parameter class that does actually work +quite well in practice, without ambiguity problems. +There is, however, a catch. This version of the Collects class is nowhere near +as general as the original class seemed to be: only one of the four instances +for <literal>Collects</literal> +given above can be used with this version of Collects because only one of +them---the instance for lists---has a collection type that can be written in +the form c e, for some type constructor c, and element type e. +</para> +</sect4> + +<sect4><title>Adding functional dependencies</title> + +<para> +To get a more useful version of the Collects class, Hugs provides a mechanism +that allows programmers to specify dependencies between the parameters of a +multiple parameter class (For readers with an interest in theoretical +foundations and previous work: The use of dependency information can be seen +both as a generalization of the proposal for `parametric type classes' that was +put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's +later framework for "improvement" of qualified types. The +underlying ideas are also discussed in a more theoretical and abstract setting +in a manuscript [implparam], where they are identified as one point in a +general design space for systems of implicit parameterization.). + +To start with an abstract example, consider a declaration such as: +<programlisting> + class C a b where ... +</programlisting> +which tells us simply that C can be thought of as a binary relation on types +(or type constructors, depending on the kinds of a and b). Extra clauses can be +included in the definition of classes to add information about dependencies +between parameters, as in the following examples: +<programlisting> + class D a b | a -> b where ... + class E a b | a -> b, b -> a where ... +</programlisting> +The notation <literal>a -> b</literal> used here between the | and where +symbols --- not to be +confused with a function type --- indicates that the a parameter uniquely +determines the b parameter, and might be read as "a determines b." Thus D is +not just a relation, but actually a (partial) function. Similarly, from the two +dependencies that are included in the definition of E, we can see that E +represents a (partial) one-one mapping between types. +</para> +<para> +More generally, dependencies take the form <literal>x1 ... xn -> y1 ... ym</literal>, +where x1, ..., xn, and y1, ..., yn are type variables with n>0 and +m>=0, meaning that the y parameters are uniquely determined by the x +parameters. Spaces can be used as separators if more than one variable appears +on any single side of a dependency, as in <literal>t -> a b</literal>. Note that a class may be +annotated with multiple dependencies using commas as separators, as in the +definition of E above. Some dependencies that we can write in this notation are +redundant, and will be rejected because they don't serve any useful +purpose, and may instead indicate an error in the program. Examples of +dependencies like this include <literal>a -> a </literal>, +<literal>a -> a a </literal>, +<literal>a -> </literal>, etc. There can also be +some redundancy if multiple dependencies are given, as in +<literal>a->b</literal>, + <literal>b->c </literal>, <literal>a->c </literal>, and +in which some subset implies the remaining dependencies. Examples like this are +not treated as errors. Note that dependencies appear only in class +declarations, and not in any other part of the language. In particular, the +syntax for instance declarations, class constraints, and types is completely +unchanged. +</para> +<para> +By including dependencies in a class declaration, we provide a mechanism for +the programmer to specify each multiple parameter class more precisely. The +compiler, on the other hand, is responsible for ensuring that the set of +instances that are in scope at any given point in the program is consistent +with any declared dependencies. For example, the following pair of instance +declarations cannot appear together in the same scope because they violate the +dependency for D, even though either one on its own would be acceptable: +<programlisting> + instance D Bool Int where ... + instance D Bool Char where ... +</programlisting> +Note also that the following declaration is not allowed, even by itself: +<programlisting> + instance D [a] b where ... +</programlisting> +The problem here is that this instance would allow one particular choice of [a] +to be associated with more than one choice for b, which contradicts the +dependency specified in the definition of D. More generally, this means that, +in any instance of the form: +<programlisting> + instance D t s where ... +</programlisting> +for some particular types t and s, the only variables that can appear in s are +the ones that appear in t, and hence, if the type t is known, then s will be +uniquely determined. +</para> +<para> +The benefit of including dependency information is that it allows us to define +more general multiple parameter classes, without ambiguity problems, and with +the benefit of more accurate types. To illustrate this, we return to the +collection class example, and annotate the original definition of <literal>Collects</literal> +with a simple dependency: +<programlisting> + class Collects e ce | ce -> e where + empty :: ce + insert :: e -> ce -> ce + member :: e -> ce -> Bool +</programlisting> +The dependency <literal>ce -> e</literal> here specifies that the type e of elements is uniquely +determined by the type of the collection ce. Note that both parameters of +Collects are of kind *; there are no constructor classes here. Note too that +all of the instances of Collects that we gave earlier can be used +together with this new definition. +</para> +<para> +What about the ambiguity problems that we encountered with the original +definition? The empty function still has type Collects e ce => ce, but it is no +longer necessary to regard that as an ambiguous type: Although the variable e +does not appear on the right of the => symbol, the dependency for class +Collects tells us that it is uniquely determined by ce, which does appear on +the right of the => symbol. Hence the context in which empty is used can still +give enough information to determine types for both ce and e, without +ambiguity. More generally, we need only regard a type as ambiguous if it +contains a variable on the left of the => that is not uniquely determined +(either directly or indirectly) by the variables on the right. +</para> +<para> +Dependencies also help to produce more accurate types for user defined +functions, and hence to provide earlier detection of errors, and less cluttered +types for programmers to work with. Recall the previous definition for a +function f: +<programlisting> + f x y = insert x y = insert x . insert y +</programlisting> +for which we originally obtained a type: +<programlisting> + f :: (Collects a c, Collects b c) => a -> b -> c -> c +</programlisting> +Given the dependency information that we have for Collects, however, we can +deduce that a and b must be equal because they both appear as the second +parameter in a Collects constraint with the same first parameter c. Hence we +can infer a shorter and more accurate type for f: +<programlisting> + f :: (Collects a c) => a -> a -> c -> c +</programlisting> +In a similar way, the earlier definition of g will now be flagged as a type error. +</para> +<para> +Although we have given only a few examples here, it should be clear that the +addition of dependency information can help to make multiple parameter classes +more useful in practice, avoiding ambiguity problems, and allowing more general +sets of instance declarations. +</para> +</sect4> +</sect3> +</sect2> + +<sect2 id="instance-decls"> +<title>Instance declarations</title> + +<sect3 id="instance-rules"> +<title>Relaxed rules for instance declarations</title> + +<para>An instance declaration has the form +<screen> + instance ( <replaceable>assertion</replaceable><subscript>1</subscript>, ..., <replaceable>assertion</replaceable><subscript>n</subscript>) => <replaceable>class</replaceable> <replaceable>type</replaceable><subscript>1</subscript> ... <replaceable>type</replaceable><subscript>m</subscript> where ... +</screen> +The part before the "<literal>=></literal>" is the +<emphasis>context</emphasis>, while the part after the +"<literal>=></literal>" is the <emphasis>head</emphasis> of the instance declaration. +</para> + +<para> +In Haskell 98 the head of an instance declaration +must be of the form <literal>C (T a1 ... an)</literal>, where +<literal>C</literal> is the class, <literal>T</literal> is a type constructor, +and the <literal>a1 ... an</literal> are distinct type variables. +Furthermore, the assertions in the context of the instance declaration +must be of the form <literal>C a</literal> where <literal>a</literal> +is a type variable that occurs in the head. +</para> +<para> +The <option>-fglasgow-exts</option> flag loosens these restrictions +considerably. Firstly, multi-parameter type classes are permitted. Secondly, +the context and head of the instance declaration can each consist of arbitrary +(well-kinded) assertions <literal>(C t1 ... tn)</literal> subject only to the +following rules: +<orderedlist> +<listitem><para> +For each assertion in the context: +<orderedlist> +<listitem><para>No type variable has more occurrences in the assertion than in the head</para></listitem> +<listitem><para>The assertion has fewer constructors and variables (taken together + and counting repetitions) than the head</para></listitem> +</orderedlist> +</para></listitem> + +<listitem><para>The coverage condition. For each functional dependency, +<replaceable>tvs</replaceable><subscript>left</subscript> <literal>-></literal> +<replaceable>tvs</replaceable><subscript>right</subscript>, of the class, +every type variable in +S(<replaceable>tvs</replaceable><subscript>right</subscript>) must appear in +S(<replaceable>tvs</replaceable><subscript>left</subscript>), where S is the +substitution mapping each type variable in the class declaration to the +corresponding type in the instance declaration. +</para></listitem> +</orderedlist> +These restrictions ensure that context reduction terminates: each reduction +step makes the problem smaller by at least one +constructor. For example, the following would make the type checker +loop if it wasn't excluded: +<programlisting> + instance C a => C a where ... +</programlisting> +For example, these are OK: +<programlisting> + instance C Int [a] -- Multiple parameters + instance Eq (S [a]) -- Structured type in head + + -- Repeated type variable in head + instance C4 a a => C4 [a] [a] + instance Stateful (ST s) (MutVar s) + + -- Head can consist of type variables only + instance C a + instance (Eq a, Show b) => C2 a b + + -- Non-type variables in context + instance Show (s a) => Show (Sized s a) + instance C2 Int a => C3 Bool [a] + instance C2 Int a => C3 [a] b +</programlisting> +But these are not: +<programlisting> + -- Context assertion no smaller than head + instance C a => C a where ... + -- (C b b) has more more occurrences of b than the head + instance C b b => Foo [b] where ... +</programlisting> +</para> + +<para> +The same restrictions apply to instances generated by +<literal>deriving</literal> clauses. Thus the following is accepted: +<programlisting> + data MinHeap h a = H a (h a) + deriving (Show) +</programlisting> +because the derived instance +<programlisting> + instance (Show a, Show (h a)) => Show (MinHeap h a) +</programlisting> +conforms to the above rules. +</para> + +<para> +A useful idiom permitted by the above rules is as follows. +If one allows overlapping instance declarations then it's quite +convenient to have a "default instance" declaration that applies if +something more specific does not: +<programlisting> + instance C a where + op = ... -- Default +</programlisting> +</para> +</sect3> + +<sect3 id="undecidable-instances"> +<title>Undecidable instances</title> + +<para> +Sometimes even the rules of <xref linkend="instance-rules"/> are too onerous. +For example, sometimes you might want to use the following to get the +effect of a "class synonym": +<programlisting> + class (C1 a, C2 a, C3 a) => C a where { } + + instance (C1 a, C2 a, C3 a) => C a where { } +</programlisting> +This allows you to write shorter signatures: +<programlisting> + f :: C a => ... +</programlisting> +instead of +<programlisting> + f :: (C1 a, C2 a, C3 a) => ... +</programlisting> +The restrictions on functional dependencies (<xref +linkend="functional-dependencies"/>) are particularly troublesome. +It is tempting to introduce type variables in the context that do not appear in +the head, something that is excluded by the normal rules. For example: +<programlisting> + class HasConverter a b | a -> b where + convert :: a -> b + + data Foo a = MkFoo a + + instance (HasConverter a b,Show b) => Show (Foo a) where + show (MkFoo value) = show (convert value) +</programlisting> +This is dangerous territory, however. Here, for example, is a program that would make the +typechecker loop: +<programlisting> + class D a + class F a b | a->b + instance F [a] [[a]] + instance (D c, F a c) => D [a] -- 'c' is not mentioned in the head +</programlisting> +Similarly, it can be tempting to lift the coverage condition: +<programlisting> + class Mul a b c | a b -> c where + (.*.) :: a -> b -> c + + instance Mul Int Int Int where (.*.) = (*) + instance Mul Int Float Float where x .*. y = fromIntegral x * y + instance Mul a b c => Mul a [b] [c] where x .*. v = map (x.*.) v +</programlisting> +The third instance declaration does not obey the coverage condition; +and indeed the (somewhat strange) definition: +<programlisting> + f = \ b x y -> if b then x .*. [y] else y +</programlisting> +makes instance inference go into a loop, because it requires the constraint +<literal>(Mul a [b] b)</literal>. +</para> +<para> +Nevertheless, GHC allows you to experiment with more liberal rules. If you use +the experimental flag <option>-fallow-undecidable-instances</option> +<indexterm><primary>-fallow-undecidable-instances +option</primary></indexterm>, you can use arbitrary +types in both an instance context and instance head. Termination is ensured by having a +fixed-depth recursion stack. If you exceed the stack depth you get a +sort of backtrace, and the opportunity to increase the stack depth +with <option>-fcontext-stack</option><emphasis>N</emphasis>. +</para> + +</sect3> + + +<sect3 id="instance-overlap"> +<title>Overlapping instances</title> +<para> +In general, <emphasis>GHC requires that that it be unambiguous which instance +declaration +should be used to resolve a type-class constraint</emphasis>. This behaviour +can be modified by two flags: <option>-fallow-overlapping-instances</option> +<indexterm><primary>-fallow-overlapping-instances +</primary></indexterm> +and <option>-fallow-incoherent-instances</option> +<indexterm><primary>-fallow-incoherent-instances +</primary></indexterm>, as this section discusses.</para> +<para> +When GHC tries to resolve, say, the constraint <literal>C Int Bool</literal>, +it tries to match every instance declaration against the +constraint, +by instantiating the head of the instance declaration. For example, consider +these declarations: +<programlisting> + instance context1 => C Int a where ... -- (A) + instance context2 => C a Bool where ... -- (B) + instance context3 => C Int [a] where ... -- (C) + instance context4 => C Int [Int] where ... -- (D) +</programlisting> +The instances (A) and (B) match the constraint <literal>C Int Bool</literal>, +but (C) and (D) do not. When matching, GHC takes +no account of the context of the instance declaration +(<literal>context1</literal> etc). +GHC's default behaviour is that <emphasis>exactly one instance must match the +constraint it is trying to resolve</emphasis>. +It is fine for there to be a <emphasis>potential</emphasis> of overlap (by +including both declarations (A) and (B), say); an error is only reported if a +particular constraint matches more than one. +</para> + +<para> +The <option>-fallow-overlapping-instances</option> flag instructs GHC to allow +more than one instance to match, provided there is a most specific one. For +example, the constraint <literal>C Int [Int]</literal> matches instances (A), +(C) and (D), but the last is more specific, and hence is chosen. If there is no +most-specific match, the program is rejected. +</para> +<para> +However, GHC is conservative about committing to an overlapping instance. For example: +<programlisting> + f :: [b] -> [b] + f x = ... +</programlisting> +Suppose that from the RHS of <literal>f</literal> we get the constraint +<literal>C Int [b]</literal>. But +GHC does not commit to instance (C), because in a particular +call of <literal>f</literal>, <literal>b</literal> might be instantiate +to <literal>Int</literal>, in which case instance (D) would be more specific still. +So GHC rejects the program. If you add the flag <option>-fallow-incoherent-instances</option>, +GHC will instead pick (C), without complaining about +the problem of subsequent instantiations. +</para> +<para> +The willingness to be overlapped or incoherent is a property of +the <emphasis>instance declaration</emphasis> itself, controlled by the +presence or otherwise of the <option>-fallow-overlapping-instances</option> +and <option>-fallow-incoherent-instances</option> flags when that mdodule is +being defined. Neither flag is required in a module that imports and uses the +instance declaration. Specifically, during the lookup process: +<itemizedlist> +<listitem><para> +An instance declaration is ignored during the lookup process if (a) a more specific +match is found, and (b) the instance declaration was compiled with +<option>-fallow-overlapping-instances</option>. The flag setting for the +more-specific instance does not matter. +</para></listitem> +<listitem><para> +Suppose an instance declaration does not matche the constraint being looked up, but +does unify with it, so that it might match when the constraint is further +instantiated. Usually GHC will regard this as a reason for not committing to +some other constraint. But if the instance declaration was compiled with +<option>-fallow-incoherent-instances</option>, GHC will skip the "does-it-unify?" +check for that declaration. +</para></listitem> +</itemizedlist> +All this makes it possible for a library author to design a library that relies on +overlapping instances without the library client having to know. +</para> +<para>The <option>-fallow-incoherent-instances</option> flag implies the +<option>-fallow-overlapping-instances</option> flag, but not vice versa. +</para> +</sect3> + +<sect3> +<title>Type synonyms in the instance head</title> + +<para> +<emphasis>Unlike Haskell 98, instance heads may use type +synonyms</emphasis>. (The instance "head" is the bit after the "=>" in an instance decl.) +As always, using a type synonym is just shorthand for +writing the RHS of the type synonym definition. For example: + + +<programlisting> + type Point = (Int,Int) + instance C Point where ... + instance C [Point] where ... +</programlisting> + + +is legal. However, if you added + + +<programlisting> + instance C (Int,Int) where ... +</programlisting> + + +as well, then the compiler will complain about the overlapping +(actually, identical) instance declarations. As always, type synonyms +must be fully applied. You cannot, for example, write: + + +<programlisting> + type P a = [[a]] + instance Monad P where ... +</programlisting> + + +This design decision is independent of all the others, and easily +reversed, but it makes sense to me. + +</para> +</sect3> + + +</sect2> + +<sect2 id="type-restrictions"> +<title>Type signatures</title> + +<sect3><title>The context of a type signature</title> +<para> +Unlike Haskell 98, constraints in types do <emphasis>not</emphasis> have to be of +the form <emphasis>(class type-variable)</emphasis> or +<emphasis>(class (type-variable type-variable ...))</emphasis>. Thus, +these type signatures are perfectly OK +<programlisting> + g :: Eq [a] => ... + g :: Ord (T a ()) => ... +</programlisting> +</para> +<para> +GHC imposes the following restrictions on the constraints in a type signature. +Consider the type: + +<programlisting> + forall tv1..tvn (c1, ...,cn) => type +</programlisting> + +(Here, we write the "foralls" explicitly, although the Haskell source +language omits them; in Haskell 98, all the free type variables of an +explicit source-language type signature are universally quantified, +except for the class type variables in a class declaration. However, +in GHC, you can give the foralls if you want. See <xref linkend="universal-quantification"/>). +</para> + +<para> + +<orderedlist> +<listitem> + +<para> + <emphasis>Each universally quantified type variable +<literal>tvi</literal> must be reachable from <literal>type</literal></emphasis>. + +A type variable <literal>a</literal> is "reachable" if it it appears +in the same constraint as either a type variable free in in +<literal>type</literal>, or another reachable type variable. +A value with a type that does not obey +this reachability restriction cannot be used without introducing +ambiguity; that is why the type is rejected. +Here, for example, is an illegal type: + + +<programlisting> + forall a. Eq a => Int +</programlisting> + + +When a value with this type was used, the constraint <literal>Eq tv</literal> +would be introduced where <literal>tv</literal> is a fresh type variable, and +(in the dictionary-translation implementation) the value would be +applied to a dictionary for <literal>Eq tv</literal>. The difficulty is that we +can never know which instance of <literal>Eq</literal> to use because we never +get any more information about <literal>tv</literal>. +</para> +<para> +Note +that the reachability condition is weaker than saying that <literal>a</literal> is +functionally dependent on a type variable free in +<literal>type</literal> (see <xref +linkend="functional-dependencies"/>). The reason for this is there +might be a "hidden" dependency, in a superclass perhaps. So +"reachable" is a conservative approximation to "functionally dependent". +For example, consider: +<programlisting> + class C a b | a -> b where ... + class C a b => D a b where ... + f :: forall a b. D a b => a -> a +</programlisting> +This is fine, because in fact <literal>a</literal> does functionally determine <literal>b</literal> +but that is not immediately apparent from <literal>f</literal>'s type. +</para> +</listitem> +<listitem> + +<para> + <emphasis>Every constraint <literal>ci</literal> must mention at least one of the +universally quantified type variables <literal>tvi</literal></emphasis>. + +For example, this type is OK because <literal>C a b</literal> mentions the +universally quantified type variable <literal>b</literal>: + + +<programlisting> + forall a. C a b => burble +</programlisting> + + +The next type is illegal because the constraint <literal>Eq b</literal> does not +mention <literal>a</literal>: + + +<programlisting> + forall a. Eq b => burble +</programlisting> + + +The reason for this restriction is milder than the other one. The +excluded types are never useful or necessary (because the offending +context doesn't need to be witnessed at this point; it can be floated +out). Furthermore, floating them out increases sharing. Lastly, +excluding them is a conservative choice; it leaves a patch of +territory free in case we need it later. + +</para> +</listitem> + +</orderedlist> + +</para> +</sect3> + +<sect3 id="hoist"> +<title>For-all hoisting</title> +<para> +It is often convenient to use generalised type synonyms (see <xref linkend="type-synonyms"/>) at the right hand +end of an arrow, thus: +<programlisting> + type Discard a = forall b. a -> b -> a + + g :: Int -> Discard Int + g x y z = x+y +</programlisting> +Simply expanding the type synonym would give +<programlisting> + g :: Int -> (forall b. Int -> b -> Int) +</programlisting> +but GHC "hoists" the <literal>forall</literal> to give the isomorphic type +<programlisting> + g :: forall b. Int -> Int -> b -> Int +</programlisting> +In general, the rule is this: <emphasis>to determine the type specified by any explicit +user-written type (e.g. in a type signature), GHC expands type synonyms and then repeatedly +performs the transformation:</emphasis> +<programlisting> + <emphasis>type1</emphasis> -> forall a1..an. <emphasis>context2</emphasis> => <emphasis>type2</emphasis> +==> + forall a1..an. <emphasis>context2</emphasis> => <emphasis>type1</emphasis> -> <emphasis>type2</emphasis> +</programlisting> +(In fact, GHC tries to retain as much synonym information as possible for use in +error messages, but that is a usability issue.) This rule applies, of course, whether +or not the <literal>forall</literal> comes from a synonym. For example, here is another +valid way to write <literal>g</literal>'s type signature: +<programlisting> + g :: Int -> Int -> forall b. b -> Int +</programlisting> +</para> +<para> +When doing this hoisting operation, GHC eliminates duplicate constraints. For +example: +<programlisting> + type Foo a = (?x::Int) => Bool -> a + g :: Foo (Foo Int) +</programlisting> +means +<programlisting> + g :: (?x::Int) => Bool -> Bool -> Int +</programlisting> +</para> +</sect3> + + +</sect2> + +<sect2 id="implicit-parameters"> +<title>Implicit parameters</title> + +<para> Implicit parameters are implemented as described in +"Implicit parameters: dynamic scoping with static types", +J Lewis, MB Shields, E Meijer, J Launchbury, +27th ACM Symposium on Principles of Programming Languages (POPL'00), +Boston, Jan 2000. +</para> + +<para>(Most of the following, stil rather incomplete, documentation is +due to Jeff Lewis.)</para> + +<para>Implicit parameter support is enabled with the option +<option>-fimplicit-params</option>.</para> + +<para> +A variable is called <emphasis>dynamically bound</emphasis> when it is bound by the calling +context of a function and <emphasis>statically bound</emphasis> when bound by the callee's +context. In Haskell, all variables are statically bound. Dynamic +binding of variables is a notion that goes back to Lisp, but was later +discarded in more modern incarnations, such as Scheme. Dynamic binding +can be very confusing in an untyped language, and unfortunately, typed +languages, in particular Hindley-Milner typed languages like Haskell, +only support static scoping of variables. +</para> +<para> +However, by a simple extension to the type class system of Haskell, we +can support dynamic binding. Basically, we express the use of a +dynamically bound variable as a constraint on the type. These +constraints lead to types of the form <literal>(?x::t') => t</literal>, which says "this +function uses a dynamically-bound variable <literal>?x</literal> +of type <literal>t'</literal>". For +example, the following expresses the type of a sort function, +implicitly parameterized by a comparison function named <literal>cmp</literal>. +<programlisting> + sort :: (?cmp :: a -> a -> Bool) => [a] -> [a] +</programlisting> +The dynamic binding constraints are just a new form of predicate in the type class system. +</para> +<para> +An implicit parameter occurs in an expression using the special form <literal>?x</literal>, +where <literal>x</literal> is +any valid identifier (e.g. <literal>ord ?x</literal> is a valid expression). +Use of this construct also introduces a new +dynamic-binding constraint in the type of the expression. +For example, the following definition +shows how we can define an implicitly parameterized sort function in +terms of an explicitly parameterized <literal>sortBy</literal> function: +<programlisting> + sortBy :: (a -> a -> Bool) -> [a] -> [a] + + sort :: (?cmp :: a -> a -> Bool) => [a] -> [a] + sort = sortBy ?cmp +</programlisting> +</para> + +<sect3> +<title>Implicit-parameter type constraints</title> +<para> +Dynamic binding constraints behave just like other type class +constraints in that they are automatically propagated. Thus, when a +function is used, its implicit parameters are inherited by the +function that called it. For example, our <literal>sort</literal> function might be used +to pick out the least value in a list: +<programlisting> + least :: (?cmp :: a -> a -> Bool) => [a] -> a + least xs = fst (sort xs) +</programlisting> +Without lifting a finger, the <literal>?cmp</literal> parameter is +propagated to become a parameter of <literal>least</literal> as well. With explicit +parameters, the default is that parameters must always be explicit +propagated. With implicit parameters, the default is to always +propagate them. +</para> +<para> +An implicit-parameter type constraint differs from other type class constraints in the +following way: All uses of a particular implicit parameter must have +the same type. This means that the type of <literal>(?x, ?x)</literal> +is <literal>(?x::a) => (a,a)</literal>, and not +<literal>(?x::a, ?x::b) => (a, b)</literal>, as would be the case for type +class constraints. +</para> + +<para> You can't have an implicit parameter in the context of a class or instance +declaration. For example, both these declarations are illegal: +<programlisting> + class (?x::Int) => C a where ... + instance (?x::a) => Foo [a] where ... +</programlisting> +Reason: exactly which implicit parameter you pick up depends on exactly where +you invoke a function. But the ``invocation'' of instance declarations is done +behind the scenes by the compiler, so it's hard to figure out exactly where it is done. +Easiest thing is to outlaw the offending types.</para> +<para> +Implicit-parameter constraints do not cause ambiguity. For example, consider: +<programlisting> + f :: (?x :: [a]) => Int -> Int + f n = n + length ?x + + g :: (Read a, Show a) => String -> String + g s = show (read s) +</programlisting> +Here, <literal>g</literal> has an ambiguous type, and is rejected, but <literal>f</literal> +is fine. The binding for <literal>?x</literal> at <literal>f</literal>'s call site is +quite unambiguous, and fixes the type <literal>a</literal>. +</para> +</sect3> + +<sect3> +<title>Implicit-parameter bindings</title> + +<para> +An implicit parameter is <emphasis>bound</emphasis> using the standard +<literal>let</literal> or <literal>where</literal> binding forms. +For example, we define the <literal>min</literal> function by binding +<literal>cmp</literal>. +<programlisting> + min :: [a] -> a + min = let ?cmp = (<=) in least +</programlisting> +</para> +<para> +A group of implicit-parameter bindings may occur anywhere a normal group of Haskell +bindings can occur, except at top level. That is, they can occur in a <literal>let</literal> +(including in a list comprehension, or do-notation, or pattern guards), +or a <literal>where</literal> clause. +Note the following points: +<itemizedlist> +<listitem><para> +An implicit-parameter binding group must be a +collection of simple bindings to implicit-style variables (no +function-style bindings, and no type signatures); these bindings are +neither polymorphic or recursive. +</para></listitem> +<listitem><para> +You may not mix implicit-parameter bindings with ordinary bindings in a +single <literal>let</literal> +expression; use two nested <literal>let</literal>s instead. +(In the case of <literal>where</literal> you are stuck, since you can't nest <literal>where</literal> clauses.) +</para></listitem> + +<listitem><para> +You may put multiple implicit-parameter bindings in a +single binding group; but they are <emphasis>not</emphasis> treated +as a mutually recursive group (as ordinary <literal>let</literal> bindings are). +Instead they are treated as a non-recursive group, simultaneously binding all the implicit +parameter. The bindings are not nested, and may be re-ordered without changing +the meaning of the program. +For example, consider: +<programlisting> + f t = let { ?x = t; ?y = ?x+(1::Int) } in ?x + ?y +</programlisting> +The use of <literal>?x</literal> in the binding for <literal>?y</literal> does not "see" +the binding for <literal>?x</literal>, so the type of <literal>f</literal> is +<programlisting> + f :: (?x::Int) => Int -> Int +</programlisting> +</para></listitem> +</itemizedlist> +</para> + +</sect3> + +<sect3><title>Implicit parameters and polymorphic recursion</title> + +<para> +Consider these two definitions: +<programlisting> + len1 :: [a] -> Int + len1 xs = let ?acc = 0 in len_acc1 xs + + len_acc1 [] = ?acc + len_acc1 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc1 xs + + ------------ + + len2 :: [a] -> Int + len2 xs = let ?acc = 0 in len_acc2 xs + + len_acc2 :: (?acc :: Int) => [a] -> Int + len_acc2 [] = ?acc + len_acc2 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc2 xs +</programlisting> +The only difference between the two groups is that in the second group +<literal>len_acc</literal> is given a type signature. +In the former case, <literal>len_acc1</literal> is monomorphic in its own +right-hand side, so the implicit parameter <literal>?acc</literal> is not +passed to the recursive call. In the latter case, because <literal>len_acc2</literal> +has a type signature, the recursive call is made to the +<emphasis>polymoprhic</emphasis> version, which takes <literal>?acc</literal> +as an implicit parameter. So we get the following results in GHCi: +<programlisting> + Prog> len1 "hello" + 0 + Prog> len2 "hello" + 5 +</programlisting> +Adding a type signature dramatically changes the result! This is a rather +counter-intuitive phenomenon, worth watching out for. +</para> +</sect3> + +<sect3><title>Implicit parameters and monomorphism</title> + +<para>GHC applies the dreaded Monomorphism Restriction (section 4.5.5 of the +Haskell Report) to implicit parameters. For example, consider: +<programlisting> + f :: Int -> Int + f v = let ?x = 0 in + let y = ?x + v in + let ?x = 5 in + y +</programlisting> +Since the binding for <literal>y</literal> falls under the Monomorphism +Restriction it is not generalised, so the type of <literal>y</literal> is +simply <literal>Int</literal>, not <literal>(?x::Int) => Int</literal>. +Hence, <literal>(f 9)</literal> returns result <literal>9</literal>. +If you add a type signature for <literal>y</literal>, then <literal>y</literal> +will get type <literal>(?x::Int) => Int</literal>, so the occurrence of +<literal>y</literal> in the body of the <literal>let</literal> will see the +inner binding of <literal>?x</literal>, so <literal>(f 9)</literal> will return +<literal>14</literal>. +</para> +</sect3> +</sect2> + +<sect2 id="linear-implicit-parameters"> +<title>Linear implicit parameters</title> +<para> +Linear implicit parameters are an idea developed by Koen Claessen, +Mark Shields, and Simon PJ. They address the long-standing +problem that monads seem over-kill for certain sorts of problem, notably: +</para> +<itemizedlist> +<listitem> <para> distributing a supply of unique names </para> </listitem> +<listitem> <para> distributing a supply of random numbers </para> </listitem> +<listitem> <para> distributing an oracle (as in QuickCheck) </para> </listitem> +</itemizedlist> + +<para> +Linear implicit parameters are just like ordinary implicit parameters, +except that they are "linear" -- that is, they cannot be copied, and +must be explicitly "split" instead. Linear implicit parameters are +written '<literal>%x</literal>' instead of '<literal>?x</literal>'. +(The '/' in the '%' suggests the split!) +</para> +<para> +For example: +<programlisting> + import GHC.Exts( Splittable ) + + data NameSupply = ... + + splitNS :: NameSupply -> (NameSupply, NameSupply) + newName :: NameSupply -> Name + + instance Splittable NameSupply where + split = splitNS + + + f :: (%ns :: NameSupply) => Env -> Expr -> Expr + f env (Lam x e) = Lam x' (f env e) + where + x' = newName %ns + env' = extend env x x' + ...more equations for f... +</programlisting> +Notice that the implicit parameter %ns is consumed +<itemizedlist> +<listitem> <para> once by the call to <literal>newName</literal> </para> </listitem> +<listitem> <para> once by the recursive call to <literal>f</literal> </para></listitem> +</itemizedlist> +</para> +<para> +So the translation done by the type checker makes +the parameter explicit: +<programlisting> + f :: NameSupply -> Env -> Expr -> Expr + f ns env (Lam x e) = Lam x' (f ns1 env e) + where + (ns1,ns2) = splitNS ns + x' = newName ns2 + env = extend env x x' +</programlisting> +Notice the call to 'split' introduced by the type checker. +How did it know to use 'splitNS'? Because what it really did +was to introduce a call to the overloaded function 'split', +defined by the class <literal>Splittable</literal>: +<programlisting> + class Splittable a where + split :: a -> (a,a) +</programlisting> +The instance for <literal>Splittable NameSupply</literal> tells GHC how to implement +split for name supplies. But we can simply write +<programlisting> + g x = (x, %ns, %ns) +</programlisting> +and GHC will infer +<programlisting> + g :: (Splittable a, %ns :: a) => b -> (b,a,a) +</programlisting> +The <literal>Splittable</literal> class is built into GHC. It's exported by module +<literal>GHC.Exts</literal>. +</para> +<para> +Other points: +<itemizedlist> +<listitem> <para> '<literal>?x</literal>' and '<literal>%x</literal>' +are entirely distinct implicit parameters: you + can use them together and they won't intefere with each other. </para> +</listitem> + +<listitem> <para> You can bind linear implicit parameters in 'with' clauses. </para> </listitem> + +<listitem> <para>You cannot have implicit parameters (whether linear or not) + in the context of a class or instance declaration. </para></listitem> +</itemizedlist> +</para> + +<sect3><title>Warnings</title> + +<para> +The monomorphism restriction is even more important than usual. +Consider the example above: +<programlisting> + f :: (%ns :: NameSupply) => Env -> Expr -> Expr + f env (Lam x e) = Lam x' (f env e) + where + x' = newName %ns + env' = extend env x x' +</programlisting> +If we replaced the two occurrences of x' by (newName %ns), which is +usually a harmless thing to do, we get: +<programlisting> + f :: (%ns :: NameSupply) => Env -> Expr -> Expr + f env (Lam x e) = Lam (newName %ns) (f env e) + where + env' = extend env x (newName %ns) +</programlisting> +But now the name supply is consumed in <emphasis>three</emphasis> places +(the two calls to newName,and the recursive call to f), so +the result is utterly different. Urk! We don't even have +the beta rule. +</para> +<para> +Well, this is an experimental change. With implicit +parameters we have already lost beta reduction anyway, and +(as John Launchbury puts it) we can't sensibly reason about +Haskell programs without knowing their typing. +</para> + +</sect3> + +<sect3><title>Recursive functions</title> +<para>Linear implicit parameters can be particularly tricky when you have a recursive function +Consider +<programlisting> + foo :: %x::T => Int -> [Int] + foo 0 = [] + foo n = %x : foo (n-1) +</programlisting> +where T is some type in class Splittable.</para> +<para> +Do you get a list of all the same T's or all different T's +(assuming that split gives two distinct T's back)? +</para><para> +If you supply the type signature, taking advantage of polymorphic +recursion, you get what you'd probably expect. Here's the +translated term, where the implicit param is made explicit: +<programlisting> + foo x 0 = [] + foo x n = let (x1,x2) = split x + in x1 : foo x2 (n-1) +</programlisting> +But if you don't supply a type signature, GHC uses the Hindley +Milner trick of using a single monomorphic instance of the function +for the recursive calls. That is what makes Hindley Milner type inference +work. So the translation becomes +<programlisting> + foo x = let + foom 0 = [] + foom n = x : foom (n-1) + in + foom +</programlisting> +Result: 'x' is not split, and you get a list of identical T's. So the +semantics of the program depends on whether or not foo has a type signature. +Yikes! +</para><para> +You may say that this is a good reason to dislike linear implicit parameters +and you'd be right. That is why they are an experimental feature. +</para> +</sect3> + +</sect2> + +<sect2 id="sec-kinding"> +<title>Explicitly-kinded quantification</title> + +<para> +Haskell infers the kind of each type variable. Sometimes it is nice to be able +to give the kind explicitly as (machine-checked) documentation, +just as it is nice to give a type signature for a function. On some occasions, +it is essential to do so. For example, in his paper "Restricted Data Types in Haskell" (Haskell Workshop 1999) +John Hughes had to define the data type: +<screen> + data Set cxt a = Set [a] + | Unused (cxt a -> ()) +</screen> +The only use for the <literal>Unused</literal> constructor was to force the correct +kind for the type variable <literal>cxt</literal>. +</para> +<para> +GHC now instead allows you to specify the kind of a type variable directly, wherever +a type variable is explicitly bound. Namely: +<itemizedlist> +<listitem><para><literal>data</literal> declarations: +<screen> + data Set (cxt :: * -> *) a = Set [a] +</screen></para></listitem> +<listitem><para><literal>type</literal> declarations: +<screen> + type T (f :: * -> *) = f Int +</screen></para></listitem> +<listitem><para><literal>class</literal> declarations: +<screen> + class (Eq a) => C (f :: * -> *) a where ... +</screen></para></listitem> +<listitem><para><literal>forall</literal>'s in type signatures: +<screen> + f :: forall (cxt :: * -> *). Set cxt Int +</screen></para></listitem> +</itemizedlist> +</para> + +<para> +The parentheses are required. Some of the spaces are required too, to +separate the lexemes. If you write <literal>(f::*->*)</literal> you +will get a parse error, because "<literal>::*->*</literal>" is a +single lexeme in Haskell. +</para> + +<para> +As part of the same extension, you can put kind annotations in types +as well. Thus: +<screen> + f :: (Int :: *) -> Int + g :: forall a. a -> (a :: *) +</screen> +The syntax is +<screen> + atype ::= '(' ctype '::' kind ') +</screen> +The parentheses are required. +</para> +</sect2> + + +<sect2 id="universal-quantification"> +<title>Arbitrary-rank polymorphism +</title> + +<para> +Haskell type signatures are implicitly quantified. The new keyword <literal>forall</literal> +allows us to say exactly what this means. For example: +</para> +<para> +<programlisting> + g :: b -> b +</programlisting> +means this: +<programlisting> + g :: forall b. (b -> b) +</programlisting> +The two are treated identically. +</para> + +<para> +However, GHC's type system supports <emphasis>arbitrary-rank</emphasis> +explicit universal quantification in +types. +For example, all the following types are legal: +<programlisting> + f1 :: forall a b. a -> b -> a + g1 :: forall a b. (Ord a, Eq b) => a -> b -> a + + f2 :: (forall a. a->a) -> Int -> Int + g2 :: (forall a. Eq a => [a] -> a -> Bool) -> Int -> Int + + f3 :: ((forall a. a->a) -> Int) -> Bool -> Bool +</programlisting> +Here, <literal>f1</literal> and <literal>g1</literal> are rank-1 types, and +can be written in standard Haskell (e.g. <literal>f1 :: a->b->a</literal>). +The <literal>forall</literal> makes explicit the universal quantification that +is implicitly added by Haskell. +</para> +<para> +The functions <literal>f2</literal> and <literal>g2</literal> have rank-2 types; +the <literal>forall</literal> is on the left of a function arrow. As <literal>g2</literal> +shows, the polymorphic type on the left of the function arrow can be overloaded. +</para> +<para> +The function <literal>f3</literal> has a rank-3 type; +it has rank-2 types on the left of a function arrow. +</para> +<para> +GHC allows types of arbitrary rank; you can nest <literal>forall</literal>s +arbitrarily deep in function arrows. (GHC used to be restricted to rank 2, but +that restriction has now been lifted.) +In particular, a forall-type (also called a "type scheme"), +including an operational type class context, is legal: +<itemizedlist> +<listitem> <para> On the left of a function arrow </para> </listitem> +<listitem> <para> On the right of a function arrow (see <xref linkend="hoist"/>) </para> </listitem> +<listitem> <para> As the argument of a constructor, or type of a field, in a data type declaration. For +example, any of the <literal>f1,f2,f3,g1,g2</literal> above would be valid +field type signatures.</para> </listitem> +<listitem> <para> As the type of an implicit parameter </para> </listitem> +<listitem> <para> In a pattern type signature (see <xref linkend="scoped-type-variables"/>) </para> </listitem> +</itemizedlist> +There is one place you cannot put a <literal>forall</literal>: +you cannot instantiate a type variable with a forall-type. So you cannot +make a forall-type the argument of a type constructor. So these types are illegal: +<programlisting> + x1 :: [forall a. a->a] + x2 :: (forall a. a->a, Int) + x3 :: Maybe (forall a. a->a) +</programlisting> +Of course <literal>forall</literal> becomes a keyword; you can't use <literal>forall</literal> as +a type variable any more! +</para> + + +<sect3 id="univ"> +<title>Examples +</title> + +<para> +In a <literal>data</literal> or <literal>newtype</literal> declaration one can quantify +the types of the constructor arguments. Here are several examples: +</para> + +<para> + +<programlisting> +data T a = T1 (forall b. b -> b -> b) a + +data MonadT m = MkMonad { return :: forall a. a -> m a, + bind :: forall a b. m a -> (a -> m b) -> m b + } + +newtype Swizzle = MkSwizzle (Ord a => [a] -> [a]) +</programlisting> + +</para> + +<para> +The constructors have rank-2 types: +</para> + +<para> + +<programlisting> +T1 :: forall a. (forall b. b -> b -> b) -> a -> T a +MkMonad :: forall m. (forall a. a -> m a) + -> (forall a b. m a -> (a -> m b) -> m b) + -> MonadT m +MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle +</programlisting> + +</para> + +<para> +Notice that you don't need to use a <literal>forall</literal> if there's an +explicit context. For example in the first argument of the +constructor <function>MkSwizzle</function>, an implicit "<literal>forall a.</literal>" is +prefixed to the argument type. The implicit <literal>forall</literal> +quantifies all type variables that are not already in scope, and are +mentioned in the type quantified over. +</para> + +<para> +As for type signatures, implicit quantification happens for non-overloaded +types too. So if you write this: + +<programlisting> + data T a = MkT (Either a b) (b -> b) +</programlisting> + +it's just as if you had written this: + +<programlisting> + data T a = MkT (forall b. Either a b) (forall b. b -> b) +</programlisting> + +That is, since the type variable <literal>b</literal> isn't in scope, it's +implicitly universally quantified. (Arguably, it would be better +to <emphasis>require</emphasis> explicit quantification on constructor arguments +where that is what is wanted. Feedback welcomed.) +</para> + +<para> +You construct values of types <literal>T1, MonadT, Swizzle</literal> by applying +the constructor to suitable values, just as usual. For example, +</para> + +<para> + +<programlisting> + a1 :: T Int + a1 = T1 (\xy->x) 3 + + a2, a3 :: Swizzle + a2 = MkSwizzle sort + a3 = MkSwizzle reverse + + a4 :: MonadT Maybe + a4 = let r x = Just x + b m k = case m of + Just y -> k y + Nothing -> Nothing + in + MkMonad r b + + mkTs :: (forall b. b -> b -> b) -> a -> [T a] + mkTs f x y = [T1 f x, T1 f y] +</programlisting> + +</para> + +<para> +The type of the argument can, as usual, be more general than the type +required, as <literal>(MkSwizzle reverse)</literal> shows. (<function>reverse</function> +does not need the <literal>Ord</literal> constraint.) +</para> + +<para> +When you use pattern matching, the bound variables may now have +polymorphic types. For example: +</para> + +<para> + +<programlisting> + f :: T a -> a -> (a, Char) + f (T1 w k) x = (w k x, w 'c' 'd') + + g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b] + g (MkSwizzle s) xs f = s (map f (s xs)) + + h :: MonadT m -> [m a] -> m [a] + h m [] = return m [] + h m (x:xs) = bind m x $ \y -> + bind m (h m xs) $ \ys -> + return m (y:ys) +</programlisting> + +</para> + +<para> +In the function <function>h</function> we use the record selectors <literal>return</literal> +and <literal>bind</literal> to extract the polymorphic bind and return functions +from the <literal>MonadT</literal> data structure, rather than using pattern +matching. +</para> +</sect3> + +<sect3> +<title>Type inference</title> + +<para> +In general, type inference for arbitrary-rank types is undecidable. +GHC uses an algorithm proposed by Odersky and Laufer ("Putting type annotations to work", POPL'96) +to get a decidable algorithm by requiring some help from the programmer. +We do not yet have a formal specification of "some help" but the rule is this: +</para> +<para> +<emphasis>For a lambda-bound or case-bound variable, x, either the programmer +provides an explicit polymorphic type for x, or GHC's type inference will assume +that x's type has no foralls in it</emphasis>. +</para> +<para> +What does it mean to "provide" an explicit type for x? You can do that by +giving a type signature for x directly, using a pattern type signature +(<xref linkend="scoped-type-variables"/>), thus: +<programlisting> + \ f :: (forall a. a->a) -> (f True, f 'c') +</programlisting> +Alternatively, you can give a type signature to the enclosing +context, which GHC can "push down" to find the type for the variable: +<programlisting> + (\ f -> (f True, f 'c')) :: (forall a. a->a) -> (Bool,Char) +</programlisting> +Here the type signature on the expression can be pushed inwards +to give a type signature for f. Similarly, and more commonly, +one can give a type signature for the function itself: +<programlisting> + h :: (forall a. a->a) -> (Bool,Char) + h f = (f True, f 'c') +</programlisting> +You don't need to give a type signature if the lambda bound variable +is a constructor argument. Here is an example we saw earlier: +<programlisting> + f :: T a -> a -> (a, Char) + f (T1 w k) x = (w k x, w 'c' 'd') +</programlisting> +Here we do not need to give a type signature to <literal>w</literal>, because +it is an argument of constructor <literal>T1</literal> and that tells GHC all +it needs to know. +</para> + +</sect3> + + +<sect3 id="implicit-quant"> +<title>Implicit quantification</title> + +<para> +GHC performs implicit quantification as follows. <emphasis>At the top level (only) of +user-written types, if and only if there is no explicit <literal>forall</literal>, +GHC finds all the type variables mentioned in the type that are not already +in scope, and universally quantifies them.</emphasis> For example, the following pairs are +equivalent: +<programlisting> + f :: a -> a + f :: forall a. a -> a + + g (x::a) = let + h :: a -> b -> b + h x y = y + in ... + g (x::a) = let + h :: forall b. a -> b -> b + h x y = y + in ... +</programlisting> +</para> +<para> +Notice that GHC does <emphasis>not</emphasis> find the innermost possible quantification +point. For example: +<programlisting> + f :: (a -> a) -> Int + -- MEANS + f :: forall a. (a -> a) -> Int + -- NOT + f :: (forall a. a -> a) -> Int + + + g :: (Ord a => a -> a) -> Int + -- MEANS the illegal type + g :: forall a. (Ord a => a -> a) -> Int + -- NOT + g :: (forall a. Ord a => a -> a) -> Int +</programlisting> +The latter produces an illegal type, which you might think is silly, +but at least the rule is simple. If you want the latter type, you +can write your for-alls explicitly. Indeed, doing so is strongly advised +for rank-2 types. +</para> +</sect3> +</sect2> + + + + +<sect2 id="scoped-type-variables"> +<title>Scoped type variables +</title> + +<para> +A <emphasis>lexically scoped type variable</emphasis> can be bound by: +<itemizedlist> +<listitem><para>A declaration type signature (<xref linkend="decl-type-sigs"/>)</para></listitem> +<listitem><para>A pattern type signature (<xref linkend="pattern-type-sigs"/>)</para></listitem> +<listitem><para>A result type signature (<xref linkend="result-type-sigs"/>)</para></listitem> +</itemizedlist> +For example: +<programlisting> +f (xs::[a]) = ys ++ ys + where + ys :: [a] + ys = reverse xs +</programlisting> +The pattern <literal>(xs::[a])</literal> includes a type signature for <varname>xs</varname>. +This brings the type variable <literal>a</literal> into scope; it scopes over +all the patterns and right hand sides for this equation for <function>f</function>. +In particular, it is in scope at the type signature for <varname>y</varname>. +</para> + +<para> +At ordinary type signatures, such as that for <varname>ys</varname>, any type variables +mentioned in the type signature <emphasis>that are not in scope</emphasis> are +implicitly universally quantified. (If there are no type variables in +scope, all type variables mentioned in the signature are universally +quantified, which is just as in Haskell 98.) In this case, since <varname>a</varname> +is in scope, it is not universally quantified, so the type of <varname>ys</varname> is +the same as that of <varname>xs</varname>. In Haskell 98 it is not possible to declare +a type for <varname>ys</varname>; a major benefit of scoped type variables is that +it becomes possible to do so. +</para> + +<para> +Scoped type variables are implemented in both GHC and Hugs. Where the +implementations differ from the specification below, those differences +are noted. +</para> + +<para> +So much for the basic idea. Here are the details. +</para> + +<sect3> +<title>What a scoped type variable means</title> +<para> +A lexically-scoped type variable is simply +the name for a type. The restriction it expresses is that all occurrences +of the same name mean the same type. For example: +<programlisting> + f :: [Int] -> Int -> Int + f (xs::[a]) (y::a) = (head xs + y) :: a +</programlisting> +The pattern type signatures on the left hand side of +<literal>f</literal> express the fact that <literal>xs</literal> +must be a list of things of some type <literal>a</literal>; and that <literal>y</literal> +must have this same type. The type signature on the expression <literal>(head xs)</literal> +specifies that this expression must have the same type <literal>a</literal>. +<emphasis>There is no requirement that the type named by "<literal>a</literal>" is +in fact a type variable</emphasis>. Indeed, in this case, the type named by "<literal>a</literal>" is +<literal>Int</literal>. (This is a slight liberalisation from the original rather complex +rules, which specified that a pattern-bound type variable should be universally quantified.) +For example, all of these are legal:</para> + +<programlisting> + t (x::a) (y::a) = x+y*2 + + f (x::a) (y::b) = [x,y] -- a unifies with b + + g (x::a) = x + 1::Int -- a unifies with Int + + h x = let k (y::a) = [x,y] -- a is free in the + in k x -- environment + + k (x::a) True = ... -- a unifies with Int + k (x::Int) False = ... + + w :: [b] -> [b] + w (x::a) = x -- a unifies with [b] +</programlisting> + +</sect3> + +<sect3> +<title>Scope and implicit quantification</title> + +<para> + +<itemizedlist> +<listitem> + +<para> +All the type variables mentioned in a pattern, +that are not already in scope, +are brought into scope by the pattern. We describe this set as +the <emphasis>type variables bound by the pattern</emphasis>. +For example: +<programlisting> + f (x::a) = let g (y::(a,b)) = fst y + in + g (x,True) +</programlisting> +The pattern <literal>(x::a)</literal> brings the type variable +<literal>a</literal> into scope, as well as the term +variable <literal>x</literal>. The pattern <literal>(y::(a,b))</literal> +contains an occurrence of the already-in-scope type variable <literal>a</literal>, +and brings into scope the type variable <literal>b</literal>. +</para> +</listitem> + +<listitem> +<para> +The type variable(s) bound by the pattern have the same scope +as the term variable(s) bound by the pattern. For example: +<programlisting> + let + f (x::a) = <...rhs of f...> + (p::b, q::b) = (1,2) + in <...body of let...> +</programlisting> +Here, the type variable <literal>a</literal> scopes over the right hand side of <literal>f</literal>, +just like <literal>x</literal> does; while the type variable <literal>b</literal> scopes over the +body of the <literal>let</literal>, and all the other definitions in the <literal>let</literal>, +just like <literal>p</literal> and <literal>q</literal> do. +Indeed, the newly bound type variables also scope over any ordinary, separate +type signatures in the <literal>let</literal> group. +</para> +</listitem> + + +<listitem> +<para> +The type variables bound by the pattern may be +mentioned in ordinary type signatures or pattern +type signatures anywhere within their scope. + +</para> +</listitem> + +<listitem> +<para> + In ordinary type signatures, any type variable mentioned in the +signature that is in scope is <emphasis>not</emphasis> universally quantified. + +</para> +</listitem> + +<listitem> + +<para> + Ordinary type signatures do not bring any new type variables +into scope (except in the type signature itself!). So this is illegal: + +<programlisting> + f :: a -> a + f x = x::a +</programlisting> + +It's illegal because <varname>a</varname> is not in scope in the body of <function>f</function>, +so the ordinary signature <literal>x::a</literal> is equivalent to <literal>x::forall a.a</literal>; +and that is an incorrect typing. + +</para> +</listitem> + +<listitem> +<para> +The pattern type signature is a monotype: +</para> + +<itemizedlist> +<listitem> <para> +A pattern type signature cannot contain any explicit <literal>forall</literal> quantification. +</para> </listitem> + +<listitem> <para> +The type variables bound by a pattern type signature can only be instantiated to monotypes, +not to type schemes. +</para> </listitem> + +<listitem> <para> +There is no implicit universal quantification on pattern type signatures (in contrast to +ordinary type signatures). +</para> </listitem> + +</itemizedlist> + +</listitem> + +<listitem> +<para> + +The type variables in the head of a <literal>class</literal> or <literal>instance</literal> declaration +scope over the methods defined in the <literal>where</literal> part. For example: + + +<programlisting> + class C a where + op :: [a] -> a + + op xs = let ys::[a] + ys = reverse xs + in + head ys +</programlisting> + + +(Not implemented in Hugs yet, Dec 98). +</para> +</listitem> + +</itemizedlist> + +</para> + +</sect3> + +<sect3 id="decl-type-sigs"> +<title>Declaration type signatures</title> +<para>A declaration type signature that has <emphasis>explicit</emphasis> +quantification (using <literal>forall</literal>) brings into scope the +explicitly-quantified +type variables, in the definition of the named function(s). For example: +<programlisting> + f :: forall a. [a] -> [a] + f (x:xs) = xs ++ [ x :: a ] +</programlisting> +The "<literal>forall a</literal>" brings "<literal>a</literal>" into scope in +the definition of "<literal>f</literal>". +</para> +<para>This only happens if the quantification in <literal>f</literal>'s type +signature is explicit. For example: +<programlisting> + g :: [a] -> [a] + g (x:xs) = xs ++ [ x :: a ] +</programlisting> +This program will be rejected, because "<literal>a</literal>" does not scope +over the definition of "<literal>f</literal>", so "<literal>x::a</literal>" +means "<literal>x::forall a. a</literal>" by Haskell's usual implicit +quantification rules. +</para> +</sect3> + +<sect3 id="pattern-type-sigs"> +<title>Where a pattern type signature can occur</title> + +<para> +A pattern type signature can occur in any pattern. For example: +<itemizedlist> + +<listitem> +<para> +A pattern type signature can be on an arbitrary sub-pattern, not +just on a variable: + + +<programlisting> + f ((x,y)::(a,b)) = (y,x) :: (b,a) +</programlisting> + + +</para> +</listitem> +<listitem> + +<para> + Pattern type signatures, including the result part, can be used +in lambda abstractions: + +<programlisting> + (\ (x::a, y) :: a -> x) +</programlisting> +</para> +</listitem> +<listitem> + +<para> + Pattern type signatures, including the result part, can be used +in <literal>case</literal> expressions: + +<programlisting> + case e of { ((x::a, y) :: (a,b)) -> x } +</programlisting> + +Note that the <literal>-></literal> symbol in a case alternative +leads to difficulties when parsing a type signature in the pattern: in +the absence of the extra parentheses in the example above, the parser +would try to interpret the <literal>-></literal> as a function +arrow and give a parse error later. + +</para> + +</listitem> + +<listitem> +<para> +To avoid ambiguity, the type after the “<literal>::</literal>” in a result +pattern signature on a lambda or <literal>case</literal> must be atomic (i.e. a single +token or a parenthesised type of some sort). To see why, +consider how one would parse this: + + +<programlisting> + \ x :: a -> b -> x +</programlisting> + + +</para> +</listitem> + +<listitem> + +<para> + Pattern type signatures can bind existential type variables. +For example: + + +<programlisting> + data T = forall a. MkT [a] + + f :: T -> T + f (MkT [t::a]) = MkT t3 + where + t3::[a] = [t,t,t] +</programlisting> + + +</para> +</listitem> + + +<listitem> + +<para> +Pattern type signatures +can be used in pattern bindings: + +<programlisting> + f x = let (y, z::a) = x in ... + f1 x = let (y, z::Int) = x in ... + f2 (x::(Int,a)) = let (y, z::a) = x in ... + f3 :: (b->b) = \x -> x +</programlisting> + +In all such cases, the binding is not generalised over the pattern-bound +type variables. Thus <literal>f3</literal> is monomorphic; <literal>f3</literal> +has type <literal>b -> b</literal> for some type <literal>b</literal>, +and <emphasis>not</emphasis> <literal>forall b. b -> b</literal>. +In contrast, the binding +<programlisting> + f4 :: b->b + f4 = \x -> x +</programlisting> +makes a polymorphic function, but <literal>b</literal> is not in scope anywhere +in <literal>f4</literal>'s scope. + +</para> +</listitem> +</itemizedlist> +</para> +<para>Pattern type signatures are completely orthogonal to ordinary, separate +type signatures. The two can be used independently or together.</para> + +</sect3> + +<sect3 id="result-type-sigs"> +<title>Result type signatures</title> + +<para> +The result type of a function can be given a signature, thus: + + +<programlisting> + f (x::a) :: [a] = [x,x,x] +</programlisting> + + +The final <literal>:: [a]</literal> after all the patterns gives a signature to the +result type. Sometimes this is the only way of naming the type variable +you want: + + +<programlisting> + f :: Int -> [a] -> [a] + f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x) + in \xs -> map g (reverse xs `zip` xs) +</programlisting> + +</para> +<para> +The type variables bound in a result type signature scope over the right hand side +of the definition. However, consider this corner-case: +<programlisting> + rev1 :: [a] -> [a] = \xs -> reverse xs + + foo ys = rev (ys::[a]) +</programlisting> +The signature on <literal>rev1</literal> is considered a pattern type signature, not a result +type signature, and the type variables it binds have the same scope as <literal>rev1</literal> +itself (i.e. the right-hand side of <literal>rev1</literal> and the rest of the module too). +In particular, the expression <literal>(ys::[a])</literal> is OK, because the type variable <literal>a</literal> +is in scope (otherwise it would mean <literal>(ys::forall a.[a])</literal>, which would be rejected). +</para> +<para> +As mentioned above, <literal>rev1</literal> is made monomorphic by this scoping rule. +For example, the following program would be rejected, because it claims that <literal>rev1</literal> +is polymorphic: +<programlisting> + rev1 :: [b] -> [b] + rev1 :: [a] -> [a] = \xs -> reverse xs +</programlisting> +</para> + +<para> +Result type signatures are not yet implemented in Hugs. +</para> + +</sect3> + +</sect2> + +<sect2 id="deriving-typeable"> +<title>Deriving clause for classes <literal>Typeable</literal> and <literal>Data</literal></title> + +<para> +Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type +declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause. +In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard +classes <literal>Eq</literal>, <literal>Ord</literal>, +<literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>. +</para> +<para> +GHC extends this list with two more classes that may be automatically derived +(provided the <option>-fglasgow-exts</option> flag is specified): +<literal>Typeable</literal>, and <literal>Data</literal>. These classes are defined in the library +modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively, and the +appropriate class must be in scope before it can be mentioned in the <literal>deriving</literal> clause. +</para> +<para>An instance of <literal>Typeable</literal> can only be derived if the +data type has seven or fewer type parameters, all of kind <literal>*</literal>. +The reason for this is that the <literal>Typeable</literal> class is derived using the scheme +described in +<ulink url="http://research.microsoft.com/%7Esimonpj/papers/hmap/gmap2.ps"> +Scrap More Boilerplate: Reflection, Zips, and Generalised Casts +</ulink>. +(Section 7.4 of the paper describes the multiple <literal>Typeable</literal> classes that +are used, and only <literal>Typeable1</literal> up to +<literal>Typeable7</literal> are provided in the library.) +In other cases, there is nothing to stop the programmer writing a <literal>TypableX</literal> +class, whose kind suits that of the data type constructor, and +then writing the data type instance by hand. +</para> +</sect2> + +<sect2 id="newtype-deriving"> +<title>Generalised derived instances for newtypes</title> + +<para> +When you define an abstract type using <literal>newtype</literal>, you may want +the new type to inherit some instances from its representation. In +Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>, +<literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any +other classes you have to write an explicit instance declaration. For +example, if you define + +<programlisting> + newtype Dollars = Dollars Int +</programlisting> + +and you want to use arithmetic on <literal>Dollars</literal>, you have to +explicitly define an instance of <literal>Num</literal>: + +<programlisting> + instance Num Dollars where + Dollars a + Dollars b = Dollars (a+b) + ... +</programlisting> +All the instance does is apply and remove the <literal>newtype</literal> +constructor. It is particularly galling that, since the constructor +doesn't appear at run-time, this instance declaration defines a +dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal> +dictionary, only slower! +</para> + + +<sect3> <title> Generalising the deriving clause </title> +<para> +GHC now permits such instances to be derived instead, so one can write +<programlisting> + newtype Dollars = Dollars Int deriving (Eq,Show,Num) +</programlisting> + +and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary +for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler +derives an instance declaration of the form + +<programlisting> + instance Num Int => Num Dollars +</programlisting> + +which just adds or removes the <literal>newtype</literal> constructor according to the type. +</para> +<para> + +We can also derive instances of constructor classes in a similar +way. For example, suppose we have implemented state and failure monad +transformers, such that + +<programlisting> + instance Monad m => Monad (State s m) + instance Monad m => Monad (Failure m) +</programlisting> +In Haskell 98, we can define a parsing monad by +<programlisting> + type Parser tok m a = State [tok] (Failure m) a +</programlisting> + +which is automatically a monad thanks to the instance declarations +above. With the extension, we can make the parser type abstract, +without needing to write an instance of class <literal>Monad</literal>, via + +<programlisting> + newtype Parser tok m a = Parser (State [tok] (Failure m) a) + deriving Monad +</programlisting> +In this case the derived instance declaration is of the form +<programlisting> + instance Monad (State [tok] (Failure m)) => Monad (Parser tok m) +</programlisting> + +Notice that, since <literal>Monad</literal> is a constructor class, the +instance is a <emphasis>partial application</emphasis> of the new type, not the +entire left hand side. We can imagine that the type declaration is +``eta-converted'' to generate the context of the instance +declaration. +</para> +<para> + +We can even derive instances of multi-parameter classes, provided the +newtype is the last class parameter. In this case, a ``partial +application'' of the class appears in the <literal>deriving</literal> +clause. For example, given the class + +<programlisting> + class StateMonad s m | m -> s where ... + instance Monad m => StateMonad s (State s m) where ... +</programlisting> +then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by +<programlisting> + newtype Parser tok m a = Parser (State [tok] (Failure m) a) + deriving (Monad, StateMonad [tok]) +</programlisting> + +The derived instance is obtained by completing the application of the +class to the new type: + +<programlisting> + instance StateMonad [tok] (State [tok] (Failure m)) => + StateMonad [tok] (Parser tok m) +</programlisting> +</para> +<para> + +As a result of this extension, all derived instances in newtype + declarations are treated uniformly (and implemented just by reusing +the dictionary for the representation type), <emphasis>except</emphasis> +<literal>Show</literal> and <literal>Read</literal>, which really behave differently for +the newtype and its representation. +</para> +</sect3> + +<sect3> <title> A more precise specification </title> +<para> +Derived instance declarations are constructed as follows. Consider the +declaration (after expansion of any type synonyms) + +<programlisting> + newtype T v1...vn = T' (S t1...tk vk+1...vn) deriving (c1...cm) +</programlisting> + +where + <itemizedlist> +<listitem><para> + <literal>S</literal> is a type constructor, +</para></listitem> +<listitem><para> + The <literal>t1...tk</literal> are types, +</para></listitem> +<listitem><para> + The <literal>vk+1...vn</literal> are type variables which do not occur in any of + the <literal>ti</literal>, and +</para></listitem> +<listitem><para> + The <literal>ci</literal> are partial applications of + classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal> + is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument. +</para></listitem> +<listitem><para> + None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>, + <literal>Typeable</literal>, or <literal>Data</literal>. These classes + should not "look through" the type or its constructor. You can still + derive these classes for a newtype, but it happens in the usual way, not + via this new mechanism. +</para></listitem> +</itemizedlist> +Then, for each <literal>ci</literal>, the derived instance +declaration is: +<programlisting> + instance ci (S t1...tk vk+1...v) => ci (T v1...vp) +</programlisting> +where <literal>p</literal> is chosen so that <literal>T v1...vp</literal> is of the +right <emphasis>kind</emphasis> for the last parameter of class <literal>Ci</literal>. +</para> +<para> + +As an example which does <emphasis>not</emphasis> work, consider +<programlisting> + newtype NonMonad m s = NonMonad (State s m s) deriving Monad +</programlisting> +Here we cannot derive the instance +<programlisting> + instance Monad (State s m) => Monad (NonMonad m) +</programlisting> + +because the type variable <literal>s</literal> occurs in <literal>State s m</literal>, +and so cannot be "eta-converted" away. It is a good thing that this +<literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is +not, in fact, a monad --- for the same reason. Try defining +<literal>>>=</literal> with the correct type: you won't be able to. +</para> +<para> + +Notice also that the <emphasis>order</emphasis> of class parameters becomes +important, since we can only derive instances for the last one. If the +<literal>StateMonad</literal> class above were instead defined as + +<programlisting> + class StateMonad m s | m -> s where ... +</programlisting> + +then we would not have been able to derive an instance for the +<literal>Parser</literal> type above. We hypothesise that multi-parameter +classes usually have one "main" parameter for which deriving new +instances is most interesting. +</para> +<para>Lastly, all of this applies only for classes other than +<literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>, +and <literal>Data</literal>, for which the built-in derivation applies (section +4.3.3. of the Haskell Report). +(For the standard classes <literal>Eq</literal>, <literal>Ord</literal>, +<literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether +the standard method is used or the one described here.) +</para> +</sect3> + +</sect2> + +<sect2 id="typing-binds"> +<title>Generalised typing of mutually recursive bindings</title> + +<para> +The Haskell Report specifies that a group of bindings (at top level, or in a +<literal>let</literal> or <literal>where</literal>) should be sorted into +strongly-connected components, and then type-checked in dependency order +(<ulink url="http://haskell.org/onlinereport/decls.html#sect4.5.1">Haskell +Report, Section 4.5.1</ulink>). +As each group is type-checked, any binders of the group that +have +an explicit type signature are put in the type environment with the specified +polymorphic type, +and all others are monomorphic until the group is generalised +(<ulink url="http://haskell.org/onlinereport/decls.html#sect4.5.2">Haskell Report, Section 4.5.2</ulink>). +</para> + +<para>Following a suggestion of Mark Jones, in his paper +<ulink url="http://www.cse.ogi.edu/~mpj/thih/">Typing Haskell in +Haskell</ulink>, +GHC implements a more general scheme. If <option>-fglasgow-exts</option> is +specified: +<emphasis>the dependency analysis ignores references to variables that have an explicit +type signature</emphasis>. +As a result of this refined dependency analysis, the dependency groups are smaller, and more bindings will +typecheck. For example, consider: +<programlisting> + f :: Eq a => a -> Bool + f x = (x == x) || g True || g "Yes" + + g y = (y <= y) || f True +</programlisting> +This is rejected by Haskell 98, but under Jones's scheme the definition for +<literal>g</literal> is typechecked first, separately from that for +<literal>f</literal>, +because the reference to <literal>f</literal> in <literal>g</literal>'s right +hand side is ingored by the dependency analysis. Then <literal>g</literal>'s +type is generalised, to get +<programlisting> + g :: Ord a => a -> Bool +</programlisting> +Now, the defintion for <literal>f</literal> is typechecked, with this type for +<literal>g</literal> in the type environment. +</para> + +<para> +The same refined dependency analysis also allows the type signatures of +mutually-recursive functions to have different contexts, something that is illegal in +Haskell 98 (Section 4.5.2, last sentence). With +<option>-fglasgow-exts</option> +GHC only insists that the type signatures of a <emphasis>refined</emphasis> group have identical +type signatures; in practice this means that only variables bound by the same +pattern binding must have the same context. For example, this is fine: +<programlisting> + f :: Eq a => a -> Bool + f x = (x == x) || g True + + g :: Ord a => a -> Bool + g y = (y <= y) || f True +</programlisting> +</para> +</sect2> + +</sect1> +<!-- ==================== End of type system extensions ================= --> + +<!-- ====================== Generalised algebraic data types ======================= --> + +<sect1 id="gadt"> +<title>Generalised Algebraic Data Types</title> + +<para>Generalised Algebraic Data Types (GADTs) generalise ordinary algebraic data types by allowing you +to give the type signatures of constructors explicitly. For example: +<programlisting> + data Term a where + Lit :: Int -> Term Int + Succ :: Term Int -> Term Int + IsZero :: Term Int -> Term Bool + If :: Term Bool -> Term a -> Term a -> Term a + Pair :: Term a -> Term b -> Term (a,b) +</programlisting> +Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the +case with ordinary vanilla data types. Now we can write a well-typed <literal>eval</literal> function +for these <literal>Terms</literal>: +<programlisting> + eval :: Term a -> a + eval (Lit i) = i + eval (Succ t) = 1 + eval t + eval (IsZero t) = eval t == 0 + eval (If b e1 e2) = if eval b then eval e1 else eval e2 + eval (Pair e1 e2) = (eval e1, eval e2) +</programlisting> +These and many other examples are given in papers by Hongwei Xi, and Tim Sheard. +</para> +<para> The extensions to GHC are these: +<itemizedlist> +<listitem><para> + Data type declarations have a 'where' form, as exemplified above. The type signature of +each constructor is independent, and is implicitly universally quantified as usual. Unlike a normal +Haskell data type declaration, the type variable(s) in the "<literal>data Term a where</literal>" header +have no scope. Indeed, one can write a kind signature instead: +<programlisting> + data Term :: * -> * where ... +</programlisting> +or even a mixture of the two: +<programlisting> + data Foo a :: (* -> *) -> * where ... +</programlisting> +The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal> +like this: +<programlisting> + data Foo a (b :: * -> *) where ... +</programlisting> +</para></listitem> + +<listitem><para> +There are no restrictions on the type of the data constructor, except that the result +type must begin with the type constructor being defined. For example, in the <literal>Term</literal> data +type above, the type of each constructor must end with <literal> ... -> Term ...</literal>. +</para></listitem> + +<listitem><para> +You can use record syntax on a GADT-style data type declaration: + +<programlisting> + data Term a where + Lit { val :: Int } :: Term Int + Succ { num :: Term Int } :: Term Int + Pred { num :: Term Int } :: Term Int + IsZero { arg :: Term Int } :: Term Bool + Pair { arg1 :: Term a + , arg2 :: Term b + } :: Term (a,b) + If { cnd :: Term Bool + , tru :: Term a + , fls :: Term a + } :: Term a +</programlisting> +For every constructor that has a field <literal>f</literal>, (a) the type of +field <literal>f</literal> must be the same; and (b) the +result type of the constructor must be the same; both modulo alpha conversion. +Hence, in our example, we cannot merge the <literal>num</literal> and <literal>arg</literal> +fields above into a +single name. Although their field types are both <literal>Term Int</literal>, +their selector functions actually have different types: + +<programlisting> + num :: Term Int -> Term Int + arg :: Term Bool -> Term Int +</programlisting> + +At the moment, record updates are not yet possible with GADT, so support is +limited to record construction, selection and pattern matching: + +<programlisting> + someTerm :: Term Bool + someTerm = IsZero { arg = Succ { num = Lit { val = 0 } } } + + eval :: Term a -> a + eval Lit { val = i } = i + eval Succ { num = t } = eval t + 1 + eval Pred { num = t } = eval t - 1 + eval IsZero { arg = t } = eval t == 0 + eval Pair { arg1 = t1, arg2 = t2 } = (eval t1, eval t2) + eval t@If{} = if eval (cnd t) then eval (tru t) else eval (fls t) +</programlisting> + +</para></listitem> + +<listitem><para> +You can use strictness annotations, in the obvious places +in the constructor type: +<programlisting> + data Term a where + Lit :: !Int -> Term Int + If :: Term Bool -> !(Term a) -> !(Term a) -> Term a + Pair :: Term a -> Term b -> Term (a,b) +</programlisting> +</para></listitem> + +<listitem><para> +You can use a <literal>deriving</literal> clause on a GADT-style data type +declaration, but only if the data type could also have been declared in +Haskell-98 syntax. For example, these two declarations are equivalent +<programlisting> + data Maybe1 a where { + Nothing1 :: Maybe a ; + Just1 :: a -> Maybe a + } deriving( Eq, Ord ) + + data Maybe2 a = Nothing2 | Just2 a + deriving( Eq, Ord ) +</programlisting> +This simply allows you to declare a vanilla Haskell-98 data type using the +<literal>where</literal> form without losing the <literal>deriving</literal> clause. +</para></listitem> + +<listitem><para> +Pattern matching causes type refinement. For example, in the right hand side of the equation +<programlisting> + eval :: Term a -> a + eval (Lit i) = ... +</programlisting> +the type <literal>a</literal> is refined to <literal>Int</literal>. (That's the whole point!) +A precise specification of the type rules is beyond what this user manual aspires to, but there is a paper +about the ideas: "Wobbly types: practical type inference for generalised algebraic data types", on Simon PJ's home page.</para> + +<para> The general principle is this: <emphasis>type refinement is only carried out based on user-supplied type annotations</emphasis>. +So if no type signature is supplied for <literal>eval</literal>, no type refinement happens, and lots of obscure error messages will +occur. However, the refinement is quite general. For example, if we had: +<programlisting> + eval :: Term a -> a -> a + eval (Lit i) j = i+j +</programlisting> +the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type +of the constructor <literal>Lit</literal>, and that refinement also applies to the type of <literal>j</literal>, and +the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal. +</para> +</listitem> +</itemizedlist> +</para> + +<para>Notice that GADTs generalise existential types. For example, these two declarations are equivalent: +<programlisting> + data T a = forall b. MkT b (b->a) + data T' a where { MKT :: b -> (b->a) -> T' a } +</programlisting> +</para> +</sect1> + +<!-- ====================== End of Generalised algebraic data types ======================= --> + +<!-- ====================== TEMPLATE HASKELL ======================= --> + +<sect1 id="template-haskell"> +<title>Template Haskell</title> + +<para>Template Haskell allows you to do compile-time meta-programming in Haskell. There is a "home page" for +Template Haskell at <ulink url="http://www.haskell.org/th/"> +http://www.haskell.org/th/</ulink>, while +the background to +the main technical innovations is discussed in "<ulink +url="http://research.microsoft.com/~simonpj/papers/meta-haskell"> +Template Meta-programming for Haskell</ulink>" (Proc Haskell Workshop 2002). +The details of the Template Haskell design are still in flux. Make sure you +consult the <ulink url="http://www.haskell.org/ghc/docs/latest/html/libraries/index.html">online library reference material</ulink> +(search for the type ExpQ). +[Temporary: many changes to the original design are described in + <ulink url="http://research.microsoft.com/~simonpj/tmp/notes2.ps">"http://research.microsoft.com/~simonpj/tmp/notes2.ps"</ulink>. +Not all of these changes are in GHC 6.2.] +</para> + +<para> The first example from that paper is set out below as a worked example to help get you started. +</para> + +<para> +The documentation here describes the realisation in GHC. (It's rather sketchy just now; +Tim Sheard is going to expand it.) +</para> + + <sect2> + <title>Syntax</title> + + <para> Template Haskell has the following new syntactic + constructions. You need to use the flag + <option>-fth</option><indexterm><primary><option>-fth</option></primary> + </indexterm>to switch these syntactic extensions on + (<option>-fth</option> is currently implied by + <option>-fglasgow-exts</option>, but you are encouraged to + specify it explicitly).</para> + + <itemizedlist> + <listitem><para> + A splice is written <literal>$x</literal>, where <literal>x</literal> is an + identifier, or <literal>$(...)</literal>, where the "..." is an arbitrary expression. + There must be no space between the "$" and the identifier or parenthesis. This use + of "$" overrides its meaning as an infix operator, just as "M.x" overrides the meaning + of "." as an infix operator. If you want the infix operator, put spaces around it. + </para> + <para> A splice can occur in place of + <itemizedlist> + <listitem><para> an expression; the spliced expression must + have type <literal>Q Exp</literal></para></listitem> + <listitem><para> a list of top-level declarations; ; the spliced expression must have type <literal>Q [Dec]</literal></para></listitem> + <listitem><para> [Planned, but not implemented yet.] a + type; the spliced expression must have type <literal>Q Typ</literal>.</para></listitem> + </itemizedlist> + (Note that the syntax for a declaration splice uses "<literal>$</literal>" not "<literal>splice</literal>" as in + the paper. Also the type of the enclosed expression must be <literal>Q [Dec]</literal>, not <literal>[Q Dec]</literal> + as in the paper.) + </para></listitem> + + + <listitem><para> + A expression quotation is written in Oxford brackets, thus: + <itemizedlist> + <listitem><para> <literal>[| ... |]</literal>, where the "..." is an expression; + the quotation has type <literal>Expr</literal>.</para></listitem> + <listitem><para> <literal>[d| ... |]</literal>, where the "..." is a list of top-level declarations; + the quotation has type <literal>Q [Dec]</literal>.</para></listitem> + <listitem><para> [Planned, but not implemented yet.] <literal>[t| ... |]</literal>, where the "..." is a type; + the quotation has type <literal>Type</literal>.</para></listitem> + </itemizedlist></para></listitem> + + <listitem><para> + Reification is written thus: + <itemizedlist> + <listitem><para> <literal>reifyDecl T</literal>, where <literal>T</literal> is a type constructor; this expression + has type <literal>Dec</literal>. </para></listitem> + <listitem><para> <literal>reifyDecl C</literal>, where <literal>C</literal> is a class; has type <literal>Dec</literal>.</para></listitem> + <listitem><para> <literal>reifyType f</literal>, where <literal>f</literal> is an identifier; has type <literal>Typ</literal>.</para></listitem> + <listitem><para> Still to come: fixities </para></listitem> + + </itemizedlist></para> + </listitem> + + + </itemizedlist> +</sect2> + +<sect2> <title> Using Template Haskell </title> +<para> +<itemizedlist> + <listitem><para> + The data types and monadic constructor functions for Template Haskell are in the library + <literal>Language.Haskell.THSyntax</literal>. + </para></listitem> + + <listitem><para> + You can only run a function at compile time if it is imported from another module. That is, + you can't define a function in a module, and call it from within a splice in the same module. + (It would make sense to do so, but it's hard to implement.) + </para></listitem> + + <listitem><para> + The flag <literal>-ddump-splices</literal> shows the expansion of all top-level splices as they happen. + </para></listitem> + <listitem><para> + If you are building GHC from source, you need at least a stage-2 bootstrap compiler to + run Template Haskell. A stage-1 compiler will reject the TH constructs. Reason: TH + compiles and runs a program, and then looks at the result. So it's important that + the program it compiles produces results whose representations are identical to + those of the compiler itself. + </para></listitem> +</itemizedlist> +</para> +<para> Template Haskell works in any mode (<literal>--make</literal>, <literal>--interactive</literal>, + or file-at-a-time). There used to be a restriction to the former two, but that restriction + has been lifted. +</para> +</sect2> + +<sect2> <title> A Template Haskell Worked Example </title> +<para>To help you get over the confidence barrier, try out this skeletal worked example. + First cut and paste the two modules below into "Main.hs" and "Printf.hs":</para> + +<programlisting> + +{- Main.hs -} +module Main where + +-- Import our template "pr" +import Printf ( pr ) + +-- The splice operator $ takes the Haskell source code +-- generated at compile time by "pr" and splices it into +-- the argument of "putStrLn". +main = putStrLn ( $(pr "Hello") ) + + +{- Printf.hs -} +module Printf where + +-- Skeletal printf from the paper. +-- It needs to be in a separate module to the one where +-- you intend to use it. + +-- Import some Template Haskell syntax +import Language.Haskell.TH + +-- Describe a format string +data Format = D | S | L String + +-- Parse a format string. This is left largely to you +-- as we are here interested in building our first ever +-- Template Haskell program and not in building printf. +parse :: String -> [Format] +parse s = [ L s ] + +-- Generate Haskell source code from a parsed representation +-- of the format string. This code will be spliced into +-- the module which calls "pr", at compile time. +gen :: [Format] -> ExpQ +gen [D] = [| \n -> show n |] +gen [S] = [| \s -> s |] +gen [L s] = stringE s + +-- Here we generate the Haskell code for the splice +-- from an input format string. +pr :: String -> ExpQ +pr s = gen (parse s) +</programlisting> + +<para>Now run the compiler (here we are a Cygwin prompt on Windows): +</para> +<programlisting> +$ ghc --make -fth main.hs -o main.exe +</programlisting> + +<para>Run "main.exe" and here is your output:</para> + +<programlisting> +$ ./main +Hello +</programlisting> + +</sect2> + +</sect1> + +<!-- ===================== Arrow notation =================== --> + +<sect1 id="arrow-notation"> +<title>Arrow notation +</title> + +<para>Arrows are a generalization of monads introduced by John Hughes. +For more details, see +<itemizedlist> + +<listitem> +<para> +“Generalising Monads to Arrows”, +John Hughes, in <citetitle>Science of Computer Programming</citetitle> 37, +pp67–111, May 2000. +</para> +</listitem> + +<listitem> +<para> +“<ulink url="http://www.soi.city.ac.uk/~ross/papers/notation.html">A New Notation for Arrows</ulink>”, +Ross Paterson, in <citetitle>ICFP</citetitle>, Sep 2001. +</para> +</listitem> + +<listitem> +<para> +“<ulink url="http://www.soi.city.ac.uk/~ross/papers/fop.html">Arrows and Computation</ulink>”, +Ross Paterson, in <citetitle>The Fun of Programming</citetitle>, +Palgrave, 2003. +</para> +</listitem> + +</itemizedlist> +and the arrows web page at +<ulink url="http://www.haskell.org/arrows/"><literal>http://www.haskell.org/arrows/</literal></ulink>. +With the <option>-farrows</option> flag, GHC supports the arrow +notation described in the second of these papers. +What follows is a brief introduction to the notation; +it won't make much sense unless you've read Hughes's paper. +This notation is translated to ordinary Haskell, +using combinators from the +<ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink> +module. +</para> + +<para>The extension adds a new kind of expression for defining arrows: +<screen> +<replaceable>exp</replaceable><superscript>10</superscript> ::= ... + | proc <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable> +</screen> +where <literal>proc</literal> is a new keyword. +The variables of the pattern are bound in the body of the +<literal>proc</literal>-expression, +which is a new sort of thing called a <firstterm>command</firstterm>. +The syntax of commands is as follows: +<screen> +<replaceable>cmd</replaceable> ::= <replaceable>exp</replaceable><superscript>10</superscript> -< <replaceable>exp</replaceable> + | <replaceable>exp</replaceable><superscript>10</superscript> -<< <replaceable>exp</replaceable> + | <replaceable>cmd</replaceable><superscript>0</superscript> +</screen> +with <replaceable>cmd</replaceable><superscript>0</superscript> up to +<replaceable>cmd</replaceable><superscript>9</superscript> defined using +infix operators as for expressions, and +<screen> +<replaceable>cmd</replaceable><superscript>10</superscript> ::= \ <replaceable>apat</replaceable> ... <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable> + | let <replaceable>decls</replaceable> in <replaceable>cmd</replaceable> + | if <replaceable>exp</replaceable> then <replaceable>cmd</replaceable> else <replaceable>cmd</replaceable> + | case <replaceable>exp</replaceable> of { <replaceable>calts</replaceable> } + | do { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> ; <replaceable>cmd</replaceable> } + | <replaceable>fcmd</replaceable> + +<replaceable>fcmd</replaceable> ::= <replaceable>fcmd</replaceable> <replaceable>aexp</replaceable> + | ( <replaceable>cmd</replaceable> ) + | (| <replaceable>aexp</replaceable> <replaceable>cmd</replaceable> ... <replaceable>cmd</replaceable> |) + +<replaceable>cstmt</replaceable> ::= let <replaceable>decls</replaceable> + | <replaceable>pat</replaceable> <- <replaceable>cmd</replaceable> + | rec { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> [;] } + | <replaceable>cmd</replaceable> +</screen> +where <replaceable>calts</replaceable> are like <replaceable>alts</replaceable> +except that the bodies are commands instead of expressions. +</para> + +<para> +Commands produce values, but (like monadic computations) +may yield more than one value, +or none, and may do other things as well. +For the most part, familiarity with monadic notation is a good guide to +using commands. +However the values of expressions, even monadic ones, +are determined by the values of the variables they contain; +this is not necessarily the case for commands. +</para> + +<para> +A simple example of the new notation is the expression +<screen> +proc x -> f -< x+1 +</screen> +We call this a <firstterm>procedure</firstterm> or +<firstterm>arrow abstraction</firstterm>. +As with a lambda expression, the variable <literal>x</literal> +is a new variable bound within the <literal>proc</literal>-expression. +It refers to the input to the arrow. +In the above example, <literal>-<</literal> is not an identifier but an +new reserved symbol used for building commands from an expression of arrow +type and an expression to be fed as input to that arrow. +(The weird look will make more sense later.) +It may be read as analogue of application for arrows. +The above example is equivalent to the Haskell expression +<screen> +arr (\ x -> x+1) >>> f +</screen> +That would make no sense if the expression to the left of +<literal>-<</literal> involves the bound variable <literal>x</literal>. +More generally, the expression to the left of <literal>-<</literal> +may not involve any <firstterm>local variable</firstterm>, +i.e. a variable bound in the current arrow abstraction. +For such a situation there is a variant <literal>-<<</literal>, as in +<screen> +proc x -> f x -<< x+1 +</screen> +which is equivalent to +<screen> +arr (\ x -> (f x, x+1)) >>> app +</screen> +so in this case the arrow must belong to the <literal>ArrowApply</literal> +class. +Such an arrow is equivalent to a monad, so if you're using this form +you may find a monadic formulation more convenient. +</para> + +<sect2> +<title>do-notation for commands</title> + +<para> +Another form of command is a form of <literal>do</literal>-notation. +For example, you can write +<screen> +proc x -> do + y <- f -< x+1 + g -< 2*y + let z = x+y + t <- h -< x*z + returnA -< t+z +</screen> +You can read this much like ordinary <literal>do</literal>-notation, +but with commands in place of monadic expressions. +The first line sends the value of <literal>x+1</literal> as an input to +the arrow <literal>f</literal>, and matches its output against +<literal>y</literal>. +In the next line, the output is discarded. +The arrow <function>returnA</function> is defined in the +<ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink> +module as <literal>arr id</literal>. +The above example is treated as an abbreviation for +<screen> +arr (\ x -> (x, x)) >>> + first (arr (\ x -> x+1) >>> f) >>> + arr (\ (y, x) -> (y, (x, y))) >>> + first (arr (\ y -> 2*y) >>> g) >>> + arr snd >>> + arr (\ (x, y) -> let z = x+y in ((x, z), z)) >>> + first (arr (\ (x, z) -> x*z) >>> h) >>> + arr (\ (t, z) -> t+z) >>> + returnA +</screen> +Note that variables not used later in the composition are projected out. +After simplification using rewrite rules (see <xref linkend="rewrite-rules"/>) +defined in the +<ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink> +module, this reduces to +<screen> +arr (\ x -> (x+1, x)) >>> + first f >>> + arr (\ (y, x) -> (2*y, (x, y))) >>> + first g >>> + arr (\ (_, (x, y)) -> let z = x+y in (x*z, z)) >>> + first h >>> + arr (\ (t, z) -> t+z) +</screen> +which is what you might have written by hand. +With arrow notation, GHC keeps track of all those tuples of variables for you. +</para> + +<para> +Note that although the above translation suggests that +<literal>let</literal>-bound variables like <literal>z</literal> must be +monomorphic, the actual translation produces Core, +so polymorphic variables are allowed. +</para> + +<para> +It's also possible to have mutually recursive bindings, +using the new <literal>rec</literal> keyword, as in the following example: +<programlisting> +counter :: ArrowCircuit a => a Bool Int +counter = proc reset -> do + rec output <- returnA -< if reset then 0 else next + next <- delay 0 -< output+1 + returnA -< output +</programlisting> +The translation of such forms uses the <function>loop</function> combinator, +so the arrow concerned must belong to the <literal>ArrowLoop</literal> class. +</para> + +</sect2> + +<sect2> +<title>Conditional commands</title> + +<para> +In the previous example, we used a conditional expression to construct the +input for an arrow. +Sometimes we want to conditionally execute different commands, as in +<screen> +proc (x,y) -> + if f x y + then g -< x+1 + else h -< y+2 +</screen> +which is translated to +<screen> +arr (\ (x,y) -> if f x y then Left x else Right y) >>> + (arr (\x -> x+1) >>> f) ||| (arr (\y -> y+2) >>> g) +</screen> +Since the translation uses <function>|||</function>, +the arrow concerned must belong to the <literal>ArrowChoice</literal> class. +</para> + +<para> +There are also <literal>case</literal> commands, like +<screen> +case input of + [] -> f -< () + [x] -> g -< x+1 + x1:x2:xs -> do + y <- h -< (x1, x2) + ys <- k -< xs + returnA -< y:ys +</screen> +The syntax is the same as for <literal>case</literal> expressions, +except that the bodies of the alternatives are commands rather than expressions. +The translation is similar to that of <literal>if</literal> commands. +</para> + +</sect2> + +<sect2> +<title>Defining your own control structures</title> + +<para> +As we're seen, arrow notation provides constructs, +modelled on those for expressions, +for sequencing, value recursion and conditionals. +But suitable combinators, +which you can define in ordinary Haskell, +may also be used to build new commands out of existing ones. +The basic idea is that a command defines an arrow from environments to values. +These environments assign values to the free local variables of the command. +Thus combinators that produce arrows from arrows +may also be used to build commands from commands. +For example, the <literal>ArrowChoice</literal> class includes a combinator +<programlisting> +ArrowChoice a => (<+>) :: a e c -> a e c -> a e c +</programlisting> +so we can use it to build commands: +<programlisting> +expr' = proc x -> do + returnA -< x + <+> do + symbol Plus -< () + y <- term -< () + expr' -< x + y + <+> do + symbol Minus -< () + y <- term -< () + expr' -< x - y +</programlisting> +(The <literal>do</literal> on the first line is needed to prevent the first +<literal><+> ...</literal> from being interpreted as part of the +expression on the previous line.) +This is equivalent to +<programlisting> +expr' = (proc x -> returnA -< x) + <+> (proc x -> do + symbol Plus -< () + y <- term -< () + expr' -< x + y) + <+> (proc x -> do + symbol Minus -< () + y <- term -< () + expr' -< x - y) +</programlisting> +It is essential that this operator be polymorphic in <literal>e</literal> +(representing the environment input to the command +and thence to its subcommands) +and satisfy the corresponding naturality property +<screen> +arr k >>> (f <+> g) = (arr k >>> f) <+> (arr k >>> g) +</screen> +at least for strict <literal>k</literal>. +(This should be automatic if you're not using <function>seq</function>.) +This ensures that environments seen by the subcommands are environments +of the whole command, +and also allows the translation to safely trim these environments. +The operator must also not use any variable defined within the current +arrow abstraction. +</para> + +<para> +We could define our own operator +<programlisting> +untilA :: ArrowChoice a => a e () -> a e Bool -> a e () +untilA body cond = proc x -> + if cond x then returnA -< () + else do + body -< x + untilA body cond -< x +</programlisting> +and use it in the same way. +Of course this infix syntax only makes sense for binary operators; +there is also a more general syntax involving special brackets: +<screen> +proc x -> do + y <- f -< x+1 + (|untilA (increment -< x+y) (within 0.5 -< x)|) +</screen> +</para> + +</sect2> + +<sect2> +<title>Primitive constructs</title> + +<para> +Some operators will need to pass additional inputs to their subcommands. +For example, in an arrow type supporting exceptions, +the operator that attaches an exception handler will wish to pass the +exception that occurred to the handler. +Such an operator might have a type +<screen> +handleA :: ... => a e c -> a (e,Ex) c -> a e c +</screen> +where <literal>Ex</literal> is the type of exceptions handled. +You could then use this with arrow notation by writing a command +<screen> +body `handleA` \ ex -> handler +</screen> +so that if an exception is raised in the command <literal>body</literal>, +the variable <literal>ex</literal> is bound to the value of the exception +and the command <literal>handler</literal>, +which typically refers to <literal>ex</literal>, is entered. +Though the syntax here looks like a functional lambda, +we are talking about commands, and something different is going on. +The input to the arrow represented by a command consists of values for +the free local variables in the command, plus a stack of anonymous values. +In all the prior examples, this stack was empty. +In the second argument to <function>handleA</function>, +this stack consists of one value, the value of the exception. +The command form of lambda merely gives this value a name. +</para> + +<para> +More concretely, +the values on the stack are paired to the right of the environment. +So operators like <function>handleA</function> that pass +extra inputs to their subcommands can be designed for use with the notation +by pairing the values with the environment in this way. +More precisely, the type of each argument of the operator (and its result) +should have the form +<screen> +a (...(e,t1), ... tn) t +</screen> +where <replaceable>e</replaceable> is a polymorphic variable +(representing the environment) +and <replaceable>ti</replaceable> are the types of the values on the stack, +with <replaceable>t1</replaceable> being the <quote>top</quote>. +The polymorphic variable <replaceable>e</replaceable> must not occur in +<replaceable>a</replaceable>, <replaceable>ti</replaceable> or +<replaceable>t</replaceable>. +However the arrows involved need not be the same. +Here are some more examples of suitable operators: +<screen> +bracketA :: ... => a e b -> a (e,b) c -> a (e,c) d -> a e d +runReader :: ... => a e c -> a' (e,State) c +runState :: ... => a e c -> a' (e,State) (c,State) +</screen> +We can supply the extra input required by commands built with the last two +by applying them to ordinary expressions, as in +<screen> +proc x -> do + s <- ... + (|runReader (do { ... })|) s +</screen> +which adds <literal>s</literal> to the stack of inputs to the command +built using <function>runReader</function>. +</para> + +<para> +The command versions of lambda abstraction and application are analogous to +the expression versions. +In particular, the beta and eta rules describe equivalences of commands. +These three features (operators, lambda abstraction and application) +are the core of the notation; everything else can be built using them, +though the results would be somewhat clumsy. +For example, we could simulate <literal>do</literal>-notation by defining +<programlisting> +bind :: Arrow a => a e b -> a (e,b) c -> a e c +u `bind` f = returnA &&& u >>> f + +bind_ :: Arrow a => a e b -> a e c -> a e c +u `bind_` f = u `bind` (arr fst >>> f) +</programlisting> +We could simulate <literal>if</literal> by defining +<programlisting> +cond :: ArrowChoice a => a e b -> a e b -> a (e,Bool) b +cond f g = arr (\ (e,b) -> if b then Left e else Right e) >>> f ||| g +</programlisting> +</para> + +</sect2> + +<sect2> +<title>Differences with the paper</title> + +<itemizedlist> + +<listitem> +<para>Instead of a single form of arrow application (arrow tail) with two +translations, the implementation provides two forms +<quote><literal>-<</literal></quote> (first-order) +and <quote><literal>-<<</literal></quote> (higher-order). +</para> +</listitem> + +<listitem> +<para>User-defined operators are flagged with banana brackets instead of +a new <literal>form</literal> keyword. +</para> +</listitem> + +</itemizedlist> + +</sect2> + +<sect2> +<title>Portability</title> + +<para> +Although only GHC implements arrow notation directly, +there is also a preprocessor +(available from the +<ulink url="http://www.haskell.org/arrows/">arrows web page</ulink>) +that translates arrow notation into Haskell 98 +for use with other Haskell systems. +You would still want to check arrow programs with GHC; +tracing type errors in the preprocessor output is not easy. +Modules intended for both GHC and the preprocessor must observe some +additional restrictions: +<itemizedlist> + +<listitem> +<para> +The module must import +<ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>. +</para> +</listitem> + +<listitem> +<para> +The preprocessor cannot cope with other Haskell extensions. +These would have to go in separate modules. +</para> +</listitem> + +<listitem> +<para> +Because the preprocessor targets Haskell (rather than Core), +<literal>let</literal>-bound variables are monomorphic. +</para> +</listitem> + +</itemizedlist> +</para> + +</sect2> + +</sect1> + +<!-- ==================== ASSERTIONS ================= --> + +<sect1 id="sec-assertions"> +<title>Assertions +<indexterm><primary>Assertions</primary></indexterm> +</title> + +<para> +If you want to make use of assertions in your standard Haskell code, you +could define a function like the following: +</para> + +<para> + +<programlisting> +assert :: Bool -> a -> a +assert False x = error "assertion failed!" +assert _ x = x +</programlisting> + +</para> + +<para> +which works, but gives you back a less than useful error message -- +an assertion failed, but which and where? +</para> + +<para> +One way out is to define an extended <function>assert</function> function which also +takes a descriptive string to include in the error message and +perhaps combine this with the use of a pre-processor which inserts +the source location where <function>assert</function> was used. +</para> + +<para> +Ghc offers a helping hand here, doing all of this for you. For every +use of <function>assert</function> in the user's source: +</para> + +<para> + +<programlisting> +kelvinToC :: Double -> Double +kelvinToC k = assert (k >= 0.0) (k+273.15) +</programlisting> + +</para> + +<para> +Ghc will rewrite this to also include the source location where the +assertion was made, +</para> + +<para> + +<programlisting> +assert pred val ==> assertError "Main.hs|15" pred val +</programlisting> + +</para> + +<para> +The rewrite is only performed by the compiler when it spots +applications of <function>Control.Exception.assert</function>, so you +can still define and use your own versions of +<function>assert</function>, should you so wish. If not, import +<literal>Control.Exception</literal> to make use +<function>assert</function> in your code. +</para> + +<para> +GHC ignores assertions when optimisation is turned on with the + <option>-O</option><indexterm><primary><option>-O</option></primary></indexterm> flag. That is, expressions of the form +<literal>assert pred e</literal> will be rewritten to +<literal>e</literal>. You can also disable assertions using the + <option>-fignore-asserts</option> + option<indexterm><primary><option>-fignore-asserts</option></primary> + </indexterm>.</para> + +<para> +Assertion failures can be caught, see the documentation for the +<literal>Control.Exception</literal> library for the details. +</para> + +</sect1> + + +<!-- =============================== PRAGMAS =========================== --> + + <sect1 id="pragmas"> + <title>Pragmas</title> + + <indexterm><primary>pragma</primary></indexterm> + + <para>GHC supports several pragmas, or instructions to the + compiler placed in the source code. Pragmas don't normally affect + the meaning of the program, but they might affect the efficiency + of the generated code.</para> + + <para>Pragmas all take the form + +<literal>{-# <replaceable>word</replaceable> ... #-}</literal> + + where <replaceable>word</replaceable> indicates the type of + pragma, and is followed optionally by information specific to that + type of pragma. Case is ignored in + <replaceable>word</replaceable>. The various values for + <replaceable>word</replaceable> that GHC understands are described + in the following sections; any pragma encountered with an + unrecognised <replaceable>word</replaceable> is (silently) + ignored.</para> + + <sect2 id="deprecated-pragma"> + <title>DEPRECATED pragma</title> + <indexterm><primary>DEPRECATED</primary> + </indexterm> + + <para>The DEPRECATED pragma lets you specify that a particular + function, class, or type, is deprecated. There are two + forms. + + <itemizedlist> + <listitem> + <para>You can deprecate an entire module thus:</para> +<programlisting> + module Wibble {-# DEPRECATED "Use Wobble instead" #-} where + ... +</programlisting> + <para>When you compile any module that import + <literal>Wibble</literal>, GHC will print the specified + message.</para> + </listitem> + + <listitem> + <para>You can deprecate a function, class, type, or data constructor, with the + following top-level declaration:</para> +<programlisting> + {-# DEPRECATED f, C, T "Don't use these" #-} +</programlisting> + <para>When you compile any module that imports and uses any + of the specified entities, GHC will print the specified + message.</para> + <para> You can only depecate entities declared at top level in the module + being compiled, and you can only use unqualified names in the list of + entities being deprecated. A capitalised name, such as <literal>T</literal> + refers to <emphasis>either</emphasis> the type constructor <literal>T</literal> + <emphasis>or</emphasis> the data constructor <literal>T</literal>, or both if + both are in scope. If both are in scope, there is currently no way to deprecate + one without the other (c.f. fixities <xref linkend="infix-tycons"/>).</para> + </listitem> + </itemizedlist> + Any use of the deprecated item, or of anything from a deprecated + module, will be flagged with an appropriate message. However, + deprecations are not reported for + (a) uses of a deprecated function within its defining module, and + (b) uses of a deprecated function in an export list. + The latter reduces spurious complaints within a library + in which one module gathers together and re-exports + the exports of several others. + </para> + <para>You can suppress the warnings with the flag + <option>-fno-warn-deprecations</option>.</para> + </sect2> + + <sect2 id="include-pragma"> + <title>INCLUDE pragma</title> + + <para>The <literal>INCLUDE</literal> pragma is for specifying the names + of C header files that should be <literal>#include</literal>'d into + the C source code generated by the compiler for the current module (if + compiling via C). For example:</para> + +<programlisting> +{-# INCLUDE "foo.h" #-} +{-# INCLUDE <stdio.h> #-}</programlisting> + + <para>The <literal>INCLUDE</literal> pragma(s) must appear at the top of + your source file with any <literal>OPTIONS_GHC</literal> + pragma(s).</para> + + <para>An <literal>INCLUDE</literal> pragma is the preferred alternative + to the <option>-#include</option> option (<xref + linkend="options-C-compiler" />), because the + <literal>INCLUDE</literal> pragma is understood by other + compilers. Yet another alternative is to add the include file to each + <literal>foreign import</literal> declaration in your code, but we + don't recommend using this approach with GHC.</para> + </sect2> + + <sect2 id="inline-noinline-pragma"> + <title>INLINE and NOINLINE pragmas</title> + + <para>These pragmas control the inlining of function + definitions.</para> + + <sect3 id="inline-pragma"> + <title>INLINE pragma</title> + <indexterm><primary>INLINE</primary></indexterm> + + <para>GHC (with <option>-O</option>, as always) tries to + inline (or “unfold”) functions/values that are + “small enough,” thus avoiding the call overhead + and possibly exposing other more-wonderful optimisations. + Normally, if GHC decides a function is “too + expensive” to inline, it will not do so, nor will it + export that unfolding for other modules to use.</para> + + <para>The sledgehammer you can bring to bear is the + <literal>INLINE</literal><indexterm><primary>INLINE + pragma</primary></indexterm> pragma, used thusly:</para> + +<programlisting> +key_function :: Int -> String -> (Bool, Double) + +#ifdef __GLASGOW_HASKELL__ +{-# INLINE key_function #-} +#endif +</programlisting> + + <para>(You don't need to do the C pre-processor carry-on + unless you're going to stick the code through HBC—it + doesn't like <literal>INLINE</literal> pragmas.)</para> + + <para>The major effect of an <literal>INLINE</literal> pragma + is to declare a function's “cost” to be very low. + The normal unfolding machinery will then be very keen to + inline it.</para> + + <para>Syntactically, an <literal>INLINE</literal> pragma for a + function can be put anywhere its type signature could be + put.</para> + + <para><literal>INLINE</literal> pragmas are a particularly + good idea for the + <literal>then</literal>/<literal>return</literal> (or + <literal>bind</literal>/<literal>unit</literal>) functions in + a monad. For example, in GHC's own + <literal>UniqueSupply</literal> monad code, we have:</para> + +<programlisting> +#ifdef __GLASGOW_HASKELL__ +{-# INLINE thenUs #-} +{-# INLINE returnUs #-} +#endif +</programlisting> + + <para>See also the <literal>NOINLINE</literal> pragma (<xref + linkend="noinline-pragma"/>).</para> + </sect3> + + <sect3 id="noinline-pragma"> + <title>NOINLINE pragma</title> + + <indexterm><primary>NOINLINE</primary></indexterm> + <indexterm><primary>NOTINLINE</primary></indexterm> + + <para>The <literal>NOINLINE</literal> pragma does exactly what + you'd expect: it stops the named function from being inlined + by the compiler. You shouldn't ever need to do this, unless + you're very cautious about code size.</para> + + <para><literal>NOTINLINE</literal> is a synonym for + <literal>NOINLINE</literal> (<literal>NOINLINE</literal> is + specified by Haskell 98 as the standard way to disable + inlining, so it should be used if you want your code to be + portable).</para> + </sect3> + + <sect3 id="phase-control"> + <title>Phase control</title> + + <para> Sometimes you want to control exactly when in GHC's + pipeline the INLINE pragma is switched on. Inlining happens + only during runs of the <emphasis>simplifier</emphasis>. Each + run of the simplifier has a different <emphasis>phase + number</emphasis>; the phase number decreases towards zero. + If you use <option>-dverbose-core2core</option> you'll see the + sequence of phase numbers for successive runs of the + simplifier. In an INLINE pragma you can optionally specify a + phase number, thus:</para> + + <itemizedlist> + <listitem> + <para>You can say "inline <literal>f</literal> in Phase 2 + and all subsequent phases": +<programlisting> + {-# INLINE [2] f #-} +</programlisting> + </para> + </listitem> + + <listitem> + <para>You can say "inline <literal>g</literal> in all + phases up to, but not including, Phase 3": +<programlisting> + {-# INLINE [~3] g #-} +</programlisting> + </para> + </listitem> + + <listitem> + <para>If you omit the phase indicator, you mean "inline in + all phases".</para> + </listitem> + </itemizedlist> + + <para>You can use a phase number on a NOINLINE pragma too:</para> + + <itemizedlist> + <listitem> + <para>You can say "do not inline <literal>f</literal> + until Phase 2; in Phase 2 and subsequently behave as if + there was no pragma at all": +<programlisting> + {-# NOINLINE [2] f #-} +</programlisting> + </para> + </listitem> + + <listitem> + <para>You can say "do not inline <literal>g</literal> in + Phase 3 or any subsequent phase; before that, behave as if + there was no pragma": +<programlisting> + {-# NOINLINE [~3] g #-} +</programlisting> + </para> + </listitem> + + <listitem> + <para>If you omit the phase indicator, you mean "never + inline this function".</para> + </listitem> + </itemizedlist> + + <para>The same phase-numbering control is available for RULES + (<xref linkend="rewrite-rules"/>).</para> + </sect3> + </sect2> + + <sect2 id="language-pragma"> + <title>LANGUAGE pragma</title> + + <indexterm><primary>LANGUAGE</primary><secondary>pragma</secondary></indexterm> + <indexterm><primary>pragma</primary><secondary>LANGUAGE</secondary></indexterm> + + <para>This allows language extensions to be enabled in a portable way. + It is the intention that all Haskell compilers support the + <literal>LANGUAGE</literal> pragma with the same syntax, although not + all extensions are supported by all compilers, of + course. The <literal>LANGUAGE</literal> pragma should be used instead + of <literal>OPTIONS_GHC</literal>, if possible.</para> + + <para>For example, to enable the FFI and preprocessing with CPP:</para> + +<programlisting>{-# LANGUAGE ForeignFunctionInterface, CPP #-}</programlisting> + + <para>Any extension from the <literal>Extension</literal> type defined in + <ulink + url="../libraries/Cabal/Language-Haskell-Extension.html"><literal>Language.Haskell.Extension</literal></ulink> may be used. GHC will report an error if any of the requested extensions are not supported.</para> + </sect2> + + + <sect2 id="line-pragma"> + <title>LINE pragma</title> + + <indexterm><primary>LINE</primary><secondary>pragma</secondary></indexterm> + <indexterm><primary>pragma</primary><secondary>LINE</secondary></indexterm> + <para>This pragma is similar to C's <literal>#line</literal> + pragma, and is mainly for use in automatically generated Haskell + code. It lets you specify the line number and filename of the + original code; for example</para> + +<programlisting>{-# LINE 42 "Foo.vhs" #-}</programlisting> + + <para>if you'd generated the current file from something called + <filename>Foo.vhs</filename> and this line corresponds to line + 42 in the original. GHC will adjust its error messages to refer + to the line/file named in the <literal>LINE</literal> + pragma.</para> + </sect2> + + <sect2 id="options-pragma"> + <title>OPTIONS_GHC pragma</title> + <indexterm><primary>OPTIONS_GHC</primary> + </indexterm> + <indexterm><primary>pragma</primary><secondary>OPTIONS_GHC</secondary> + </indexterm> + + <para>The <literal>OPTIONS_GHC</literal> pragma is used to specify + additional options that are given to the compiler when compiling + this source file. See <xref linkend="source-file-options"/> for + details.</para> + + <para>Previous versions of GHC accepted <literal>OPTIONS</literal> rather + than <literal>OPTIONS_GHC</literal>, but that is now deprecated.</para> + </sect2> + + <sect2 id="rules"> + <title>RULES pragma</title> + + <para>The RULES pragma lets you specify rewrite rules. It is + described in <xref linkend="rewrite-rules"/>.</para> + </sect2> + + <sect2 id="specialize-pragma"> + <title>SPECIALIZE pragma</title> + + <indexterm><primary>SPECIALIZE pragma</primary></indexterm> + <indexterm><primary>pragma, SPECIALIZE</primary></indexterm> + <indexterm><primary>overloading, death to</primary></indexterm> + + <para>(UK spelling also accepted.) For key overloaded + functions, you can create extra versions (NB: more code space) + specialised to particular types. Thus, if you have an + overloaded function:</para> + +<programlisting> + hammeredLookup :: Ord key => [(key, value)] -> key -> value +</programlisting> + + <para>If it is heavily used on lists with + <literal>Widget</literal> keys, you could specialise it as + follows:</para> + +<programlisting> + {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-} +</programlisting> + + <para>A <literal>SPECIALIZE</literal> pragma for a function can + be put anywhere its type signature could be put.</para> + + <para>A <literal>SPECIALIZE</literal> has the effect of generating + (a) a specialised version of the function and (b) a rewrite rule + (see <xref linkend="rewrite-rules"/>) that rewrites a call to the + un-specialised function into a call to the specialised one.</para> + + <para>The type in a SPECIALIZE pragma can be any type that is less + polymorphic than the type of the original function. In concrete terms, + if the original function is <literal>f</literal> then the pragma +<programlisting> + {-# SPECIALIZE f :: <type> #-} +</programlisting> + is valid if and only if the defintion +<programlisting> + f_spec :: <type> + f_spec = f +</programlisting> + is valid. Here are some examples (where we only give the type signature + for the original function, not its code): +<programlisting> + f :: Eq a => a -> b -> b + {-# SPECIALISE f :: Int -> b -> b #-} + + g :: (Eq a, Ix b) => a -> b -> b + {-# SPECIALISE g :: (Eq a) => a -> Int -> Int #-} + + h :: Eq a => a -> a -> a + {-# SPECIALISE h :: (Eq a) => [a] -> [a] -> [a] #-} +</programlisting> +The last of these examples will generate a +RULE with a somewhat-complex left-hand side (try it yourself), so it might not fire very +well. If you use this kind of specialisation, let us know how well it works. +</para> + +<para>A <literal>SPECIALIZE</literal> pragma can optionally be followed with a +<literal>INLINE</literal> or <literal>NOINLINE</literal> pragma, optionally +followed by a phase, as described in <xref linkend="inline-noinline-pragma"/>. +The <literal>INLINE</literal> pragma affects the specialised verison of the +function (only), and applies even if the function is recursive. The motivating +example is this: +<programlisting> +-- A GADT for arrays with type-indexed representation +data Arr e where + ArrInt :: !Int -> ByteArray# -> Arr Int + ArrPair :: !Int -> Arr e1 -> Arr e2 -> Arr (e1, e2) + +(!:) :: Arr e -> Int -> e +{-# SPECIALISE INLINE (!:) :: Arr Int -> Int -> Int #-} +{-# SPECIALISE INLINE (!:) :: Arr (a, b) -> Int -> (a, b) #-} +(ArrInt _ ba) !: (I# i) = I# (indexIntArray# ba i) +(ArrPair _ a1 a2) !: i = (a1 !: i, a2 !: i) +</programlisting> +Here, <literal>(!:)</literal> is a recursive function that indexes arrays +of type <literal>Arr e</literal>. Consider a call to <literal>(!:)</literal> +at type <literal>(Int,Int)</literal>. The second specialisation will fire, and +the specialised function will be inlined. It has two calls to +<literal>(!:)</literal>, +both at type <literal>Int</literal>. Both these calls fire the first +specialisation, whose body is also inlined. The result is a type-based +unrolling of the indexing function.</para> +<para>Warning: you can make GHC diverge by using <literal>SPECIALISE INLINE</literal> +on an ordinarily-recursive function.</para> + + <para>Note: In earlier versions of GHC, it was possible to provide your own + specialised function for a given type: + +<programlisting> +{-# SPECIALIZE hammeredLookup :: [(Int, value)] -> Int -> value = intLookup #-} +</programlisting> + + This feature has been removed, as it is now subsumed by the + <literal>RULES</literal> pragma (see <xref linkend="rule-spec"/>).</para> + + </sect2> + +<sect2 id="specialize-instance-pragma"> +<title>SPECIALIZE instance pragma +</title> + +<para> +<indexterm><primary>SPECIALIZE pragma</primary></indexterm> +<indexterm><primary>overloading, death to</primary></indexterm> +Same idea, except for instance declarations. For example: + +<programlisting> +instance (Eq a) => Eq (Foo a) where { + {-# SPECIALIZE instance Eq (Foo [(Int, Bar)]) #-} + ... usual stuff ... + } +</programlisting> +The pragma must occur inside the <literal>where</literal> part +of the instance declaration. +</para> +<para> +Compatible with HBC, by the way, except perhaps in the placement +of the pragma. +</para> + +</sect2> + + <sect2 id="unpack-pragma"> + <title>UNPACK pragma</title> + + <indexterm><primary>UNPACK</primary></indexterm> + + <para>The <literal>UNPACK</literal> indicates to the compiler + that it should unpack the contents of a constructor field into + the constructor itself, removing a level of indirection. For + example:</para> + +<programlisting> +data T = T {-# UNPACK #-} !Float + {-# UNPACK #-} !Float +</programlisting> + + <para>will create a constructor <literal>T</literal> containing + two unboxed floats. This may not always be an optimisation: if + the <function>T</function> constructor is scrutinised and the + floats passed to a non-strict function for example, they will + have to be reboxed (this is done automatically by the + compiler).</para> + + <para>Unpacking constructor fields should only be used in + conjunction with <option>-O</option>, in order to expose + unfoldings to the compiler so the reboxing can be removed as + often as possible. For example:</para> + +<programlisting> +f :: T -> Float +f (T f1 f2) = f1 + f2 +</programlisting> + + <para>The compiler will avoid reboxing <function>f1</function> + and <function>f2</function> by inlining <function>+</function> + on floats, but only when <option>-O</option> is on.</para> + + <para>Any single-constructor data is eligible for unpacking; for + example</para> + +<programlisting> +data T = T {-# UNPACK #-} !(Int,Int) +</programlisting> + + <para>will store the two <literal>Int</literal>s directly in the + <function>T</function> constructor, by flattening the pair. + Multi-level unpacking is also supported:</para> + +<programlisting> +data T = T {-# UNPACK #-} !S +data S = S {-# UNPACK #-} !Int {-# UNPACK #-} !Int +</programlisting> + + <para>will store two unboxed <literal>Int#</literal>s + directly in the <function>T</function> constructor. The + unpacker can see through newtypes, too.</para> + + <para>If a field cannot be unpacked, you will not get a warning, + so it might be an idea to check the generated code with + <option>-ddump-simpl</option>.</para> + + <para>See also the <option>-funbox-strict-fields</option> flag, + which essentially has the effect of adding + <literal>{-# UNPACK #-}</literal> to every strict + constructor field.</para> + </sect2> + +</sect1> + +<!-- ======================= REWRITE RULES ======================== --> + +<sect1 id="rewrite-rules"> +<title>Rewrite rules + +<indexterm><primary>RULES pragma</primary></indexterm> +<indexterm><primary>pragma, RULES</primary></indexterm> +<indexterm><primary>rewrite rules</primary></indexterm></title> + +<para> +The programmer can specify rewrite rules as part of the source program +(in a pragma). GHC applies these rewrite rules wherever it can, provided (a) +the <option>-O</option> flag (<xref linkend="options-optimise"/>) is on, +and (b) the <option>-frules-off</option> flag +(<xref linkend="options-f"/>) is not specified. +</para> + +<para> +Here is an example: + +<programlisting> + {-# RULES + "map/map" forall f g xs. map f (map g xs) = map (f.g) xs + #-} +</programlisting> + +</para> + +<sect2> +<title>Syntax</title> + +<para> +From a syntactic point of view: + +<itemizedlist> +<listitem> + +<para> + There may be zero or more rules in a <literal>RULES</literal> pragma. +</para> +</listitem> + +<listitem> + +<para> + Each rule has a name, enclosed in double quotes. The name itself has +no significance at all. It is only used when reporting how many times the rule fired. +</para> +</listitem> + +<listitem> +<para> +A rule may optionally have a phase-control number (see <xref linkend="phase-control"/>), +immediately after the name of the rule. Thus: +<programlisting> + {-# RULES + "map/map" [2] forall f g xs. map f (map g xs) = map (f.g) xs + #-} +</programlisting> +The "[2]" means that the rule is active in Phase 2 and subsequent phases. The inverse +notation "[~2]" is also accepted, meaning that the rule is active up to, but not including, +Phase 2. +</para> +</listitem> + + +<listitem> + +<para> + Layout applies in a <literal>RULES</literal> pragma. Currently no new indentation level +is set, so you must lay out your rules starting in the same column as the +enclosing definitions. +</para> +</listitem> + +<listitem> + +<para> + Each variable mentioned in a rule must either be in scope (e.g. <function>map</function>), +or bound by the <literal>forall</literal> (e.g. <function>f</function>, <function>g</function>, <function>xs</function>). The variables bound by +the <literal>forall</literal> are called the <emphasis>pattern</emphasis> variables. They are separated +by spaces, just like in a type <literal>forall</literal>. +</para> +</listitem> +<listitem> + +<para> + A pattern variable may optionally have a type signature. +If the type of the pattern variable is polymorphic, it <emphasis>must</emphasis> have a type signature. +For example, here is the <literal>foldr/build</literal> rule: + +<programlisting> +"fold/build" forall k z (g::forall b. (a->b->b) -> b -> b) . + foldr k z (build g) = g k z +</programlisting> + +Since <function>g</function> has a polymorphic type, it must have a type signature. + +</para> +</listitem> +<listitem> + +<para> +The left hand side of a rule must consist of a top-level variable applied +to arbitrary expressions. For example, this is <emphasis>not</emphasis> OK: + +<programlisting> +"wrong1" forall e1 e2. case True of { True -> e1; False -> e2 } = e1 +"wrong2" forall f. f True = True +</programlisting> + +In <literal>"wrong1"</literal>, the LHS is not an application; in <literal>"wrong2"</literal>, the LHS has a pattern variable +in the head. +</para> +</listitem> +<listitem> + +<para> + A rule does not need to be in the same module as (any of) the +variables it mentions, though of course they need to be in scope. +</para> +</listitem> +<listitem> + +<para> + Rules are automatically exported from a module, just as instance declarations are. +</para> +</listitem> + +</itemizedlist> + +</para> + +</sect2> + +<sect2> +<title>Semantics</title> + +<para> +From a semantic point of view: + +<itemizedlist> +<listitem> + +<para> +Rules are only applied if you use the <option>-O</option> flag. +</para> +</listitem> + +<listitem> +<para> + Rules are regarded as left-to-right rewrite rules. +When GHC finds an expression that is a substitution instance of the LHS +of a rule, it replaces the expression by the (appropriately-substituted) RHS. +By "a substitution instance" we mean that the LHS can be made equal to the +expression by substituting for the pattern variables. + +</para> +</listitem> +<listitem> + +<para> + The LHS and RHS of a rule are typechecked, and must have the +same type. + +</para> +</listitem> +<listitem> + +<para> + GHC makes absolutely no attempt to verify that the LHS and RHS +of a rule have the same meaning. That is undecidable in general, and +infeasible in most interesting cases. The responsibility is entirely the programmer's! + +</para> +</listitem> +<listitem> + +<para> + GHC makes no attempt to make sure that the rules are confluent or +terminating. For example: + +<programlisting> + "loop" forall x,y. f x y = f y x +</programlisting> + +This rule will cause the compiler to go into an infinite loop. + +</para> +</listitem> +<listitem> + +<para> + If more than one rule matches a call, GHC will choose one arbitrarily to apply. + +</para> +</listitem> +<listitem> +<para> + GHC currently uses a very simple, syntactic, matching algorithm +for matching a rule LHS with an expression. It seeks a substitution +which makes the LHS and expression syntactically equal modulo alpha +conversion. The pattern (rule), but not the expression, is eta-expanded if +necessary. (Eta-expanding the expression can lead to laziness bugs.) +But not beta conversion (that's called higher-order matching). +</para> + +<para> +Matching is carried out on GHC's intermediate language, which includes +type abstractions and applications. So a rule only matches if the +types match too. See <xref linkend="rule-spec"/> below. +</para> +</listitem> +<listitem> + +<para> + GHC keeps trying to apply the rules as it optimises the program. +For example, consider: + +<programlisting> + let s = map f + t = map g + in + s (t xs) +</programlisting> + +The expression <literal>s (t xs)</literal> does not match the rule <literal>"map/map"</literal>, but GHC +will substitute for <varname>s</varname> and <varname>t</varname>, giving an expression which does match. +If <varname>s</varname> or <varname>t</varname> was (a) used more than once, and (b) large or a redex, then it would +not be substituted, and the rule would not fire. + +</para> +</listitem> +<listitem> + +<para> + In the earlier phases of compilation, GHC inlines <emphasis>nothing +that appears on the LHS of a rule</emphasis>, because once you have substituted +for something you can't match against it (given the simple minded +matching). So if you write the rule + +<programlisting> + "map/map" forall f,g. map f . map g = map (f.g) +</programlisting> + +this <emphasis>won't</emphasis> match the expression <literal>map f (map g xs)</literal>. +It will only match something written with explicit use of ".". +Well, not quite. It <emphasis>will</emphasis> match the expression + +<programlisting> +wibble f g xs +</programlisting> + +where <function>wibble</function> is defined: + +<programlisting> +wibble f g = map f . map g +</programlisting> + +because <function>wibble</function> will be inlined (it's small). + +Later on in compilation, GHC starts inlining even things on the +LHS of rules, but still leaves the rules enabled. This inlining +policy is controlled by the per-simplification-pass flag <option>-finline-phase</option><emphasis>n</emphasis>. + +</para> +</listitem> +<listitem> + +<para> + All rules are implicitly exported from the module, and are therefore +in force in any module that imports the module that defined the rule, directly +or indirectly. (That is, if A imports B, which imports C, then C's rules are +in force when compiling A.) The situation is very similar to that for instance +declarations. +</para> +</listitem> + +</itemizedlist> + +</para> + +</sect2> + +<sect2> +<title>List fusion</title> + +<para> +The RULES mechanism is used to implement fusion (deforestation) of common list functions. +If a "good consumer" consumes an intermediate list constructed by a "good producer", the +intermediate list should be eliminated entirely. +</para> + +<para> +The following are good producers: + +<itemizedlist> +<listitem> + +<para> + List comprehensions +</para> +</listitem> +<listitem> + +<para> + Enumerations of <literal>Int</literal> and <literal>Char</literal> (e.g. <literal>['a'..'z']</literal>). +</para> +</listitem> +<listitem> + +<para> + Explicit lists (e.g. <literal>[True, False]</literal>) +</para> +</listitem> +<listitem> + +<para> + The cons constructor (e.g <literal>3:4:[]</literal>) +</para> +</listitem> +<listitem> + +<para> + <function>++</function> +</para> +</listitem> + +<listitem> +<para> + <function>map</function> +</para> +</listitem> + +<listitem> +<para> + <function>filter</function> +</para> +</listitem> +<listitem> + +<para> + <function>iterate</function>, <function>repeat</function> +</para> +</listitem> +<listitem> + +<para> + <function>zip</function>, <function>zipWith</function> +</para> +</listitem> + +</itemizedlist> + +</para> + +<para> +The following are good consumers: + +<itemizedlist> +<listitem> + +<para> + List comprehensions +</para> +</listitem> +<listitem> + +<para> + <function>array</function> (on its second argument) +</para> +</listitem> +<listitem> + +<para> + <function>length</function> +</para> +</listitem> +<listitem> + +<para> + <function>++</function> (on its first argument) +</para> +</listitem> + +<listitem> +<para> + <function>foldr</function> +</para> +</listitem> + +<listitem> +<para> + <function>map</function> +</para> +</listitem> +<listitem> + +<para> + <function>filter</function> +</para> +</listitem> +<listitem> + +<para> + <function>concat</function> +</para> +</listitem> +<listitem> + +<para> + <function>unzip</function>, <function>unzip2</function>, <function>unzip3</function>, <function>unzip4</function> +</para> +</listitem> +<listitem> + +<para> + <function>zip</function>, <function>zipWith</function> (but on one argument only; if both are good producers, <function>zip</function> +will fuse with one but not the other) +</para> +</listitem> +<listitem> + +<para> + <function>partition</function> +</para> +</listitem> +<listitem> + +<para> + <function>head</function> +</para> +</listitem> +<listitem> + +<para> + <function>and</function>, <function>or</function>, <function>any</function>, <function>all</function> +</para> +</listitem> +<listitem> + +<para> + <function>sequence_</function> +</para> +</listitem> +<listitem> + +<para> + <function>msum</function> +</para> +</listitem> +<listitem> + +<para> + <function>sortBy</function> +</para> +</listitem> + +</itemizedlist> + +</para> + + <para> +So, for example, the following should generate no intermediate lists: + +<programlisting> +array (1,10) [(i,i*i) | i <- map (+ 1) [0..9]] +</programlisting> + +</para> + +<para> +This list could readily be extended; if there are Prelude functions that you use +a lot which are not included, please tell us. +</para> + +<para> +If you want to write your own good consumers or producers, look at the +Prelude definitions of the above functions to see how to do so. +</para> + +</sect2> + +<sect2 id="rule-spec"> +<title>Specialisation +</title> + +<para> +Rewrite rules can be used to get the same effect as a feature +present in earlier versions of GHC. +For example, suppose that: + +<programlisting> +genericLookup :: Ord a => Table a b -> a -> b +intLookup :: Table Int b -> Int -> b +</programlisting> + +where <function>intLookup</function> is an implementation of +<function>genericLookup</function> that works very fast for +keys of type <literal>Int</literal>. You might wish +to tell GHC to use <function>intLookup</function> instead of +<function>genericLookup</function> whenever the latter was called with +type <literal>Table Int b -> Int -> b</literal>. +It used to be possible to write + +<programlisting> +{-# SPECIALIZE genericLookup :: Table Int b -> Int -> b = intLookup #-} +</programlisting> + +This feature is no longer in GHC, but rewrite rules let you do the same thing: + +<programlisting> +{-# RULES "genericLookup/Int" genericLookup = intLookup #-} +</programlisting> + +This slightly odd-looking rule instructs GHC to replace +<function>genericLookup</function> by <function>intLookup</function> +<emphasis>whenever the types match</emphasis>. +What is more, this rule does not need to be in the same +file as <function>genericLookup</function>, unlike the +<literal>SPECIALIZE</literal> pragmas which currently do (so that they +have an original definition available to specialise). +</para> + +<para>It is <emphasis>Your Responsibility</emphasis> to make sure that +<function>intLookup</function> really behaves as a specialised version +of <function>genericLookup</function>!!!</para> + +<para>An example in which using <literal>RULES</literal> for +specialisation will Win Big: + +<programlisting> +toDouble :: Real a => a -> Double +toDouble = fromRational . toRational + +{-# RULES "toDouble/Int" toDouble = i2d #-} +i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly +</programlisting> + +The <function>i2d</function> function is virtually one machine +instruction; the default conversion—via an intermediate +<literal>Rational</literal>—is obscenely expensive by +comparison. +</para> + +</sect2> + +<sect2> +<title>Controlling what's going on</title> + +<para> + +<itemizedlist> +<listitem> + +<para> + Use <option>-ddump-rules</option> to see what transformation rules GHC is using. +</para> +</listitem> +<listitem> + +<para> + Use <option>-ddump-simpl-stats</option> to see what rules are being fired. +If you add <option>-dppr-debug</option> you get a more detailed listing. +</para> +</listitem> +<listitem> + +<para> + The definition of (say) <function>build</function> in <filename>GHC/Base.lhs</filename> looks llike this: + +<programlisting> + build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a] + {-# INLINE build #-} + build g = g (:) [] +</programlisting> + +Notice the <literal>INLINE</literal>! That prevents <literal>(:)</literal> from being inlined when compiling +<literal>PrelBase</literal>, so that an importing module will “see” the <literal>(:)</literal>, and can +match it on the LHS of a rule. <literal>INLINE</literal> prevents any inlining happening +in the RHS of the <literal>INLINE</literal> thing. I regret the delicacy of this. + +</para> +</listitem> +<listitem> + +<para> + In <filename>libraries/base/GHC/Base.lhs</filename> look at the rules for <function>map</function> to +see how to write rules that will do fusion and yet give an efficient +program even if fusion doesn't happen. More rules in <filename>GHC/List.lhs</filename>. +</para> +</listitem> + +</itemizedlist> + +</para> + +</sect2> + +<sect2 id="core-pragma"> + <title>CORE pragma</title> + + <indexterm><primary>CORE pragma</primary></indexterm> + <indexterm><primary>pragma, CORE</primary></indexterm> + <indexterm><primary>core, annotation</primary></indexterm> + +<para> + The external core format supports <quote>Note</quote> annotations; + the <literal>CORE</literal> pragma gives a way to specify what these + should be in your Haskell source code. Syntactically, core + annotations are attached to expressions and take a Haskell string + literal as an argument. The following function definition shows an + example: + +<programlisting> +f x = ({-# CORE "foo" #-} show) ({-# CORE "bar" #-} x) +</programlisting> + + Semantically, this is equivalent to: + +<programlisting> +g x = show x +</programlisting> +</para> + +<para> + However, when external for is generated (via + <option>-fext-core</option>), there will be Notes attached to the + expressions <function>show</function> and <varname>x</varname>. + The core function declaration for <function>f</function> is: +</para> + +<programlisting> + f :: %forall a . GHCziShow.ZCTShow a -> + a -> GHCziBase.ZMZN GHCziBase.Char = + \ @ a (zddShow::GHCziShow.ZCTShow a) (eta::a) -> + (%note "foo" + %case zddShow %of (tpl::GHCziShow.ZCTShow a) + {GHCziShow.ZCDShow + (tpl1::GHCziBase.Int -> + a -> + GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha +r) + (tpl2::a -> GHCziBase.ZMZN GHCziBase.Char) + (tpl3::GHCziBase.ZMZN a -> + GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha +r) -> + tpl2}) + (%note "foo" + eta); +</programlisting> + +<para> + Here, we can see that the function <function>show</function> (which + has been expanded out to a case expression over the Show dictionary) + has a <literal>%note</literal> attached to it, as does the + expression <varname>eta</varname> (which used to be called + <varname>x</varname>). +</para> + +</sect2> + +</sect1> + +<sect1 id="generic-classes"> +<title>Generic classes</title> + + <para>(Note: support for generic classes is currently broken in + GHC 5.02).</para> + +<para> +The ideas behind this extension are described in detail in "Derivable type classes", +Ralf Hinze and Simon Peyton Jones, Haskell Workshop, Montreal Sept 2000, pp94-105. +An example will give the idea: +</para> + +<programlisting> + import Generics + + class Bin a where + toBin :: a -> [Int] + fromBin :: [Int] -> (a, [Int]) + + toBin {| Unit |} Unit = [] + toBin {| a :+: b |} (Inl x) = 0 : toBin x + toBin {| a :+: b |} (Inr y) = 1 : toBin y + toBin {| a :*: b |} (x :*: y) = toBin x ++ toBin y + + fromBin {| Unit |} bs = (Unit, bs) + fromBin {| a :+: b |} (0:bs) = (Inl x, bs') where (x,bs') = fromBin bs + fromBin {| a :+: b |} (1:bs) = (Inr y, bs') where (y,bs') = fromBin bs + fromBin {| a :*: b |} bs = (x :*: y, bs'') where (x,bs' ) = fromBin bs + (y,bs'') = fromBin bs' +</programlisting> +<para> +This class declaration explains how <literal>toBin</literal> and <literal>fromBin</literal> +work for arbitrary data types. They do so by giving cases for unit, product, and sum, +which are defined thus in the library module <literal>Generics</literal>: +</para> +<programlisting> + data Unit = Unit + data a :+: b = Inl a | Inr b + data a :*: b = a :*: b +</programlisting> +<para> +Now you can make a data type into an instance of Bin like this: +<programlisting> + instance (Bin a, Bin b) => Bin (a,b) + instance Bin a => Bin [a] +</programlisting> +That is, just leave off the "where" clause. Of course, you can put in the +where clause and over-ride whichever methods you please. +</para> + + <sect2> + <title> Using generics </title> + <para>To use generics you need to</para> + <itemizedlist> + <listitem> + <para>Use the flags <option>-fglasgow-exts</option> (to enable the extra syntax), + <option>-fgenerics</option> (to generate extra per-data-type code), + and <option>-package lang</option> (to make the <literal>Generics</literal> library + available. </para> + </listitem> + <listitem> + <para>Import the module <literal>Generics</literal> from the + <literal>lang</literal> package. This import brings into + scope the data types <literal>Unit</literal>, + <literal>:*:</literal>, and <literal>:+:</literal>. (You + don't need this import if you don't mention these types + explicitly; for example, if you are simply giving instance + declarations.)</para> + </listitem> + </itemizedlist> + </sect2> + +<sect2> <title> Changes wrt the paper </title> +<para> +Note that the type constructors <literal>:+:</literal> and <literal>:*:</literal> +can be written infix (indeed, you can now use +any operator starting in a colon as an infix type constructor). Also note that +the type constructors are not exactly as in the paper (Unit instead of 1, etc). +Finally, note that the syntax of the type patterns in the class declaration +uses "<literal>{|</literal>" and "<literal>|}</literal>" brackets; curly braces +alone would ambiguous when they appear on right hand sides (an extension we +anticipate wanting). +</para> +</sect2> + +<sect2> <title>Terminology and restrictions</title> +<para> +Terminology. A "generic default method" in a class declaration +is one that is defined using type patterns as above. +A "polymorphic default method" is a default method defined as in Haskell 98. +A "generic class declaration" is a class declaration with at least one +generic default method. +</para> + +<para> +Restrictions: +<itemizedlist> +<listitem> +<para> +Alas, we do not yet implement the stuff about constructor names and +field labels. +</para> +</listitem> + +<listitem> +<para> +A generic class can have only one parameter; you can't have a generic +multi-parameter class. +</para> +</listitem> + +<listitem> +<para> +A default method must be defined entirely using type patterns, or entirely +without. So this is illegal: +<programlisting> + class Foo a where + op :: a -> (a, Bool) + op {| Unit |} Unit = (Unit, True) + op x = (x, False) +</programlisting> +However it is perfectly OK for some methods of a generic class to have +generic default methods and others to have polymorphic default methods. +</para> +</listitem> + +<listitem> +<para> +The type variable(s) in the type pattern for a generic method declaration +scope over the right hand side. So this is legal (note the use of the type variable ``p'' in a type signature on the right hand side: +<programlisting> + class Foo a where + op :: a -> Bool + op {| p :*: q |} (x :*: y) = op (x :: p) + ... +</programlisting> +</para> +</listitem> + +<listitem> +<para> +The type patterns in a generic default method must take one of the forms: +<programlisting> + a :+: b + a :*: b + Unit +</programlisting> +where "a" and "b" are type variables. Furthermore, all the type patterns for +a single type constructor (<literal>:*:</literal>, say) must be identical; they +must use the same type variables. So this is illegal: +<programlisting> + class Foo a where + op :: a -> Bool + op {| a :+: b |} (Inl x) = True + op {| p :+: q |} (Inr y) = False +</programlisting> +The type patterns must be identical, even in equations for different methods of the class. +So this too is illegal: +<programlisting> + class Foo a where + op1 :: a -> Bool + op1 {| a :*: b |} (x :*: y) = True + + op2 :: a -> Bool + op2 {| p :*: q |} (x :*: y) = False +</programlisting> +(The reason for this restriction is that we gather all the equations for a particular type consructor +into a single generic instance declaration.) +</para> +</listitem> + +<listitem> +<para> +A generic method declaration must give a case for each of the three type constructors. +</para> +</listitem> + +<listitem> +<para> +The type for a generic method can be built only from: + <itemizedlist> + <listitem> <para> Function arrows </para> </listitem> + <listitem> <para> Type variables </para> </listitem> + <listitem> <para> Tuples </para> </listitem> + <listitem> <para> Arbitrary types not involving type variables </para> </listitem> + </itemizedlist> +Here are some example type signatures for generic methods: +<programlisting> + op1 :: a -> Bool + op2 :: Bool -> (a,Bool) + op3 :: [Int] -> a -> a + op4 :: [a] -> Bool +</programlisting> +Here, op1, op2, op3 are OK, but op4 is rejected, because it has a type variable +inside a list. +</para> +<para> +This restriction is an implementation restriction: we just havn't got around to +implementing the necessary bidirectional maps over arbitrary type constructors. +It would be relatively easy to add specific type constructors, such as Maybe and list, +to the ones that are allowed.</para> +</listitem> + +<listitem> +<para> +In an instance declaration for a generic class, the idea is that the compiler +will fill in the methods for you, based on the generic templates. However it can only +do so if + <itemizedlist> + <listitem> + <para> + The instance type is simple (a type constructor applied to type variables, as in Haskell 98). + </para> + </listitem> + <listitem> + <para> + No constructor of the instance type has unboxed fields. + </para> + </listitem> + </itemizedlist> +(Of course, these things can only arise if you are already using GHC extensions.) +However, you can still give an instance declarations for types which break these rules, +provided you give explicit code to override any generic default methods. +</para> +</listitem> + +</itemizedlist> +</para> + +<para> +The option <option>-ddump-deriv</option> dumps incomprehensible stuff giving details of +what the compiler does with generic declarations. +</para> + +</sect2> + +<sect2> <title> Another example </title> +<para> +Just to finish with, here's another example I rather like: +<programlisting> + class Tag a where + nCons :: a -> Int + nCons {| Unit |} _ = 1 + nCons {| a :*: b |} _ = 1 + nCons {| a :+: b |} _ = nCons (bot::a) + nCons (bot::b) + + tag :: a -> Int + tag {| Unit |} _ = 1 + tag {| a :*: b |} _ = 1 + tag {| a :+: b |} (Inl x) = tag x + tag {| a :+: b |} (Inr y) = nCons (bot::a) + tag y +</programlisting> +</para> +</sect2> +</sect1> + + + +<!-- Emacs stuff: + ;;; Local Variables: *** + ;;; mode: xml *** + ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter" "sect1") *** + ;;; End: *** + --> + |