summaryrefslogtreecommitdiff
path: root/docs/comm/the-beast/typecheck.html
blob: 8d22784b8a6b0e9eeaffd6f39cc56f9af482cb1b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
    <title>The GHC Commentary - Checking Types</title>
  </head>

  <body BGCOLOR="FFFFFF">
    <h1>The GHC Commentary - Checking Types</h1>
    <p>
      Probably the most important phase in the frontend is the type checker,
      which is located at <a
	href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/"><code>fptools/ghc/compiler/typecheck/</code>.</a>
      GHC type checks programs in their original Haskell form before the
      desugarer converts them into Core code.  This complicates the type
      checker as it has to handle the much more verbose Haskell AST, but it
      improves error messages, as those message are based on the same
      structure that the user sees.
    </p>
    <p>
      GHC defines the abstract syntax of Haskell programs in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/hsSyn/HsSyn.lhs"><code>HsSyn</code></a>
      using a structure that abstracts over the concrete representation of
      bound occurences of identifiers and patterns.  The module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcHsSyn.lhs"><code>TcHsSyn</code></a>
      defines a number of helper function required by the type checker.  Note
      that the type <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcRnTypes.lhs"><code>TcRnTypes</code></a>.<code>TcId</code>
      used to represent identifiers in some signatures during type checking
      is, in fact, nothing but a synonym for a <a href="vars.html">plain
      <code>Id</code>.</a>  
    </p>
    <p>
      It is also noteworthy, that the representations of types changes during
      type checking from <code>HsType</code> to <code>TypeRep.Type</code>.
      The latter is a <a href="types.html">hybrid type representation</a> that
      is used to type Core, but still contains sufficient information to
      recover source types.  In particular, the type checker maintains and
      compares types in their <code>Type</code> form.
    </p>

    <h2>The Overall Flow of Things</h2>

    <h4>Entry Points Into the Type Checker</h4>
    <p>
      The interface of the type checker (and <a
      href="renamer.html">renamer</a>) to the rest of the compiler is provided
      by <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcRnDriver.lhs"><code>TcRnDriver</code></a>.
      Entire modules are processed by calling <code>tcRnModule</code> and GHCi
      uses <code>tcRnStmt</code>, <code>tcRnExpr</code>, and
      <code>tcRnType</code> to typecheck statements and expressions, and to
      kind check types, respectively.  Moreover, <code>tcRnExtCore</code> is
      provided to typecheck external Core code.  Moreover,
      <code>tcTopSrcDecls</code> is used by Template Haskell - more
      specifically by <code>TcSplice.tc_bracket</code>
      - to type check the contents of declaration brackets.
    </p>

    <h4>Renaming and Type Checking a Module</h4>
    <p>
      The function <code>tcRnModule</code> controls the complete static
      analysis of a Haskell module.  It sets up the combined renamer and type
      checker monad, resolves all import statements, initiates the actual
      renaming and type checking process, and finally, wraps off by processing
      the export list.
    </p>
    <p>
      The actual type checking and renaming process is initiated via
      <code>TcRnDriver.tcRnSrcDecls</code>, which uses a helper called
      <code>tc_rn_src_decls</code> to implement the iterative renaming and
      type checking process required by <a href="../exts/th.html">Template
      Haskell</a>.  However, before it invokes <code>tc_rn_src_decls</code>,
      it takes care of hi-boot files; afterwards, it simplifies type
      constraints and zonking (see below regarding the later).
    </p>
    <p>
      The function <code>tc_rn_src_decls</code> partitions static analysis of
      a whole module into multiple rounds, where the initial round is followed
      by an additional one for each toplevel splice.  It collects all
      declarations up to the next splice into an <code>HsDecl.HsGroup</code>
      to rename and type check that <em>declaration group</em> by calling
      <code>TcRnDriver.tcRnGroup</code>.  Afterwards, it executes the
      splice (if there are any left) and proceeds to the next group, which
      includes the declarations produced by the splice.
    </p>
    <p>
      The function <code>tcRnGroup</code>, finally, gets down to invoke the
      actual renaming and type checking via
      <code>TcRnDriver.rnTopSrcDecls</code> and
      <code>TcRnDriver.tcTopSrcDecls</code>, respectively.  The renamer, apart
      from renaming, computes the global type checking environment, of type
      <code>TcRnTypes.TcGblEnv</code>, which is stored in the type checking
      monad before type checking commences.
    </p>

    <h2>Type Checking a Declaration Group</h2>
    <p>
      The type checking of a declaration group, performed by
      <code>tcTopSrcDecls</code> starts by processing of the type and class
      declarations of the current module, using the function
      <code>TcTyClsDecls.tcTyAndClassDecls</code>.  This is followed by a
      first round over instance declarations using
      <code>TcInstDcls.tcInstDecls1</code>, which in particular generates all
      additional bindings due to the deriving process.  Then come foreign
      import declarations (<code>TcForeign.tcForeignImports</code>) and
      default declarations (<code>TcDefaults.tcDefaults</code>).
    </p>
    <p>
      Now, finally, toplevel value declarations (including derived ones) are
      type checked using <code>TcBinds.tcTopBinds</code>.  Afterwards,
      <code>TcInstDcls.tcInstDecls2</code> traverses instances for the second
      time.  Type checking concludes with processing foreign exports
      (<code>TcForeign.tcForeignExports</code>) and rewrite rules
      (<code>TcRules.tcRules</code>).  Finally, the global environment is
      extended with the new bindings.
    </p>

    <h2>Type checking Type and Class Declarations</h2>
    <p>
      Type and class declarations are type checked in a couple of phases that
      contain recursive dependencies - aka <em>knots.</em> The first knot
      encompasses almost the whole type checking of these declarations and
      forms the main piece of <code>TcTyClsDecls.tcTyAndClassDecls</code>.
    </p>
    <p>
      Inside this big knot, the first main operation is kind checking, which
      again involves a knot.  It is implemented by <code>kcTyClDecls</code>,
      which performs kind checking of potentially recursively-dependent type
      and class declarations using kind variables for initially unknown kinds.
      During processing the individual declarations some of these variables
      will be instantiated depending on the context; the rest gets by default
      kind <code>*</code> (during <em>zonking</em> of the kind signatures).
      Type synonyms are treated specially in this process, because they can
      have an unboxed type, but they cannot be recursive.  Hence, their kinds
      are inferred in dependency order.  Moreover, in contrast to class
      declarations and other type declarations, synonyms are not entered into
      the global environment as a global <code>TyThing</code>.
      (<code>TypeRep.TyThing</code> is a sum type that combines the various
      flavours of typish entities, such that they can be stuck into type
      environments and similar.)
    </p>

    <h2>More Details</h2>

    <h4>Types Variables and Zonking</h4>
    <p>
      During type checking type variables are represented by mutable variables
      - cf. the <a href="vars.html#TyVar">variable story.</a>  Consequently,
      unification can instantiate type variables by updating those mutable
      variables.  This process of instantiation is (for reasons that elude me)
      called <a
      href="http://www.dictionary.com/cgi-bin/dict.pl?term=zonk&db=*">zonking</a>
      in GHC's sources.  The zonking routines for the various forms of Haskell
      constructs are responsible for most of the code in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcHsSyn.lhs"><code>TcHsSyn</code>,</a>
      whereas the routines that actually operate on mutable types are defined
      in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcMType.lhs"><code>TcMType</code></a>;
      this includes the zonking of type variables and type terms, routines to
      create mutable structures and update them as well as routines that check
      constraints, such as that type variables in function signatures have not
      been instantiated during type checking.  The actual type unification
      routine is <code>uTys</code> in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcUnify.lhs"><code>TcUnify</code></a>.
    </p>
    <p>
      All type variables that may be instantiated (those in signatures
      may not), but haven't been instantiated during type checking, are zonked
      to <code>()</code>, so that after type checking all mutable variables
      have been eliminated.
    </p>

    <h4>Type Representation</h4>
    <p>
      The representation of types is fixed in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcRep.lhs"><code>TcRep</code></a>
      and exported as the data type <code>Type</code>.  As explained in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcType.lhs"><code>TcType</code></a>,
      GHC supports rank-N types, but, in the type checker, maintains the
      restriction that type variables cannot be instantiated to quantified
      types (i.e., the type system is predicative).  The type checker floats
      universal quantifiers outside and maintains types in prenex form.
      (However, quantifiers can, of course, not float out of negative
      positions.)  Overall, we have
    </p>
    <blockquote>
      <pre>
sigma -> forall tyvars. phi
phi   -> theta => rho
rho   -> sigma -> rho
       | tau
tau   -> tyvar
       | tycon tau_1 .. tau_n
       | tau_1 tau_2
       | tau_1 -> tau_2</pre>
    </blockquote>
    <p>
      where <code>sigma</code> is in prenex form; i.e., there is never a
      forall to the right of an arrow in a <code>phi</code> type.  Moreover, a
      type of the form <code>tau</code> never contains a quantifier (which
      includes arguments to type constructors).
    </p>
    <p>
      Of particular interest are the variants <code>SourceTy</code> and
      <code>NoteTy</code> of <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TypeRep.lhs"><code>TypeRep</code></a>.<code>Type</code>.
      The constructor <code>SourceTy :: SourceType -> Type</code> represents a
      type constraint; that is, a predicate over types represented by a
      dictionary.  The type checker treats a <code>SourceTy</code> as opaque,
      but during the translation to core it will be expanded into its concrete
      representation (i.e., a dictionary type) by the function <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/types/Type.lhs"><code>Type</code></a>.<code>sourceTypeRep</code>.  
      Note that newtypes are not covered by <code>SourceType</code>s anymore,
      even if some comments in GHC still suggest this.  Instead, all newtype
      applications are initially represented as a <code>NewTcApp</code>, until
      they are eliminated by calls to <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/types/Type.lhs"><code>Type</code></a>.<code>newTypeRep</code>.
    </p>
    <p>
      The <code>NoteTy</code> constructor is used to add non-essential
      information to a type term.  Such information has the type
      <code>TypeRep.TyNote</code> and is either the set of free type variables
      of the annotated expression or the unexpanded version of a type synonym.
      Free variables sets are cached as notes to save the overhead of
      repeatedly computing the same set for a given term.  Unexpanded type
      synonyms are useful for generating comprehensible error messages, but
      have no influence on the process of type checking.
    </p>

    <h4>Type Checking Environment</h4>
    <p>
      During type checking, GHC maintains a <em>type environment</em> whose
      type definitions are fixed in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcRnTypes.lhs"><code>TcRnTypes</code></a> with the operations defined in
<a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcEnv.lhs"><code>TcEnv</code></a>.
      Among other things, the environment contains all imported and local
      instances as well as a list of <em>global</em> entities (imported and
      local types and classes together with imported identifiers) and
      <em>local</em> entities (locally defined identifiers).  This environment
      is threaded through the type checking monad, whose support functions
      including initialisation can be found in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcRnMonad.lhs"><code>TcRnMonad</code>.</a>

    <h4>Expressions</h4>
    <p>
      Expressions are type checked by <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/TcExpr.lhs"><code>TcExpr</code>.</a>  
    <p>
      Usage occurences of identifiers are processed by the function
      <code>tcId</code> whose main purpose is to <a href="#inst">instantiate
      overloaded identifiers.</a> It essentially calls
      <code>TcInst.instOverloadedFun</code> once for each universally
      quantified set of type constraints.  It should be noted that overloaded
      identifiers are replaced by new names that are first defined in the LIE
      (Local Instance Environment?) and later promoted into top-level
      bindings.
      
    <h4><a name="inst">Handling of Dictionaries and Method Instances</a></h4>
    <p>
      GHC implements overloading using so-called <em>dictionaries.</em> A
      dictionary is a tuple of functions -- one function for each method in
      the class of which the dictionary implements an instance.  During type
      checking, GHC replaces each type constraint of a function with one
      additional argument.  At runtime, the extended function gets passed a
      matching class dictionary by way of these additional arguments.
      Whenever the function needs to call a method of such a class, it simply
      extracts it from the dictionary.
    <p>
      This sounds simple enough; however, the actual implementation is a bit
      more tricky as it wants to keep track of all the instances at which
      overloaded functions are used in a module.  This information is useful
      to optimise the code.  The implementation is the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/typecheck/Inst.lhs"><code>Inst.lhs</code>.</a>
    <p>
      The function <code>instOverloadedFun</code> is invoked for each
      overloaded usage occurence of an identifier, where overloaded means that
      the type of the idendifier contains a non-trivial type constraint.  It
      proceeds in two steps: (1) Allocation of a method instance
      (<code>newMethodWithGivenTy</code>) and (2) instantiation of functional
      dependencies.  The former implies allocating a new unique identifier,
      which replaces the original (overloaded) identifier at the currently
      type-checked usage occurrence.
    <p>
      The new identifier (after being threaded through the LIE) eventually
      will be bound by a top-level binding whose rhs contains a partial
      application of the original overloaded identifier.  This papp applies
      the overloaded function to the dictionaries needed for the current
      instance.  In GHC lingo, this is called a <em>method.</em>  Before
      becoming a top-level binding, the method is first represented as a value
      of type <code>Inst.Inst</code>, which makes it easy to fold multiple
      instances of the same identifier at the same types into one global
      definition.  (And probably other things, too, which I haven't
      investigated yet.)

    <p>
      <strong>Note:</strong> As of 13 January 2001 (wrt. to the code in the
      CVS HEAD), the above mechanism interferes badly with RULES pragmas
      defined over overloaded functions.  During instantiation, a new name is
      created for an overloaded function partially applied to the dictionaries
      needed in a usage position of that function.  As the rewrite rule,
      however, mentions the original overloaded name, it won't fire anymore
      -- unless later phases remove the intermediate definition again.  The
      latest CVS version of GHC has an option
      <code>-fno-method-sharing</code>, which avoids sharing instantiation
      stubs.  This is usually/often/sometimes sufficient to make the rules
      fire again.

    <p><small>
<!-- hhmts start -->
Last modified: Thu May 12 22:52:46 EST 2005
<!-- hhmts end -->
    </small>
  </body>
</html>