1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<title>The GHC Commentary - Sugar Free: From Haskell To Core</title>
</head>
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - Sugar Free: From Haskell To Core</h1>
<p>
Up until after type checking, GHC keeps the source program in an
abstract representation of Haskell source without removing any of the
syntactic sugar (such as, list comprehensions) that could easily be
represented by more primitive Haskell. This complicates part of the
front-end considerably as the abstract syntax of Haskell (as exported by
the module <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/hsSyn/HsSyn.lhs"><code>HsSyn</code></a>)
is much more complex than a simplified representation close to, say, the
<a href="http://haskell.org/onlinereport/intro.html#sect1.2">Haskell
Kernel</a> would be. However, having a representation that is as close
as possible to the surface syntax simplifies the generation of clear
error messages. As GHC (quite in contrast to "conventional" compilers)
prints code fragments as part of error messages, the choice of
representation is especially important.
<p>
Nonetheless, as soon as the input has passed all static checks, it is
transformed into GHC's principal intermediate language that goes by the
name of <em>Core</em> and whose representation is exported by the
module <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CoreSyn.lhs"><code>CoreSyn</code></a>.
All following compiler phases, except code generation operate on Core.
Due to Andrew Tolmach's effort, there is also an <a
href="http://www.haskell.org/ghc/docs/papers/core.ps.gz">external
representation for Core.</a>
<p>
The conversion of the compiled module from <code>HsSyn</code> into that
of <code>CoreSyn</code> is performed by a phase called the
<em>desugarer</em>, which is located in
<a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/"><code>fptools/ghc/compiler/deSugar/</code></a>.
It's operation is detailed in the following.
</p>
<h2>Auxilliary Functions</h2>
<p>
The modules <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsMonad.lhs"><code>DsMonad</code></a>
defines the desugarer monad (of type <code>DsM</code>) which maintains
the environment needed for desugaring. In particular, it encapsulates a
unique supply for generating new variables, a map to lookup standard
names (such as functions from the prelude), a source location for error
messages, and a pool to collect warning messages generated during
desugaring. Initialisation of the environment happens in the function <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Desugar.lhs"><code>Desugar</code></a><code>.desugar</code>,
which is also the main entry point into the desugarer.
<p>
The generation of Core code often involves the use of standard functions
for which proper identifiers (i.e., values of type <code>Id</code> that
actually refer to the definition in the right Prelude) need to be
obtained. This is supported by the function
<code>DsMonad.dsLookupGlobalValue :: Name -> DsM Id</code>.
<h2><a name="patmat">Pattern Matching</a></h2>
<p>
Nested pattern matching with guards and everything is translated into
the simple, flat case expressions of Core by the following modules:
<dl>
<dt><a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Match.lhs"><code>Match</code></a>:
<dd>This modules contains the main pattern-matching compiler in the form
of a function called <code>match</code>. There is some documentation
as to how <code>match</code> works contained in the module itself.
Generally, the implemented algorithm is similar to the one described
in Phil Wadler's Chapter ? of Simon Peyton Jones' <em>The
Implementation of Functional Programming Languages</em>.
<code>Match</code> exports a couple of functions with not really
intuitive names. In particular, it exports <code>match</code>,
<code>matchWrapper</code>, <code>matchExport</code>, and
<code>matchSimply</code>. The function <code>match</code>, which is
the main work horse, is only used by the other matching modules. The
function <code>matchExport</code> - despite it's name - is merely used
internally in <code>Match</code> and handles warning messages (see
below for more details). The actual interface to the outside is
<code>matchWrapper</code>, which converts the output of the type
checker into the form needed by the pattern matching compiler (i.e., a
list of <code>EquationInfo</code>). Similar in function to
<code>matchWrapper</code> is <code>matchSimply</code>, which provides
an interface for the case where a single expression is to be matched
against a single pattern (as, for example, is the case in bindings in
a <code>do</code> expression).
<dt><a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/MatchCon.lhs"><code>MatchCon</code></a>:
<dd>This module generates code for a set of alternative constructor
patterns that belong to a single type by means of the routine
<code>matchConFamily</code>. More precisely, the routine gets a set
of equations where the left-most pattern of each equation is a
constructor pattern with a head symbol from the same type as that of
all the other equations. A Core case expression is generated that
distinguihes between all these constructors. The routine is clever
enough to generate a sparse case expression and to add a catch-all
default case only when needed (i.e., if the case expression isn't
exhaustive already). There is also an explanation at the start of the
modules.
<dt><a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/MatchLit.lhs"><code>MatchLit</code></a>:
<dd>Generates code for a set of alternative literal patterns by means of
the routine <code>matchLiterals</code>. The principle is similar to
that of <code>matchConFamily</code>, but all left-most patterns are
literals of the same type.
<dt><a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsUtils.lhs"><code>DsUtils</code></a>:
<dd>This module provides a set of auxilliary definitions as well as the
data types <code>EquationInfo</code> and <code>MatchResult</code> that
form the input and output, respectively, of the pattern matching
compiler.
<dt><a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Check.lhs"><code>Check</code></a>:
<dd>This module does not really contribute the compiling pattern
matching, but it inspects sets of equations to find whether there are
any overlapping patterns or non-exhaustive pattern sets. This task is
implemented by the function <code>check</code>, which returns a list of
patterns that are part of a non-exhaustive case distinction as well as a
set of equation labels that can be reached during execution of the code;
thus, the remaining equations are shadowed due to overlapping patterns.
The function <code>check</code> is invoked and its result converted into
suitable warning messages by the function <code>Match.matchExport</code>
(which is a wrapper for <code>Match.match</code>).
</dl>
<p>
The central function <code>match</code>, given a set of equations,
proceeds in a number of steps:
<ol>
<li>It starts by desugaring the left-most pattern of each equation using
the function <code>tidy1</code> (indirectly via
<code>tidyEqnInfo</code>). During this process, non-elementary
pattern (e.g., those using explicit list syntax <code>[x, y, ...,
z]</code>) are converted to a standard constructor pattern and also
irrefutable pattern are removed.
<li>Then, a process called <em>unmixing</em> clusters the equations into
blocks (without re-ordering them), such that the left-most pattern of
all equations in a block are either all variables, all literals, or
all constructors.
<li>Each block is, then, compiled by <code>matchUnmixedEqns</code>,
which forwards the handling of literal pattern blocks to
<code>MatchLit.matchLiterals</code>, of constructor pattern blocks to
<code>MatchCon.matchConFamily</code>, and hands variable pattern
blocks back to <code>match</code>.
</ol>
<p><hr><small>
<!-- hhmts start -->
Last modified: Mon Feb 11 22:35:25 EST 2002
<!-- hhmts end -->
</small>
</body>
</html>
|