ghc/docs/comm/the-beast/desugar.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
    <title>The GHC Commentary - Sugar Free: From Haskell To Core</title>
  </head>

  <body BGCOLOR="FFFFFF">
    <h1>The GHC Commentary - Sugar Free: From Haskell To Core</h1>
    <p>
      Up until after type checking, GHC keeps the source program in an
      abstract representation of Haskell source without removing any of the
      syntactic sugar (such as, list comprehensions) that could easily be
      represented by more primitive Haskell.  This complicates part of the
      front-end considerably as the abstract syntax of Haskell (as exported by
      the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/hsSyn/HsSyn.lhs"><code>HsSyn</code></a>)
      is much more complex than a simplified representation close to, say, the
      <a href="http://haskell.org/onlinereport/intro.html#sect1.2">Haskell
      Kernel</a> would be.  However, having a representation that is as close
      as possible to the surface syntax simplifies the generation of clear
      error messages.  As GHC (quite in contrast to "conventional" compilers)
      prints code fragments as part of error messages, the choice of
      representation is especially important.
    <p>
      Nonetheless, as soon as the input has passed all static checks, it is
      transformed into GHC's principal intermediate language that goes by the
      name of <em>Core</em> and whose representation is exported by the
      module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CoreSyn.lhs"><code>CoreSyn</code></a>.  
      All following compiler phases, except code generation operate on Core.
      Due to Andrew Tolmach's effort, there is also an <a
      href="http://www.haskell.org/ghc/docs/papers/core.ps.gz">external
      representation for Core.</a>
    <p>
      The conversion of the compiled module from <code>HsSyn</code> into that
      of <code>CoreSyn</code> is performed by a phase called the
      <em>desugarer</em>, which is located in
      <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/"><code>fptools/ghc/compiler/deSugar/</code></a>.
      It's operation is detailed in the following.
    </p>

    <h2>Auxilliary Functions</h2>
    <p>
      The modules <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsMonad.lhs"><code>DsMonad</code></a>
      defines the desugarer monad (of type <code>DsM</code>) which maintains
      the environment needed for desugaring.  In particular, it encapsulates a
      unique supply for generating new variables, a map to lookup standard
      names (such as functions from the prelude), a source location for error
      messages, and a pool to collect warning messages generated during
      desugaring.  Initialisation of the environment happens in the function <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Desugar.lhs"><code>Desugar</code></a><code>.desugar</code>, 
      which is also the main entry point into the desugarer.
    <p>
      The generation of Core code often involves the use of standard functions
      for which proper identifiers (i.e., values of type <code>Id</code> that
      actually refer to the definition in the right Prelude) need to be
      obtained.  This is supported by the function
      <code>DsMonad.dsLookupGlobalValue :: Name -> DsM Id</code>.

    <h2><a name="patmat">Pattern Matching</a></h2>
    <p>
      Nested pattern matching with guards and everything is translated into
      the simple, flat case expressions of Core by the following modules:
    <dl>
      <dt><a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Match.lhs"><code>Match</code></a>:
      <dd>This modules contains the main pattern-matching compiler in the form
	of a function called <code>match</code>.  There is some documentation
	as to how <code>match</code> works contained in the module itself.
	Generally, the implemented algorithm is similar to the one described
	in Phil Wadler's Chapter ? of Simon Peyton Jones' <em>The
	Implementation of Functional Programming Languages</em>.
	<code>Match</code> exports a couple of functions with not really
	intuitive names.  In particular, it exports <code>match</code>,
	<code>matchWrapper</code>, <code>matchExport</code>, and
	<code>matchSimply</code>.  The function <code>match</code>, which is
	the main work horse, is only used by the other matching modules.  The
	function <code>matchExport</code> - despite it's name - is merely used
	internally in <code>Match</code> and handles warning messages (see
	below for more details).  The actual interface to the outside is
	<code>matchWrapper</code>, which converts the output of the type
	checker into the form needed by the pattern matching compiler (i.e., a
	list of <code>EquationInfo</code>).  Similar in function to
	<code>matchWrapper</code> is <code>matchSimply</code>, which provides
	an interface for the case where a single expression is to be matched
	against a single pattern (as, for example, is the case in bindings in
	a <code>do</code> expression).
      <dt><a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/MatchCon.lhs"><code>MatchCon</code></a>:
      <dd>This module generates code for a set of alternative constructor
	patterns that belong to a single type by means of the routine
	<code>matchConFamily</code>.  More precisely, the routine gets a set
	of equations where the left-most pattern of each equation is a
	constructor pattern with a head symbol from the same type as that of
	all the other equations.  A Core case expression is generated that
	distinguihes between all these constructors.  The routine is clever
	enough to generate a sparse case expression and to add a catch-all
	default case only when needed (i.e., if the case expression isn't
	exhaustive already).  There is also an explanation at the start of the
	modules.
      <dt><a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/MatchLit.lhs"><code>MatchLit</code></a>:
      <dd>Generates code for a set of alternative literal patterns by means of
	the routine <code>matchLiterals</code>.  The principle is similar to
	that of <code>matchConFamily</code>, but all left-most patterns are
	literals of the same type.
      <dt><a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsUtils.lhs"><code>DsUtils</code></a>:
      <dd>This module provides a set of auxilliary definitions as well as the
      data types <code>EquationInfo</code> and <code>MatchResult</code> that
      form the input and output, respectively, of the pattern matching
      compiler.
      <dt><a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Check.lhs"><code>Check</code></a>:
      <dd>This module does not really contribute the compiling pattern
      matching, but it inspects sets of equations to find whether there are
      any overlapping patterns or non-exhaustive pattern sets.  This task is
      implemented by the function <code>check</code>, which returns a list of
      patterns that are part of a non-exhaustive case distinction as well as a
      set of equation labels that can be reached during execution of the code;
      thus, the remaining equations are shadowed due to overlapping patterns.
      The function <code>check</code> is invoked and its result converted into
      suitable warning messages by the function <code>Match.matchExport</code>
      (which is a wrapper for <code>Match.match</code>).
    </dl>
    <p>
      The central function <code>match</code>, given a set of equations,
      proceeds in a number of steps:
      <ol>
      <li>It starts by desugaring the left-most pattern of each equation using
	the function <code>tidy1</code> (indirectly via
	<code>tidyEqnInfo</code>).  During this process, non-elementary
	pattern (e.g., those using explicit list syntax <code>[x, y, ...,
	z]</code>) are converted to a standard constructor pattern and also
	irrefutable pattern are removed.
      <li>Then, a process called <em>unmixing</em> clusters the equations into
	blocks (without re-ordering them), such that the left-most pattern of
	all equations in a block are either all variables, all literals, or
	all constructors.
      <li>Each block is, then, compiled by <code>matchUnmixedEqns</code>,
	which forwards the handling of literal pattern blocks to
	<code>MatchLit.matchLiterals</code>, of constructor pattern blocks to
	<code>MatchCon.matchConFamily</code>, and hands variable pattern
	blocks back to <code>match</code>.
      </ol>

    <p><hr><small>
<!-- hhmts start -->
Last modified: Mon Feb 11 22:35:25 EST 2002
<!-- hhmts end -->
    </small>
  </body>
</html>