summaryrefslogtreecommitdiff
path: root/docs/comm/the-beast/vars.html
blob: 9bbd310c605a66c2b5ed0db6ca7f1a71018c009d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
    <title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title>
  </head>

  <body BGCOLOR="FFFFFF">
    <h1>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</h1>
    <p>


<h2>Variables</h2>

The <code>Var</code> type, defined in <code>basicTypes/Var.lhs</code>,
represents variables, both term variables and type variables:
<pre>
    data Var
      = Var {
	    varName    :: Name,
	    realUnique :: FastInt,
	    varType    :: Type,
	    varDetails :: VarDetails,
	    varInfo    :: IdInfo		
	}
</pre>
<ul>
<li> The <code>varName</code> field contains the identity of the variable:
its unique number, and its print-name.  See "<a href="names.html">The truth about names</a>".

<p><li> The <code>realUnique</code> field caches the unique number in the
<code>varName</code> field, just to make comparison of <code>Var</code>s a little faster.

<p><li> The <code>varType</code> field gives the type of a term variable, or the kind of a
type variable.  (Types and kinds are both represented by a <code>Type</code>.)

<p><li> The <code>varDetails</code> field distinguishes term variables from type variables,
and makes some further distinctions (see below).

<p><li> For term variables (only) the <code>varInfo</code> field contains lots of useful
information: strictness, unfolding, etc.  However, this information is all optional;
you can always throw away the <code>IdInfo</code>.  In contrast, you can't safely throw away
the <code>VarDetails</code> of a <code>Var</code>
</ul>
<p>
It's often fantastically convenient to have term variables and type variables
share a single data type.  For example, 
<pre>
  exprFreeVars :: CoreExpr -> VarSet
</pre>
If there were two types, we'd need to return two sets.  Simiarly, big lambdas and
little lambdas use the same constructor in Core, which is extremely convenient.
<p>
We define a couple of type synonyms:
<pre>
  type Id    = Var  -- Term variables
  type TyVar = Var  -- Type variables
</pre>
just to help us document the occasions when we are expecting only term variables,
or only type variables.


<h2> The <code>VarDetails</code> field </h2>

The <code>VarDetails</code> field tells what kind of variable this is:
<pre>
data VarDetails
  = LocalId 		-- Used for locally-defined Ids (see NOTE below)
	LocalIdDetails

  | GlobalId 		-- Used for imported Ids, dict selectors etc
	GlobalIdDetails

  | TyVar
  | MutTyVar (IORef (Maybe Type)) 	-- Used during unification;
	     TyVarDetails
</pre>

<a name="TyVar">
<h2>Type variables (<code>TyVar</code>)</h2>
</a>
<p>
The <code>TyVar</code> case is self-explanatory.  The <code>MutTyVar</code>
case is used only during type checking.  Then a type variable can be unified,
using an imperative update, with a type, and that is what the
<code>IORef</code> is for.  The <code>TcType.TyVarDetails</code> field records
the sort of type variable we are dealing with.  It is defined as
<pre>
data TyVarDetails = SigTv | ClsTv | InstTv | VanillaTv
</pre>
<code>SigTv</code> marks type variables that were introduced when
instantiating a type signature prior to matching it against the inferred type
of a definition.  The variants <code>ClsTv</code> and <code>InstTv</code> mark
scoped type variables introduced by class and instance heads, respectively.
These first three sorts of type variables are skolem variables (tested by the
predicate <code>isSkolemTyVar</code>); i.e., they must <em>not</em> be
instantiated. All other type variables are marked as <code>VanillaTv</code>.
<p>
For a long time I tried to keep mutable Vars statically type-distinct
from immutable Vars, but I've finally given up.   It's just too painful.
After type checking there are no MutTyVars left, but there's no static check
of that fact.

<h2>Term variables (<code>Id</code>)</h2>

A term variable (of type <code>Id</code>) is represented either by a
<code>LocalId</code> or a <code>GlobalId</code>:
<p>
A <code>GlobalId</code> is
<ul>
<li> Always bound at top-level.
<li> Always has a <code>GlobalName</code>, and hence has 
     a <code>Unique</code> that is globally unique across the whole
     GHC invocation (a single invocation may compile multiple modules).
<li> Has <code>IdInfo</code> that is absolutely fixed, forever.
</ul>

<p>
A <code>LocalId</code> is:
<ul> 
<li> Always bound in the module being compiled:
<ul>
<li> <em>either</em> bound within an expression (lambda, case, local let(rec))
<li> <em>or</em> defined at top level in the module being compiled.
</ul>
<li> Has IdInfo that changes as the simpifier bashes repeatedly on it.
</ul>
<p>
The key thing about <code>LocalId</code>s is that the free-variable finder
typically treats them as candidate free variables. That is, it ignores
<code>GlobalId</code>s such as imported constants, data contructors, etc.
<p>
An important invariant is this: <em>All the bindings in the module
being compiled (whether top level or not) are <code>LocalId</code>s
until the CoreTidy phase.</em> In the CoreTidy phase, all
externally-visible top-level bindings are made into GlobalIds.  This
is the point when a <code>LocalId</code> becomes "frozen" and becomes
a fixed, immutable <code>GlobalId</code>.
<p>
(A binding is <em>"externally-visible"</em> if it is exported, or
mentioned in the unfolding of an externally-visible Id.  An
externally-visible Id may not have an unfolding, either because it is
too big, or because it is the loop-breaker of a recursive group.)

<h3>Global Ids and implicit Ids</h3>

<code>GlobalId</code>s are further categorised by their <code>GlobalIdDetails</code>.
This type is defined in <code>basicTypes/IdInfo</code>, because it mentions other
structured types like <code>DataCon</code>.  Unfortunately it is *used* in <code>Var.lhs</code>
so there's a <code>hi-boot</code> knot to get it there.  Anyway, here's the declaration:
<pre>
data GlobalIdDetails
  = NotGlobalId			-- Used as a convenient extra return value 
                                -- from globalIdDetails

  | VanillaGlobal		-- Imported from elsewhere

  | PrimOpId PrimOp		-- The Id for a primitive operator
  | FCallId ForeignCall		-- The Id for a foreign call

  -- These next ones are all "implicit Ids"
  | RecordSelId FieldLabel	-- The Id for a record selector
  | DataConId DataCon		-- The Id for a data constructor *worker*
  | DataConWrapId DataCon	-- The Id for a data constructor *wrapper*
				-- [the only reasons we need to know is so that
				--  a) we can  suppress printing a definition in the interface file
				--  b) when typechecking a pattern we can get from the
				--     Id back to the data con]
</pre>
The <code>GlobalIdDetails</code> allows us to go from the <code>Id</code> for 
a record selector, say, to its field name; or the <code>Id</code> for a primitive
operator to the <code>PrimOp</code> itself.
<p>
Certain <code>GlobalId</code>s are called <em>"implicit"</em> Ids.  An implicit
Id is derived by implication from some other declaration.  So a record selector is
derived from its data type declaration, for example.  An implicit Ids is always 
a <code>GlobalId</code>.  For most of the compilation, the implicit Ids are just
that: implicit.  If you do -ddump-simpl you won't see their definition.  (That's
why it's true to say that until CoreTidy all Ids in this compilation unit are
LocalIds.)  But at CorePrep, a binding is added for each implicit Id defined in
this module, so that the code generator will generate code for the (curried) function.
<p>
Implicit Ids carry their unfolding inside them, of course, so they may well have
been inlined much earlier; but we generate the curried top-level defn just in
case its ever needed.

<h3>LocalIds</h3>

The <code>LocalIdDetails</code> gives more info about a <code>LocalId</code>:
<pre>
data LocalIdDetails 
  = NotExported	-- Not exported
  | Exported	-- Exported
  | SpecPragma	-- Not exported, but not to be discarded either
		-- It's unclean that this is so deeply built in
</pre>
From this we can tell whether the <code>LocalId</code> is exported, and that
tells us whether we can drop an unused binding as dead code. 
<p>
The <code>SpecPragma</code> thing is a HACK.  Suppose you write a SPECIALIZE pragma:
<pre>
   foo :: Num a => a -> a
   {-# SPECIALIZE foo :: Int -> Int #-}
   foo = ...
</pre>
The type checker generates a dummy call to <code>foo</code> at the right types:
<pre>
   $dummy = foo Int dNumInt
</pre>
The Id <code>$dummy</code> is marked <code>SpecPragma</code>.  Its role is to hang
onto that call to <code>foo</code> so that the specialiser can see it, but there
are no calls to <code>$dummy</code>.
The simplifier is careful not to discard <code>SpecPragma</code> Ids, so that it
reaches the specialiser.  The specialiser processes the right hand side of a <code>SpecPragma</code> Id
to find calls to overloaded functions, <em>and then discards the <code>SpecPragma</code> Id</em>.
So <code>SpecPragma</code> behaves a like <code>Exported</code>, at least until the specialiser.


<h3> ExternalNames and InternalNames </h3>

Notice that whether an Id is a <code>LocalId</code> or <code>GlobalId</code> is 
not the same as whether the Id has an <code>ExternaName</code> or an <code>InternalName</code>
(see "<a href="names.html#sort">The truth about Names</a>"):
<ul>
<li> Every <code>GlobalId</code> has an <code>ExternalName</code>.
<li> A <code>LocalId</code> might have either kind of <code>Name</code>.
</ul>

<!-- hhmts start -->
Last modified: Fri Sep 12 15:17:18 BST 2003
<!-- hhmts end -->
    </small>
  </body>
</html>