diff options
Diffstat (limited to 'docs/comm/the-beast/vars.html')
-rw-r--r-- | docs/comm/the-beast/vars.html | 235 |
1 files changed, 235 insertions, 0 deletions
diff --git a/docs/comm/the-beast/vars.html b/docs/comm/the-beast/vars.html new file mode 100644 index 0000000000..9bbd310c60 --- /dev/null +++ b/docs/comm/the-beast/vars.html @@ -0,0 +1,235 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> + <title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title> + </head> + + <body BGCOLOR="FFFFFF"> + <h1>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</h1> + <p> + + +<h2>Variables</h2> + +The <code>Var</code> type, defined in <code>basicTypes/Var.lhs</code>, +represents variables, both term variables and type variables: +<pre> + data Var + = Var { + varName :: Name, + realUnique :: FastInt, + varType :: Type, + varDetails :: VarDetails, + varInfo :: IdInfo + } +</pre> +<ul> +<li> The <code>varName</code> field contains the identity of the variable: +its unique number, and its print-name. See "<a href="names.html">The truth about names</a>". + +<p><li> The <code>realUnique</code> field caches the unique number in the +<code>varName</code> field, just to make comparison of <code>Var</code>s a little faster. + +<p><li> The <code>varType</code> field gives the type of a term variable, or the kind of a +type variable. (Types and kinds are both represented by a <code>Type</code>.) + +<p><li> The <code>varDetails</code> field distinguishes term variables from type variables, +and makes some further distinctions (see below). + +<p><li> For term variables (only) the <code>varInfo</code> field contains lots of useful +information: strictness, unfolding, etc. However, this information is all optional; +you can always throw away the <code>IdInfo</code>. In contrast, you can't safely throw away +the <code>VarDetails</code> of a <code>Var</code> +</ul> +<p> +It's often fantastically convenient to have term variables and type variables +share a single data type. For example, +<pre> + exprFreeVars :: CoreExpr -> VarSet +</pre> +If there were two types, we'd need to return two sets. Simiarly, big lambdas and +little lambdas use the same constructor in Core, which is extremely convenient. +<p> +We define a couple of type synonyms: +<pre> + type Id = Var -- Term variables + type TyVar = Var -- Type variables +</pre> +just to help us document the occasions when we are expecting only term variables, +or only type variables. + + +<h2> The <code>VarDetails</code> field </h2> + +The <code>VarDetails</code> field tells what kind of variable this is: +<pre> +data VarDetails + = LocalId -- Used for locally-defined Ids (see NOTE below) + LocalIdDetails + + | GlobalId -- Used for imported Ids, dict selectors etc + GlobalIdDetails + + | TyVar + | MutTyVar (IORef (Maybe Type)) -- Used during unification; + TyVarDetails +</pre> + +<a name="TyVar"> +<h2>Type variables (<code>TyVar</code>)</h2> +</a> +<p> +The <code>TyVar</code> case is self-explanatory. The <code>MutTyVar</code> +case is used only during type checking. Then a type variable can be unified, +using an imperative update, with a type, and that is what the +<code>IORef</code> is for. The <code>TcType.TyVarDetails</code> field records +the sort of type variable we are dealing with. It is defined as +<pre> +data TyVarDetails = SigTv | ClsTv | InstTv | VanillaTv +</pre> +<code>SigTv</code> marks type variables that were introduced when +instantiating a type signature prior to matching it against the inferred type +of a definition. The variants <code>ClsTv</code> and <code>InstTv</code> mark +scoped type variables introduced by class and instance heads, respectively. +These first three sorts of type variables are skolem variables (tested by the +predicate <code>isSkolemTyVar</code>); i.e., they must <em>not</em> be +instantiated. All other type variables are marked as <code>VanillaTv</code>. +<p> +For a long time I tried to keep mutable Vars statically type-distinct +from immutable Vars, but I've finally given up. It's just too painful. +After type checking there are no MutTyVars left, but there's no static check +of that fact. + +<h2>Term variables (<code>Id</code>)</h2> + +A term variable (of type <code>Id</code>) is represented either by a +<code>LocalId</code> or a <code>GlobalId</code>: +<p> +A <code>GlobalId</code> is +<ul> +<li> Always bound at top-level. +<li> Always has a <code>GlobalName</code>, and hence has + a <code>Unique</code> that is globally unique across the whole + GHC invocation (a single invocation may compile multiple modules). +<li> Has <code>IdInfo</code> that is absolutely fixed, forever. +</ul> + +<p> +A <code>LocalId</code> is: +<ul> +<li> Always bound in the module being compiled: +<ul> +<li> <em>either</em> bound within an expression (lambda, case, local let(rec)) +<li> <em>or</em> defined at top level in the module being compiled. +</ul> +<li> Has IdInfo that changes as the simpifier bashes repeatedly on it. +</ul> +<p> +The key thing about <code>LocalId</code>s is that the free-variable finder +typically treats them as candidate free variables. That is, it ignores +<code>GlobalId</code>s such as imported constants, data contructors, etc. +<p> +An important invariant is this: <em>All the bindings in the module +being compiled (whether top level or not) are <code>LocalId</code>s +until the CoreTidy phase.</em> In the CoreTidy phase, all +externally-visible top-level bindings are made into GlobalIds. This +is the point when a <code>LocalId</code> becomes "frozen" and becomes +a fixed, immutable <code>GlobalId</code>. +<p> +(A binding is <em>"externally-visible"</em> if it is exported, or +mentioned in the unfolding of an externally-visible Id. An +externally-visible Id may not have an unfolding, either because it is +too big, or because it is the loop-breaker of a recursive group.) + +<h3>Global Ids and implicit Ids</h3> + +<code>GlobalId</code>s are further categorised by their <code>GlobalIdDetails</code>. +This type is defined in <code>basicTypes/IdInfo</code>, because it mentions other +structured types like <code>DataCon</code>. Unfortunately it is *used* in <code>Var.lhs</code> +so there's a <code>hi-boot</code> knot to get it there. Anyway, here's the declaration: +<pre> +data GlobalIdDetails + = NotGlobalId -- Used as a convenient extra return value + -- from globalIdDetails + + | VanillaGlobal -- Imported from elsewhere + + | PrimOpId PrimOp -- The Id for a primitive operator + | FCallId ForeignCall -- The Id for a foreign call + + -- These next ones are all "implicit Ids" + | RecordSelId FieldLabel -- The Id for a record selector + | DataConId DataCon -- The Id for a data constructor *worker* + | DataConWrapId DataCon -- The Id for a data constructor *wrapper* + -- [the only reasons we need to know is so that + -- a) we can suppress printing a definition in the interface file + -- b) when typechecking a pattern we can get from the + -- Id back to the data con] +</pre> +The <code>GlobalIdDetails</code> allows us to go from the <code>Id</code> for +a record selector, say, to its field name; or the <code>Id</code> for a primitive +operator to the <code>PrimOp</code> itself. +<p> +Certain <code>GlobalId</code>s are called <em>"implicit"</em> Ids. An implicit +Id is derived by implication from some other declaration. So a record selector is +derived from its data type declaration, for example. An implicit Ids is always +a <code>GlobalId</code>. For most of the compilation, the implicit Ids are just +that: implicit. If you do -ddump-simpl you won't see their definition. (That's +why it's true to say that until CoreTidy all Ids in this compilation unit are +LocalIds.) But at CorePrep, a binding is added for each implicit Id defined in +this module, so that the code generator will generate code for the (curried) function. +<p> +Implicit Ids carry their unfolding inside them, of course, so they may well have +been inlined much earlier; but we generate the curried top-level defn just in +case its ever needed. + +<h3>LocalIds</h3> + +The <code>LocalIdDetails</code> gives more info about a <code>LocalId</code>: +<pre> +data LocalIdDetails + = NotExported -- Not exported + | Exported -- Exported + | SpecPragma -- Not exported, but not to be discarded either + -- It's unclean that this is so deeply built in +</pre> +From this we can tell whether the <code>LocalId</code> is exported, and that +tells us whether we can drop an unused binding as dead code. +<p> +The <code>SpecPragma</code> thing is a HACK. Suppose you write a SPECIALIZE pragma: +<pre> + foo :: Num a => a -> a + {-# SPECIALIZE foo :: Int -> Int #-} + foo = ... +</pre> +The type checker generates a dummy call to <code>foo</code> at the right types: +<pre> + $dummy = foo Int dNumInt +</pre> +The Id <code>$dummy</code> is marked <code>SpecPragma</code>. Its role is to hang +onto that call to <code>foo</code> so that the specialiser can see it, but there +are no calls to <code>$dummy</code>. +The simplifier is careful not to discard <code>SpecPragma</code> Ids, so that it +reaches the specialiser. The specialiser processes the right hand side of a <code>SpecPragma</code> Id +to find calls to overloaded functions, <em>and then discards the <code>SpecPragma</code> Id</em>. +So <code>SpecPragma</code> behaves a like <code>Exported</code>, at least until the specialiser. + + +<h3> ExternalNames and InternalNames </h3> + +Notice that whether an Id is a <code>LocalId</code> or <code>GlobalId</code> is +not the same as whether the Id has an <code>ExternaName</code> or an <code>InternalName</code> +(see "<a href="names.html#sort">The truth about Names</a>"): +<ul> +<li> Every <code>GlobalId</code> has an <code>ExternalName</code>. +<li> A <code>LocalId</code> might have either kind of <code>Name</code>. +</ul> + +<!-- hhmts start --> +Last modified: Fri Sep 12 15:17:18 BST 2003 +<!-- hhmts end --> + </small> + </body> +</html> + |