diff options
Diffstat (limited to 'docs/comm/the-beast/stg.html')
-rw-r--r-- | docs/comm/the-beast/stg.html | 164 |
1 files changed, 164 insertions, 0 deletions
diff --git a/docs/comm/the-beast/stg.html b/docs/comm/the-beast/stg.html new file mode 100644 index 0000000000..4581da7d1f --- /dev/null +++ b/docs/comm/the-beast/stg.html @@ -0,0 +1,164 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> + <title>The GHC Commentary - You Got Control: The STG-language</title> + </head> + + <body BGCOLOR="FFFFFF"> + <h1>The GHC Commentary - You Got Control: The STG-language</h1> + <p> + GHC contains two completely independent backends: the byte code + generator and the machine code generator. The decision over which of + the two is invoked is made in <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscCodeGen</code>. + The machine code generator proceeds itself in a number of phases: First, + the <a href="desugar.html">Core</a> intermediate language is translated + into <em>STG-language</em>; second, STG-language is transformed into a + GHC-internal variant of <a href="http://www.cminusminus.org/">C--</a>; + and thirdly, this is either emitted as concrete C--, converted to GNU C, + or translated to native code (by the <a href="ncg.html">native code + generator</a> which targets IA32, Sparc, and PowerPC [as of March '5]). + </p> + <p> + In the following, we will have a look at the first step of machine code + generation, namely the translation steps involving the STG-language. + Details about the underlying abstract machine, the <em>Spineless Tagless + G-machine</em>, are in <a + href="http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/spineless-tagless-gmachine.ps.gz&pub=34">Implementing + lazy functional languages on stock hardware: the Spineless Tagless + G-machine</a>, SL Peyton Jones, Journal of Functional Programming 2(2), + Apr 1992, pp127-202. (Some details have changed since the publication of + this article, but it still gives a good introduction to the main + concepts.) + </p> + + <h2>The STG Language</h2> + <p> + The AST of the STG-language and the generation of STG code from Core is + both located in the <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/"><code>stgSyn/</code></a> + directory; in the modules <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a> + and <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a>, + respectively. + </p> + <p> + Conceptually, the STG-language is a lambda calculus (including data + constructors and case expressions) whose syntax is restricted to make + all control flow explicit. As such, it can be regarded as a variant of + <em>administrative normal form (ANF).</em> (C.f., <a + href="http://doi.acm.org/10.1145/173262.155113">The essence of compiling + with continuations.</a> Cormac Flanagan, Amr Sabry, Bruce F. Duba, and + Matthias Felleisen. <em>ACM SIGPLAN Conference on Programming Language + Design and Implementation,</em> ACM Press, 1993.) Each syntactic from + has a precise operational interpretation, in addition to the + denotational interpretation inherited from the lambda calculus. The + concrete representation of the STG language inside GHC also includes + auxiliary attributes, such as <em>static reference tables (SRTs),</em> + which determine the top-level bindings referenced by each let binding + and case expression. + </p> + <p> + As usual in ANF, arguments to functions etc. are restricted to atoms + (i.e., constants or variables), which implies that all sub-expressions + are explicitly named and evaluation order is explicit. Specific to the + STG language is that all let bindings correspond to closure allocation + (thunks, function closures, and data constructors) and that case + expressions encode both computation and case selection. There are two + flavours of case expressions scrutinising boxed and unboxed values, + respectively. The former perform function calls including demanding the + evaluation of thunks, whereas the latter execute primitive operations + (such as arithmetic on fixed size integers and floating-point numbers). + </p> + <p> + The representation of STG language defined in <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a> + abstracts over both binders and occurences of variables. The type names + involved in this generic definition all carry the prefix + <code>Gen</code> (such as in <code>GenStgBinding</code>). Instances of + these generic definitions, where both binders and occurences are of type + <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/basicTypes/Id.lhs"><code>Id</code></a><code>.Id</code> + are defined as type synonyms and use type names that drop the + <code>Gen</code> prefix (i.e., becoming plain <code>StgBinding</code>). + Complete programs in STG form are represented by values of type + <code>[StgBinding]</code>. + </p> + + <h2>From Core to STG</h2> + <p> + Although, the actual translation from Core AST into STG AST is performed + by the function <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code> + (or <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreExprToStg</code> + for individual expressions), the translation crucial depends on <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepPgm</code> + (resp. <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepExpr</code>), + which prepares Core code for code generation (for both byte code and + machine code generation). <code>CorePrep</code> saturates primitive and + constructor applications, turns the code into A-normal form, renames all + identifiers into globally unique names, generates bindings for + constructor workers, constructor wrappers, and record selectors plus + some further cleanup. + </p> + <p> + In other words, after Core code is prepared for code generation it is + structurally already in the form required by the STG language. The main + work performed by the actual transformation from Core to STG, as + performed by <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>, + is to compute the live and free variables as well as live CAFs (constant + applicative forms) at each let binding and case alternative. In + subsequent phases, the live CAF information is used to compute SRTs. + The live variable information is used to determine which stack slots + need to be zapped (to avoid space leaks) and the free variable + information is need to construct closures. Moreover, hints for + optimised code generation are computed, such as whether a closure needs + to be updated after is has been evaluated. + </p> + + <h2>STG Passes</h2> + <p> + These days little actual work is performed on programs in STG form; in + particular, the code is not further optimised. All serious optimisation + (except low-level optimisations which are performed during native code + generation) has already been done on Core. The main task of <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.stg2stg</code> + is to compute SRTs from the live CAF information determined during STG + generation. Other than that, <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/profiling/SCCfinal.lhs"><code>SCCfinal</code></a><code>.stgMassageForProfiling</code> + is executed when compiling for profiling and information may be dumped + for debugging purposes. + </p> + + <h2>Towards C--</h2> + <p> + GHC's internal form of C-- is defined in the module <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/cmm/Cmm.hs"><code>Cmm</code></a>. + The definition is generic in that it abstracts over the type of static + data and of the contents of basic blocks (i.e., over the concrete + representation of constant data and instructions). These generic + definitions have names carrying the prefix <code>Gen</code> (such as + <code>GenCmm</code>). The same module also instantiates the generic + form to a concrete form where data is represented by + <code>CmmStatic</code> and instructions are represented by + <code>CmmStmt</code> (giving us, e.g., <code>Cmm</code> from + <code>GenCmm</code>). The concrete form more or less follows the + external <a href="http://www.cminusminus.org/">C--</a> language. + </p> + <p> + Programs in STG form are translated to <code>Cmm</code> by <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/codeGen/CodeGen.lhs"><code>CodeGen</code></a><code>.codeGen</code> + </p> + + <p><hr><small> +<!-- hhmts start --> +Last modified: Sat Mar 5 22:55:25 EST 2005 +<!-- hhmts end --> + </small> + </body> +</html> |