summaryrefslogtreecommitdiff
path: root/docs/comm/the-beast/stg.html
diff options
context:
space:
mode:
authorSimon Marlow <simonmar@microsoft.com>2006-04-07 02:05:11 +0000
committerSimon Marlow <simonmar@microsoft.com>2006-04-07 02:05:11 +0000
commit0065d5ab628975892cea1ec7303f968c3338cbe1 (patch)
tree8e2afe0ab48ee33cf95009809d67c9649573ef92 /docs/comm/the-beast/stg.html
parent28a464a75e14cece5db40f2765a29348273ff2d2 (diff)
downloadhaskell-0065d5ab628975892cea1ec7303f968c3338cbe1.tar.gz
Reorganisation of the source tree
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
Diffstat (limited to 'docs/comm/the-beast/stg.html')
-rw-r--r--docs/comm/the-beast/stg.html164
1 files changed, 164 insertions, 0 deletions
diff --git a/docs/comm/the-beast/stg.html b/docs/comm/the-beast/stg.html
new file mode 100644
index 0000000000..4581da7d1f
--- /dev/null
+++ b/docs/comm/the-beast/stg.html
@@ -0,0 +1,164 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
+ <title>The GHC Commentary - You Got Control: The STG-language</title>
+ </head>
+
+ <body BGCOLOR="FFFFFF">
+ <h1>The GHC Commentary - You Got Control: The STG-language</h1>
+ <p>
+ GHC contains two completely independent backends: the byte code
+ generator and the machine code generator. The decision over which of
+ the two is invoked is made in <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscCodeGen</code>.
+ The machine code generator proceeds itself in a number of phases: First,
+ the <a href="desugar.html">Core</a> intermediate language is translated
+ into <em>STG-language</em>; second, STG-language is transformed into a
+ GHC-internal variant of <a href="http://www.cminusminus.org/">C--</a>;
+ and thirdly, this is either emitted as concrete C--, converted to GNU C,
+ or translated to native code (by the <a href="ncg.html">native code
+ generator</a> which targets IA32, Sparc, and PowerPC [as of March '5]).
+ </p>
+ <p>
+ In the following, we will have a look at the first step of machine code
+ generation, namely the translation steps involving the STG-language.
+ Details about the underlying abstract machine, the <em>Spineless Tagless
+ G-machine</em>, are in <a
+ href="http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/spineless-tagless-gmachine.ps.gz&pub=34">Implementing
+ lazy functional languages on stock hardware: the Spineless Tagless
+ G-machine</a>, SL Peyton Jones, Journal of Functional Programming 2(2),
+ Apr 1992, pp127-202. (Some details have changed since the publication of
+ this article, but it still gives a good introduction to the main
+ concepts.)
+ </p>
+
+ <h2>The STG Language</h2>
+ <p>
+ The AST of the STG-language and the generation of STG code from Core is
+ both located in the <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/"><code>stgSyn/</code></a>
+ directory; in the modules <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
+ and <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a>,
+ respectively.
+ </p>
+ <p>
+ Conceptually, the STG-language is a lambda calculus (including data
+ constructors and case expressions) whose syntax is restricted to make
+ all control flow explicit. As such, it can be regarded as a variant of
+ <em>administrative normal form (ANF).</em> (C.f., <a
+ href="http://doi.acm.org/10.1145/173262.155113">The essence of compiling
+ with continuations.</a> Cormac Flanagan, Amr Sabry, Bruce F. Duba, and
+ Matthias Felleisen. <em>ACM SIGPLAN Conference on Programming Language
+ Design and Implementation,</em> ACM Press, 1993.) Each syntactic from
+ has a precise operational interpretation, in addition to the
+ denotational interpretation inherited from the lambda calculus. The
+ concrete representation of the STG language inside GHC also includes
+ auxiliary attributes, such as <em>static reference tables (SRTs),</em>
+ which determine the top-level bindings referenced by each let binding
+ and case expression.
+ </p>
+ <p>
+ As usual in ANF, arguments to functions etc. are restricted to atoms
+ (i.e., constants or variables), which implies that all sub-expressions
+ are explicitly named and evaluation order is explicit. Specific to the
+ STG language is that all let bindings correspond to closure allocation
+ (thunks, function closures, and data constructors) and that case
+ expressions encode both computation and case selection. There are two
+ flavours of case expressions scrutinising boxed and unboxed values,
+ respectively. The former perform function calls including demanding the
+ evaluation of thunks, whereas the latter execute primitive operations
+ (such as arithmetic on fixed size integers and floating-point numbers).
+ </p>
+ <p>
+ The representation of STG language defined in <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
+ abstracts over both binders and occurences of variables. The type names
+ involved in this generic definition all carry the prefix
+ <code>Gen</code> (such as in <code>GenStgBinding</code>). Instances of
+ these generic definitions, where both binders and occurences are of type
+ <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/basicTypes/Id.lhs"><code>Id</code></a><code>.Id</code>
+ are defined as type synonyms and use type names that drop the
+ <code>Gen</code> prefix (i.e., becoming plain <code>StgBinding</code>).
+ Complete programs in STG form are represented by values of type
+ <code>[StgBinding]</code>.
+ </p>
+
+ <h2>From Core to STG</h2>
+ <p>
+ Although, the actual translation from Core AST into STG AST is performed
+ by the function <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>
+ (or <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreExprToStg</code>
+ for individual expressions), the translation crucial depends on <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepPgm</code>
+ (resp. <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepExpr</code>),
+ which prepares Core code for code generation (for both byte code and
+ machine code generation). <code>CorePrep</code> saturates primitive and
+ constructor applications, turns the code into A-normal form, renames all
+ identifiers into globally unique names, generates bindings for
+ constructor workers, constructor wrappers, and record selectors plus
+ some further cleanup.
+ </p>
+ <p>
+ In other words, after Core code is prepared for code generation it is
+ structurally already in the form required by the STG language. The main
+ work performed by the actual transformation from Core to STG, as
+ performed by <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>,
+ is to compute the live and free variables as well as live CAFs (constant
+ applicative forms) at each let binding and case alternative. In
+ subsequent phases, the live CAF information is used to compute SRTs.
+ The live variable information is used to determine which stack slots
+ need to be zapped (to avoid space leaks) and the free variable
+ information is need to construct closures. Moreover, hints for
+ optimised code generation are computed, such as whether a closure needs
+ to be updated after is has been evaluated.
+ </p>
+
+ <h2>STG Passes</h2>
+ <p>
+ These days little actual work is performed on programs in STG form; in
+ particular, the code is not further optimised. All serious optimisation
+ (except low-level optimisations which are performed during native code
+ generation) has already been done on Core. The main task of <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.stg2stg</code>
+ is to compute SRTs from the live CAF information determined during STG
+ generation. Other than that, <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/profiling/SCCfinal.lhs"><code>SCCfinal</code></a><code>.stgMassageForProfiling</code>
+ is executed when compiling for profiling and information may be dumped
+ for debugging purposes.
+ </p>
+
+ <h2>Towards C--</h2>
+ <p>
+ GHC's internal form of C-- is defined in the module <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/cmm/Cmm.hs"><code>Cmm</code></a>.
+ The definition is generic in that it abstracts over the type of static
+ data and of the contents of basic blocks (i.e., over the concrete
+ representation of constant data and instructions). These generic
+ definitions have names carrying the prefix <code>Gen</code> (such as
+ <code>GenCmm</code>). The same module also instantiates the generic
+ form to a concrete form where data is represented by
+ <code>CmmStatic</code> and instructions are represented by
+ <code>CmmStmt</code> (giving us, e.g., <code>Cmm</code> from
+ <code>GenCmm</code>). The concrete form more or less follows the
+ external <a href="http://www.cminusminus.org/">C--</a> language.
+ </p>
+ <p>
+ Programs in STG form are translated to <code>Cmm</code> by <a
+ href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/codeGen/CodeGen.lhs"><code>CodeGen</code></a><code>.codeGen</code>
+ </p>
+
+ <p><hr><small>
+<!-- hhmts start -->
+Last modified: Sat Mar 5 22:55:25 EST 2005
+<!-- hhmts end -->
+ </small>
+ </body>
+</html>