summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorIan Lynagh <igloo@earth.li>2012-02-14 14:43:08 +0000
committerIan Lynagh <igloo@earth.li>2012-02-14 15:27:04 +0000
commit8c0196b48d043fe16eb5b2d343f5544b7fdd5004 (patch)
tree0f1c47a7d0ba5a40b00c0449806abaeafde5d1f6 /docs
parent92779798ceb6f3414ed378d51dc8be5c9ca4c52d (diff)
downloadhaskell-8c0196b48d043fe16eb5b2d343f5544b7fdd5004.tar.gz
Add docbook-ised external-core doc; from #5844, by James H. Fisher
Diffstat (limited to 'docs')
-rw-r--r--docs/users_guide/external_core.xml1807
-rw-r--r--docs/users_guide/ug-book.xml.in1
-rw-r--r--docs/users_guide/ug-ent.xml.in2
3 files changed, 1810 insertions, 0 deletions
diff --git a/docs/users_guide/external_core.xml b/docs/users_guide/external_core.xml
new file mode 100644
index 0000000000..faf7148f20
--- /dev/null
+++ b/docs/users_guide/external_core.xml
@@ -0,0 +1,1807 @@
+<?xml version="1.0" encoding="utf-8"?>
+
+<!--
+This document is a semi-automatic conversion of docs/ext-core/core.tex to DocBook using
+
+1. `htlatex` to convert LaTeX to HTML
+2. `pandoc` to convert HTML to DocBook
+3. extensive manual work by James H. Fisher (jameshfisher@gmail.com)
+-->
+
+<!--
+TODO:
+
+* Replace "java" programlisting with "ghccore"
+("ghccore" is not recognized by highlighters,
+causing some generators to fail).
+
+* Complete bibliography entries with journal titles;
+I am unsure of the proper DocBook elements.
+
+* Integrate this file with the rest of the Users' Guide.
+-->
+
+
+<chapter id="an-external-representation-for-the-ghc-core-language-for-ghc-6.10">
+ <title>An External Representation for the GHC Core Language (For GHC 6.10)</title>
+
+ <para>Andrew Tolmach, Tim Chevalier ({apt,tjc}@cs.pdx.edu) and The GHC Team</para>
+
+ <abstract>
+ <para>This document provides a precise definition for the GHC Core
+ language, so that it can be used to communicate between GHC and new
+ stand-alone compilation tools such as back-ends or
+ optimizers.<footnote>
+ <para>This is a draft document, which attempts
+ to describe GHC’s current behavior as precisely as possible. Working
+ notes scattered throughout indicate areas where further work is
+ needed. Constructive comments are very welcome, both on the
+ presentation, and on ways in which GHC could be improved in order to
+ simplify the Core story.</para>
+
+ <para>Support for generating external Core (post-optimization) was
+ originally introduced in GHC 5.02. The definition of external Core in
+ this document reflects the version of external Core generated by the
+ HEAD (unstable) branch of GHC as of May 3, 2008 (version 6.9), using
+ the compiler flag <code>-fext-core</code>. We expect that GHC 6.10 will be
+ consistent with this definition.</para>
+ </footnote>
+ The definition includes a formal grammar and an informal semantics.
+ An executable typechecker and interpreter (in Haskell), which
+ formally embody the static and dynamic semantics, are available
+ separately.</para>
+
+ </abstract>
+
+ <section id="introduction">
+ <title>Introduction</title>
+
+ <para>The Glasgow Haskell Compiler (GHC) uses an intermediate language,
+ called <quote>Core,</quote> as its internal program representation within the
+ compiler’s simplification phase. Core resembles a subset of
+ Haskell, but with explicit type annotations in the style of the
+ polymorphic lambda calculus (F<subscript>ω</subscript>).</para>
+
+ <para>GHC’s front-end translates full Haskell 98 (plus some extensions)
+ into Core. The GHC optimizer then repeatedly transforms Core
+ programs while preserving their meaning. A <quote>Core Lint</quote> pass in GHC
+ typechecks Core in between transformation passes (at least when
+ the user enables linting by setting a compiler flag), verifying
+ that transformations preserve type-correctness. Finally, GHC’s
+ back-end translates Core into STG-machine code <citation>stg-machine</citation> and then into C
+ or native code.</para>
+
+ <para>Two existing papers discuss the original rationale for the design
+ and use of Core <citation>ghc-inliner,comp-by-trans-scp</citation>, although the (two different) idealized
+ versions of Core described therein differ in significant ways from
+ the actual Core language in current GHC. In particular, with the
+ advent of GHC support for generalized algebraic datatypes (GADTs)
+ <citation>gadts</citation> Core was extended beyond its previous
+ F<subscript>ω</subscript>-style incarnation to support type
+ equality constraints and safe coercions, and is now based on a
+ system known as F<subscript>C</subscript> <citation>system-fc</citation>.</para>
+
+ <para>Researchers interested in writing just <emphasis>part</emphasis> of a Haskell compiler,
+ such as a new back-end or a new optimizer pass, might like to use
+ GHC to provide the other parts of the compiler. For example, they
+ might like to use GHC’s front-end to parse, desugar, and
+ type-check source Haskell, then feeding the resulting code to
+ their own back-end tool. As another example, they might like to
+ use Core as the target language for a front-end compiler of their
+ own design, feeding externally synthesized Core into GHC in order
+ to take advantage of GHC’s optimizer, code generator, and run-time
+ system. Without external Core, there are two ways for compiler
+ writers to do this: they can link their code into the GHC
+ executable, which is an arduous process, or they can use the GHC
+ API <citation>ghc-api</citation> to do the same task more cleanly. Both ways require new
+ code to be written in Haskell.</para>
+
+ <para>We present a precisely specified external format for Core files.
+ The external format is text-based and human-readable, to promote
+ interoperability and ease of use. We hope this format will make it
+ easier for external developers to use GHC in a modular way.</para>
+
+ <para>It has long been true that GHC prints an ad-hoc textual
+ representation of Core if you set certain compiler flags. But this
+ representation is intended to be read by people who are debugging
+ the compiler, not by other programs. Making Core into a
+ machine-readable, bi-directional communication format requires:
+
+ <orderedlist>
+ <listitem>
+ precisely specifying the external format of Core;
+ </listitem>
+ <listitem>
+ modifying GHC to generate external Core files
+ (post-simplification; as always, users can control the exact
+ transformations GHC does with command-line flags);
+ </listitem>
+ <listitem>
+ modifying GHC to accept external Core files in place of
+ Haskell source files (users will also be able to control what
+ GHC does to those files with command-line flags).
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <para>The first two facilities will let developers couple GHC’s
+ front-end (parser, type-checker, desugarer), and optionally its
+ optimizer, with new back-end tools. The last facility will let
+ developers write new Core-to-Core transformations as an external
+ tool and integrate them into GHC. It will also allow new
+ front-ends to generate Core that can be fed into GHC’s optimizer
+ or back-end.</para>
+
+ <para>However, because there are many (undocumented) idiosyncracies in
+ the way GHC produces Core from source Haskell, it will be hard for
+ an external tool to produce Core that can be integrated with
+ GHC-produced Core (e.g., for the Prelude), and we don’t aim to
+ support this. Indeed, for the time being, we aim to support only
+ the first two facilities and not the third: we define and
+ implement Core as an external format that GHC can use to
+ communicate with external back-end tools, and defer the larger
+ task of extending GHC to support reading this external format back
+ in.</para>
+
+ <para>This document addresses the first requirement, a formal Core
+ definition, by proposing a formal grammar for an
+ <link linkend="external-grammar-of-core">external representation of Core</link>,
+ and an <link linkend="informal-semantics">informal semantics</link>.</para>
+
+ <para>GHC supports many type system extensions; the External Core
+ printer built into GHC only supports some of them. However,
+ External Core should be capable of representing any Haskell 98
+ program, and may be able to represent programs that require
+ certain type system extensions as well. If a program uses
+ unsupported features, GHC may fail to compile it to Core when the
+ -fext-core flag is set, or GHC may successfully compile it to
+ Core, but the external tools will not be able to typecheck or
+ interpret it.</para>
+
+ <para>Formal static and dynamic semantics in the form of an executable
+ typechecker and interpreter are available separately in the GHC
+ source tree
+ <footnote><ulink url="http://darcs.haskell.org/ghc">http://darcs.haskell.org/ghc</ulink></footnote>
+ under <code>utils/ext-core</code>.</para>
+
+ </section>
+ <section id="external-grammar-of-core">
+ <title>External Grammar of Core</title>
+
+ <para>In designing the external grammar, we have tried to strike a
+ balance among a number of competing goals, including easy
+ parseability by machines, easy readability by humans, and adequate
+ structural simplicity to allow straightforward presentations of
+ the semantics. Thus, we had to make some compromises.
+ Specifically:</para>
+
+ <itemizedlist>
+ <listitem>In order to avoid explosion of parentheses, we support
+ standard precedences and short-cuts for expressions, types,
+ and kinds. Thus we had to introduce multiple non-terminals for
+ each of these syntactic categories, and as a result, the
+ concrete grammar is longer and more complex than the
+ underlying abstract syntax.</listitem>
+
+ <listitem>On the other hand, we have kept the grammar simpler by
+ avoiding special syntax for tuple types and terms. Tuples
+ (both boxed and unboxed) are treated as ordinary constructors.</listitem>
+
+ <listitem>All type abstractions and applications are given in full, even
+ though some of them (e.g., for tuples) could be reconstructed;
+ this means a parser for Core does not have to reconstruct
+ types.<footnote>
+ These choices are certainly debatable. In
+ particular, keeping type applications on tuples and case arms
+ considerably increases the size of Core files and makes them less
+ human-readable, though it allows a Core parser to be simpler.
+ </footnote></listitem>
+
+ <listitem>The syntax of identifiers is heavily restricted (to just
+ alphanumerics and underscores); this again makes Core easier
+ to parse but harder to read.</listitem>
+ </itemizedlist>
+
+ <para>We use the following notational conventions for syntax:
+
+ <informaltable frame="none">
+ <tgroup cols='2' align='left' colsep="0" rowsep="0">
+ <tbody>
+ <row>
+ <entry>[ pat ]</entry>
+ <entry>optional</entry>
+ </row>
+
+ <row>
+ <entry>{ pat }</entry>
+ <entry>zero or more repetitions</entry>
+ </row>
+
+ <row>
+ <entry>
+ { pat }<superscript>+</superscript>
+ </entry>
+ <entry>one or more repetitions</entry>
+ </row>
+
+ <row>
+ <entry>
+ pat<subscript>1</subscript> ∣ pat<subscript>2</subscript>
+ </entry>
+ <entry>choice</entry>
+ </row>
+
+ <row>
+ <entry>
+ <code>fibonacci</code>
+ </entry>
+ <entry>terminal syntax in typewriter font</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </para>
+
+ <informaltable frame="none" colsep="0" rowsep="0">
+ <tgroup cols='5'>
+ <colspec colname="cat" align="left" colwidth="3*" />
+ <colspec colname="lhs" align="right" colwidth="2*" />
+ <colspec align="center" colwidth="*" />
+ <colspec colname="rhs" align="left" colwidth="10*" />
+ <colspec colname="name" align="right" colwidth="6*" />
+ <tbody>
+ <row rowsep="1">
+ <entry>Module</entry>
+ <entry>module</entry>
+ <entry>→</entry>
+ <entry>
+ <code>%module</code> mident { tdef ; }{ vdefg ; }
+ </entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">Type defn.</entry>
+ <entry morerows="1" valign="top">tdef</entry>
+ <entry>→</entry>
+ <entry>
+ <code>%data</code> qtycon { tbind } <code>= {</code> [ cdef {<code>;</code> cdef } ] <code>}</code>
+ </entry>
+ <entry>algebraic type</entry>
+ </row>
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry>
+ <code>%newtype</code> qtycon qtycon { tbind } <code>=</code> ty
+ </entry>
+ <entry>newtype</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>Constr. defn.</entry>
+ <entry>cdef</entry>
+ <entry>→</entry>
+ <entry>
+ qdcon { <code>@</code> tbind }{ aty }<superscript>+</superscript>
+ </entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top">Value defn.</entry>
+ <entry morerows="1" valign="top">vdefg</entry>
+ <entry>→</entry>
+ <entry><code>%rec {</code> vdef { <code>;</code> vdef } <code>}</code></entry>
+ <entry>recursive</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>vdef</entry>
+ <entry>non-recursive</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>vdef</entry>
+ <entry>→</entry>
+ <entry>qvar <code>::</code> ty <code>=</code> exp</entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry morerows="3" valign="top">Atomic expr.</entry>
+ <entry morerows="3" valign="top">aexp</entry>
+ <entry>→</entry>
+ <entry>qvar</entry>
+ <entry>variable</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>qdcon</entry>
+ <entry>data constructor</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>lit</entry>
+ <entry>literal</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>(</code> exp <code>)</code></entry>
+ <entry>nested expr.</entry>
+ </row>
+
+ <row>
+ <entry morerows="9" valign="top">Expression</entry>
+ <entry morerows="9" valign="top">exp</entry>
+ <entry>→</entry>
+ <entry>aexp</entry>
+ <entry>atomic expresion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>aexp { arg }<superscript>+</superscript></entry>
+ <entry>application</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>\</code> { binder }<superscript>+</superscript> &arw; exp</entry>
+ <entry>abstraction</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%let</code> vdefg <code>%in</code> exp</entry>
+ <entry>local definition</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%case (</code> aty <code>)</code> exp <code>%of</code> vbind <code>{</code> alt { <code>;</code> alt } <code>}</code></entry>
+ <entry>case expression</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%cast</code> exp aty</entry>
+ <entry>type coercion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%note</code> &quot; { char } &quot; exp</entry>
+ <entry>expression note</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%external ccall &quot;</code> { char } <code>&quot;</code> aty</entry>
+ <entry>external reference</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%dynexternal ccall</code> aty</entry>
+ <entry>external reference (dynamic)</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>%label &quot;</code> { char } <code>&quot;</code></entry>
+ <entry>external label</entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">Argument</entry>
+ <entry morerows="1" valign="top">arg</entry>
+ <entry>→</entry>
+ <entry><code>@</code> aty</entry>
+ <entry>type argument</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry>aexp</entry>
+ <entry>value argument</entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top">Case alt.</entry>
+ <entry morerows="2" valign="top">alt</entry>
+ <entry>→</entry>
+ <entry>qdcon { <code>@</code> tbind }{ vbind } <code>&arw;</code> exp</entry>
+ <entry>constructor alternative</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>lit <code>&arw;</code> exp</entry>
+ <entry>literal alternative</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>%_ &arw;</code> exp</entry>
+ <entry>default alternative</entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">Binder</entry>
+ <entry morerows="1" valign="top">binder</entry>
+ <entry>→</entry>
+ <entry><code>@</code> tbind</entry>
+ <entry>type binder</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry>vbind</entry>
+ <entry>value binder</entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">Type binder</entry>
+ <entry morerows="1" valign="top">tbind</entry>
+ <entry>→</entry>
+ <entry>tyvar</entry>
+ <entry>implicitly of kind *</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>(</code> tyvar <code>::</code> kind <code>)</code></entry>
+ <entry>explicitly kinded</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>Value binder</entry>
+ <entry>vbind</entry>
+ <entry>→</entry>
+ <entry><code>(</code> var <code>::</code> ty <code>)</code></entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry morerows="3" valign="top">Literal</entry>
+ <entry morerows="3" valign="top">lit</entry>
+ <entry>→</entry>
+ <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry>
+ <entry>integer</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>%</code> { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry>
+ <entry>rational</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>( '</code> char <code>' ::</code> ty <code>)</code></entry>
+ <entry>character</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>( &quot;</code> { char } <code>&quot; ::</code> ty <code>)</code></entry>
+ <entry>string</entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top">Character</entry>
+ <entry morerows="1" valign="top">char</entry>
+ <entry>→</entry>
+ <entry namest="rhs" nameend="name"><emphasis>any ASCII character in range 0x20-0x7E except 0x22,0x27,0x5c</emphasis></entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>\x</code> hex hex</entry>
+ <entry>ASCII code escape sequence</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>hex</entry>
+ <entry>→</entry>
+ <entry>0∣…∣9 ∣a ∣…∣f</entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top">Atomic type</entry>
+ <entry morerows="2" valign="top">aty</entry>
+ <entry>→</entry>
+ <entry>tyvar</entry>
+ <entry>type variable</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>qtycon</entry>
+ <entry>type constructor</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>(</code> ty <code>)</code></entry>
+ <entry>nested type</entry>
+ </row>
+
+ <row>
+ <entry morerows="7" valign="top">Basic type</entry>
+ <entry morerows="7" valign="top">bty</entry>
+ <entry>→</entry>
+ <entry>aty</entry>
+ <entry>atomic type</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>bty aty</entry>
+ <entry>type application</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%trans</code> aty aty</entry>
+ <entry>transitive coercion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%sym</code> aty</entry>
+ <entry>symmetric coercion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%unsafe</code> aty aty</entry>
+ <entry>unsafe coercion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%left</code> aty</entry>
+ <entry>left coercion</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%right</code> aty</entry>
+ <entry>right coercion</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>%inst</code> aty aty</entry>
+ <entry>instantiation coercion</entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top">Type</entry>
+ <entry morerows="2" valign="top">ty</entry>
+ <entry>→</entry>
+ <entry>bty</entry>
+ <entry>basic type</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>%forall</code> { tbind }<superscript>+</superscript> <code>.</code> ty</entry>
+ <entry>type abstraction</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry>bty <code>&arw;</code> ty</entry>
+ <entry>arrow type construction</entry>
+ </row>
+
+ <row>
+ <entry morerows="4" valign="top">Atomic kind</entry>
+ <entry morerows="4" valign="top">akind</entry>
+ <entry>→</entry>
+ <entry><code>*</code></entry>
+ <entry>lifted kind</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>#</code></entry>
+ <entry>unlifted kind</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry><code>?</code></entry>
+ <entry>open kind</entry>
+ </row>
+
+ <row>
+ <entry>∣</entry>
+ <entry>bty <code>:=:</code> bty</entry>
+ <entry>equality kind</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry><code>(</code> kind <code>)</code></entry>
+ <entry>nested kind</entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">Kind</entry>
+ <entry morerows="1" valign="top">kind</entry>
+ <entry>→</entry>
+ <entry>akind</entry>
+ <entry>atomic kind</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>∣</entry>
+ <entry>akind <code>&arw;</code> kind</entry>
+ <entry>arrow kind</entry>
+ </row>
+
+ <row>
+ <entry morerows="7" valign="top">Identifier</entry>
+ <entry>mident</entry>
+ <entry>→</entry>
+ <entry>pname <code>:</code> uname</entry>
+ <entry>module</entry>
+ </row>
+
+ <row>
+ <entry>tycon</entry>
+ <entry>→</entry>
+ <entry>uname</entry>
+ <entry>type constr.</entry>
+ </row>
+
+ <row>
+ <entry>qtycon</entry>
+ <entry>→</entry>
+ <entry>mident <code>.</code> tycon</entry>
+ <entry>qualified type constr.</entry>
+ </row>
+
+ <row>
+ <entry>tyvar</entry>
+ <entry>→</entry>
+ <entry>lname</entry>
+ <entry>type variable</entry>
+ </row>
+
+ <row>
+ <entry>dcon</entry>
+ <entry>→</entry>
+ <entry>uname</entry>
+ <entry>data constr.</entry>
+ </row>
+
+ <row>
+ <entry>qdcon</entry>
+ <entry>→</entry>
+ <entry>mident <code>.</code> dcon</entry>
+ <entry>qualified data constr.</entry>
+ </row>
+
+ <row>
+ <entry>var</entry>
+ <entry>→</entry>
+ <entry>lname</entry>
+ <entry>variable</entry>
+ </row>
+
+ <row rowsep="1">
+ <entry>qvar</entry>
+ <entry>→</entry>
+ <entry>[ mident <code>.</code> ] var</entry>
+ <entry>optionally qualified variable</entry>
+ </row>
+
+ <row>
+ <entry morerows="6" valign="top">Name</entry>
+ <entry>lname</entry>
+ <entry>→</entry>
+ <entry>lower { namechar }</entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>uname</entry>
+ <entry>→</entry>
+ <entry>upper { namechar }</entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>pname</entry>
+ <entry>→</entry>
+ <entry>{ namechar }<superscript>+</superscript></entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>namechar</entry>
+ <entry>→</entry>
+ <entry>lower ∣ upper ∣ digit</entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>lower</entry>
+ <entry>→</entry>
+ <entry><code>a</code> ∣ <code>b</code> ∣ … ∣ <code>z</code> ∣ <code>_</code></entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>upper</entry>
+ <entry>→</entry>
+ <entry><code>A</code> ∣ <code>B</code> ∣ … ∣ <code>Z</code></entry>
+ <entry></entry>
+ </row>
+
+ <row>
+ <entry>digit</entry>
+ <entry>→</entry>
+ <entry><code>0</code> ∣ <code>1</code> ∣ … ∣ <code>9</code></entry>
+ <entry></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+ <section id="informal-semantics">
+ <title>Informal Semantics</title>
+
+ <para>At the term level, Core resembles a explicitly-typed polymorphic
+ lambda calculus (F<subscript>ω</subscript>), with the addition of
+ local <code>let</code> bindings, algebraic type definitions, constructors, and
+ <code>case</code> expressions, and primitive types, literals and operators. Its
+ type system is richer than that of System F, supporting explicit
+ type equality coercions and type functions.<citation>system-fc</citation></para>
+
+ <para>In this section we concentrate on the less obvious points about
+ Core.</para>
+
+ <section id="program-organization-and-modules">
+ <title>Program Organization and Modules</title>
+
+ <para>Core programs are organized into <emphasis>modules</emphasis>, corresponding directly
+ to source-level Haskell modules. Each module has a identifying
+ name <emphasis>mident</emphasis>. A module identifier consists of a <emphasis>package name</emphasis>
+ followed by a module name, which may be hierarchical: for
+ example, <code>base:GHC.Base</code> is the module identifier for GHC’s Base
+ module. Its name is <code>Base</code>, and it lives in the GHC hierarchy
+ within the <code>base</code> package. Section 5.8 of the GHC users’ guide
+ explains package names <citation>ghc-user-guide</citation>. In particular, note that a Core
+ program may contain multiple modules with the same (possibly
+ hierarchical) module name that differ in their package names. In
+ some of the code examples that follow, we will omit package
+ names and possibly full hierarchical module names from
+ identifiers for brevity, but be aware that they are always
+ required.<footnote>
+ A possible improvement to the Core syntax
+ would be to add explicit import lists to Core modules, which could be
+ used to specify abbrevations for long qualified names. This would make
+ the code more human-readable.
+ </footnote></para>
+
+ <para>Each module may contain the following kinds of top-level
+ declarations:
+
+ <itemizedlist>
+ <listitem>
+ Algebraic data type declarations, each defining a type
+ constructor and one or more data constructors;
+ </listitem>
+ <listitem>
+ Newtype declarations, corresponding to Haskell <code>newtype</code>
+ declarations, each defining a type constructor and a
+ coercion name; and
+ </listitem>
+ <listitem>
+ Value declarations, defining the types and values of
+ top-level variables.
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>No type constructor, data constructor, or top-level value may be
+ declared more than once within a given module. All the type
+ declarations are (potentially) mutually recursive. Value
+ declarations must be in dependency order, with explicit grouping
+ of potentially mutually recursive declarations.</para>
+
+ <para>Identifiers defined in top-level declarations may be <emphasis>external</emphasis> or
+ <emphasis>internal</emphasis>. External identifiers can be referenced from any other
+ module in the program, using conventional dot notation (e.g.,
+ <code>base:GHC.Base.Bool</code>, <code>base:GHC.Base.True</code>). Internal identifiers
+ are visible only within the defining module. All type and data
+ constructors are external, and are always defined and referenced
+ using fully qualified names (with dots).</para>
+
+ <para>A top-level value is external if it is defined and referenced
+ using a fully qualified name with a dot (e.g., <code>main:MyModule.foo = ...</code>);
+ otherwise, it is internal (e.g., <code>bar = ...</code>). Note that
+ Core’s notion of an external identifier does not necessarily
+ coincide with that of <quote>exported</quote> identifier in a Haskell source
+ module. An identifier can be an external identifier in Core, but
+ not be exported by the original Haskell source
+ module.<footnote>
+ Two examples of such identifiers are: data
+ constructors, and values that potentially appear in an unfolding. For an
+ example of the latter, consider <code>Main.foo = ... Main.bar ...</code>, where
+ <code>Main.foo</code> is inlineable. Since <code>bar</code> appears in <code>foo</code>’s unfolding, it is
+ defined and referenced with an external name, even if <code>bar</code> was not
+ exported by the original source module.
+ </footnote>
+ However, if an identifier was exported by the Haskell source
+ module, it will appear as an external name in Core.</para>
+
+ <para>Core modules have no explicit import or export lists. Modules
+ may be mutually recursive. Note that because of the latter fact,
+ GHC currently prints out the top-level bindings for every module
+ as a single recursive group, in order to avoid keeping track of
+ dependencies between top-level values within a module. An
+ external Core tool could reconstruct dependencies later, of
+ course.</para>
+
+ <para>There is also an implicitly-defined module <code>ghc-prim:GHC.Prim</code>,
+ which exports the <quote>built-in</quote> types and values that must be
+ provided by any implementation of Core (including GHC). Details
+ of this module are in the <link linkend="primitive-module">Primitive Module section</link>.</para>
+
+ <para>A Core <emphasis>program</emphasis> is a collection of distinctly-named modules that
+ includes a module called main:Main having an exported value
+ called <code>main:ZCMain.main</code> of type <code>base:GHC.IOBase.IO a</code> (for some
+ type <code>a</code>). (Note that the strangely named wrapper for <code>main</code> is the
+ one exception to the rule that qualified names defined within a
+ module <code>m</code> must have module name <code>m</code>.)</para>
+
+ <para>Many Core programs will contain library modules, such as
+ <code>base:GHC.Base</code>, which implement parts of the Haskell standard
+ library. In principle, these modules are ordinary Haskell
+ modules, with no special status. In practice, the requirement on
+ the type of <code>main:Main.main</code> implies that every program will
+ contain a large subset of the standard library modules.</para>
+
+ </section>
+ <section id="namespaces">
+ <title>Namespaces</title>
+
+ <para>There are five distinct namespaces:
+ <orderedlist>
+ <listitem>module identifiers (<code>mident</code>),</listitem>
+ <listitem>type constructors (<code>tycon</code>),</listitem>
+ <listitem>type variables (<code>tyvar</code>),</listitem>
+ <listitem>data constructors (<code>dcon</code>),</listitem>
+ <listitem>term variables (<code>var</code>).</listitem>
+ </orderedlist>
+ </para>
+
+ <para>Spaces (1), (2+3), and (4+5) can be distinguished from each
+ other by context. To distinguish (2) from (3) and (4) from (5),
+ we require that data and type constructors begin with an
+ upper-case character, and that term and type variables begin
+ with a lower-case character.</para>
+
+ <para>Primitive types and operators are not syntactically
+ distinguished.</para>
+
+ <para>Primitive <emphasis>coercion</emphasis> operators, of which there are six, <emphasis>are</emphasis>
+ syntactically distinguished in the grammar. This is because
+ these coercions must be fully applied, and because
+ distinguishing their applications in the syntax makes
+ typechecking easier.</para>
+
+ <para>A given variable (type or term) may have multiple definitions
+ within a module. However, definitions of term variables never
+ <quote>shadow</quote> one another: the scope of the definition of a given
+ variable never contains a redefinition of the same variable.
+ Type variables may be shadowed. Thus, if a term variable has
+ multiple definitions within a module, all those definitions must
+ be local (let-bound). The only exception to this rule is that
+ (necessarily closed) types labelling <code>%external</code> expressions may
+ contain <code>tyvar</code> bindings that shadow outer bindings.</para>
+
+ <para>Core generated by GHC makes heavy use of encoded names, in which
+ the characters <code>Z</code> and <code>z</code> are used to introduce escape sequences
+ for non-alphabetic characters such as dollar sign <code>$</code> (<code>zd</code>), hash <code>#</code>
+ (<code>zh</code>), plus <code>+</code> (<code>zp</code>), etc. This is the same encoding used in <code>.hi</code>
+ files and in the back-end of GHC itself, except that we
+ sometimes change an initial <code>z</code> to <code>Z</code>, or vice-versa, in order to
+ maintain case distinctions.</para>
+
+ <para>Finally, note that hierarchical module names are z-encoded in
+ Core: for example, <code>base:GHC.Base.foo</code> is rendered as
+ <code>base:GHCziBase.foo</code>. A parser may reconstruct the module
+ hierarchy, or regard <code>GHCziBase</code> as a flat name.</para>
+
+ </section>
+ <section id="types-and-kinds">
+ <title>Types and Kinds</title>
+
+ <para>In Core, all type abstractions and applications are explicit.
+ This make it easy to typecheck any (closed) fragment of Core
+ code. An full executable typechecker is available separately.</para>
+
+ <section id="types">
+ <title>Types</title>
+
+ <para>Types are described by type expressions, which are built from
+ named type constructors and type variables using type
+ application and universal quantification. Each type
+ constructor has a fixed arity ≥ 0. Because it is so widely
+ used, there is special infix syntax for the fully-applied
+ function type constructor (<code>&arw;</code>). (The prefix identifier for
+ this constructor is <code>ghc-prim:GHC.Prim.ZLzmzgZR</code>; this should
+ only appear in unapplied or partially applied form.)</para>
+
+ <para>There are also a number of other primitive type constructors
+ (e.g., <code>Intzh</code>) that are predefined in the <code>GHC.Prim</code> module, but
+ have no special syntax. <code>%data</code> and <code>%newtype</code> declarations
+ introduce additional type constructors, as described below.
+ Type constructors are distinguished solely by name.</para>
+
+ </section>
+ <section id="coercions">
+ <title>Coercions</title>
+
+ <para>A type may also be built using one of the primitive coercion
+ operators, as described in <link linkend="namespaces">the Namespaces section</link>. For details on the
+ meanings of these operators, see the System FC paper <citation>system-fc</citation>. Also
+ see <link linkend="newtypes">the Newtypes section</link> for
+ examples of how GHC uses coercions in Core code.</para>
+
+ </section>
+ <section id="kinds">
+ <title>Kinds</title>
+ <para>As described in the Haskell definition, it is necessary to
+ distinguish well-formed type-expressions by classifying them
+ into different <emphasis>kinds</emphasis> <citation>haskell98, p. 41</citation><!-- TODO -->. In particular, Core
+ explicitly records the kind of every bound type variable.</para>
+
+ <para>In addition, Core’s kind system includes equality kinds, as in
+ System FC <citation>system-fc</citation>. An application of a built-in coercion, or of a
+ user-defined coercion as introduced by a <code>newtype</code> declaration,
+ has an equality kind.</para>
+
+ </section>
+ <section id="lifted-and-unlifted-types">
+ <title>Lifted and Unlifted Types</title>
+
+ <para>Semantically, a type is <emphasis>lifted</emphasis> if and only if it has bottom as
+ an element. We need to distinguish them because operationally,
+ terms with lifted types may be represented by closures; terms
+ with unlifted types must not be represented by closures, which
+ implies that any unboxed value is necessarily unlifted. We
+ distinguish between lifted and unlifted types by ascribing
+ them different kinds.</para>
+
+ <para>Currently, all the primitive types are unlifted (including a
+ few boxed primitive types such as <code>ByteArrayzh</code>). Peyton-Jones
+ and Launchbury <citation>pj:unboxed</citation> described the ideas behind unboxed and
+ unlifted types.</para>
+
+ </section>
+ <section id="type-constructors-base-kinds-and-higher-kinds">
+ <title>Type Constructors; Base Kinds and Higher Kinds</title>
+
+ <para>Every type constructor has a kind, depending on its arity and
+ whether it or its arguments are lifted.</para>
+
+ <para>Term variables can only be assigned types that have base
+ kinds: the base kinds are <code>*</code>, <code>#</code>, and <code>?</code>. The three base kinds
+ distinguish the liftedness of the types they classify: <code>*</code>
+ represents lifted types; <code>#</code> represents unlifted types; and <code>?</code> is
+ the <quote>open</quote> kind, representing a type that may be either lifted
+ or unlifted. Of these, only <code>*</code> ever appears in Core type
+ declarations generated from user code; the other two are
+ needed to describe certain types in primitive (or otherwise
+ specially-generated) code (which, after optimization, could
+ potentially appear anywhere).</para>
+
+ <para>In particular, no top-level identifier (except in
+ <code>ghc-prim:GHC.Prim</code>) has a type of kind <code>#</code> or <code>?</code>.</para>
+
+ <para>Nullary type constructors have base kinds: for example, the
+ type <code>Int</code> has kind <code>*</code>, and <code>Int#</code> has kind <code>#</code>.</para>
+
+ <para>Non-nullary type constructors have higher kinds: kinds that
+ have the form
+ k<subscript>1</subscript><code>&arw;</code>k<subscript>2</subscript>, where
+ k<subscript>1</subscript> and k<subscript>2</subscript> are
+ kinds. For example, the function type constructor <code>&arw;</code> has
+ kind <code>* &arw; (* &arw; *)</code>. Since Haskell allows abstracting
+ over type constructors, type variables may have higher kinds;
+ however, much more commonly they have kind <code>*</code>, so that is the
+ default if a type binder omits a kind.</para>
+
+ </section>
+
+ <section id="type-synonyms-and-type-equivalence">
+ <title>Type Synonyms and Type Equivalence</title>
+
+ <para>There is no mechanism for defining type synonyms
+ (corresponding to Haskell <code>type</code> declarations).</para>
+
+ <para>Type equivalence is just syntactic equivalence on type
+ expressions (of base kinds) modulo:</para>
+
+ <itemizedlist>
+ <listitem>alpha-renaming of variables bound in <code>%forall</code> types;</listitem>
+ <listitem>the identity a <code>&arw;</code> b ≡ <code>ghc-prim:GHC.Prim.ZLzmzgZR</code> a b</listitem>
+ </itemizedlist>
+
+ </section>
+ </section>
+ <section id="algebraic-data-types">
+ <title>Algebraic data types</title>
+
+ <para>Each data declaration introduces a new type constructor and a
+ set of one or more data constructors, normally corresponding
+ directly to a source Haskell <code>data</code> declaration. For example, the
+ source declaration
+
+ <programlisting language="haskell">
+data Bintree a =
+ Fork (Bintree a) (Bintree a)
+ | Leaf a
+ </programlisting>
+
+ might induce the following Core declaration
+
+ <programlisting language="java">
+%data Bintree a = {
+ Fork (Bintree a) (Bintree a);
+ Leaf a)}
+ </programlisting>
+
+ which introduces the unary type constructor Bintree of kind
+ <code>*&arw;*</code> and two data constructors with types
+
+ <programlisting language="java">
+Fork :: %forall a . Bintree a &arw; Bintree a &arw; Bintree a
+Leaf :: %forall a . a &arw; Bintree a
+ </programlisting>
+
+ We define the <emphasis>arity</emphasis> of each data constructor to be the number of
+ value arguments it takes; e.g. <code>Fork</code> has arity 2 and <code>Leaf</code> has
+ arity 1.</para>
+
+ <para>For a less conventional example illustrating the possibility of
+ higher-order kinds, the Haskell source declaration
+
+ <programlisting language="haskell">
+data A f a = MkA (f a)
+ </programlisting>
+
+ might induce the Core declaration
+
+ <programlisting language="java">
+%data A (f::*&arw;*) a = { MkA (f a) }
+ </programlisting>
+
+ which introduces the constructor
+
+ <programlisting language="java">
+MkA :: %forall (f::*&arw;*) a . (f a) &arw; (A f) a
+ </programlisting></para>
+
+ <para>GHC (like some other Haskell implementations) supports an
+ extension to Haskell98 for existential types such as
+
+ <programlisting language="haskell">
+data T = forall a . MkT a (a &arw; Bool)
+ </programlisting>
+
+ This is represented by the Core declaration
+
+ <programlisting language="java">
+%data T = {MkT @a a (a &arw; Bool)}
+ </programlisting>
+
+ which introduces the nullary type constructor T and the data
+ constructor
+
+ <programlisting language="java">
+MkT :: %forall a . a &arw; (a &arw; Bool) &arw; T
+ </programlisting>
+
+ In general, existentially quantified variables appear as extra
+ universally quantified variables in the data contructor types. An
+ example of how to construct and deconstruct values of type <code>T</code> is
+ shown in <link linkend="expression-forms">the Expression Forms section</link>.</para>
+
+ </section>
+ <section id="newtypes">
+ <title>Newtypes</title>
+
+ <para>Each Core <code>%newtype</code> declaration introduces a new type constructor
+ and an associated representation type, corresponding to a source
+ Haskell <code>newtype</code> declaration. However, unlike in source Haskell,
+ a <code>%newtype</code> declaration does not introduce any data constructors.</para>
+
+ <para>Each <code>%newtype</code> declaration also introduces a new coercion
+ (syntactically, just another type constructor) that implies an
+ axiom equating the type constructor, applied to any type
+ variables bound by the <code>%newtype</code>, to the representation type.</para>
+
+ <para>For example, the Haskell fragment
+
+ <programlisting language="haskell">
+newtype U = MkU Bool
+u = MkU True
+v = case u of
+ MkU b &arw; not b
+ </programlisting>
+
+ might induce the Core fragment
+
+ <programlisting language="java">
+%newtype U ZCCoU = Bool;
+u :: U = %cast (True)
+ ((%sym ZCCoU));
+v :: Bool = not (%cast (u) ZCCoU);
+ </programlisting></para>
+
+ <para>The <code>newtype</code> declaration implies that the types <code>U</code> and <code>Bool</code> have
+ equivalent representations, and the coercion axiom <code>ZCCoU</code>
+ provides evidence that <code>U</code> is equivalent to <code>Bool</code>. Notice that in
+ the body of <code>u</code>, the boolean value <code>True</code> is cast to type <code>U</code> using
+ the primitive symmetry rule applied to <code>ZCCoU</code>: that is, using a
+ coercion of kind <code>Bool :=: U</code>. And in the body of <code>v</code>, <code>u</code> is cast
+ back to type <code>Bool</code> using the axiom <code>ZCCoU</code>.</para>
+
+ <para>Notice that the <code>case</code> in the Haskell source code above translates
+ to a <code>cast</code> in the corresponding Core code. That is because
+ operationally, a <code>case</code> on a value whose type is declared by a
+ <code>newtype</code> declaration is a no-op. Unlike a <code>case</code> on any other
+ value, such a <code>case</code> does no evaluation: its only function is to
+ coerce its scrutinee’s type.</para>
+
+ <para>Also notice that unlike in a previous draft version of External
+ Core, there is no need to handle recursive newtypes specially.</para>
+
+ </section>
+
+ <section id="expression-forms">
+ <title>Expression Forms</title>
+
+ <para>Variables and data constructors are straightforward.</para>
+
+ <para>Literal (<emphasis role="variable">lit</emphasis>) expressions consist of a literal value, in one of
+ four different formats, and a (primitive) type annotation. Only
+ certain combinations of format and type are permitted;
+ see <link linkend="primitive-module">the Primitive Module section</link>.
+ The character and string formats can describe only 8-bit ASCII characters.</para>
+
+ <para>Moreover, because the operational semantics for Core interprets
+ strings as C-style null-terminated strings, strings should not
+ contain embedded nulls.</para>
+
+ <para>In Core, value applications, type applications, value
+ abstractions, and type abstractions are all explicit. To tell
+ them apart, type arguments in applications and formal type
+ arguments in abstractions are preceded by an <code>@ symbol</code>. (In
+ abstractions, the <code>@</code> plays essentially the same role as the more
+ usual Λ symbol.) For example, the Haskell source declaration
+
+ <programlisting language="haskell">
+f x = Leaf (Leaf x)
+ </programlisting>
+
+ might induce the Core declaration
+
+ <programlisting language="java">
+f :: %forall a . a &arw; BinTree (BinTree a) =
+ \ @a (x::a) &arw; Leaf @(Bintree a) (Leaf @a x)
+ </programlisting></para>
+
+ <para>Value applications may be of user-defined functions, data
+ constructors, or primitives. None of these sorts of applications
+ are necessarily saturated.</para>
+
+ <para>Note that the arguments of type applications are not always of
+ kind <code>*</code>. For example, given our previous definition of type <code>A</code>:
+
+ <programlisting language="haskell">
+data A f a = MkA (f a)
+ </programlisting>
+
+ the source code
+
+ <programlisting language="haskell">
+MkA (Leaf True)
+ </programlisting>
+
+ becomes
+
+ <programlisting language="java">
+(MkA @Bintree @Bool) (Leaf @Bool True)
+ </programlisting></para>
+
+ <para>Local bindings, of a single variable or of a set of mutually
+ recursive variables, are represented by <code>%let</code> expressions in the
+ usual way.</para>
+
+ <para>By far the most complicated expression form is <code>%case</code>. <code>%case</code>
+ expressions are permitted over values of any type, although they
+ will normally be algebraic or primitive types (with literal
+ values). Evaluating a <code>%case</code> forces the evaluation of the
+ expression being tested (the <quote>scrutinee</quote>). The value of the
+ scrutinee is bound to the variable following the <code>%of</code> keyword,
+ which is in scope in all alternatives; this is useful when the
+ scrutinee is a non-atomic expression (see next example). The
+ scrutinee is preceded by the type of the entire <code>%case</code>
+ expression: that is, the result type that all of the <code>%case</code>
+ alternatives have (this is intended to make type reconstruction
+ easier in the presence of type equality coercions).</para>
+
+ <para>In an algebraic <code>%case</code>, all the case alternatives must be labeled
+ with distinct data constructors from the algebraic type,
+ followed by any existential type variable bindings (see below),
+ and typed term variable bindings corresponding to the data
+ constructor’s arguments. The number of variables must match the
+ data constructor’s arity.</para>
+
+ <para>For example, the following Haskell source expression
+
+ <programlisting language="haskell">
+case g x of
+ Fork l r &arw; Fork r l
+ t@(Leaf v) &arw; Fork t t
+ </programlisting>
+
+ might induce the Core expression
+
+ <programlisting language="java">
+%case ((Bintree a)) g x %of (t::Bintree a)
+ Fork (l::Bintree a) (r::Bintree a) &arw;
+ Fork @a r l
+ Leaf (v::a) &arw;
+ Fork @a t t
+ </programlisting></para>
+
+ <para>When performing a <code>%case</code> over a value of an
+ existentially-quantified algebraic type, the alternative must
+ include extra local type bindings for the
+ existentially-quantified variables. For example, given
+
+ <programlisting language="haskell">
+data T = forall a . MkT a (a &arw; Bool)
+ </programlisting>
+
+ the source
+
+ <programlisting language="haskell">
+case x of
+ MkT w g &arw; g w
+ </programlisting>
+
+ becomes
+
+ <programlisting language="java">
+%case x %of (x’::T)
+ MkT @b (w::b) (g::b&arw;Bool) &arw; g w
+ </programlisting></para>
+
+ <para>In a <code>%case</code> over literal alternatives, all the case alternatives
+ must be distinct literals of the same primitive type.</para>
+
+ <para>The list of alternatives may begin with a default alternative
+ labeled with an underscore (<code>%_</code>), whose right-hand side will be
+ evaluated if none of the other alternatives match. The default
+ is optional except for in a case over a primitive type, or when
+ there are no other alternatives. If the case is over neither an
+ algebraic type nor a primitive type, then the list of
+ alternatives must contain a default alternative and nothing
+ else. For algebraic cases, the set of alternatives need not be
+ exhaustive, even if no default is given; if alternatives are
+ missing, this implies that GHC has deduced that they cannot
+ occur.</para>
+
+ <para><code>%cast</code> is used to manipulate newtypes, as described in
+ <link linkend="newtypes">the Newtype section</link>. The <code>%cast</code> expression
+ takes an expression and a coercion: syntactically, the coercion
+ is an arbitrary type, but it must have an equality kind. In an
+ expression <code>(cast e co)</code>, if <code>e :: T</code> and <code>co</code> has kind <code>T :=: U</code>, then
+ the overall expression has type <code>U</code> <citation>ghc-fc-commentary</citation>. Here, <code>co</code> must be a
+ coercion whose left-hand side is <code>T</code>.</para>
+
+ <para>Note that unlike the <code>%coerce</code> expression that existed in previous
+ versions of Core, this means that <code>%cast</code> is (almost) type-safe:
+ the coercion argument provides evidence that can be verified by
+ a typechecker. There are still unsafe <code>%cast</code>s, corresponding to
+ the unsafe <code>%coerce</code> construct that existed in old versions of
+ Core, because there is a primitive unsafe coercion type that can
+ be used to cast arbitrary types to each other. GHC uses this for
+ such purposes as coercing the return type of a function (such as
+ error) which is guaranteed to never return:
+
+ <programlisting language="haskell">
+case (error &quot;&quot;) of
+ True &arw; 1
+ False &arw; 2
+ </programlisting>
+
+ becomes:
+
+ <programlisting language="java">
+%cast (error @ Bool (ZMZN @ Char))
+(%unsafe Bool Integer);
+ </programlisting>
+
+ <code>%cast</code> has no operational meaning and is only used in
+ typechecking.</para>
+
+ <para>A <code>%note</code> expression carries arbitrary internal information that
+ GHC finds interesting. The information is encoded as a string.
+ Expression notes currently generated by GHC include the inlining
+ pragma (<code>InlineMe</code>) and cost-center labels for profiling.</para>
+
+ <para>A <code>%external</code> expression denotes an external identifier, which has
+ the indicated type (always expressed in terms of Haskell
+ primitive types). External Core supports two kinds of external
+ calls: <code>%external</code> and <code>%dynexternal</code>. Only the former is supported
+ by the current set of stand-alone Core tools. In addition, there
+ is a <code>%label</code> construct which GHC may generate but which the Core
+ tools do not support.</para>
+
+ <para>The present syntax for externals is sufficient for describing C
+ functions and labels. Interfacing to other languages may require
+ additional information or a different interpretation of the name
+ string.</para>
+
+ </section>
+
+ <section id="expression-evaluation">
+ <title>Expression Evaluation</title>
+ <para>The dynamic semantics of Core are defined on the type-erasure of
+ the program: for example, we ignore all type abstractions and
+ applications. The denotational semantics of the resulting
+ type-free program are just the conventional ones for a
+ call-by-name language, in which expressions are only evaluated
+ on demand. But Core is intended to be a call-by-<emphasis>need</emphasis> language,
+ in which expressions are only evaluated once. To express the
+ sharing behavior of call-by-need, we give an operational model
+ in the style of Launchbury <citation>launchbury93natural</citation>.</para>
+
+ <para>This section describes the model informally; a more formal
+ semantics is separately available as an executable interpreter.</para>
+
+ <para>To simplify the semantics, we consider only <quote>well-behaved</quote> Core
+ programs in which constructor and primitive applications are
+ fully saturated, and in which non-trivial expresssions of
+ unlifted kind (<code>#</code>) appear only as scrutinees in <code>%case</code>
+ expressions. Any program can easily be put into this form; a
+ separately available preprocessor illustrates how. In the
+ remainder of this section, we use <quote>Core</quote> to mean <quote>well-behaved</quote>
+ Core.</para>
+
+ <para>Evaluating a Core expression means reducing it to <emphasis>weak-head normal form (WHNF)</emphasis>,
+ i.e., a primitive value, lambda abstraction,
+ or fully-applied data constructor. Evaluating a program means
+ evaluating the expression <code>main:ZCMain.main</code>.</para>
+
+ <para>To make sure that expression evaluation is shared, we make use
+ of a <emphasis>heap</emphasis>, which contains <emphasis>heap entries</emphasis>. A heap entry can be:
+
+ <itemizedlist>
+ <listitem>
+ A <emphasis>thunk</emphasis>, representing an unevaluated expression, also known
+ as a suspension.
+ </listitem>
+ <listitem>
+ A <emphasis>WHNF</emphasis>, representing an evaluated expression. The result of
+ evaluating a thunk is a WHNF. A WHNF is always a closure
+ (corresponding to a lambda abstraction in the source
+ program) or a data constructor application: computations
+ over primitive types are never suspended.
+ </listitem>
+ </itemizedlist></para>
+
+ <para><emphasis>Heap pointers</emphasis> point to heap entries: at different times, the
+ same heap pointer can point to either a thunk or a WHNF, because
+ the run-time system overwrites thunks with WHNFs as computation
+ proceeds.</para>
+
+ <para>The suspended computation that a thunk represents might
+ represent evaluating one of three different kinds of expression.
+ The run-time system allocates a different kind of thunk
+ depending on what kind of expression it is:
+
+ <itemizedlist>
+ <listitem>
+ A thunk for a value definition has a group of suspended
+ defining expressions, along with a list of bindings between
+ defined names and heap pointers to those suspensions. (A
+ value definition may be a recursive group of definitions or
+ a single non-recursive definition, and it may be top-level
+ (global) or <code>let</code>-bound (local)).
+ </listitem>
+ <listitem>
+ A thunk for a function application (where the function is
+ user-defined) has a suspended actual argument expression,
+ and a binding between the formal argument and a heap pointer
+ to that suspension.
+ </listitem>
+ <listitem>
+ A thunk for a constructor application has a suspended actual
+ argument expression; the entire constructed value has a heap
+ pointer to that suspension embedded in it.
+ </listitem>
+ </itemizedlist></para>
+
+ <para>As computation proceeds, copies of the heap pointer for a given
+ thunk propagate through the executing program. When another
+ computation demands the result of that thunk, the thunk is
+ <emphasis>forced</emphasis>: the run-time system computes the thunk’s result,
+ yielding a WHNF, and overwrites the heap entry for the thunk
+ with the WHNF. Now, all copies of the heap pointer point to the
+ new heap entry: a WHNF. Forcing occurs only in the context of
+
+ <itemizedlist>
+ <listitem>evaluating the operator expression of an application;</listitem>
+ <listitem>evaluating the scrutinee of a <code>case</code> expression; or</listitem>
+ <listitem>evaluating an argument to a primitive or external function application</listitem>
+ </itemizedlist>
+ </para>
+
+ <para>When no pointers to a heap entry (whether it is a thunk or WHNF)
+ remain, the garbage collector can reclaim the space it uses. We
+ assume this happens implicitly.</para>
+
+ <para>With the exception of functions, arrays, and mutable variables,
+ we intend that values of all primitive types should be held
+ <emphasis>unboxed</emphasis>: they should not be heap-allocated. This does not
+ violate call-by-need semantics: all primitive types are
+ <emphasis>unlifted</emphasis>, which means that values of those types must be
+ evaluated strictly. Unboxed tuple types are not heap-allocated
+ either.</para>
+
+ <para>Certain primitives and <code>%external</code> functions cause side-effects to
+ state threads or to the real world. Where the ordering of these
+ side-effects matters, Core already forces this order with data
+ dependencies on the pseudo-values representing the threads.</para>
+
+ <para>An implementation must specially support the <code>raisezh</code> and
+ <code>handlezh</code> primitives: for example, by using a handler stack.
+ Again, real-world threading guarantees that they will execute in
+ the correct order.</para>
+
+ </section>
+ </section>
+ <section id="primitive-module">
+ <title>Primitive Module</title>
+
+ <para>The semantics of External Core rely on the contents and informal
+ semantics of the primitive module <code>ghc-prim:GHC.Prim</code>. Nearly all
+ the primitives are required in order to cover GHC’s implementation
+ of the Haskell98 standard prelude; the only operators that can be
+ completely omitted are those supporting the byte-code interpreter,
+ parallelism, and foreign objects. Some of the concurrency
+ primitives are needed, but can be given degenerate implementations
+ if it desired to target a purely sequential backend (see Section
+ <link linkend="non-concurrent-back-end">the Non-concurrent Back End section</link>).</para>
+
+ <para>In addition to these primitives, a large number of C library
+ functions are required to implement the full standard Prelude,
+ particularly to handle I/O and arithmetic on less usual types.</para>
+
+ <para>For a full listing of the names and types of the primitive
+ operators, see the GHC library documentation <citation>ghcprim</citation>.</para>
+
+ <section id="non-concurrent-back-end">
+ <title>Non-concurrent Back End</title>
+
+ <para>The Haskell98 standard prelude doesn’t include any concurrency
+ support, but GHC’s implementation of it relies on the existence
+ of some concurrency primitives. However, it never actually forks
+ multiple threads. Hence, the concurrency primitives can be given
+ degenerate implementations that will work in a non-concurrent
+ setting, as follows:</para>
+
+ <itemizedlist>
+ <listitem>
+ <code>ThreadIdzh</code> can be represented by a singleton type, whose
+ (unique) value is returned by <code>myThreadIdzh</code>.
+ </listitem>
+ <listitem>
+ <code>forkzh</code> can just die with an <quote>unimplemented</quote> message.
+ </listitem>
+ <listitem>
+ <code>killThreadzh</code> and <code>yieldzh</code> can also just die <quote>unimplemented</quote>
+ since in a one-thread world, the only thread a thread can
+ kill is itself, and if a thread yields the program hangs.
+ </listitem>
+ <listitem>
+ <code>MVarzh a</code> can be represented by <code>MutVarzh (Maybe a)</code>; where a
+ concurrent implementation would block, the sequential
+ implementation can just die with a suitable message (since
+ no other thread exists to unblock it).
+ </listitem>
+ <listitem>
+ <code>waitReadzh</code> and <code>waitWritezh</code> can be implemented using a <code>select</code>
+ with no timeout.
+ </listitem>
+ </itemizedlist>
+ </section>
+
+ <section id="literals">
+ <title>Literals</title>
+
+ <para>Only the following combination of literal forms and types are
+ permitted:</para>
+
+ <informaltable frame="none" colsep="0" rowsep="0">
+ <tgroup cols='3'>
+ <colspec colname="literal" align="left" colwidth="*" />
+ <colspec colname="type" align="left" colwidth="*" />
+ <colspec colname="description" align="left" colwidth="4*" />
+ <thead>
+ <row>
+ <entry>Literal form</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry morerows="3" valign="top">integer</entry>
+ <entry><code>Intzh</code></entry>
+ <entry>Int</entry>
+ </row>
+ <row>
+ <entry><code>Wordzh</code></entry>
+ <entry>Word</entry>
+ </row>
+ <row>
+ <entry><code>Addrzh</code></entry>
+ <entry>Address</entry>
+ </row>
+ <row>
+ <entry><code>Charzh</code></entry>
+ <entry>Unicode character code</entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top">rational</entry>
+ <entry><code>Floatzh</code></entry>
+ <entry>Float</entry>
+ </row>
+ <row>
+ <entry><code>Doublezh</code></entry>
+ <entry>Double</entry>
+ </row>
+
+ <row>
+ <entry>character</entry>
+ <entry><code>Charzh</code></entry>
+ <entry>Unicode character specified by ASCII character</entry>
+ </row>
+
+ <row>
+ <entry>string</entry>
+ <entry><code>Addrzh</code></entry>
+ <entry>Address of specified C-format string</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+ </section>
+
+
+ <bibliolist>
+ <!-- This bibliography was semi-automatically converted by JabRef from core.bib. -->
+
+ <title>References</title>
+
+ <biblioentry>
+ <abbrev>ghc-user-guide</abbrev>
+ <authorgroup>
+ <author><surname>The GHC Team</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">The Glorious Glasgow Haskell Compilation System User's Guide, Version 6.8.2</citetitle>
+ <pubdate>2008</pubdate>
+ <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html">http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>ghc-fc-commentary</abbrev>
+ <authorgroup>
+ <author><surname>GHC Wiki</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">System FC: equality constraints and coercions</citetitle>
+ <pubdate>2006</pubdate>
+ <bibliomisc><ulink url="http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC">http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>ghc-api</abbrev>
+ <authorgroup>
+ <author><surname>Haskell Wiki</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Using GHC as a library</citetitle>
+ <pubdate>2007</pubdate>
+ <bibliomisc><ulink url="http://haskell.org/haskellwiki/GHC/As_a_library">http://haskell.org/haskellwiki/GHC/As_a_library</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>haskell98</abbrev>
+ <authorgroup>
+ <editor><firstname>Simon</firstname><surname>Peyton-Jones</surname></editor>
+ </authorgroup>
+ <citetitle pubwork="article">Haskell 98 Language and Libraries: The Revised Report</citetitle>
+ <publisher>
+ <publishername>Cambridge University Press</publishername>
+ <address>
+ <city>Cambridge></city>
+ <state>UK</state>
+ </address>
+ </publisher>
+ <pubdate>2003</pubdate>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>system-fc</abbrev>
+ <authorgroup>
+ <author><firstname>Martin</firstname><surname>Sulzmann</surname></author>
+ <author><firstname>Manuel M.T.</firstname><surname>Chakravarty</surname></author>
+ <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+ <author><firstname>Kevin</firstname><surname>Donnelly</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">System F with type equality coercions</citetitle>
+ <publisher>
+ <publishername>ACM</publishername>
+ <address>
+ <city>New York</city>
+ <state>NY</state>
+ <country>USA</country>
+ </address>
+ </publisher>
+ <artpagenums>53-66</artpagenums>
+ <pubdate>2007</pubdate>
+ <bibliomisc><ulink url="http://portal.acm.org/citation.cfm?id=1190324">http://portal.acm.org/citation.cfm?id=1190324</ulink></bibliomisc>
+ <!-- booktitle = {{TLDI '07: Proceedings of the 2007 ACM SIGPLAN International Workshop on Types in Language Design and Implementation}}, -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>gadts</abbrev>
+ <authorgroup>
+ <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+ <author><firstname>Dimitrios</firstname><surname>Vytiniotis</surname></author>
+ <author><firstname>Stephanie</firstname><surname>Weirich</surname></author>
+ <author><firstname>Geoffrey</firstname><surname>Washburn</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Simple unification-based type inference for GADTs</citetitle>
+ <publisher>
+ <publishername>ACM</publishername>
+ <address>
+ <city>New York</city>
+ <state>NY</state>
+ <country>USA</country>
+ </address>
+ </publisher>
+ <artpagenums>50-61</artpagenums>
+ <pubdate>2006</pubdate>
+ <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm">http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>Launchbury94</abbrev>
+ <authorgroup>
+ <author><firstname>John</firstname><surname>Launchbury</surname></author>
+ <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Lazy Functional State Threads</citetitle>
+ <artpagenums>24-35</artpagenums>
+ <pubdate>1994</pubdate>
+ <bibliomisc><ulink url="http://citeseer.ist.psu.edu/article/launchbury93lazy.html">http://citeseer.ist.psu.edu/article/launchbury93lazy.html</ulink></bibliomisc>
+ <!-- booktitle = "{SIGPLAN} {Conference} on {Programming Language Design and Implementation}", -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>pj:unboxed</abbrev>
+ <authorgroup>
+ <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+ <author><firstname>John</firstname><surname>Launchbury</surname></author>
+ <editor><firstname>J.</firstname><surname>Hughes</surname></editor>
+ </authorgroup>
+ <citetitle pubwork="article">Unboxed Values as First Class Citizens in a Non-strict Functional Language</citetitle>
+ <publisher>
+ <publishername>Springer-Verlag LNCS523</publishername>
+ <address>
+ <city>Cambridge</city>
+ <state>Massachussetts</state>
+ <country>USA</country>
+ </address>
+ </publisher>
+ <artpagenums>636-666</artpagenums>
+ <pubdate>1991, August 26-28</pubdate>
+ <bibliomisc><ulink url="http://citeseer.ist.psu.edu/jones91unboxed.html">http://citeseer.ist.psu.edu/jones91unboxed.html</ulink></bibliomisc>
+ <!-- booktitle = "Proceedings of the Conference on Functional Programming and Computer Architecture", -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>ghc-inliner</abbrev>
+ <authorgroup>
+ <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+ <author><firstname>Simon</firstname><surname>Marlow</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Secrets of the Glasgow Haskell Compiler inliner</citetitle>
+ <pubdate>1999</pubdate>
+ <address>
+ <city>Paris</city>
+ <country>France</country>
+ </address>
+ <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf">http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf</ulink></bibliomisc>
+ <!-- booktitle = "Workshop on Implementing Declarative Languages", -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>comp-by-trans-scp</abbrev>
+ <authorgroup>
+ <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+ <author><firstname>A. L. M.</firstname><surname>Santos</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">A transformation-based optimiser for Haskell</citetitle>
+ <citetitle pubwork="journal">Science of Computer Programming</citetitle>
+ <volumenum>32</volumenum>
+ <issuenum>1-3</issuenum>
+ <artpagenums>3-47</artpagenums>
+ <pubdate>1998</pubdate>
+ <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones98transformationbased.html">http://citeseer.ist.psu.edu/peytonjones98transformationbased.html</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>stg-machine</abbrev>
+ <authorgroup>
+ <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine</citetitle>
+ <citetitle pubwork="journal">Journal of Functional Programming</citetitle>
+ <volumenum>2</volumenum>
+ <issuenum>2</issuenum>
+ <artpagenums>127-202</artpagenums>
+ <pubdate>1992</pubdate>
+ <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones92implementing.html">http://citeseer.ist.psu.edu/peytonjones92implementing.html</ulink></bibliomisc>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>launchbury93natural</abbrev>
+ <authorgroup>
+ <author><firstname>John</firstname><surname>Launchbury</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">A Natural Semantics for Lazy Evaluation</citetitle>
+ <artpagenums>144-154</artpagenums>
+ <address>
+ <city>Charleston</city>
+ <state>South Carolina</state>
+ </address>
+ <pubdate>1993</pubdate>
+ <bibliomisc><ulink url="citeseer.ist.psu.edu/launchbury93natural.html">citeseer.ist.psu.edu/launchbury93natural.html</ulink></bibliomisc>
+ <!-- booktitle = "Conference Record of the Twentieth Annual {ACM} {SIGPLAN}-{SIGACT} Symposium on Principles of Programming Languages", -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>ghcprim</abbrev>
+ <authorgroup>
+ <author><surname>The GHC Team</surname></author>
+ </authorgroup>
+ <citetitle pubwork="article">Library documentation: GHC.Prim</citetitle>
+ <pubdate>2008</pubdate>
+ <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html">http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html</ulink></bibliomisc>
+ </biblioentry>
+ </bibliolist>
+
+</chapter>
diff --git a/docs/users_guide/ug-book.xml.in b/docs/users_guide/ug-book.xml.in
index 1ff487c2ed..4d4489f80f 100644
--- a/docs/users_guide/ug-book.xml.in
+++ b/docs/users_guide/ug-book.xml.in
@@ -17,6 +17,7 @@
&lang-features;
&ffi-chap;
&extending-ghc;
+&external-core;
&wrong;
&utils;
&win32-dll;
diff --git a/docs/users_guide/ug-ent.xml.in b/docs/users_guide/ug-ent.xml.in
index c83abaad52..37df56889b 100644
--- a/docs/users_guide/ug-ent.xml.in
+++ b/docs/users_guide/ug-ent.xml.in
@@ -12,6 +12,7 @@
<!ENTITY sooner SYSTEM "sooner.xml" >
<!ENTITY lang-features SYSTEM "lang.xml" >
<!ENTITY glasgowexts SYSTEM "glasgow_exts.xml" >
+<!ENTITY external-core SYSTEM "external_core.xml" >
<!ENTITY packages SYSTEM "packages.xml" >
<!ENTITY parallel SYSTEM "parallel.xml" >
<!ENTITY safehaskell SYSTEM "safe_haskell.xml" >
@@ -28,3 +29,4 @@
<!ENTITY libraryBaseLocation "../libraries/base-@LIBRARY_base_VERSION@">
<!ENTITY libraryCabalLocation "../libraries/Cabal-@LIBRARY_Cabal_VERSION@">
<!ENTITY libraryGhcPrimLocation "../libraries/ghc-prim-@LIBRARY_ghc_prim_VERSION@">
+<!ENTITY arw "-&gt;">