diff options
author | Ian Lynagh <igloo@earth.li> | 2012-02-14 14:43:08 +0000 |
---|---|---|
committer | Ian Lynagh <igloo@earth.li> | 2012-02-14 15:27:04 +0000 |
commit | 8c0196b48d043fe16eb5b2d343f5544b7fdd5004 (patch) | |
tree | 0f1c47a7d0ba5a40b00c0449806abaeafde5d1f6 /docs | |
parent | 92779798ceb6f3414ed378d51dc8be5c9ca4c52d (diff) | |
download | haskell-8c0196b48d043fe16eb5b2d343f5544b7fdd5004.tar.gz |
Add docbook-ised external-core doc; from #5844, by James H. Fisher
Diffstat (limited to 'docs')
-rw-r--r-- | docs/users_guide/external_core.xml | 1807 | ||||
-rw-r--r-- | docs/users_guide/ug-book.xml.in | 1 | ||||
-rw-r--r-- | docs/users_guide/ug-ent.xml.in | 2 |
3 files changed, 1810 insertions, 0 deletions
diff --git a/docs/users_guide/external_core.xml b/docs/users_guide/external_core.xml new file mode 100644 index 0000000000..faf7148f20 --- /dev/null +++ b/docs/users_guide/external_core.xml @@ -0,0 +1,1807 @@ +<?xml version="1.0" encoding="utf-8"?> + +<!-- +This document is a semi-automatic conversion of docs/ext-core/core.tex to DocBook using + +1. `htlatex` to convert LaTeX to HTML +2. `pandoc` to convert HTML to DocBook +3. extensive manual work by James H. Fisher (jameshfisher@gmail.com) +--> + +<!-- +TODO: + +* Replace "java" programlisting with "ghccore" +("ghccore" is not recognized by highlighters, +causing some generators to fail). + +* Complete bibliography entries with journal titles; +I am unsure of the proper DocBook elements. + +* Integrate this file with the rest of the Users' Guide. +--> + + +<chapter id="an-external-representation-for-the-ghc-core-language-for-ghc-6.10"> + <title>An External Representation for the GHC Core Language (For GHC 6.10)</title> + + <para>Andrew Tolmach, Tim Chevalier ({apt,tjc}@cs.pdx.edu) and The GHC Team</para> + + <abstract> + <para>This document provides a precise definition for the GHC Core + language, so that it can be used to communicate between GHC and new + stand-alone compilation tools such as back-ends or + optimizers.<footnote> + <para>This is a draft document, which attempts + to describe GHC’s current behavior as precisely as possible. Working + notes scattered throughout indicate areas where further work is + needed. Constructive comments are very welcome, both on the + presentation, and on ways in which GHC could be improved in order to + simplify the Core story.</para> + + <para>Support for generating external Core (post-optimization) was + originally introduced in GHC 5.02. The definition of external Core in + this document reflects the version of external Core generated by the + HEAD (unstable) branch of GHC as of May 3, 2008 (version 6.9), using + the compiler flag <code>-fext-core</code>. We expect that GHC 6.10 will be + consistent with this definition.</para> + </footnote> + The definition includes a formal grammar and an informal semantics. + An executable typechecker and interpreter (in Haskell), which + formally embody the static and dynamic semantics, are available + separately.</para> + + </abstract> + + <section id="introduction"> + <title>Introduction</title> + + <para>The Glasgow Haskell Compiler (GHC) uses an intermediate language, + called <quote>Core,</quote> as its internal program representation within the + compiler’s simplification phase. Core resembles a subset of + Haskell, but with explicit type annotations in the style of the + polymorphic lambda calculus (F<subscript>ω</subscript>).</para> + + <para>GHC’s front-end translates full Haskell 98 (plus some extensions) + into Core. The GHC optimizer then repeatedly transforms Core + programs while preserving their meaning. A <quote>Core Lint</quote> pass in GHC + typechecks Core in between transformation passes (at least when + the user enables linting by setting a compiler flag), verifying + that transformations preserve type-correctness. Finally, GHC’s + back-end translates Core into STG-machine code <citation>stg-machine</citation> and then into C + or native code.</para> + + <para>Two existing papers discuss the original rationale for the design + and use of Core <citation>ghc-inliner,comp-by-trans-scp</citation>, although the (two different) idealized + versions of Core described therein differ in significant ways from + the actual Core language in current GHC. In particular, with the + advent of GHC support for generalized algebraic datatypes (GADTs) + <citation>gadts</citation> Core was extended beyond its previous + F<subscript>ω</subscript>-style incarnation to support type + equality constraints and safe coercions, and is now based on a + system known as F<subscript>C</subscript> <citation>system-fc</citation>.</para> + + <para>Researchers interested in writing just <emphasis>part</emphasis> of a Haskell compiler, + such as a new back-end or a new optimizer pass, might like to use + GHC to provide the other parts of the compiler. For example, they + might like to use GHC’s front-end to parse, desugar, and + type-check source Haskell, then feeding the resulting code to + their own back-end tool. As another example, they might like to + use Core as the target language for a front-end compiler of their + own design, feeding externally synthesized Core into GHC in order + to take advantage of GHC’s optimizer, code generator, and run-time + system. Without external Core, there are two ways for compiler + writers to do this: they can link their code into the GHC + executable, which is an arduous process, or they can use the GHC + API <citation>ghc-api</citation> to do the same task more cleanly. Both ways require new + code to be written in Haskell.</para> + + <para>We present a precisely specified external format for Core files. + The external format is text-based and human-readable, to promote + interoperability and ease of use. We hope this format will make it + easier for external developers to use GHC in a modular way.</para> + + <para>It has long been true that GHC prints an ad-hoc textual + representation of Core if you set certain compiler flags. But this + representation is intended to be read by people who are debugging + the compiler, not by other programs. Making Core into a + machine-readable, bi-directional communication format requires: + + <orderedlist> + <listitem> + precisely specifying the external format of Core; + </listitem> + <listitem> + modifying GHC to generate external Core files + (post-simplification; as always, users can control the exact + transformations GHC does with command-line flags); + </listitem> + <listitem> + modifying GHC to accept external Core files in place of + Haskell source files (users will also be able to control what + GHC does to those files with command-line flags). + </listitem> + </orderedlist> + </para> + + <para>The first two facilities will let developers couple GHC’s + front-end (parser, type-checker, desugarer), and optionally its + optimizer, with new back-end tools. The last facility will let + developers write new Core-to-Core transformations as an external + tool and integrate them into GHC. It will also allow new + front-ends to generate Core that can be fed into GHC’s optimizer + or back-end.</para> + + <para>However, because there are many (undocumented) idiosyncracies in + the way GHC produces Core from source Haskell, it will be hard for + an external tool to produce Core that can be integrated with + GHC-produced Core (e.g., for the Prelude), and we don’t aim to + support this. Indeed, for the time being, we aim to support only + the first two facilities and not the third: we define and + implement Core as an external format that GHC can use to + communicate with external back-end tools, and defer the larger + task of extending GHC to support reading this external format back + in.</para> + + <para>This document addresses the first requirement, a formal Core + definition, by proposing a formal grammar for an + <link linkend="external-grammar-of-core">external representation of Core</link>, + and an <link linkend="informal-semantics">informal semantics</link>.</para> + + <para>GHC supports many type system extensions; the External Core + printer built into GHC only supports some of them. However, + External Core should be capable of representing any Haskell 98 + program, and may be able to represent programs that require + certain type system extensions as well. If a program uses + unsupported features, GHC may fail to compile it to Core when the + -fext-core flag is set, or GHC may successfully compile it to + Core, but the external tools will not be able to typecheck or + interpret it.</para> + + <para>Formal static and dynamic semantics in the form of an executable + typechecker and interpreter are available separately in the GHC + source tree + <footnote><ulink url="http://darcs.haskell.org/ghc">http://darcs.haskell.org/ghc</ulink></footnote> + under <code>utils/ext-core</code>.</para> + + </section> + <section id="external-grammar-of-core"> + <title>External Grammar of Core</title> + + <para>In designing the external grammar, we have tried to strike a + balance among a number of competing goals, including easy + parseability by machines, easy readability by humans, and adequate + structural simplicity to allow straightforward presentations of + the semantics. Thus, we had to make some compromises. + Specifically:</para> + + <itemizedlist> + <listitem>In order to avoid explosion of parentheses, we support + standard precedences and short-cuts for expressions, types, + and kinds. Thus we had to introduce multiple non-terminals for + each of these syntactic categories, and as a result, the + concrete grammar is longer and more complex than the + underlying abstract syntax.</listitem> + + <listitem>On the other hand, we have kept the grammar simpler by + avoiding special syntax for tuple types and terms. Tuples + (both boxed and unboxed) are treated as ordinary constructors.</listitem> + + <listitem>All type abstractions and applications are given in full, even + though some of them (e.g., for tuples) could be reconstructed; + this means a parser for Core does not have to reconstruct + types.<footnote> + These choices are certainly debatable. In + particular, keeping type applications on tuples and case arms + considerably increases the size of Core files and makes them less + human-readable, though it allows a Core parser to be simpler. + </footnote></listitem> + + <listitem>The syntax of identifiers is heavily restricted (to just + alphanumerics and underscores); this again makes Core easier + to parse but harder to read.</listitem> + </itemizedlist> + + <para>We use the following notational conventions for syntax: + + <informaltable frame="none"> + <tgroup cols='2' align='left' colsep="0" rowsep="0"> + <tbody> + <row> + <entry>[ pat ]</entry> + <entry>optional</entry> + </row> + + <row> + <entry>{ pat }</entry> + <entry>zero or more repetitions</entry> + </row> + + <row> + <entry> + { pat }<superscript>+</superscript> + </entry> + <entry>one or more repetitions</entry> + </row> + + <row> + <entry> + pat<subscript>1</subscript> ∣ pat<subscript>2</subscript> + </entry> + <entry>choice</entry> + </row> + + <row> + <entry> + <code>fibonacci</code> + </entry> + <entry>terminal syntax in typewriter font</entry> + </row> + </tbody> + </tgroup> + </informaltable> + </para> + + <informaltable frame="none" colsep="0" rowsep="0"> + <tgroup cols='5'> + <colspec colname="cat" align="left" colwidth="3*" /> + <colspec colname="lhs" align="right" colwidth="2*" /> + <colspec align="center" colwidth="*" /> + <colspec colname="rhs" align="left" colwidth="10*" /> + <colspec colname="name" align="right" colwidth="6*" /> + <tbody> + <row rowsep="1"> + <entry>Module</entry> + <entry>module</entry> + <entry>→</entry> + <entry> + <code>%module</code> mident { tdef ; }{ vdefg ; } + </entry> + <entry></entry> + </row> + + <row> + <entry morerows="1" valign="top">Type defn.</entry> + <entry morerows="1" valign="top">tdef</entry> + <entry>→</entry> + <entry> + <code>%data</code> qtycon { tbind } <code>= {</code> [ cdef {<code>;</code> cdef } ] <code>}</code> + </entry> + <entry>algebraic type</entry> + </row> + <row rowsep="1"> + <entry>∣</entry> + <entry> + <code>%newtype</code> qtycon qtycon { tbind } <code>=</code> ty + </entry> + <entry>newtype</entry> + </row> + + <row rowsep="1"> + <entry>Constr. defn.</entry> + <entry>cdef</entry> + <entry>→</entry> + <entry> + qdcon { <code>@</code> tbind }{ aty }<superscript>+</superscript> + </entry> + </row> + + <row> + <entry morerows="2" valign="top">Value defn.</entry> + <entry morerows="1" valign="top">vdefg</entry> + <entry>→</entry> + <entry><code>%rec {</code> vdef { <code>;</code> vdef } <code>}</code></entry> + <entry>recursive</entry> + </row> + + <row> + <entry>∣</entry> + <entry>vdef</entry> + <entry>non-recursive</entry> + </row> + + <row rowsep="1"> + <entry>vdef</entry> + <entry>→</entry> + <entry>qvar <code>::</code> ty <code>=</code> exp</entry> + <entry></entry> + </row> + + <row> + <entry morerows="3" valign="top">Atomic expr.</entry> + <entry morerows="3" valign="top">aexp</entry> + <entry>→</entry> + <entry>qvar</entry> + <entry>variable</entry> + </row> + + <row> + <entry>∣</entry> + <entry>qdcon</entry> + <entry>data constructor</entry> + </row> + + <row> + <entry>∣</entry> + <entry>lit</entry> + <entry>literal</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>(</code> exp <code>)</code></entry> + <entry>nested expr.</entry> + </row> + + <row> + <entry morerows="9" valign="top">Expression</entry> + <entry morerows="9" valign="top">exp</entry> + <entry>→</entry> + <entry>aexp</entry> + <entry>atomic expresion</entry> + </row> + + <row> + <entry>∣</entry> + <entry>aexp { arg }<superscript>+</superscript></entry> + <entry>application</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>\</code> { binder }<superscript>+</superscript> &arw; exp</entry> + <entry>abstraction</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%let</code> vdefg <code>%in</code> exp</entry> + <entry>local definition</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%case (</code> aty <code>)</code> exp <code>%of</code> vbind <code>{</code> alt { <code>;</code> alt } <code>}</code></entry> + <entry>case expression</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%cast</code> exp aty</entry> + <entry>type coercion</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%note</code> " { char } " exp</entry> + <entry>expression note</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%external ccall "</code> { char } <code>"</code> aty</entry> + <entry>external reference</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%dynexternal ccall</code> aty</entry> + <entry>external reference (dynamic)</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>%label "</code> { char } <code>"</code></entry> + <entry>external label</entry> + </row> + + <row> + <entry morerows="1" valign="top">Argument</entry> + <entry morerows="1" valign="top">arg</entry> + <entry>→</entry> + <entry><code>@</code> aty</entry> + <entry>type argument</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry>aexp</entry> + <entry>value argument</entry> + </row> + + <row> + <entry morerows="2" valign="top">Case alt.</entry> + <entry morerows="2" valign="top">alt</entry> + <entry>→</entry> + <entry>qdcon { <code>@</code> tbind }{ vbind } <code>&arw;</code> exp</entry> + <entry>constructor alternative</entry> + </row> + + <row> + <entry>∣</entry> + <entry>lit <code>&arw;</code> exp</entry> + <entry>literal alternative</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>%_ &arw;</code> exp</entry> + <entry>default alternative</entry> + </row> + + <row> + <entry morerows="1" valign="top">Binder</entry> + <entry morerows="1" valign="top">binder</entry> + <entry>→</entry> + <entry><code>@</code> tbind</entry> + <entry>type binder</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry>vbind</entry> + <entry>value binder</entry> + </row> + + <row> + <entry morerows="1" valign="top">Type binder</entry> + <entry morerows="1" valign="top">tbind</entry> + <entry>→</entry> + <entry>tyvar</entry> + <entry>implicitly of kind *</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>(</code> tyvar <code>::</code> kind <code>)</code></entry> + <entry>explicitly kinded</entry> + </row> + + <row rowsep="1"> + <entry>Value binder</entry> + <entry>vbind</entry> + <entry>→</entry> + <entry><code>(</code> var <code>::</code> ty <code>)</code></entry> + <entry></entry> + </row> + + <row> + <entry morerows="3" valign="top">Literal</entry> + <entry morerows="3" valign="top">lit</entry> + <entry>→</entry> + <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry> + <entry>integer</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>%</code> { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry> + <entry>rational</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>( '</code> char <code>' ::</code> ty <code>)</code></entry> + <entry>character</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>( "</code> { char } <code>" ::</code> ty <code>)</code></entry> + <entry>string</entry> + </row> + + <row> + <entry morerows="2" valign="top">Character</entry> + <entry morerows="1" valign="top">char</entry> + <entry>→</entry> + <entry namest="rhs" nameend="name"><emphasis>any ASCII character in range 0x20-0x7E except 0x22,0x27,0x5c</emphasis></entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>\x</code> hex hex</entry> + <entry>ASCII code escape sequence</entry> + </row> + + <row rowsep="1"> + <entry>hex</entry> + <entry>→</entry> + <entry>0∣…∣9 ∣a ∣…∣f</entry> + <entry></entry> + </row> + + <row> + <entry morerows="2" valign="top">Atomic type</entry> + <entry morerows="2" valign="top">aty</entry> + <entry>→</entry> + <entry>tyvar</entry> + <entry>type variable</entry> + </row> + + <row> + <entry>∣</entry> + <entry>qtycon</entry> + <entry>type constructor</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>(</code> ty <code>)</code></entry> + <entry>nested type</entry> + </row> + + <row> + <entry morerows="7" valign="top">Basic type</entry> + <entry morerows="7" valign="top">bty</entry> + <entry>→</entry> + <entry>aty</entry> + <entry>atomic type</entry> + </row> + + <row> + <entry>∣</entry> + <entry>bty aty</entry> + <entry>type application</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%trans</code> aty aty</entry> + <entry>transitive coercion</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%sym</code> aty</entry> + <entry>symmetric coercion</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%unsafe</code> aty aty</entry> + <entry>unsafe coercion</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%left</code> aty</entry> + <entry>left coercion</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%right</code> aty</entry> + <entry>right coercion</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>%inst</code> aty aty</entry> + <entry>instantiation coercion</entry> + </row> + + <row> + <entry morerows="2" valign="top">Type</entry> + <entry morerows="2" valign="top">ty</entry> + <entry>→</entry> + <entry>bty</entry> + <entry>basic type</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>%forall</code> { tbind }<superscript>+</superscript> <code>.</code> ty</entry> + <entry>type abstraction</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry>bty <code>&arw;</code> ty</entry> + <entry>arrow type construction</entry> + </row> + + <row> + <entry morerows="4" valign="top">Atomic kind</entry> + <entry morerows="4" valign="top">akind</entry> + <entry>→</entry> + <entry><code>*</code></entry> + <entry>lifted kind</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>#</code></entry> + <entry>unlifted kind</entry> + </row> + + <row> + <entry>∣</entry> + <entry><code>?</code></entry> + <entry>open kind</entry> + </row> + + <row> + <entry>∣</entry> + <entry>bty <code>:=:</code> bty</entry> + <entry>equality kind</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry><code>(</code> kind <code>)</code></entry> + <entry>nested kind</entry> + </row> + + <row> + <entry morerows="1" valign="top">Kind</entry> + <entry morerows="1" valign="top">kind</entry> + <entry>→</entry> + <entry>akind</entry> + <entry>atomic kind</entry> + </row> + + <row rowsep="1"> + <entry>∣</entry> + <entry>akind <code>&arw;</code> kind</entry> + <entry>arrow kind</entry> + </row> + + <row> + <entry morerows="7" valign="top">Identifier</entry> + <entry>mident</entry> + <entry>→</entry> + <entry>pname <code>:</code> uname</entry> + <entry>module</entry> + </row> + + <row> + <entry>tycon</entry> + <entry>→</entry> + <entry>uname</entry> + <entry>type constr.</entry> + </row> + + <row> + <entry>qtycon</entry> + <entry>→</entry> + <entry>mident <code>.</code> tycon</entry> + <entry>qualified type constr.</entry> + </row> + + <row> + <entry>tyvar</entry> + <entry>→</entry> + <entry>lname</entry> + <entry>type variable</entry> + </row> + + <row> + <entry>dcon</entry> + <entry>→</entry> + <entry>uname</entry> + <entry>data constr.</entry> + </row> + + <row> + <entry>qdcon</entry> + <entry>→</entry> + <entry>mident <code>.</code> dcon</entry> + <entry>qualified data constr.</entry> + </row> + + <row> + <entry>var</entry> + <entry>→</entry> + <entry>lname</entry> + <entry>variable</entry> + </row> + + <row rowsep="1"> + <entry>qvar</entry> + <entry>→</entry> + <entry>[ mident <code>.</code> ] var</entry> + <entry>optionally qualified variable</entry> + </row> + + <row> + <entry morerows="6" valign="top">Name</entry> + <entry>lname</entry> + <entry>→</entry> + <entry>lower { namechar }</entry> + <entry></entry> + </row> + + <row> + <entry>uname</entry> + <entry>→</entry> + <entry>upper { namechar }</entry> + <entry></entry> + </row> + + <row> + <entry>pname</entry> + <entry>→</entry> + <entry>{ namechar }<superscript>+</superscript></entry> + <entry></entry> + </row> + + <row> + <entry>namechar</entry> + <entry>→</entry> + <entry>lower ∣ upper ∣ digit</entry> + <entry></entry> + </row> + + <row> + <entry>lower</entry> + <entry>→</entry> + <entry><code>a</code> ∣ <code>b</code> ∣ … ∣ <code>z</code> ∣ <code>_</code></entry> + <entry></entry> + </row> + + <row> + <entry>upper</entry> + <entry>→</entry> + <entry><code>A</code> ∣ <code>B</code> ∣ … ∣ <code>Z</code></entry> + <entry></entry> + </row> + + <row> + <entry>digit</entry> + <entry>→</entry> + <entry><code>0</code> ∣ <code>1</code> ∣ … ∣ <code>9</code></entry> + <entry></entry> + </row> + </tbody> + </tgroup> + </informaltable> + </section> + + <section id="informal-semantics"> + <title>Informal Semantics</title> + + <para>At the term level, Core resembles a explicitly-typed polymorphic + lambda calculus (F<subscript>ω</subscript>), with the addition of + local <code>let</code> bindings, algebraic type definitions, constructors, and + <code>case</code> expressions, and primitive types, literals and operators. Its + type system is richer than that of System F, supporting explicit + type equality coercions and type functions.<citation>system-fc</citation></para> + + <para>In this section we concentrate on the less obvious points about + Core.</para> + + <section id="program-organization-and-modules"> + <title>Program Organization and Modules</title> + + <para>Core programs are organized into <emphasis>modules</emphasis>, corresponding directly + to source-level Haskell modules. Each module has a identifying + name <emphasis>mident</emphasis>. A module identifier consists of a <emphasis>package name</emphasis> + followed by a module name, which may be hierarchical: for + example, <code>base:GHC.Base</code> is the module identifier for GHC’s Base + module. Its name is <code>Base</code>, and it lives in the GHC hierarchy + within the <code>base</code> package. Section 5.8 of the GHC users’ guide + explains package names <citation>ghc-user-guide</citation>. In particular, note that a Core + program may contain multiple modules with the same (possibly + hierarchical) module name that differ in their package names. In + some of the code examples that follow, we will omit package + names and possibly full hierarchical module names from + identifiers for brevity, but be aware that they are always + required.<footnote> + A possible improvement to the Core syntax + would be to add explicit import lists to Core modules, which could be + used to specify abbrevations for long qualified names. This would make + the code more human-readable. + </footnote></para> + + <para>Each module may contain the following kinds of top-level + declarations: + + <itemizedlist> + <listitem> + Algebraic data type declarations, each defining a type + constructor and one or more data constructors; + </listitem> + <listitem> + Newtype declarations, corresponding to Haskell <code>newtype</code> + declarations, each defining a type constructor and a + coercion name; and + </listitem> + <listitem> + Value declarations, defining the types and values of + top-level variables. + </listitem> + </itemizedlist> + </para> + + <para>No type constructor, data constructor, or top-level value may be + declared more than once within a given module. All the type + declarations are (potentially) mutually recursive. Value + declarations must be in dependency order, with explicit grouping + of potentially mutually recursive declarations.</para> + + <para>Identifiers defined in top-level declarations may be <emphasis>external</emphasis> or + <emphasis>internal</emphasis>. External identifiers can be referenced from any other + module in the program, using conventional dot notation (e.g., + <code>base:GHC.Base.Bool</code>, <code>base:GHC.Base.True</code>). Internal identifiers + are visible only within the defining module. All type and data + constructors are external, and are always defined and referenced + using fully qualified names (with dots).</para> + + <para>A top-level value is external if it is defined and referenced + using a fully qualified name with a dot (e.g., <code>main:MyModule.foo = ...</code>); + otherwise, it is internal (e.g., <code>bar = ...</code>). Note that + Core’s notion of an external identifier does not necessarily + coincide with that of <quote>exported</quote> identifier in a Haskell source + module. An identifier can be an external identifier in Core, but + not be exported by the original Haskell source + module.<footnote> + Two examples of such identifiers are: data + constructors, and values that potentially appear in an unfolding. For an + example of the latter, consider <code>Main.foo = ... Main.bar ...</code>, where + <code>Main.foo</code> is inlineable. Since <code>bar</code> appears in <code>foo</code>’s unfolding, it is + defined and referenced with an external name, even if <code>bar</code> was not + exported by the original source module. + </footnote> + However, if an identifier was exported by the Haskell source + module, it will appear as an external name in Core.</para> + + <para>Core modules have no explicit import or export lists. Modules + may be mutually recursive. Note that because of the latter fact, + GHC currently prints out the top-level bindings for every module + as a single recursive group, in order to avoid keeping track of + dependencies between top-level values within a module. An + external Core tool could reconstruct dependencies later, of + course.</para> + + <para>There is also an implicitly-defined module <code>ghc-prim:GHC.Prim</code>, + which exports the <quote>built-in</quote> types and values that must be + provided by any implementation of Core (including GHC). Details + of this module are in the <link linkend="primitive-module">Primitive Module section</link>.</para> + + <para>A Core <emphasis>program</emphasis> is a collection of distinctly-named modules that + includes a module called main:Main having an exported value + called <code>main:ZCMain.main</code> of type <code>base:GHC.IOBase.IO a</code> (for some + type <code>a</code>). (Note that the strangely named wrapper for <code>main</code> is the + one exception to the rule that qualified names defined within a + module <code>m</code> must have module name <code>m</code>.)</para> + + <para>Many Core programs will contain library modules, such as + <code>base:GHC.Base</code>, which implement parts of the Haskell standard + library. In principle, these modules are ordinary Haskell + modules, with no special status. In practice, the requirement on + the type of <code>main:Main.main</code> implies that every program will + contain a large subset of the standard library modules.</para> + + </section> + <section id="namespaces"> + <title>Namespaces</title> + + <para>There are five distinct namespaces: + <orderedlist> + <listitem>module identifiers (<code>mident</code>),</listitem> + <listitem>type constructors (<code>tycon</code>),</listitem> + <listitem>type variables (<code>tyvar</code>),</listitem> + <listitem>data constructors (<code>dcon</code>),</listitem> + <listitem>term variables (<code>var</code>).</listitem> + </orderedlist> + </para> + + <para>Spaces (1), (2+3), and (4+5) can be distinguished from each + other by context. To distinguish (2) from (3) and (4) from (5), + we require that data and type constructors begin with an + upper-case character, and that term and type variables begin + with a lower-case character.</para> + + <para>Primitive types and operators are not syntactically + distinguished.</para> + + <para>Primitive <emphasis>coercion</emphasis> operators, of which there are six, <emphasis>are</emphasis> + syntactically distinguished in the grammar. This is because + these coercions must be fully applied, and because + distinguishing their applications in the syntax makes + typechecking easier.</para> + + <para>A given variable (type or term) may have multiple definitions + within a module. However, definitions of term variables never + <quote>shadow</quote> one another: the scope of the definition of a given + variable never contains a redefinition of the same variable. + Type variables may be shadowed. Thus, if a term variable has + multiple definitions within a module, all those definitions must + be local (let-bound). The only exception to this rule is that + (necessarily closed) types labelling <code>%external</code> expressions may + contain <code>tyvar</code> bindings that shadow outer bindings.</para> + + <para>Core generated by GHC makes heavy use of encoded names, in which + the characters <code>Z</code> and <code>z</code> are used to introduce escape sequences + for non-alphabetic characters such as dollar sign <code>$</code> (<code>zd</code>), hash <code>#</code> + (<code>zh</code>), plus <code>+</code> (<code>zp</code>), etc. This is the same encoding used in <code>.hi</code> + files and in the back-end of GHC itself, except that we + sometimes change an initial <code>z</code> to <code>Z</code>, or vice-versa, in order to + maintain case distinctions.</para> + + <para>Finally, note that hierarchical module names are z-encoded in + Core: for example, <code>base:GHC.Base.foo</code> is rendered as + <code>base:GHCziBase.foo</code>. A parser may reconstruct the module + hierarchy, or regard <code>GHCziBase</code> as a flat name.</para> + + </section> + <section id="types-and-kinds"> + <title>Types and Kinds</title> + + <para>In Core, all type abstractions and applications are explicit. + This make it easy to typecheck any (closed) fragment of Core + code. An full executable typechecker is available separately.</para> + + <section id="types"> + <title>Types</title> + + <para>Types are described by type expressions, which are built from + named type constructors and type variables using type + application and universal quantification. Each type + constructor has a fixed arity ≥ 0. Because it is so widely + used, there is special infix syntax for the fully-applied + function type constructor (<code>&arw;</code>). (The prefix identifier for + this constructor is <code>ghc-prim:GHC.Prim.ZLzmzgZR</code>; this should + only appear in unapplied or partially applied form.)</para> + + <para>There are also a number of other primitive type constructors + (e.g., <code>Intzh</code>) that are predefined in the <code>GHC.Prim</code> module, but + have no special syntax. <code>%data</code> and <code>%newtype</code> declarations + introduce additional type constructors, as described below. + Type constructors are distinguished solely by name.</para> + + </section> + <section id="coercions"> + <title>Coercions</title> + + <para>A type may also be built using one of the primitive coercion + operators, as described in <link linkend="namespaces">the Namespaces section</link>. For details on the + meanings of these operators, see the System FC paper <citation>system-fc</citation>. Also + see <link linkend="newtypes">the Newtypes section</link> for + examples of how GHC uses coercions in Core code.</para> + + </section> + <section id="kinds"> + <title>Kinds</title> + <para>As described in the Haskell definition, it is necessary to + distinguish well-formed type-expressions by classifying them + into different <emphasis>kinds</emphasis> <citation>haskell98, p. 41</citation><!-- TODO -->. In particular, Core + explicitly records the kind of every bound type variable.</para> + + <para>In addition, Core’s kind system includes equality kinds, as in + System FC <citation>system-fc</citation>. An application of a built-in coercion, or of a + user-defined coercion as introduced by a <code>newtype</code> declaration, + has an equality kind.</para> + + </section> + <section id="lifted-and-unlifted-types"> + <title>Lifted and Unlifted Types</title> + + <para>Semantically, a type is <emphasis>lifted</emphasis> if and only if it has bottom as + an element. We need to distinguish them because operationally, + terms with lifted types may be represented by closures; terms + with unlifted types must not be represented by closures, which + implies that any unboxed value is necessarily unlifted. We + distinguish between lifted and unlifted types by ascribing + them different kinds.</para> + + <para>Currently, all the primitive types are unlifted (including a + few boxed primitive types such as <code>ByteArrayzh</code>). Peyton-Jones + and Launchbury <citation>pj:unboxed</citation> described the ideas behind unboxed and + unlifted types.</para> + + </section> + <section id="type-constructors-base-kinds-and-higher-kinds"> + <title>Type Constructors; Base Kinds and Higher Kinds</title> + + <para>Every type constructor has a kind, depending on its arity and + whether it or its arguments are lifted.</para> + + <para>Term variables can only be assigned types that have base + kinds: the base kinds are <code>*</code>, <code>#</code>, and <code>?</code>. The three base kinds + distinguish the liftedness of the types they classify: <code>*</code> + represents lifted types; <code>#</code> represents unlifted types; and <code>?</code> is + the <quote>open</quote> kind, representing a type that may be either lifted + or unlifted. Of these, only <code>*</code> ever appears in Core type + declarations generated from user code; the other two are + needed to describe certain types in primitive (or otherwise + specially-generated) code (which, after optimization, could + potentially appear anywhere).</para> + + <para>In particular, no top-level identifier (except in + <code>ghc-prim:GHC.Prim</code>) has a type of kind <code>#</code> or <code>?</code>.</para> + + <para>Nullary type constructors have base kinds: for example, the + type <code>Int</code> has kind <code>*</code>, and <code>Int#</code> has kind <code>#</code>.</para> + + <para>Non-nullary type constructors have higher kinds: kinds that + have the form + k<subscript>1</subscript><code>&arw;</code>k<subscript>2</subscript>, where + k<subscript>1</subscript> and k<subscript>2</subscript> are + kinds. For example, the function type constructor <code>&arw;</code> has + kind <code>* &arw; (* &arw; *)</code>. Since Haskell allows abstracting + over type constructors, type variables may have higher kinds; + however, much more commonly they have kind <code>*</code>, so that is the + default if a type binder omits a kind.</para> + + </section> + + <section id="type-synonyms-and-type-equivalence"> + <title>Type Synonyms and Type Equivalence</title> + + <para>There is no mechanism for defining type synonyms + (corresponding to Haskell <code>type</code> declarations).</para> + + <para>Type equivalence is just syntactic equivalence on type + expressions (of base kinds) modulo:</para> + + <itemizedlist> + <listitem>alpha-renaming of variables bound in <code>%forall</code> types;</listitem> + <listitem>the identity a <code>&arw;</code> b ≡ <code>ghc-prim:GHC.Prim.ZLzmzgZR</code> a b</listitem> + </itemizedlist> + + </section> + </section> + <section id="algebraic-data-types"> + <title>Algebraic data types</title> + + <para>Each data declaration introduces a new type constructor and a + set of one or more data constructors, normally corresponding + directly to a source Haskell <code>data</code> declaration. For example, the + source declaration + + <programlisting language="haskell"> +data Bintree a = + Fork (Bintree a) (Bintree a) + | Leaf a + </programlisting> + + might induce the following Core declaration + + <programlisting language="java"> +%data Bintree a = { + Fork (Bintree a) (Bintree a); + Leaf a)} + </programlisting> + + which introduces the unary type constructor Bintree of kind + <code>*&arw;*</code> and two data constructors with types + + <programlisting language="java"> +Fork :: %forall a . Bintree a &arw; Bintree a &arw; Bintree a +Leaf :: %forall a . a &arw; Bintree a + </programlisting> + + We define the <emphasis>arity</emphasis> of each data constructor to be the number of + value arguments it takes; e.g. <code>Fork</code> has arity 2 and <code>Leaf</code> has + arity 1.</para> + + <para>For a less conventional example illustrating the possibility of + higher-order kinds, the Haskell source declaration + + <programlisting language="haskell"> +data A f a = MkA (f a) + </programlisting> + + might induce the Core declaration + + <programlisting language="java"> +%data A (f::*&arw;*) a = { MkA (f a) } + </programlisting> + + which introduces the constructor + + <programlisting language="java"> +MkA :: %forall (f::*&arw;*) a . (f a) &arw; (A f) a + </programlisting></para> + + <para>GHC (like some other Haskell implementations) supports an + extension to Haskell98 for existential types such as + + <programlisting language="haskell"> +data T = forall a . MkT a (a &arw; Bool) + </programlisting> + + This is represented by the Core declaration + + <programlisting language="java"> +%data T = {MkT @a a (a &arw; Bool)} + </programlisting> + + which introduces the nullary type constructor T and the data + constructor + + <programlisting language="java"> +MkT :: %forall a . a &arw; (a &arw; Bool) &arw; T + </programlisting> + + In general, existentially quantified variables appear as extra + universally quantified variables in the data contructor types. An + example of how to construct and deconstruct values of type <code>T</code> is + shown in <link linkend="expression-forms">the Expression Forms section</link>.</para> + + </section> + <section id="newtypes"> + <title>Newtypes</title> + + <para>Each Core <code>%newtype</code> declaration introduces a new type constructor + and an associated representation type, corresponding to a source + Haskell <code>newtype</code> declaration. However, unlike in source Haskell, + a <code>%newtype</code> declaration does not introduce any data constructors.</para> + + <para>Each <code>%newtype</code> declaration also introduces a new coercion + (syntactically, just another type constructor) that implies an + axiom equating the type constructor, applied to any type + variables bound by the <code>%newtype</code>, to the representation type.</para> + + <para>For example, the Haskell fragment + + <programlisting language="haskell"> +newtype U = MkU Bool +u = MkU True +v = case u of + MkU b &arw; not b + </programlisting> + + might induce the Core fragment + + <programlisting language="java"> +%newtype U ZCCoU = Bool; +u :: U = %cast (True) + ((%sym ZCCoU)); +v :: Bool = not (%cast (u) ZCCoU); + </programlisting></para> + + <para>The <code>newtype</code> declaration implies that the types <code>U</code> and <code>Bool</code> have + equivalent representations, and the coercion axiom <code>ZCCoU</code> + provides evidence that <code>U</code> is equivalent to <code>Bool</code>. Notice that in + the body of <code>u</code>, the boolean value <code>True</code> is cast to type <code>U</code> using + the primitive symmetry rule applied to <code>ZCCoU</code>: that is, using a + coercion of kind <code>Bool :=: U</code>. And in the body of <code>v</code>, <code>u</code> is cast + back to type <code>Bool</code> using the axiom <code>ZCCoU</code>.</para> + + <para>Notice that the <code>case</code> in the Haskell source code above translates + to a <code>cast</code> in the corresponding Core code. That is because + operationally, a <code>case</code> on a value whose type is declared by a + <code>newtype</code> declaration is a no-op. Unlike a <code>case</code> on any other + value, such a <code>case</code> does no evaluation: its only function is to + coerce its scrutinee’s type.</para> + + <para>Also notice that unlike in a previous draft version of External + Core, there is no need to handle recursive newtypes specially.</para> + + </section> + + <section id="expression-forms"> + <title>Expression Forms</title> + + <para>Variables and data constructors are straightforward.</para> + + <para>Literal (<emphasis role="variable">lit</emphasis>) expressions consist of a literal value, in one of + four different formats, and a (primitive) type annotation. Only + certain combinations of format and type are permitted; + see <link linkend="primitive-module">the Primitive Module section</link>. + The character and string formats can describe only 8-bit ASCII characters.</para> + + <para>Moreover, because the operational semantics for Core interprets + strings as C-style null-terminated strings, strings should not + contain embedded nulls.</para> + + <para>In Core, value applications, type applications, value + abstractions, and type abstractions are all explicit. To tell + them apart, type arguments in applications and formal type + arguments in abstractions are preceded by an <code>@ symbol</code>. (In + abstractions, the <code>@</code> plays essentially the same role as the more + usual Λ symbol.) For example, the Haskell source declaration + + <programlisting language="haskell"> +f x = Leaf (Leaf x) + </programlisting> + + might induce the Core declaration + + <programlisting language="java"> +f :: %forall a . a &arw; BinTree (BinTree a) = + \ @a (x::a) &arw; Leaf @(Bintree a) (Leaf @a x) + </programlisting></para> + + <para>Value applications may be of user-defined functions, data + constructors, or primitives. None of these sorts of applications + are necessarily saturated.</para> + + <para>Note that the arguments of type applications are not always of + kind <code>*</code>. For example, given our previous definition of type <code>A</code>: + + <programlisting language="haskell"> +data A f a = MkA (f a) + </programlisting> + + the source code + + <programlisting language="haskell"> +MkA (Leaf True) + </programlisting> + + becomes + + <programlisting language="java"> +(MkA @Bintree @Bool) (Leaf @Bool True) + </programlisting></para> + + <para>Local bindings, of a single variable or of a set of mutually + recursive variables, are represented by <code>%let</code> expressions in the + usual way.</para> + + <para>By far the most complicated expression form is <code>%case</code>. <code>%case</code> + expressions are permitted over values of any type, although they + will normally be algebraic or primitive types (with literal + values). Evaluating a <code>%case</code> forces the evaluation of the + expression being tested (the <quote>scrutinee</quote>). The value of the + scrutinee is bound to the variable following the <code>%of</code> keyword, + which is in scope in all alternatives; this is useful when the + scrutinee is a non-atomic expression (see next example). The + scrutinee is preceded by the type of the entire <code>%case</code> + expression: that is, the result type that all of the <code>%case</code> + alternatives have (this is intended to make type reconstruction + easier in the presence of type equality coercions).</para> + + <para>In an algebraic <code>%case</code>, all the case alternatives must be labeled + with distinct data constructors from the algebraic type, + followed by any existential type variable bindings (see below), + and typed term variable bindings corresponding to the data + constructor’s arguments. The number of variables must match the + data constructor’s arity.</para> + + <para>For example, the following Haskell source expression + + <programlisting language="haskell"> +case g x of + Fork l r &arw; Fork r l + t@(Leaf v) &arw; Fork t t + </programlisting> + + might induce the Core expression + + <programlisting language="java"> +%case ((Bintree a)) g x %of (t::Bintree a) + Fork (l::Bintree a) (r::Bintree a) &arw; + Fork @a r l + Leaf (v::a) &arw; + Fork @a t t + </programlisting></para> + + <para>When performing a <code>%case</code> over a value of an + existentially-quantified algebraic type, the alternative must + include extra local type bindings for the + existentially-quantified variables. For example, given + + <programlisting language="haskell"> +data T = forall a . MkT a (a &arw; Bool) + </programlisting> + + the source + + <programlisting language="haskell"> +case x of + MkT w g &arw; g w + </programlisting> + + becomes + + <programlisting language="java"> +%case x %of (x’::T) + MkT @b (w::b) (g::b&arw;Bool) &arw; g w + </programlisting></para> + + <para>In a <code>%case</code> over literal alternatives, all the case alternatives + must be distinct literals of the same primitive type.</para> + + <para>The list of alternatives may begin with a default alternative + labeled with an underscore (<code>%_</code>), whose right-hand side will be + evaluated if none of the other alternatives match. The default + is optional except for in a case over a primitive type, or when + there are no other alternatives. If the case is over neither an + algebraic type nor a primitive type, then the list of + alternatives must contain a default alternative and nothing + else. For algebraic cases, the set of alternatives need not be + exhaustive, even if no default is given; if alternatives are + missing, this implies that GHC has deduced that they cannot + occur.</para> + + <para><code>%cast</code> is used to manipulate newtypes, as described in + <link linkend="newtypes">the Newtype section</link>. The <code>%cast</code> expression + takes an expression and a coercion: syntactically, the coercion + is an arbitrary type, but it must have an equality kind. In an + expression <code>(cast e co)</code>, if <code>e :: T</code> and <code>co</code> has kind <code>T :=: U</code>, then + the overall expression has type <code>U</code> <citation>ghc-fc-commentary</citation>. Here, <code>co</code> must be a + coercion whose left-hand side is <code>T</code>.</para> + + <para>Note that unlike the <code>%coerce</code> expression that existed in previous + versions of Core, this means that <code>%cast</code> is (almost) type-safe: + the coercion argument provides evidence that can be verified by + a typechecker. There are still unsafe <code>%cast</code>s, corresponding to + the unsafe <code>%coerce</code> construct that existed in old versions of + Core, because there is a primitive unsafe coercion type that can + be used to cast arbitrary types to each other. GHC uses this for + such purposes as coercing the return type of a function (such as + error) which is guaranteed to never return: + + <programlisting language="haskell"> +case (error "") of + True &arw; 1 + False &arw; 2 + </programlisting> + + becomes: + + <programlisting language="java"> +%cast (error @ Bool (ZMZN @ Char)) +(%unsafe Bool Integer); + </programlisting> + + <code>%cast</code> has no operational meaning and is only used in + typechecking.</para> + + <para>A <code>%note</code> expression carries arbitrary internal information that + GHC finds interesting. The information is encoded as a string. + Expression notes currently generated by GHC include the inlining + pragma (<code>InlineMe</code>) and cost-center labels for profiling.</para> + + <para>A <code>%external</code> expression denotes an external identifier, which has + the indicated type (always expressed in terms of Haskell + primitive types). External Core supports two kinds of external + calls: <code>%external</code> and <code>%dynexternal</code>. Only the former is supported + by the current set of stand-alone Core tools. In addition, there + is a <code>%label</code> construct which GHC may generate but which the Core + tools do not support.</para> + + <para>The present syntax for externals is sufficient for describing C + functions and labels. Interfacing to other languages may require + additional information or a different interpretation of the name + string.</para> + + </section> + + <section id="expression-evaluation"> + <title>Expression Evaluation</title> + <para>The dynamic semantics of Core are defined on the type-erasure of + the program: for example, we ignore all type abstractions and + applications. The denotational semantics of the resulting + type-free program are just the conventional ones for a + call-by-name language, in which expressions are only evaluated + on demand. But Core is intended to be a call-by-<emphasis>need</emphasis> language, + in which expressions are only evaluated once. To express the + sharing behavior of call-by-need, we give an operational model + in the style of Launchbury <citation>launchbury93natural</citation>.</para> + + <para>This section describes the model informally; a more formal + semantics is separately available as an executable interpreter.</para> + + <para>To simplify the semantics, we consider only <quote>well-behaved</quote> Core + programs in which constructor and primitive applications are + fully saturated, and in which non-trivial expresssions of + unlifted kind (<code>#</code>) appear only as scrutinees in <code>%case</code> + expressions. Any program can easily be put into this form; a + separately available preprocessor illustrates how. In the + remainder of this section, we use <quote>Core</quote> to mean <quote>well-behaved</quote> + Core.</para> + + <para>Evaluating a Core expression means reducing it to <emphasis>weak-head normal form (WHNF)</emphasis>, + i.e., a primitive value, lambda abstraction, + or fully-applied data constructor. Evaluating a program means + evaluating the expression <code>main:ZCMain.main</code>.</para> + + <para>To make sure that expression evaluation is shared, we make use + of a <emphasis>heap</emphasis>, which contains <emphasis>heap entries</emphasis>. A heap entry can be: + + <itemizedlist> + <listitem> + A <emphasis>thunk</emphasis>, representing an unevaluated expression, also known + as a suspension. + </listitem> + <listitem> + A <emphasis>WHNF</emphasis>, representing an evaluated expression. The result of + evaluating a thunk is a WHNF. A WHNF is always a closure + (corresponding to a lambda abstraction in the source + program) or a data constructor application: computations + over primitive types are never suspended. + </listitem> + </itemizedlist></para> + + <para><emphasis>Heap pointers</emphasis> point to heap entries: at different times, the + same heap pointer can point to either a thunk or a WHNF, because + the run-time system overwrites thunks with WHNFs as computation + proceeds.</para> + + <para>The suspended computation that a thunk represents might + represent evaluating one of three different kinds of expression. + The run-time system allocates a different kind of thunk + depending on what kind of expression it is: + + <itemizedlist> + <listitem> + A thunk for a value definition has a group of suspended + defining expressions, along with a list of bindings between + defined names and heap pointers to those suspensions. (A + value definition may be a recursive group of definitions or + a single non-recursive definition, and it may be top-level + (global) or <code>let</code>-bound (local)). + </listitem> + <listitem> + A thunk for a function application (where the function is + user-defined) has a suspended actual argument expression, + and a binding between the formal argument and a heap pointer + to that suspension. + </listitem> + <listitem> + A thunk for a constructor application has a suspended actual + argument expression; the entire constructed value has a heap + pointer to that suspension embedded in it. + </listitem> + </itemizedlist></para> + + <para>As computation proceeds, copies of the heap pointer for a given + thunk propagate through the executing program. When another + computation demands the result of that thunk, the thunk is + <emphasis>forced</emphasis>: the run-time system computes the thunk’s result, + yielding a WHNF, and overwrites the heap entry for the thunk + with the WHNF. Now, all copies of the heap pointer point to the + new heap entry: a WHNF. Forcing occurs only in the context of + + <itemizedlist> + <listitem>evaluating the operator expression of an application;</listitem> + <listitem>evaluating the scrutinee of a <code>case</code> expression; or</listitem> + <listitem>evaluating an argument to a primitive or external function application</listitem> + </itemizedlist> + </para> + + <para>When no pointers to a heap entry (whether it is a thunk or WHNF) + remain, the garbage collector can reclaim the space it uses. We + assume this happens implicitly.</para> + + <para>With the exception of functions, arrays, and mutable variables, + we intend that values of all primitive types should be held + <emphasis>unboxed</emphasis>: they should not be heap-allocated. This does not + violate call-by-need semantics: all primitive types are + <emphasis>unlifted</emphasis>, which means that values of those types must be + evaluated strictly. Unboxed tuple types are not heap-allocated + either.</para> + + <para>Certain primitives and <code>%external</code> functions cause side-effects to + state threads or to the real world. Where the ordering of these + side-effects matters, Core already forces this order with data + dependencies on the pseudo-values representing the threads.</para> + + <para>An implementation must specially support the <code>raisezh</code> and + <code>handlezh</code> primitives: for example, by using a handler stack. + Again, real-world threading guarantees that they will execute in + the correct order.</para> + + </section> + </section> + <section id="primitive-module"> + <title>Primitive Module</title> + + <para>The semantics of External Core rely on the contents and informal + semantics of the primitive module <code>ghc-prim:GHC.Prim</code>. Nearly all + the primitives are required in order to cover GHC’s implementation + of the Haskell98 standard prelude; the only operators that can be + completely omitted are those supporting the byte-code interpreter, + parallelism, and foreign objects. Some of the concurrency + primitives are needed, but can be given degenerate implementations + if it desired to target a purely sequential backend (see Section + <link linkend="non-concurrent-back-end">the Non-concurrent Back End section</link>).</para> + + <para>In addition to these primitives, a large number of C library + functions are required to implement the full standard Prelude, + particularly to handle I/O and arithmetic on less usual types.</para> + + <para>For a full listing of the names and types of the primitive + operators, see the GHC library documentation <citation>ghcprim</citation>.</para> + + <section id="non-concurrent-back-end"> + <title>Non-concurrent Back End</title> + + <para>The Haskell98 standard prelude doesn’t include any concurrency + support, but GHC’s implementation of it relies on the existence + of some concurrency primitives. However, it never actually forks + multiple threads. Hence, the concurrency primitives can be given + degenerate implementations that will work in a non-concurrent + setting, as follows:</para> + + <itemizedlist> + <listitem> + <code>ThreadIdzh</code> can be represented by a singleton type, whose + (unique) value is returned by <code>myThreadIdzh</code>. + </listitem> + <listitem> + <code>forkzh</code> can just die with an <quote>unimplemented</quote> message. + </listitem> + <listitem> + <code>killThreadzh</code> and <code>yieldzh</code> can also just die <quote>unimplemented</quote> + since in a one-thread world, the only thread a thread can + kill is itself, and if a thread yields the program hangs. + </listitem> + <listitem> + <code>MVarzh a</code> can be represented by <code>MutVarzh (Maybe a)</code>; where a + concurrent implementation would block, the sequential + implementation can just die with a suitable message (since + no other thread exists to unblock it). + </listitem> + <listitem> + <code>waitReadzh</code> and <code>waitWritezh</code> can be implemented using a <code>select</code> + with no timeout. + </listitem> + </itemizedlist> + </section> + + <section id="literals"> + <title>Literals</title> + + <para>Only the following combination of literal forms and types are + permitted:</para> + + <informaltable frame="none" colsep="0" rowsep="0"> + <tgroup cols='3'> + <colspec colname="literal" align="left" colwidth="*" /> + <colspec colname="type" align="left" colwidth="*" /> + <colspec colname="description" align="left" colwidth="4*" /> + <thead> + <row> + <entry>Literal form</entry> + <entry>Type</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry morerows="3" valign="top">integer</entry> + <entry><code>Intzh</code></entry> + <entry>Int</entry> + </row> + <row> + <entry><code>Wordzh</code></entry> + <entry>Word</entry> + </row> + <row> + <entry><code>Addrzh</code></entry> + <entry>Address</entry> + </row> + <row> + <entry><code>Charzh</code></entry> + <entry>Unicode character code</entry> + </row> + + <row> + <entry morerows="1" valign="top">rational</entry> + <entry><code>Floatzh</code></entry> + <entry>Float</entry> + </row> + <row> + <entry><code>Doublezh</code></entry> + <entry>Double</entry> + </row> + + <row> + <entry>character</entry> + <entry><code>Charzh</code></entry> + <entry>Unicode character specified by ASCII character</entry> + </row> + + <row> + <entry>string</entry> + <entry><code>Addrzh</code></entry> + <entry>Address of specified C-format string</entry> + </row> + </tbody> + </tgroup> + </informaltable> + </section> + </section> + + + <bibliolist> + <!-- This bibliography was semi-automatically converted by JabRef from core.bib. --> + + <title>References</title> + + <biblioentry> + <abbrev>ghc-user-guide</abbrev> + <authorgroup> + <author><surname>The GHC Team</surname></author> + </authorgroup> + <citetitle pubwork="article">The Glorious Glasgow Haskell Compilation System User's Guide, Version 6.8.2</citetitle> + <pubdate>2008</pubdate> + <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html">http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>ghc-fc-commentary</abbrev> + <authorgroup> + <author><surname>GHC Wiki</surname></author> + </authorgroup> + <citetitle pubwork="article">System FC: equality constraints and coercions</citetitle> + <pubdate>2006</pubdate> + <bibliomisc><ulink url="http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC">http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>ghc-api</abbrev> + <authorgroup> + <author><surname>Haskell Wiki</surname></author> + </authorgroup> + <citetitle pubwork="article">Using GHC as a library</citetitle> + <pubdate>2007</pubdate> + <bibliomisc><ulink url="http://haskell.org/haskellwiki/GHC/As_a_library">http://haskell.org/haskellwiki/GHC/As_a_library</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>haskell98</abbrev> + <authorgroup> + <editor><firstname>Simon</firstname><surname>Peyton-Jones</surname></editor> + </authorgroup> + <citetitle pubwork="article">Haskell 98 Language and Libraries: The Revised Report</citetitle> + <publisher> + <publishername>Cambridge University Press</publishername> + <address> + <city>Cambridge></city> + <state>UK</state> + </address> + </publisher> + <pubdate>2003</pubdate> + </biblioentry> + + <biblioentry> + <abbrev>system-fc</abbrev> + <authorgroup> + <author><firstname>Martin</firstname><surname>Sulzmann</surname></author> + <author><firstname>Manuel M.T.</firstname><surname>Chakravarty</surname></author> + <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author> + <author><firstname>Kevin</firstname><surname>Donnelly</surname></author> + </authorgroup> + <citetitle pubwork="article">System F with type equality coercions</citetitle> + <publisher> + <publishername>ACM</publishername> + <address> + <city>New York</city> + <state>NY</state> + <country>USA</country> + </address> + </publisher> + <artpagenums>53-66</artpagenums> + <pubdate>2007</pubdate> + <bibliomisc><ulink url="http://portal.acm.org/citation.cfm?id=1190324">http://portal.acm.org/citation.cfm?id=1190324</ulink></bibliomisc> + <!-- booktitle = {{TLDI '07: Proceedings of the 2007 ACM SIGPLAN International Workshop on Types in Language Design and Implementation}}, --> + </biblioentry> + + <biblioentry> + <abbrev>gadts</abbrev> + <authorgroup> + <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author> + <author><firstname>Dimitrios</firstname><surname>Vytiniotis</surname></author> + <author><firstname>Stephanie</firstname><surname>Weirich</surname></author> + <author><firstname>Geoffrey</firstname><surname>Washburn</surname></author> + </authorgroup> + <citetitle pubwork="article">Simple unification-based type inference for GADTs</citetitle> + <publisher> + <publishername>ACM</publishername> + <address> + <city>New York</city> + <state>NY</state> + <country>USA</country> + </address> + </publisher> + <artpagenums>50-61</artpagenums> + <pubdate>2006</pubdate> + <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm">http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>Launchbury94</abbrev> + <authorgroup> + <author><firstname>John</firstname><surname>Launchbury</surname></author> + <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> + </authorgroup> + <citetitle pubwork="article">Lazy Functional State Threads</citetitle> + <artpagenums>24-35</artpagenums> + <pubdate>1994</pubdate> + <bibliomisc><ulink url="http://citeseer.ist.psu.edu/article/launchbury93lazy.html">http://citeseer.ist.psu.edu/article/launchbury93lazy.html</ulink></bibliomisc> + <!-- booktitle = "{SIGPLAN} {Conference} on {Programming Language Design and Implementation}", --> + </biblioentry> + + <biblioentry> + <abbrev>pj:unboxed</abbrev> + <authorgroup> + <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> + <author><firstname>John</firstname><surname>Launchbury</surname></author> + <editor><firstname>J.</firstname><surname>Hughes</surname></editor> + </authorgroup> + <citetitle pubwork="article">Unboxed Values as First Class Citizens in a Non-strict Functional Language</citetitle> + <publisher> + <publishername>Springer-Verlag LNCS523</publishername> + <address> + <city>Cambridge</city> + <state>Massachussetts</state> + <country>USA</country> + </address> + </publisher> + <artpagenums>636-666</artpagenums> + <pubdate>1991, August 26-28</pubdate> + <bibliomisc><ulink url="http://citeseer.ist.psu.edu/jones91unboxed.html">http://citeseer.ist.psu.edu/jones91unboxed.html</ulink></bibliomisc> + <!-- booktitle = "Proceedings of the Conference on Functional Programming and Computer Architecture", --> + </biblioentry> + + <biblioentry> + <abbrev>ghc-inliner</abbrev> + <authorgroup> + <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author> + <author><firstname>Simon</firstname><surname>Marlow</surname></author> + </authorgroup> + <citetitle pubwork="article">Secrets of the Glasgow Haskell Compiler inliner</citetitle> + <pubdate>1999</pubdate> + <address> + <city>Paris</city> + <country>France</country> + </address> + <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf">http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf</ulink></bibliomisc> + <!-- booktitle = "Workshop on Implementing Declarative Languages", --> + </biblioentry> + + <biblioentry> + <abbrev>comp-by-trans-scp</abbrev> + <authorgroup> + <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> + <author><firstname>A. L. M.</firstname><surname>Santos</surname></author> + </authorgroup> + <citetitle pubwork="article">A transformation-based optimiser for Haskell</citetitle> + <citetitle pubwork="journal">Science of Computer Programming</citetitle> + <volumenum>32</volumenum> + <issuenum>1-3</issuenum> + <artpagenums>3-47</artpagenums> + <pubdate>1998</pubdate> + <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones98transformationbased.html">http://citeseer.ist.psu.edu/peytonjones98transformationbased.html</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>stg-machine</abbrev> + <authorgroup> + <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> + </authorgroup> + <citetitle pubwork="article">Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine</citetitle> + <citetitle pubwork="journal">Journal of Functional Programming</citetitle> + <volumenum>2</volumenum> + <issuenum>2</issuenum> + <artpagenums>127-202</artpagenums> + <pubdate>1992</pubdate> + <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones92implementing.html">http://citeseer.ist.psu.edu/peytonjones92implementing.html</ulink></bibliomisc> + </biblioentry> + + <biblioentry> + <abbrev>launchbury93natural</abbrev> + <authorgroup> + <author><firstname>John</firstname><surname>Launchbury</surname></author> + </authorgroup> + <citetitle pubwork="article">A Natural Semantics for Lazy Evaluation</citetitle> + <artpagenums>144-154</artpagenums> + <address> + <city>Charleston</city> + <state>South Carolina</state> + </address> + <pubdate>1993</pubdate> + <bibliomisc><ulink url="citeseer.ist.psu.edu/launchbury93natural.html">citeseer.ist.psu.edu/launchbury93natural.html</ulink></bibliomisc> + <!-- booktitle = "Conference Record of the Twentieth Annual {ACM} {SIGPLAN}-{SIGACT} Symposium on Principles of Programming Languages", --> + </biblioentry> + + <biblioentry> + <abbrev>ghcprim</abbrev> + <authorgroup> + <author><surname>The GHC Team</surname></author> + </authorgroup> + <citetitle pubwork="article">Library documentation: GHC.Prim</citetitle> + <pubdate>2008</pubdate> + <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html">http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html</ulink></bibliomisc> + </biblioentry> + </bibliolist> + +</chapter> diff --git a/docs/users_guide/ug-book.xml.in b/docs/users_guide/ug-book.xml.in index 1ff487c2ed..4d4489f80f 100644 --- a/docs/users_guide/ug-book.xml.in +++ b/docs/users_guide/ug-book.xml.in @@ -17,6 +17,7 @@ &lang-features; &ffi-chap; &extending-ghc; +&external-core; &wrong; &utils; &win32-dll; diff --git a/docs/users_guide/ug-ent.xml.in b/docs/users_guide/ug-ent.xml.in index c83abaad52..37df56889b 100644 --- a/docs/users_guide/ug-ent.xml.in +++ b/docs/users_guide/ug-ent.xml.in @@ -12,6 +12,7 @@ <!ENTITY sooner SYSTEM "sooner.xml" > <!ENTITY lang-features SYSTEM "lang.xml" > <!ENTITY glasgowexts SYSTEM "glasgow_exts.xml" > +<!ENTITY external-core SYSTEM "external_core.xml" > <!ENTITY packages SYSTEM "packages.xml" > <!ENTITY parallel SYSTEM "parallel.xml" > <!ENTITY safehaskell SYSTEM "safe_haskell.xml" > @@ -28,3 +29,4 @@ <!ENTITY libraryBaseLocation "../libraries/base-@LIBRARY_base_VERSION@"> <!ENTITY libraryCabalLocation "../libraries/Cabal-@LIBRARY_Cabal_VERSION@"> <!ENTITY libraryGhcPrimLocation "../libraries/ghc-prim-@LIBRARY_ghc_prim_VERSION@"> +<!ENTITY arw "->"> |