summaryrefslogtreecommitdiff
path: root/ghc/docs/add_to_compiler/front-end.verb
diff options
context:
space:
mode:
Diffstat (limited to 'ghc/docs/add_to_compiler/front-end.verb')
-rw-r--r--ghc/docs/add_to_compiler/front-end.verb297
1 files changed, 0 insertions, 297 deletions
diff --git a/ghc/docs/add_to_compiler/front-end.verb b/ghc/docs/add_to_compiler/front-end.verb
deleted file mode 100644
index 5c3c41dd38..0000000000
--- a/ghc/docs/add_to_compiler/front-end.verb
+++ /dev/null
@@ -1,297 +0,0 @@
-%************************************************************************
-%* *
-\subsection{The front end of the compiler}
-\label{sec:front-end}
-%* *
-%************************************************************************
-
-The previous section covered the main points about the front end of
-the compiler: it is dominated by a ``renamer'' and a typechecker
-working directly at the Haskell source level. In this section, we
-will look at some basic datatypes used or introduced in the front
-end---ones that you may see as input to your optimisation pass.
-
-\subsubsection{``Abstract syntax'', a source-level datatype}
-\label{sec:AbsSyntax}
-
-As Figure~\ref{fig:overview} shows, the typechecker both reads and
-writes a collection of datatypes we call ``Abstract syntax.''
-This is misleading in that what
-goes into the typechecker is quite different from what comes out.
-
-Let's first consider this fragment of the abstract-syntax
-definition\srcloc{abstractSyn/HsExpr.lhs}, for Haskell explicit-list
-expressions (Haskell report, section~3.5
-\cite{hudak91a}):\nopagebreak[4]
-\begin{mytightcode}
-data Expr var pat =
- ...
- | ExplicitList [Expr var pat]
- | ExplicitListOut UniType [Expr var pat]
- ...
-
-type ProtoNameExpr = Expr ProtoName ProtoNamePat
-type RenamedExpr = Expr Name RenamedPat
-type TypecheckedExpr = Expr Id TypecheckedPat\end{mytightcode}
-an @ExplicitList@ appears only in typechecker input; an @ExplicitListOut@
-is the corresponding construct that appears
-only in the output, with the inferred type information attached.
-
-The fragment above also shows how abstract syntax is {\em parameterised}
-with respect to variables and patterns. The idea is the same for
-both; let's just consider variables.
-
-The renamer converts @ProtoNameExprs@ (expressions with
-@ProtoNames@\srcloc{basicTypes/ProtoName.lhs} as variables---little
-more than just strings) into @RenamedExprs@, which have all naming sorted out
-(using @Names@\srcloc{abstractSyn/Name.lhs}). A @Name@ is known to be
-properly bound, isn't duplicated, etc.; it's known if it's bound to a
-built-in standard-prelude entity.
-
-(The renamer also does dependency analysis, which is required to
-maximise polymorphism in a Hindley-Milner type system.)
-
-The typechecker reads the @RenamedExprs@, sorts out the types, and
-spits out @TypecheckedExprs@, with variables represented by
-@Ids@\srcloc{basicTypes/Id.lhs}. You can find out just about anything
-you want to know about a variable from its @Id@.
-
-To see what GHC makes of some Haskell, in a file @Foo.hs@, say:
-try @ghc -noC -ddump-rn4 Foo.hs@, to see what comes out of the renamer [pass~4];
-try @ghc -noC -ddump-tc Foo.hs@, to see what comes out of the typechecker.
-
-\subsubsection{Basic datatypes in the compiler}
-
-None of the internal datatypes in the example just given are
-particularly interesting except @Ids@\srcloc{basicTypes/Id.lhs}. A
-program variable, which enters the typechecker as a string, emerges as
-an @Id@.
-
-The important thing about @Id@---and the datatypes representing other
-basic program entities (type variables, type constructors, classes,
-etc.)---is that they often include {\em memoised} information that can
-be used throughout the rest of the compiler.
-
-Let us take a cursory look at @Ids@, as a representative example of
-these basic data types. (Don't be too scared---@Ids@ are the hairiest
-entities in the whole compiler!)
-Here we go:
-\begin{mytightcode}
-\codeallowbreaks{}data Id
- = Id Unique {\dcd\rm-- key for fast comparison}
- UniType {\dcd\rm-- Id's type; used all the time;}
- IdInfo {\dcd\rm-- non-essential info about this Id;}
- PragmaInfo {\dcd\rm-- user-specified pragmas about this Id;}
- IdDetails {\dcd\rm-- stuff about individual kinds of Ids.}\end{mytightcode}
-
-So, every @Id@ comes with:
-\begin{enumerate}
-\item
-A @Unique@\srcloc{basicTypes/Unique.lhs}, essentially a unique
-@Int@, for fast comparison;
-\item
-A @UniType@ (more on them below... section~\ref{sec:UniType}) giving the variable's
-type---this is the most useful memoised information.
-\item
-A @PragmaInfo@, which is pragmatic stuff the user specified for
-this @Id@; e.g., @INLINE@ it; GHC does not promise to honour these
-pragma requests, but this is where it keeps track of them.
-\item
-An @IdInfo@ (more on {\em them} below... section~\ref{sec:IdInfo}),
-which tells you all the extra things
-that the compiler doesn't {\em have} to know about an @Id@, but it's jolly nice...
-This corresponds pretty closely to the @GHC_PRAGMA@ cross-module info that you will
-see in any interface produced using @ghc -O@.
-An example of some @IdInfo@
-would be: that @Id@'s unfolding; or its arity.
-\end{enumerate}
-
-Then the fun begins with @IdDetails@...
-\begin{mytightcode}
-\codeallowbreaks{}data IdDetails
-
- {\dcd\rm---------------- Local values}
-
- = LocalId ShortName {\dcd\rm-- mentioned by the user}
-
- | SysLocalId ShortName {\dcd\rm-- made up by the compiler}
-
- {\dcd\rm---------------- Global values}
-
- | ImportedId FullName {\dcd\rm-- Id imported from an interface}
-
- | PreludeId FullName {\dcd\rm-- Prelude things the compiler ``knows'' about}
-
- | TopLevId FullName {\dcd\rm-- Top-level in the orig source pgm}
- {\dcd\rm-- (not moved there by transformations).}
-
- {\dcd\rm---------------- Data constructors}
-
- | DataConId FullName
- ConTag
- [TyVarTemplate] ThetaType [UniType] TyCon
- {\dcd\rm-- split-up type: the constructor's type is:}
- {\dcd\rm-- $\forall~tyvars . theta\_ty \Rightarrow$}
- {\dcd\rm-- $unitype_1 \rightarrow~ ... \rightarrow~ unitype_n \rightarrow~ tycon tyvars$}
-
- | TupleCon Int {\dcd\rm-- its arity}
-
- {\dcd\rm-- There are quite a few more flavours of {\tt IdDetails}...}\end{mytightcode}
-
-% A @ShortName@,\srcloc{basicTypes/NameTypes.lhs} which includes a name string
-% and a source-line location for the name's binding occurrence;
-
-In short: everything that later parts of the compiler might want to
-know about a local @Id@ is readily at hand. The same principle holds
-true for imported-variable and data-constructor @Ids@ (tuples are an
-important enough case to merit special pleading), as well as for other
-basic program entities. Here are a few further notes about the @Id@
-fragments above:
-\begin{itemize}
-\item
-A @FullName@\srcloc{basicTypes/NameTypes.lhs} is one that may be
-globally visible, with a module-name as well; it may have been
-renamed.
-\item
-@DataConKey@\srcloc{prelude/PrelUniqs.lhs} is a specialised
-fast-comparison key for data constructors; there are several of these
-kinds of things.
-\item
-The @UniType@ for @DataConIds@ is given in {\em two} ways: once, just as
-a plain type; secondly, split up into its component parts. This is to
-save frequently re-splitting such types.
-\item
-Similarly, a @TupleCon@ has a type attached, even though we could
-construct it just from the arity.
-\end{itemize}
-
-\subsubsection{@UniTypes@, representing types in the compiler}
-\label{sec:UniType}
-
-Let us look further at @UniTypes@\srcloc{uniType/}. Their definition
-is:
-\begin{mytightcode}
-\codeallowbreaks{}data UniType
- = UniTyVar TyVar
-
- | UniFun UniType {\dcd\rm-- function type}
- UniType
-
- | UniData TyCon {\dcd\rm-- non-synonym datatype}
- [UniType]
-
- | UniSyn TyCon {\dcd\rm-- type synonym}
- [UniType] {\dcd\rm--\ \ unexpanded form}
- UniType {\dcd\rm--\ \ expanded form}
-
- | UniDict Class {\dcd\rm-- for types with overloading}
- UniType
-
- {\dcd\rm-- The next two are to do with universal quantification.}
- | UniTyVarTemplate TyVarTemplate
-
- | UniForall TyVarTemplate
- UniType\end{mytightcode}
-When the typechecker processes a source module, it adds @UniType@
-information to all the basic entities (e.g., @Ids@), among other
-places (see Section~\ref{sec:second-order} for more details). These
-types are used throughout the compiler.
-
-The following example shows several things about @UniTypes@.
-If a programmer wrote @(Eq a) => a -> [a]@, it would be represented
-as:\footnote{The internal structures of @Ids@,
-@Classes@, @TyVars@, and @TyCons@ are glossed over here...}
-\begin{mytightcode}
-\codeallowbreaks{}UniForall {\dcd$\alpha$}
- (UniFun (UniDict {\dcd\em Eq} (UniTyVar {\dcd$\alpha$}))
- (UniFun (UniTyVarTemplate {\dcd$\alpha$})
- (UniData {\dcd\em listTyCon}
- [UniTyVarTemplate {\dcd$\alpha$}])))\end{mytightcode}
-From this example we see:
-\begin{itemize}
-\item
-The universal quantification of the type variable $\alpha$ is made explicit
-(with a @UniForall@).
-\item
-The class assertion @(Eq a)@ is represented with a @UniDict@ whose
-second component is a @UniType@---this
-reflects the fact that we expect @UniType@ to be used in a stylized
-way, avoiding nonsensical constructs (e.g.,
-@(UniDict f (UniDict g (UniDict h ...)))@).
-\item
-The ``double arrow'' (@=>@) of the Haskell source, indicating an
-overloaded type, is represented by the usual
-@UniFun@ ``single arrow'' (@->@), again in a stylized way.
-This change reflects the fact that each class assertion in a
-function's type is implemented by adding an extra dictionary argument.
-\item
-In keeping with the memoising tradition we saw with @Ids@, type
-synonyms (@UniSyns@) keep both the unexpanded and expanded forms handy.
-\end{itemize}
-
-\subsubsection{@IdInfo@: extra pragmatic info about an @Id@}
-\label{sec:IdInfo}
-
-[New in 0.16.] All the nice-to-have-but-not-essential information
-about @Ids@ is now hidden in the
-@IdInfo@\srcloc{basicTypes/IdInfo.lhs} datatype. It looks something
-like:
-\begin{mytightcode}
-\codeallowbreaks{}data IdInfo
- = NoIdInfo {\dcd\rm-- OK, we know nothing...}
-
- | MkIdInfo
- ArityInfo {\dcd\rm-- its arity}
- DemandInfo {\dcd\rm-- whether or not it is definitely demanded}
- InliningInfo {\dcd\rm-- whether or not we should inline it}
- SpecialisationInfo {\dcd\rm-- specialisations of this overloaded}
- {\dcd\rm-- function which exist}
- StrictnessInfo {\dcd\rm-- strictness properties, notably}
- {\dcd\rm-- how to conjure up ``worker'' functions}
- WrapperInfo {\dcd\rm-- ({\em informational only}) if an Id is}
- {\dcd\rm-- a ``worker,'' this says what Id it's}
- {\dcd\rm-- a worker for, i.e., ``who is my wrapper''}
- {\dcd\rm-- (used to match up workers/wrappers)}
- UnfoldingInfo {\dcd\rm-- its unfolding}
- UpdateInfo {\dcd\rm-- which args should be updated}
- SrcLocation {\dcd\rm-- source location of definition}\end{mytightcode}
-As you can see, we may accumulate a lot of information about an Id!
-(The types for all the sub-bits are given in the same place...)
-
-\subsubsection{Introducing dictionaries for overloading}
-
-The major novel feature of the Haskell language is its systematic
-overloading using {\em type classes}; Wadler and Blott's paper is the
-standard reference \cite{wadler89a}.
-
-To implement type classes, the typechecker not only checks the Haskell
-source program, it also {\em translates} it---by inserting code to
-pass around in {\em dictionaries}. These dictionaries
-are essentially tuples of functions, from which the correct code may
-be plucked at run-time to give the desired effect. Kevin Hammond
-wrote and described the first working implementation of type
-classes \cite{hammond89a}, and the ever-growing static-semantics paper
-by Peyton Jones and Wadler is replete with the glories of dictionary
-handling \cite{peyton-jones90a}. (By the way, the typechecker's
-structure closely follows the static semantics paper; inquirers into
-the former will become devoted students of the latter.)
-
-Much of the abstract-syntax datatypes are given
-over to output-only translation machinery. Here are a few more
-fragments of the @Expr@ type, all of which appear only in typechecker
-output:
-\begin{mytightcode}
-data Expr var pat =
- ...
- | DictLam [DictVar] (Expr var pat)
- | DictApp (Expr var pat) [DictVar]
- | Dictionary [DictVar] [Id]
- | SingleDict DictVar
- ...\end{mytightcode}
-You needn't worry about this stuff:
-After the desugarer gets through with such constructs, there's nothing
-left but @Ids@, tuples, tupling functions, etc.,---that is, ``plain
-simple stuff'' that should make the potential optimiser's heart throb.
-Optimisation passes don't deal with dictionaries explicitly but, in
-some cases, quite a bit of the code passed through to them will be for
-dictionary-handling.