diff options
Diffstat (limited to 'ghc/docs/add_to_compiler/front-end.verb')
-rw-r--r-- | ghc/docs/add_to_compiler/front-end.verb | 297 |
1 files changed, 0 insertions, 297 deletions
diff --git a/ghc/docs/add_to_compiler/front-end.verb b/ghc/docs/add_to_compiler/front-end.verb deleted file mode 100644 index 5c3c41dd38..0000000000 --- a/ghc/docs/add_to_compiler/front-end.verb +++ /dev/null @@ -1,297 +0,0 @@ -%************************************************************************ -%* * -\subsection{The front end of the compiler} -\label{sec:front-end} -%* * -%************************************************************************ - -The previous section covered the main points about the front end of -the compiler: it is dominated by a ``renamer'' and a typechecker -working directly at the Haskell source level. In this section, we -will look at some basic datatypes used or introduced in the front -end---ones that you may see as input to your optimisation pass. - -\subsubsection{``Abstract syntax'', a source-level datatype} -\label{sec:AbsSyntax} - -As Figure~\ref{fig:overview} shows, the typechecker both reads and -writes a collection of datatypes we call ``Abstract syntax.'' -This is misleading in that what -goes into the typechecker is quite different from what comes out. - -Let's first consider this fragment of the abstract-syntax -definition\srcloc{abstractSyn/HsExpr.lhs}, for Haskell explicit-list -expressions (Haskell report, section~3.5 -\cite{hudak91a}):\nopagebreak[4] -\begin{mytightcode} -data Expr var pat = - ... - | ExplicitList [Expr var pat] - | ExplicitListOut UniType [Expr var pat] - ... - -type ProtoNameExpr = Expr ProtoName ProtoNamePat -type RenamedExpr = Expr Name RenamedPat -type TypecheckedExpr = Expr Id TypecheckedPat\end{mytightcode} -an @ExplicitList@ appears only in typechecker input; an @ExplicitListOut@ -is the corresponding construct that appears -only in the output, with the inferred type information attached. - -The fragment above also shows how abstract syntax is {\em parameterised} -with respect to variables and patterns. The idea is the same for -both; let's just consider variables. - -The renamer converts @ProtoNameExprs@ (expressions with -@ProtoNames@\srcloc{basicTypes/ProtoName.lhs} as variables---little -more than just strings) into @RenamedExprs@, which have all naming sorted out -(using @Names@\srcloc{abstractSyn/Name.lhs}). A @Name@ is known to be -properly bound, isn't duplicated, etc.; it's known if it's bound to a -built-in standard-prelude entity. - -(The renamer also does dependency analysis, which is required to -maximise polymorphism in a Hindley-Milner type system.) - -The typechecker reads the @RenamedExprs@, sorts out the types, and -spits out @TypecheckedExprs@, with variables represented by -@Ids@\srcloc{basicTypes/Id.lhs}. You can find out just about anything -you want to know about a variable from its @Id@. - -To see what GHC makes of some Haskell, in a file @Foo.hs@, say: -try @ghc -noC -ddump-rn4 Foo.hs@, to see what comes out of the renamer [pass~4]; -try @ghc -noC -ddump-tc Foo.hs@, to see what comes out of the typechecker. - -\subsubsection{Basic datatypes in the compiler} - -None of the internal datatypes in the example just given are -particularly interesting except @Ids@\srcloc{basicTypes/Id.lhs}. A -program variable, which enters the typechecker as a string, emerges as -an @Id@. - -The important thing about @Id@---and the datatypes representing other -basic program entities (type variables, type constructors, classes, -etc.)---is that they often include {\em memoised} information that can -be used throughout the rest of the compiler. - -Let us take a cursory look at @Ids@, as a representative example of -these basic data types. (Don't be too scared---@Ids@ are the hairiest -entities in the whole compiler!) -Here we go: -\begin{mytightcode} -\codeallowbreaks{}data Id - = Id Unique {\dcd\rm-- key for fast comparison} - UniType {\dcd\rm-- Id's type; used all the time;} - IdInfo {\dcd\rm-- non-essential info about this Id;} - PragmaInfo {\dcd\rm-- user-specified pragmas about this Id;} - IdDetails {\dcd\rm-- stuff about individual kinds of Ids.}\end{mytightcode} - -So, every @Id@ comes with: -\begin{enumerate} -\item -A @Unique@\srcloc{basicTypes/Unique.lhs}, essentially a unique -@Int@, for fast comparison; -\item -A @UniType@ (more on them below... section~\ref{sec:UniType}) giving the variable's -type---this is the most useful memoised information. -\item -A @PragmaInfo@, which is pragmatic stuff the user specified for -this @Id@; e.g., @INLINE@ it; GHC does not promise to honour these -pragma requests, but this is where it keeps track of them. -\item -An @IdInfo@ (more on {\em them} below... section~\ref{sec:IdInfo}), -which tells you all the extra things -that the compiler doesn't {\em have} to know about an @Id@, but it's jolly nice... -This corresponds pretty closely to the @GHC_PRAGMA@ cross-module info that you will -see in any interface produced using @ghc -O@. -An example of some @IdInfo@ -would be: that @Id@'s unfolding; or its arity. -\end{enumerate} - -Then the fun begins with @IdDetails@... -\begin{mytightcode} -\codeallowbreaks{}data IdDetails - - {\dcd\rm---------------- Local values} - - = LocalId ShortName {\dcd\rm-- mentioned by the user} - - | SysLocalId ShortName {\dcd\rm-- made up by the compiler} - - {\dcd\rm---------------- Global values} - - | ImportedId FullName {\dcd\rm-- Id imported from an interface} - - | PreludeId FullName {\dcd\rm-- Prelude things the compiler ``knows'' about} - - | TopLevId FullName {\dcd\rm-- Top-level in the orig source pgm} - {\dcd\rm-- (not moved there by transformations).} - - {\dcd\rm---------------- Data constructors} - - | DataConId FullName - ConTag - [TyVarTemplate] ThetaType [UniType] TyCon - {\dcd\rm-- split-up type: the constructor's type is:} - {\dcd\rm-- $\forall~tyvars . theta\_ty \Rightarrow$} - {\dcd\rm-- $unitype_1 \rightarrow~ ... \rightarrow~ unitype_n \rightarrow~ tycon tyvars$} - - | TupleCon Int {\dcd\rm-- its arity} - - {\dcd\rm-- There are quite a few more flavours of {\tt IdDetails}...}\end{mytightcode} - -% A @ShortName@,\srcloc{basicTypes/NameTypes.lhs} which includes a name string -% and a source-line location for the name's binding occurrence; - -In short: everything that later parts of the compiler might want to -know about a local @Id@ is readily at hand. The same principle holds -true for imported-variable and data-constructor @Ids@ (tuples are an -important enough case to merit special pleading), as well as for other -basic program entities. Here are a few further notes about the @Id@ -fragments above: -\begin{itemize} -\item -A @FullName@\srcloc{basicTypes/NameTypes.lhs} is one that may be -globally visible, with a module-name as well; it may have been -renamed. -\item -@DataConKey@\srcloc{prelude/PrelUniqs.lhs} is a specialised -fast-comparison key for data constructors; there are several of these -kinds of things. -\item -The @UniType@ for @DataConIds@ is given in {\em two} ways: once, just as -a plain type; secondly, split up into its component parts. This is to -save frequently re-splitting such types. -\item -Similarly, a @TupleCon@ has a type attached, even though we could -construct it just from the arity. -\end{itemize} - -\subsubsection{@UniTypes@, representing types in the compiler} -\label{sec:UniType} - -Let us look further at @UniTypes@\srcloc{uniType/}. Their definition -is: -\begin{mytightcode} -\codeallowbreaks{}data UniType - = UniTyVar TyVar - - | UniFun UniType {\dcd\rm-- function type} - UniType - - | UniData TyCon {\dcd\rm-- non-synonym datatype} - [UniType] - - | UniSyn TyCon {\dcd\rm-- type synonym} - [UniType] {\dcd\rm--\ \ unexpanded form} - UniType {\dcd\rm--\ \ expanded form} - - | UniDict Class {\dcd\rm-- for types with overloading} - UniType - - {\dcd\rm-- The next two are to do with universal quantification.} - | UniTyVarTemplate TyVarTemplate - - | UniForall TyVarTemplate - UniType\end{mytightcode} -When the typechecker processes a source module, it adds @UniType@ -information to all the basic entities (e.g., @Ids@), among other -places (see Section~\ref{sec:second-order} for more details). These -types are used throughout the compiler. - -The following example shows several things about @UniTypes@. -If a programmer wrote @(Eq a) => a -> [a]@, it would be represented -as:\footnote{The internal structures of @Ids@, -@Classes@, @TyVars@, and @TyCons@ are glossed over here...} -\begin{mytightcode} -\codeallowbreaks{}UniForall {\dcd$\alpha$} - (UniFun (UniDict {\dcd\em Eq} (UniTyVar {\dcd$\alpha$})) - (UniFun (UniTyVarTemplate {\dcd$\alpha$}) - (UniData {\dcd\em listTyCon} - [UniTyVarTemplate {\dcd$\alpha$}])))\end{mytightcode} -From this example we see: -\begin{itemize} -\item -The universal quantification of the type variable $\alpha$ is made explicit -(with a @UniForall@). -\item -The class assertion @(Eq a)@ is represented with a @UniDict@ whose -second component is a @UniType@---this -reflects the fact that we expect @UniType@ to be used in a stylized -way, avoiding nonsensical constructs (e.g., -@(UniDict f (UniDict g (UniDict h ...)))@). -\item -The ``double arrow'' (@=>@) of the Haskell source, indicating an -overloaded type, is represented by the usual -@UniFun@ ``single arrow'' (@->@), again in a stylized way. -This change reflects the fact that each class assertion in a -function's type is implemented by adding an extra dictionary argument. -\item -In keeping with the memoising tradition we saw with @Ids@, type -synonyms (@UniSyns@) keep both the unexpanded and expanded forms handy. -\end{itemize} - -\subsubsection{@IdInfo@: extra pragmatic info about an @Id@} -\label{sec:IdInfo} - -[New in 0.16.] All the nice-to-have-but-not-essential information -about @Ids@ is now hidden in the -@IdInfo@\srcloc{basicTypes/IdInfo.lhs} datatype. It looks something -like: -\begin{mytightcode} -\codeallowbreaks{}data IdInfo - = NoIdInfo {\dcd\rm-- OK, we know nothing...} - - | MkIdInfo - ArityInfo {\dcd\rm-- its arity} - DemandInfo {\dcd\rm-- whether or not it is definitely demanded} - InliningInfo {\dcd\rm-- whether or not we should inline it} - SpecialisationInfo {\dcd\rm-- specialisations of this overloaded} - {\dcd\rm-- function which exist} - StrictnessInfo {\dcd\rm-- strictness properties, notably} - {\dcd\rm-- how to conjure up ``worker'' functions} - WrapperInfo {\dcd\rm-- ({\em informational only}) if an Id is} - {\dcd\rm-- a ``worker,'' this says what Id it's} - {\dcd\rm-- a worker for, i.e., ``who is my wrapper''} - {\dcd\rm-- (used to match up workers/wrappers)} - UnfoldingInfo {\dcd\rm-- its unfolding} - UpdateInfo {\dcd\rm-- which args should be updated} - SrcLocation {\dcd\rm-- source location of definition}\end{mytightcode} -As you can see, we may accumulate a lot of information about an Id! -(The types for all the sub-bits are given in the same place...) - -\subsubsection{Introducing dictionaries for overloading} - -The major novel feature of the Haskell language is its systematic -overloading using {\em type classes}; Wadler and Blott's paper is the -standard reference \cite{wadler89a}. - -To implement type classes, the typechecker not only checks the Haskell -source program, it also {\em translates} it---by inserting code to -pass around in {\em dictionaries}. These dictionaries -are essentially tuples of functions, from which the correct code may -be plucked at run-time to give the desired effect. Kevin Hammond -wrote and described the first working implementation of type -classes \cite{hammond89a}, and the ever-growing static-semantics paper -by Peyton Jones and Wadler is replete with the glories of dictionary -handling \cite{peyton-jones90a}. (By the way, the typechecker's -structure closely follows the static semantics paper; inquirers into -the former will become devoted students of the latter.) - -Much of the abstract-syntax datatypes are given -over to output-only translation machinery. Here are a few more -fragments of the @Expr@ type, all of which appear only in typechecker -output: -\begin{mytightcode} -data Expr var pat = - ... - | DictLam [DictVar] (Expr var pat) - | DictApp (Expr var pat) [DictVar] - | Dictionary [DictVar] [Id] - | SingleDict DictVar - ...\end{mytightcode} -You needn't worry about this stuff: -After the desugarer gets through with such constructs, there's nothing -left but @Ids@, tuples, tupling functions, etc.,---that is, ``plain -simple stuff'' that should make the potential optimiser's heart throb. -Optimisation passes don't deal with dictionaries explicitly but, in -some cases, quite a bit of the code passed through to them will be for -dictionary-handling. |