diff options
Diffstat (limited to 'docs/core-spec/core-spec.mng')
-rw-r--r-- | docs/core-spec/core-spec.mng | 112 |
1 files changed, 97 insertions, 15 deletions
diff --git a/docs/core-spec/core-spec.mng b/docs/core-spec/core-spec.mng index d1d89055e7..952a172e6f 100644 --- a/docs/core-spec/core-spec.mng +++ b/docs/core-spec/core-spec.mng @@ -30,7 +30,7 @@ System FC, as implemented in GHC\footnote{This document was originally prepared by Richard Eisenberg (\texttt{eir@cis.upenn.edu}), but it should be maintained by anyone who edits the functions or data structures mentioned in this file. Please feel free to contact Richard for more information.}\\ -\Large 23 October, 2015 +\Large 26 January, 2018 \end{center} \section{Introduction} @@ -89,6 +89,10 @@ variables: \ottn } +\gram{ +\ottl +} + We sometimes omit the type/kind annotation to a variable when it is obvious from context. \subsection{Expressions} @@ -112,13 +116,39 @@ is ok-for-speculation. See \verb|#let_app_invariant#| in \ghcfile{coreSyn/CoreSy \item Types and coercions can only appear on the right-hand-side of an application. \item The $[[t]]$ form of an expression must not then turn out to be a coercion. In other words, the payload inside of a \texttt{Type} constructor must not turn out -to be built with \texttt{CoercionTy}. +to be built with \texttt{CoercionTy}. +\item Join points (introduced by $[[join]]$ expressions) follow the invariants +laid out in \verb|Note [Invariants on join points]| in +\ghcfile{coreSyn/CoreSyn.lhs}: +\begin{enumerate} + \item All occurences must be tail calls. This is enforced in our typing + rules using the label environment $[[D]]$. + \item Each join point has a \emph{join arity}. In this document, we write + each label as $[[p/I_t]]$ for the name $[[p]]$, the type $[[t]]$, and the + join arity $[[I]]$. The right-hand side of the binding must begin with at + least $[[I]]$ lambdas. This is enforced implicitly in + \ottdrulename{Tm\_JoinNonRec} and \ottdrulename{Tm\_JoinRec} by the use of + $[[split]]$ meta-function. + \item If the binding is recursive, then all other bindings in the recursive + group must be join points. We enforce this in our reformulation of the + grammar; in the actual AST, a $[[join]]$ is simply a $[[let]]$ where each + identifier is flagged as a join id, so this invariant requires that this + flag must be consistent across a recursive binding. + \item The binding's type must not be polymorphic in its return type. This + is expressed in \ottdrulename{Label\_Label}; see + Section~\ref{sec:name_consistency}. +\end{enumerate} + \end{itemize} Bindings for $[[let]]$ statements: \gram{\ottbinding} +Bindings for $[[join]]$ statements: + +\gram{\ottjbinding} + Case alternatives: \gram{\ottalt} @@ -139,9 +169,10 @@ A program is just a list of bindings: \gram{\ottt} -\ctor{ForAllTy}s are represented in two different ways, depending on whether -the \ctor{ForAllTy} is anonymous (written $[[t1 -> t2]]$) or -named (written $[[forall n . t]]$). +\ctor{FunTy} is the special case for non-dependent function type. The +\ctor{TyBinder} in \ghcfile{types/TyCoRep.lhs} distinguishes whether a binder is +anonymous (\ctor{FunTy}) or named (\ctor{ForAllTy}). See +\verb|Note [TyBinders]| in \ghcfile{types/TyCoRep.lhs}. There are some invariants on types: \begin{itemize} @@ -152,7 +183,7 @@ $[[T]]$. It should be another application or a type variable. does \emph{not} need to be saturated. \item A saturated application of $[[(->) t1 t2]]$ should be represented as $[[t1 -> t2]]$. This is a different point in the grammar, not just pretty-printing. -The constructor for a saturated $[[(->)]]$ is \texttt{ForAllTy}. +The constructor for a saturated $[[(->)]]$ is \texttt{FunTy}. \item A type-level literal is represented in GHC with a different datatype than a term-level literal, but we are ignoring this distinction here. \item A coercion used as a type should appear only in the right-hand side of @@ -164,7 +195,7 @@ are purely representational. The metatheory would remain the same if these forms were removed in favor of $[[t1 t2]]$. Nevertheless, we keep all three forms in this documentation to accurately reflect the implementation. -The \texttt{Named} variant of a \texttt{Binder} (the first argument to a +The \texttt{ArgFlag} field of a \texttt{TyVarBinder} (the first argument to a \texttt{ForAllTy}) also tracks visibility of arguments. Visibility affects only source Haskell, and is omitted from this presentation. @@ -176,14 +207,14 @@ We use the notation $[[t1 k1~#k2 t2]]$ to stand for $[[(~#) k1 k2 t1 t2]]$. Invariants on coercions: \begin{itemize} -\item $[[<t1 t2>_R]]$ is used; never $[[<t1>_R <t2>_Nom]]$. -\item If $[[<T>_R]]$ is applied to some coercions, at least one of which is not -reflexive, use $[[T_R </ gi // i />]]$, never $[[<T>_R g1 g2]] \ldots$. +\item $[[<t1 t2>]]$ is used; never $[[<t1> <t2> ]]$. +\item If $[[<T>]]$ is applied to some coercions, at least one of which is not +reflexive, use $[[T_R </ gi // i />]]$, never $[[<T> g1 g2]] \ldots$. \item The $[[T]]$ in $[[T_R </gi//i/>]]$ is never a type synonym, though it could be a type function. \item Every non-reflexive coercion coerces between two distinct types. \item The name in a coercion must be a term-level name (\ctor{Id}). -\item The contents of $[[<t>_R]]$ must not be a coercion. In other words, +\item The contents of $[[<t>]]$ must not be a coercion. In other words, the payload in a \texttt{Refl} must not be built with \texttt{CoercionTy}. \end{itemize} @@ -201,6 +232,20 @@ for more background. \gram{\ottR} +The \texttt{GRefl} constructor taks an $[[mg]]$. It wraps a kind coercion, which +might be reflexive or any coercion: + +\gram{\ottmg} + +A nominal reflexive coercion is quite common, so we keep the special form +\texttt{Refl}. Invariants on reflexive coercions: + +\begin{itemize} +\item Always use $[[<t>]]$; never $[[<t> Nom MRefl]]$. +\item All invariants on $[[<t>]]$ hold for $[[<t> R MRefl]]$. +\item Use $[[<t> Rep MRefl]]$; never $[[sub <t>]]$. +\end{itemize} + Is it a left projection or a right projection? \gram{\ottLorR} @@ -275,15 +320,21 @@ synonym for $[[TYPE 'Unlifted]]$. \section{Contexts} The functions in \ghcfile{coreSyn/CoreLint.lhs} use the \texttt{LintM} monad. -This monad contains a context with a set of bound variables $[[G]]$. The -formalism treats $[[G]]$ as an ordered list, but GHC uses a set as its +This monad contains a context with a set of bound variables $[[G]]$ and a set +of bound labels $[[D]]$. The formalism treats $[[G]]$ and $[[D]]$ as ordered +lists, but GHC uses sets as its representation. \gram{ \ottG } -We assume the Barendregt variable convention that all new variables are +\gram{ +\ottD +} + +We assume the Barendregt variable convention that all new variables and labels +are fresh in the context. In the implementation, of course, some work is done to guarantee this freshness. In particular, adding a new type variable to the context sometimes requires creating a new, fresh variable name and then @@ -313,12 +364,29 @@ Here is the definition of $[[vars_of]]$, taken from \coderef{coreSyn/CoreSyn.lhs \end{array} \] +Here is the definition of $[[split]]$, which has no direct analogue in the +source but simplifies the presentation here: + +\[ +\begin{array}{ll} +[[split]]_0 [[t]] &= [[t]] \\ +[[split]]_n ([[s -> t]]) &= [[s]] \; ([[split]]_{n-1} [[t]]) \\ +[[split]]_n ([[forall alpha _ k . t]]) &= [[k]] \; ([[split]]_{n-1} [[t]]) +\end{array} +\] + +The idea is simply to peel off the leading $i$ argument types (which may be +kinds for type arguments) from a given type and return them in a sequence, with +the return type (given $i$ arguments) as the final element of the sequence. + \subsection{Binding consistency} \ottdefnlintXXbind{} \ottdefnlintSingleBinding{} +\ottdefnlintSingleBindingXXjoins{} + In the GHC source, this function contains a number of other checks, such as for strictness and exportability. See the source code for further information. @@ -376,11 +444,19 @@ See Section~\ref{sec:tyconroles} for more information about $[[tyConRolesX]]$, a see Section~\ref{sec:axiom_rules} for more information about $[[coaxrProves]]$. \subsection{Name consistency} +\label{sec:name_consistency} -There are two very similar checks for names, one declared as a local function: +There are three very similar checks for names, two performed as part of +\coderef{coreSyn/CoreLint.hs}{lintSingleBinding}: \ottdefnlintSingleBindingXXlintBinder{} +\ottdefnlintSingleBindingXXlintBinderXXjoin{} + +The point of the extra checks on $[[t']]$ is that a join point's type cannot be +polymorphic in its return type; see \texttt{Note [The polymorphism rule of join +points]} in \ghcfile{coreSyn/CoreSyn.hs}. + \ottdefnlintBinder{} \subsection{Substitution consistency} @@ -477,6 +553,12 @@ Also note that this semantics implements call-by-name, not call-by-need. So while it describes the operational meaning of a term, it does not describe what subexpressions are shared, and when. +\subsection{Join points} +Dealing with join points properly here would be cumbersome and pointless, since +by design they work no differently than functions as far as FC is concerned. +Reading $[[join]]$ as $[[let]]$ and $[[jump]]$ as application should tell you +all need to know. + \subsection{Operational semantics rules} \ottdefnstep{} |