diff options
author | Edward Z. Yang <ezyang@cs.stanford.edu> | 2015-07-07 19:49:32 -0700 |
---|---|---|
committer | Edward Z. Yang <ezyang@cs.stanford.edu> | 2015-07-07 19:49:49 -0700 |
commit | 8800a73a03c70357c81f24ef53f4644de6668d9d (patch) | |
tree | 79872b23d1dc7ce335e152bf5725ae385cf05e19 /docs | |
parent | 1967a52df5bea5539e46393fa0da0a1ebd6d9db8 (diff) | |
download | haskell-8800a73a03c70357c81f24ef53f4644de6668d9d.tar.gz |
Backpack: Flesh out more Cabal details
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/backpack/algorithm.pdf | bin | 288880 -> 299029 bytes | |||
-rw-r--r-- | docs/backpack/algorithm.tex | 328 |
2 files changed, 282 insertions, 46 deletions
diff --git a/docs/backpack/algorithm.pdf b/docs/backpack/algorithm.pdf Binary files differindex b8da93ce65..9d72314d19 100644 --- a/docs/backpack/algorithm.pdf +++ b/docs/backpack/algorithm.pdf diff --git a/docs/backpack/algorithm.tex b/docs/backpack/algorithm.tex index 79ddccfa44..2de50f5f8f 100644 --- a/docs/backpack/algorithm.tex +++ b/docs/backpack/algorithm.tex @@ -1291,7 +1291,7 @@ Design goals: \item Backpack files are user-written. (In an earlier design, we had the idea that Cabal would generate Backpack files; however, we've since made Backpack files more user-friendly and reasonable to - write by hand.) + write by hand since they are reasonably designed for user development.) \item Backpack files are optional. A package can add a Backpack file to replace some (but not all) of the fields in a Cabal description. @@ -1307,70 +1307,86 @@ Design goals: \subsection{Versioning} -In this section, we discuss how Cabal's version numbers factor into -Backpack, namely how we specify \I{PkgKey}s. - -\paragraph{History} -Prior to GHC 7.10, GHC has allowed an arbitrary combination of libraries -to be linked together, assuming that the package IDs (e.g. -\verb|foo-0.1|) were all unique. Cabal enforces a stronger restriction, -which is that there exists some unique mapping from package name to -package version which is consistent with all transitive dependencies. +In this section, we discuss how version numbers from Cabal factor into +Backpack. In particular, versioning impacts the specification of \I{PkgKey}s. +See \url{https://ghc.haskell.org/trac/ghc/wiki/Commentary/Packages/Concepts} +for more background, and \url{https://ghc.haskell.org/trac/ghc/ticket/10566} +for implementation progress. \paragraph{Design goals} Here are some design goals for versioning: \begin{enumerate} - \item GHC only tests for equality on versioning; Cabal is - responsible for determining the version of a package. For example, - pre-7.10 the linker symbols were prefixed using a package name and - version, but GHC simply represented this internally as an opaque - string. As another example, package qualified imports only allow - qualification by package name, and not by version. - - \item Cabal only tests for equality on package keys; GHC is - responsible for calculating the package key of a package. (This is + \item GHC doesn't know anything about version numbers: this is Cabal + specific information. There are a few cases in GHC today where + this design goal is already in force: pre-7.10, linker + symbols were prefixed using a package name and version, but GHC + simply represented this internally as an opaque string. And in + today's GHC, package qualified imports only allow qualification by + package name, and not by version. + + \item Cabal doesn't know anything about package keys: GHC is + responsible for calculating the package key of a package. This is because GHC must be able to maintain a mapping between the unhashed and hashed versions of a key, and the hashing process must be - deterministic.) If Cabal needs to generate a new package key, it - must do so through GHC. + deterministic. If Cabal needs to generate a new package key, it + must do so through GHC. (This is NOT how this is happening in GHC 7.10.) \item Our design should, in principle, support mutual recursion - between packages, even if the implementation does not (presently). + between packages, even if the implementation does not at the moment. \item GHC should not lose functionality, i.e. it should still be possible to link together the same package with different versions; however, Cabal may arrange for this to not occur by default unless a user explicitly asks for it. + + \item A Cabal source package identifier (e.g. \verb|foo-0.1|), which is + a unit of distribution, is a distinct + concept from a Backpack package (which we have referred to previously + in the document as a mere package name), because a single Cabal file may + ship a Backpack file that defines multiple internal packages. \end{enumerate} These goals imply a few things: \begin{enumerate} \item Backpack files should not contain any version numbers, - and should be agnostic to versioning. + and should be agnostic to versioning. Backpack files are parsed + and interpreted by GHC, and version numbers are Cabal's provenance! + + \item As a corollary, if you want to refer to a specific version of + a package from a Backpack file, this has to be done by giving the + alternate version a different package name, e.g. \verb|network-old|. + (It is tempting to want to simply say that this means we should allow + version numbers into GHC, but consider more complicated situations where + you want to refer to two instances of \verb|foo|, but one compiled + with \verb|bar-0.1| and the other compiled with \verb|bar-0.2|, then + your description of which package to pick up becomes considerably more + complicated than just a package name and version. Better to defer + this decision to Cabal.) \item Package keys must record versioning information, otherwise we can't link together two different versions of the same package. + This is due to our backwards-compatibility requirement. \end{enumerate} \paragraph{Package keys} -Earlier, we specified \I{PkgKey} as a package name $p$ and then a list -of hole instantiations. To allow linking together multiple versions of +To allow linking together multiple versions of the same package, we must record versioning information into the \I{PkgKey}. To do this, we include in the \I{PkgKey} a \I{VersionHash}. -Cabal is responsible for defining \I{VersionHash}, but we give two possible +Cabal is responsible for defining \I{VersionHash} and may do whatever it +wants, but we give two possible definitions in Figure~\ref{fig:version}. \begin{figure}[htpb] $$ \begin{array}{rcll} p && \mbox{Package name} \\ -v && \mbox{Version number} \\[1em] -\I{VersionHash} & ::= & p \verb|-| v\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\ -\I{VersionHash'} & ::= & p \; \verb|{| \, p_0\verb|-|v_0 \verb|,|\, \ldots\, p_n\verb|-|v_n \, \verb|}| & \mbox{Simplified version hash} \\ -\I{PkgKey} & ::= & \I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\ +\I{SrcPkgId} && \mbox{Cabal source package ID, e.g. } \verb|foo-0.1| \\[1em] +\I{VersionHash} & ::= & \I{SrcPkgId}\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\ +\I{VersionHash'} & ::= & \I{SrcPkgId} \; \verb|{| \, \I{SrcPkgId}_0 \verb|,|\, \ldots\, \verb|,|\, \I{SrcPkgId}_n \, \verb|}| & \mbox{Simplified version hash} \\ +\I{PkgKey} & ::= & p\verb|-|\I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\ \end{array} $$ \caption{Version hash} \label{fig:version} @@ -1387,7 +1403,7 @@ The full version hash has some subtleties: \begin{itemize} \item Each sub-\I{VersionHash} recorded in a \I{VersionHash} is identified by a package name, which may not necessarily equal the - package name in the \I{VersionHash}. This permits us to calculate + package name embedded in the \I{SrcPkgId} in the \I{VersionHash}. This permits us to calculate a \I{VersionHash} for a package like: \begin{verbatim} package p where @@ -1397,15 +1413,15 @@ The full version hash has some subtleties: \end{verbatim} if we want \verb|network| to refer to \verb|network-2.0| and \verb|network-old| to refer to \verb|network-1.0|. Without - identifying each subdependency by package name, we wouldn't know - what \verb|network-old| would refer to. + identifying each subdependency by package name, we could not + distinguish the recorded \I{VersionHash}s for \verb|network-old| and \verb|network|. - \item If a package is locally specified in a Backpack - file, it does not occur in the \I{VersionHash}. This is because - we always refer to the same package; there are no different versions! + \item If a package name is locally specified in a Backpack + file, it does not occur in the \I{VersionHash}: \I{VersionHash} + strictly operates over Cabal's notion of package identity. \item You might wonder why we need a \I{VersionHash} as well as a \I{PkgKey}; - why not just specify \I{PkgKey} as $p-v \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$? However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion: + why not just specify \I{PkgKey} as $\I{SrcPkgId} \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$? However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion: \begin{verbatim} package p where @@ -1419,15 +1435,235 @@ The full version hash has some subtleties: version hash does not have this problem as it is not recursive.) \end{itemize} -\paragraph{Cabal to GHC} +\subsection{Distribution and installation} + +How are Backpack files installed so other people can use them? + +\paragraph{Challenges} + +\begin{itemize} + \item Prior to Backpack, when a Cabal package (e.g. unit of + distribution) was compiled and installed would result in a single + entry in the installed package database. With Backpack, compiling a + package could result in multiple entries in the installed package + database: (1) for indefinite packages which were instantiated, and + (2) when there are multiple packages in a Backpack file. + + \item Relatedly, when we include an indefinite package, we may need + to rebuild it with our specific dependencies. This makes compiling + a Backpack file much more similar to \verb|cabal-install| than to + \verb|Cabal|; however, the dependency structure is something that + only GHC can calculate. +\end{itemize} + +\paragraph{Why distribute Backpack files?} + +Backpack files offer a convenient mechanism of defining multiple packages +with inline syntax for modules. Further syntax extensions could allow us +to give people a MixML style of programming in Haskell. + +A Backpack file is not a replacement for a Cabal file: +\verb|exposed-modules| and similar fields are not necessary but we still +need a \verb|build-depends| to provide version bounds (until Backpack +can also be used to handle version dependency.) This makes it easy +for cabal-install to do its job. + +This means we distinguish a package name $p$ which occurs in a Backpack +file and a Cabal \I{SrcPkgId}: Cabal creates a mapping between these. +So to refer to an old version of a package, you would refer to it with +a different name $q$, and then tell Cabal about the version bound constraints +you want. + +\paragraph{Definite packages} + +Suppose we have written a Backpack file that looks like: + +\begin{verbatim} + package helper where + include base + module P + package mypackage where + include containers + include helper + module Q +\end{verbatim} + +and have written a Cabal file for it intending to distribute it on +Hackage under the name \verb|mypackage-0.1|. In the end, we will end +up with the following entries in our installed package database: + +\begin{verbatim} + name: "mypackage" + id: mypackage-1.0-IPID + version: 1.0 + key: XXX + # e.g. mypackage-AAA {} + version-hash: AAA + # e.g. mypackage-1.0 { base -> base-4.7 , containers -> containers-0.5 } + depends: mypackage$helper-1.0-IPID, base-4.7-IPID + --- + name: "mypackage$helper" + version: 1.0 + id: mypackage$helper-1.0-IPID + key: YYY + # e.g. helper-AAA {} + version-hash: AAA + depends: containers-0.5-IPID +\end{verbatim} +% +Things to note: + +\begin{enumerate} + \item The package in the Backpack file with the same name as the Cabal + package has special status: this is the package which is registered + to the installed package database under the same name. All other packages + are \emph{qualified} under the Cabal package name, e.g. \verb|mypackage$helper|. + + \item The version hash, as described previously, is computed once for all + packages in the Backpack file, and the \verb|version| and \verb|version-hash| + are the same across all of them. + + \item The key varies between the packages, since the $p$ parameter is different + in each one. + + \item The installed package ID incorporates information about the package name. + + \item Dependencies are only recorded directly \verb|include|d packages in a Backpack package. (GHC has to communicate to Cabal what the includes of every subpackage are.) +\end{enumerate} +% +A more complex example with instantiated packages looks similar: + +\begin{verbatim} + package helper where + signature Data.Map + module P + package mypackage where + include containers (Data.Map) + include helper + module Q +\end{verbatim} +% +however, now the instantiation is recorded in the database as well. + +\begin{verbatim} + name: "mypackage" + id: mypackage-1.0-IPID + version: 1.0 + key: XXX + # e.g. mypackage-AAA {} + version-hash: AAA + # e.g. mypackage-1.0 { containers -> containers-0.5 } + depends: mypackage$helper-1.0-IPID, containers-0.5-IPID + --- + name: "mypackage$helper" + version: 1.0 + id: mypackage$helper-1.0-IPID + key: YYY + # e.g. helper-AAA { Data.Map -> containers-KEY:Data.Map } + version-hash: AAA + depends: (none) + instantiated-with: + Data.Map -> Data.Map@containers-0.5-IPID +\end{verbatim} +% +More remarks: + +\begin{enumerate} + \item Cabal's recorded \verb|instantiated-with| records installed + package IDs, so that the used implementation is uniquely determined. + + \item Conversely, \verb|depends| does NOT record non-textual dependencies + such as instantiated holes. \Red{is this necessary} + + \item IPID includes information about how holes were instantiated. +\end{enumerate} + +\paragraph{GHC to Cabal} + +When GHC compiles a Backpack file, it is the only entity which knows +about the subpackages of a package. In order to make sure they are +all correctly installed, GHC has to communicate back some meta-data to +Cabal: for each package, + +\begin{itemize} + \item The (computed) package keys + \item The dependencies + \item The instantiation +\end{itemize} + +I guess we have to define some format to do this. GHC can't directly +write to the package database, because it doesn't know how to write in +the Cabal-specific portion of the information. + +\Red{This is clunky, is there a way to eliminate this? It's not possible +for Cabal out of the box to handle this, since it assumes no module name conflicts +but there definitely may be some in Backpack.} + +\paragraph{Indefinite package database} + +The indefinite package database records indefinite packages (with holes) +that have been typechecked. An indefinite package is associated with a +(possibly unlimited) number of instantiated versions of the package, +which have been fully instantiated and compiled. + +An indefinite package is a new type of entry in the existing installed package +database. \Red{or maybe another entry in a different database} Here are the important things to keep track of for an +indefinite package: + +\begin{itemize} + \item Where do the (indefinite) interface files live? (NB: there are no + libraries since we haven't compiled the package.) + \item Where does the shape information live? (We could put it with the + interface files, it's a pretty similar binary file.) + \item Where does the source live, so we can recompile it when we instantiate it. + (If it's empty, we'll have to refetch it from Hackage or something). + \item Where does the Cabal configuration (result of running + \verb|cabal configure|) live, so that we build it with the same dependencies, flags, etc. +\end{itemize} + +Associated with an indefinite package is some number of instantiated versions +of this package. These are identified by package key (the installed package ID +is the same) and are morally ``sub''-packages of the indefinite package, +although they get their own entries. \Red{Alternate plan: put them together. +Distinction between Cabal package and Backpack package.} + +What makes installed indefinite packages difficult is that GHC may need to +recompile them on the fly depending on an include. + +\paragraph{The plan} + +\Red{To be worked out} + +% Description: cabal-install only computes package-name edge labeling, +% then attempts to compile. If the package is indefinite, Cabal +% type checks and installs the interface files, source code and +% configuration information (TODO: this is something GHC has +% to understand\ldots) to the package database. If the package +% is definite, Cabal goes and ahead and builds it. During compilation, +% when processing an include GHC may notice that a package depends on an +% instantiation of an indefinite package that is not compiled; GHC +% goes ahead and builds it using the saved information. + +% Con: We need to install indefinite packages, including all of +% the source and information we'd need to actually build it +% (the result of a configure? Only Cabal really knows how +% to understand that; so it should be like a Cabal configured +% package? If GHC calls in that's annoying.) It would be nice +% if this was done cabal-install style, but there are many downside +% to deferring all of this processing to cabal-install. + + +% Model: GHC compiles everything itself +% GHC needs to report multiple distinct compile products to Cabal +% GHC needs to ``reset'' the EPS (but only for type checking) + +% Model: Cabal pre-compiles dependencies, and then GHC handles the rest +% Trouble: Cabal needs to be able to read the bkp file to find out what the instantiation is +% Fix: Have a GHC mode to output this information. Also, if Cabal is doing an old style it already knows. +% Trouble: seems wrong for normal Cabal to isntall it +% Think about it like a CACHE + + -Prior to GHC-7.10, Cabal passed versioning information to GHC using the -\verb|-package-name| flag. In GHC 7.10, this flag was renamed to -\verb|-this-package-key|. We propose that this flag be renamed once -again to \verb|-this-version-hash|, to which Cabal passes a hash (or string) -describing the versioning of the package which is then incorporated -into the package key. Cabal no longer needs to calculate package keys. -In the absence of Backpack, there will be no semantic difference if we -switch to full version hashes. \end{document} % chktex 16 |