diff options
author | Edward Z. Yang <ezyang@cs.stanford.edu> | 2014-07-31 14:05:02 +0100 |
---|---|---|
committer | Edward Z. Yang <ezyang@cs.stanford.edu> | 2014-07-31 14:05:07 +0100 |
commit | a0ff1eb3c0745230cd70525853ca741e08a1f34d (patch) | |
tree | 5b85198270207395498647a760b1738f60e19c6a /docs/backpack | |
parent | 6fa6caad0cb4ba99b2c0b444b0583190e743dd63 (diff) | |
download | haskell-a0ff1eb3c0745230cd70525853ca741e08a1f34d.tar.gz |
[backpack] Package selection
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs/backpack')
-rw-r--r-- | docs/backpack/backpack-impl.tex | 625 |
1 files changed, 412 insertions, 213 deletions
diff --git a/docs/backpack/backpack-impl.tex b/docs/backpack/backpack-impl.tex index a921fc38df..0d4a5807ba 100644 --- a/docs/backpack/backpack-impl.tex +++ b/docs/backpack/backpack-impl.tex @@ -44,218 +44,222 @@ The purpose of this document is to describe an implementation path for Backpack in GHC\@. -We start off by outlining the current architecture of GHC, ghc-pkg and Cabal, -which constitute the existing packaging system. We then state what our subgoals -are, since there are many similar sounding but different problems to solve. Next, -we describe the ``probably correct'' implementation plan, and finish off with -some open design questions. This is intended to be an evolving design document, -so please contribute! - \tableofcontents -\section{Current packaging architecture} - -The overall architecture is described in Figure~\ref{fig:arch}. - -\begin{figure}[H] - \center{\scalebox{0.8}{\includegraphics{arch.png}}} -\label{fig:arch}\caption{Architecture of GHC, ghc-pkg and Cabal. Green bits indicate additions from upcoming IHG work, red bits indicate additions from Backpack. Orange indicates a Haskell library.} -\end{figure} - -Here, arrows indicate dependencies from one component to another. Color -coding is as follows: orange components are libaries, green components -are to be added with the IHG work, red components are to be added with -Backpack. (Thus, black and orange can be considered the current) - -\subsection{Installed package database} - -Starting from the bottom, we have the \emph{installed package database} -(actually a collection of such databases), which stores information -about what packages have been installed are thus available to be -compiled against. There is both a global database (for the system -administrator) and a local database (for end users), which can be -updated independently. One way to think about the package database -is as a \emph{cache of object code}. In principle, one could compile -any piece of code by repeatedly recompiling all of its dependencies; -the installed package database describes when this can be bypassed. - -\begin{figure}[H] - \center{\scalebox{0.8}{\includegraphics{pkgdb.png}}} -\label{fig:pkgdb}\caption{Anatomy of a package database.} -\end{figure} - -In Figure~\ref{fig:pkgdb}, we show the structure of a package database. -The installed package are created from a Cabal file through the process -of dependency resolution and compilation. In database terms, the primary key -of a package database is the InstalledPackageId -(Figure~\ref{fig:current-pkgid}). This ID uniquely identifies an -instance of an installed package. The PackageId omits the ABI hash and -is used to qualify linker exported symbols: the current value of this -parameter is communicated to GHC using the \verb|-package-id| flag. - -In principle, packages with different PackageIds should be linkable -together in the same compiled program, whereas packages with the same -PackageId are not (even if they have different InstalledPackageIds). In -practice, GHC is currently only able to select one version of a package, -as it clears out all old versions of the package in -\ghcfile{compiler/main/Package.lhs}:applyPackageFlag. - -\begin{figure} - \center{\begin{tabular}{r l} - PackageId & package name, package version \\ - InstalledPackageId & PackageId, ABI hash \\ - \end{tabular}} -\label{fig:current-pkgid}\caption{Current structure of package identifiers.} -\end{figure} - -The database entry itself contains the information from the installed package ID, -as well as information such as what dependencies it was linked against, where -its compiled code and interface files live, its compilation flags, what modules -it exposes, etc. Much of this information is only relevant to Cabal; GHC -uses a subset of the information in the package database. - -\subsection{GHC} +\section{What we are trying to solve} -The two programs which access the package database directly are GHC -proper (for compilation) and ghc-pkg (which is a general purpose -command line tool for manipulating the database.) GHC relies on -the package database in the following ways: - -\begin{itemize} - \item It imports the local and global package databases into - its runtime database, and applies modifications to the exposed - and trusted status of the entries via the flags \verb|-package| - and others (\ghcfile{compiler/main/Packages.lhs}). The internal - package state can be seen at \verb|-v4| or higher. - \item It uses this package database to find the location of module - interfaces when it attempts to load the module info of an external - module (\ghcfile{compiler/iface/LoadIface.hs}). -\end{itemize} +While the current ecosystem has proved itself serviceable for many years, +there are a number of major problems which causes significant headaches +for many users. Here are some of them: -GHC itself performs a type checking phase, which generates an interface -file representing the module (so that later invocations of GHC can load the type -of a module), and then after compilation projects object files and linked archives -for programs to use. +\subsection{Package reinstalls are destructive}\label{sec:destructive} -\paragraph{Original names} Original names are an important design pattern -in GHC\@. -Sometimes, a name can be exposed in an hi file even if its module -wasn't exposed. Here is an example (compiled in package R): +When attempting to install a new package, you might get an error like +this: \begin{verbatim} -module X where - import Internal (f) - g = f - -module Internal where - import Internal.Total (f) +$ cabal install hakyll +cabal: The following packages are likely to be broken by the reinstalls: +pandoc-1.9.4.5 +Graphalyze-0.14.0.0 +Use --force-reinstalls if you want to install anyway. \end{verbatim} -Then in X.hi: +While this error message is understandable if you're really trying to +reinstall a package, it is quite surprising that it can occur even if +you didn't ask for any reinstalls! + +The underlying cause of this problem is related to an invariant Cabal +currently enforces on a package database: there can only be one instance +of a package for any given package name and version. This means that it +is not possible to install a package multiple times, compiled against +different dependencies. However, sometimes, reinstalling a package with +different dependencies is the only way to fulfill version bounds of a +package! For example: say we have three packages \pname{a}, \pname{b} +and \pname{c}. \pname{b-1.0} is the only version of \pname{b} +available, and it has been installed and compiled against \pname{c-1.0}. +Later, the user installs an updated version \pname{c-1.1} and then +attempts to install \pname{a}, which depends on the specific versions +\pname{c-1.1} and \pname{b-1.0}. We \emph{cannot} use the already +installed version of \pname{b-1.0}, which depends on the wrong version +of \pname{c}, so our only choice is to reinstall \pname{b-1.0} compiled +against \pname{c-1.1}. This will break any packages, e.g. \pname{d}, +which were built against the old version of \pname{b-1.0}. + +Our solution to this problem is to \emph{abolish} destructive package +installs, and allow a package to be installed multiple times with the same +package name and version. However, allowing this poses some interesting +user interface problems, since package IDs are now no longer unambiguous +identifiers. + +\subsection{Version bounds are often over/under-constrained} + +When attempting to install a new package, Cabal might fail in this way: \begin{verbatim} -g = <R.id, Internal.Total, f> (this is the original name) +$ cabal install hledger-0.18 +Resolving dependencies... +cabal: Could not resolve dependencies: +# pile of output \end{verbatim} -(The reason we refer to the package as R.id is because it's the -full package ID, and not just R). - -\subsection{hs-boot} - -\verb|hs-boot| is a special mechanism used to support recursive linking -of modules within a package, today. Suppose I have a recursive module -dependency between modules and A and B. I break one of\ldots +There are a number of possible reasons why this could occur, but usually +it's because some of the packages involved have over-constrained version +bounds, which are resulting in an unsatisfiable set of constraints (or, +at least, Cabal gave up backtracking before it found a solution.) To +add insult to injury, most of the time the bound is nonsense and removing +it would result in a working compilation. In fact, this situation is +so common that Cabal has a flag \verb|--allow-newer| which lets you +override the package upper bounds. + +However, the flip-side is when Cabal finds a satisfying set, but your +compilation fails with a type error. Here, you had an under-constrained +set of version bounds which didn't actually reflect the compatible +versions of a package, and Cabal picked a version of the package which +was incompatible. + +Our solution to this problem is to use signatures instead of version +numbers as the primary mechanism by which compatibility is determined: +e.g., if it typechecks, it's a valid choice. Version numbers can still +be used to reflect semantic changes not seen in the types (in +particular, ruling out buggy versions of a package is a useful +operation), but these bounds are empirical observations and can be +collected after-the-fact. + +\subsection{It is difficult to support multiple implementations of a type} + +This problem is perhaps best described by referring to a particular +instance of it Haskell's ecosystem: the \texttt{String} data type. Haskell, +by default, implements strings as linked lists of integers (representing +characters). Many libraries use \texttt{String}, because it's very +convenient to program against. However, this representation is also +very \emph{slow}, so there are alternative implementations such as +\texttt{Text} which implement efficient, UTF-8 encoded packed byte +arrays. + +Now, suppose you are writing a library and you don't care if the user of +your library is using \texttt{String} or \texttt{Text}. However, you +don't want to rewrite your library twice to support both data types: +rather, you'd like to rely on some \emph{common interface} between the +two types, and let the user instantiate the implementation. The only +way to do this in today's Haskell is using type classes; however, this +necessitates rewriting all type signatures from a nice \texttt{String -> +String} to \texttt{StringLike s => s -> s}. The result is less readable, +required a large number of trivial edits to type signatures, and might +even be less efficient, if GHC does not appropriately specialize your code +written in this style. + +Our solution to this problem is to introduce a new mechanism of +pluggability: module holes, which let us use types and functions from a +module \texttt{Data.String} as before, but defer choosing \emph{what} +module should be used in the implementation to some later point (or +instantiate the code multiple times with different choices.) + +\subsection{Fast moving APIs are difficult to develop/develop against} + +Most packages that are uploaded to Hackage have package authors which pay +some amount of attention to backwards compatibility and avoid making egregious +breaking changes. However, a package like the \verb|ghc-api| follows a +very different model: the library is a treated by its developers as an +internal component of an application (GHC), and is frequently refactored +in a way that changes its outwards facing interface. + +Arguably, an application like GHC should design a stable API and +maintain backwards compatibility against it. However, this is a lot of +work (including refactoring) which is only being done slowly, and in the +meantime, the dump of all the modules gives users the functionality they +want (even if it keeps breaking every version.) + +One could say that the core problem is there is no way for users to +easily communicate to GHC authors what parts of the API they rely on. A +developer of GHC who is refactoring an interface will often rely on the +typechecker to let them know which parts of the codebase they need to +follow and update, and often could say precisely how to update code to +use the new interface. User applications, which live out of tree, don't +receive this level of attention. + +Our solution is to make it possible to typecheck the GHC API against a +signature. Important consumers can publish what subsets of the GHC API +they rely against, and developers of GHC, as part of their normal build +process, type-check against these signatures. If the signature breaks, +a developer can either do the refactoring differently to avoid the +compatibility-break, or document how to update code to use the new API\@. + +\section{Backpack in a nutshell} + +For a more in-depth tutorial about Backpack's features, check out Section 2 +of the original Backpack paper. In this section, we briefly review the +most important points of Backpack's design. + +\paragraph{Thinning and renaming at the module level} +A user can specify a build dependency which only exposes a subset of +modules (possibly under different names.) By itself, it's a way for the +user to resolve ambiguous module imports at the package level, without +having to use the \texttt{PackageImports} syntax extension. + +\paragraph{Holes (abstract module definitions)} The core +component of Backpack's support for \emph{separate modular development} +is the ability to specify abstract module bindings, or holes, which give +users of the module an obligation to provide an implementation which +fulfills the signature of the hole. In this example: -(ToDo: describe how hs-boot mechanism works) +\begin{verbatim} +package p where + A :: [ ... ] + B = [ import A; ... ] +\end{verbatim} -\subsection{Cabal} +\verb|p| is an \emph{indefinite package}, which cannot be compiled until +an implementation of \m{A} is provided. However, we can still type check +\m{B} without any implementation of \m{A}, by type checking it against +the signature. Holes can be put into signature packages and included +(depended upon) by other packages to reuse definitions of signatures. -Cabal is the build system for GHC, we can think of it as parsing a Cabal -file describing a package, and then making (possibly multiple) -invocations to GHC to perform the appropriate compilation. What -information does Cabal pass onto GHC\@? One can get an idea for this by -looking at a prototypical command line that Cabal invokes GHC with: +\paragraph{Filling in holes with an implementation} +A hole in an indefinite package can be instantiated in a \emph{mix-in} +style: namely, if a signature and an implementation have the same name, +they are linked together: \begin{verbatim} -ghc --make - -package-name myapp-0.1 - -hide-all-packages - -package-id containers-0.9-ABCD - Module1 Module2 +package q where + A = [ ... ] + include p -- has signature A \end{verbatim} -There are a few things going on here. First, Cabal has to tell GHC -what the name of the package it's compiling (otherwise, GHC can't appropriately -generate symbols that other code referring to this package might generate). -There are also a number of commands which configure its in-memory view of -the package database (GHC's view of the package database may not directly -correspond to what is on disk). There's also an optimization here: in principle, -GHC can compile each module one-by-one, but instead we use the \verb|--make| flag -because this allows GHC to reuse some data structures, resulting in a nontrivial -speedup. - -(ToDo: describe cabal-install/sandbox) +Renaming is often useful to rename a module (or a hole) so that a signature +and implementation have the same name and are linked together. +An indefinite package can be instantiated multiple times with different +implementations: the \emph{applicativity} of Backpack means that if +a package is instantiated separately with the same module, the results +are type equal: -\section{Goals} +\begin{verbatim} +package q' where + A = [ ... ] + include p (A, B as B1) + include p (A, B as B2) + -- B1 and B2 are equivalent +\end{verbatim} -Here are some of the high-level goals which motivate our improvements to -the module system. +\paragraph{Combining signatures together} +Unlike implementations, it's valid for a multiple signatures with the +same name to be in scope. -\begin{itemize} - \item Solve \emph{Cabal hell}, a situation which occurs when conflicting - version ranges on a wide range of dependencies leads to a situation - where it is impossible to satisfy the constraints. We're seeking - to solve this problem in two ways: first, we want to support - multiple instances of containers-2.9 in the database which are - compiled with different dependencies (and even link them - together), and second, we want to abolish (often inaccurate) - version ranges and move to a regime where packages depend on - signatures. Version ranges may still be used to indicate important - semantic changes (e.g., bugs or bad behavior on the part of package - authors), but they should no longer drive dependency resolution - and often only be recorded after the fact. - - \item Support \emph{hermetic builds with sharing}. A hermetic build - system is one which simulates rebuilding every package whenever - it is built; on the other hand, actually rebuilding every time - is extremely inefficient (but what happens in practice with - Cabal sandboxes). We seek to solve this problem with the IHG work, - by allowing multiple instances of a package in the database, where - the only difference is compilation parameters. We don't care - about being able to link these together in a single program. - - \item Support \emph{module-level pluggability} as an alternative to - existing (poor) usage of type classes. The canonical example are - strings, where a library might want to either use the convenient - but inefficient native strings, or the efficient packed Text data - type, but would really like to avoid having to say \verb|StringLike s => ...| - in all of their type signatures. While we do not plan on supporting - separate compilation, Cabal should understand how to automatically - recompile these ``indefinite'' packages when they are instantiated - with a new plugin. - - \item Support \emph{separate modular development}, where a library and - an application using the library can be developed and type-checked - separately, intermediated by an interface. The canonical example - here is the \verb|ghc-api|, which is a large, complex API that - the library writers (GHC developers) frequently change---the ability - for downstream projects to say, ``Here is the API I'm relying on'' - without requiring these projects to actually be built would greatly - assist in keeping the API stable. This is applicable in - the pluggability example as well, where we want to ensure that all - of the $M \times N$ configurations of libraries versus applications - type check, by only running the typechecker $M + N$ times. A closely - related concern is related toolchain support for extracting a signature - from an existing implementation, as current Haskell culture is averse - to explicitly writing separate signature files. - - \item Subsume existing support for \emph{mutually recursive modules}, - without the double vision problem. -\end{itemize} +\begin{verbatim} +package a-sig where + A :: [ ... ] +package a-sig2 where + A :: [ ... ] +package q where + include a-sig + include a-sig2 + B = [ import A; ... ] +\end{verbatim} -A \emph{non-goal} is to allow users to upgrade upstream libraries -without recompiling downstream. This is an ABI concern and we're not -going to worry about it. +These signatures \emph{merge} together, providing the union of the +functionality (assuming the types of individual entities are +compatible.) Backpack has a very simple merging algorithm: types must +match exactly to be compatible (\emph{width} subtyping). \clearpage @@ -718,38 +722,233 @@ but we must record this tree \emph{even} when our package has no holes. % As a final example, the full module % identity of \m{B1} in Figure~\ref{fig:granularity} may actually be $\pname{p-0.9(q-1.0[p-0.9]:A1)}$:\m{B}. - -\subsection{Implementation} - -In GHC's current packaging system, a single package compiles into a -single entry in the installed package database, indexed by the package -key. This property is preserved by package-level granularity, as we -assign the same package key to all modules. Package keys provide an -easy mechanism for sharing to occur: when an indefinite package is fully -instantiated, we can check if we already have its package key installed -in the installed package database. (At the end of this section, we'll -briefly discuss some of the problems actually implementing Paper Backpack.) -It is also important to note that we are \emph{willing to duplicate code}; -processes like this already happen in other parts of the compiler -(such as inlining.) - -\paragraph{Relaxing package selection restrictions} As mentioned -previously, GHC is unable to select multiple packages with the same -package name (but different package keys). This restriction needs to be -lifted. We should add a new flag \verb|-package-key|. GHC also knows -about version numbers and will mask out old versions of a library when -you make another version visible; this behavior needs to be modified. - \paragraph{Linker symbols} As we increase the amount of information in PackageId, it's important to be careful about the length of these IDs, as they are used for exported linker symbols (e.g. \verb|base_TextziReadziLex_zdwvalDig_info|). Very long symbol names hurt compile and link time, object file sizes, GHCi startup time, -dynamic linking, and make gdb hard to use. As such, the current plan is -to do away with full package names and versions, and instead use just a -base-62 encoded hash, perhaps with the first four characters of the package +dynamic linking, and make gdb hard to use. As such, we are going to +do away with full package names and versions and instead use just a +base-62 encoded hash, with the first five characters of the package name for user-friendliness. +\subsection{Package selection} + +When I fire up \texttt{ghci} with no arguments, GHC somehow creates +out of thin air some consistent set of packages, whose modules I can +load using \texttt{:m}. This functionality is extremely handy for +exploratory work, but actually GHC has to work quite hard in order +to generate this set of packages, the contents of which are all +dumped into a global namespace. For example, GHC doesn't have access +to Cabal's dependency solver, nor does it know \emph{which} packages +the user is going to ask for, so it can't just run a constraint solver, +get a set of consistent packages to offer and provide them to the user.\footnote{Some might +argue that depending on a global environment in this fashion is wrong, because +when you perform a build in this way, you have absolutely no ideas what +dependencies you actually ended up using. But the fact remains that for +end users, this functionality is very useful.} + +To make matters worse, while in the current design of the package database, +a package is uniquely identified by its package name and version, in +the Backpack design, it is \emph{mandatory} that we support multiple +packages installed in the database with the same package name and version, +and this can result in complications in the user model. This further +complicates GHC's default package selection algorithm. + +In this section, we describe how the current algorithm operates (including +what invariants it tries to uphold and where it goes wrong), and how +to replace the algorithm to handle generalization to +multiple instances in the package database. We'll also try to tease +apart the relationship between package keys and installed package IDs in +the database. + +\paragraph{The current algorithm} Abstractly, GHC's current package +selection algorithm operates as follows. For every package name, select +the package with the latest version (recall that this is unique) which +is also \emph{valid}. A package is valid if: + +\begin{itemize} + \item It exists in the package database, + \item All of its dependencies are valid, + \item It is not shadowed by a package with the same package ID\footnote{Recall that currently, a package ID uniquely identifies a package in the package database} in + another package database (unless it is in the transitive closure + of a package named by \texttt{-package-id}), and + \item It is not ignored with \texttt{-ignore-package}. +\end{itemize} + +Package validity is probably the minimal criterion for to GHC to ensure +that it can actually \emph{use} a package. If the package is missing, +GHC can't find the interface files or object code associated with the +package. Ignoring packages is a way of pretending that a package is +missing from the database. + +Package validity is also a very weak criterion. Another criterion we +might hope holds is \emph{consistency}: when we consider the transitive +closure of all selected packages, for any given package name, there +should only be one instance providing that package. It is trivially +easy to break this property: suppose that I have packages \pname{a-1.0}, +\pname{b-1.0} compiled against \pname{a-1.0}, and \pname{a-1.1}. GHC +will happily load \pname{b-1.0} and \pname{a-1.1} together in the same +interactive session (they are both valid and the latest versions), even +though \pname{b-1.0}'s dependency is inconsistent with another package +that was loaded. The user will notice if they attempt to treat entities +from \pname{a} reexported by \pname{b-1.0} and entities from +\pname{a-1.1} as type equal. Here is one user who had this problem: +\url{http://stackoverflow.com/questions/12576817/}. In some cases, the +problem is easy to work around (there is only one offending package +which just needs to be hidden), but if the divergence is deep in two +separate dependency hierarchies, it is often easier to just blow away +the package database and try again. + +Perversely, destructive reinstallation helps prevent these sorts of +inconsistent databases. While inconsistencies can arise when multiple +versions of a package are installed, multiple versions will frequently +lead to the necessity of reinstalls. In the previous example, if a user +attempts to Cabal install a package which depends on \pname{a-1.1} and +\pname{b-1.0}, Cabal's dependency solver will propose reinstalling +\pname{b-1.0} compiled against \pname{a-1.1}, in order to get a +consistent set of dependencies. If this reinstall is accepted, we +invalidate all packages in the database which were previously installed +against \pname{b-1.0} and \pname{a-1.0}, excluding them from GHC's +selection process and making it more likely that the user will see a +consistent view of the database. + +\paragraph{Enforcing consistent dependencies} From the user's +perspective, it would be desirable if GHC never loaded a set of packages +whose dependencies were inconsistent. +There are two ways we can go +about doing this. First, we can improve GHC's logic so that it doesn't +pick an inconsistent set. However, as a point of design, we'd like to +keep whatever resolution GHC does as simple as possible (in an ideal +world, we'd skip the validity checks entirely, but they ended up being +necessary to prevent broken database from stopping GHC from starting up +at all). In particular, GHC should \emph{not} learn how to do +backtracking constraint solving: that's in the domain of Cabal. Second, +we can modify the logic of Cabal to enforce that the package database is +always kept in a consistent state, similar to the consistency check +Cabal applies to sandboxes, where it refuses to install a package to a +sandbox if the resulting dependencies would not be consistent. + +The second alternative is a appealing, but Cabal sandboxes are currently +designed for small, self-contained single projects, as opposed to the +global ``universe'' that a default environment is intended to provide. +For example, with a Cabal sandbox environment, it's impossible to +upgrade a dependency to a new version without blowing away the sandbox +and starting again. To support upgrades, Cabal needs to do some work: +when a new version is put in the default set, all of the +reverse-dependencies of the old version are now inconsistent. Cabal +should offer to hide these packages or reinstall them compiled against +the latest version. Cabal should also be able to snapshot the older +environment which captures the state of the universe prior to the +installation, in case the user wants to revert back. + +\paragraph{Modifying the default environment} Currently, after GHC +calculates the default package environment, a user may further modify +the environment by passing package flags to GHC, which can be used to +explicitly hide or expose packages. How do these flags interact with +our Cabal-managed environments? Hiding packages is simple enough, +but exposing packages is a bit dicier. If a user asks for a different +version of a package than in the default set, it will probably be +inconsistent with the rest of the dependencies. Cabal would have to +be consulted to figure out a maximal set of consistent packages with +the constraints given. + +However, this use-case is rare. Usually, it's not because they want a +specific version: the package is hidden simply because we're not +interested in loading it by default (\pname{ghc-api} is the canonical +example, since it dumps a lot of modules in the top level namespace). +If we distinguish packages which are consistent but hidden, their +loads can be handled appropriately. + +\paragraph{Consistency in Backpack} We have stated as an implicit +assumption that if we have both \pname{foo-1.0} and \pname{foo-1.1} +available, only one should be loaded at a time. What are the +consequences if both of these packages are loaded at the same time? An +import of \m{Data.Foo} provided by both packages would be ambiguous and +the user might find some type equalities they expect to hold would not. +However, the result is not \emph{unsound}: indeed, we might imagine a +user purposely wanting two different versions of a library in the same +program, renaming the modules they provided so that they could be +referred to unambiguously. As another example, suppose that we have an +indefinite package with a hole that is instantiated multiple times. In +this case, a user absolutely may want to refer to both instantiations, +once again renaming modules so that they have unique names. + +There are two consequences of this. First, while the default package +set may enforce consistency, a user should still be able to explicitly +ask for a package instance, renamed so that its modules don't conflict, +and then use it in their program. Second, instantiated indefinite packages +should \emph{never} be placed in the default set, since it's impossible +to know which instantiation is the one the user prefers. A definite package +can reexport an instantiated module under an unambiguous name if the user +so pleases. + +\paragraph{Shadowing, installed package IDs, ABI hashes and package +keys} Shadowing plays an important role for maintaining the soundness of +compilation; call this the \emph{compatibility} of the package set. The +problem it addresses is when there are two distinct implementations of a +module, but because their package ID (or package key, in the new world +order) are the same, they are considered type equal. It is absolutely +wrong for a single program to include both implementations +simultaneously (the symbols would conflict and GHC would incorrectly +conclude things were type equal when they're not), so \emph{shadowing}'s +job is to ensure that only one instance is picked, and all the other +instances considered invalid (and their reverse-dependencies, etc.) +Recall that in current GHC, within a package database, a package +instance is uniquely identified by its package ID\@; thus, shadowing +only needs to take place between package databases. An interesting +corner case is when the same package ID occurs in both databases, but +the installed package IDs are the \emph{same}. Because the installed +package ID is currently simply an ABI hash, we skip shadowing, because +the packages are---in principle---interchangeable. + +There are currently a number of proposed changes to this state of affairs: + +\begin{itemize} + \item Change installed package IDs to not be based on ABI hashes. + ABI hashes have a number of disadvantages as identifiers for + packages in the database. First, they cannot be computed until + after compilation, which gave the multi-instance GSoC project a + few years some headaches. Second, it's not really true that + programs with identical ABI hashes are interchangeable: a new + package may be ABI compatible but have different semantics. + Thus, installed package IDs are a poor unique identifier for + packages in the package database. However, because GHC does not + give ABI stability guarantees, it would not be possible to + assume from here that packages with the same installed package + ID are ABI compatible. + + \item Relaxing the uniqueness constraint on package IDs. There are + actually two things that could be done here. First, since we + have augmented package IDs with dependency resolution + information to form package keys, we could simply state that + package keys uniquely identify a package in a database. + Shadowing rules can be implemented in the same way as before, by + preferring the instance topmost on the stack. Second, we could + also allow \emph{same-database} shadowing: that is, not even + package keys are guaranteed to be unique in a database: instead, + installed package IDs are the sole unique identifier of a + package. The motivation behind this architecture is to treat + the package database more like a cache rather than a database: + information about shadowing is separately maintained and used. +\end{itemize} + +Edward thinks same-database shadowing is wrong. What same-database +shadowing implies is that there are multiple incompatible ``package +hierarchies'' (possibly with a shared root), one of which shadows the +other hierarchy. It is now absolutely essential to somehow identify +which hierarchy should be visible (the rest being shadowed). It seems +better to me to explicitly reify this hierarchy as a hierarchy of +package databases. For example, instead of having (installed package +IDs) \texttt{foo-1.0-hash1} and \texttt{foo-1.0-hash2} in the same +database, have a separate database for each, and the respective dependencies +which are built against those packages. (Notice that all of these dependencies +are incompatible with one another.) Furthermore, because of the precedence +of shadowing, we can store one of these installed package IDs in the primary +database, and then layer the second on top of it (as it takes precedence, +it automatically invalidates all of the packages depending on \texttt{foo-1.0-hash1}, +while keeping packages which are otherwise compatible.) + \section{Shapeless Backpack}\label{sec:simplifying-backpack} Backpack as currently defined always requires a \emph{shaping} pass, |