summaryrefslogtreecommitdiff
path: root/docs/backpack
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-31 14:05:02 +0100
committerEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-31 14:05:07 +0100
commita0ff1eb3c0745230cd70525853ca741e08a1f34d (patch)
tree5b85198270207395498647a760b1738f60e19c6a /docs/backpack
parent6fa6caad0cb4ba99b2c0b444b0583190e743dd63 (diff)
downloadhaskell-a0ff1eb3c0745230cd70525853ca741e08a1f34d.tar.gz
[backpack] Package selection
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs/backpack')
-rw-r--r--docs/backpack/backpack-impl.tex625
1 files changed, 412 insertions, 213 deletions
diff --git a/docs/backpack/backpack-impl.tex b/docs/backpack/backpack-impl.tex
index a921fc38df..0d4a5807ba 100644
--- a/docs/backpack/backpack-impl.tex
+++ b/docs/backpack/backpack-impl.tex
@@ -44,218 +44,222 @@
The purpose of this document is to describe an implementation path
for Backpack in GHC\@.
-We start off by outlining the current architecture of GHC, ghc-pkg and Cabal,
-which constitute the existing packaging system. We then state what our subgoals
-are, since there are many similar sounding but different problems to solve. Next,
-we describe the ``probably correct'' implementation plan, and finish off with
-some open design questions. This is intended to be an evolving design document,
-so please contribute!
-
\tableofcontents
-\section{Current packaging architecture}
-
-The overall architecture is described in Figure~\ref{fig:arch}.
-
-\begin{figure}[H]
- \center{\scalebox{0.8}{\includegraphics{arch.png}}}
-\label{fig:arch}\caption{Architecture of GHC, ghc-pkg and Cabal. Green bits indicate additions from upcoming IHG work, red bits indicate additions from Backpack. Orange indicates a Haskell library.}
-\end{figure}
-
-Here, arrows indicate dependencies from one component to another. Color
-coding is as follows: orange components are libaries, green components
-are to be added with the IHG work, red components are to be added with
-Backpack. (Thus, black and orange can be considered the current)
-
-\subsection{Installed package database}
-
-Starting from the bottom, we have the \emph{installed package database}
-(actually a collection of such databases), which stores information
-about what packages have been installed are thus available to be
-compiled against. There is both a global database (for the system
-administrator) and a local database (for end users), which can be
-updated independently. One way to think about the package database
-is as a \emph{cache of object code}. In principle, one could compile
-any piece of code by repeatedly recompiling all of its dependencies;
-the installed package database describes when this can be bypassed.
-
-\begin{figure}[H]
- \center{\scalebox{0.8}{\includegraphics{pkgdb.png}}}
-\label{fig:pkgdb}\caption{Anatomy of a package database.}
-\end{figure}
-
-In Figure~\ref{fig:pkgdb}, we show the structure of a package database.
-The installed package are created from a Cabal file through the process
-of dependency resolution and compilation. In database terms, the primary key
-of a package database is the InstalledPackageId
-(Figure~\ref{fig:current-pkgid}). This ID uniquely identifies an
-instance of an installed package. The PackageId omits the ABI hash and
-is used to qualify linker exported symbols: the current value of this
-parameter is communicated to GHC using the \verb|-package-id| flag.
-
-In principle, packages with different PackageIds should be linkable
-together in the same compiled program, whereas packages with the same
-PackageId are not (even if they have different InstalledPackageIds). In
-practice, GHC is currently only able to select one version of a package,
-as it clears out all old versions of the package in
-\ghcfile{compiler/main/Package.lhs}:applyPackageFlag.
-
-\begin{figure}
- \center{\begin{tabular}{r l}
- PackageId & package name, package version \\
- InstalledPackageId & PackageId, ABI hash \\
- \end{tabular}}
-\label{fig:current-pkgid}\caption{Current structure of package identifiers.}
-\end{figure}
-
-The database entry itself contains the information from the installed package ID,
-as well as information such as what dependencies it was linked against, where
-its compiled code and interface files live, its compilation flags, what modules
-it exposes, etc. Much of this information is only relevant to Cabal; GHC
-uses a subset of the information in the package database.
-
-\subsection{GHC}
+\section{What we are trying to solve}
-The two programs which access the package database directly are GHC
-proper (for compilation) and ghc-pkg (which is a general purpose
-command line tool for manipulating the database.) GHC relies on
-the package database in the following ways:
-
-\begin{itemize}
- \item It imports the local and global package databases into
- its runtime database, and applies modifications to the exposed
- and trusted status of the entries via the flags \verb|-package|
- and others (\ghcfile{compiler/main/Packages.lhs}). The internal
- package state can be seen at \verb|-v4| or higher.
- \item It uses this package database to find the location of module
- interfaces when it attempts to load the module info of an external
- module (\ghcfile{compiler/iface/LoadIface.hs}).
-\end{itemize}
+While the current ecosystem has proved itself serviceable for many years,
+there are a number of major problems which causes significant headaches
+for many users. Here are some of them:
-GHC itself performs a type checking phase, which generates an interface
-file representing the module (so that later invocations of GHC can load the type
-of a module), and then after compilation projects object files and linked archives
-for programs to use.
+\subsection{Package reinstalls are destructive}\label{sec:destructive}
-\paragraph{Original names} Original names are an important design pattern
-in GHC\@.
-Sometimes, a name can be exposed in an hi file even if its module
-wasn't exposed. Here is an example (compiled in package R):
+When attempting to install a new package, you might get an error like
+this:
\begin{verbatim}
-module X where
- import Internal (f)
- g = f
-
-module Internal where
- import Internal.Total (f)
+$ cabal install hakyll
+cabal: The following packages are likely to be broken by the reinstalls:
+pandoc-1.9.4.5
+Graphalyze-0.14.0.0
+Use --force-reinstalls if you want to install anyway.
\end{verbatim}
-Then in X.hi:
+While this error message is understandable if you're really trying to
+reinstall a package, it is quite surprising that it can occur even if
+you didn't ask for any reinstalls!
+
+The underlying cause of this problem is related to an invariant Cabal
+currently enforces on a package database: there can only be one instance
+of a package for any given package name and version. This means that it
+is not possible to install a package multiple times, compiled against
+different dependencies. However, sometimes, reinstalling a package with
+different dependencies is the only way to fulfill version bounds of a
+package! For example: say we have three packages \pname{a}, \pname{b}
+and \pname{c}. \pname{b-1.0} is the only version of \pname{b}
+available, and it has been installed and compiled against \pname{c-1.0}.
+Later, the user installs an updated version \pname{c-1.1} and then
+attempts to install \pname{a}, which depends on the specific versions
+\pname{c-1.1} and \pname{b-1.0}. We \emph{cannot} use the already
+installed version of \pname{b-1.0}, which depends on the wrong version
+of \pname{c}, so our only choice is to reinstall \pname{b-1.0} compiled
+against \pname{c-1.1}. This will break any packages, e.g. \pname{d},
+which were built against the old version of \pname{b-1.0}.
+
+Our solution to this problem is to \emph{abolish} destructive package
+installs, and allow a package to be installed multiple times with the same
+package name and version. However, allowing this poses some interesting
+user interface problems, since package IDs are now no longer unambiguous
+identifiers.
+
+\subsection{Version bounds are often over/under-constrained}
+
+When attempting to install a new package, Cabal might fail in this way:
\begin{verbatim}
-g = <R.id, Internal.Total, f> (this is the original name)
+$ cabal install hledger-0.18
+Resolving dependencies...
+cabal: Could not resolve dependencies:
+# pile of output
\end{verbatim}
-(The reason we refer to the package as R.id is because it's the
-full package ID, and not just R).
-
-\subsection{hs-boot}
-
-\verb|hs-boot| is a special mechanism used to support recursive linking
-of modules within a package, today. Suppose I have a recursive module
-dependency between modules and A and B. I break one of\ldots
+There are a number of possible reasons why this could occur, but usually
+it's because some of the packages involved have over-constrained version
+bounds, which are resulting in an unsatisfiable set of constraints (or,
+at least, Cabal gave up backtracking before it found a solution.) To
+add insult to injury, most of the time the bound is nonsense and removing
+it would result in a working compilation. In fact, this situation is
+so common that Cabal has a flag \verb|--allow-newer| which lets you
+override the package upper bounds.
+
+However, the flip-side is when Cabal finds a satisfying set, but your
+compilation fails with a type error. Here, you had an under-constrained
+set of version bounds which didn't actually reflect the compatible
+versions of a package, and Cabal picked a version of the package which
+was incompatible.
+
+Our solution to this problem is to use signatures instead of version
+numbers as the primary mechanism by which compatibility is determined:
+e.g., if it typechecks, it's a valid choice. Version numbers can still
+be used to reflect semantic changes not seen in the types (in
+particular, ruling out buggy versions of a package is a useful
+operation), but these bounds are empirical observations and can be
+collected after-the-fact.
+
+\subsection{It is difficult to support multiple implementations of a type}
+
+This problem is perhaps best described by referring to a particular
+instance of it Haskell's ecosystem: the \texttt{String} data type. Haskell,
+by default, implements strings as linked lists of integers (representing
+characters). Many libraries use \texttt{String}, because it's very
+convenient to program against. However, this representation is also
+very \emph{slow}, so there are alternative implementations such as
+\texttt{Text} which implement efficient, UTF-8 encoded packed byte
+arrays.
+
+Now, suppose you are writing a library and you don't care if the user of
+your library is using \texttt{String} or \texttt{Text}. However, you
+don't want to rewrite your library twice to support both data types:
+rather, you'd like to rely on some \emph{common interface} between the
+two types, and let the user instantiate the implementation. The only
+way to do this in today's Haskell is using type classes; however, this
+necessitates rewriting all type signatures from a nice \texttt{String ->
+String} to \texttt{StringLike s => s -> s}. The result is less readable,
+required a large number of trivial edits to type signatures, and might
+even be less efficient, if GHC does not appropriately specialize your code
+written in this style.
+
+Our solution to this problem is to introduce a new mechanism of
+pluggability: module holes, which let us use types and functions from a
+module \texttt{Data.String} as before, but defer choosing \emph{what}
+module should be used in the implementation to some later point (or
+instantiate the code multiple times with different choices.)
+
+\subsection{Fast moving APIs are difficult to develop/develop against}
+
+Most packages that are uploaded to Hackage have package authors which pay
+some amount of attention to backwards compatibility and avoid making egregious
+breaking changes. However, a package like the \verb|ghc-api| follows a
+very different model: the library is a treated by its developers as an
+internal component of an application (GHC), and is frequently refactored
+in a way that changes its outwards facing interface.
+
+Arguably, an application like GHC should design a stable API and
+maintain backwards compatibility against it. However, this is a lot of
+work (including refactoring) which is only being done slowly, and in the
+meantime, the dump of all the modules gives users the functionality they
+want (even if it keeps breaking every version.)
+
+One could say that the core problem is there is no way for users to
+easily communicate to GHC authors what parts of the API they rely on. A
+developer of GHC who is refactoring an interface will often rely on the
+typechecker to let them know which parts of the codebase they need to
+follow and update, and often could say precisely how to update code to
+use the new interface. User applications, which live out of tree, don't
+receive this level of attention.
+
+Our solution is to make it possible to typecheck the GHC API against a
+signature. Important consumers can publish what subsets of the GHC API
+they rely against, and developers of GHC, as part of their normal build
+process, type-check against these signatures. If the signature breaks,
+a developer can either do the refactoring differently to avoid the
+compatibility-break, or document how to update code to use the new API\@.
+
+\section{Backpack in a nutshell}
+
+For a more in-depth tutorial about Backpack's features, check out Section 2
+of the original Backpack paper. In this section, we briefly review the
+most important points of Backpack's design.
+
+\paragraph{Thinning and renaming at the module level}
+A user can specify a build dependency which only exposes a subset of
+modules (possibly under different names.) By itself, it's a way for the
+user to resolve ambiguous module imports at the package level, without
+having to use the \texttt{PackageImports} syntax extension.
+
+\paragraph{Holes (abstract module definitions)} The core
+component of Backpack's support for \emph{separate modular development}
+is the ability to specify abstract module bindings, or holes, which give
+users of the module an obligation to provide an implementation which
+fulfills the signature of the hole. In this example:
-(ToDo: describe how hs-boot mechanism works)
+\begin{verbatim}
+package p where
+ A :: [ ... ]
+ B = [ import A; ... ]
+\end{verbatim}
-\subsection{Cabal}
+\verb|p| is an \emph{indefinite package}, which cannot be compiled until
+an implementation of \m{A} is provided. However, we can still type check
+\m{B} without any implementation of \m{A}, by type checking it against
+the signature. Holes can be put into signature packages and included
+(depended upon) by other packages to reuse definitions of signatures.
-Cabal is the build system for GHC, we can think of it as parsing a Cabal
-file describing a package, and then making (possibly multiple)
-invocations to GHC to perform the appropriate compilation. What
-information does Cabal pass onto GHC\@? One can get an idea for this by
-looking at a prototypical command line that Cabal invokes GHC with:
+\paragraph{Filling in holes with an implementation}
+A hole in an indefinite package can be instantiated in a \emph{mix-in}
+style: namely, if a signature and an implementation have the same name,
+they are linked together:
\begin{verbatim}
-ghc --make
- -package-name myapp-0.1
- -hide-all-packages
- -package-id containers-0.9-ABCD
- Module1 Module2
+package q where
+ A = [ ... ]
+ include p -- has signature A
\end{verbatim}
-There are a few things going on here. First, Cabal has to tell GHC
-what the name of the package it's compiling (otherwise, GHC can't appropriately
-generate symbols that other code referring to this package might generate).
-There are also a number of commands which configure its in-memory view of
-the package database (GHC's view of the package database may not directly
-correspond to what is on disk). There's also an optimization here: in principle,
-GHC can compile each module one-by-one, but instead we use the \verb|--make| flag
-because this allows GHC to reuse some data structures, resulting in a nontrivial
-speedup.
-
-(ToDo: describe cabal-install/sandbox)
+Renaming is often useful to rename a module (or a hole) so that a signature
+and implementation have the same name and are linked together.
+An indefinite package can be instantiated multiple times with different
+implementations: the \emph{applicativity} of Backpack means that if
+a package is instantiated separately with the same module, the results
+are type equal:
-\section{Goals}
+\begin{verbatim}
+package q' where
+ A = [ ... ]
+ include p (A, B as B1)
+ include p (A, B as B2)
+ -- B1 and B2 are equivalent
+\end{verbatim}
-Here are some of the high-level goals which motivate our improvements to
-the module system.
+\paragraph{Combining signatures together}
+Unlike implementations, it's valid for a multiple signatures with the
+same name to be in scope.
-\begin{itemize}
- \item Solve \emph{Cabal hell}, a situation which occurs when conflicting
- version ranges on a wide range of dependencies leads to a situation
- where it is impossible to satisfy the constraints. We're seeking
- to solve this problem in two ways: first, we want to support
- multiple instances of containers-2.9 in the database which are
- compiled with different dependencies (and even link them
- together), and second, we want to abolish (often inaccurate)
- version ranges and move to a regime where packages depend on
- signatures. Version ranges may still be used to indicate important
- semantic changes (e.g., bugs or bad behavior on the part of package
- authors), but they should no longer drive dependency resolution
- and often only be recorded after the fact.
-
- \item Support \emph{hermetic builds with sharing}. A hermetic build
- system is one which simulates rebuilding every package whenever
- it is built; on the other hand, actually rebuilding every time
- is extremely inefficient (but what happens in practice with
- Cabal sandboxes). We seek to solve this problem with the IHG work,
- by allowing multiple instances of a package in the database, where
- the only difference is compilation parameters. We don't care
- about being able to link these together in a single program.
-
- \item Support \emph{module-level pluggability} as an alternative to
- existing (poor) usage of type classes. The canonical example are
- strings, where a library might want to either use the convenient
- but inefficient native strings, or the efficient packed Text data
- type, but would really like to avoid having to say \verb|StringLike s => ...|
- in all of their type signatures. While we do not plan on supporting
- separate compilation, Cabal should understand how to automatically
- recompile these ``indefinite'' packages when they are instantiated
- with a new plugin.
-
- \item Support \emph{separate modular development}, where a library and
- an application using the library can be developed and type-checked
- separately, intermediated by an interface. The canonical example
- here is the \verb|ghc-api|, which is a large, complex API that
- the library writers (GHC developers) frequently change---the ability
- for downstream projects to say, ``Here is the API I'm relying on''
- without requiring these projects to actually be built would greatly
- assist in keeping the API stable. This is applicable in
- the pluggability example as well, where we want to ensure that all
- of the $M \times N$ configurations of libraries versus applications
- type check, by only running the typechecker $M + N$ times. A closely
- related concern is related toolchain support for extracting a signature
- from an existing implementation, as current Haskell culture is averse
- to explicitly writing separate signature files.
-
- \item Subsume existing support for \emph{mutually recursive modules},
- without the double vision problem.
-\end{itemize}
+\begin{verbatim}
+package a-sig where
+ A :: [ ... ]
+package a-sig2 where
+ A :: [ ... ]
+package q where
+ include a-sig
+ include a-sig2
+ B = [ import A; ... ]
+\end{verbatim}
-A \emph{non-goal} is to allow users to upgrade upstream libraries
-without recompiling downstream. This is an ABI concern and we're not
-going to worry about it.
+These signatures \emph{merge} together, providing the union of the
+functionality (assuming the types of individual entities are
+compatible.) Backpack has a very simple merging algorithm: types must
+match exactly to be compatible (\emph{width} subtyping).
\clearpage
@@ -718,38 +722,233 @@ but we must record this tree \emph{even} when our package has no holes.
% As a final example, the full module
% identity of \m{B1} in Figure~\ref{fig:granularity} may actually be $\pname{p-0.9(q-1.0[p-0.9]:A1)}$:\m{B}.
-
-\subsection{Implementation}
-
-In GHC's current packaging system, a single package compiles into a
-single entry in the installed package database, indexed by the package
-key. This property is preserved by package-level granularity, as we
-assign the same package key to all modules. Package keys provide an
-easy mechanism for sharing to occur: when an indefinite package is fully
-instantiated, we can check if we already have its package key installed
-in the installed package database. (At the end of this section, we'll
-briefly discuss some of the problems actually implementing Paper Backpack.)
-It is also important to note that we are \emph{willing to duplicate code};
-processes like this already happen in other parts of the compiler
-(such as inlining.)
-
-\paragraph{Relaxing package selection restrictions} As mentioned
-previously, GHC is unable to select multiple packages with the same
-package name (but different package keys). This restriction needs to be
-lifted. We should add a new flag \verb|-package-key|. GHC also knows
-about version numbers and will mask out old versions of a library when
-you make another version visible; this behavior needs to be modified.
-
\paragraph{Linker symbols} As we increase the amount of information in
PackageId, it's important to be careful about the length of these IDs,
as they are used for exported linker symbols (e.g.
\verb|base_TextziReadziLex_zdwvalDig_info|). Very long symbol names
hurt compile and link time, object file sizes, GHCi startup time,
-dynamic linking, and make gdb hard to use. As such, the current plan is
-to do away with full package names and versions, and instead use just a
-base-62 encoded hash, perhaps with the first four characters of the package
+dynamic linking, and make gdb hard to use. As such, we are going to
+do away with full package names and versions and instead use just a
+base-62 encoded hash, with the first five characters of the package
name for user-friendliness.
+\subsection{Package selection}
+
+When I fire up \texttt{ghci} with no arguments, GHC somehow creates
+out of thin air some consistent set of packages, whose modules I can
+load using \texttt{:m}. This functionality is extremely handy for
+exploratory work, but actually GHC has to work quite hard in order
+to generate this set of packages, the contents of which are all
+dumped into a global namespace. For example, GHC doesn't have access
+to Cabal's dependency solver, nor does it know \emph{which} packages
+the user is going to ask for, so it can't just run a constraint solver,
+get a set of consistent packages to offer and provide them to the user.\footnote{Some might
+argue that depending on a global environment in this fashion is wrong, because
+when you perform a build in this way, you have absolutely no ideas what
+dependencies you actually ended up using. But the fact remains that for
+end users, this functionality is very useful.}
+
+To make matters worse, while in the current design of the package database,
+a package is uniquely identified by its package name and version, in
+the Backpack design, it is \emph{mandatory} that we support multiple
+packages installed in the database with the same package name and version,
+and this can result in complications in the user model. This further
+complicates GHC's default package selection algorithm.
+
+In this section, we describe how the current algorithm operates (including
+what invariants it tries to uphold and where it goes wrong), and how
+to replace the algorithm to handle generalization to
+multiple instances in the package database. We'll also try to tease
+apart the relationship between package keys and installed package IDs in
+the database.
+
+\paragraph{The current algorithm} Abstractly, GHC's current package
+selection algorithm operates as follows. For every package name, select
+the package with the latest version (recall that this is unique) which
+is also \emph{valid}. A package is valid if:
+
+\begin{itemize}
+ \item It exists in the package database,
+ \item All of its dependencies are valid,
+ \item It is not shadowed by a package with the same package ID\footnote{Recall that currently, a package ID uniquely identifies a package in the package database} in
+ another package database (unless it is in the transitive closure
+ of a package named by \texttt{-package-id}), and
+ \item It is not ignored with \texttt{-ignore-package}.
+\end{itemize}
+
+Package validity is probably the minimal criterion for to GHC to ensure
+that it can actually \emph{use} a package. If the package is missing,
+GHC can't find the interface files or object code associated with the
+package. Ignoring packages is a way of pretending that a package is
+missing from the database.
+
+Package validity is also a very weak criterion. Another criterion we
+might hope holds is \emph{consistency}: when we consider the transitive
+closure of all selected packages, for any given package name, there
+should only be one instance providing that package. It is trivially
+easy to break this property: suppose that I have packages \pname{a-1.0},
+\pname{b-1.0} compiled against \pname{a-1.0}, and \pname{a-1.1}. GHC
+will happily load \pname{b-1.0} and \pname{a-1.1} together in the same
+interactive session (they are both valid and the latest versions), even
+though \pname{b-1.0}'s dependency is inconsistent with another package
+that was loaded. The user will notice if they attempt to treat entities
+from \pname{a} reexported by \pname{b-1.0} and entities from
+\pname{a-1.1} as type equal. Here is one user who had this problem:
+\url{http://stackoverflow.com/questions/12576817/}. In some cases, the
+problem is easy to work around (there is only one offending package
+which just needs to be hidden), but if the divergence is deep in two
+separate dependency hierarchies, it is often easier to just blow away
+the package database and try again.
+
+Perversely, destructive reinstallation helps prevent these sorts of
+inconsistent databases. While inconsistencies can arise when multiple
+versions of a package are installed, multiple versions will frequently
+lead to the necessity of reinstalls. In the previous example, if a user
+attempts to Cabal install a package which depends on \pname{a-1.1} and
+\pname{b-1.0}, Cabal's dependency solver will propose reinstalling
+\pname{b-1.0} compiled against \pname{a-1.1}, in order to get a
+consistent set of dependencies. If this reinstall is accepted, we
+invalidate all packages in the database which were previously installed
+against \pname{b-1.0} and \pname{a-1.0}, excluding them from GHC's
+selection process and making it more likely that the user will see a
+consistent view of the database.
+
+\paragraph{Enforcing consistent dependencies} From the user's
+perspective, it would be desirable if GHC never loaded a set of packages
+whose dependencies were inconsistent.
+There are two ways we can go
+about doing this. First, we can improve GHC's logic so that it doesn't
+pick an inconsistent set. However, as a point of design, we'd like to
+keep whatever resolution GHC does as simple as possible (in an ideal
+world, we'd skip the validity checks entirely, but they ended up being
+necessary to prevent broken database from stopping GHC from starting up
+at all). In particular, GHC should \emph{not} learn how to do
+backtracking constraint solving: that's in the domain of Cabal. Second,
+we can modify the logic of Cabal to enforce that the package database is
+always kept in a consistent state, similar to the consistency check
+Cabal applies to sandboxes, where it refuses to install a package to a
+sandbox if the resulting dependencies would not be consistent.
+
+The second alternative is a appealing, but Cabal sandboxes are currently
+designed for small, self-contained single projects, as opposed to the
+global ``universe'' that a default environment is intended to provide.
+For example, with a Cabal sandbox environment, it's impossible to
+upgrade a dependency to a new version without blowing away the sandbox
+and starting again. To support upgrades, Cabal needs to do some work:
+when a new version is put in the default set, all of the
+reverse-dependencies of the old version are now inconsistent. Cabal
+should offer to hide these packages or reinstall them compiled against
+the latest version. Cabal should also be able to snapshot the older
+environment which captures the state of the universe prior to the
+installation, in case the user wants to revert back.
+
+\paragraph{Modifying the default environment} Currently, after GHC
+calculates the default package environment, a user may further modify
+the environment by passing package flags to GHC, which can be used to
+explicitly hide or expose packages. How do these flags interact with
+our Cabal-managed environments? Hiding packages is simple enough,
+but exposing packages is a bit dicier. If a user asks for a different
+version of a package than in the default set, it will probably be
+inconsistent with the rest of the dependencies. Cabal would have to
+be consulted to figure out a maximal set of consistent packages with
+the constraints given.
+
+However, this use-case is rare. Usually, it's not because they want a
+specific version: the package is hidden simply because we're not
+interested in loading it by default (\pname{ghc-api} is the canonical
+example, since it dumps a lot of modules in the top level namespace).
+If we distinguish packages which are consistent but hidden, their
+loads can be handled appropriately.
+
+\paragraph{Consistency in Backpack} We have stated as an implicit
+assumption that if we have both \pname{foo-1.0} and \pname{foo-1.1}
+available, only one should be loaded at a time. What are the
+consequences if both of these packages are loaded at the same time? An
+import of \m{Data.Foo} provided by both packages would be ambiguous and
+the user might find some type equalities they expect to hold would not.
+However, the result is not \emph{unsound}: indeed, we might imagine a
+user purposely wanting two different versions of a library in the same
+program, renaming the modules they provided so that they could be
+referred to unambiguously. As another example, suppose that we have an
+indefinite package with a hole that is instantiated multiple times. In
+this case, a user absolutely may want to refer to both instantiations,
+once again renaming modules so that they have unique names.
+
+There are two consequences of this. First, while the default package
+set may enforce consistency, a user should still be able to explicitly
+ask for a package instance, renamed so that its modules don't conflict,
+and then use it in their program. Second, instantiated indefinite packages
+should \emph{never} be placed in the default set, since it's impossible
+to know which instantiation is the one the user prefers. A definite package
+can reexport an instantiated module under an unambiguous name if the user
+so pleases.
+
+\paragraph{Shadowing, installed package IDs, ABI hashes and package
+keys} Shadowing plays an important role for maintaining the soundness of
+compilation; call this the \emph{compatibility} of the package set. The
+problem it addresses is when there are two distinct implementations of a
+module, but because their package ID (or package key, in the new world
+order) are the same, they are considered type equal. It is absolutely
+wrong for a single program to include both implementations
+simultaneously (the symbols would conflict and GHC would incorrectly
+conclude things were type equal when they're not), so \emph{shadowing}'s
+job is to ensure that only one instance is picked, and all the other
+instances considered invalid (and their reverse-dependencies, etc.)
+Recall that in current GHC, within a package database, a package
+instance is uniquely identified by its package ID\@; thus, shadowing
+only needs to take place between package databases. An interesting
+corner case is when the same package ID occurs in both databases, but
+the installed package IDs are the \emph{same}. Because the installed
+package ID is currently simply an ABI hash, we skip shadowing, because
+the packages are---in principle---interchangeable.
+
+There are currently a number of proposed changes to this state of affairs:
+
+\begin{itemize}
+ \item Change installed package IDs to not be based on ABI hashes.
+ ABI hashes have a number of disadvantages as identifiers for
+ packages in the database. First, they cannot be computed until
+ after compilation, which gave the multi-instance GSoC project a
+ few years some headaches. Second, it's not really true that
+ programs with identical ABI hashes are interchangeable: a new
+ package may be ABI compatible but have different semantics.
+ Thus, installed package IDs are a poor unique identifier for
+ packages in the package database. However, because GHC does not
+ give ABI stability guarantees, it would not be possible to
+ assume from here that packages with the same installed package
+ ID are ABI compatible.
+
+ \item Relaxing the uniqueness constraint on package IDs. There are
+ actually two things that could be done here. First, since we
+ have augmented package IDs with dependency resolution
+ information to form package keys, we could simply state that
+ package keys uniquely identify a package in a database.
+ Shadowing rules can be implemented in the same way as before, by
+ preferring the instance topmost on the stack. Second, we could
+ also allow \emph{same-database} shadowing: that is, not even
+ package keys are guaranteed to be unique in a database: instead,
+ installed package IDs are the sole unique identifier of a
+ package. The motivation behind this architecture is to treat
+ the package database more like a cache rather than a database:
+ information about shadowing is separately maintained and used.
+\end{itemize}
+
+Edward thinks same-database shadowing is wrong. What same-database
+shadowing implies is that there are multiple incompatible ``package
+hierarchies'' (possibly with a shared root), one of which shadows the
+other hierarchy. It is now absolutely essential to somehow identify
+which hierarchy should be visible (the rest being shadowed). It seems
+better to me to explicitly reify this hierarchy as a hierarchy of
+package databases. For example, instead of having (installed package
+IDs) \texttt{foo-1.0-hash1} and \texttt{foo-1.0-hash2} in the same
+database, have a separate database for each, and the respective dependencies
+which are built against those packages. (Notice that all of these dependencies
+are incompatible with one another.) Furthermore, because of the precedence
+of shadowing, we can store one of these installed package IDs in the primary
+database, and then layer the second on top of it (as it takes precedence,
+it automatically invalidates all of the packages depending on \texttt{foo-1.0-hash1},
+while keeping packages which are otherwise compatible.)
+
\section{Shapeless Backpack}\label{sec:simplifying-backpack}
Backpack as currently defined always requires a \emph{shaping} pass,