summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-09 19:01:11 +0100
committerEdward Z. Yang <ezyang@cs.stanford.edu>2014-07-09 19:01:11 +0100
commita77e07970a89f2101e3c4f0429c6f426f06f1faf (patch)
treec7a131c92f9cc9e9d97bff7bdf65cd8011ff40c7 /docs
parentafe7bc188804dabf4c35c77e5442c0fe10b32afd (diff)
downloadhaskell-a77e07970a89f2101e3c4f0429c6f426f06f1faf.tar.gz
Start expanding out linking text
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r--docs/backpack/backpack-impl.tex331
1 files changed, 239 insertions, 92 deletions
diff --git a/docs/backpack/backpack-impl.tex b/docs/backpack/backpack-impl.tex
index e7d32c3137..ecf047c8ca 100644
--- a/docs/backpack/backpack-impl.tex
+++ b/docs/backpack/backpack-impl.tex
@@ -5,6 +5,9 @@
\usepackage{float}
\usepackage{titling}
\usepackage{hyperref}
+\usepackage{tikz}
+\usetikzlibrary{arrows}
+\usetikzlibrary{positioning}
\setlength{\droptitle}{-6em}
\newcommand{\ghcfile}[1]{\textsl{#1}}
@@ -170,39 +173,53 @@ speedup.
\section{Goals}
-There are actually a number of different goals we have for modifying the
-packaging system, some of which are subsets of the Backpack system.
+Here are some of the high-level goals which motivate our improvements to
+the module system.
\begin{itemize}
- \item As a prerequisite, support multiple instances of containers-2.9 \emph{in the
- package database}. These instances may be compiled against
- different dependencies, have the same dependencies but different
- source files (as when a package is being developed), or be
- compiled with different options. It is less important to allow
- these instances to be linkable together.\footnote{Actually, I think
- this is completely orthogonal to Backpack, since we are going to treat
- dependencies as part of the package ID, so they would be considered
- separate entries in the package database anyway.}
-
- \item Support typechecking a library against a module interface
- as opposed to an actual implementation. This includes not
- only support proper, but also toolchain support for generating
- and keeping interface files up-to-date, and taking advantage
- of this discipline to augment Cabal package dependency resolution
- with proper signature packages. % See Section~\ref{sec:signature-packages} for more details.
-
- \item Support compiling the aforementioned libraries with actual implementations.
- It is \emph{not} a goal to be able to compile a library while only
- partially providing dependencies, and it is low priority to support
- mutually recursive implementations of these implementations.
+ \item Solve \emph{Cabal hell}, a situation which occurs when conflicting
+ version ranges on a wide range of dependencies leads to a situation
+ where it is impossible to satisfy the constraints. We're seeking
+ to solve this problem in two ways: first, we want to support
+ multiple instances of containers-2.9 in the database which are
+ compiled with different dependencies (and even link them
+ together), and second, we want to abolish (often inaccurate)
+ version ranges and move to a regime where packages depend on
+ signatures.
+
+ \item Support \emph{hermetic builds with sharing}. A hermetic build
+ system is one which simulates rebuilding every package whenever
+ it is built; on the other hand, actually rebuilding every time
+ is extremely inefficient (but what happens in practice with
+ Cabal sandboxes). We seek to solve this problem with the IHG work,
+ by allowing multiple instances of a package in the database, where
+ the only difference is compilation parameters. We don't care
+ about being able to link these together in a single program.
+
+ \item Support \emph{module-level pluggability} as an alternative to
+ existing (poor) usage of type classes. The canonical example are
+ strings, where a library might want to either use the convenient
+ but inefficient native strings, or the efficient packed Text data
+ type, but would really like to avoid having to say \verb|StringLike s => ...|
+ in all of their type signatures. While we do not plan on supporting
+ separate compilation, Cabal should understand how to automatically
+ recompile these ``indefinite'' packages when they are instantiated
+ with a new plugin.
+
+ \item Support \emph{separate modular development}, where a library and
+ an application using the library can be developed and type-checked
+ separately, intermediated by an interface. This is applicable in
+ the pluggability example as well, where we want to ensure that all
+ of the $M \times N$ configurations of libraries versus applications
+ type check, by only running the typechecker $M + N$ times. A closely
+ related concern is related toolchain support for extracting a signature
+ from an existing implementation, as current Haskell culture is averse
+ to explicitly writing separate signature files.
+
+ \item Subsume existing support for \emph{mutually recursive modules},
+ without the double vision problem.
\end{itemize}
-A lower priority goal is to actually allow multiple instances of
-containers-2.9 to be linked together in the same executable
-program.\footnote{In particular, this requires changes to how linker
-symbols are assigned. However, this feature is important to
-implement a number of Backpack features.}
-
A \emph{non-goal} is to allow users to upgrade upstream libraries
without recompiling downstream. This is an ABI concern and we're not
going to worry about it.
@@ -269,14 +286,7 @@ of a package to be installed at the same time, use of \emph{random
numbers} was used to workaround the inability to get an ABI early
enough. We are not using this plan.
-\section{Infrastructural improvements}
-
-There are some infrastructural improvements that must be made before
-Backpack proper can be implemented. These additions are described in
-red in the architectural diagrams. The current structure of this
-section is to describe the additions bottom up.
-
-\subsection{Concrete physical identity = PackageId + Module name + Module dependencies}\label{sec:ipi}
+\section{Concrete physical identity = PackageId + Module name + Module dependencies}\label{sec:ipi}
\begin{figure}
\center{\begin{tabular}{r l}
@@ -339,10 +349,6 @@ to do away with full package names and versions, and instead use just a
base-62 encoded hash, perhaps with the first four characters of the package
name for user-friendliness.
-Edward: I'm still partial to a short hash of the dependency bits (or
-even Simon's registry of short strings to dependency trees), and keeping
-everything else the same.
-
\paragraph{Wired-in names} One annoying thing to remember is that GHC
has wired-in names, which refer to packages without any version. Now
these wired names also have to accomodate dependency trees. A
@@ -388,16 +394,17 @@ to have free variables, which represent holes. Handling this is a
tricky question, and we defer it to Section~\ref{sec:typechecking-indefinite}, when
we talk about packages with holes.
-\subsection{Flatten the package database}\label{sec:flatten}
+\section{The granularity of packages versus system libraries}\label{sec:flatten}
-In the previous section, I implied that the \emph{module} dependencies
-could be recorded with the \emph{package} ID\@. Type error? Yes!
+In the original design of GHC's packaging system, the intent was that
+a package be compiled into a system library, e.g. \verb|libHSbase-4.7.1.0.so|
+or \verb|libHSbase-4.7.1.0.a|. Dependencies between packages translate simply
+into dependencies between system libraries.
-In the old world order, we think of dependencies as being associated with
-packages as a whole. However, Paper Backpack specifies dependency tracking
-at the \emph{module} level, which means that different compilations of
-an indefinite package may result in modules which should be identified.
-Here is a Backpack package which demonstrates the issue:
+Interestingly enough, Paper Backpack specifies dependency tracking at
+the \emph{module} level, which is a finer granularity than we have been
+tracking previously. Here is a Backpack package which demonstrates the
+issue:
\begin{verbatim}
package a where
@@ -417,48 +424,188 @@ package y where
include a
\end{verbatim}
-Here, we have a package \verb|a| which probably should have been defined as
-two separate packages (since \verb|A| only relies on \verb|P| and \verb|B|
-only relies on \verb|Q|), but the functionality has been glommed together.
-Then, the downstream packages \verb|x| and \verb|y| fill in the holes using the
-same implementation of \verb|P|, but differing implementations of \verb|Q|.
-(For more explanation about how we would go about compiling this set of
-packages, please see Section~\ref{sec:compiling-definite}.)
-
-Is module \verb|A| from package \verb|x| the same as module \verb|A|
-from package \verb|y|? In Paper Backpack, the answer is yes.
-However, presently, the installed package database acts as a cache at the \emph{package}
-level; code is only shared if it comes from the same package. Can we share
-packages \verb|x| and \verb|y|? No!
-\verb|x:B| is \emph{not} the same module as \verb|y:B| (they are using differing
-versions of \verb|Q|). The upshot is that we are in an awkward position,
-where package \verb|a| contains some modules which must be distinct, and other
-modules which must be unified over several installs.
-
-The theory, thus, demands that we flatten the package database, so that
-we no longer insist that all compiled code that is produced when a
-package is installed live together in a single directory: instead,
-\emph{the installed package database is a directory of physical module
-identities to objects/interfaces}. Installed packages are represented
-as (possibly overlapping) sets over this store of modules.
-
-\paragraph{Do we really care about this use case?} Scott gave me one
-good argument why this is a nice feature to have: it permits users to
-modularize third-party packages which were not packaged with modularity
-in mind, but happen to be modular. For example, when libraries ship
-with test-cases, they currently have to split these test-cases to separate
-packages, so as to not introduce spurious dependencies with various
-test frameworks, which the user may not have or care about. If dependencies
-are done on a per-module basis, as long as the user doesn't import a test
-module, they never gain the extra dependency. Another situation is when
-a library also happens to have a helper utility module which doesn't have
-any of the dependencies of the primary library.
-
-One could argue that people already understand it is good practice to
-separate such cases into separate packages, and there is no pressing
-need to allow for sloppy habits. The counterargument, however, is that
-you are often not in control of these third-party libraries, and the
-more control in the end-user's hands, the better.
+Here, we have a package \verb|a| which probably should have been defined
+as two separate packages (since \verb|A| only relies on \verb|P| and
+\verb|B| only relies on \verb|Q|), but the functionality has been
+glommed together. Then, the downstream packages \verb|x| and \verb|y|
+fill in the holes using the same implementation of \verb|P|, but
+differing implementations of \verb|Q|. (For more explanation about how
+we would go about compiling this set of packages, please see
+Section~\ref{sec:compiling-definite}.) The operant question: is module
+\verb|A| from package \verb|x| the same as module \verb|A| from package
+\verb|y|? In Paper Backpack, the answer is yes.
+
+From an end-user perspective, why is this desirable? One answer is that
+it permits users to modularize third-party packages which were not
+packaged with modularity in mind, but happen to be modular. For
+example, when libraries ship with test-cases, they currently have to
+split these test-cases to separate packages, so as to not introduce
+spurious dependencies with various test frameworks, which the user may
+not have or care about. If dependencies are done on a per-module basis,
+as long as the user doesn't import a test module, they never gain the
+extra dependency. Another situation is when a library also happens to
+have a helper utility module which doesn't have any of the dependencies
+of the primary library. And yet another situation is when we want to
+define type class instances for data types without orphan instances, but
+without pulling in a pile of packages defining type classes for things
+an end user doesn't care about.
+
+From an implementation perspective, however, this answer is quite distressing:
+
+\begin{enumerate}
+ \item When I install package \verb|x| and then install package
+ \verb|y|, what should happen to the installed module files for
+ \verb|A|? Currently, the installed package database organizes
+ these files at the package level, so it is not clear where these
+ files should live---we might even accidentally duplicate them!
+ Similarly, when I typecheck, I need to ensure that the original
+ name for \verb|x:A| and \verb|y:A| are the same: thus, the
+ PackageId I assign cannot be \verb|x| or \verb|y|.
+
+ \item When I create a \verb|libHSx.so| and a \verb|libHSy.so|, which
+ dynamic library contains the object files for \verb|A|? Either
+ of these dynamic libraries could be used alone, so both must
+ contain \verb|A|. But an application could also use both at the
+ same time, at which point a program will see two copies of the
+ program text for \verb|A|.
+\end{enumerate}
+
+The first problem can be solved by ``flattening'' the package database,
+so that instead of organizing object files by library, the database
+is just a directory of physical module identities to objects/interfaces.
+Instaled packages are now represented as (possibly overlapping) sets over
+this store of modules. However, the second problem is far more persistent:
+if a package is a dynamic library, we still are unable to get the sharing of
+object files necessary. (This problem isn't present for static linking, where
+one doesn't actually have to link against \verb|libHSx.a|, the object files
+are just fine.)
+
+In the next few sections, we describe a few different designs for Backpack which
+vary the granularity of packages. In general, the coarser the granularity,
+the closer to the current package-library correspondence the design is, but
+at the cost of giving up more sharing. Finer granularity requires smaller
+and smaller corresponding libraries. We'll use the following running example.
+
+\begin{verbatim}
+package p where
+ A :: [ a :: Bool ]
+ B = [ import A; b = a ]
+ C = [ c = True ]
+
+package q where
+ A = [ a = False ]
+ include p
+ D = [ import C; d = c ]
+ E = [ import C; e = c ]
+\end{verbatim}
+
+As a notational convenience, we'll omit the full physical identity of a
+module when it's clear from context. We'll start by recapping the
+finest granularity, and coarsen.
+
+\subsection{One library per physical module identity}
+
+In this world, we generate a library per physical module identity.
+Briefly, the physical module identities for our running example
+are \verb|q:A|, \verb|p:B(q:A)|, \verb|p:C|, \verb|q:D(p:C)| and \verb|q:E(p:C)|.
+This results in the following libraries (and their respective
+dependencies): \\
+
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm,
+ thick,m/.style={rectangle,draw,font=\sffamily\Large\bfseries}]
+ \node[m] (1) {libHSq:A.so};
+ \node[m] (2) [left of=1] {libHSp:B.so};
+ \node[m] (3) [below=0.3cm of 1] {libHSp:C.so};
+ \node[m] (4) [left of=3] {libHSq:D.so};
+ \node[m] (5) [below=0.3cm of 4] {libHSq:E.so};
+ \path
+ (2) edge node {} (1)
+ (4) edge node {} (3)
+ (5) edge node {} (3)
+ ;
+\end{tikzpicture}
+
+Notice that there is no ``ordering'' that the packages could have been
+built in; rather, we walk the package tree, building modules as we encounter
+them (thus, the instantiated package \verb|p| is fully built before \verb|q|).
+
+\subsection{One library per package identity}
+
+In this world, we simplify physical module identities in the following
+way: we the module names and only consider the package components of
+physical module identities to identify what library a package should
+live in. In the running example, the simplified physical module identities
+are as follows: \verb|q|, \verb|p(q)|, \verb|p| and \verb|q(p)|. This
+results in a very similar module graph: \\
+
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm,
+ thick,m/.style={rectangle,draw,font=\sffamily\Large\bfseries}]
+ \node[m] (1) {libHSq.so (A)};
+ \node[m] (2) [left of=1] {libHSp(q).so (B)};
+ \node[m] (3) [below=0.5cm] {libHSp.so (C)};
+ \node[m] (4) [left of=3] {libHSq(p).so (D,E)};
+ \path
+ (2) edge node [right] {} (1)
+ (4) edge node [right] {} (3);
+\end{tikzpicture}
+
+The primary difference seen here is that modules \verb|q:D(p:C)| and \verb|q:E(p:C)| have
+been placed in the same library, \verb|libHSq(p).so|, since they both erase
+to \verb|q(p)|.
+
+This scheme preserves all of the type-level sharing properties that Backpack
+requires, since modules with identical physical identities are guaranteed
+to be placed into the same library. However, there are a few cases where
+one might need to add extra code to a library ``after the fact'', usually
+when renaming is involved:
+
+\begin{verbatim}
+package p where
+ A :: [ a :: Bool ]
+ B = [ import A; b = a ]
+package q where
+ A1 = [ a = True ]
+ A2 = [ a = False ]
+package linker1 where
+ include q
+ include p (A as A1, B as B1)
+\end{verbatim}
+
+Installing \verb|linker1| results in libraries \verb|libHSq.so| (A1 and A2)
+as well as \verb|libHSp(q).so| (physical identity \verb|p:B(q:A1)|---note the 1).
+
+However, now, suppose we install this package:
+
+\begin{verbatim}
+package linker2 where
+ include q
+ include p (A as A2, B as B2)
+\end{verbatim}
+
+We get a new module B with physical identity \verb|p:B(q:A2)|.
+Unfortunately, this erases to \verb|p(q)|, so we are obligated to also
+place it in \verb|libHSq(p).so|.
+
+\subsection{One library per package identity (more aggressive)}
+
+(ToDo: Remember what this did)
+
+\subsection{One library per definite package}
+
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm,
+ thick,m/.style={rectangle,draw,font=\sffamily\Large\bfseries}]
+ \node[m] (1) {libHSq.so (A,B,C,D,E)};
+\end{tikzpicture}
+
+In this world, we create a dynamic library per definite package (package with
+no holes). Indefinite packages don't get compiled into libraries, the code
+is duplicated and type equality is only seen if a type came from the same
+definite package. In the running example, we only generate \verb|libHSq.so|
+which exports all of the modules (\verb|p| is indefinite), and if another
+package instantiates \verb|p|, the instances of \verb|C| will be considered
+different.
+
+(ToDo: Why is this world problematic?)
\paragraph{Operating system linking} When the package database is flattened
into a collection of modules, it becomes less clear how to generate library
@@ -519,7 +666,7 @@ Backpack semantics, and rules out modularization without splitting a package
into multiple packages. Maybe Scott can give other reasons why this
would not be so good. SPJ is quite keen on this plan.
-\subsection{Exposed modules should allow external modules}\label{sec:reexport}
+\section{Exposed modules should allow external modules}\label{sec:reexport}
In Backpack, the definition of a package consists of a logical context,
which maps logical module names to physical module names. These do not