summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEdward Z. Yang <ezyang@cs.stanford.edu>2014-06-18 22:59:23 +0100
committerEdward Z. Yang <ezyang@cs.stanford.edu>2014-06-18 22:59:23 +0100
commit2a41db332d9738198418513fe99abb04a0619928 (patch)
treeb9b7b116e1dce252e53886f2423b973aaf760323 /docs
parent2ba1a560a92f4de0938d03c48b0f2e00e382d6b6 (diff)
downloadhaskell-2a41db332d9738198418513fe99abb04a0619928.tar.gz
In progress Backpack implementation docs.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Diffstat (limited to 'docs')
-rw-r--r--docs/backpack/.gitignore10
-rw-r--r--docs/backpack/Makefile2
-rw-r--r--docs/backpack/arch.pngbin0 -> 107562 bytes
-rw-r--r--docs/backpack/backpack-impl.bib17
-rw-r--r--docs/backpack/backpack-impl.tex298
-rw-r--r--docs/backpack/diagrams.pdfbin0 -> 145951 bytes
-rw-r--r--docs/backpack/diagrams.xojbin0 -> 141272 bytes
-rw-r--r--docs/backpack/pkgdb.pngbin0 -> 96706 bytes
8 files changed, 327 insertions, 0 deletions
diff --git a/docs/backpack/.gitignore b/docs/backpack/.gitignore
new file mode 100644
index 0000000000..c3eb46ecd6
--- /dev/null
+++ b/docs/backpack/.gitignore
@@ -0,0 +1,10 @@
+*.aux
+*.bak
+*.bbl
+*.blg
+*.dvi
+*.fdb_latexmk
+*.fls
+*.log
+*.synctex.gz
+backpack-impl.pdf
diff --git a/docs/backpack/Makefile b/docs/backpack/Makefile
new file mode 100644
index 0000000000..0dd7a9dad5
--- /dev/null
+++ b/docs/backpack/Makefile
@@ -0,0 +1,2 @@
+backpack-impl.pdf: backpack-impl.tex
+ latexmk -pdf -latexoption=-halt-on-error -latexoption=-file-line-error -latexoption=-synctex=1 backpack-impl.tex && touch paper.dvi || ! rm -f $@
diff --git a/docs/backpack/arch.png b/docs/backpack/arch.png
new file mode 100644
index 0000000000..d8b8fd21f9
--- /dev/null
+++ b/docs/backpack/arch.png
Binary files differ
diff --git a/docs/backpack/backpack-impl.bib b/docs/backpack/backpack-impl.bib
new file mode 100644
index 0000000000..6bda35a8ea
--- /dev/null
+++ b/docs/backpack/backpack-impl.bib
@@ -0,0 +1,17 @@
+@inproceedings{Kilpatrick:2014:BRH:2535838.2535884,
+ author = {Kilpatrick, Scott and Dreyer, Derek and Peyton Jones, Simon and Marlow, Simon},
+ title = {Backpack: Retrofitting Haskell with Interfaces},
+ booktitle = {Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages},
+ series = {POPL '14},
+ year = {2014},
+ isbn = {978-1-4503-2544-8},
+ location = {San Diego, California, USA},
+ pages = {19--31},
+ numpages = {13},
+ url = {http://doi.acm.org/10.1145/2535838.2535884},
+ doi = {10.1145/2535838.2535884},
+ acmid = {2535884},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+ keywords = {applicative instantiation, haskell modules, mixin modules, module systems, packages, recursive modules, separate modular development, type systems},
+}
diff --git a/docs/backpack/backpack-impl.tex b/docs/backpack/backpack-impl.tex
new file mode 100644
index 0000000000..14942ce92c
--- /dev/null
+++ b/docs/backpack/backpack-impl.tex
@@ -0,0 +1,298 @@
+\documentclass{article}
+
+\usepackage{graphicx} %[pdftex] OR [dvips]
+\usepackage{fullpage}
+\usepackage{float}
+\usepackage{titling}
+\setlength{\droptitle}{-6em}
+
+\newcommand{\ghcfile}[1]{\textsl{#1}}
+
+\title{Implementing Backpack}
+
+\begin{document}
+
+\maketitle
+
+The purpose of this document is to describe an implementation path
+for Backpack~\cite{Kilpatrick:2014:BRH:2535838.2535884} in GHC\@.
+
+We start off by outlining the current architecture of GHC, ghc-pkg and Cabal,
+which constitute the existing packaging system. We then state what our subgoals
+are, since there are many similar sounding but different problems to solve. Next,
+we describe the ``probably correct'' implementation plan, and finish off with
+some open design questions. This is intended to be an evolving design document,
+so please contribute!
+
+\section{Current packaging architecture}
+
+The overall architecture is described in Figure~\ref{fig:arch} (ignore
+the red and green bits for now).
+
+\begin{figure}[H]
+ \center{\scalebox{0.8}{\includegraphics{arch.png}}}
+\label{fig:arch}\caption{Architecture of GHC, ghc-pkg and Cabal. Green bits indicate additions from upcoming IHG work, red bits indicate additions from Backpack. Orange indicates a Haskell library.}
+\end{figure}
+
+Here, arrows indicate dependencies from one component to another.
+(insert architectural description here)
+
+A particularly important component of this picture is the package
+database, sketched in Figure~\ref{fig:pkgdb}.
+
+\begin{figure}[H]
+ \center{\scalebox{0.8}{\includegraphics{pkgdb.png}}}
+\label{fig:pkgdb}\caption{Anatomy of a package database and a package identifier.}
+\end{figure}
+
+An installed package is calculated from a Cabal file through the process
+of dependency resolution and compilation. We can think of it a
+database, whose primary key is the InstalledPackageId, which presently
+is the package name, the package version, and the ABI hash (nota bene,
+the diagram disagrees with this text: it shows the version of installed
+package IDs which we'd like to move towards.) These IDs uniquely
+identify an instance of an installed package. A mere PackageId omits
+the ABI hash.
+
+The database entry itself contains the information from the installed package ID,
+as well as information such as what dependencies it was linked against, where
+its compiled code and interface files live, its compilation flags, what modules
+it exposes, etc. Much of this information is only relevant to Cabal; GHC
+uses a subset of the information in the package database.
+
+\section{Goals}
+
+There are actually a number of different goals we have for modifying the
+packaging system.
+
+\begin{itemize}
+ \item Support multiple instances of containers-2.9 \emph{in the
+ package database}. These instances may be compiled against
+ different dependencies, have the same dependencies but different
+ source files (as when a package is being developed), or be
+ compiled with different options. It is less important to allow
+ these instances to be linkable together.
+
+ \item Some easy-to-implement subset of the functionality provided by
+ packages with holes (Backpack proper).
+\end{itemize}
+
+A lower priority goal is to actually allow multiple instances of
+containers-2.9 to be linked together in the same executable
+program.\footnote{In particular, this requires changes to how linker symbols
+are assigned. However, this feature is important to implement a number
+of Backpack features.}
+
+A \emph{non-goal} is to allow users to upgrade upstream libraries
+without recompiling downstream. This is an ABI concern and we're not
+going to worry about it.
+
+\section{Aside: Recent IHG work}
+
+The IHG project has allocated some funds to relax the package instance
+constraint in the package database, so that multiple instances can be
+stored, but now the user of GHC must explicitly list package-IDs to be
+linked against. In the far future, it would be expected that tools like
+Cabal automatically handle instance selection among a large number of
+instances, but this is subtle and so this work is only to do some
+foundational work, allowing a package database to optionally relax the
+unique package-version requirement, and utilize environment files to
+select which packages should be used. See Duncan's email for more
+details on the proposal.
+
+For the purpose of Backpack, the only relevant part of this proposal
+is the relaxation of package databases to allow multiple
+
+\section{Adding Backpack to GHC}
+
+Backpack additions are described in red in the architectural diagrams.
+The current structure of this section is to describe the additions bottom up.
+
+\subsection{Use InstalledPackageId instead PackageId in typechecking}
+
+In Backpack, there needs to be some mechanism for assigning
+\emph{physical module identities} to modules, which are essential for
+typechecking Backpack packages, since they let us tell if two types are
+equal or not. In the paper, the physical identity was represented as the
+package that constructed it as well as some representation of the module
+source. We can simplify this slightly: in current Cabal packages, we
+require that modules always be given a package-unique logical name;
+thus, physical identities can be simply represented as a PackageId plus
+module name. (See \ghcfile{compiler/basicTypes/Module.lhs:Module})
+
+However, this is not enough if we allow multiple instances of a package
+in the package database: now we may incorrectly conclude that types
+defined by different instances of the same version of a package are
+equal: this would be especially fatal if the two packages were linked
+against different underlying libraries. Thus, a physical module name
+should be represented as an InstalledPackageId (which uniquely
+identifies an installed package) as well as the original logical name.
+
+\paragraph{Note about linker symbols} Module is currently used for
+typechecking, and then once again
+
+Currently, InstalledPackageId is constructed of a package, version and ABI
+hash. The use of an ABI hash is a bit of hack, mostly to make sure these
+installed package IDs are unique. In Figure~\ref{fig:pkgdb}, an alternate
+logical representation of InstalledPackageId is suggested using
+
+----
+
+Simon gave some nice explanations of original names in GHC
+
+Sometimes, a name can be exposed in an hi file even if its module
+wasn't exposed. Example in package R:
+
+ module X where
+ import Internal (f)
+ g = f
+
+ module Internal where
+ import Internal.Total (f)
+
+Then in X.hi:
+
+ g = <R.id, Internal.Total, f> (this is the original name)
+
+(The reason we refer to the package as R.id is because it's the
+full package ID, and not just R).
+
+How might internal names work with Backpack?
+
+ package P where
+ M = ...
+ N = ...
+ package Q (M, R, T)
+ include P (N -> R)
+ T = ...
+
+ Q; exposed modules
+ M -> <P.id, M>
+ R -> <P.id, N>
+ T -> <Q.id, T>
+
+When we compile Q, and the interface file gets generated, we have
+to generate identifiers for each of the exposed modules. These should
+be calculated to directly refer to the "original name" of each them;
+so for example M and R point directly to package P, but they also
+include the original name they had in the original definition.
+
+----
+
+Differing intuitions: GHC internals versus Backpack abstraction
+
+----
+
+Refactoring necessary:
+
+ - PackageId in GHC needs to be InstalledPackageId. I get these IDs
+ from -package-name when I build a package, and these are baked
+ into the hi files and linker names. To the type checker, this
+ IS exactly what a package is. But see open question about linker
+ names...
+
+ There appears to already be a conversion, probably a newtype,
+ from package name to linker names, according to Duncan.
+
+ THE PLAN: To remain BC, we have a flag named -package-name which
+ is used for both. So now I maintain two different values,
+ -package-name sets both, and then I have another flag for setting
+ one separately.
+
+ Watch out: GHC plays tricks with wired-in names. Suggested is a
+ table from wired names to package IDs (constructed with the
+ package environment)
+
+ghc-pkg
+
+ - Remove the "removal step" when registering a package (with a flag)
+
+ - Check main/Packages.lhs:mkPackagesState to look out for shadowing
+ within a database, it might already do the right thing (key idea
+ is that we already do something sensible merging package databases
+ together, reuse that)
+
+ - Experiment using -hide-all-packages -package-id ... flags explicitly
+
+\section{Open questions}
+
+Here are open problems about the implementation which still require
+hashing out.
+
+ - Aliasing of signatures means that it is no longer the case that
+ original name means type equality. We were not able to convince
+ Simon of any satisfactory resolution. Strawman proposal is to
+ extending original names to also be variables probably won't work
+ because it is so deeply wired, but it's difficult to construct hi
+ files so that everything works out (quadratic).
+
+ - Relationship between linker names and InstalledPackageId? The reason
+ the obvious thing to do is use all of InstalledPackageId for linker
+ name, but this breaks recompilation. So some of these things
+ should go in the linker name, and not others (yes package, yes
+ version, yes some compile flags (definitely ways), eventually
+ dependency package IDs, what about cabal build flags). This is
+ approximately an ABI hash, but it's computable before compilation.
+ This worries Simon because now there are two names, but maybe
+ the DB can solve that problem--unfortunately, GHC doesn't ever
+ register during compilation; only later.
+
+ Simon also thought we should use shorter names for linker
+ names and InstallPkgIds. This appears to be orthogonal.
+
+ - In this example:
+
+ package A where
+ A = [ ... ]
+ package A2 where
+ A2 = [ ... ]
+ package B (B)
+ ...
+
+ Do the seperate instantiations of B exist as seperate artifacts
+ in the database, or do they get constructed on the fly by
+ definite packages which include them? The former is conceptually
+ nicer (identity of types work, closer to Backpack semantics), but
+ the latter is easier for linker names. (Simon's first inclination
+ is to construct it on the fly.)
+
+ There was another example, showing that we need to solve this
+ problem even for indefinite combinations of indefinite modules.
+ You can get to it by modifying the earlier example so that C and
+ D still have holes, which E does not fill.
+
+ - We have to store the preprocessed sources for indefinite packages.
+ This is hard when we're constructing packages on the fly.
+
+ - What is the impact on dependency solving in Cabal? Old questions
+ of what to prefer when multiple package-versions are available
+ (Cabal originally only needed to solve this between different
+ versions of the same package, preferring the oldest version), but
+ with signatures, there are more choices. Should there be a
+ complex solver that does all signature solving, or a preprocessing
+ step that puts things back into the original Cabal version.
+ Authors may want to suggest policy for what packages should actually
+ link against signatures (so a crypto library doesn't accidentally
+ link against a null cipher package).
+
+\section{Immediate tasks}
+
+At this point in time, it seems we can identify the following set
+of non-controversial tasks which can be started immediately.
+
+\begin{itemize}
+ \item Relax the package database constraint to allow multiple
+ instances of package-version.
+ \item Propagate the use of \verb|InstalledPackageId| instead of
+ package IDs for typechecking (but not for linking, yet).
+ \item Implement export declarations in package format, so
+ packages can reexport modules from other packages.
+\end{itemize}
+
+The aliasing problem is probably the most important open problem
+blocking the rest of the system.
+
+\bibliographystyle{plain}
+\bibliography{backpack-impl}
+
+\end{document}
diff --git a/docs/backpack/diagrams.pdf b/docs/backpack/diagrams.pdf
new file mode 100644
index 0000000000..a50916b234
--- /dev/null
+++ b/docs/backpack/diagrams.pdf
Binary files differ
diff --git a/docs/backpack/diagrams.xoj b/docs/backpack/diagrams.xoj
new file mode 100644
index 0000000000..71d7c45166
--- /dev/null
+++ b/docs/backpack/diagrams.xoj
Binary files differ
diff --git a/docs/backpack/pkgdb.png b/docs/backpack/pkgdb.png
new file mode 100644
index 0000000000..c1cf9632cb
--- /dev/null
+++ b/docs/backpack/pkgdb.png
Binary files differ