1 files changed, 0 insertions, 228 deletions
diff --git a/doc/impl.html b/doc/impl.html
deleted file mode 100644
index b190d2c..0000000
--- a/doc/impl.html
+++ /dev/null
@@ -1,228 +0,0 @@
-<!DOCTYPE html>
-<html>
-<head>
-<link rel="stylesheet" type="text/css" href="doc.css" />
-<title>Leveldb file layout and compactions</title>
-</head>
-
-<body>
-
-<h1>Files</h1>
-
-The implementation of leveldb is similar in spirit to the
-representation of a single
-<a href="http://labs.google.com/papers/bigtable.html">
-Bigtable tablet (section 5.3)</a>.
-However the organization of the files that make up the representation
-is somewhat different and is explained below.
-
-<p>
-Each database is represented by a set of file stored in a directory.
-There are several different types of files as documented below:
-<p>
-<h2>Log files</h2>
-<p>
-A log file (*.log) stores a sequence of recent updates.  Each update
-is appended to the current log file.  When the log file reaches a
-pre-determined size (approximately 1MB by default), it is converted
-to a sorted table (see below) and a new log file is created for future
-updates.
-<p>
-A copy of the current log file is kept in an in-memory structure (the
-<code>memtable</code>).  This copy is consulted on every read so that read
-operations reflect all logged updates.
-<p>
-<h2>Sorted tables</h2>
-<p>
-A sorted table (*.sst) stores a sequence of entries sorted by key.
-Each entry is either a value for the key, or a deletion marker for the
-key.  (Deletion markers are kept around to hide obsolete values
-present in older sorted tables).
-<p>
-The set of sorted tables are organized into a sequence of levels.  The
-sorted table generated from a log file is placed in a special <code>young</code>
-level (also called level-0).  When the number of young files exceeds a
-certain threshold (currently four), all of the young files are merged
-together with all of the overlapping level-1 files to produce a
-sequence of new level-1 files (we create a new level-1 file for every
-2MB of data.)
-<p>
-Files in the young level may contain overlapping keys.  However files
-in other levels have distinct non-overlapping key ranges.  Consider
-level number L where L >= 1.  When the combined size of files in
-level-L exceeds (10^L) MB (i.e., 10MB for level-1, 100MB for level-2,
-...), one file in level-L, and all of the overlapping files in
-level-(L+1) are merged to form a set of new files for level-(L+1).
-These merges have the effect of gradually migrating new updates from
-the young level to the largest level using only bulk reads and writes
-(i.e., minimizing expensive seeks).
-
-<h2>Large value files</h2>
-<p>
-Each large value (greater than 64KB by default) is placed in a large
-value file (*.val) of its own.  An entry is maintained in the log
-and/or sorted tables that maps from the corresponding key to the
-name of this large value file.  The name of the large value file
-is derived from a SHA1 hash of the value and its length so that
-identical values share the same file.
-<p>
-<h2>Manifest</h2>
-<p>
-A MANIFEST file lists the set of sorted tables that make up each
-level, the corresponding key ranges, and other important metadata.
-A new MANIFEST file (with a new number embedded in the file name)
-is created whenever the database is reopened.  The MANIFEST file is
-formatted as a log, and changes made to the serving state (as files
-are added or removed) are appended to this log.
-<p>
-<h2>Current</h2>
-<p>
-CURRENT is a simple text file that contains the name of the latest
-MANIFEST file.
-<p>
-<h2>Info logs</h2>
-<p>
-Informational messages are printed to files named LOG and LOG.old.
-<p>
-<h2>Others</h2>
-<p>
-Other files used for miscellaneous purposes may also be present
-(LOCK, *.dbtmp).
-
-<h1>Level 0</h1>
-When the log file grows above a certain size (1MB by default):
-<ul>
-<li>Write the contents of the current memtable to an sstable
-<li>Replace the current memtable by a brand new empty memtable
-<li>Switch to a new log file
-<li>Delete the old log file and the old memtable
-</ul>
-Experimental measurements show that generating an sstable from a 1MB
-log file takes ~12ms, which seems like an acceptable latency hiccup to
-add infrequently to a log write.
-
-<p>
-The new sstable is added to a special level-0 level.  level-0 contains
-a set of files (up to 4 by default).  However unlike other levels,
-these files do not cover disjoint ranges, but may overlap each other.
-
-<h1>Compactions</h1>
-
-<p>
-When the size of level L exceeds its limit, we compact it in a
-background thread.  The compaction picks a file from level L and all
-overlapping files from the next level L+1.  Note that if a level-L
-file overlaps only part of a level-(L+1) file, the entire file at
-level-(L+1) is used as an input to the compaction and will be
-discarded after the compaction.  Aside: because level-0 is special
-(files in it may overlap each other), we treat compactions from
-level-0 to level-1 specially: a level-0 compaction may pick more than
-one level-0 file in case some of these files overlap each other.
-
-<p>
-A compaction merges the contents of the picked files to produce a
-sequence of level-(L+1) files.  We switch to producing a new
-level-(L+1) file after the current output file has reached the target
-file size (2MB).  We also switch to a new output file when the key
-range of the current output file has grown enough to overlap more then
-ten level-(L+2) files.  This last rule ensures that a later compaction
-of a level-(L+1) file will not pick up too much data from level-(L+2).
-
-<p>
-The old files are discarded and the new files are added to the serving
-state.
-
-<p>
-Compactions for a particular level rotate through the key space.  In
-more detail, for each level L, we remember the ending key of the last
-compaction at level L.  The next compaction for level L will pick the
-first file that starts after this key (wrapping around to the
-beginning of the key space if there is no such file).
-
-<p>
-Compactions drop overwritten values.  They also drop deletion markers
-if there are no higher numbered levels that contain a file whose range
-overlaps the current key.
-
-<h2>Timing</h2>
-
-Level-0 compactions will read up to four 1MB files from level-0, and
-at worst all the level-1 files (10MB).  I.e., we will read 14MB and
-write 14MB.
-
-<p>
-Other than the special level-0 compactions, we will pick one 2MB file
-from level L.  In the worst case, this will overlap ~ 12 files from
-level L+1 (10 because level-(L+1) is ten times the size of level-L,
-and another two at the boundaries since the file ranges at level-L
-will usually not be aligned with the file ranges at level-L+1).  The
-compaction will therefore read 26MB and write 26MB.  Assuming a disk
-IO rate of 100MB/s (ballpark range for modern drives), the worst
-compaction cost will be approximately 0.5 second.
-
-<p>
-If we throttle the background writing to something small, say 10% of
-the full 100MB/s speed, a compaction may take up to 5 seconds.  If the
-user is writing at 10MB/s, we might build up lots of level-0 files
-(~50 to hold the 5*10MB).  This may signficantly increase the cost of
-reads due to the overhead of merging more files together on every
-read.
-
-<p>
-Solution 1: To reduce this problem, we might want to increase the log
-switching threshold when the number of level-0 files is large.  Though
-the downside is that the larger this threshold, the larger the delay
-that we will add to write latency when a write triggers a log switch.
-
-<p>
-Solution 2: We might want to decrease write rate artificially when the
-number of level-0 files goes up.
-
-<p>
-Solution 3: We work on reducing the cost of very wide merges.
-Perhaps most of the level-0 files will have their blocks sitting
-uncompressed in the cache and we will only need to worry about the
-O(N) complexity in the merging iterator.
-
-<h2>Number of files</h2>
-
-Instead of always making 2MB files, we could make larger files for
-larger levels to reduce the total file count, though at the expense of
-more bursty compactions.  Alternatively, we could shard the set of
-files into multiple directories.
-
-<p>
-An experiment on an <code>ext3</code> filesystem on Feb 04, 2011 shows
-the following timings to do 100K file opens in directories with
-varying number of files:
-<table class="datatable">
-<tr><th>Files in directory</th><th>Microseconds to open a file</th></tr>
-<tr><td>1000</td><td>9</td>
-<tr><td>10000</td><td>10</td>
-<tr><td>100000</td><td>16</td>
-</table>
-So maybe even the sharding is not necessary on modern filesystems?
-
-<h1>Recovery</h1>
-
-<ul>
-<li> Read CURRENT to find name of the latest committed MANIFEST
-<li> Read the named MANIFEST file
-<li> Clean up stale files
-<li> We could open all sstables here, but it is probably better to be lazy...
-<li> Convert log chunk to a new level-0 sstable
-<li> Start directing new writes to a new log file with recovered sequence#
-</ul>
-
-<h1>Garbage collection of files</h1>
-
-<code>DeleteObsoleteFiles()</code> is called at the end of every
-compaction and at the end of recovery.  It finds the names of all
-files in the database.  It deletes all log files that are not the
-current log file.  It deletes all table files that are not referenced
-from some level and are not the output of an active compaction.  It
-deletes all large value files that are not referenced from any live
-table or log file.
-
-</body>
-</html>