summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorSanjay Ghemawat <sanjay@google.com>2012-04-17 08:36:46 -0700
committerSanjay Ghemawat <sanjay@google.com>2012-04-17 08:36:46 -0700
commit85584d497e7b354853b72f450683d59fcf6b9c5c (patch)
treeb6aad4f973f87487c3caa5600e7596219c79f645 /doc
parentbc1ee4d25e09b04e074db330a41f54ef4af0e31b (diff)
downloadleveldb-85584d497e7b354853b72f450683d59fcf6b9c5c.tar.gz
Added bloom filter support.v1.4
In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters. Other changes: - Updated version number to 1.4. - Some build tweaks. - C binding for CompactRange. - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom. - Minor .gitignore update.
Diffstat (limited to 'doc')
-rw-r--r--doc/index.html63
-rw-r--r--doc/table_format.txt41
2 files changed, 104 insertions, 0 deletions
diff --git a/doc/index.html b/doc/index.html
index 472f7cd..521d2ba 100644
--- a/doc/index.html
+++ b/doc/index.html
@@ -400,6 +400,69 @@ We might want to prefix <code>filename</code> keys with one letter (say '/') and
over just the metadata do not force us to fetch and cache bulky file
contents.
<p>
+<h2>Filters</h2>
+<p>
+Because of the way <code>leveldb</code> data is organized on disk,
+a single <code>Get()</code> call may involve multiple reads from disk.
+The optional <code>FilterPolicy</code> mechanism can be used to reduce
+the number of disk reads substantially.
+<pre>
+ leveldb::Options options;
+ options.filter_policy = NewBloomFilter(10);
+ leveldb::DB* db;
+ leveldb::DB::Open(options, "/tmp/testdb", &amp;db);
+ ... use the database ...
+ delete db;
+ delete options.filter_policy;
+</pre>
+The preceding code associates a
+<a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</a>
+based filtering policy with the database. Bloom filter based
+filtering relies on keeping some number of bits of data in memory per
+key (in this case 10 bits per key since that is the argument we passed
+to NewBloomFilter). This filter will reduce the number of unnecessary
+disk reads needed for <code>Get()</code> calls by a factor of
+approximately a 100. Increasing the bits per key will lead to a
+larger reduction at the cost of more memory usage. We recommend that
+applications whose working set does not fit in memory and that do a
+lot of random reads set a filter policy.
+<p>
+If you are using a custom comparator, you should ensure that the filter
+policy you are using is compatible with your comparator. For example,
+consider a comparator that ignores trailing spaces when comparing keys.
+<code>NewBloomFilter</code> must not be used with such a comparator.
+Instead, the application should provide a custom filter policy that
+also ignores trailing spaces. For example:
+<pre>
+ class CustomFilterPolicy : public leveldb::FilterPolicy {
+ private:
+ FilterPolicy* builtin_policy_;
+ public:
+ CustomFilterPolicy() : builtin_policy_(NewBloomFilter(10)) { }
+ ~CustomFilterPolicy() { delete builtin_policy_; }
+
+ const char* Name() const { return "IgnoreTrailingSpacesFilter"; }
+
+ void CreateFilter(const Slice* keys, int n, std::string* dst) const {
+ // Use builtin bloom filter code after removing trailing spaces
+ std::vector&lt;Slice&gt; trimmed(n);
+ for (int i = 0; i &lt; n; i++) {
+ trimmed[i] = RemoveTrailingSpaces(keys[i]);
+ }
+ return builtin_policy_-&gt;CreateFilter(&amp;trimmed[i], n, dst);
+ }
+
+ bool KeyMayMatch(const Slice& key, const Slice& filter) const {
+ // Use builtin bloom filter code after removing trailing spaces
+ return builtin_policy_-&gt;KeyMayMatch(RemoveTrailingSpaces(key), filter);
+ }
+ };
+</pre>
+<p>
+Advanced applications may provide a filter policy that does not use
+a bloom filter but uses some other mechanism for summarizing a set
+of keys. See <code>leveldb/filter_policy.h</code> for detail.
+<p>
<h1>Checksums</h1>
<p>
<code>leveldb</code> associates checksums with all data it stores in the file system.
diff --git a/doc/table_format.txt b/doc/table_format.txt
index ad5aa4b..d0f3065 100644
--- a/doc/table_format.txt
+++ b/doc/table_format.txt
@@ -47,6 +47,47 @@ the BlockHandle of the metaindex and index blocks as well as a magic number.
// (40==2*BlockHandle::kMaxEncodedLength)
magic: fixed64; // == 0xdb4775248b80fb57
+"filter" Meta Block
+-------------------
+
+If a "FilterPolicy" was specified when the database was opened, a
+filter block is stored in each table. The "metaindex" block contains
+an entry that maps from "filter.<N>" to the BlockHandle for the filter
+block where "<N>" is the string returned by the filter policy's
+"Name()" method.
+
+The filter block stores a sequence of filters, where filter i contains
+the output of FilterPolicy::CreateFilter() on all keys that are stored
+in a block whose file offset falls within the range
+
+ [ i*base ... (i+1)*base-1 ]
+
+Currently, "base" is 2KB. So for example, if blocks X and Y start in
+the range [ 0KB .. 2KB-1 ], all of the keys in X and Y will be
+converted to a filter by calling FilterPolicy::CreateFilter(), and the
+resulting filter will be stored as the first filter in the filter
+block.
+
+The filter block is formatted as follows:
+
+ [filter 0]
+ [filter 1]
+ [filter 2]
+ ...
+ [filter N-1]
+
+ [offset of filter 0] : 4 bytes
+ [offset of filter 1] : 4 bytes
+ [offset of filter 2] : 4 bytes
+ ...
+ [offset of filter N-1] : 4 bytes
+
+ [offset of beginning of offset array] : 4 bytes
+ lg(base) : 1 byte
+
+The offset array at the end of the filter block allows efficient
+mapping from a data block offset to the corresponding filter.
+
"stats" Meta Block
------------------