From 85584d497e7b354853b72f450683d59fcf6b9c5c Mon Sep 17 00:00:00 2001 From: Sanjay Ghemawat Date: Tue, 17 Apr 2012 08:36:46 -0700 Subject: Added bloom filter support. In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters. Other changes: - Updated version number to 1.4. - Some build tweaks. - C binding for CompactRange. - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom. - Minor .gitignore update. --- doc/index.html | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++ doc/table_format.txt | 41 ++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) (limited to 'doc') diff --git a/doc/index.html b/doc/index.html index 472f7cd..521d2ba 100644 --- a/doc/index.html +++ b/doc/index.html @@ -400,6 +400,69 @@ We might want to prefix filename keys with one letter (say '/') and over just the metadata do not force us to fetch and cache bulky file contents.

+

Filters

+

+Because of the way leveldb data is organized on disk, +a single Get() call may involve multiple reads from disk. +The optional FilterPolicy mechanism can be used to reduce +the number of disk reads substantially. +

+   leveldb::Options options;
+   options.filter_policy = NewBloomFilter(10);
+   leveldb::DB* db;
+   leveldb::DB::Open(options, "/tmp/testdb", &db);
+   ... use the database ...
+   delete db;
+   delete options.filter_policy;
+
+The preceding code associates a +Bloom filter +based filtering policy with the database. Bloom filter based +filtering relies on keeping some number of bits of data in memory per +key (in this case 10 bits per key since that is the argument we passed +to NewBloomFilter). This filter will reduce the number of unnecessary +disk reads needed for Get() calls by a factor of +approximately a 100. Increasing the bits per key will lead to a +larger reduction at the cost of more memory usage. We recommend that +applications whose working set does not fit in memory and that do a +lot of random reads set a filter policy. +

+If you are using a custom comparator, you should ensure that the filter +policy you are using is compatible with your comparator. For example, +consider a comparator that ignores trailing spaces when comparing keys. +NewBloomFilter must not be used with such a comparator. +Instead, the application should provide a custom filter policy that +also ignores trailing spaces. For example: +

+  class CustomFilterPolicy : public leveldb::FilterPolicy {
+   private:
+    FilterPolicy* builtin_policy_;
+   public:
+    CustomFilterPolicy() : builtin_policy_(NewBloomFilter(10)) { }
+    ~CustomFilterPolicy() { delete builtin_policy_; }
+
+    const char* Name() const { return "IgnoreTrailingSpacesFilter"; }
+
+    void CreateFilter(const Slice* keys, int n, std::string* dst) const {
+      // Use builtin bloom filter code after removing trailing spaces
+      std::vector<Slice> trimmed(n);
+      for (int i = 0; i < n; i++) {
+        trimmed[i] = RemoveTrailingSpaces(keys[i]);
+      }
+      return builtin_policy_->CreateFilter(&trimmed[i], n, dst);
+    }
+
+    bool KeyMayMatch(const Slice& key, const Slice& filter) const {
+      // Use builtin bloom filter code after removing trailing spaces
+      return builtin_policy_->KeyMayMatch(RemoveTrailingSpaces(key), filter);
+    }
+  };
+
+

+Advanced applications may provide a filter policy that does not use +a bloom filter but uses some other mechanism for summarizing a set +of keys. See leveldb/filter_policy.h for detail. +

Checksums

leveldb associates checksums with all data it stores in the file system. diff --git a/doc/table_format.txt b/doc/table_format.txt index ad5aa4b..d0f3065 100644 --- a/doc/table_format.txt +++ b/doc/table_format.txt @@ -47,6 +47,47 @@ the BlockHandle of the metaindex and index blocks as well as a magic number. // (40==2*BlockHandle::kMaxEncodedLength) magic: fixed64; // == 0xdb4775248b80fb57 +"filter" Meta Block +------------------- + +If a "FilterPolicy" was specified when the database was opened, a +filter block is stored in each table. The "metaindex" block contains +an entry that maps from "filter." to the BlockHandle for the filter +block where "" is the string returned by the filter policy's +"Name()" method. + +The filter block stores a sequence of filters, where filter i contains +the output of FilterPolicy::CreateFilter() on all keys that are stored +in a block whose file offset falls within the range + + [ i*base ... (i+1)*base-1 ] + +Currently, "base" is 2KB. So for example, if blocks X and Y start in +the range [ 0KB .. 2KB-1 ], all of the keys in X and Y will be +converted to a filter by calling FilterPolicy::CreateFilter(), and the +resulting filter will be stored as the first filter in the filter +block. + +The filter block is formatted as follows: + + [filter 0] + [filter 1] + [filter 2] + ... + [filter N-1] + + [offset of filter 0] : 4 bytes + [offset of filter 1] : 4 bytes + [offset of filter 2] : 4 bytes + ... + [offset of filter N-1] : 4 bytes + + [offset of beginning of offset array] : 4 bytes + lg(base) : 1 byte + +The offset array at the end of the filter block allows efficient +mapping from a data block offset to the corresponding filter. + "stats" Meta Block ------------------ -- cgit v1.2.1