summaryrefslogtreecommitdiff
path: root/revision.h
diff options
context:
space:
mode:
authorGarima Singh <garima.singh@microsoft.com>2020-04-06 16:59:52 +0000
committerJunio C Hamano <gitster@pobox.com>2020-04-06 11:08:37 -0700
commita56b9464cd0a49317fafde080ae4e73c5430ac9b (patch)
tree73a69c869e0dc5dabc5655940479510fafe09426 /revision.h
parentd38e07b8c44ffdb73e7eba1b7f6a73eb7eb0d5f9 (diff)
downloadgit-a56b9464cd0a49317fafde080ae4e73c5430ac9b.tar.gz
revision.c: use Bloom filters to speed up path based revision walks
Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'revision.h')
-rw-r--r--revision.h11
1 files changed, 11 insertions, 0 deletions
diff --git a/revision.h b/revision.h
index 475f048fb6..7c026fe41f 100644
--- a/revision.h
+++ b/revision.h
@@ -56,6 +56,8 @@ struct repository;
struct rev_info;
struct string_list;
struct saved_parents;
+struct bloom_key;
+struct bloom_filter_settings;
define_shared_commit_slab(revision_sources, char *);
struct rev_cmdline_info {
@@ -291,6 +293,15 @@ struct rev_info {
struct revision_sources *sources;
struct topo_walk_info *topo_walk_info;
+
+ /* Commit graph bloom filter fields */
+ /* The bloom filter key for the pathspec */
+ struct bloom_key *bloom_key;
+ /*
+ * The bloom filter settings used to generate the key.
+ * This is loaded from the commit-graph being used.
+ */
+ struct bloom_filter_settings *bloom_filter_settings;
};
int ref_excluded(struct string_list *, const char *path);