summaryrefslogtreecommitdiff
path: root/upload-pack.c
diff options
context:
space:
mode:
authorDerrick Stolee <dstolee@microsoft.com>2018-07-20 16:33:28 +0000
committerJunio C Hamano <gitster@pobox.com>2018-07-20 15:38:56 -0700
commit4fbcca4effc1c6f8431120f88f5a4bd1c8e38ca3 (patch)
tree88cefac812dd03d8261c5d697c3ee2a12499eb55 /upload-pack.c
parent1e3497a24cf13fe907b247d1b93a997d6537cca1 (diff)
downloadgit-4fbcca4effc1c6f8431120f88f5a4bd1c8e38ca3.tar.gz
commit-reach: make can_all_from_reach... linear
The can_all_from_reach_with_flags() algorithm is currently quadratic in the worst case, because it calls the reachable() method for every 'from' without tracking which commits have already been walked or which can already reach a commit in 'to'. Rewrite the algorithm to walk each commit a constant number of times. We also add some optimizations that should work for the main consumer of this method: fetch negotitation (haves/wants). The first step includes using a depth-first-search (DFS) from each 'from' commit, sorted by ascending generation number. We do not walk beyond the minimum generation number or the minimum commit date. This DFS is likely to be faster than the existing reachable() method because we expect previous ref values to be along the first-parent history. If we find a target commit, then we mark everything in the DFS stack as a RESULT. This expands the set of targets for the other 'from' commits. We also mark the visited commits using 'assign_flag' to prevent re- walking the same commits. We still need to clear our flags at the end, which is why we will have a total of three visits to each commit. Performance was measured on the Linux repository using 'test-tool reach can_all_from_reach'. The input included rows seeded by tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as v3.[0-9]*. This mimics a (very large) fetch that says "I have all major v3 releases and want all major v4 releases." The "large" case included X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate tags to the set, which does not greatly increase the number of objects that are considered, but does increase the number of 'from' commits, demonstrating the quadratic nature of the previous code. Small Case: Before: 1.52 s After: 0.26 s Large Case: Before: 3.50 s After: 0.27 s Note how the time increases between the two cases in the two versions. The new code increases relative to the number of commits that need to be walked, but not directly relative to the number of 'from' commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'upload-pack.c')
-rw-r--r--upload-pack.c5
1 files changed, 4 insertions, 1 deletions
diff --git a/upload-pack.c b/upload-pack.c
index 11c426685d..1e498f1188 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -338,11 +338,14 @@ static int got_oid(const char *hex, struct object_id *oid)
static int ok_to_give_up(void)
{
+ uint32_t min_generation = GENERATION_NUMBER_ZERO;
+
if (!have_obj.nr)
return 0;
return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
- COMMON_KNOWN, oldest_have);
+ COMMON_KNOWN, oldest_have,
+ min_generation);
}
static int get_common_commits(void)