summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2016-07-25 14:50:10 -0400
committerJunio C Hamano <gitster@pobox.com>2016-07-26 13:34:49 -0700
commit1ceddba29036f2efd4f243ce30158a3152c39b11 (patch)
tree92508b4156a85f91cbb8d5762e43c571a9665ead
parent08bb3500a2a718c3c78b0547c68601cafa7a8fd9 (diff)
downloadgit-1ceddba29036f2efd4f243ce30158a3152c39b11.tar.gz
pack-objects: break out of want_object loop early
When pack-objects collects the list of objects to pack (either from stdin, or via its internal rev-list), it filters each one through want_object_in_pack(). This function loops through each existing packfile, looking for the object. When we find it, we mark the pack/offset combo for later use. However, we can't just return "yes, we want it" at that point. If --honor-pack-keep is in effect, we must keep looking to find it in _all_ packs, to make sure none of them has a .keep. Likewise, if --local is in effect, we must make sure it is not present in any non-local pack. As a result, the sum effort of these calls is effectively O(nr_objects * nr_packs). In an ordinary repository, we have only a handful of packs, and this doesn't make a big difference. But in pathological cases, it can slow the counting phase to a crawl. This patch notices the case that we have neither "--local" nor "--honor-pack-keep" in effect and breaks out of the loop early, after finding the first instance. Note that our worst case is still "objects * packs" (i.e., we might find each object in the last pack we look in), but in practice we will often break out early. On an "average" repo, my git.git with 8 packs, this shows a modest 2% (a few dozen milliseconds) improvement in the counting-objects phase of "git pack-objects --all <foo" (hackily instrumented by sticking exit(0) right after list_objects). But in a much more pathological case, it makes a bigger difference. I ran the same command on a real-world example with ~9 million objects across 1300 packs. The counting time dropped from 413s to 45s, an improvement of about 89%. Note that this patch won't do anything by itself for a normal "git gc", as it uses both --honor-pack-keep and --local. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
-rw-r--r--builtin/pack-objects.c15
1 files changed, 15 insertions, 0 deletions
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a2f8cfdec0..a46bf5b7e4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -977,6 +977,21 @@ static int want_object_in_pack(const unsigned char *sha1,
return 1;
if (incremental)
return 0;
+
+ /*
+ * When asked to do --local (do not include an
+ * object that appears in a pack we borrow
+ * from elsewhere) or --honor-pack-keep (do not
+ * include an object that appears in a pack marked
+ * with .keep), we need to make sure no copy of this
+ * object come from in _any_ pack that causes us to
+ * omit it, and need to complete this loop. When
+ * neither option is in effect, we know the object
+ * we just found is going to be packed, so break
+ * out of the loop to return 1 now.
+ */
+ if (!ignore_packed_keep && !local)
+ break;
if (local && !p->pack_local)
return 0;
if (ignore_packed_keep && p->pack_local && p->pack_keep)