From f4f19fb63449e1beee02b0ec845319f7115fa9d0 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Mon, 16 Nov 2009 10:56:25 -0500
Subject: diffcore-break: free filespec data as we go

As we look at each changed file and consider breaking it, we
load the blob data and make a decision about whether to
break, which is independent of any other blobs that might
have changed. However, we keep the data in memory while we
consider breaking all of the other files. Which means that
both versions of every file you are diffing are in memory at
the same time.

This patch instead frees the blob data as we finish with
each file pair, leading to much lower memory usage.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diffcore-break.c | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'diffcore-break.c')

diff --git a/diffcore-break.c b/diffcore-break.c
index d7097bb576..15562e4556 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -204,12 +204,16 @@ void diffcore_break(int break_score)
 				dp->score = score;
 				dp->broken_pair = 1;
 
+				diff_free_filespec_data(p->one);
+				diff_free_filespec_data(p->two);
 				free(p); /* not diff_free_filepair(), we are
 					  * reusing one and two here.
 					  */
 				continue;
 			}
 		}
+		diff_free_filespec_data(p->one);
+		diff_free_filespec_data(p->two);
 		diff_q(&outq, p);
 	}
 	free(q->queue);
-- 
cgit v1.2.1


From 8282de94bc76360e0bf76da4076755696b049d23 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Mon, 16 Nov 2009 11:02:02 -0500
Subject: diffcore-break: save cnt_data for other phases

The "break" phase works by counting changes between two
blobs with the same path. We do this by splitting the file
into chunks (or lines for text oriented files) and then
keeping a count of chunk hashes.

The "rename" phase counts changes between blobs at two
different paths. However, it uses the exact same set of
chunk hashes (which are immutable for a given sha1).

The rename phase can therefore use the same hash data as
break. Unfortunately, we were throwing this data away after
computing it in the break phase. This patch instead attaches
it to the filespec and lets it live through the rename
phase, working under the assumption that most of the time
that breaks are being computed, renames will be too.

We only do this optimization for files which have actually
been broken, as those ones will be candidates for rename
detection (and it is a time-space tradeoff, so we don't want
to waste space keeping useless data).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diffcore-break.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'diffcore-break.c')

diff --git a/diffcore-break.c b/diffcore-break.c
index 15562e4556..3a7b60a037 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -69,7 +69,7 @@ static int should_break(struct diff_filespec *src,
 		return 0; /* we do not break too small filepair */
 
 	if (diffcore_count_changes(src, dst,
-				   NULL, NULL,
+				   &src->cnt_data, &dst->cnt_data,
 				   0,
 				   &src_copied, &literal_added))
 		return 0;
@@ -204,8 +204,8 @@ void diffcore_break(int break_score)
 				dp->score = score;
 				dp->broken_pair = 1;
 
-				diff_free_filespec_data(p->one);
-				diff_free_filespec_data(p->two);
+				diff_free_filespec_blob(p->one);
+				diff_free_filespec_blob(p->two);
 				free(p); /* not diff_free_filepair(), we are
 					  * reusing one and two here.
 					  */
-- 
cgit v1.2.1