diff options
author | Stefan Beller <sbeller@google.com> | 2018-07-27 15:23:56 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2018-08-01 13:36:22 -0700 |
commit | 301ef8540155316cb87c896866dd1cab3108807b (patch) | |
tree | 487204f9308324c012996d70c546a304ecf5545d /xdiff/xdiffi.c | |
parent | 53f9a3e157dbbc901a02ac2c73346d375e24978c (diff) | |
download | git-301ef8540155316cb87c896866dd1cab3108807b.tar.gz |
xdiff: reduce indent heuristic overhead
Skip searching for better indentation heuristics if we'd slide a hunk more
than its size. This is the easiest fix proposed in the analysis[1] in
response to a patch that mercurial took for xdiff to limit searching
by a constant. Using a performance test as:
#!python
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 1000001)
This patch reduces the execution of "git diff --no-index a b" from
0.70s to 0.31s. However limiting the sliding to the size of the diff hunk,
which was proposed as a solution (that I found easiest to implement for
now) is not optimal for cases like
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 2000000)
as then we'd still slide 1000000 times.
In addition to limiting the sliding to size of the hunk, also limit by a
constant. Choose 100 lines as the constant as that fits more than a screen,
which really means that the diff sliding is probably not providing a lot
of benefit anyway.
[1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@alum.mit.edu/
Reported-by: Jun Wu <quark@fb.com>
Analysis-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'xdiff/xdiffi.c')
-rw-r--r-- | xdiff/xdiffi.c | 12 |
1 files changed, 11 insertions, 1 deletions
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 0de1ef463b..91e98ee986 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -592,6 +592,11 @@ static void measure_split(const xdfile_t *xdf, long split, #define INDENT_WEIGHT 60 /* + * How far do we slide a hunk at most? + */ +#define INDENT_HEURISTIC_MAX_SLIDING 100 + +/* * Compute a badness score for the hypothetical split whose measurements are * stored in m. The weight factors were determined empirically using the tools and * corpus described in @@ -903,7 +908,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long shift, best_shift = -1; struct split_score best_score; - for (shift = earliest_end; shift <= g.end; shift++) { + shift = earliest_end; + if (g.end - groupsize - 1 > shift) + shift = g.end - groupsize - 1; + if (g.end - INDENT_HEURISTIC_MAX_SLIDING > shift) + shift = g.end - INDENT_HEURISTIC_MAX_SLIDING; + for (; shift <= g.end; shift++) { struct split_measurement m; struct split_score score = {0, 0}; |