From 30e12b924b57b15e707f1749f2e5af15f1c7fe09 Mon Sep 17 00:00:00 2001
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 27 Apr 2014 21:15:44 +0300
Subject: patch-id: make it stable against hunk reordering

Patch id changes if users reorder file diffs that make up a patch.

As the result is functionally equivalent, a different patch id is
surprising to many users.
In particular, reordering files using diff -O is helpful to make patches
more readable (e.g. API header diff before implementation diff).

Add an option to change patch-id behaviour making it stable against
these kinds of patch change:
calculate SHA1 hash for each hunk separately and sum all hashes
(using a symmetrical sum) to get patch id

We use a 20byte sum and not xor - since xor would give 0 output
for patches that have two identical diffs, which isn't all that
unlikely (e.g. append the same line in two places).

The new behaviour is enabled
- when patchid.stable is true
- when --stable flag is present

Using a new flag --unstable or setting patchid.stable to false force
the historical behaviour.

In the documentation, clarify that patch ID can now be a sum of hashes,
not a hash.
Document how command line and config options affect the
behaviour.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-patch-id.txt | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 312c3b1fe5..31efc587ee 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 --------
 [verse]
-'git patch-id' < <patch>
+'git patch-id' [--stable | --unstable] < <patch>
 
 DESCRIPTION
 -----------
-A "patch ID" is nothing but a SHA-1 of the diff associated with a patch, with
-whitespace and line numbers ignored.  As such, it's "reasonably stable", but at
-the same time also reasonably unique, i.e., two patches that have the same "patch
-ID" are almost guaranteed to be the same thing.
+A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
+patch, with whitespace and line numbers ignored.  As such, it's "reasonably
+stable", but at the same time also reasonably unique, i.e., two patches that
+have the same "patch ID" are almost guaranteed to be the same thing.
 
 IOW, you can use this thing to look for likely duplicate commits.
 
@@ -27,6 +27,33 @@ This can be used to make a mapping from patch ID to commit ID.
 
 OPTIONS
 -------
+
+--stable::
+	Use a "stable" sum of hashes as the patch ID. With this option:
+	 - Reordering file diffs that make up a patch does not affect the ID.
+	   In particular, two patches produced by comparing the same two trees
+	   with two different settings for "-O<orderfile>" result in the same
+	   patch ID signature, thereby allowing the computed result to be used
+	   as a key to index some meta-information about the change between
+	   the two trees;
+
+	 - Result is different from the value produced by git 1.9 and older
+	   or produced when an "unstable" hash (see --unstable below) is
+	   configured - even when used on a diff output taken without any use
+	   of "-O<orderfile>", thereby making existing databases storing such
+	   "unstable" or historical patch-ids unusable.
+
+	This is the default if patchid.stable is set to true.
+
+--unstable::
+	Use an "unstable" hash as the patch ID. With this option,
+	the result produced is compatible with the patch-id value produced
+	by git 1.9 and older.  Users with pre-existing databases storing
+	patch-ids produced by git 1.9 and older (who do not deal with reordered
+	patches) may want to use this option.
+
+	This is the default.
+
 <patch>::
 	The diff to create the ID of.
 
-- 
cgit v1.2.1