summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2014-02-27 14:01:48 -0800
committerJunio C Hamano <gitster@pobox.com>2014-02-27 14:01:48 -0800
commit0f9e62e0847c075678a7a5a748567d1e881d16f8 (patch)
tree8ef8989069ae40eef891b0964fc2cb8036a74e48
parent6784fab0ac1f973f22f1d4252f0e513d61be6c6b (diff)
parent6b5b3a27b7faf9d72efec28fa017408daf45cd00 (diff)
downloadgit-0f9e62e0847c075678a7a5a748567d1e881d16f8.tar.gz
Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
-rw-r--r--Documentation/config.txt25
-rw-r--r--Documentation/git-repack.txt9
-rw-r--r--Documentation/git-rev-list.txt1
-rw-r--r--Documentation/rev-list-options.txt8
-rw-r--r--Documentation/technical/bitmap-format.txt164
-rw-r--r--Makefile16
-rw-r--r--block-sha1/sha1.c32
-rw-r--r--builtin/pack-objects.c451
-rw-r--r--builtin/repack.c41
-rw-r--r--builtin/rev-list.c39
-rw-r--r--cache.h1
-rw-r--r--compat/bswap.h112
-rw-r--r--ewah/bitmap.c221
-rw-r--r--ewah/ewah_bitmap.c714
-rw-r--r--ewah/ewah_io.c204
-rw-r--r--ewah/ewah_rlw.c115
-rw-r--r--ewah/ewok.h233
-rw-r--r--ewah/ewok_rlw.h114
-rw-r--r--khash.h338
-rw-r--r--pack-bitmap-write.c552
-rw-r--r--pack-bitmap.c1073
-rw-r--r--pack-bitmap.h64
-rw-r--r--pack-objects.c111
-rw-r--r--pack-objects.h68
-rw-r--r--pack-revindex.c43
-rw-r--r--pack-revindex.h9
-rw-r--r--pack-write.c2
-rw-r--r--read-cache.c44
-rw-r--r--revision.c4
-rw-r--r--revision.h2
-rw-r--r--sha1_file.c6
-rwxr-xr-xt/perf/p5310-pack-bitmaps.sh57
-rwxr-xr-xt/t5310-pack-bitmaps.sh139
33 files changed, 4736 insertions, 276 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 7eec746950..040197b10e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1870,6 +1870,31 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
+pack.useBitmaps::
+ When true, git will use pack bitmaps (if available) when packing
+ to stdout (e.g., during the server side of a fetch). Defaults to
+ true. You should not generally need to turn this off unless
+ you are debugging pack bitmaps.
+
+pack.writebitmaps::
+ When true, git will write a bitmap index when packing all
+ objects to disk (e.g., when `git repack -a` is run). This
+ index can speed up the "counting objects" phase of subsequent
+ packs created for clones and fetches, at the cost of some disk
+ space and extra time spent on the initial repack. Defaults to
+ false.
+
+pack.writeBitmapHashCache::
+ When true, git will include a "hash cache" section in the bitmap
+ index (if one is written). This cache can be used to feed git's
+ delta heuristics, potentially leading to better deltas between
+ bitmapped and non-bitmapped objects (e.g., when serving a fetch
+ between an older, bitmapped pack and objects that have been
+ pushed since the last gc). The downside is that it consumes 4
+ bytes per object of disk space, and that JGit's bitmap
+ implementation does not understand it, causing it to complain if
+ Git and JGit are used on the same repository. Defaults to false.
+
pager.<cmd>::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 509cf73e50..002cfd5eb9 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository
SYNOPSIS
--------
[verse]
-'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=<n>] [--depth=<n>]
+'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=<n>] [--depth=<n>]
DESCRIPTION
-----------
@@ -110,6 +110,13 @@ other objects in that pack they already have locally.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
+-b::
+--write-bitmap-index::
+ Write a reachability bitmap index as part of the repack. This
+ only makes sense when used with `-a` or `-A`, as the bitmaps
+ must be able to refer to all reachable objects. This option
+ overrides the setting of `pack.writebitmaps`.
+
Configuration
-------------
diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 045b37b82e..7a1585def0 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -55,6 +55,7 @@ SYNOPSIS
[ \--reverse ]
[ \--walk-reflogs ]
[ \--no-walk ] [ \--do-walk ]
+ [ \--use-bitmap-index ]
<commit>... [ \-- <paths>... ]
DESCRIPTION
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 03533af715..9a3da3646e 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -257,6 +257,14 @@ See also linkgit:git-reflog[1].
Output excluded boundary commits. Boundary commits are
prefixed with `-`.
+ifdef::git-rev-list[]
+--use-bitmap-index::
+
+ Try to speed up the traversal using the pack bitmap index (if
+ one is available). Note that when traversing with `--objects`,
+ trees and blobs will not have their associated path printed.
+endif::git-rev-list[]
+
--
History Simplification
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 0000000000..f8c18a0f7a
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,164 @@
+GIT bitmap v1 format
+====================
+
+ - A header appears at the beginning:
+
+ 4-byte signature: {'B', 'I', 'T', 'M'}
+
+ 2-byte version number (network byte order)
+ The current implementation only supports version 1
+ of the bitmap index (the same one as JGit).
+
+ 2-byte flags (network byte order)
+
+ The following flags are supported:
+
+ - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+ This flag must always be present. It implies that the bitmap
+ index has been generated for a packfile with full closure
+ (i.e. where every single object in the packfile can find
+ its parent links inside the same packfile). This is a
+ requirement for the bitmap index format, also present in JGit,
+ that greatly reduces the complexity of the implementation.
+
+ - BITMAP_OPT_HASH_CACHE (0x4)
+ If present, the end of the bitmap file contains
+ `N` 32-bit name-hash values, one per object in the
+ pack. The format and meaning of the name-hash is
+ described below.
+
+ 4-byte entry count (network byte order)
+
+ The total count of entries (bitmapped commits) in this bitmap index.
+
+ 20-byte checksum
+
+ The SHA1 checksum of the pack this bitmap index belongs to.
+
+ - 4 EWAH bitmaps that act as type indexes
+
+ Type indexes are serialized after the hash cache in the shape
+ of four EWAH bitmaps stored consecutively (see Appendix A for
+ the serialization format of an EWAH bitmap).
+
+ There is a bitmap for each Git object type, stored in the following
+ order:
+
+ - Commits
+ - Trees
+ - Blobs
+ - Tags
+
+ In each bitmap, the `n`th bit is set to true if the `n`th object
+ in the packfile is of that type.
+
+ The obvious consequence is that the OR of all 4 bitmaps will result
+ in a full set (all bits set), and the AND of all 4 bitmaps will
+ result in an empty bitmap (no bits set).
+
+ - N entries with compressed bitmaps, one for each indexed commit
+
+ Where `N` is the total amount of entries in this bitmap index.
+ Each entry contains the following:
+
+ - 4-byte object position (network byte order)
+ The position **in the index for the packfile** where the
+ bitmap for this commit is found.
+
+ - 1-byte XOR-offset
+ The xor offset used to compress this bitmap. For an entry
+ in position `x`, a XOR offset of `y` means that the actual
+ bitmap representing this commit is composed by XORing the
+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
+ the bitmap `y` entries before this one).
+
+ Note that this compression can be recursive. In order to
+ XOR this entry with a previous one, the previous entry needs
+ to be decompressed first, and so on.
+
+ The hard-limit for this offset is 160 (an entry can only be
+ xor'ed against one of the 160 entries preceding it). This
+ number is always positive, and hence entries are always xor'ed
+ with **previous** bitmaps, not bitmaps that will come afterwards
+ in the index.
+
+ - 1-byte flags for this bitmap
+ At the moment the only available flag is `0x1`, which hints
+ that this bitmap can be re-used when rebuilding bitmap indexes
+ for the repository.
+
+ - The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+ - 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+ - 4-byte number of words of the COMPRESSED bitmap, when stored
+
+ - N x 8-byte words, as specified by the previous field
+
+ This is the actual content of the compressed bitmap.
+
+ - 4-byte position of the current RLW for the compressed
+ bitmap
+
+All words are stored in network byte order for their corresponding
+sizes.
+
+The compressed bitmap is stored in a form of run-length encoding, as
+follows. It consists of a concatenation of an arbitrary number of
+chunks. Each chunk consists of one or more 64-bit words
+
+ H L_1 L_2 L_3 .... L_M
+
+H is called RLW (run length word). It consists of (from lower to higher
+order bits):
+
+ - 1 bit: the repeated bit B
+
+ - 32 bits: repetition count K (unsigned)
+
+ - 31 bits: literal word count M (unsigned)
+
+The bitstream represented by the above chunk is then:
+
+ - K repetitions of B
+
+ - The bits stored in `L_1` through `L_M`. Within a word, bits at
+ lower order come earlier in the stream than those at higher
+ order.
+
+The next word after `L_M` (if any) must again be a RLW, for the next
+chunk. For efficient appending to the bitstream, the EWAH stores a
+pointer to the last RLW in the stream.
+
+
+== Appendix B: Optional Bitmap Sections
+
+These sections may or may not be present in the `.bitmap` file; their
+presence is indicated by the header flags section described above.
+
+Name-hash cache
+---------------
+
+If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
+a cache of 32-bit values, one per object in the pack. The value at
+position `i` is the hash of the pathname at which the `i`th object
+(counting in index order) in the pack can be found. This can be fed
+into the delta heuristics to compare objects with similar pathnames.
+
+The hash algorithm used is:
+
+ hash = 0;
+ while ((c = *name++))
+ if (!isspace(c))
+ hash = (hash >> 2) + (c << 24);
+
+Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
+If implementations want to choose a different hashing scheme, they are
+free to do so, but MUST allocate a new header flag (because comparing
+hashes made under two different schemes would be pointless).
diff --git a/Makefile b/Makefile
index c3213bd54b..d4ce53a71b 100644
--- a/Makefile
+++ b/Makefile
@@ -664,6 +664,8 @@ LIB_H += diff.h
LIB_H += diffcore.h
LIB_H += dir.h
LIB_H += exec_cmd.h
+LIB_H += ewah/ewok.h
+LIB_H += ewah/ewok_rlw.h
LIB_H += fetch-pack.h
LIB_H += fmt-merge-msg.h
LIB_H += fsck.h
@@ -691,8 +693,10 @@ LIB_H += notes-merge.h
LIB_H += notes-utils.h
LIB_H += notes.h
LIB_H += object.h
+LIB_H += pack-objects.h
LIB_H += pack-revindex.h
LIB_H += pack.h
+LIB_H += pack-bitmap.h
LIB_H += parse-options.h
LIB_H += patch-ids.h
LIB_H += pathspec.h
@@ -796,6 +800,10 @@ LIB_OBJS += dir.o
LIB_OBJS += editor.o
LIB_OBJS += entry.o
LIB_OBJS += environment.o
+LIB_OBJS += ewah/bitmap.o
+LIB_OBJS += ewah/ewah_bitmap.o
+LIB_OBJS += ewah/ewah_io.o
+LIB_OBJS += ewah/ewah_rlw.o
LIB_OBJS += exec_cmd.o
LIB_OBJS += fetch-pack.o
LIB_OBJS += fsck.o
@@ -827,7 +835,10 @@ LIB_OBJS += notes-cache.o
LIB_OBJS += notes-merge.o
LIB_OBJS += notes-utils.o
LIB_OBJS += object.o
+LIB_OBJS += pack-bitmap.o
+LIB_OBJS += pack-bitmap-write.o
LIB_OBJS += pack-check.o
+LIB_OBJS += pack-objects.o
LIB_OBJS += pack-revindex.o
LIB_OBJS += pack-write.o
LIB_OBJS += pager.o
@@ -2480,8 +2491,9 @@ profile-clean:
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
clean: profile-clean coverage-clean
- $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \
- builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+ $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o
+ $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
+ $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS) $(NO_INSTALL)
$(RM) -r bin-wrappers $(dep_dirs)
diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index e1a1eb6097..22b125cf8c 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -62,38 +62,6 @@
#define setW(x, val) (W(x) = (val))
#endif
-/*
- * Performance might be improved if the CPU architecture is OK with
- * unaligned 32-bit loads and a fast ntohl() is available.
- * Otherwise fall back to byte loads and shifts which is portable,
- * and is faster on architectures with memory alignment issues.
- */
-
-#if defined(__i386__) || defined(__x86_64__) || \
- defined(_M_IX86) || defined(_M_X64) || \
- defined(__ppc__) || defined(__ppc64__) || \
- defined(__powerpc__) || defined(__powerpc64__) || \
- defined(__s390__) || defined(__s390x__)
-
-#define get_be32(p) ntohl(*(unsigned int *)(p))
-#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
-
-#else
-
-#define get_be32(p) ( \
- (*((unsigned char *)(p) + 0) << 24) | \
- (*((unsigned char *)(p) + 1) << 16) | \
- (*((unsigned char *)(p) + 2) << 8) | \
- (*((unsigned char *)(p) + 3) << 0) )
-#define put_be32(p, v) do { \
- unsigned int __v = (v); \
- *((unsigned char *)(p) + 0) = __v >> 24; \
- *((unsigned char *)(p) + 1) = __v >> 16; \
- *((unsigned char *)(p) + 2) = __v >> 8; \
- *((unsigned char *)(p) + 3) = __v >> 0; } while (0)
-
-#endif
-
/* This "rolls" over the 512-bit array */
#define W(x) (array[(x)&15])
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 541667f102..c73337931b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,10 +14,12 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
+#include "pack-objects.h"
#include "progress.h"
#include "refs.h"
#include "streaming.h"
#include "thread-utils.h"
+#include "pack-bitmap.h"
static const char *pack_usage[] = {
N_("git pack-objects --stdout [options...] [< ref-list | < object-list]"),
@@ -25,42 +27,15 @@ static const char *pack_usage[] = {
NULL
};
-struct object_entry {
- struct pack_idx_entry idx;
- unsigned long size; /* uncompressed size */
- struct packed_git *in_pack; /* already in pack */
- off_t in_pack_offset;
- struct object_entry *delta; /* delta base object */
- struct object_entry *delta_child; /* deltified objects who bases me */
- struct object_entry *delta_sibling; /* other deltified objects who
- * uses the same base as me
- */
- void *delta_data; /* cached delta (uncompressed) */
- unsigned long delta_size; /* delta data size (uncompressed) */
- unsigned long z_delta_size; /* delta data size (compressed) */
- enum object_type type;
- enum object_type in_pack_type; /* could be delta */
- uint32_t hash; /* name hint hash */
- unsigned char in_pack_header_size;
- unsigned preferred_base:1; /*
- * we do not pack this, but is available
- * to be used as the base object to delta
- * objects against.
- */
- unsigned no_try_delta:1;
- unsigned tagged:1; /* near the very tip of refs */
- unsigned filled:1; /* assigned write-order */
-};
-
/*
- * Objects we are going to pack are collected in objects array (dynamically
- * expanded). nr_objects & nr_alloc controls this array. They are stored
- * in the order we see -- typically rev-list --objects order that gives us
- * nice "minimum seek" order.
+ * Objects we are going to pack are collected in the `to_pack` structure.
+ * It contains an array (dynamically expanded) of the object data, and a map
+ * that can resolve SHA1s to their position in the array.
*/
-static struct object_entry *objects;
+static struct packing_data to_pack;
+
static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_result, nr_written;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
@@ -83,6 +58,14 @@ static struct progress *progress_state;
static int pack_compression_level = Z_DEFAULT_COMPRESSION;
static int pack_compression_seen;
+static struct packed_git *reuse_packfile;
+static uint32_t reuse_packfile_objects;
+static off_t reuse_packfile_offset;
+
+static int use_bitmap_index = 1;
+static int write_bitmap_index;
+static uint16_t write_bitmap_options;
+
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
static unsigned long cache_max_small_delta_size = 1000;
@@ -90,20 +73,28 @@ static unsigned long cache_max_small_delta_size = 1000;
static unsigned long window_memory_limit = 0;
/*
- * The object names in objects array are hashed with this hashtable,
- * to help looking up the entry by object name.
- * This hashtable is built after all the objects are seen.
- */
-static int *object_ix;
-static int object_ix_hashsz;
-static struct object_entry *locate_object_entry(const unsigned char *sha1);
-
-/*
* stats
*/
static uint32_t written, written_delta;
static uint32_t reused, reused_delta;
+/*
+ * Indexed commits
+ */
+static struct commit **indexed_commits;
+static unsigned int indexed_commits_nr;
+static unsigned int indexed_commits_alloc;
+
+static void index_commit_for_bitmap(struct commit *commit)
+{
+ if (indexed_commits_nr >= indexed_commits_alloc) {
+ indexed_commits_alloc = (indexed_commits_alloc + 32) * 2;
+ indexed_commits = xrealloc(indexed_commits,
+ indexed_commits_alloc * sizeof(struct commit *));
+ }
+
+ indexed_commits[indexed_commits_nr++] = commit;
+}
static void *get_delta(struct object_entry *entry)
{
@@ -553,12 +544,12 @@ static int mark_tagged(const char *path, const unsigned char *sha1, int flag,
void *cb_data)
{
unsigned char peeled[20];
- struct object_entry *entry = locate_object_entry(sha1);
+ struct object_entry *entry = packlist_find(&to_pack, sha1, NULL);
if (entry)
entry->tagged = 1;
if (!peel_ref(path, peeled)) {
- entry = locate_object_entry(peeled);
+ entry = packlist_find(&to_pack, peeled, NULL);
if (entry)
entry->tagged = 1;
}
@@ -633,9 +624,10 @@ static struct object_entry **compute_write_order(void)
{
unsigned int i, wo_end, last_untagged;
- struct object_entry **wo = xmalloc(nr_objects * sizeof(*wo));
+ struct object_entry **wo = xmalloc(to_pack.nr_objects * sizeof(*wo));
+ struct object_entry *objects = to_pack.objects;
- for (i = 0; i < nr_objects; i++) {
+ for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
objects[i].delta_child = NULL;
@@ -647,7 +639,7 @@ static struct object_entry **compute_write_order(void)
* Make sure delta_sibling is sorted in the original
* recency order.
*/
- for (i = nr_objects; i > 0;) {
+ for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
if (!e->delta)
continue;
@@ -665,7 +657,7 @@ static struct object_entry **compute_write_order(void)
* Give the objects in the original recency order until
* we see a tagged tip.
*/
- for (i = wo_end = 0; i < nr_objects; i++) {
+ for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
break;
add_to_write_order(wo, &wo_end, &objects[i]);
@@ -675,7 +667,7 @@ static struct object_entry **compute_write_order(void)
/*
* Then fill all the tagged tips.
*/
- for (; i < nr_objects; i++) {
+ for (; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -683,7 +675,7 @@ static struct object_entry **compute_write_order(void)
/*
* And then all remaining commits and tags.
*/
- for (i = last_untagged; i < nr_objects; i++) {
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (objects[i].type != OBJ_COMMIT &&
objects[i].type != OBJ_TAG)
continue;
@@ -693,7 +685,7 @@ static struct object_entry **compute_write_order(void)
/*
* And then all the trees.
*/
- for (i = last_untagged; i < nr_objects; i++) {
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (objects[i].type != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
@@ -702,17 +694,57 @@ static struct object_entry **compute_write_order(void)
/*
* Finally all the rest in really tight order
*/
- for (i = last_untagged; i < nr_objects; i++) {
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled)
add_family_to_write_order(wo, &wo_end, &objects[i]);
}
- if (wo_end != nr_objects)
- die("ordered %u objects, expected %"PRIu32, wo_end, nr_objects);
+ if (wo_end != to_pack.nr_objects)
+ die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
return wo;
}
+static off_t write_reused_pack(struct sha1file *f)
+{
+ unsigned char buffer[8192];
+ off_t to_write;
+ int fd;
+
+ if (!is_pack_valid(reuse_packfile))
+ die("packfile is invalid: %s", reuse_packfile->pack_name);
+
+ fd = git_open_noatime(reuse_packfile->pack_name);
+ if (fd < 0)
+ die_errno("unable to open packfile for reuse: %s",
+ reuse_packfile->pack_name);
+
+ if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
+ die_errno("unable to seek in reused packfile");
+
+ if (reuse_packfile_offset < 0)
+ reuse_packfile_offset = reuse_packfile->pack_size - 20;
+
+ to_write = reuse_packfile_offset - sizeof(struct pack_header);
+
+ while (to_write) {
+ int read_pack = xread(fd, buffer, sizeof(buffer));
+
+ if (read_pack <= 0)
+ die_errno("unable to read from reused packfile");
+
+ if (read_pack > to_write)
+ read_pack = to_write;
+
+ sha1write(f, buffer, read_pack);
+ to_write -= read_pack;
+ }
+
+ close(fd);
+ written += reuse_packfile_objects;
+ return reuse_packfile_offset - sizeof(struct pack_header);
+}
+
static void write_pack_file(void)
{
uint32_t i = 0, j;
@@ -724,7 +756,7 @@ static void write_pack_file(void)
if (progress > pack_to_stdout)
progress_state = start_progress("Writing objects", nr_result);
- written_list = xmalloc(nr_objects * sizeof(*written_list));
+ written_list = xmalloc(to_pack.nr_objects * sizeof(*written_list));
write_order = compute_write_order();
do {
@@ -737,8 +769,17 @@ static void write_pack_file(void)
f = create_tmp_packfile(&pack_tmp_name);
offset = write_pack_header(f, nr_remaining);
+
+ if (reuse_packfile) {
+ off_t packfile_size;
+ assert(pack_to_stdout);
+
+ packfile_size = write_reused_pack(f);
+ offset += packfile_size;
+ }
+
nr_written = 0;
- for (; i < nr_objects; i++) {
+ for (; i < to_pack.nr_objects; i++) {
struct object_entry *e = write_order[i];
if (write_one(f, e, &offset) == WRITE_ONE_BREAK)
break;
@@ -789,9 +830,31 @@ static void write_pack_file(void)
if (sizeof(tmpname) <= strlen(base_name) + 50)
die("pack base name '%s' too long", base_name);
snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
+
+ if (write_bitmap_index) {
+ bitmap_writer_set_checksum(sha1);
+ bitmap_writer_build_type_index(written_list, nr_written);
+ }
+
finish_tmp_packfile(tmpname, pack_tmp_name,
written_list, nr_written,
&pack_idx_opts, sha1);
+
+ if (write_bitmap_index) {
+ char *end_of_name_prefix = strrchr(tmpname, 0);
+ sprintf(end_of_name_prefix, "%s.bitmap", sha1_to_hex(sha1));
+
+ stop_progress(&progress_state);
+
+ bitmap_writer_show_progress(progress);
+ bitmap_writer_reuse_bitmaps(&to_pack);
+ bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
+ bitmap_writer_build(&to_pack);
+ bitmap_writer_finish(written_list, nr_written,
+ tmpname, write_bitmap_options);
+ write_bitmap_index = 0;
+ }
+
free(pack_tmp_name);
puts(sha1_to_hex(sha1));
}
@@ -801,7 +864,7 @@ static void write_pack_file(void)
written_list[j]->offset = (off_t)-1;
}
nr_remaining -= nr_written;
- } while (nr_remaining && i < nr_objects);
+ } while (nr_remaining && i < to_pack.nr_objects);
free(written_list);
free(write_order);
@@ -811,73 +874,6 @@ static void write_pack_file(void)
written, nr_result);
}
-static int locate_object_entry_hash(const unsigned char *sha1)
-{
- int i;
- unsigned int ui;
- memcpy(&ui, sha1, sizeof(unsigned int));
- i = ui % object_ix_hashsz;
- while (0 < object_ix[i]) {
- if (!hashcmp(sha1, objects[object_ix[i] - 1].idx.sha1))
- return i;
- if (++i == object_ix_hashsz)
- i = 0;
- }
- return -1 - i;
-}
-
-static struct object_entry *locate_object_entry(const unsigned char *sha1)
-{
- int i;
-
- if (!object_ix_hashsz)
- return NULL;
-
- i = locate_object_entry_hash(sha1);
- if (0 <= i)
- return &objects[object_ix[i]-1];
- return NULL;
-}
-
-static void rehash_objects(void)
-{
- uint32_t i;
- struct object_entry *oe;
-
- object_ix_hashsz = nr_objects * 3;
- if (object_ix_hashsz < 1024)
- object_ix_hashsz = 1024;
- object_ix = xrealloc(object_ix, sizeof(int) * object_ix_hashsz);
- memset(object_ix, 0, sizeof(int) * object_ix_hashsz);
- for (i = 0, oe = objects; i < nr_objects; i++, oe++) {
- int ix = locate_object_entry_hash(oe->idx.sha1);
- if (0 <= ix)
- continue;
- ix = -1 - ix;
- object_ix[ix] = i + 1;
- }
-}
-
-static uint32_t name_hash(const char *name)
-{
- uint32_t c, hash = 0;
-
- if (!name)
- return 0;
-
- /*
- * This effectively just creates a sortable number from the
- * last sixteen non-whitespace characters. Last characters
- * count "most", so things that end in ".c" sort together.
- */
- while ((c = *name++) != 0) {
- if (isspace(c))
- continue;
- hash = (hash >> 2) + (c << 24);
- }
- return hash;
-}
-
static void setup_delta_attr_check(struct git_attr_check *check)
{
static struct git_attr *attr_delta;
@@ -900,42 +896,69 @@ static int no_try_delta(const char *path)
return 0;
}
-static int add_object_entry(const unsigned char *sha1, enum object_type type,
- const char *name, int exclude)
+/*
+ * When adding an object, check whether we have already added it
+ * to our packing list. If so, we can skip. However, if we are
+ * being asked to excludei t, but the previous mention was to include
+ * it, make sure to adjust its flags and tweak our numbers accordingly.
+ *
+ * As an optimization, we pass out the index position where we would have
+ * found the item, since that saves us from having to look it up again a
+ * few lines later when we want to add the new entry.
+ */
+static int have_duplicate_entry(const unsigned char *sha1,
+ int exclude,
+ uint32_t *index_pos)
{
struct object_entry *entry;
- struct packed_git *p, *found_pack = NULL;
- off_t found_offset = 0;
- int ix;
- uint32_t hash = name_hash(name);
-
- ix = nr_objects ? locate_object_entry_hash(sha1) : -1;
- if (ix >= 0) {
- if (exclude) {
- entry = objects + object_ix[ix] - 1;
- if (!entry->preferred_base)
- nr_result--;
- entry->preferred_base = 1;
- }
+
+ entry = packlist_find(&to_pack, sha1, index_pos);
+ if (!entry)
return 0;
+
+ if (exclude) {
+ if (!entry->preferred_base)
+ nr_result--;
+ entry->preferred_base = 1;
}
+ return 1;
+}
+
+/*
+ * Check whether we want the object in the pack (e.g., we do not want
+ * objects found in non-local stores if the "--local" option was used).
+ *
+ * As a side effect of this check, we will find the packed version of this
+ * object, if any. We therefore pass out the pack information to avoid having
+ * to look it up again later.
+ */
+static int want_object_in_pack(const unsigned char *sha1,
+ int exclude,
+ struct packed_git **found_pack,
+ off_t *found_offset)
+{
+ struct packed_git *p;
+
if (!exclude && local && has_loose_object_nonlocal(sha1))
return 0;
+ *found_pack = NULL;
+ *found_offset = 0;
+
for (p = packed_git; p; p = p->next) {
off_t offset = find_pack_entry_one(sha1, p);
if (offset) {
- if (!found_pack) {
+ if (!*found_pack) {
if (!is_pack_valid(p)) {
warning("packfile %s cannot be accessed", p->pack_name);
continue;
}
- found_offset = offset;
- found_pack = p;
+ *found_offset = offset;
+ *found_pack = p;
}
if (exclude)
- break;
+ return 1;
if (incremental)
return 0;
if (local && !p->pack_local)
@@ -945,14 +968,21 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
}
}
- if (nr_objects >= nr_alloc) {
- nr_alloc = (nr_alloc + 1024) * 3 / 2;
- objects = xrealloc(objects, nr_alloc * sizeof(*entry));
- }
+ return 1;
+}
+
+static void create_object_entry(const unsigned char *sha1,
+ enum object_type type,
+ uint32_t hash,
+ int exclude,
+ int no_try_delta,
+ uint32_t index_pos,
+ struct packed_git *found_pack,
+ off_t found_offset)
+{
+ struct object_entry *entry;
- entry = objects + nr_objects++;
- memset(entry, 0, sizeof(*entry));
- hashcpy(entry->idx.sha1, sha1);
+ entry = packlist_alloc(&to_pack, sha1, index_pos);
entry->hash = hash;
if (type)
entry->type = type;
@@ -965,16 +995,43 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
entry->in_pack_offset = found_offset;
}
- if (object_ix_hashsz * 3 <= nr_objects * 4)
- rehash_objects();
- else
- object_ix[-1 - ix] = nr_objects;
+ entry->no_try_delta = no_try_delta;
+}
+
+static int add_object_entry(const unsigned char *sha1, enum object_type type,
+ const char *name, int exclude)
+{
+ struct packed_git *found_pack;
+ off_t found_offset;
+ uint32_t index_pos;
- display_progress(progress_state, nr_objects);
+ if (have_duplicate_entry(sha1, exclude, &index_pos))
+ return 0;
+
+ if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset))
+ return 0;
- if (name && no_try_delta(name))
- entry->no_try_delta = 1;
+ create_object_entry(sha1, type, pack_name_hash(name),
+ exclude, name && no_try_delta(name),
+ index_pos, found_pack, found_offset);
+ display_progress(progress_state, to_pack.nr_objects);
+ return 1;
+}
+
+static int add_object_entry_from_bitmap(const unsigned char *sha1,
+ enum object_type type,
+ int flags, uint32_t name_hash,
+ struct packed_git *pack, off_t offset)
+{
+ uint32_t index_pos;
+
+ if (have_duplicate_entry(sha1, 0, &index_pos))
+ return 0;
+
+ create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
+
+ display_progress(progress_state, to_pack.nr_objects);
return 1;
}
@@ -1175,7 +1232,7 @@ static void add_preferred_base_object(const char *name)
{
struct pbase_tree *it;
int cmplen;
- unsigned hash = name_hash(name);
+ unsigned hash = pack_name_hash(name);
if (!num_preferred_base || check_pbase_path(hash))
return;
@@ -1327,7 +1384,7 @@ static void check_object(struct object_entry *entry)
break;
}
- if (base_ref && (base_entry = locate_object_entry(base_ref))) {
+ if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and we even found that base
@@ -1401,12 +1458,12 @@ static void get_object_details(void)
uint32_t i;
struct object_entry **sorted_by_offset;
- sorted_by_offset = xcalloc(nr_objects, sizeof(struct object_entry *));
- for (i = 0; i < nr_objects; i++)
- sorted_by_offset[i] = objects + i;
- qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
+ sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct object_entry *));
+ for (i = 0; i < to_pack.nr_objects; i++)
+ sorted_by_offset[i] = to_pack.objects + i;
+ qsort(sorted_by_offset, to_pack.nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
- for (i = 0; i < nr_objects; i++) {
+ for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
if (big_file_threshold < entry->size)
@@ -2032,7 +2089,7 @@ static int add_ref_tag(const char *path, const unsigned char *sha1, int flag, vo
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, peeled) && /* peelable? */
- locate_object_entry(peeled)) /* object packed? */
+ packlist_find(&to_pack, peeled, NULL)) /* object packed? */
add_object_entry(sha1, OBJ_TAG, NULL, 0);
return 0;
}
@@ -2055,14 +2112,14 @@ static void prepare_pack(int window, int depth)
if (!pack_to_stdout)
do_check_packed_object_crc = 1;
- if (!nr_objects || !window || !depth)
+ if (!to_pack.nr_objects || !window || !depth)
return;
- delta_list = xmalloc(nr_objects * sizeof(*delta_list));
+ delta_list = xmalloc(to_pack.nr_objects * sizeof(*delta_list));
nr_deltas = n = 0;
- for (i = 0; i < nr_objects; i++) {
- struct object_entry *entry = objects + i;
+ for (i = 0; i < to_pack.nr_objects; i++) {
+ struct object_entry *entry = to_pack.objects + i;
if (entry->delta)
/* This happens if we decided to reuse existing
@@ -2140,6 +2197,20 @@ static int git_pack_config(const char *k, const char *v, void *cb)
cache_max_small_delta_size = git_config_int(k, v);
return 0;
}
+ if (!strcmp(k, "pack.writebitmaps")) {
+ write_bitmap_index = git_config_bool(k, v);
+ return 0;
+ }
+ if (!strcmp(k, "pack.writebitmaphashcache")) {
+ if (git_config_bool(k, v))
+ write_bitmap_options |= BITMAP_OPT_HASH_CACHE;
+ else
+ write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
+ }
+ if (!strcmp(k, "pack.usebitmaps")) {
+ use_bitmap_index = git_config_bool(k, v);
+ return 0;
+ }
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
@@ -2198,6 +2269,9 @@ static void show_commit(struct commit *commit, void *data)
{
add_object_entry(commit->object.sha1, OBJ_COMMIT, NULL, 0);
commit->object.flags |= OBJECT_ADDED;
+
+ if (write_bitmap_index)
+ index_commit_for_bitmap(commit);
}
static void show_object(struct object *obj,
@@ -2340,7 +2414,7 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
for (i = 0; i < p->num_objects; i++) {
sha1 = nth_packed_object_sha1(p, i);
- if (!locate_object_entry(sha1) &&
+ if (!packlist_find(&to_pack, sha1, NULL) &&
!has_sha1_pack_kept_or_nonlocal(sha1))
if (force_object_loose(sha1, p->mtime))
die("unable to force loose object");
@@ -2348,6 +2422,29 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
}
}
+static int get_object_list_from_bitmap(struct rev_info *revs)
+{
+ if (prepare_bitmap_walk(revs) < 0)
+ return -1;
+
+ if (!reuse_partial_packfile_from_bitmap(
+ &reuse_packfile,
+ &reuse_packfile_objects,
+ &reuse_packfile_offset)) {
+ assert(reuse_packfile_objects);
+ nr_result += reuse_packfile_objects;
+
+ if (progress) {
+ fprintf(stderr, "Reusing existing pack: %d, done.\n",
+ reuse_packfile_objects);
+ fflush(stderr);
+ }
+ }
+
+ traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
+ return 0;
+}
+
static void get_object_list(int ac, const char **av)
{
struct rev_info revs;
@@ -2367,6 +2464,7 @@ static void get_object_list(int ac, const char **av)
if (*line == '-') {
if (!strcmp(line, "--not")) {
flags ^= UNINTERESTING;
+ write_bitmap_index = 0;
continue;
}
die("not a rev '%s'", line);
@@ -2375,6 +2473,9 @@ static void get_object_list(int ac, const char **av)
die("bad revision '%s'", line);
}
+ if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
+ return;
+
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
@@ -2504,6 +2605,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
N_("pack compression level")),
OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents,
N_("do not hide commits by grafts"), 0),
+ OPT_BOOL(0, "use-bitmap-index", &use_bitmap_index,
+ N_("use a bitmap index if available to speed up counting objects")),
+ OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
+ N_("write a bitmap index together with the pack index")),
OPT_END(),
};
@@ -2570,6 +2675,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (keep_unreachable && unpack_unreachable)
die("--keep-unreachable and --unpack-unreachable are incompatible.");
+ if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow())
+ use_bitmap_index = 0;
+
+ if (pack_to_stdout || !rev_list_all)
+ write_bitmap_index = 0;
+
if (progress && all_progress_implied)
progress = 2;
diff --git a/builtin/repack.c b/builtin/repack.c
index bb2314c9cb..49f5857627 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
static void remove_redundant_pack(const char *dir_name, const char *base_name)
{
- const char *exts[] = {".pack", ".idx", ".keep"};
+ const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
@@ -115,7 +115,14 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
int cmd_repack(int argc, const char **argv, const char *prefix)
{
- const char *exts[2] = {".pack", ".idx"};
+ struct {
+ const char *name;
+ unsigned optional:1;
+ } exts[] = {
+ {".pack"},
+ {".idx"},
+ {".bitmap", 1},
+ };
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -137,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+ int write_bitmap = -1;
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@@ -155,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
OPT__QUIET(&quiet, N_("be quiet")),
OPT_BOOL('l', "local", &local,
N_("pass --local to git-pack-objects")),
+ OPT_BOOL('b', "write-bitmap-index", &write_bitmap,
+ N_("write bitmap index")),
OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
N_("with -A, do not loosen objects older than this")),
OPT_STRING(0, "window", &window, N_("n"),
@@ -196,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
argv_array_pushf(&cmd_args, "--no-reuse-delta");
if (no_reuse_object)
argv_array_pushf(&cmd_args, "--no-reuse-object");
+ if (write_bitmap >= 0)
+ argv_array_pushf(&cmd_args, "--%swrite-bitmap-index",
+ write_bitmap ? "" : "no-");
if (pack_everything & ALL_INTO_ONE) {
get_non_kept_pack_filenames(&existing_packs);
@@ -256,17 +269,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
*/
failed = 0;
for_each_string_list_item(item, &names) {
- for (ext = 0; ext < 2; ext++) {
+ for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup("%s/pack-%s%s", packdir,
- item->string, exts[ext]);
+ item->string, exts[ext].name);
if (!file_exists(fname)) {
free(fname);
continue;
}
fname_old = mkpath("%s/old-%s%s", packdir,
- item->string, exts[ext]);
+ item->string, exts[ext].name);
if (file_exists(fname_old))
if (unlink(fname_old))
failed = 1;
@@ -313,19 +326,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
/* Now the ones with the same name are out of the way... */
for_each_string_list_item(item, &names) {
- for (ext = 0; ext < 2; ext++) {
+ for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
+ int exists = 0;
fname = mkpathdup("%s/pack-%s%s",
- packdir, item->string, exts[ext]);
+ packdir, item->string, exts[ext].name);
fname_old = mkpathdup("%s-%s%s",
- packtmp, item->string, exts[ext]);
+ packtmp, item->string, exts[ext].name);
if (!stat(fname_old, &statbuffer)) {
statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
+ exists = 1;
+ }
+ if (exists || !exts[ext].optional) {
+ if (rename(fname_old, fname))
+ die_errno(_("renaming '%s' failed"), fname_old);
}
- if (rename(fname_old, fname))
- die_errno(_("renaming '%s' failed"), fname_old);
free(fname);
free(fname_old);
}
@@ -333,12 +350,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
/* Remove the "old-" files */
for_each_string_list_item(item, &names) {
- for (ext = 0; ext < 2; ext++) {
+ for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname;
fname = mkpath("%s/old-%s%s",
packdir,
item->string,
- exts[ext]);
+ exts[ext].name);
if (remove_path(fname))
warning(_("removing '%s' failed"), fname);
}
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0745e2d053..9f92905379 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
+#include "pack.h"
+#include "pack-bitmap.h"
#include "builtin.h"
#include "log-tree.h"
#include "graph.h"
@@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, int reaches, int all)
return 0;
}
+static int show_object_fast(
+ const unsigned char *sha1,
+ enum object_type type,
+ int exclude,
+ uint32_t name_hash,
+ struct packed_git *found_pack,
+ off_t found_offset)
+{
+ fprintf(stdout, "%s\n", sha1_to_hex(sha1));
+ return 1;
+}
+
int cmd_rev_list(int argc, const char **argv, const char *prefix)
{
struct rev_info revs;
@@ -265,6 +279,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
int bisect_list = 0;
int bisect_show_vars = 0;
int bisect_find_all = 0;
+ int use_bitmap_index = 0;
git_config(git_default_config, NULL);
init_revisions(&revs, prefix);
@@ -306,6 +321,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
bisect_show_vars = 1;
continue;
}
+ if (!strcmp(arg, "--use-bitmap-index")) {
+ use_bitmap_index = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--test-bitmap")) {
+ test_bitmap_walk(&revs);
+ return 0;
+ }
usage(rev_list_usage);
}
@@ -333,6 +356,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
if (bisect_list)
revs.limited = 1;
+ if (use_bitmap_index) {
+ if (revs.count && !revs.left_right && !revs.cherry_mark) {
+ uint32_t commit_count;
+ if (!prepare_bitmap_walk(&revs)) {
+ count_bitmap_commit_list(&commit_count, NULL, NULL, NULL);
+ printf("%d\n", commit_count);
+ return 0;
+ }
+ } else if (revs.tag_objects && revs.tree_objects && revs.blob_objects) {
+ if (!prepare_bitmap_walk(&revs)) {
+ traverse_bitmap_commit_list(&show_object_fast);
+ return 0;
+ }
+ }
+ }
+
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
if (revs.tree_objects)
diff --git a/cache.h b/cache.h
index 9f15744218..db223e8048 100644
--- a/cache.h
+++ b/cache.h
@@ -809,6 +809,7 @@ extern int hash_sha1_file(const void *buf, unsigned long len, const char *type,
extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_noatime(const char *name);
extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
diff --git a/compat/bswap.h b/compat/bswap.h
index 5061214f73..120c6c1d37 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val)
((val & 0x000000ff) << 24));
}
+static inline uint64_t default_bswap64(uint64_t val)
+{
+ return (((val & (uint64_t)0x00000000000000ffULL) << 56) |
+ ((val & (uint64_t)0x000000000000ff00ULL) << 40) |
+ ((val & (uint64_t)0x0000000000ff0000ULL) << 24) |
+ ((val & (uint64_t)0x00000000ff000000ULL) << 8) |
+ ((val & (uint64_t)0x000000ff00000000ULL) >> 8) |
+ ((val & (uint64_t)0x0000ff0000000000ULL) >> 24) |
+ ((val & (uint64_t)0x00ff000000000000ULL) >> 40) |
+ ((val & (uint64_t)0xff00000000000000ULL) >> 56));
+}
+
#undef bswap32
+#undef bswap64
#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
@@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x)
return result;
}
+#define bswap64 git_bswap64
+#if defined(__x86_64__)
+static inline uint64_t git_bswap64(uint64_t x)
+{
+ uint64_t result;
+ if (__builtin_constant_p(x))
+ result = default_bswap64(x);
+ else
+ __asm__("bswap %q0" : "=r" (result) : "0" (x));
+ return result;
+}
+#else
+static inline uint64_t git_bswap64(uint64_t x)
+{
+ union { uint64_t i64; uint32_t i32[2]; } tmp, result;
+ if (__builtin_constant_p(x))
+ result.i64 = default_bswap64(x);
+ else {
+ tmp.i64 = x;
+ result.i32[0] = git_bswap32(tmp.i32[1]);
+ result.i32[1] = git_bswap32(tmp.i32[0]);
+ }
+ return result.i64;
+}
+#endif
+
#elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
#include <stdlib.h>
#define bswap32(x) _byteswap_ulong(x)
+#define bswap64(x) _byteswap_uint64(x)
#endif
-#ifdef bswap32
+#if defined(bswap32)
#undef ntohl
#undef htonl
@@ -48,3 +88,73 @@ static inline uint32_t git_bswap32(uint32_t x)
#define htonl(x) bswap32(x)
#endif
+
+#if defined(bswap64)
+
+#undef ntohll
+#undef htonll
+#define ntohll(x) bswap64(x)
+#define htonll(x) bswap64(x)
+
+#else
+
+#undef ntohll
+#undef htonll
+
+#if !defined(__BYTE_ORDER)
+# if defined(BYTE_ORDER) && defined(LITTLE_ENDIAN) && defined(BIG_ENDIAN)
+# define __BYTE_ORDER BYTE_ORDER
+# define __LITTLE_ENDIAN LITTLE_ENDIAN
+# define __BIG_ENDIAN BIG_ENDIAN
+# endif
+#endif
+
+#if !defined(__BYTE_ORDER)
+# error "Cannot determine endianness"
+#endif
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+# define ntohll(n) (n)
+# define htonll(n) (n)
+#else
+# define ntohll(n) default_bswap64(n)
+# define htonll(n) default_bswap64(n)
+#endif
+
+#endif
+
+/*
+ * Performance might be improved if the CPU architecture is OK with
+ * unaligned 32-bit loads and a fast ntohl() is available.
+ * Otherwise fall back to byte loads and shifts which is portable,
+ * and is faster on architectures with memory alignment issues.
+ */
+
+#if defined(__i386__) || defined(__x86_64__) || \
+ defined(_M_IX86) || defined(_M_X64) || \
+ defined(__ppc__) || defined(__ppc64__) || \
+ defined(__powerpc__) || defined(__powerpc64__) || \
+ defined(__s390__) || defined(__s390x__)
+
+#define get_be16(p) ntohs(*(unsigned short *)(p))
+#define get_be32(p) ntohl(*(unsigned int *)(p))
+#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
+
+#else
+
+#define get_be16(p) ( \
+ (*((unsigned char *)(p) + 0) << 8) | \
+ (*((unsigned char *)(p) + 1) << 0) )
+#define get_be32(p) ( \
+ (*((unsigned char *)(p) + 0) << 24) | \
+ (*((unsigned char *)(p) + 1) << 16) | \
+ (*((unsigned char *)(p) + 2) << 8) | \
+ (*((unsigned char *)(p) + 3) << 0) )
+#define put_be32(p, v) do { \
+ unsigned int __v = (v); \
+ *((unsigned char *)(p) + 0) = __v >> 24; \
+ *((unsigned char *)(p) + 1) = __v >> 16; \
+ *((unsigned char *)(p) + 2) = __v >> 8; \
+ *((unsigned char *)(p) + 3) = __v >> 0; } while (0)
+
+#endif
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
new file mode 100644
index 0000000000..710e58c1bf
--- /dev/null
+++ b/ewah/bitmap.c
@@ -0,0 +1,221 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "git-compat-util.h"
+#include "ewok.h"
+
+#define MASK(x) ((eword_t)1 << (x % BITS_IN_WORD))
+#define BLOCK(x) (x / BITS_IN_WORD)
+
+struct bitmap *bitmap_new(void)
+{
+ struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
+ bitmap->words = ewah_calloc(32, sizeof(eword_t));
+ bitmap->word_alloc = 32;
+ return bitmap;
+}
+
+void bitmap_set(struct bitmap *self, size_t pos)
+{
+ size_t block = BLOCK(pos);
+
+ if (block >= self->word_alloc) {
+ size_t old_size = self->word_alloc;
+ self->word_alloc = block * 2;
+ self->words = ewah_realloc(self->words,
+ self->word_alloc * sizeof(eword_t));
+
+ memset(self->words + old_size, 0x0,
+ (self->word_alloc - old_size) * sizeof(eword_t));
+ }
+
+ self->words[block] |= MASK(pos);
+}
+
+void bitmap_clear(struct bitmap *self, size_t pos)
+{
+ size_t block = BLOCK(pos);
+
+ if (block < self->word_alloc)
+ self->words[block] &= ~MASK(pos);
+}
+
+int bitmap_get(struct bitmap *self, size_t pos)
+{
+ size_t block = BLOCK(pos);
+ return block < self->word_alloc &&
+ (self->words[block] & MASK(pos)) != 0;
+}
+
+struct ewah_bitmap *bitmap_to_ewah(struct bitmap *bitmap)
+{
+ struct ewah_bitmap *ewah = ewah_new();
+ size_t i, running_empty_words = 0;
+ eword_t last_word = 0;
+
+ for (i = 0; i < bitmap->word_alloc; ++i) {
+ if (bitmap->words[i] == 0) {
+ running_empty_words++;
+ continue;
+ }
+
+ if (last_word != 0)
+ ewah_add(ewah, last_word);
+
+ if (running_empty_words > 0) {
+ ewah_add_empty_words(ewah, 0, running_empty_words);
+ running_empty_words = 0;
+ }
+
+ last_word = bitmap->words[i];
+ }
+
+ ewah_add(ewah, last_word);
+ return ewah;
+}
+
+struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah)
+{
+ struct bitmap *bitmap = bitmap_new();
+ struct ewah_iterator it;
+ eword_t blowup;
+ size_t i = 0;
+
+ ewah_iterator_init(&it, ewah);
+
+ while (ewah_iterator_next(&blowup, &it)) {
+ if (i >= bitmap->word_alloc) {
+ bitmap->word_alloc *= 1.5;
+ bitmap->words = ewah_realloc(
+ bitmap->words, bitmap->word_alloc * sizeof(eword_t));
+ }
+
+ bitmap->words[i++] = blowup;
+ }
+
+ bitmap->word_alloc = i;
+ return bitmap;
+}
+
+void bitmap_and_not(struct bitmap *self, struct bitmap *other)
+{
+ const size_t count = (self->word_alloc < other->word_alloc) ?
+ self->word_alloc : other->word_alloc;
+
+ size_t i;
+
+ for (i = 0; i < count; ++i)
+ self->words[i] &= ~other->words[i];
+}
+
+void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
+{
+ size_t original_size = self->word_alloc;
+ size_t other_final = (other->bit_size / BITS_IN_WORD) + 1;
+ size_t i = 0;
+ struct ewah_iterator it;
+ eword_t word;
+
+ if (self->word_alloc < other_final) {
+ self->word_alloc = other_final;
+ self->words = ewah_realloc(self->words,
+ self->word_alloc * sizeof(eword_t));
+ memset(self->words + original_size, 0x0,
+ (self->word_alloc - original_size) * sizeof(eword_t));
+ }
+
+ ewah_iterator_init(&it, other);
+
+ while (ewah_iterator_next(&word, &it))
+ self->words[i++] |= word;
+}
+
+void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data)
+{
+ size_t pos = 0, i;
+
+ for (i = 0; i < self->word_alloc; ++i) {
+ eword_t word = self->words[i];
+ uint32_t offset;
+
+ if (word == (eword_t)~0) {
+ for (offset = 0; offset < BITS_IN_WORD; ++offset)
+ callback(pos++, data);
+ } else {
+ for (offset = 0; offset < BITS_IN_WORD; ++offset) {
+ if ((word >> offset) == 0)
+ break;
+
+ offset += ewah_bit_ctz64(word >> offset);
+ callback(pos + offset, data);
+ }
+ pos += BITS_IN_WORD;
+ }
+ }
+}
+
+size_t bitmap_popcount(struct bitmap *self)
+{
+ size_t i, count = 0;
+
+ for (i = 0; i < self->word_alloc; ++i)
+ count += ewah_bit_popcount64(self->words[i]);
+
+ return count;
+}
+
+int bitmap_equals(struct bitmap *self, struct bitmap *other)
+{
+ struct bitmap *big, *small;
+ size_t i;
+
+ if (self->word_alloc < other->word_alloc) {
+ small = self;
+ big = other;
+ } else {
+ small = other;
+ big = self;
+ }
+
+ for (i = 0; i < small->word_alloc; ++i) {
+ if (small->words[i] != big->words[i])
+ return 0;
+ }
+
+ for (; i < big->word_alloc; ++i) {
+ if (big->words[i] != 0)
+ return 0;
+ }
+
+ return 1;
+}
+
+void bitmap_reset(struct bitmap *bitmap)
+{
+ memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t));
+}
+
+void bitmap_free(struct bitmap *bitmap)
+{
+ if (bitmap == NULL)
+ return;
+
+ free(bitmap->words);
+ free(bitmap);
+}
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
new file mode 100644
index 0000000000..9ced2dadfe
--- /dev/null
+++ b/ewah/ewah_bitmap.c
@@ -0,0 +1,714 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "git-compat-util.h"
+#include "ewok.h"
+#include "ewok_rlw.h"
+
+static inline size_t min_size(size_t a, size_t b)
+{
+ return a < b ? a : b;
+}
+
+static inline size_t max_size(size_t a, size_t b)
+{
+ return a > b ? a : b;
+}
+
+static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size)
+{
+ size_t rlw_offset = (uint8_t *)self->rlw - (uint8_t *)self->buffer;
+
+ if (self->alloc_size >= new_size)
+ return;
+
+ self->alloc_size = new_size;
+ self->buffer = ewah_realloc(self->buffer,
+ self->alloc_size * sizeof(eword_t));
+ self->rlw = self->buffer + (rlw_offset / sizeof(size_t));
+}
+
+static inline void buffer_push(struct ewah_bitmap *self, eword_t value)
+{
+ if (self->buffer_size + 1 >= self->alloc_size)
+ buffer_grow(self, self->buffer_size * 3 / 2);
+
+ self->buffer[self->buffer_size++] = value;
+}
+
+static void buffer_push_rlw(struct ewah_bitmap *self, eword_t value)
+{
+ buffer_push(self, value);
+ self->rlw = self->buffer + self->buffer_size - 1;
+}
+
+static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number)
+{
+ size_t added = 0;
+ eword_t runlen, can_add;
+
+ if (rlw_get_run_bit(self->rlw) != v && rlw_size(self->rlw) == 0) {
+ rlw_set_run_bit(self->rlw, v);
+ } else if (rlw_get_literal_words(self->rlw) != 0 ||
+ rlw_get_run_bit(self->rlw) != v) {
+ buffer_push_rlw(self, 0);
+ if (v) rlw_set_run_bit(self->rlw, v);
+ added++;
+ }
+
+ runlen = rlw_get_running_len(self->rlw);
+ can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
+
+ rlw_set_running_len(self->rlw, runlen + can_add);
+ number -= can_add;
+
+ while (number >= RLW_LARGEST_RUNNING_COUNT) {
+ buffer_push_rlw(self, 0);
+ added++;
+ if (v) rlw_set_run_bit(self->rlw, v);
+ rlw_set_running_len(self->rlw, RLW_LARGEST_RUNNING_COUNT);
+ number -= RLW_LARGEST_RUNNING_COUNT;
+ }
+
+ if (number > 0) {
+ buffer_push_rlw(self, 0);
+ added++;
+
+ if (v) rlw_set_run_bit(self->rlw, v);
+ rlw_set_running_len(self->rlw, number);
+ }
+
+ return added;
+}
+
+size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number)
+{
+ if (number == 0)
+ return 0;
+
+ self->bit_size += number * BITS_IN_WORD;
+ return add_empty_words(self, v, number);
+}
+
+static size_t add_literal(struct ewah_bitmap *self, eword_t new_data)
+{
+ eword_t current_num = rlw_get_literal_words(self->rlw);
+
+ if (current_num >= RLW_LARGEST_LITERAL_COUNT) {
+ buffer_push_rlw(self, 0);
+
+ rlw_set_literal_words(self->rlw, 1);
+ buffer_push(self, new_data);
+ return 2;
+ }
+
+ rlw_set_literal_words(self->rlw, current_num + 1);
+
+ /* sanity check */
+ assert(rlw_get_literal_words(self->rlw) == current_num + 1);
+
+ buffer_push(self, new_data);
+ return 1;
+}
+
+void ewah_add_dirty_words(
+ struct ewah_bitmap *self, const eword_t *buffer,
+ size_t number, int negate)
+{
+ size_t literals, can_add;
+
+ while (1) {
+ literals = rlw_get_literal_words(self->rlw);
+ can_add = min_size(number, RLW_LARGEST_LITERAL_COUNT - literals);
+
+ rlw_set_literal_words(self->rlw, literals + can_add);
+
+ if (self->buffer_size + can_add >= self->alloc_size)
+ buffer_grow(self, (self->buffer_size + can_add) * 3 / 2);
+
+ if (negate) {
+ size_t i;
+ for (i = 0; i < can_add; ++i)
+ self->buffer[self->buffer_size++] = ~buffer[i];
+ } else {
+ memcpy(self->buffer + self->buffer_size,
+ buffer, can_add * sizeof(eword_t));
+ self->buffer_size += can_add;
+ }
+
+ self->bit_size += can_add * BITS_IN_WORD;
+
+ if (number - can_add == 0)
+ break;
+
+ buffer_push_rlw(self, 0);
+ buffer += can_add;
+ number -= can_add;
+ }
+}
+
+static size_t add_empty_word(struct ewah_bitmap *self, int v)
+{
+ int no_literal = (rlw_get_literal_words(self->rlw) == 0);
+ eword_t run_len = rlw_get_running_len(self->rlw);
+
+ if (no_literal && run_len == 0) {
+ rlw_set_run_bit(self->rlw, v);
+ assert(rlw_get_run_bit(self->rlw) == v);
+ }
+
+ if (no_literal && rlw_get_run_bit(self->rlw) == v &&
+ run_len < RLW_LARGEST_RUNNING_COUNT) {
+ rlw_set_running_len(self->rlw, run_len + 1);
+ assert(rlw_get_running_len(self->rlw) == run_len + 1);
+ return 0;
+ } else {
+ buffer_push_rlw(self, 0);
+
+ assert(rlw_get_running_len(self->rlw) == 0);
+ assert(rlw_get_run_bit(self->rlw) == 0);
+ assert(rlw_get_literal_words(self->rlw) == 0);
+
+ rlw_set_run_bit(self->rlw, v);
+ assert(rlw_get_run_bit(self->rlw) == v);
+
+ rlw_set_running_len(self->rlw, 1);
+ assert(rlw_get_running_len(self->rlw) == 1);
+ assert(rlw_get_literal_words(self->rlw) == 0);
+ return 1;
+ }
+}
+
+size_t ewah_add(struct ewah_bitmap *self, eword_t word)
+{
+ self->bit_size += BITS_IN_WORD;
+
+ if (word == 0)
+ return add_empty_word(self, 0);
+
+ if (word == (eword_t)(~0))
+ return add_empty_word(self, 1);
+
+ return add_literal(self, word);
+}
+
+void ewah_set(struct ewah_bitmap *self, size_t i)
+{
+ const size_t dist =
+ (i + BITS_IN_WORD) / BITS_IN_WORD -
+ (self->bit_size + BITS_IN_WORD - 1) / BITS_IN_WORD;
+
+ assert(i >= self->bit_size);
+
+ self->bit_size = i + 1;
+
+ if (dist > 0) {
+ if (dist > 1)
+ add_empty_words(self, 0, dist - 1);
+
+ add_literal(self, (eword_t)1 << (i % BITS_IN_WORD));
+ return;
+ }
+
+ if (rlw_get_literal_words(self->rlw) == 0) {
+ rlw_set_running_len(self->rlw,
+ rlw_get_running_len(self->rlw) - 1);
+ add_literal(self, (eword_t)1 << (i % BITS_IN_WORD));
+ return;
+ }
+
+ self->buffer[self->buffer_size - 1] |=
+ ((eword_t)1 << (i % BITS_IN_WORD));
+
+ /* check if we just completed a stream of 1s */
+ if (self->buffer[self->buffer_size - 1] == (eword_t)(~0)) {
+ self->buffer[--self->buffer_size] = 0;
+ rlw_set_literal_words(self->rlw,
+ rlw_get_literal_words(self->rlw) - 1);
+ add_empty_word(self, 1);
+ }
+}
+
+void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), void *payload)
+{
+ size_t pos = 0;
+ size_t pointer = 0;
+ size_t k;
+
+ while (pointer < self->buffer_size) {
+ eword_t *word = &self->buffer[pointer];
+
+ if (rlw_get_run_bit(word)) {
+ size_t len = rlw_get_running_len(word) * BITS_IN_WORD;
+ for (k = 0; k < len; ++k, ++pos)
+ callback(pos, payload);
+ } else {
+ pos += rlw_get_running_len(word) * BITS_IN_WORD;
+ }
+
+ ++pointer;
+
+ for (k = 0; k < rlw_get_literal_words(word); ++k) {
+ int c;
+
+ /* todo: zero count optimization */
+ for (c = 0; c < BITS_IN_WORD; ++c, ++pos) {
+ if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
+ callback(pos, payload);
+ }
+
+ ++pointer;
+ }
+ }
+}
+
+struct ewah_bitmap *ewah_new(void)
+{
+ struct ewah_bitmap *self;
+
+ self = ewah_malloc(sizeof(struct ewah_bitmap));
+ if (self == NULL)
+ return NULL;
+
+ self->buffer = ewah_malloc(32 * sizeof(eword_t));
+ self->alloc_size = 32;
+
+ ewah_clear(self);
+ return self;
+}
+
+void ewah_clear(struct ewah_bitmap *self)
+{
+ self->buffer_size = 1;
+ self->buffer[0] = 0;
+ self->bit_size = 0;
+ self->rlw = self->buffer;
+}
+
+void ewah_free(struct ewah_bitmap *self)
+{
+ if (!self)
+ return;
+
+ if (self->alloc_size)
+ free(self->buffer);
+
+ free(self);
+}
+
+static void read_new_rlw(struct ewah_iterator *it)
+{
+ const eword_t *word = NULL;
+
+ it->literals = 0;
+ it->compressed = 0;
+
+ while (1) {
+ word = &it->buffer[it->pointer];
+
+ it->rl = rlw_get_running_len(word);
+ it->lw = rlw_get_literal_words(word);
+ it->b = rlw_get_run_bit(word);
+
+ if (it->rl || it->lw)
+ return;
+
+ if (it->pointer < it->buffer_size - 1) {
+ it->pointer++;
+ } else {
+ it->pointer = it->buffer_size;
+ return;
+ }
+ }
+}
+
+int ewah_iterator_next(eword_t *next, struct ewah_iterator *it)
+{
+ if (it->pointer >= it->buffer_size)
+ return 0;
+
+ if (it->compressed < it->rl) {
+ it->compressed++;
+ *next = it->b ? (eword_t)(~0) : 0;
+ } else {
+ assert(it->literals < it->lw);
+
+ it->literals++;
+ it->pointer++;
+
+ assert(it->pointer < it->buffer_size);
+
+ *next = it->buffer[it->pointer];
+ }
+
+ if (it->compressed == it->rl && it->literals == it->lw) {
+ if (++it->pointer < it->buffer_size)
+ read_new_rlw(it);
+ }
+
+ return 1;
+}
+
+void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
+{
+ it->buffer = parent->buffer;
+ it->buffer_size = parent->buffer_size;
+ it->pointer = 0;
+
+ it->lw = 0;
+ it->rl = 0;
+ it->compressed = 0;
+ it->literals = 0;
+ it->b = 0;
+
+ if (it->pointer < it->buffer_size)
+ read_new_rlw(it);
+}
+
+void ewah_not(struct ewah_bitmap *self)
+{
+ size_t pointer = 0;
+
+ while (pointer < self->buffer_size) {
+ eword_t *word = &self->buffer[pointer];
+ size_t literals, k;
+
+ rlw_xor_run_bit(word);
+ ++pointer;
+
+ literals = rlw_get_literal_words(word);
+ for (k = 0; k < literals; ++k) {
+ self->buffer[pointer] = ~self->buffer[pointer];
+ ++pointer;
+ }
+ }
+}
+
+void ewah_xor(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out)
+{
+ struct rlw_iterator rlw_i;
+ struct rlw_iterator rlw_j;
+ size_t literals;
+
+ rlwit_init(&rlw_i, ewah_i);
+ rlwit_init(&rlw_j, ewah_j);
+
+ while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+ while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
+ struct rlw_iterator *prey, *predator;
+ size_t index;
+ int negate_words;
+
+ if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
+ prey = &rlw_i;
+ predator = &rlw_j;
+ } else {
+ prey = &rlw_j;
+ predator = &rlw_i;
+ }
+
+ negate_words = !!predator->rlw.running_bit;
+ index = rlwit_discharge(prey, out,
+ predator->rlw.running_len, negate_words);
+
+ ewah_add_empty_words(out, negate_words,
+ predator->rlw.running_len - index);
+
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ }
+
+ literals = min_size(
+ rlw_i.rlw.literal_words,
+ rlw_j.rlw.literal_words);
+
+ if (literals) {
+ size_t k;
+
+ for (k = 0; k < literals; ++k) {
+ ewah_add(out,
+ rlw_i.buffer[rlw_i.literal_word_start + k] ^
+ rlw_j.buffer[rlw_j.literal_word_start + k]
+ );
+ }
+
+ rlwit_discard_first_words(&rlw_i, literals);
+ rlwit_discard_first_words(&rlw_j, literals);
+ }
+ }
+
+ if (rlwit_word_size(&rlw_i) > 0)
+ rlwit_discharge(&rlw_i, out, ~0, 0);
+ else
+ rlwit_discharge(&rlw_j, out, ~0, 0);
+
+ out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
+}
+
+void ewah_and(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out)
+{
+ struct rlw_iterator rlw_i;
+ struct rlw_iterator rlw_j;
+ size_t literals;
+
+ rlwit_init(&rlw_i, ewah_i);
+ rlwit_init(&rlw_j, ewah_j);
+
+ while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+ while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
+ struct rlw_iterator *prey, *predator;
+
+ if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
+ prey = &rlw_i;
+ predator = &rlw_j;
+ } else {
+ prey = &rlw_j;
+ predator = &rlw_i;
+ }
+
+ if (predator->rlw.running_bit == 0) {
+ ewah_add_empty_words(out, 0,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(prey,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ } else {
+ size_t index = rlwit_discharge(prey, out,
+ predator->rlw.running_len, 0);
+ ewah_add_empty_words(out, 0,
+ predator->rlw.running_len - index);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ }
+ }
+
+ literals = min_size(
+ rlw_i.rlw.literal_words,
+ rlw_j.rlw.literal_words);
+
+ if (literals) {
+ size_t k;
+
+ for (k = 0; k < literals; ++k) {
+ ewah_add(out,
+ rlw_i.buffer[rlw_i.literal_word_start + k] &
+ rlw_j.buffer[rlw_j.literal_word_start + k]
+ );
+ }
+
+ rlwit_discard_first_words(&rlw_i, literals);
+ rlwit_discard_first_words(&rlw_j, literals);
+ }
+ }
+
+ if (rlwit_word_size(&rlw_i) > 0)
+ rlwit_discharge_empty(&rlw_i, out);
+ else
+ rlwit_discharge_empty(&rlw_j, out);
+
+ out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
+}
+
+void ewah_and_not(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out)
+{
+ struct rlw_iterator rlw_i;
+ struct rlw_iterator rlw_j;
+ size_t literals;
+
+ rlwit_init(&rlw_i, ewah_i);
+ rlwit_init(&rlw_j, ewah_j);
+
+ while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+ while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
+ struct rlw_iterator *prey, *predator;
+
+ if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
+ prey = &rlw_i;
+ predator = &rlw_j;
+ } else {
+ prey = &rlw_j;
+ predator = &rlw_i;
+ }
+
+ if ((predator->rlw.running_bit && prey == &rlw_i) ||
+ (!predator->rlw.running_bit && prey != &rlw_i)) {
+ ewah_add_empty_words(out, 0,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(prey,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ } else {
+ size_t index;
+ int negate_words;
+
+ negate_words = (&rlw_i != prey);
+ index = rlwit_discharge(prey, out,
+ predator->rlw.running_len, negate_words);
+ ewah_add_empty_words(out, negate_words,
+ predator->rlw.running_len - index);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ }
+ }
+
+ literals = min_size(
+ rlw_i.rlw.literal_words,
+ rlw_j.rlw.literal_words);
+
+ if (literals) {
+ size_t k;
+
+ for (k = 0; k < literals; ++k) {
+ ewah_add(out,
+ rlw_i.buffer[rlw_i.literal_word_start + k] &
+ ~(rlw_j.buffer[rlw_j.literal_word_start + k])
+ );
+ }
+
+ rlwit_discard_first_words(&rlw_i, literals);
+ rlwit_discard_first_words(&rlw_j, literals);
+ }
+ }
+
+ if (rlwit_word_size(&rlw_i) > 0)
+ rlwit_discharge(&rlw_i, out, ~0, 0);
+ else
+ rlwit_discharge_empty(&rlw_j, out);
+
+ out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
+}
+
+void ewah_or(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out)
+{
+ struct rlw_iterator rlw_i;
+ struct rlw_iterator rlw_j;
+ size_t literals;
+
+ rlwit_init(&rlw_i, ewah_i);
+ rlwit_init(&rlw_j, ewah_j);
+
+ while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
+ while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
+ struct rlw_iterator *prey, *predator;
+
+ if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
+ prey = &rlw_i;
+ predator = &rlw_j;
+ } else {
+ prey = &rlw_j;
+ predator = &rlw_i;
+ }
+
+ if (predator->rlw.running_bit) {
+ ewah_add_empty_words(out, 0,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(prey,
+ predator->rlw.running_len);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ } else {
+ size_t index = rlwit_discharge(prey, out,
+ predator->rlw.running_len, 0);
+ ewah_add_empty_words(out, 0,
+ predator->rlw.running_len - index);
+ rlwit_discard_first_words(predator,
+ predator->rlw.running_len);
+ }
+ }
+
+ literals = min_size(
+ rlw_i.rlw.literal_words,
+ rlw_j.rlw.literal_words);
+
+ if (literals) {
+ size_t k;
+
+ for (k = 0; k < literals; ++k) {
+ ewah_add(out,
+ rlw_i.buffer[rlw_i.literal_word_start + k] |
+ rlw_j.buffer[rlw_j.literal_word_start + k]
+ );
+ }
+
+ rlwit_discard_first_words(&rlw_i, literals);
+ rlwit_discard_first_words(&rlw_j, literals);
+ }
+ }
+
+ if (rlwit_word_size(&rlw_i) > 0)
+ rlwit_discharge(&rlw_i, out, ~0, 0);
+ else
+ rlwit_discharge(&rlw_j, out, ~0, 0);
+
+ out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
+}
+
+
+#define BITMAP_POOL_MAX 16
+static struct ewah_bitmap *bitmap_pool[BITMAP_POOL_MAX];
+static size_t bitmap_pool_size;
+
+struct ewah_bitmap *ewah_pool_new(void)
+{
+ if (bitmap_pool_size)
+ return bitmap_pool[--bitmap_pool_size];
+
+ return ewah_new();
+}
+
+void ewah_pool_free(struct ewah_bitmap *self)
+{
+ if (self == NULL)
+ return;
+
+ if (bitmap_pool_size == BITMAP_POOL_MAX ||
+ self->alloc_size == 0) {
+ ewah_free(self);
+ return;
+ }
+
+ ewah_clear(self);
+ bitmap_pool[bitmap_pool_size++] = self;
+}
+
+uint32_t ewah_checksum(struct ewah_bitmap *self)
+{
+ const uint8_t *p = (uint8_t *)self->buffer;
+ uint32_t crc = (uint32_t)self->bit_size;
+ size_t size = self->buffer_size * sizeof(eword_t);
+
+ while (size--)
+ crc = (crc << 5) - crc + (uint32_t)*p++;
+
+ return crc;
+}
diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
new file mode 100644
index 0000000000..f7f700ef51
--- /dev/null
+++ b/ewah/ewah_io.c
@@ -0,0 +1,204 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "git-compat-util.h"
+#include "ewok.h"
+
+int ewah_serialize_native(struct ewah_bitmap *self, int fd)
+{
+ uint32_t write32;
+ size_t to_write = self->buffer_size * 8;
+
+ /* 32 bit -- bit size for the map */
+ write32 = (uint32_t)self->bit_size;
+ if (write(fd, &write32, 4) != 4)
+ return -1;
+
+ /** 32 bit -- number of compressed 64-bit words */
+ write32 = (uint32_t)self->buffer_size;
+ if (write(fd, &write32, 4) != 4)
+ return -1;
+
+ if (write(fd, self->buffer, to_write) != to_write)
+ return -1;
+
+ /** 32 bit -- position for the RLW */
+ write32 = self->rlw - self->buffer;
+ if (write(fd, &write32, 4) != 4)
+ return -1;
+
+ return (3 * 4) + to_write;
+}
+
+int ewah_serialize_to(struct ewah_bitmap *self,
+ int (*write_fun)(void *, const void *, size_t),
+ void *data)
+{
+ size_t i;
+ eword_t dump[2048];
+ const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
+ uint32_t bitsize, word_count, rlw_pos;
+
+ const eword_t *buffer;
+ size_t words_left;
+
+ /* 32 bit -- bit size for the map */
+ bitsize = htonl((uint32_t)self->bit_size);
+ if (write_fun(data, &bitsize, 4) != 4)
+ return -1;
+
+ /** 32 bit -- number of compressed 64-bit words */
+ word_count = htonl((uint32_t)self->buffer_size);
+ if (write_fun(data, &word_count, 4) != 4)
+ return -1;
+
+ /** 64 bit x N -- compressed words */
+ buffer = self->buffer;
+ words_left = self->buffer_size;
+
+ while (words_left >= words_per_dump) {
+ for (i = 0; i < words_per_dump; ++i, ++buffer)
+ dump[i] = htonll(*buffer);
+
+ if (write_fun(data, dump, sizeof(dump)) != sizeof(dump))
+ return -1;
+
+ words_left -= words_per_dump;
+ }
+
+ if (words_left) {
+ for (i = 0; i < words_left; ++i, ++buffer)
+ dump[i] = htonll(*buffer);
+
+ if (write_fun(data, dump, words_left * 8) != words_left * 8)
+ return -1;
+ }
+
+ /** 32 bit -- position for the RLW */
+ rlw_pos = (uint8_t*)self->rlw - (uint8_t *)self->buffer;
+ rlw_pos = htonl(rlw_pos / sizeof(eword_t));
+
+ if (write_fun(data, &rlw_pos, 4) != 4)
+ return -1;
+
+ return (3 * 4) + (self->buffer_size * 8);
+}
+
+static int write_helper(void *fd, const void *buf, size_t len)
+{
+ return write((intptr_t)fd, buf, len);
+}
+
+int ewah_serialize(struct ewah_bitmap *self, int fd)
+{
+ return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd);
+}
+
+int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len)
+{
+ uint8_t *ptr = map;
+ size_t i;
+
+ self->bit_size = get_be32(ptr);
+ ptr += sizeof(uint32_t);
+
+ self->buffer_size = self->alloc_size = get_be32(ptr);
+ ptr += sizeof(uint32_t);
+
+ self->buffer = ewah_realloc(self->buffer,
+ self->alloc_size * sizeof(eword_t));
+
+ if (!self->buffer)
+ return -1;
+
+ /*
+ * Copy the raw data for the bitmap as a whole chunk;
+ * if we're in a little-endian platform, we'll perform
+ * the endianness conversion in a separate pass to ensure
+ * we're loading 8-byte aligned words.
+ */
+ memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t));
+ ptr += self->buffer_size * sizeof(uint64_t);
+
+ for (i = 0; i < self->buffer_size; ++i)
+ self->buffer[i] = ntohll(self->buffer[i]);
+
+ self->rlw = self->buffer + get_be32(ptr);
+
+ return (3 * 4) + (self->buffer_size * 8);
+}
+
+int ewah_deserialize(struct ewah_bitmap *self, int fd)
+{
+ size_t i;
+ eword_t dump[2048];
+ const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
+ uint32_t bitsize, word_count, rlw_pos;
+
+ eword_t *buffer = NULL;
+ size_t words_left;
+
+ ewah_clear(self);
+
+ /* 32 bit -- bit size for the map */
+ if (read(fd, &bitsize, 4) != 4)
+ return -1;
+
+ self->bit_size = (size_t)ntohl(bitsize);
+
+ /** 32 bit -- number of compressed 64-bit words */
+ if (read(fd, &word_count, 4) != 4)
+ return -1;
+
+ self->buffer_size = self->alloc_size = (size_t)ntohl(word_count);
+ self->buffer = ewah_realloc(self->buffer,
+ self->alloc_size * sizeof(eword_t));
+
+ if (!self->buffer)
+ return -1;
+
+ /** 64 bit x N -- compressed words */
+ buffer = self->buffer;
+ words_left = self->buffer_size;
+
+ while (words_left >= words_per_dump) {
+ if (read(fd, dump, sizeof(dump)) != sizeof(dump))
+ return -1;
+
+ for (i = 0; i < words_per_dump; ++i, ++buffer)
+ *buffer = ntohll(dump[i]);
+
+ words_left -= words_per_dump;
+ }
+
+ if (words_left) {
+ if (read(fd, dump, words_left * 8) != words_left * 8)
+ return -1;
+
+ for (i = 0; i < words_left; ++i, ++buffer)
+ *buffer = ntohll(dump[i]);
+ }
+
+ /** 32 bit -- position for the RLW */
+ if (read(fd, &rlw_pos, 4) != 4)
+ return -1;
+
+ self->rlw = self->buffer + ntohl(rlw_pos);
+ return 0;
+}
diff --git a/ewah/ewah_rlw.c b/ewah/ewah_rlw.c
new file mode 100644
index 0000000000..c723f1aefd
--- /dev/null
+++ b/ewah/ewah_rlw.c
@@ -0,0 +1,115 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "git-compat-util.h"
+#include "ewok.h"
+#include "ewok_rlw.h"
+
+static inline int next_word(struct rlw_iterator *it)
+{
+ if (it->pointer >= it->size)
+ return 0;
+
+ it->rlw.word = &it->buffer[it->pointer];
+ it->pointer += rlw_get_literal_words(it->rlw.word) + 1;
+
+ it->rlw.literal_words = rlw_get_literal_words(it->rlw.word);
+ it->rlw.running_len = rlw_get_running_len(it->rlw.word);
+ it->rlw.running_bit = rlw_get_run_bit(it->rlw.word);
+ it->rlw.literal_word_offset = 0;
+
+ return 1;
+}
+
+void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *from_ewah)
+{
+ it->buffer = from_ewah->buffer;
+ it->size = from_ewah->buffer_size;
+ it->pointer = 0;
+
+ next_word(it);
+
+ it->literal_word_start = rlwit_literal_words(it) +
+ it->rlw.literal_word_offset;
+}
+
+void rlwit_discard_first_words(struct rlw_iterator *it, size_t x)
+{
+ while (x > 0) {
+ size_t discard;
+
+ if (it->rlw.running_len > x) {
+ it->rlw.running_len -= x;
+ return;
+ }
+
+ x -= it->rlw.running_len;
+ it->rlw.running_len = 0;
+
+ discard = (x > it->rlw.literal_words) ? it->rlw.literal_words : x;
+
+ it->literal_word_start += discard;
+ it->rlw.literal_words -= discard;
+ x -= discard;
+
+ if (x > 0 || rlwit_word_size(it) == 0) {
+ if (!next_word(it))
+ break;
+
+ it->literal_word_start =
+ rlwit_literal_words(it) + it->rlw.literal_word_offset;
+ }
+ }
+}
+
+size_t rlwit_discharge(
+ struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate)
+{
+ size_t index = 0;
+
+ while (index < max && rlwit_word_size(it) > 0) {
+ size_t pd, pl = it->rlw.running_len;
+
+ if (index + pl > max)
+ pl = max - index;
+
+ ewah_add_empty_words(out, it->rlw.running_bit ^ negate, pl);
+ index += pl;
+
+ pd = it->rlw.literal_words;
+ if (pd + index > max)
+ pd = max - index;
+
+ ewah_add_dirty_words(out,
+ it->buffer + it->literal_word_start, pd, negate);
+
+ rlwit_discard_first_words(it, pd + pl);
+ index += pd;
+ }
+
+ return index;
+}
+
+void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out)
+{
+ while (rlwit_word_size(it) > 0) {
+ ewah_add_empty_words(out, 0, rlwit_word_size(it));
+ rlwit_discard_first_words(it, rlwit_word_size(it));
+ }
+}
diff --git a/ewah/ewok.h b/ewah/ewok.h
new file mode 100644
index 0000000000..43adeb5c68
--- /dev/null
+++ b/ewah/ewok.h
@@ -0,0 +1,233 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#ifndef __EWOK_BITMAP_H__
+#define __EWOK_BITMAP_H__
+
+#ifndef ewah_malloc
+# define ewah_malloc xmalloc
+#endif
+#ifndef ewah_realloc
+# define ewah_realloc xrealloc
+#endif
+#ifndef ewah_calloc
+# define ewah_calloc xcalloc
+#endif
+
+typedef uint64_t eword_t;
+#define BITS_IN_WORD (sizeof(eword_t) * 8)
+
+/**
+ * Do not use __builtin_popcountll. The GCC implementation
+ * is notoriously slow on all platforms.
+ *
+ * See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
+ */
+static inline uint32_t ewah_bit_popcount64(uint64_t x)
+{
+ x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL);
+ x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
+ x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL);
+ return (x * 0x0101010101010101ULL) >> 56;
+}
+
+#ifdef __GNUC__
+#define ewah_bit_ctz64(x) __builtin_ctzll(x)
+#else
+static inline int ewah_bit_ctz64(uint64_t x)
+{
+ int n = 0;
+ if ((x & 0xffffffff) == 0) { x >>= 32; n += 32; }
+ if ((x & 0xffff) == 0) { x >>= 16; n += 16; }
+ if ((x & 0xff) == 0) { x >>= 8; n += 8; }
+ if ((x & 0xf) == 0) { x >>= 4; n += 4; }
+ if ((x & 0x3) == 0) { x >>= 2; n += 2; }
+ if ((x & 0x1) == 0) { x >>= 1; n += 1; }
+ return n + !x;
+}
+#endif
+
+struct ewah_bitmap {
+ eword_t *buffer;
+ size_t buffer_size;
+ size_t alloc_size;
+ size_t bit_size;
+ eword_t *rlw;
+};
+
+typedef void (*ewah_callback)(size_t pos, void *);
+
+struct ewah_bitmap *ewah_pool_new(void);
+void ewah_pool_free(struct ewah_bitmap *self);
+
+/**
+ * Allocate a new EWAH Compressed bitmap
+ */
+struct ewah_bitmap *ewah_new(void);
+
+/**
+ * Clear all the bits in the bitmap. Does not free or resize
+ * memory.
+ */
+void ewah_clear(struct ewah_bitmap *self);
+
+/**
+ * Free all the memory of the bitmap
+ */
+void ewah_free(struct ewah_bitmap *self);
+
+int ewah_serialize_to(struct ewah_bitmap *self,
+ int (*write_fun)(void *out, const void *buf, size_t len),
+ void *out);
+int ewah_serialize(struct ewah_bitmap *self, int fd);
+int ewah_serialize_native(struct ewah_bitmap *self, int fd);
+
+int ewah_deserialize(struct ewah_bitmap *self, int fd);
+int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len);
+int ewah_read_mmap_native(struct ewah_bitmap *self, void *map, size_t len);
+
+uint32_t ewah_checksum(struct ewah_bitmap *self);
+
+/**
+ * Logical not (bitwise negation) in-place on the bitmap
+ *
+ * This operation is linear time based on the size of the bitmap.
+ */
+void ewah_not(struct ewah_bitmap *self);
+
+/**
+ * Call the given callback with the position of every single bit
+ * that has been set on the bitmap.
+ *
+ * This is an efficient operation that does not fully decompress
+ * the bitmap.
+ */
+void ewah_each_bit(struct ewah_bitmap *self, ewah_callback callback, void *payload);
+
+/**
+ * Set a given bit on the bitmap.
+ *
+ * The bit at position `pos` will be set to true. Because of the
+ * way that the bitmap is compressed, a set bit cannot be unset
+ * later on.
+ *
+ * Furthermore, since the bitmap uses streaming compression, bits
+ * can only set incrementally.
+ *
+ * E.g.
+ * ewah_set(bitmap, 1); // ok
+ * ewah_set(bitmap, 76); // ok
+ * ewah_set(bitmap, 77); // ok
+ * ewah_set(bitmap, 8712800127); // ok
+ * ewah_set(bitmap, 25); // failed, assert raised
+ */
+void ewah_set(struct ewah_bitmap *self, size_t i);
+
+struct ewah_iterator {
+ const eword_t *buffer;
+ size_t buffer_size;
+
+ size_t pointer;
+ eword_t compressed, literals;
+ eword_t rl, lw;
+ int b;
+};
+
+/**
+ * Initialize a new iterator to run through the bitmap in uncompressed form.
+ *
+ * The iterator can be stack allocated. The underlying bitmap must not be freed
+ * before the iteration is over.
+ *
+ * E.g.
+ *
+ * struct ewah_bitmap *bitmap = ewah_new();
+ * struct ewah_iterator it;
+ *
+ * ewah_iterator_init(&it, bitmap);
+ */
+void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
+
+/**
+ * Yield every single word in the bitmap in uncompressed form. This is:
+ * yield single words (32-64 bits) where each bit represents an actual
+ * bit from the bitmap.
+ *
+ * Return: true if a word was yield, false if there are no words left
+ */
+int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+
+void ewah_or(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out);
+
+void ewah_and_not(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out);
+
+void ewah_xor(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out);
+
+void ewah_and(
+ struct ewah_bitmap *ewah_i,
+ struct ewah_bitmap *ewah_j,
+ struct ewah_bitmap *out);
+
+/**
+ * Direct word access
+ */
+size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number);
+void ewah_add_dirty_words(
+ struct ewah_bitmap *self, const eword_t *buffer, size_t number, int negate);
+size_t ewah_add(struct ewah_bitmap *self, eword_t word);
+
+
+/**
+ * Uncompressed, old-school bitmap that can be efficiently compressed
+ * into an `ewah_bitmap`.
+ */
+struct bitmap {
+ eword_t *words;
+ size_t word_alloc;
+};
+
+struct bitmap *bitmap_new(void);
+void bitmap_set(struct bitmap *self, size_t pos);
+void bitmap_clear(struct bitmap *self, size_t pos);
+int bitmap_get(struct bitmap *self, size_t pos);
+void bitmap_reset(struct bitmap *self);
+void bitmap_free(struct bitmap *self);
+int bitmap_equals(struct bitmap *self, struct bitmap *other);
+int bitmap_is_subset(struct bitmap *self, struct bitmap *super);
+
+struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
+struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
+
+void bitmap_and_not(struct bitmap *self, struct bitmap *other);
+void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
+void bitmap_or(struct bitmap *self, const struct bitmap *other);
+
+void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data);
+size_t bitmap_popcount(struct bitmap *self);
+
+#endif
diff --git a/ewah/ewok_rlw.h b/ewah/ewok_rlw.h
new file mode 100644
index 0000000000..63efdf9698
--- /dev/null
+++ b/ewah/ewok_rlw.h
@@ -0,0 +1,114 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#ifndef __EWOK_RLW_H__
+#define __EWOK_RLW_H__
+
+#define RLW_RUNNING_BITS (sizeof(eword_t) * 4)
+#define RLW_LITERAL_BITS (sizeof(eword_t) * 8 - 1 - RLW_RUNNING_BITS)
+
+#define RLW_LARGEST_RUNNING_COUNT (((eword_t)1 << RLW_RUNNING_BITS) - 1)
+#define RLW_LARGEST_LITERAL_COUNT (((eword_t)1 << RLW_LITERAL_BITS) - 1)
+
+#define RLW_LARGEST_RUNNING_COUNT_SHIFT (RLW_LARGEST_RUNNING_COUNT << 1)
+
+#define RLW_RUNNING_LEN_PLUS_BIT (((eword_t)1 << (RLW_RUNNING_BITS + 1)) - 1)
+
+static int rlw_get_run_bit(const eword_t *word)
+{
+ return *word & (eword_t)1;
+}
+
+static inline void rlw_set_run_bit(eword_t *word, int b)
+{
+ if (b) {
+ *word |= (eword_t)1;
+ } else {
+ *word &= (eword_t)(~1);
+ }
+}
+
+static inline void rlw_xor_run_bit(eword_t *word)
+{
+ if (*word & 1) {
+ *word &= (eword_t)(~1);
+ } else {
+ *word |= (eword_t)1;
+ }
+}
+
+static inline void rlw_set_running_len(eword_t *word, eword_t l)
+{
+ *word |= RLW_LARGEST_RUNNING_COUNT_SHIFT;
+ *word &= (l << 1) | (~RLW_LARGEST_RUNNING_COUNT_SHIFT);
+}
+
+static inline eword_t rlw_get_running_len(const eword_t *word)
+{
+ return (*word >> 1) & RLW_LARGEST_RUNNING_COUNT;
+}
+
+static inline eword_t rlw_get_literal_words(const eword_t *word)
+{
+ return *word >> (1 + RLW_RUNNING_BITS);
+}
+
+static inline void rlw_set_literal_words(eword_t *word, eword_t l)
+{
+ *word |= ~RLW_RUNNING_LEN_PLUS_BIT;
+ *word &= (l << (RLW_RUNNING_BITS + 1)) | RLW_RUNNING_LEN_PLUS_BIT;
+}
+
+static inline eword_t rlw_size(const eword_t *self)
+{
+ return rlw_get_running_len(self) + rlw_get_literal_words(self);
+}
+
+struct rlw_iterator {
+ const eword_t *buffer;
+ size_t size;
+ size_t pointer;
+ size_t literal_word_start;
+
+ struct {
+ const eword_t *word;
+ int literal_words;
+ int running_len;
+ int literal_word_offset;
+ int running_bit;
+ } rlw;
+};
+
+void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *bitmap);
+void rlwit_discard_first_words(struct rlw_iterator *it, size_t x);
+size_t rlwit_discharge(
+ struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate);
+void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out);
+
+static inline size_t rlwit_word_size(struct rlw_iterator *it)
+{
+ return it->rlw.running_len + it->rlw.literal_words;
+}
+
+static inline size_t rlwit_literal_words(struct rlw_iterator *it)
+{
+ return it->pointer - it->rlw.literal_words;
+}
+
+#endif
diff --git a/khash.h b/khash.h
new file mode 100644
index 0000000000..57ff6038c5
--- /dev/null
+++ b/khash.h
@@ -0,0 +1,338 @@
+/* The MIT License
+
+ Copyright (c) 2008, 2009, 2011 by Attractive Chaos <attractor@live.co.uk>
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ SOFTWARE.
+*/
+
+#ifndef __AC_KHASH_H
+#define __AC_KHASH_H
+
+#define AC_VERSION_KHASH_H "0.2.8"
+
+typedef uint32_t khint32_t;
+typedef uint64_t khint64_t;
+
+typedef khint32_t khint_t;
+typedef khint_t khiter_t;
+
+#define __ac_isempty(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&2)
+#define __ac_isdel(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&1)
+#define __ac_iseither(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&3)
+#define __ac_set_isdel_false(flag, i) (flag[i>>4]&=~(1ul<<((i&0xfU)<<1)))
+#define __ac_set_isempty_false(flag, i) (flag[i>>4]&=~(2ul<<((i&0xfU)<<1)))
+#define __ac_set_isboth_false(flag, i) (flag[i>>4]&=~(3ul<<((i&0xfU)<<1)))
+#define __ac_set_isdel_true(flag, i) (flag[i>>4]|=1ul<<((i&0xfU)<<1))
+
+#define __ac_fsize(m) ((m) < 16? 1 : (m)>>4)
+
+#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
+
+static inline khint_t __ac_X31_hash_string(const char *s)
+{
+ khint_t h = (khint_t)*s;
+ if (h) for (++s ; *s; ++s) h = (h << 5) - h + (khint_t)*s;
+ return h;
+}
+
+#define kh_str_hash_func(key) __ac_X31_hash_string(key)
+#define kh_str_hash_equal(a, b) (strcmp(a, b) == 0)
+
+static const double __ac_HASH_UPPER = 0.77;
+
+#define __KHASH_TYPE(name, khkey_t, khval_t) \
+ typedef struct { \
+ khint_t n_buckets, size, n_occupied, upper_bound; \
+ khint32_t *flags; \
+ khkey_t *keys; \
+ khval_t *vals; \
+ } kh_##name##_t;
+
+#define __KHASH_PROTOTYPES(name, khkey_t, khval_t) \
+ extern kh_##name##_t *kh_init_##name(void); \
+ extern void kh_destroy_##name(kh_##name##_t *h); \
+ extern void kh_clear_##name(kh_##name##_t *h); \
+ extern khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \
+ extern int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
+ extern khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \
+ extern void kh_del_##name(kh_##name##_t *h, khint_t x);
+
+#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
+ SCOPE kh_##name##_t *kh_init_##name(void) { \
+ return (kh_##name##_t*)xcalloc(1, sizeof(kh_##name##_t)); \
+ } \
+ SCOPE void kh_destroy_##name(kh_##name##_t *h) \
+ { \
+ if (h) { \
+ free((void *)h->keys); free(h->flags); \
+ free((void *)h->vals); \
+ free(h); \
+ } \
+ } \
+ SCOPE void kh_clear_##name(kh_##name##_t *h) \
+ { \
+ if (h && h->flags) { \
+ memset(h->flags, 0xaa, __ac_fsize(h->n_buckets) * sizeof(khint32_t)); \
+ h->size = h->n_occupied = 0; \
+ } \
+ } \
+ SCOPE khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key) \
+ { \
+ if (h->n_buckets) { \
+ khint_t k, i, last, mask, step = 0; \
+ mask = h->n_buckets - 1; \
+ k = __hash_func(key); i = k & mask; \
+ last = i; \
+ while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
+ i = (i + (++step)) & mask; \
+ if (i == last) return h->n_buckets; \
+ } \
+ return __ac_iseither(h->flags, i)? h->n_buckets : i; \
+ } else return 0; \
+ } \
+ SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
+ { /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \
+ khint32_t *new_flags = NULL; \
+ khint_t j = 1; \
+ { \
+ kroundup32(new_n_buckets); \
+ if (new_n_buckets < 4) new_n_buckets = 4; \
+ if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0; /* requested size is too small */ \
+ else { /* hash table size to be changed (shrink or expand); rehash */ \
+ new_flags = (khint32_t*)xmalloc(__ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
+ if (!new_flags) return -1; \
+ memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
+ if (h->n_buckets < new_n_buckets) { /* expand */ \
+ khkey_t *new_keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \
+ if (!new_keys) return -1; \
+ h->keys = new_keys; \
+ if (kh_is_map) { \
+ khval_t *new_vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \
+ if (!new_vals) return -1; \
+ h->vals = new_vals; \
+ } \
+ } /* otherwise shrink */ \
+ } \
+ } \
+ if (j) { /* rehashing is needed */ \
+ for (j = 0; j != h->n_buckets; ++j) { \
+ if (__ac_iseither(h->flags, j) == 0) { \
+ khkey_t key = h->keys[j]; \
+ khval_t val; \
+ khint_t new_mask; \
+ new_mask = new_n_buckets - 1; \
+ if (kh_is_map) val = h->vals[j]; \
+ __ac_set_isdel_true(h->flags, j); \
+ while (1) { /* kick-out process; sort of like in Cuckoo hashing */ \
+ khint_t k, i, step = 0; \
+ k = __hash_func(key); \
+ i = k & new_mask; \
+ while (!__ac_isempty(new_flags, i)) i = (i + (++step)) & new_mask; \
+ __ac_set_isempty_false(new_flags, i); \
+ if (i < h->n_buckets && __ac_iseither(h->flags, i) == 0) { /* kick out the existing element */ \
+ { khkey_t tmp = h->keys[i]; h->keys[i] = key; key = tmp; } \
+ if (kh_is_map) { khval_t tmp = h->vals[i]; h->vals[i] = val; val = tmp; } \
+ __ac_set_isdel_true(h->flags, i); /* mark it as deleted in the old hash table */ \
+ } else { /* write the element and jump out of the loop */ \
+ h->keys[i] = key; \
+ if (kh_is_map) h->vals[i] = val; \
+ break; \
+ } \
+ } \
+ } \
+ } \
+ if (h->n_buckets > new_n_buckets) { /* shrink the hash table */ \
+ h->keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \
+ if (kh_is_map) h->vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \
+ } \
+ free(h->flags); /* free the working space */ \
+ h->flags = new_flags; \
+ h->n_buckets = new_n_buckets; \
+ h->n_occupied = h->size; \
+ h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \
+ } \
+ return 0; \
+ } \
+ SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
+ { \
+ khint_t x; \
+ if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
+ if (h->n_buckets > (h->size<<1)) { \
+ if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
+ *ret = -1; return h->n_buckets; \
+ } \
+ } else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
+ *ret = -1; return h->n_buckets; \
+ } \
+ } /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
+ { \
+ khint_t k, i, site, last, mask = h->n_buckets - 1, step = 0; \
+ x = site = h->n_buckets; k = __hash_func(key); i = k & mask; \
+ if (__ac_isempty(h->flags, i)) x = i; /* for speed up */ \
+ else { \
+ last = i; \
+ while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
+ if (__ac_isdel(h->flags, i)) site = i; \
+ i = (i + (++step)) & mask; \
+ if (i == last) { x = site; break; } \
+ } \
+ if (x == h->n_buckets) { \
+ if (__ac_isempty(h->flags, i) && site != h->n_buckets) x = site; \
+ else x = i; \
+ } \
+ } \
+ } \
+ if (__ac_isempty(h->flags, x)) { /* not present at all */ \
+ h->keys[x] = key; \
+ __ac_set_isboth_false(h->flags, x); \
+ ++h->size; ++h->n_occupied; \
+ *ret = 1; \
+ } else if (__ac_isdel(h->flags, x)) { /* deleted */ \
+ h->keys[x] = key; \
+ __ac_set_isboth_false(h->flags, x); \
+ ++h->size; \
+ *ret = 2; \
+ } else *ret = 0; /* Don't touch h->keys[x] if present and not deleted */ \
+ return x; \
+ } \
+ SCOPE void kh_del_##name(kh_##name##_t *h, khint_t x) \
+ { \
+ if (x != h->n_buckets && !__ac_iseither(h->flags, x)) { \
+ __ac_set_isdel_true(h->flags, x); \
+ --h->size; \
+ } \
+ }
+
+#define KHASH_DECLARE(name, khkey_t, khval_t) \
+ __KHASH_TYPE(name, khkey_t, khval_t) \
+ __KHASH_PROTOTYPES(name, khkey_t, khval_t)
+
+#define KHASH_INIT2(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
+ __KHASH_TYPE(name, khkey_t, khval_t) \
+ __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
+
+#define KHASH_INIT(name, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
+ KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
+
+/* Other convenient macros... */
+
+/*! @function
+ @abstract Test whether a bucket contains data.
+ @param h Pointer to the hash table [khash_t(name)*]
+ @param x Iterator to the bucket [khint_t]
+ @return 1 if containing data; 0 otherwise [int]
+ */
+#define kh_exist(h, x) (!__ac_iseither((h)->flags, (x)))
+
+/*! @function
+ @abstract Get key given an iterator
+ @param h Pointer to the hash table [khash_t(name)*]
+ @param x Iterator to the bucket [khint_t]
+ @return Key [type of keys]
+ */
+#define kh_key(h, x) ((h)->keys[x])
+
+/*! @function
+ @abstract Get value given an iterator
+ @param h Pointer to the hash table [khash_t(name)*]
+ @param x Iterator to the bucket [khint_t]
+ @return Value [type of values]
+ @discussion For hash sets, calling this results in segfault.
+ */
+#define kh_val(h, x) ((h)->vals[x])
+
+/*! @function
+ @abstract Alias of kh_val()
+ */
+#define kh_value(h, x) ((h)->vals[x])
+
+/*! @function
+ @abstract Get the start iterator
+ @param h Pointer to the hash table [khash_t(name)*]
+ @return The start iterator [khint_t]
+ */
+#define kh_begin(h) (khint_t)(0)
+
+/*! @function
+ @abstract Get the end iterator
+ @param h Pointer to the hash table [khash_t(name)*]
+ @return The end iterator [khint_t]
+ */
+#define kh_end(h) ((h)->n_buckets)
+
+/*! @function
+ @abstract Get the number of elements in the hash table
+ @param h Pointer to the hash table [khash_t(name)*]
+ @return Number of elements in the hash table [khint_t]
+ */
+#define kh_size(h) ((h)->size)
+
+/*! @function
+ @abstract Get the number of buckets in the hash table
+ @param h Pointer to the hash table [khash_t(name)*]
+ @return Number of buckets in the hash table [khint_t]
+ */
+#define kh_n_buckets(h) ((h)->n_buckets)
+
+/*! @function
+ @abstract Iterate over the entries in the hash table
+ @param h Pointer to the hash table [khash_t(name)*]
+ @param kvar Variable to which key will be assigned
+ @param vvar Variable to which value will be assigned
+ @param code Block of code to execute
+ */
+#define kh_foreach(h, kvar, vvar, code) { khint_t __i; \
+ for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \
+ if (!kh_exist(h,__i)) continue; \
+ (kvar) = kh_key(h,__i); \
+ (vvar) = kh_val(h,__i); \
+ code; \
+ } }
+
+/*! @function
+ @abstract Iterate over the values in the hash table
+ @param h Pointer to the hash table [khash_t(name)*]
+ @param vvar Variable to which value will be assigned
+ @param code Block of code to execute
+ */
+#define kh_foreach_value(h, vvar, code) { khint_t __i; \
+ for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \
+ if (!kh_exist(h,__i)) continue; \
+ (vvar) = kh_val(h,__i); \
+ code; \
+ } }
+
+static inline khint_t __kh_oid_hash(const unsigned char *oid)
+{
+ khint_t hash;
+ memcpy(&hash, oid, sizeof(hash));
+ return hash;
+}
+
+#define __kh_oid_cmp(a, b) (hashcmp(a, b) == 0)
+
+KHASH_INIT(sha1, const unsigned char *, void *, 1, __kh_oid_hash, __kh_oid_cmp)
+typedef kh_sha1_t khash_sha1;
+
+KHASH_INIT(sha1_pos, const unsigned char *, int, 1, __kh_oid_hash, __kh_oid_cmp)
+typedef kh_sha1_pos_t khash_sha1_pos;
+
+#endif /* __AC_KHASH_H */
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
new file mode 100644
index 0000000000..1218befaf2
--- /dev/null
+++ b/pack-bitmap-write.c
@@ -0,0 +1,552 @@
+#include "cache.h"
+#include "commit.h"
+#include "tag.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "pack-revindex.h"
+#include "pack.h"
+#include "pack-bitmap.h"
+#include "sha1-lookup.h"
+#include "pack-objects.h"
+
+struct bitmapped_commit {
+ struct commit *commit;
+ struct ewah_bitmap *bitmap;
+ struct ewah_bitmap *write_as;
+ int flags;
+ int xor_offset;
+ uint32_t commit_pos;
+};
+
+struct bitmap_writer {
+ struct ewah_bitmap *commits;
+ struct ewah_bitmap *trees;
+ struct ewah_bitmap *blobs;
+ struct ewah_bitmap *tags;
+
+ khash_sha1 *bitmaps;
+ khash_sha1 *reused;
+ struct packing_data *to_pack;
+
+ struct bitmapped_commit *selected;
+ unsigned int selected_nr, selected_alloc;
+
+ struct progress *progress;
+ int show_progress;
+ unsigned char pack_checksum[20];
+};
+
+static struct bitmap_writer writer;
+
+void bitmap_writer_show_progress(int show)
+{
+ writer.show_progress = show;
+}
+
+/**
+ * Build the initial type index for the packfile
+ */
+void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+ uint32_t index_nr)
+{
+ uint32_t i;
+
+ writer.commits = ewah_new();
+ writer.trees = ewah_new();
+ writer.blobs = ewah_new();
+ writer.tags = ewah_new();
+
+ for (i = 0; i < index_nr; ++i) {
+ struct object_entry *entry = (struct object_entry *)index[i];
+ enum object_type real_type;
+
+ entry->in_pack_pos = i;
+
+ switch (entry->type) {
+ case OBJ_COMMIT:
+ case OBJ_TREE:
+ case OBJ_BLOB:
+ case OBJ_TAG:
+ real_type = entry->type;
+ break;
+
+ default:
+ real_type = sha1_object_info(entry->idx.sha1, NULL);
+ break;
+ }
+
+ switch (real_type) {
+ case OBJ_COMMIT:
+ ewah_set(writer.commits, i);
+ break;
+
+ case OBJ_TREE:
+ ewah_set(writer.trees, i);
+ break;
+
+ case OBJ_BLOB:
+ ewah_set(writer.blobs, i);
+ break;
+
+ case OBJ_TAG:
+ ewah_set(writer.tags, i);
+ break;
+
+ default:
+ die("Missing type information for %s (%d/%d)",
+ sha1_to_hex(entry->idx.sha1), real_type, entry->type);
+ }
+ }
+}
+
+/**
+ * Compute the actual bitmaps
+ */
+static struct object **seen_objects;
+static unsigned int seen_objects_nr, seen_objects_alloc;
+
+static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused)
+{
+ if (writer.selected_nr >= writer.selected_alloc) {
+ writer.selected_alloc = (writer.selected_alloc + 32) * 2;
+ writer.selected = xrealloc(writer.selected,
+ writer.selected_alloc * sizeof(struct bitmapped_commit));
+ }
+
+ writer.selected[writer.selected_nr].commit = commit;
+ writer.selected[writer.selected_nr].bitmap = reused;
+ writer.selected[writer.selected_nr].flags = 0;
+
+ writer.selected_nr++;
+}
+
+static inline void mark_as_seen(struct object *object)
+{
+ ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc);
+ seen_objects[seen_objects_nr++] = object;
+}
+
+static inline void reset_all_seen(void)
+{
+ unsigned int i;
+ for (i = 0; i < seen_objects_nr; ++i) {
+ seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN);
+ }
+ seen_objects_nr = 0;
+}
+
+static uint32_t find_object_pos(const unsigned char *sha1)
+{
+ struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL);
+
+ if (!entry) {
+ die("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", sha1_to_hex(sha1));
+ }
+
+ return entry->in_pack_pos;
+}
+
+static void show_object(struct object *object, const struct name_path *path,
+ const char *last, void *data)
+{
+ struct bitmap *base = data;
+ bitmap_set(base, find_object_pos(object->sha1));
+ mark_as_seen(object);
+}
+
+static void show_commit(struct commit *commit, void *data)
+{
+ mark_as_seen((struct object *)commit);
+}
+
+static int
+add_to_include_set(struct bitmap *base, struct commit *commit)
+{
+ khiter_t hash_pos;
+ uint32_t bitmap_pos = find_object_pos(commit->object.sha1);
+
+ if (bitmap_get(base, bitmap_pos))
+ return 0;
+
+ hash_pos = kh_get_sha1(writer.bitmaps, commit->object.sha1);
+ if (hash_pos < kh_end(writer.bitmaps)) {
+ struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos);
+ bitmap_or_ewah(base, bc->bitmap);
+ return 0;
+ }
+
+ bitmap_set(base, bitmap_pos);
+ return 1;
+}
+
+static int
+should_include(struct commit *commit, void *_data)
+{
+ struct bitmap *base = _data;
+
+ if (!add_to_include_set(base, commit)) {
+ struct commit_list *parent = commit->parents;
+
+ mark_as_seen((struct object *)commit);
+
+ while (parent) {
+ parent->item->object.flags |= SEEN;
+ mark_as_seen((struct object *)parent->item);
+ parent = parent->next;
+ }
+
+ return 0;
+ }
+
+ return 1;
+}
+
+static void compute_xor_offsets(void)
+{
+ static const int MAX_XOR_OFFSET_SEARCH = 10;
+
+ int i, next = 0;
+
+ while (next < writer.selected_nr) {
+ struct bitmapped_commit *stored = &writer.selected[next];
+
+ int best_offset = 0;
+ struct ewah_bitmap *best_bitmap = stored->bitmap;
+ struct ewah_bitmap *test_xor;
+
+ for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
+ int curr = next - i;
+
+ if (curr < 0)
+ break;
+
+ test_xor = ewah_pool_new();
+ ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor);
+
+ if (test_xor->buffer_size < best_bitmap->buffer_size) {
+ if (best_bitmap != stored->bitmap)
+ ewah_pool_free(best_bitmap);
+
+ best_bitmap = test_xor;
+ best_offset = i;
+ } else {
+ ewah_pool_free(test_xor);
+ }
+ }
+
+ stored->xor_offset = best_offset;
+ stored->write_as = best_bitmap;
+
+ next++;
+ }
+}
+
+void bitmap_writer_build(struct packing_data *to_pack)
+{
+ static const double REUSE_BITMAP_THRESHOLD = 0.2;
+
+ int i, reuse_after, need_reset;
+ struct bitmap *base = bitmap_new();
+ struct rev_info revs;
+
+ writer.bitmaps = kh_init_sha1();
+ writer.to_pack = to_pack;
+
+ if (writer.show_progress)
+ writer.progress = start_progress("Building bitmaps", writer.selected_nr);
+
+ init_revisions(&revs, NULL);
+ revs.tag_objects = 1;
+ revs.tree_objects = 1;
+ revs.blob_objects = 1;
+ revs.no_walk = 0;
+
+ revs.include_check = should_include;
+ reset_revision_walk();
+
+ reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD;
+ need_reset = 0;
+
+ for (i = writer.selected_nr - 1; i >= 0; --i) {
+ struct bitmapped_commit *stored;
+ struct object *object;
+
+ khiter_t hash_pos;
+ int hash_ret;
+
+ stored = &writer.selected[i];
+ object = (struct object *)stored->commit;
+
+ if (stored->bitmap == NULL) {
+ if (i < writer.selected_nr - 1 &&
+ (need_reset ||
+ !in_merge_bases(writer.selected[i + 1].commit,
+ stored->commit))) {
+ bitmap_reset(base);
+ reset_all_seen();
+ }
+
+ add_pending_object(&revs, object, "");
+ revs.include_check_data = base;
+
+ if (prepare_revision_walk(&revs))
+ die("revision walk setup failed");
+
+ traverse_commit_list(&revs, show_commit, show_object, base);
+
+ revs.pending.nr = 0;
+ revs.pending.alloc = 0;
+ revs.pending.objects = NULL;
+
+ stored->bitmap = bitmap_to_ewah(base);
+ need_reset = 0;
+ } else
+ need_reset = 1;
+
+ if (i >= reuse_after)
+ stored->flags |= BITMAP_FLAG_REUSE;
+
+ hash_pos = kh_put_sha1(writer.bitmaps, object->sha1, &hash_ret);
+ if (hash_ret == 0)
+ die("Duplicate entry when writing index: %s",
+ sha1_to_hex(object->sha1));
+
+ kh_value(writer.bitmaps, hash_pos) = stored;
+ display_progress(writer.progress, writer.selected_nr - i);
+ }
+
+ bitmap_free(base);
+ stop_progress(&writer.progress);
+
+ compute_xor_offsets();
+}
+
+/**
+ * Select the commits that will be bitmapped
+ */
+static inline unsigned int next_commit_index(unsigned int idx)
+{
+ static const unsigned int MIN_COMMITS = 100;
+ static const unsigned int MAX_COMMITS = 5000;
+
+ static const unsigned int MUST_REGION = 100;
+ static const unsigned int MIN_REGION = 20000;
+
+ unsigned int offset, next;
+
+ if (idx <= MUST_REGION)
+ return 0;
+
+ if (idx <= MIN_REGION) {
+ offset = idx - MUST_REGION;
+ return (offset < MIN_COMMITS) ? offset : MIN_COMMITS;
+ }
+
+ offset = idx - MIN_REGION;
+ next = (offset < MAX_COMMITS) ? offset : MAX_COMMITS;
+
+ return (next > MIN_COMMITS) ? next : MIN_COMMITS;
+}
+
+static int date_compare(const void *_a, const void *_b)
+{
+ struct commit *a = *(struct commit **)_a;
+ struct commit *b = *(struct commit **)_b;
+ return (long)b->date - (long)a->date;
+}
+
+void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack)
+{
+ if (prepare_bitmap_git() < 0)
+ return;
+
+ writer.reused = kh_init_sha1();
+ rebuild_existing_bitmaps(to_pack, writer.reused, writer.show_progress);
+}
+
+static struct ewah_bitmap *find_reused_bitmap(const unsigned char *sha1)
+{
+ khiter_t hash_pos;
+
+ if (!writer.reused)
+ return NULL;
+
+ hash_pos = kh_get_sha1(writer.reused, sha1);
+ if (hash_pos >= kh_end(writer.reused))
+ return NULL;
+
+ return kh_value(writer.reused, hash_pos);
+}
+
+void bitmap_writer_select_commits(struct commit **indexed_commits,
+ unsigned int indexed_commits_nr,
+ int max_bitmaps)
+{
+ unsigned int i = 0, j, next;
+
+ qsort(indexed_commits, indexed_commits_nr, sizeof(indexed_commits[0]),
+ date_compare);
+
+ if (writer.show_progress)
+ writer.progress = start_progress("Selecting bitmap commits", 0);
+
+ if (indexed_commits_nr < 100) {
+ for (i = 0; i < indexed_commits_nr; ++i)
+ push_bitmapped_commit(indexed_commits[i], NULL);
+ return;
+ }
+
+ for (;;) {
+ struct ewah_bitmap *reused_bitmap = NULL;
+ struct commit *chosen = NULL;
+
+ next = next_commit_index(i);
+
+ if (i + next >= indexed_commits_nr)
+ break;
+
+ if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) {
+ writer.selected_nr = max_bitmaps;
+ break;
+ }
+
+ if (next == 0) {
+ chosen = indexed_commits[i];
+ reused_bitmap = find_reused_bitmap(chosen->object.sha1);
+ } else {
+ chosen = indexed_commits[i + next];
+
+ for (j = 0; j <= next; ++j) {
+ struct commit *cm = indexed_commits[i + j];
+
+ reused_bitmap = find_reused_bitmap(cm->object.sha1);
+ if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) {
+ chosen = cm;
+ break;
+ }
+
+ if (cm->parents && cm->parents->next)
+ chosen = cm;
+ }
+ }
+
+ push_bitmapped_commit(chosen, reused_bitmap);
+
+ i += next + 1;
+ display_progress(writer.progress, i);
+ }
+
+ stop_progress(&writer.progress);
+}
+
+
+static int sha1write_ewah_helper(void *f, const void *buf, size_t len)
+{
+ /* sha1write will die on error */
+ sha1write(f, buf, len);
+ return len;
+}
+
+/**
+ * Write the bitmap index to disk
+ */
+static inline void dump_bitmap(struct sha1file *f, struct ewah_bitmap *bitmap)
+{
+ if (ewah_serialize_to(bitmap, sha1write_ewah_helper, f) < 0)
+ die("Failed to write bitmap index");
+}
+
+static const unsigned char *sha1_access(size_t pos, void *table)
+{
+ struct pack_idx_entry **index = table;
+ return index[pos]->sha1;
+}
+
+static void write_selected_commits_v1(struct sha1file *f,
+ struct pack_idx_entry **index,
+ uint32_t index_nr)
+{
+ int i;
+
+ for (i = 0; i < writer.selected_nr; ++i) {
+ struct bitmapped_commit *stored = &writer.selected[i];
+ struct bitmap_disk_entry on_disk;
+
+ int commit_pos =
+ sha1_pos(stored->commit->object.sha1, index, index_nr, sha1_access);
+
+ if (commit_pos < 0)
+ die("BUG: trying to write commit not in index");
+
+ on_disk.object_pos = htonl(commit_pos);
+ on_disk.xor_offset = stored->xor_offset;
+ on_disk.flags = stored->flags;
+
+ sha1write(f, &on_disk, sizeof(on_disk));
+ dump_bitmap(f, stored->write_as);
+ }
+}
+
+static void write_hash_cache(struct sha1file *f,
+ struct pack_idx_entry **index,
+ uint32_t index_nr)
+{
+ uint32_t i;
+
+ for (i = 0; i < index_nr; ++i) {
+ struct object_entry *entry = (struct object_entry *)index[i];
+ uint32_t hash_value = htonl(entry->hash);
+ sha1write(f, &hash_value, sizeof(hash_value));
+ }
+}
+
+void bitmap_writer_set_checksum(unsigned char *sha1)
+{
+ hashcpy(writer.pack_checksum, sha1);
+}
+
+void bitmap_writer_finish(struct pack_idx_entry **index,
+ uint32_t index_nr,
+ const char *filename,
+ uint16_t options)
+{
+ static char tmp_file[PATH_MAX];
+ static uint16_t default_version = 1;
+ static uint16_t flags = BITMAP_OPT_FULL_DAG;
+ struct sha1file *f;
+
+ struct bitmap_disk_header header;
+
+ int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
+
+ if (fd < 0)
+ die_errno("unable to create '%s'", tmp_file);
+ f = sha1fd(fd, tmp_file);
+
+ memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
+ header.version = htons(default_version);
+ header.options = htons(flags | options);
+ header.entry_count = htonl(writer.selected_nr);
+ memcpy(header.checksum, writer.pack_checksum, 20);
+
+ sha1write(f, &header, sizeof(header));
+ dump_bitmap(f, writer.commits);
+ dump_bitmap(f, writer.trees);
+ dump_bitmap(f, writer.blobs);
+ dump_bitmap(f, writer.tags);
+ write_selected_commits_v1(f, index, index_nr);
+
+ if (options & BITMAP_OPT_HASH_CACHE)
+ write_hash_cache(f, index, index_nr);
+
+ sha1close(f, NULL, CSUM_FSYNC);
+
+ if (adjust_shared_perm(tmp_file))
+ die_errno("unable to make temporary bitmap file readable");
+
+ if (rename(tmp_file, filename))
+ die_errno("unable to rename temporary bitmap file to '%s'", filename);
+}
diff --git a/pack-bitmap.c b/pack-bitmap.c
new file mode 100644
index 0000000000..ae0b57b950
--- /dev/null
+++ b/pack-bitmap.c
@@ -0,0 +1,1073 @@
+#include "cache.h"
+#include "commit.h"
+#include "tag.h"
+#include "diff.h"
+#include "revision.h"
+#include "progress.h"
+#include "list-objects.h"
+#include "pack.h"
+#include "pack-bitmap.h"
+#include "pack-revindex.h"
+#include "pack-objects.h"
+
+/*
+ * An entry on the bitmap index, representing the bitmap for a given
+ * commit.
+ */
+struct stored_bitmap {
+ unsigned char sha1[20];
+ struct ewah_bitmap *root;
+ struct stored_bitmap *xor;
+ int flags;
+};
+
+/*
+ * The currently active bitmap index. By design, repositories only have
+ * a single bitmap index available (the index for the biggest packfile in
+ * the repository), since bitmap indexes need full closure.
+ *
+ * If there is more than one bitmap index available (e.g. because of alternates),
+ * the active bitmap index is the largest one.
+ */
+static struct bitmap_index {
+ /* Packfile to which this bitmap index belongs to */
+ struct packed_git *pack;
+
+ /* reverse index for the packfile */
+ struct pack_revindex *reverse_index;
+
+ /*
+ * Mark the first `reuse_objects` in the packfile as reused:
+ * they will be sent as-is without using them for repacking
+ * calculations
+ */
+ uint32_t reuse_objects;
+
+ /* mmapped buffer of the whole bitmap index */
+ unsigned char *map;
+ size_t map_size; /* size of the mmaped buffer */
+ size_t map_pos; /* current position when loading the index */
+
+ /*
+ * Type indexes.
+ *
+ * Each bitmap marks which objects in the packfile are of the given
+ * type. This provides type information when yielding the objects from
+ * the packfile during a walk, which allows for better delta bases.
+ */
+ struct ewah_bitmap *commits;
+ struct ewah_bitmap *trees;
+ struct ewah_bitmap *blobs;
+ struct ewah_bitmap *tags;
+
+ /* Map from SHA1 -> `stored_bitmap` for all the bitmapped comits */
+ khash_sha1 *bitmaps;
+
+ /* Number of bitmapped commits */
+ uint32_t entry_count;
+
+ /* Name-hash cache (or NULL if not present). */
+ uint32_t *hashes;
+
+ /*
+ * Extended index.
+ *
+ * When trying to perform bitmap operations with objects that are not
+ * packed in `pack`, these objects are added to this "fake index" and
+ * are assumed to appear at the end of the packfile for all operations
+ */
+ struct eindex {
+ struct object **objects;
+ uint32_t *hashes;
+ uint32_t count, alloc;
+ khash_sha1_pos *positions;
+ } ext_index;
+
+ /* Bitmap result of the last performed walk */
+ struct bitmap *result;
+
+ /* Version of the bitmap index */
+ unsigned int version;
+
+ unsigned loaded : 1;
+
+} bitmap_git;
+
+static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
+{
+ struct ewah_bitmap *parent;
+ struct ewah_bitmap *composed;
+
+ if (st->xor == NULL)
+ return st->root;
+
+ composed = ewah_pool_new();
+ parent = lookup_stored_bitmap(st->xor);
+ ewah_xor(st->root, parent, composed);
+
+ ewah_pool_free(st->root);
+ st->root = composed;
+ st->xor = NULL;
+
+ return composed;
+}
+
+/*
+ * Read a bitmap from the current read position on the mmaped
+ * index, and increase the read position accordingly
+ */
+static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+{
+ struct ewah_bitmap *b = ewah_pool_new();
+
+ int bitmap_size = ewah_read_mmap(b,
+ index->map + index->map_pos,
+ index->map_size - index->map_pos);
+
+ if (bitmap_size < 0) {
+ error("Failed to load bitmap index (corrupted?)");
+ ewah_pool_free(b);
+ return NULL;
+ }
+
+ index->map_pos += bitmap_size;
+ return b;
+}
+
+static int load_bitmap_header(struct bitmap_index *index)
+{
+ struct bitmap_disk_header *header = (void *)index->map;
+
+ if (index->map_size < sizeof(*header) + 20)
+ return error("Corrupted bitmap index (missing header data)");
+
+ if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
+ return error("Corrupted bitmap index file (wrong header)");
+
+ index->version = ntohs(header->version);
+ if (index->version != 1)
+ return error("Unsupported version for bitmap index file (%d)", index->version);
+
+ /* Parse known bitmap format options */
+ {
+ uint32_t flags = ntohs(header->options);
+
+ if ((flags & BITMAP_OPT_FULL_DAG) == 0)
+ return error("Unsupported options for bitmap index file "
+ "(Git requires BITMAP_OPT_FULL_DAG)");
+
+ if (flags & BITMAP_OPT_HASH_CACHE) {
+ unsigned char *end = index->map + index->map_size - 20;
+ index->hashes = ((uint32_t *)end) - index->pack->num_objects;
+ }
+ }
+
+ index->entry_count = ntohl(header->entry_count);
+ index->map_pos += sizeof(*header);
+ return 0;
+}
+
+static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
+ struct ewah_bitmap *root,
+ const unsigned char *sha1,
+ struct stored_bitmap *xor_with,
+ int flags)
+{
+ struct stored_bitmap *stored;
+ khiter_t hash_pos;
+ int ret;
+
+ stored = xmalloc(sizeof(struct stored_bitmap));
+ stored->root = root;
+ stored->xor = xor_with;
+ stored->flags = flags;
+ hashcpy(stored->sha1, sha1);
+
+ hash_pos = kh_put_sha1(index->bitmaps, stored->sha1, &ret);
+
+ /* a 0 return code means the insertion succeeded with no changes,
+ * because the SHA1 already existed on the map. this is bad, there
+ * shouldn't be duplicated commits in the index */
+ if (ret == 0) {
+ error("Duplicate entry in bitmap index: %s", sha1_to_hex(sha1));
+ return NULL;
+ }
+
+ kh_value(index->bitmaps, hash_pos) = stored;
+ return stored;
+}
+
+static int load_bitmap_entries_v1(struct bitmap_index *index)
+{
+ static const size_t MAX_XOR_OFFSET = 160;
+
+ uint32_t i;
+ struct stored_bitmap **recent_bitmaps;
+ struct bitmap_disk_entry *entry;
+
+ recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap));
+
+ for (i = 0; i < index->entry_count; ++i) {
+ int xor_offset, flags;
+ struct ewah_bitmap *bitmap = NULL;
+ struct stored_bitmap *xor_bitmap = NULL;
+ uint32_t commit_idx_pos;
+ const unsigned char *sha1;
+
+ entry = (struct bitmap_disk_entry *)(index->map + index->map_pos);
+ index->map_pos += sizeof(struct bitmap_disk_entry);
+
+ commit_idx_pos = ntohl(entry->object_pos);
+ sha1 = nth_packed_object_sha1(index->pack, commit_idx_pos);
+
+ xor_offset = (int)entry->xor_offset;
+ flags = (int)entry->flags;
+
+ bitmap = read_bitmap_1(index);
+ if (!bitmap)
+ return -1;
+
+ if (xor_offset > MAX_XOR_OFFSET || xor_offset > i)
+ return error("Corrupted bitmap pack index");
+
+ if (xor_offset > 0) {
+ xor_bitmap = recent_bitmaps[(i - xor_offset) % MAX_XOR_OFFSET];
+
+ if (xor_bitmap == NULL)
+ return error("Invalid XOR offset in bitmap pack index");
+ }
+
+ recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
+ index, bitmap, sha1, xor_bitmap, flags);
+ }
+
+ return 0;
+}
+
+static int open_pack_bitmap_1(struct packed_git *packfile)
+{
+ int fd;
+ struct stat st;
+ char *idx_name;
+
+ if (open_pack_index(packfile))
+ return -1;
+
+ idx_name = pack_bitmap_filename(packfile);
+ fd = git_open_noatime(idx_name);
+ free(idx_name);
+
+ if (fd < 0)
+ return -1;
+
+ if (fstat(fd, &st)) {
+ close(fd);
+ return -1;
+ }
+
+ if (bitmap_git.pack) {
+ warning("ignoring extra bitmap file: %s", packfile->pack_name);
+ close(fd);
+ return -1;
+ }
+
+ bitmap_git.pack = packfile;
+ bitmap_git.map_size = xsize_t(st.st_size);
+ bitmap_git.map = xmmap(NULL, bitmap_git.map_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ bitmap_git.map_pos = 0;
+ close(fd);
+
+ if (load_bitmap_header(&bitmap_git) < 0) {
+ munmap(bitmap_git.map, bitmap_git.map_size);
+ bitmap_git.map = NULL;
+ bitmap_git.map_size = 0;
+ return -1;
+ }
+
+ return 0;
+}
+
+static int load_pack_bitmap(void)
+{
+ assert(bitmap_git.map && !bitmap_git.loaded);
+
+ bitmap_git.bitmaps = kh_init_sha1();
+ bitmap_git.ext_index.positions = kh_init_sha1_pos();
+ bitmap_git.reverse_index = revindex_for_pack(bitmap_git.pack);
+
+ if (!(bitmap_git.commits = read_bitmap_1(&bitmap_git)) ||
+ !(bitmap_git.trees = read_bitmap_1(&bitmap_git)) ||
+ !(bitmap_git.blobs = read_bitmap_1(&bitmap_git)) ||
+ !(bitmap_git.tags = read_bitmap_1(&bitmap_git)))
+ goto failed;
+
+ if (load_bitmap_entries_v1(&bitmap_git) < 0)
+ goto failed;
+
+ bitmap_git.loaded = 1;
+ return 0;
+
+failed:
+ munmap(bitmap_git.map, bitmap_git.map_size);
+ bitmap_git.map = NULL;
+ bitmap_git.map_size = 0;
+ return -1;
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
+{
+ char *idx_name;
+ int len;
+
+ len = strlen(p->pack_name) - strlen(".pack");
+ idx_name = xmalloc(len + strlen(".bitmap") + 1);
+
+ memcpy(idx_name, p->pack_name, len);
+ memcpy(idx_name + len, ".bitmap", strlen(".bitmap") + 1);
+
+ return idx_name;
+}
+
+static int open_pack_bitmap(void)
+{
+ struct packed_git *p;
+ int ret = -1;
+
+ assert(!bitmap_git.map && !bitmap_git.loaded);
+
+ prepare_packed_git();
+ for (p = packed_git; p; p = p->next) {
+ if (open_pack_bitmap_1(p) == 0)
+ ret = 0;
+ }
+
+ return ret;
+}
+
+int prepare_bitmap_git(void)
+{
+ if (bitmap_git.loaded)
+ return 0;
+
+ if (!open_pack_bitmap())
+ return load_pack_bitmap();
+
+ return -1;
+}
+
+struct include_data {
+ struct bitmap *base;
+ struct bitmap *seen;
+};
+
+static inline int bitmap_position_extended(const unsigned char *sha1)
+{
+ khash_sha1_pos *positions = bitmap_git.ext_index.positions;
+ khiter_t pos = kh_get_sha1_pos(positions, sha1);
+
+ if (pos < kh_end(positions)) {
+ int bitmap_pos = kh_value(positions, pos);
+ return bitmap_pos + bitmap_git.pack->num_objects;
+ }
+
+ return -1;
+}
+
+static inline int bitmap_position_packfile(const unsigned char *sha1)
+{
+ off_t offset = find_pack_entry_one(sha1, bitmap_git.pack);
+ if (!offset)
+ return -1;
+
+ return find_revindex_position(bitmap_git.reverse_index, offset);
+}
+
+static int bitmap_position(const unsigned char *sha1)
+{
+ int pos = bitmap_position_packfile(sha1);
+ return (pos >= 0) ? pos : bitmap_position_extended(sha1);
+}
+
+static int ext_index_add_object(struct object *object, const char *name)
+{
+ struct eindex *eindex = &bitmap_git.ext_index;
+
+ khiter_t hash_pos;
+ int hash_ret;
+ int bitmap_pos;
+
+ hash_pos = kh_put_sha1_pos(eindex->positions, object->sha1, &hash_ret);
+ if (hash_ret > 0) {
+ if (eindex->count >= eindex->alloc) {
+ eindex->alloc = (eindex->alloc + 16) * 3 / 2;
+ eindex->objects = xrealloc(eindex->objects,
+ eindex->alloc * sizeof(struct object *));
+ eindex->hashes = xrealloc(eindex->hashes,
+ eindex->alloc * sizeof(uint32_t));
+ }
+
+ bitmap_pos = eindex->count;
+ eindex->objects[eindex->count] = object;
+ eindex->hashes[eindex->count] = pack_name_hash(name);
+ kh_value(eindex->positions, hash_pos) = bitmap_pos;
+ eindex->count++;
+ } else {
+ bitmap_pos = kh_value(eindex->positions, hash_pos);
+ }
+
+ return bitmap_pos + bitmap_git.pack->num_objects;
+}
+
+static void show_object(struct object *object, const struct name_path *path,
+ const char *last, void *data)
+{
+ struct bitmap *base = data;
+ int bitmap_pos;
+
+ bitmap_pos = bitmap_position(object->sha1);
+
+ if (bitmap_pos < 0) {
+ char *name = path_name(path, last);
+ bitmap_pos = ext_index_add_object(object, name);
+ free(name);
+ }
+
+ bitmap_set(base, bitmap_pos);
+}
+
+static void show_commit(struct commit *commit, void *data)
+{
+}
+
+static int add_to_include_set(struct include_data *data,
+ const unsigned char *sha1,
+ int bitmap_pos)
+{
+ khiter_t hash_pos;
+
+ if (data->seen && bitmap_get(data->seen, bitmap_pos))
+ return 0;
+
+ if (bitmap_get(data->base, bitmap_pos))
+ return 0;
+
+ hash_pos = kh_get_sha1(bitmap_git.bitmaps, sha1);
+ if (hash_pos < kh_end(bitmap_git.bitmaps)) {
+ struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, hash_pos);
+ bitmap_or_ewah(data->base, lookup_stored_bitmap(st));
+ return 0;
+ }
+
+ bitmap_set(data->base, bitmap_pos);
+ return 1;
+}
+
+static int should_include(struct commit *commit, void *_data)
+{
+ struct include_data *data = _data;
+ int bitmap_pos;
+
+ bitmap_pos = bitmap_position(commit->object.sha1);
+ if (bitmap_pos < 0)
+ bitmap_pos = ext_index_add_object((struct object *)commit, NULL);
+
+ if (!add_to_include_set(data, commit->object.sha1, bitmap_pos)) {
+ struct commit_list *parent = commit->parents;
+
+ while (parent) {
+ parent->item->object.flags |= SEEN;
+ parent = parent->next;
+ }
+
+ return 0;
+ }
+
+ return 1;
+}
+
+static struct bitmap *find_objects(struct rev_info *revs,
+ struct object_list *roots,
+ struct bitmap *seen)
+{
+ struct bitmap *base = NULL;
+ int needs_walk = 0;
+
+ struct object_list *not_mapped = NULL;
+
+ /*
+ * Go through all the roots for the walk. The ones that have bitmaps
+ * on the bitmap index will be `or`ed together to form an initial
+ * global reachability analysis.
+ *
+ * The ones without bitmaps in the index will be stored in the
+ * `not_mapped_list` for further processing.
+ */
+ while (roots) {
+ struct object *object = roots->item;
+ roots = roots->next;
+
+ if (object->type == OBJ_COMMIT) {
+ khiter_t pos = kh_get_sha1(bitmap_git.bitmaps, object->sha1);
+
+ if (pos < kh_end(bitmap_git.bitmaps)) {
+ struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, pos);
+ struct ewah_bitmap *or_with = lookup_stored_bitmap(st);
+
+ if (base == NULL)
+ base = ewah_to_bitmap(or_with);
+ else
+ bitmap_or_ewah(base, or_with);
+
+ object->flags |= SEEN;
+ continue;
+ }
+ }
+
+ object_list_insert(object, &not_mapped);
+ }
+
+ /*
+ * Best case scenario: We found bitmaps for all the roots,
+ * so the resulting `or` bitmap has the full reachability analysis
+ */
+ if (not_mapped == NULL)
+ return base;
+
+ roots = not_mapped;
+
+ /*
+ * Let's iterate through all the roots that don't have bitmaps to
+ * check if we can determine them to be reachable from the existing
+ * global bitmap.
+ *
+ * If we cannot find them in the existing global bitmap, we'll need
+ * to push them to an actual walk and run it until we can confirm
+ * they are reachable
+ */
+ while (roots) {
+ struct object *object = roots->item;
+ int pos;
+
+ roots = roots->next;
+ pos = bitmap_position(object->sha1);
+
+ if (pos < 0 || base == NULL || !bitmap_get(base, pos)) {
+ object->flags &= ~UNINTERESTING;
+ add_pending_object(revs, object, "");
+ needs_walk = 1;
+ } else {
+ object->flags |= SEEN;
+ }
+ }
+
+ if (needs_walk) {
+ struct include_data incdata;
+
+ if (base == NULL)
+ base = bitmap_new();
+
+ incdata.base = base;
+ incdata.seen = seen;
+
+ revs->include_check = should_include;
+ revs->include_check_data = &incdata;
+
+ if (prepare_revision_walk(revs))
+ die("revision walk setup failed");
+
+ traverse_commit_list(revs, show_commit, show_object, base);
+ }
+
+ return base;
+}
+
+static void show_extended_objects(struct bitmap *objects,
+ show_reachable_fn show_reach)
+{
+ struct eindex *eindex = &bitmap_git.ext_index;
+ uint32_t i;
+
+ for (i = 0; i < eindex->count; ++i) {
+ struct object *obj;
+
+ if (!bitmap_get(objects, bitmap_git.pack->num_objects + i))
+ continue;
+
+ obj = eindex->objects[i];
+ show_reach(obj->sha1, obj->type, 0, eindex->hashes[i], NULL, 0);
+ }
+}
+
+static void show_objects_for_type(
+ struct bitmap *objects,
+ struct ewah_bitmap *type_filter,
+ enum object_type object_type,
+ show_reachable_fn show_reach)
+{
+ size_t pos = 0, i = 0;
+ uint32_t offset;
+
+ struct ewah_iterator it;
+ eword_t filter;
+
+ if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
+ return;
+
+ ewah_iterator_init(&it, type_filter);
+
+ while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ eword_t word = objects->words[i] & filter;
+
+ for (offset = 0; offset < BITS_IN_WORD; ++offset) {
+ const unsigned char *sha1;
+ struct revindex_entry *entry;
+ uint32_t hash = 0;
+
+ if ((word >> offset) == 0)
+ break;
+
+ offset += ewah_bit_ctz64(word >> offset);
+
+ if (pos + offset < bitmap_git.reuse_objects)
+ continue;
+
+ entry = &bitmap_git.reverse_index->revindex[pos + offset];
+ sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr);
+
+ if (bitmap_git.hashes)
+ hash = ntohl(bitmap_git.hashes[entry->nr]);
+
+ show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset);
+ }
+
+ pos += BITS_IN_WORD;
+ i++;
+ }
+}
+
+static int in_bitmapped_pack(struct object_list *roots)
+{
+ while (roots) {
+ struct object *object = roots->item;
+ roots = roots->next;
+
+ if (find_pack_entry_one(object->sha1, bitmap_git.pack) > 0)
+ return 1;
+ }
+
+ return 0;
+}
+
+int prepare_bitmap_walk(struct rev_info *revs)
+{
+ unsigned int i;
+ unsigned int pending_nr = revs->pending.nr;
+ struct object_array_entry *pending_e = revs->pending.objects;
+
+ struct object_list *wants = NULL;
+ struct object_list *haves = NULL;
+
+ struct bitmap *wants_bitmap = NULL;
+ struct bitmap *haves_bitmap = NULL;
+
+ if (!bitmap_git.loaded) {
+ /* try to open a bitmapped pack, but don't parse it yet
+ * because we may not need to use it */
+ if (open_pack_bitmap() < 0)
+ return -1;
+ }
+
+ for (i = 0; i < pending_nr; ++i) {
+ struct object *object = pending_e[i].item;
+
+ if (object->type == OBJ_NONE)
+ parse_object_or_die(object->sha1, NULL);
+
+ while (object->type == OBJ_TAG) {
+ struct tag *tag = (struct tag *) object;
+
+ if (object->flags & UNINTERESTING)
+ object_list_insert(object, &haves);
+ else
+ object_list_insert(object, &wants);
+
+ if (!tag->tagged)
+ die("bad tag");
+ object = parse_object_or_die(tag->tagged->sha1, NULL);
+ }
+
+ if (object->flags & UNINTERESTING)
+ object_list_insert(object, &haves);
+ else
+ object_list_insert(object, &wants);
+ }
+
+ /*
+ * if we have a HAVES list, but none of those haves is contained
+ * in the packfile that has a bitmap, we don't have anything to
+ * optimize here
+ */
+ if (haves && !in_bitmapped_pack(haves))
+ return -1;
+
+ /* if we don't want anything, we're done here */
+ if (!wants)
+ return -1;
+
+ /*
+ * now we're going to use bitmaps, so load the actual bitmap entries
+ * from disk. this is the point of no return; after this the rev_list
+ * becomes invalidated and we must perform the revwalk through bitmaps
+ */
+ if (!bitmap_git.loaded && load_pack_bitmap() < 0)
+ return -1;
+
+ revs->pending.nr = 0;
+ revs->pending.alloc = 0;
+ revs->pending.objects = NULL;
+
+ if (haves) {
+ haves_bitmap = find_objects(revs, haves, NULL);
+ reset_revision_walk();
+
+ if (haves_bitmap == NULL)
+ die("BUG: failed to perform bitmap walk");
+ }
+
+ wants_bitmap = find_objects(revs, wants, haves_bitmap);
+
+ if (!wants_bitmap)
+ die("BUG: failed to perform bitmap walk");
+
+ if (haves_bitmap)
+ bitmap_and_not(wants_bitmap, haves_bitmap);
+
+ bitmap_git.result = wants_bitmap;
+
+ bitmap_free(haves_bitmap);
+ return 0;
+}
+
+int reuse_partial_packfile_from_bitmap(struct packed_git **packfile,
+ uint32_t *entries,
+ off_t *up_to)
+{
+ /*
+ * Reuse the packfile content if we need more than
+ * 90% of its objects
+ */
+ static const double REUSE_PERCENT = 0.9;
+
+ struct bitmap *result = bitmap_git.result;
+ uint32_t reuse_threshold;
+ uint32_t i, reuse_objects = 0;
+
+ assert(result);
+
+ for (i = 0; i < result->word_alloc; ++i) {
+ if (result->words[i] != (eword_t)~0) {
+ reuse_objects += ewah_bit_ctz64(~result->words[i]);
+ break;
+ }
+
+ reuse_objects += BITS_IN_WORD;
+ }
+
+#ifdef GIT_BITMAP_DEBUG
+ {
+ const unsigned char *sha1;
+ struct revindex_entry *entry;
+
+ entry = &bitmap_git.reverse_index->revindex[reuse_objects];
+ sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr);
+
+ fprintf(stderr, "Failed to reuse at %d (%016llx)\n",
+ reuse_objects, result->words[i]);
+ fprintf(stderr, " %s\n", sha1_to_hex(sha1));
+ }
+#endif
+
+ if (!reuse_objects)
+ return -1;
+
+ if (reuse_objects >= bitmap_git.pack->num_objects) {
+ bitmap_git.reuse_objects = *entries = bitmap_git.pack->num_objects;
+ *up_to = -1; /* reuse the full pack */
+ *packfile = bitmap_git.pack;
+ return 0;
+ }
+
+ reuse_threshold = bitmap_popcount(bitmap_git.result) * REUSE_PERCENT;
+
+ if (reuse_objects < reuse_threshold)
+ return -1;
+
+ bitmap_git.reuse_objects = *entries = reuse_objects;
+ *up_to = bitmap_git.reverse_index->revindex[reuse_objects].offset;
+ *packfile = bitmap_git.pack;
+
+ return 0;
+}
+
+void traverse_bitmap_commit_list(show_reachable_fn show_reachable)
+{
+ assert(bitmap_git.result);
+
+ show_objects_for_type(bitmap_git.result, bitmap_git.commits,
+ OBJ_COMMIT, show_reachable);
+ show_objects_for_type(bitmap_git.result, bitmap_git.trees,
+ OBJ_TREE, show_reachable);
+ show_objects_for_type(bitmap_git.result, bitmap_git.blobs,
+ OBJ_BLOB, show_reachable);
+ show_objects_for_type(bitmap_git.result, bitmap_git.tags,
+ OBJ_TAG, show_reachable);
+
+ show_extended_objects(bitmap_git.result, show_reachable);
+
+ bitmap_free(bitmap_git.result);
+ bitmap_git.result = NULL;
+}
+
+static uint32_t count_object_type(struct bitmap *objects,
+ enum object_type type)
+{
+ struct eindex *eindex = &bitmap_git.ext_index;
+
+ uint32_t i = 0, count = 0;
+ struct ewah_iterator it;
+ eword_t filter;
+
+ switch (type) {
+ case OBJ_COMMIT:
+ ewah_iterator_init(&it, bitmap_git.commits);
+ break;
+
+ case OBJ_TREE:
+ ewah_iterator_init(&it, bitmap_git.trees);
+ break;
+
+ case OBJ_BLOB:
+ ewah_iterator_init(&it, bitmap_git.blobs);
+ break;
+
+ case OBJ_TAG:
+ ewah_iterator_init(&it, bitmap_git.tags);
+ break;
+
+ default:
+ return 0;
+ }
+
+ while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ eword_t word = objects->words[i++] & filter;
+ count += ewah_bit_popcount64(word);
+ }
+
+ for (i = 0; i < eindex->count; ++i) {
+ if (eindex->objects[i]->type == type &&
+ bitmap_get(objects, bitmap_git.pack->num_objects + i))
+ count++;
+ }
+
+ return count;
+}
+
+void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees,
+ uint32_t *blobs, uint32_t *tags)
+{
+ assert(bitmap_git.result);
+
+ if (commits)
+ *commits = count_object_type(bitmap_git.result, OBJ_COMMIT);
+
+ if (trees)
+ *trees = count_object_type(bitmap_git.result, OBJ_TREE);
+
+ if (blobs)
+ *blobs = count_object_type(bitmap_git.result, OBJ_BLOB);
+
+ if (tags)
+ *tags = count_object_type(bitmap_git.result, OBJ_TAG);
+}
+
+struct bitmap_test_data {
+ struct bitmap *base;
+ struct progress *prg;
+ size_t seen;
+};
+
+static void test_show_object(struct object *object,
+ const struct name_path *path,
+ const char *last, void *data)
+{
+ struct bitmap_test_data *tdata = data;
+ int bitmap_pos;
+
+ bitmap_pos = bitmap_position(object->sha1);
+ if (bitmap_pos < 0)
+ die("Object not in bitmap: %s\n", sha1_to_hex(object->sha1));
+
+ bitmap_set(tdata->base, bitmap_pos);
+ display_progress(tdata->prg, ++tdata->seen);
+}
+
+static void test_show_commit(struct commit *commit, void *data)
+{
+ struct bitmap_test_data *tdata = data;
+ int bitmap_pos;
+
+ bitmap_pos = bitmap_position(commit->object.sha1);
+ if (bitmap_pos < 0)
+ die("Object not in bitmap: %s\n", sha1_to_hex(commit->object.sha1));
+
+ bitmap_set(tdata->base, bitmap_pos);
+ display_progress(tdata->prg, ++tdata->seen);
+}
+
+void test_bitmap_walk(struct rev_info *revs)
+{
+ struct object *root;
+ struct bitmap *result = NULL;
+ khiter_t pos;
+ size_t result_popcnt;
+ struct bitmap_test_data tdata;
+
+ if (prepare_bitmap_git())
+ die("failed to load bitmap indexes");
+
+ if (revs->pending.nr != 1)
+ die("you must specify exactly one commit to test");
+
+ fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
+ bitmap_git.version, bitmap_git.entry_count);
+
+ root = revs->pending.objects[0].item;
+ pos = kh_get_sha1(bitmap_git.bitmaps, root->sha1);
+
+ if (pos < kh_end(bitmap_git.bitmaps)) {
+ struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, pos);
+ struct ewah_bitmap *bm = lookup_stored_bitmap(st);
+
+ fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n",
+ sha1_to_hex(root->sha1), (int)bm->bit_size, ewah_checksum(bm));
+
+ result = ewah_to_bitmap(bm);
+ }
+
+ if (result == NULL)
+ die("Commit %s doesn't have an indexed bitmap", sha1_to_hex(root->sha1));
+
+ revs->tag_objects = 1;
+ revs->tree_objects = 1;
+ revs->blob_objects = 1;
+
+ result_popcnt = bitmap_popcount(result);
+
+ if (prepare_revision_walk(revs))
+ die("revision walk setup failed");
+
+ tdata.base = bitmap_new();
+ tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
+ tdata.seen = 0;
+
+ traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
+
+ stop_progress(&tdata.prg);
+
+ if (bitmap_equals(result, tdata.base))
+ fprintf(stderr, "OK!\n");
+ else
+ fprintf(stderr, "Mismatch!\n");
+}
+
+static int rebuild_bitmap(uint32_t *reposition,
+ struct ewah_bitmap *source,
+ struct bitmap *dest)
+{
+ uint32_t pos = 0;
+ struct ewah_iterator it;
+ eword_t word;
+
+ ewah_iterator_init(&it, source);
+
+ while (ewah_iterator_next(&word, &it)) {
+ uint32_t offset, bit_pos;
+
+ for (offset = 0; offset < BITS_IN_WORD; ++offset) {
+ if ((word >> offset) == 0)
+ break;
+
+ offset += ewah_bit_ctz64(word >> offset);
+
+ bit_pos = reposition[pos + offset];
+ if (bit_pos > 0)
+ bitmap_set(dest, bit_pos - 1);
+ else /* can't reuse, we don't have the object */
+ return -1;
+ }
+
+ pos += BITS_IN_WORD;
+ }
+ return 0;
+}
+
+int rebuild_existing_bitmaps(struct packing_data *mapping,
+ khash_sha1 *reused_bitmaps,
+ int show_progress)
+{
+ uint32_t i, num_objects;
+ uint32_t *reposition;
+ struct bitmap *rebuild;
+ struct stored_bitmap *stored;
+ struct progress *progress = NULL;
+
+ khiter_t hash_pos;
+ int hash_ret;
+
+ if (prepare_bitmap_git() < 0)
+ return -1;
+
+ num_objects = bitmap_git.pack->num_objects;
+ reposition = xcalloc(num_objects, sizeof(uint32_t));
+
+ for (i = 0; i < num_objects; ++i) {
+ const unsigned char *sha1;
+ struct revindex_entry *entry;
+ struct object_entry *oe;
+
+ entry = &bitmap_git.reverse_index->revindex[i];
+ sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr);
+ oe = packlist_find(mapping, sha1, NULL);
+
+ if (oe)
+ reposition[i] = oe->in_pack_pos + 1;
+ }
+
+ rebuild = bitmap_new();
+ i = 0;
+
+ if (show_progress)
+ progress = start_progress("Reusing bitmaps", 0);
+
+ kh_foreach_value(bitmap_git.bitmaps, stored, {
+ if (stored->flags & BITMAP_FLAG_REUSE) {
+ if (!rebuild_bitmap(reposition,
+ lookup_stored_bitmap(stored),
+ rebuild)) {
+ hash_pos = kh_put_sha1(reused_bitmaps,
+ stored->sha1,
+ &hash_ret);
+ kh_value(reused_bitmaps, hash_pos) =
+ bitmap_to_ewah(rebuild);
+ }
+ bitmap_reset(rebuild);
+ display_progress(progress, ++i);
+ }
+ });
+
+ stop_progress(&progress);
+
+ free(reposition);
+ bitmap_free(rebuild);
+ return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
new file mode 100644
index 0000000000..8b7f4e9f0d
--- /dev/null
+++ b/pack-bitmap.h
@@ -0,0 +1,64 @@
+#ifndef PACK_BITMAP_H
+#define PACK_BITMAP_H
+
+#include "ewah/ewok.h"
+#include "khash.h"
+#include "pack-objects.h"
+
+struct bitmap_disk_entry {
+ uint32_t object_pos;
+ uint8_t xor_offset;
+ uint8_t flags;
+} __attribute__((packed));
+
+struct bitmap_disk_header {
+ char magic[4];
+ uint16_t version;
+ uint16_t options;
+ uint32_t entry_count;
+ unsigned char checksum[20];
+};
+
+static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
+
+#define NEEDS_BITMAP (1u<<22)
+
+enum pack_bitmap_opts {
+ BITMAP_OPT_FULL_DAG = 1,
+ BITMAP_OPT_HASH_CACHE = 4,
+};
+
+enum pack_bitmap_flags {
+ BITMAP_FLAG_REUSE = 0x1
+};
+
+typedef int (*show_reachable_fn)(
+ const unsigned char *sha1,
+ enum object_type type,
+ int flags,
+ uint32_t hash,
+ struct packed_git *found_pack,
+ off_t found_offset);
+
+int prepare_bitmap_git(void);
+void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, uint32_t *blobs, uint32_t *tags);
+void traverse_bitmap_commit_list(show_reachable_fn show_reachable);
+void test_bitmap_walk(struct rev_info *revs);
+char *pack_bitmap_filename(struct packed_git *p);
+int prepare_bitmap_walk(struct rev_info *revs);
+int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *entries, off_t *up_to);
+int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress);
+
+void bitmap_writer_show_progress(int show);
+void bitmap_writer_set_checksum(unsigned char *sha1);
+void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
+void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
+void bitmap_writer_select_commits(struct commit **indexed_commits,
+ unsigned int indexed_commits_nr, int max_bitmaps);
+void bitmap_writer_build(struct packing_data *to_pack);
+void bitmap_writer_finish(struct pack_idx_entry **index,
+ uint32_t index_nr,
+ const char *filename,
+ uint16_t options);
+
+#endif
diff --git a/pack-objects.c b/pack-objects.c
new file mode 100644
index 0000000000..d01d851ce9
--- /dev/null
+++ b/pack-objects.c
@@ -0,0 +1,111 @@
+#include "cache.h"
+#include "object.h"
+#include "pack.h"
+#include "pack-objects.h"
+
+static uint32_t locate_object_entry_hash(struct packing_data *pdata,
+ const unsigned char *sha1,
+ int *found)
+{
+ uint32_t i, hash, mask = (pdata->index_size - 1);
+
+ memcpy(&hash, sha1, sizeof(uint32_t));
+ i = hash & mask;
+
+ while (pdata->index[i] > 0) {
+ uint32_t pos = pdata->index[i] - 1;
+
+ if (!hashcmp(sha1, pdata->objects[pos].idx.sha1)) {
+ *found = 1;
+ return i;
+ }
+
+ i = (i + 1) & mask;
+ }
+
+ *found = 0;
+ return i;
+}
+
+static inline uint32_t closest_pow2(uint32_t v)
+{
+ v = v - 1;
+ v |= v >> 1;
+ v |= v >> 2;
+ v |= v >> 4;
+ v |= v >> 8;
+ v |= v >> 16;
+ return v + 1;
+}
+
+static void rehash_objects(struct packing_data *pdata)
+{
+ uint32_t i;
+ struct object_entry *entry;
+
+ pdata->index_size = closest_pow2(pdata->nr_objects * 3);
+ if (pdata->index_size < 1024)
+ pdata->index_size = 1024;
+
+ pdata->index = xrealloc(pdata->index, sizeof(uint32_t) * pdata->index_size);
+ memset(pdata->index, 0, sizeof(int) * pdata->index_size);
+
+ entry = pdata->objects;
+
+ for (i = 0; i < pdata->nr_objects; i++) {
+ int found;
+ uint32_t ix = locate_object_entry_hash(pdata, entry->idx.sha1, &found);
+
+ if (found)
+ die("BUG: Duplicate object in hash");
+
+ pdata->index[ix] = i + 1;
+ entry++;
+ }
+}
+
+struct object_entry *packlist_find(struct packing_data *pdata,
+ const unsigned char *sha1,
+ uint32_t *index_pos)
+{
+ uint32_t i;
+ int found;
+
+ if (!pdata->index_size)
+ return NULL;
+
+ i = locate_object_entry_hash(pdata, sha1, &found);
+
+ if (index_pos)
+ *index_pos = i;
+
+ if (!found)
+ return NULL;
+
+ return &pdata->objects[pdata->index[i] - 1];
+}
+
+struct object_entry *packlist_alloc(struct packing_data *pdata,
+ const unsigned char *sha1,
+ uint32_t index_pos)
+{
+ struct object_entry *new_entry;
+
+ if (pdata->nr_objects >= pdata->nr_alloc) {
+ pdata->nr_alloc = (pdata->nr_alloc + 1024) * 3 / 2;
+ pdata->objects = xrealloc(pdata->objects,
+ pdata->nr_alloc * sizeof(*new_entry));
+ }
+
+ new_entry = pdata->objects + pdata->nr_objects++;
+
+ memset(new_entry, 0, sizeof(*new_entry));
+ hashcpy(new_entry->idx.sha1, sha1);
+
+ if (pdata->index_size * 3 <= pdata->nr_objects * 4)
+ rehash_objects(pdata);
+ else
+ pdata->index[index_pos] = pdata->nr_objects;
+
+ return new_entry;
+}
diff --git a/pack-objects.h b/pack-objects.h
new file mode 100644
index 0000000000..d1b98b30ff
--- /dev/null
+++ b/pack-objects.h
@@ -0,0 +1,68 @@
+#ifndef PACK_OBJECTS_H
+#define PACK_OBJECTS_H
+
+struct object_entry {
+ struct pack_idx_entry idx;
+ unsigned long size; /* uncompressed size */
+ struct packed_git *in_pack; /* already in pack */
+ off_t in_pack_offset;
+ struct object_entry *delta; /* delta base object */
+ struct object_entry *delta_child; /* deltified objects who bases me */
+ struct object_entry *delta_sibling; /* other deltified objects who
+ * uses the same base as me
+ */
+ void *delta_data; /* cached delta (uncompressed) */
+ unsigned long delta_size; /* delta data size (uncompressed) */
+ unsigned long z_delta_size; /* delta data size (compressed) */
+ enum object_type type;
+ enum object_type in_pack_type; /* could be delta */
+ uint32_t hash; /* name hint hash */
+ unsigned int in_pack_pos;
+ unsigned char in_pack_header_size;
+ unsigned preferred_base:1; /*
+ * we do not pack this, but is available
+ * to be used as the base object to delta
+ * objects against.
+ */
+ unsigned no_try_delta:1;
+ unsigned tagged:1; /* near the very tip of refs */
+ unsigned filled:1; /* assigned write-order */
+};
+
+struct packing_data {
+ struct object_entry *objects;
+ uint32_t nr_objects, nr_alloc;
+
+ int32_t *index;
+ uint32_t index_size;
+};
+
+struct object_entry *packlist_alloc(struct packing_data *pdata,
+ const unsigned char *sha1,
+ uint32_t index_pos);
+
+struct object_entry *packlist_find(struct packing_data *pdata,
+ const unsigned char *sha1,
+ uint32_t *index_pos);
+
+static inline uint32_t pack_name_hash(const char *name)
+{
+ uint32_t c, hash = 0;
+
+ if (!name)
+ return 0;
+
+ /*
+ * This effectively just creates a sortable number from the
+ * last sixteen non-whitespace characters. Last characters
+ * count "most", so things that end in ".c" sort together.
+ */
+ while ((c = *name++) != 0) {
+ if (isspace(c))
+ continue;
+ hash = (hash >> 2) + (c << 24);
+ }
+ return hash;
+}
+
+#endif
diff --git a/pack-revindex.c b/pack-revindex.c
index b4d2b35bb3..5bd7c61980 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -16,11 +16,6 @@
* get the object sha1 from the main index.
*/
-struct pack_revindex {
- struct packed_git *p;
- struct revindex_entry *revindex;
-};
-
static struct pack_revindex *pack_revindex;
static int pack_revindex_hashsz;
@@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex *rix)
sort_revindex(rix->revindex, num_ent, p->pack_size);
}
-struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+struct pack_revindex *revindex_for_pack(struct packed_git *p)
{
int num;
- unsigned lo, hi;
struct pack_revindex *rix;
- struct revindex_entry *revindex;
if (!pack_revindex_hashsz)
init_pack_revindex();
+
num = pack_revindex_ix(p);
if (num < 0)
die("internal error: pack revindex fubar");
@@ -217,30 +211,37 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
rix = &pack_revindex[num];
if (!rix->revindex)
create_pack_revindex(rix);
- revindex = rix->revindex;
- lo = 0;
- hi = p->num_objects + 1;
+ return rix;
+}
+
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs)
+{
+ int lo = 0;
+ int hi = pridx->p->num_objects + 1;
+ struct revindex_entry *revindex = pridx->revindex;
+
do {
unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
- return revindex + mi;
+ return mi;
} else if (ofs < revindex[mi].offset)
hi = mi;
else
lo = mi + 1;
} while (lo < hi);
+
error("bad offset for revindex");
- return NULL;
+ return -1;
}
-void discard_revindex(void)
+struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
{
- if (pack_revindex_hashsz) {
- int i;
- for (i = 0; i < pack_revindex_hashsz; i++)
- free(pack_revindex[i].revindex);
- free(pack_revindex);
- pack_revindex_hashsz = 0;
- }
+ struct pack_revindex *pridx = revindex_for_pack(p);
+ int pos = find_revindex_position(pridx, ofs);
+
+ if (pos < 0)
+ return NULL;
+
+ return pridx->revindex + pos;
}
diff --git a/pack-revindex.h b/pack-revindex.h
index 8d5027ad91..d737f98965 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -6,7 +6,14 @@ struct revindex_entry {
unsigned int nr;
};
+struct pack_revindex {
+ struct packed_git *p;
+ struct revindex_entry *revindex;
+};
+
+struct pack_revindex *revindex_for_pack(struct packed_git *p);
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
+
struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
-void discard_revindex(void);
#endif
diff --git a/pack-write.c b/pack-write.c
index 605d01b25c..9b8308b759 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -364,5 +364,7 @@ void finish_tmp_packfile(char *name_buffer,
if (rename(idx_tmp_name, name_buffer))
die_errno("unable to rename temporary index file");
+ *end_of_name_prefix = '\0';
+
free((void *)idx_tmp_name);
}
diff --git a/read-cache.c b/read-cache.c
index 6d8ee3a792..fb440b4d9d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1314,26 +1314,6 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file());
}
-#ifndef NEEDS_ALIGNED_ACCESS
-#define ntoh_s(var) ntohs(var)
-#define ntoh_l(var) ntohl(var)
-#else
-static inline uint16_t ntoh_s_force_align(void *p)
-{
- uint16_t x;
- memcpy(&x, p, sizeof(x));
- return ntohs(x);
-}
-static inline uint32_t ntoh_l_force_align(void *p)
-{
- uint32_t x;
- memcpy(&x, p, sizeof(x));
- return ntohl(x);
-}
-#define ntoh_s(var) ntoh_s_force_align(&(var))
-#define ntoh_l(var) ntoh_l_force_align(&(var))
-#endif
-
static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
unsigned int flags,
const char *name,
@@ -1341,16 +1321,16 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
{
struct cache_entry *ce = xmalloc(cache_entry_size(len));
- ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
- ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
- ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
- ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
- ce->ce_stat_data.sd_dev = ntoh_l(ondisk->dev);
- ce->ce_stat_data.sd_ino = ntoh_l(ondisk->ino);
- ce->ce_mode = ntoh_l(ondisk->mode);
- ce->ce_stat_data.sd_uid = ntoh_l(ondisk->uid);
- ce->ce_stat_data.sd_gid = ntoh_l(ondisk->gid);
- ce->ce_stat_data.sd_size = ntoh_l(ondisk->size);
+ ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
+ ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
+ ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec);
+ ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec);
+ ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev);
+ ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino);
+ ce->ce_mode = get_be32(&ondisk->mode);
+ ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid);
+ ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid);
+ ce->ce_stat_data.sd_size = get_be32(&ondisk->size);
ce->ce_flags = flags & ~CE_NAMEMASK;
ce->ce_namelen = len;
hashcpy(ce->sha1, ondisk->sha1);
@@ -1390,14 +1370,14 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
unsigned int flags;
/* On-disk flags are just 16 bits */
- flags = ntoh_s(ondisk->flags);
+ flags = get_be16(&ondisk->flags);
len = flags & CE_NAMEMASK;
if (flags & CE_EXTENDED) {
struct ondisk_cache_entry_extended *ondisk2;
int extended_flags;
ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
- extended_flags = ntoh_s(ondisk2->flags2) << 16;
+ extended_flags = get_be16(&ondisk2->flags2) << 16;
/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
if (extended_flags & ~CE_EXTENDED_FLAGS)
die("Unknown index entry format %08x", extended_flags);
diff --git a/revision.c b/revision.c
index 61be08754a..bd027bc015 100644
--- a/revision.c
+++ b/revision.c
@@ -774,6 +774,10 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
return 0;
commit->object.flags |= ADDED;
+ if (revs->include_check &&
+ !revs->include_check(commit, revs->include_check_data))
+ return 0;
+
/*
* If the commit is uninteresting, don't try to
* prune parents - we want the maximal uninteresting
diff --git a/revision.h b/revision.h
index 88967d6a24..1eb94c1548 100644
--- a/revision.h
+++ b/revision.h
@@ -172,6 +172,8 @@ struct rev_info {
unsigned long min_age;
int min_parents;
int max_parents;
+ int (*include_check)(struct commit *, void *);
+ void *include_check_data;
/* diff info for patches and for paths limiting */
struct diff_options diffopt;
diff --git a/sha1_file.c b/sha1_file.c
index 6e8c05d108..019628add5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -252,8 +252,6 @@ char *sha1_pack_index_name(const unsigned char *sha1)
struct alternate_object_database *alt_odb_list;
static struct alternate_object_database **alt_odb_tail;
-static int git_open_noatime(const char *name);
-
/*
* Prepare alternate object database registry.
*
@@ -1232,6 +1230,7 @@ static void prepare_packed_git_one(char *objdir, int local)
if (has_extension(de->d_name, ".idx") ||
has_extension(de->d_name, ".pack") ||
+ has_extension(de->d_name, ".bitmap") ||
has_extension(de->d_name, ".keep"))
string_list_append(&garbage, path);
else
@@ -1316,7 +1315,6 @@ void prepare_packed_git(void)
void reprepare_packed_git(void)
{
- discard_revindex();
prepare_packed_git_run_once = 0;
prepare_packed_git();
}
@@ -1393,7 +1391,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
}
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
{
static int sha1_file_open_flag = O_NOATIME;
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
new file mode 100755
index 0000000000..685d46f8b7
--- /dev/null
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+# note that we do everything through config,
+# since we want to be able to compare bitmap-aware
+# git versus non-bitmap git
+test_expect_success 'setup bitmap config' '
+ git config pack.writebitmaps true &&
+ git config pack.writebitmaphashcache true
+'
+
+test_perf 'repack to disk' '
+ git repack -ad
+'
+
+test_perf 'simulated clone' '
+ git pack-objects --stdout --all </dev/null >/dev/null
+'
+
+test_perf 'simulated fetch' '
+ have=$(git rev-list HEAD~100 -1) &&
+ {
+ echo HEAD &&
+ echo ^$have
+ } | git pack-objects --revs --stdout >/dev/null
+'
+
+test_expect_success 'create partial bitmap state' '
+ # pick a commit to represent the repo tip in the past
+ cutoff=$(git rev-list HEAD~100 -1) &&
+ orig_tip=$(git rev-parse HEAD) &&
+
+ # now kill off all of the refs and pretend we had
+ # just the one tip
+ rm -rf .git/logs .git/refs/* .git/packed-refs
+ git update-ref HEAD $cutoff
+
+ # and then repack, which will leave us with a nice
+ # big bitmap pack of the "old" history, and all of
+ # the new history will be loose, as if it had been pushed
+ # up incrementally and exploded via unpack-objects
+ git repack -Ad
+
+ # and now restore our original tip, as if the pushes
+ # had happened
+ git update-ref HEAD $orig_tip
+'
+
+test_perf 'partial bitmap' '
+ git pack-objects --stdout --all </dev/null >/dev/null
+'
+
+test_done
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
new file mode 100755
index 0000000000..d3a3afaba8
--- /dev/null
+++ b/t/t5310-pack-bitmaps.sh
@@ -0,0 +1,139 @@
+#!/bin/sh
+
+test_description='exercise basic bitmap functionality'
+. ./test-lib.sh
+
+test_expect_success 'setup repo with moderate-sized history' '
+ for i in $(test_seq 1 10); do
+ test_commit $i
+ done &&
+ git checkout -b other HEAD~5 &&
+ for i in $(test_seq 1 10); do
+ test_commit side-$i
+ done &&
+ git checkout master &&
+ blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+ git tag tagged-blob $blob &&
+ git config pack.writebitmaps true &&
+ git config pack.writebitmaphashcache true
+'
+
+test_expect_success 'full repack creates bitmaps' '
+ git repack -ad &&
+ ls .git/objects/pack/ | grep bitmap >output &&
+ test_line_count = 1 output
+'
+
+test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
+ git rev-list --test-bitmap HEAD
+'
+
+rev_list_tests() {
+ state=$1
+
+ test_expect_success "counting commits via bitmap ($state)" '
+ git rev-list --count HEAD >expect &&
+ git rev-list --use-bitmap-index --count HEAD >actual &&
+ test_cmp expect actual
+ '
+
+ test_expect_success "counting partial commits via bitmap ($state)" '
+ git rev-list --count HEAD~5..HEAD >expect &&
+ git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual &&
+ test_cmp expect actual
+ '
+
+ test_expect_success "counting non-linear history ($state)" '
+ git rev-list --count other...master >expect &&
+ git rev-list --use-bitmap-index --count other...master >actual &&
+ test_cmp expect actual
+ '
+
+ test_expect_success "enumerate --objects ($state)" '
+ git rev-list --objects --use-bitmap-index HEAD >tmp &&
+ cut -d" " -f1 <tmp >tmp2 &&
+ sort <tmp2 >actual &&
+ git rev-list --objects HEAD >tmp &&
+ cut -d" " -f1 <tmp >tmp2 &&
+ sort <tmp2 >expect &&
+ test_cmp expect actual
+ '
+
+ test_expect_success "bitmap --objects handles non-commit objects ($state)" '
+ git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual &&
+ grep $blob actual
+ '
+}
+
+rev_list_tests 'full bitmap'
+
+test_expect_success 'clone from bitmapped repository' '
+ git clone --no-local --bare . clone.git &&
+ git rev-parse HEAD >expect &&
+ git --git-dir=clone.git rev-parse HEAD >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'setup further non-bitmapped commits' '
+ for i in $(test_seq 1 10); do
+ test_commit further-$i
+ done
+'
+
+rev_list_tests 'partial bitmap'
+
+test_expect_success 'fetch (partial bitmap)' '
+ git --git-dir=clone.git fetch origin master:master &&
+ git rev-parse HEAD >expect &&
+ git --git-dir=clone.git rev-parse HEAD >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'incremental repack cannot create bitmaps' '
+ test_commit more-1 &&
+ test_must_fail git repack -d
+'
+
+test_expect_success 'incremental repack can disable bitmaps' '
+ test_commit more-2 &&
+ git repack -d --no-write-bitmap-index
+'
+
+test_expect_success 'full repack, reusing previous bitmaps' '
+ git repack -ad &&
+ ls .git/objects/pack/ | grep bitmap >output &&
+ test_line_count = 1 output
+'
+
+test_expect_success 'fetch (full bitmap)' '
+ git --git-dir=clone.git fetch origin master:master &&
+ git rev-parse HEAD >expect &&
+ git --git-dir=clone.git rev-parse HEAD >actual &&
+ test_cmp expect actual
+'
+
+test_lazy_prereq JGIT '
+ type jgit
+'
+
+test_expect_success JGIT 'we can read jgit bitmaps' '
+ git clone . compat-jgit &&
+ (
+ cd compat-jgit &&
+ rm -f .git/objects/pack/*.bitmap &&
+ jgit gc &&
+ git rev-list --test-bitmap HEAD
+ )
+'
+
+test_expect_success JGIT 'jgit can read our bitmaps' '
+ git clone . compat-us &&
+ (
+ cd compat-us &&
+ git repack -adb &&
+ # jgit gc will barf if it does not like our bitmaps
+ jgit gc
+ )
+'
+
+test_done