diff options
author | Michael Haggerty <mhagger@alum.mit.edu> | 2016-06-18 06:15:15 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2016-06-20 11:38:20 -0700 |
commit | 3bc581b9406e1e9a3f879d379106ee1e3bc48f5c (patch) | |
tree | dbdb5d7b0792434acb2b839eaf638265bce076ee /refs/files-backend.c | |
parent | a873924483ec6ec548e94d723fe62875b1a4ec93 (diff) | |
download | git-3bc581b9406e1e9a3f879d379106ee1e3bc48f5c.tar.gz |
refs: introduce an iterator interface
Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.
The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:
* Per-worktree vs. shared references are currently handled via a kludge
in git_path() rather than iterating over each part of the reference
namespace separately and merging the results. This kludge will cease
to work when we have multiple reference storage backends.
* The current scheme is inflexible. What if we sometimes want to bypass
the ref_cache, or use it only for packed or only for loose refs? What
if we want to store symbolic refs in one type of storage backend and
non-symbolic ones in another?
In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.
This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:
* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at
Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.
It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.
Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.
More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.
For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).
Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").
In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)
In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.
Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'refs/files-backend.c')
-rw-r--r-- | refs/files-backend.c | 281 |
1 files changed, 281 insertions, 0 deletions
diff --git a/refs/files-backend.c b/refs/files-backend.c index 62280b5c70..395056b675 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1,6 +1,7 @@ #include "../cache.h" #include "../refs.h" #include "refs-internal.h" +#include "../iterator.h" #include "../lockfile.h" #include "../object.h" #include "../dir.h" @@ -704,6 +705,153 @@ static void prime_ref_dir(struct ref_dir *dir) } } +/* + * A level in the reference hierarchy that is currently being iterated + * through. + */ +struct cache_ref_iterator_level { + /* + * The ref_dir being iterated over at this level. The ref_dir + * is sorted before being stored here. + */ + struct ref_dir *dir; + + /* + * The index of the current entry within dir (which might + * itself be a directory). If index == -1, then the iteration + * hasn't yet begun. If index == dir->nr, then the iteration + * through this level is over. + */ + int index; +}; + +/* + * Represent an iteration through a ref_dir in the memory cache. The + * iteration recurses through subdirectories. + */ +struct cache_ref_iterator { + struct ref_iterator base; + + /* + * The number of levels currently on the stack. This is always + * at least 1, because when it becomes zero the iteration is + * ended and this struct is freed. + */ + size_t levels_nr; + + /* The number of levels that have been allocated on the stack */ + size_t levels_alloc; + + /* + * A stack of levels. levels[0] is the uppermost level that is + * being iterated over in this iteration. (This is not + * necessary the top level in the references hierarchy. If we + * are iterating through a subtree, then levels[0] will hold + * the ref_dir for that subtree, and subsequent levels will go + * on from there.) + */ + struct cache_ref_iterator_level *levels; +}; + +static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) +{ + struct cache_ref_iterator *iter = + (struct cache_ref_iterator *)ref_iterator; + + while (1) { + struct cache_ref_iterator_level *level = + &iter->levels[iter->levels_nr - 1]; + struct ref_dir *dir = level->dir; + struct ref_entry *entry; + + if (level->index == -1) + sort_ref_dir(dir); + + if (++level->index == level->dir->nr) { + /* This level is exhausted; pop up a level */ + if (--iter->levels_nr == 0) + return ref_iterator_abort(ref_iterator); + + continue; + } + + entry = dir->entries[level->index]; + + if (entry->flag & REF_DIR) { + /* push down a level */ + ALLOC_GROW(iter->levels, iter->levels_nr + 1, + iter->levels_alloc); + + level = &iter->levels[iter->levels_nr++]; + level->dir = get_ref_dir(entry); + level->index = -1; + } else { + iter->base.refname = entry->name; + iter->base.oid = &entry->u.value.oid; + iter->base.flags = entry->flag; + return ITER_OK; + } + } +} + +static enum peel_status peel_entry(struct ref_entry *entry, int repeel); + +static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator, + struct object_id *peeled) +{ + struct cache_ref_iterator *iter = + (struct cache_ref_iterator *)ref_iterator; + struct cache_ref_iterator_level *level; + struct ref_entry *entry; + + level = &iter->levels[iter->levels_nr - 1]; + + if (level->index == -1) + die("BUG: peel called before advance for cache iterator"); + + entry = level->dir->entries[level->index]; + + if (peel_entry(entry, 0)) + return -1; + hashcpy(peeled->hash, entry->u.value.peeled.hash); + return 0; +} + +static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator) +{ + struct cache_ref_iterator *iter = + (struct cache_ref_iterator *)ref_iterator; + + free(iter->levels); + base_ref_iterator_free(ref_iterator); + return ITER_DONE; +} + +static struct ref_iterator_vtable cache_ref_iterator_vtable = { + cache_ref_iterator_advance, + cache_ref_iterator_peel, + cache_ref_iterator_abort +}; + +static struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir) +{ + struct cache_ref_iterator *iter; + struct ref_iterator *ref_iterator; + struct cache_ref_iterator_level *level; + + iter = xcalloc(1, sizeof(*iter)); + ref_iterator = &iter->base; + base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable); + ALLOC_GROW(iter->levels, 10, iter->levels_alloc); + + iter->levels_nr = 1; + level = &iter->levels[0]; + level->index = -1; + level->dir = dir; + + return ref_iterator; +} + struct nonmatching_ref_data { const struct string_list *skip; const char *conflicting_refname; @@ -1843,6 +1991,139 @@ int peel_ref(const char *refname, unsigned char *sha1) return peel_object(base, sha1); } +struct files_ref_iterator { + struct ref_iterator base; + + struct packed_ref_cache *packed_ref_cache; + struct ref_iterator *iter0; + unsigned int flags; +}; + +static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) +{ + struct files_ref_iterator *iter = + (struct files_ref_iterator *)ref_iterator; + int ok; + + while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) { + if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) && + !ref_resolves_to_object(iter->iter0->refname, + iter->iter0->oid, + iter->iter0->flags)) + continue; + + iter->base.refname = iter->iter0->refname; + iter->base.oid = iter->iter0->oid; + iter->base.flags = iter->iter0->flags; + return ITER_OK; + } + + iter->iter0 = NULL; + if (ref_iterator_abort(ref_iterator) != ITER_DONE) + ok = ITER_ERROR; + + return ok; +} + +static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, + struct object_id *peeled) +{ + struct files_ref_iterator *iter = + (struct files_ref_iterator *)ref_iterator; + + return ref_iterator_peel(iter->iter0, peeled); +} + +static int files_ref_iterator_abort(struct ref_iterator *ref_iterator) +{ + struct files_ref_iterator *iter = + (struct files_ref_iterator *)ref_iterator; + int ok = ITER_DONE; + + if (iter->iter0) + ok = ref_iterator_abort(iter->iter0); + + release_packed_ref_cache(iter->packed_ref_cache); + base_ref_iterator_free(ref_iterator); + return ok; +} + +static struct ref_iterator_vtable files_ref_iterator_vtable = { + files_ref_iterator_advance, + files_ref_iterator_peel, + files_ref_iterator_abort +}; + +struct ref_iterator *files_ref_iterator_begin( + const char *submodule, + const char *prefix, unsigned int flags) +{ + struct ref_cache *refs = get_ref_cache(submodule); + struct ref_dir *loose_dir, *packed_dir; + struct ref_iterator *loose_iter, *packed_iter; + struct files_ref_iterator *iter; + struct ref_iterator *ref_iterator; + + if (!refs) + return empty_ref_iterator_begin(); + + if (ref_paranoia < 0) + ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0); + if (ref_paranoia) + flags |= DO_FOR_EACH_INCLUDE_BROKEN; + + iter = xcalloc(1, sizeof(*iter)); + ref_iterator = &iter->base; + base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable); + + /* + * We must make sure that all loose refs are read before + * accessing the packed-refs file; this avoids a race + * condition if loose refs are migrated to the packed-refs + * file by a simultaneous process, but our in-memory view is + * from before the migration. We ensure this as follows: + * First, we call prime_ref_dir(), which pre-reads the loose + * references for the subtree into the cache. (If they've + * already been read, that's OK; we only need to guarantee + * that they're read before the packed refs, not *how much* + * before.) After that, we call get_packed_ref_cache(), which + * internally checks whether the packed-ref cache is up to + * date with what is on disk, and re-reads it if not. + */ + + loose_dir = get_loose_refs(refs); + + if (prefix && *prefix) + loose_dir = find_containing_dir(loose_dir, prefix, 0); + + if (loose_dir) { + prime_ref_dir(loose_dir); + loose_iter = cache_ref_iterator_begin(loose_dir); + } else { + /* There's nothing to iterate over. */ + loose_iter = empty_ref_iterator_begin(); + } + + iter->packed_ref_cache = get_packed_ref_cache(refs); + acquire_packed_ref_cache(iter->packed_ref_cache); + packed_dir = get_packed_ref_dir(iter->packed_ref_cache); + + if (prefix && *prefix) + packed_dir = find_containing_dir(packed_dir, prefix, 0); + + if (packed_dir) { + packed_iter = cache_ref_iterator_begin(packed_dir); + } else { + /* There's nothing to iterate over. */ + packed_iter = empty_ref_iterator_begin(); + } + + iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter); + iter->flags = flags; + + return ref_iterator; +} + /* * Call fn for each reference in the specified ref_cache, omitting * references not in the containing_dir of prefix. Call fn for all |