summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Documentation: what does "git log --indexed-objects" even mean?jc/doc-log-rev-list-optionsJunio C Hamano2015-01-231-5/+7
| | | | | | | | | | | | | | | | 4fe10219 (rev-list: add --indexed-objects option, 2014-10-16) adds "--indexed-objects" option to "rev-list", and it is only useful in the context of "git rev-list" and not "git log". There are other object traversal options that do not make sense for "git log" that are shown in the manual page. Move the description of "--indexed-objects" to the object traversal section so that it sits together with its friends "--objects", "--objects-edge", etc. and then show them only in "git rev-list" documentation. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rev-list: add --indexed-objects optionJeff King2014-10-194-0/+80
| | | | | | | | | | There is currently no easy way to ask the revision traversal machinery to include objects reachable from the index (e.g., blobs and trees that have not yet been committed). This patch adds an option to do so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rev-list: document --reflog optionJeff King2014-10-191-0/+4
| | | | | | | | This is mostly used internally, but it does not hurt to explain it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* t5516: test pushing a tag of an otherwise unreferenced blobJeff King2014-10-191-0/+13
| | | | | | | | | | | | | | | | It's not unreasonable to have a tag that points to a blob that is not part of the normal history. We do this in git.git to distribute gpg keys. However, we never explicitly checked in our test suite that this actually works (i.e., that pack-objects actually sends the blob because of the tag mentioning it). It does in fact work fine, but a recent patch under discussion broke this, and the test suite didn't notice. Let's make the test suite more complete. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* traverse_commit_list: support pending blobs/trees with pathsJeff King2014-10-192-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we call traverse_commit_list, we may have trees and blobs in the pending array. As we process these, we pass the "name" field from the pending entry as the path of the object within the tree (which then becomes the root path if we recurse into subtrees). When we set up the traversal in prepare_revision_walk, though, the "name" field of any pending trees and blobs is likely to be the ref at which we found the object. We would not want to make this part of the path (e.g., doing so would make "git rev-list --objects v2.6.11-tree" in linux.git show paths like "v2.6.11-tree/Makefile", which is nonsensical). Therefore prepare_revision_walk sets the name field of each pending tree and blobs to the empty string. However, this leaves no room for a caller who does know the correct path of a pending object to propagate that information to the revision walker. We can fix this by making two related changes: 1. Use the "path" field as the path instead of the "name" field in traverse_commit_list. If the path is not set, default to "" (which is what we always ended up with in the current code, because of prepare_revision_walk). 2. In prepare_revision_walk, make a complete copy of the entry. This makes the path field available to the walker (if there is one), solving our problem. Leaving the name field intact is now OK, as we do not use it as a path due to point (1) above (and we can use it to make more meaningful error messages if we want). We also make the original "mode" field available to the walker, though it does not actually use it. Note that we still re-add the pending objects and free the old ones (so we may strdup the path and name only to free the old ones). This could be made more efficient by simply copying the object_array entries that we are keeping. However, that would require more restructuring of the code, and is not done here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* make add_object_array_with_context interface more saneJeff King2014-10-163-20/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When you resolve a sha1, you can optionally keep any context found during the resolution, including the path and mode of a tree entry (e.g., when looking up "HEAD:subdir/file.c"). The add_object_array_with_context function lets you then attach that context to an entry in a list. Unfortunately, the interface for doing so is horrible. The object_context structure is large and most object_array users do not use it. Therefore we keep a pointer to the structure to avoid burdening other users too much. But that means when we do use it that we must allocate the struct ourselves. And the struct contains a fixed PATH_MAX-sized buffer, which makes this wholly unsuitable for any large arrays. We can observe that there is only a single user of the "with_context" variant: builtin/grep.c. And in that use case, the only element we care about is the path. We can therefore store only the path as a pointer (the context's mode field was redundant with the object_array_entry itself, and nobody actually cared about the surrounding tree). This still requires a strdup of the pathname, but at least we are only consuming the minimum amount of memory for each string. We can also handle the copying ourselves in add_object_array_*, and free it as appropriate in object_array_release_entry. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* write_sha1_file: freshen existing objectsJeff King2014-10-162-7/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | When we try to write a loose object file, we first check whether that object already exists. If so, we skip the write as an optimization. However, this can interfere with prune's strategy of using mtimes to mark files in progress. For example, if a branch contains a particular tree object and is deleted, that tree object may become unreachable, and have an old mtime. If a new operation then tries to write the same tree, this ends up as a noop; we notice we already have the object and do nothing. A prune running simultaneously with this operation will see the object as old, and may delete it. We can solve this by "freshening" objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface "check_and_freshen", which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack-objects: match prune logic for discarding objectsJeff King2014-10-164-40/+98
| | | | | | | | | | | | | | | A recent commit taught git-prune to keep non-recent objects that are reachable from recent ones. However, pack-objects, when loosening unreachable objects, tries to optimize out the write in the case that the object will be immediately pruned. It now gets this wrong, since its rule does not reflect the new prune code (and this can be seen by running t6501 with a strategically placed repack). Let's teach pack-objects similar logic. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack-objects: refactor unpack-unreachable expiration checkJeff King2014-10-161-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | When we are loosening unreachable packed objects, we do not bother to process objects that would simply be pruned immediately anyway. The "would be pruned" check is a simple comparison, but is about to get more complicated. Let's pull it out into a separate function. Note that this is slightly less efficient than the original, which avoided even opening old packs, since no object in them could pass the current check, which cares only about the pack mtime. But the new rules will depend on the exact object, so we need to perform the check even for old packs. Note also that we fix a minor buglet when the pack mtime is exactly the same as the expiration time. The prune code considers that worth pruning, whereas our check here considered it worth keeping. This wasn't a big deal. Besides being unlikely to happen, the result was simply that the object was loosened and then pruned, missing the optimization. Still, we can easily fix it while we are here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* prune: keep objects reachable from recent objectsJeff King2014-10-165-3/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our current strategy with prune is that an object falls into one of three categories: 1. Reachable (from ref tips, reflogs, index, etc). 2. Not reachable, but recent (based on the --expire time). 3. Not reachable and not recent. We keep objects from (1) and (2), but prune objects in (3). The point of (2) is that these objects may be part of an in-progress operation that has not yet updated any refs. However, it is not always the case that objects for an in-progress operation will have a recent mtime. For example, the object database may have an old copy of a blob (from an abandoned operation, a branch that was deleted, etc). If we create a new tree that points to it, a simultaneous prune will leave our tree, but delete the blob. Referencing that tree with a commit will then work (we check that the tree is in the object database, but not that all of its referred objects are), as will mentioning the commit in a ref. But the resulting repo is corrupt; we are missing the blob reachable from a ref. One way to solve this is to be more thorough when referencing a sha1: make sure that not only do we have that sha1, but that we have objects it refers to, and so forth recursively. The problem is that this is very expensive. Creating a parent link would require traversing the entire object graph! Instead, this patch pushes the extra work onto prune, which runs less frequently (and has to look at the whole object graph anyway). It creates a new category of objects: objects which are not recent, but which are reachable from a recent object. We do not prune these objects, just like the reachable and recent ones. This lets us avoid the recursive check above, because if we have an object, even if it is unreachable, we should have its referent. We can make a simple inductive argument that with this patch, this property holds (that there are no objects with missing referents in the repository): 0. When we have no objects, we have nothing to refer or be referred to, so the property holds. 1. If we add objects to the repository, their direct referents must generally exist (e.g., if you create a tree, the blobs it references must exist; if you create a commit to point at the tree, the tree must exist). This is already the case before this patch. And it is not 100% foolproof (you can make bogus objects using `git hash-object`, for example), but it should be the case for normal usage. Therefore for any sequence of object additions, the property will continue to hold. 2. If we remove objects from the repository, then we will not remove a child object (like a blob) if an object that refers to it is being kept. That is the part implemented by this patch. Note, however, that our reachability check and the actual pruning are not atomic. So it _is_ still possible to violate the property (e.g., an object becomes referenced just as we are deleting it). This patch is shooting for eliminating problems where the mtimes of dependent objects differ by hours or days, and one is dropped without the other. It does nothing to help with short races. Naively, the simplest way to implement this would be to add all recent objects as tips to the reachability traversal. However, this does not perform well. In a recently-packed repository, all reachable objects will also be recent, and therefore we have to look at each object twice. This patch instead performs the reachability traversal, then follows up with a second traversal for recent objects, skipping any that have already been marked. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sha1_file: add for_each iterators for loose and packed objectsJeff King2014-10-162-0/+73
| | | | | | | | | | We typically iterate over the reachable objects in a repository by starting at the tips and walking the graph. There's no easy way to iterate over all of the objects, including unreachable ones. Let's provide a way of doing so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* count-objects: use for_each_loose_file_in_objdirJeff King2014-10-161-71/+30
| | | | | | | | | This drops our line count considerably, and should make things more readable by keeping the counting logic separate from the traversal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* count-objects: do not use xsize_t when counting object sizeJeff King2014-10-161-1/+1
| | | | | | | | | | | | The point of xsize_t is to safely cast an off_t into a size_t (because we are about to mmap). But in count-objects, we are summing the sizes in an off_t. Using xsize_t means that count-objects could fail on a 32-bit system with a 4G object (not likely, as other parts of git would fail, but we should at least be correct here). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* prune-packed: use for_each_loose_file_in_objdirJeff King2014-10-161-46/+23
| | | | | | | | This saves us from manually traversing the directory structure ourselves. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* reachable: mark index blobs as SEENJeff King2014-10-161-1/+6
| | | | | | | | | | | | | | | | | When we mark all reachable objects for pruning, that includes blobs mentioned by the index. However, we do not mark these with the SEEN flag, as we do for objects that we find by traversing (we also do not add them to the pending list, but that is because there is nothing further to traverse with them). This doesn't cause any problems with prune, because it checks only that the object exists in the global object hash, and not its flags. However, let's mark these objects to be consistent and avoid any later surprises. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* prune: factor out loose-object directory traversalJeff King2014-10-163-61/+143
| | | | | | | | | | | | | | | | | | | | | | | | Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Note that this is not quite a straight code movement. The original code had strange behavior when it found a file of the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex digits. It executed a "break" from the loop, meaning that we stopped pruning in that directory (but still pruned other directories!). This was probably a bug; we do not want to process the file as an object, but we should keep going otherwise (and that is how the new code handles it). We are also a little more careful with loose object directories which fail to open. The original code silently ignored any failures, but the new code will complain about any problems besides ENOENT. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* reachable: reuse revision.c "add all reflogs" codeJeff King2014-10-163-25/+4
| | | | | | | | | | We want to add all reflog entries as tips for finding reachable objects. The revision machinery can already do this (to support "rev-list --reflog"); we can reuse that code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* reachable: use traverse_commit_list instead of custom walkJeff King2014-10-161-113/+17
| | | | | | | | | | | | | To find the set of reachable objects, we add a bunch of possible sources to our rev_info, call prepare_revision_walk, and then launch into a custom walker that handles each object top. This is a subset of what traverse_commit_list does, so we can just reuse that code (it can also handle more complex cases like UNINTERESTING commits and pathspecs, but we don't use those features). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* clean up name allocation in prepare_revision_walkJeff King2014-10-161-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we enter prepare_revision_walk, we have zero or more entries in our "pending" array. We disconnect that array from the rev_info, and then process each entry: 1. If the entry is a commit and the --source option is in effect, we keep a pointer to the object name. 2. Otherwise, we re-add the item to the pending list with a blank name. We then throw away the old array by freeing the array itself, but do not touch the "name" field of each entry. For any items of type (2), we leak the memory associated with the name. This commit fixes that by calling object_array_clear, which handles the cleanup for us. That breaks (1), though, because it depends on the memory pointed to by the name to last forever. We can solve that by making a copy of the name. This is slightly less efficient, but it shouldn't matter in practice, as we do it only for the tip commits of the traversal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* object_array: add a "clear" functionJeff King2014-10-163-6/+17
| | | | | | | | | | | | | | There's currently no easy way to free the memory associated with an object_array (and in most cases, we simply leak the memory in a rev_info's pending array). Let's provide a helper to make this easier to handle. We can make use of it in list-objects.c, which does the same thing by hand (but fails to free the "name" field of each entry, potentially leaking memory). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* object_array: factor out slopbuf-freeing logicJeff King2014-10-161-4/+12
| | | | | | | | | This is not a lot of code, but it's a logical construct that should not need to be repeated (and we are about to add a third repetition). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* isxdigit: cast input to unsigned charJeff King2014-10-162-5/+5
| | | | | | | | | | | | | | | | | Otherwise, callers must do so or risk triggering warnings -Wchar-subscript (and rightfully so; a signed char might cause us to use a bogus negative index into the hexval_table). While we are dropping the now-unnecessary casts from the caller in urlmatch.c, we can get rid of similar casts in actually parsing the hex by using the hexval() helper, which implicitly casts to unsigned (but note that we cannot implement isxdigit in terms of hexval(), as it also casts its return value to unsigned). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* foreach_alt_odb: propagate return value from callbackJeff King2014-10-162-5/+9
| | | | | | | | | | | | We check the return value of the callback and stop iterating if it is non-zero. However, we do not make the non-zero return value available to the caller, so they have no way of knowing whether the operation succeeded or not (technically they can keep their own error flag in the callback data, but that is unlike our other for_each functions). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* test-lib.sh: support -x option for shell-tracingjk/test-shell-traceJeff King2014-10-132-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Usually running a test under "-v" makes it clear which command is failing. However, sometimes it can be useful to also see a complete trace of the shell commands being run in the test. You can do so without any support from the test suite by running "sh -x tXXXX-foo.sh". However, this produces quite a large bit of output, as we see a trace of the entire test suite. This patch instead introduces a "-x" option to the test scripts (i.e., "./tXXXX-foo.sh -x"). When enabled, this turns on "set -x" only for the tests themselves. This can still be a bit verbose, but should keep things to a more manageable level. You can even use "--verbose-only" to see the trace only for a specific test. The implementation is a little invasive. We turn on the "set -x" inside the "eval" of the test code. This lets the eval itself avoid being reported in the trace (which would be long, and redundant with the verbose listing we already showed). And then after the eval runs, we do some trickery with stderr to avoid showing the "set +x" to the user. We also show traces for test_cleanup functions (since they can impact the test outcome, too). However, we do avoid running the noop ":" cleanup (the default if the test does not use test_cleanup at all), as it creates unnecessary noise in the "set -x" output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* t5304: use helper to report failure of "test foo = bar"Jeff King2014-10-132-8/+17
| | | | | | | | | | | | | | For small outputs, we sometimes use: test "$(some_cmd)" = "something we expect" instead of a full test_cmp. The downside of this is that when it fails, there is no output at all from the script. Let's introduce a small helper to make tests easier to debug. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* t5304: use test_path_is_* instead of "test -f"Jeff King2014-10-131-23/+23
| | | | | | | | | | This is slightly more robust (checking "! test -f" would not notice a directory of the same name, though that is not likely to happen here). It also makes debugging easier, as the test script will output a message on failure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Update draft release notes to 2.2Junio C Hamano2014-10-081-0/+13
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'sp/stream-clean-filter'Junio C Hamano2014-10-0810-52/+164
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a required clean filter, we do not have to mmap the original before feeding the filter. Instead, stream the file contents directly to the filter and process its output. * sp/stream-clean-filter: sha1_file: don't convert off_t to size_t too early to avoid potential die() convert: stream from fd to required clean filter to reduce used address space copy_fd(): do not close the input file descriptor mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT config.c: add git_env_ulong() to parse environment variable convert: drop arguments other than 'path' from would_convert_to_git()
| * sha1_file: don't convert off_t to size_t too early to avoid potential die()sp/stream-clean-filterSteffen Prohaska2014-09-221-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | xsize_t() checks if an off_t argument can be safely converted to a size_t return value. If the check is executed too early, it could fail for large files on 32-bit architectures even if the size_t code path is not taken. Other paths might be able to handle the large file. Specifically, index_stream_convert_blob() is able to handle a large file if a filter is configured that returns a small result. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * convert: stream from fd to required clean filter to reduce used address spaceSteffen Prohaska2014-08-284-12/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * copy_fd(): do not close the input file descriptorSteffen Prohaska2014-08-282-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The caller, not this function, opened the file descriptor; it is selfish for the callee to close it when it is done reading from it. The caller may want an option to rewind and re-read the contents after it returns. Simplify the loop to copy the input in full to the output; its body essentially is what a call to write_in_full() helper does. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap sizeSteffen Prohaska2014-08-281-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to test expectations about mmap in a way similar to testing expectations about malloc with GIT_ALLOC_LIMIT introduced by d41489a6 (Add more large blob test cases, 2012-03-07), introduce a new environment variable GIT_MMAP_LIMIT to limit the largest allowed mmap length. xmmap() is modified to check the size of the requested region and fail it if it is beyond the limit. Together with GIT_ALLOC_LIMIT tests can now confirm expectations about memory consumption. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMITSteffen Prohaska2014-08-282-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t. Better use git_env_ulong() to parse the environment variable, so that the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the limit for consistency. The change to size_t has no direct practical impact, because the environment variable is only meant to be used for our own tests, and we use it to test small sizes. The cast of size in the call to die() is changed to uintmax_t to match the format string PRIuMAX. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * config.c: add git_env_ulong() to parse environment variableSteffen Prohaska2014-08-282-0/+17
| | | | | | | | | | | | | | | | | | | | The new function parses an integeral value that fits in unsigned long in human readable form, i.e. possibly with unit suffix, e.g. 10k = 10240, etc., from an environment variable. Parsing of GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * convert: drop arguments other than 'path' from would_convert_to_git()Steffen Prohaska2014-08-212-4/+3
| | | | | | | | | | | | | | | | | | It is only the path that matters in the decision whether to filter or not. Clarify this by making path the only argument of would_convert_to_git(). Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'bw/use-write-script-in-tests'Junio C Hamano2014-10-081-3/+1
|\ \ | | | | | | | | | | | | * bw/use-write-script-in-tests: t/lib-credential: use write_script
| * | t/lib-credential: use write_scriptbw/use-write-script-in-testsBen Walton2014-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use write_script to create the helper "askpass" script, instead of hand-creating it with hardcoded "#!/bin/sh" to make sure we use the shell the user told us to use. Signed-off-by: Ben Walton <bdwalton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'nd/archive-pathspec'Junio C Hamano2014-10-082-3/+108
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | "git archive" learned to filter what gets archived with pathspec. * nd/archive-pathspec: archive: support filtering paths with glob
| * | | archive: support filtering paths with globnd/archive-pathspecNguyễn Thái Ngọc Duy2014-09-222-3/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two problems with using :(glob) (or even "*.c" without ":(glob)"). The first one is we forgot to turn on the 'recursive' flag in struct pathspec. Without that, tree_entry_interesting() will not mark potential directories "interesting" so that it can confirm whether those directories have anything matching the pathspec. The marking directories interesting has a side effect that we need to walk inside a directory to realize that there's nothing interested in there. By that time, 'archive' code has already written the (empty) directory down. That means lots of empty directories in the result archive. This problem is fixed by lazily writing directories down when we know they are actually needed. There is a theoretical bug in this implementation: we can't write empty trees/directories that match that pathspec. path_exists() is also made stricter in order to detect non-matching pathspec because when this 'recursive' flag is on, we most likely match some directories. The easiest way is not consider any directories "matched". Noticed-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'jc/push-cert'Junio C Hamano2014-10-0823-159/+932
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow "git push" request to be signed, so that it can be verified and audited, using the GPG signature of the person who pushed, that the tips of branches at a public repository really point the commits the pusher wanted to, without having to "trust" the server. * jc/push-cert: (24 commits) receive-pack::hmac_sha1(): copy the entire SHA-1 hash out signed push: allow stale nonce in stateless mode signed push: teach smart-HTTP to pass "git push --signed" around signed push: fortify against replay attacks signed push: add "pushee" header to push certificate signed push: remove duplicated protocol info send-pack: send feature request on push-cert packet receive-pack: GPG-validate push certificates push: the beginning of "git push --signed" pack-protocol doc: typofix for PKT-LINE gpg-interface: move parse_signature() to where it should be gpg-interface: move parse_gpg_output() to where it should be send-pack: clarify that cmds_sent is a boolean send-pack: refactor inspecting and resetting status and sending commands send-pack: rename "new_refs" to "need_pack_data" receive-pack: factor out capability string generation send-pack: factor out capability string generation send-pack: always send capabilities send-pack: refactor decision to send update per ref send-pack: move REF_STATUS_REJECT_NODELETE logic a bit higher ...
| * | | | receive-pack::hmac_sha1(): copy the entire SHA-1 hash outBrian Gernhardt2014-09-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang gives the following warning: builtin/receive-pack.c:327:35: error: sizeof on array function parameter will return size of 'unsigned char *' instead of 'unsigned char [20]' [-Werror,-Wsizeof-array-argument] git_SHA1_Update(&ctx, out, sizeof(out)); ^ builtin/receive-pack.c:292:37: note: declared here static void hmac_sha1(unsigned char out[20], ^ Signed-off-by: Brian Gernhardt <brian@gernhardtsoftware.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | signed push: allow stale nonce in stateless modeJunio C Hamano2014-09-174-12/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When operating with the stateless RPC mode, we will receive a nonce issued by another instance of us that advertised our capability and refs some time ago. Update the logic to check received nonce to detect this case, compute how much time has passed since the nonce was issued and report the status with a new environment variable GIT_PUSH_CERT_NONCE_SLOP to the hooks. GIT_PUSH_CERT_NONCE_STATUS will report "SLOP" in such a case. The hooks are free to decide how large a slop it is willing to accept. Strictly speaking, the "nonce" is not really a "nonce" anymore in the stateless RPC mode, as it will happily take any "nonce" issued by it (which is protected by HMAC and its secret key) as long as it is fresh enough. The degree of this security degradation, relative to the native protocol, is about the same as the "we make sure that the 'git push' decided to update our refs with new objects based on the freshest observation of our refs by making sure the values they claim the original value of the refs they ask us to update exactly match the current state" security is loosened to accomodate the stateless RPC mode in the existing code without this series, so there is no need for those who are already using smart HTTP to push to their repositories to be alarmed any more than they already are. In addition, the server operator can set receive.certnonceslop configuration variable to specify how stale a nonce can be (in seconds). When this variable is set, and if the nonce received in the certificate that passes the HMAC check was less than that many seconds old, hooks are given "OK" in GIT_PUSH_CERT_NONCE_STATUS (instead of "SLOP") and the received nonce value is given in GIT_PUSH_CERT_NONCE, which makes it easier for a simple-minded hook to check if the certificate we received is recent enough. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | signed push: teach smart-HTTP to pass "git push --signed" aroundJunio C Hamano2014-09-176-3/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "--signed" option received by "git push" is first passed to the transport layer, which the native transport directly uses to notice that a push certificate needs to be sent. When the transport-helper is involved, however, the option needs to be told to the helper with set_helper_option(), and the helper needs to take necessary action. For the smart-HTTP helper, the "necessary action" involves spawning the "git send-pack" subprocess with the "--signed" option. Once the above all gets wired in, the smart-HTTP transport now can use the push certificate mechanism to authenticate its pushes. Add a test that is modeled after tests for the native transport in t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test Apache configuration to pass GNUPGHOME environment variable through. As PassEnv would trigger warnings for an environment variable that is not set, export it from test-lib.sh set to a harmless value when GnuPG is not being used in the tests. Note that the added test is deliberately loose and does not check the nonce in this step. This is because the stateless RPC mode is inevitably flaky and a nonce that comes back in the actual push processing is one issued by a different process; if the two interactions with the server crossed a second boundary, the nonces will not match and such a check will fail. A later patch in the series will work around this shortcoming. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | signed push: fortify against replay attacksJunio C Hamano2014-09-177-29/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to prevent a valid push certificate for pushing into an repository from getting replayed in a different push operation, send a nonce string from the receive-pack process and have the signer include it in the push certificate. The receiving end uses an HMAC hash of the path to the repository it serves and the current time stamp, hashed with a secret seed (the secret seed does not have to be per-repository but can be defined in /etc/gitconfig) to generate the nonce, in order to ensure that a random third party cannot forge a nonce that looks like it originated from it. The original nonce is exported as GIT_PUSH_CERT_NONCE for the hooks to examine and match against the value on the "nonce" header in the certificate to notice a replay, but returned "nonce" header in the push certificate is examined by receive-pack and the result is exported as GIT_PUSH_CERT_NONCE_STATUS, whose value would be "OK" if the nonce recorded in the certificate matches what we expect, so that the hooks can more easily check. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | signed push: add "pushee" header to push certificateJunio C Hamano2014-09-154-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Record the URL of the intended recipient for a push (after anonymizing it if it has authentication material) on a new "pushee URL" header. Because the networking configuration (SSH-tunnels, proxies, etc.) on the pushing user's side varies, the receiving repository may not know the single canonical URL all the pushing users would refer it as (besides, many sites allow pushing over ssh://host/path and https://host/path protocols to the same repository but with different local part of the path). So this value may not be reliably used for replay-attack prevention purposes, but this will still serve as a human readable hint to identify the repository the certificate refers to. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | signed push: remove duplicated protocol infoJunio C Hamano2014-09-154-4/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the interim protocol, we used to send the update commands even though we already send a signed copy of the same information when push certificate is in use. Update the send-pack/receive-pack pair not to do so. The notable thing on the receive-pack side is that it makes sure that there is no command sent over the traditional protocol packet outside the push certificate. Otherwise a pusher can claim to be pushing one set of ref updates in the signed certificate while issuing commands to update unrelated refs, and such an update will evade later audits. Finally, start documenting the protocol. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | send-pack: send feature request on push-cert packetJunio C Hamano2014-09-152-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would want to update the interim protocol so that we do not send the usual update commands when the push certificate feature is in use, as the same information is in the certificate. Once that happens, the push-cert packet may become the only protocol command, but then there is no packet to put the feature request behind, like we always did. As we have prepared the receiving end that understands the push-cert feature to accept the feature request on the first protocol packet (other than "shallow ", which was an unfortunate historical mistake that has to come before everything else), we can give the feature request on the push-cert packet instead of the first update protocol packet, in preparation for the next step to actually update to the final protocol. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | receive-pack: GPG-validate push certificatesJunio C Hamano2014-09-153-7/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reusing the GPG signature check helpers we already have, verify the signature in receive-pack and give the results to the hooks via GIT_PUSH_CERT_{SIGNER,KEY,STATUS} environment variables. Policy decisions, such as accepting or rejecting a good signature by a key that is not fully trusted, is left to the hook and kept outside of the core. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | push: the beginning of "git push --signed"Junio C Hamano2014-09-1510-2/+253
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | pack-protocol doc: typofix for PKT-LINEJunio C Hamano2014-09-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Everywhere else we use PKT-LINE to denote the pkt-line formatted data, but "shallow/deepen" messages are described with PKT_LINE(). Fix them. Signed-off-by: Junio C Hamano <gitster@pobox.com>