summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'nd/no-the-index'Junio C Hamano2018-08-2055-245/+337
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-*.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags* to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...
| * blame.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-08-133-21/+33
| | | | | | | | | | | | | | | | Side note, since we gain access to the right repository, we can stop rely on the_repository in this code as well. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * apply.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-08-131-21/+25
| | | | | | | | | | | | | | | | | | | | Use apply_state->repo->index instead of the_index (in most cases, unless we need to use a temporary index in some functions). Let the callers (am and apply) tell us what to use, instead of always assuming to operate on the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * apply.c: make init_apply_state() take a struct repositoryNguyễn Thái Ngọc Duy2018-08-134-2/+8
| | | | | | | | | | | | | | | | | | | | | | We're moving away from the_index in this code. "struct index_state *" could be added to struct apply_state. But let's aim long term and put struct repository here instead so that we could even avoid more global states in the future. The index will be available via apply_state->repo->index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * apply.c: pass struct apply_state to more functionsNguyễn Thái Ngọc Duy2018-08-131-7/+11
| | | | | | | | | | | | | | | | | | | | we're going to remove the dependency on the_index by moving 'struct index_state *' to somewhere inside struct apply_state. Let's make sure relevant functions have access to this struct now and reduce the diff noise when the actual conversion happens. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * resolve-undo.c: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-131-1/+1
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * archive-*.c: use the right repositoryNguyễn Thái Ngọc Duy2018-08-132-2/+2
| | | | | | | | | | | | | | | | With 'struct archive_args' gaining new repository pointer, we don't have to assume the_repository in the archive backends anymore. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * archive.c: avoid access to the_indexNguyễn Thái Ngọc Duy2018-08-134-21/+45
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * grep: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-131-2/+2
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * attr: remove index from git_attr_set_direction()Nguyễn Thái Ngọc Duy2018-08-135-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since attr checking API now take the index, there's no need to set an index in advance with this call. Most call sites are straightforward because they either pass the_index or NULL (which defaults back to the_index previously). There's only one suspicious call site in unpack-trees.c where it sets a different index. This code in unpack-trees is about to check out entries from the new/temporary index after merging is done in it. The attributes will be used by entry.c code to do crlf conversion if needed. entry.c now respects struct checkout's istate field, and this field is correctly set in unpack-trees.c, there should be no regression from this change. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * entry.c: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-132-4/+6
| | | | | | | | | | | | | | | | | | checkout-index.c needs update because if checkout->istate is NULL, ie_match_stat() will crash. Previously this is ie_match_stat(&the_index, ..) so it will not crash, but it is not technically correct either. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * submodule.c: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-131-4/+4
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * pathspec.c: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-131-1/+1
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * unpack-trees: avoid the_index in verify_absent()Nguyễn Thái Ngọc Duy2018-08-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both functions that are updated in this commit are called by verify_absent(), which is part of the "unpack-trees" operation that is supposed to work on any index file specified by the caller. Thanks to Brandon [1] [2], an implicit dependency on the_index is exposed. This commit fixes it. In both functions, it makes sense to use src_index to check for exclusion because it's almost unchanged and should give us the same outcome as if running the exclude check before the unpack. It's "almost unchanged" because we do invalidate cache-tree and untracked cache in the source index. But this should not affect how exclude machinery uses the index: to see if a file is tracked, and to read a blob from the index instead of worktree if it's marked skip-worktree (i.e. it's not available in worktree) [1] a0bba65b10 (dir: convert is_excluded to take an index - 2017-05-05 [2] 2c1eb10454 (dir: convert read_directory to take an index - 2017-05-05) Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * unpack-trees: convert clear_ce_flags* to avoid the_indexNguyễn Thái Ngọc Duy2018-08-131-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to fba92be8f7, this code implicitly (and incorrectly) assumes the_index when running the exclude machinery. fba92be8f7 helps show this problem clearer because unpack-trees operation is supposed to work on whatever index the caller specifies... not specifically the_index. Update the code to use "istate" argument that's originally from mark_new_skip_worktree(). From the call sites, both in unpack_trees(), you can see that this function works on two separate indexes: o->src_index and o->result. The second mark_new_skip_worktree() so far has incorecctly applied exclude rules on o->src_index instead of o->result. It's unclear what is the consequences of this, but it's definitely wrong. [1] fba92be8f7 (dir: convert is_excluded_from_list to take an index - 2017-05-05) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * unpack-trees: don't shadow global var the_indexNguyễn Thái Ngọc Duy2018-08-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | This function mark_new_skip_worktree() has an argument named the_index which is also the name of a global variable. While they have different types (the global the_index is not a pointer) mistakes can easily happen and it's also confusing for readers. Rename the function argument to something other than the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * unpack-trees: add a note about path invalidationNguyễn Thái Ngọc Duy2018-08-131-0/+11
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * unpack-trees: remove 'extern' on function declarationNguyễn Thái Ngọc Duy2018-08-131-2/+2
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * ls-files: correct index argument to get_convert_attr_ascii()Nguyễn Thái Ngọc Duy2018-08-131-8/+9
| | | | | | | | | | | | | | | | write_eolinfo() does take an istate as function argument and it should be used instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * preload-index.c: use the right index instead of the_indexNguyễn Thái Ngọc Duy2018-08-131-1/+1
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * dir.c: remove an implicit dependency on the_index in pathspec codeNguyễn Thái Ngọc Duy2018-08-1321-45/+54
| | | | | | | | | | | | | | | | | | | | | | Make the match_patchspec API and friends take an index_state instead of assuming the_index in dir.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * convert.c: remove an implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-08-1310-33/+45
| | | | | | | | | | | | | | | | | | | | | | Make the convert API take an index_state instead of assuming the_index in convert.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * attr: remove an implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-08-1310-32/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the attr API take an index_state instead of assuming the_index in attr code. All call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. There is one ugly temporary workaround added in attr.c that needs some more explanation. Commit c24f3abace (apply: file commited with CRLF should roundtrip diff and apply - 2017-08-19) forces one convert_to_git() call to NOT read the index at all. But what do you know, we read it anyway by falling back to the_index. When "istate" from convert_to_git is now propagated down to read_attr_from_array() we will hit segfault somewhere inside read_blob_data_from_index. The right way of dealing with this is to kill "use_index" variable and only follow "istate" but at this stage we are not ready for that: while most git_attr_set_direction() calls just passes the_index to be assigned to use_index, unpack-trees passes a different one which is used by entry.c code, which has no way to know what index to use if we delete use_index. So this has to be done later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * cache-tree: wrap the_index based wrappers with #ifdefNguyễn Thái Ngọc Duy2018-08-133-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | This puts update_main_cache_tree() and write_cache_as_tree() in the same group of "index compat" functions that assume the_index implicitly, which should only be used within builtin/ or t/helper. sequencer.c is also updated to not use these functions. As of now, no files outside builtin/ use these functions anymore. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * diff.c: move read_index() code back to the callerNguyễn Thái Ngọc Duy2018-08-133-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code is only needed for diff-tree (since f0c6b2a2fd ([PATCH] Optimize diff-tree -[CM] --stdin - 2005-05-27)). Let the caller do the preparation instead and avoid read_index() in diff.c code. read_index() should be avoided (in addition to the_index) because it uses get_index_file() underneath to get the path $GIT_DIR/index. This effectively pulls the_repository in and may become the only reason to pull a 'struct repository *' in diff.c. Let's keep the dependencies as few as possible and kick it back to diff-tree.c Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'es/chain-lint-more'Junio C Hamano2018-08-2019-41/+213
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve built-in facility to catch broken &&-chain in the tests. * es/chain-lint-more: chainlint: add test of pathological case which triggered false positive chainlint: recognize multi-line quoted strings more robustly chainlint: let here-doc and multi-line string commence on same line chainlint: recognize multi-line $(...) when command cuddled with "$(" chainlint: match 'quoted' here-doc tags chainlint: match arbitrary here-docs tags rather than hard-coded names
| * | chainlint: add test of pathological case which triggered false positiveEric Sunshine2018-08-132-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extract from contrib/subtree/t7900 triggered a false positive due to three chainlint limitations: * recognizing only a "blessed" set of here-doc tag names in a subshell ("EOF", "EOT", "INPUT_END"), of which "TXT" is not a member * inability to recognize multi-line $(...) when the first statement of the body is cuddled with the opening "$(" * inability to recognize multiple constructs on a single line, such as opening a multi-line $(...) and starting a here-doc Now that all of these shortcomings have been addressed, turn this rather pathological bit of shell coding into a chainlint test case. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | chainlint: recognize multi-line quoted strings more robustlyEric Sunshine2018-08-134-13/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chainlint.sed recognizes multi-line quoted strings within subshells: echo "abc def" >out && so it can avoid incorrectly classifying lines internal to the string as breaking the &&-chain. To identify the first line of a multi-line string, it checks if the line contains a single quote. However, this is fragile and can be easily fooled by a line containing multiple strings: echo "xyz" "abc def" >out && Make detection more robust by checking for an odd number of quotes rather than only a single one. (Escaped quotes are not handled, but support may be added later.) The original multi-line string recognizer rather cavalierly threw away all but the final quote, whereas the new one is careful to retain all quotes, so the "expected" output of a couple existing chainlint tests is updated to account for this new behavior. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | chainlint: let here-doc and multi-line string commence on same lineEric Sunshine2018-08-137-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After swallowing a here-doc, chainlint.sed assumes that no other processing needs to be done on the line aside from checking for &&-chain breakage; likewise, after folding a multi-line quoted string. However, it's conceivable (even if unlikely in practice) that both a here-doc and a multi-line quoted string might commence on the same line: cat <<\EOF && echo "foo bar" data EOF Support this case by sending the line (after swallowing and folding) through the normal processing sequence rather than jumping directly to the check for broken &&-chain. This change also allows other somewhat pathological cases to be handled, such as closing a subshell on the same line starting a here-doc: ( cat <<-\INPUT) data INPUT or, for instance, opening a multi-line $(...) expression on the same line starting a here-doc: x=$(cat <<-\END && data END echo "x") among others. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | chainlint: recognize multi-line $(...) when command cuddled with "$("Eric Sunshine2018-08-133-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For multi-line $(...) expressions nested within subshells, chainlint.sed only recognizes: x=$( echo foo && ... but it is not unlikely that test authors may also cuddle the command with the opening "$(", so support that style, as well: x=$(echo foo && ... The closing ")" is already correctly recognized when cuddled or not. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | chainlint: match 'quoted' here-doc tagsEric Sunshine2018-08-135-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A here-doc tag can be quoted ('EOF') or escaped (\EOF) to suppress interpolation within the body. Although, chainlint recognizes escaped tags, it does not know about quoted tags. For completeness, teach it to recognize quoted tags, as well. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | chainlint: match arbitrary here-docs tags rather than hard-coded namesEric Sunshine2018-08-137-23/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chainlint.sed swallows top-level here-docs to avoid being fooled by content which might look like start-of-subshell. It likewise swallows here-docs in subshells to avoid marking content lines as breaking the &&-chain, and to avoid being fooled by content which might look like end-of-subshell, start-of-nested-subshell, or other specially-recognized constructs. At the time of implementation, it was believed that it was not possible to support arbitrary here-doc tag names since 'sed' provides no way to stash the opening tag name in a variable for later comparison against a line signaling end-of-here-doc. Consequently, tag names are hard-coded, with "EOF" being the only tag recognized at the top-level, and only "EOF", "EOT", and "INPUT_END" being recognized within subshells. Also, special care was taken to avoid being confused by here-docs nested within other here-docs. In practice, this limited number of hard-coded tag names has been "good enough" for the 13000+ existing Git test, despite many of those tests using tags other than the recognized ones, since the bodies of those here-docs do not contain content which would fool the linter. Nevertheless, the situation is not ideal since someone writing new tests, and choosing a name not in the "blessed" set could potentially trigger a false-positive. To address this shortcoming, upgrade chainlint.sed to handle arbitrary here-doc tag names, both at the top-level and within subshells. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'sg/t5310-empty-input-fix'Junio C Hamano2018-08-201-3/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Test fix. * sg/t5310-empty-input-fix: t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
| * | | t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' testSZEDER Gábor2018-08-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test 'pack-objects to file can use bitmap' added in 645c432d61 (pack-objects: use reachability bitmap index when generating non-stdout pack, 2016-09-10) is silently buggy and doesn't check what it's supposed to. In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function does what its name implies by running: git show-index <"$1" | cut -d' ' -f2 The test in question invokes this function like this: list_packed_objects <packa-$packasha1.idx >packa.objects && list_packed_objects <packb-$packbsha1.idx >packb.objects && test_cmp packa.objects packb.objects Note how these two callsites don't specify the name of the pack index file as the function's parameter, but redirect the function's standard input from it. This triggers an error message from the shell, as it has no filename to redirect from in the function, but this error is ignored, because it happens upstream of a pipe. Consequently, both invocations produce empty 'pack{a,b}.objects' files, and the subsequent 'test_cmp' happily finds those two empty files identical. Fix these two 'list_packed_objects' invocations by specifying the pack index files as parameters. Furthermore, eliminate the pipe in that function by replacing it with an &&-chained pair of commands using an intermediate file, so a failure of 'git show-index' or the shell redirection will fail the test. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'js/mingw-o-append'Junio C Hamano2018-08-201-2/+39
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Among the three codepaths we use O_APPEND to open a file for appending, one used for writing GIT_TRACE output requires O_APPEND implementation that behaves sensibly when multiple processes are writing to the same file. POSIX emulation used in the Windows port has been updated to improve in this area. * js/mingw-o-append: mingw: enable atomic O_APPEND
| * | | | mingw: enable atomic O_APPENDJohannes Sixt2018-08-131-2/+39
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Windows CRT implements O_APPEND "manually": on write() calls, the file pointer is set to EOF before the data is written. Clearly, this is not atomic. And in fact, this is the root cause of failures observed in t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where different processes write to the same trace file simultanously; it also occurred in t5400-send-pack.sh, but there it was worked around in 71406ed4d6 ("t5400: avoid concurrent writes into a trace file", 2017-05-18). Fortunately, Windows does support atomic O_APPEND semantics using the file access mode FILE_APPEND_DATA. Provide an implementation that does. This implementation is minimal in such a way that it only implements the open modes that are actually used in the Git code base. Emulation for other modes can be added as necessary later. To become aware of the necessity early, the unusal error ENOSYS is reported if an unsupported mode is encountered. Diagnosed-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'jk/for-each-object-iteration'Junio C Hamano2018-08-2010-111/+218
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The API to iterate over all objects learned to optionally list objects in the order they appear in packfiles, which helps locality of access if the caller accesses these objects while as objects are enumerated. * jk/for-each-object-iteration: for_each_*_object: move declarations to object-store.h cat-file: use a single strbuf for all output cat-file: split batch "buf" into two variables cat-file: use oidset check-and-insert cat-file: support "unordered" output for --batch-all-objects cat-file: rename batch_{loose,packed}_object callbacks t1006: test cat-file --batch-all-objects with duplicates for_each_packed_object: support iterating in pack-order for_each_*_object: give more comprehensive docstrings for_each_*_object: take flag arguments as enum for_each_*_object: store flag definitions in a single location
| * | | | for_each_*_object: move declarations to object-store.hJeff King2018-08-144-95/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The for_each_loose_object() and for_each_packed_object() functions are meant to be part of a unified interface: they use the same set of for_each_object_flags, and it's not inconceivable that we might one day add a single for_each_object() wrapper around them. Let's put them together in a single file, so we can avoid awkwardness like saying "the flags for this function are over in cache.h". Moving the loose functions to packfile.h is silly. Moving the packed functions to cache.h works, but makes the "cache.h is a kitchen sink" problem worse. The best place is the recently-created object-store.h, since these are quite obviously related to object storage. The for_each_*_in_objdir() functions do not use the same flags, but they are logically part of the same interface as for_each_loose_object(), and share callback signatures. So we'll move those, as well, as they also make sense in object-store.h. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | cat-file: use a single strbuf for all outputJeff King2018-08-141-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we're in batch mode, we end up in batch_object_write() for each object, which allocates its own strbuf for each call. Instead, we can provide a single "scratch" buffer that gets reused for each output. When running: git cat-file --batch-all-objects --batch-check='%(objectname)' on git.git, my best-of-five time drops from: real 0m0.171s user 0m0.159s sys 0m0.012s to: real 0m0.133s user 0m0.121s sys 0m0.012s Note that we could do this just by putting the "scratch" pointer into "struct expand_data", but I chose instead to add an extra parameter to the callstack. That's more verbose, but it makes it a bit more obvious what is going on, which in turn makes it easy to see where we need to be releasing the string in the caller (right after the loop which uses it in each case). Based-on-a-patch-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | cat-file: split batch "buf" into two variablesJeff King2018-08-141-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use the "buf" strbuf for two things: to read incoming lines, and as a scratch space for test-expanding the user-provided format. Let's split this into two variables with descriptive names, which makes their purpose and lifetime more clear. It will also help in a future patch when we start using the "output" buffer for more expansions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | cat-file: use oidset check-and-insertJeff King2018-08-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to check if the oidset has our object before we insert it; that's done as part of the insertion. We can just rely on the return value from oidset_insert(), which saves one hash lookup per object. This measurable speedup is tiny and within the run-to-run noise, but the result is simpler to read, too. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | cat-file: support "unordered" output for --batch-all-objectsJeff King2018-08-133-5/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you're going to access the contents of every object in a packfile, it's generally much more efficient to do so in pack order, rather than in hash order. That increases the locality of access within the packfile, which in turn is friendlier to the delta base cache, since the packfile puts related deltas next to each other. By contrast, hash order is effectively random, since the sha1 has no discernible relationship to the content. This patch introduces an "--unordered" option to cat-file which iterates over packs in pack-order under the hood. You can see the results when dumping all of the file content: $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c 6883195596 real 0m44.491s user 0m42.902s sys 0m5.230s $ time ./git cat-file --unordered \ --batch-all-objects --buffer --batch | wc -c 6883195596 real 0m6.075s user 0m4.774s sys 0m3.548s Same output, different order, way faster. The same speed-up applies even if you end up accessing the object content in a different process, like: git cat-file --batch-all-objects --buffer --batch-check | grep blob | git cat-file --batch='%(objectname) %(rest)' | wc -c Adding "--unordered" to the first command drops the runtime in git.git from 24s to 3.5s. Side note: there are actually further speedups available for doing it all in-process now. Since we are outputting the object content during the actual pack iteration, we know where to find the object and could skip the extra lookup done by oid_object_info(). This patch stops short of that optimization since the underlying API isn't ready for us to make those sorts of direct requests. So if --unordered is so much better, why not make it the default? Two reasons: 1. We've promised in the documentation that --batch-all-objects outputs in hash order. Since cat-file is plumbing, people may be relying on that default, and we can't change it. 2. It's actually _slower_ for some cases. We have to compute the pack revindex to walk in pack order. And our de-duplication step uses an oidset, rather than a sort-and-dedup, which can end up being more expensive. If we're just accessing the type and size of each object, for example, like: git cat-file --batch-all-objects --buffer --batch-check my best-of-five warm cache timings go from 900ms to 1100ms using --unordered. Though it's possible in a cold-cache or under memory pressure that we could do better, since we'd have better locality within the packfile. And one final question: why is it "--unordered" and not "--pack-order"? The answer is again two-fold: 1. "pack order" isn't a well-defined thing across the whole set of objects. We're hitting loose objects, as well as objects in multiple packs, and the only ordering we're promising is _within_ a single pack. The rest is apparently random. 2. The point here is optimization. So we don't want to promise any particular ordering, but only to say that we will choose an ordering which is likely to be efficient for accessing the object content. That leaves the door open for further changes in the future without having to add another compatibility option. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | cat-file: rename batch_{loose,packed}_object callbacksJeff King2018-08-131-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're not really doing the batch-show operation in these callbacks, but just collecting the set of objects. That distinction will become more important in a future patch, so let's rename them now to avoid cluttering that diff. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | t1006: test cat-file --batch-all-objects with duplicatesJeff King2018-08-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test for --batch-all-objects in t1006 covers a variety of object storage situations, but one thing it doesn't cover is that we avoid mentioning duplicate objects. We won't have any because running "git repack -ad" will have packed them all and deleted the loose ones. This does work (because we sort and de-dup the output list), but it's good to include it in our test. And doubly so for when we add an unordered mode which has to de-dup in a different way. Note that we cannot just re-create one of the objects, as Git will omit the write of an object that is already present. However, we can create a new pack with one of the objects, which forces the duplication. One alternative would be to just use "git repack -a" instead of "-ad". But then _every_ object would be duplicated as loose and packed, and we might miss a bug that omits packed objects (because we'd show their loose counterparts). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | for_each_packed_object: support iterating in pack-orderJeff King2018-08-134-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently iterate over objects within a pack in .idx order, which uses the object hashes. That means that it is effectively random with respect to the location of the object within the pack. If you're going to access the actual object data, there are two reasons to move linearly through the pack itself: 1. It improves the locality of access in the packfile. In the cold-cache case, this may mean fewer disk seeks, or better usage of disk cache. 2. We store related deltas together in the packfile. Which means that the delta base cache can operate much more efficiently if we visit all of those related deltas in sequence, as the earlier items are likely to still be in the cache. Whereas if we visit the objects in random order, our cache entries are much more likely to have been evicted by unrelated deltas in the meantime. So in general, if you're going to access the object contents pack order is generally going to end up more efficient. But if you're simply generating a list of object names, or if you're going to end up sorting the result anyway, you're better off just using the .idx order, as finding the pack order means generating the in-memory pack-revindex. According to the numbers in 8b8dfd5132 (pack-revindex: radix-sort the revindex, 2013-07-11), that takes about 200ms for linux.git, and 20ms for git.git (those numbers are a few years old but are still a good ballpark). That makes it a good optimization for some cases (we can save tens of seconds in git.git by having good locality of delta access, for a 20ms cost), but a bad one for others (e.g., right now "cat-file --batch-all-objects --batch-check="%(objectname)" is 170ms in git.git, so adding 20ms to that is noticeable). Hence this patch makes it an optional flag. You can't actually do any interesting timings yet, as it's not plumbed through to any user-facing tools like cat-file. That will come in a later patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | for_each_*_object: give more comprehensive docstringsJeff King2018-08-132-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already mention the local/alternate behavior of these functions, but we can help clarify a few other behaviors: - there's no need to mention LOCAL_ONLY specifically, since we already reference the flags by type (and as we add more flags, we don't want to have to mention each) - clarify that reachability doesn't matter here; this is all accessible objects - what ordering/uniqueness guarantees we give - how pack-specific flags are handled for the loose case Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | for_each_*_object: take flag arguments as enumJeff King2018-08-134-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not wrong to pass our flags in an "unsigned", as we know it will be at least as large as the enum. However, using the enum in the declaration makes it more obvious where to find the list of flags. While we're here, let's also drop the "extern" noise-words from the declarations, per our modern coding style. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | for_each_*_object: store flag definitions in a single locationJeff King2018-08-132-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These flags were split between cache.h and packfile.h, because some of the flags apply only to packs. However, they share a single numeric namespace, since both are respected for the packed variant. Let's make sure they're defined together so that nobody accidentally adds a new flag in one location that duplicates the other. While we're here, let's also put them in an enum (which helps debugger visibility) and use "(1<<n)" rather than counting powers of 2 manually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | Merge branch 'ab/fetch-tags-noclobber'Junio C Hamano2018-08-203-22/+47
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test and doc clean-ups. * ab/fetch-tags-noclobber: pull doc: fix a long-standing grammar error fetch tests: correct a comment "remove it" -> "remove them" push tests: assert re-pushing annotated tags push tests: add more testing for forced tag pushing push tests: fix logic error in "push" test assertion push tests: remove redundant 'git push' invocation fetch tests: change "Tag" test tag to "testTag"
| * | | | | pull doc: fix a long-standing grammar errorÆvar Arnfjörð Bjarmason2018-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should be "is not an empty string" not "is not empty string". This fixes wording originally introduced in ab9b31386b ("Documentation: multi-head fetch.", 2005-08-24). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>