summaryrefslogtreecommitdiff
path: root/src/odb.c
Commit message (Collapse)AuthorAgeFilesLines
* odb_read_prefix: reset error in backends loopethomson/read_prefixEdward Thomson2017-06-121-1/+4
| | | | | | | | | When looking for an object by prefix, we query all the backends so that we can ensure that there is no ambiguity. We need to reset the `error` value between backends; otherwise the first backend may find an object by prefix, but subsequent backends may not. If we do not reset the `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1` will fail, despite having actually found an object.
* odb: fix printf formatter for git_off_tPatrick Steinhardt2017-05-151-3/+3
| | | | | | | | | | The fields `declared_size` and `received_bytes` of the `git_odb_stream` are both of type `git_off_t` which is defined as a signed integer. When passing these values to a printf-style string in `git_odb_stream__invalid_length`, though, we format these as PRIuZ, which is unsigned. Fix the issue by using PRIdZ instead, silencing warnings on macOS.
* odb: shut up gcc warnings regarding uninitilized variablesPatrick Steinhardt2017-05-151-2/+2
| | | | | | | | | | | | The `error` variable is used as a return value in the out-section of both `odb_read_1` and `read_prefix_1`. While the value will actually always be initialized inside of this section, GCC fails to realize this due to interactions with the `found` variable: if `found` is set, the error will always be initialized. If it is not, we return early without reaching the out-statements. Shut up the warnings by initializing the error variable, even though it is unnecessary.
* odb: verify hashes in read_prefix_1Patrick Steinhardt2017-04-281-0/+12
| | | | | | While the function reading an object from the complete OID already verifies OIDs, we do not yet do so for reading objects from a partial OID. Do so when strict OID verification is enabled.
* odb: improve error handling in read_prefix_1Patrick Steinhardt2017-04-281-7/+20
| | | | | | | | | | | | The read_prefix_1 function has several return statements springled throughout the code. As we have to free memory upon getting an error, the free code has to be repeated at every single retrun -- which it is not, so we have a memory leak here. Refactor the code to use the typical `goto out` pattern, which will free data when an error has occurred. While we're at it, we can also improve the error message thrown when multiple ambiguous prefixes are found. It will now include the colliding prefixes.
* odb: add option to turn off hash verificationPatrick Steinhardt2017-04-281-5/+9
| | | | | | | | | | | Verifying hashsums of objects we are reading from the ODB may be costly as we have to perform an additional hashsum calculation on the object. Especially when reading large objects, the penalty can be as high as 35%, as can be seen when executing the equivalent of `git cat-file` with and without verification enabled. To mitigate for this, we add a global option for libgit2 which enables the developer to turn off the verification, e.g. when he can be reasonably sure that the objects on disk won't be corrupted.
* odb: verify object hashesPatrick Steinhardt2017-04-281-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upstream git.git project verifies objects when looking them up from disk. This avoids scenarios where objects have somehow become corrupt on disk, e.g. due to hardware failures or bit flips. While our mantra is usually to follow upstream behavior, we do not do so in this case, as we never check hashes of objects we have just read from disk. To fix this, we create a new error class `GIT_EMISMATCH` which denotes that we have looked up an object with a hashsum mismatch. `odb_read_1` will then, after having read the object from its backend, hash the object and compare the resulting hash to the expected hash. If hashes do not match, it will return an error. This obviously introduces another computation of checksums and could potentially impact performance. Note though that we usually perform I/O operations directly before doing this computation, and as such the actual overhead should be drowned out by I/O. Running our test suite seems to confirm this guess. On a Linux system with best-of-five timings, we had 21.592s with the check enabled and 21.590s with the ckeck disabled. Note though that our test suite mostly contains very small blobs only. It is expected that repositories with bigger blobs may notice an increased hit by this check. In addition to a new test, we also had to change the odb::backend::nonrefreshing test suite, which now triggers a hashsum mismatch when looking up the commit "deadbeef...". This is expected, as the fake backend allocated inside of the test will return an empty object for the OID "deadbeef...", which will obviously not hash back to "deadbeef..." again. We can simply adjust the hash to equal the hash of the empty object here to fix this test.
* Merge pull request #4030 from libgit2/ethomson/fsyncEdward Thomson2017-03-221-5/+23
|\ | | | | fsync all the things
| * Honor `core.fsyncObjectFiles`ethomson/fsyncEdward Thomson2017-03-021-5/+23
| |
* | git_commit_create: freshen tree objects in commitethomson/freshen_treesEdward Thomson2017-03-031-3/+3
|/ | | | Freshen the tree object that a commit points to during commit time.
* giterr_set: consistent error messagesEdward Thomson2016-12-291-12/+12
| | | | | | | | Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
* common: cast precision specifiers to intPatrick Steinhardt2016-11-141-1/+1
|
* odb: only provide the empty treeEdward Thomson2016-08-051-5/+0
| | | | | | | Only provide the empty tree internally, which matches git's behavior. If we provide the empty blob then any users trying to write it with libgit2 would omit it from actually landing in the odb, which appear to git proper as a broken repository (missing that object).
* odb: freshen existing objects when writingEdward Thomson2016-08-041-3/+44
| | | | | | When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.
* Merge pull request #3223 from ethomson/applyEdward Thomson2016-06-251-1/+1
|\ | | | | Reading patch files
| * delta: move delta application to delta.cEdward Thomson2016-05-261-1/+1
| | | | | | | | | | | | | | Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.
* | fix error message SHA truncation in git_odb__error_notfound()Sim Domingo2016-06-201-1/+1
|/
* odb: Try to lookup headers in all backends before passthroughvmg/expand-fixesVicent Marti2016-03-091-5/+20
|
* odb: Refactor `git_odb_expand_ids`Vicent Marti2016-03-091-21/+26
|
* odb: Implement new helper to read types without refreshingVicent Marti2016-03-091-45/+104
|
* odb: Handle corner cases in `git_odb_expand_ids`Vicent Marti2016-03-091-22/+27
| | | | | | | | | | The old implementation had two issues: 1. OIDs that were too short as to be ambiguous were not being handled properly. 2. If the last OID to expand in the array was missing from the ODB, we would leak a `GIT_ENOTFOUND` error code from the function.
* git_odb_expand_ids: accept git_odb_expand_id arrayEdward Thomson2016-03-081-32/+19
| | | | Take (and write to) an array of a struct, `git_odb_expand_id`.
* git_odb_expand_ids: rename func, return the typeEdward Thomson2016-03-081-5/+11
|
* git_odb_exists_many_prefixes: query odb for multiple short idsEdward Thomson2016-03-071-12/+68
| | | | | Query the object database for multiple objects at a time, given their object ID (which may be abbreviated) and optional type.
* odb: improved not found error messagesEdward Thomson2016-03-071-7/+10
| | | | | When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.
* odb: Prioritize alternate backendsvmg/odb-lookupsVicent Marti2015-10-141-4/+8
| | | | | | | | | | | | For most real use cases, repositories with alternates use them as main object storage. Checking the alternate for objects before the main repository should result in measurable speedups. Because of this, we're changing the sorting algorithm to prioritize alternates *in cases where two backends have the same priority*. This means that the pack backend for the alternate will be checked before the pack backend for the main repository *but* both of them will be checked before any loose backends.
* odb: Be smarter when refreshing backendsVicent Marti2015-10-141-75/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation of ODB backends, each backend is tasked with refreshing itself after a failed lookup. This is standard Git behavior: we want to e.g. reload the packfiles on disk in case they have changed and that's the reason we can't find the object we're looking for. This behavior, however, becomes pathological in repositories where multiple alternates have been loaded. Given that each alternate counts as a separate backend, a miss in the main repository (which can potentially be very frequent in cases where object storage comes from the alternate) will result in refreshing all its packfiles before we move on to the alternate backend where the object will most likely be found. To fix this, the code in `odb.c` has been refactored as to perform the refresh of all the backends externally, once we've verified that the object is nowhere to be found. If the refresh is successful, we then perform the lookup sequentially through all the backends, skipping the ones that we know for sure weren't refreshed (because they have no refresh API). The on-disk pack backend has been adjusted accordingly: it no longer performs refreshes internally.
* refdb and odb backends must provide `free` functionArthur Schreiber2015-10-011-2/+1
| | | | | | | | | As refdb and odb backends can be allocated by client code, libgit2 can’t know whether an alternative memory allocator was used, and thus should not try to call `git__free` on those objects. Instead, odb and refdb backend implementations must always provide their own `free` functions to ensure memory gets freed correctly.
* odb: cast to long long for printfEdward Thomson2015-06-291-1/+1
|
* Fixed build warnings on Xcode 6.1Pierre-Olivier Latour2015-06-021-1/+1
|
* Merge pull request #3118 from libgit2/cmn/stream-sizeEdward Thomson2015-05-131-4/+9
|\ | | | | odb: make the writestream's size a git_off_t
| * odb: make the writestream's size a git_off_tcmn/stream-sizeCarlos Martín Nieto2015-05-131-4/+9
| | | | | | | | | | | | | | | | | | | | Restricting files to size_t is a silly limitation. The loose backend writes to a file directly, so there is no issue in using 63 bits for the size. We still assume that the header is going to fit in 64 bytes, which does mean quite a bit smaller files due to the run-length encoding, but it's still a much larger size than you would want Git to handle.
* | odb: reverse the default backend prioritiescmn/backends-prioCarlos Martín Nieto2015-05-131-3/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | We currently first look in the loose object dir and then in the packs for objects. When performing operations on recent history this has a higher likelihood of hitting, but when we deal with operations which look further back into the past, we start spending a large amount of time getting ENOTENT from `access`. Reversing the priorities means that long-running operations can get to their objects faster, as we can look at the index data we have in memory (or rather mapped) to figure out whether we have an object, which is faster than going out to the filesystem. The packed backend already implements an optimistic read algorithm by first looking at the packs we know about and only going out to disk to referesh if the object is not found which means that in the case where we do have the object (which will be in the majority for anything that traverses the graph) we can avoid going to to disk entirely to determine whether an object exists. Operations which look at recent history may take a slight impact, but these would be operations which look a lot less at object and thus take less time regardless.
* centralizing all IO buffer size valuesJ Wyman2015-05-111-1/+1
|
* Make our overflow check look more like gcc/clang'sEdward Thomson2015-02-131-7/+8
| | | | | | | | | Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
* odb__hashlink: check st.st_size before castingEdward Thomson2015-02-121-9/+9
|
* allocations: test for overflow of requested sizeEdward Thomson2015-02-121-0/+1
| | | | | Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
* win32: remember to cleanup our hash_ctxEdward Thomson2014-12-091-0/+1
|
* odb: `git_odb_object` contents are never NULLvmg/emptyVicent Marti2014-11-211-2/+2
| | | | | | | This is a contract that we made in the library and that we need to uphold. The contents of a blob can never be NULL because several parts of the library (including the filter and attributes code) expect `git_blob_rawcontent` to always return a valid pointer.
* odb: hardcode the empty blob and treecmn/empty-objectsCarlos Martín Nieto2014-11-081-1/+23
| | | | | | | | | | | | | | | | | | | | | git hardocodes these as objects which exist regardless of whether they are in the odb and uses them in the shell interface as a way of expressing the lack of a blob or tree for one side of e.g. a diff. In the library we use each language's natural way of declaring a lack of value which makes a workaround like this unnecessary. Since git uses it, it does however mean each shell application would need to perform this check themselves. This makes it common work across a range of applications and an issue with compatibility with git, which fits right into what the library aims to provide. Thus we introduce the hard-coded empty blob and tree in the odb frontend. These hard-coded objects are checked for before going to the backends, but after the cache check, which means the second time they're used, they will be treated as normal cached objects instead of creating new ones.
* odb: clear backend errors on successful readCarlos Martín Nieto2014-05-231-0/+1
| | | | | We go through the different backends in order, so it's not an error if at least one of the backends has the data we want.
* Fix remaining init_options inconsistenciesRussell Belfer2014-05-021-9/+4
| | | | | There were a couple of "init_opts()" functions a few more cases of structure initialization that I somehow missed.
* Don't redefine the same callback types, their signatures may changeJacques Germishuys2014-04-211-1/+1
|
* Merge pull request #2178 from libgit2/rb/fix-short-idEdward Thomson2014-03-311-7/+17
|\ | | | | Fix git_odb_short_id and git_odb_exists_prefix bugs
| * Fix a number of git_odb_exists_prefix bugsRussell Belfer2014-03-101-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The git_odb_exists_prefix API was not dealing correctly when a later backend returned GIT_ENOTFOUND even if an earlier backend had found the object. Additionally, the unit tests were not properly exercising the API and had a couple mistakes in checking the results. Lastly, since the backends are not expected to behavior correctly unless all bytes of the short id are zero except for the prefix, this makes the ODB prefix APIs explicitly clear out the extra bytes so the user doesn't have to be as careful.
* | Fix wrong assertionLinquize2014-03-211-1/+1
|/ | | | Fixes issue #2196
* Added function-based initializers for every options struct.Matthew Bowen2014-03-051-0/+11
| | | | The basic structure of each function is courtesy of arrbee.
* Merge pull request #2159 from libgit2/rb/odb-exists-prefixVicent Marti2014-03-061-1/+55
|\ | | | | Add ODB API to check for existence by prefix and object id shortener
| * Check short OID len in odb, not in backendsRussell Belfer2014-03-051-1/+0
| |
| * Add exists_prefix to ODB backend and ODB APIRussell Belfer2014-03-041-0/+55
| |