summaryrefslogtreecommitdiff
path: root/src/odb.c
Commit message (Collapse)AuthorAgeFilesLines
* tests: reset odb backend priorityethomson/odb_tests_priorityEdward Thomson2021-07-301-2/+2
|
* odb: Implement option for overriding of default odb backend priorityTony De La Nuez2021-07-301-6/+6
| | | | | | | | | | | Introduce GIT_OPT_SET_ODB_LOOSE_PRIORITY and GIT_OPT_SET_ODB_PACKED_PRIORITY to allow overriding the default priority values for the default ODB backends. Libgit2 has historically assumed that most objects for long- running operations will be packed, therefore GIT_LOOSE_PRIORITY is set to 1 by default, and GIT_PACKED_PRIORITY to 2. When a client allows libgit2 to set the default backends, they can specify an override for the two priority values in order to change the order in which each ODB backend is accessed.
* Merge pull request #5765 from lhchavez/cgraph-revwalksEdward Thomson2021-07-261-0/+53
|\ | | | | commit-graph: Use the commit-graph in revwalks
| * commit-graph: Create `git_commit_graph` as an abstraction for the filelhchavez2021-03-101-60/+39
| | | | | | | | | | | | | | | | | | | | This change does a medium-size refactor of the git_commit_graph_file and the interaction with the ODB. Now instead of the ODB owning a direct reference to the git_commit_graph_file, there will be an intermediate git_commit_graph. The main advantage of that is that now end users can explicitly set a git_commit_graph that is eagerly checked for errors, while still being able to lazily use the commit-graph in a regular ODB, if the file is present.
| * commit-graph: Use the commit-graph in revwalkslhchavez2021-03-101-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | This change makes revwalks a bit faster by using the `commit-graph` file (if present). This is thanks to the `commit-graph` allow much faster parsing of the commit information by requiring near-zero I/O (aside from reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit, instead of having to read the ODB, inflate the commit, and parse it. This is done by modifying `git_commit_list_parse()` and letting it use the ODB-owned commit-graph file. Part of: #5757
* | Tolerate readlink size less than st_sizeDavid Tolnay2021-05-301-3/+4
| |
* | filter: internal git_buf filter handling functionEdward Thomson2021-05-061-3/+1
|/ | | | | | | | | | | | Introduce `git_filter_list__convert_buf` which behaves like the old implementation of `git_filter_list__apply_data`, where it might move the input data buffer over into the output data buffer space for efficiency. This new implementation will do so in a more predictible way, always freeing the given input buffer (either moving it to the output buffer or filtering it into the output buffer first). Convert internal users to it.
* Make the odb race-freelhchavez2020-11-281-17/+153
| | | | | | | This change adds all the necessary locking to the odb to avoid races in the backends. Part of: #5592
* odb: use GIT_ASSERTEdward Thomson2020-11-271-20/+41
|
* Improve the support of atomicslhchavez2020-10-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | This change: * Starts using GCC's and clang's `__atomic_*` intrinsics instead of the `__sync_*` ones, since the former supercede the latter (and can be safely replaced by their equivalent `__atomic_*` version with the sequentially consistent model). * Makes `git_atomic64`'s value `volatile`. Otherwise, this will make ThreadSanitizer complain. * Adds ways to load the values from atomics. As it turns out, unsynchronized read are okay only in some architectures, but if we want to be correct (and make ThreadSanitizer happy), those loads should also be performed with the atomic builtins. * Fixes two ThreadSanitizer warnings, as a proof-of-concept that this works: - Avoid directly accessing `git_refcount`'s `owner` directly, and instead makes all callers go through the `GIT_REFCOUNT_*()` macros, which also use the atomic utilities. - Makes `pool_system_page_size()` race-free. Part of: #5592
* Make the tests pass cleanly with MemorySanitizerlhchavez2020-06-301-3/+4
| | | | | | | | | This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
* tree-wide: do not compile deprecated functions with hard deprecationPatrick Steinhardt2020-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
* futils_filesize: use `uint64_t` for object sizeEdward Thomson2019-11-221-8/+14
| | | | | Instead of using a signed type (`off_t`) use `uint64_t` for the maximum size of files.
* odb: use `git_object_size_t` for object sizeEdward Thomson2019-11-221-5/+5
| | | | | Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
* fileops: rename to "futils.h" to match function signaturesPatrick Steinhardt2019-07-201-1/+1
| | | | | | | | | Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
* configuration: cvar -> configmapPatrick Steinhardt2019-07-181-1/+1
| | | | | `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
* docs: fixupsEtienne Samson2019-06-261-0/+1
|
* oid: `is_zero` instead of `iszero`Edward Thomson2019-06-161-5/+5
| | | | | | The only function that is named `issomething` (without underscore) was `git_oid_iszero`. Rename it to `git_oid_is_zero` for consistency with the rest of the library.
* odb: provide a free function for custom backendsethomson/odb_backend_allocationsEdward Thomson2019-02-231-0/+6
| | | | | | | | | | | | Custom backends can allocate memory when reading objects and providing them to libgit2. However, if an error occurs in the custom backend after the memory has been allocated for the custom object but before it's returned to libgit2, the custom backend has no way to free that memory and it must be leaked. Provide a free function that corresponds to the alloc function so that custom backends have an opportunity to free memory before they return an error.
* odb: rename git_odb_backend_malloc for consistencyEdward Thomson2019-02-231-1/+6
| | | | | | | | | | | | The `git_odb_backend_malloc` name is a system function that is provided for custom ODB backends and allows them to allocate memory for an ODB object in the read callback. This is important so that libgit2 can later free the memory used by an ODB object that was read from the custom backend. However, the name _suggests_ that it actually allocates a `git_odb_backend`. It does not; rename it to make it clear that it actually allocates backend _data_.
* indexer: use git_indexer_progress throughoutEdward Thomson2019-02-221-1/+1
| | | | | Update internal usage of `git_transfer_progress` to `git_indexer_progreses`.
* cache: fix misnaming of `git_cache_free`Patrick Steinhardt2019-02-211-2/+2
| | | | | | | | | Functions that free a structure's contents but not the structure itself shall be named `dispose` in the libgit2 project, but the function `git_cache_free` does not follow this naming pattern. Fix this by renaming it to `git_cache_dispose` and adjusting all callers to make use of the new name.
* Fix a memory leak in odb_otype_fast()lhchavez2019-02-201-0/+1
| | | | This change frees a copy of a cached object in odb_otype_fast().
* Fix a _very_ improbable memory leak in git_odb_new()lhchavez2019-02-161-2/+6
| | | | | | This change fixes a mostly theoretical memory leak in got_odb_new() that can only manifest if git_cache_init() fails due to running out of memory or not being able to acquire its lock.
* blob: validate that blob sizes fit in a size_tEdward Thomson2019-01-251-6/+6
| | | | | | Our blob size is a `git_off_t`, which is a signed 64 bit int. This may be erroneously negative or larger than `SIZE_MAX`. Ensure that the blob size fits into a `size_t` before casting.
* git_error: use new names in internal APIs and usageEdward Thomson2019-01-221-29/+29
| | | | | Move to the `git_error` name in the internal API for error-related functions.
* Fix odb foreach to also close on positive error codeMarijan Ć uflaj2019-01-201-1/+1
| | | | | | | | In include/git2/odb.h it states that callback can also return positive value which should break looping. Implementations of git_odb_foreach() and pack_backend__foreach() did not respect that.
* Merge pull request #4940 from libgit2/ethomson/git_objEdward Thomson2019-01-191-3/+3
|\ | | | | More `git_obj` to `git_object` updates
| * object_type: GIT_OBJECT_BAD is now GIT_OBJECT_INVALIDEdward Thomson2019-01-171-3/+3
| | | | | | | | | | | | | | We use the term "invalid" to refer to bad or malformed data, eg `GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the names of the `git_object_t`s in this release, update it to be `GIT_OBJECT_INVALID` instead of `BAD`.
* | Fix a bunch of warningslhchavez2019-01-051-1/+1
|/ | | | | | | | | | | This change fixes a bunch of warnings that were discovered by compiling with `clang -target=i386-pc-linux-gnu`. It turned out that the intrinsics were not necessarily being used in all platforms! Especially in GCC, since it does not support __has_builtin. Some more warnings were gleaned from the Windows build, but I stopped when I saw that some third-party dependencies (e.g. zlib) have warnings of their own, so we might never be able to enable -Werror there.
* object_type: use new enumeration namesethomson/index_fixesEdward Thomson2018-12-011-29/+29
| | | | Use the new object_type enumeration names within the codebase.
* odb: fix use of wrong printf formattersPatrick Steinhardt2018-08-061-2/+2
| | | | | | | | | The `git_odb_stream` members `declared_size` and `received_bytes` are both of the type `git_off_t`, which we usually defined to be a 64 bit signed integer. Thus, passing these members to "PRIdZ" formatters is not correct, as they are not guaranteed to accept big enough numbers. Instead, use the "PRId64" formatter, which is able to represent 64 bit signed integers.
* Convert usage of `git_buf_free` to new `git_buf_dispose`Patrick Steinhardt2018-06-101-7/+7
|
* odb: fix writing to fake write streamsPatrick Steinhardt2018-03-231-1/+1
| | | | | | | | | | | | | | | | | | In commit 7ec7aa4a7 (odb: assert on logic errors when writing objects, 2018-02-01), the check for whether we are trying to overflowing the fake stream buffer was changed from returning an error to raising an assert. The conversion forgot though that the logic around `assert`s are basically inverted. Previously, if the statement stream->written + len > steram->size evaluated to true, we would return a `-1`. Now we are asserting that this statement is true, and in case it is not we will raise an error. So the conversion to the `assert` in fact changed the behaviour to the complete opposite intention. Fix the assert by inverting its condition again and add a regression test.
* odb: fix memory leaks due to not freeing hash contextPatrick Steinhardt2018-02-091-0/+2
|
* odb: error when we can't create object headerEdward Thomson2018-02-091-18/+38
| | | | | Return an error to the caller when we can't create an object header for some reason (printf failure) instead of simply asserting.
* odb: assert on logic errors when writing objectsEdward Thomson2018-02-091-2/+1
| | | | | There's no recovery possible if we're so confused or corrupted that we're trying to overwrite our memory. Simply assert.
* git_odb__hashfd: propagate error on failuresEdward Thomson2018-02-091-1/+1
|
* git_odb__hashobj: provide errors messages on failuresEdward Thomson2018-02-091-4/+8
| | | | | | Provide error messages on hash failures: assert when given invalid input instead of failing with a user error; provide error messages on program errors.
* odb: check for alloc errors on hardcoded objectsEdward Thomson2018-02-091-6/+14
| | | | | | | It's unlikely that we'll fail to allocate a single byte, but let's check for allocation failures for good measure. Untangle `-1` being a marker of not having found the hardcoded odb object; use that to reflect actual errors.
* odb: error when we can't alloc an objectEdward Thomson2018-02-091-2/+6
| | | | | At the moment, we're swallowing the allocation failure. We need to return the error to the caller.
* odb: provide length and type with streaming readEdward Thomson2018-02-011-2/+7
| | | | | The streaming read functionality should provide the length and the type of the object, like the normal read functionality does.
* odb: reject reading and writing null OIDsPatrick Steinhardt2018-01-261-1/+25
| | | | | | | | | The null OID (hash with all zeroes) indicates a missing object in upstream git and is thus not a valid object ID. Add defensive measurements to avoid writing such a hash to the object database in the very unlikely case where some data results in the null OID. Furthermore, add shortcuts when reading the null OID from the ODB to avoid ever returning an object when a faulty repository may contain the null OID.
* Make sure to always include "common.h" firstPatrick Steinhardt2017-07-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
* odb_read_prefix: reset error in backends loopethomson/read_prefixEdward Thomson2017-06-121-1/+4
| | | | | | | | | When looking for an object by prefix, we query all the backends so that we can ensure that there is no ambiguity. We need to reset the `error` value between backends; otherwise the first backend may find an object by prefix, but subsequent backends may not. If we do not reset the `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1` will fail, despite having actually found an object.
* odb: fix printf formatter for git_off_tPatrick Steinhardt2017-05-151-3/+3
| | | | | | | | | | The fields `declared_size` and `received_bytes` of the `git_odb_stream` are both of type `git_off_t` which is defined as a signed integer. When passing these values to a printf-style string in `git_odb_stream__invalid_length`, though, we format these as PRIuZ, which is unsigned. Fix the issue by using PRIdZ instead, silencing warnings on macOS.
* odb: shut up gcc warnings regarding uninitilized variablesPatrick Steinhardt2017-05-151-2/+2
| | | | | | | | | | | | The `error` variable is used as a return value in the out-section of both `odb_read_1` and `read_prefix_1`. While the value will actually always be initialized inside of this section, GCC fails to realize this due to interactions with the `found` variable: if `found` is set, the error will always be initialized. If it is not, we return early without reaching the out-statements. Shut up the warnings by initializing the error variable, even though it is unnecessary.
* odb: verify hashes in read_prefix_1Patrick Steinhardt2017-04-281-0/+12
| | | | | | While the function reading an object from the complete OID already verifies OIDs, we do not yet do so for reading objects from a partial OID. Do so when strict OID verification is enabled.
* odb: improve error handling in read_prefix_1Patrick Steinhardt2017-04-281-7/+20
| | | | | | | | | | | | The read_prefix_1 function has several return statements springled throughout the code. As we have to free memory upon getting an error, the free code has to be repeated at every single retrun -- which it is not, so we have a memory leak here. Refactor the code to use the typical `goto out` pattern, which will free data when an error has occurred. While we're at it, we can also improve the error message thrown when multiple ambiguous prefixes are found. It will now include the colliding prefixes.
* odb: add option to turn off hash verificationPatrick Steinhardt2017-04-281-5/+9
| | | | | | | | | | | Verifying hashsums of objects we are reading from the ODB may be costly as we have to perform an additional hashsum calculation on the object. Especially when reading large objects, the penalty can be as high as 35%, as can be seen when executing the equivalent of `git cat-file` with and without verification enabled. To mitigate for this, we add a global option for libgit2 which enables the developer to turn off the verification, e.g. when he can be reasonably sure that the objects on disk won't be corrupted.