summaryrefslogtreecommitdiff
path: root/src/index.c
Commit message (Collapse)AuthorAgeFilesLines
* index functions: return an intEdward Thomson2020-01-241-6/+17
| | | | | Stop returning a void for functions, future-proofing them to allow them to fail.
* index: fix resizing index map twice on case-insensitive systemsPatrick Steinhardt2020-01-141-17/+14
| | | | | | | | | | | | | | | | | | | | | | | Depending on whether the index map is case-sensitive or insensitive, we need to call either `git_idxmap_icase_resize` or `git_idxmap_resize`. There are multiple locations where we thus use the following pattern: if (index->ignore_case && git_idxmap_icase_resize(map, length) < 0) return -1; else if (git_idxmap_resize(map, length) < 0) return -1; The funny thing is: on case-insensitive systems, we will try to resize the map twice in case where `git_idxmap_icase_resize()` doesn't error. While this will still use the correct hashing function as both map types use the same, this bug will at least cause us to resize the map twice in a row. Fix the issue by introducing a new function `index_map_resize` that handles case-sensitivity, similar to how `index_map_set` and `index_map_delete`. Convert all call sites where we were previously resizing the map to use that new function.
* index: replace map macros with inline functionsPatrick Steinhardt2020-01-141-43/+34
| | | | | | | | | | | Traditionally, our maps were mostly implemented via macros that had weird call semantics. This shows in our index code, where we have macros that insert into an index map case-sensitively or insensitively, as they still return error codes via an error parameter. This is unwieldy and, most importantly, not necessary anymore, due to the introduction of our high-level map API and removal of macros. Replace them with inlined functions to make code easier to read.
* configuration: cvar -> configmapPatrick Steinhardt2019-07-181-3/+3
| | | | | `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
* index: safely cast file sizeEdward Thomson2019-06-241-1/+6
|
* index: rename `frombuffer` to `from_buffer`Edward Thomson2019-06-161-1/+10
| | | | | | The majority of functions are named `from_something` (with an underscore) instead of `fromsomething`. Update the index functions for consistency with the rest of the library.
* blob: add underscore to `from` functionsEdward Thomson2019-06-161-1/+1
| | | | | | The majority of functions are named `from_something` (with an underscore) instead of `fromsomething`. Update the blob functions for consistency with the rest of the library.
* idxmap: have `resize` functions return proper error codePatrick Steinhardt2019-02-151-14/+22
| | | | | | | | | | The currently existing function `git_idxmap_resize` and `git_idxmap_icase_resize` do not return any error codes at all due to their previous implementation making use of a macro. Due to that, it is impossible to see whether the resize operation might have failed due to an out-of-memory situation. Fix this by providing a proper error code. Adjust callers to make use of it.
* idxmap: introduce high-level setter for key/value pairsPatrick Steinhardt2019-02-151-7/+7
| | | | | | | | | | | | Currently, one would use the function `git_idxmap_insert` to insert key/value pairs into a map. This function has historically been a macro, which is why its syntax is kind of weird: instead of returning an error code directly, it instead has to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes. Introduce a new function `git_idxmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all callers of `git_idxmap_insert` to make use of it.
* idxmap: introduce high-level getter for valuesPatrick Steinhardt2019-02-151-9/+10
| | | | | | | | | | | | | | The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce new high-level functions `git_idxmap_get` and `git_idxmap_icase_get` that take a map and a key and return a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
* maps: use uniform lifecycle management functionsPatrick Steinhardt2019-02-151-7/+7
| | | | | | | | | | | | | | | | Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
* index: explicitly cast down to a size_tEdward Thomson2019-01-251-1/+1
| | | | | | Quiet down a warning from MSVC about how we're potentially losing data. This cast is safe since we've explicitly tested that `strip_len` <= `last_len`.
* index: preserve extension parsing errorsEtienne Samson2019-01-241-14/+15
| | | | | | | Previously, we would clobber any extension-specific error message with an "extension is truncated" message. This makes `read_extension` correctly preserve those errors, takes responsibility for truncation errors, and adds a new message with the actual extension signature for unsupported mandatory extensions.
* git_error: use new names in internal APIs and usageEdward Thomson2019-01-221-48/+48
| | | | | Move to the `git_error` name in the internal API for error-related functions.
* index: use new enum and structure namesEdward Thomson2018-12-011-47/+47
| | | | Use the new-style index names throughout our own codebase.
* khash: remove intricate knowledge of khash typesPatrick Steinhardt2018-11-281-8/+8
| | | | | | | Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
* Merge pull request #4884 from libgit2/ethomson/index_iteratorPatrick Steinhardt2018-11-211-0/+45
|\ | | | | index: introduce git_index_iterator
| * index: introduce git_index_iteratorethomson/index_iteratorEdward Thomson2018-11-141-0/+45
| | | | | | | | | | | | Provide a public git_index_iterator API that is backed by an index snapshot. This allows consumers to provide a stable iteration even while manipulating the index during iteration.
* | Merge pull request #4818 from pks-t/pks/index-collisionPatrick Steinhardt2018-11-131-45/+40
|\ \ | |/ |/| Index collision fixes
| * index: fix adding index entries with conflicting filesPatrick Steinhardt2018-10-191-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding an index entry "a/b/c" while an index entry "a/b" already exists, git will happily remove "a/b/c" and only add the new index entry: $ git init test Initialized empty Git repository in /tmp/test.repo/test/.git/ $ touch x $ git add x $ rm x $ mkdir x $ touch x/y $ git add x/y $ git status A x/y The other way round, adding an index entry "a/b" with an entry "a/b/c" already existing is equivalent, where git will remove "a/b/c" and add "a/b". In contrast, libgit2 will currently fail to add these properly and instead complain about the entry appearing as both a file and a directory. This is a programming error, though: our current code already tries to detect and, in the case of `git_index_add`, to automatically replace such index entries. Funnily enough, we already remove the conflicting index entries, but instead of adding the new entry we then bail out afterwards. This leaves callers with the worst of both worlds: we both remove the old entry but fail to add the new one. The root cause is weird semantics of the `has_file_name` and `has_dir_name` functions. While these functions only sound like they are responsible for detecting such conflicts, they will also already remove them in case where its `ok_to_replace` parameter is set. But even if we tell it to replace such entries, it will return an error code. Fix the error by returning success in case where the entries have been replaced. Fix an already existing test which tested for wrong behaviour. Note that the test didn't notice that the resulting tree had no entries. Thus it is fine to change existing behaviour here, as the previous result could've let to silently loosing data. Also add a new test that verifies behaviour in the reverse conflicting case.
| * index: modernize error handling of `index_insert`Patrick Steinhardt2018-10-191-31/+32
| | | | | | | | | | | | | | | | | | | | The current error hanling of the function `index_insert` is currently very fragile. Instead of erroring out in case an error has happened, it will instead verify that no error has happened for each statement. This makes adding new code to that function an adventurous task. Improve the situation by converting the function to use our typical `goto out` pattern.
* | index: avoid out-of-bounds read when reading reuc entry stagePatrick Steinhardt2018-10-181-1/+1
|/ | | | | | | | | | | We use `git__strtol64` to parse file modes of the index entries, which does not limit the parsed buffer length. As the index can be essentially treated as "untrusted" in that the data stems from the file system, it may be misformatted and may not contain terminating `NUL` bytes. This may lead to out-of-bounds reads when trying to parse index entries with such malformatted modes. Fix the issue by using `git__strntol64` instead.
* index: release the snapshot instead of freeing the indexEtienne Samson2018-09-111-1/+1
| | | | | Previously we would assert in index_free because the reader incrementation would not be balanced. Release the snapshot normally, so the variable gets decremented before the index is freed.
* Fix leak in index.cabyss72018-08-161-1/+2
|
* settings: optional unsaved index safetyEdward Thomson2018-06-291-1/+3
| | | | | | | | | | | | | | Add the `GIT_OPT_ENABLE_UNSAVED_INDEX_SAFETY` option, which will cause commands that reload the on-disk index to fail if the current `git_index` has changed that have not been saved. This will prevent users from - for example - adding a file to the index then calling a function like `git_checkout` and having that file be silently removed from the index since it was re-read from disk. Now calls that would re-read the index will fail if the index is "dirty", meaning changes have been made to it but have not been written. Users can either `git_index_read` to discard those changes explicitly, or `git_index_write` to write them.
* index: return a unique error code on dirty indexEdward Thomson2018-06-291-1/+1
| | | | | When the index is dirty, return GIT_EINDEXDIRTY so that consumers can identify the exact problem programatically.
* index: commit the changes to the index properlyEdward Thomson2018-06-291-0/+11
| | | | | | | Now that the index has a "dirty" state, where it has changes that have not yet been committed or rolled back, our tests need to be adapted to actually commit or rollback the changes instead of assuming that the index can be operated on in its indeterminate state.
* index: add a dirty bit reflecting unsaved changesEdward Thomson2018-06-291-6/+34
| | | | | | | | | | | Teach the index when it is "dirty", and has unsaved changes. Consider the index dirty whenever a caller has added or removed an entry from the main index, REUC or NAME section, including when the index is completely cleared. Similarly, consider the index _not_ dirty immediately after it is written, or when it is read from the on-disk index. This allows us to ensure that unsaved changes are not lost when we automatically refresh the index.
* Convert usage of `git_buf_free` to new `git_buf_dispose`Patrick Steinhardt2018-06-101-9/+9
|
* index: Fix alignment issues in write_disk_entry()John Paul Adrian Glaubitz2018-06-011-21/+21
| | | | | | In order to avoid alignment issues on certain target architectures, it is necessary to use memcpy() when modifying elements of a struct inside a buffer returned by git_filebuf_reserve().
* path: reject .gitmodules as a symlinkCarlos Martín Nieto2018-05-231-4/+5
| | | | | | | | Any part of the library which asks the question can pass in the mode to have it checked against `.gitmodules` being a symlink. This is particularly relevant for adding entries to the index from the worktree and for checking out files.
* index: stat before creating the entryCarlos Martín Nieto2018-05-231-7/+30
| | | | | This is so we have it available for the path validity checking. In a later commit we will start rejecting `.gitmodules` files as symlinks.
* index: error out on unreasonable prefix-compressed path lengthsPatrick Steinhardt2018-03-101-0/+4
| | | | | | | | | | | | | | When computing the complete path length from the encoded prefix-compressed path, we end up just allocating the complete path without ever checking what the encoded path length actually is. This can easily lead to a denial of service by just encoding an unreasonable long path name inside of the index. Git already enforces a maximum path length of 4096 bytes. As we also have that enforcement ready in some places, just make sure that the resulting path is smaller than GIT_PATH_MAX. Reported-by: Krishna Ram Prakash R <krp@gtux.in> Reported-by: Vivek Parikh <viv0411.parikh@gmail.com>
* index: fix out-of-bounds read with invalid index entry prefix lengthPatrick Steinhardt2018-03-101-9/+10
| | | | | | | | | | | | | | | | | The index format in version 4 has prefix-compressed entries, where every index entry can compress its path by using a path prefix of the previous entry. Since implmenting support for this index format version in commit 5625d86b9 (index: support index v4, 2016-05-17), though, we do not correctly verify that the prefix length that we want to reuse is actually smaller or equal to the amount of characters than the length of the previous index entry's path. This can lead to a an integer underflow and subsequently to an out-of-bounds read. Fix this by verifying that the prefix is actually smaller than the previous entry's path length. Reported-by: Krishna Ram Prakash R <krp@gtux.in> Reported-by: Vivek Parikh <viv0411.parikh@gmail.com>
* index: convert `read_entry` to return entry size via an out-paramPatrick Steinhardt2018-03-101-9/+13
| | | | | | | | | | | | | | | | | | | The function `read_entry` does not conform to our usual coding style of returning stuff via the out parameter and to use the return value for reporting errors. Due to most of our code conforming to that pattern, it has become quite natural for us to actually return `-1` in case there is any error, which has also slipped in with commit 5625d86b9 (index: support index v4, 2016-05-17). As the function returns an `size_t` only, though, the return value is wrapped around, causing the caller of `read_tree` to continue with an invalid index entry. Ultimately, this can lead to a double-free. Improve code and fix the bug by converting the function to return the index entry size via an out parameter and only using the return value to indicate errors. Reported-by: Krishna Ram Prakash R <krp@gtux.in> Reported-by: Vivek Parikh <viv0411.parikh@gmail.com>
* Merge pull request #4529 from libgit2/ethomson/index_add_requires_filesEdward Thomson2018-02-181-5/+9
|\ | | | | git_index_add_frombuffer: only accept files/links
| * git_index_add_frombuffer: only accept files/linksethomson/index_add_requires_filesEdward Thomson2018-02-181-5/+9
| | | | | | | | | | | | | | Ensure that the buffer given to `git_index_add_frombuffer` represents a regular blob, an executable blob, or a link. Explicitly reject commit entries (submodules) - it makes little sense to allow users to add a submodule from a string; there's no possible path to success.
* | index: shut up warning on uninitialized variablePatrick Steinhardt2018-02-161-1/+1
|/ | | | | | | | Even though the `entry` variable will always be initialized when `read_entry` returns success and even though we never dereference `entry` in case `read_entry` fails, GCC prints a warning about uninitialized use. Just initialize the pointer to `NULL` in order to shut GCC up.
* Make sure to always include "common.h" firstPatrick Steinhardt2017-07-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
* index: verify we have enough space left when writing index entriesPatrick Steinhardt2017-06-061-4/+23
| | | | | | | | | | In our code writing index entries, we carry around a `disk_size` representing how much memory we have in total and pass this value to `git_encode_varint` to do bounds checks. This does not make much sense, as at the time when passing on this variable it is already out of date. Fix this by subtracting used memory from `disk_size` as we go along. Furthermore, assert we've actually got enough space left to do the final path memcpy.
* index: fix shared prefix computation when writing index entryPatrick Steinhardt2017-06-061-2/+1
| | | | | | | | | | | When using compressed index entries, each entry's path is preceded by a varint encoding how long the shared prefix with the previous index entry actually is. We currently encode a length of `(path_len - same_len)`, which is doubly wrong. First, `path_len` is already set to `path_len - same_len` previously. Second, we want to encode the shared prefix rather than the un-shared suffix length. Fix this by using `same_len` as the varint value instead.
* index: also sanity check entry size with compressed entriesPatrick Steinhardt2017-06-061-4/+3
| | | | | | | We have a check in place whether the index has enough data left for the required footer after reading an index entry, but this was only used for uncompressed entries. Move the check down a bit so that it is executed for both compressed and uncompressed index entries.
* index: remove file-scope entry size macrosPatrick Steinhardt2017-06-061-6/+4
| | | | | | | All index entry size computations are now performed in `index_entry_size`. As such, we do not need the file-scope macros for computing these sizes anymore. Remove them and move the `entry_size` macro into the `index_entry_size` function.
* index: don't right-pad paths when writing compressed entriesPatrick Steinhardt2017-06-061-4/+3
| | | | | | | | | | Our code to write index entries to disk does not check whether the entry that is to be written should use prefix compression for the path. As such, we were overallocating memory and added bogus right-padding into the resulting index entries. As there is no padding allowed in the index version 4 format, this should actually result in an invalid index. Fix this by re-using the newly extracted `index_entry_size` function.
* index: move index entry size computation into its own functionPatrick Steinhardt2017-06-061-5/+17
| | | | | | | Create a new function `index_entry_size` which encapsulates the logic to calculate how much space is needed for an index entry, whether it is simple/extended or compressed/uncompressed. This can later be re-used by our code writing index entries.
* index: set last written index entry in foreach-entry-loopPatrick Steinhardt2017-06-061-7/+8
| | | | | | | The last written disk entry is currently being written inside of the function `write_disk_entry`. Make behavior a bit more obviously by instead setting it inside of `write_entries` while iterating all entries.
* index: set last entry when reading compressed entriesPatrick Steinhardt2017-06-061-4/+7
| | | | | | | | | | | To calculate the path of a compressed index entry, we need to know the preceding entry's path. While we do actually set the first predecessor correctly to "", we fail to update this while reading the entries. Fix the issue by updating `last` inside of the loop. Previously, we've been passing a double-pointer to `read_entry`, which it didn't update. As it is more obvious to update the pointer inside the loop itself, though, we can simply convert it to a normal pointer.
* index: fix confusion with shared prefix in compressed path namesPatrick Steinhardt2017-06-061-9/+12
| | | | | | | | | | | | | | | | | | The index version 4 introduced compressed path names for the entries. From the git.git index-format documentation: At the beginning of an entry, an integer N in the variable width encoding [...] is stored, followed by a NUL-terminated string S. Removing N bytes from the end of the path name for the previous entry, and replacing it with the string S yields the path name for this entry. But instead of stripping N bytes from the previous path's string and using the remaining prefix, we were instead simply concatenating the previous path with the current entry path, which is obviously wrong. Fix the issue by correctly copying the first N bytes of the previous entry only and concatenating the result with our current entry's path.
* idxmap: remove GIT__USE_IDXMAPPatrick Steinhardt2017-02-171-3/+0
|
* khash: avoid using `kh_resize` directlyPatrick Steinhardt2017-02-171-6/+6
|