summaryrefslogtreecommitdiff
path: root/src/blob.c
Commit message (Collapse)AuthorAgeFilesLines
* blob: identify binary contentethomson/blob_data_is_binaryEdward Thomson2021-12-101-0/+9
| | | | | Introduce `git_blob_data_is_binary` to examine a blob's data, instead of the blob itself. A replacement for `git_buf_is_binary`.
* path: separate git-specific path functions from utilEdward Thomson2021-11-091-2/+2
| | | | | | Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).
* str: introduce `git_str` for internal, `git_buf` is externalethomson/gitstrEdward Thomson2021-10-171-24/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
* blob: improve `create_from_disk` attribute lookupsEdward Thomson2021-09-251-5/+4
| | | | | | Resolve absolute paths to be working directory relative when looking up attributes. Importantly, now we will _never_ pass an absolute path down to attribute lookup functions.
* filter: use a `git_oid` in filter options, not a pointerethomson/filter_commit_idEdward Thomson2021-09-211-1/+7
| | | | | | | Using a `git_oid *` in filter options was a mistake; it is a deviation from our typical pattern, and callers in some languages that GC may need very special treatment in order to pass both an options structure and a pointer outside of it.
* If longpaths is true and filters are enabled, pass git_repository through ↵Laurence McGlashan2021-09-141-3/+4
| | | | | | the filtering code to ensure the cached longpath setting is returned. Fixes: #6054
* filter: introduce GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMITEdward Thomson2021-07-221-5/+10
| | | | | Provide a mechanism to filter using attribute data from a specific commit (making use of `GIT_ATTR_CHECK_INCLUDE_COMMIT`).
* buf: remove internal `git_buf_text` namespaceEdward Thomson2021-05-111-2/+1
| | | | | The `git_buf_text` namespace is unnecessary and strange. Remove it, just keep the functions prefixed with `git_buf`.
* use git_repository_workdir_path to generate pathsEdward Thomson2021-04-281-5/+1
| | | | | Use `git_repository_workdir_path` to generate workdir paths since it will validate the length.
* Merge pull request #5760 from libgit2/ethomson/tttoo_many_tttsEdward Thomson2021-01-071-1/+1
|\ | | | | blob: fix name of `GIT_BLOB_FILTER_ATTRIBUTES_FROM_HEAD`
| * blob: fix name of `GIT_BLOB_FILTER_ATTRIBUTES_FROM_HEAD`ethomson/tttoo_many_tttsEdward Thomson2021-01-051-1/+1
| | | | | | | | | | | | `GIT_BLOB_FILTER_ATTTRIBUTES_FROM_HEAD` is misspelled, it should be `GIT_BLOB_FILTER_ATTRIBUTES_FROM_HEAD`, and it would be if it were not for the MacBook Pro keyboard and my inattentiveness.
* | blob: add git_blob_filter_options_initEdward Thomson2021-01-051-0/+9
|/ | | | | | The `git_blob_filter_options_init` function should be included, to allow callers in FFI environments to let us initialize an options structure for them.
* buffer: git_buf_sanitize should return a valueEdward Thomson2020-11-251-2/+3
| | | | | | `git_buf_sanitize` is called with user-input, and wants to sanity-check that input. Allow it to return a value if the input was malformed in a way that we cannot cope.
* blob: use GIT_ASSERTEdward Thomson2020-11-251-9/+19
|
* tree-wide: do not compile deprecated functions with hard deprecationPatrick Steinhardt2020-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
* blob: use `git_object_size_t` for object sizeEdward Thomson2019-11-221-8/+8
| | | | | Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
* blob: optionally read attributes from repositoryEdward Thomson2019-08-111-0/+3
| | | | | | | When `GIT_BLOB_FILTER_ATTTRIBUTES_FROM_HEAD` is passed to `git_blob_filter`, read attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_FILTER_ATTRIBUTES_FROM_HEAD` to the filter functions.
* blob: allow blob filtering to ignore system gitattributesEdward Thomson2019-08-111-0/+3
| | | | | | | | Introduce `GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES`, which tells `git_blob_filter` to ignore the system-wide attributes file, usually `/etc/gitattributes`. This simply passes the appropriate flag to the attribute loading code.
* blob: deprecate `git_blob_filtered_content`Edward Thomson2019-08-111-16/+16
| | | | Users should now use `git_blob_filter`.
* blob: introduce git_blob_filterEdward Thomson2019-08-111-4/+29
| | | | | Provide a function to filter blobs that allows for more functionality than the existing `git_blob_filtered_content` function.
* blob: add underscore to `from` functionsEdward Thomson2019-06-161-5/+38
| | | | | | The majority of functions are named `from_something` (with an underscore) instead of `fromsomething`. Update the blob functions for consistency with the rest of the library.
* blob: validate that blob sizes fit in a size_tEdward Thomson2019-01-251-6/+8
| | | | | | Our blob size is a `git_off_t`, which is a signed 64 bit int. This may be erroneously negative or larger than `SIZE_MAX`. Ensure that the blob size fits into a `size_t` before casting.
* git_error: use new names in internal APIs and usageEdward Thomson2019-01-221-6/+6
| | | | | Move to the `git_error` name in the internal API for error-related functions.
* object_type: use new enumeration namesethomson/index_fixesEdward Thomson2018-12-011-4/+4
| | | | Use the new object_type enumeration names within the codebase.
* blob: implement function to parse raw dataPatrick Steinhardt2018-06-221-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, parsing objects is strictly tied to having an ODB object available. This makes it hard to parse an object when all that is available is its raw object and size. Furthermore, hacking around that limitation by directly creating an ODB structure either on stack or on heap does not really work that well due to ODB objects being reference counted and then automatically free'd when reaching a reference count of zero. In some occasions parsing raw objects without touching the ODB is actually recuired, though. One use case is for example object verification, where we want to assure that an object is valid before inserting it into the ODB or writing it into the git repository. Asa first step towards that, introduce a distinction between raw and ODB objects for blobs. Creation of ODB objects stays the same by simply using `git_blob__parse`, but a new function `git_blob__parse_raw` has been added that creates a blob from a pair of data and size. By setting a new flag inside of the blob, we can now distinguish whether it is a raw or ODB object now and treat it accordingly in several places. Note that the blob data passed in is not being copied. Because of that, callers need to make sure to keep it alive during the blob's life time. This is being used to avoid unnecessarily increasing the memory footprint when parsing largish blobs.
* blob: use getters to get raw blob content and sizePatrick Steinhardt2018-06-221-4/+4
| | | | | | | | | Going forward, we will have to change how blob sizes are calculated based on whether the blob is a cahed object part of the ODB or not. In order to not have to distinguish between those two object types repeatedly when accessing the blob's data or size, encapsulate all existing direct uses of those fields by instead using `git_blob_rawcontent` and `git_blob_rawsize`.
* Convert usage of `git_buf_free` to new `git_buf_dispose`Patrick Steinhardt2018-06-101-5/+5
|
* Make sure to always include "common.h" firstPatrick Steinhardt2017-07-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
* repository: use `git_repository_item_path`Patrick Steinhardt2017-02-131-2/+2
| | | | | | | | | | | | | | The recent introduction of the commondir variable of a repository requires callers to distinguish whether their files are part of the dot-git directory or the common directory shared between multpile worktrees. In order to take the burden from callers and unify knowledge on which files reside where, the `git_repository_item_path` function has been introduced which encapsulate this knowledge. Modify most existing callers of `git_repository_path` to use `git_repository_item_path` instead, thus making them implicitly aware of the common directory.
* giterr_set: consistent error messagesEdward Thomson2016-12-291-3/+3
| | | | | | | | Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
* blob: remove _fromchunks()cmn/createblob-streamCarlos Martín Nieto2016-03-221-60/+0
| | | | | | The callback mechanism makes it awkward to write data from an IO source; move to `_fromstream()` which lets the caller remain in control, in the same vein as we prefer iterators over foreach callbacks.
* blob: introduce creating a blob by writing into a streamCarlos Martín Nieto2016-03-221-0/+92
| | | | | | | | | | | | | | | | | | | | | The pair of `git_blob_create_frombuffer()` and `git_blob_create_frombuffer_commit()` is meant to replace `git_blob_create_fromchunks()` by providing a way for a user to write a new blob when they want filtering or they do not know the size. This approach allows the caller to retain control over when to add data to this buffer and a more natural fit into higher-level language's own stream abstractions instead of having to handle IO wait in the callback. The in-memory buffer size of 2MB is chosen somewhat arbitrarily to be a round multiple of usual page sizes and a value where most blobs seem likely to be either going to be way below or way over that size. It's also a round number of pages. This implementation re-uses the helper we have from `_fromchunks()` so we end up writing everything to disk, but hopefully more efficiently than with a default filebuf. A later optimisation can be to avoid writing the in-memory contents to disk, with some extra complexity.
* blob: fail to create a blob from a dir with EDIRECTORYCarlos Martín Nieto2015-07-121-0/+6
| | | | | This also affects `git_index_add_bypath()` by providing a better error message and a specific error code when a directory is passed.
* odb: make the writestream's size a git_off_tcmn/stream-sizeCarlos Martín Nieto2015-05-131-2/+3
| | | | | | | | | | Restricting files to size_t is a silly limitation. The loose backend writes to a file directly, so there is no issue in using 63 bits for the size. We still assume that the header is going to fit in 64 bytes, which does mean quite a bit smaller files due to the run-length encoding, but it's still a much larger size than you would want Git to handle.
* centralizing all IO buffer size valuesJ Wyman2015-05-111-1/+1
|
* git_filter_opt_t -> git_filter_flag_tEdward Thomson2015-02-191-2/+2
| | | | | For consistency with the rest of the library, where an opt is an options *structure*.
* buffer: introduce git_buf_attach_notownedEdward Thomson2015-02-191-6/+4
| | | | | | Provide a convenience function that creates a buffer that can be provided to callers but will not be freed via `git_buf_free`, so the buffer creator maintains the allocation lifecycle of the buffer's contents.
* Increase binary detection len to 8kRussell Belfer2014-05-161-1/+2
|
* Add filter options and ALLOW_UNSAFERussell Belfer2014-05-061-2/+4
| | | | | | | | | Diff and status do not want core.safecrlf to actually raise an error regardless of the setting, so this extends the filter API with an additional options flags parameter and adds a flag so that filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating that unsafe filter application should be downgraded from a failure to a warning.
* Const correctness!Jacques Germishuys2014-04-031-1/+1
|
* Some missing oid to id renamesRussell Belfer2014-01-301-19/+22
|
* Handle git_buf's from users more liberallyEdward Thomson2014-01-081-0/+2
|
* Update git_blob_create_fromchunks callback behavrRussell Belfer2013-12-111-13/+21
| | | | | | | The callback to supply data chunks could return a negative value to stop creation of the blob, but we were neither using GIT_EUSER nor propagating the return value. This makes things use the new behavior of returning the negative value back to the user.
* move mode_t to filebuf_open instead of _commitEdward Thomson2013-11-041-1/+1
|
* Merge git_buf and git_bufferRussell Belfer2013-09-171-3/+3
| | | | | | | | | | | This makes the git_buf struct that was used internally into an externally available structure and eliminates the git_buffer. As part of that, some of the special cases that arose with the externally used git_buffer were blended into the git_buf, such as being careful about git_buf objects that may have a NULL ptr and allowing for bufs with a valid ptr and size but zero asize as a way of referring to externally owned data.
* Add ident filterRussell Belfer2013-09-171-4/+4
| | | | | | | This adds the ident filter (that knows how to replace $Id$) and tweaks the filter APIs and code so that git_filter_source objects actually have the updated OID of the object being filtered when it is a known value.
* Extend public filter api with filter listsRussell Belfer2013-09-171-45/+12
| | | | | | | | | | | This moves the git_filter_list into the public API so that users can create, apply, and dispose of filter lists. This allows more granular application of filters to user data outside of libgit2 internals. This also converts all the internal usage of filters to the public APIs along with a few small tweaks to make it easier to use the public git_buffer stuff alongside the internal git_buf.
* Create public filter object and use itRussell Belfer2013-09-171-26/+22
| | | | | | | This creates include/sys/filter.h with a basic definition of a git_filter and then converts the internal code to use it. There are related internal objects (git_filter_list) that we will want to publish at some point, but this is a first step.
* Start of filter API + git_blob_filtered_contentRussell Belfer2013-09-171-0/+51
| | | | | | | | | | This begins the process of exposing git_filter objects to the public API. This includes: * new public type and API for `git_buffer` through which an allocated buffer can be passed to the user * new API `git_blob_filtered_content` * make the git_filter type and GIT_FILTER_TO_... constants public
* odb: wrap the stream reading and writing functionsCarlos Martín Nieto2013-08-151-7/+7
| | | | | | This is in preparation for moving the hashing to the frontend, which requires us to handle the incoming data before passing it to the backend's stream.