| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return `GIT_EINVALID` on parse errors so that direct callers of parse
functions can determine when there was a failure to parse the object.
The object parser functions will swallow this error code to prevent it
from propagating down the chain to end-users. (`git_merge` should not
return `GIT_EINVALID` when a commit it tries to look up is not valid,
this would be too vague to be useful.)
The only public function that this affects is
`git_signature_from_buffer`, which is now documented as returning
`GIT_EINVALID` when appropriate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgit2 has two distinct requirements that were previously solved by
`git_buf`. We require:
1. A general purpose string class that provides a number of utility APIs
for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
can take ownership of.
By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.
Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class. The name also
is an homage to Junio Hamano ("gitstr").
The public API remains `git_buf`, and has a much smaller footprint. It
is generally only used as an "out" param with strict requirements that
follow the documentation. (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)
Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
There can be a significant difference between the system where we created the
buffer (if at all) and when the caller provides us with the contents of a
commit.
Verify that the commit we are being asked to create references objects which do
exist in the target repository.
|
|\
| |
| | |
DRY commit parsing
|
| |
| |
| |
| |
| |
| | |
This allows us to pick which data from a commit we're interested in.
This will be used by the revwalk code, which is only interested in
parents' and committer data.
|
| | |
|
|/
|
|
| |
If provided with a null signature, skip adding the signature header and create the commit anyway.
|
|
|
|
|
| |
Move to the `git_error` name in the internal API for error-related
functions.
|
|
|
|
| |
Use the new object_type enumeration names within the codebase.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While commit objects usually should have only one author field, our commit
parser actually handles the case where a commit has multiple author fields
because some tools that exist in the wild actually write them. Detection of
those additional author fields is done by using a simple `git__prefixcmp`,
checking whether the current line starts with the string "author ". In case
where we are handed a non-NUL-terminated string that ends directly after the
space, though, we may have an out-of-bounds read of one byte when trying to
compare the expected final NUL byte.
Fix the issue by using `git__prefixncmp` instead of `git_prefixcmp`.
Unfortunately, a test cannot be easily written to catch this case. While we
could test the last error message and verify that it didn't in fact fail parsing
a signature (because that would indicate that it has in fact tried to parse the
additional "author " field, which it shouldn't be able to detect in the first
place), this doesn't work as the next line needs to be the "committer" field,
which would error out with the same error message even if we hadn't done an
out-of-bounds read.
As objects read from the object database are always NUL terminated, this issue
cannot be triggered in normal code and thus it's not security critical.
|
|
|
|
|
|
|
|
|
|
|
| |
The commit message encoding is currently being parsed by the
`git__prefixcmp` function. As this function does not accept a buffer
length, it will happily skip over a buffer's end if it is not `NUL`
terminated.
Fix the issue by using `git__prefixncmp` instead. Add a test that
verifies that we are unable to parse the encoding field if it's cut off
by the supplied buffer length.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, parsing objects is strictly tied to having an ODB object
available. This makes it hard to parse an object when all that is
available is its raw object and size. Furthermore, hacking around that
limitation by directly creating an ODB structure either on stack or on
heap does not really work that well due to ODB objects being reference
counted and then automatically free'd when reaching a reference count of
zero.
Implement a function `git_commit__parse_raw` to parse a commit object
from a pair of `data` and `size`.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|
|
|
| |
Freshen the tree object that a commit points to during commit time.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When extracting a commit's signature, we first free the object and only
afterwards put its signature contents into the result buffer. This works
in most cases - the free'd object will normally be cached anyway, so we
only end up decrementing its reference count without actually freeing
its contents. But in some more exotic setups, where caching is disabled,
this can definitly be a problem, as we might be the only instance
currently holding a reference to this object.
Fix this issue by first extracting the contents and freeing the object
afterwards only.
|
|
|
|
|
|
|
|
|
|
| |
The functions `git_commit_header_field` and
`git_commit_extract_signature` both receive buffers used to hand back
the results to the user. While these functions called `git_buf_sanitize`
on these buffers, this is not the right thing to do, as it will simply
initialize or zero-terminate passed buffers. As we want to overwrite
contents, we instead have to call `git_buf_clear` to completely reset
them.
|
|
|
|
|
|
|
|
| |
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parsing a commit, we will treat all bytes left after parsing
the headers as the commit message. When no bytes are left, we
leave the commit's message uninitialized. While uncommon to have
a commit without message, this is the right behavior as Git
unfortunately allows for empty commit messages.
Given that this scenario is so uncommon, most programs acting on
the commit message will never check if the message is actually
set, which may lead to errors. To work around the error and not
lay the burden of checking for empty commit messages to the
developer, initialize the commit message with an empty string
when no commit message is given.
|
| |
|
|
|
|
|
|
|
| |
When calling `git_commit_create` with an empty array of `parents` and `parent_count == 0`
the call will segfault at https://github.com/libgit2/libgit2/blob/master/src/commit.c#L107
when it's trying to compare `current_id` to a null parent oid.
This just puts in a check to stop that segfault.
|
| |
|
|\
| |
| | |
commit: add function to attach a signature to a commit
|
| |
| |
| |
| |
| | |
In combination with the function which creates a commit into a buffer,
this allows us to more easily create signed commits.
|
|/
|
|
|
|
|
|
| |
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
|
|
|
|
|
|
| |
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
|
|
|
|
|
| |
When `GIT_OPT_ENABLE_STRICT_OBJECT_CREATION` is turned on, validate
the tree and parent ids given to commit creation functions.
|
|
|
|
|
|
| |
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
|
|
|
|
|
|
|
|
|
| |
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
|
|\
| |
| | |
Introduce git_commit_extract_signature
|
| |
| |
| |
| |
| |
| | |
This returns the GPG signature for a commit and its contents without the
signature block, allowing for the verification of the commit's
signature.
|
|/
|
|
|
|
|
|
| |
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
|
|
|
|
|
|
|
|
|
| |
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
|
|
|
|
| |
whitespace. Collapse spaces around newlines for the summary.
|
|
|
|
|
|
| |
This allows the user to look up fields which we don't parse in libgit2,
and allows them to access gpgsig or mergetag fields if they wish to
check the signature.
|
|
|
|
|
|
|
|
|
|
| |
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
|
|
|
|
|
|
|
|
|
|
| |
The signature for the reflog is not something which changes
dynamically. Almost all uses will be NULL, since we want for the
repository's default identity to be used, making it noise.
In order to allow for changing the identity, we instead provide
git_repository_set_ident() and git_repository_ident() which allow a user
to override the choice of signature.
|
|
|
|
|
| |
Without this change, compiling with gcc and pedantic generates warning:
ISO C does not allow extra ‘;’ outside of a function.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
|
|
|
|
|
|
|
|
| |
We can make use of git_object_dup to use refcounting instead of pointer
comparison to make sure we don't free the caller's object.
This also lets us simplify the case for '~0' which is now just an
assignment instead of looking up the object we have at hand.
|
| |
|
|
|
|
|
|
|
|
|
| |
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
|
|/ |
|