summaryrefslogtreecommitdiff
path: root/src/util.c
Commit message (Collapse)AuthorAgeFilesLines
* refactor: `src` is now `src/libgit2`Edward Thomson2022-02-221-819/+0
|
* cmake: qsort detection in features.hEdward Thomson2021-11-111-8/+8
|
* str: introduce `git_str` for internal, `git_buf` is externalethomson/gitstrEdward Thomson2021-10-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
* win32: allow empty environment variablesEdward Thomson2021-09-201-1/+1
| | | | | A length of 0 indicates an error and GetLastError() will be set. If GetLastError() is unset then the environment variable has a length of 0.
* utf8: refactor utf8 functionsEdward Thomson2021-04-141-117/+0
| | | | | | | | Move the utf8 functions into a proper namespace `git_utf8` instead of being in the namespaceless `git__` function group. Update them to have out-params first and use `char *` instead of `uint8_t *` to match our API treating strings as `char *` (even if they truly contain `uchar`s inside).
* Make git__strntol64() ~70%* fasterlhchavez2020-12-121-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This change uses compiler intrinsics to detect overflows instead of using divisions to detect potential overflow. This makes the code faster and makes it easier to read as a bonus side-effect! Some of the things this quickens: * Config parsing. * Tree parsing. * Smart protocol negotiation. \* Measured by running `libgit2_clar` with `-fno-optimize-sibling-calls -fno-omit-frame-pointer` under `perf(1)`: ```shell $ perf diff --symbols=git__strntol64 --compute=ratio \ --percentage=absolute baseline.data perf.data \# Event 'cycles' \# \# Baseline Ratio Shared Object \# ........ .............. ............. \# 0.25% 0.321836 libgit2_clar ```
* win32: provide macros for calling conventionsEdward Thomson2020-12-081-1/+1
| | | | | | | Provide more clarity for Win32 calling conventions - now that we always compile to __cdecl on Win32, we'll define that as the the libgit2 calling convention. Also offer NTAPI (__stdcall) calling conventions for things that need callbacks from NTAPI code (eg fiber-local storage).
* util: move git_online_cpus into utilEdward Thomson2020-12-061-0/+49
| | | | | | The number of CPUs is useful information for creating a thread pool or a number of workers, but it's not really about threading directly. Evict it from the thread file
* tree-wide: mark local functions as staticPatrick Steinhardt2020-06-091-3/+3
| | | | | | | We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
* strarray: move to its own fileEdward Thomson2020-06-011-46/+0
|
* git__hexdump: better mimic `hexdump -C`ethomson/hexdumpEdward Thomson2020-04-011-9/+22
|
* util: hide helper qsort code to silence unused functions warningEtienne Samson2019-10-131-0/+5
|
* util: do not perform allocations in insertsortPatrick Steinhardt2019-08-231-24/+27
| | | | | | | | | | | | | | | | | | | | | Our hand-rolled fallback sorting function `git__insertsort_r` does an in-place sort of the given array. As elements may not necessarily be pointers, it needs a way of swapping two values of arbitrary size, which is currently implemented by allocating a temporary buffer of the element's size. This is problematic, though, as the emulated `qsort` interface doesn't provide any return values and thus cannot signal an error if allocation of that temporary buffer has failed. Convert the function to swap via a temporary buffer allocated on the stack. Like this, it can `memcpy` contents of both elements in small batches without requiring a heap allocation. The buffer size has been chosen such that in most cases, a single iteration of copying will suffice. Most importantly, it can fully contain `git_oid` structures and pointers. Add a bunch of tests for the `git__qsort_r` interface to verify nothing breaks. Furthermore, this removes the declaration of `git__insertsort_r` and makes it static as it is not used anywhere else.
* strtol: cast error message length to intEdward Thomson2019-06-241-1/+1
|
* utf8: use size_t for length of bufferEdward Thomson2019-06-241-6/+6
| | | | | | | | The `git__utf8_charlen` now takes `size_t` as the buffer length, since it contains the full length of the buffer at the current position. It now returns `-1` in all cases where utf8 codepoints are invalid, since callers only care about a valid length of a sequence of codepoints, or if the current position is not valid utf8.
* optimize string comparisonsromkatv2019-03-291-17/+12
|
* Remove `git_time_monotonic`remove_time_monotonicEdward Thomson2019-02-171-5/+0
| | | | | | | | | | `git_time_monotonic` was added so that non-native bindings like rugged could get high-resolution timing for benchmarking. However, this is outside the scope of libgit2 *and* rugged decided not to use this function in the first place. Google suggests that absolutely _nobody_ is using this function and we don't want to be in the benchmarking business. Remove the function.
* git_error: use new names in internal APIs and usageEdward Thomson2019-01-221-6/+6
| | | | | Move to the `git_error` name in the internal API for error-related functions.
* strntol: fix out-of-bounds reads when parsing numbers with leading signPatrick Steinhardt2018-11-141-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing a number, we accept a leading plus or minus sign to return a positive or negative number. When the parsed string has such a leading sign, we set up a flag indicating that the number is negative and advance the pointer to the next character in that string. This misses updating the number of bytes in the string, though, which is why the parser may later on do an out-of-bounds read. Fix the issue by correctly updating both the pointer and the number of remaining bytes. Furthermore, we need to check whether we actually have any bytes left after having advanced the pointer, as otherwise the auto-detection of the base may do an out-of-bonuds access. Add a test that detects the out-of-bound read. Note that this is not actually security critical. While there are a lot of places where the function is called, all of these places are guarded or irrelevant: - commit list: this operates on objects from the ODB, which are always NUL terminated any may thus not trigger the off-by-one OOB read. - config: the configuration is NUL terminated. - curl stream: user input is being parsed that is always NUL terminated - index: the index is read via `git_futils_readbuffer`, which always NUL terminates it. - loose objects: used to parse the length from the object's header. As we check previously that the buffer contains a NUL byte, this is safe. - rebase: this parses numbers from the rebase instruction sheet. As the rebase code uses `git_futils_readbuffer`, the buffer is always NUL terminated. - revparse: this parses a user provided buffer that is NUL terminated. - signature: this parser the header information of objects. As objects read from the ODB are always NUL terminated, this is a non-issue. The constructor `git_signature_from_buffer` does not accept a length parameter for the buffer, so the buffer needs to be NUL terminated, as well. - smart transport: the buffer that is parsed is NUL terminated - tree cache: this parses the tree cache from the index extension. The index itself is read via `git_futils_readbuffer`, which always NUL terminates it. - winhttp transport: user input is being parsed that is always NUL terminated
* strntol: fix detection and skipping of base prefixesPatrick Steinhardt2018-11-021-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The `git__strntol` family of functions has the ability to auto-detect a number's base if the string has either the common '0x' prefix for hexadecimal numbers or '0' prefix for octal numbers. The detection of such prefixes and following handling has two major issues though that are being fixed in one go now. - We do not do any bounds checking previous to verifying the '0x' base. While we do verify that there is at least one digit available previously, we fail to verify that there are two digits available and thus may do an out-of-bounds read when parsing this two-character-prefix. - When skipping the prefix of such numbers, we only update the pointer length without also updating the number of remaining bytes. Thus if we try to parse a number '0x1' of total length 3, we will first skip the first two bytes and then try to read 3 bytes starting at '1'. Fix both issues by disentangling the logic. Instead of doing the detection and skipping of such prefixes in one go, we will now first try to detect the base while also honoring how many bytes are left. Only if we have a valid base that is either 8 or 16 and have one of the known prefixes, we will now advance the pointer and update the remaining bytes in one step. Add some tests that verify that no out-of-bounds parsing happens and that autodetection works as advertised.
* strntol: fix out-of-bounds read when skipping leading spacesPatrick Steinhardt2018-11-021-2/+5
| | | | | | | | | | | | | | The `git__strntol` family of functions accepts leading spaces and will simply skip them. The skipping will not honor the provided buffer's length, though, which may lead it to read outside of the provided buffer's bounds if it is not a simple NUL-terminated string. Furthermore, if leading space is trimmed, the function will further advance the pointer but not update the number of remaining bytes, which may also lead to out-of-bounds reads. Fix the issue by properly paying attention to the buffer length and updating it when stripping leading whitespace characters. Add a test that verifies that we won't read past the provided buffer length.
* Merge pull request #4864 from pks-t/pks/object-parse-fixesPatrick Steinhardt2018-10-261-0/+41
|\ | | | | Object parse fixes
| * util: provide `git__memmem` functionPatrick Steinhardt2018-10-251-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, neither the `memmem` nor the `strnstr` functions are part of any C standard but are merely extensions of C that are implemented by e.g. glibc. Thus, there is no standardized way to search for a string in a block of memory with a limited size, and using `strstr` is to be considered unsafe in case where the buffer has not been sanitized. In fact, there are some uses of `strstr` in exactly that unsafe way in our codebase. Provide a new function `git__memmem` that implements the `memmem` semantics. That is in a given haystack of `n` bytes, search for the occurrence of a byte sequence of `m` bytes and return a pointer to the first occurrence. The implementation chosen is the "Not So Naive" algorithm from [1]. It was chosen as the implementation is comparably simple while still being reasonably efficient in most cases. Preprocessing happens in constant time and space, searching has a time complexity of O(n*m) with a slightly sub-linear average case. [1]: http://www-igm.univ-mlv.fr/~lecroq/string/
* | util: fix out of bounds read in error messagePatrick Steinhardt2018-10-191-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an integer that is parsed with `git__strntol32` is too big to fit into an int32, we will generate an error message that includes the actual string that failed to parse. This does not acknowledge the fact that the string may either not be NUL terminated or alternative include additional characters after the number that is to be parsed. We may thus end up printing characters into the buffer that aren't the number or, worse, read out of bounds. Fix the issue by utilizing the `endptr` that was set by `git__strntol64`. This pointer is guaranteed to be set to the first character following the number, and we can thus use it to compute the width of the number that shall be printed. Create a test to verify that we correctly truncate the number.
* | util: avoid signed integer overflows in `git__strntol64`Patrick Steinhardt2018-10-181-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While `git__strntol64` tries to detect integer overflows when doing the necessary arithmetics to come up with the final result, it does the detection only after the fact. This check thus relies on undefined behavior of signed integer overflows. Fix this by instead checking up-front whether the multiplications or additions will overflow. Note that a detected overflow will not cause us to abort parsing the current sequence of digits. In the case of an overflow, previous behavior was to still set up the end pointer correctly to point to the first character immediately after the currently parsed number. We do not want to change this now as code may rely on the end pointer being set up correctly even if the parsed number is too big to be represented as 64 bit integer.
* | util: remove `git__strtol32`Patrick Steinhardt2018-10-181-6/+0
| | | | | | | | | | | | | | The function `git__strtol32` can easily be misused when untrusted data is passed to it that may not have been sanitized with trailing `NUL` bytes. As all usages of this function have now been removed, we can remove this function altogether to avoid future misuse of it.
* | util: remove unsafe `git__strtol64` functionPatrick Steinhardt2018-10-181-6/+0
|/ | | | | | | | The function `git__strtol64` does not take a maximum buffer length as parameter. This has led to some unsafe usages of this function, and as such we may consider it as being unsafe to use. As we have now eradicated all usages of this function, let's remove it completely to avoid future misuse.
* util: make the qsort_r check work on macOSEtienne Samson2018-08-241-3/+3
| | | | | | | | This performs a compile-check by using CMake support, to differentiate the GNU version from the BSD version of qsort_r. Module taken from 4f252abea5f1d17c60f6ff115c9c44cc0b6f1df6, which I've checked against CMake 2.8.11.
* Sanitize the hunk header to ensure it contains UTF-8 valid dataStan Hu2018-05-051-0/+16
| | | | | | | | | | The diff driver truncates the hunk header text to 80 bytes, which can truncate 4-byte Unicode characters and introduce garbage characters in the diff output. This change sanitizes the hunk header before it is displayed. This mirrors the test in git: https://github.com/git/git/blob/master/t/t4025-hunk-header.sh Closes https://github.com/libgit2/rugged/issues/716
* util: fix missing headers for MinGW environmentsPatrick Steinhardt2018-03-291-0/+5
| | | | | There are multiple references to undefined functions in the Microsoft builds. Add headers to make them known.
* util: clean up header includesPatrick Steinhardt2018-02-161-4/+1
| | | | | | | | | | | While "util.h" declares the macro `git__tolower`, which simply resorts to tolower(3P) on Unix-like systems, the <ctype.h> header is only being included in "util.c". Thus, anybody who has included "util.h" without having <ctype.h> included will fail to compile as soon as the macro is in use. Furthermore, we can clean up additional includes in "util.c" and simply replace them with an include for "common.h".
* Explicitly mark fallthrough cases with commentsPatrick Steinhardt2018-02-161-1/+3
| | | | | | | | | | | | | | | | | | A lot of compilers nowadays generate warnings when there are cases in a switch statement which implicitly fall through to the next case. To avoid this warning, the last line in the case that is falling through can have a comment matching a regular expression, where one possible comment body would be `/* fall through */`. An alternative to the comment would be an explicit attribute like e.g. `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only introduced support for such an attribute recently with GCC 7. Thus, and also because the fallthrough comment is supported by most compilers, we settle for using comments instead. One shortcoming of that method is that compilers are very strict about that. Most interestingly, that comment _really_ has to be the last line. In case a closing brace follows the comment, the heuristic will fail.
* util: introduce `git__prefixncmp` and consolidate implementationsEdward Thomson2017-12-201-16/+28
| | | | | | | | | | | Introduce `git_prefixncmp` that will search up to the first `n` characters of a string to see if it is prefixed by another string. This is useful for examining if a non-null terminated character array is prefixed by a particular substring. Consolidate the various implementations of `git__prefixcmp` around a single core implementation and add some test cases to validate its behavior.
* Make sure to always include "common.h" firstPatrick Steinhardt2017-07-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
* giterr_set: consistent error messagesEdward Thomson2016-12-291-3/+3
| | | | | | | | Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
* time: Export `git_time_monotonic`vmg/timeVicent Marti2016-09-131-0/+5
|
* Fixed bug while parsing INT64_MINKrishna Ram Prakash R2016-06-291-3/+3
|
* Patch parsing from patch filesEdward Thomson2016-05-261-2/+20
|
* qsort_r/qsort_s: detect their supportEdward Thomson2015-09-301-13/+8
|
* Fix duplicate basenames to support older VSJohn Haley2015-08-051-1/+1
| | | | | | | With Visual Studio versions 2008 and older they ignore the full path to files and only check the basename of the file to find a collision. Additionally, having duplicate basenames can break other build tools like GYP. This fixes https://github.com/libgit2/libgit2/issues/3356
* git__getenv: utf-8 aware env readerEdward Thomson2015-07-021-0/+48
| | | | | | Introduce `git__getenv` which is a UTF-8 aware `getenv` everywhere. Make `cl_getenv` use this to keep consistent memory handling around return values (free everywhere, as opposed to only some platforms).
* git__tolower: a tolower() that isn't dumbEdward Thomson2015-05-291-9/+9
| | | | | | | | Some brain damaged tolower() implementations appear to want to take the locale into account, and this may require taking some insanely aggressive lock on the locale and slowing down what should be the most trivial of trivial calls for people who just want to downcase ASCII.
* git__strcasecmp: treat input bytes as unsignedEdward Thomson2015-05-291-2/+2
| | | | | | Treat input bytes as unsigned before doing arithmetic on them, lest we look at some non-ASCII byte (like a UTF-8 character) as a negative value and perform the comparison incorrectly.
* Credit utf8proc for utf8 iteratorEdward Thomson2015-02-101-0/+25
|
* path: Use UTF8 iteration for HFS charsVicent Marti2014-12-161-0/+76
|
* checkout: disallow bad paths on win32Edward Thomson2014-12-161-0/+15
| | | | | | | | | | | | | | | Disallow: 1. paths with trailing dot 2. paths with trailing space 3. paths with trailing colon 4. paths that are 8.3 short names of .git folders ("GIT~1") 5. paths that are reserved path names (COM1, LPT1, etc). 6. paths with reserved DOS characters (colons, asterisks, etc) These paths would (without \\?\ syntax) be elided to other paths - for example, ".git." would be written as ".git". As a result, writing these paths literally (using \\?\ syntax) makes them hard to operate with from the shell, Windows Explorer or other tools. Disallow these.
* qsort_r is only available from Visual Studio 2005+Jacques Germishuys2014-07-061-1/+2
|
* cygwin also doesn't have qsort_rJacques Germishuys2014-04-301-1/+1
|
* Decouple index iterator sort from indexRussell Belfer2014-04-171-3/+5
| | | | | | | | This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the index data itself. To take advantage of this, I had to export a number of the internal index entry comparison functions. I also wrote some new tests to exercise the capability.
* Move libgit2 settings out of utilEdward Thomson2014-01-141-123/+0
|