summaryrefslogtreecommitdiff
path: root/src/ignore.c
Commit message (Collapse)AuthorAgeFilesLines
* path: use new length validation functionsEdward Thomson2021-11-091-2/+3
|
* path: separate git-specific path functions from utilEdward Thomson2021-11-091-11/+11
| | | | | | Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).
* str: introduce `git_str` for internal, `git_buf` is externalethomson/gitstrEdward Thomson2021-10-171-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
* attr_file: don't take the `repo` as an argethomson/attr_longpathsEdward Thomson2021-09-261-2/+2
| | | | The `repo` argument is now unnecessary. Remove it.
* attr: include the filename in the attr sourceEdward Thomson2021-07-221-7/+5
| | | | The attribute source object is now the type and the path.
* attr: rename internal attr file source enumEdward Thomson2021-07-221-4/+7
| | | | | | The enum `git_attr_file_source` is better suffixed with a `_t` since it's a type-of source. Similarly, its members should have a matching name.
* Merge pull request #5824 from palmin/fix-ignore-negateEdward Thomson2021-07-141-3/+11
|\ | | | | fix check for ignoring of negate rules
| * Apply suggestions from code reviewEdward Thomson2021-06-151-4/+4
| |
| * Update src/ignore.cAnders Borum2021-06-131-2/+2
| | | | | | Co-authored-by: lhchavez <lhchavez@lhchavez.com>
| * Update src/ignore.cAnders Borum2021-06-131-2/+1
| | | | | | Co-authored-by: lhchavez <lhchavez@lhchavez.com>
| * fix check for ignoring of negate rulesAnders Borum2021-03-201-2/+11
| | | | | | | | | | | | gitignore rule like "*.pdf" should be negated by "!dir/test.pdf" even though one is inside a directory since *.pdf isn't restricted to any directory this is fixed by making wildcard match treat * like ** by excluding WM_PATHNAME flag when the rule isn't for a full path
* | ignore: validate workdir paths for ignore filesEdward Thomson2021-04-281-7/+10
| |
* | attr: validate workdir paths for attribute filesEdward Thomson2021-04-281-2/+2
|/ | | | | | We should allow attribute files - inside working directories - to have names longer than MAX_PATH when core.longpaths is set. `git_attr_path__init` takes a repository to validate the path with.
* ignore: use GIT_ASSERTEdward Thomson2020-11-271-2/+6
|
* ignore: correct handling of nested rules overriding wild card unignorebuddyspike2019-08-281-3/+6
| | | | | | | | | | | | | | problem: filesystem_iterator loads .gitignore files in top-down order. subsequently, ignore module evaluates them in the order they are loaded. this creates a problem if we have unignored a rule (using a wild card) in a sub dir and ignored it again in a level further below (see the test included in this patch). solution: process ignores in reverse order. closes #4963
* Merge pull request #5173 from pks-t/pks/gitignore-wildmatch-errorEdward Thomson2019-07-201-6/+1
|\ | | | | ignore: fix determining whether a shorter pattern negates another
| * ignore: fix determining whether a shorter pattern negates anotherPatrick Steinhardt2019-07-181-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When computing whether we need to store a negative pattern, we iterate through all previously known patterns and check whether the negative pattern undoes any of the previous ones. In doing so we call `wildmatch` and check it's return for any negative error values. If there was a negative return, we will abort and bubble up that error to the caller. In fact, this check for negative values stems from the time where we still used `fnmatch` instead of `wildmatch`. For `fnmatch`, negative values indicate a "real" error, while for `wildmatch` a negative value may be returned if the matching was prematurely aborted. A premature abort may for example also happen if the pattern matches a prefix of the haystack if the pattern is shorter. Returning an error in that case is the wrong thing to do. Fix the code to compare for equality with `WM_MATCH`, only. Negative values returned by `wildmatch` are perfectly fine and thus should be ignored. Add a test that verifies we do not see the error.
* | configuration: cvar -> configmapPatrick Steinhardt2019-07-181-3/+3
| | | | | | | | | | `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
* | attr_file: ignore macros defined in subdirectoriesPatrick Steinhardt2019-07-121-9/+10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, we are unconditionally applying all macros found in a gitatttributes file. But quoting gitattributes(5): Custom macro attributes can be defined only in top-level gitattributes files ($GIT_DIR/info/attributes, the .gitattributes file at the top level of the working tree, or the global or system-wide gitattributes files), not in .gitattributes files in working tree subdirectories. The built-in macro attribute "binary" is equivalent to: So gitattribute files in subdirectories of the working tree may explicitly _not_ contain macro definitions, but we do not currently enforce this limitation. This patch introduces a new parameter to the gitattributes parser that tells whether macros are allowed in the current file or not. If set to `false`, we will still parse macros, but silently ignore them instead of adding them to the list of defined macros. Update all callers to correctly determine whether the to-be-parsed file may contain macros or not. Most importantly, when walking up the directory hierarchy, we will only set it to `true` once it reaches the root directory of the repo itself. Add a test that verifies that we are indeed not applying macros from subdirectories. Previous to these changes, the test would've failed.
* ignore: fix a missing commondir causing failuresEtienne Samson2019-06-261-10/+7
| | | | | As with the preceding commit, the ignore code tries to load code from info/exclude, and we fail to ignore a non-existent file here.
* attr_file: convert to use `wildmatch`Patrick Steinhardt2019-06-151-3/+1
| | | | | | | | | | | | | | | | | | Upstream git has converted to use `wildmatch` instead of `fnmatch`. Convert our gitattributes logic to use `wildmatch` as the last user of `fnmatch`. Please, don't expect I know what I'm doing here: the fnmatch parser is one of the most fun things to play around with as it has a sh*tload of weird cases. In all honesty, I'm simply relying on our tests that are by now rather comprehensive in that area. The conversion actually fixes compatibility with how git.git parser "**" patterns when the given path does not contain any directory separators. Previously, a pattern "**.foo" erroneously wouldn't match a file "x.foo", while git.git would match. Remove the new-unused LEADINGDIR/NOLEADINGDIR flags for `git_attr_fnmatch`.
* global: convert trivial `fnmatch` users to use `wildcard`Patrick Steinhardt2019-06-151-6/+6
| | | | | | | | | | Upstream git.git has converted its codebase to use wildcard in favor of fnmatch in commit 70a8fc999d (stop using fnmatch (either native or compat), 2014-02-15). To keep our own regex-matching in line with what git does, convert all trivial instances of `fnmatch` usage to use `wildcard`, instead. Trivial usage is defined to be use of `fnmatch` with either no flags or flags that have a 1:1 equivalent in wildmatch (PATHNAME, IGNORECASE).
* ignore: treat paths with trailing "/" as directoriesPatrick Steinhardt2019-04-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | The function `git_ignore_path_is_ignored` is there to test the ignore status of paths that need not necessarily exist inside of a repository. This has the implication that for a given path, we cannot always decide whether it references a directory or a file, and we need to distinguish those cases because ignore rules may treat those differently. E.g. given the following gitignore file: * !/**/ we'd only want to unignore directories, while keeping files ignored. But still, calling `git_ignore_path_is_ignored("dir/")` will say that this directory is ignored because it treats "dir/" as a file path. As said, the `is_ignored` function cannot always decide whether the given path is a file or directory, and thus it may produce wrong results in some cases. While this is unfixable in the general case, we can do better when we are being passed a path name with a trailing path separator (e.g. "dir/") and always treat them as directories.
* git_error: use new names in internal APIs and usageEdward Thomson2019-01-221-4/+4
| | | | | Move to the `git_error` name in the internal API for error-related functions.
* Merge pull request #4436 from pks-t/pks/packfile-stream-freeEdward Thomson2018-06-111-5/+5
|\ | | | | pack: rename `git_packfile_stream_free`
| * Convert usage of `git_buf_free` to new `git_buf_dispose`Patrick Steinhardt2018-06-101-5/+5
| |
* | ignore: remove now-useless check for LEADINGDIRPatrick Steinhardt2018-06-061-14/+3
| | | | | | | | | | | | | | | | | | | | | | When checking whether a rule negates another rule, we were checking whether a rule had the `GIT_ATTR_FNMATCH_LEADINGDIR` flag set and, if so, added a "/*" to its end before passing it to `fnmatch`. Our code now sets `GIT_ATTR_FNMATCH_NOLEADINGDIR`, thus the `LEADINGDIR` flag shall never be set. Furthermore, due to the `NOLEADINGDIR` flag, trailing globs do not get consumed by our ignore parser anymore. Clean up code by just dropping this now useless logic.
* | ignore: fix negative leading directory rules unignoring subdirectory filesPatrick Steinhardt2018-06-061-1/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When computing whether a file is ignored, we simply search for the first matching rule and return whether it is a positive ignore rule (the file is really ignored) or whether it is a negative ignore rule (the file is being unignored). Each rule has a set of flags which are being passed to `fnmatch`, depending on what kind of rule it is. E.g. in case it is a negative ignore we add a flag `GIT_ATTR_FNMATCH_NEGATIVE`, in case it contains a glob we set the `GIT_ATTR_FNMATCH_HASGLOB` flag. One of these flags is the `GIT_ATTR_FNMATCH_LEADINGDIR` flag, which is always set in case the pattern has a trailing "/*" or in case the pattern is negative. The flag causes the `fnmatch` function to return a match in case a string is a leading directory of another, e.g. "dir/" matches "dir/foo/bar.c". In case of negative patterns, this is wrong in certain cases. Take the following simple example of a gitignore: dir/ !dir/ The `LEADINGDIR` flag causes "!dir/" to match "dir/foo/bar.c", and we correctly unignore the directory. But take this example: *.test !dir/* We expect everything in "dir/" to be unignored, but e.g. a file in a subdirectory of dir should be ignored, as the "*" does not cross directory hierarchies. With `LEADINGDIR`, though, we would just see that "dir/" matches and return that the file is unignored, even if it is contained in a subdirectory. Instead, we want to ignore leading directories here and check "*.test". Afterwards, we have to iterate up to the parent directory and do the same checks. To fix the issue, disallow matching against leading directories in gitignore files. This can be trivially done by just adding the `GIT_ATTR_FNMATCH_NOLEADINGDIR` to the spec passed to `git_attr_fnmatch__parse`. Due to a bug in that function, though, this flag is being ignored for negative patterns, which is fixed in this commit, as well. As a last fix, we need to ignore rules that are supposed to match a directory when our path itself is a file. All together, these changes fix the described error case.
* attr_file: fix handling of directory patterns with trailing spacesPatrick Steinhardt2018-04-121-10/+0
| | | | | | | | | | | | | | | | When comparing whether a path matches a directory rule, we pass the both the path and directory name to `fnmatch` with `GIT_ATTR_FNMATCH_DIRECTORY` being set. `fnmatch` expects the pattern to contain no trailing directory '/', which is why we try to always strip patterns of trailing slashes. We do not handle that case correctly though when the pattern itself has trailing spaces, causing the match to fail. Fix the issue by stripping trailing spaces and tabs for a rule previous to checking whether the pattern is a directory pattern with a trailing '/'. This replaces the whitespace-stripping in our ignore file parsing code, which was stripping whitespaces too late. Add a test to catch future breakage.
* win32: strncmp -> git__strncmpethomson/strncmp_stdcallEdward Thomson2018-02-281-1/+1
| | | | | | | | The win32 C library is compiled cdecl, however when configured with `STDCALL=ON`, our functions (and function pointers) will use the stdcall calling convention. You cannot set a `__stdcall` function pointer to a `__cdecl` function, so it's easier to just use our `git__strncmp` instead of sorting that mess out.
* attr: avoid stat'ting files for bare repositoriesPatrick Steinhardt2018-02-011-1/+5
| | | | | | | | | | | | | | | | | | | | | Depending on whether the path we want to look up an attribute for is a file or a directory, the fnmatch function will be called with different flags. Because of this, we have to first stat(3) the path to determine whether it is a file or directory in `git_attr_path__init`. This is wasteful though in bare repositories, where we can already be assured that the path will never exist at all due to there being no worktree. In this case, we will execute an unnecessary syscall, which might be noticeable on networked file systems. What happens right now is that we always pass the `GIT_DIR_FLAG_UNKOWN` flag to `git_attr_path__init`, which causes it to `stat` the file itself to determine its type. As it is calling `git_path_isdir` on the path, which will always return `false` in case the path does not exist, we end up with the path always being treated as a file in case of a bare repository. As such, we can just check the bare-repository case in all callers and then pass in `GIT_DIR_FLAG_FALSE` ourselves, avoiding the need to `stat`. While this may not always be correct, it at least is no different from our current behavior.
* Ignore trailing whitespace in .gitignore files (as git itself does)David Turner2017-10-291-0/+10
|
* ignore: honor case insensitivity for negative ignoresPatrick Steinhardt2017-08-251-4/+14
| | | | | | | | | | | | | | | | When computing negative ignores, we throw away any rule which does not undo a previous rule to optimize. But on case insensitive file systems, we need to keep in mind that a negative ignore can also undo a previous rule with different case, which we did not yet honor while determining whether a rule undoes a previous one. So in the following example, we fail to unignore the "/Case" directory: /case !/Case Make both paths checking whether a plain- or wildcard-based rule undo a previous rule aware of case-insensitivity. This fixes the described issue.
* ignore: keep negative rules containing wildcardsPatrick Steinhardt2017-08-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | Ignore rules allow for reverting a previously ignored rule by prefixing it with an exclamation mark. As such, a negative rule can only override previously ignored files. While computing all ignore patterns, we try to use this fact to optimize away some negative rules which do not override any previous patterns, as they won't change the outcome anyway. In some cases, though, this optimization causes us to get the actual ignores wrong for some files. This may happen whenever the pattern contains a wildcard, as we are unable to reason about whether a pattern overrides a previous pattern in a sane way. This happens for example in the case where a gitignore file contains "*.c" and "!src/*.c", where we wouldn't un-ignore files inside of the "src/" subdirectory. In this case, the first solution coming to mind may be to just strip the "src/" prefix and simply compare the basenames. While that would work here, it would stop working as soon as the basename pattern itself is different, like for example with "*x.c" and "!src/*.c. As such, we settle for the easier fix of just not optimizing away rules that contain a wildcard.
* ignore: return early to avoid useless indentationPatrick Steinhardt2017-08-251-26/+24
|
* ignore: fix indentation of comment blockPatrick Steinhardt2017-08-251-6/+6
|
* Make sure to always include "common.h" firstPatrick Steinhardt2017-07-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
* Add missing license headersPatrick Steinhardt2017-07-031-0/+7
| | | | | Some implementation files were missing the license headers. This commit adds them.
* repository: use `git_repository_item_path`Patrick Steinhardt2017-02-131-1/+7
| | | | | | | | | | | | | | The recent introduction of the commondir variable of a repository requires callers to distinguish whether their files are part of the dot-git directory or the common directory shared between multpile worktrees. In order to take the burden from callers and unify knowledge on which files reside where, the `git_repository_item_path` function has been introduced which encapsulate this knowledge. Modify most existing callers of `git_repository_path` to use `git_repository_item_path` instead, thus making them implicitly aware of the common directory.
* ignore: there must be a repositoryEtienne Samson2017-01-131-3/+3
| | | Otherwise we'll NULL-dereference in git_attr_cache__init
* giterr_set: consistent error messagesEdward Thomson2016-12-291-1/+1
| | | | | | | | Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
* ignore: allow unignoring basenames in subdirectoriesPatrick Steinhardt2016-08-121-16/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .gitignore file allows for patterns which unignore previous ignore patterns. When unignoring a previous pattern, there are basically three cases how this is matched when no globbing is used: 1. when a previous file has been ignored, it can be unignored by using its exact name, e.g. foo/bar !foo/bar 2. when a file in a subdirectory has been ignored, it can be unignored by using its basename, e.g. foo/bar !bar 3. when all files with a basename are ignored, a specific file can be unignored again by specifying its path in a subdirectory, e.g. bar !foo/bar The first problem in libgit2 is that we did not correctly treat the second case. While we verified that the negative pattern matches the tail of the positive one, we did not verify if it only matches the basename of the positive pattern. So e.g. we would have also negated a pattern like foo/fruz_bar !bar Furthermore, we did not check for the third case, where a basename is being unignored in a certain subdirectory again. Both issues are fixed with this commit.
* ignore: don't use realpath to canonicalize pathcmn/ignore-symlinkCarlos Martín Nieto2016-04-021-3/+11
| | | | | | If we're looking for a symlink, realpath will give us the resolved path, which is not what we're after, but a canonicalized version of the path the user asked for.
* ignore: add test and adjust style and comment for dir with wildmatchCarlos Martín Nieto2015-09-131-7/+9
| | | | | | The previous commit left the comment referencing the earlier state of the code, change it to explain the current logic. While here, change the logic to avoid repeating the copy of the base pattern.
* Fix 'If we're dealing with a directory' checkVsevolod Parfenov2015-08-241-1/+1
|
* ignore: clear the error when matching a pattern negationcmn/ignored-ignoreCarlos Martín Nieto2015-05-201-0/+1
| | | | | | When we discover that we want to keep a negative rule, make sure to clear the error variable, as it we otherwise return whatever was left by the previous loop iteration.
* Improvements to ignore performance on Windows.J Wyman2015-04-281-3/+3
| | | | Minimizing the number directory and file opens, minimizes the amount of IO thus reducing the overall cost of performing ignore operations.
* ignore: fix negative ignores without wildcards.Patrick Steinhardt2015-04-171-5/+45
|
* attrcache: don't re-read attrs during checkoutEdward Thomson2015-02-031-2/+2
| | | | | | | | | | | During checkout, assume that the .gitattributes files aren't modified during the checkout. Instead, create an "attribute session" during checkout. Assume that attribute data read in the same checkout "session" hasn't been modified since the checkout started. (But allow subsequent checkouts to invalidate the cache.) Further, cache nonexistent git_attr_file data even when .gitattributes files are not found to prevent re-scanning for nonexistent files.
* ignore: match git's rule negation rulescmn/neg-ignore-dirCarlos Martín Nieto2014-12-051-3/+86
| | | | | | | | | | | A rule can only negate something which was explicitly mentioned in the rules before it. Change our parsing to ignore a negative rule which does not negate something mentioned in the rules above it. While here, fix a wrong allocator usage. The memory for the match string comes from pool allocator. We must not free it with the general allocator. We can instead simply forget the string and it will be cleaned up.