summaryrefslogtreecommitdiff
path: root/dir.c
Commit message (Collapse)AuthorAgeFilesLines
* Avoid using 'lstat()' to figure out directoriesLinus Torvalds2009-07-091-5/+42
| | | | | | | | | If we have an up-to-date index entry for a file in that directory, we can know that the directories leading up to that file must be directories. No need to do an lstat() on the directory. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Avoid doing extra 'lstat()'s for d_type if we have an up-to-date cache entryLinus Torvalds2009-07-091-5/+9
| | | | | | | | | | On filesystems without d_type, we can look at the cache entry first. Doing an lstat() can be expensive. Reported by Dmitry Potapov for Cygwin. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Simplify read_directory[_recursive]() argumentsLinus Torvalds2009-07-091-29/+28
| | | | | | | | | Stop the insanity with separate 'path' and 'base' arguments that must match. We don't need that crazy interface any more, since we cleaned up handling of 'path' in commit da4b3e8c28b1dc2b856d2555ac7bb47ab712598c. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add 'fill_directory()' helper function for directory traversalLinus Torvalds2009-07-091-1/+22
| | | | | | | | | | | | | | | | | | | | | | Most of the users of "read_directory()" actually want a much simpler interface than the whole complex (but rather powerful) one. In fact 'git add' had already largely abstracted out the core interface issues into a private "fill_directory()" function that was largely applicable almost as-is to a number of callers. Yes, 'git add' wants to do some extra work of its own, specific to the add semantics, but we can easily split that out, and use the core as a generic function. This function does exactly that, and now that much simplified 'fill_directory()' function can be shared with a number of callers, while also ensuring that the rather more complex calling conventions of read_directory() are used by fewer call-sites. This also makes the 'common_prefix()' helper function private to dir.c, since all callers are now in that file. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Convert existing die(..., strerror(errno)) to die_errno()Thomas Rast2009-06-271-1/+1
| | | | | | | | | | | Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* git-add: no need for -f when resolving a conflict in already tracked pathJeff King2009-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When a path F that matches ignore pattern has a conflict, "git add F" insisted the -f option be given, which did not make sense. It would have required -f when the path was originally added, but when resolving a conflict, it already is tracked. So this should work (and does): $ echo file >.gitignore $ echo content >file $ git add -f file ;# need -f because we are adding new path $ echo more content >>file $ git add file ;# don't need -f; it is not actually an "other" file This is handled under the hood by the COLLECT_IGNORED option to read_directory. When that code finds an ignored file, it checks the index to make sure it is not actually a tracked file. However, the test it uses does not take into account unmerged entries, and considers them to still be ignored. "git ls-files" uses a more elaborate test and gets the right answer and the same test should be used here. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir.c: clean up handling of 'path' parameter in read_directory_recursive()Linus Torvalds2009-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Right now we pass two different pathnames ('path' and 'base') down to read_directory_recursive(), and the only real reason for that is that we want to allow an empty 'base' parameter, but when we do so, we need the pathname to "opendir()" to be "." rather than the empty string. And rather than handle that confusion in the caller, we can just fix read_directory_recursive() to handle the case of an empty path itself, by just passing opendir() a "." ourselves if the path is empty. This would allow us to then drop one of the pathnames entirely from the calling convention, but rather than do that, we'll start separating them out as a "filesystem pathname" (the one we use for filesystem accesses) and a "git internal base name" (which is the name that we use for git internally). That will eventually allow us to do things like handle different encodings (eg the filesystem pathnames might be Latin1, while git itself would use UTF-8 for filename information). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'maint'Junio C Hamano2009-05-051-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | * maint: improve error message in config.c t4018-diff-funcname: add cpp xfuncname pattern to syntax test Work around BSD whose typeof(tv.tv_sec) != time_t git-am.txt: reword extra headers in message body git-am.txt: Use date or value instead of time or timestamp git-am.txt: add an 'a', say what 'it' is, simplify a sentence dir.c: Fix two minor grammatical errors in comments git-svn: fix a sloppy Getopt::Long usage
| * Merge branch 'maint-1.6.0' into maintJunio C Hamano2009-05-051-2/+2
| |\ | | | | | | | | | | | | * maint-1.6.0: dir.c: Fix two minor grammatical errors in comments
| | * dir.c: Fix two minor grammatical errors in commentsAllan Caffee2009-05-051-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Allan Caffee <allan.caffee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | Merge branch 'fg/maint-exclude-bq' into maintJunio C Hamano2009-03-111-1/+1
| |\ \ | | |/ | | | | | | | | | * fg/maint-exclude-bq: Support "\" in non-wildcard exclusion entries
* | | Fix a bunch of pointer declarations (codestyle)Felipe Contreras2009-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | Essentially; s/type* /type */ as per the coding guidelines. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'mv/parseopt-ls-files'Junio C Hamano2009-03-201-8/+9
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * mv/parseopt-ls-files: ls-files: fix broken --no-empty-directory t3000: use test_cmp instead of diff parse-opt: migrate builtin-ls-files. Turn the flags in struct dir_struct into a single variable Conflicts: builtin-ls-files.c t/t3000-ls-files-others.sh
| * | | Turn the flags in struct dir_struct into a single variableJohannes Schindelin2009-02-181-8/+9
| |/ / | | | | | | | | | | | | | | | | | | | | | By having flags represented as bits in the new member variable 'flags', it will be easier to use parse_options when dir_struct is involved. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'kb/checkout-optim'Junio C Hamano2009-03-171-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kb/checkout-optim: Revert "lstat_cache(): print a warning if doing ping-pong between cache types" checkout bugfix: use stat.mtime instead of stat.ctime in two places Makefile: Set compiler switch for USE_NSEC Create USE_ST_TIMESPEC and turn it on for Darwin Not all systems use st_[cm]tim field for ns resolution file timestamp Record ns-timestamps if possible, but do not use it without USE_NSEC write_index(): update index_state->timestamp after flushing to disk verify_uptodate(): add ce_uptodate(ce) test make USE_NSEC work as expected fix compile error when USE_NSEC is defined check_updates(): effective removal of cache entries marked CE_REMOVE lstat_cache(): print a warning if doing ping-pong between cache types show_patch_diff(): remove a call to fstat() write_entry(): use fstat() instead of lstat() when file is open write_entry(): cleanup of some duplicated code create_directories(): remove some memcpy() and strchr() calls unlink_entry(): introduce schedule_dir_for_removal() lstat_cache(): swap func(length, string) into func(string, length) lstat_cache(): generalise longest_match_lstat_cache() lstat_cache(): small cleanup and optimisation
| * | | lstat_cache(): swap func(length, string) into func(string, length)Kjetil Barvik2009-02-091-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Swap function argument pair (length, string) into (string, length) to conform with the commonly used order inside the GIT source code. Also, add a note about this fact into the coding guidelines. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'fg/exclude-bq'Junio C Hamano2009-03-051-1/+1
|\ \ \ | |/ / |/| / | |/ | | * fg/exclude-bq: Support "\" in non-wildcard exclusion entries
| * Support "\" in non-wildcard exclusion entriesFinn Arne Gangstad2009-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "\" was treated differently in exclude rules depending on whether a wildcard match was done. For wildcard rules, "\" was de-escaped in fnmatch, but this was not done for other rules since they used strcmp instead. A file named "#foo" would not be excluded by "\#foo", but would be excluded by "\#foo*". We now treat all rules with "\" as wildcard rules. Another solution could be to de-escape all non-wildcard rules as we read them, but we would have to do the de-escaping exactly as fnmatch does it to avoid inconsistencies. Signed-off-by: Finn Arne Gangstad <finnag@pvv.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'cb/add-pathspec'Junio C Hamano2009-01-251-8/+11
|\ \ | | | | | | | | | | | | | | | * cb/add-pathspec: remove pathspec_match, use match_pathspec instead clean up pathspec matching
| * | remove pathspec_match, use match_pathspec insteadClemens Buchacher2009-01-141-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both versions have the same functionality. This removes any redundancy. This also adds makes two extensions to match_pathspec: - If pathspec is NULL, return 1. This reflects the behavior of git commands, for which no paths usually means "match all paths". - If seen is NULL, do not use it. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | clean up pathspec matchingClemens Buchacher2009-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | If pathspec already matched exactly, it cannot match any more. Originally, we had to continue anyways, because we did not differentiate between exact, recursive and globbing matches. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'rs/ctype'Junio C Hamano2009-01-211-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * rs/ctype: Add is_regex_special() Change NUL char handling of isspecial() Reformat ctype.c Add ctype test Conflicts: Makefile
| * | | Change NUL char handling of isspecial()René Scharfe2009-01-171-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace isspecial() by the new macro is_glob_special(), which is more, well, specialized. The former included the NUL char in its character class, while the letter only included characters that are special to file name globbing. The new name contains underscores because they enhance readability considerably now that it's made up of three words. Renaming the function is necessary to document its changed scope. The call sites of isspecial() are updated to check explicitly for NUL. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Allow cloning to an existing empty directoryAlexander Potashev2009-01-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The die() message updated accordingly. The previous behaviour was to only allow cloning when the destination directory doesn't exist. [jc: added trivial tests] Signed-off-by: Alexander Potashev <aspotashev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | add is_dot_or_dotdot inline functionAlexander Potashev2009-01-111-8/+4
|/ / | | | | | | | | | | | | | | | | | | A new inline function is_dot_or_dotdot is used to check if the directory name is either "." or "..". It returns a non-zero value if the given string is "." or "..". It's applicable to a lot of Git source code. Signed-off-by: Alexander Potashev <aspotashev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | dir.c: make dir_add_name() and dir_add_ignored() staticNanako Shiraishi2008-10-021-2/+2
| | | | | | | | | | | | | | These functions are not used by any other file. Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Merge branch 'maint' into bc/master-diff-hunk-header-fixShawn O. Pearce2008-09-291-0/+20
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint: (41 commits) Clarify commit error message for unmerged files Use strchrnul() instead of strchr() plus manual workaround Use remove_path from dir.c instead of own implementation Add remove_path: a function to remove as much as possible of a path git-submodule: Fix "Unable to checkout" for the initial 'update' Clarify how the user can satisfy stash's 'dirty state' check. Remove empty directories in recursive merge Documentation: clarify the details of overriding LESS via core.pager Update release notes for 1.6.0.3 checkout: Do not show local changes when in quiet mode for-each-ref: Fix --format=%(subject) for log message without newlines git-stash.sh: don't default to refs/stash if invalid ref supplied maint: check return of split_cmdline to avoid bad config strings builtin-prune.c: prune temporary packs in <object_dir>/pack directory Do not perform cross-directory renames when creating packs Use dashless git commands in setgitperms.perl git-remote: do not use user input in a printf format string make "git remote" report multiple URLs Start draft release notes for 1.6.0.3 git-repack uses --no-repack-object, not --no-repack-delta. ... Conflicts: RelNotes
| * Add remove_path: a function to remove as much as possible of a pathAlex Riesen2008-09-291-0/+20
| | | | | | | | | | | | | | | | | | The function has two potential users which both managed to get wrong their implementations (the one in builtin-rm.c one has a memleak, and builtin-merge-recursive.c scribles over its const argument). Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | dir.c: Avoid c99 array initializationBrandon Casey2008-08-281-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following syntax: char foo[] = { [0] = 1, [7] = 2, [15] = 3 }; is a c99 construct which some compilers do not support even though they support other c99 constructs. This construct can be avoided by folding these 'special' test cases into the sane_ctype array and making use of the related infrastructure. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'jc/add-stop-at-symlink'Junio C Hamano2008-08-201-1/+5
|\ \ | |/ |/| | | | | | | * jc/add-stop-at-symlink: add: refuse to add working tree items beyond symlinks update-index: refuse to add working tree items beyond symlinks
| * add: refuse to add working tree items beyond symlinksJunio C Hamano2008-08-041-1/+5
| | | | | | | | | | | | | | This is the same fix for the issue of adding "sym/path" when "sym" is a symblic link that points at a directory "dir" with "path" in it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Fix escaping of glob special characters in pathspecsKevin Ballard2008-08-131-1/+1
|/ | | | | | | | | | match_one implements an optimized pathspec match where it only uses fnmatch if it detects glob special characters in the pattern. Unfortunately it didn't treat \ as a special character, so attempts to escape a glob special character would fail even though fnmatch() supports it. Signed-off-by: Kevin Ballard <kevin@sb.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'lt/case-insensitive'Junio C Hamano2008-05-101-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | * lt/case-insensitive: Make git-add behave more sensibly in a case-insensitive environment When adding files to the index, add support for case-independent matches Make unpack-tree update removed files before any updated files Make branch merging aware of underlying case-insensitive filsystems Add 'core.ignorecase' option Make hash_name_lookup able to do case-independent lookups Make "index_name_exists()" return the cache_entry it found Move name hashing functions into a file of its own Make unpack_trees_options bit flags actual bitfields
| * Add 'core.ignorecase' optionLinus Torvalds2008-04-091-1/+1
| | | | | | | | | | | | | | | | | | ..and start using it for directory entry traversal (ie "git status" will not consider entries that match an existing entry case-insensitively to be a new file) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * Make hash_name_lookup able to do case-independent lookupsLinus Torvalds2008-04-091-1/+1
| | | | | | | | | | | | | | | | Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Optimize match_pathspec() to avoid fnmatch()Linus Torvalds2008-04-261-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "git add *" is actually fundamentally different from "git add .", and yeah, you should generally use the latter. The reason? The argument list is actually something different from what you think it is. For git, it's a "pathspec", so what actualy happens is that in *both* cases, it will really traverse the whole tree, and then match every file it finds against the pathspec. So think of the arguments not as a file list, but as a random bunch of patterns to match against the files you have! Which is why the cost is actually approximately O(n*m), where "n" is the size of the working tree, and "m" is the number of pathspecs. So the reason "git add ." is fast is actually that "m" in that case is just 1 (just one trivial pattern), and then "git add *" is slow because "m" is large (lots of complicated patterns). In both cases, 'n' is the same (== the whole set of files in your working tree). Anyway, here's a trivial patch that doesn't change this fundamental fact, but that avoids doing anything *expensive* until we've done some cheap initial tests. It may or may not help your test-case, but it's pretty simple and it matches the other git optimizations in this area (ie "conceptually handle the general case, but optimize the simple cases where we can exit early") Notice how this patch doesn' actually change the fundamental O(n^2) behaviour, but it makes it much cheaper by generally avoiding the expensive 'fnmatch' and 'strlen/strncmp' when they are obviously not needed. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | git clean: Don't automatically remove directories when run within subdirectoryShawn Bohrer2008-04-141-1/+1
|/ | | | | | | | | | | | | | | When git clean is run from a subdirectory it should follow the normal policy and only remove directories if they are passed in as a pathspec, or -d is specified. The fix is to send len which could be shorter than ent->len because we have stripped the trailing '/' that read_directory adds. Additionaly match_one() was modified to allow a name[] that is not NUL terminated. This allows us to check if the name matched the pathspec exactly instead of recursively. Signed-off-by: Shawn Bohrer <shawn.bohrer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Avoid unnecessary "if-before-free" tests.Jim Meyering2008-02-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change removes all obvious useless if-before-free tests. E.g., it replaces code like this: if (some_expression) free (some_expression); with the now-equivalent: free (some_expression); It is equivalent not just because POSIX has required free(NULL) to work for a long time, but simply because it has worked for so long that no reasonable porting target fails the test. Here's some evidence from nearly 1.5 years ago: http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html FYI, the change below was prepared by running the following: git ls-files -z | xargs -0 \ perl -0x3b -pi -e \ 's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s' Note however, that it doesn't handle brace-enclosed blocks like "if (x) { free (x); }". But that's ok, since there were none like that in git sources. Beware: if you do use the above snippet, note that it can produce syntactically invalid C code. That happens when the affected "if"-statement has a matching "else". E.g., it would transform this if (x) free (x); else foo (); into this: free (x); else foo (); There were none of those here, either. If you're interested in automating detection of the useless tests, you might like the useless-if-before-free script in gnulib: [it *does* detect brace-enclosed free statements, and has a --name=S option to make it detect free-like functions with different names] http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free Addendum: Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jc/gitignore-ends-with-slash'Junio C Hamano2008-02-161-11/+38
|\ | | | | | | | | | | * jc/gitignore-ends-with-slash: gitignore: lazily find dtype gitignore(5): Allow "foo/" in ignore list to match directory "foo"
| * gitignore: lazily find dtypeJunio C Hamano2008-02-051-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we process "foo/" entries in gitignore files on a system that does not have d_type member in "struct dirent", the earlier implementation ran lstat(2) separately when matching with entries that came from the command line, in-tree .gitignore files, and $GIT_DIR/info/excludes file. This optimizes it by delaying the lstat(2) call until it becomes absolutely necessary. The initial idea for this change was by Jeff King, but I optimized it further to pass pointers to around. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * gitignore(5): Allow "foo/" in ignore list to match directory "foo"Junio C Hamano2008-02-051-11/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pattern "foo/" in the exclude list did not match directory "foo", but a pattern "foo" did. This attempts to extend the exclude mechanism so that it would while not matching a regular file or a symbolic link "foo". In order to differentiate a directory and non directory, this passes down the type of path being checked to excluded() function. A downside is that the recursive directory walk may need to run lstat(2) more often on systems whose "struct dirent" do not give the type of the entry; earlier it did not have to do so for an excluded path, but we now need to figure out if a path is a directory before deciding to exclude it. This is especially bad because an idea similar to the earlier CE_UPTODATE optimization to reduce number of lstat(2) calls would by definition not apply to the codepaths involved, as (1) directories will not be registered in the index, and (2) excluded paths will not be in the index anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Create pathname-based hash-table lookup into indexLinus Torvalds2008-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Make on-disk index representation separate from in-core oneLinus Torvalds2008-01-211-1/+1
|/ | | | | | | | | | | | | | | | This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix a memory leak李鸿2007-12-161-0/+3
| | | | | Signed-off-by: Li Hong <leehong@pku.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'kh/commit'Junio C Hamano2007-12-041-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kh/commit: (33 commits) git-commit --allow-empty git-commit: Allow to amend a merge commit that does not change the tree quote_path: fix collapsing of relative paths Make git status usage say git status instead of git commit Fix --signoff in builtin-commit differently. git-commit: clean up die messages Do not generate full commit log message if it is not going to be used Remove git-status from list of scripts as it is builtin Fix off-by-one error when truncating the diff out of the commit message. builtin-commit.c: export GIT_INDEX_FILE for launch_editor as well. Add a few more tests for git-commit builtin-commit: Include the diff in the commit message when verbose. builtin-commit: fix partial-commit support Fix add_files_to_cache() to take pathspec, not user specified list of files Export three helper functions from ls-files builtin-commit: run commit-msg hook with correct message file builtin-commit: do not color status output shown in the message template file_exists(): dangling symlinks do exist Replace "runstatus" with "status" in the tests t7501-commit: Add test for git commit <file> with dirty index. ...
| * file_exists(): dangling symlinks do existJunio C Hamano2007-11-221-4/+3
| | | | | | | | | | | | | | | | | | This function is used to see if a path given by the user does exist on the filesystem. A symbolic link that does not point anywhere does exist but running stat() on it would yield an error, and it incorrectly said it does not exist. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | per-directory-exclude: lazily read .gitignore filesJunio C Hamano2007-11-291-50/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Operations that walk directories or trees, which potentially need to consult the .gitignore files, used to always try to open the .gitignore file every time they entered a new directory, even when they ended up not needing to call excluded() function to see if a path in the directory is ignored. This was done by push/pop exclude_per_directory() functions that managed the data in a stack. This changes the directory walking API to remove the need to call these two functions. Instead, the directory walk data structure caches the data used by excluded() function the last time, and lazily reuses it as much as possible. Among the data the last check used, the ones from deeper directories that the path we are checking is outside are discarded, data from the common leading directories are reused, and then the directories between the common directory and the directory the path being checked is in are checked for .gitignore file. This is very similar to the way gitattributes are handled. This API change also fixes "ls-files -c -i", which called excluded() without setting up the gitignore data via the old push/pop functions. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | dir.c: minor clean-upJunio C Hamano2007-11-291-9/+4
|/ | | | | | | Replace handcrafted reallocation with ALLOC_GROW(). Reindent "file_exists()" helper function. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Fix per-directory exclude handing for "git add"Junio C Hamano2007-11-161-2/+4
| | | | | | | | | | | | | | | | | | | In "dir_struct", each exclusion element in the exclusion stack records a base string (pointer to the beginning with length) so that we can tell where it came from, but this pointer is just pointing at the parameter that is given by the caller to the push_exclude_per_directory() function. While read_directory_recursive() runs, calls to excluded() makes use the data in the exclusion elements, including this base string. The caller of read_directory_recursive() is not supposed to free the buffer it gave to push_exclude_per_directory() earlier, until it returns. The test case Bruce Stephens gave in the mailing list discussion was simplified and added to the t3700 test. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* core.excludesfile clean-upJunio C Hamano2007-11-141-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are inconsistencies in the way commands currently handle the core.excludesfile configuration variable. The problem is the variable is too new to be noticed by anything other than git-add and git-status. * git-ls-files does not notice any of the "ignore" files by default, as it predates the standardized set of ignore files. The calling scripts established the convention to use .git/info/exclude, .gitignore, and later core.excludesfile. * git-add and git-status know about it because they call add_excludes_from_file() directly with their own notion of which standard set of ignore files to use. This is just a stupid duplication of code that need to be updated every time the definition of the standard set of ignore files is changed. * git-read-tree takes --exclude-per-directory=<gitignore>, not because the flexibility was needed. Again, this was because the option predates the standardization of the ignore files. * git-merge-recursive uses hardcoded per-directory .gitignore and nothing else. git-clean (scripted version) does not honor core.* because its call to underlying ls-files does not know about it. git-clean in C (parked in 'pu') doesn't either. We probably could change git-ls-files to use the standard set when no excludes are specified on the command line and ignore processing was asked, or something like that, but that will be a change in semantics and might break people's scripts in a subtle way. I am somewhat reluctant to make such a change. On the other hand, I think it makes perfect sense to fix git-read-tree, git-merge-recursive and git-clean to follow the same rule as other commands. I do not think of a valid use case to give an exclude-per-directory that is nonstandard to read-tree command, outside a "negative" test in the t1004 test script. This patch is the first step to untangle this mess. The next step would be to teach read-tree, merge-recursive and clean (in C) to use setup_standard_excludes(). Signed-off-by: Junio C Hamano <gitster@pobox.com>