| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Now that its easier to craft test cases (thanks to 'data <<')
we should start to verify fast-import works as expected.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During testing its nice to not have to feed the length of a data
chunk to the 'data' command of fast-import. Instead we would
prefer to be able to establish a data chunk much like shell's <<
operator and use a line delimiter to denote the end of the input.
So now if a data command is started as 'data <<EOF' we will look
for a terminator line containing only the string EOF on that line.
Once found, we stop the data command. Everything between the two
lines is used as the data value.
The 'data <<' syntax is slower than 'data n', as we don't know how
many bytes to expect and instead must grow our buffer on the fly.
It also has the problem that the frontend must use a string which
will not appear on a line by itself in the input, and the data
region will always end in an LF. For these reasons real import
frontends are encouraged to continue to use _only_ 'data n'.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The --objects command line option is rather unnecessary. Internally
we allocate objects in 5000 unit blocks, ensuring that any sort
of malloc overhead is ammortized over the individual objects to
almost nothing. Since most frontends don't know how many objects
they will need for a given import run (and its hard for them to
predict without just doing the run) we probably won't see anyone
using --objects. Further since there's really no major benefit
to using the option, most frontends won't even bother supplying
it even if they could estimate the number of objects. So I'm
removing it.
The --max-objects-per-pack option was probably a mistake to even
have added in the first place. The packfile format is limited
to 4 GiB today; given that objects need at least 3 bytes of data
(and probably need even more) there's no way we are going to exceed
the limit of 1<<32-1 objects before we reach the file size limit.
So I'm removing it (to slightly reduce the complexity of the code)
before anyone gets any wise ideas and tries to use it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
| |
Currently the pack .idx file format uses 32-bit unsigned integers
for the fan-out table and the object offsets. We had previously
defined these as 'unsigned int', but not every system will define
that type to be a 32 bit value. To ensure maximum portability we
should always use 'uint32_t'.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were using 'unsigned int' to update the hdr_entries
field of the pack header after the file had been completed and
was being hashed. This may not be 32 bits on all platforms.
Instead we want to always uint32_t.
I'm actually cheating here by just using the pack_header like the
rest of Git and letting the struct definition declare the correct
type. Right now that field is still 'unsigned int' (wrong) but a
pending change submitted by Simon 'corecode' Schubert changes it
to uint32_t. After that change is merged in fast-import will do
the right thing all of the time.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Branches are only contained by a packfile if the branch actually
had its most recent commit in that packfile. So new branches are
set to MAX_PACK_ID to ensure they don't cause their commit to list
as part of the first packfile when it closes out if the commit was
actually in existance before fast-import started.
Also corrected the type of last_commit to be umaxint_t to prevent
overflow and wraparound on very large imports. Though that is
highly unlikely to occur as we're talking 4 billion commits, which
no real project has right now.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
| |
Apparently the git convention is to declare any function which
takes no arguments as taking void. I did not do this during the
early fast-import development, but should have.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The length of an atom string cannot be negative. So make it
explicit and declare it as an unsigned value.
The shift width in a mark table node also cannot be negative.
I'm also moving it to after the pointer arrays to prevent any
possible alignment problems on a 64 bit system.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that fast-import uses uintmax_t (the largest available unsigned
integer type) for marks we don't want to say its an unsigned 32
bit integer in ASCII base 10 notation. It could be much larger,
especially on 64 bit systems, and especially if a frontend uses
a very large number of marks (1 per file revision on a very, very
large import).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To help callers repack very large repositories into a series of
packfiles fast-import now outputs the last commits/tags it wrote to
a packfile when it prints out the packfile name. This information
can be feed to pack-objects --revs to repack. For the first pack
of an initial import this is pretty easy (just feed those SHA1s on
stdin) but for subsequent packs you want to feed the subsequent
pack's final SHA1s but also all prior pack's SHA1s prefixed with
the negation operator. This way the prior pack's data does not
get included into the subsequent pack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since object_count is limited to 'unsigned long' (really an
unsigned 32 bit integer value) by the pack file format we may as
well use exactly that type here in fast-import for that counter.
An earlier change by me incorrectly made it uintmax_t.
But since object_count is a counter for the current packfile only,
we don't want to output its value at the end. Instead we should
sum up the individual type counters and report that total, as that
will cover all of the packfiles.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
| |
Apparently amd64 has defined 'unsigned long' to be a 64 bit value,
which means -1 was way over the 4 GiB packfile limit. Whoops.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Much like the pack_sha1 the pack_fd is an unnecessary global
variable, we already have the fd stored in our struct packed_git
*pack_data so that the core library functions in sha1_file.c are
able to lookup and decompress object data that we have previously
written. Keeping an extra copy of this value in our own variable
is just a hold-over from earlier versions of fast-import and is
now completely unnecessary.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
| |
Because we are renaming the packfile into its file destination we
need to be sure its not open when the rename is called, otherwise
some operating systems (e.g. Windows) may prevent the rename from
occurring.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because fast-import automatically updates all references (heads
and tags) at the end of its run the repository is corrupt unless
the objects are available in the .git/objects/pack directory prior
to the refs being modified. The easiest way to ensure that is true
is to move the packfile and its associated index directly into the
.git/objects/pack directory as soon as we have finished output to it.
But the only safe way to do this is to create the a temporary .keep
file for that pack, so we use the same tricks that index-pack uses
when its being invoked by receive-pack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
| |
Rather than maintaing our own packfile level sha1 variable we
can make use of the one already available in struct packed_git.
Its meant for the SHA1 of the index but it can also hold the
SHA1 of the packfile itself between final checksumming of the
packfile and creation of the index.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
| |
Prior to git having read_in_full() fast-import used its own private
function yread to perform the header reading task. No sense in
keeping that around now that read_in_full is a public, stable
function.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a frontend wants to use a mark per file revision and per commit
and is doing a truly huge import (such as a 32 GiB SVN repository)
we may need more than 2**32 unique mark values, especially if the
frontend is unable (or unwilling) to recycle mark values. For mark
idnums we should use the largest unsigned integer type available,
hoping that will be at least 64 bits when we are compiled as a 64
bit executable. This way we may consume huge amounts of memory
storing our mark table, but we'll at least be able to process
the entire import without failing.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we previously were using a delta but we needed to checkpoint the
current packfile and switch to a new packfile we need to throw away
the delta and compress the raw object by itself, as delta chains
cannot span non-thin packfiles. Unfortunately the output buffer
in this case needs to grow, as the size of the compressed object
may be quite a bit larger than the size of the compressed delta.
I've also avoided recompressing the object if we are checkpointing
and we didn't use a delta. In this case the output buffer is the
correct size and has already been populated with the right data,
we just need to close out the current packfile and open a new one.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
| |
Caller scripts may want to know what packfiles the fast-import
process just wrote out for them. This is now output to stdout,
one packfile name per line, after we checkpoint each packfile.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the number of objects or number of bytes gets close to the limit
allowed by the packfile format (or configured on the command line by
our caller) we should automatically checkpoint the current packfile
and start a new one before writing the object out. This does however
require that we abandon the delta (if we had one) as its not valid
in a new packfile.
I also added the simple rule that if we got a delta back but the
delta itself is the same size as or larger than the uncompressed
object to ignore the delta and just store the object data. This
should avoid some really bad behavior caused by our current delta
strategy.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we are generating multiple packfiles at once we only need
to scan the blocks of object_entry structs which contain objects
for the current packfile. Because the most recent blocks are at
the front of the linked list, and because all new objects going
into the current file are allocated from the front of that list,
we can stop scanning for objects as soon as we identify one which
doesn't belong to the current packfile.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
| |
If the last packfile is going to be empty (has 0 objects) then it
shouldn't be kept after the import has terminated, as there is no
point to the packfile. So rather than hashing it and making the
index file, just delete the packfile.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
To help importers which are dealing with massive amounts of data
fast-import needs to be able to close the packfile it is currently
writing to and open a new packfile for any additional data that
will be received. A new 'checkpoint' command has been introduced
which can be used by the frontend import process to force this
to occur at any time. This may be useful to ensure a very long
running import doesn't lose any work due to unexpected failures.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
| |
There is little reason to be keeping a global duplicate_count
value when we also keep it per object type. The global counter can
easily be computed at the end, once all processing has completed.
This saves us a couple of machine instructions in an unimportant
part of code. But it looks slightly better to me to not keep
two counters around.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we are starting to see some really large projects (such
as KDE or a fork of FreeBSD) get imported into Git we're running
into the upper limit on packfile object count as well as overall
byte length. The KDE and FreeBSD projects are both likely to
require more than 4 GiB to store their current history, which means
we really need multiple packfiles to handle their content.
This is a fairly simple restructuring of the internal code to help
us support creating multiple packfiles from within fast-import.
We are now adding a 5 digit incrementing suffix to the end of the
basename supplied to us by the caller, permitting up to 99,999
packs to be generated in a single fast-import run.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
| |
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the sha1_file.c library routines use the sliding mmap
routines to perform efficient access to portions of a packfile
I can remove that code from fast-import.c and just invoke it.
One benefit is we now have reloading support for any packfile which
uses OBJ_OFS_DELTA. Another is we have significantly less code
to maintain.
This code reuse change *requires* that fast-import generate only
an OBJ_OFS_DELTA format packfile, as there is absolutely no index
available to perform OBJ_REF_DELTA lookup in while unpacking
an object. This is probably reasonable to require as the delta
offsets result in smaller packfiles and are faster to unpack,
as no index searching is required. Its also only a temporary
requirement as users could always repack without offsets before
making the import available to older versions of Git.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I'm bringing master in early so that the OBJ_OFS_DELTA implementation
is available as part of the topic. This way git-fast-import can
learn about this new slightly smaller and faster packfile format,
and can generate them directly rather than needing to have them be
repacked with git-pack-objects.
Due to the API changes in master during the period of development
of git-fast-import, a few minor tweaks to fast-import.c are needed
to produce a working merge. I've done them here as part of the
merge to ensure bisection always works.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
| |
| |
| |
| | |
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| | |
Signed-off-by: Quy Tonthat <qtonthat@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some of the recent changes and shortcuts to the tests broke
things for people using older versions of svn:
t9104-git-svn-follow-parent.sh:
v1.2.3 (from SuSE 10.0 as reported by riddochc on #git
(thanks!)) required an extra 'svn up'. I was also able to
reproduce this with v1.1.4 (Debian Sarge).
lib-git-svn.sh:
SVN::Repos bindings in versions up to and including 1.1.4
(Sarge again) do not pass fs-config options to the underlying
library. BerkeleyDB repositories also seem completely broken
on all my Sarge machines; so not using FSFS does not seem to
be an option for most people.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Steven Grimm noticed that git-repack's verbosity is inconsistent
because pack-objects is chatty and prune-packed is not. This
makes the latter a bit more chatty and gives -q option to
squelch it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| | |
Pointed out by Paul Witt <paul.witt@oxix.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While 'init-db' still is and probably will always remain a valid git
command for obvious backward compatibility reasons, it would be a good
idea to move shipped tools and docs to using 'init' instead.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Andy Parkins noticed that the error message some "whole tree"
oriented commands emit is stated misleadingly when they refused
to run from a subdirectory.
We could probably allow some of them to work from a subdirectory
but that is a semantic change that could have unintended side
effects, so let's start at first by rewording the error message
to be easier to read without doing anything else to be safe.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is not available in the outermost merge, and it is only
useful for debugging merge-recursive in the inner merges.
Sergey Vlasov noticed that the old code accesses an
uninitialized location.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The code previously checked it's own name and called 'die' upon
an error. However 'die' was not yet defined because git-sh-setup
had not been sourced yet. Instead simply write the error message
to stderr and exit with an error as was originally desired.
Signed-off-by: Bob Proulx <bob@proulx.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Despite what the documentation claims, git-commit does not check commit
for suspicious lines: all hooks are disabled by default,
and the pre-comit hook could be changed to do something else.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The "read_or_die()" function would silently NOT die for a partial read,
and since it was of type "void" it obviously couldn't even return the
partial number of bytes read.
IOW, it was totally broken. This hopefully fixes it up.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With the new-and-improved write_in_full() semantics, where a partial write
simply always returns a real error (and always sets 'errno' when that
happens, including for the disk full case), a lot of the callers of
write_in_full() were just unnecessarily complex.
In particular, there's no reason to ever check for a zero length or
return: if the length was zero, we'll return zero, otherwise, if a disk
full resulted in the actual write() system call returning zero the
write_in_full() logic would have correctly turned that into a negative
return value, with 'errno' set to ENOSPC.
I really wish every "write_in_full()" user would just check against "<0"
now, but this fixes the nasty and stupid ones.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| | |
When --stale-fix is not passed, the code did not initialize the
two commit objects properly.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| | |
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| | |
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Recently git.git itself encountered a situation on its master and
next branches where git-describe stopped reporting 'v1.5.0-rc0-gN'
and instead started reporting 'v1.4.4.4-gN'. This appeared to be
a backward jump in version numbering.
maint o-------------------4
\ \
master o-o-o-o-o-o-o-5-o-C-o-W
The issue is that commit C in the diagram claims it is version
1.5.0, as the tag v1.5.0 is placed on commit 5. Yet commit W
claims it is version 1.4.4.4 as the tag v1.5.0 has an older tag
date than the v1.4.4.4 tag.
As it turns out this situation is very common. A bug fix applied
to maint and later merged into master occurs frequently enough that
it should Just Work Right(tm).
Rather than taking the first tag that gets found git-describe will
now generate a list of all possible tags and select the one which
has the most number of commits in common with HEAD (or whatever
revision the user requested the description of).
This rule is based on the principle shown in the diagram above.
There are a large number of commits on the primary development branch
'master' which do not appear in the 'maint' branch, and many of
these are already tagged as part of v1.5.0-rc0. Additionally these
commits are not in v1.4.4.4, as they are part of the v1.5.0 release
still being developed. The v1.5.0-rc0 tag is more descriptive of
W than v1.4.4.4 is, and therefore should be used.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* jc/bare:
Disallow working directory commands in a bare repository.
git-fetch: allow updating the current branch in a bare repository.
Introduce is_bare_repository() and core.bare configuration variable
Move initialization of log_all_ref_updates
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the user tries to run a porcelainish command which requires
a working directory in a bare repository they may get unexpected
results which are difficult to predict and may differ from command
to command.
Instead we should detect that the current repository is a bare
repository and refuse to run the command there, as there is no
working directory associated with it.
[jc: updated Shawn's original somewhat -- bugs are mine.]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sometimes, people have only fetch access into a bare repository
that is used as a back-up location (or a distribution point) but
does not have a push access for networking reasons, e.g. one end
being behind a firewall, and updating the "current branch" in
such a case is perfectly fine.
This allows such a fetch without --update-head-ok, which is a
flag that should never be used by end users otherwise.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This removes the old is_bare_git_dir(const char *) to ask if a
directory, if it is a GIT_DIR, is a bare repository, and
replaces it with is_bare_repository(void *). The function looks
at core.bare configuration variable if exists but uses the old
heuristics: if it is ".git" or ends with "/.git", then it does
not look like a bare repository, otherwise it does.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The patches to prevent Porcelainish that require working tree
from doing any damage in a bare repository make a lot of sense,
and I want to make the is_bare_git_dir() function more reliable.
In order to allow the repository owner override the heuristic
implemented in is_bare_git_dir() if/when it misidentifies a
particular repository, it would make sense to introduce a new
configuration variable "[core] bare = true/false", and make
is_bare_git_dir() notice it.
The scripts would do a 'repo-config --bool --get core.bare' and
iff the command fails (i.e. there is no such variable in the
configuration file), it would use the heuristic implemented at
the script level [*1*].
However, setup_git_env() which is called a lot earlier than we
even read from the repository configuration currently makes a
call to is_bare_git_dir(), in order to change the default
setting for log_all_ref_updates. It somehow feels that this is
a hack.
By the way, [*1*] is another thing I hate about the current
config mechanism. "git-repo-config --get" does not know what
the possible configuration variables are, let alone what the
default values for them are. It allows us not to maintain a
centralized configuration table, which makes it easy to
introduce ad-hoc variables and gives a warm fuzzy feeling of
being modular, but my feeling is that it is turning out to be a
rather high price to pay for scripts.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|