summaryrefslogtreecommitdiff
path: root/doc/workflow/git_annex.md
blob: 3efc882a1232496a32e4de6626867f76850e0c09 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Git annex

The biggest limitation of git compared to some older centralized version control systems has been the maximum size of the repositories.
The general recommendation is to not have git repositories larger than 1GB to preserve performance.
Although GitLab has no limit (some repositories in GitLab are over 50GB!) we subscribe to the advise to keep repositories as small as you can.

Not being able to version control large binaries is a big problem for many larger organizations.
Video, photo's, audio, compiled binaries and many other types of files are too large.
As a workaround, people keep artwork-in-progress in a Dropbox folder and only check in the final result.
This results in using outdated files, not having a complete history and the risk of losing work.

This problem is solved by integrating the awesome [git-annex](https://git-annex.branchable.com/).
Git-annex allows managing large binaries with git, without checking the contents into git.
You check in only a symlink that contains the SHA-1 of the large binary.
If you need the large binary you can sync it from the GitLab server over rsync, a very fast file copying tool.

<!-- more -->

## Using GitLab Annex

For example, if you want to upload a very large file and check it into your Git repository:

```bash
git clone git@gitlab.example.com:group/project.git
git annex init 'My Laptop'            # initialize the annex project
cp ~/tmp/debian.iso ./                # copy a large file into the current directory
git annex add .                       # add the large file to git annex
git commit -am"Added Debian iso"      # commit the file meta data
git annex sync --content              # sync the git repo and large file to the GitLab server
```

Downloading a single large file is also very simple:

```bash
git clone git@gitlab.example.com:group/project.git
git annex sync                        # sync git branches but not the large file
git annex get debian.iso              # download the large file
```

To download all files:

```bash
git clone git@gitlab.example.com:group/project.git
git annex sync --content              # sync git branches and download all the large files
```

You don't have to setup git-annex on a separate server or add annex remotes to the repository.
Git-annex without GitLab gives everyone that can access the server access to the files of all projects.
GitLab annex ensures you can only acces files of projects you work on (developer, master or owner role).

## How it works

Internally GitLab uses [GitLab Shell](https://gitlab.com/gitlab-org/gitlab-shell) to handle ssh access and this was a great integration point for git-annex.
We've added a setting to GitLab Shell so you can disable GitLab Annex support if you don't want it.

You'll have to use ssh style links for to git remote to your GitLab server instead of https style links.

## Troubleshooting tips

Differences in version of `git-annex` on `GitLab` server and on local machine can cause `git-annex` to raise unpredicted warnings and errors.
Although there is no general guide for `git-annex` errors, there are a few tips on how to go arround the warnings.

### git-annex-shell: Not a git-annex or gcrypt repository.

This warning can appear on inital `git annex sync --content`. This is caused by differences in `git-annex-shell`, read more about it in [this git-annex issue](https://git-annex.branchable.com/forum/Error_from_git-annex-shell_on_creation_of_gcrypt_special_remote/).

Important thing to note is that the `sync` succeeds and the files are pushed to the GitLab repository. After this warning it is required to do:

```
git config remote.origin.annex-ignore false
```

in the repository that was pushed.

Consecutive `git annex sync --content` **should not** produce this warning and the output should look like this:

```
commit  ok
pull origin
ok
pull origin
ok
push origin
```