summaryrefslogtreecommitdiff
path: root/doc/user/project/repository/reducing_the_repo_size_using_git.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/user/project/repository/reducing_the_repo_size_using_git.md')
-rw-r--r--doc/user/project/repository/reducing_the_repo_size_using_git.md274
1 files changed, 184 insertions, 90 deletions
diff --git a/doc/user/project/repository/reducing_the_repo_size_using_git.md b/doc/user/project/repository/reducing_the_repo_size_using_git.md
index 16bffe5417d..124150c441a 100644
--- a/doc/user/project/repository/reducing_the_repo_size_using_git.md
+++ b/doc/user/project/repository/reducing_the_repo_size_using_git.md
@@ -1,150 +1,244 @@
---
+stage: Create
+group: Gitaly
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type: howto
---
-# Reducing the repository size using Git
-
-A GitLab Enterprise Edition administrator can set a [repository size limit](../../admin_area/settings/account_and_limit_settings.md)
-which will prevent you from exceeding it.
-
-When a project has reached its size limit, you will not be able to push to it,
-create a new merge request, or merge existing ones. You will still be able to
-create new issues, and clone the project though. Uploading LFS objects will
-also be denied.
-
-If you exceed the repository size limit, your first thought might be to remove
-some data, make a new commit and push back to the repository. Perhaps you can
-move some blobs to LFS, or remove some old dependency updates from history.
-Unfortunately, it's not so easy and that workflow won't work. Deleting files in
-a commit doesn't actually reduce the size of the repo since the earlier commits
-and blobs are still around. What you need to do is rewrite history with Git's
-[`filter-branch` option](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch),
-or an open source community-maintained tool like the
-[BFG](https://rtyley.github.io/bfg-repo-cleaner/).
-
-Note that even with that method, until `git gc` runs on the GitLab side, the
-"removed" commits and blobs will still be around. You also need to be able to
-push the rewritten history to GitLab, which may be impossible if you've already
-exceeded the maximum size limit.
+# Reduce repository size
-In order to lift these restrictions, the administrator of the GitLab instance
-needs to increase the limit on the particular project that exceeded it, so it's
-always better to spot that you're approaching the limit and act proactively to
-stay underneath it. If you hit the limit, and your admin can't - or won't -
-temporarily increase it for you, your only option is to prune all the unneeded
-stuff locally, and then create a new project on GitLab and start using that
-instead.
+Git repositories become larger over time. When large files are added to a Git repository:
-If you can continue to use the original project, we recommend [using
-BFG](#using-the-bfg-repo-cleaner), a tool that's built and
-maintained by the open source community. It's faster and simpler than
-`git filter-branch`, and GitLab can use its account of what has changed to clean
-up its own internal state, maximizing the space saved.
+- Fetching the repository becomes slower because everyone must download the files.
+- They take up a large amount of storage space on the server.
+- Git repository storage limits [can be reached](#storage-limits).
-CAUTION: **Caution:**
-Make sure to first make a copy of your repository since rewriting history will
-purge the files and information you are about to delete. Also make sure to
-inform any collaborators to not use `pull` after your changes, but use `rebase`.
+Rewriting a repository can remove unwanted history to make the repository smaller.
+[`git filter-repo`](https://github.com/newren/git-filter-repo) is a tool for quickly rewriting Git
+repository history, and is recommended over both:
-CAUTION: **Caution:**
-This process is not suitable for removing sensitive data like password or keys
-from your repository. Information about commits, including file content, is
-cached in the database, and will remain visible even after they have been
-removed from the repository.
+- [`git filter-branch`](https://git-scm.com/docs/git-filter-branch).
+- [BFG](https://rtyley.github.io/bfg-repo-cleaner/).
+
+DANGER: **Danger:**
+Rewriting repository history is a destructive operation. Make sure to backup your repository before
+you begin. The best way back up a repository is to
+[export the project](../settings/import_export.md#exporting-a-project-and-its-data).
-## Using the BFG Repo-Cleaner
+## Purge files from repository history
-> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/issues/19376) in GitLab 11.6.
+To make cloning your project faster, rewrite branches and tags to remove unwanted files.
-1. [Install BFG](https://rtyley.github.io/bfg-repo-cleaner/) from its open source community repository.
+1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
+ using a supported package manager or from source.
-1. Navigate to your repository:
+1. Clone a fresh copy of the repository using `--bare`:
```shell
- cd my_repository/
+ git clone --bare https://example.gitlab.com/my/project.git
```
-1. Change to the branch you want to remove the big file from:
+1. Using `git filter-repo`, purge any files from the history of your repository.
+
+ To purge all large files, the `--strip-blobs-bigger-than` option can be used:
```shell
- git checkout master
+ git filter-repo --strip-blobs-bigger-than 10M
```
-1. Create a commit removing the large file from the branch, if it still exists:
+ To purge specific large files by path, the `--path` and `--invert-paths` options can be combined:
```shell
- git rm path/to/big_file.mpg
- git commit -m 'Remove unneeded large file'
+ git filter-repo --path path/to/big/file.m4v --invert-paths
```
-1. Rewrite history:
+ See the
+ [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
+ for more examples and the complete documentation.
+
+1. Running `git filter-repo` removes all remotes. To restore the remote for your project, run:
```shell
- bfg --delete-files path/to/big_file.mpg
+ git remote add origin https://example.gitlab.com/<namespace>/<project_name>.git
```
- An object map file will be written to `object-id-map.old-new.txt`. Keep it
- around - you'll need it for the final step!
+1. Force push your changes to overwrite all branches on GitLab:
-1. Force-push the changes to GitLab:
+ ```shell
+ git push origin --force --all
+ ```
+
+ [Protected branches](../protected_branches.md) will cause this to fail. To proceed, you must
+ remove branch protection, push, and then re-enable protected branches.
+
+1. To remove large files from tagged releases, force push your changes to all tags on GitLab:
```shell
- git push --force-with-lease origin master
+ git push origin --force --tags
```
- If this step fails, someone has changed the `master` branch while you were
- rewriting history. You could restore the branch and re-run BFG to preserve
- their changes, or use `git push --force` to overwrite their changes.
+ [Protected tags](../protected_tags.md) will cause this to fail. To proceed, you must remove tag
+ protection, push, and then re-enable protected tags.
-1. Navigate to **Project > Settings > Repository > Repository Cleanup**:
+## Purge files from GitLab storage
- ![Repository settings cleanup form](img/repository_cleanup.png)
+To reduce the size of your repository in GitLab, you must remove GitLab internal references to
+commits that contain large files. Before completing these steps,
+[purge files from your repository history](#purge-files-from-repository-history).
- Upload the `object-id-map.old-new.txt` file and press **Start cleanup**.
- This will remove any internal Git references to the old commits, and run
- `git gc` against the repository. You will receive an email once it has
- completed.
+As well as [branches](branches/index.md) and tags, which are a type of Git ref, GitLab automatically
+creates other refs. These refs prevent dead links to commits, or missing diffs when viewing merge
+requests. [Repository cleanup](#repository-cleanup) can be used to remove these from GitLab.
-NOTE: **Note:**
-This process will remove some copies of the rewritten commits from GitLab's
-cache and database, but there are still numerous gaps in coverage - at present,
-some of the copies may persist indefinitely. [Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache)
-may help to remove some of them, but it should not be depended on for security
-purposes!
+The following internal refs are not advertised:
-## Using `git filter-branch`
+- `refs/merge-requests/*` for merge requests.
+- `refs/pipelines/*` for
+ [pipelines](../../../ci/pipelines/index.md#troubleshooting-fatal-reference-is-not-a-tree).
+- `refs/environments/*` for environments.
-1. Navigate to your repository:
+This means they are not usually included when fetching, which makes fetching faster. In addition,
+`refs/keep-around/*` are hidden refs to prevent commits with discussion from being deleted and
+cannot be fetched at all.
- ```shell
- cd my_repository/
- ```
+However, these refs can be accessed from the Git bundle inside a project export.
-1. Change to the branch you want to remove the big file from:
+1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
+ using a supported package manager or from source.
+
+1. Generate a fresh [export from the
+ project](../settings/import_export.html#exporting-a-project-and-its-data) and download it.
+
+1. Decompress the backup using `tar`:
```shell
- git checkout master
+ tar xzf project-backup.tar.gz
```
-1. Use `filter-branch` to remove the big file:
+ This will contain a `project.bundle` file, which was created by
+ [`git bundle`](https://git-scm.com/docs/git-bundle).
+
+1. Clone a fresh copy of the repository from the bundle:
```shell
- git filter-branch --force --tree-filter 'rm -f path/to/big_file.mpg' HEAD
+ git clone --bare --mirror /path/to/project.bundle
```
-1. Instruct Git to purge the unwanted data:
+1. Using `git filter-repo`, purge any files from the history of your repository. Because we are
+ trying to remove internal refs, we will rely on the `commit-map` produced by each run to tell us
+ which internal refs to remove.
+
+ NOTE:**Note:**
+ `git filter-repo` creates a new `commit-map` file every run, and overwrite the `commit-map` from
+ the previous run. You will need this file from **every** run. Do the next step every time you run
+ `git filter-repo`.
+
+ To purge all large files, the `--strip-blobs-bigger-than` option can be used:
```shell
- git reflog expire --expire=now --all && git gc --prune=now --aggressive
+ git filter-repo --strip-blobs-bigger-than 10M
```
-1. Lastly, force push to the repository:
+ To purge specific large files by path, the `--path` and `--invert-paths` options can be combined.
```shell
- git push --force origin master
+ git filter-repo --path path/to/big/file.m4v --invert-paths
```
-Your repository should now be below the size limit.
+ See the
+ [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
+ for more examples and the complete documentation.
+
+1. Run a [repository cleanup](#repository-cleanup).
+
+## Repository cleanup
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/19376) in GitLab 11.6.
+
+Repository cleanup allows you to upload a text file of objects and GitLab will remove internal Git
+references to these objects. You can use
+[`git filter-repo`](https://github.com/newren/git-filter-repo) to produce a list of objects (in a
+`commit-map` file) that can be used with repository cleanup.
+
+To clean up a repository:
+
+1. Go to the project for the repository.
+1. Navigate to **{settings}** **Settings > Repository**.
+1. Upload a list of objects. For example, a `commit-map` file.
+1. Click **Start cleanup**.
+
+This will:
+
+- Remove any internal Git references to old commits.
+- Run `git gc` against the repository.
+
+You will receive an email once it has completed.
+
+When using repository cleanup, note:
+
+- Housekeeping prunes loose objects older than 2 weeks. This means objects added in the last 2 weeks
+ will not be removed immediately. If you have access to the
+ [Gitaly](../../../administration/gitaly/index.md) server, you may run `git gc --prune=now` to
+ prune all loose objects immediately.
+- This process will remove some copies of the rewritten commits from GitLab's cache and database,
+ but there are still numerous gaps in coverage and some of the copies may persist indefinitely.
+ [Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache)
+ may help to remove some of them, but it should not be depended on for security purposes!
+
+## Storage limits
+
+Repository size limits:
+
+- Can [be set by an administrator](../../admin_area/settings/account_and_limit_settings.md#repository-size-limit-starter-only)
+ on self-managed instances. **(STARTER ONLY)**
+- Are [set for GitLab.com](../../gitlab_com/index.md#repository-size-limit).
+
+When a project has reached its size limit, you cannot:
+
+- Push to the project.
+- Create a new merge request.
+- Merge existing merge requests.
+- Upload LFS objects.
+
+You can still:
+
+- Create new issues.
+- Clone the project.
+
+If you exceed the repository size limit, you might try to:
+
+1. Remove some data.
+1. Make a new commit.
+1. Push back to the repository.
+
+Perhaps you might also:
+
+- Move some blobs to LFS.
+- Remove some old dependency updates from history.
+
+Unfortunately, this workflow won't work. Deleting files in a commit doesn't actually reduce the size
+of the repository because the earlier commits and blobs still exist.
+
+What you need to do is rewrite history. We recommend the open-source community-maintained tool
+[`git filter-repo`](https://github.com/newren/git-filter-repo).
+
+NOTE: **Note:**
+Until `git gc` runs on the GitLab side, the "removed" commits and blobs will still exist. You also
+must be able to push the rewritten history to GitLab, which may be impossible if you've already
+exceeded the maximum size limit.
+
+In order to lift these restrictions, the administrator of the self-managed GitLab instance must
+increase the limit on the particular project that exceeded it. Therefore, it's always better to
+proactively stay underneath the limit. If you hit the limit, and can't have it temporarily
+increased, your only option is to:
+
+1. Prune all the unneeded stuff locally.
+1. Create a new project on GitLab and start using that instead.
+
+CAUTION: **Caution:**
+This process is not suitable for removing sensitive data like password or keys from your repository.
+Information about commits, including file content, is cached in the database, and will remain
+visible even after they have been removed from the repository.
<!-- ## Troubleshooting