diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-06-18 11:18:50 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-06-18 11:18:50 +0000 |
commit | 8c7f4e9d5f36cff46365a7f8c4b9c21578c1e781 (patch) | |
tree | a77e7fe7a93de11213032ed4ab1f33a3db51b738 /doc/user/project/repository/reducing_the_repo_size_using_git.md | |
parent | 00b35af3db1abfe813a778f643dad221aad51fca (diff) | |
download | gitlab-ce-8c7f4e9d5f36cff46365a7f8c4b9c21578c1e781.tar.gz |
Add latest changes from gitlab-org/gitlab@13-1-stable-ee
Diffstat (limited to 'doc/user/project/repository/reducing_the_repo_size_using_git.md')
-rw-r--r-- | doc/user/project/repository/reducing_the_repo_size_using_git.md | 274 |
1 files changed, 184 insertions, 90 deletions
diff --git a/doc/user/project/repository/reducing_the_repo_size_using_git.md b/doc/user/project/repository/reducing_the_repo_size_using_git.md index 16bffe5417d..124150c441a 100644 --- a/doc/user/project/repository/reducing_the_repo_size_using_git.md +++ b/doc/user/project/repository/reducing_the_repo_size_using_git.md @@ -1,150 +1,244 @@ --- +stage: Create +group: Gitaly +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers type: howto --- -# Reducing the repository size using Git - -A GitLab Enterprise Edition administrator can set a [repository size limit](../../admin_area/settings/account_and_limit_settings.md) -which will prevent you from exceeding it. - -When a project has reached its size limit, you will not be able to push to it, -create a new merge request, or merge existing ones. You will still be able to -create new issues, and clone the project though. Uploading LFS objects will -also be denied. - -If you exceed the repository size limit, your first thought might be to remove -some data, make a new commit and push back to the repository. Perhaps you can -move some blobs to LFS, or remove some old dependency updates from history. -Unfortunately, it's not so easy and that workflow won't work. Deleting files in -a commit doesn't actually reduce the size of the repo since the earlier commits -and blobs are still around. What you need to do is rewrite history with Git's -[`filter-branch` option](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch), -or an open source community-maintained tool like the -[BFG](https://rtyley.github.io/bfg-repo-cleaner/). - -Note that even with that method, until `git gc` runs on the GitLab side, the -"removed" commits and blobs will still be around. You also need to be able to -push the rewritten history to GitLab, which may be impossible if you've already -exceeded the maximum size limit. +# Reduce repository size -In order to lift these restrictions, the administrator of the GitLab instance -needs to increase the limit on the particular project that exceeded it, so it's -always better to spot that you're approaching the limit and act proactively to -stay underneath it. If you hit the limit, and your admin can't - or won't - -temporarily increase it for you, your only option is to prune all the unneeded -stuff locally, and then create a new project on GitLab and start using that -instead. +Git repositories become larger over time. When large files are added to a Git repository: -If you can continue to use the original project, we recommend [using -BFG](#using-the-bfg-repo-cleaner), a tool that's built and -maintained by the open source community. It's faster and simpler than -`git filter-branch`, and GitLab can use its account of what has changed to clean -up its own internal state, maximizing the space saved. +- Fetching the repository becomes slower because everyone must download the files. +- They take up a large amount of storage space on the server. +- Git repository storage limits [can be reached](#storage-limits). -CAUTION: **Caution:** -Make sure to first make a copy of your repository since rewriting history will -purge the files and information you are about to delete. Also make sure to -inform any collaborators to not use `pull` after your changes, but use `rebase`. +Rewriting a repository can remove unwanted history to make the repository smaller. +[`git filter-repo`](https://github.com/newren/git-filter-repo) is a tool for quickly rewriting Git +repository history, and is recommended over both: -CAUTION: **Caution:** -This process is not suitable for removing sensitive data like password or keys -from your repository. Information about commits, including file content, is -cached in the database, and will remain visible even after they have been -removed from the repository. +- [`git filter-branch`](https://git-scm.com/docs/git-filter-branch). +- [BFG](https://rtyley.github.io/bfg-repo-cleaner/). + +DANGER: **Danger:** +Rewriting repository history is a destructive operation. Make sure to backup your repository before +you begin. The best way back up a repository is to +[export the project](../settings/import_export.md#exporting-a-project-and-its-data). -## Using the BFG Repo-Cleaner +## Purge files from repository history -> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/issues/19376) in GitLab 11.6. +To make cloning your project faster, rewrite branches and tags to remove unwanted files. -1. [Install BFG](https://rtyley.github.io/bfg-repo-cleaner/) from its open source community repository. +1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md) + using a supported package manager or from source. -1. Navigate to your repository: +1. Clone a fresh copy of the repository using `--bare`: ```shell - cd my_repository/ + git clone --bare https://example.gitlab.com/my/project.git ``` -1. Change to the branch you want to remove the big file from: +1. Using `git filter-repo`, purge any files from the history of your repository. + + To purge all large files, the `--strip-blobs-bigger-than` option can be used: ```shell - git checkout master + git filter-repo --strip-blobs-bigger-than 10M ``` -1. Create a commit removing the large file from the branch, if it still exists: + To purge specific large files by path, the `--path` and `--invert-paths` options can be combined: ```shell - git rm path/to/big_file.mpg - git commit -m 'Remove unneeded large file' + git filter-repo --path path/to/big/file.m4v --invert-paths ``` -1. Rewrite history: + See the + [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES) + for more examples and the complete documentation. + +1. Running `git filter-repo` removes all remotes. To restore the remote for your project, run: ```shell - bfg --delete-files path/to/big_file.mpg + git remote add origin https://example.gitlab.com/<namespace>/<project_name>.git ``` - An object map file will be written to `object-id-map.old-new.txt`. Keep it - around - you'll need it for the final step! +1. Force push your changes to overwrite all branches on GitLab: -1. Force-push the changes to GitLab: + ```shell + git push origin --force --all + ``` + + [Protected branches](../protected_branches.md) will cause this to fail. To proceed, you must + remove branch protection, push, and then re-enable protected branches. + +1. To remove large files from tagged releases, force push your changes to all tags on GitLab: ```shell - git push --force-with-lease origin master + git push origin --force --tags ``` - If this step fails, someone has changed the `master` branch while you were - rewriting history. You could restore the branch and re-run BFG to preserve - their changes, or use `git push --force` to overwrite their changes. + [Protected tags](../protected_tags.md) will cause this to fail. To proceed, you must remove tag + protection, push, and then re-enable protected tags. -1. Navigate to **Project > Settings > Repository > Repository Cleanup**: +## Purge files from GitLab storage - ![Repository settings cleanup form](img/repository_cleanup.png) +To reduce the size of your repository in GitLab, you must remove GitLab internal references to +commits that contain large files. Before completing these steps, +[purge files from your repository history](#purge-files-from-repository-history). - Upload the `object-id-map.old-new.txt` file and press **Start cleanup**. - This will remove any internal Git references to the old commits, and run - `git gc` against the repository. You will receive an email once it has - completed. +As well as [branches](branches/index.md) and tags, which are a type of Git ref, GitLab automatically +creates other refs. These refs prevent dead links to commits, or missing diffs when viewing merge +requests. [Repository cleanup](#repository-cleanup) can be used to remove these from GitLab. -NOTE: **Note:** -This process will remove some copies of the rewritten commits from GitLab's -cache and database, but there are still numerous gaps in coverage - at present, -some of the copies may persist indefinitely. [Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache) -may help to remove some of them, but it should not be depended on for security -purposes! +The following internal refs are not advertised: -## Using `git filter-branch` +- `refs/merge-requests/*` for merge requests. +- `refs/pipelines/*` for + [pipelines](../../../ci/pipelines/index.md#troubleshooting-fatal-reference-is-not-a-tree). +- `refs/environments/*` for environments. -1. Navigate to your repository: +This means they are not usually included when fetching, which makes fetching faster. In addition, +`refs/keep-around/*` are hidden refs to prevent commits with discussion from being deleted and +cannot be fetched at all. - ```shell - cd my_repository/ - ``` +However, these refs can be accessed from the Git bundle inside a project export. -1. Change to the branch you want to remove the big file from: +1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md) + using a supported package manager or from source. + +1. Generate a fresh [export from the + project](../settings/import_export.html#exporting-a-project-and-its-data) and download it. + +1. Decompress the backup using `tar`: ```shell - git checkout master + tar xzf project-backup.tar.gz ``` -1. Use `filter-branch` to remove the big file: + This will contain a `project.bundle` file, which was created by + [`git bundle`](https://git-scm.com/docs/git-bundle). + +1. Clone a fresh copy of the repository from the bundle: ```shell - git filter-branch --force --tree-filter 'rm -f path/to/big_file.mpg' HEAD + git clone --bare --mirror /path/to/project.bundle ``` -1. Instruct Git to purge the unwanted data: +1. Using `git filter-repo`, purge any files from the history of your repository. Because we are + trying to remove internal refs, we will rely on the `commit-map` produced by each run to tell us + which internal refs to remove. + + NOTE:**Note:** + `git filter-repo` creates a new `commit-map` file every run, and overwrite the `commit-map` from + the previous run. You will need this file from **every** run. Do the next step every time you run + `git filter-repo`. + + To purge all large files, the `--strip-blobs-bigger-than` option can be used: ```shell - git reflog expire --expire=now --all && git gc --prune=now --aggressive + git filter-repo --strip-blobs-bigger-than 10M ``` -1. Lastly, force push to the repository: + To purge specific large files by path, the `--path` and `--invert-paths` options can be combined. ```shell - git push --force origin master + git filter-repo --path path/to/big/file.m4v --invert-paths ``` -Your repository should now be below the size limit. + See the + [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES) + for more examples and the complete documentation. + +1. Run a [repository cleanup](#repository-cleanup). + +## Repository cleanup + +> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/19376) in GitLab 11.6. + +Repository cleanup allows you to upload a text file of objects and GitLab will remove internal Git +references to these objects. You can use +[`git filter-repo`](https://github.com/newren/git-filter-repo) to produce a list of objects (in a +`commit-map` file) that can be used with repository cleanup. + +To clean up a repository: + +1. Go to the project for the repository. +1. Navigate to **{settings}** **Settings > Repository**. +1. Upload a list of objects. For example, a `commit-map` file. +1. Click **Start cleanup**. + +This will: + +- Remove any internal Git references to old commits. +- Run `git gc` against the repository. + +You will receive an email once it has completed. + +When using repository cleanup, note: + +- Housekeeping prunes loose objects older than 2 weeks. This means objects added in the last 2 weeks + will not be removed immediately. If you have access to the + [Gitaly](../../../administration/gitaly/index.md) server, you may run `git gc --prune=now` to + prune all loose objects immediately. +- This process will remove some copies of the rewritten commits from GitLab's cache and database, + but there are still numerous gaps in coverage and some of the copies may persist indefinitely. + [Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache) + may help to remove some of them, but it should not be depended on for security purposes! + +## Storage limits + +Repository size limits: + +- Can [be set by an administrator](../../admin_area/settings/account_and_limit_settings.md#repository-size-limit-starter-only) + on self-managed instances. **(STARTER ONLY)** +- Are [set for GitLab.com](../../gitlab_com/index.md#repository-size-limit). + +When a project has reached its size limit, you cannot: + +- Push to the project. +- Create a new merge request. +- Merge existing merge requests. +- Upload LFS objects. + +You can still: + +- Create new issues. +- Clone the project. + +If you exceed the repository size limit, you might try to: + +1. Remove some data. +1. Make a new commit. +1. Push back to the repository. + +Perhaps you might also: + +- Move some blobs to LFS. +- Remove some old dependency updates from history. + +Unfortunately, this workflow won't work. Deleting files in a commit doesn't actually reduce the size +of the repository because the earlier commits and blobs still exist. + +What you need to do is rewrite history. We recommend the open-source community-maintained tool +[`git filter-repo`](https://github.com/newren/git-filter-repo). + +NOTE: **Note:** +Until `git gc` runs on the GitLab side, the "removed" commits and blobs will still exist. You also +must be able to push the rewritten history to GitLab, which may be impossible if you've already +exceeded the maximum size limit. + +In order to lift these restrictions, the administrator of the self-managed GitLab instance must +increase the limit on the particular project that exceeded it. Therefore, it's always better to +proactively stay underneath the limit. If you hit the limit, and can't have it temporarily +increased, your only option is to: + +1. Prune all the unneeded stuff locally. +1. Create a new project on GitLab and start using that instead. + +CAUTION: **Caution:** +This process is not suitable for removing sensitive data like password or keys from your repository. +Information about commits, including file content, is cached in the database, and will remain +visible even after they have been removed from the repository. <!-- ## Troubleshooting |