diff options
Diffstat (limited to 'doc/administration/repository_storage_types.md')
-rw-r--r-- | doc/administration/repository_storage_types.md | 266 |
1 files changed, 135 insertions, 131 deletions
diff --git a/doc/administration/repository_storage_types.md b/doc/administration/repository_storage_types.md index a5c323be4ce..29e31fcb6ef 100644 --- a/doc/administration/repository_storage_types.md +++ b/doc/administration/repository_storage_types.md @@ -7,51 +7,53 @@ type: reference, howto # Repository storage types **(FREE SELF)** -> - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28283) in GitLab 10.0. -> - Hashed storage became the default for new installations in GitLab 12.0 -> - Hashed storage is enabled by default for new and renamed projects in GitLab 13.0. +GitLab can be configured to use one or multiple repository storages. These storages can be: -GitLab can be configured to use one or multiple repository storage paths/shard -locations that can be: +- Accessed via [Gitaly](gitaly/index.md), optionally on + [its own server](gitaly/configure_gitaly.md#run-gitaly-on-its-own-server). +- Mounted to the local disk. This [method](repository_storage_paths.md#configure-repository-storage-paths) + is deprecated and [scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/2320) in + GitLab 14.0. +- Exposed as an NFS shared volume. This method is deprecated and + [scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/3371) in GitLab 14.0. -- Mounted to the local disk -- Exposed as an NFS shared volume -- Accessed via [Gitaly](gitaly/index.md) on its own machine. +In GitLab: -In GitLab, this is configured in `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})` -configuration hash. The storage layouts discussed here apply to any shard -defined in it. +- Repository storages are configured in: + - `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})` configuration hash for Omnibus GitLab + installations. + - `gitlab.yml` by the `repositories.storages` key for installations from source. +- The `default` repository storage is available in any installations that haven't customized it. By + default, it points to a Gitaly node. -The `default` repository shard that is available in any installations -that haven't customized it, points to the local folder: `/var/opt/gitlab/git-data`. -Anything discussed below is expected to be part of that folder. +The repository storage types documented here apply to any repository storage defined in +`git_data_dirs({})` or `repositories.storages`. ## Hashed storage -NOTE: -In GitLab 13.0, hashed storage is enabled by default and the legacy storage is -deprecated. Support for legacy storage is scheduled to be removed in GitLab 14.0. -If you haven't migrated yet, check the -[migration instructions](raketasks/storage.md#migrate-to-hashed-storage). -The option to choose between hashed and legacy storage in the admin area has -been disabled. - -Hashed storage is the storage behavior we rolled out with 10.0. Instead -of coupling project URL and the folder structure where the repository is -stored on disk, we couple a hash based on the project's ID. This makes -the folder structure immutable, and therefore eliminates any requirement to -synchronize state from URLs to disk structure. This means that renaming a group, -user, or project costs only the database transaction, and takes effect -immediately. - -The hash also helps spread the repositories more evenly on the disk. The -top-level directory contains fewer folders than the total number of top-level -namespaces. - -The hash format is based on the hexadecimal representation of SHA256: -`SHA256(project.id)`. The top-level folder uses the first 2 characters, followed -by another folder with the next 2 characters. They are both stored in a special -`@hashed` folder, to be able to co-exist with existing Legacy Storage projects: +> - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28283) in GitLab 10.0. +> - Made the default for new installations in GitLab 12.0. +> - Enabled by default for new and renamed projects in GitLab 13.0. + +Hashed storage stores projects on disk in a location based on a hash of the project's ID. Hashed +storage is different to [legacy storage](#legacy-storage) where a project is stored based on: + +- The project's URL. +- The folder structure where the repository is stored on disk. + +This makes the folder structure immutable and eliminates the need to synchronize state from URLs to +disk structure. This means that renaming a group, user, or project: + +- Costs only the database transaction. +- Takes effect immediately. + +The hash also helps spread the repositories more evenly on the disk. The top-level directory +contains fewer folders than the total number of top-level namespaces. + +The hash format is based on the hexadecimal representation of a SHA256, calculated with +`SHA256(project.id)`. The top-level folder uses the first two characters, followed by another folder +with the next two characters. They are both stored in a special `@hashed` folder so they can +co-exist with existing legacy storage projects. For example: ```ruby # Project's repository: @@ -61,53 +63,59 @@ by another folder with the next 2 characters. They are both stored in a special "@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git" ``` -### Translating hashed storage paths +### Translate hashed storage paths + +Troubleshooting problems with the Git repositories, adding hooks, and other tasks requires you +translate between the human-readable project name and the hashed storage path. You can translate: -Troubleshooting problems with the Git repositories, adding hooks, and other -tasks requires you translate between the human readable project name -and the hashed storage path. +- From a [project's name to its hashed path](#from-project-name-to-hashed-path). +- From a [hashed path to a project's name](#from-hashed-path-to-project-name). #### From project name to hashed path -The hashed path is shown on the project's page in the [admin area](../user/admin_area/index.md#administering-projects). +Administrators can look up a project's hashed path from its name or ID using: + +- The [Admin area](../user/admin_area/index.md#administering-projects). +- A Rails console. + +To look up a project's hash path in the Admin Area: -To access the Projects page, go to **Admin Area > Overview > Projects** and then -open up the page for the project. +1. Go to the **Admin Area** (**{admin}**). +1. Go to **Overview > Projects** and select the project. -The "Gitaly relative path" is shown there, for example: +The **Gitaly relative path** is displayed there and looks similar to: ```plaintext "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git" ``` -This is the path under `/var/opt/gitlab/git-data/repositories/` on a -default Omnibus installation. +To look up a project's hash path using a Rails console: -In a [Rails console](operations/rails_console.md#starting-a-rails-console-session), -get this information using either the numeric project ID or the full path: +1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session). +1. Run a command similar to this example (use either the project's ID or its name): -```ruby -Project.find(16).disk_path -Project.find_by_full_path('group/project').disk_path -``` + ```ruby + Project.find(16).disk_path + Project.find_by_full_path('group/project').disk_path + ``` #### From hashed path to project name -To translate from a hashed storage path to a project name: +Administrators can look up a project's name from its hashed storage path using a Rails console. To +look up a project's name from its hashed storage path: 1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session). -1. Run the following: +1. Run a command similar to this example: -```ruby -ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project -``` + ```ruby + ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project + ``` -The quoted string in that command is the directory tree you can find on your -GitLab server. For example, on a default Omnibus installation this would be -`/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git` +The quoted string in that command is the directory tree you can find on your GitLab server. For +example, on a default Omnibus installation this would be `/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git` with `.git` from the end of the directory name removed. -The output includes the project ID and the project name: +The output includes the project ID and the project name. For example: ```plaintext => #<Project id:16 it/supportteam/ticketsystem> @@ -117,57 +125,61 @@ The output includes the project ID and the project name: > [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/1606) in GitLab 12.1. -WARNING: -Do not run `git prune` or `git gc` in pool repositories! This can -cause data loss in "real" repositories that depend on the pool in -question. +Object pools are repositories used to deduplicate forks of public and internal projects and +contain the objects from the source project. Using `objects/info/alternates`, the source project and +forks use the object pool for shared objects. For more information, see +[How Git object deduplication works in GitLab](../development/git_object_deduplication.md). -Forks of public and internal projects are deduplicated by creating a third repository, the -object pool, containing the objects from the source project. Using -`objects/info/alternates`, the source project and forks use the object pool for -shared objects. Objects are moved from the source project to the object pool -when housekeeping is run on the source project. +Objects are moved from the source project to the object pool when housekeeping is run on the source +project. Object pool repositories are stored similarly to regular repositories: ```ruby # object pool paths "@pools/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git" ``` -### Hashed storage coverage migration - -Files stored in an S3-compatible endpoint do not have the downsides -mentioned earlier, if they are not prefixed with `#{namespace}/#{project_name}`. -This is true for CI Cache and LFS Objects. - -In the table below, you can find the coverage of the migration to the hashed storage. - -| Storable Object | Legacy storage | Hashed storage | S3 Compatible | GitLab Version | -| --------------- | -------------- | -------------- | ------------- | -------------- | -| Repository | Yes | Yes | - | 10.0 | -| Attachments | Yes | Yes | - | 10.2 | -| Avatars | Yes | No | - | - | -| Pages | Yes | No | - | - | -| Docker Registry | Yes | No | - | - | -| CI Build Logs | No | No | - | - | -| CI Artifacts | No | No | Yes | 9.4 / 10.6 | -| CI Cache | No | No | Yes | - | -| LFS Objects | Yes | Similar | Yes | 10.0 / 10.7 | -| Repository pools| No | Yes | - | 11.6 | +WARNING: +Do not run `git prune` or `git gc` in object pool repositories. This can cause data loss in the +regular repositories that depend on the object pool. + +### Object storage support + +This table shows which storable objects are storable in each storage type: + +| Storable object | Legacy storage | Hashed storage | S3 compatible | GitLab version | +|:-----------------|:---------------|:---------------|:--------------|:---------------| +| Repository | Yes | Yes | - | 10.0 | +| Attachments | Yes | Yes | - | 10.2 | +| Avatars | Yes | No | - | - | +| Pages | Yes | No | - | - | +| Docker Registry | Yes | No | - | - | +| CI/CD job logs | No | No | - | - | +| CI/CD artifacts | No | No | Yes | 9.4 / 10.6 | +| CI/CD cache | No | No | Yes | - | +| LFS objects | Yes | Similar | Yes | 10.0 / 10.7 | +| Repository pools | No | Yes | - | 11.6 | + +Files stored in an S3-compatible endpoint can have the same advantages as +[hashed storage](#hashed-storage), as long as they are not prefixed with +`#{namespace}/#{project_name}`. This is true for CI/CD cache and LFS objects. #### Avatars -Each file is stored in a folder with its `id` from the database. The filename is always `avatar.png` for user avatars. -When avatar is replaced, `Upload` model is destroyed and a new one takes place with different `id`. +Each file is stored in a directory that matches the `id` assigned to it in the database. The +filename is always `avatar.png` for user avatars. When an avatar is replaced, the `Upload` model is +destroyed and a new one takes place with a different `id`. + +#### CI/CD artifacts -#### CI artifacts +CI/CD artifacts are: -CI Artifacts are S3 compatible since **9.4** (GitLab Premium), and available in GitLab Free since -**10.6**. +- S3-compatible since GitLab 9.4, initially available in [GitLab Premium](https://about.gitlab.com/pricing/). +- Available in [GitLab Free](https://about.gitlab.com/pricing/) since GitLab 10.6. #### LFS objects [LFS Objects in GitLab](../topics/git/lfs/index.md) implement a similar -storage pattern using 2 chars, 2 level folders, following Git's own implementation: +storage pattern using two characters and two-level folders, following Git's own implementation: ```ruby "shared/lfs-objects/#{oid[0..1}/#{oid[2..3]}/#{oid[4..-1]}" @@ -176,40 +188,32 @@ storage pattern using 2 chars, 2 level folders, following Git's own implementati "shared/lfs-objects/89/09/029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c" ``` -LFS objects are also [S3 compatible](lfs/index.md#storing-lfs-objects-in-remote-object-storage). +LFS objects are also [S3-compatible](lfs/index.md#storing-lfs-objects-in-remote-object-storage). ## Legacy storage WARNING: -In GitLab 13.0, hashed storage is enabled by default and the legacy storage is -deprecated. If you haven't migrated yet, check the -[migration instructions](raketasks/storage.md#migrate-to-hashed-storage). -Support for legacy storage is scheduled to be removed in GitLab 14.0. If you're on GitLab -13.0 and later, switching new projects to legacy storage is not possible. -The option to choose between hashed and legacy storage in the admin area has -been disabled. - -Legacy storage is the storage behavior prior to version 10.0. For historical -reasons, GitLab replicated the same mapping structure from the projects URLs: - -- Project's repository: `#{namespace}/#{project_name}.git` -- Project's wiki: `#{namespace}/#{project_name}.wiki.git` - -This structure enables you to migrate from existing solutions to GitLab, and -for Administrators to find where the repository is stored. - -This approach also has some drawbacks: - -Storage location concentrates a huge number of top-level namespaces. The -impact can be reduced by the introduction of -[multiple storage paths](repository_storage_paths.md). - -Because backups are a snapshot of the same URL mapping, if you try to recover a -very old backup, you need to verify whether any project has taken the place of -an old removed or renamed project sharing the same URL. This means that -`mygroup/myproject` from your backup may not be the same original project that -is at that same URL today. - -Any change in the URL needs to be reflected on disk (when groups / users or -projects are renamed). This can add a lot of load in big installations, -especially if using any type of network based file system. +In GitLab 13.0, legacy storage is deprecated. If you haven't migrated to hashed storage yet, check +the [migration instructions](raketasks/storage.md#migrate-to-hashed-storage). Support for legacy +storage is [scheduled to be removed](https://gitlab.com/gitlab-org/gitaly/-/issues/1690) in GitLab +14.0. In GitLab 13.0 and later, switching new projects to legacy storage is not possible. The +option to choose between hashed and legacy storage in the Admin Area is disabled. + +Legacy storage was the storage behavior prior to version GitLab 10.0. For historical reasons, +GitLab replicated the same mapping structure from the projects URLs: + +- Project's repository: `#{namespace}/#{project_name}.git`. +- Project's wiki: `#{namespace}/#{project_name}.wiki.git`. + +This structure enabled you to migrate from existing solutions to GitLab, and for Administrators to +find where the repository was stored. This approach also had some drawbacks: + +- Storage location concentrated a large number of top-level namespaces. The impact could be + reduced by [multiple repository storage paths](repository_storage_paths.md). +- Because backups were a snapshot of the same URL mapping, if you tried to recover a very old + backup, you needed to verify whether any project had taken the place of an old removed or renamed + project sharing the same URL. This meant that `mygroup/myproject` from your backup may not have + been the same original project that was at that same URL today. +- Any change in the URL needed to be reflected on disk (when groups, users, or projects were + renamed. This could add a lot of load in big installations, especially if using any type of + network-based file system. |