diff options
Diffstat (limited to 'doc/administration/gitaly/index.md')
-rw-r--r-- | doc/administration/gitaly/index.md | 295 |
1 files changed, 250 insertions, 45 deletions
diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md index 0af248e0573..bca83e903ac 100644 --- a/doc/administration/gitaly/index.md +++ b/doc/administration/gitaly/index.md @@ -30,8 +30,8 @@ repository storage is either: - A Gitaly storage with direct access to repositories using [storage paths](../repository_storage_paths.md), where each repository is stored on a single Gitaly node. All requests are routed to this node. -- A virtual storage provided by [Gitaly Cluster](#gitaly-cluster), where each repository can be - stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster: +- A [virtual storage](#virtual-storage) provided by [Gitaly Cluster](#gitaly-cluster), where each + repository can be stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster: - Read requests are distributed between multiple Gitaly nodes, which can improve performance. - Write requests are broadcast to repository replicas. @@ -39,32 +39,6 @@ WARNING: Engineering support for NFS for Git repositories is deprecated. Read the [deprecation notice](#nfs-deprecation-notice). -## Virtual storage - -Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository -management. - -Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations. -However, this is at the expense of additional storage space needed to store each repository on multiple -Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is: - -- Improved fault tolerance, because each Gitaly node has a copy of every repository. -- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak - loads, because read loads are distributed across Gitaly nodes. -- Manual rebalancing for performance is not required, because read loads are distributed across - Gitaly nodes. -- Simpler management, because all Gitaly nodes are identical. - -The number of repository replicas can be configured using a -[replication factor](praefect.md#replication-factor). - -It can -be uneconomical to have the same replication factor for all repositories. -[Variable replication factor](https://gitlab.com/groups/gitlab-org/-/epics/3372) is planned to -provide greater flexibility for extremely large GitLab instances. - -As with normal Gitaly storages, virtual storages can be sharded. - ## Gitaly The following shows GitLab set up to use direct access to Gitaly: @@ -160,7 +134,7 @@ In this example: - Repositories are stored on a virtual storage called `storage-1`. - Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`. - The three Gitaly nodes share data in three separate hashed storage locations. -- The [replication factor](praefect.md#replication-factor) is `3`. There are three copies maintained +- The [replication factor](#replication-factor) is `3`. There are three copies maintained of each repository. The availability objectives for Gitaly clusters are: @@ -170,7 +144,7 @@ The availability objectives for Gitaly clusters are: Writes are replicated asynchronously. Any writes that have not been replicated to the newly promoted primary are lost. - [Strong consistency](praefect.md#strong-consistency) can be used to avoid loss in some + [Strong consistency](#strong-consistency) can be used to avoid loss in some circumstances. - **Recovery Time Objective (RTO):** Less than 10 seconds. @@ -178,20 +152,34 @@ The availability objectives for Gitaly clusters are: second. Failover requires ten consecutive failed health checks on each Praefect node. - [Faster outage detection](https://gitlab.com/gitlab-org/gitaly/-/issues/2608) - is planned to improve this to less than 1 second. + Faster outage detection, to improve this speed to less than 1 second, + is tracked [in this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2608). + +### Virtual storage + +Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository +management. + +Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations. +However, this is at the expense of additional storage space needed to store each repository on multiple +Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is: -Gitaly Cluster supports: +- Improved fault tolerance, because each Gitaly node has a copy of every repository. +- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak + loads, because read loads are distributed across Gitaly nodes. +- Manual rebalancing for performance is not required, because read loads are distributed across + Gitaly nodes. +- Simpler management, because all Gitaly nodes are identical. -- [Strong consistency](praefect.md#strong-consistency) of the secondary replicas. -- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the primary to the secondary. -- Reporting of possible data loss if replication queue is non-empty. -- From GitLab 13.0 to GitLab 14.0, marking repositories as [read-only](praefect.md#read-only-mode) - if data loss is detected to prevent data inconsistencies. +The number of repository replicas can be configured using a +[replication factor](#replication-factor). + +It can +be uneconomical to have the same replication factor for all repositories. +To provide greater flexibility for extremely large GitLab instances, +variable replication factor is tracked in [this issue](https://gitlab.com/groups/gitlab-org/-/epics/3372). -Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489) -for improvements including -[horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013). +As with normal Gitaly storages, virtual storages can be sharded. ### Moving beyond NFS @@ -220,7 +208,7 @@ Further reading: - Blog post: [The road to Gitaly v1.0 (aka, why GitLab doesn't require NFS for storing Git data anymore)](https://about.gitlab.com/blog/2018/09/12/the-road-to-gitaly-1-0/) - Blog post: [How we spent two weeks hunting an NFS bug in the Linux kernel](https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/) -### Components of Gitaly Cluster +### Components Gitaly Cluster consists of multiple components: @@ -240,10 +228,227 @@ component for running a Gitaly Cluster. For more information, see [Gitaly High Availability (HA) Design](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md). +### Features + +Gitaly Cluster provides the following features: + +- [Distributed reads](#distributed-reads) among Gitaly nodes. +- [Strong consistency](#strong-consistency) of the secondary replicas. +- [Replication factor](#replication-factor) of repositories for increased redundancy. +- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the + primary Gitaly node to secondary Gitaly nodes. +- Reporting of possible [data loss](praefect.md#check-for-data-loss) if replication queue is + non-empty. + +Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489) for improvements +including [horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013). + +#### Distributed reads + +> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled. +> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3. +> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5. +> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8. +> - [Feature flag removed](https://gitlab.com/gitlab-org/gitaly/-/issues/3383) in GitLab 13.11. + +Gitaly Cluster supports distribution of read operations across Gitaly nodes that are configured for +the [virtual storage](#virtual-storage). + +All RPCs marked with the `ACCESSOR` option are redirected to an up to date and healthy Gitaly node. +For example, [`GetBlob`](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16). + +_Up to date_ in this context means that: + +- There is no replication operations scheduled for this Gitaly node. +- The last replication operation is in _completed_ state. + +The primary node is chosen to serve the request if: + +- There are no up to date nodes. +- Any other error occurs during node selection. + +You can [monitor distribution of reads](#monitor-gitaly-cluster) using Prometheus. + +#### Strong consistency + +> - Introduced in GitLab 13.1 in [alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga), disabled by default. +> - Entered [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) in GitLab 13.2, disabled by default. +> - In GitLab 13.3, disabled unless primary-wins voting strategy is disabled. +> - From GitLab 13.4, enabled by default. +> - From GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency. +> - From GitLab 13.6, primary-wins voting strategy and `gitaly_reference_transactions_primary_wins` feature flag were removed from the source code. + +By default, Gitaly Cluster guarantees eventual consistency by replicating all writes to secondary +Gitaly nodes after the write to the primary Gitaly node has happened. + +Praefect can instead provide strong consistency by creating a transaction and writing changes to all +Gitaly nodes at once. + +If enabled, transactions are only available for a subset of RPCs. For more information, see the +[strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189). + +For configuration information, see [Configure strong consistency](praefect.md#configure-strong-consistency). + +#### Replication factor + +Replication factor is the number of copies Gitaly Cluster maintains of a given repository. A higher +replication factor: + +- Offers better redundancy and distribution of read workload. +- Results in higher storage cost. + +By default, Gitaly Cluster replicates repositories to every storage in a +[virtual storage](#virtual-storage). + +For configuration information, see [Configure replication factor](praefect.md#configure-replication-factor). + ### Configure Gitaly Cluster For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md). +### Migrate to Gitaly Cluster + +Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice) +or to move from single Gitaly nodes, the basic process involves: + +1. Create the required storage. Refer to + [repository storage recommendations](faq.md#what-are-some-repository-storage-recommendations). +1. Create and configure [Gitaly Cluster](praefect.md). +1. [Move the repositories](../operations/moving_repositories.md#move-repositories). To migrate to + Gitaly Cluster, existing repositories stored outside Gitaly Cluster must be moved. There is no + automatic migration but the moves can be scheduled with the GitLab API. + +## Monitor Gitaly and Gitaly Cluster + +You can use the available logs and [Prometheus metrics](../monitoring/prometheus/index.md) to +monitor Gitaly and Gitaly Cluster (Praefect). + +Metric definitions are available: + +- Directly from Prometheus `/metrics` endpoint configured for Gitaly. +- Using [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) on a + Grafana instance configured against Prometheus. + +### Monitor Gitaly + +You can observe the behavior of [queued requests](configure_gitaly.md#limit-rpc-concurrency) using +the Gitaly logs and Prometheus: + +- In the [Gitaly logs](../logs.md#gitaly-logs), look for the string (or structured log field) + `acquire_ms`. Messages that have this field are reporting about the concurrency limiter. +- In Prometheus, look for the following metrics: + - `gitaly_rate_limiting_in_progress`. + - `gitaly_rate_limiting_queued`. + - `gitaly_rate_limiting_seconds`. + + Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter, + not a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency + doesn't exceed 1, and the concurrency limiter has no effect. + +The following [pack-objects cache](configure_gitaly.md#pack-objects-cache) metrics are available: + +- `gitaly_pack_objects_cache_enabled`, a gauge set to `1` when the cache is enabled. Available + labels: `dir` and `max_age`. +- `gitaly_pack_objects_cache_lookups_total`, a counter for cache lookups. Available label: `result`. +- `gitaly_pack_objects_generated_bytes_total`, a counter for the number of bytes written into the + cache. +- `gitaly_pack_objects_served_bytes_total`, a counter for the number of bytes read from the cache. +- `gitaly_streamcache_filestore_disk_usage_bytes`, a gauge for the total size of cache files. + Available label: `dir`. +- `gitaly_streamcache_index_entries`, a gauge for the number of entries in the cache. Available + label: `dir`. + +Some of these metrics start with `gitaly_streamcache` because they are generated by the +`streamcache` internal library package in Gitaly. + +Example: + +```plaintext +gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1 +gitaly_pack_objects_cache_lookups_total{result="hit"} 2 +gitaly_pack_objects_cache_lookups_total{result="miss"} 1 +gitaly_pack_objects_generated_bytes_total 2.618649e+07 +gitaly_pack_objects_served_bytes_total 7.855947e+07 +gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07 +gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 +gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 +``` + +#### Useful queries + +The following are useful queries for monitoring Gitaly: + +- Use the following Prometheus query to observe the + [type of connections](configure_gitaly.md#enable-tls-support) Gitaly is serving a production + environment: + + ```prometheus + sum(rate(gitaly_connections_total[5m])) by (type) + ``` + +- Use the following Prometheus query to monitor the + [authentication behavior](configure_gitaly.md#observe-type-of-gitaly-connections) of your GitLab + installation: + + ```prometheus + sum(rate(gitaly_authentications_total[5m])) by (enforced, status) + ``` + + In a system where authentication is configured correctly and where you have live traffic, you + see something like this: + + ```prometheus + {enforced="true",status="ok"} 4424.985419441742 + ``` + + There may also be other numbers with rate 0, but you only have to take note of the non-zero numbers. + + The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero + numbers, something is wrong in your configuration. + + The `status="ok"` number reflects your current request rate. In the example above, Gitaly is + handling about 4000 requests per second. + +- Use the following Prometheus query to observe the [Git protocol versions](../git_protocol.md) + being used in a production environment: + + ```prometheus + sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service) + ``` + +### Monitor Gitaly Cluster + +To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics: + +- `gitaly_praefect_read_distribution`, a counter to track [distribution of reads](#distributed-reads). + It has two labels: + + - `virtual_storage`. + - `storage`. + + They reflect configuration defined for this instance of Praefect. + +- `gitaly_praefect_replication_latency_bucket`, a histogram measuring the amount of time it takes + for replication to complete once the replication job starts. Available in GitLab 12.10 and later. +- `gitaly_praefect_replication_delay_bucket`, a histogram measuring how much time passes between + when the replication job is created and when it starts. Available in GitLab 12.10 and later. +- `gitaly_praefect_node_latency_bucket`, a histogram measuring the latency in Gitaly returning + health check information to Praefect. This indicates Praefect connection saturation. Available in + GitLab 12.10 and later. + +To monitor [strong consistency](#strong-consistency), you can use the following Prometheus metrics: + +- `gitaly_praefect_transactions_total`, the number of transactions created and voted on. +- `gitaly_praefect_subtransactions_per_transaction_total`, the number of times nodes cast a vote for + a single transaction. This can happen multiple times if multiple references are getting updated in + a single transaction. +- `gitaly_praefect_voters_per_transaction_total`: the number of Gitaly nodes taking part in a + transaction. +- `gitaly_praefect_transactions_delay_seconds`, the server-side delay introduced by waiting for the + transaction to be committed. +- `gitaly_hook_transaction_voting_delay_seconds`, the client-side delay introduced by waiting for + the transaction to be committed. + ## Do not bypass Gitaly GitLab doesn't advise directly accessing Gitaly repositories stored on disk with a Git client, @@ -253,8 +458,8 @@ your assumptions, resulting in performance degradation, instability, and even da - Gitaly has optimizations such as the [`info/refs` advertisement cache](https://gitlab.com/gitlab-org/gitaly/blob/master/doc/design_diskcache.md), that rely on Gitaly controlling and monitoring access to repositories by using the official gRPC interface. -- [Gitaly Cluster](praefect.md) has optimizations, such as fault tolerance and - [distributed reads](praefect.md#distributed-reads), that depend on the gRPC interface and database +- [Gitaly Cluster](#gitaly-cluster) has optimizations, such as fault tolerance and + [distributed reads](#distributed-reads), that depend on the gRPC interface and database to determine repository state. WARNING: @@ -367,7 +572,7 @@ Additional information: GitLab recommends: - Creating a [Gitaly Cluster](#gitaly-cluster) as soon as possible. -- [Moving your repositories](praefect.md#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly +- [Moving your repositories](#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly Cluster. We welcome your feedback on this process. You can: |