diff options
Diffstat (limited to 'doc/administration/gitaly')
-rw-r--r-- | doc/administration/gitaly/configure_gitaly.md | 105 | ||||
-rw-r--r-- | doc/administration/gitaly/faq.md | 6 | ||||
-rw-r--r-- | doc/administration/gitaly/index.md | 295 | ||||
-rw-r--r-- | doc/administration/gitaly/praefect.md | 254 | ||||
-rw-r--r-- | doc/administration/gitaly/reference.md | 2 | ||||
-rw-r--r-- | doc/administration/gitaly/troubleshooting.md | 30 |
6 files changed, 331 insertions, 361 deletions
diff --git a/doc/administration/gitaly/configure_gitaly.md b/doc/administration/gitaly/configure_gitaly.md index 0b22df5a115..5e8cbac42c1 100644 --- a/doc/administration/gitaly/configure_gitaly.md +++ b/doc/administration/gitaly/configure_gitaly.md @@ -217,10 +217,7 @@ disable enforcement. For more information, see the documentation on configuring 1. Edit `/etc/gitlab/gitlab.rb`: - <!-- - updates to following example must also be made at - https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab - --> + <!-- Updates to following example must also be made at https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab --> ```ruby # Avoid running unnecessary services on the Gitaly server @@ -267,10 +264,7 @@ disable enforcement. For more information, see the documentation on configuring 1. Append the following to `/etc/gitlab/gitlab.rb` for each respective Gitaly server: - <!-- - updates to following example must also be made at - https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab - --> + <!-- Updates to following example must also be made at https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab --> On `gitaly1.internal`: @@ -595,10 +589,7 @@ To configure Gitaly with TLS: 1. Edit `/etc/gitlab/gitlab.rb` and add: - <!-- - updates to following example must also be made at - https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab - --> + <!-- Updates to following example must also be made at https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab --> ```ruby gitaly['tls_listen_addr'] = "0.0.0.0:9999" @@ -693,12 +684,8 @@ To configure Gitaly with TLS: ### Observe type of Gitaly connections -[Prometheus](../monitoring/prometheus/index.md) can be used observe what type of connections Gitaly -is serving a production environment. Use the following Prometheus query: - -```prometheus -sum(rate(gitaly_connections_total[5m])) by (type) -``` +For information on observing the type of Gitaly connections being served, see the +[relevant documentation](index.md#useful-queries). ## `gitaly-ruby` @@ -790,20 +777,8 @@ repository. In the example above: - If another request comes in for a repository that has used up its 20 slots, that request gets queued. -You can observe the behavior of this queue using the Gitaly logs and Prometheus: - -- In the Gitaly logs, look for the string (or structured log field) `acquire_ms`. Messages that have - this field are reporting about the concurrency limiter. -- In Prometheus, look for the following metrics: - - - `gitaly_rate_limiting_in_progress`. - - `gitaly_rate_limiting_queued`. - - `gitaly_rate_limiting_seconds`. - -NOTE: -Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter, not -a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency doesn't -exceed 1, and the concurrency limiter has no effect. +You can observe the behavior of this queue using the Gitaly logs and Prometheus. For more +information, see the [relevant documentation](index.md#monitor-gitaly). ## Background Repository Optimization @@ -857,30 +832,11 @@ server" and "Gitaly client" refers to the same machine. ### Verify authentication monitoring -Before rotating a Gitaly authentication token, verify that you can monitor the authentication -behavior of your GitLab installation using Prometheus. Use the following Prometheus query: +Before rotating a Gitaly authentication token, verify that you can +[monitor the authentication behavior](index.md#useful-queries) of your GitLab installation using +Prometheus. -```prometheus -sum(rate(gitaly_authentications_total[5m])) by (enforced, status) -``` - -In a system where authentication is configured correctly and where you have live traffic, you -see something like this: - -```prometheus -{enforced="true",status="ok"} 4424.985419441742 -``` - -There may also be other numbers with rate 0. We care only about the non-zero numbers. - -The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero -numbers, something is wrong in your configuration. - -The `status="ok"` number reflects your current request rate. In the example above, Gitaly is -handling about 4000 requests per second. - -Now that you have established that you can monitor the Gitaly authentication behavior of your GitLab -installation, you can begin the rest of the procedure. +You can then continue the rest of the procedure. ### Enable "auth transitioning" mode @@ -955,7 +911,7 @@ result as you did at the start. For example: {enforced="true",status="ok"} 4424.985419441742 ``` -Note that `enforced="true"` means that authentication is being enforced. +`enforced="true"` means that authentication is being enforced. ## Pack-objects cache **(FREE SELF)** @@ -1079,7 +1035,7 @@ cache hit and the average amount of storage used by cache files. Entries older than `max_age` get evicted from the in-memory metadata store, and deleted from disk. -Note that eviction does not interfere with ongoing requests, so it is OK +Eviction does not interfere with ongoing requests, so it is OK for `max_age` to be less than the time it takes to do a fetch over a slow connection. This is because Unix filesystems do not truly delete a file until all processes that are reading the deleted file have @@ -1087,9 +1043,8 @@ closed it. ### Observe the cache -The cache can be observed in logs and using metrics. - -#### Logs +The cache can be observed [using metrics](index.md#monitor-gitaly) and in the following logged +information: |Message|Fields|Description| |:---|:---|:---| @@ -1149,33 +1104,3 @@ Example: "time":"2021-03-25T14:57:53.543Z" } ``` - -#### Metrics - -The following cache metrics are available. - -|Metric|Type|Labels|Description| -|:---|:---|:---|:---| -|`gitaly_pack_objects_cache_enabled`|gauge|`dir`,`max_age`|Set to `1` when the cache is enabled via the Gitaly configuration file| -|`gitaly_pack_objects_cache_lookups_total`|counter|`result`|Hit/miss counter for cache lookups| -|`gitaly_pack_objects_generated_bytes_total`|counter||Number of bytes written into the cache| -|`gitaly_pack_objects_served_bytes_total`|counter||Number of bytes read from the cache| -|`gitaly_streamcache_filestore_disk_usage_bytes`|gauge|`dir`|Total size of cache files| -|`gitaly_streamcache_index_entries`|gauge|`dir`|Number of entries in the cache| - -Some of these metrics start with `gitaly_streamcache` -because they are generated by the "streamcache" internal library -package in Gitaly. - -Example: - -```plaintext -gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1 -gitaly_pack_objects_cache_lookups_total{result="hit"} 2 -gitaly_pack_objects_cache_lookups_total{result="miss"} 1 -gitaly_pack_objects_generated_bytes_total 2.618649e+07 -gitaly_pack_objects_served_bytes_total 7.855947e+07 -gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07 -gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 -gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 -``` diff --git a/doc/administration/gitaly/faq.md b/doc/administration/gitaly/faq.md index a5964b7a2eb..c7ecaa020e0 100644 --- a/doc/administration/gitaly/faq.md +++ b/doc/administration/gitaly/faq.md @@ -25,7 +25,7 @@ The following table outlines the major differences between Gitaly Cluster and Ge | Tool | Nodes | Locations | Latency tolerance | Failover | Consistency | Provides redundancy for | |:---------------|:---------|:----------|:-------------------|:----------------------------------------------------------------------------|:-----------------------------------------|:------------------------| -| Gitaly Cluster | Multiple | Single | Approximately 1 ms | [Automatic](praefect.md#automatic-failover-and-primary-election-strategies) | [Strong](praefect.md#strong-consistency) | Data storage in Git | +| Gitaly Cluster | Multiple | Single | Approximately 1 ms | [Automatic](praefect.md#automatic-failover-and-primary-election-strategies) | [Strong](index.md#strong-consistency) | Data storage in Git | | Geo | Multiple | Multiple | Up to one minute | [Manual](../geo/disaster_recovery/index.md) | Eventual | Entire GitLab instance | For more information, see: @@ -35,12 +35,12 @@ For more information, see: ## Are there instructions for migrating to Gitaly Cluster? -Yes! For more information, see [Migrate to Gitaly Cluster](praefect.md#migrate-to-gitaly-cluster). +Yes! For more information, see [Migrate to Gitaly Cluster](index.md#migrate-to-gitaly-cluster). ## What are some repository storage recommendations? The size of the required storage can vary between instances and depends on the set -[replication factor](praefect.md#replication-factor). You might want to include implementing +[replication factor](index.md#replication-factor). You might want to include implementing repository storage redundancy. For a replication factor: diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md index 0af248e0573..bca83e903ac 100644 --- a/doc/administration/gitaly/index.md +++ b/doc/administration/gitaly/index.md @@ -30,8 +30,8 @@ repository storage is either: - A Gitaly storage with direct access to repositories using [storage paths](../repository_storage_paths.md), where each repository is stored on a single Gitaly node. All requests are routed to this node. -- A virtual storage provided by [Gitaly Cluster](#gitaly-cluster), where each repository can be - stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster: +- A [virtual storage](#virtual-storage) provided by [Gitaly Cluster](#gitaly-cluster), where each + repository can be stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster: - Read requests are distributed between multiple Gitaly nodes, which can improve performance. - Write requests are broadcast to repository replicas. @@ -39,32 +39,6 @@ WARNING: Engineering support for NFS for Git repositories is deprecated. Read the [deprecation notice](#nfs-deprecation-notice). -## Virtual storage - -Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository -management. - -Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations. -However, this is at the expense of additional storage space needed to store each repository on multiple -Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is: - -- Improved fault tolerance, because each Gitaly node has a copy of every repository. -- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak - loads, because read loads are distributed across Gitaly nodes. -- Manual rebalancing for performance is not required, because read loads are distributed across - Gitaly nodes. -- Simpler management, because all Gitaly nodes are identical. - -The number of repository replicas can be configured using a -[replication factor](praefect.md#replication-factor). - -It can -be uneconomical to have the same replication factor for all repositories. -[Variable replication factor](https://gitlab.com/groups/gitlab-org/-/epics/3372) is planned to -provide greater flexibility for extremely large GitLab instances. - -As with normal Gitaly storages, virtual storages can be sharded. - ## Gitaly The following shows GitLab set up to use direct access to Gitaly: @@ -160,7 +134,7 @@ In this example: - Repositories are stored on a virtual storage called `storage-1`. - Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`. - The three Gitaly nodes share data in three separate hashed storage locations. -- The [replication factor](praefect.md#replication-factor) is `3`. There are three copies maintained +- The [replication factor](#replication-factor) is `3`. There are three copies maintained of each repository. The availability objectives for Gitaly clusters are: @@ -170,7 +144,7 @@ The availability objectives for Gitaly clusters are: Writes are replicated asynchronously. Any writes that have not been replicated to the newly promoted primary are lost. - [Strong consistency](praefect.md#strong-consistency) can be used to avoid loss in some + [Strong consistency](#strong-consistency) can be used to avoid loss in some circumstances. - **Recovery Time Objective (RTO):** Less than 10 seconds. @@ -178,20 +152,34 @@ The availability objectives for Gitaly clusters are: second. Failover requires ten consecutive failed health checks on each Praefect node. - [Faster outage detection](https://gitlab.com/gitlab-org/gitaly/-/issues/2608) - is planned to improve this to less than 1 second. + Faster outage detection, to improve this speed to less than 1 second, + is tracked [in this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2608). + +### Virtual storage + +Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository +management. + +Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations. +However, this is at the expense of additional storage space needed to store each repository on multiple +Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is: -Gitaly Cluster supports: +- Improved fault tolerance, because each Gitaly node has a copy of every repository. +- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak + loads, because read loads are distributed across Gitaly nodes. +- Manual rebalancing for performance is not required, because read loads are distributed across + Gitaly nodes. +- Simpler management, because all Gitaly nodes are identical. -- [Strong consistency](praefect.md#strong-consistency) of the secondary replicas. -- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the primary to the secondary. -- Reporting of possible data loss if replication queue is non-empty. -- From GitLab 13.0 to GitLab 14.0, marking repositories as [read-only](praefect.md#read-only-mode) - if data loss is detected to prevent data inconsistencies. +The number of repository replicas can be configured using a +[replication factor](#replication-factor). + +It can +be uneconomical to have the same replication factor for all repositories. +To provide greater flexibility for extremely large GitLab instances, +variable replication factor is tracked in [this issue](https://gitlab.com/groups/gitlab-org/-/epics/3372). -Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489) -for improvements including -[horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013). +As with normal Gitaly storages, virtual storages can be sharded. ### Moving beyond NFS @@ -220,7 +208,7 @@ Further reading: - Blog post: [The road to Gitaly v1.0 (aka, why GitLab doesn't require NFS for storing Git data anymore)](https://about.gitlab.com/blog/2018/09/12/the-road-to-gitaly-1-0/) - Blog post: [How we spent two weeks hunting an NFS bug in the Linux kernel](https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/) -### Components of Gitaly Cluster +### Components Gitaly Cluster consists of multiple components: @@ -240,10 +228,227 @@ component for running a Gitaly Cluster. For more information, see [Gitaly High Availability (HA) Design](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md). +### Features + +Gitaly Cluster provides the following features: + +- [Distributed reads](#distributed-reads) among Gitaly nodes. +- [Strong consistency](#strong-consistency) of the secondary replicas. +- [Replication factor](#replication-factor) of repositories for increased redundancy. +- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the + primary Gitaly node to secondary Gitaly nodes. +- Reporting of possible [data loss](praefect.md#check-for-data-loss) if replication queue is + non-empty. + +Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489) for improvements +including [horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013). + +#### Distributed reads + +> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled. +> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3. +> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5. +> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8. +> - [Feature flag removed](https://gitlab.com/gitlab-org/gitaly/-/issues/3383) in GitLab 13.11. + +Gitaly Cluster supports distribution of read operations across Gitaly nodes that are configured for +the [virtual storage](#virtual-storage). + +All RPCs marked with the `ACCESSOR` option are redirected to an up to date and healthy Gitaly node. +For example, [`GetBlob`](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16). + +_Up to date_ in this context means that: + +- There is no replication operations scheduled for this Gitaly node. +- The last replication operation is in _completed_ state. + +The primary node is chosen to serve the request if: + +- There are no up to date nodes. +- Any other error occurs during node selection. + +You can [monitor distribution of reads](#monitor-gitaly-cluster) using Prometheus. + +#### Strong consistency + +> - Introduced in GitLab 13.1 in [alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga), disabled by default. +> - Entered [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) in GitLab 13.2, disabled by default. +> - In GitLab 13.3, disabled unless primary-wins voting strategy is disabled. +> - From GitLab 13.4, enabled by default. +> - From GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency. +> - From GitLab 13.6, primary-wins voting strategy and `gitaly_reference_transactions_primary_wins` feature flag were removed from the source code. + +By default, Gitaly Cluster guarantees eventual consistency by replicating all writes to secondary +Gitaly nodes after the write to the primary Gitaly node has happened. + +Praefect can instead provide strong consistency by creating a transaction and writing changes to all +Gitaly nodes at once. + +If enabled, transactions are only available for a subset of RPCs. For more information, see the +[strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189). + +For configuration information, see [Configure strong consistency](praefect.md#configure-strong-consistency). + +#### Replication factor + +Replication factor is the number of copies Gitaly Cluster maintains of a given repository. A higher +replication factor: + +- Offers better redundancy and distribution of read workload. +- Results in higher storage cost. + +By default, Gitaly Cluster replicates repositories to every storage in a +[virtual storage](#virtual-storage). + +For configuration information, see [Configure replication factor](praefect.md#configure-replication-factor). + ### Configure Gitaly Cluster For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md). +### Migrate to Gitaly Cluster + +Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice) +or to move from single Gitaly nodes, the basic process involves: + +1. Create the required storage. Refer to + [repository storage recommendations](faq.md#what-are-some-repository-storage-recommendations). +1. Create and configure [Gitaly Cluster](praefect.md). +1. [Move the repositories](../operations/moving_repositories.md#move-repositories). To migrate to + Gitaly Cluster, existing repositories stored outside Gitaly Cluster must be moved. There is no + automatic migration but the moves can be scheduled with the GitLab API. + +## Monitor Gitaly and Gitaly Cluster + +You can use the available logs and [Prometheus metrics](../monitoring/prometheus/index.md) to +monitor Gitaly and Gitaly Cluster (Praefect). + +Metric definitions are available: + +- Directly from Prometheus `/metrics` endpoint configured for Gitaly. +- Using [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) on a + Grafana instance configured against Prometheus. + +### Monitor Gitaly + +You can observe the behavior of [queued requests](configure_gitaly.md#limit-rpc-concurrency) using +the Gitaly logs and Prometheus: + +- In the [Gitaly logs](../logs.md#gitaly-logs), look for the string (or structured log field) + `acquire_ms`. Messages that have this field are reporting about the concurrency limiter. +- In Prometheus, look for the following metrics: + - `gitaly_rate_limiting_in_progress`. + - `gitaly_rate_limiting_queued`. + - `gitaly_rate_limiting_seconds`. + + Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter, + not a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency + doesn't exceed 1, and the concurrency limiter has no effect. + +The following [pack-objects cache](configure_gitaly.md#pack-objects-cache) metrics are available: + +- `gitaly_pack_objects_cache_enabled`, a gauge set to `1` when the cache is enabled. Available + labels: `dir` and `max_age`. +- `gitaly_pack_objects_cache_lookups_total`, a counter for cache lookups. Available label: `result`. +- `gitaly_pack_objects_generated_bytes_total`, a counter for the number of bytes written into the + cache. +- `gitaly_pack_objects_served_bytes_total`, a counter for the number of bytes read from the cache. +- `gitaly_streamcache_filestore_disk_usage_bytes`, a gauge for the total size of cache files. + Available label: `dir`. +- `gitaly_streamcache_index_entries`, a gauge for the number of entries in the cache. Available + label: `dir`. + +Some of these metrics start with `gitaly_streamcache` because they are generated by the +`streamcache` internal library package in Gitaly. + +Example: + +```plaintext +gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1 +gitaly_pack_objects_cache_lookups_total{result="hit"} 2 +gitaly_pack_objects_cache_lookups_total{result="miss"} 1 +gitaly_pack_objects_generated_bytes_total 2.618649e+07 +gitaly_pack_objects_served_bytes_total 7.855947e+07 +gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07 +gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 +gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1 +``` + +#### Useful queries + +The following are useful queries for monitoring Gitaly: + +- Use the following Prometheus query to observe the + [type of connections](configure_gitaly.md#enable-tls-support) Gitaly is serving a production + environment: + + ```prometheus + sum(rate(gitaly_connections_total[5m])) by (type) + ``` + +- Use the following Prometheus query to monitor the + [authentication behavior](configure_gitaly.md#observe-type-of-gitaly-connections) of your GitLab + installation: + + ```prometheus + sum(rate(gitaly_authentications_total[5m])) by (enforced, status) + ``` + + In a system where authentication is configured correctly and where you have live traffic, you + see something like this: + + ```prometheus + {enforced="true",status="ok"} 4424.985419441742 + ``` + + There may also be other numbers with rate 0, but you only have to take note of the non-zero numbers. + + The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero + numbers, something is wrong in your configuration. + + The `status="ok"` number reflects your current request rate. In the example above, Gitaly is + handling about 4000 requests per second. + +- Use the following Prometheus query to observe the [Git protocol versions](../git_protocol.md) + being used in a production environment: + + ```prometheus + sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service) + ``` + +### Monitor Gitaly Cluster + +To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics: + +- `gitaly_praefect_read_distribution`, a counter to track [distribution of reads](#distributed-reads). + It has two labels: + + - `virtual_storage`. + - `storage`. + + They reflect configuration defined for this instance of Praefect. + +- `gitaly_praefect_replication_latency_bucket`, a histogram measuring the amount of time it takes + for replication to complete once the replication job starts. Available in GitLab 12.10 and later. +- `gitaly_praefect_replication_delay_bucket`, a histogram measuring how much time passes between + when the replication job is created and when it starts. Available in GitLab 12.10 and later. +- `gitaly_praefect_node_latency_bucket`, a histogram measuring the latency in Gitaly returning + health check information to Praefect. This indicates Praefect connection saturation. Available in + GitLab 12.10 and later. + +To monitor [strong consistency](#strong-consistency), you can use the following Prometheus metrics: + +- `gitaly_praefect_transactions_total`, the number of transactions created and voted on. +- `gitaly_praefect_subtransactions_per_transaction_total`, the number of times nodes cast a vote for + a single transaction. This can happen multiple times if multiple references are getting updated in + a single transaction. +- `gitaly_praefect_voters_per_transaction_total`: the number of Gitaly nodes taking part in a + transaction. +- `gitaly_praefect_transactions_delay_seconds`, the server-side delay introduced by waiting for the + transaction to be committed. +- `gitaly_hook_transaction_voting_delay_seconds`, the client-side delay introduced by waiting for + the transaction to be committed. + ## Do not bypass Gitaly GitLab doesn't advise directly accessing Gitaly repositories stored on disk with a Git client, @@ -253,8 +458,8 @@ your assumptions, resulting in performance degradation, instability, and even da - Gitaly has optimizations such as the [`info/refs` advertisement cache](https://gitlab.com/gitlab-org/gitaly/blob/master/doc/design_diskcache.md), that rely on Gitaly controlling and monitoring access to repositories by using the official gRPC interface. -- [Gitaly Cluster](praefect.md) has optimizations, such as fault tolerance and - [distributed reads](praefect.md#distributed-reads), that depend on the gRPC interface and database +- [Gitaly Cluster](#gitaly-cluster) has optimizations, such as fault tolerance and + [distributed reads](#distributed-reads), that depend on the gRPC interface and database to determine repository state. WARNING: @@ -367,7 +572,7 @@ Additional information: GitLab recommends: - Creating a [Gitaly Cluster](#gitaly-cluster) as soon as possible. -- [Moving your repositories](praefect.md#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly +- [Moving your repositories](#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly Cluster. We welcome your feedback on this process. You can: diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md index e483bcc944a..4af7f1a58a5 100644 --- a/doc/administration/gitaly/praefect.md +++ b/doc/administration/gitaly/praefect.md @@ -24,7 +24,7 @@ NOTE: Upgrade instructions for Omnibus GitLab installations [are available](https://docs.gitlab.com/omnibus/update/#gitaly-cluster). -## Requirements for configuring a Gitaly Cluster +## Requirements The minimum recommended configuration for a Gitaly Cluster requires: @@ -33,14 +33,33 @@ The minimum recommended configuration for a Gitaly Cluster requires: - 3 Praefect nodes - 3 Gitaly nodes (1 primary, 2 secondary) -See the [design -document](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md) +See the [design document](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md) for implementation details. NOTE: If not set in GitLab, feature flags are read as false from the console and Praefect uses their default value. The default value depends on the GitLab version. +### Network connectivity + +Gitaly Cluster [components](index.md#components) need to communicate with each other over many +routes. Your firewall rules must allow the following for Gitaly Cluster to function properly: + +| From | To | Default port | TLS port | +|:-----------------------|:-----------------------|:-------------|:---------| +| GitLab | Praefect load balancer | `2305` | `3305` | +| Praefect load balancer | Praefect | `2305` | `3305` | +| Praefect | Gitaly | `8075` | `9999` | +| Gitaly | GitLab (internal API) | `80` | `443` | +| Gitaly | Praefect load balancer | `2305` | `3305` | +| Gitaly | Praefect | `2305` | `3305` | +| Gitaly | Gitaly | `8075` | `9999` | + +NOTE: +Gitaly does not directly connect to Praefect. However, requests from Gitaly to the Praefect +load balancer may still be blocked unless firewalls on the Praefect nodes allow traffic from +the Gitaly nodes. + ## Setup Instructions If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus GitLab package @@ -129,7 +148,7 @@ The following options are available: - For non-Geo installations, either: - Use one of the documented [PostgreSQL setups](../postgresql/index.md). - - Use your own third-party database setup. This will require [manual setup](#manual-database-setup). + - Use your own third-party database setup. This requires [manual setup](#manual-database-setup). - For Geo instances, either: - Set up a separate [PostgreSQL instance](https://www.postgresql.org/docs/11/high-availability.html). - Use a cloud-managed PostgreSQL service. AWS @@ -176,7 +195,7 @@ instructions only work on Omnibus-provided PostgreSQL: ``` Replace `<PRAEFECT_SQL_PASSWORD_HASH>` with the hash of the password you generated in the - preparation step. Note that it is prefixed with `md5` literal. + preparation step. It is prefixed with `md5` literal. 1. The PgBouncer that is shipped with Omnibus is configured to use [`auth_query`](https://www.pgbouncer.org/config.html#generic-settings) and uses `pg_shadow_lookup` function. You need to create this function in `praefect_production` @@ -413,7 +432,7 @@ On the **Praefect** node: WARNING: If you have data on an already existing storage called `default`, you should configure the virtual storage with another name and - [migrate the data to the Gitaly Cluster storage](#migrate-to-gitaly-cluster) + [migrate the data to the Gitaly Cluster storage](index.md#migrate-to-gitaly-cluster) afterwards. Replace `PRAEFECT_INTERNAL_TOKEN` with a strong secret, which is used by @@ -457,7 +476,7 @@ On the **Praefect** node: In [GitLab 13.8 and earlier](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4988), Gitaly nodes were configured directly under the virtual storage, and not under the `nodes` key. -1. [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2013) in GitLab 13.1 and later, enable [distribution of reads](#distributed-reads). +1. [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2013) in GitLab 13.1 and later, enable [distribution of reads](index.md#distributed-reads). 1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure): @@ -877,7 +896,7 @@ Particular attention should be shown to: WARNING: If you have existing data stored on the default Gitaly storage, - you should [migrate the data your Gitaly Cluster storage](#migrate-to-gitaly-cluster) + you should [migrate the data your Gitaly Cluster storage](index.md#migrate-to-gitaly-cluster) first. ```ruby @@ -1053,75 +1072,9 @@ To get started quickly: Congratulations! You've configured an observable fault-tolerant Praefect cluster. -## Network connectivity requirements - -Gitaly Cluster components need to communicate with each other over many routes. -Your firewall rules must allow the following for Gitaly Cluster to function properly: - -| From | To | Default port / TLS port | -|:-----------------------|:------------------------|:------------------------| -| GitLab | Praefect load balancer | `2305` / `3305` | -| Praefect load balancer | Praefect | `2305` / `3305` | -| Praefect | Gitaly | `8075` / `9999` | -| Gitaly | GitLab (internal API) | `80` / `443` | -| Gitaly | Praefect load balancer | `2305` / `3305` | -| Gitaly | Praefect | `2305` / `3305` | -| Gitaly | Gitaly | `8075` / `9999` | - -NOTE: -Gitaly does not directly connect to Praefect. However, requests from Gitaly to the Praefect -load balancer may still be blocked unless firewalls on the Praefect nodes allow traffic from -the Gitaly nodes. - -## Distributed reads - -> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled. -> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3. -> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5. -> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8. -> - [Feature flag removed](https://gitlab.com/gitlab-org/gitaly/-/issues/3383) in GitLab 13.11. - -Praefect supports distribution of read operations across Gitaly nodes that are -configured for the virtual node. - -All RPCs marked with `ACCESSOR` option like -[GetBlob](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16) -are redirected to an up to date and healthy Gitaly node. - -_Up to date_ in this context means that: +## Configure strong consistency -- There is no replication operations scheduled for this node. -- The last replication operation is in _completed_ state. - -If there is no such nodes, or any other error occurs during node selection, the primary -node is chosen to serve the request. - -To track distribution of read operations, you can use the `gitaly_praefect_read_distribution` -Prometheus counter metric. It has two labels: - -- `virtual_storage`. -- `storage`. - -They reflect configuration defined for this instance of Praefect. - -## Strong consistency - -> - Introduced in GitLab 13.1 in [alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga), disabled by default. -> - Entered [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) in GitLab 13.2, disabled by default. -> - In GitLab 13.3, disabled unless primary-wins voting strategy is disabled. -> - From GitLab 13.4, enabled by default. -> - From GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency. -> - From GitLab 13.6, primary-wins voting strategy and `gitaly_reference_transactions_primary_wins` feature flag were removed from the source code. - -Praefect guarantees eventual consistency by replicating all writes to secondary nodes -after the write to the primary Gitaly node has happened. - -Praefect can instead provide strong consistency by creating a transaction and writing -changes to all Gitaly nodes at once. -If enabled, transactions are only available for a subset of RPCs. For more -information, see the [strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189). - -To enable strong consistency: +To enable [strong consistency](index.md#strong-consistency): - In GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency. - In GitLab 13.4 and later, the strong consistency voting strategy has been improved and enabled by default. @@ -1141,28 +1094,10 @@ Feature.enable(:gitaly_reference_transactions) Feature.disable(:gitaly_reference_transactions_primary_wins) ``` -To monitor strong consistency, you can use the following Prometheus metrics: - -- `gitaly_praefect_transactions_total`: Number of transactions created and - voted on. -- `gitaly_praefect_subtransactions_per_transaction_total`: Number of times - nodes cast a vote for a single transaction. This can happen multiple times if - multiple references are getting updated in a single transaction. -- `gitaly_praefect_voters_per_transaction_total`: Number of Gitaly nodes taking - part in a transaction. -- `gitaly_praefect_transactions_delay_seconds`: Server-side delay introduced by - waiting for the transaction to be committed. -- `gitaly_hook_transaction_voting_delay_seconds`: Client-side delay introduced - by waiting for the transaction to be committed. +For information on monitoring strong consistency, see the +[relevant documentation](index.md#monitor-gitaly-cluster). -## Replication factor - -Replication factor is the number of copies Praefect maintains of a given repository. A higher -replication factor offers better redundancy and distribution of read workload, but also results -in a higher storage cost. By default, Praefect replicates repositories to every storage in a -virtual storage. - -### Configure replication factor +## Configure replication factor WARNING: Configurable replication factors require [repository-specific primary nodes](#repository-specific-primary-nodes) to be used. @@ -1639,128 +1574,3 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t - Replace the placeholder `<virtual-storage>` with the virtual storage containing the Gitaly node storage to be checked. - Replace the placeholder `<up-to-date-storage>` with the Gitaly storage name containing up to date repositories. - Replace the placeholder `<outdated-storage>` with the Gitaly storage name containing outdated repositories. - -## Migrate to Gitaly Cluster - -Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice) -or to move from single Gitaly nodes, the basic process involves: - -1. Create the required storage. -1. Create and configure Gitaly Cluster. -1. [Move the repositories](#move-repositories). - -When creating the storage, see some -[repository storage recommendations](faq.md#what-are-some-repository-storage-recommendations). - -### Move Repositories - -To migrate to Gitaly Cluster, existing repositories stored outside Gitaly Cluster must be -moved. There is no automatic migration but the moves can be scheduled with the GitLab API. - -GitLab repositories can be associated with projects, groups, and snippets. Each of these types -have a separate API to schedule the respective repositories to move. To move all repositories -on a GitLab instance, each of these types must be scheduled to move for each storage. - -Each repository is made read-only for the duration of the move. The repository is not writable -until the move has completed. - -After creating and configuring Gitaly Cluster: - -1. Ensure all storages are accessible to the GitLab instance. In this example, these are - `<original_storage_name>` and `<cluster_storage_name>`. -1. [Configure repository storage weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored) - so that the Gitaly Cluster receives all new projects. This stops new projects from being created - on existing Gitaly nodes while the migration is in progress. -1. Schedule repository moves for: - - [Projects](#bulk-schedule-project-moves). - - [Snippets](#bulk-schedule-snippet-moves). - - [Groups](#bulk-schedule-group-moves). **(PREMIUM SELF)** - -#### Bulk schedule project moves - -1. [Schedule repository storage moves for all projects on a storage shard](../../api/project_repository_storage_moves.md#schedule-repository-storage-moves-for-all-projects-on-a-storage-shard) using the API. For example: - - ```shell - curl --request POST --header "Private-Token: <your_access_token>" \ - --header "Content-Type: application/json" \ - --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \ - "https://gitlab.example.com/api/v4/project_repository_storage_moves" - ``` - -1. [Query the most recent repository moves](../../api/project_repository_storage_moves.md#retrieve-all-project-repository-storage-moves) - using the API. The query indicates either: - - The moves have completed successfully. The `state` field is `finished`. - - The moves are in progress. Re-query the repository move until it completes successfully. - - The moves have failed. Most failures are temporary and are solved by rescheduling the move. - -1. After the moves are complete, [query projects](../../api/projects.md#list-all-projects) - using the API to confirm that all projects have moved. No projects should be returned - with `repository_storage` field set to the old storage. - - ```shell - curl --header "Private-Token: <your_access_token>" --header "Content-Type: application/json" \ - "https://gitlab.example.com/api/v4/projects?repository_storage=<original_storage_name>" - ``` - - Alternatively use [the rails console](../operations/rails_console.md) to - confirm that all projects have moved. Run the following in the rails console: - - ```ruby - ProjectRepository.for_repository_storage('<original_storage_name>') - ``` - -1. Repeat for each storage as required. - -#### Bulk schedule snippet moves - -1. [Schedule repository storage moves for all snippets on a storage shard](../../api/snippet_repository_storage_moves.md#schedule-repository-storage-moves-for-all-snippets-on-a-storage-shard) using the API. For example: - - ```shell - curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \ - --header "Content-Type: application/json" \ - --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \ - "https://gitlab.example.com/api/v4/snippet_repository_storage_moves" - ``` - -1. [Query the most recent repository moves](../../api/snippet_repository_storage_moves.md#retrieve-all-snippet-repository-storage-moves) - using the API. The query indicates either: - - The moves have completed successfully. The `state` field is `finished`. - - The moves are in progress. Re-query the repository move until it completes successfully. - - The moves have failed. Most failures are temporary and are solved by rescheduling the move. - -1. After the moves are complete, use [the rails console](../operations/rails_console.md) to - confirm that all snippets have moved. No snippets should be returned for the original - storage. Run the following in the rails console: - - ```ruby - SnippetRepository.for_repository_storage('<original_storage_name>') - ``` - -1. Repeat for each storage as required. - -#### Bulk schedule group moves **(PREMIUM SELF)** - -1. [Schedule repository storage moves for all groups on a storage shard](../../api/group_repository_storage_moves.md#schedule-repository-storage-moves-for-all-groups-on-a-storage-shard) using the API. - - ```shell - curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \ - --header "Content-Type: application/json" \ - --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \ - "https://gitlab.example.com/api/v4/group_repository_storage_moves" - ``` - -1. [Query the most recent repository moves](../../api/group_repository_storage_moves.md#retrieve-all-group-repository-storage-moves) - using the API. The query indicates either: - - The moves have completed successfully. The `state` field is `finished`. - - The moves are in progress. Re-query the repository move until it completes successfully. - - The moves have failed. Most failures are temporary and are solved by rescheduling the move. - -1. After the moves are complete, use [the rails console](../operations/rails_console.md) to - confirm that all groups have moved. No groups should be returned for the original - storage. Run the following in the rails console: - - ```ruby - GroupWikiRepository.for_repository_storage('<original_storage_name>') - ``` - -1. Repeat for each storage as required. diff --git a/doc/administration/gitaly/reference.md b/doc/administration/gitaly/reference.md index ec5a8d47ae2..9fe09be10a3 100644 --- a/doc/administration/gitaly/reference.md +++ b/doc/administration/gitaly/reference.md @@ -71,7 +71,7 @@ Remember to disable `transitioning` when you are done changing your token settings. All authentication attempts are counted in Prometheus under -the `gitaly_authentications_total` metric. +the [`gitaly_authentications_total` metric](index.md#useful-queries). ### TLS diff --git a/doc/administration/gitaly/troubleshooting.md b/doc/administration/gitaly/troubleshooting.md index ab6f493cf0f..3dd700968f9 100644 --- a/doc/administration/gitaly/troubleshooting.md +++ b/doc/administration/gitaly/troubleshooting.md @@ -223,6 +223,28 @@ on the Gitaly server matches the one on Gitaly client. If it doesn't match, update the secrets file on the Gitaly server to match the Gitaly client, then [reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure). +### Repository pushes fail with a `deny updating a hidden ref` error + +Due to [a change](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3426) +introduced in GitLab 13.12, Gitaly has read-only, internal GitLab references that users are not +permitted to update. If you attempt to update internal references with `git push --mirror`, Git +returns the rejection error, `deny updating a hidden ref`. + +The following references are read-only: + +- refs/environments/ +- refs/keep-around/ +- refs/merge-requests/ +- refs/pipelines/ + +To mirror-push branches and tags only, and avoid attempting to mirror-push protected refs, run: + +```shell +git push origin +refs/heads/*:refs/heads/* +refs/tags/*:refs/tags/* +``` + +Any other namespaces that the admin wants to push can be included there as well via additional patterns. + ### Command line tools cannot connect to Gitaly gRPC cannot reach your Gitaly server if: @@ -370,3 +392,11 @@ the `praefect` command: $ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate praefect sql-migrate: OK (applied 21 migrations) ``` + +### Requests fail with 'repo scoped: invalid Repository' errors + +This indicates that the virtual storage name used in the +[Praefect configuration](praefect.md#praefect) does not match the storage name used in +[`git_data_dirs` setting](praefect.md#gitaly) for GitLab. + +Resolve this by matching the virtual storage names used in Praefect and GitLab configuration. |