summaryrefslogtreecommitdiff
path: root/doc/administration/gitaly
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/gitaly')
-rw-r--r--doc/administration/gitaly/faq.md5
-rw-r--r--doc/administration/gitaly/index.md574
-rw-r--r--doc/administration/gitaly/praefect.md522
-rw-r--r--doc/administration/gitaly/troubleshooting.md372
4 files changed, 819 insertions, 654 deletions
diff --git a/doc/administration/gitaly/faq.md b/doc/administration/gitaly/faq.md
index 98a90925d32..a5964b7a2eb 100644
--- a/doc/administration/gitaly/faq.md
+++ b/doc/administration/gitaly/faq.md
@@ -7,7 +7,8 @@ type: reference
# Frequently asked questions **(FREE SELF)**
-The following are answers to frequently asked questions about Gitaly and Gitaly Cluster.
+The following are answers to frequently asked questions about Gitaly and Gitaly Cluster. For
+troubleshooting information, see [Troubleshooting Gitaly and Gitaly Cluster](troubleshooting.md).
## How does Gitaly Cluster compare to Geo?
@@ -87,4 +88,4 @@ There are no special requirements. Gitaly Cluster requires PostgreSQL version 11
These tables are created per the [specific configuration section](praefect.md#postgresql).
If you find you have an empty Praefect database table, see the
-[relevant troubleshooting section](index.md#relation-does-not-exist-errors).
+[relevant troubleshooting section](troubleshooting.md#relation-does-not-exist-errors).
diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md
index eaf9e21780d..0af248e0573 100644
--- a/doc/administration/gitaly/index.md
+++ b/doc/administration/gitaly/index.md
@@ -19,6 +19,67 @@ Gitaly implements a client-server architecture:
- [GitLab Shell](https://gitlab.com/gitlab-org/gitlab-shell).
- [GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse).
+Gitaly manages only Git repository access for GitLab. Other types of GitLab data aren't accessed
+using Gitaly.
+
+GitLab accesses [repositories](../../user/project/repository/index.md) through the configured
+[repository storages](../repository_storage_paths.md). Each new repository is stored on one of the
+repository storages based on their
+[configured weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored). Each
+repository storage is either:
+
+- A Gitaly storage with direct access to repositories using [storage paths](../repository_storage_paths.md),
+ where each repository is stored on a single Gitaly node. All requests are routed to this node.
+- A virtual storage provided by [Gitaly Cluster](#gitaly-cluster), where each repository can be
+ stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster:
+ - Read requests are distributed between multiple Gitaly nodes, which can improve performance.
+ - Write requests are broadcast to repository replicas.
+
+WARNING:
+Engineering support for NFS for Git repositories is deprecated. Read the
+[deprecation notice](#nfs-deprecation-notice).
+
+## Virtual storage
+
+Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository
+management.
+
+Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations.
+However, this is at the expense of additional storage space needed to store each repository on multiple
+Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is:
+
+- Improved fault tolerance, because each Gitaly node has a copy of every repository.
+- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
+ loads, because read loads are distributed across Gitaly nodes.
+- Manual rebalancing for performance is not required, because read loads are distributed across
+ Gitaly nodes.
+- Simpler management, because all Gitaly nodes are identical.
+
+The number of repository replicas can be configured using a
+[replication factor](praefect.md#replication-factor).
+
+It can
+be uneconomical to have the same replication factor for all repositories.
+[Variable replication factor](https://gitlab.com/groups/gitlab-org/-/epics/3372) is planned to
+provide greater flexibility for extremely large GitLab instances.
+
+As with normal Gitaly storages, virtual storages can be sharded.
+
+## Gitaly
+
+The following shows GitLab set up to use direct access to Gitaly:
+
+![Shard example](img/shard_example_v13_3.png)
+
+In this example:
+
+- Each repository is stored on one of three Gitaly storages: `storage-1`, `storage-2`, or
+ `storage-3`.
+- Each storage is serviced by a Gitaly node.
+- The three Gitaly nodes store data on their file systems.
+
+### Gitaly architecture
+
The following illustrates the Gitaly client-server architecture:
```mermaid
@@ -44,19 +105,7 @@ D -- gRPC --> Gitaly
E --> F
```
-End users do not have direct access to Gitaly. Gitaly manages only Git repository access for GitLab.
-Other types of GitLab data aren't accessed using Gitaly.
-
-<!-- vale gitlab.FutureTense = NO -->
-
-WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](#nfs-deprecation-notice).
-
-<!-- vale gitlab.FutureTense = YES -->
-
-## Configure Gitaly
+### Configure Gitaly
Gitaly comes pre-configured with Omnibus GitLab, which is a configuration
[suitable for up to 1000 users](../reference_architectures/1k_users.md). For:
@@ -72,10 +121,24 @@ default value. The default value depends on the GitLab version.
## Gitaly Cluster
-Gitaly, the service that provides storage for Git repositories, can
-be run in a clustered configuration to scale the Gitaly service and increase
-fault tolerance. In this configuration, every Git repository is stored on every
-Gitaly node in the cluster.
+Git storage is provided through the Gitaly service in GitLab, and is essential to the operation of
+GitLab. When the number of users, repositories, and activity grows, it is important to scale Gitaly
+appropriately by:
+
+- Increasing the available CPU and memory resources available to Git before
+ resource exhaustion degrades Git, Gitaly, and GitLab application performance.
+- Increasing available storage before storage limits are reached causing write
+ operations to fail.
+- Removing single points of failure to improve fault tolerance. Git should be
+ considered mission critical if a service degradation would prevent you from
+ deploying changes to production.
+
+Gitaly can be run in a clustered configuration to:
+
+- Scale the Gitaly service.
+- Increase fault tolerance.
+
+In this configuration, every Git repository can be stored on multiple Gitaly nodes in the cluster.
Using a Gitaly Cluster increases fault tolerance by:
@@ -87,6 +150,19 @@ NOTE:
Technical support for Gitaly clusters is limited to GitLab Premium and Ultimate
customers.
+The following shows GitLab set up to access `storage-1`, a virtual storage provided by Gitaly
+Cluster:
+
+![Cluster example](img/cluster_example_v13_3.png)
+
+In this example:
+
+- Repositories are stored on a virtual storage called `storage-1`.
+- Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`.
+- The three Gitaly nodes share data in three separate hashed storage locations.
+- The [replication factor](praefect.md#replication-factor) is `3`. There are three copies maintained
+ of each repository.
+
The availability objectives for Gitaly clusters are:
- **Recovery Point Objective (RPO):** Less than 1 minute.
@@ -110,33 +186,18 @@ Gitaly Cluster supports:
- [Strong consistency](praefect.md#strong-consistency) of the secondary replicas.
- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the primary to the secondary.
- Reporting of possible data loss if replication queue is non-empty.
-- Marking repositories as [read-only](praefect.md#read-only-mode) if data loss is detected to prevent data inconsistencies.
+- From GitLab 13.0 to GitLab 14.0, marking repositories as [read-only](praefect.md#read-only-mode)
+ if data loss is detected to prevent data inconsistencies.
Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489)
for improvements including
[horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013).
-### Overview
-
-Git storage is provided through the Gitaly service in GitLab, and is essential
-to the operation of the GitLab application. When the number of
-users, repositories, and activity grows, it is important to scale Gitaly
-appropriately by:
-
-- Increasing the available CPU and memory resources available to Git before
- resource exhaustion degrades Git, Gitaly, and GitLab application performance.
-- Increase available storage before storage limits are reached causing write
- operations to fail.
-- Improve fault tolerance by removing single points of failure. Git should be
- considered mission critical if a service degradation would prevent you from
- deploying changes to production.
-
### Moving beyond NFS
WARNING:
-From GitLab 13.0, using NFS for Git repositories is deprecated. In GitLab 14.0,
-support for NFS for Git repositories is scheduled to be removed. Upgrade to
-Gitaly Cluster as soon as possible.
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
[Network File System (NFS)](https://en.wikipedia.org/wiki/Network_File_System)
is not well suited to Git workloads which are CPU and IOPS sensitive.
@@ -159,22 +220,6 @@ Further reading:
- Blog post: [The road to Gitaly v1.0 (aka, why GitLab doesn't require NFS for storing Git data anymore)](https://about.gitlab.com/blog/2018/09/12/the-road-to-gitaly-1-0/)
- Blog post: [How we spent two weeks hunting an NFS bug in the Linux kernel](https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/)
-### Where Gitaly Cluster fits
-
-GitLab accesses [repositories](../../user/project/repository/index.md) through the configured
-[repository storages](../repository_storage_paths.md). Each new repository is stored on one of the
-repository storages based on their configured weights. Each repository storage is either:
-
-- A Gitaly storage served directly by Gitaly. These map to a directory on the file system of a
- Gitaly node.
-- A [virtual storage](#virtual-storage-or-direct-gitaly-storage) served by Praefect. A virtual
- storage is a cluster of Gitaly storages that appear as a single repository storage.
-
-Virtual storages are a feature of Gitaly Cluster. They support replicating the repositories to
-multiple storages for fault tolerance. Virtual storages can improve performance by distributing
-requests across Gitaly nodes. Their distributed nature makes it viable to have a single repository
-storage in GitLab to simplify repository management.
-
### Components of Gitaly Cluster
Gitaly Cluster consists of multiple components:
@@ -182,59 +227,10 @@ Gitaly Cluster consists of multiple components:
- [Load balancer](praefect.md#load-balancer) for distributing requests and providing fault-tolerant access to
Praefect nodes.
- [Praefect](praefect.md#praefect) nodes for managing the cluster and routing requests to Gitaly nodes.
-- [PostgreSQL database](praefect.md#postgresql) for persisting cluster metadata and [PgBouncer](praefect.md#pgbouncer),
+- [PostgreSQL database](praefect.md#postgresql) for persisting cluster metadata and [PgBouncer](praefect.md#use-pgbouncer),
recommended for pooling Praefect's database connections.
- Gitaly nodes to provide repository storage and Git access.
-![Cluster example](img/cluster_example_v13_3.png)
-
-In this example:
-
-- Repositories are stored on a virtual storage called `storage-1`.
-- Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`.
-- The three Gitaly nodes store data on their file systems.
-
-### Virtual storage or direct Gitaly storage
-
-Gitaly supports multiple models of scaling:
-
-- Clustering using Gitaly Cluster, where each repository is stored on multiple Gitaly nodes in the
- cluster. Read requests are distributed between repository replicas and write requests are
- broadcast to repository replicas. GitLab accesses virtual storage.
-- Direct access to Gitaly storage using [repository storage paths](../repository_storage_paths.md),
- where each repository is stored on the assigned Gitaly node. All requests are routed to this node.
-
-The following is Gitaly set up to use direct access to Gitaly instead of Gitaly Cluster:
-
-![Shard example](img/shard_example_v13_3.png)
-
-In this example:
-
-- Each repository is stored on one of three Gitaly storages: `storage-1`, `storage-2`,
- or `storage-3`.
-- Each storage is serviced by a Gitaly node.
-- The three Gitaly nodes share data in three separate hashed storage locations.
-- The [replication factor](praefect.md#replication-factor) is `3`. There are three copies maintained
- of each repository.
-
-Generally, virtual storage with Gitaly Cluster can replace direct Gitaly storage configurations, at
-the expense of additional storage needed to store each repository on multiple Gitaly nodes. The
-benefit of using Gitaly Cluster over direct Gitaly storage is:
-
-- Improved fault tolerance, because each Gitaly node has a copy of every repository.
-- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
- loads, because read loads are distributed across replicas.
-- Manual rebalancing for performance is not required, because read loads are distributed across
- replicas.
-- Simpler management, because all Gitaly nodes are identical.
-
-Under some workloads, CPU and memory requirements may require a large fleet of Gitaly nodes. It
-can be uneconomical to have one to one replication factor.
-
-A hybrid approach can be used in these instances, where each shard is configured as a smaller
-cluster. [Variable replication factor](https://gitlab.com/groups/gitlab-org/-/epics/3372) is planned
-to provide greater flexibility for extremely large GitLab instances.
-
### Architecture
Praefect is a router and transaction manager for Gitaly, and a required
@@ -360,385 +356,21 @@ The second facet presents the only real solution. For this, we developed
## NFS deprecation notice
-<!-- vale gitlab.FutureTense = NO -->
-
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
Additional information:
- [Recommended NFS mount options and known issues with Gitaly and NFS](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
- [GitLab statement of support](https://about.gitlab.com/support/statement-of-support.html#gitaly-and-nfs).
-<!-- vale gitlab.FutureTense = YES -->
-
GitLab recommends:
- Creating a [Gitaly Cluster](#gitaly-cluster) as soon as possible.
- [Moving your repositories](praefect.md#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly
Cluster.
-We welcome your feedback on this process: raise a support ticket, or [comment on the epic](https://gitlab.com/groups/gitlab-org/-/epics/4916).
-
-## Troubleshooting
-
-Refer to the information below when troubleshooting Gitaly and Gitaly Cluster.
-
-Before troubleshooting, see the Gitaly and Gitaly Cluster
-[frequently asked questions](faq.md).
-
-### Troubleshoot Gitaly
-
-The following sections provide possible solutions to Gitaly errors.
-
-See also [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings.
-
-#### Check versions when using standalone Gitaly servers
-
-When using standalone Gitaly servers, you must make sure they are the same version
-as GitLab to ensure full compatibility:
-
-1. On the top bar, select **Menu >** **{admin}** **Admin** on your GitLab instance.
-1. On the left sidebar, select **Overview > Gitaly Servers**.
-1. Confirm all Gitaly servers indicate that they are up to date.
-
-#### Use `gitaly-debug`
-
-The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git
-performance. It is intended to help production engineers and support
-engineers investigate Gitaly performance problems.
-
-If you're using GitLab 11.6 or newer, this tool should be installed on
-your GitLab or Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`.
-If you're investigating an older GitLab version you can compile this
-tool offline and copy the executable to your server:
-
-```shell
-git clone https://gitlab.com/gitlab-org/gitaly.git
-cd cmd/gitaly-debug
-GOOS=linux GOARCH=amd64 go build -o gitaly-debug
-```
-
-To see the help page of `gitaly-debug` for a list of supported sub-commands, run:
-
-```shell
-gitaly-debug -h
-```
-
-#### Commits, pushes, and clones return a 401
-
-```plaintext
-remote: GitLab: 401 Unauthorized
-```
-
-You need to sync your `gitlab-secrets.json` file with your GitLab
-application nodes.
-
-#### Client side gRPC logs
-
-Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
-client has its own log file which may contain useful information when
-you are seeing Gitaly errors. You can control the log level of the
-gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
-default level is `WARN`.
-
-You can run a gRPC trace with:
-
-```shell
-sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check
-```
-
-#### Server side gRPC logs
-
-gRPC tracing can also be enabled in Gitaly itself with the `GODEBUG=http2debug`
-environment variable. To set this in an Omnibus GitLab install:
-
-1. Add the following to your `gitlab.rb` file:
-
- ```ruby
- gitaly['env'] = {
- "GODEBUG=http2debug" => "2"
- }
- ```
-
-1. [Reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure) GitLab.
-
-#### Correlating Git processes with RPCs
-
-Sometimes you need to find out which Gitaly RPC created a particular Git process.
-
-One method for doing this is by using `DEBUG` logging. However, this needs to be enabled
-ahead of time and the logs produced are quite verbose.
-
-A lightweight method for doing this correlation is by inspecting the environment
-of the Git process (using its `PID`) and looking at the `CORRELATION_ID` variable:
-
-```shell
-PID=<Git process ID>
-sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID=
-```
-
-This method isn't reliable for `git cat-file` processes, because Gitaly
-internally pools and re-uses those across RPCs.
-
-#### Observing `gitaly-ruby` traffic
+We welcome your feedback on this process. You can:
-[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly,
-so, there's not that much visibility into what goes on inside
-`gitaly-ruby` processes.
-
-If you have Prometheus set up to scrape your Gitaly process, you can see
-request rates and error codes for individual RPCs in `gitaly-ruby` by
-querying `grpc_client_handled_total`.
-
-- In theory, this metric does not differentiate between `gitaly-ruby` and other RPCs.
-- In practice from GitLab 11.9, all gRPC calls made by Gitaly itself are internal calls from the
- main Gitaly process to one of its `gitaly-ruby` sidecars.
-
-Assuming your `grpc_client_handled_total` counter only observes Gitaly,
-the following query shows you RPCs are (most likely) internally
-implemented as calls to `gitaly-ruby`:
-
-```prometheus
-sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0
-```
-
-#### Repository changes fail with a `401 Unauthorized` error
-
-If you run Gitaly on its own server and notice these conditions:
-
-- Users can successfully clone and fetch repositories by using both SSH and HTTPS.
-- Users can't push to repositories, or receive a `401 Unauthorized` message when attempting to
- make changes to them in the web UI.
-
-Gitaly may be failing to authenticate with the Gitaly client because it has the
-[wrong secrets file](configure_gitaly.md#configure-gitaly-servers).
-
-Confirm the following are all true:
-
-- When any user performs a `git push` to any repository on this Gitaly server, it
- fails with a `401 Unauthorized` error:
-
- ```shell
- remote: GitLab: 401 Unauthorized
- To <REMOTE_URL>
- ! [remote rejected] branch-name -> branch-name (pre-receive hook declined)
- error: failed to push some refs to '<REMOTE_URL>'
- ```
-
-- When any user adds or modifies a file from the repository using the GitLab
- UI, it immediately fails with a red `401 Unauthorized` banner.
-- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects)
- successfully creates the project but doesn't create the README.
-- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server)
- on a Gitaly client and reproducing the error, you get `401` errors
- when reaching the [`/api/v4/internal/allowed`](../../development/internal_api.md) endpoint:
-
- ```shell
- # api_json.log
- {
- "time": "2019-07-18T00:30:14.967Z",
- "severity": "INFO",
- "duration": 0.57,
- "db": 0,
- "view": 0.57,
- "status": 401,
- "method": "POST",
- "path": "\/api\/v4\/internal\/allowed",
- "params": [
- {
- "key": "action",
- "value": "git-receive-pack"
- },
- {
- "key": "changes",
- "value": "REDACTED"
- },
- {
- "key": "gl_repository",
- "value": "REDACTED"
- },
- {
- "key": "project",
- "value": "\/path\/to\/project.git"
- },
- {
- "key": "protocol",
- "value": "web"
- },
- {
- "key": "env",
- "value": "{\"GIT_ALTERNATE_OBJECT_DIRECTORIES\":[],\"GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE\":[],\"GIT_OBJECT_DIRECTORY\":null,\"GIT_OBJECT_DIRECTORY_RELATIVE\":null}"
- },
- {
- "key": "user_id",
- "value": "2"
- },
- {
- "key": "secret_token",
- "value": "[FILTERED]"
- }
- ],
- "host": "gitlab.example.com",
- "ip": "REDACTED",
- "ua": "Ruby",
- "route": "\/api\/:version\/internal\/allowed",
- "queue_duration": 4.24,
- "gitaly_calls": 0,
- "gitaly_duration": 0,
- "correlation_id": "XPUZqTukaP3"
- }
-
- # nginx_access.log
- [IP] - - [18/Jul/2019:00:30:14 +0000] "POST /api/v4/internal/allowed HTTP/1.1" 401 30 "" "Ruby"
- ```
-
-To fix this problem, confirm that your [`gitlab-secrets.json` file](configure_gitaly.md#configure-gitaly-servers)
-on the Gitaly server matches the one on Gitaly client. If it doesn't match,
-update the secrets file on the Gitaly server to match the Gitaly client, then
-[reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure).
-
-#### Command line tools cannot connect to Gitaly
-
-gRPC cannot reach your Gitaly server if:
-
-- You can't connect to a Gitaly server with command-line tools.
-- Certain actions result in a `14: Connect Failed` error message.
-
-Verify you can reach Gitaly by using TCP:
-
-```shell
-sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
-```
-
-If the TCP connection:
-
-- Fails, check your network settings and your firewall rules.
-- Succeeds, your networking and firewall rules are correct.
-
-If you use proxy servers in your command line environment such as Bash, these can interfere with
-your gRPC traffic.
-
-If you use Bash or a compatible command line environment, run the following commands to determine
-whether you have proxy servers configured:
-
-```shell
-echo $http_proxy
-echo $https_proxy
-```
-
-If either of these variables have a value, your Gitaly CLI connections may be getting routed through
-a proxy which cannot connect to Gitaly.
-
-To remove the proxy setting, run the following commands (depending on which variables had values):
-
-```shell
-unset http_proxy
-unset https_proxy
-```
-
-#### Permission denied errors appearing in Gitaly or Praefect logs when accessing repositories
-
-You might see the following in Gitaly and Praefect logs:
-
-```shell
-{
- ...
- "error":"rpc error: code = PermissionDenied desc = permission denied",
- "grpc.code":"PermissionDenied",
- "grpc.meta.client_name":"gitlab-web",
- "grpc.request.fullMethod":"/gitaly.ServerService/ServerInfo",
- "level":"warning",
- "msg":"finished unary call with code PermissionDenied",
- ...
-}
-```
-
-This is a GRPC call
-[error response code](https://grpc.github.io/grpc/core/md_doc_statuscodes.html).
-
-If this error occurs, even though
-[the Gitaly auth tokens are set up correctly](#praefect-errors-in-logs),
-it's likely that the Gitaly servers are experiencing
-[clock drift](https://en.wikipedia.org/wiki/Clock_drift).
-
-Ensure the Gitaly clients and servers are synchronized, and use an NTP time
-server to keep them synchronized.
-
-#### Gitaly not listening on new address after reconfiguring
-
-When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may
-continue to listen on the old address after a `sudo gitlab-ctl reconfigure`.
-
-When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be
-necessary because [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved.
-
-#### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node
-
-If this error occurs even though file permissions are correct, it's likely that the Gitaly node is
-experiencing [clock drift](https://en.wikipedia.org/wiki/Clock_drift).
-
-Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time
-server to keep them synchronized if possible.
-
-### Troubleshoot Praefect (Gitaly Cluster)
-
-The following sections provide possible solutions to Gitaly Cluster errors.
-
-#### Praefect errors in logs
-
-If you receive an error, check `/var/log/gitlab/gitlab-rails/production.log`.
-
-Here are common errors and potential causes:
-
-- 500 response code
- - **ActionView::Template::Error (7:permission denied)**
- - `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server.
- - **Unable to save project. Error: 7:permission denied**
- - Secret token in `praefect['storage_nodes']` on GitLab server does not match the
- value in `gitaly['auth_token']` on one or more Gitaly servers.
-- 503 response code
- - **GRPC::Unavailable (14:failed to connect to all addresses)**
- - GitLab was unable to reach Praefect.
- - **GRPC::Unavailable (14:all SubCons are in TransientFailure...)**
- - Praefect cannot reach one or more of its child Gitaly nodes. Try running
- the Praefect connection checker to diagnose.
-
-#### Determine primary Gitaly node
-
-To determine the current primary Gitaly node for a specific Praefect node:
-
-- Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the
- [`Gitlab Omnibus - Praefect` dashboard](https://gitlab.com/gitlab-org/grafana-dashboards/-/blob/master/omnibus/praefect.json).
- This is recommended.
-- If you do not have Grafana set up, use the following command on each host of each
- Praefect node:
-
- ```shell
- curl localhost:9652/metrics | grep gitaly_praefect_primaries`
- ```
-
-#### Relation does not exist errors
-
-By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task.
-However, if the `gitlab-ctl reconfigure` command isn't executed or there are errors during the
-execution, the Praefect database tables are not created on initial reconfigure and can throw
-errors that relations do not exist.
-
-For example:
-
-- `ERROR: relation "node_status" does not exist at character 13`
-- `ERROR: relation "replication_queue_lock" does not exist at character 40`
-- This error:
-
- ```json
- {"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"}
- ```
-
-To solve this, the database schema migration can be done using `sql-migrate` sub-command of
-the `praefect` command:
-
-```shell
-$ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate
-praefect sql-migrate: OK (applied 21 migrations)
-```
+- Raise a support ticket.
+- [Comment on the epic](https://gitlab.com/groups/gitlab-org/-/epics/4916).
diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md
index 21e5360e27b..e483bcc944a 100644
--- a/doc/administration/gitaly/praefect.md
+++ b/doc/administration/gitaly/praefect.md
@@ -43,8 +43,8 @@ default value. The default value depends on the GitLab version.
## Setup Instructions
-If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus
-package (highly recommended), follow the steps below:
+If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus GitLab package
+(highly recommended), follow the steps below:
1. [Preparation](#preparation)
1. [Configuring the Praefect database](#postgresql)
@@ -59,25 +59,27 @@ package (highly recommended), follow the steps below:
Before beginning, you should already have a working GitLab instance. [Learn how
to install GitLab](https://about.gitlab.com/install/).
-Provision a PostgreSQL server (PostgreSQL 11 or newer).
+Provision a PostgreSQL server. We recommend using the PostgreSQL that is shipped
+with Omnibus GitLab and use it to configure the PostgreSQL database. You can use an
+external PostgreSQL server (version 11 or newer) but you must set it up [manually](#manual-database-setup).
-Prepare all your new nodes by [installing
-GitLab](https://about.gitlab.com/install/).
+Prepare all your new nodes by [installing GitLab](https://about.gitlab.com/install/). You need:
+- 1 PostgreSQL node
+- 1 PgBouncer node (optional)
- At least 1 Praefect node (minimal storage required)
- 3 Gitaly nodes (high CPU, high memory, fast storage)
- 1 GitLab server
-You need the IP/host address for each node.
+You also need the IP/host address for each node:
-1. `LOAD_BALANCER_SERVER_ADDRESS`: the IP/host address of the load balancer
-1. `POSTGRESQL_SERVER_ADDRESS`: the IP/host address of the PostgreSQL server
+1. `PRAEFECT_LOADBALANCER_HOST`: the IP/host address of Praefect load balancer
+1. `POSTGRESQL_HOST`: the IP/host address of the PostgreSQL server
+1. `PGBOUNCER_HOST`: the IP/host address of the PostgreSQL server
1. `PRAEFECT_HOST`: the IP/host address of the Praefect server
1. `GITALY_HOST_*`: the IP or host address of each Gitaly server
1. `GITLAB_HOST`: the IP/host address of the GitLab server
-If you are using a cloud provider, you can look up the addresses for each server through your cloud provider's management console.
-
If you are using Google Cloud Platform, SoftLayer, or any other vendor that provides a virtual private cloud (VPC) you can use the private addresses for each cloud instance (corresponds to "internal address" for Google Cloud Platform) for `PRAEFECT_HOST`, `GITALY_HOST_*`, and `GITLAB_HOST`.
#### Secrets
@@ -98,6 +100,14 @@ with secure tokens as you complete the setup process.
Praefect cluster directly; that could lead to data loss.
1. `PRAEFECT_SQL_PASSWORD`: this password is used by Praefect to connect to
PostgreSQL.
+1. `PRAEFECT_SQL_PASSWORD_HASH`: the hash of password of the Praefect user.
+ Use `gitlab-ctl pg-password-md5 praefect` to generate the hash. The command
+ asks for the password for `praefect` user. Enter `PRAEFECT_SQL_PASSWORD`
+ plaintext password. By default, Praefect uses `praefect` user, but you can
+ change it.
+1. `PGBOUNCER_SQL_PASSWORD_HASH`: the hash of password of the PgBouncer user.
+ PgBouncer uses this password to connect to PostgreSQL. For more details
+ see [bundled PgBouncer](../postgresql/pgbouncer.md) documentation.
We note in the instructions below where these secrets are required.
@@ -108,127 +118,210 @@ Omnibus GitLab installations can use `gitlab-secrets.json` for `GITLAB_SHELL_SEC
NOTE:
Do not store the GitLab application database and the Praefect
-database on the same PostgreSQL server if using
-[Geo](../geo/index.md). The replication state is internal to each instance
-of GitLab and should not be replicated.
+database on the same PostgreSQL server if using [Geo](../geo/index.md).
+The replication state is internal to each instance of GitLab and should
+not be replicated.
These instructions help set up a single PostgreSQL database, which creates a single point of
-failure. The following options are available:
+failure. Alternatively, [you can use PostgreSQL replication and failover](../postgresql/replication_and_failover.md).
+
+The following options are available:
- For non-Geo installations, either:
- Use one of the documented [PostgreSQL setups](../postgresql/index.md).
- - Use your own third-party database setup, if fault tolerance is required.
+ - Use your own third-party database setup. This will require [manual setup](#manual-database-setup).
- For Geo instances, either:
- Set up a separate [PostgreSQL instance](https://www.postgresql.org/docs/11/high-availability.html).
- Use a cloud-managed PostgreSQL service. AWS
[Relational Database Service](https://aws.amazon.com/rds/) is recommended.
-To complete this section you need:
+#### Manual database setup
-- 1 Praefect node
-- 1 PostgreSQL server (PostgreSQL 11 or newer)
- - An SQL user with permissions to create databases
+To complete this section you need:
-During this section, we configure the PostgreSQL server, from the Praefect
-node, using `psql` which is installed by Omnibus GitLab.
+- One Praefect node
+- One PostgreSQL node (version 11 or newer)
+ - A PostgreSQL user with permissions to manage the database server
-1. SSH into the **Praefect** node and login as root:
+In this section, we configure the PostgreSQL database. This can be used for both external
+and Omnibus-provided PostgreSQL server.
- ```shell
- sudo -i
- ```
+To run the following instructions, you can use the Praefect node, where `psql` is installed
+by Omnibus GitLab (`/opt/gitlab/embedded/bin/psql`). If you are using the Omnibus-provided
+PostgreSQL you can use `gitlab-psql` on the PostgreSQL node instead:
-1. Connect to the PostgreSQL server with administrative access. This is likely
- the `postgres` user. The database `template1` is used because it is created
- by default on all PostgreSQL servers.
+1. Create a new user `praefect` to be used by Praefect:
- ```shell
- /opt/gitlab/embedded/bin/psql -U postgres -d template1 -h POSTGRESQL_SERVER_ADDRESS
+ ```sql
+ CREATE ROLE praefect WITH LOGIN PASSWORD 'PRAEFECT_SQL_PASSWORD';
```
- Create a new user `praefect` to be used by Praefect. Replace
- `PRAEFECT_SQL_PASSWORD` with the strong password you generated in the
- preparation step.
+ Replace `PRAEFECT_SQL_PASSWORD` with the strong password you generated in the preparation step.
+
+1. Create a new database `praefect_production` that is owned by `praefect` user.
```sql
- CREATE ROLE praefect WITH LOGIN CREATEDB PASSWORD 'PRAEFECT_SQL_PASSWORD';
+ CREATE DATABASE praefect_production WITH OWNER praefect ENCODING UTF8;
```
-1. Reconnect to the PostgreSQL server, this time as the `praefect` user:
+For using Omnibus-provided PgBouncer you need to take the following additional steps. We strongly
+recommend using the PostgreSQL that is shipped with Omnibus as the backend. The following
+instructions only work on Omnibus-provided PostgreSQL:
- ```shell
- /opt/gitlab/embedded/bin/psql -U praefect -d template1 -h POSTGRESQL_SERVER_ADDRESS
+1. For Omnibus-provided PgBouncer, you need to use the hash of `praefect` user instead the of the
+ actual password:
+
+ ```sql
+ ALTER ROLE praefect WITH PASSWORD 'md5<PRAEFECT_SQL_PASSWORD_HASH>';
```
- Create a new database `praefect_production`. By creating the database while
- connected as the `praefect` user, we are confident they have access.
+ Replace `<PRAEFECT_SQL_PASSWORD_HASH>` with the hash of the password you generated in the
+ preparation step. Note that it is prefixed with `md5` literal.
+
+1. The PgBouncer that is shipped with Omnibus is configured to use [`auth_query`](https://www.pgbouncer.org/config.html#generic-settings)
+ and uses `pg_shadow_lookup` function. You need to create this function in `praefect_production`
+ database:
```sql
- CREATE DATABASE praefect_production WITH ENCODING=UTF8;
+ CREATE OR REPLACE FUNCTION public.pg_shadow_lookup(in i_username text, out username text, out password text) RETURNS record AS $$
+ BEGIN
+ SELECT usename, passwd FROM pg_catalog.pg_shadow
+ WHERE usename = i_username INTO username, password;
+ RETURN;
+ END;
+ $$ LANGUAGE plpgsql SECURITY DEFINER;
+
+ REVOKE ALL ON FUNCTION public.pg_shadow_lookup(text) FROM public, pgbouncer;
+ GRANT EXECUTE ON FUNCTION public.pg_shadow_lookup(text) TO pgbouncer;
```
The database used by Praefect is now configured.
If you see Praefect database errors after configuring PostgreSQL, see
-[troubleshooting steps](index.md#relation-does-not-exist-errors).
+[troubleshooting steps](troubleshooting.md#relation-does-not-exist-errors).
-#### PgBouncer
+#### Use PgBouncer
To reduce PostgreSQL resource consumption, we recommend setting up and configuring
[PgBouncer](https://www.pgbouncer.org/) in front of the PostgreSQL instance. To do
-this, set the corresponding IP or host address of the PgBouncer instance in
-`/etc/gitlab/gitlab.rb` by changing the following settings:
+this, you must point Praefect to PgBouncer by setting Praefect database parameters:
-- `praefect['database_host']`, for the address.
-- `praefect['database_port']`, for the port.
+```ruby
+praefect['database_host'] = PGBOUNCER_HOST
+praefect['database_port'] = 6432
+praefect['database_user'] = 'praefect'
+praefect['database_password'] = PRAEFECT_SQL_PASSWORD
+praefect['database_dbname'] = 'praefect_production'
+#praefect['database_sslmode'] = '...'
+#praefect['database_sslcert'] = '...'
+#praefect['database_sslkey'] = '...'
+#praefect['database_sslrootcert'] = '...'
+```
-Because PgBouncer manages resources more efficiently, Praefect still requires a
-direct connection to the PostgreSQL database. It uses the
-[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html)
-feature that is [not supported](https://www.pgbouncer.org/features.html) by
-PgBouncer with `pool_mode = transaction`.
-Set `praefect['database_host_no_proxy']` and `praefect['database_port_no_proxy']`
-to a direct connection, and not a PgBouncer connection.
+Praefect requires an additional connection to the PostgreSQL that supports the
+[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html) feature. With PgBouncer
+this feature is only available with `session` pool mode (`pool_mode = session`).
+It is not supported in `transaction` pool mode (`pool_mode = transaction`).
-Save the changes to `/etc/gitlab/gitlab.rb` and
-[reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure).
+For the additional connection, you must either:
-This documentation doesn't provide PgBouncer installation instructions,
-but you can:
+- Connect Praefect directly to PostgreSQL and bypass PgBouncer.
+- Configure a new PgBouncer database that uses to the same PostgreSQL database endpoint,
+ but with different pool mode. That is, `pool_mode = session`.
-- Find instructions on the [official website](https://www.pgbouncer.org/install.html).
-- Use a [Docker image](https://hub.docker.com/r/edoburu/pgbouncer/).
+Praefect can be configured to use different connection parameters for direct access
+to PostgreSQL. This is the connection that supports the `LISTEN` feature.
-In addition to the base PgBouncer configuration options, set the following values in
-your `pgbouncer.ini` file:
+Here is an example of Praefect that bypasses PgBouncer and directly connects to PostgreSQL:
-- The [Praefect PostgreSQL database](#postgresql) in the `[databases]` section:
+```ruby
+praefect['database_direct_host'] = POSTGRESQL_HOST
+praefect['database_direct_port'] = 5432
+
+# Use the following to override parameters of direct database connection.
+# Comment out where the parameters are the same for both connections.
+
+praefect['database_direct_user'] = 'praefect'
+praefect['database_direct_password'] = PRAEFECT_SQL_PASSWORD
+praefect['database_direct_dbname'] = 'praefect_production'
+#praefect['database_direct_sslmode'] = '...'
+#praefect['database_direct_sslcert'] = '...'
+#praefect['database_direct_sslkey'] = '...'
+#praefect['database_direct_sslrootcert'] = '...'
+```
- ```ini
- [databases]
- * = host=POSTGRESQL_SERVER_ADDRESS port=5432 auth_user=praefect
- ```
+We recommend using PgBouncer with `session` pool mode instead. You can use the [bundled
+PgBouncer](../postgresql/pgbouncer.md) or use an external PgBouncer and [configure it
+manually](https://www.pgbouncer.org/config.html).
-- [`pool_mode`](https://www.pgbouncer.org/config.html#pool_mode)
- and [`ignore_startup_parameters`](https://www.pgbouncer.org/config.html#ignore_startup_parameters)
- in the `[pgbouncer]` section:
+The following example uses the bundled PgBouncer and sets up two separate connection pools,
+one in `session` pool mode and the other in `transaction` pool mode. For this example to work,
+you need to prepare PostgreSQL server with [setup instruction](#manual-database-setup):
- ```ini
- [pgbouncer]
- pool_mode = transaction
- ignore_startup_parameters = extra_float_digits
- ```
+```ruby
+pgbouncer['databases'] = {
+ # Other database configuation including gitlabhq_production
+ ...
+
+ praefect_production: {
+ host: POSTGRESQL_HOST,
+ # Use `pgbouncer` user to connect to database backend.
+ user: 'pgbouncer',
+ password: PGBOUNCER_SQL_PASSWORD_HASH,
+ pool_mode: 'transaction'
+ }
+ praefect_production_direct: {
+ host: POSTGRESQL_HOST,
+ # Use `pgbouncer` user to connect to database backend.
+ user: 'pgbouncer',
+ password: PGBOUNCER_SQL_PASSWORD_HASH,
+ dbname: 'praefect_production',
+ pool_mode: 'session'
+ },
+
+ ...
+}
+```
+
+Both `praefect_production` and `praefect_production_direct` use the same database endpoint
+(`praefect_production`), but with different pool modes. This translates to the following
+`databases` section of PgBouncer:
-The `praefect` user and its password should be included in the file (default is
-`userlist.txt`) used by PgBouncer if the [`auth_file`](https://www.pgbouncer.org/config.html#auth_file)
-configuration option is set.
+```ini
+[databases]
+praefect_production = host=POSTGRESQL_HOST auth_user=pgbouncer pool_mode=transaction
+praefect_production_direct = host=POSTGRESQL_HOST auth_user=pgbouncer dbname=praefect_production pool_mode=session
+```
+
+Now you can configure Praefect to use PgBouncer for both connections:
+
+```ruby
+praefect['database_host'] = PGBOUNCER_HOST
+praefect['database_port'] = 6432
+praefect['database_user'] = 'praefect'
+# `PRAEFECT_SQL_PASSWORD` is the plain-text password of
+# Praefect user. Not to be confused with `PRAEFECT_SQL_PASSWORD_HASH`.
+praefect['database_password'] = PRAEFECT_SQL_PASSWORD
+
+praefect['database_dbname'] = 'praefect_production'
+praefect['database_direct_dbname'] = 'praefect_production_direct'
+
+# There is no need to repeat the following. Parameters of direct
+# database connection will fall back to the values above.
+
+#praefect['database_direct_host'] = PGBOUNCER_HOST
+#praefect['database_direct_port'] = 6432
+#praefect['database_direct_user'] = 'praefect'
+#praefect['database_direct_password'] = PRAEFECT_SQL_PASSWORD
+```
+
+With this configuration, Praefect uses PgBouncer for both connection types.
NOTE:
-By default PgBouncer uses port `6432` to accept incoming
-connections. You can change it by setting the [`listen_port`](https://www.pgbouncer.org/config.html#listen_port)
-configuration option. We recommend setting it to the default port value (`5432`) used by
-PostgreSQL instances. Otherwise you should change the configuration parameter
-`praefect['database_port']` for each Praefect instance to the correct value.
+Omnibus GitLab handles the authentication requirements (using `auth_query`), but if you are preparing
+your databases manually and configuring an external PgBouncer, you must include `praefect` user and
+its password in the file used by PgBouncer. For example, `userlist.txt` if the [`auth_file`](https://www.pgbouncer.org/config.html#auth_file)
+configuration option is set. For more details, consult the PgBouncer documentation.
### Praefect
@@ -241,17 +334,10 @@ If there are multiple Praefect nodes:
To complete this section you need a [configured PostgreSQL server](#postgresql), including:
-- IP/host address (`POSTGRESQL_SERVER_ADDRESS`)
-- Password (`PRAEFECT_SQL_PASSWORD`)
-
Praefect should be run on a dedicated node. Do not run Praefect on the
application server, or a Gitaly node.
-1. SSH into the **Praefect** node and login as root:
-
- ```shell
- sudo -i
- ```
+On the **Praefect** node:
1. Disable all other services by editing `/etc/gitlab/gitlab.rb`:
@@ -295,22 +381,8 @@ application server, or a Gitaly node.
praefect['auth_token'] = 'PRAEFECT_EXTERNAL_TOKEN'
```
-1. Configure **Praefect** to connect to the PostgreSQL database by editing
- `/etc/gitlab/gitlab.rb`.
-
- You need to replace `POSTGRESQL_SERVER_ADDRESS` with the IP/host address
- of the database, and `PRAEFECT_SQL_PASSWORD` with the strong password set
- above.
-
- ```ruby
- praefect['database_host'] = 'POSTGRESQL_SERVER_ADDRESS'
- praefect['database_port'] = 5432
- praefect['database_user'] = 'praefect'
- praefect['database_password'] = 'PRAEFECT_SQL_PASSWORD'
- praefect['database_dbname'] = 'praefect_production'
- praefect['database_host_no_proxy'] = 'POSTGRESQL_SERVER_ADDRESS'
- praefect['database_port_no_proxy'] = 5432
- ```
+1. Configure **Praefect** to [connect to the PostgreSQL database](#postgresql). We
+ highly recommend using [PgBouncer](#use-pgbouncer) as well.
If you want to use a TLS client certificate, the options below can be used:
@@ -507,7 +579,7 @@ To configure Praefect with TLS:
```ruby
git_data_dirs({
"default" => {
- "gitaly_address" => 'tls://LOAD_BALANCER_SERVER_ADDRESS:2305',
+ "gitaly_address" => 'tls://PRAEFECT_LOADBALANCER_HOST:2305',
"gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
}
})
@@ -544,7 +616,7 @@ To configure Praefect with TLS:
repositories:
storages:
default:
- gitaly_address: tls://LOAD_BALANCER_SERVER_ADDRESS:3305
+ gitaly_address: tls://PRAEFECT_LOADBALANCER_HOST:3305
path: /some/local/path
```
@@ -817,7 +889,7 @@ Particular attention should be shown to:
You need to replace:
- - `LOAD_BALANCER_SERVER_ADDRESS` with the IP address or hostname of the load
+ - `PRAEFECT_LOADBALANCER_HOST` with the IP address or hostname of the load
balancer.
- `PRAEFECT_EXTERNAL_TOKEN` with the real secret
@@ -826,7 +898,7 @@ Particular attention should be shown to:
```ruby
git_data_dirs({
"default" => {
- "gitaly_address" => "tcp://LOAD_BALANCER_SERVER_ADDRESS:2305",
+ "gitaly_address" => "tcp://PRAEFECT_LOADBALANCER_HOST:2305",
"gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
}
})
@@ -926,7 +998,7 @@ For example:
git_data_dirs({
'default' => { 'gitaly_address' => 'tcp://old-gitaly.internal:8075' },
'cluster' => {
- 'gitaly_address' => 'tcp://<load_balancer_server_address>:2305',
+ 'gitaly_address' => 'tcp://<PRAEFECT_LOADBALANCER_HOST>:2305',
'gitaly_token' => '<praefect_external_token>'
}
})
@@ -981,6 +1053,26 @@ To get started quickly:
Congratulations! You've configured an observable fault-tolerant Praefect
cluster.
+## Network connectivity requirements
+
+Gitaly Cluster components need to communicate with each other over many routes.
+Your firewall rules must allow the following for Gitaly Cluster to function properly:
+
+| From | To | Default port / TLS port |
+|:-----------------------|:------------------------|:------------------------|
+| GitLab | Praefect load balancer | `2305` / `3305` |
+| Praefect load balancer | Praefect | `2305` / `3305` |
+| Praefect | Gitaly | `8075` / `9999` |
+| Gitaly | GitLab (internal API) | `80` / `443` |
+| Gitaly | Praefect load balancer | `2305` / `3305` |
+| Gitaly | Praefect | `2305` / `3305` |
+| Gitaly | Gitaly | `8075` / `9999` |
+
+NOTE:
+Gitaly does not directly connect to Praefect. However, requests from Gitaly to the Praefect
+load balancer may still be blocked unless firewalls on the Praefect nodes allow traffic from
+the Gitaly nodes.
+
## Distributed reads
> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled.
@@ -1147,24 +1239,30 @@ The `per_repository` election strategy solves this problem by electing a primary
repository. Combined with [configurable replication factors](#configure-replication-factor), you can
horizontally scale storage capacity and distribute write load across Gitaly nodes.
-Primary elections are run when:
+Primary elections are run:
-- Praefect starts up.
-- The cluster's consensus of a Gitaly node's health changes.
+- In GitLab 14.1 and later, lazily. This means that Praefect doesn't immediately elect
+ a new primary node if the current one is unhealthy. A new primary is elected if it is
+ necessary to serve a request while the current primary is unavailable.
+- In GitLab 13.12 to GitLab 14.0 when:
+ - Praefect starts up.
+ - The cluster's consensus of a Gitaly node's health changes.
-A Gitaly node is considered:
+A valid primary node candidate is a Gitaly node that:
-- Healthy if `>=50%` Praefect nodes have successfully health checked the Gitaly node in the
- previous ten seconds.
-- Unhealthy otherwise.
+- Is healthy. A Gitaly node is considered healthy if `>=50%` Praefect nodes have
+ successfully health checked the Gitaly node in the previous ten seconds.
+- Has a fully up to date copy of the repository.
-During an election run, Praefect elects a new primary Gitaly node for each repository that has
-an unhealthy primary Gitaly node. The election is made:
+If there are multiple primary node candidates, Praefect:
-- Randomly from healthy secondary Gitaly nodes that are the most up to date.
-- Only from Gitaly nodes assigned to the host repository.
+- Picks one of them randomly.
+- Prioritizes promoting a Gitaly node that is assigned to host the repository. If
+ there are no assigned Gitaly nodes to elect as the primary, Praefect may temporarily
+ elect an unassigned one. The unassigned primary is demoted in favor of an assigned
+ one when one becomes available.
-If there are no healthy secondary nodes for a repository:
+If there are no valid primary candidates for a repository:
- The unhealthy primary node is demoted and the repository is left without a primary node.
- Operations that require a primary node fail until a primary is successfully elected.
@@ -1212,7 +1310,7 @@ To migrate existing clusters:
- If downtime is unacceptable:
- 1. Determine which Gitaly node is [the current primary](index.md#determine-primary-gitaly-node).
+ 1. Determine which Gitaly node is [the current primary](troubleshooting.md#determine-primary-gitaly-node).
1. Comment out the secondary Gitaly nodes from the virtual storage's configuration in `/etc/gitlab/gitlab.rb`
on all Praefect nodes. This ensures there's only one Gitaly node configured, causing both of the election
@@ -1259,23 +1357,37 @@ Migrate to [repository-specific primary nodes](#repository-specific-primary-node
Gitaly Cluster recovers from a failing primary Gitaly node by promoting a healthy secondary as the
new primary.
-To minimize data loss, Gitaly Cluster:
+In GitLab 14.1 and later, Gitaly Cluster:
+
+- Elects a healthy secondary with a fully up to date copy of the repository as the new primary.
+- Repository becomes unavailable if there are no fully up to date copies of it on healthy secondaries.
+
+To minimize data loss in GitLab 13.0 to 14.0, Gitaly Cluster:
- Switches repositories that are outdated on the new primary to [read-only mode](#read-only-mode).
-- Elects the secondary with the least unreplicated writes from the primary to be the new primary.
- Because there can still be some unreplicated writes, [data loss can occur](#check-for-data-loss).
+- Elects the secondary with the least unreplicated writes from the primary to be the new
+ primary. Because there can still be some unreplicated writes,
+ [data loss can occur](#check-for-data-loss).
### Read-only mode
> - Introduced in GitLab 13.0 as [generally available](https://about.gitlab.com/handbook/product/gitlab-the-product/#generally-available-ga).
> - Between GitLab 13.0 and GitLab 13.2, read-only mode applied to the whole virtual storage and occurred whenever failover occurred.
> - [In GitLab 13.3 and later](https://gitlab.com/gitlab-org/gitaly/-/issues/2862), read-only mode applies on a per-repository basis and only occurs if a new primary is out of date.
+new primary. If the failed primary contained unreplicated writes, [data loss can occur](#check-for-data-loss).
+> - Removed in GitLab 14.1. Instead, repositories [become unavailable](#unavailable-repositories).
+
+In GitLab 13.0 to 14.0, when Gitaly Cluster switches to a new primary, repositories enter
+read-only mode if they are out of date. This can happen after failing over to an outdated
+secondary. Read-only mode eases data recovery efforts by preventing writes that may conflict
+with the unreplicated writes on other nodes.
-When Gitaly Cluster switches to a new primary, repositories enter read-only mode if they are out of
-date. This can happen after failing over to an outdated secondary. Read-only mode eases data
-recovery efforts by preventing writes that may conflict with the unreplicated writes on other nodes.
+When Gitaly Cluster switches to a new primary In GitLab 13.0 to 14.0, repositories enter
+read-only mode if they are out of date. This can happen after failing over to an outdated
+secondary. Read-only mode eases data recovery efforts by preventing writes that may conflict
+with the unreplicated writes on other nodes.
-To enable writes again, an administrator can:
+To enable writes again in GitLab 13.0 to 14.0, an administrator can:
1. [Check](#check-for-data-loss) for data loss.
1. Attempt to [recover](#data-recovery) missing data.
@@ -1283,21 +1395,38 @@ To enable writes again, an administrator can:
[accept data loss](#enable-writes-or-accept-data-loss) if necessary, depending on the version of
GitLab.
+## Unavailable repositories
+
+> - From GitLab 13.0 through 14.0, repositories became read-only if they were outdated on the primary but fully up to date on a healthy secondary. `dataloss` sub-command displays read-only repositories by default through these versions.
+> - Since GitLab 14.1, Praefect contains more responsive failover logic which immediately fails over to one of the fully up to date secondaries rather than placing the repository in read-only mode. Since GitLab 14.1, the `dataloss` sub-command displays repositories which are unavailable due to having no fully up to date copies on healthy Gitaly nodes.
+
+A repository is unavailable if all of its up to date replicas are unavailable. Unavailable repositories are
+not accessible through Praefect to prevent serving stale data that may break automated tooling.
+
### Check for data loss
-The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This can help
-identify potential data loss after a failover. The following parameters are
-available:
+The Praefect `dataloss` subcommand identifies:
+
+- Copies of repositories in GitLab 13.0 to GitLab 14.0 that at are likely to be outdated.
+ This can help identify potential data loss after a failover.
+- Repositories in GitLab 14.1 and later that are unavailable. This helps identify potential
+ data loss and repositories which are no longer accessible because all of their up-to-date
+ replicas copies are unavailable.
+
+The following parameters are available:
-- `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
- display outdated replicas of read-only repositories as they might require administrator action.
-- In GitLab 13.3 and later, `-partially-replicated` that specifies whether to display a list of
- [outdated replicas of writable repositories](#outdated-replicas-of-writable-repositories).
+- `-virtual-storage` that specifies which virtual storage to check. Because they might require
+ an administrator to intervene, the default behavior is to display:
+ - In GitLab 13.0 to 14.0, copies of read-only repositories.
+ - In GitLab 14.1 and later, unavailable repositories.
+- In GitLab 14.1 and later, [`-partially-unavailable`](#unavailable-replicas-of-available-repositories)
+ that specifies whether to include in the output repositories that are available but have
+ some assigned copies that are not available.
NOTE:
`dataloss` is still in beta and the output format is subject to change.
-To check for repositories with outdated primaries, run:
+To check for repositories with outdated primaries or for unavailable repositories, run:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>]
@@ -1309,13 +1438,20 @@ Every configured virtual storage is checked if none is specified:
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
```
-Repositories which have assigned storage nodes that contain an outdated copy of the repository are listed
-in the output. This information is printed for each repository:
+Repositories are listed in the output that have either:
+
+- An outdated copy of the repository on the primary, in GitLab 13.0 to GitLab 14.0.
+- No healthy and fully up-to-date copies available, in GitLab 14.1 and later.
+
+The following information is printed for each repository:
- A repository's relative path to the storage directory identifies each repository and groups the related
information.
-- The repository's current status is printed in parentheses next to the disk path. If the repository's primary
- is outdated, the repository is in `read-only` mode and can't accept writes. Otherwise, the mode is `writable`.
+- The repository's current status is printed in parentheses next to the disk path:
+ - In GitLab 13.0 to 14.0, either `(read-only)` if the repository's primary node is outdated
+ and can't accept writes. Otherwise, `(writable)`.
+ - In GitLab 14.1 and later, `(unavailable)` is printed next to the disk path if the
+ repository is unavailable.
- The primary field lists the repository's current primary. If the repository has no primary, the field shows
`No Primary`.
- The In-Sync Storages lists replicas which have replicated the latest successful write and all writes
@@ -1325,44 +1461,51 @@ in the output. This information is printed for each repository:
is listed next to replica. It's important to notice that the outdated replicas may be fully up to date or contain
later changes but Praefect can't guarantee it.
-Whether a replica is assigned to host the repository is listed with each replica's status. `assigned host` is printed
-next to replicas which are assigned to store the repository. The text is omitted if the replica contains a copy of
-the repository but is not assigned to store the repository. Such replicas aren't kept in-sync by Praefect, but may
-act as replication sources to bring assigned replicas up to date.
+Additional information includes:
+
+- Whether a node is assigned to host the repository is listed with each node's status.
+ `assigned host` is printed next to nodes that are assigned to store the repository. The
+ text is omitted if the node contains a copy of the repository but is not assigned to store
+ the repository. Such copies aren't kept in sync by Praefect, but may act as replication
+ sources to bring assigned copies up to date.
+- In GitLab 14.1 and later, `unhealthy` is printed next to the copies that are located
+ on unhealthy Gitaly nodes.
Example output:
```shell
Virtual storage: default
Outdated repositories:
- @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (read-only):
+ @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (unavailable):
Primary: gitaly-1
In-Sync Storages:
- gitaly-2, assigned host
+ gitaly-2, assigned host, unhealthy
Outdated Storages:
gitaly-1 is behind by 3 changes or less, assigned host
gitaly-3 is behind by 3 changes or less
```
-A confirmation is printed out when every repository is writable. For example:
+A confirmation is printed out when every repository is available. For example:
```shell
Virtual storage: default
- All repositories are writable!
+ All repositories are available!
```
-#### Outdated replicas of writable repositories
+#### Unavailable replicas of available repositories
-> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3019) in GitLab 13.3.
+NOTE:
+In GitLab 14.0 and earlier, the flag is `-partially-replicated` and the output shows any repositories with assigned nodes with outdated
+copies.
-To also list information of repositories whose primary is up to date but one or more assigned
-replicas are outdated, use the `-partially-replicated` flag.
+To also list information of repositories which are available but are unavailable from some of the assigned nodes,
+use the `-partially-unavailable` flag.
-A repository is writable if the primary has the latest changes. Secondaries might be temporarily
-outdated while they are waiting to replicate the latest changes.
+A repository is available if there is a healthy, up to date replica available. Some of the assigned secondary
+replicas may be temporarily unavailable for access while they are waiting to replicate the latest changes.
```shell
-sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-replicated]
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-unavailable]
```
Example output:
@@ -1370,7 +1513,7 @@ Example output:
```shell
Virtual storage: default
Outdated repositories:
- @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (writable):
+ @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git:
Primary: gitaly-1
In-Sync Storages:
gitaly-1, assigned host
@@ -1379,14 +1522,14 @@ Virtual storage: default
gitaly-3 is behind by 3 changes or less
```
-With the `-partially-replicated` flag set, a confirmation is printed out if every assigned replica is fully up to
-date.
+With the `-partially-unavailable` flag set, a confirmation is printed out if every assigned replica is fully up to
+date and healthy.
For example:
```shell
Virtual storage: default
- All repositories are up to date!
+ All repositories are fully available on all assigned storages!
```
### Check repository checksums
@@ -1394,30 +1537,50 @@ Virtual storage: default
To check a project's repository checksums across on all Gitaly nodes, run the
[replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node.
+### Accept data loss
+
+WARNING:
+`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
+[recovery efforts](#data-recovery) must be performed before using it.
+
+If it is not possible to bring one of the up to date replicas back online, you may have to accept data
+loss. When accepting data loss, Praefect marks the chosen replica of the repository as the latest version
+and replicates it to the other assigned Gitaly nodes. This process overwrites any other version of the
+repository so care must be taken.
+
+```shell
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss
+-virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
+```
+
### Enable writes or accept data loss
-Praefect provides the following sub-commands to re-enable writes:
+WARNING:
+`accept-dataloss` causes permanent data loss by overwriting other versions of the repository.
+Data [recovery efforts](#data-recovery) must be performed before using it.
-- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
- recovery attempts.
+Praefect provides the following subcommands to re-enable writes or accept data loss:
- ```shell
- sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
- ```
+- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after
+ data recovery attempts:
-- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and later,
- `accept-dataloss` to accept data loss and re-enable writes for repositories after data recovery
- attempts have failed. Accepting data loss causes current version of the repository on the
- authoritative storage to be considered latest. Other storages are brought up to date with the
- authoritative storage by scheduling replication jobs.
+ ```shell
+ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
+ ```
+
+- In GitLab 13.3 and later, if it is not possible to bring one of the up to date nodes back
+ online, you may have to accept data loss:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
```
-WARNING:
-`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
-[recovery efforts](#data-recovery) must be performed before using it.
+ When accepting data loss, Praefect:
+
+ 1. Marks the chosen copy of the repository as the latest version.
+ 1. Replicates the copy to the other assigned Gitaly nodes.
+
+ This process overwrites any other copy of the repository so care must be taken.
## Data recovery
@@ -1463,10 +1626,7 @@ praefect['reconciliation_scheduling_interval'] = '0' # disable the feature
### Manual reconciliation
WARNING:
-The `reconcile` sub-command is deprecated and scheduled for removal in GitLab 14.0. Use
-[automatic reconciliation](#automatic-reconciliation) instead. Manual reconciliation may
-produce excess replication jobs and is limited in functionality. Manual reconciliation does
-not work when [repository-specific primary nodes](#repository-specific-primary-nodes) are
+The `reconcile` sub-command was removed in GitLab 14.1. Use [automatic reconciliation](#automatic-reconciliation) instead. Manual reconciliation may produce excess replication jobs and is limited in functionality. Manual reconciliation does not work when [repository-specific primary nodes](#repository-specific-primary-nodes) are
enabled.
The Praefect `reconcile` sub-command allows for the manual reconciliation between two Gitaly nodes. The
@@ -1509,7 +1669,7 @@ After creating and configuring Gitaly Cluster:
1. Ensure all storages are accessible to the GitLab instance. In this example, these are
`<original_storage_name>` and `<cluster_storage_name>`.
1. [Configure repository storage weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored)
- so that the Gitaly Cluster receives all new projects. This stops new projects being created
+ so that the Gitaly Cluster receives all new projects. This stops new projects from being created
on existing Gitaly nodes while the migration is in progress.
1. Schedule repository moves for:
- [Projects](#bulk-schedule-project-moves).
diff --git a/doc/administration/gitaly/troubleshooting.md b/doc/administration/gitaly/troubleshooting.md
new file mode 100644
index 00000000000..ab6f493cf0f
--- /dev/null
+++ b/doc/administration/gitaly/troubleshooting.md
@@ -0,0 +1,372 @@
+---
+stage: Create
+group: Gitaly
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+type: reference
+---
+
+# Troubleshooting Gitaly and Gitaly Cluster **(FREE SELF)**
+
+Refer to the information below when troubleshooting Gitaly and Gitaly Cluster.
+
+Before troubleshooting, see the Gitaly and Gitaly Cluster
+[frequently asked questions](faq.md).
+
+## Troubleshoot Gitaly
+
+The following sections provide possible solutions to Gitaly errors.
+
+See also [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings.
+
+### Check versions when using standalone Gitaly servers
+
+When using standalone Gitaly servers, you must make sure they are the same version
+as GitLab to ensure full compatibility:
+
+1. On the top bar, select **Menu >** **{admin}** **Admin** on your GitLab instance.
+1. On the left sidebar, select **Overview > Gitaly Servers**.
+1. Confirm all Gitaly servers indicate that they are up to date.
+
+### Use `gitaly-debug`
+
+The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git
+performance. It is intended to help production engineers and support
+engineers investigate Gitaly performance problems.
+
+If you're using GitLab 11.6 or newer, this tool should be installed on
+your GitLab or Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`.
+If you're investigating an older GitLab version you can compile this
+tool offline and copy the executable to your server:
+
+```shell
+git clone https://gitlab.com/gitlab-org/gitaly.git
+cd cmd/gitaly-debug
+GOOS=linux GOARCH=amd64 go build -o gitaly-debug
+```
+
+To see the help page of `gitaly-debug` for a list of supported sub-commands, run:
+
+```shell
+gitaly-debug -h
+```
+
+### Commits, pushes, and clones return a 401
+
+```plaintext
+remote: GitLab: 401 Unauthorized
+```
+
+You need to sync your `gitlab-secrets.json` file with your GitLab
+application nodes.
+
+### Client side gRPC logs
+
+Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
+client has its own log file which may contain useful information when
+you are seeing Gitaly errors. You can control the log level of the
+gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
+default level is `WARN`.
+
+You can run a gRPC trace with:
+
+```shell
+sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check
+```
+
+### Server side gRPC logs
+
+gRPC tracing can also be enabled in Gitaly itself with the `GODEBUG=http2debug`
+environment variable. To set this in an Omnibus GitLab install:
+
+1. Add the following to your `gitlab.rb` file:
+
+ ```ruby
+ gitaly['env'] = {
+ "GODEBUG=http2debug" => "2"
+ }
+ ```
+
+1. [Reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure) GitLab.
+
+### Correlating Git processes with RPCs
+
+Sometimes you need to find out which Gitaly RPC created a particular Git process.
+
+One method for doing this is by using `DEBUG` logging. However, this needs to be enabled
+ahead of time and the logs produced are quite verbose.
+
+A lightweight method for doing this correlation is by inspecting the environment
+of the Git process (using its `PID`) and looking at the `CORRELATION_ID` variable:
+
+```shell
+PID=<Git process ID>
+sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID=
+```
+
+This method isn't reliable for `git cat-file` processes, because Gitaly
+internally pools and re-uses those across RPCs.
+
+### Observing `gitaly-ruby` traffic
+
+[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly,
+so, there's not that much visibility into what goes on inside
+`gitaly-ruby` processes.
+
+If you have Prometheus set up to scrape your Gitaly process, you can see
+request rates and error codes for individual RPCs in `gitaly-ruby` by
+querying `grpc_client_handled_total`.
+
+- In theory, this metric does not differentiate between `gitaly-ruby` and other RPCs.
+- In practice from GitLab 11.9, all gRPC calls made by Gitaly itself are internal calls from the
+ main Gitaly process to one of its `gitaly-ruby` sidecars.
+
+Assuming your `grpc_client_handled_total` counter only observes Gitaly,
+the following query shows you RPCs are (most likely) internally
+implemented as calls to `gitaly-ruby`:
+
+```prometheus
+sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0
+```
+
+### Repository changes fail with a `401 Unauthorized` error
+
+If you run Gitaly on its own server and notice these conditions:
+
+- Users can successfully clone and fetch repositories by using both SSH and HTTPS.
+- Users can't push to repositories, or receive a `401 Unauthorized` message when attempting to
+ make changes to them in the web UI.
+
+Gitaly may be failing to authenticate with the Gitaly client because it has the
+[wrong secrets file](configure_gitaly.md#configure-gitaly-servers).
+
+Confirm the following are all true:
+
+- When any user performs a `git push` to any repository on this Gitaly server, it
+ fails with a `401 Unauthorized` error:
+
+ ```shell
+ remote: GitLab: 401 Unauthorized
+ To <REMOTE_URL>
+ ! [remote rejected] branch-name -> branch-name (pre-receive hook declined)
+ error: failed to push some refs to '<REMOTE_URL>'
+ ```
+
+- When any user adds or modifies a file from the repository using the GitLab
+ UI, it immediately fails with a red `401 Unauthorized` banner.
+- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects)
+ successfully creates the project but doesn't create the README.
+- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server)
+ on a Gitaly client and reproducing the error, you get `401` errors
+ when reaching the [`/api/v4/internal/allowed`](../../development/internal_api.md) endpoint:
+
+ ```shell
+ # api_json.log
+ {
+ "time": "2019-07-18T00:30:14.967Z",
+ "severity": "INFO",
+ "duration": 0.57,
+ "db": 0,
+ "view": 0.57,
+ "status": 401,
+ "method": "POST",
+ "path": "\/api\/v4\/internal\/allowed",
+ "params": [
+ {
+ "key": "action",
+ "value": "git-receive-pack"
+ },
+ {
+ "key": "changes",
+ "value": "REDACTED"
+ },
+ {
+ "key": "gl_repository",
+ "value": "REDACTED"
+ },
+ {
+ "key": "project",
+ "value": "\/path\/to\/project.git"
+ },
+ {
+ "key": "protocol",
+ "value": "web"
+ },
+ {
+ "key": "env",
+ "value": "{\"GIT_ALTERNATE_OBJECT_DIRECTORIES\":[],\"GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE\":[],\"GIT_OBJECT_DIRECTORY\":null,\"GIT_OBJECT_DIRECTORY_RELATIVE\":null}"
+ },
+ {
+ "key": "user_id",
+ "value": "2"
+ },
+ {
+ "key": "secret_token",
+ "value": "[FILTERED]"
+ }
+ ],
+ "host": "gitlab.example.com",
+ "ip": "REDACTED",
+ "ua": "Ruby",
+ "route": "\/api\/:version\/internal\/allowed",
+ "queue_duration": 4.24,
+ "gitaly_calls": 0,
+ "gitaly_duration": 0,
+ "correlation_id": "XPUZqTukaP3"
+ }
+
+ # nginx_access.log
+ [IP] - - [18/Jul/2019:00:30:14 +0000] "POST /api/v4/internal/allowed HTTP/1.1" 401 30 "" "Ruby"
+ ```
+
+To fix this problem, confirm that your [`gitlab-secrets.json` file](configure_gitaly.md#configure-gitaly-servers)
+on the Gitaly server matches the one on Gitaly client. If it doesn't match,
+update the secrets file on the Gitaly server to match the Gitaly client, then
+[reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure).
+
+### Command line tools cannot connect to Gitaly
+
+gRPC cannot reach your Gitaly server if:
+
+- You can't connect to a Gitaly server with command-line tools.
+- Certain actions result in a `14: Connect Failed` error message.
+
+Verify you can reach Gitaly by using TCP:
+
+```shell
+sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
+```
+
+If the TCP connection:
+
+- Fails, check your network settings and your firewall rules.
+- Succeeds, your networking and firewall rules are correct.
+
+If you use proxy servers in your command line environment such as Bash, these can interfere with
+your gRPC traffic.
+
+If you use Bash or a compatible command line environment, run the following commands to determine
+whether you have proxy servers configured:
+
+```shell
+echo $http_proxy
+echo $https_proxy
+```
+
+If either of these variables have a value, your Gitaly CLI connections may be getting routed through
+a proxy which cannot connect to Gitaly.
+
+To remove the proxy setting, run the following commands (depending on which variables had values):
+
+```shell
+unset http_proxy
+unset https_proxy
+```
+
+### Permission denied errors appearing in Gitaly or Praefect logs when accessing repositories
+
+You might see the following in Gitaly and Praefect logs:
+
+```shell
+{
+ ...
+ "error":"rpc error: code = PermissionDenied desc = permission denied",
+ "grpc.code":"PermissionDenied",
+ "grpc.meta.client_name":"gitlab-web",
+ "grpc.request.fullMethod":"/gitaly.ServerService/ServerInfo",
+ "level":"warning",
+ "msg":"finished unary call with code PermissionDenied",
+ ...
+}
+```
+
+This is a GRPC call
+[error response code](https://grpc.github.io/grpc/core/md_doc_statuscodes.html).
+
+If this error occurs, even though
+[the Gitaly auth tokens are set up correctly](#praefect-errors-in-logs),
+it's likely that the Gitaly servers are experiencing
+[clock drift](https://en.wikipedia.org/wiki/Clock_drift).
+
+Ensure the Gitaly clients and servers are synchronized, and use an NTP time
+server to keep them synchronized.
+
+### Gitaly not listening on new address after reconfiguring
+
+When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may
+continue to listen on the old address after a `sudo gitlab-ctl reconfigure`.
+
+When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be
+necessary because [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved.
+
+### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node
+
+If this error occurs even though file permissions are correct, it's likely that the Gitaly node is
+experiencing [clock drift](https://en.wikipedia.org/wiki/Clock_drift).
+
+Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time
+server to keep them synchronized if possible.
+
+## Troubleshoot Praefect (Gitaly Cluster)
+
+The following sections provide possible solutions to Gitaly Cluster errors.
+
+### Praefect errors in logs
+
+If you receive an error, check `/var/log/gitlab/gitlab-rails/production.log`.
+
+Here are common errors and potential causes:
+
+- 500 response code
+ - **ActionView::Template::Error (7:permission denied)**
+ - `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server.
+ - **Unable to save project. Error: 7:permission denied**
+ - Secret token in `praefect['storage_nodes']` on GitLab server does not match the
+ value in `gitaly['auth_token']` on one or more Gitaly servers.
+- 503 response code
+ - **GRPC::Unavailable (14:failed to connect to all addresses)**
+ - GitLab was unable to reach Praefect.
+ - **GRPC::Unavailable (14:all SubCons are in TransientFailure...)**
+ - Praefect cannot reach one or more of its child Gitaly nodes. Try running
+ the Praefect connection checker to diagnose.
+
+### Determine primary Gitaly node
+
+To determine the current primary Gitaly node for a specific Praefect node:
+
+- Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the
+ [`Gitlab Omnibus - Praefect` dashboard](https://gitlab.com/gitlab-org/grafana-dashboards/-/blob/master/omnibus/praefect.json).
+ This is recommended.
+- If you do not have Grafana set up, use the following command on each host of each
+ Praefect node:
+
+ ```shell
+ curl localhost:9652/metrics | grep gitaly_praefect_primaries`
+ ```
+
+### Relation does not exist errors
+
+By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task.
+
+However, the Praefect database tables are not created on initial reconfigure and can throw
+errors that relations do not exist if either:
+
+- The `gitlab-ctl reconfigure` command isn't executed.
+- There are errors during the execution.
+
+For example:
+
+- `ERROR: relation "node_status" does not exist at character 13`
+- `ERROR: relation "replication_queue_lock" does not exist at character 40`
+- This error:
+
+ ```json
+ {"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"}
+ ```
+
+To solve this, the database schema migration can be done using `sql-migrate` sub-command of
+the `praefect` command:
+
+```shell
+$ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate
+praefect sql-migrate: OK (applied 21 migrations)
+```