summaryrefslogtreecommitdiff
path: root/doc/administration/gitaly
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-02-18 10:34:06 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2021-02-18 10:34:06 +0000
commit859a6fb938bb9ee2a317c46dfa4fcc1af49608f0 (patch)
treed7f2700abe6b4ffcb2dcfc80631b2d87d0609239 /doc/administration/gitaly
parent446d496a6d000c73a304be52587cd9bbc7493136 (diff)
downloadgitlab-ce-859a6fb938bb9ee2a317c46dfa4fcc1af49608f0.tar.gz
Add latest changes from gitlab-org/gitlab@13-9-stable-eev13.9.0-rc42
Diffstat (limited to 'doc/administration/gitaly')
-rw-r--r--doc/administration/gitaly/index.md119
-rw-r--r--doc/administration/gitaly/praefect.md241
-rw-r--r--doc/administration/gitaly/reference.md14
3 files changed, 234 insertions, 140 deletions
diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md
index 9577fb40abe..f02b9b8fc1a 100644
--- a/doc/administration/gitaly/index.md
+++ b/doc/administration/gitaly/index.md
@@ -19,7 +19,7 @@ In the Gitaly documentation:
- [GitLab Shell](https://gitlab.com/gitlab-org/gitlab-shell).
- [GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse).
-GitLab end users do not have direct access to Gitaly. Gitaly only manages Git
+GitLab end users do not have direct access to Gitaly. Gitaly manages only Git
repository access for GitLab. Other types of GitLab data aren't accessed using Gitaly.
<!-- vale gitlab.FutureTense = NO -->
@@ -40,7 +40,7 @@ The following is a high-level architecture overview of how Gitaly is used.
## Configure Gitaly
-The Gitaly service itself is configured via a [TOML configuration file](reference.md).
+The Gitaly service itself is configured by using a [TOML configuration file](reference.md).
To change Gitaly settings:
@@ -91,8 +91,8 @@ When running Gitaly on its own server, note the following regarding GitLab versi
- From GitLab 11.4, Gitaly was able to serve all Git requests without requiring a shared NFS mount
for Git repository data, except for the
[Elasticsearch indexer](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer).
-- From GitLab 11.8, the Elasticsearch indexer uses Gitaly for data access as well. NFS can still be
- leveraged for redundancy on block-level Git data, but only has to be mounted on the Gitaly
+- From GitLab 11.8, the Elasticsearch indexer also uses Gitaly for data access. NFS can still be
+ leveraged for redundancy on block-level Git data, but should be mounted only on the Gitaly
servers.
- From GitLab 11.8 to 12.2, it is possible to use Elasticsearch in a Gitaly setup that doesn't use
NFS. To use Elasticsearch in these versions, the
@@ -121,7 +121,7 @@ The following list depicts the network architecture of Gitaly:
- GitLab Shell.
- Elasticsearch indexer.
- Gitaly itself.
-- A Gitaly server must be able to make RPC calls **to itself** via its own
+- A Gitaly server must be able to make RPC calls **to itself** by using its own
`(Gitaly address, Gitaly token)` pair as specified in `/config/gitlab.yml`.
- Authentication is done through a static token which is shared among the Gitaly and GitLab Rails
nodes.
@@ -497,16 +497,16 @@ gitaly['certificate_path'] = "/etc/gitlab/ssl/cert.pem"
gitaly['key_path'] = "/etc/gitlab/ssl/key.pem"
```
-`path` can only be included for storage shards on the local Gitaly server.
+`path` can be included only for storage shards on the local Gitaly server.
If it's excluded, default Git storage directory is used for that storage shard.
### Disable Gitaly where not required (optional)
-If you are running Gitaly [as a remote service](#run-gitaly-on-its-own-server), you may want to
-disable the local Gitaly service that runs on your GitLab server by default and have it only running
-where required.
+If you run Gitaly [as a remote service](#run-gitaly-on-its-own-server), consider
+disabling the local Gitaly service that runs on your GitLab server by default, and run it
+only where required.
-Disabling Gitaly on the GitLab instance only makes sense when you run GitLab in a custom cluster configuration, where
+Disabling Gitaly on the GitLab instance makes sense only when you run GitLab in a custom cluster configuration, where
Gitaly runs on a separate machine from the GitLab instance. Disabling Gitaly on all machines in the cluster is not
a valid configuration (some machines much act as Gitaly servers).
@@ -538,7 +538,7 @@ To disable Gitaly on a GitLab server:
> - [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3160) in GitLab 13.6, outgoing TLS connections to GitLab provide client certificates if configured.
Gitaly supports TLS encryption. To communicate with a Gitaly instance that listens for secure
-connections, you must use `tls://` URL scheme in the `gitaly_address` of the corresponding
+connections, use the `tls://` URL scheme in the `gitaly_address` of the corresponding
storage entry in the GitLab configuration.
Gitaly provides the same server certificates as client certificates in TLS
@@ -724,7 +724,7 @@ Gitaly Go process. Some examples of things that are implemented in `gitaly-ruby`
We recommend:
-- At least 300MB memory per worker.
+- At least 300 MB memory per worker.
- No more than one worker per core.
NOTE:
@@ -752,7 +752,7 @@ settings:
gitaly['ruby_num_workers'] = 4
```
-1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
+1. Save the file, and then [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
**For installations from source**
@@ -810,9 +810,42 @@ You can observe the behavior of this queue using the Gitaly logs and Prometheus:
- `gitaly_rate_limiting_seconds`.
NOTE:
-Though the name of the Prometheus metric contains `rate_limiting`, it is a concurrency limiter, not
-a rate limiter. If a Gitaly client makes 1000 requests in a row very quickly, concurrency does not
-exceed 1 and the concurrency limiter has no effect.
+Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter, not
+a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency doesn't
+exceed 1, and the concurrency limiter has no effect.
+
+## Background Repository Optimization
+
+Empty directories and unneeded config settings may accumulate in a repository and
+slow down Git operations. Gitaly can schedule a daily background task with a maximum duration
+to clean up these items and improve performance.
+
+WARNING:
+This is an experimental feature and may place significant load on the host while running.
+Make sure to schedule this during off-peak hours and keep the duration short (for example, 30-60 minutes).
+
+**For Omnibus GitLab**
+
+Edit `/etc/gitlab/gitlab.rb` and add:
+
+```ruby
+gitaly['daily_maintenance_start_hour'] = 4
+gitaly['daily_maintenance_start_minute'] = 30
+gitaly['daily_maintenance_duration'] = '30m'
+gitaly['daily_maintenance_storages'] = ["default"]
+```
+
+**For installations from source**
+
+Edit `/home/git/gitaly/config.toml` and add:
+
+```toml
+[daily_maintenance]
+start_hour = 4
+start_minute = 30
+duration = '30m'
+storages = ["default"]
+```
## Rotate Gitaly authentication token
@@ -847,7 +880,7 @@ see something like this:
{enforced="true",status="ok"} 4424.985419441742
```
-There may also be other numbers with rate 0. We only care about the non-zero numbers.
+There may also be other numbers with rate 0. We care only about the non-zero numbers.
The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero
numbers, something is wrong in your configuration.
@@ -906,7 +939,7 @@ After the new token is set, and all services involved have been restarted, you w
- `status="would be ok"`.
- `status="denied"`.
-After the new token has been picked up by all Gitaly clients and Gitaly servers, the
+After the new token is picked up by all Gitaly clients and Gitaly servers, the
**only non-zero rate** should be `enforced="false",status="would be ok"`.
### Disable "auth transitioning" mode
@@ -935,12 +968,13 @@ Note that `enforced="true"` means that authentication is being enforced.
## Direct Git access bypassing Gitaly
-While it is possible to access Gitaly repositories stored on disk directly with a Git client,
-it is not advisable because Gitaly is being continuously improved and changed. Theses improvements may invalidate assumptions, resulting in performance degradation, instability, and even data loss.
+GitLab doesn't advise directly accessing Gitaly repositories stored on disk with
+a Git client, because Gitaly is being continuously improved and changed. These
+improvements may invalidate assumptions, resulting in performance degradation, instability, and even data loss.
Gitaly has optimizations, such as the
[`info/refs` advertisement cache](https://gitlab.com/gitlab-org/gitaly/blob/master/doc/design_diskcache.md),
-that rely on Gitaly controlling and monitoring access to repositories via the
+that rely on Gitaly controlling and monitoring access to repositories by using the
official gRPC interface. Likewise, Praefect has optimizations, such as fault
tolerance and distributed reads, that depend on the gRPC interface and
database to determine repository state.
@@ -979,11 +1013,11 @@ lookup. Even when Gitaly is able to re-use an already-running `git` process (for
a commit), you still have:
- The cost of a network roundtrip to Gitaly.
-- Within Gitaly, a write/read roundtrip on the Unix pipes that connect Gitaly to the `git` process.
+- Inside Gitaly, a write/read roundtrip on the Unix pipes that connect Gitaly to the `git` process.
Using GitLab.com to measure, we reduced the number of Gitaly calls per request until the loss of
Rugged's efficiency was no longer felt. It also helped that we run Gitaly itself directly on the Git
-file severs, rather than via NFS mounts. This gave us a speed boost that counteracted the negative
+file severs, rather than by using NFS mounts. This gave us a speed boost that counteracted the negative
effect of not using Rugged anymore.
Unfortunately, other deployments of GitLab could not remove NFS like we did on GitLab.com, and they
@@ -1018,7 +1052,7 @@ The result of these checks is cached.
To see if GitLab can access the repository file system directly, we use the following heuristic:
- Gitaly ensures that the file system has a metadata file in its root with a UUID in it.
-- Gitaly reports this UUID to GitLab via the `ServerInfo` RPC.
+- Gitaly reports this UUID to GitLab by using the `ServerInfo` RPC.
- GitLab Rails tries to read the metadata file directly. If it exists, and if the UUID's match,
assume we have direct access.
@@ -1085,7 +1119,7 @@ app nodes).
### Client side gRPC logs
Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
-client has its own log file which may contain useful information when
+client has its own log file which may contain debugging information when
you are seeing Gitaly errors. You can control the log level of the
gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
default level is `WARN`.
@@ -1100,7 +1134,7 @@ sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check
Sometimes you need to find out which Gitaly RPC created a particular Git process.
-One method for doing this is via `DEBUG` logging. However, this needs to be enabled
+One method for doing this is by using `DEBUG` logging. However, this needs to be enabled
ahead of time and the logs produced are quite verbose.
A lightweight method for doing this correlation is by inspecting the environment
@@ -1111,7 +1145,7 @@ PID=<Git process ID>
sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID=
```
-Please note that this method is not reliable for `git cat-file` processes because Gitaly
+This method isn't reliable for `git cat-file` processes, because Gitaly
internally pools and re-uses those across RPCs.
### Observing `gitaly-ruby` traffic
@@ -1127,7 +1161,7 @@ not differentiate between `gitaly-ruby` and other RPCs, but in practice
(as of GitLab 11.9), all gRPC calls made by Gitaly itself are internal
calls from the main Gitaly process to one of its `gitaly-ruby` sidecars.
-Assuming your `grpc_client_handled_total` counter only observes Gitaly,
+Assuming your `grpc_client_handled_total` counter observes only Gitaly,
the following query shows you RPCs are (most likely) internally
implemented as calls to `gitaly-ruby`:
@@ -1137,16 +1171,19 @@ sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0
### Repository changes fail with a `401 Unauthorized` error
-If you're running Gitaly on its own server and notice that users can
-successfully clone and fetch repositories (via both SSH and HTTPS), but can't
-push to them or make changes to the repository in the web UI without getting a
-`401 Unauthorized` message, then it's possible Gitaly is failing to authenticate
-with the Gitaly client due to having the [wrong secrets file](#configure-gitaly-servers).
+If you run Gitaly on its own server and notice these conditions:
+
+- Users can successfully clone and fetch repositories by using both SSH and HTTPS.
+- Users can't push to repositories, or receive a `401 Unauthorized` message when attempting to
+ make changes to them in the web UI.
+
+Gitaly may be failing to authenticate with the Gitaly client because it has the
+[wrong secrets file](#configure-gitaly-servers).
Confirm the following are all true:
- When any user performs a `git push` to any repository on this Gitaly server, it
- fails with the following error (note the `401 Unauthorized`):
+ fails with a `401 Unauthorized` error:
```shell
remote: GitLab: 401 Unauthorized
@@ -1157,8 +1194,8 @@ Confirm the following are all true:
- When any user adds or modifies a file from the repository using the GitLab
UI, it immediately fails with a red `401 Unauthorized` banner.
-- Creating a new project and [initializing it with a README](../../gitlab-basics/create-project.md#blank-projects)
- successfully creates the project but doesn't create the README.
+- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects)
+ successfully creates the project, but doesn't create the README.
- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server)
on a Gitaly client and reproducing the error, you get `401` errors
when reaching the `/api/v4/internal/allowed` endpoint:
@@ -1229,11 +1266,11 @@ update the secrets file on the Gitaly server to match the Gitaly client, then
### Command line tools cannot connect to Gitaly
-If you are having trouble connecting to a Gitaly server with command line (CLI) tools,
+If you can't connect to a Gitaly server with command line (CLI) tools,
and certain actions result in a `14: Connect Failed` error message,
-it means that gRPC cannot reach your Gitaly server.
+gRPC cannot reach your Gitaly server.
-Verify that you can reach Gitaly via TCP:
+Verify you can reach Gitaly by using TCP:
```shell
sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
@@ -1269,8 +1306,8 @@ If this error occurs even though file permissions are correct, it's likely that
the Gitaly server is experiencing
[clock drift](https://en.wikipedia.org/wiki/Clock_drift).
-Please ensure that the Gitaly clients and servers are synchronized and use an NTP time
-server to keep them synchronized if possible.
+Ensure the Gitaly clients and servers are synchronized, and use an NTP time
+server to keep them synchronized, if possible.
### Praefect
diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md
index fe8b3e5f566..45f478b8d16 100644
--- a/doc/administration/gitaly/praefect.md
+++ b/doc/administration/gitaly/praefect.md
@@ -5,12 +5,12 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: reference
---
-# Gitaly Cluster **(CORE ONLY)**
+# Gitaly Cluster **(FREE SELF)**
[Gitaly](index.md), the service that provides storage for Git repositories, can
be run in a clustered configuration to increase fault tolerance. In this
configuration, every Git repository is stored on every Gitaly node in the
-cluster. Multiple clusters (or shards) can be configured.
+cluster. Multiple clusters (or storage shards) can be configured.
NOTE:
Technical support for Gitaly clusters is limited to GitLab Premium and Ultimate
@@ -21,7 +21,7 @@ component for running a Gitaly Cluster.
![Architecture diagram](img/praefect_architecture_v12_10.png)
-Using a Gitaly Cluster increase fault tolerance by:
+Using a Gitaly Cluster increases fault tolerance by:
- Replicating write operations to warm standby Gitaly nodes.
- Detecting Gitaly node failures.
@@ -53,7 +53,7 @@ Gitaly Cluster supports:
- Reporting of possible data loss if replication queue is non-empty.
- Marking repositories as [read only](#read-only-mode) if data loss is detected to prevent data inconsistencies.
-Follow the [HA Gitaly epic](https://gitlab.com/groups/gitlab-org/-/epics/1489)
+Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489)
for improvements including
[horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013).
@@ -65,7 +65,7 @@ Gitaly Cluster and [Geo](../geo/index.md) both provide redundancy. However the r
not aware when Gitaly Cluster is used.
- Geo provides [replication](../geo/index.md) and [disaster recovery](../geo/disaster_recovery/index.md) for
an entire instance of GitLab. Users know when they are using Geo for
- [replication](../geo/index.md). Geo [replicates multiple datatypes](../geo/replication/datatypes.md#limitations-on-replicationverification),
+ [replication](../geo/index.md). Geo [replicates multiple data types](../geo/replication/datatypes.md#limitations-on-replicationverification),
including Git data.
The following table outlines the major differences between Gitaly Cluster and Geo:
@@ -80,23 +80,65 @@ For more information, see:
- [Gitaly architecture](index.md#architecture).
- Geo [use cases](../geo/index.md#use-cases) and [architecture](../geo/index.md#architecture).
-## Cluster or shard
+## Where Gitaly Cluster fits
+
+GitLab accesses [repositories](../../user/project/repository/index.md) through the configured
+[repository storages](../repository_storage_paths.md). Each new repository is stored on one of the
+repository storages based on their configured weights. Each repository storage is either:
+
+- A Gitaly storage served directly by Gitaly. These map to a directory on the file system of a
+ Gitaly node.
+- A [virtual storage](#virtual-storage-or-direct-gitaly-storage) served by Praefect. A virtual
+ storage is a cluster of Gitaly storages that appear as a single repository storage.
+
+Virtual storages are a feature of Gitaly Cluster. They support replicating the repositories to
+multiple storages for fault tolerance. Virtual storages can improve performance by distributing
+requests across Gitaly nodes. Their distributed nature makes it viable to have a single repository
+storage in GitLab to simplify repository management.
+
+## Components of Gitaly Cluster
+
+Gitaly Cluster consists of multiple components:
+
+- [Load balancer](#load-balancer) for distributing requests and providing fault-tolerant access to
+ Praefect nodes.
+- [Praefect](#praefect) nodes for managing the cluster and routing requests to Gitaly nodes.
+- [PostgreSQL database](#postgresql) for persisting cluster metadata and [PgBouncer](#pgbouncer),
+ recommended for pooling Praefect's database connections.
+- [Gitaly](index.md) nodes to provide repository storage and Git access.
+
+![Cluster example](img/cluster_example_v13_3.png)
+
+In this example:
+
+- Repositories are stored on a virtual storage called `storage-1`.
+- Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`.
+- The three Gitaly nodes store data on their file systems.
+
+### Virtual storage or direct Gitaly storage
Gitaly supports multiple models of scaling:
- Clustering using Gitaly Cluster, where each repository is stored on multiple Gitaly nodes in the
cluster. Read requests are distributed between repository replicas and write requests are
- broadcast to repository replicas.
-- Sharding using [repository storage paths](../repository_storage_paths.md), where each repository
- is stored on the assigned Gitaly node. All requests are routed to this node.
+ broadcast to repository replicas. GitLab accesses virtual storage.
+- Direct access to Gitaly storage using [repository storage paths](../repository_storage_paths.md),
+ where each repository is stored on the assigned Gitaly node. All requests are routed to this node.
+
+The following is Gitaly set up to use direct access to Gitaly instead of Gitaly Cluster:
+
+![Shard example](img/shard_example_v13_3.png)
-| Cluster | Shard |
-|:--------------------------------------------------|:----------------------------------------------|
-| ![Cluster example](img/cluster_example_v13_3.png) | ![Shard example](img/shard_example_v13_3.png) |
+In this example:
-Generally, Gitaly Cluster can replace sharded configurations, at the expense of additional storage
-needed to store each repository on multiple Gitaly nodes. The benefit of using Gitaly Cluster over
-sharding is:
+- Each repository is stored on one of three Gitaly storages: `storage-1`, `storage-2`,
+ or `storage-3`.
+- Each storage is serviced by a Gitaly node.
+- The three Gitaly nodes store data in three separate hashed storage locations.
+
+Generally, virtual storage with Gitaly Cluster can replace direct Gitaly storage configurations, at
+the expense of additional storage needed to store each repository on multiple Gitaly nodes. The
+benefit of using Gitaly Cluster over direct Gitaly storage is:
- Improved fault tolerance, because each Gitaly node has a copy of every repository.
- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
@@ -105,7 +147,7 @@ sharding is:
replicas.
- Simpler management, because all Gitaly nodes are identical.
-Under some workloads, CPU and memory requirements may require a large fleet of Gitaly nodes and it
+Under some workloads, CPU and memory requirements may require a large fleet of Gitaly nodes. It
can be uneconomical to have one to one replication factor.
A hybrid approach can be used in these instances, where each shard is configured as a smaller
@@ -157,18 +199,18 @@ You need the IP/host address for each node.
1. `LOAD_BALANCER_SERVER_ADDRESS`: the IP/host address of the load balancer
1. `POSTGRESQL_SERVER_ADDRESS`: the IP/host address of the PostgreSQL server
1. `PRAEFECT_HOST`: the IP/host address of the Praefect server
-1. `GITALY_HOST`: the IP/host address of each Gitaly server
+1. `GITALY_HOST_*`: the IP or host address of each Gitaly server
1. `GITLAB_HOST`: the IP/host address of the GitLab server
If you are using a cloud provider, you can look up the addresses for each server through your cloud provider's management console.
-If you are using Google Cloud Platform, SoftLayer, or any other vendor that provides a virtual private cloud (VPC) you can use the private addresses for each cloud instance (corresponds to “internal address” for Google Cloud Platform) for `PRAEFECT_HOST`, `GITALY_HOST`, and `GITLAB_HOST`.
+If you are using Google Cloud Platform, SoftLayer, or any other vendor that provides a virtual private cloud (VPC) you can use the private addresses for each cloud instance (corresponds to “internal address” for Google Cloud Platform) for `PRAEFECT_HOST`, `GITALY_HOST_*`, and `GITLAB_HOST`.
#### Secrets
The communication between components is secured with different secrets, which
are described below. Before you begin, generate a unique secret for each, and
-make note of it. This makes it easy to replace these placeholder tokens
+make note of it. This enables you to replace these placeholder tokens
with secure tokens as you complete the setup process.
1. `GITLAB_SHELL_SECRET_TOKEN`: this is used by Git hooks to make callback HTTP
@@ -260,13 +302,12 @@ this, set the corresponding IP or host address of the PgBouncer instance in
- `praefect['database_port']`, for the port.
Because PgBouncer manages resources more efficiently, Praefect still requires a
-direct connection to the PostgreSQL database because it uses
+direct connection to the PostgreSQL database. It uses the
[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html)
-functionality that is [not supported](https://www.pgbouncer.org/features.html) by
+feature that is [not supported](https://www.pgbouncer.org/features.html) by
PgBouncer with `pool_mode = transaction`.
-
-Therefore, `praefect['database_host_no_proxy']` and `praefect['database_port_no_proxy']`
-should be set to a direct connection and not a PgBouncer connection.
+Set `praefect['database_host_no_proxy']` and `praefect['database_port_no_proxy']`
+to a direct connection, and not a PgBouncer connection.
Save the changes to `/etc/gitlab/gitlab.rb` and
[reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure).
@@ -424,32 +465,43 @@ application server, or a Gitaly node.
Praefect when communicating with Gitaly nodes in the cluster. This token is
distinct from the `PRAEFECT_EXTERNAL_TOKEN`.
- Replace `GITALY_HOST` with the IP/host address of the each Gitaly node.
+ Replace `GITALY_HOST_*` with the IP or host address of the each Gitaly node.
More Gitaly nodes can be added to the cluster to increase the number of
replicas. More clusters can also be added for very large GitLab instances.
+ NOTE:
+ When adding additional Gitaly nodes to a virtual storage, all storage names
+ within that virtual storage must be unique. Additionally, all Gitaly node
+ addresses referenced in the Praefect configuration must be unique.
+
```ruby
# Name of storage hash must match storage name in git_data_dirs on GitLab
- # server ('praefect') and in git_data_dirs on Gitaly nodes ('gitaly-1')
+ # server ('default') and in git_data_dirs on Gitaly nodes ('gitaly-1')
praefect['virtual_storages'] = {
'default' => {
- 'gitaly-1' => {
- 'address' => 'tcp://GITALY_HOST:8075',
- 'token' => 'PRAEFECT_INTERNAL_TOKEN',
- },
- 'gitaly-2' => {
- 'address' => 'tcp://GITALY_HOST:8075',
- 'token' => 'PRAEFECT_INTERNAL_TOKEN'
- },
- 'gitaly-3' => {
- 'address' => 'tcp://GITALY_HOST:8075',
- 'token' => 'PRAEFECT_INTERNAL_TOKEN'
+ 'nodes' => {
+ 'gitaly-1' => {
+ 'address' => 'tcp://GITALY_HOST_1:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN',
+ },
+ 'gitaly-2' => {
+ 'address' => 'tcp://GITALY_HOST_2:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN'
+ },
+ 'gitaly-3' => {
+ 'address' => 'tcp://GITALY_HOST_3:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN'
+ }
}
}
}
```
+ NOTE:
+ In [GitLab 13.8 and earlier](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4988),
+ Gitaly nodes were configured directly under the virtual storage, and not under the `nodes` key.
+
1. [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2013) in GitLab 13.1 and later, enable [distribution of reads](#distributed-reads).
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
@@ -640,7 +692,7 @@ because we rely on Praefect to route operations correctly.
Particular attention should be shown to:
- The `gitaly['auth_token']` configured in this section must match the `token`
- value under `praefect['virtual_storages']` on the Praefect node. This was set
+ value under `praefect['virtual_storages']['nodes']` on the Praefect node. This was set
in the [previous section](#praefect). This document uses the placeholder
`PRAEFECT_INTERNAL_TOKEN` throughout.
- The storage names in `git_data_dirs` configured in this section must match the
@@ -774,7 +826,7 @@ configuration.
### Load Balancer
-In a highly available Gitaly configuration, a load balancer is needed to route
+In a fault-tolerant Gitaly configuration, a load balancer is needed to route
internal traffic from the GitLab application to the Praefect nodes. The
specifics on which load balancer to use or the exact configuration is beyond the
scope of the GitLab documentation.
@@ -786,7 +838,7 @@ addition to the GitLab nodes. Some requests handled by
process. `gitaly-ruby` uses the Gitaly address set in the GitLab server's
`git_data_dirs` setting to make this connection.
-We hope that if you’re managing HA systems like GitLab, you have a load balancer
+We hope that if you’re managing fault-tolerant systems like GitLab, you have a load balancer
of choice already. Some examples include [HAProxy](https://www.haproxy.org/)
(open-source), [Google Internal Load Balancer](https://cloud.google.com/load-balancing/docs/internal/),
[AWS Elastic Load Balancer](https://aws.amazon.com/elasticloadbalancing/), F5
@@ -878,7 +930,7 @@ Particular attention should be shown to:
You need to replace:
- `PRAEFECT_HOST` with the IP address or hostname of the Praefect node
- - `GITALY_HOST` with the IP address or hostname of each Gitaly node
+ - `GITALY_HOST_*` with the IP address or hostname of each Gitaly node
```ruby
prometheus['scrape_configs'] = [
@@ -896,9 +948,9 @@ Particular attention should be shown to:
'job_name' => 'praefect-gitaly',
'static_configs' => [
'targets' => [
- 'GITALY_HOST:9236', # gitaly-1
- 'GITALY_HOST:9236', # gitaly-2
- 'GITALY_HOST:9236', # gitaly-3
+ 'GITALY_HOST_1:9236', # gitaly-1
+ 'GITALY_HOST_2:9236', # gitaly-2
+ 'GITALY_HOST_3:9236', # gitaly-3
]
]
}
@@ -960,14 +1012,14 @@ To get started quickly:
gitlab-ctl reconfigure
```
-1. Set the Grafana admin password. This command prompts you to enter a new
+1. Set the Grafana administrator password. This command prompts you to enter a new
password:
```shell
gitlab-ctl set-grafana-password
```
-1. In your web browser, open `/-/grafana` (e.g.
+1. In your web browser, open `/-/grafana` (such as
`https://gitlab.example.com/-/grafana`) on your GitLab server.
Login using the password you set, and the username `admin`.
@@ -975,7 +1027,7 @@ To get started quickly:
1. Go to **Explore** and query `gitlab_build_info` to verify that you are
getting metrics from all your machines.
-Congratulations! You've configured an observable highly available Praefect
+Congratulations! You've configured an observable fault-tolerant Praefect
cluster.
## Distributed reads
@@ -983,18 +1035,12 @@ cluster.
> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled.
> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3.
> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5.
+> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8.
Praefect supports distribution of read operations across Gitaly nodes that are
configured for the virtual node.
-The feature is disabled by default. To enable distributed reads, the `gitaly_distributed_reads`
-[feature flag](../feature_flags.md) must be enabled in a Ruby console:
-
-```ruby
-Feature.enable(:gitaly_distributed_reads)
-```
-
-If enabled, all RPCs marked with `ACCESSOR` option like
+All RPCs marked with `ACCESSOR` option like
[GetBlob](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16)
are redirected to an up to date and healthy Gitaly node.
@@ -1025,9 +1071,8 @@ Praefect guarantees eventual consistency by replicating all writes to secondary
after the write to the primary Gitaly node has happened.
Praefect can instead provide strong consistency by creating a transaction and writing
-changes to all Gitaly nodes at once. Strong consistency is currently in
-[alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) and not enabled by
-default. If enabled, transactions are only available for a subset of RPCs. For more
+changes to all Gitaly nodes at once.
+If enabled, transactions are only available for a subset of RPCs. For more
information, see the [strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189).
To enable strong consistency:
@@ -1077,7 +1122,7 @@ replication factor offers better redundancy and distribution of read workload, b
in a higher storage cost. By default, Praefect replicates repositories to every storage in a
virtual storage.
-### Variable replication factor
+### Configure replication factors
WARNING:
The feature is not production ready yet. After you set a replication factor, you can't unset it
@@ -1088,36 +1133,46 @@ strategy is not production ready yet.
Praefect supports configuring a replication factor on a per-repository basis, by assigning
specific storage nodes to host a repository.
-[In an upcoming release](https://gitlab.com/gitlab-org/gitaly/-/issues/3362), we intend to
-support configuring a default replication factor for a virtual storage. The default replication factor
-is applied to every newly-created repository.
-
-Prafect does not store the actual replication factor, but assigns enough storages to host the repository
+Praefect does not store the actual replication factor, but assigns enough storages to host the repository
so the desired replication factor is met. If a storage node is later removed from the virtual storage,
the replication factor of repositories assigned to the storage is decreased accordingly.
-The only way to configure a repository's replication factor is the `set-replication-factor`
-sub-command. `set-replication-factor` automatically assigns or unassigns random storage nodes as necessary to
-reach the desired replication factor. The repository's primary node is always assigned
-first and is never unassigned.
+You can configure:
-```shell
-sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage <virtual-storage> -repository <relative-path> -replication-factor <replication-factor>
-```
+- A default replication factor for each virtual storage that is applied to newly-created repositories.
+ The configuration is added to the `/etc/gitlab/gitlab.rb` file:
-- `-virtual-storage` is the virtual storage the repository is located in.
-- `-repository` is the repository's relative path in the storage.
-- `-replication-factor` is the desired replication factor of the repository. The minimum value is
- `1`, as the primary needs a copy of the repository. The maximum replication factor is the number of
- storages in the virtual storage.
+ ```ruby
+ praefect['virtual_storages'] = {
+ 'default' => {
+ 'default_replication_factor' => 1,
+ # ...
+ }
+ }
+ ```
-On success, the assigned host storages are printed. For example:
+- A replication factor for an existing repository using the `set-replication-factor` sub-command.
+ `set-replication-factor` automatically assigns or unassigns random storage nodes as
+ necessary to reach the desired replication factor. The repository's primary node is
+ always assigned first and is never unassigned.
-```shell
-$ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage default -repository @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git -replication-factor 2
+ ```shell
+ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage <virtual-storage> -repository <relative-path> -replication-factor <replication-factor>
+ ```
-current assignments: gitaly-1, gitaly-2
-```
+ - `-virtual-storage` is the virtual storage the repository is located in.
+ - `-repository` is the repository's relative path in the storage.
+ - `-replication-factor` is the desired replication factor of the repository. The minimum value is
+ `1`, as the primary needs a copy of the repository. The maximum replication factor is the number of
+ storages in the virtual storage.
+
+ On success, the assigned host storages are printed. For example:
+
+ ```shell
+ $ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage default -repository @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git -replication-factor 2
+
+ current assignments: gitaly-1, gitaly-2
+ ```
## Automatic failover and leader election
@@ -1171,8 +1226,8 @@ To enable writes again, an administrator can:
### Check for data loss
-The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This is
-useful for identifying potential data loss after a failover. The following parameters are
+The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This can help
+identify potential data loss after a failover. The following parameters are
available:
- `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
@@ -1196,7 +1251,7 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t
```
Repositories which have assigned storage nodes that contain an outdated copy of the repository are listed
-in the output. A number of useful information is printed for each repository:
+in the output. This information is printed for each repository:
- A repository's relative path to the storage directory identifies each repository and groups the related
information.
@@ -1213,7 +1268,7 @@ in the output. A number of useful information is printed for each repository:
Whether a replica is assigned to host the repository is listed with each replica's status. `assigned host` is printed
next to replicas which are assigned to store the repository. The text is omitted if the replica contains a copy of
-the repository but is not assigned to store the repository. Such replicas won't be kept in-sync by Praefect but may
+the repository but is not assigned to store the repository. Such replicas aren't kept in-sync by Praefect, but may
act as replication sources to bring assigned replicas up to date.
Example output:
@@ -1282,7 +1337,7 @@ To check a project's repository checksums across on all Gitaly nodes, run the
### Enable writes or accept data loss
-Praefect provides the following subcommands to re-enable writes:
+Praefect provides the following sub-commands to re-enable writes:
- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
recovery attempts.
@@ -1324,7 +1379,7 @@ These tools reconcile the outdated repositories to bring them fully up to date a
Praefect automatically reconciles repositories that are not up to date. By default, this is done every
five minutes. For each outdated repository on a healthy Gitaly node, the Praefect picks a
-random, fully up to date replica of the repository on another healthy Gitaly node to replicate from. A
+random, fully up-to-date replica of the repository on another healthy Gitaly node to replicate from. A
replication job is scheduled only if there are no other replication jobs pending for the target
repository.
@@ -1363,10 +1418,10 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t
If your GitLab instance already has repositories on single Gitaly nodes, these aren't migrated to
Gitaly Cluster automatically.
-Project repositories may be moved from one storage location using the [Project repository storage moves API](../../api/project_repository_storage_moves.md):
+Project repositories may be moved from one storage location using the [Project repository storage moves API](../../api/project_repository_storage_moves.md). Note that this API cannot move all repository types. For moving other repositories types, see:
-NOTE:
-The Project repository storage moves API [cannot move all repository types](../../api/project_repository_storage_moves.md#limitations).
+- [Snippet repository storage moves API](../../api/snippet_repository_storage_moves.md).
+- [Group repository storage moves API](../../api/group_repository_storage_moves.md).
To move repositories to Gitaly Cluster:
@@ -1383,11 +1438,13 @@ To move repositories to Gitaly Cluster:
- The moves are in progress. Re-query the repository move until it completes successfully.
- The moves have failed. Most failures are temporary and are solved by rescheduling the move.
-1. Once the moves are complete, [query projects](../../api/projects.md#list-all-projects)
+1. After the moves are complete, [query projects](../../api/projects.md#list-all-projects)
using the API to confirm that all projects have moved. No projects should be returned
with `repository_storage` field set to the old storage.
-In a similar way, you can move Snippet repositories using the [Snippet repository storage moves API](../../api/snippet_repository_storage_moves.md):
+In a similar way, you can move other repository types by using the
+[Snippet repository storage moves API](../../api/snippet_repository_storage_moves.md) **(FREE SELF)**
+or the [Groups repository storage moves API](../../api/group_repository_storage_moves.md) **(PREMIUM SELF)**.
## Debugging Praefect
diff --git a/doc/administration/gitaly/reference.md b/doc/administration/gitaly/reference.md
index 5a004d97220..5105b9ed0d4 100644
--- a/doc/administration/gitaly/reference.md
+++ b/doc/administration/gitaly/reference.md
@@ -95,13 +95,13 @@ key_path = '/home/git/key.pem'
### Storage
-GitLab repositories are grouped into directories known as "storages"
-(e.g., `/home/git/repositories`) containing bare repositories managed
-by GitLab with names (e.g., `default`).
+GitLab repositories are grouped into directories known as storages, such as
+`/home/git/repositories`. They contain bare repositories managed
+by GitLab with names, such as `default`.
These names and paths are also defined in the `gitlab.yml` configuration file of
-GitLab. When you run Gitaly on the same machine as GitLab, which is the default
-and recommended configuration, storage paths defined in Gitaly's `config.toml`
+GitLab. When you run Gitaly on the same machine as GitLab (the default
+and recommended configuration) storage paths defined in Gitaly's `config.toml`
must match those in `gitlab.yml`.
| Name | Type | Required | Description |
@@ -232,9 +232,9 @@ The following values configure logging in Gitaly under the `[logging]` section.
| ---- | ---- | -------- | ----------- |
| `format` | string | no | Log format: `text` or `json`. Default: `text`. |
| `level` | string | no | Log level: `debug`, `info`, `warn`, `error`, `fatal`, or `panic`. Default: `info`. |
-| `sentry_dsn` | string | no | Sentry DSN for exception monitoring. |
+| `sentry_dsn` | string | no | Sentry DSN (Data Source Name) for exception monitoring. |
| `sentry_environment` | string | no | [Sentry Environment](https://docs.sentry.io/product/sentry-basics/environments/) for exception monitoring. |
-| `ruby_sentry_dsn` | string | no | Sentry DSN for `gitaly-ruby` exception monitoring. |
+| `ruby_sentry_dsn` | string | no | Sentry DSN (Data Source Name) for `gitaly-ruby` exception monitoring. |
While the main Gitaly application logs go to `stdout`, there are some extra log
files that go to a configured directory, like the GitLab Shell logs.