summaryrefslogtreecommitdiff
path: root/doc/administration/gitaly/praefect.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/gitaly/praefect.md')
-rw-r--r--doc/administration/gitaly/praefect.md522
1 files changed, 341 insertions, 181 deletions
diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md
index 21e5360e27b..e483bcc944a 100644
--- a/doc/administration/gitaly/praefect.md
+++ b/doc/administration/gitaly/praefect.md
@@ -43,8 +43,8 @@ default value. The default value depends on the GitLab version.
## Setup Instructions
-If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus
-package (highly recommended), follow the steps below:
+If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus GitLab package
+(highly recommended), follow the steps below:
1. [Preparation](#preparation)
1. [Configuring the Praefect database](#postgresql)
@@ -59,25 +59,27 @@ package (highly recommended), follow the steps below:
Before beginning, you should already have a working GitLab instance. [Learn how
to install GitLab](https://about.gitlab.com/install/).
-Provision a PostgreSQL server (PostgreSQL 11 or newer).
+Provision a PostgreSQL server. We recommend using the PostgreSQL that is shipped
+with Omnibus GitLab and use it to configure the PostgreSQL database. You can use an
+external PostgreSQL server (version 11 or newer) but you must set it up [manually](#manual-database-setup).
-Prepare all your new nodes by [installing
-GitLab](https://about.gitlab.com/install/).
+Prepare all your new nodes by [installing GitLab](https://about.gitlab.com/install/). You need:
+- 1 PostgreSQL node
+- 1 PgBouncer node (optional)
- At least 1 Praefect node (minimal storage required)
- 3 Gitaly nodes (high CPU, high memory, fast storage)
- 1 GitLab server
-You need the IP/host address for each node.
+You also need the IP/host address for each node:
-1. `LOAD_BALANCER_SERVER_ADDRESS`: the IP/host address of the load balancer
-1. `POSTGRESQL_SERVER_ADDRESS`: the IP/host address of the PostgreSQL server
+1. `PRAEFECT_LOADBALANCER_HOST`: the IP/host address of Praefect load balancer
+1. `POSTGRESQL_HOST`: the IP/host address of the PostgreSQL server
+1. `PGBOUNCER_HOST`: the IP/host address of the PostgreSQL server
1. `PRAEFECT_HOST`: the IP/host address of the Praefect server
1. `GITALY_HOST_*`: the IP or host address of each Gitaly server
1. `GITLAB_HOST`: the IP/host address of the GitLab server
-If you are using a cloud provider, you can look up the addresses for each server through your cloud provider's management console.
-
If you are using Google Cloud Platform, SoftLayer, or any other vendor that provides a virtual private cloud (VPC) you can use the private addresses for each cloud instance (corresponds to "internal address" for Google Cloud Platform) for `PRAEFECT_HOST`, `GITALY_HOST_*`, and `GITLAB_HOST`.
#### Secrets
@@ -98,6 +100,14 @@ with secure tokens as you complete the setup process.
Praefect cluster directly; that could lead to data loss.
1. `PRAEFECT_SQL_PASSWORD`: this password is used by Praefect to connect to
PostgreSQL.
+1. `PRAEFECT_SQL_PASSWORD_HASH`: the hash of password of the Praefect user.
+ Use `gitlab-ctl pg-password-md5 praefect` to generate the hash. The command
+ asks for the password for `praefect` user. Enter `PRAEFECT_SQL_PASSWORD`
+ plaintext password. By default, Praefect uses `praefect` user, but you can
+ change it.
+1. `PGBOUNCER_SQL_PASSWORD_HASH`: the hash of password of the PgBouncer user.
+ PgBouncer uses this password to connect to PostgreSQL. For more details
+ see [bundled PgBouncer](../postgresql/pgbouncer.md) documentation.
We note in the instructions below where these secrets are required.
@@ -108,127 +118,210 @@ Omnibus GitLab installations can use `gitlab-secrets.json` for `GITLAB_SHELL_SEC
NOTE:
Do not store the GitLab application database and the Praefect
-database on the same PostgreSQL server if using
-[Geo](../geo/index.md). The replication state is internal to each instance
-of GitLab and should not be replicated.
+database on the same PostgreSQL server if using [Geo](../geo/index.md).
+The replication state is internal to each instance of GitLab and should
+not be replicated.
These instructions help set up a single PostgreSQL database, which creates a single point of
-failure. The following options are available:
+failure. Alternatively, [you can use PostgreSQL replication and failover](../postgresql/replication_and_failover.md).
+
+The following options are available:
- For non-Geo installations, either:
- Use one of the documented [PostgreSQL setups](../postgresql/index.md).
- - Use your own third-party database setup, if fault tolerance is required.
+ - Use your own third-party database setup. This will require [manual setup](#manual-database-setup).
- For Geo instances, either:
- Set up a separate [PostgreSQL instance](https://www.postgresql.org/docs/11/high-availability.html).
- Use a cloud-managed PostgreSQL service. AWS
[Relational Database Service](https://aws.amazon.com/rds/) is recommended.
-To complete this section you need:
+#### Manual database setup
-- 1 Praefect node
-- 1 PostgreSQL server (PostgreSQL 11 or newer)
- - An SQL user with permissions to create databases
+To complete this section you need:
-During this section, we configure the PostgreSQL server, from the Praefect
-node, using `psql` which is installed by Omnibus GitLab.
+- One Praefect node
+- One PostgreSQL node (version 11 or newer)
+ - A PostgreSQL user with permissions to manage the database server
-1. SSH into the **Praefect** node and login as root:
+In this section, we configure the PostgreSQL database. This can be used for both external
+and Omnibus-provided PostgreSQL server.
- ```shell
- sudo -i
- ```
+To run the following instructions, you can use the Praefect node, where `psql` is installed
+by Omnibus GitLab (`/opt/gitlab/embedded/bin/psql`). If you are using the Omnibus-provided
+PostgreSQL you can use `gitlab-psql` on the PostgreSQL node instead:
-1. Connect to the PostgreSQL server with administrative access. This is likely
- the `postgres` user. The database `template1` is used because it is created
- by default on all PostgreSQL servers.
+1. Create a new user `praefect` to be used by Praefect:
- ```shell
- /opt/gitlab/embedded/bin/psql -U postgres -d template1 -h POSTGRESQL_SERVER_ADDRESS
+ ```sql
+ CREATE ROLE praefect WITH LOGIN PASSWORD 'PRAEFECT_SQL_PASSWORD';
```
- Create a new user `praefect` to be used by Praefect. Replace
- `PRAEFECT_SQL_PASSWORD` with the strong password you generated in the
- preparation step.
+ Replace `PRAEFECT_SQL_PASSWORD` with the strong password you generated in the preparation step.
+
+1. Create a new database `praefect_production` that is owned by `praefect` user.
```sql
- CREATE ROLE praefect WITH LOGIN CREATEDB PASSWORD 'PRAEFECT_SQL_PASSWORD';
+ CREATE DATABASE praefect_production WITH OWNER praefect ENCODING UTF8;
```
-1. Reconnect to the PostgreSQL server, this time as the `praefect` user:
+For using Omnibus-provided PgBouncer you need to take the following additional steps. We strongly
+recommend using the PostgreSQL that is shipped with Omnibus as the backend. The following
+instructions only work on Omnibus-provided PostgreSQL:
- ```shell
- /opt/gitlab/embedded/bin/psql -U praefect -d template1 -h POSTGRESQL_SERVER_ADDRESS
+1. For Omnibus-provided PgBouncer, you need to use the hash of `praefect` user instead the of the
+ actual password:
+
+ ```sql
+ ALTER ROLE praefect WITH PASSWORD 'md5<PRAEFECT_SQL_PASSWORD_HASH>';
```
- Create a new database `praefect_production`. By creating the database while
- connected as the `praefect` user, we are confident they have access.
+ Replace `<PRAEFECT_SQL_PASSWORD_HASH>` with the hash of the password you generated in the
+ preparation step. Note that it is prefixed with `md5` literal.
+
+1. The PgBouncer that is shipped with Omnibus is configured to use [`auth_query`](https://www.pgbouncer.org/config.html#generic-settings)
+ and uses `pg_shadow_lookup` function. You need to create this function in `praefect_production`
+ database:
```sql
- CREATE DATABASE praefect_production WITH ENCODING=UTF8;
+ CREATE OR REPLACE FUNCTION public.pg_shadow_lookup(in i_username text, out username text, out password text) RETURNS record AS $$
+ BEGIN
+ SELECT usename, passwd FROM pg_catalog.pg_shadow
+ WHERE usename = i_username INTO username, password;
+ RETURN;
+ END;
+ $$ LANGUAGE plpgsql SECURITY DEFINER;
+
+ REVOKE ALL ON FUNCTION public.pg_shadow_lookup(text) FROM public, pgbouncer;
+ GRANT EXECUTE ON FUNCTION public.pg_shadow_lookup(text) TO pgbouncer;
```
The database used by Praefect is now configured.
If you see Praefect database errors after configuring PostgreSQL, see
-[troubleshooting steps](index.md#relation-does-not-exist-errors).
+[troubleshooting steps](troubleshooting.md#relation-does-not-exist-errors).
-#### PgBouncer
+#### Use PgBouncer
To reduce PostgreSQL resource consumption, we recommend setting up and configuring
[PgBouncer](https://www.pgbouncer.org/) in front of the PostgreSQL instance. To do
-this, set the corresponding IP or host address of the PgBouncer instance in
-`/etc/gitlab/gitlab.rb` by changing the following settings:
+this, you must point Praefect to PgBouncer by setting Praefect database parameters:
-- `praefect['database_host']`, for the address.
-- `praefect['database_port']`, for the port.
+```ruby
+praefect['database_host'] = PGBOUNCER_HOST
+praefect['database_port'] = 6432
+praefect['database_user'] = 'praefect'
+praefect['database_password'] = PRAEFECT_SQL_PASSWORD
+praefect['database_dbname'] = 'praefect_production'
+#praefect['database_sslmode'] = '...'
+#praefect['database_sslcert'] = '...'
+#praefect['database_sslkey'] = '...'
+#praefect['database_sslrootcert'] = '...'
+```
-Because PgBouncer manages resources more efficiently, Praefect still requires a
-direct connection to the PostgreSQL database. It uses the
-[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html)
-feature that is [not supported](https://www.pgbouncer.org/features.html) by
-PgBouncer with `pool_mode = transaction`.
-Set `praefect['database_host_no_proxy']` and `praefect['database_port_no_proxy']`
-to a direct connection, and not a PgBouncer connection.
+Praefect requires an additional connection to the PostgreSQL that supports the
+[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html) feature. With PgBouncer
+this feature is only available with `session` pool mode (`pool_mode = session`).
+It is not supported in `transaction` pool mode (`pool_mode = transaction`).
-Save the changes to `/etc/gitlab/gitlab.rb` and
-[reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure).
+For the additional connection, you must either:
-This documentation doesn't provide PgBouncer installation instructions,
-but you can:
+- Connect Praefect directly to PostgreSQL and bypass PgBouncer.
+- Configure a new PgBouncer database that uses to the same PostgreSQL database endpoint,
+ but with different pool mode. That is, `pool_mode = session`.
-- Find instructions on the [official website](https://www.pgbouncer.org/install.html).
-- Use a [Docker image](https://hub.docker.com/r/edoburu/pgbouncer/).
+Praefect can be configured to use different connection parameters for direct access
+to PostgreSQL. This is the connection that supports the `LISTEN` feature.
-In addition to the base PgBouncer configuration options, set the following values in
-your `pgbouncer.ini` file:
+Here is an example of Praefect that bypasses PgBouncer and directly connects to PostgreSQL:
-- The [Praefect PostgreSQL database](#postgresql) in the `[databases]` section:
+```ruby
+praefect['database_direct_host'] = POSTGRESQL_HOST
+praefect['database_direct_port'] = 5432
+
+# Use the following to override parameters of direct database connection.
+# Comment out where the parameters are the same for both connections.
+
+praefect['database_direct_user'] = 'praefect'
+praefect['database_direct_password'] = PRAEFECT_SQL_PASSWORD
+praefect['database_direct_dbname'] = 'praefect_production'
+#praefect['database_direct_sslmode'] = '...'
+#praefect['database_direct_sslcert'] = '...'
+#praefect['database_direct_sslkey'] = '...'
+#praefect['database_direct_sslrootcert'] = '...'
+```
- ```ini
- [databases]
- * = host=POSTGRESQL_SERVER_ADDRESS port=5432 auth_user=praefect
- ```
+We recommend using PgBouncer with `session` pool mode instead. You can use the [bundled
+PgBouncer](../postgresql/pgbouncer.md) or use an external PgBouncer and [configure it
+manually](https://www.pgbouncer.org/config.html).
-- [`pool_mode`](https://www.pgbouncer.org/config.html#pool_mode)
- and [`ignore_startup_parameters`](https://www.pgbouncer.org/config.html#ignore_startup_parameters)
- in the `[pgbouncer]` section:
+The following example uses the bundled PgBouncer and sets up two separate connection pools,
+one in `session` pool mode and the other in `transaction` pool mode. For this example to work,
+you need to prepare PostgreSQL server with [setup instruction](#manual-database-setup):
- ```ini
- [pgbouncer]
- pool_mode = transaction
- ignore_startup_parameters = extra_float_digits
- ```
+```ruby
+pgbouncer['databases'] = {
+ # Other database configuation including gitlabhq_production
+ ...
+
+ praefect_production: {
+ host: POSTGRESQL_HOST,
+ # Use `pgbouncer` user to connect to database backend.
+ user: 'pgbouncer',
+ password: PGBOUNCER_SQL_PASSWORD_HASH,
+ pool_mode: 'transaction'
+ }
+ praefect_production_direct: {
+ host: POSTGRESQL_HOST,
+ # Use `pgbouncer` user to connect to database backend.
+ user: 'pgbouncer',
+ password: PGBOUNCER_SQL_PASSWORD_HASH,
+ dbname: 'praefect_production',
+ pool_mode: 'session'
+ },
+
+ ...
+}
+```
+
+Both `praefect_production` and `praefect_production_direct` use the same database endpoint
+(`praefect_production`), but with different pool modes. This translates to the following
+`databases` section of PgBouncer:
-The `praefect` user and its password should be included in the file (default is
-`userlist.txt`) used by PgBouncer if the [`auth_file`](https://www.pgbouncer.org/config.html#auth_file)
-configuration option is set.
+```ini
+[databases]
+praefect_production = host=POSTGRESQL_HOST auth_user=pgbouncer pool_mode=transaction
+praefect_production_direct = host=POSTGRESQL_HOST auth_user=pgbouncer dbname=praefect_production pool_mode=session
+```
+
+Now you can configure Praefect to use PgBouncer for both connections:
+
+```ruby
+praefect['database_host'] = PGBOUNCER_HOST
+praefect['database_port'] = 6432
+praefect['database_user'] = 'praefect'
+# `PRAEFECT_SQL_PASSWORD` is the plain-text password of
+# Praefect user. Not to be confused with `PRAEFECT_SQL_PASSWORD_HASH`.
+praefect['database_password'] = PRAEFECT_SQL_PASSWORD
+
+praefect['database_dbname'] = 'praefect_production'
+praefect['database_direct_dbname'] = 'praefect_production_direct'
+
+# There is no need to repeat the following. Parameters of direct
+# database connection will fall back to the values above.
+
+#praefect['database_direct_host'] = PGBOUNCER_HOST
+#praefect['database_direct_port'] = 6432
+#praefect['database_direct_user'] = 'praefect'
+#praefect['database_direct_password'] = PRAEFECT_SQL_PASSWORD
+```
+
+With this configuration, Praefect uses PgBouncer for both connection types.
NOTE:
-By default PgBouncer uses port `6432` to accept incoming
-connections. You can change it by setting the [`listen_port`](https://www.pgbouncer.org/config.html#listen_port)
-configuration option. We recommend setting it to the default port value (`5432`) used by
-PostgreSQL instances. Otherwise you should change the configuration parameter
-`praefect['database_port']` for each Praefect instance to the correct value.
+Omnibus GitLab handles the authentication requirements (using `auth_query`), but if you are preparing
+your databases manually and configuring an external PgBouncer, you must include `praefect` user and
+its password in the file used by PgBouncer. For example, `userlist.txt` if the [`auth_file`](https://www.pgbouncer.org/config.html#auth_file)
+configuration option is set. For more details, consult the PgBouncer documentation.
### Praefect
@@ -241,17 +334,10 @@ If there are multiple Praefect nodes:
To complete this section you need a [configured PostgreSQL server](#postgresql), including:
-- IP/host address (`POSTGRESQL_SERVER_ADDRESS`)
-- Password (`PRAEFECT_SQL_PASSWORD`)
-
Praefect should be run on a dedicated node. Do not run Praefect on the
application server, or a Gitaly node.
-1. SSH into the **Praefect** node and login as root:
-
- ```shell
- sudo -i
- ```
+On the **Praefect** node:
1. Disable all other services by editing `/etc/gitlab/gitlab.rb`:
@@ -295,22 +381,8 @@ application server, or a Gitaly node.
praefect['auth_token'] = 'PRAEFECT_EXTERNAL_TOKEN'
```
-1. Configure **Praefect** to connect to the PostgreSQL database by editing
- `/etc/gitlab/gitlab.rb`.
-
- You need to replace `POSTGRESQL_SERVER_ADDRESS` with the IP/host address
- of the database, and `PRAEFECT_SQL_PASSWORD` with the strong password set
- above.
-
- ```ruby
- praefect['database_host'] = 'POSTGRESQL_SERVER_ADDRESS'
- praefect['database_port'] = 5432
- praefect['database_user'] = 'praefect'
- praefect['database_password'] = 'PRAEFECT_SQL_PASSWORD'
- praefect['database_dbname'] = 'praefect_production'
- praefect['database_host_no_proxy'] = 'POSTGRESQL_SERVER_ADDRESS'
- praefect['database_port_no_proxy'] = 5432
- ```
+1. Configure **Praefect** to [connect to the PostgreSQL database](#postgresql). We
+ highly recommend using [PgBouncer](#use-pgbouncer) as well.
If you want to use a TLS client certificate, the options below can be used:
@@ -507,7 +579,7 @@ To configure Praefect with TLS:
```ruby
git_data_dirs({
"default" => {
- "gitaly_address" => 'tls://LOAD_BALANCER_SERVER_ADDRESS:2305',
+ "gitaly_address" => 'tls://PRAEFECT_LOADBALANCER_HOST:2305',
"gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
}
})
@@ -544,7 +616,7 @@ To configure Praefect with TLS:
repositories:
storages:
default:
- gitaly_address: tls://LOAD_BALANCER_SERVER_ADDRESS:3305
+ gitaly_address: tls://PRAEFECT_LOADBALANCER_HOST:3305
path: /some/local/path
```
@@ -817,7 +889,7 @@ Particular attention should be shown to:
You need to replace:
- - `LOAD_BALANCER_SERVER_ADDRESS` with the IP address or hostname of the load
+ - `PRAEFECT_LOADBALANCER_HOST` with the IP address or hostname of the load
balancer.
- `PRAEFECT_EXTERNAL_TOKEN` with the real secret
@@ -826,7 +898,7 @@ Particular attention should be shown to:
```ruby
git_data_dirs({
"default" => {
- "gitaly_address" => "tcp://LOAD_BALANCER_SERVER_ADDRESS:2305",
+ "gitaly_address" => "tcp://PRAEFECT_LOADBALANCER_HOST:2305",
"gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
}
})
@@ -926,7 +998,7 @@ For example:
git_data_dirs({
'default' => { 'gitaly_address' => 'tcp://old-gitaly.internal:8075' },
'cluster' => {
- 'gitaly_address' => 'tcp://<load_balancer_server_address>:2305',
+ 'gitaly_address' => 'tcp://<PRAEFECT_LOADBALANCER_HOST>:2305',
'gitaly_token' => '<praefect_external_token>'
}
})
@@ -981,6 +1053,26 @@ To get started quickly:
Congratulations! You've configured an observable fault-tolerant Praefect
cluster.
+## Network connectivity requirements
+
+Gitaly Cluster components need to communicate with each other over many routes.
+Your firewall rules must allow the following for Gitaly Cluster to function properly:
+
+| From | To | Default port / TLS port |
+|:-----------------------|:------------------------|:------------------------|
+| GitLab | Praefect load balancer | `2305` / `3305` |
+| Praefect load balancer | Praefect | `2305` / `3305` |
+| Praefect | Gitaly | `8075` / `9999` |
+| Gitaly | GitLab (internal API) | `80` / `443` |
+| Gitaly | Praefect load balancer | `2305` / `3305` |
+| Gitaly | Praefect | `2305` / `3305` |
+| Gitaly | Gitaly | `8075` / `9999` |
+
+NOTE:
+Gitaly does not directly connect to Praefect. However, requests from Gitaly to the Praefect
+load balancer may still be blocked unless firewalls on the Praefect nodes allow traffic from
+the Gitaly nodes.
+
## Distributed reads
> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled.
@@ -1147,24 +1239,30 @@ The `per_repository` election strategy solves this problem by electing a primary
repository. Combined with [configurable replication factors](#configure-replication-factor), you can
horizontally scale storage capacity and distribute write load across Gitaly nodes.
-Primary elections are run when:
+Primary elections are run:
-- Praefect starts up.
-- The cluster's consensus of a Gitaly node's health changes.
+- In GitLab 14.1 and later, lazily. This means that Praefect doesn't immediately elect
+ a new primary node if the current one is unhealthy. A new primary is elected if it is
+ necessary to serve a request while the current primary is unavailable.
+- In GitLab 13.12 to GitLab 14.0 when:
+ - Praefect starts up.
+ - The cluster's consensus of a Gitaly node's health changes.
-A Gitaly node is considered:
+A valid primary node candidate is a Gitaly node that:
-- Healthy if `>=50%` Praefect nodes have successfully health checked the Gitaly node in the
- previous ten seconds.
-- Unhealthy otherwise.
+- Is healthy. A Gitaly node is considered healthy if `>=50%` Praefect nodes have
+ successfully health checked the Gitaly node in the previous ten seconds.
+- Has a fully up to date copy of the repository.
-During an election run, Praefect elects a new primary Gitaly node for each repository that has
-an unhealthy primary Gitaly node. The election is made:
+If there are multiple primary node candidates, Praefect:
-- Randomly from healthy secondary Gitaly nodes that are the most up to date.
-- Only from Gitaly nodes assigned to the host repository.
+- Picks one of them randomly.
+- Prioritizes promoting a Gitaly node that is assigned to host the repository. If
+ there are no assigned Gitaly nodes to elect as the primary, Praefect may temporarily
+ elect an unassigned one. The unassigned primary is demoted in favor of an assigned
+ one when one becomes available.
-If there are no healthy secondary nodes for a repository:
+If there are no valid primary candidates for a repository:
- The unhealthy primary node is demoted and the repository is left without a primary node.
- Operations that require a primary node fail until a primary is successfully elected.
@@ -1212,7 +1310,7 @@ To migrate existing clusters:
- If downtime is unacceptable:
- 1. Determine which Gitaly node is [the current primary](index.md#determine-primary-gitaly-node).
+ 1. Determine which Gitaly node is [the current primary](troubleshooting.md#determine-primary-gitaly-node).
1. Comment out the secondary Gitaly nodes from the virtual storage's configuration in `/etc/gitlab/gitlab.rb`
on all Praefect nodes. This ensures there's only one Gitaly node configured, causing both of the election
@@ -1259,23 +1357,37 @@ Migrate to [repository-specific primary nodes](#repository-specific-primary-node
Gitaly Cluster recovers from a failing primary Gitaly node by promoting a healthy secondary as the
new primary.
-To minimize data loss, Gitaly Cluster:
+In GitLab 14.1 and later, Gitaly Cluster:
+
+- Elects a healthy secondary with a fully up to date copy of the repository as the new primary.
+- Repository becomes unavailable if there are no fully up to date copies of it on healthy secondaries.
+
+To minimize data loss in GitLab 13.0 to 14.0, Gitaly Cluster:
- Switches repositories that are outdated on the new primary to [read-only mode](#read-only-mode).
-- Elects the secondary with the least unreplicated writes from the primary to be the new primary.
- Because there can still be some unreplicated writes, [data loss can occur](#check-for-data-loss).
+- Elects the secondary with the least unreplicated writes from the primary to be the new
+ primary. Because there can still be some unreplicated writes,
+ [data loss can occur](#check-for-data-loss).
### Read-only mode
> - Introduced in GitLab 13.0 as [generally available](https://about.gitlab.com/handbook/product/gitlab-the-product/#generally-available-ga).
> - Between GitLab 13.0 and GitLab 13.2, read-only mode applied to the whole virtual storage and occurred whenever failover occurred.
> - [In GitLab 13.3 and later](https://gitlab.com/gitlab-org/gitaly/-/issues/2862), read-only mode applies on a per-repository basis and only occurs if a new primary is out of date.
+new primary. If the failed primary contained unreplicated writes, [data loss can occur](#check-for-data-loss).
+> - Removed in GitLab 14.1. Instead, repositories [become unavailable](#unavailable-repositories).
+
+In GitLab 13.0 to 14.0, when Gitaly Cluster switches to a new primary, repositories enter
+read-only mode if they are out of date. This can happen after failing over to an outdated
+secondary. Read-only mode eases data recovery efforts by preventing writes that may conflict
+with the unreplicated writes on other nodes.
-When Gitaly Cluster switches to a new primary, repositories enter read-only mode if they are out of
-date. This can happen after failing over to an outdated secondary. Read-only mode eases data
-recovery efforts by preventing writes that may conflict with the unreplicated writes on other nodes.
+When Gitaly Cluster switches to a new primary In GitLab 13.0 to 14.0, repositories enter
+read-only mode if they are out of date. This can happen after failing over to an outdated
+secondary. Read-only mode eases data recovery efforts by preventing writes that may conflict
+with the unreplicated writes on other nodes.
-To enable writes again, an administrator can:
+To enable writes again in GitLab 13.0 to 14.0, an administrator can:
1. [Check](#check-for-data-loss) for data loss.
1. Attempt to [recover](#data-recovery) missing data.
@@ -1283,21 +1395,38 @@ To enable writes again, an administrator can:
[accept data loss](#enable-writes-or-accept-data-loss) if necessary, depending on the version of
GitLab.
+## Unavailable repositories
+
+> - From GitLab 13.0 through 14.0, repositories became read-only if they were outdated on the primary but fully up to date on a healthy secondary. `dataloss` sub-command displays read-only repositories by default through these versions.
+> - Since GitLab 14.1, Praefect contains more responsive failover logic which immediately fails over to one of the fully up to date secondaries rather than placing the repository in read-only mode. Since GitLab 14.1, the `dataloss` sub-command displays repositories which are unavailable due to having no fully up to date copies on healthy Gitaly nodes.
+
+A repository is unavailable if all of its up to date replicas are unavailable. Unavailable repositories are
+not accessible through Praefect to prevent serving stale data that may break automated tooling.
+
### Check for data loss
-The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This can help
-identify potential data loss after a failover. The following parameters are
-available:
+The Praefect `dataloss` subcommand identifies:
+
+- Copies of repositories in GitLab 13.0 to GitLab 14.0 that at are likely to be outdated.
+ This can help identify potential data loss after a failover.
+- Repositories in GitLab 14.1 and later that are unavailable. This helps identify potential
+ data loss and repositories which are no longer accessible because all of their up-to-date
+ replicas copies are unavailable.
+
+The following parameters are available:
-- `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
- display outdated replicas of read-only repositories as they might require administrator action.
-- In GitLab 13.3 and later, `-partially-replicated` that specifies whether to display a list of
- [outdated replicas of writable repositories](#outdated-replicas-of-writable-repositories).
+- `-virtual-storage` that specifies which virtual storage to check. Because they might require
+ an administrator to intervene, the default behavior is to display:
+ - In GitLab 13.0 to 14.0, copies of read-only repositories.
+ - In GitLab 14.1 and later, unavailable repositories.
+- In GitLab 14.1 and later, [`-partially-unavailable`](#unavailable-replicas-of-available-repositories)
+ that specifies whether to include in the output repositories that are available but have
+ some assigned copies that are not available.
NOTE:
`dataloss` is still in beta and the output format is subject to change.
-To check for repositories with outdated primaries, run:
+To check for repositories with outdated primaries or for unavailable repositories, run:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>]
@@ -1309,13 +1438,20 @@ Every configured virtual storage is checked if none is specified:
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
```
-Repositories which have assigned storage nodes that contain an outdated copy of the repository are listed
-in the output. This information is printed for each repository:
+Repositories are listed in the output that have either:
+
+- An outdated copy of the repository on the primary, in GitLab 13.0 to GitLab 14.0.
+- No healthy and fully up-to-date copies available, in GitLab 14.1 and later.
+
+The following information is printed for each repository:
- A repository's relative path to the storage directory identifies each repository and groups the related
information.
-- The repository's current status is printed in parentheses next to the disk path. If the repository's primary
- is outdated, the repository is in `read-only` mode and can't accept writes. Otherwise, the mode is `writable`.
+- The repository's current status is printed in parentheses next to the disk path:
+ - In GitLab 13.0 to 14.0, either `(read-only)` if the repository's primary node is outdated
+ and can't accept writes. Otherwise, `(writable)`.
+ - In GitLab 14.1 and later, `(unavailable)` is printed next to the disk path if the
+ repository is unavailable.
- The primary field lists the repository's current primary. If the repository has no primary, the field shows
`No Primary`.
- The In-Sync Storages lists replicas which have replicated the latest successful write and all writes
@@ -1325,44 +1461,51 @@ in the output. This information is printed for each repository:
is listed next to replica. It's important to notice that the outdated replicas may be fully up to date or contain
later changes but Praefect can't guarantee it.
-Whether a replica is assigned to host the repository is listed with each replica's status. `assigned host` is printed
-next to replicas which are assigned to store the repository. The text is omitted if the replica contains a copy of
-the repository but is not assigned to store the repository. Such replicas aren't kept in-sync by Praefect, but may
-act as replication sources to bring assigned replicas up to date.
+Additional information includes:
+
+- Whether a node is assigned to host the repository is listed with each node's status.
+ `assigned host` is printed next to nodes that are assigned to store the repository. The
+ text is omitted if the node contains a copy of the repository but is not assigned to store
+ the repository. Such copies aren't kept in sync by Praefect, but may act as replication
+ sources to bring assigned copies up to date.
+- In GitLab 14.1 and later, `unhealthy` is printed next to the copies that are located
+ on unhealthy Gitaly nodes.
Example output:
```shell
Virtual storage: default
Outdated repositories:
- @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (read-only):
+ @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (unavailable):
Primary: gitaly-1
In-Sync Storages:
- gitaly-2, assigned host
+ gitaly-2, assigned host, unhealthy
Outdated Storages:
gitaly-1 is behind by 3 changes or less, assigned host
gitaly-3 is behind by 3 changes or less
```
-A confirmation is printed out when every repository is writable. For example:
+A confirmation is printed out when every repository is available. For example:
```shell
Virtual storage: default
- All repositories are writable!
+ All repositories are available!
```
-#### Outdated replicas of writable repositories
+#### Unavailable replicas of available repositories
-> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3019) in GitLab 13.3.
+NOTE:
+In GitLab 14.0 and earlier, the flag is `-partially-replicated` and the output shows any repositories with assigned nodes with outdated
+copies.
-To also list information of repositories whose primary is up to date but one or more assigned
-replicas are outdated, use the `-partially-replicated` flag.
+To also list information of repositories which are available but are unavailable from some of the assigned nodes,
+use the `-partially-unavailable` flag.
-A repository is writable if the primary has the latest changes. Secondaries might be temporarily
-outdated while they are waiting to replicate the latest changes.
+A repository is available if there is a healthy, up to date replica available. Some of the assigned secondary
+replicas may be temporarily unavailable for access while they are waiting to replicate the latest changes.
```shell
-sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-replicated]
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-unavailable]
```
Example output:
@@ -1370,7 +1513,7 @@ Example output:
```shell
Virtual storage: default
Outdated repositories:
- @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (writable):
+ @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git:
Primary: gitaly-1
In-Sync Storages:
gitaly-1, assigned host
@@ -1379,14 +1522,14 @@ Virtual storage: default
gitaly-3 is behind by 3 changes or less
```
-With the `-partially-replicated` flag set, a confirmation is printed out if every assigned replica is fully up to
-date.
+With the `-partially-unavailable` flag set, a confirmation is printed out if every assigned replica is fully up to
+date and healthy.
For example:
```shell
Virtual storage: default
- All repositories are up to date!
+ All repositories are fully available on all assigned storages!
```
### Check repository checksums
@@ -1394,30 +1537,50 @@ Virtual storage: default
To check a project's repository checksums across on all Gitaly nodes, run the
[replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node.
+### Accept data loss
+
+WARNING:
+`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
+[recovery efforts](#data-recovery) must be performed before using it.
+
+If it is not possible to bring one of the up to date replicas back online, you may have to accept data
+loss. When accepting data loss, Praefect marks the chosen replica of the repository as the latest version
+and replicates it to the other assigned Gitaly nodes. This process overwrites any other version of the
+repository so care must be taken.
+
+```shell
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss
+-virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
+```
+
### Enable writes or accept data loss
-Praefect provides the following sub-commands to re-enable writes:
+WARNING:
+`accept-dataloss` causes permanent data loss by overwriting other versions of the repository.
+Data [recovery efforts](#data-recovery) must be performed before using it.
-- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
- recovery attempts.
+Praefect provides the following subcommands to re-enable writes or accept data loss:
- ```shell
- sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
- ```
+- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after
+ data recovery attempts:
-- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and later,
- `accept-dataloss` to accept data loss and re-enable writes for repositories after data recovery
- attempts have failed. Accepting data loss causes current version of the repository on the
- authoritative storage to be considered latest. Other storages are brought up to date with the
- authoritative storage by scheduling replication jobs.
+ ```shell
+ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
+ ```
+
+- In GitLab 13.3 and later, if it is not possible to bring one of the up to date nodes back
+ online, you may have to accept data loss:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
```
-WARNING:
-`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
-[recovery efforts](#data-recovery) must be performed before using it.
+ When accepting data loss, Praefect:
+
+ 1. Marks the chosen copy of the repository as the latest version.
+ 1. Replicates the copy to the other assigned Gitaly nodes.
+
+ This process overwrites any other copy of the repository so care must be taken.
## Data recovery
@@ -1463,10 +1626,7 @@ praefect['reconciliation_scheduling_interval'] = '0' # disable the feature
### Manual reconciliation
WARNING:
-The `reconcile` sub-command is deprecated and scheduled for removal in GitLab 14.0. Use
-[automatic reconciliation](#automatic-reconciliation) instead. Manual reconciliation may
-produce excess replication jobs and is limited in functionality. Manual reconciliation does
-not work when [repository-specific primary nodes](#repository-specific-primary-nodes) are
+The `reconcile` sub-command was removed in GitLab 14.1. Use [automatic reconciliation](#automatic-reconciliation) instead. Manual reconciliation may produce excess replication jobs and is limited in functionality. Manual reconciliation does not work when [repository-specific primary nodes](#repository-specific-primary-nodes) are
enabled.
The Praefect `reconcile` sub-command allows for the manual reconciliation between two Gitaly nodes. The
@@ -1509,7 +1669,7 @@ After creating and configuring Gitaly Cluster:
1. Ensure all storages are accessible to the GitLab instance. In this example, these are
`<original_storage_name>` and `<cluster_storage_name>`.
1. [Configure repository storage weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored)
- so that the Gitaly Cluster receives all new projects. This stops new projects being created
+ so that the Gitaly Cluster receives all new projects. This stops new projects from being created
on existing Gitaly nodes while the migration is in progress.
1. Schedule repository moves for:
- [Projects](#bulk-schedule-project-moves).