From 983a0bba5d2a042c4a3bbb22432ec192c7501d82 Mon Sep 17 00:00:00 2001 From: GitLab Bot Date: Mon, 20 Apr 2020 18:38:24 +0000 Subject: Add latest changes from gitlab-org/gitlab@12-10-stable-ee --- doc/administration/gitaly/praefect.md | 79 +++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) (limited to 'doc/administration/gitaly') diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md index d1d0c358dc6..19a69dc348c 100644 --- a/doc/administration/gitaly/praefect.md +++ b/doc/administration/gitaly/praefect.md @@ -300,6 +300,60 @@ application server, or a Gitaly node. edit `/etc/gitlab/gitlab.rb`, remember to run `sudo gitlab-ctl reconfigure` again before trying the `sql-ping` command. +#### Automatic failover + +When automatic failover is enabled, Praefect will do automatic detection of the health of internal Gitaly nodes. If the +primary has a certain amount of health checks fail, it will decide to promote one of the secondaries to be primary, and +demote the primary to be a secondary. + +1. To enable automatic failover, edit `/etc/gitlab/gitlab.rb`: + + ```ruby + # failover_enabled turns on automatic failover + praefect['failover_enabled'] = true + praefect['virtual_storages'] = { + 'praefect' => { + 'gitaly-1' => { + 'address' => 'tcp://GITALY_HOST:8075', + 'token' => 'PRAEFECT_INTERNAL_TOKEN', + 'primary' => true + }, + 'gitaly-2' => { + 'address' => 'tcp://GITALY_HOST:8075', + 'token' => 'PRAEFECT_INTERNAL_TOKEN' + }, + 'gitaly-3' => { + 'address' => 'tcp://GITALY_HOST:8075', + 'token' => 'PRAEFECT_INTERNAL_TOKEN' + } + } + } + ``` + +1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). + +Below is the picture when Praefect starts up with the config.toml above: + +```mermaid +graph TD + A[Praefect] -->|Mutator RPC| B(internal_storage_0) + B --> |Replication|C[internal_storage_1] +``` + +Let's say suddenly `internal_storage_0` goes down. Praefect will detect this and +automatically switch over to `internal_storage_1`, and `internal_storage_0` will serve as a secondary: + +```mermaid +graph TD + A[Praefect] -->|Mutator RPC| B(internal_storage_1) + B --> |Replication|C[internal_storage_0] +``` + +NOTE: **Note:**: Currently this feature is supported for setups that only have 1 Praefect instance. Praefect instances running, +for example behind a load balancer, `failover_enabled` should be disabled. The reason is The reason is because there +is no coordination that currently happens across different Praefect instances, so there could be a situation where +two Praefect instances think two different Gitaly nodes are the primary. + ### Gitaly NOTE: **Note:** Complete these steps for **each** Gitaly node. @@ -697,6 +751,31 @@ during a failover. Follow issue It is likely that we will implement support for Consul, and a cloud native strategy in the future. +## Identifying Impact of a Primary Node Failure + +When a primary Gitaly node fails, there is a chance of dataloss. Dataloss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The Praefect `dataloss` subcommand helps identify these cases by counting the number of dead replication jobs for each repository within a given timeframe. + +```shell +sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from -to +``` + +If the timeframe is not specified, dead replication jobs from the last six hours are counted: + +```shell +sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss + +Failed replication jobs between [2020-01-02 00:00:00 +0000 UTC, 2020-01-02 06:00:00 +0000 UTC): +example/repository-1: 1 jobs +example/repository-2: 4 jobs +example/repository-3: 2 jobs +``` + +To specify a timeframe in UTC, run: + +```shell +sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from 2020-01-02T00:00:00+00:00 -to 2020-01-02T00:02:00+00:00 +``` + ## Backend Node Recovery When a Praefect backend node fails and is no longer able to -- cgit v1.2.1