summaryrefslogtreecommitdiff
path: root/doc/administration/gitaly/praefect.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/gitaly/praefect.md')
-rw-r--r--doc/administration/gitaly/praefect.md79
1 files changed, 79 insertions, 0 deletions
diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md
index d1d0c358dc6..19a69dc348c 100644
--- a/doc/administration/gitaly/praefect.md
+++ b/doc/administration/gitaly/praefect.md
@@ -300,6 +300,60 @@ application server, or a Gitaly node.
edit `/etc/gitlab/gitlab.rb`, remember to run `sudo gitlab-ctl reconfigure`
again before trying the `sql-ping` command.
+#### Automatic failover
+
+When automatic failover is enabled, Praefect will do automatic detection of the health of internal Gitaly nodes. If the
+primary has a certain amount of health checks fail, it will decide to promote one of the secondaries to be primary, and
+demote the primary to be a secondary.
+
+1. To enable automatic failover, edit `/etc/gitlab/gitlab.rb`:
+
+ ```ruby
+ # failover_enabled turns on automatic failover
+ praefect['failover_enabled'] = true
+ praefect['virtual_storages'] = {
+ 'praefect' => {
+ 'gitaly-1' => {
+ 'address' => 'tcp://GITALY_HOST:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN',
+ 'primary' => true
+ },
+ 'gitaly-2' => {
+ 'address' => 'tcp://GITALY_HOST:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN'
+ },
+ 'gitaly-3' => {
+ 'address' => 'tcp://GITALY_HOST:8075',
+ 'token' => 'PRAEFECT_INTERNAL_TOKEN'
+ }
+ }
+ }
+ ```
+
+1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
+
+Below is the picture when Praefect starts up with the config.toml above:
+
+```mermaid
+graph TD
+ A[Praefect] -->|Mutator RPC| B(internal_storage_0)
+ B --> |Replication|C[internal_storage_1]
+```
+
+Let's say suddenly `internal_storage_0` goes down. Praefect will detect this and
+automatically switch over to `internal_storage_1`, and `internal_storage_0` will serve as a secondary:
+
+```mermaid
+graph TD
+ A[Praefect] -->|Mutator RPC| B(internal_storage_1)
+ B --> |Replication|C[internal_storage_0]
+```
+
+NOTE: **Note:**: Currently this feature is supported for setups that only have 1 Praefect instance. Praefect instances running,
+for example behind a load balancer, `failover_enabled` should be disabled. The reason is The reason is because there
+is no coordination that currently happens across different Praefect instances, so there could be a situation where
+two Praefect instances think two different Gitaly nodes are the primary.
+
### Gitaly
NOTE: **Note:** Complete these steps for **each** Gitaly node.
@@ -697,6 +751,31 @@ during a failover. Follow issue
It is likely that we will implement support for Consul, and a cloud native
strategy in the future.
+## Identifying Impact of a Primary Node Failure
+
+When a primary Gitaly node fails, there is a chance of dataloss. Dataloss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The Praefect `dataloss` subcommand helps identify these cases by counting the number of dead replication jobs for each repository within a given timeframe.
+
+```shell
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from <rfc3339-time> -to <rfc3339-time>
+```
+
+If the timeframe is not specified, dead replication jobs from the last six hours are counted:
+
+```shell
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
+
+Failed replication jobs between [2020-01-02 00:00:00 +0000 UTC, 2020-01-02 06:00:00 +0000 UTC):
+example/repository-1: 1 jobs
+example/repository-2: 4 jobs
+example/repository-3: 2 jobs
+```
+
+To specify a timeframe in UTC, run:
+
+```shell
+sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from 2020-01-02T00:00:00+00:00 -to 2020-01-02T00:02:00+00:00
+```
+
## Backend Node Recovery
When a Praefect backend node fails and is no longer able to