diff options
Diffstat (limited to 'doc/administration/gitaly/praefect.md')
-rw-r--r-- | doc/administration/gitaly/praefect.md | 72 |
1 files changed, 21 insertions, 51 deletions
diff --git a/doc/administration/gitaly/praefect.md b/doc/administration/gitaly/praefect.md index 0ea83f0e090..3d4e606205e 100644 --- a/doc/administration/gitaly/praefect.md +++ b/doc/administration/gitaly/praefect.md @@ -1,4 +1,7 @@ --- +stage: Create +group: Gitaly +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers type: reference --- @@ -89,7 +92,7 @@ to install GitLab](https://about.gitlab.com/install/). Provision a PostgreSQL server (PostgreSQL 11 or newer). Configuration through the Omnibus GitLab distribution is not yet supported. Follow this -[issue](https://gitlab.com/gitlab-org/gitaly/issues/2476) for updates. +[issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2476) for updates. Prepare all your new nodes by [installing GitLab](https://about.gitlab.com/install/). @@ -319,37 +322,6 @@ application server, or a Gitaly node. } ``` -1. Enable the database replication queue: - - ```ruby - praefect['postgres_queue_enabled'] = true - ``` - - In the next release, database replication queue will be enabled by default. - See [issue #2615](https://gitlab.com/gitlab-org/gitaly/-/issues/2615). - -1. Enable automatic failover by editing `/etc/gitlab/gitlab.rb`: - - ```ruby - praefect['failover_enabled'] = true - praefect['failover_election_strategy'] = 'sql' - ``` - - When automatic failover is enabled, Praefect checks the health of internal - Gitaly nodes. If the primary has a certain amount of health checks fail, it - will promote one of the secondaries to be primary, and demote the primary to - be a secondary. - - NOTE: **Note:** Database leader election will be [enabled by default in the - future](https://gitlab.com/gitlab-org/gitaly/-/issues/2682). - - Caution, **automatic failover** favors availability over consistency and will - cause data loss if changes have not been replicated to the newly elected - primary. In the next release, leader election will [prefer to promote up to - date replicas](https://gitlab.com/gitlab-org/gitaly/-/issues/2642), and it - will be an option to favor consistency by marking [out-of-date repositories - read-only](https://gitlab.com/gitlab-org/gitaly/-/issues/2630). - 1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure): @@ -409,7 +381,7 @@ Particular attention should be shown to: `gitaly-2`, and `gitaly-3` as Gitaly storage names. For more information on Gitaly server configuration, see our [Gitaly -documentation](index.md#3-gitaly-server-configuration). +documentation](index.md#configure-gitaly-servers). 1. SSH into the **Gitaly** node and login as root: @@ -424,16 +396,18 @@ documentation](index.md#3-gitaly-server-configuration). postgresql['enable'] = false redis['enable'] = false nginx['enable'] = false - prometheus['enable'] = false grafana['enable'] = false puma['enable'] = false sidekiq['enable'] = false gitlab_workhorse['enable'] = false prometheus_monitoring['enable'] = false - # Enable only the Praefect service + # Enable only the Gitaly service gitaly['enable'] = true + # Enable Prometheus if needed + prometheus['enable'] = true + # Prevent database connections during 'gitlab-ctl reconfigure' gitlab_rails['rake_cache_clear'] = false gitlab_rails['auto_migrate'] = false @@ -682,8 +656,8 @@ Particular attention should be shown to: Repository > Repository storage** to make the newly configured Praefect cluster the storage location for new Git repositories. - - Deselect the **default** storage location - - Select the **praefect** storage location + - The default option is unchecked. + - The Praefect option is checked. ![Update repository storage](img/praefect_storage_v12_10.png) @@ -744,7 +718,7 @@ Praefect regularly checks the health of each backend Gitaly node. This information can be used to automatically failover to a new primary node if the current primary node is found to be unhealthy. -- **PostgreSQL (recommended):** Enabled by setting +- **PostgreSQL (recommended):** Enabled by default, and equivalent to: `praefect['failover_election_strategy'] = sql`. This configuration option will allow multiple Praefect nodes to coordinate via the PostgreSQL database to elect a primary Gitaly node. This configuration @@ -755,24 +729,21 @@ current primary node is found to be unhealthy. reconfigured in `/etc/gitlab/gitlab.rb` on the Praefect node. Modify the `praefect['virtual_storages']` field by moving the `primary = true` to promote a different Gitaly node to primary. In the steps above, `gitaly-1` was set to - the primary. -- **Memory:** Enabled by setting `praefect['failover_enabled'] = true` in - `/etc/gitlab/gitlab.rb` on the Praefect node. If a sufficient number of health + the primary. Requires `praefect['failover_enabled'] = false` in the configuration. +- **Memory:** Enabled by setting `praefect['failover_election_strategy'] = 'local'` + in `/etc/gitlab/gitlab.rb` on the Praefect node. If a sufficient number of health checks fail for the current primary backend Gitaly node, and new primary will be elected. **Do not use with multiple Praefect nodes!** Using with multiple Praefect nodes is likely to result in a split brain. -NOTE: **Note:**: Praefect does not yet account for replication lag on -the secondaries during the election process, so data loss can occur -during a failover. Follow issue -[#2642](https://gitlab.com/gitlab-org/gitaly/-/issues/2642) for updates. - It is likely that we will implement support for Consul, and a cloud native strategy in the future. ## Identifying Impact of a Primary Node Failure -When a primary Gitaly node fails, there is a chance of data loss. Data loss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The Praefect `dataloss` sub-command helps identify these cases by counting the number of dead replication jobs for each repository within a given time frame. +When a primary Gitaly node fails, there is a chance of data loss. Data loss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The `dataloss` Praefect sub-command helps identify these cases by counting the number of dead replication jobs for each repository. This command must be executed on a Praefect node. + +A time frame to search can be specified with `-from` and `-to`: ```shell sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from <rfc3339-time> -to <rfc3339-time> @@ -784,9 +755,7 @@ If the time frame is not specified, dead replication jobs from the last six hour sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss Failed replication jobs between [2020-01-02 00:00:00 +0000 UTC, 2020-01-02 06:00:00 +0000 UTC): -example/repository-1: 1 jobs -example/repository-2: 4 jobs -example/repository-3: 2 jobs +@hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git: 2 jobs ``` To specify a time frame in UTC, run: @@ -797,7 +766,8 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t ### Checking repository checksums -To check a project's checksums across all nodes, the Praefect replicas Rake task can be used: +To check a project's repository checksums across on all Gitaly nodes, the +replicas Rake task can be run on the main GitLab node: ```shell sudo gitlab-rake "gitlab:praefect:replicas[project_id]" |