diff options
Diffstat (limited to 'doc/administration/geo/disaster_recovery/background_verification.md')
-rw-r--r-- | doc/administration/geo/disaster_recovery/background_verification.md | 172 |
1 files changed, 172 insertions, 0 deletions
diff --git a/doc/administration/geo/disaster_recovery/background_verification.md b/doc/administration/geo/disaster_recovery/background_verification.md new file mode 100644 index 00000000000..7d2fd51f834 --- /dev/null +++ b/doc/administration/geo/disaster_recovery/background_verification.md @@ -0,0 +1,172 @@ +# Automatic background verification **[PREMIUM ONLY]** + +NOTE: **Note:** +Automatic background verification of repositories and wikis was added in +GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can +disable or enable this feature manually by following +[these instructions](#disabling-or-enabling-the-automatic-background-verification). + +Automatic background verification ensures that the transferred data matches a +calculated checksum. If the checksum of the data on the **primary** node matches checksum of the +data on the **secondary** node, the data transferred successfully. Following a planned failover, +any corrupted data may be **lost**, depending on the extent of the corruption. + +If verification fails on the **primary** node, this indicates that Geo is +successfully replicating a corrupted object; restore it from backup or remove it +it from the **primary** node to resolve the issue. + +If verification succeeds on the **primary** node but fails on the **secondary** node, +this indicates that the object was corrupted during the replication process. +Geo actively try to correct verification failures marking the repository to +be resynced with a backoff period. If you want to reset the verification for +these failures, so you should follow [these instructions][reset-verification]. + +If verification is lagging significantly behind replication, consider giving +the node more time before scheduling a planned failover. + +## Disabling or enabling the automatic background verification + +Run the following commands in a Rails console on the **primary** node: + +```sh +# Omnibus GitLab +gitlab-rails console + +# Installation from source +cd /home/git/gitlab +sudo -u git -H bin/rails console RAILS_ENV=production +``` + +To check if automatic background verification is enabled: + +```ruby +Gitlab::Geo.repository_verification_enabled? +``` + +To disable automatic background verification: + +```ruby +Feature.disable('geo_repository_verification') +``` + +To enable automatic background verification: + +```ruby +Feature.enable('geo_repository_verification') +``` + +## Repository verification + +Navigate to the **Admin Area > Geo** dashboard on the **primary** node and expand +the **Verification information** tab for that node to view automatic checksumming +status for repositories and wikis. Successes are shown in green, pending work +in grey, and failures in red. + +![Verification status](img/verification-status-primary.png) + +Navigate to the **Admin Area > Geo** dashboard on the **secondary** node and expand +the **Verification information** tab for that node to view automatic verification +status for repositories and wikis. As with checksumming, successes are shown in +green, pending work in grey, and failures in red. + +![Verification status](img/verification-status-secondary.png) + +## Using checksums to compare Geo nodes + +To check the health of Geo **secondary** nodes, we use a checksum over the list of +Git references and their values. The checksum includes `HEAD`, `heads`, `tags`, +`notes`, and GitLab-specific references to ensure true consistency. If two nodes +have the same checksum, then they definitely hold the same references. We compute +the checksum for every node after every update to make sure that they are all +in sync. + +## Repository re-verification + +> [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8550) in GitLab Enterprise Edition 11.6. Available in [GitLab Premium](https://about.gitlab.com/pricing/). + +Due to bugs or transient infrastructure failures, it is possible for Git +repositories to change unexpectedly without being marked for verification. +Geo constantly reverifies the repositories to ensure the integrity of the +data. The default and recommended re-verification interval is 7 days, though +an interval as short as 1 day can be set. Shorter intervals reduce risk but +increase load and vice versa. + +Navigate to the **Admin Area > Geo** dashboard on the **primary** node, and +click the **Edit** button for the **primary** node to customize the minimum +re-verification interval: + +![Re-verification interval](img/reverification-interval.png) + +The automatic background re-verification is enabled by default, but you can +disable if you need. Run the following commands in a Rails console on the +**primary** node: + +```sh +# Omnibus GitLab +gitlab-rails console + +# Installation from source +cd /home/git/gitlab +sudo -u git -H bin/rails console RAILS_ENV=production +``` + +To disable automatic background re-verification: + +```ruby +Feature.disable('geo_repository_reverification') +``` + +To enable automatic background re-verification: + +```ruby +Feature.enable('geo_repository_reverification') +``` + +## Reset verification for projects where verification has failed + +Geo actively try to correct verification failures marking the repository to +be resynced with a backoff period. If you want to reset them manually, this +rake task marks projects where verification has failed or the checksum mismatch +to be resynced without the backoff period: + +For repositories: + +- Omnibus Installation + + ```sh + sudo gitlab-rake geo:verification:repository:reset + ``` + +- Source Installation + + ```sh + sudo -u git -H bundle exec rake geo:verification:repository:reset RAILS_ENV=production + ``` + +For wikis: + +- Omnibus Installation + + ```sh + sudo gitlab-rake geo:verification:wiki:reset + ``` + +- Source Installation + + ```sh + sudo -u git -H bundle exec rake geo:verification:wiki:reset RAILS_ENV=production + ``` + +## Current limitations + +Until [issue #5064][ee-5064] is completed, background verification doesn't cover +CI job artifacts and traces, LFS objects, or user uploads in file storage. +Verify their integrity manually by following [these instructions][foreground-verification] +on both nodes, and comparing the output between them. + +Data in object storage is **not verified**, as the object store is responsible +for ensuring the integrity of the data. + +[reset-verification]: background_verification.md#reset-verification-for-projects-where-verification-has-failed +[foreground-verification]: ../../raketasks/check.md +[ee-5064]: https://gitlab.com/gitlab-org/gitlab-ee/issues/5064 |