diff options
Diffstat (limited to 'doc/administration/geo/replication/troubleshooting.md')
-rw-r--r-- | doc/administration/geo/replication/troubleshooting.md | 47 |
1 files changed, 39 insertions, 8 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md index b8172322c10..f6d6f39fb19 100644 --- a/doc/administration/geo/replication/troubleshooting.md +++ b/doc/administration/geo/replication/troubleshooting.md @@ -251,7 +251,7 @@ sudo gitlab-rake gitlab:geo:check When performing a PostgreSQL major version (9 > 10) update this is expected. Follow: - - [initiate-the-replication-process](database.md#step-3-initiate-the-replication-process) + - [initiate-the-replication-process](../setup/database.md#step-3-initiate-the-replication-process) ## Fixing replication errors @@ -268,7 +268,7 @@ default to 1. You may need to increase this value if you have more Be sure to restart PostgreSQL for this to take effect. See the [PostgreSQL replication -setup](database.md#postgresql-replication) guide for more details. +setup](../setup/database.md#postgresql-replication) guide for more details. ### Message: `FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist`? @@ -276,11 +276,11 @@ This occurs when PostgreSQL does not have a replication slot for the **secondary** node by that name. You may want to rerun the [replication -process](database.md) on the **secondary** node . +process](../setup/database.md) on the **secondary** node . ### Message: "Command exceeded allowed execution time" when setting up replication? -This may happen while [initiating the replication process](database.md#step-3-initiate-the-replication-process) on the **secondary** node, +This may happen while [initiating the replication process](../setup/database.md#step-3-initiate-the-replication-process) on the **secondary** node, and indicates that your initial dataset is too large to be replicated in the default timeout (30 minutes). Re-run `gitlab-ctl replicate-geo-database`, but include a larger value for @@ -603,9 +603,9 @@ or `gitlab-ctl promote-to-primary-node`, either: bug](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/22021) was fixed. -If the above does not work, another possible reason is that you have paused replication -from the original primary node before attempting to promote this node. +### Message: ActiveRecord::RecordInvalid: Validation failed: Enabled Geo primary node cannot be disabled +This error may occur if you have paused replication from the original primary node before attempting to promote this node. To double check this, you can do the following: - Get the current secondary node's ID using: @@ -632,6 +632,23 @@ To double check this, you can do the following: UPDATE geo_nodes SET enabled = 't' WHERE id = ID_FROM_ABOVE; ``` +### While Promoting the secondary, I got an error `ActiveRecord::RecordInvalid` + +If you disabled a secondary node, either with the [replication pause task](../index.md#pausing-and-resuming-replication) +(13.2) or via the UI (13.1 and earlier), you must first re-enable the +node before you can continue. This is fixed in 13.4. + +From `gitlab-psql`, execute the following, replacing `<your secondary url>` +with the URL for your secondary server starting with `http` or `https` and ending with a `/`. + +```shell +SECONDARY_URL="https://<secondary url>/" +DATABASE_NAME="gitlabhq_production" +sudo gitlab-psql -d "$DATABASE_NAME" -c "UPDATE geo_nodes SET enabled = true WHERE url = '$SECONDARY_URL';" +``` + +This should update 1 row. + ### Message: ``NoMethodError: undefined method `secondary?' for nil:NilClass`` When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node), @@ -674,6 +691,20 @@ sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote GitLab 12.9 and later are [unaffected by this error](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5147). +### Two-factor authentication is broken after a failover + +The setup instructions for Geo prior to 10.5 failed to replicate the +`otp_key_base` secret, which is used to encrypt the two-factor authentication +secrets stored in the database. If it differs between **primary** and **secondary** +nodes, users with two-factor authentication enabled won't be able to log in +after a failover. + +If you still have access to the old **primary** node, you can follow the +instructions in the +[Upgrading to GitLab 10.5](../replication/version_specific_updates.md#updating-to-gitlab-105) +section to resolve the error. Otherwise, the secret is lost and you'll need to +[reset two-factor authentication for all users](../../../security/two_factor_authentication.md#disabling-2fa-for-everyone). + ## Expired artifacts If you notice for some reason there are more artifacts on the Geo @@ -723,7 +754,7 @@ This error refers to a problem with the database replica on a **secondary** node which Geo expects to have access to. It usually means, either: - An unsupported replication method was used (for example, logical replication). -- The instructions to setup a [Geo database replication](database.md) were not followed correctly. +- The instructions to setup a [Geo database replication](../setup/database.md) were not followed correctly. - Your database connection details are incorrect, that is you have specified the wrong user in your `/etc/gitlab/gitlab.rb` file. @@ -743,7 +774,7 @@ The most common problems that prevent the database from replicating correctly ar - Database replication slot is misconfigured. - Database is not using a replication slot or another alternative and cannot catch-up because WAL files were purged. -Make sure you follow the [Geo database replication](database.md) instructions for supported configuration. +Make sure you follow the [Geo database replication](../setup/database.md) instructions for supported configuration. ### Geo database version (...) does not match latest migration (...) |