diff options
Diffstat (limited to 'doc/administration/geo/replication/troubleshooting.md')
-rw-r--r-- | doc/administration/geo/replication/troubleshooting.md | 85 |
1 files changed, 79 insertions, 6 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md index 871d6041066..5a29c5a3c54 100644 --- a/doc/administration/geo/replication/troubleshooting.md +++ b/doc/administration/geo/replication/troubleshooting.md @@ -419,6 +419,21 @@ sudo gitlab-ctl reconfigure To help us resolve this problem, consider commenting on [the issue](https://gitlab.com/gitlab-org/gitlab/-/issues/4489). +### Message: `FATAL: could not connect to the primary server: server certificate for "PostgreSQL" does not match host name` + +This happens because the PostgreSQL certificate that the Omnibus GitLab package automatically creates contains +the Common Name `PostgreSQL`, but the replication is connecting to a different host and GitLab attempts to use +the `verify-full` SSL mode by default. + +In order to fix this, you can either: + +- Use the `--sslmode=verify-ca` argument with the `replicate-geo-database` command. +- For an already replicated database, change `sslmode=verify-full` to `sslmode=verify-ca` + in `/var/opt/gitlab/postgresql/data/gitlab-geo.conf` and run `gitlab-ctl restart postgresql`. +- [Configure SSL for PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#configuring-ssl) + with a custom certificate (including the host name that's used to connect to the database in the CN or SAN) + instead of using the automatically generated certificate. + ### Message: `LOG: invalid CIDR mask in address` This happens on wrongly-formatted addresses in `postgresql['md5_auth_cidr_addresses']`. @@ -637,9 +652,9 @@ to start again from scratch, there are a few steps that can help you: 1. Reset the Tracking Database. ```shell - gitlab-rake geo:db:drop # on a secondary app node - gitlab-ctl reconfigure # on the tracking database node - gitlab-rake geo:db:setup # on a secondary app node + gitlab-rake db:drop:geo # on a secondary app node + gitlab-ctl reconfigure # on the tracking database node + gitlab-rake db:migrate:geo # on a secondary app node ``` 1. Restart previously stopped services. @@ -977,7 +992,7 @@ On the **primary** node: 1. On the left sidebar, select **Geo > Nodes**. 1. Find the affected **secondary** site and select **Edit**. 1. Ensure the **URL** field matches the value found in `/etc/gitlab/gitlab.rb` - in `external_url "https://gitlab.example.com"` on the frontend server(s) of + in `external_url "https://gitlab.example.com"` on the frontend servers of the **secondary** node. ## Fixing common errors @@ -1042,7 +1057,7 @@ Make sure you follow the [Geo database replication](../setup/database.md) instru If you are using Omnibus GitLab installation, something might have failed during upgrade. You can: - Run `sudo gitlab-ctl reconfigure`. -- Manually trigger the database migration by running: `sudo gitlab-rake geo:db:migrate` as root on the **secondary** node. +- Manually trigger the database migration by running: `sudo gitlab-rake db:migrate:geo` as root on the **secondary** node. ### GitLab indicates that more than 100% of repositories were synced @@ -1101,12 +1116,70 @@ This is due to [Pages data not being managed by Geo](datatypes.md#limitations-on Find advice to resolve those error messages in the [Pages administration documentation](../../../administration/pages/index.md#404-error-after-promoting-a-geo-secondary-to-a-primary-node). +### Primary site returns 500 error when accessing `/admin/geo/replication/projects` + +Navigating to **Admin > Geo > Replication** (or `/admin/geo/replication/projects`) on a primary Geo site, shows a 500 error, while that same link on the secondary works fine. The primary's `production.log` has a similar entry to the following: + +```plaintext +Geo::TrackingBase::SecondaryNotConfigured: Geo secondary database is not configured + from ee/app/models/geo/tracking_base.rb:26:in `connection' + [..] + from ee/app/views/admin/geo/projects/_all.html.haml:1 +``` + +On a Geo primary site this error can be ignored. + +This happens because GitLab is attempting to display registries from the [Geo tracking database](../../../administration/geo/#geo-tracking-database) which doesn't exist on the primary site (only the original projects exist on the primary; no replicated projects are present, therefore no tracking database exists). + ## Fixing client errors -### Authorization errors from LFS HTTP(s) client requests +### Authorization errors from LFS HTTP(S) client requests You may have problems if you're running a version of [Git LFS](https://git-lfs.github.com/) before 2.4.2. As noted in [this authentication issue](https://github.com/git-lfs/git-lfs/issues/3025), requests redirected from the secondary to the primary node do not properly send the Authorization header. This may result in either an infinite `Authorization <-> Redirect` loop, or Authorization error messages. + +## Recovering from a partial failover + +The partial failover to a secondary Geo *site* may be the result of a temporary/transient issue. Therefore, first attempt to run the promote command again. + +1. SSH into every Sidekiq, PostgresSQL, Gitaly, and Rails node in the **secondary** site and run one of the following commands: + + - To promote the secondary node to primary: + + ```shell + sudo gitlab-ctl geo promote + ``` + + - To promote the secondary node to primary **without any further confirmation**: + + ```shell + sudo gitlab-ctl geo promote --force + ``` + +1. Verify you can connect to the newly-promoted **primary** site using the URL used previously for the **secondary** site. +1. If **successful**, the **secondary** site is now promoted to the **primary** site. + +If the above steps are **not successful**, proceed through the next steps: + +1. SSH to every Sidekiq, PostgresSQL, Gitaly and Rails node in the **secondary** site and perform the following operations: + + - Create a `/etc/gitlab/gitlab-cluster.json` file with the following content: + + ```shell + { + "primary": true, + "secondary": false + } + ``` + + - Reconfigure GitLab for the changes to take effect: + + ```shell + sudo gitlab-ctl reconfigure + ``` + +1. Verify you can connect to the newly-promoted **primary** site using the URL used previously for the **secondary** site. +1. If successful, the **secondary** site is now promoted to the **primary** site. |