summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication/troubleshooting.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/troubleshooting.md')
-rw-r--r--doc/administration/geo/replication/troubleshooting.md91
1 files changed, 77 insertions, 14 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 2c0160ed02a..3b1f60a0f3f 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -5,7 +5,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: howto
---
-# Troubleshooting Geo **(PREMIUM ONLY)**
+# Troubleshooting Geo **(PREMIUM SELF)**
Setting up Geo requires careful attention to details and sometimes it's easy to
miss a step.
@@ -219,7 +219,7 @@ sudo gitlab-rake gitlab:geo:check
```
- Ensure that you have added the secondary node in the Admin Area of the **primary** node.
- - Ensure that you entered the `external_url` or `gitlab_rails['geo_node_name']` when adding the secondary node in the admin are of the **primary** node.
+ - Ensure that you entered the `external_url` or `gitlab_rails['geo_node_name']` when adding the secondary node in the Admin Area of the **primary** node.
- Prior to GitLab 12.4, edit the secondary node in the Admin Area of the **primary** node and ensure that there is a trailing `/` in the `Name` field.
1. Check returns `Exception: PG::UndefinedTable: ERROR: relation "geo_nodes" does not exist`
@@ -396,6 +396,69 @@ In GitLab 13.4, a seed project is added when GitLab is first installed. This mak
on a new Geo secondary node. There is an [issue to account for seed projects](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5618)
when checking the database.
+### Message: `Synchronization failed - Error syncing repository`
+
+WARNING:
+If large repositories are affected by this problem,
+their resync may take a long time and cause significant load on your Geo nodes,
+storage and network systems.
+
+If you get the error `Synchronization failed - Error syncing repository` along with the following log messages, this indicates that the expected `geo` remote is not present in the `.git/config` file
+of a repository on the secondary Geo node's filesystem:
+
+```json
+{
+ "created": "@1603481145.084348757",
+ "description": "Error received from peer unix:/var/opt/gitlab/gitaly/gitaly.socket",
+ …
+ "grpc_message": "exit status 128",
+ "grpc_status": 13
+}
+{ …
+ "grpc.request.fullMethod": "/gitaly.RemoteService/FindRemoteRootRef",
+ "grpc.request.glProjectPath": "<namespace>/<project>",
+ …
+ "level": "error",
+ "msg": "fatal: 'geo' does not appear to be a git repository
+ fatal: Could not read from remote repository. …",
+}
+```
+
+To solve this:
+
+1. Log into the secondary Geo node.
+
+1. Back up [the `.git` folder](../../repository_storage_types.md#translating-hashed-storage-paths).
+
+1. Optional: [Spot-check](../../troubleshooting/log_parsing.md#find-all-projects-affected-by-a-fatal-git-problem))
+ a few of those IDs whether they indeed correspond
+ to a project with known Geo replication failures.
+ Use `fatal: 'geo'` as the `grep` term and the following API call:
+
+ ```shell
+ curl --request GET --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/<first_failed_geo_sync_ID>"
+ ```
+
+1. Enter the [Rails console](../../troubleshooting/navigating_gitlab_via_rails_console.md) and run:
+
+ ```ruby
+ failed_geo_syncs = Geo::ProjectRegistry.failed.pluck(:id)
+ failed_geo_syncs.each do |fgs|
+ puts Geo::ProjectRegistry.failed.find(fgs).project_id
+ end
+ ```
+
+1. Run the following commands to reset each project's
+ Geo-related attributes and execute a new sync:
+
+ ```ruby
+ failed_geo_syncs.each do |fgs|
+ registry = Geo::ProjectRegistry.failed.find(fgs)
+ registry.update(resync_repository: true, force_to_redownload_repository: false, repository_retry_count: 0)
+ Geo::RepositorySyncService.new(registry.project).execute
+ end
+ ```
+
### Very large repositories never successfully synchronize on the **secondary** node
GitLab places a timeout on all repository clones, including project imports
@@ -678,19 +741,19 @@ sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote
GitLab 12.9 and later are [unaffected by this error](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5147).
-### Two-factor authentication is broken after a failover
+### Message: `ERROR - Replication is not up-to-date` during `gitlab-ctl promotion-preflight-checks`
+
+In GitLab 13.7 and earlier, if you have a data type with zero items to sync,
+this command reports `ERROR - Replication is not up-to-date` even if
+replication is actually up-to-date. This bug was fixed in GitLab 13.8 and
+later.
-The setup instructions for Geo prior to 10.5 failed to replicate the
-`otp_key_base` secret, which is used to encrypt the two-factor authentication
-secrets stored in the database. If it differs between **primary** and **secondary**
-nodes, users with two-factor authentication enabled won't be able to log in
-after a failover.
+### Message: `ERROR - Replication is not up-to-date` during `gitlab-ctl promote-to-primary-node`
-If you still have access to the old **primary** node, you can follow the
-instructions in the
-[Upgrading to GitLab 10.5](../replication/version_specific_updates.md#updating-to-gitlab-105)
-section to resolve the error. Otherwise, the secret is lost and you'll need to
-[reset two-factor authentication for all users](../../../security/two_factor_authentication.md#disabling-2fa-for-everyone).
+In GitLab 13.7 and earlier, if you have a data type with zero items to sync,
+this command reports `ERROR - Replication is not up-to-date` even if
+replication is actually up-to-date. If replication and verification output
+shows that it is complete, you can add `--skip-preflight-checks` to make the command complete promotion. This bug was fixed in GitLab 13.8 and later.
## Expired artifacts
@@ -717,7 +780,7 @@ node's URL matches its external URL.
## Fixing common errors
-This section documents common errors reported in the Admin UI and how to fix them.
+This section documents common errors reported in the Admin Area and how to fix them.
### Geo database configuration file is missing