summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication/troubleshooting.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/troubleshooting.md')
-rw-r--r--doc/administration/geo/replication/troubleshooting.md78
1 files changed, 74 insertions, 4 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index c00f523957c..d63e927627a 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -327,7 +327,7 @@ Slots where `active` is `f` are not active.
- When this slot should be active, because you have a **secondary** node configured using that slot,
log in to that **secondary** node and check the PostgreSQL logs why the replication is not running.
-- If you are no longer using the slot (e.g. you no longer have Geo enabled), you can remove it with in the
+- If you are no longer using the slot (for example, you no longer have Geo enabled), you can remove it with in the
PostgreSQL console session:
```sql
@@ -378,7 +378,7 @@ This happens on wrongly-formatted addresses in `postgresql['md5_auth_cidr_addres
```
To fix this, update the IP addresses in `/etc/gitlab/gitlab.rb` under `postgresql['md5_auth_cidr_addresses']`
-to respect the CIDR format (i.e. `1.2.3.4/32`).
+to respect the CIDR format (that is, `1.2.3.4/32`).
### Message: `LOG: invalid IP mask "md5": Name or service not known`
@@ -390,7 +390,7 @@ This happens when you have added IP addresses without a subnet mask in `postgres
```
To fix this, add the subnet mask in `/etc/gitlab/gitlab.rb` under `postgresql['md5_auth_cidr_addresses']`
-to respect the CIDR format (i.e. `1.2.3.4/32`).
+to respect the CIDR format (that is, `1.2.3.4/32`).
### Message: `Found data in the gitlabhq_production database!` when running `gitlab-ctl replicate-geo-database`
@@ -588,6 +588,75 @@ to start again from scratch, there are a few steps that can help you:
gitlab-ctl start
```
+### Design repository failures on mirrored projects and project imports
+
+On the top bar, under **Menu >** **{admin}** **Admin > Geo > Nodes**,
+if the Design repositories progress bar shows
+`Synced` and `Failed` greater than 100%, and negative `Queued`, then the instance
+is likely affected by
+[a bug in GitLab 13.2 and 13.3](https://gitlab.com/gitlab-org/gitlab/-/issues/241668).
+It was [fixed in 13.4+](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40643).
+
+To determine the actual replication status of design repositories in
+a [Rails console](../../operations/rails_console.md):
+
+```ruby
+secondary = Gitlab::Geo.current_node
+counts = {}
+secondary.designs.select("projects.id").find_each do |p|
+ registry = Geo::DesignRegistry.find_by(project_id: p.id)
+ state = registry ? "#{registry.state}" : "registry does not exist yet"
+ # puts "Design ID##{p.id}: #{state}" # uncomment this for granular information
+ counts[state] ||= 0
+ counts[state] += 1
+end
+puts "\nCounts:", counts
+```
+
+Example output:
+
+```plaintext
+Design ID#5: started
+Design ID#6: synced
+Design ID#7: failed
+Design ID#8: pending
+Design ID#9: synced
+
+Counts:
+{"started"=>1, "synced"=>2, "failed"=>1, "pending"=>1}
+```
+
+Example output if there are actually zero design repository replication failures:
+
+```plaintext
+Design ID#5: synced
+Design ID#6: synced
+Design ID#7: synced
+
+Counts:
+{"synced"=>3}
+```
+
+#### If you are promoting a Geo secondary site running on a single server
+
+`gitlab-ctl promotion-preflight-checks` will fail due to the existence of
+`failed` rows in the `geo_design_registry` table. Use the
+[previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) to
+determine the actual replication status of Design repositories.
+
+`gitlab-ctl promote-to-primary-node` will fail since it runs preflight checks.
+If the [previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports)
+shows that all designs are synced, then you can use the
+`--skip-preflight-checks` option or the `--force` option to move forward with
+promotion.
+
+#### If you are promoting a Geo secondary site running on multiple servers
+
+`gitlab-ctl promotion-preflight-checks` will fail due to the existence of
+`failed` rows in the `geo_design_registry` table. Use the
+[previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) to
+determine the actual replication status of Design repositories.
+
## Fixing errors during a failover or when promoting a secondary to a primary node
The following are possible errors that might be encountered during failover or
@@ -726,6 +795,7 @@ sudo gitlab-ctl promotion-preflight-checks
sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote
sudo gitlab-ctl reconfigure
sudo gitlab-rake geo:set_secondary_as_primary
+```
## Expired artifacts
@@ -794,7 +864,7 @@ PostgreSQL instances:
The most common problems that prevent the database from replicating correctly are:
-- **Secondary** nodes cannot reach the **primary** node. Check credentials, firewall rules, etc.
+- **Secondary** nodes cannot reach the **primary** node. Check credentials, firewall rules, and so on.
- SSL certificate problems. Make sure you copied `/etc/gitlab/gitlab-secrets.json` from the **primary** node.
- Database storage disk is full.
- Database replication slot is misconfigured.