diff options
Diffstat (limited to 'doc/administration/geo/replication')
11 files changed, 130 insertions, 37 deletions
diff --git a/doc/administration/geo/replication/configuration.md b/doc/administration/geo/replication/configuration.md index 926c4c565aa..e8ffa1ae91a 100644 --- a/doc/administration/geo/replication/configuration.md +++ b/doc/administration/geo/replication/configuration.md @@ -196,9 +196,9 @@ keys must be manually replicated to the **secondary** node. gitlab-ctl reconfigure ``` -1. On the top bar, select **Menu >** **{admin}** **Admin**. +1. On the top bar of the primary node, select **Menu >** **{admin}** **Admin**. 1. On the left sidebar, select **Geo > Nodes**. -1. Select **New node**. +1. Select **Add site**. ![Add secondary node](img/adding_a_secondary_node_v13_3.png) 1. Fill in **Name** with the `gitlab_rails['geo_node_name']` in `/etc/gitlab/gitlab.rb`. These values must always match *exactly*, character diff --git a/doc/administration/geo/replication/datatypes.md b/doc/administration/geo/replication/datatypes.md index 6989765dbad..a56d9dc813c 100644 --- a/doc/administration/geo/replication/datatypes.md +++ b/doc/administration/geo/replication/datatypes.md @@ -209,6 +209,6 @@ successfully, you must replicate their data using some other means. #### Limitation of verification for files in Object Storage -GitLab managed Object Storage replication support [is in beta](object_storage.md#enabling-gitlab-managed-object-storage-replication). +GitLab managed Object Storage replication support [is in beta](object_storage.md#enabling-gitlab-managed-object-storage-replication). Locally stored files are verified but remote stored files are not. diff --git a/doc/administration/geo/replication/docker_registry.md b/doc/administration/geo/replication/docker_registry.md index cc0719442a1..5cc4f66017b 100644 --- a/doc/administration/geo/replication/docker_registry.md +++ b/doc/administration/geo/replication/docker_registry.md @@ -53,7 +53,7 @@ We need to make Docker Registry send notification events to the registry['notifications'] = [ { 'name' => 'geo_event', - 'url' => 'https://example.com/api/v4/container_registry_event/events', + 'url' => 'https://<example.com>/api/v4/container_registry_event/events', 'timeout' => '500ms', 'threshold' => 5, 'backoff' => '1s', @@ -65,7 +65,8 @@ We need to make Docker Registry send notification events to the ``` NOTE: - Replace `<replace_with_a_secret_token>` with a case sensitive alphanumeric string + Replace `<example.com>` with the `external_url` defined in your primary site's `/etc/gitlab/gitlab.rb` file, and + replace `<replace_with_a_secret_token>` with a case sensitive alphanumeric string that starts with a letter. You can generate one with `< /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c 32 | sed "s/^[0-9]*//"; echo` NOTE: @@ -109,11 +110,14 @@ For each application and Sidekiq node on the **secondary** site: 1. Copy `/var/opt/gitlab/gitlab-rails/etc/gitlab-registry.key` from the **primary** to the node. -1. Edit `/etc/gitlab/gitlab.rb`: +1. Edit `/etc/gitlab/gitlab.rb` and add: ```ruby gitlab_rails['geo_registry_replication_enabled'] = true - gitlab_rails['geo_registry_replication_primary_api_url'] = 'https://primary.example.com:5050/' # Primary registry address, it will be used by the secondary node to directly communicate to primary registry + + # Primary registry's hostname and port, it will be used by + # the secondary node to directly communicate to primary registry + gitlab_rails['geo_registry_replication_primary_api_url'] = 'https://primary.example.com:5050/' ``` 1. Reconfigure the node for the change to take effect: diff --git a/doc/administration/geo/replication/faq.md b/doc/administration/geo/replication/faq.md index ef41b2ff172..28030dccb3b 100644 --- a/doc/administration/geo/replication/faq.md +++ b/doc/administration/geo/replication/faq.md @@ -23,7 +23,7 @@ For each project to sync: 1. Geo issues a `git fetch geo --mirror` to get the latest information from the **primary** site. If there are no changes, the sync is fast. Otherwise, it has to pull the latest commits. -1. The **secondary** site updates the tracking database to store the fact that it has synced projects A, B, C, etc. +1. The **secondary** site updates the tracking database to store the fact that it has synced projects A, B, C, and so on. 1. Repeat until all projects are synced. When someone pushes a commit to the **primary** site, it generates an event in the GitLab database that the repository has changed. @@ -46,8 +46,8 @@ Read the documentation for [Disaster Recovery](../disaster_recovery/index.md). ## What data is replicated to a **secondary** site? We currently replicate project repositories, LFS objects, generated -attachments / avatars and the whole database. This means user accounts, -issues, merge requests, groups, project data, etc., will be available for +attachments and avatars, and the whole database. This means user accounts, +issues, merge requests, groups, project data, and so on, will be available for query. ## Can I `git push` to a **secondary** site? @@ -58,7 +58,7 @@ Yes! Pushing directly to a **secondary** site (for both HTTP and SSH, including All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of factors including the amount of traffic, how big your commit is, the -connectivity between your sites, your hardware, etc. +connectivity between your sites, your hardware, and so on. ## What if the SSH server runs at a different port? diff --git a/doc/administration/geo/replication/location_aware_git_url.md b/doc/administration/geo/replication/location_aware_git_url.md index 014ca59e571..a80c293149e 100644 --- a/doc/administration/geo/replication/location_aware_git_url.md +++ b/doc/administration/geo/replication/location_aware_git_url.md @@ -88,7 +88,7 @@ routing configurations. ![Created policy record](img/single_git_created_policy_record.png) -You have successfully set up a single host, e.g. `git.example.com` which +You have successfully set up a single host, for example, `git.example.com` which distributes traffic to your Geo sites by geolocation! ## Configure Git clone URLs to use the special Git URL diff --git a/doc/administration/geo/replication/remove_geo_node.md b/doc/administration/geo/replication/remove_geo_node.md deleted file mode 100644 index b72cd3cbb95..00000000000 --- a/doc/administration/geo/replication/remove_geo_node.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -redirect_to: '../../geo/replication/remove_geo_site.md' -remove_date: '2021-06-01' ---- - -This document was moved to [another location](../../geo/replication/remove_geo_site.md). - -<!-- This redirect file can be deleted after 2021-06-01 --> -<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page --> diff --git a/doc/administration/geo/replication/security_review.md b/doc/administration/geo/replication/security_review.md index ae41599311b..966902a3d74 100644 --- a/doc/administration/geo/replication/security_review.md +++ b/doc/administration/geo/replication/security_review.md @@ -60,7 +60,7 @@ from [owasp.org](https://owasp.org/). access), but is constrained to read-only activities. The principal use case is envisioned to be cloning Git repositories from the **secondary** site in favor of the **primary** site, but end-users may use the GitLab web interface to view projects, - issues, merge requests, snippets, etc. + issues, merge requests, snippets, and so on. ### What security expectations do the endāusers have? @@ -203,7 +203,7 @@ from [owasp.org](https://owasp.org/). ### What data entry paths does the application support? - Data is entered via the web application exposed by GitLab itself. Some data is - also entered using system administration commands on the GitLab servers (e.g., + also entered using system administration commands on the GitLab servers (for example `gitlab-ctl set-primary-node`). - **Secondary** sites also receive inputs via PostgreSQL streaming replication from the **primary** site. @@ -247,7 +247,7 @@ from [owasp.org](https://owasp.org/). ### What encryption requirements have been defined for data in transit - including transmission over WAN, LAN, SecureFTP, or publicly accessible protocols such as http: and https:? - Data must have the option to be encrypted in transit, and be secure against - both passive and active attack (e.g., MITM attacks should not be possible). + both passive and active attack (for example, MITM attacks should not be possible). ## Access diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md index c00f523957c..d63e927627a 100644 --- a/doc/administration/geo/replication/troubleshooting.md +++ b/doc/administration/geo/replication/troubleshooting.md @@ -327,7 +327,7 @@ Slots where `active` is `f` are not active. - When this slot should be active, because you have a **secondary** node configured using that slot, log in to that **secondary** node and check the PostgreSQL logs why the replication is not running. -- If you are no longer using the slot (e.g. you no longer have Geo enabled), you can remove it with in the +- If you are no longer using the slot (for example, you no longer have Geo enabled), you can remove it with in the PostgreSQL console session: ```sql @@ -378,7 +378,7 @@ This happens on wrongly-formatted addresses in `postgresql['md5_auth_cidr_addres ``` To fix this, update the IP addresses in `/etc/gitlab/gitlab.rb` under `postgresql['md5_auth_cidr_addresses']` -to respect the CIDR format (i.e. `1.2.3.4/32`). +to respect the CIDR format (that is, `1.2.3.4/32`). ### Message: `LOG: invalid IP mask "md5": Name or service not known` @@ -390,7 +390,7 @@ This happens when you have added IP addresses without a subnet mask in `postgres ``` To fix this, add the subnet mask in `/etc/gitlab/gitlab.rb` under `postgresql['md5_auth_cidr_addresses']` -to respect the CIDR format (i.e. `1.2.3.4/32`). +to respect the CIDR format (that is, `1.2.3.4/32`). ### Message: `Found data in the gitlabhq_production database!` when running `gitlab-ctl replicate-geo-database` @@ -588,6 +588,75 @@ to start again from scratch, there are a few steps that can help you: gitlab-ctl start ``` +### Design repository failures on mirrored projects and project imports + +On the top bar, under **Menu >** **{admin}** **Admin > Geo > Nodes**, +if the Design repositories progress bar shows +`Synced` and `Failed` greater than 100%, and negative `Queued`, then the instance +is likely affected by +[a bug in GitLab 13.2 and 13.3](https://gitlab.com/gitlab-org/gitlab/-/issues/241668). +It was [fixed in 13.4+](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40643). + +To determine the actual replication status of design repositories in +a [Rails console](../../operations/rails_console.md): + +```ruby +secondary = Gitlab::Geo.current_node +counts = {} +secondary.designs.select("projects.id").find_each do |p| + registry = Geo::DesignRegistry.find_by(project_id: p.id) + state = registry ? "#{registry.state}" : "registry does not exist yet" + # puts "Design ID##{p.id}: #{state}" # uncomment this for granular information + counts[state] ||= 0 + counts[state] += 1 +end +puts "\nCounts:", counts +``` + +Example output: + +```plaintext +Design ID#5: started +Design ID#6: synced +Design ID#7: failed +Design ID#8: pending +Design ID#9: synced + +Counts: +{"started"=>1, "synced"=>2, "failed"=>1, "pending"=>1} +``` + +Example output if there are actually zero design repository replication failures: + +```plaintext +Design ID#5: synced +Design ID#6: synced +Design ID#7: synced + +Counts: +{"synced"=>3} +``` + +#### If you are promoting a Geo secondary site running on a single server + +`gitlab-ctl promotion-preflight-checks` will fail due to the existence of +`failed` rows in the `geo_design_registry` table. Use the +[previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) to +determine the actual replication status of Design repositories. + +`gitlab-ctl promote-to-primary-node` will fail since it runs preflight checks. +If the [previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) +shows that all designs are synced, then you can use the +`--skip-preflight-checks` option or the `--force` option to move forward with +promotion. + +#### If you are promoting a Geo secondary site running on multiple servers + +`gitlab-ctl promotion-preflight-checks` will fail due to the existence of +`failed` rows in the `geo_design_registry` table. Use the +[previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) to +determine the actual replication status of Design repositories. + ## Fixing errors during a failover or when promoting a secondary to a primary node The following are possible errors that might be encountered during failover or @@ -726,6 +795,7 @@ sudo gitlab-ctl promotion-preflight-checks sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote sudo gitlab-ctl reconfigure sudo gitlab-rake geo:set_secondary_as_primary +``` ## Expired artifacts @@ -794,7 +864,7 @@ PostgreSQL instances: The most common problems that prevent the database from replicating correctly are: -- **Secondary** nodes cannot reach the **primary** node. Check credentials, firewall rules, etc. +- **Secondary** nodes cannot reach the **primary** node. Check credentials, firewall rules, and so on. - SSL certificate problems. Make sure you copied `/etc/gitlab/gitlab-secrets.json` from the **primary** node. - Database storage disk is full. - Database replication slot is misconfigured. diff --git a/doc/administration/geo/replication/updating_the_geo_nodes.md b/doc/administration/geo/replication/updating_the_geo_nodes.md index 0c68adf162d..03570048071 100644 --- a/doc/administration/geo/replication/updating_the_geo_nodes.md +++ b/doc/administration/geo/replication/updating_the_geo_nodes.md @@ -28,9 +28,9 @@ and all **secondary** nodes: 1. **Optional:** [Pause replication on each **secondary** node.](../index.md#pausing-and-resuming-replication) 1. Log into the **primary** node. -1. [Update GitLab on the **primary** node using Omnibus's Geo-specific steps](https://docs.gitlab.com/omnibus/update/README.html#geo-deployment). +1. [Update GitLab on the **primary** node using Omnibus](https://docs.gitlab.com/omnibus/update/#update-using-the-official-repositories). 1. Log into each **secondary** node. -1. [Update GitLab on each **secondary** node using Omnibus's Geo-specific steps](https://docs.gitlab.com/omnibus/update/README.html#geo-deployment). +1. [Update GitLab on each **secondary** node using Omnibus](https://docs.gitlab.com/omnibus/update/#update-using-the-official-repositories). 1. If you paused replication in step 1, [resume replication on each **secondary**](../index.md#pausing-and-resuming-replication) 1. [Test](#check-status-after-updating) **primary** and **secondary** nodes, and check version in each. diff --git a/doc/administration/geo/replication/usage.md b/doc/administration/geo/replication/usage.md index 1491aa3427e..7fe8eec467e 100644 --- a/doc/administration/geo/replication/usage.md +++ b/doc/administration/geo/replication/usage.md @@ -27,7 +27,7 @@ Everything up-to-date ``` NOTE: -If you're using HTTPS instead of [SSH](../../../ssh/README.md) to push to the secondary, +If you're using HTTPS instead of [SSH](../../../ssh/index.md) to push to the secondary, you can't store credentials in the URL like `user:password@URL`. Instead, you can use a [`.netrc` file](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html) for Unix-like operating systems or `_netrc` for Windows. In that case, the credentials diff --git a/doc/administration/geo/replication/version_specific_updates.md b/doc/administration/geo/replication/version_specific_updates.md index 301be931b29..e193fc630b9 100644 --- a/doc/administration/geo/replication/version_specific_updates.md +++ b/doc/administration/geo/replication/version_specific_updates.md @@ -11,16 +11,35 @@ Review this page for update instructions for your version. These steps accompany the [general steps](updating_the_geo_nodes.md#general-update-steps) for updating Geo nodes. +## Updating to GitLab 13.12 + +We found an issue where [secondary nodes re-download all LFS files](https://gitlab.com/gitlab-org/gitlab/-/issues/334550) upon update. This bug: + +- Only applies to Geo secondary sites that have replicated LFS objects. +- Is _not_ a data loss risk. +- Causes churn and wasted bandwidth re-downloading all LFS objects. +- May impact performance for GitLab installations with a large number of LFS files. + +If you don't have many LFS objects or can stand a bit of churn, then it is safe to let the secondary sites re-download LFS objects. +If you do have many LFS objects, or many Geo secondary sites, or limited bandwidth, or a combination of them all, then we recommend you skip GitLab 13.12.0 through 13.12.6 and update to GitLab 13.12.7 or newer. + +### If you have already updated to an affected version, and the re-sync is ongoing + +You can manually migrate the legacy sync state to the new state column by running the following command in a [Rails console](../../operations/rails_console.md). It should take under a minute: + +```ruby +Geo::LfsObjectRegistry.where(state: 0, success: true).update_all(state: 2) +``` + ## Updating to GitLab 13.11 -We found an [issue with Git clone/pull through HTTP(s)](https://gitlab.com/gitlab-org/gitlab/-/issues/330787) on Geo secondaries and on any GitLab instance if maintenance mode is enabled. This was caused by a regression in GitLab Workhorse. This is fixed in the [GitLab 13.11.4 patch release](https://about.gitlab.com/releases/2021/05/14/gitlab-13-11-4-released/). To avoid this issue, upgrade to GitLab 13.11.4 or later. +We found an [issue with Git clone/pull through HTTP(s)](https://gitlab.com/gitlab-org/gitlab/-/issues/330787) on Geo secondaries and on any GitLab instance if maintenance mode is enabled. This was caused by a regression in GitLab Workhorse. This is fixed in the [GitLab 13.11.4 patch release](https://about.gitlab.com/releases/2021/05/14/gitlab-13-11-4-released/). To avoid this issue, upgrade to GitLab 13.11.4 or later. ## Updating to GitLab 13.9 We've detected an issue [with a column rename](https://gitlab.com/gitlab-org/gitlab/-/issues/324160) -that may prevent upgrades to GitLab 13.9.0, 13.9.1, 13.9.2 and 13.9.3. -We are working on a patch, but until a fixed version is released, you can manually complete -the zero-downtime upgrade: +that will prevent upgrades to GitLab 13.9.0, 13.9.1, 13.9.2 and 13.9.3 when following the zero-downtime steps. It is necessary +to perform the following additional steps for the zero-downtime upgrade: 1. Before running the final `sudo gitlab-rake db:migrate` command on the deploy node, execute the following queries using the PostgreSQL console (or `sudo gitlab-psql`) @@ -40,9 +59,18 @@ the zero-downtime upgrade: ``` If you have already run the final `sudo gitlab-rake db:migrate` command on the deploy node and have -encountered the [column rename issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160), you can still -follow the previous steps to complete the update. +encountered the [column rename issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160), you will +see the following error: + +```shell +-- remove_column(:application_settings, :asset_proxy_whitelist) +rake aborted! +StandardError: An error has occurred, all later migrations canceled: +PG::DependentObjectsStillExist: ERROR: cannot drop column asset_proxy_whitelist of table application_settings because other objects depend on it +DETAIL: trigger trigger_0d588df444c8 on table application_settings depends on column asset_proxy_whitelist of table application_settings +``` +To work around this bug, follow the previous steps to complete the update. More details are available [in this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160). ## Updating to GitLab 13.7 |