summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication')
-rw-r--r--doc/administration/geo/replication/configuration.md26
-rw-r--r--doc/administration/geo/replication/datatypes.md30
-rw-r--r--doc/administration/geo/replication/docker_registry.md12
-rw-r--r--doc/administration/geo/replication/faq.md36
-rw-r--r--doc/administration/geo/replication/geo_validation_tests.md9
-rw-r--r--doc/administration/geo/replication/location_aware_git_url.md30
-rw-r--r--doc/administration/geo/replication/multiple_servers.md20
-rw-r--r--doc/administration/geo/replication/object_storage.md18
-rw-r--r--doc/administration/geo/replication/remove_geo_node.md2
-rw-r--r--doc/administration/geo/replication/security_review.md68
-rw-r--r--doc/administration/geo/replication/troubleshooting.md7
-rw-r--r--doc/administration/geo/replication/usage.md8
-rw-r--r--doc/administration/geo/replication/using_a_geo_server.md2
-rw-r--r--doc/administration/geo/replication/version_specific_updates.md25
14 files changed, 184 insertions, 109 deletions
diff --git a/doc/administration/geo/replication/configuration.md b/doc/administration/geo/replication/configuration.md
index 7dbb0c78166..6d5f3e61ba0 100644
--- a/doc/administration/geo/replication/configuration.md
+++ b/doc/administration/geo/replication/configuration.md
@@ -24,7 +24,7 @@ You are encouraged to first read through all the steps before executing them
in your testing/production environment.
NOTE:
-**Do not** set up any custom authentication for the **secondary** nodes. This will be handled by the **primary** node.
+**Do not** set up any custom authentication for the **secondary** nodes. This is handled by the **primary** node.
Any change that requires access to the **Admin Area** needs to be done in the
**primary** node because the **secondary** node is a read-only replica.
@@ -41,7 +41,7 @@ they must be manually replicated to the **secondary** node.
sudo cat /etc/gitlab/gitlab-secrets.json
```
- This will display the secrets that need to be replicated, in JSON format.
+ This displays the secrets that need to be replicated, in JSON format.
1. SSH into the **secondary** node and login as the `root` user:
@@ -85,11 +85,11 @@ GitLab integrates with the system-installed SSH daemon, designating a user
(typically named `git`) through which all access requests are handled.
In a [Disaster Recovery](../disaster_recovery/index.md) situation, GitLab system
-administrators will promote a **secondary** node to the **primary** node. DNS records for the
+administrators promote a **secondary** node to the **primary** node. DNS records for the
**primary** domain should also be updated to point to the new **primary** node
-(previously a **secondary** node). Doing so will avoid the need to update Git remotes and API URLs.
+(previously a **secondary** node). Doing so avoids the need to update Git remotes and API URLs.
-This will cause all SSH requests to the newly promoted **primary** node to
+This causes all SSH requests to the newly promoted **primary** node to
fail due to SSH host key mismatch. To prevent this, the primary SSH host
keys must be manually replicated to the **secondary** node.
@@ -183,7 +183,7 @@ keys must be manually replicated to the **secondary** node.
sudo -i
```
-1. Edit `/etc/gitlab/gitlab.rb` and add a **unique** name for your node. You will need this in the next steps:
+1. Edit `/etc/gitlab/gitlab.rb` and add a **unique** name for your node. You need this in the next steps:
```ruby
# The unique identifier for the Geo node.
@@ -229,9 +229,9 @@ keys must be manually replicated to the **secondary** node.
gitlab-rake gitlab:geo:check
```
-Once added to the Geo administration page and restarted, the **secondary** node will automatically start
+Once added to the Geo administration page and restarted, the **secondary** node automatically starts
replicating missing data from the **primary** node in a process known as **backfill**.
-Meanwhile, the **primary** node will start to notify each **secondary** node of any changes, so
+Meanwhile, the **primary** node starts to notify each **secondary** node of any changes, so
that the **secondary** node can act on those notifications immediately.
Be sure the _secondary_ node is running and accessible. You can sign in to the
@@ -241,7 +241,7 @@ _secondary_ node with the same credentials as were used with the _primary_ node.
You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
-If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
+If your **primary** node is using a self-signed certificate for *HTTPS* support, you
need to add that certificate to the **secondary** node's trust store. Retrieve the
certificate from the **primary** node and follow
[these instructions](https://docs.gitlab.com/omnibus/settings/ssl.html)
@@ -265,7 +265,7 @@ the _primary_ node. Visit the _secondary_ node's **Admin Area > Geo**
(`/admin/geo/nodes`) in your browser to determine if it's correctly identified
as a _secondary_ Geo node, and if Geo is enabled.
-The initial replication, or 'backfill', will probably still be in progress. You
+The initial replication, or 'backfill', is probably still in progress. You
can monitor the synchronization process on each Geo node from the **primary**
node's **Geo Nodes** dashboard in your browser.
@@ -282,12 +282,12 @@ The two most obvious issues that can become apparent in the dashboard are:
- You are using a custom certificate or custom CA (see the [troubleshooting document](troubleshooting.md)).
- The instance is firewalled (check your firewall rules).
-Please note that disabling a **secondary** node will stop the synchronization process.
+Please note that disabling a **secondary** node stops the synchronization process.
Please note that if `git_data_dirs` is customized on the **primary** node for multiple
repository shards you must duplicate the same configuration on each **secondary** node.
-Point your users to the ["Using a Geo Server" guide](using_a_geo_server.md).
+Point your users to the [Using a Geo Site guide](usage.md).
Currently, this is what is synced:
@@ -312,7 +312,7 @@ It is important to note that selective synchronization:
1. Does not hide project metadata from **secondary** nodes.
- Since Geo currently relies on PostgreSQL replication, all project metadata
gets replicated to **secondary** nodes, but repositories that have not been
- selected will be empty.
+ selected are empty.
1. Does not reduce the number of events generated for the Geo event log.
- The **primary** node generates events as long as any **secondary** nodes are present.
Selective synchronization restrictions are implemented on the **secondary** nodes,
diff --git a/doc/administration/geo/replication/datatypes.md b/doc/administration/geo/replication/datatypes.md
index 61f99257844..e2f12cbd8dc 100644
--- a/doc/administration/geo/replication/datatypes.md
+++ b/doc/administration/geo/replication/datatypes.md
@@ -205,3 +205,33 @@ successfully, you must replicate their data using some other means.
|[GitLab Pages](../../pages/index.md) | [No](https://gitlab.com/groups/gitlab-org/-/epics/589) | No | No | |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked on [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Note that replication of this cache is not needed for Disaster Recovery purposes because it can be recreated from external sources. |
|[Vulnerability Export](../../../user/application_security/vulnerability_report/#export-vulnerability-details) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/3111) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Not planned because they are ephemeral and sensitive. They can be regenerated on demand. |
+
+#### LFS object replication using the self service framework
+
+> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/276696) in GitLab 13.12.
+> - [Deployed behind a feature flag](../../../user/feature_flags.md), enabled by default.
+> - Not enabled on GitLab.com as Geo is not enabled.
+> - Recommended for production use.
+> - For GitLab self-managed instances, GitLab administrators can opt to [disable it](#enable-or-disable-lfs-object-replication-using-the-self-service-framework).
+
+There can be [risks when disabling released features](../../../user/feature_flags.md#risks-when-disabling-released-features).
+Refer to this feature's version history for more details.
+
+##### Enable or disable LFS object replication using the self service framework
+
+LFS object replication using the self service framework is under development but ready for production use. It is
+deployed behind a feature flag that is **enabled by default**.
+[GitLab administrators with access to the GitLab Rails console](../../../administration/feature_flags.md)
+can opt to disable it.
+
+To enable it:
+
+```ruby
+Feature.enable(:geo_lfs_object_replication)
+```
+
+To disable it:
+
+```ruby
+Feature.disable(:geo_lfs_object_replication)
+```
diff --git a/doc/administration/geo/replication/docker_registry.md b/doc/administration/geo/replication/docker_registry.md
index ea73614511f..a8628481ba7 100644
--- a/doc/administration/geo/replication/docker_registry.md
+++ b/doc/administration/geo/replication/docker_registry.md
@@ -24,7 +24,7 @@ integrated [Container Registry](../../packages/container_registry.md#use-object-
You can enable a storage-agnostic replication so it
can be used for cloud or local storage. Whenever a new image is pushed to the
-**primary** site, each **secondary** site will pull it to its own container
+**primary** site, each **secondary** site pulls it to its own container
repository.
To configure Docker Registry replication:
@@ -70,12 +70,12 @@ We need to make Docker Registry send notification events to the
NOTE:
If you use an external Registry (not the one integrated with GitLab), you must add
- these settings to its configuration yourself. In this case, you will also have to specify
+ these settings to its configuration yourself. In this case, you also have to specify
notification secret in `registry.notification_secret` section of
`/etc/gitlab/gitlab.rb` file.
NOTE:
- If you use GitLab HA, you will also have to specify
+ If you use GitLab HA, you also have to specify
the notification secret in `registry.notification_secret` section of
`/etc/gitlab/gitlab.rb` file for every web node.
@@ -95,11 +95,11 @@ expecting to see the Docker images replicated.
Because we need to allow the **secondary** site to communicate securely with
the **primary** site Container Registry, we need to have a single key
-pair for all the sites. The **secondary** site will use this key to
+pair for all the sites. The **secondary** site uses this key to
generate a short-lived JWT that is pull-only-capable to access the
**primary** site Container Registry.
-For each application and Sidekiq node on the **secondary** site:
+For each application and Sidekiq node on the **secondary** site:
1. SSH into the node and login as the `root` user:
@@ -126,5 +126,5 @@ For each application and Sidekiq node on the **secondary** site:
To verify Container Registry replication is working, go to **Admin Area > Geo**
(`/admin/geo/nodes`) on the **secondary** site.
-The initial replication, or "backfill", will probably still be in progress.
+The initial replication, or "backfill", is probably still in progress.
You can monitor the synchronization process on each Geo site from the **primary** site's **Geo Nodes** dashboard in your browser.
diff --git a/doc/administration/geo/replication/faq.md b/doc/administration/geo/replication/faq.md
index 73a36f5e674..a83a1c22db6 100644
--- a/doc/administration/geo/replication/faq.md
+++ b/doc/administration/geo/replication/faq.md
@@ -13,61 +13,61 @@ The requirements are listed [on the index page](../index.md#requirements-for-run
## How does Geo know which projects to sync?
-On each **secondary** node, there is a read-only replicated copy of the GitLab database.
-A **secondary** node also has a tracking database where it stores which projects have been synced.
+On each **secondary** site, there is a read-only replicated copy of the GitLab database.
+A **secondary** site also has a tracking database where it stores which projects have been synced.
Geo compares the two databases to find projects that are not yet tracked.
At the start, this tracking database is empty, so Geo will start trying to update from every project that it can see in the GitLab database.
For each project to sync:
-1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** node.
+1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** site.
If there are no changes, the sync will be fast and end quickly. Otherwise, it will pull the latest commits.
-1. The **secondary** node will update the tracking database to store the fact that it has synced projects A, B, C, etc.
+1. The **secondary** site will update the tracking database to store the fact that it has synced projects A, B, C, etc.
1. Repeat until all projects are synced.
-When someone pushes a commit to the **primary** node, it generates an event in the GitLab database that the repository has changed.
-The **secondary** node sees this event, marks the project in question as dirty, and schedules the project to be resynced.
+When someone pushes a commit to the **primary** site, it generates an event in the GitLab database that the repository has changed.
+The **secondary** site sees this event, marks the project in question as dirty, and schedules the project to be resynced.
To ensure that problems with pipelines (for example, syncs failing too many times or jobs being lost) don't permanently stop projects syncing, Geo also periodically checks the tracking database for projects that are marked as dirty. This check happens when
the number of concurrent syncs falls below `repos_max_capacity` and there are no new projects waiting to be synced.
Geo also has a checksum feature which runs a SHA256 sum across all the Git references to the SHA values.
-If the refs don't match between the **primary** node and the **secondary** node, then the **secondary** node will mark that project as dirty and try to resync it.
+If the refs don't match between the **primary** site and the **secondary** site, then the **secondary** site will mark that project as dirty and try to resync it.
So even if we have an outdated tracking database, the validation should activate and find discrepancies in the repository state and resync.
## Can I use Geo in a disaster recovery situation?
Yes, but there are limitations to what we replicate (see
-[What data is replicated to a **secondary** node?](#what-data-is-replicated-to-a-secondary-node)).
+[What data is replicated to a **secondary** site?](#what-data-is-replicated-to-a-secondary-site)).
Read the documentation for [Disaster Recovery](../disaster_recovery/index.md).
-## What data is replicated to a **secondary** node?
+## What data is replicated to a **secondary** site?
We currently replicate project repositories, LFS objects, generated
attachments / avatars and the whole database. This means user accounts,
issues, merge requests, groups, project data, etc., will be available for
query.
-## Can I `git push` to a **secondary** node?
+## Can I `git push` to a **secondary** site?
-Yes! Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
+Yes! Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
-## How long does it take to have a commit replicated to a **secondary** node?
+## How long does it take to have a commit replicated to a **secondary** site?
All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of
factors including the amount of traffic, how big your commit is, the
-connectivity between your nodes, your hardware, etc.
+connectivity between your sites, your hardware, etc.
## What if the SSH server runs at a different port?
-That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to all **secondary** nodes.
+That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** site to all **secondary** sites.
-## Is this possible to set up a Docker Registry for a **secondary** node that mirrors the one on the **primary** node?
+## Is this possible to set up a Docker Registry for a **secondary** site that mirrors the one on the **primary** site?
-Yes. See [Docker Registry for a **secondary** node](docker_registry.md).
+Yes. See [Docker Registry for a **secondary** site](docker_registry.md).
-## Can I login to a secondary node?
+## Can I login to a secondary site?
-Yes, but secondary nodes receive all authentication data (like user accounts and logins) from the primary instance. This means you will be re-directed to the primary for authentication and routed back afterwards.
+Yes, but secondary sites receive all authentication data (like user accounts and logins) from the primary instance. This means you will be re-directed to the primary for authentication and routed back afterwards.
diff --git a/doc/administration/geo/replication/geo_validation_tests.md b/doc/administration/geo/replication/geo_validation_tests.md
index 06fd3cd70be..8f67e70c9e2 100644
--- a/doc/administration/geo/replication/geo_validation_tests.md
+++ b/doc/administration/geo/replication/geo_validation_tests.md
@@ -179,6 +179,15 @@ The following are PostgreSQL upgrade validation tests we performed.
The following are additional validation tests we performed.
+### May 2021
+
+[Test failover with object storage replication enabled](https://gitlab.com/gitlab-org/gitlab/-/issues/330362):
+
+- Description: At the time of testing, Geo's object storage replication functionality was in beta. We tested that object storage replication works as intended and that the data was present on the new primary after a failover.
+- Outcome: The test was successful. Data in object storage was replicated and present after a failover.
+- Follow up issues:
+ - [Geo: Failing to replicate initial Monitoring project](https://gitlab.com/gitlab-org/gitlab/-/issues/330485)
+
### August 2020
[Test Gitaly Cluster on a Geo Deployment](https://gitlab.com/gitlab-org/gitlab/-/issues/223210):
diff --git a/doc/administration/geo/replication/location_aware_git_url.md b/doc/administration/geo/replication/location_aware_git_url.md
index 272b746015b..014ca59e571 100644
--- a/doc/administration/geo/replication/location_aware_git_url.md
+++ b/doc/administration/geo/replication/location_aware_git_url.md
@@ -8,11 +8,11 @@ type: howto
# Location-aware Git remote URL with AWS Route53 **(PREMIUM SELF)**
You can provide GitLab users with a single remote URL that automatically uses
-the Geo node closest to them. This means users don't need to update their Git
-configuration to take advantage of closer Geo nodes as they move.
+the Geo site closest to them. This means users don't need to update their Git
+configuration to take advantage of closer Geo sites as they move.
This is possible because, Git push requests can be automatically redirected
-(HTTP) or proxied (SSH) from **secondary** nodes to the **primary** node.
+(HTTP) or proxied (SSH) from **secondary** sites to the **primary** site.
Though these instructions use [AWS Route53](https://aws.amazon.com/route53/),
other services such as [Cloudflare](https://www.cloudflare.com/) could be used
@@ -20,30 +20,30 @@ as well.
NOTE:
You can also use a load balancer to distribute web UI or API traffic to
-[multiple Geo **secondary** nodes](../../../user/admin_area/geo_nodes.md#multiple-secondary-nodes-behind-a-load-balancer).
-Importantly, the **primary** node cannot yet be included. See the feature request
+[multiple Geo **secondary** sites](../../../user/admin_area/geo_nodes.md#multiple-secondary-nodes-behind-a-load-balancer).
+Importantly, the **primary** site cannot yet be included. See the feature request
[Support putting the **primary** behind a Geo node load balancer](https://gitlab.com/gitlab-org/gitlab/-/issues/10888).
## Prerequisites
In this example, we have already set up:
-- `primary.example.com` as a Geo **primary** node.
-- `secondary.example.com` as a Geo **secondary** node.
+- `primary.example.com` as a Geo **primary** site.
+- `secondary.example.com` as a Geo **secondary** site.
We will create a `git.example.com` subdomain that will automatically direct
requests:
-- From Europe to the **secondary** node.
-- From all other locations to the **primary** node.
+- From Europe to the **secondary** site.
+- From all other locations to the **primary** site.
In any case, you require:
-- A working GitLab **primary** node that is accessible at its own address.
-- A working GitLab **secondary** node.
+- A working GitLab **primary** site that is accessible at its own address.
+- A working GitLab **secondary** site.
- A Route53 Hosted Zone managing your domain.
-If you haven't yet set up a Geo _primary_ node and _secondary_ node, see the
+If you haven't yet set up a Geo _primary_ site and _secondary_ site, see the
[Geo setup instructions](../index.md#setup-instructions).
## Create a traffic policy
@@ -89,7 +89,7 @@ routing configurations.
![Created policy record](img/single_git_created_policy_record.png)
You have successfully set up a single host, e.g. `git.example.com` which
-distributes traffic to your Geo nodes by geolocation!
+distributes traffic to your Geo sites by geolocation!
## Configure Git clone URLs to use the special Git URL
@@ -114,10 +114,10 @@ You can customize the:
After following the configuration steps above, handling for Git requests is now location aware.
For requests:
-- Outside Europe, all requests are directed to the **primary** node.
+- Outside Europe, all requests are directed to the **primary** site.
- Within Europe, over:
- HTTP:
- - `git clone http://git.example.com/foo/bar.git` is directed to the **secondary** node.
+ - `git clone http://git.example.com/foo/bar.git` is directed to the **secondary** site.
- `git push` is initially directed to the **secondary**, which automatically
redirects to `primary.example.com`.
- SSH:
diff --git a/doc/administration/geo/replication/multiple_servers.md b/doc/administration/geo/replication/multiple_servers.md
index f83e0e14e54..59bb3884a02 100644
--- a/doc/administration/geo/replication/multiple_servers.md
+++ b/doc/administration/geo/replication/multiple_servers.md
@@ -53,14 +53,14 @@ It is possible to use cloud hosted services for PostgreSQL and Redis, but this i
## Prerequisites: Two working GitLab multi-node clusters
-One cluster will serve as the **primary** node. Use the
+One cluster serves as the **primary** node. Use the
[GitLab multi-node documentation](../../reference_architectures/index.md) to set this up. If
you already have a working GitLab instance that is in-use, it can be used as a
**primary**.
-The second cluster will serve as the **secondary** node. Again, use the
+The second cluster serves as the **secondary** node. Again, use the
[GitLab multi-node documentation](../../reference_architectures/index.md) to set this up.
-It's a good idea to log in and test it, however, note that its data will be
+It's a good idea to log in and test it, however, note that its data is
wiped out as part of the process of replicating from the **primary**.
## Configure the GitLab cluster to be the **primary** node
@@ -120,7 +120,7 @@ major differences:
called the "tracking database", which tracks the synchronization state of
various resources.
-Therefore, we will set up the multi-node components one-by-one, and include deviations
+Therefore, we set up the multi-node components one-by-one, and include deviations
from the normal multi-node setup. However, we highly recommend first configuring a
brand-new cluster as if it were not part of a Geo setup so that it can be
tested and verified as a working cluster. And only then should it be modified
@@ -133,7 +133,7 @@ Configure the following services, again using the non-Geo multi-node
documentation:
- [Configuring Redis for GitLab](../../redis/replication_and_failover.md#example-configuration-for-the-gitlab-application) for multiple nodes.
-- [Gitaly](../../gitaly/index.md), which will store data that is
+- [Gitaly](../../gitaly/index.md), which stores data that is
synchronized from the **primary** node.
NOTE:
@@ -143,7 +143,7 @@ recommended.
### Step 2: Configure the main read-only replica PostgreSQL database on the **secondary** node
NOTE:
-The following documentation assumes the database will be run on
+The following documentation assumes the database runs on
a single node only. Multi-node PostgreSQL on **secondary** nodes is
[not currently supported](https://gitlab.com/groups/gitlab-org/-/epics/2536).
@@ -151,7 +151,7 @@ Configure the [**secondary** database](../setup/database.md) as a read-only repl
the **primary** database. Use the following as a guide.
1. Generate an MD5 hash of the desired password for the database user that the
- GitLab application will use to access the read-replica database:
+ GitLab application uses to access the read-replica database:
Note that the username (`gitlab` by default) is incorporated into the hash.
@@ -233,13 +233,13 @@ If using an external PostgreSQL instance, refer also to
### Step 3: Configure the tracking database on the **secondary** node
NOTE:
-This documentation assumes the tracking database will be run on
+This documentation assumes the tracking database runs on
only a single machine, rather than as a PostgreSQL cluster.
Configure the tracking database.
1. Generate an MD5 hash of the desired password for the database user that the
- GitLab application will use to access the tracking database:
+ GitLab application uses to access the tracking database:
Note that the username (`gitlab_geo` by default) is incorporated into the
hash.
@@ -377,7 +377,7 @@ Make sure that current node IP is listed in `postgresql['md5_auth_cidr_addresses
After making these changes [Reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) so the changes take effect.
-On the secondary the following GitLab frontend services will be enabled:
+On the secondary the following GitLab frontend services are enabled:
- `geo-logcursor`
- `gitlab-pages`
diff --git a/doc/administration/geo/replication/object_storage.md b/doc/administration/geo/replication/object_storage.md
index ad419f999b3..7dd831092a3 100644
--- a/doc/administration/geo/replication/object_storage.md
+++ b/doc/administration/geo/replication/object_storage.md
@@ -9,9 +9,9 @@ type: howto
Geo can be used in combination with Object Storage (AWS S3, or other compatible object storage).
-Currently, **secondary** nodes can use either:
+Currently, **secondary** sites can use either:
-- The same storage bucket as the **primary** node.
+- The same storage bucket as the **primary** site.
- A replicated storage bucket.
To have:
@@ -28,13 +28,13 @@ To have:
WARNING:
This is a [**beta** feature](https://about.gitlab.com/handbook/product/#beta) and is not ready yet for production use at any scale. The main limitations are a lack of testing at scale and no verification of any replicated data.
-**Secondary** nodes can replicate files stored on the **primary** node regardless of
+**Secondary** sites can replicate files stored on the **primary** site regardless of
whether they are stored on the local file system or in object storage.
To enable GitLab replication, you must:
1. Go to **Admin Area > Geo**.
-1. Press **Edit** on the **secondary** node.
+1. Press **Edit** on the **secondary** site.
1. In the **Synchronization Settings** section, find the **Allow this secondary node to replicate content on Object Storage**
checkbox to enable it.
@@ -46,7 +46,7 @@ For CI job artifacts, there is similar documentation to configure
For user uploads, there is similar documentation to configure [upload object storage](../../uploads.md#using-object-storage)
-If you want to migrate the **primary** node's files to object storage, you can
+If you want to migrate the **primary** site's files to object storage, you can
configure the **secondary** in a few ways:
- Use the exact same object storage.
@@ -57,15 +57,15 @@ configure the **secondary** in a few ways:
GitLab does not currently support the case where both:
-- The **primary** node uses local storage.
-- A **secondary** node uses object storage.
+- The **primary** site uses local storage.
+- A **secondary** site uses object storage.
## Third-party replication services
When using Amazon S3, you can use
[CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to
-have automatic replication between the bucket used by the **primary** node and
-the bucket used by **secondary** nodes.
+have automatic replication between the bucket used by the **primary** site and
+the bucket used by **secondary** sites.
If you are using Google Cloud Storage, consider using
[Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional).
diff --git a/doc/administration/geo/replication/remove_geo_node.md b/doc/administration/geo/replication/remove_geo_node.md
index 09ea84b6c4b..697d8c6ae38 100644
--- a/doc/administration/geo/replication/remove_geo_node.md
+++ b/doc/administration/geo/replication/remove_geo_node.md
@@ -4,5 +4,5 @@ redirect_to: '../../geo/replication/remove_geo_site.md'
This document was moved to [another location](../../geo/replication/remove_geo_site.md).
-<!-- This redirect file can be deleted after 2022-04-01 -->
+<!-- This redirect file can be deleted after 2021-06-01 -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page -->
diff --git a/doc/administration/geo/replication/security_review.md b/doc/administration/geo/replication/security_review.md
index abb84b95623..f84d7a2171d 100644
--- a/doc/administration/geo/replication/security_review.md
+++ b/doc/administration/geo/replication/security_review.md
@@ -36,7 +36,7 @@ from [owasp.org](https://owasp.org/).
- The GitLab model of sensitivity is centered around public vs. internal vs.
private projects. Geo replicates them all indiscriminately. "Selective sync"
exists for files and repositories (but not database content), which would permit
- only less-sensitive projects to be replicated to a **secondary** node if desired.
+ only less-sensitive projects to be replicated to a **secondary** site if desired.
- See also: [GitLab data classification policy](https://about.gitlab.com/handbook/engineering/security/data-classification-standard.html).
### What data backup and retention requirements have been defined for the application?
@@ -48,18 +48,18 @@ from [owasp.org](https://owasp.org/).
### Who are the application's end‐users?
-- **Secondary** nodes are created in regions that are distant (in terms of
- Internet latency) from the main GitLab installation (the **primary** node). They are
- intended to be used by anyone who would ordinarily use the **primary** node, who finds
- that the **secondary** node is closer to them (in terms of Internet latency).
+- **Secondary** sites are created in regions that are distant (in terms of
+ Internet latency) from the main GitLab installation (the **primary** site). They are
+ intended to be used by anyone who would ordinarily use the **primary** site, who finds
+ that the **secondary** site is closer to them (in terms of Internet latency).
### How do the end‐users interact with the application?
-- **Secondary** nodes provide all the interfaces a **primary** node does
+- **Secondary** sites provide all the interfaces a **primary** site does
(notably a HTTP/HTTPS web application, and HTTP/HTTPS or SSH Git repository
access), but is constrained to read-only activities. The principal use case is
- envisioned to be cloning Git repositories from the **secondary** node in favor of the
- **primary** node, but end-users may use the GitLab web interface to view projects,
+ envisioned to be cloning Git repositories from the **secondary** site in favor of the
+ **primary** site, but end-users may use the GitLab web interface to view projects,
issues, merge requests, snippets, etc.
### What security expectations do the end‐users have?
@@ -67,10 +67,10 @@ from [owasp.org](https://owasp.org/).
- The replication process must be secure. It would typically be unacceptable to
transmit the entire database contents or all files and repositories across the
public Internet in plaintext, for instance.
-- **Secondary** nodes must have the same access controls over its content as the
- **primary** node - unauthenticated users must not be able to gain access to privileged
- information on the **primary** node by querying the **secondary** node.
-- Attackers must not be able to impersonate the **secondary** node to the **primary** node, and
+- **Secondary** sites must have the same access controls over its content as the
+ **primary** site - unauthenticated users must not be able to gain access to privileged
+ information on the **primary** site by querying the **secondary** site.
+- Attackers must not be able to impersonate the **secondary** site to the **primary** site, and
thus gain access to privileged information.
## Administrators
@@ -86,7 +86,7 @@ from [owasp.org](https://owasp.org/).
### What administrative capabilities does the application offer?
-- **Secondary** nodes may be added, modified, or removed by users with
+- **Secondary** sites may be added, modified, or removed by users with
administrative access.
- The replication process may be controlled (start/stop) via the Sidekiq
administrative controls.
@@ -95,9 +95,9 @@ from [owasp.org](https://owasp.org/).
### What details regarding routing, switching, firewalling, and load‐balancing have been defined?
-- Geo requires the **primary** node and **secondary** node to be able to communicate with each
- other across a TCP/IP network. In particular, the **secondary** nodes must be able to
- access HTTP/HTTPS and PostgreSQL services on the **primary** node.
+- Geo requires the **primary** site and **secondary** site to be able to communicate with each
+ other across a TCP/IP network. In particular, the **secondary** sites must be able to
+ access HTTP/HTTPS and PostgreSQL services on the **primary** site.
### What core network devices support the application?
@@ -105,9 +105,9 @@ from [owasp.org](https://owasp.org/).
### What network performance requirements exist?
-- Maximum replication speeds between **primary** node and **secondary** node is limited by the
+- Maximum replication speeds between **primary** site and **secondary** site is limited by the
available bandwidth between sites. No hard requirements exist - time to complete
- replication (and ability to keep up with changes on the **primary** node) is a function
+ replication (and ability to keep up with changes on the **primary** site) is a function
of the size of the data set, tolerance for latency, and available network
capacity.
@@ -189,9 +189,9 @@ from [owasp.org](https://owasp.org/).
### How will database connection strings, encryption keys, and other sensitive components be stored, accessed, and protected from unauthorized detection?
- There are some Geo-specific values. Some are shared secrets which must be
- securely transmitted from the **primary** node to the **secondary** node at setup time. Our
- documentation recommends transmitting them from the **primary** node to the system
- administrator via SSH, and then back out to the **secondary** node in the same manner.
+ securely transmitted from the **primary** site to the **secondary** site at setup time. Our
+ documentation recommends transmitting them from the **primary** site to the system
+ administrator via SSH, and then back out to the **secondary** site in the same manner.
In particular, this includes the PostgreSQL replication credentials and a secret
key (`db_key_base`) which is used to decrypt certain columns in the database.
The `db_key_base` secret is stored unencrypted on the file system, in
@@ -205,25 +205,25 @@ from [owasp.org](https://owasp.org/).
- Data is entered via the web application exposed by GitLab itself. Some data is
also entered using system administration commands on the GitLab servers (e.g.,
`gitlab-ctl set-primary-node`).
-- **Secondary** nodes also receive inputs via PostgreSQL streaming replication from the **primary** node.
+- **Secondary** sites also receive inputs via PostgreSQL streaming replication from the **primary** site.
### What data output paths does the application support?
-- **Primary** nodes output via PostgreSQL streaming replication to the **secondary** node.
+- **Primary** sites output via PostgreSQL streaming replication to the **secondary** site.
Otherwise, principally via the web application exposed by GitLab itself, and via
SSH `git clone` operations initiated by the end-user.
### How does data flow across the application's internal components?
-- **Secondary** nodes and **primary** nodes interact via HTTP/HTTPS (secured with JSON web
+- **Secondary** sites and **primary** sites interact via HTTP/HTTPS (secured with JSON web
tokens) and via PostgreSQL streaming replication.
-- Within a **primary** node or **secondary** node, the SSOT is the file system and the database
- (including Geo tracking database on **secondary** node). The various internal components
+- Within a **primary** site or **secondary** site, the SSOT is the file system and the database
+ (including Geo tracking database on **secondary** site). The various internal components
are orchestrated to make alterations to these stores.
### What data input validation requirements have been defined?
-- **Secondary** nodes must have a faithful replication of the **primary** node's data.
+- **Secondary** sites must have a faithful replication of the **primary** site's data.
### What data does the application store and how?
@@ -231,11 +231,11 @@ from [owasp.org](https://owasp.org/).
### What data is or may need to be encrypted and what key management requirements have been defined?
-- Neither **primary** nodes or **secondary** nodes encrypt Git repository or file system data at
+- Neither **primary** sites or **secondary** sites encrypt Git repository or file system data at
rest. A subset of database columns are encrypted at rest using the `db_otp_key`.
- A static secret shared across all hosts in a GitLab deployment.
- In transit, data should be encrypted, although the application does permit
- communication to proceed unencrypted. The two main transits are the **secondary** node's
+ communication to proceed unencrypted. The two main transits are the **secondary** site's
replication process for PostgreSQL, and for Git repositories/files. Both should
be protected using TLS, with the keys for that managed via Omnibus per existing
configuration for end-user access to GitLab.
@@ -253,19 +253,19 @@ from [owasp.org](https://owasp.org/).
### What user privilege levels does the application support?
-- Geo adds one type of privilege: **secondary** nodes can access a special Geo API to
+- Geo adds one type of privilege: **secondary** sites can access a special Geo API to
download files over HTTP/HTTPS, and to clone repositories using HTTP/HTTPS.
### What user identification and authentication requirements have been defined?
-- **Secondary** nodes identify to Geo **primary** nodes via OAuth or JWT authentication
+- **Secondary** sites identify to Geo **primary** sites via OAuth or JWT authentication
based on the shared database (HTTP access) or a PostgreSQL replication user (for
database replication). The database replication also requires IP-based access
controls to be defined.
### What user authorization requirements have been defined?
-- **Secondary** nodes must only be able to *read* data. They are not currently able to mutate data on the **primary** node.
+- **Secondary** sites must only be able to *read* data. They are not currently able to mutate data on the **primary** site.
### What session management requirements have been defined?
@@ -279,9 +279,9 @@ from [owasp.org](https://owasp.org/).
### What access requirements have been defined for URI and Service calls?
-- **Secondary** nodes make many calls to the **primary** node's API. This is how file
+- **Secondary** sites make many calls to the **primary** site's API. This is how file
replication proceeds, for instance. This endpoint is only accessible with a JWT token.
-- The **primary** node also makes calls to the **secondary** node to get status information.
+- The **primary** site also makes calls to the **secondary** site to get status information.
## Application Monitoring
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 079a3713c73..6d990fd12ba 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -558,6 +558,7 @@ to start again from scratch, there are a few steps that can help you:
mv /var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/uploads.old
mkdir -p /var/opt/gitlab/gitlab-rails/uploads
+ gitlab-ctl start postgresql
gitlab-ctl start geo-postgresql
```
@@ -853,6 +854,12 @@ To resolve this issue:
the **primary** node using IPv4 in the `/etc/hosts` file. Alternatively, you should
[enable IPv6 on the **primary** node](https://docs.gitlab.com/omnibus/settings/nginx.html#setting-the-nginx-listen-address-or-addresses).
+### GitLab Pages return 404 errors after promoting
+
+This is due to [Pages data not being managed by Geo](datatypes.md#limitations-on-replicationverification).
+Find advice to resolve those errors in the
+[Pages administration documentation](../../../administration/pages/index.md#404-error-after-promoting-a-geo-secondary-to-a-primary-node).
+
## Fixing client errors
### Authorization errors from LFS HTTP(s) client requests
diff --git a/doc/administration/geo/replication/usage.md b/doc/administration/geo/replication/usage.md
index 2fcc0565567..1491aa3427e 100644
--- a/doc/administration/geo/replication/usage.md
+++ b/doc/administration/geo/replication/usage.md
@@ -25,3 +25,11 @@ remote: ssh://git@primary.geo/user/repo.git
remote:
Everything up-to-date
```
+
+NOTE:
+If you're using HTTPS instead of [SSH](../../../ssh/README.md) to push to the secondary,
+you can't store credentials in the URL like `user:password@URL`. Instead, you can use a
+[`.netrc` file](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html)
+for Unix-like operating systems or `_netrc` for Windows. In that case, the credentials
+will be stored as a plain text. If you're looking for a more secure way to store credentials,
+you can use [Git Credential Storage](https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage).
diff --git a/doc/administration/geo/replication/using_a_geo_server.md b/doc/administration/geo/replication/using_a_geo_server.md
index e48e750f710..f8ce72ac3f8 100644
--- a/doc/administration/geo/replication/using_a_geo_server.md
+++ b/doc/administration/geo/replication/using_a_geo_server.md
@@ -4,5 +4,5 @@ redirect_to: '../../geo/replication/usage.md'
This document was moved to [another location](../../geo/replication/usage.md).
-<!-- This redirect file can be deleted after 2022-04-01 -->
+<!-- This redirect file can be deleted after 2022-06-01 -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page -->
diff --git a/doc/administration/geo/replication/version_specific_updates.md b/doc/administration/geo/replication/version_specific_updates.md
index 4f0a4dc638c..4a101c52325 100644
--- a/doc/administration/geo/replication/version_specific_updates.md
+++ b/doc/administration/geo/replication/version_specific_updates.md
@@ -15,8 +15,29 @@ for updating Geo nodes.
We've detected an issue [with a column rename](https://gitlab.com/gitlab-org/gitlab/-/issues/324160)
that may prevent upgrades to GitLab 13.9.0, 13.9.1, 13.9.2 and 13.9.3.
-We are working on a patch and recommend delaying any upgrade attempt until a fixed version
-is released.
+We are working on a patch, but until a fixed version is released, you can manually complete
+the zero-downtime upgrade:
+
+1. Before running the final `sudo gitlab-rake db:migrate` command on the deploy node,
+ execute the following queries using the PostgreSQL console (or `sudo gitlab-psql`)
+ to drop the problematic triggers:
+
+ ```sql
+ drop trigger trigger_e40a6f1858e6 on application_settings;
+ drop trigger trigger_0d588df444c8 on application_settings;
+ drop trigger trigger_1572cbc9a15f on application_settings;
+ drop trigger trigger_22a39c5c25f3 on application_settings;
+ ```
+
+1. Run the final migrations:
+
+ ```shell
+ sudo gitlab-rake db:migrate
+ ```
+
+If you have already run the final `sudo gitlab-rake db:migrate` command on the deploy node and have
+encountered the [column rename issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160), you can still
+follow the previous steps to complete the update.
More details are available [in this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160).