summaryrefslogtreecommitdiff
path: root/doc/administration/geo
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo')
-rw-r--r--doc/administration/geo/disaster_recovery/index.md22
-rw-r--r--doc/administration/geo/disaster_recovery/planned_failover.md5
-rw-r--r--doc/administration/geo/index.md122
-rw-r--r--doc/administration/geo/replication/configuration.md26
-rw-r--r--doc/administration/geo/replication/datatypes.md30
-rw-r--r--doc/administration/geo/replication/docker_registry.md12
-rw-r--r--doc/administration/geo/replication/faq.md36
-rw-r--r--doc/administration/geo/replication/geo_validation_tests.md9
-rw-r--r--doc/administration/geo/replication/location_aware_git_url.md30
-rw-r--r--doc/administration/geo/replication/multiple_servers.md20
-rw-r--r--doc/administration/geo/replication/object_storage.md18
-rw-r--r--doc/administration/geo/replication/remove_geo_node.md2
-rw-r--r--doc/administration/geo/replication/security_review.md68
-rw-r--r--doc/administration/geo/replication/troubleshooting.md7
-rw-r--r--doc/administration/geo/replication/usage.md8
-rw-r--r--doc/administration/geo/replication/using_a_geo_server.md2
-rw-r--r--doc/administration/geo/replication/version_specific_updates.md25
-rw-r--r--doc/administration/geo/setup/database.md272
-rw-r--r--doc/administration/geo/setup/index.md2
19 files changed, 518 insertions, 198 deletions
diff --git a/doc/administration/geo/disaster_recovery/index.md b/doc/administration/geo/disaster_recovery/index.md
index d1ea2978202..7c6f4a32b57 100644
--- a/doc/administration/geo/disaster_recovery/index.md
+++ b/doc/administration/geo/disaster_recovery/index.md
@@ -62,8 +62,7 @@ must disable the **primary** node.
- If you do not have SSH access to the **primary** node, take the machine offline and
prevent it from rebooting by any means at your disposal.
- Since there are many ways you may prefer to accomplish this, we will avoid a
- single recommendation. You may need to:
+ You might need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the primary DNS record to the
@@ -240,11 +239,11 @@ an external PostgreSQL database, as it can only perform changes on a **secondary
node with GitLab and the database on the same machine. As a result, a manual process is
required:
-1. Promote the replica database associated with the **secondary** site. This will
- set the database to read-write. The instructions vary depending on where your database is hosted:
+1. Promote the replica database associated with the **secondary** site. This
+ sets the database to read-write. The instructions vary depending on where your database is hosted:
- [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote)
- [Azure PostgreSQL](https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal#stop-replication)
- - For other external PostgreSQL databases, save the following script in you
+ - For other external PostgreSQL databases, save the following script in your
secondary node, for example `/tmp/geo_promote.sh`, and modify the connection
parameters to match your environment. Then, execute it to promote the replica:
@@ -292,7 +291,7 @@ required:
### Step 4. (Optional) Updating the primary domain DNS record
Updating the DNS records for the primary domain to point to the **secondary** node
-will prevent the need to update all references to the primary domain to the
+to prevent the need to update all references to the primary domain to the
secondary domain, like changing Git remotes and API URLs.
1. SSH into the **secondary** node and login as root:
@@ -311,7 +310,7 @@ secondary domain, like changing Git remotes and API URLs.
```
NOTE:
- Changing `external_url` won't prevent access via the old secondary URL, as
+ Changing `external_url` does not prevent access via the old secondary URL, as
long as the secondary DNS records are still intact.
1. Reconfigure the **secondary** node for the change to take effect:
@@ -326,7 +325,7 @@ secondary domain, like changing Git remotes and API URLs.
gitlab-rake geo:update_primary_node_url
```
- This command will use the changed `external_url` configuration defined
+ This command uses the changed `external_url` configuration defined
in `/etc/gitlab/gitlab.rb`.
1. For GitLab 11.11 through 12.7 only, you may need to update the **primary**
@@ -334,7 +333,7 @@ secondary domain, like changing Git remotes and API URLs.
To determine if you need to do this, search for the
`gitlab_rails["geo_node_name"]` setting in your `/etc/gitlab/gitlab.rb`
- file. If it is commented out with `#` or not found at all, then you will
+ file. If it is commented out with `#` or not found at all, then you
need to update the **primary** node's name in the database. You can search for it
like so:
@@ -444,7 +443,7 @@ and after that you also need two extra steps.
Now we need to make each **secondary** node listen to changes on the new **primary** node. To do that you need
to [initiate the replication process](../setup/database.md#step-3-initiate-the-replication-process) again but this time
-for another **primary** node. All the old replication settings will be overwritten.
+for another **primary** node. All the old replication settings are overwritten.
## Promoting a secondary Geo cluster in GitLab Cloud Native Helm Charts
@@ -479,8 +478,7 @@ must disable the **primary** site:
- If you do not have access to the **primary** Kubernetes cluster, take the cluster offline and
prevent it from coming back online by any means at your disposal.
- Since there are many ways you may prefer to accomplish this, we will avoid a
- single recommendation. You may need to:
+ You might need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the primary DNS record to the
diff --git a/doc/administration/geo/disaster_recovery/planned_failover.md b/doc/administration/geo/disaster_recovery/planned_failover.md
index 96c6482e3db..bd8467f5437 100644
--- a/doc/administration/geo/disaster_recovery/planned_failover.md
+++ b/doc/administration/geo/disaster_recovery/planned_failover.md
@@ -27,7 +27,7 @@ have a high degree of confidence in being able to perform them accurately.
## Not all data is automatically replicated
-If you are using any GitLab features that Geo [doesn't support](../index.md#limitations),
+If you are using any GitLab features that Geo [doesn't support](../replication/datatypes.md#limitations-on-replicationverification),
you must make separate provisions to ensure that the **secondary** node has an
up-to-date copy of any data associated with that feature. This may extend the
required scheduled maintenance period significantly.
@@ -40,8 +40,7 @@ final transfer inside the maintenance window) will then transfer only the
Repository-centric strategies for using `rsync` effectively can be found in the
[moving repositories](../../operations/moving_repositories.md) documentation; these strategies can
-be adapted for use with any other file-based data, such as GitLab Pages (to
-be found in `/var/opt/gitlab/gitlab-rails/shared/pages` if using Omnibus).
+be adapted for use with any other file-based data, such as [GitLab Pages](../../pages/index.md#change-storage-path).
## Preflight checks
diff --git a/doc/administration/geo/index.md b/doc/administration/geo/index.md
index f1884f297e8..780e391973c 100644
--- a/doc/administration/geo/index.md
+++ b/doc/administration/geo/index.md
@@ -22,12 +22,14 @@ Geo undergoes significant changes from release to release. Upgrades **are** supp
Fetching large repositories can take a long time for teams located far from a single GitLab instance.
-Geo provides local, read-only instances of your GitLab instances. This can reduce the time it takes
+Geo provides local, read-only sites of your GitLab instances. This can reduce the time it takes
to clone and fetch large repositories, speeding up development.
For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w).
-To make sure you're using the right version of the documentation, navigate to [the source version of this page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v11.2.3-ee`](https://gitlab.com/gitlab-org/gitlab/blob/v11.2.3-ee/doc/administration/geo/index.md).
+To make sure you're using the right version of the documentation, navigate to [the Geo page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v13.7.6-ee`](https://gitlab.com/gitlab-org/gitlab/-/blob/v13.7.6-ee/doc/administration/geo/index.md).
+
+Geo uses a set of defined terms that is described in the [Geo Glossary](glossary.md), please familiarize yourself with those terms.
## Use cases
@@ -35,21 +37,21 @@ Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
-- Balance the read-only load between your **primary** and **secondary** nodes.
+- Balance the read-only load between your **primary** and **secondary** sites.
In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [limitations](#limitations)).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
-- Can quickly fail over to a **secondary** node in a [disaster recovery](disaster_recovery/index.md) scenario.
-- Allows [planned failover](disaster_recovery/planned_failover.md) to a **secondary** node.
+- Can quickly fail over to a **secondary** site in a [disaster recovery](disaster_recovery/index.md) scenario.
+- Allows [planned failover](disaster_recovery/planned_failover.md) to a **secondary** site.
Geo provides:
-- Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling read-only **secondary** nodes for each of your distributed teams.
-- Authentication system hooks: **Secondary** nodes receives all authentication data (like user accounts and logins) from the **primary** instance.
-- An intuitive UI: **Secondary** nodes use the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node.
+- Read-only **secondary** sites: Maintain one **primary** GitLab site while still enabling read-only **secondary** sites for each of your distributed teams.
+- Authentication system hooks: **Secondary** sites receives all authentication data (like user accounts and logins) from the **primary** instance.
+- An intuitive UI: **Secondary** sites use the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** sites.
### Gitaly Cluster
@@ -64,16 +66,16 @@ Your Geo instance can be used for cloning and fetching projects, in addition to
When Geo is enabled, the:
-- Original instance is known as the **primary** node.
-- Replicated read-only nodes are known as **secondary** nodes.
+- Original instance is known as the **primary** site.
+- Replicated read-only sites are known as **secondary** sites.
Keep in mind that:
-- **Secondary** nodes talk to the **primary** node to:
+- **Secondary** sites talk to the **primary** site to:
- Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
-- In GitLab Premium 10.0 and later, the **primary** node no longer talks to **secondary** nodes to notify for changes (API).
-- Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
+- In GitLab Premium 10.0 and later, the **primary** site no longer talks to **secondary** sites to notify for changes (API).
+- Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
- There are [limitations](#limitations) when using Geo.
### Architecture
@@ -84,31 +86,31 @@ The following diagram illustrates the underlying architecture of Geo.
In this diagram:
-- There is the **primary** node and the details of one **secondary** node.
-- Writes to the database can only be performed on the **primary** node. A **secondary** node receives database
+- There is the **primary** site and the details of one **secondary** site.
+- Writes to the database can only be performed on the **primary** site. A **secondary** site receives database
updates via PostgreSQL streaming replication.
- If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](disaster_recovery/index.md) scenarios.
-- A **secondary** node performs different type of synchronizations against the **primary** node, using a special
+- A **secondary** site performs different type of synchronizations against the **primary** site, using a special
authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations:
-- The **primary** node behaves as a full read-write GitLab instance.
-- **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves.
+- The **primary** site behaves as a full read-write GitLab instance.
+- **Secondary** sites are read-only but proxy Git push operations to the **primary** site. This makes **secondary** sites appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that:
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
-Note that a **secondary** node needs two different PostgreSQL databases:
+Note that a **secondary** site needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database.
-- [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated.
+- [Another database instance](#geo-tracking-database) used internally by the **secondary** site to record what data has been replicated.
-In **secondary** nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
+In **secondary** sites, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Requirements for running Geo
@@ -122,7 +124,7 @@ The following are required to run Geo:
- PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- Git 2.9+
- Git-lfs 2.4.2+ on the user side when using LFS
-- All nodes must run the same GitLab version.
+- All sites must run the same GitLab version.
Additionally, check the GitLab [minimum requirements](../../install/requirements.md),
and we recommend you use:
@@ -132,9 +134,9 @@ and we recommend you use:
### Firewall rules
-The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo.
+The following table lists basic ports that must be open between the **primary** and **secondary** sites for Geo.
-| **Primary** node | **Secondary** node | Protocol |
+| **Primary** site | **Secondary** site | Protocol |
|:-----------------|:-------------------|:-------------|
| 80 | 80 | HTTP |
| 443 | 443 | TCP or HTTPS |
@@ -153,10 +155,10 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p
### LDAP
-We recommend that if you use LDAP on your **primary** node, you also set up secondary LDAP servers on each **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
+We recommend that if you use LDAP on your **primary** site, you also set up secondary LDAP servers on each **secondary** site. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** site using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
NOTE:
-It is possible for all **secondary** nodes to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](disaster_recovery/index.md) scenario if a **secondary** node is promoted to be a **primary** node.
+It is possible for all **secondary** sites to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](disaster_recovery/index.md) scenario if a **secondary** site is promoted to be a **primary** site.
Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
@@ -168,18 +170,37 @@ The tracking database instance is used as metadata to control what needs to be u
- Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated.
-Because the replicated database instance is read-only, we need this additional database instance for each **secondary** node.
+Because the replicated database instance is read-only, we need this additional database instance for each **secondary** site.
### Geo Log Cursor
This daemon:
-- Reads a log of events replicated by the **primary** node to the **secondary** database instance.
+- Reads a log of events replicated by the **primary** site to the **secondary** database instance.
- Updates the Geo Tracking Database instance with changes that need to be executed.
-When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state.
+When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** site will execute the required operations and update the state.
+
+This new architecture allows GitLab to be resilient to connectivity issues between the sites. It doesn't matter how long the **secondary** site is disconnected from the **primary** site as it will be able to replay all the events in the correct order and become synchronized with the **primary** site again.
+
+## Limitations
+
+WARNING:
+This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
-This new architecture allows GitLab to be resilient to connectivity issues between the nodes. It doesn't matter how long the **secondary** node is disconnected from the **primary** node as it will be able to replay all the events in the correct order and become synchronized with the **primary** node again.
+- Pushing directly to a **secondary** site redirects (for HTTP) or proxies (for SSH) the request to the **primary** site instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`.
+- The **primary** site has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** site to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465).
+- The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details.
+- Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** site.
+- [Selective synchronization](replication/configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** site in full, making it inappropriate for use as an access control mechanism.
+- Object pools for forked project deduplication work only on the **primary** site, and are duplicated on the **secondary** site.
+- GitLab Runners cannot register with a **secondary** site. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294).
+- Configuring Geo **secondary** sites to [use high-availability configurations of PostgreSQL](https://gitlab.com/groups/gitlab-org/-/epics/2536) is currently in **alpha** support.
+- [Selective synchronization](replication/configuration.md#selective-synchronization) only limits what repositories are replicated. The entire PostgreSQL data is still replicated. Selective synchronization is not built to accomodate compliance / export control use cases.
+
+### Limitations on replication/verification
+
+There is a complete list of all GitLab [data types](replication/datatypes.md) and [existing support for replication and verification](replication/datatypes.md#limitations-on-replicationverification).
## Setup instructions
@@ -187,7 +208,7 @@ For setup instructions, see [Setting up Geo](setup/index.md).
## Post-installation documentation
-After installing GitLab on the **secondary** nodes and performing the initial configuration, see the following documentation for post-installation information.
+After installing GitLab on the **secondary** site(s) and performing the initial configuration, see the following documentation for post-installation information.
### Configuring Geo
@@ -195,16 +216,16 @@ For information on configuring Geo, see [Geo configuration](replication/configur
### Updating Geo
-For information on how to update your Geo nodes to the latest GitLab version, see [Updating the Geo nodes](replication/updating_the_geo_nodes.md).
+For information on how to update your Geo site(s) to the latest GitLab version, see [Updating the Geo sites](replication/updating_the_geo_nodes.md).
### Pausing and resuming replication
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/35913) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2.
WARNING:
-In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the
+In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a
-secondary. If the node is paused, be sure to resume before promoting. This
+secondary. If the site is paused, be sure to resume before promoting. This
issue has been fixed in GitLab 13.4 and later.
WARNING:
@@ -213,7 +234,7 @@ Omnibus GitLab-managed database. External databases are currently not supported.
In some circumstances, like during [upgrades](replication/updating_the_geo_nodes.md) or a [planned failover](disaster_recovery/planned_failover.md), it is desirable to pause replication between the primary and secondary.
-Pausing and resuming replication is done via a command line tool from the secondary node where the `postgresql` service is enabled.
+Pausing and resuming replication is done via a command line tool from the a node in the secondary site where the `postgresql` service is enabled.
If `postgresql` is on a standalone database node, ensure that `gitlab.rb` on that node contains the configuration line `gitlab_rails['geo_node_name'] = 'node_name'`, where `node_name` is the same as the `geo_name_name` on the application node.
@@ -243,7 +264,7 @@ For information on using Geo in disaster recovery situations to mitigate data-lo
### Replicating the Container Registry
-For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** node](replication/docker_registry.md).
+For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** site](replication/docker_registry.md).
### Security Review
@@ -259,41 +280,22 @@ For an example of how to set up a location-aware Git remote URL with AWS Route53
### Backfill
-Once a **secondary** node is set up, it will start replicating missing data from
-the **primary** node in a process known as **backfill**. You can monitor the
-synchronization process on each Geo node from the **primary** node's **Geo Nodes**
+Once a **secondary** site is set up, it will start replicating missing data from
+the **primary** site in a process known as **backfill**. You can monitor the
+synchronization process on each Geo site from the **primary** site's **Geo Nodes**
dashboard in your browser.
Failures that happen during a backfill are scheduled to be retried at the end
of the backfill.
-## Remove Geo node
+## Remove Geo site
-For more information on removing a Geo node, see [Removing **secondary** Geo nodes](replication/remove_geo_node.md).
+For more information on removing a Geo site, see [Removing **secondary** Geo sites](replication/remove_geo_site.md).
## Disable Geo
To find out how to disable Geo, see [Disabling Geo](replication/disable_geo.md).
-## Limitations
-
-WARNING:
-This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
-
-- Pushing directly to a **secondary** node redirects (for HTTP) or proxies (for SSH) the request to the **primary** node instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`.
-- The **primary** node has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** node to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465).
-- The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details.
-- Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** node.
-- [Selective synchronization](replication/configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** node in full, making it inappropriate for use as an access control mechanism.
-- Object pools for forked project deduplication work only on the **primary** node, and are duplicated on the **secondary** node.
-- GitLab Runners cannot register with a **secondary** node. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294).
-- Geo **secondary** nodes can not be configured to [use high-availability configurations of PostgreSQL](https://gitlab.com/groups/gitlab-org/-/epics/2536).
-- [Selective synchronization](replication/configuration.md#selective-synchronization) only limits what repositories are replicated. The entire PostgreSQL data is still replicated. Selective synchronization is not built to accomodate compliance / export control use cases.
-
-### Limitations on replication/verification
-
-There is a complete list of all GitLab [data types](replication/datatypes.md) and [existing support for replication and verification](replication/datatypes.md#limitations-on-replicationverification).
-
## Frequently Asked Questions
For answers to common questions, see the [Geo FAQ](replication/faq.md).
diff --git a/doc/administration/geo/replication/configuration.md b/doc/administration/geo/replication/configuration.md
index 7dbb0c78166..6d5f3e61ba0 100644
--- a/doc/administration/geo/replication/configuration.md
+++ b/doc/administration/geo/replication/configuration.md
@@ -24,7 +24,7 @@ You are encouraged to first read through all the steps before executing them
in your testing/production environment.
NOTE:
-**Do not** set up any custom authentication for the **secondary** nodes. This will be handled by the **primary** node.
+**Do not** set up any custom authentication for the **secondary** nodes. This is handled by the **primary** node.
Any change that requires access to the **Admin Area** needs to be done in the
**primary** node because the **secondary** node is a read-only replica.
@@ -41,7 +41,7 @@ they must be manually replicated to the **secondary** node.
sudo cat /etc/gitlab/gitlab-secrets.json
```
- This will display the secrets that need to be replicated, in JSON format.
+ This displays the secrets that need to be replicated, in JSON format.
1. SSH into the **secondary** node and login as the `root` user:
@@ -85,11 +85,11 @@ GitLab integrates with the system-installed SSH daemon, designating a user
(typically named `git`) through which all access requests are handled.
In a [Disaster Recovery](../disaster_recovery/index.md) situation, GitLab system
-administrators will promote a **secondary** node to the **primary** node. DNS records for the
+administrators promote a **secondary** node to the **primary** node. DNS records for the
**primary** domain should also be updated to point to the new **primary** node
-(previously a **secondary** node). Doing so will avoid the need to update Git remotes and API URLs.
+(previously a **secondary** node). Doing so avoids the need to update Git remotes and API URLs.
-This will cause all SSH requests to the newly promoted **primary** node to
+This causes all SSH requests to the newly promoted **primary** node to
fail due to SSH host key mismatch. To prevent this, the primary SSH host
keys must be manually replicated to the **secondary** node.
@@ -183,7 +183,7 @@ keys must be manually replicated to the **secondary** node.
sudo -i
```
-1. Edit `/etc/gitlab/gitlab.rb` and add a **unique** name for your node. You will need this in the next steps:
+1. Edit `/etc/gitlab/gitlab.rb` and add a **unique** name for your node. You need this in the next steps:
```ruby
# The unique identifier for the Geo node.
@@ -229,9 +229,9 @@ keys must be manually replicated to the **secondary** node.
gitlab-rake gitlab:geo:check
```
-Once added to the Geo administration page and restarted, the **secondary** node will automatically start
+Once added to the Geo administration page and restarted, the **secondary** node automatically starts
replicating missing data from the **primary** node in a process known as **backfill**.
-Meanwhile, the **primary** node will start to notify each **secondary** node of any changes, so
+Meanwhile, the **primary** node starts to notify each **secondary** node of any changes, so
that the **secondary** node can act on those notifications immediately.
Be sure the _secondary_ node is running and accessible. You can sign in to the
@@ -241,7 +241,7 @@ _secondary_ node with the same credentials as were used with the _primary_ node.
You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
-If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
+If your **primary** node is using a self-signed certificate for *HTTPS* support, you
need to add that certificate to the **secondary** node's trust store. Retrieve the
certificate from the **primary** node and follow
[these instructions](https://docs.gitlab.com/omnibus/settings/ssl.html)
@@ -265,7 +265,7 @@ the _primary_ node. Visit the _secondary_ node's **Admin Area > Geo**
(`/admin/geo/nodes`) in your browser to determine if it's correctly identified
as a _secondary_ Geo node, and if Geo is enabled.
-The initial replication, or 'backfill', will probably still be in progress. You
+The initial replication, or 'backfill', is probably still in progress. You
can monitor the synchronization process on each Geo node from the **primary**
node's **Geo Nodes** dashboard in your browser.
@@ -282,12 +282,12 @@ The two most obvious issues that can become apparent in the dashboard are:
- You are using a custom certificate or custom CA (see the [troubleshooting document](troubleshooting.md)).
- The instance is firewalled (check your firewall rules).
-Please note that disabling a **secondary** node will stop the synchronization process.
+Please note that disabling a **secondary** node stops the synchronization process.
Please note that if `git_data_dirs` is customized on the **primary** node for multiple
repository shards you must duplicate the same configuration on each **secondary** node.
-Point your users to the ["Using a Geo Server" guide](using_a_geo_server.md).
+Point your users to the [Using a Geo Site guide](usage.md).
Currently, this is what is synced:
@@ -312,7 +312,7 @@ It is important to note that selective synchronization:
1. Does not hide project metadata from **secondary** nodes.
- Since Geo currently relies on PostgreSQL replication, all project metadata
gets replicated to **secondary** nodes, but repositories that have not been
- selected will be empty.
+ selected are empty.
1. Does not reduce the number of events generated for the Geo event log.
- The **primary** node generates events as long as any **secondary** nodes are present.
Selective synchronization restrictions are implemented on the **secondary** nodes,
diff --git a/doc/administration/geo/replication/datatypes.md b/doc/administration/geo/replication/datatypes.md
index 61f99257844..e2f12cbd8dc 100644
--- a/doc/administration/geo/replication/datatypes.md
+++ b/doc/administration/geo/replication/datatypes.md
@@ -205,3 +205,33 @@ successfully, you must replicate their data using some other means.
|[GitLab Pages](../../pages/index.md) | [No](https://gitlab.com/groups/gitlab-org/-/epics/589) | No | No | |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked on [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Note that replication of this cache is not needed for Disaster Recovery purposes because it can be recreated from external sources. |
|[Vulnerability Export](../../../user/application_security/vulnerability_report/#export-vulnerability-details) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/3111) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Not planned because they are ephemeral and sensitive. They can be regenerated on demand. |
+
+#### LFS object replication using the self service framework
+
+> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/276696) in GitLab 13.12.
+> - [Deployed behind a feature flag](../../../user/feature_flags.md), enabled by default.
+> - Not enabled on GitLab.com as Geo is not enabled.
+> - Recommended for production use.
+> - For GitLab self-managed instances, GitLab administrators can opt to [disable it](#enable-or-disable-lfs-object-replication-using-the-self-service-framework).
+
+There can be [risks when disabling released features](../../../user/feature_flags.md#risks-when-disabling-released-features).
+Refer to this feature's version history for more details.
+
+##### Enable or disable LFS object replication using the self service framework
+
+LFS object replication using the self service framework is under development but ready for production use. It is
+deployed behind a feature flag that is **enabled by default**.
+[GitLab administrators with access to the GitLab Rails console](../../../administration/feature_flags.md)
+can opt to disable it.
+
+To enable it:
+
+```ruby
+Feature.enable(:geo_lfs_object_replication)
+```
+
+To disable it:
+
+```ruby
+Feature.disable(:geo_lfs_object_replication)
+```
diff --git a/doc/administration/geo/replication/docker_registry.md b/doc/administration/geo/replication/docker_registry.md
index ea73614511f..a8628481ba7 100644
--- a/doc/administration/geo/replication/docker_registry.md
+++ b/doc/administration/geo/replication/docker_registry.md
@@ -24,7 +24,7 @@ integrated [Container Registry](../../packages/container_registry.md#use-object-
You can enable a storage-agnostic replication so it
can be used for cloud or local storage. Whenever a new image is pushed to the
-**primary** site, each **secondary** site will pull it to its own container
+**primary** site, each **secondary** site pulls it to its own container
repository.
To configure Docker Registry replication:
@@ -70,12 +70,12 @@ We need to make Docker Registry send notification events to the
NOTE:
If you use an external Registry (not the one integrated with GitLab), you must add
- these settings to its configuration yourself. In this case, you will also have to specify
+ these settings to its configuration yourself. In this case, you also have to specify
notification secret in `registry.notification_secret` section of
`/etc/gitlab/gitlab.rb` file.
NOTE:
- If you use GitLab HA, you will also have to specify
+ If you use GitLab HA, you also have to specify
the notification secret in `registry.notification_secret` section of
`/etc/gitlab/gitlab.rb` file for every web node.
@@ -95,11 +95,11 @@ expecting to see the Docker images replicated.
Because we need to allow the **secondary** site to communicate securely with
the **primary** site Container Registry, we need to have a single key
-pair for all the sites. The **secondary** site will use this key to
+pair for all the sites. The **secondary** site uses this key to
generate a short-lived JWT that is pull-only-capable to access the
**primary** site Container Registry.
-For each application and Sidekiq node on the **secondary** site:
+For each application and Sidekiq node on the **secondary** site:
1. SSH into the node and login as the `root` user:
@@ -126,5 +126,5 @@ For each application and Sidekiq node on the **secondary** site:
To verify Container Registry replication is working, go to **Admin Area > Geo**
(`/admin/geo/nodes`) on the **secondary** site.
-The initial replication, or "backfill", will probably still be in progress.
+The initial replication, or "backfill", is probably still in progress.
You can monitor the synchronization process on each Geo site from the **primary** site's **Geo Nodes** dashboard in your browser.
diff --git a/doc/administration/geo/replication/faq.md b/doc/administration/geo/replication/faq.md
index 73a36f5e674..a83a1c22db6 100644
--- a/doc/administration/geo/replication/faq.md
+++ b/doc/administration/geo/replication/faq.md
@@ -13,61 +13,61 @@ The requirements are listed [on the index page](../index.md#requirements-for-run
## How does Geo know which projects to sync?
-On each **secondary** node, there is a read-only replicated copy of the GitLab database.
-A **secondary** node also has a tracking database where it stores which projects have been synced.
+On each **secondary** site, there is a read-only replicated copy of the GitLab database.
+A **secondary** site also has a tracking database where it stores which projects have been synced.
Geo compares the two databases to find projects that are not yet tracked.
At the start, this tracking database is empty, so Geo will start trying to update from every project that it can see in the GitLab database.
For each project to sync:
-1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** node.
+1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** site.
If there are no changes, the sync will be fast and end quickly. Otherwise, it will pull the latest commits.
-1. The **secondary** node will update the tracking database to store the fact that it has synced projects A, B, C, etc.
+1. The **secondary** site will update the tracking database to store the fact that it has synced projects A, B, C, etc.
1. Repeat until all projects are synced.
-When someone pushes a commit to the **primary** node, it generates an event in the GitLab database that the repository has changed.
-The **secondary** node sees this event, marks the project in question as dirty, and schedules the project to be resynced.
+When someone pushes a commit to the **primary** site, it generates an event in the GitLab database that the repository has changed.
+The **secondary** site sees this event, marks the project in question as dirty, and schedules the project to be resynced.
To ensure that problems with pipelines (for example, syncs failing too many times or jobs being lost) don't permanently stop projects syncing, Geo also periodically checks the tracking database for projects that are marked as dirty. This check happens when
the number of concurrent syncs falls below `repos_max_capacity` and there are no new projects waiting to be synced.
Geo also has a checksum feature which runs a SHA256 sum across all the Git references to the SHA values.
-If the refs don't match between the **primary** node and the **secondary** node, then the **secondary** node will mark that project as dirty and try to resync it.
+If the refs don't match between the **primary** site and the **secondary** site, then the **secondary** site will mark that project as dirty and try to resync it.
So even if we have an outdated tracking database, the validation should activate and find discrepancies in the repository state and resync.
## Can I use Geo in a disaster recovery situation?
Yes, but there are limitations to what we replicate (see
-[What data is replicated to a **secondary** node?](#what-data-is-replicated-to-a-secondary-node)).
+[What data is replicated to a **secondary** site?](#what-data-is-replicated-to-a-secondary-site)).
Read the documentation for [Disaster Recovery](../disaster_recovery/index.md).
-## What data is replicated to a **secondary** node?
+## What data is replicated to a **secondary** site?
We currently replicate project repositories, LFS objects, generated
attachments / avatars and the whole database. This means user accounts,
issues, merge requests, groups, project data, etc., will be available for
query.
-## Can I `git push` to a **secondary** node?
+## Can I `git push` to a **secondary** site?
-Yes! Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
+Yes! Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
-## How long does it take to have a commit replicated to a **secondary** node?
+## How long does it take to have a commit replicated to a **secondary** site?
All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of
factors including the amount of traffic, how big your commit is, the
-connectivity between your nodes, your hardware, etc.
+connectivity between your sites, your hardware, etc.
## What if the SSH server runs at a different port?
-That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to all **secondary** nodes.
+That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** site to all **secondary** sites.
-## Is this possible to set up a Docker Registry for a **secondary** node that mirrors the one on the **primary** node?
+## Is this possible to set up a Docker Registry for a **secondary** site that mirrors the one on the **primary** site?
-Yes. See [Docker Registry for a **secondary** node](docker_registry.md).
+Yes. See [Docker Registry for a **secondary** site](docker_registry.md).
-## Can I login to a secondary node?
+## Can I login to a secondary site?
-Yes, but secondary nodes receive all authentication data (like user accounts and logins) from the primary instance. This means you will be re-directed to the primary for authentication and routed back afterwards.
+Yes, but secondary sites receive all authentication data (like user accounts and logins) from the primary instance. This means you will be re-directed to the primary for authentication and routed back afterwards.
diff --git a/doc/administration/geo/replication/geo_validation_tests.md b/doc/administration/geo/replication/geo_validation_tests.md
index 06fd3cd70be..8f67e70c9e2 100644
--- a/doc/administration/geo/replication/geo_validation_tests.md
+++ b/doc/administration/geo/replication/geo_validation_tests.md
@@ -179,6 +179,15 @@ The following are PostgreSQL upgrade validation tests we performed.
The following are additional validation tests we performed.
+### May 2021
+
+[Test failover with object storage replication enabled](https://gitlab.com/gitlab-org/gitlab/-/issues/330362):
+
+- Description: At the time of testing, Geo's object storage replication functionality was in beta. We tested that object storage replication works as intended and that the data was present on the new primary after a failover.
+- Outcome: The test was successful. Data in object storage was replicated and present after a failover.
+- Follow up issues:
+ - [Geo: Failing to replicate initial Monitoring project](https://gitlab.com/gitlab-org/gitlab/-/issues/330485)
+
### August 2020
[Test Gitaly Cluster on a Geo Deployment](https://gitlab.com/gitlab-org/gitlab/-/issues/223210):
diff --git a/doc/administration/geo/replication/location_aware_git_url.md b/doc/administration/geo/replication/location_aware_git_url.md
index 272b746015b..014ca59e571 100644
--- a/doc/administration/geo/replication/location_aware_git_url.md
+++ b/doc/administration/geo/replication/location_aware_git_url.md
@@ -8,11 +8,11 @@ type: howto
# Location-aware Git remote URL with AWS Route53 **(PREMIUM SELF)**
You can provide GitLab users with a single remote URL that automatically uses
-the Geo node closest to them. This means users don't need to update their Git
-configuration to take advantage of closer Geo nodes as they move.
+the Geo site closest to them. This means users don't need to update their Git
+configuration to take advantage of closer Geo sites as they move.
This is possible because, Git push requests can be automatically redirected
-(HTTP) or proxied (SSH) from **secondary** nodes to the **primary** node.
+(HTTP) or proxied (SSH) from **secondary** sites to the **primary** site.
Though these instructions use [AWS Route53](https://aws.amazon.com/route53/),
other services such as [Cloudflare](https://www.cloudflare.com/) could be used
@@ -20,30 +20,30 @@ as well.
NOTE:
You can also use a load balancer to distribute web UI or API traffic to
-[multiple Geo **secondary** nodes](../../../user/admin_area/geo_nodes.md#multiple-secondary-nodes-behind-a-load-balancer).
-Importantly, the **primary** node cannot yet be included. See the feature request
+[multiple Geo **secondary** sites](../../../user/admin_area/geo_nodes.md#multiple-secondary-nodes-behind-a-load-balancer).
+Importantly, the **primary** site cannot yet be included. See the feature request
[Support putting the **primary** behind a Geo node load balancer](https://gitlab.com/gitlab-org/gitlab/-/issues/10888).
## Prerequisites
In this example, we have already set up:
-- `primary.example.com` as a Geo **primary** node.
-- `secondary.example.com` as a Geo **secondary** node.
+- `primary.example.com` as a Geo **primary** site.
+- `secondary.example.com` as a Geo **secondary** site.
We will create a `git.example.com` subdomain that will automatically direct
requests:
-- From Europe to the **secondary** node.
-- From all other locations to the **primary** node.
+- From Europe to the **secondary** site.
+- From all other locations to the **primary** site.
In any case, you require:
-- A working GitLab **primary** node that is accessible at its own address.
-- A working GitLab **secondary** node.
+- A working GitLab **primary** site that is accessible at its own address.
+- A working GitLab **secondary** site.
- A Route53 Hosted Zone managing your domain.
-If you haven't yet set up a Geo _primary_ node and _secondary_ node, see the
+If you haven't yet set up a Geo _primary_ site and _secondary_ site, see the
[Geo setup instructions](../index.md#setup-instructions).
## Create a traffic policy
@@ -89,7 +89,7 @@ routing configurations.
![Created policy record](img/single_git_created_policy_record.png)
You have successfully set up a single host, e.g. `git.example.com` which
-distributes traffic to your Geo nodes by geolocation!
+distributes traffic to your Geo sites by geolocation!
## Configure Git clone URLs to use the special Git URL
@@ -114,10 +114,10 @@ You can customize the:
After following the configuration steps above, handling for Git requests is now location aware.
For requests:
-- Outside Europe, all requests are directed to the **primary** node.
+- Outside Europe, all requests are directed to the **primary** site.
- Within Europe, over:
- HTTP:
- - `git clone http://git.example.com/foo/bar.git` is directed to the **secondary** node.
+ - `git clone http://git.example.com/foo/bar.git` is directed to the **secondary** site.
- `git push` is initially directed to the **secondary**, which automatically
redirects to `primary.example.com`.
- SSH:
diff --git a/doc/administration/geo/replication/multiple_servers.md b/doc/administration/geo/replication/multiple_servers.md
index f83e0e14e54..59bb3884a02 100644
--- a/doc/administration/geo/replication/multiple_servers.md
+++ b/doc/administration/geo/replication/multiple_servers.md
@@ -53,14 +53,14 @@ It is possible to use cloud hosted services for PostgreSQL and Redis, but this i
## Prerequisites: Two working GitLab multi-node clusters
-One cluster will serve as the **primary** node. Use the
+One cluster serves as the **primary** node. Use the
[GitLab multi-node documentation](../../reference_architectures/index.md) to set this up. If
you already have a working GitLab instance that is in-use, it can be used as a
**primary**.
-The second cluster will serve as the **secondary** node. Again, use the
+The second cluster serves as the **secondary** node. Again, use the
[GitLab multi-node documentation](../../reference_architectures/index.md) to set this up.
-It's a good idea to log in and test it, however, note that its data will be
+It's a good idea to log in and test it, however, note that its data is
wiped out as part of the process of replicating from the **primary**.
## Configure the GitLab cluster to be the **primary** node
@@ -120,7 +120,7 @@ major differences:
called the "tracking database", which tracks the synchronization state of
various resources.
-Therefore, we will set up the multi-node components one-by-one, and include deviations
+Therefore, we set up the multi-node components one-by-one, and include deviations
from the normal multi-node setup. However, we highly recommend first configuring a
brand-new cluster as if it were not part of a Geo setup so that it can be
tested and verified as a working cluster. And only then should it be modified
@@ -133,7 +133,7 @@ Configure the following services, again using the non-Geo multi-node
documentation:
- [Configuring Redis for GitLab](../../redis/replication_and_failover.md#example-configuration-for-the-gitlab-application) for multiple nodes.
-- [Gitaly](../../gitaly/index.md), which will store data that is
+- [Gitaly](../../gitaly/index.md), which stores data that is
synchronized from the **primary** node.
NOTE:
@@ -143,7 +143,7 @@ recommended.
### Step 2: Configure the main read-only replica PostgreSQL database on the **secondary** node
NOTE:
-The following documentation assumes the database will be run on
+The following documentation assumes the database runs on
a single node only. Multi-node PostgreSQL on **secondary** nodes is
[not currently supported](https://gitlab.com/groups/gitlab-org/-/epics/2536).
@@ -151,7 +151,7 @@ Configure the [**secondary** database](../setup/database.md) as a read-only repl
the **primary** database. Use the following as a guide.
1. Generate an MD5 hash of the desired password for the database user that the
- GitLab application will use to access the read-replica database:
+ GitLab application uses to access the read-replica database:
Note that the username (`gitlab` by default) is incorporated into the hash.
@@ -233,13 +233,13 @@ If using an external PostgreSQL instance, refer also to
### Step 3: Configure the tracking database on the **secondary** node
NOTE:
-This documentation assumes the tracking database will be run on
+This documentation assumes the tracking database runs on
only a single machine, rather than as a PostgreSQL cluster.
Configure the tracking database.
1. Generate an MD5 hash of the desired password for the database user that the
- GitLab application will use to access the tracking database:
+ GitLab application uses to access the tracking database:
Note that the username (`gitlab_geo` by default) is incorporated into the
hash.
@@ -377,7 +377,7 @@ Make sure that current node IP is listed in `postgresql['md5_auth_cidr_addresses
After making these changes [Reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) so the changes take effect.
-On the secondary the following GitLab frontend services will be enabled:
+On the secondary the following GitLab frontend services are enabled:
- `geo-logcursor`
- `gitlab-pages`
diff --git a/doc/administration/geo/replication/object_storage.md b/doc/administration/geo/replication/object_storage.md
index ad419f999b3..7dd831092a3 100644
--- a/doc/administration/geo/replication/object_storage.md
+++ b/doc/administration/geo/replication/object_storage.md
@@ -9,9 +9,9 @@ type: howto
Geo can be used in combination with Object Storage (AWS S3, or other compatible object storage).
-Currently, **secondary** nodes can use either:
+Currently, **secondary** sites can use either:
-- The same storage bucket as the **primary** node.
+- The same storage bucket as the **primary** site.
- A replicated storage bucket.
To have:
@@ -28,13 +28,13 @@ To have:
WARNING:
This is a [**beta** feature](https://about.gitlab.com/handbook/product/#beta) and is not ready yet for production use at any scale. The main limitations are a lack of testing at scale and no verification of any replicated data.
-**Secondary** nodes can replicate files stored on the **primary** node regardless of
+**Secondary** sites can replicate files stored on the **primary** site regardless of
whether they are stored on the local file system or in object storage.
To enable GitLab replication, you must:
1. Go to **Admin Area > Geo**.
-1. Press **Edit** on the **secondary** node.
+1. Press **Edit** on the **secondary** site.
1. In the **Synchronization Settings** section, find the **Allow this secondary node to replicate content on Object Storage**
checkbox to enable it.
@@ -46,7 +46,7 @@ For CI job artifacts, there is similar documentation to configure
For user uploads, there is similar documentation to configure [upload object storage](../../uploads.md#using-object-storage)
-If you want to migrate the **primary** node's files to object storage, you can
+If you want to migrate the **primary** site's files to object storage, you can
configure the **secondary** in a few ways:
- Use the exact same object storage.
@@ -57,15 +57,15 @@ configure the **secondary** in a few ways:
GitLab does not currently support the case where both:
-- The **primary** node uses local storage.
-- A **secondary** node uses object storage.
+- The **primary** site uses local storage.
+- A **secondary** site uses object storage.
## Third-party replication services
When using Amazon S3, you can use
[CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to
-have automatic replication between the bucket used by the **primary** node and
-the bucket used by **secondary** nodes.
+have automatic replication between the bucket used by the **primary** site and
+the bucket used by **secondary** sites.
If you are using Google Cloud Storage, consider using
[Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional).
diff --git a/doc/administration/geo/replication/remove_geo_node.md b/doc/administration/geo/replication/remove_geo_node.md
index 09ea84b6c4b..697d8c6ae38 100644
--- a/doc/administration/geo/replication/remove_geo_node.md
+++ b/doc/administration/geo/replication/remove_geo_node.md
@@ -4,5 +4,5 @@ redirect_to: '../../geo/replication/remove_geo_site.md'
This document was moved to [another location](../../geo/replication/remove_geo_site.md).
-<!-- This redirect file can be deleted after 2022-04-01 -->
+<!-- This redirect file can be deleted after 2021-06-01 -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page -->
diff --git a/doc/administration/geo/replication/security_review.md b/doc/administration/geo/replication/security_review.md
index abb84b95623..f84d7a2171d 100644
--- a/doc/administration/geo/replication/security_review.md
+++ b/doc/administration/geo/replication/security_review.md
@@ -36,7 +36,7 @@ from [owasp.org](https://owasp.org/).
- The GitLab model of sensitivity is centered around public vs. internal vs.
private projects. Geo replicates them all indiscriminately. "Selective sync"
exists for files and repositories (but not database content), which would permit
- only less-sensitive projects to be replicated to a **secondary** node if desired.
+ only less-sensitive projects to be replicated to a **secondary** site if desired.
- See also: [GitLab data classification policy](https://about.gitlab.com/handbook/engineering/security/data-classification-standard.html).
### What data backup and retention requirements have been defined for the application?
@@ -48,18 +48,18 @@ from [owasp.org](https://owasp.org/).
### Who are the application's end‐users?
-- **Secondary** nodes are created in regions that are distant (in terms of
- Internet latency) from the main GitLab installation (the **primary** node). They are
- intended to be used by anyone who would ordinarily use the **primary** node, who finds
- that the **secondary** node is closer to them (in terms of Internet latency).
+- **Secondary** sites are created in regions that are distant (in terms of
+ Internet latency) from the main GitLab installation (the **primary** site). They are
+ intended to be used by anyone who would ordinarily use the **primary** site, who finds
+ that the **secondary** site is closer to them (in terms of Internet latency).
### How do the end‐users interact with the application?
-- **Secondary** nodes provide all the interfaces a **primary** node does
+- **Secondary** sites provide all the interfaces a **primary** site does
(notably a HTTP/HTTPS web application, and HTTP/HTTPS or SSH Git repository
access), but is constrained to read-only activities. The principal use case is
- envisioned to be cloning Git repositories from the **secondary** node in favor of the
- **primary** node, but end-users may use the GitLab web interface to view projects,
+ envisioned to be cloning Git repositories from the **secondary** site in favor of the
+ **primary** site, but end-users may use the GitLab web interface to view projects,
issues, merge requests, snippets, etc.
### What security expectations do the end‐users have?
@@ -67,10 +67,10 @@ from [owasp.org](https://owasp.org/).
- The replication process must be secure. It would typically be unacceptable to
transmit the entire database contents or all files and repositories across the
public Internet in plaintext, for instance.
-- **Secondary** nodes must have the same access controls over its content as the
- **primary** node - unauthenticated users must not be able to gain access to privileged
- information on the **primary** node by querying the **secondary** node.
-- Attackers must not be able to impersonate the **secondary** node to the **primary** node, and
+- **Secondary** sites must have the same access controls over its content as the
+ **primary** site - unauthenticated users must not be able to gain access to privileged
+ information on the **primary** site by querying the **secondary** site.
+- Attackers must not be able to impersonate the **secondary** site to the **primary** site, and
thus gain access to privileged information.
## Administrators
@@ -86,7 +86,7 @@ from [owasp.org](https://owasp.org/).
### What administrative capabilities does the application offer?
-- **Secondary** nodes may be added, modified, or removed by users with
+- **Secondary** sites may be added, modified, or removed by users with
administrative access.
- The replication process may be controlled (start/stop) via the Sidekiq
administrative controls.
@@ -95,9 +95,9 @@ from [owasp.org](https://owasp.org/).
### What details regarding routing, switching, firewalling, and load‐balancing have been defined?
-- Geo requires the **primary** node and **secondary** node to be able to communicate with each
- other across a TCP/IP network. In particular, the **secondary** nodes must be able to
- access HTTP/HTTPS and PostgreSQL services on the **primary** node.
+- Geo requires the **primary** site and **secondary** site to be able to communicate with each
+ other across a TCP/IP network. In particular, the **secondary** sites must be able to
+ access HTTP/HTTPS and PostgreSQL services on the **primary** site.
### What core network devices support the application?
@@ -105,9 +105,9 @@ from [owasp.org](https://owasp.org/).
### What network performance requirements exist?
-- Maximum replication speeds between **primary** node and **secondary** node is limited by the
+- Maximum replication speeds between **primary** site and **secondary** site is limited by the
available bandwidth between sites. No hard requirements exist - time to complete
- replication (and ability to keep up with changes on the **primary** node) is a function
+ replication (and ability to keep up with changes on the **primary** site) is a function
of the size of the data set, tolerance for latency, and available network
capacity.
@@ -189,9 +189,9 @@ from [owasp.org](https://owasp.org/).
### How will database connection strings, encryption keys, and other sensitive components be stored, accessed, and protected from unauthorized detection?
- There are some Geo-specific values. Some are shared secrets which must be
- securely transmitted from the **primary** node to the **secondary** node at setup time. Our
- documentation recommends transmitting them from the **primary** node to the system
- administrator via SSH, and then back out to the **secondary** node in the same manner.
+ securely transmitted from the **primary** site to the **secondary** site at setup time. Our
+ documentation recommends transmitting them from the **primary** site to the system
+ administrator via SSH, and then back out to the **secondary** site in the same manner.
In particular, this includes the PostgreSQL replication credentials and a secret
key (`db_key_base`) which is used to decrypt certain columns in the database.
The `db_key_base` secret is stored unencrypted on the file system, in
@@ -205,25 +205,25 @@ from [owasp.org](https://owasp.org/).
- Data is entered via the web application exposed by GitLab itself. Some data is
also entered using system administration commands on the GitLab servers (e.g.,
`gitlab-ctl set-primary-node`).
-- **Secondary** nodes also receive inputs via PostgreSQL streaming replication from the **primary** node.
+- **Secondary** sites also receive inputs via PostgreSQL streaming replication from the **primary** site.
### What data output paths does the application support?
-- **Primary** nodes output via PostgreSQL streaming replication to the **secondary** node.
+- **Primary** sites output via PostgreSQL streaming replication to the **secondary** site.
Otherwise, principally via the web application exposed by GitLab itself, and via
SSH `git clone` operations initiated by the end-user.
### How does data flow across the application's internal components?
-- **Secondary** nodes and **primary** nodes interact via HTTP/HTTPS (secured with JSON web
+- **Secondary** sites and **primary** sites interact via HTTP/HTTPS (secured with JSON web
tokens) and via PostgreSQL streaming replication.
-- Within a **primary** node or **secondary** node, the SSOT is the file system and the database
- (including Geo tracking database on **secondary** node). The various internal components
+- Within a **primary** site or **secondary** site, the SSOT is the file system and the database
+ (including Geo tracking database on **secondary** site). The various internal components
are orchestrated to make alterations to these stores.
### What data input validation requirements have been defined?
-- **Secondary** nodes must have a faithful replication of the **primary** node's data.
+- **Secondary** sites must have a faithful replication of the **primary** site's data.
### What data does the application store and how?
@@ -231,11 +231,11 @@ from [owasp.org](https://owasp.org/).
### What data is or may need to be encrypted and what key management requirements have been defined?
-- Neither **primary** nodes or **secondary** nodes encrypt Git repository or file system data at
+- Neither **primary** sites or **secondary** sites encrypt Git repository or file system data at
rest. A subset of database columns are encrypted at rest using the `db_otp_key`.
- A static secret shared across all hosts in a GitLab deployment.
- In transit, data should be encrypted, although the application does permit
- communication to proceed unencrypted. The two main transits are the **secondary** node's
+ communication to proceed unencrypted. The two main transits are the **secondary** site's
replication process for PostgreSQL, and for Git repositories/files. Both should
be protected using TLS, with the keys for that managed via Omnibus per existing
configuration for end-user access to GitLab.
@@ -253,19 +253,19 @@ from [owasp.org](https://owasp.org/).
### What user privilege levels does the application support?
-- Geo adds one type of privilege: **secondary** nodes can access a special Geo API to
+- Geo adds one type of privilege: **secondary** sites can access a special Geo API to
download files over HTTP/HTTPS, and to clone repositories using HTTP/HTTPS.
### What user identification and authentication requirements have been defined?
-- **Secondary** nodes identify to Geo **primary** nodes via OAuth or JWT authentication
+- **Secondary** sites identify to Geo **primary** sites via OAuth or JWT authentication
based on the shared database (HTTP access) or a PostgreSQL replication user (for
database replication). The database replication also requires IP-based access
controls to be defined.
### What user authorization requirements have been defined?
-- **Secondary** nodes must only be able to *read* data. They are not currently able to mutate data on the **primary** node.
+- **Secondary** sites must only be able to *read* data. They are not currently able to mutate data on the **primary** site.
### What session management requirements have been defined?
@@ -279,9 +279,9 @@ from [owasp.org](https://owasp.org/).
### What access requirements have been defined for URI and Service calls?
-- **Secondary** nodes make many calls to the **primary** node's API. This is how file
+- **Secondary** sites make many calls to the **primary** site's API. This is how file
replication proceeds, for instance. This endpoint is only accessible with a JWT token.
-- The **primary** node also makes calls to the **secondary** node to get status information.
+- The **primary** site also makes calls to the **secondary** site to get status information.
## Application Monitoring
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 079a3713c73..6d990fd12ba 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -558,6 +558,7 @@ to start again from scratch, there are a few steps that can help you:
mv /var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/uploads.old
mkdir -p /var/opt/gitlab/gitlab-rails/uploads
+ gitlab-ctl start postgresql
gitlab-ctl start geo-postgresql
```
@@ -853,6 +854,12 @@ To resolve this issue:
the **primary** node using IPv4 in the `/etc/hosts` file. Alternatively, you should
[enable IPv6 on the **primary** node](https://docs.gitlab.com/omnibus/settings/nginx.html#setting-the-nginx-listen-address-or-addresses).
+### GitLab Pages return 404 errors after promoting
+
+This is due to [Pages data not being managed by Geo](datatypes.md#limitations-on-replicationverification).
+Find advice to resolve those errors in the
+[Pages administration documentation](../../../administration/pages/index.md#404-error-after-promoting-a-geo-secondary-to-a-primary-node).
+
## Fixing client errors
### Authorization errors from LFS HTTP(s) client requests
diff --git a/doc/administration/geo/replication/usage.md b/doc/administration/geo/replication/usage.md
index 2fcc0565567..1491aa3427e 100644
--- a/doc/administration/geo/replication/usage.md
+++ b/doc/administration/geo/replication/usage.md
@@ -25,3 +25,11 @@ remote: ssh://git@primary.geo/user/repo.git
remote:
Everything up-to-date
```
+
+NOTE:
+If you're using HTTPS instead of [SSH](../../../ssh/README.md) to push to the secondary,
+you can't store credentials in the URL like `user:password@URL`. Instead, you can use a
+[`.netrc` file](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html)
+for Unix-like operating systems or `_netrc` for Windows. In that case, the credentials
+will be stored as a plain text. If you're looking for a more secure way to store credentials,
+you can use [Git Credential Storage](https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage).
diff --git a/doc/administration/geo/replication/using_a_geo_server.md b/doc/administration/geo/replication/using_a_geo_server.md
index e48e750f710..f8ce72ac3f8 100644
--- a/doc/administration/geo/replication/using_a_geo_server.md
+++ b/doc/administration/geo/replication/using_a_geo_server.md
@@ -4,5 +4,5 @@ redirect_to: '../../geo/replication/usage.md'
This document was moved to [another location](../../geo/replication/usage.md).
-<!-- This redirect file can be deleted after 2022-04-01 -->
+<!-- This redirect file can be deleted after 2022-06-01 -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page -->
diff --git a/doc/administration/geo/replication/version_specific_updates.md b/doc/administration/geo/replication/version_specific_updates.md
index 4f0a4dc638c..4a101c52325 100644
--- a/doc/administration/geo/replication/version_specific_updates.md
+++ b/doc/administration/geo/replication/version_specific_updates.md
@@ -15,8 +15,29 @@ for updating Geo nodes.
We've detected an issue [with a column rename](https://gitlab.com/gitlab-org/gitlab/-/issues/324160)
that may prevent upgrades to GitLab 13.9.0, 13.9.1, 13.9.2 and 13.9.3.
-We are working on a patch and recommend delaying any upgrade attempt until a fixed version
-is released.
+We are working on a patch, but until a fixed version is released, you can manually complete
+the zero-downtime upgrade:
+
+1. Before running the final `sudo gitlab-rake db:migrate` command on the deploy node,
+ execute the following queries using the PostgreSQL console (or `sudo gitlab-psql`)
+ to drop the problematic triggers:
+
+ ```sql
+ drop trigger trigger_e40a6f1858e6 on application_settings;
+ drop trigger trigger_0d588df444c8 on application_settings;
+ drop trigger trigger_1572cbc9a15f on application_settings;
+ drop trigger trigger_22a39c5c25f3 on application_settings;
+ ```
+
+1. Run the final migrations:
+
+ ```shell
+ sudo gitlab-rake db:migrate
+ ```
+
+If you have already run the final `sudo gitlab-rake db:migrate` command on the deploy node and have
+encountered the [column rename issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160), you can still
+follow the previous steps to complete the update.
More details are available [in this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/324160).
diff --git a/doc/administration/geo/setup/database.md b/doc/administration/geo/setup/database.md
index 9c2cc8fc62e..b87a606e349 100644
--- a/doc/administration/geo/setup/database.md
+++ b/doc/administration/geo/setup/database.md
@@ -466,15 +466,16 @@ The replication process is now complete.
[PgBouncer](https://www.pgbouncer.org/) may be used with GitLab Geo to pool
PostgreSQL connections. We recommend using PgBouncer if you use GitLab in a
high-availability configuration with a cluster of nodes supporting a Geo
-**primary** node and another cluster of nodes supporting a Geo **secondary** node. For more
-information, see [High Availability with Omnibus GitLab](../../postgresql/replication_and_failover.md).
+**primary** site and two other clusters of nodes supporting a Geo **secondary** site.
+One for the main database and the other for the tracking database. For more information,
+see [High Availability with Omnibus GitLab](../../postgresql/replication_and_failover.md).
## Patroni support
Support for Patroni is intended to replace `repmgr` as a
[highly available PostgreSQL solution](../../postgresql/replication_and_failover.md)
on the primary node, but it can also be used for PostgreSQL HA on a secondary
-site.
+site. Similar to `repmgr`, using Patroni on a secondary node is optional.
Starting with GitLab 13.5, Patroni is available for _experimental_ use with Geo
primary and secondary sites. Due to its experimental nature, Patroni support is
@@ -490,6 +491,10 @@ This experimental implementation has the following limitations:
For instructions about how to set up Patroni on the primary site, see the
[PostgreSQL replication and failover with Omnibus GitLab](../../postgresql/replication_and_failover.md#patroni) page.
+### Configuring Patroni cluster for a Geo secondary site
+
+In a Geo secondary site, the main PostgreSQL database is a read-only replica of the primary site’s PostgreSQL database.
+
If you are currently using `repmgr` on your Geo primary site, see [these instructions](#migrating-from-repmgr-to-patroni) for migrating from `repmgr` to Patroni.
A production-ready and secure setup requires at least three Consul nodes, three
@@ -498,9 +503,7 @@ configuration for the secondary site. The internal load balancer provides a sing
endpoint for connecting to the Patroni cluster's leader whenever a new leader is
elected. Be sure to use [password credentials](../../postgresql/replication_and_failover.md#database-authorization-for-patroni) and other database best practices.
-Similar to `repmgr`, using Patroni on a secondary node is optional.
-
-### Step 1. Configure Patroni permanent replication slot on the primary site
+#### Step 1. Configure Patroni permanent replication slot on the primary site
To set up database replication with Patroni on a secondary node, we need to
configure a _permanent replication slot_ on the primary node's Patroni cluster,
@@ -520,7 +523,7 @@ Leader instance**:
```ruby
consul['enable'] = true
consul['configuration'] = {
- retry_join: %w[CONSUL_PRIMARY1_IP CONSULT_PRIMARY2_IP CONSULT_PRIMARY3_IP]
+ retry_join: %w[CONSUL_PRIMARY1_IP CONSUL_PRIMARY2_IP CONSUL_PRIMARY3_IP]
}
repmgr['enable'] = false
@@ -553,7 +556,7 @@ Leader instance**:
gitlab-ctl reconfigure
```
-### Step 2. Configure the internal load balancer on the primary site
+#### Step 2. Configure the internal load balancer on the primary site
To avoid reconfiguring the Standby Leader on the secondary site whenever a new
Leader is elected on the primary site, we'll need to set up a TCP internal load
@@ -597,7 +600,65 @@ backend postgresql
Refer to your preferred Load Balancer's documentation for further guidance.
-### Step 3. Configure a Standby cluster on the secondary site
+#### Step 3. Configure a PgBouncer node on the secondary site
+
+A production-ready and highly available configuration requires at least
+three Consul nodes, a minimum of one PgBouncer node, but it’s recommended to have
+one per database node. An internal load balancer (TCP) is required when there is
+more than one PgBouncer service nodes. The internal load balancer provides a single
+endpoint for connecting to the PgBouncer cluster. For more information,
+see [High Availability with Omnibus GitLab](../../postgresql/replication_and_failover.md).
+
+Follow the minimal configuration for the PgBouncer node:
+
+1. SSH into your PgBouncer node and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following:
+
+ ```ruby
+ # Disable all components except Pgbouncer and Consul agent
+ roles ['pgbouncer_role']
+
+ # PgBouncer configuration
+ pgbouncer['users'] = {
+ 'pgbouncer': {
+ password: 'PGBOUNCER_PASSWORD_HASH'
+ }
+ }
+
+ # Consul configuration
+ consul['watchers'] = %w(postgresql)
+
+ consul['configuration'] = {
+ retry_join: %w[CONSUL_SECONDARY1_IP CONSUL_SECONDARY2_IP CONSUL_SECONDARY3_IP]
+ }
+
+ consul['monitoring_service_discovery'] = true
+ ```
+
+1. Reconfigure GitLab for the changes to take effect:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+1. Create a `.pgpass` file so Consul is able to reload PgBouncer. Enter the `PLAIN_TEXT_PGBOUNCER_PASSWORD` twice when asked:
+
+ ```shell
+ gitlab-ctl write-pgpass --host 127.0.0.1 --database pgbouncer --user pgbouncer --hostuser gitlab-consul
+ ```
+
+1. Restart the PgBouncer service:
+
+ ```shell
+ gitlab-ctl restart pgbouncer
+ ```
+
+#### Step 4. Configure a Standby cluster on the secondary site
NOTE:
If you are converting a secondary site to a Patroni Cluster, you must start
@@ -619,7 +680,7 @@ For each Patroni instance on the secondary site:
consul['enable'] = true
consul['configuration'] = {
- retry_join: %w[CONSUL_SECONDARY1_IP CONSULT_SECONDARY2_IP CONSULT_SECONDARY3_IP]
+ retry_join: %w[CONSUL_SECONDARY1_IP CONSUL_SECONDARY2_IP CONSUL_SECONDARY3_IP]
}
repmgr['enable'] = false
@@ -669,14 +730,14 @@ For each Patroni instance on the secondary site:
gitlab-ctl reconfigure
```
-## Migrating from repmgr to Patroni
+### Migrating from repmgr to Patroni
1. Before migrating, it is recommended that there is no replication lag between the primary and secondary sites and that replication is paused. In GitLab 13.2 and later, you can pause and resume replication with `gitlab-ctl geo-replication-pause` and `gitlab-ctl geo-replication-resume` on a Geo secondary database node.
1. Follow the [instructions to migrate repmgr to Patroni](../../postgresql/replication_and_failover.md#switching-from-repmgr-to-patroni). When configuring Patroni on each primary site database node, add `patroni['replication_slots'] = { '<slot_name>' => 'physical' }`
to `gitlab.rb` where `<slot_name>` is the name of the replication slot for your Geo secondary. This will ensure that Patroni recognizes the replication slot as permanent and will not drop it upon restarting.
1. If database replication to the secondary was paused before migration, resume replication once Patroni is confirmed working on the primary.
-## Migrating a single PostgreSQL node to Patroni
+### Migrating a single PostgreSQL node to Patroni
Before the introduction of Patroni, Geo had no Omnibus support for HA setups on the secondary node.
@@ -685,12 +746,197 @@ With Patroni it's now possible to support that. In order to migrate the existing
1. Make sure you have a Consul cluster setup on the secondary (similar to how you set it up on the primary).
1. [Configure a permanent replication slot](#step-1-configure-patroni-permanent-replication-slot-on-the-primary-site).
1. [Configure the internal load balancer](#step-2-configure-the-internal-load-balancer-on-the-primary-site).
-1. [Configure a Standby Cluster](#step-3-configure-a-standby-cluster-on-the-secondary-site)
+1. [Configure a PgBouncer node](#step-3-configure-a-pgbouncer-node-on-the-secondary-site)
+1. [Configure a Standby Cluster](#step-4-configure-a-standby-cluster-on-the-secondary-site)
on that single node machine.
You will end up with a "Standby Cluster" with a single node. That allows you to later on add additional Patroni nodes
by following the same instructions above.
+### Configuring Patroni cluster for the tracking PostgreSQL database
+
+Secondary sites use a separate PostgreSQL installation as a tracking database to
+keep track of replication status and automatically recover from potential replication issues.
+Omnibus automatically configures a tracking database when `roles ['geo_secondary_role']` is set.
+If you want to run this database in a highly available configuration, follow the instructions below.
+
+A production-ready and secure setup requires at least three Consul nodes, three
+Patroni nodes on the secondary site secondary site. Be sure to use [password credentials](../../postgresql/replication_and_failover.md#database-authorization-for-patroni) and other database best practices.
+
+#### Step 1. Configure a PgBouncer node on the secondary site
+
+A production-ready and highly available configuration requires at least
+three Consul nodes, three PgBouncer nodes, and one internal load-balancing node.
+The internal load balancer provides a single endpoint for connecting to the
+PgBouncer cluster. For more information, see [High Availability with Omnibus GitLab](../../postgresql/replication_and_failover.md).
+
+Follow the minimal configuration for the PgBouncer node for the tracking database:
+
+1. SSH into your PgBouncer node and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following:
+
+ ```ruby
+ # Disable all components except Pgbouncer and Consul agent
+ roles ['pgbouncer_role']
+
+ # PgBouncer configuration
+ pgbouncer['users'] = {
+ 'pgbouncer': {
+ password: 'PGBOUNCER_PASSWORD_HASH'
+ }
+ }
+
+ pgbouncer['databases'] = {
+ gitlabhq_geo_production: {
+ user: 'pgbouncer',
+ password: 'PGBOUNCER_PASSWORD_HASH'
+ }
+ }
+
+ # Consul configuration
+ consul['watchers'] = %w(postgresql)
+
+ consul['configuration'] = {
+ retry_join: %w[CONSUL_TRACKINGDB1_IP CONSUL_TRACKINGDB2_IP CONSUL_TRACKINGDB3_IP]
+ }
+
+ consul['monitoring_service_discovery'] = true
+
+ # GitLab database settings
+ gitlab_rails['db_database'] = 'gitlabhq_geo_production'
+ gitlab_rails['db_username'] = 'gitlab_geo'
+ ```
+
+1. Reconfigure GitLab for the changes to take effect:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+1. Create a `.pgpass` file so Consul is able to reload PgBouncer. Enter the `PLAIN_TEXT_PGBOUNCER_PASSWORD` twice when asked:
+
+ ```shell
+ gitlab-ctl write-pgpass --host 127.0.0.1 --database pgbouncer --user pgbouncer --hostuser gitlab-consul
+ ```
+
+1. Restart the PgBouncer service:
+
+ ```shell
+ gitlab-ctl restart pgbouncer
+ ```
+
+#### Step 2. Configure a Patroni cluster
+
+For each Patroni instance on the secondary site for the tracking database:
+
+1. SSH into your Patroni node and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following:
+
+ ```ruby
+ # Disable all components except PostgreSQL, Patroni, and Consul
+ roles ['patroni_role']
+
+ # Consul configuration
+ consul['services'] = %w(postgresql)
+
+ consul['configuration'] = {
+ server: true,
+ retry_join: %w[CONSUL_TRACKINGDB1_IP CONSUL_TRACKINGDB2_IP CONSUL_TRACKINGDB3_IP]
+ }
+
+ # PostgreSQL configuration
+ postgresql['listen_address'] = '0.0.0.0'
+ postgresql['hot_standby'] = 'on'
+ postgresql['wal_level'] = 'replica'
+
+ postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
+ postgresql['sql_replication_password'] = 'POSTGRESQL_REPLICATION_PASSWORD_HASH'
+ postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
+
+ postgresql['md5_auth_cidr_addresses'] = [
+ 'PATRONI_TRACKINGDB1_IP/32', 'PATRONI_TRACKINGDB2_IP/32', 'PATRONI_TRACKINGDB3_IP/32', 'PATRONI_TRACKINGDB_PGBOUNCER/32',
+ # Any other instance that needs access to the database as per documentation
+ ]
+
+ # Patroni configuration
+ patroni['replication_password'] = 'PLAIN_TEXT_POSTGRESQL_REPLICATION_PASSWORD'
+ patroni['postgresql']['max_wal_senders'] = 5 # A minimum of three for one replica, plus two for each additional replica
+
+ # GitLab database settings
+ gitlab_rails['db_database'] = 'gitlabhq_geo_production'
+ gitlab_rails['db_username'] = 'gitlab_geo'
+
+ # Disable automatic database migrations
+ gitlab_rails['auto_migrate'] = false
+ ```
+
+1. Reconfigure GitLab for the changes to take effect.
+ This is required to bootstrap PostgreSQL users and settings:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+#### Step 3. Configure the tracking database on the secondary nodes
+
+For each node running the `gitlab-rails`, `sidekiq`, and `geo-logcursor` services:
+
+1. SSH into your node and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following attributes. You may have other attributes set, but the following need to be set.
+
+ ```ruby
+ # Tracking database settings
+ geo_secondary['db_username'] = 'gitlab_geo'
+ geo_secondary['db_password'] = 'PLAIN_TEXT_PGBOUNCER_PASSWORD'
+ geo_secondary['db_database'] = 'gitlabhq_geo_production'
+ geo_secondary['db_host'] = 'PATRONI_TRACKINGDB_PGBOUNCER_IP'
+ geo_secondary['db_port'] = 6432
+ geo_secondary['auto_migrate'] = false
+
+ # Disable the tracking database service
+ geo_postgresql['enable'] = false
+ ```
+
+1. Reconfigure GitLab for the changes to take effect.
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+1. Run the tracking database migrations:
+
+ ```shell
+ gitlab-rake geo:db:migrate
+ ```
+
+### Migrating a single tracking database node to Patroni
+
+Before the introduction of Patroni, Geo had no Omnibus support for HA setups on
+the secondary node.
+
+With Patroni, it's now possible to support that. Due to some restrictions on the
+Patroni implementation on Omnibus that do not allow us to manage two different
+clusters on the same machine, we recommend setting up a new Patroni cluster for
+the tracking database by following the same instructions above.
+
+The secondary nodes will backfill the new tracking database, and no data
+synchronization will be required.
+
## Troubleshooting
Read the [troubleshooting document](../replication/troubleshooting.md).
diff --git a/doc/administration/geo/setup/index.md b/doc/administration/geo/setup/index.md
index 5ec18e29f21..1afa4360cbc 100644
--- a/doc/administration/geo/setup/index.md
+++ b/doc/administration/geo/setup/index.md
@@ -25,7 +25,7 @@ If you installed GitLab using the Omnibus packages (highly recommended):
1. [Configure fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md). This step is required and needs to be done on **both** the **primary** and **secondary** nodes.
1. [Configure GitLab](../replication/configuration.md) to set the **primary** and **secondary** nodes.
1. Optional: [Configure a secondary LDAP server](../../auth/ldap/index.md) for the **secondary** node. See [notes on LDAP](../index.md#ldap).
-1. [Follow the "Using a Geo Server" guide](../replication/using_a_geo_server.md).
+1. Follow the [Using a Geo Site](../replication/usage.md) guide.
## Post-installation documentation