summaryrefslogtreecommitdiff
path: root/doc/administration/postgresql/replication_and_failover.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/postgresql/replication_and_failover.md')
-rw-r--r--doc/administration/postgresql/replication_and_failover.md1186
1 files changed, 586 insertions, 600 deletions
diff --git a/doc/administration/postgresql/replication_and_failover.md b/doc/administration/postgresql/replication_and_failover.md
index 303246c9c82..75d0c558962 100644
--- a/doc/administration/postgresql/replication_and_failover.md
+++ b/doc/administration/postgresql/replication_and_failover.md
@@ -46,22 +46,19 @@ Each database node runs three services:
`PostgreSQL` - The database itself.
-`repmgrd` - Communicates with other repmgrd services in the cluster and handles
-failover when issues with the master server occurs. The failover procedure
+`Patroni` - Communicates with other patroni services in the cluster and handles
+failover when issues with the leader server occurs. The failover procedure
consists of:
-- Selecting a new master for the cluster.
-- Promoting the new node to master.
-- Instructing remaining servers to follow the new master node.
-- The old master node is automatically evicted from the cluster and should be
- rejoined manually once recovered.
+- Selecting a new leader for the cluster.
+- Promoting the new node to leader.
+- Instructing remaining servers to follow the new leader node.
-`Consul` agent - Monitors the status of each node in the database cluster and
-tracks its health in a service definition on the Consul cluster.
+`Consul` agent - To communicate with Consul cluster which stores the current Patroni state. The agent monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster.
### Consul server node
-The Consul server node runs the Consul server service.
+The Consul server node runs the Consul server service. These nodes must have reached the quorum and elected a leader _before_ Patroni cluster bootstrap otherwise database nodes will wait until such Consul leader is elected.
### PgBouncer node
@@ -80,7 +77,7 @@ Each service in the package comes with a set of [default ports](https://docs.git
- Application servers connect to either PgBouncer directly via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers.
- PgBouncer connects to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
-- Repmgr connects to the database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
+- Patroni actively manages the running PostgreSQL processes and configuration.
- PostgreSQL secondaries connect to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
- Consul servers and agents connect to each others [Consul default ports](https://docs.gitlab.com/omnibus/package-information/defaults.html#consul)
@@ -141,7 +138,7 @@ available database connections.
In this document we are assuming 3 database nodes, which makes this configuration:
```ruby
-postgresql['max_wal_senders'] = 4
+patroni['postgresql']['max_wal_senders'] = 4
```
As previously mentioned, you'll have to prepare the network subnets that will
@@ -186,18 +183,6 @@ Few notes on the service itself:
- `/etc/gitlab/gitlab.rb`: hashed, and in plain text
- `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
-#### Repmgr information
-
-When using default setup, you will only have to prepare the network subnets that will
-be allowed to authenticate with the service.
-
-Few notes on the service itself:
-
-- The service runs under the same system account as the database
- - In the package, this is by default `gitlab-psql`
-- The service will have a superuser database user account generated for it
- - This defaults to `gitlab_repmgr`
-
### Installing Omnibus GitLab
First, make sure to [download/install](https://about.gitlab.com/install/)
@@ -212,72 +197,80 @@ When installing the GitLab package, do not supply `EXTERNAL_URL` value.
1. Make sure to [configure the Consul nodes](../consul.md).
1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step.
-1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:
+#### Configuring Patroni cluster
- ```ruby
- # Disable all components except PostgreSQL and Repmgr and Consul
- roles ['postgres_role']
+You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`). When Patroni is enabled
+repmgr will be disabled automatically.
- # PostgreSQL configuration
- postgresql['listen_address'] = '0.0.0.0'
- postgresql['hot_standby'] = 'on'
- postgresql['wal_level'] = 'replica'
- postgresql['shared_preload_libraries'] = 'repmgr_funcs'
+Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, etc, are strictly
+controlled by Patroni and will override the original settings that you make with the `postgresql[...]` configuration key.
+Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
+Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
+`max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
+configuration key.
- # Disable automatic database migrations
- gitlab_rails['auto_migrate'] = false
+NOTE:
+The configuration of a Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore
+any replication setting of PostgreSQL (it will be overwritten anyway). Then you can remove any `repmgr[...]` or
+repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`.
- # Configure the Consul agent
- consul['services'] = %w(postgresql)
+Here is an example similar to [the one that was done with repmgr](#configuring-repmgr-nodes):
- # START user configuration
- # Please set the real values as explained in Required Information section
- #
- # Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
- postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
- # Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
- postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
- # Replace X with value of number of db nodes + 1
- postgresql['max_wal_senders'] = X
- postgresql['max_replication_slots'] = X
+```ruby
+# Disable all components except PostgreSQL, Patroni (or Repmgr), and Consul
+roles['postgres_role']
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
- repmgr['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 XXX.XXX.XXX.XXX/YY)
+# Enable Patroni (which automatically disables Repmgr).
+patroni['enable'] = true
- # Replace placeholders:
- #
- # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
- # with the addresses gathered for CONSUL_SERVER_NODES
- consul['configuration'] = {
- retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
- }
- #
- # END user configuration
- ```
+# PostgreSQL configuration
+postgresql['listen_address'] = '0.0.0.0'
- > `postgres_role` was introduced with GitLab 10.3
+# Disable automatic database migrations
+gitlab_rails['auto_migrate'] = false
-1. On secondary nodes, add all the configuration specified above for primary node
- to `/etc/gitlab/gitlab.rb`. In addition, append the following configuration
- to inform `gitlab-ctl` that they are standby nodes initially and it need not
- attempt to register them as primary node
+# Configure the Consul agent
+consul['services'] = %w(postgresql)
- ```ruby
- # Specify if a node should attempt to be master on initialization
- repmgr['master_on_initialization'] = false
- ```
+# START user configuration
+# Please set the real values as explained in Required Information section
+#
+# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
+postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
+# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
+postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
-1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-1. [Enable Monitoring](#enable-monitoring)
+# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)
+patroni['postgresql']['max_wal_senders'] = X
+patroni['postgresql']['max_replication_slots'] = X
-> Please note:
->
-> - If you want your database to listen on a specific interface, change the configuration:
-> `postgresql['listen_address'] = '0.0.0.0'`.
-> - If your PgBouncer service runs under a different user account,
-> you also need to specify: `postgresql['pgbouncer_user'] = PGBOUNCER_USERNAME` in
-> your configuration.
+# Replace XXX.XXX.XXX.XXX/YY with Network Address
+postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
+
+# Replace placeholders:
+#
+# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
+# with the addresses gathered for CONSUL_SERVER_NODES
+consul['configuration'] = {
+ retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
+}
+#
+# END user configuration
+```
+
+You do not need an additional or different configuration for replica nodes. As a matter of fact, you don't have to have
+a predetermined primary node. Therefore all database nodes use the same configuration.
+
+Once the configuration of a node is done, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure)
+on each node for the changes to take effect.
+
+Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure)
+becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order.
+If you choose an arbitrary order you do not have any predetermined master.
+
+NOTE:
+As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join
+the replicas.
#### Enable Monitoring
@@ -298,129 +291,6 @@ If you enable Monitoring, it must be enabled on **all** database servers.
1. Run `sudo gitlab-ctl reconfigure` to compile the configuration.
-#### Database nodes post-configuration
-
-##### Primary node
-
-Select one node as a primary node.
-
-1. Open a database prompt:
-
- ```shell
- gitlab-psql -d gitlabhq_production
- ```
-
-1. Enable the `pg_trgm` extension:
-
- ```shell
- CREATE EXTENSION pg_trgm;
- ```
-
-1. Enable the `btree_gist` extension:
-
- ```shell
- CREATE EXTENSION btree_gist;
- ```
-
-1. Exit the database prompt by typing `\q` and Enter.
-
-1. Verify the cluster is initialized with one node:
-
- ```shell
- gitlab-ctl repmgr cluster show
- ```
-
- The output should be similar to the following:
-
- ```plaintext
- Role | Name | Upstream | Connection String
- ----------+----------|----------|----------------------------------------
- * master | HOSTNAME | | host=HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
- ```
-
-1. Note down the hostname or IP address in the connection string: `host=HOSTNAME`. We will
- refer to the hostname in the next section as `MASTER_NODE_NAME`. If the value
- is not an IP address, it will need to be a resolvable name (via DNS or
- `/etc/hosts`)
-
-##### Secondary nodes
-
-1. Set up the repmgr standby:
-
- ```shell
- gitlab-ctl repmgr standby setup MASTER_NODE_NAME
- ```
-
- Do note that this will remove the existing data on the node. The command
- has a wait time.
-
- The output should be similar to the following:
-
- ```console
- # gitlab-ctl repmgr standby setup MASTER_NODE_NAME
- Doing this will delete the entire contents of /var/opt/gitlab/postgresql/data
- If this is not what you want, hit Ctrl-C now to exit
- To skip waiting, rerun with the -w option
- Sleeping for 30 seconds
- Stopping the database
- Removing the data
- Cloning the data
- Starting the database
- Registering the node with the cluster
- ok: run: repmgrd: (pid 19068) 0s
- ```
-
-1. Verify the node now appears in the cluster:
-
- ```shell
- gitlab-ctl repmgr cluster show
- ```
-
- The output should be similar to the following:
-
- ```plaintext
- Role | Name | Upstream | Connection String
- ----------+---------|-----------|------------------------------------------------
- * master | MASTER | | host=MASTER_NODE_NAME user=gitlab_repmgr dbname=gitlab_repmgr
- standby | STANDBY | MASTER | host=STANDBY_HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
- ```
-
-Repeat the above steps on all secondary nodes.
-
-#### Database checkpoint
-
-Before moving on, make sure the databases are configured correctly. Run the
-following command on the **primary** node to verify that replication is working
-properly:
-
-```shell
-gitlab-ctl repmgr cluster show
-```
-
-The output should be similar to:
-
-```plaintext
-Role | Name | Upstream | Connection String
-----------+--------------|--------------|--------------------------------------------------------------------
-* master | MASTER | | host=MASTER port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
- standby | STANDBY | MASTER | host=STANDBY port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
-```
-
-If the 'Role' column for any node says "FAILED", check the
-[Troubleshooting section](#troubleshooting) before proceeding.
-
-Also, check that the check master command works successfully on each node:
-
-```shell
-su - gitlab-consul
-gitlab-ctl repmgr-check-master || echo 'This node is a standby repmgr node'
-```
-
-This command relies on exit codes to tell Consul whether a particular node is a master
-or secondary. The most important thing here is that this command does not produce errors.
-If there are errors it's most likely due to incorrect `gitlab-consul` database user permissions.
-Check the [Troubleshooting section](#troubleshooting) before proceeding.
-
### Configuring the PgBouncer node
1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`CONSUL_PASSWORD_HASH`](#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information) before executing the next step.
@@ -577,6 +447,12 @@ PostgreSQL's `trust_auth_cidr_addresses` in `gitlab.rb` on your database nodes.
[PgBouncer error `ERROR: pgbouncer cannot connect to server`](#pgbouncer-error-error-pgbouncer-cannot-connect-to-server)
in the Troubleshooting section before proceeding.
+### Backups
+
+Do not backup or restore GitLab through a PgBouncer connection: this will cause a GitLab outage.
+
+[Read more about this and how to reconfigure backups](../../raketasks/backup_restore.md#backup-and-restore-for-installations-using-pgbouncer).
+
### Ensure GitLab is running
At this point, your GitLab instance should be up and running. Verify you're able
@@ -605,9 +481,9 @@ Here is a list and description of each machine and the assigned IP:
- `10.6.0.21`: PgBouncer 1
- `10.6.0.22`: PgBouncer 2
- `10.6.0.23`: PgBouncer 3
-- `10.6.0.31`: PostgreSQL master
-- `10.6.0.32`: PostgreSQL secondary
-- `10.6.0.33`: PostgreSQL secondary
+- `10.6.0.31`: PostgreSQL 1
+- `10.6.0.32`: PostgreSQL 2
+- `10.6.0.33`: PostgreSQL 3
- `10.6.0.41`: GitLab application
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes and the `external_url` for GitLab is `http://gitlab.example.com`.
@@ -667,29 +543,28 @@ An internal load balancer (TCP) is then required to be setup to serve each PgBou
#### Example recommended setup for PostgreSQL servers
-##### Primary node
-
-On primary node edit `/etc/gitlab/gitlab.rb`:
+On database nodes edit `/etc/gitlab/gitlab.rb`:
```ruby
-# Disable all components except PostgreSQL and Repmgr and Consul
+# Disable all components except PostgreSQL, Patroni (or Repmgr), and Consul
roles ['postgres_role']
# PostgreSQL configuration
postgresql['listen_address'] = '0.0.0.0'
postgresql['hot_standby'] = 'on'
postgresql['wal_level'] = 'replica'
-postgresql['shared_preload_libraries'] = 'repmgr_funcs'
+
+# Enable Patroni (which automatically disables Repmgr).
+patroni['enable'] = true
# Disable automatic database migrations
gitlab_rails['auto_migrate'] = false
postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
-postgresql['max_wal_senders'] = 4
+patroni['postgresql']['max_wal_senders'] = 4
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
-repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
# Configure the Consul agent
consul['services'] = %w(postgresql)
@@ -702,85 +577,27 @@ consul['monitoring_service_discovery'] = true
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-##### Secondary nodes
-
-On secondary nodes, edit `/etc/gitlab/gitlab.rb` and add all the configuration
-added to primary node, noted above. In addition, append the following
-configuration:
-
-```ruby
-# Specify if a node should attempt to be master on initialization
-repmgr['master_on_initialization'] = false
-```
-
-[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-
-###### Example recommended setup for application server
-
-On the server edit `/etc/gitlab/gitlab.rb`:
-
-```ruby
-external_url 'http://gitlab.example.com'
-
-gitlab_rails['db_host'] = '10.6.0.20' # Internal Load Balancer for PgBouncer nodes
-gitlab_rails['db_port'] = 6432
-gitlab_rails['db_password'] = 'toomanysecrets'
-gitlab_rails['auto_migrate'] = false
-
-postgresql['enable'] = false
-pgbouncer['enable'] = false
-consul['enable'] = true
-
-# Configure Consul agent
-consul['watchers'] = %w(postgresql)
-
-pgbouncer['users'] = {
- 'gitlab-consul': {
- password: '5e0e3263571e3704ad655076301d6ebe'
- },
- 'pgbouncer': {
- password: '771a8625958a529132abe6f1a4acb19c'
- }
-}
-
-consul['configuration'] = {
- retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
-}
-```
-
-[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-
#### Example recommended setup manual steps
After deploying the configuration follow these steps:
-1. On `10.6.0.31`, our primary database:
-
- Enable the `pg_trgm` and `btree_gist` extensions:
-
- ```shell
- gitlab-psql -d gitlabhq_production
- ```
+1. Find the primary database node:
```shell
- CREATE EXTENSION pg_trgm;
- CREATE EXTENSION btree_gist;
+ gitlab-ctl get-postgresql-primary
```
-1. On `10.6.0.32`, our first standby database:
+1. On the primary database node:
- Make this node a standby of the primary:
+ Enable the `pg_trgm` and `btree_gist` extensions:
```shell
- gitlab-ctl repmgr standby setup 10.6.0.21
+ gitlab-psql -d gitlabhq_production
```
-1. On `10.6.0.33`, our second standby database:
-
- Make this node a standby of the primary:
-
```shell
- gitlab-ctl repmgr standby setup 10.6.0.21
+ CREATE EXTENSION pg_trgm;
+ CREATE EXTENSION btree_gist;
```
1. On `10.6.0.41`, our application server:
@@ -802,15 +619,15 @@ After deploying the configuration follow these steps:
This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside).
It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL.
-The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](../consul.md#outage-recovery) on the same set of machines.
+The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#manual-failover-procedure-for-patroni) procedures in addition to [Consul outage recovery](../consul.md#outage-recovery) on the same set of machines.
In this example we start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses.
Here is a list and description of each machine and the assigned IP:
-- `10.6.0.21`: PostgreSQL master
-- `10.6.0.22`: PostgreSQL secondary
-- `10.6.0.23`: PostgreSQL secondary
+- `10.6.0.21`: PostgreSQL 1
+- `10.6.0.22`: PostgreSQL 2
+- `10.6.0.23`: PostgreSQL 3
- `10.6.0.31`: GitLab application
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes.
@@ -821,9 +638,7 @@ Please note that after the initial configuration, if a failover occurs, the Post
#### Example minimal configuration for database servers
-##### Primary node
-
-On primary database node edit `/etc/gitlab/gitlab.rb`:
+On database nodes edit `/etc/gitlab/gitlab.rb`:
```ruby
# Disable all components except PostgreSQL, Repmgr, and Consul
@@ -833,7 +648,9 @@ roles ['postgres_role']
postgresql['listen_address'] = '0.0.0.0'
postgresql['hot_standby'] = 'on'
postgresql['wal_level'] = 'replica'
-postgresql['shared_preload_libraries'] = 'repmgr_funcs'
+
+# Enable Patroni (which automatically disables Repmgr).
+patroni['enable'] = true
# Disable automatic database migrations
gitlab_rails['auto_migrate'] = false
@@ -843,10 +660,9 @@ consul['services'] = %w(postgresql)
postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
-postgresql['max_wal_senders'] = 4
+patroni['postgresql']['max_wal_senders'] = 4
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
-repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
consul['configuration'] = {
server: true,
@@ -856,16 +672,6 @@ consul['configuration'] = {
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-##### Secondary nodes
-
-On secondary nodes, edit `/etc/gitlab/gitlab.rb` and add all the information added
-to primary node, noted above. In addition, append the following configuration
-
-```ruby
-# Specify if a node should attempt to be master on initialization
-repmgr['master_on_initialization'] = false
-```
-
#### Example minimal configuration for application server
On the server edit `/etc/gitlab/gitlab.rb`:
@@ -908,249 +714,10 @@ consul['configuration'] = {
The manual steps for this configuration are the same as for the [example recommended setup](#example-recommended-setup-manual-steps).
-### Failover procedure
-
-By default, if the master database fails, `repmgrd` should promote one of the
-standby nodes to master automatically, and Consul will update PgBouncer with
-the new master.
-
-If you need to failover manually, you have two options:
-
-**Shutdown the current master database**
-
-Run:
-
-```shell
-gitlab-ctl stop postgresql
-```
-
-The automated failover process will see this and failover to one of the
-standby nodes.
-
-**Or perform a manual failover**
-
-1. Ensure the old master node is not still active.
-1. Login to the server that should become the new master and run:
-
- ```shell
- gitlab-ctl repmgr standby promote
- ```
-
-1. If there are any other standby servers in the cluster, have them follow
- the new master server:
-
- ```shell
- gitlab-ctl repmgr standby follow NEW_MASTER
- ```
-
-### Restore procedure
-
-If a node fails, it can be removed from the cluster, or added back as a standby
-after it has been restored to service.
-
-#### Remove a standby from the cluster
-
- From any other node in the cluster, run:
-
- ```shell
- gitlab-ctl repmgr standby unregister --node=X
- ```
-
- where X is the value of node in `repmgr.conf` on the old server.
-
- To find this, you can use:
-
- ```shell
- awk -F = '$1 == "node" { print $2 }' /var/opt/gitlab/postgresql/repmgr.conf
- ```
-
- It will output something like:
-
- ```plaintext
- 959789412
- ```
-
- Then you will use this ID to unregister the node:
-
- ```shell
- gitlab-ctl repmgr standby unregister --node=959789412
- ```
-
-#### Add a node as a standby server
-
- From the standby node, run:
-
- ```shell
- gitlab-ctl repmgr standby follow NEW_MASTER
- gitlab-ctl restart repmgrd
- ```
-
- WARNING:
- When the server is brought back online, and before
- you switch it to a standby node, repmgr will report that there are two masters.
- If there are any clients that are still attempting to write to the old master,
- this will cause a split, and the old master will need to be resynced from
- scratch by performing a `gitlab-ctl repmgr standby setup NEW_MASTER`.
-
-#### Add a failed master back into the cluster as a standby node
-
- Once `repmgrd` and PostgreSQL are running, the node will need to follow the new
- as a standby node.
-
- ```shell
- gitlab-ctl repmgr standby follow NEW_MASTER
- ```
-
- Once the node is following the new master as a standby, the node needs to be
- [unregistered from the cluster on the new master node](#remove-a-standby-from-the-cluster).
-
- Once the old master node has been unregistered from the cluster, it will need
- to be setup as a new standby:
-
- ```shell
- gitlab-ctl repmgr standby setup NEW_MASTER
- ```
-
- Failure to unregister and read the old master node can lead to subsequent failovers
- not working.
-
-### Alternate configurations
-
-#### Database authorization
-
-By default, we give any host on the database network the permission to perform
-repmgr operations using PostgreSQL's `trust` method. If you do not want this
-level of trust, there are alternatives.
-
-You can trust only the specific nodes that will be database clusters, or you
-can require md5 authentication.
-
-#### Trust specific addresses
-
-If you know the IP address, or FQDN of all database and PgBouncer nodes in the
-cluster, you can trust only those nodes.
-
-In `/etc/gitlab/gitlab.rb` on all of the database nodes, set
-`repmgr['trust_auth_cidr_addresses']` to an array of strings containing all of
-the addresses.
-
-If setting to a node's FQDN, they must have a corresponding PTR record in DNS.
-If setting to a node's IP address, specify it as `XXX.XXX.XXX.XXX/32`.
-
-For example:
-
-```ruby
-repmgr['trust_auth_cidr_addresses'] = %w(192.168.1.44/32 db2.example.com)
-```
-
-#### MD5 Authentication
-
-If you are running on an untrusted network, repmgr can use md5 authentication
-with a [`.pgpass` file](https://www.postgresql.org/docs/11/libpq-pgpass.html)
-to authenticate.
-
-You can specify by IP address, FQDN, or by subnet, using the same format as in
-the previous section:
-
-1. On the current master node, create a password for the `gitlab` and
- `gitlab_repmgr` user:
-
- ```shell
- gitlab-psql -d template1
- template1=# \password gitlab_repmgr
- Enter password: ****
- Confirm password: ****
- template1=# \password gitlab
- ```
-
-1. On each database node:
-
- 1. Edit `/etc/gitlab/gitlab.rb`:
- 1. Ensure `repmgr['trust_auth_cidr_addresses']` is **not** set
- 1. Set `postgresql['md5_auth_cidr_addresses']` to the desired value
- 1. Set `postgresql['sql_replication_user'] = 'gitlab_repmgr'`
- 1. Reconfigure with `gitlab-ctl reconfigure`
- 1. Restart PostgreSQL with `gitlab-ctl restart postgresql`
-
- 1. Create a `.pgpass` file. Enter the `gitlab_repmgr` password twice to
- when asked:
-
- ```shell
- gitlab-ctl write-pgpass --user gitlab_repmgr --hostuser gitlab-psql --database '*'
- ```
-
-1. On each PgBouncer node, edit `/etc/gitlab/gitlab.rb`:
- 1. Ensure `gitlab_rails['db_password']` is set to the plaintext password for
- the `gitlab` database user
- 1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect
-
-## Troubleshooting
-
-### Consul and PostgreSQL changes not taking effect
-
-Due to the potential impacts, `gitlab-ctl reconfigure` only reloads Consul and PostgreSQL, it will not restart the services. However, not all changes can be activated by reloading.
-
-To restart either service, run `gitlab-ctl restart SERVICE`
-
-For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`.
-
-On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner.
-
-### `gitlab-ctl repmgr-check-master` command produces errors
-
-If this command displays errors about database permissions it is likely that something failed during
-install, resulting in the `gitlab-consul` database user getting incorrect permissions. Follow these
-steps to fix the problem:
-
-1. On the master database node, connect to the database prompt - `gitlab-psql -d template1`
-1. Delete the `gitlab-consul` user - `DROP USER "gitlab-consul";`
-1. Exit the database prompt - `\q`
-1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) and the user will be re-added with the proper permissions.
-1. Change to the `gitlab-consul` user - `su - gitlab-consul`
-1. Try the check command again - `gitlab-ctl repmgr-check-master`.
-
-Now there should not be errors. If errors still occur then there is another problem.
-
-### PgBouncer error `ERROR: pgbouncer cannot connect to server`
-
-You may get this error when running `gitlab-rake gitlab:db:configure` or you
-may see the error in the PgBouncer log file.
-
-```plaintext
-PG::ConnectionBad: ERROR: pgbouncer cannot connect to server
-```
-
-The problem may be that your PgBouncer node's IP address is not included in the
-`trust_auth_cidr_addresses` setting in `/etc/gitlab/gitlab.rb` on the database nodes.
-
-You can confirm that this is the issue by checking the PostgreSQL log on the master
-database node. If you see the following error then `trust_auth_cidr_addresses`
-is the problem.
-
-```plaintext
-2018-03-29_13:59:12.11776 FATAL: no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off
-```
-
-To fix the problem, add the IP address to `/etc/gitlab/gitlab.rb`.
-
-```ruby
-postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
-```
-
-[Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
-
-### Issues with other components
-
-If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page:
-
-- [Consul](../consul.md#troubleshooting-consul)
-- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)
-
## Patroni
NOTE:
-Starting from GitLab 13.1, Patroni is available for **experimental** use to replace repmgr. Due to its
-experimental nature, Patroni support is **subject to change without notice.**
+Using Patroni instead of Repmgr is supported for PostgreSQL 11 and required for PostgreSQL 12.
Patroni is an opinionated solution for PostgreSQL high-availability. It takes the control of PostgreSQL, overrides its
configuration and manages its lifecycle (start, stop, restart). This is a more active approach when compared to repmgr.
@@ -1174,80 +741,44 @@ functional or does not have a leader, Patroni and by extension PostgreSQL will n
API which can be accessed via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#patroni)
on each node.
-### Configuring Patroni cluster
-
-You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`). When Patroni is enabled
-repmgr will be disabled automatically.
+### Selecting the appropriate Patroni replication method
-Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, etc, are strictly
-controlled by Patroni and will override the original settings that you make with the `postgresql[...]` configuration key.
-Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
-Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
-`max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
-configuration key.
+[Review the Patroni documentation carefully](https://patroni.readthedocs.io/en/latest/SETTINGS.html#postgresql)
+before making changes as **_some of the options carry a risk of potential data
+loss if not fully understood_**. The [replication mode](https://patroni.readthedocs.io/en/latest/replication_modes.html)
+configured determines the amount of tolerable data loss.
-The configuration of Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore
-any replication setting of PostgreSQL (it will be overwritten anyway). Then you can remove any `repmgr[...]` or
-repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`.
+WARNING:
+Replication is not a backup strategy! There is no replacement for a well-considered and tested backup solution.
-Here is an example similar to [the one that was done with repmgr](#configuring-the-database-nodes):
+Omnibus GitLab defaults [`synchronous_commit`](https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT) to `on`.
```ruby
-# Disable all components except PostgreSQL and Repmgr and Consul
-roles['postgres_role']
+postgresql['synchronous_commit'] = 'on'
+gitlab['geo-postgresql']['synchronous_commit'] = 'on'
+```
-# Enable Patroni
-patroni['enable'] = true
+#### Customizing Patroni failover behavior
-# PostgreSQL configuration
-postgresql['listen_address'] = '0.0.0.0'
+Omnibus GitLab exposes several options allowing more control over the [Patroni restoration process](#recovering-the-patroni-cluster).
-# Disable automatic database migrations
-gitlab_rails['auto_migrate'] = false
+Each option is shown below with its default value in `/etc/gitlab/gitlab.rb`.
-# Configure the Consul agent
-consul['services'] = %w(postgresql)
-
-# START user configuration
-# Please set the real values as explained in Required Information section
-#
-# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
-postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
-# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
-postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
-
-# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)
-patroni['postgresql']['max_wal_senders'] = X
-patroni['postgresql']['max_replication_slots'] = X
-
-# Replace XXX.XXX.XXX.XXX/YY with Network Address
-postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
-
-# Replace placeholders:
-#
-# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
-# with the addresses gathered for CONSUL_SERVER_NODES
-consul['configuration'] = {
- retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
-}
-#
-# END user configuration
+```ruby
+patroni['use_pg_rewind'] = true
+patroni['remove_data_directory_on_rewind_failure'] = false
+patroni['remove_data_directory_on_diverged_timelines'] = false
```
-You do not need an additional or different configuration for replica nodes. As a matter of fact, you don't have to have
-a predetermined primary node. Therefore all database nodes use the same configuration.
+[The upstream documentation will always be more up to date](https://patroni.readthedocs.io/en/latest/SETTINGS.html#postgresql), but the table below should provide a minimal overview of functionality.
-Once the configuration of a node is done, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure)
-on each node for the changes to take effect.
+|Setting|Overview|
+|-|-|
+|`use_pg_rewind`|Try running `pg_rewind` on the former cluster leader before it rejoins the database cluster.|
+|`remove_data_directory_on_rewind_failure`|If `pg_rewind` fails, remove the local PostgreSQL data directory and re-replicate from the current cluster leader.|
+|`remove_data_directory_on_diverged_timelines`|If `pg_rewind` cannot be used and the former leader's timeline has diverged from the current one, then delete the local data directory and re-replicate from the current cluster leader.|
-Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure)
-becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order.
-If you choose an arbitrary order you do not have any predetermined master.
-
-As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join
-the replicas.
-
-#### Database authorization for Patroni
+### Database authorization for Patroni
Patroni uses Unix socket to manage PostgreSQL instance. Therefore, the connection from the `local` socket must be trusted.
@@ -1297,6 +828,16 @@ a manual one, where you have two slightly different options:
For further details on this subject, see the
[Patroni documentation](https://patroni.readthedocs.io/en/latest/rest_api.html#switchover-and-failover-endpoints).
+#### Geo secondary site considerations
+
+Similar to `repmgr`, when a Geo secondary site is replicating from a primary site that uses `Patroni` and `PgBouncer`, [replicating through PgBouncer is not supported](https://github.com/pgbouncer/pgbouncer/issues/382#issuecomment-517911529) and the secondary must replicate directly from the leader node in the `Patroni` cluster. Therefore, when there is an automatic or manual failover in the `Patroni` cluster, you will need to manually re-point your secondary site to replicate from the new leader with:
+
+```shell
+sudo gitlab-ctl replicate-geo-database --host=<new_leader_ip> --replication-slot=<slot_name>
+```
+
+Otherwise, the replication will not happen anymore, even if the original node gets re-added as a follower node. This will re-sync your secondary site database and may take a long time depending on the amount of data to sync. You may also need to run `gitlab-ctl reconfigure` if replication is still not working after re-syncing.
+
### Recovering the Patroni cluster
To recover the old primary and rejoin it to the cluster as a replica, you can simply start Patroni with:
@@ -1363,9 +904,9 @@ You can switch an exiting database cluster to use Patroni instead of repmgr with
### Upgrading PostgreSQL major version in a Patroni cluster
-As of GitLab 13.3, PostgreSQL 11.7 and 12.3 are both shipped with Omnibus GitLab. GitLab still
-uses PostgreSQL 11 by default. Therefore `gitlab-ctl pg-upgrade` does not automatically upgrade
-to PostgreSQL 12. If you want to upgrade to PostgreSQL 12, you must ask for it explicitly.
+As of GitLab 13.3, PostgreSQL 11.7 and 12.3 are both shipped with Omnibus GitLab, and as of GitLab 13.7
+PostgreSQL 12 is used by default. If you want to upgrade to PostgreSQL 12 in versions prior to GitLab 13.7,
+you must ask for it explicitly.
WARNING:
The procedure for upgrading PostgreSQL in a Patroni cluster is different than when upgrading using repmgr.
@@ -1450,3 +991,448 @@ NOTE:
Reverting PostgreSQL upgrade with `gitlab-ctl revert-pg-upgrade` has the same considerations as
`gitlab-ctl pg-upgrade`. You should follow the same procedure by first stopping the replicas,
then reverting the leader, and finally reverting the replicas.
+
+## Repmgr
+
+NOTE:
+Using Patroni instead of Repmgr is supported for PostgreSQL 11 and required for PostgreSQL 12.
+
+### Configuring Repmgr Nodes
+
+1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:
+
+ ```ruby
+ # Disable all components except PostgreSQL and Repmgr and Consul
+ roles ['postgres_role']
+
+ # PostgreSQL configuration
+ postgresql['listen_address'] = '0.0.0.0'
+ postgresql['hot_standby'] = 'on'
+ postgresql['wal_level'] = 'replica'
+ postgresql['shared_preload_libraries'] = 'repmgr_funcs'
+
+ # Disable automatic database migrations
+ gitlab_rails['auto_migrate'] = false
+
+ # Configure the Consul agent
+ consul['services'] = %w(postgresql)
+
+ # START user configuration
+ # Please set the real values as explained in Required Information section
+ #
+ # Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
+ postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
+ # Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
+ postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
+ # Replace X with value of number of db nodes + 1
+ postgresql['max_wal_senders'] = X
+ postgresql['max_replication_slots'] = X
+
+ # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
+ repmgr['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 XXX.XXX.XXX.XXX/YY)
+
+ # Replace placeholders:
+ #
+ # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
+ # with the addresses gathered for CONSUL_SERVER_NODES
+ consul['configuration'] = {
+ retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
+ }
+ #
+ # END user configuration
+ ```
+
+ > `postgres_role` was introduced with GitLab 10.3
+
+1. On secondary nodes, add all the configuration specified above for primary node
+ to `/etc/gitlab/gitlab.rb`. In addition, append the following configuration
+ to inform `gitlab-ctl` that they are standby nodes initially and it need not
+ attempt to register them as primary node
+
+ ```ruby
+ # Specify if a node should attempt to be master on initialization
+ repmgr['master_on_initialization'] = false
+ ```
+
+1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
+1. [Enable Monitoring](#enable-monitoring)
+
+> Please note:
+>
+> - If you want your database to listen on a specific interface, change the configuration:
+> `postgresql['listen_address'] = '0.0.0.0'`.
+> - If your PgBouncer service runs under a different user account,
+> you also need to specify: `postgresql['pgbouncer_user'] = PGBOUNCER_USERNAME` in
+> your configuration.
+
+#### Database nodes post-configuration
+
+##### Primary node
+
+Select one node as a primary node.
+
+1. Open a database prompt:
+
+ ```shell
+ gitlab-psql -d gitlabhq_production
+ ```
+
+1. Enable the `pg_trgm` extension:
+
+ ```shell
+ CREATE EXTENSION pg_trgm;
+ ```
+
+1. Enable the `btree_gist` extension:
+
+ ```shell
+ CREATE EXTENSION btree_gist;
+ ```
+
+1. Exit the database prompt by typing `\q` and Enter.
+
+1. Verify the cluster is initialized with one node:
+
+ ```shell
+ gitlab-ctl repmgr cluster show
+ ```
+
+ The output should be similar to the following:
+
+ ```plaintext
+ Role | Name | Upstream | Connection String
+ ----------+----------|----------|----------------------------------------
+ * master | HOSTNAME | | host=HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
+ ```
+
+1. Note down the hostname or IP address in the connection string: `host=HOSTNAME`. We will
+ refer to the hostname in the next section as `MASTER_NODE_NAME`. If the value
+ is not an IP address, it will need to be a resolvable name (via DNS or
+ `/etc/hosts`)
+
+##### Secondary nodes
+
+1. Set up the repmgr standby:
+
+ ```shell
+ gitlab-ctl repmgr standby setup MASTER_NODE_NAME
+ ```
+
+ Do note that this will remove the existing data on the node. The command
+ has a wait time.
+
+ The output should be similar to the following:
+
+ ```console
+ # gitlab-ctl repmgr standby setup MASTER_NODE_NAME
+ Doing this will delete the entire contents of /var/opt/gitlab/postgresql/data
+ If this is not what you want, hit Ctrl-C now to exit
+ To skip waiting, rerun with the -w option
+ Sleeping for 30 seconds
+ Stopping the database
+ Removing the data
+ Cloning the data
+ Starting the database
+ Registering the node with the cluster
+ ok: run: repmgrd: (pid 19068) 0s
+ ```
+
+1. Verify the node now appears in the cluster:
+
+ ```shell
+ gitlab-ctl repmgr cluster show
+ ```
+
+ The output should be similar to the following:
+
+ ```plaintext
+ Role | Name | Upstream | Connection String
+ ----------+---------|-----------|------------------------------------------------
+ * master | MASTER | | host=MASTER_NODE_NAME user=gitlab_repmgr dbname=gitlab_repmgr
+ standby | STANDBY | MASTER | host=STANDBY_HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
+ ```
+
+Repeat the above steps on all secondary nodes.
+
+#### Database checkpoint
+
+Before moving on, make sure the databases are configured correctly. Run the
+following command on the **primary** node to verify that replication is working
+properly:
+
+```shell
+gitlab-ctl repmgr cluster show
+```
+
+The output should be similar to:
+
+```plaintext
+Role | Name | Upstream | Connection String
+----------+--------------|--------------|--------------------------------------------------------------------
+* master | MASTER | | host=MASTER port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
+ standby | STANDBY | MASTER | host=STANDBY port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
+```
+
+If the 'Role' column for any node says "FAILED", check the
+[Troubleshooting section](#troubleshooting) before proceeding.
+
+Also, check that the check master command works successfully on each node:
+
+```shell
+su - gitlab-consul
+gitlab-ctl repmgr-check-master || echo 'This node is a standby repmgr node'
+```
+
+This command relies on exit codes to tell Consul whether a particular node is a master
+or secondary. The most important thing here is that this command does not produce errors.
+If there are errors it's most likely due to incorrect `gitlab-consul` database user permissions.
+Check the [Troubleshooting section](#troubleshooting) before proceeding.
+
+### Repmgr failover procedure
+
+By default, if the master database fails, `repmgrd` should promote one of the
+standby nodes to master automatically, and Consul will update PgBouncer with
+the new master.
+
+If you need to failover manually, you have two options:
+
+**Shutdown the current master database**
+
+Run:
+
+```shell
+gitlab-ctl stop postgresql
+```
+
+The automated failover process will see this and failover to one of the
+standby nodes.
+
+**Or perform a manual failover**
+
+1. Ensure the old master node is not still active.
+1. Login to the server that should become the new master and run:
+
+ ```shell
+ gitlab-ctl repmgr standby promote
+ ```
+
+1. If there are any other standby servers in the cluster, have them follow
+ the new master server:
+
+ ```shell
+ gitlab-ctl repmgr standby follow NEW_MASTER
+ ```
+
+#### Geo secondary site considerations
+
+When a Geo secondary site is replicating from a primary site that uses `repmgr` and `PgBouncer`, [replicating through PgBouncer is not supported](https://github.com/pgbouncer/pgbouncer/issues/382#issuecomment-517911529) and the secondary must replicate directly from the leader node in the `repmgr` cluster. Therefore, when there is a failover in the `repmgr` cluster, you will need to manually re-point your secondary site to replicate from the new leader with:
+
+```shell
+sudo gitlab-ctl replicate-geo-database --host=<new_leader_ip> --replication-slot=<slot_name>
+```
+
+Otherwise, the replication will not happen anymore, even if the original node gets re-added as a follower node. This will re-sync your secondary site database and may take a long time depending on the amount of data to sync. You may also need to run `gitlab-ctl reconfigure` if replication is still not working after re-syncing.
+
+### Repmgr Restore procedure
+
+If a node fails, it can be removed from the cluster, or added back as a standby
+after it has been restored to service.
+
+#### Remove a standby from the cluster
+
+ From any other node in the cluster, run:
+
+ ```shell
+ gitlab-ctl repmgr standby unregister --node=X
+ ```
+
+ where X is the value of node in `repmgr.conf` on the old server.
+
+ To find this, you can use:
+
+ ```shell
+ awk -F = '$1 == "node" { print $2 }' /var/opt/gitlab/postgresql/repmgr.conf
+ ```
+
+ It will output something like:
+
+ ```plaintext
+ 959789412
+ ```
+
+ Then you will use this ID to unregister the node:
+
+ ```shell
+ gitlab-ctl repmgr standby unregister --node=959789412
+ ```
+
+#### Add a node as a standby server
+
+ From the standby node, run:
+
+ ```shell
+ gitlab-ctl repmgr standby follow NEW_MASTER
+ gitlab-ctl restart repmgrd
+ ```
+
+ WARNING:
+ When the server is brought back online, and before
+ you switch it to a standby node, repmgr will report that there are two masters.
+ If there are any clients that are still attempting to write to the old master,
+ this will cause a split, and the old master will need to be resynced from
+ scratch by performing a `gitlab-ctl repmgr standby setup NEW_MASTER`.
+
+#### Add a failed master back into the cluster as a standby node
+
+ Once `repmgrd` and PostgreSQL are running, the node will need to follow the new
+ as a standby node.
+
+ ```shell
+ gitlab-ctl repmgr standby follow NEW_MASTER
+ ```
+
+ Once the node is following the new master as a standby, the node needs to be
+ [unregistered from the cluster on the new master node](#remove-a-standby-from-the-cluster).
+
+ Once the old master node has been unregistered from the cluster, it will need
+ to be setup as a new standby:
+
+ ```shell
+ gitlab-ctl repmgr standby setup NEW_MASTER
+ ```
+
+ Failure to unregister and read the old master node can lead to subsequent failovers
+ not working.
+
+### Alternate configurations
+
+#### Database authorization
+
+By default, we give any host on the database network the permission to perform
+repmgr operations using PostgreSQL's `trust` method. If you do not want this
+level of trust, there are alternatives.
+
+You can trust only the specific nodes that will be database clusters, or you
+can require md5 authentication.
+
+#### Trust specific addresses
+
+If you know the IP address, or FQDN of all database and PgBouncer nodes in the
+cluster, you can trust only those nodes.
+
+In `/etc/gitlab/gitlab.rb` on all of the database nodes, set
+`repmgr['trust_auth_cidr_addresses']` to an array of strings containing all of
+the addresses.
+
+If setting to a node's FQDN, they must have a corresponding PTR record in DNS.
+If setting to a node's IP address, specify it as `XXX.XXX.XXX.XXX/32`.
+
+For example:
+
+```ruby
+repmgr['trust_auth_cidr_addresses'] = %w(192.168.1.44/32 db2.example.com)
+```
+
+#### MD5 Authentication
+
+If you are running on an untrusted network, repmgr can use md5 authentication
+with a [`.pgpass` file](https://www.postgresql.org/docs/11/libpq-pgpass.html)
+to authenticate.
+
+You can specify by IP address, FQDN, or by subnet, using the same format as in
+the previous section:
+
+1. On the current master node, create a password for the `gitlab` and
+ `gitlab_repmgr` user:
+
+ ```shell
+ gitlab-psql -d template1
+ template1=# \password gitlab_repmgr
+ Enter password: ****
+ Confirm password: ****
+ template1=# \password gitlab
+ ```
+
+1. On each database node:
+
+ 1. Edit `/etc/gitlab/gitlab.rb`:
+ 1. Ensure `repmgr['trust_auth_cidr_addresses']` is **not** set
+ 1. Set `postgresql['md5_auth_cidr_addresses']` to the desired value
+ 1. Set `postgresql['sql_replication_user'] = 'gitlab_repmgr'`
+ 1. Reconfigure with `gitlab-ctl reconfigure`
+ 1. Restart PostgreSQL with `gitlab-ctl restart postgresql`
+
+ 1. Create a `.pgpass` file. Enter the `gitlab_repmgr` password twice to
+ when asked:
+
+ ```shell
+ gitlab-ctl write-pgpass --user gitlab_repmgr --hostuser gitlab-psql --database '*'
+ ```
+
+1. On each PgBouncer node, edit `/etc/gitlab/gitlab.rb`:
+ 1. Ensure `gitlab_rails['db_password']` is set to the plaintext password for
+ the `gitlab` database user
+ 1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect
+
+## Troubleshooting
+
+### Consul and PostgreSQL changes not taking effect
+
+Due to the potential impacts, `gitlab-ctl reconfigure` only reloads Consul and PostgreSQL, it will not restart the services. However, not all changes can be activated by reloading.
+
+To restart either service, run `gitlab-ctl restart SERVICE`
+
+For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`.
+
+On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner.
+
+### `gitlab-ctl repmgr-check-master` command produces errors
+
+If this command displays errors about database permissions it is likely that something failed during
+install, resulting in the `gitlab-consul` database user getting incorrect permissions. Follow these
+steps to fix the problem:
+
+1. On the master database node, connect to the database prompt - `gitlab-psql -d template1`
+1. Delete the `gitlab-consul` user - `DROP USER "gitlab-consul";`
+1. Exit the database prompt - `\q`
+1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) and the user will be re-added with the proper permissions.
+1. Change to the `gitlab-consul` user - `su - gitlab-consul`
+1. Try the check command again - `gitlab-ctl repmgr-check-master`.
+
+Now there should not be errors. If errors still occur then there is another problem.
+
+### PgBouncer error `ERROR: pgbouncer cannot connect to server`
+
+You may get this error when running `gitlab-rake gitlab:db:configure` or you
+may see the error in the PgBouncer log file.
+
+```plaintext
+PG::ConnectionBad: ERROR: pgbouncer cannot connect to server
+```
+
+The problem may be that your PgBouncer node's IP address is not included in the
+`trust_auth_cidr_addresses` setting in `/etc/gitlab/gitlab.rb` on the database nodes.
+
+You can confirm that this is the issue by checking the PostgreSQL log on the master
+database node. If you see the following error then `trust_auth_cidr_addresses`
+is the problem.
+
+```plaintext
+2018-03-29_13:59:12.11776 FATAL: no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off
+```
+
+To fix the problem, add the IP address to `/etc/gitlab/gitlab.rb`.
+
+```ruby
+postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
+```
+
+[Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
+
+### Issues with other components
+
+If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page:
+
+- [Consul](../consul.md#troubleshooting-consul)
+- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)