summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication/external_database.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/external_database.md')
-rw-r--r--doc/administration/geo/replication/external_database.md219
1 files changed, 219 insertions, 0 deletions
diff --git a/doc/administration/geo/replication/external_database.md b/doc/administration/geo/replication/external_database.md
new file mode 100644
index 00000000000..dae5ed911b0
--- /dev/null
+++ b/doc/administration/geo/replication/external_database.md
@@ -0,0 +1,219 @@
+# Geo with external PostgreSQL instances **[PREMIUM ONLY]**
+
+This document is relevant if you are using a PostgreSQL instance that is *not
+managed by Omnibus*. This includes cloud-managed instances like AWS RDS, or
+manually installed and configured PostgreSQL instances.
+
+NOTE: **Note**:
+We strongly recommend running Omnibus-managed instances as they are actively
+developed and tested. We aim to be compatible with most external
+(not managed by Omnibus) databases but we do not guarantee compatibility.
+
+## **Primary** node
+
+1. SSH into a GitLab **primary** application server and login as root:
+
+ ```sh
+ sudo -i
+ ```
+
+1. Execute the command below to define the node as **primary** node:
+
+ ```sh
+ gitlab-ctl set-geo-primary-node
+ ```
+
+ This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
+
+
+### Configure the external database to be replicated
+
+To set up an external database, you can either:
+
+- Set up streaming replication yourself (for example, in AWS RDS).
+- Perform the Omnibus configuration manually as follows.
+
+#### Leverage your cloud provider's tools to replicate the primary database
+
+Given you have a primary node set up on AWS EC2 that uses RDS.
+You can now just create a read-only replica in a different region and the
+replication process will be managed by AWS. Make sure you've set Network ACL, Subnet, and
+Security Group according to your needs, so the secondary application node can access the database.
+Skip to the [Configure secondary application node](#configure-secondary-application-nodes-to-use-the-external-read-replica) section below.
+
+#### Manually configure the primary database for replication
+
+The [geo_primary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
+configures the **primary** node's database to be replicated by making changes to
+`pg_hba.conf` and `postgresql.conf`. Make the following configuration changes
+manually to your external database configuration:
+
+```
+##
+## Geo Primary Role
+## - pg_hba.conf
+##
+host replication gitlab_replicator <trusted secondary IP>/32 md5
+```
+
+```
+##
+## Geo Primary Role
+## - postgresql.conf
+##
+sql_replication_user = gitlab_replicator
+wal_level = hot_standby
+max_wal_senders = 10
+wal_keep_segments = 50
+max_replication_slots = 1 # number of secondary instances
+hot_standby = on
+```
+
+## **Secondary** nodes
+
+### Manually configure the replica database
+
+Make the following configuration changes manually to your `postgresql.conf`
+of external replica database:
+
+```
+##
+## Geo Secondary Role
+## - postgresql.conf
+##
+wal_level = hot_standby
+max_wal_senders = 10
+wal_keep_segments = 10
+hot_standby = on
+```
+
+### Configure **secondary** application nodes to use the external read-replica
+
+With Omnibus, the
+[geo_secondary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
+has three main functions:
+
+1. Configure the replica database.
+1. Configure the tracking database.
+1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).
+
+To configure the connection to the external read-replica database and enable Log Cursor:
+
+1. SSH into a GitLab **secondary** application server and login as root:
+
+ ```bash
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following
+
+ ```ruby
+ ##
+ ## Geo Secondary role
+ ## - configure dependent flags automatically to enable Geo
+ ##
+ roles ['geo_secondary_role']
+
+ # note this is shared between both databases,
+ # make sure you define the same password in both
+ gitlab_rails['db_password'] = '<your_password_here>'
+
+ gitlab_rails['db_username'] = 'gitlab'
+ gitlab_rails['db_host'] = '<database_read_replica_host>'
+ ```
+1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
+
+### Configure the tracking database
+
+**Secondary** nodes use a separate PostgreSQL installation as a tracking
+database to keep track of replication status and automatically recover from
+potential replication issues. Omnibus automatically configures a tracking database
+when `roles ['geo_secondary_role']` is set. For high availability,
+refer to [Geo High Availability](https://docs.gitlab.com/ee/administration/high_availability).
+If you want to run this database external to Omnibus, please follow the instructions below.
+
+The tracking database requires an [FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
+connection with the **secondary** replica database for improved performance.
+
+If you have an external database ready to be used as the tracking database,
+follow the instructions below to use it:
+
+NOTE: **Note:**
+If you want to use AWS RDS as a tracking database, make sure it has access to
+the secondary database. Unfortunately, just assigning the same security group is not enough as
+outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
+rule to the read-replica's security group allowing any TCP traffic from
+the tracking database on port 5432.
+
+1. SSH into a GitLab **secondary** server and login as root:
+
+ ```bash
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` with the connection params and credentials for
+ the machine with the PostgreSQL instance:
+
+ ```ruby
+ geo_secondary['db_username'] = 'gitlab_geo'
+ geo_secondary['db_password'] = '<your_password_here>'
+
+ geo_secondary['db_host'] = '<tracking_database_host>'
+ geo_secondary['db_port'] = <tracking_database_port> # change to the correct port
+ geo_secondary['db_fdw'] = true # enable FDW
+ geo_postgresql['enable'] = false # don't use internal managed instance
+ ```
+
+1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
+
+1. Run the tracking database migrations:
+
+ ```bash
+ gitlab-rake geo:db:create
+ gitlab-rake geo:db:migrate
+ ```
+
+1. Configure the
+ [PostgreSQL FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
+ connection and credentials:
+
+ Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
+ params to match your environment. Execute it to set up the FDW connection.
+
+ ```bash
+ #!/bin/bash
+
+ # Secondary Database connection params:
+ DB_HOST="<public_ip_or_vpc_private_ip>"
+ DB_NAME="gitlabhq_production"
+ DB_USER="gitlab"
+ DB_PASS="<your_password_here>"
+ DB_PORT="5432"
+
+ # Tracking Database connection params:
+ GEO_DB_HOST="<public_ip_or_vpc_private_ip>"
+ GEO_DB_NAME="gitlabhq_geo_production"
+ GEO_DB_USER="gitlab_geo"
+ GEO_DB_PORT="5432"
+
+ query_exec () {
+ gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
+ }
+
+ query_exec "CREATE EXTENSION postgres_fdw;"
+ query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
+ query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
+ query_exec "CREATE SCHEMA gitlab_secondary;"
+ query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
+ ```
+
+ NOTE: **Note:** The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
+ but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
+ `psql` for AWS RDS.
+
+1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
+1. Populate the FDW tables:
+
+ ```bash
+ gitlab-rake geo:db:refresh_foreign_tables
+ ```