summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication/database_source.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/database_source.md')
-rw-r--r--doc/administration/geo/replication/database_source.md439
1 files changed, 439 insertions, 0 deletions
diff --git a/doc/administration/geo/replication/database_source.md b/doc/administration/geo/replication/database_source.md
new file mode 100644
index 00000000000..67cf8b6535f
--- /dev/null
+++ b/doc/administration/geo/replication/database_source.md
@@ -0,0 +1,439 @@
+# Geo database replication (source) **[PREMIUM ONLY]**
+
+NOTE: **Note:**
+This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release.
+Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations
+using the Omnibus GitLab packages, follow the
+[**database replication for Omnibus GitLab**][database] guide.
+
+NOTE: **Note:**
+The stages of the setup process must be completed in the documented order.
+Before attempting the steps in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated).
+
+This document describes the minimal steps you have to take in order to
+replicate your **primary** GitLab database to a **secondary** node's database. You may
+have to change some values according to your database setup, how big it is, etc.
+
+You are encouraged to first read through all the steps before executing them
+in your testing/production environment.
+
+## PostgreSQL replication
+
+The GitLab **primary** node where the write operations happen will connect to
+**primary** database server, and the **secondary** ones which are read-only will
+connect to **secondary** database servers (which are read-only too).
+
+NOTE: **Note:**
+In many databases' documentation, you will see "**primary**" being referenced as "master"
+and "**secondary**" as either "slave" or "standby" server (read-only).
+
+We recommend using [PostgreSQL replication slots][replication-slots-article]
+to ensure the **primary** node retains all the data necessary for the secondaries to
+recover. See below for more details.
+
+The following guide assumes that:
+
+- You are using PostgreSQL 9.6 or later which includes the
+ [`pg_basebackup` tool][pgback] and improved [Foreign Data Wrapper][FDW] support.
+- You have a **primary** node already set up (the GitLab server you are
+ replicating from), running PostgreSQL 9.6 or later, and
+ you have a new **secondary** server set up with the same versions of the OS,
+ PostgreSQL, and GitLab on all nodes.
+- The IP of the **primary** server for our examples is `198.51.100.1`, whereas the
+ **secondary** node's IP is `198.51.100.2`. Note that the **primary** and **secondary** servers
+ **must** be able to communicate over these addresses. These IP addresses can either
+ be public or private.
+
+CAUTION: **Warning:**
+Geo works with streaming replication. Logical replication is not supported at this time.
+There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420).
+
+### Step 1. Configure the **primary** server
+
+1. SSH into your GitLab **primary** server and login as root:
+
+ ```sh
+ sudo -i
+ ```
+
+1. Add this node as the Geo **primary** by running:
+
+ ```sh
+ bundle exec rake geo:set_primary_node
+ ```
+
+1. Create a [replication user] named `gitlab_replicator`:
+
+ ```sql
+ --- Create a new user 'replicator'
+ CREATE USER gitlab_replicator;
+
+ --- Set/change a password and grants replication privilege
+ ALTER USER gitlab_replicator WITH REPLICATION ENCRYPTED PASSWORD '<replication_password>';
+ ```
+
+1. Make sure your the `gitlab` database user has a password defined:
+
+ ```sh
+ sudo \
+ -u postgres psql \
+ -d template1 \
+ -c "ALTER USER gitlab WITH ENCRYPTED PASSWORD '<database_password>';"
+ ```
+
+1. Edit the content of `database.yml` in `production:` and add the password like the example below:
+
+ ```yaml
+ #
+ # PRODUCTION
+ #
+ production:
+ adapter: postgresql
+ encoding: unicode
+ database: gitlabhq_production
+ pool: 10
+ username: gitlab
+ password: <database_password>
+ host: /var/opt/gitlab/geo-postgresql
+ ```
+
+1. Set up TLS support for the PostgreSQL **primary** server:
+
+ CAUTION: **Warning**:
+ Only skip this step if you **know** that PostgreSQL traffic
+ between the **primary** and **secondary** nodes will be secured through some other
+ means, e.g., a known-safe physical network path or a site-to-site VPN that
+ you have configured.
+
+ If you are replicating your database across the open Internet, it is
+ **essential** that the connection is TLS-secured. Correctly configured, this
+ provides protection against both passive eavesdroppers and active
+ "man-in-the-middle" attackers.
+
+ To generate a self-signed certificate and key, run this command:
+
+ ```sh
+ openssl req \
+ -nodes \
+ -batch \
+ -x509 \
+ -newkey rsa:4096 \
+ -keyout server.key \
+ -out server.crt \
+ -days 3650
+ ```
+
+ This will create two files - `server.key` and `server.crt` - that you can
+ use for authentication.
+
+ Copy them to the correct location for your PostgreSQL installation:
+
+ ```sh
+ # Copying a self-signed certificate and key
+ install -o postgres -g postgres -m 0400 -T server.crt ~postgres/9.x/main/data/server.crt
+ install -o postgres -g postgres -m 0400 -T server.key ~postgres/9.x/main/data/server.key
+ ```
+
+ Add this configuration to `postgresql.conf`, removing any existing
+ configuration for `ssl_cert_file` or `ssl_key_file`:
+
+ ```
+ ssl = on
+ ssl_cert_file='server.crt'
+ ssl_key_file='server.key'
+ ```
+
+1. Edit `postgresql.conf` to configure the **primary** server for streaming replication
+ (for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
+
+ ```
+ listen_address = '<primary_node_ip>'
+ wal_level = hot_standby
+ max_wal_senders = 5
+ min_wal_size = 80MB
+ max_wal_size = 1GB
+ max_replicaton_slots = 1 # Number of Geo secondary nodes
+ wal_keep_segments = 10
+ hot_standby = on
+ ```
+
+ NOTE: **Note**:
+ Be sure to set `max_replication_slots` to the number of Geo **secondary**
+ nodes that you may potentially have (at least 1).
+
+ For security reasons, PostgreSQL by default only listens on the local
+ interface (e.g. 127.0.0.1). However, Geo needs to communicate
+ between the **primary** and **secondary** nodes over a common network, such as a
+ corporate LAN or the public Internet. For this reason, we need to
+ configure PostgreSQL to listen on more interfaces.
+
+ The `listen_address` option opens PostgreSQL up to external connections
+ with the interface corresponding to the given IP. See [the PostgreSQL
+ documentation][pg-docs-runtime-conn] for more details.
+
+ You may also want to edit the `wal_keep_segments` and `max_wal_senders` to
+ match your database replication requirements. Consult the
+ [PostgreSQL - Replication documentation][pg-docs-runtime-replication] for more information.
+
+1. Set the access control on the **primary** node to allow TCP connections using the
+ server's public IP and set the connection from the **secondary** node to require a
+ password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
+ `/etc/postgresql/9.x/main/pg_hba.conf`):
+
+ ```sh
+ host all all <primary_node_ip>/32 md5
+ host replication gitlab_replicator <secondary_node_ip>/32 md5
+ ```
+
+ If you want to add another secondary, add one more row like the replication
+ one and change the IP address:
+
+ ```sh
+ host all all <primary_node_ip>/32 md5
+ host replication gitlab_replicator <secondary_node_ip>/32 md5
+ host replication gitlab_replicator <another_secondary_node_ip>/32 md5
+ ```
+
+1. Restart PostgreSQL for the changes to take effect.
+
+1. Choose a database-friendly name to use for your secondary to use as the
+ replication slot name. For example, if your domain is
+ `secondary.geo.example.com`, you may use `secondary_example` as the slot
+ name.
+
+1. Create the replication slot on the **primary** node:
+
+ ```sh
+ $ sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('secondary_example');"
+ slot_name | xlog_position
+ ------------------+---------------
+ secondary_example |
+ (1 row)
+ ```
+
+1. Now that the PostgreSQL server is set up to accept remote connections, run
+ `netstat -plnt` to make sure that PostgreSQL is listening to the server's
+ public IP.
+
+### Step 2. Configure the secondary server
+
+Follow the first steps in ["configure the secondary server"][database-replication] and note that since you are installing from source, the username and
+group listed as `gitlab-psql` in those steps should be replaced by `postgres`
+instead. After completing the "Test that the `gitlab-psql` user can connect to
+the **primary** node's database" step, continue here:
+
+1. Edit `postgresql.conf` to configure the secondary for streaming replication
+ (for Debian/Ubuntu that would be `/etc/postgresql/9.*/main/postgresql.conf`):
+
+ ```sh
+ wal_level = hot_standby
+ max_wal_senders = 5
+ checkpoint_segments = 10
+ wal_keep_segments = 10
+ hot_standby = on
+ ```
+
+1. Restart PostgreSQL for the changes to take effect.
+
+#### Enable tracking database on the secondary server
+
+Geo secondary nodes use a tracking database to keep track of replication status
+and recover automatically from some replication issues. Follow the steps below to create
+the tracking database.
+
+1. On the secondary node, run the following command to create `database_geo.yml` with the
+ information of your secondary PostgreSQL instance:
+
+ ```sh
+ sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml
+ ```
+
+1. Edit the content of `database_geo.yml` in `production:` as in the example below:
+
+ ```yaml
+ #
+ # PRODUCTION
+ #
+ production:
+ adapter: postgresql
+ encoding: unicode
+ database: gitlabhq_geo_production
+ pool: 10
+ username: gitlab_geo
+ # password:
+ host: /var/opt/gitlab/geo-postgresql
+ ```
+
+1. Create the database `gitlabhq_geo_production` on the PostgreSQL instance of the **secondary** node.
+
+1. Set up the Geo tracking database:
+
+ ```sh
+ bundle exec rake geo:db:migrate
+ ```
+
+1. Configure the [PostgreSQL FDW][FDW] connection and credentials:
+
+ Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
+ params to match your environment. Execute it to set up the FDW connection.
+
+ ```sh
+ #!/bin/bash
+
+ # Secondary Database connection params:
+ DB_HOST="/var/opt/gitlab/postgresql" # change to the public IP or VPC private IP if its an external server
+ DB_NAME="gitlabhq_production"
+ DB_USER="gitlab"
+ DB_PORT="5432"
+
+ # Tracking Database connection params:
+ GEO_DB_HOST="/var/opt/gitlab/geo-postgresql" # change to the public IP or VPC private IP if its an external server
+ GEO_DB_NAME="gitlabhq_geo_production"
+ GEO_DB_USER="gitlab_geo"
+ GEO_DB_PORT="5432"
+
+ query_exec () {
+ gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
+ }
+
+ query_exec "CREATE EXTENSION postgres_fdw;"
+ query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
+ query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}');"
+ query_exec "CREATE SCHEMA gitlab_secondary;"
+ query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
+ ```
+
+ And edit the content of `database_geo.yml` and to add `fdw: true` to
+ the `production:` block.
+
+### Step 3. Initiate the replication process
+
+Below we provide a script that connects the database on the **secondary** node to
+the database on the **primary** node, replicates the database, and creates the
+needed files for streaming replication.
+
+The directories used are the defaults for Debian/Ubuntu. If you have changed
+any defaults, configure it as you see fit replacing the directories and paths.
+
+CAUTION: **Warning:**
+Make sure to run this on the **secondary** server as it removes all PostgreSQL's
+data before running `pg_basebackup`.
+
+1. SSH into your GitLab **secondary** server and login as root:
+
+ ```sh
+ sudo -i
+ ```
+
+1. Save the snippet below in a file, let's say `/tmp/replica.sh`. Modify the
+ embedded paths if necessary:
+
+ ```
+ #!/bin/bash
+
+ PORT="5432"
+ USER="gitlab_replicator"
+ echo ---------------------------------------------------------------
+ echo WARNING: Make sure this script is run from the secondary server
+ echo ---------------------------------------------------------------
+ echo
+ echo Enter the IP or FQDN of the primary PostgreSQL server
+ read HOST
+ echo Enter the password for $USER@$HOST
+ read -s PASSWORD
+ echo Enter the required sslmode
+ read SSLMODE
+
+ echo Stopping PostgreSQL and all GitLab services
+ sudo service gitlab stop
+ sudo service postgresql stop
+
+ echo Backing up postgresql.conf
+ sudo -u postgres mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/
+
+ echo Cleaning up old cluster directory
+ sudo -u postgres rm -rf /var/opt/gitlab/postgresql/data
+
+ echo Starting base backup as the replicator user
+ echo Enter the password for $USER@$HOST
+ sudo -u postgres /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P
+
+ echo Writing recovery.conf file
+ sudo -u postgres bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_
+ standby_mode = 'on'
+ primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD sslmode=$SSLMODE'
+ _EOF1_
+ "
+
+ echo Restoring postgresql.conf
+ sudo -u postgres mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/
+
+ echo Starting PostgreSQL
+ sudo service postgresql start
+ ```
+
+1. Run it with:
+
+ ```sh
+ bash /tmp/replica.sh
+ ```
+
+ When prompted, enter the IP/FQDN of the **primary** node, and the password you set up
+ for the `gitlab_replicator` user in the first step.
+
+ You should use `verify-ca` for the `sslmode`. You can use `disable` if you
+ are happy to skip PostgreSQL TLS authentication altogether (e.g., you know
+ the network path is secure, or you are using a site-to-site VPN). This is
+ **not** safe over the public Internet!
+
+ You can read more details about each `sslmode` in the
+ [PostgreSQL documentation][pg-docs-ssl];
+ the instructions above are carefully written to ensure protection against
+ both passive eavesdroppers and active "man-in-the-middle" attackers.
+
+The replication process is now over.
+
+## PGBouncer support (optional)
+
+1. First, enter the PostgreSQL console as an admin user.
+
+1. Then create the read-only user:
+
+ ```sql
+ -- NOTE: Use the password defined earlier
+ CREATE USER gitlab_geo_fdw WITH password '<your_password_here>';
+ GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
+ GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
+ GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
+ GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;
+
+ -- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
+ -- automatically.
+ ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
+ ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
+ ```
+
+1. Enter the PostgreSQL console on the **secondary** tracking database and change the user mapping to this new user:
+
+ ```
+ ALTER USER MAPPING FOR gitlab_geo SERVER gitlab_secondary OPTIONS (SET user 'gitlab_geo_fdw')
+ ```
+
+## MySQL replication
+
+MySQL replication is not supported for Geo.
+
+## Troubleshooting
+
+Read the [troubleshooting document](troubleshooting.md).
+
+[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75
+[pgback]: http://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html
+[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication
+[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
+[database]: database.md
+[add-geo-node]: configuration.md#step-3-add-the-secondary-gitlab-node
+[database-replication]: database.md#step-2-configure-the-secondary-server
+[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
+[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
+[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html