diff options
Diffstat (limited to 'doc/administration/geo/replication/database_source.md')
-rw-r--r-- | doc/administration/geo/replication/database_source.md | 439 |
1 files changed, 0 insertions, 439 deletions
diff --git a/doc/administration/geo/replication/database_source.md b/doc/administration/geo/replication/database_source.md deleted file mode 100644 index 67cf8b6535f..00000000000 --- a/doc/administration/geo/replication/database_source.md +++ /dev/null @@ -1,439 +0,0 @@ -# Geo database replication (source) **[PREMIUM ONLY]** - -NOTE: **Note:** -This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release. -Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations -using the Omnibus GitLab packages, follow the -[**database replication for Omnibus GitLab**][database] guide. - -NOTE: **Note:** -The stages of the setup process must be completed in the documented order. -Before attempting the steps in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated). - -This document describes the minimal steps you have to take in order to -replicate your **primary** GitLab database to a **secondary** node's database. You may -have to change some values according to your database setup, how big it is, etc. - -You are encouraged to first read through all the steps before executing them -in your testing/production environment. - -## PostgreSQL replication - -The GitLab **primary** node where the write operations happen will connect to -**primary** database server, and the **secondary** ones which are read-only will -connect to **secondary** database servers (which are read-only too). - -NOTE: **Note:** -In many databases' documentation, you will see "**primary**" being referenced as "master" -and "**secondary**" as either "slave" or "standby" server (read-only). - -We recommend using [PostgreSQL replication slots][replication-slots-article] -to ensure the **primary** node retains all the data necessary for the secondaries to -recover. See below for more details. - -The following guide assumes that: - -- You are using PostgreSQL 9.6 or later which includes the - [`pg_basebackup` tool][pgback] and improved [Foreign Data Wrapper][FDW] support. -- You have a **primary** node already set up (the GitLab server you are - replicating from), running PostgreSQL 9.6 or later, and - you have a new **secondary** server set up with the same versions of the OS, - PostgreSQL, and GitLab on all nodes. -- The IP of the **primary** server for our examples is `198.51.100.1`, whereas the - **secondary** node's IP is `198.51.100.2`. Note that the **primary** and **secondary** servers - **must** be able to communicate over these addresses. These IP addresses can either - be public or private. - -CAUTION: **Warning:** -Geo works with streaming replication. Logical replication is not supported at this time. -There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420). - -### Step 1. Configure the **primary** server - -1. SSH into your GitLab **primary** server and login as root: - - ```sh - sudo -i - ``` - -1. Add this node as the Geo **primary** by running: - - ```sh - bundle exec rake geo:set_primary_node - ``` - -1. Create a [replication user] named `gitlab_replicator`: - - ```sql - --- Create a new user 'replicator' - CREATE USER gitlab_replicator; - - --- Set/change a password and grants replication privilege - ALTER USER gitlab_replicator WITH REPLICATION ENCRYPTED PASSWORD '<replication_password>'; - ``` - -1. Make sure your the `gitlab` database user has a password defined: - - ```sh - sudo \ - -u postgres psql \ - -d template1 \ - -c "ALTER USER gitlab WITH ENCRYPTED PASSWORD '<database_password>';" - ``` - -1. Edit the content of `database.yml` in `production:` and add the password like the example below: - - ```yaml - # - # PRODUCTION - # - production: - adapter: postgresql - encoding: unicode - database: gitlabhq_production - pool: 10 - username: gitlab - password: <database_password> - host: /var/opt/gitlab/geo-postgresql - ``` - -1. Set up TLS support for the PostgreSQL **primary** server: - - CAUTION: **Warning**: - Only skip this step if you **know** that PostgreSQL traffic - between the **primary** and **secondary** nodes will be secured through some other - means, e.g., a known-safe physical network path or a site-to-site VPN that - you have configured. - - If you are replicating your database across the open Internet, it is - **essential** that the connection is TLS-secured. Correctly configured, this - provides protection against both passive eavesdroppers and active - "man-in-the-middle" attackers. - - To generate a self-signed certificate and key, run this command: - - ```sh - openssl req \ - -nodes \ - -batch \ - -x509 \ - -newkey rsa:4096 \ - -keyout server.key \ - -out server.crt \ - -days 3650 - ``` - - This will create two files - `server.key` and `server.crt` - that you can - use for authentication. - - Copy them to the correct location for your PostgreSQL installation: - - ```sh - # Copying a self-signed certificate and key - install -o postgres -g postgres -m 0400 -T server.crt ~postgres/9.x/main/data/server.crt - install -o postgres -g postgres -m 0400 -T server.key ~postgres/9.x/main/data/server.key - ``` - - Add this configuration to `postgresql.conf`, removing any existing - configuration for `ssl_cert_file` or `ssl_key_file`: - - ``` - ssl = on - ssl_cert_file='server.crt' - ssl_key_file='server.key' - ``` - -1. Edit `postgresql.conf` to configure the **primary** server for streaming replication - (for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`): - - ``` - listen_address = '<primary_node_ip>' - wal_level = hot_standby - max_wal_senders = 5 - min_wal_size = 80MB - max_wal_size = 1GB - max_replicaton_slots = 1 # Number of Geo secondary nodes - wal_keep_segments = 10 - hot_standby = on - ``` - - NOTE: **Note**: - Be sure to set `max_replication_slots` to the number of Geo **secondary** - nodes that you may potentially have (at least 1). - - For security reasons, PostgreSQL by default only listens on the local - interface (e.g. 127.0.0.1). However, Geo needs to communicate - between the **primary** and **secondary** nodes over a common network, such as a - corporate LAN or the public Internet. For this reason, we need to - configure PostgreSQL to listen on more interfaces. - - The `listen_address` option opens PostgreSQL up to external connections - with the interface corresponding to the given IP. See [the PostgreSQL - documentation][pg-docs-runtime-conn] for more details. - - You may also want to edit the `wal_keep_segments` and `max_wal_senders` to - match your database replication requirements. Consult the - [PostgreSQL - Replication documentation][pg-docs-runtime-replication] for more information. - -1. Set the access control on the **primary** node to allow TCP connections using the - server's public IP and set the connection from the **secondary** node to require a - password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be - `/etc/postgresql/9.x/main/pg_hba.conf`): - - ```sh - host all all <primary_node_ip>/32 md5 - host replication gitlab_replicator <secondary_node_ip>/32 md5 - ``` - - If you want to add another secondary, add one more row like the replication - one and change the IP address: - - ```sh - host all all <primary_node_ip>/32 md5 - host replication gitlab_replicator <secondary_node_ip>/32 md5 - host replication gitlab_replicator <another_secondary_node_ip>/32 md5 - ``` - -1. Restart PostgreSQL for the changes to take effect. - -1. Choose a database-friendly name to use for your secondary to use as the - replication slot name. For example, if your domain is - `secondary.geo.example.com`, you may use `secondary_example` as the slot - name. - -1. Create the replication slot on the **primary** node: - - ```sh - $ sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('secondary_example');" - slot_name | xlog_position - ------------------+--------------- - secondary_example | - (1 row) - ``` - -1. Now that the PostgreSQL server is set up to accept remote connections, run - `netstat -plnt` to make sure that PostgreSQL is listening to the server's - public IP. - -### Step 2. Configure the secondary server - -Follow the first steps in ["configure the secondary server"][database-replication] and note that since you are installing from source, the username and -group listed as `gitlab-psql` in those steps should be replaced by `postgres` -instead. After completing the "Test that the `gitlab-psql` user can connect to -the **primary** node's database" step, continue here: - -1. Edit `postgresql.conf` to configure the secondary for streaming replication - (for Debian/Ubuntu that would be `/etc/postgresql/9.*/main/postgresql.conf`): - - ```sh - wal_level = hot_standby - max_wal_senders = 5 - checkpoint_segments = 10 - wal_keep_segments = 10 - hot_standby = on - ``` - -1. Restart PostgreSQL for the changes to take effect. - -#### Enable tracking database on the secondary server - -Geo secondary nodes use a tracking database to keep track of replication status -and recover automatically from some replication issues. Follow the steps below to create -the tracking database. - -1. On the secondary node, run the following command to create `database_geo.yml` with the - information of your secondary PostgreSQL instance: - - ```sh - sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml - ``` - -1. Edit the content of `database_geo.yml` in `production:` as in the example below: - - ```yaml - # - # PRODUCTION - # - production: - adapter: postgresql - encoding: unicode - database: gitlabhq_geo_production - pool: 10 - username: gitlab_geo - # password: - host: /var/opt/gitlab/geo-postgresql - ``` - -1. Create the database `gitlabhq_geo_production` on the PostgreSQL instance of the **secondary** node. - -1. Set up the Geo tracking database: - - ```sh - bundle exec rake geo:db:migrate - ``` - -1. Configure the [PostgreSQL FDW][FDW] connection and credentials: - - Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection - params to match your environment. Execute it to set up the FDW connection. - - ```sh - #!/bin/bash - - # Secondary Database connection params: - DB_HOST="/var/opt/gitlab/postgresql" # change to the public IP or VPC private IP if its an external server - DB_NAME="gitlabhq_production" - DB_USER="gitlab" - DB_PORT="5432" - - # Tracking Database connection params: - GEO_DB_HOST="/var/opt/gitlab/geo-postgresql" # change to the public IP or VPC private IP if its an external server - GEO_DB_NAME="gitlabhq_geo_production" - GEO_DB_USER="gitlab_geo" - GEO_DB_PORT="5432" - - query_exec () { - gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}" - } - - query_exec "CREATE EXTENSION postgres_fdw;" - query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');" - query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}');" - query_exec "CREATE SCHEMA gitlab_secondary;" - query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};" - ``` - - And edit the content of `database_geo.yml` and to add `fdw: true` to - the `production:` block. - -### Step 3. Initiate the replication process - -Below we provide a script that connects the database on the **secondary** node to -the database on the **primary** node, replicates the database, and creates the -needed files for streaming replication. - -The directories used are the defaults for Debian/Ubuntu. If you have changed -any defaults, configure it as you see fit replacing the directories and paths. - -CAUTION: **Warning:** -Make sure to run this on the **secondary** server as it removes all PostgreSQL's -data before running `pg_basebackup`. - -1. SSH into your GitLab **secondary** server and login as root: - - ```sh - sudo -i - ``` - -1. Save the snippet below in a file, let's say `/tmp/replica.sh`. Modify the - embedded paths if necessary: - - ``` - #!/bin/bash - - PORT="5432" - USER="gitlab_replicator" - echo --------------------------------------------------------------- - echo WARNING: Make sure this script is run from the secondary server - echo --------------------------------------------------------------- - echo - echo Enter the IP or FQDN of the primary PostgreSQL server - read HOST - echo Enter the password for $USER@$HOST - read -s PASSWORD - echo Enter the required sslmode - read SSLMODE - - echo Stopping PostgreSQL and all GitLab services - sudo service gitlab stop - sudo service postgresql stop - - echo Backing up postgresql.conf - sudo -u postgres mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/ - - echo Cleaning up old cluster directory - sudo -u postgres rm -rf /var/opt/gitlab/postgresql/data - - echo Starting base backup as the replicator user - echo Enter the password for $USER@$HOST - sudo -u postgres /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P - - echo Writing recovery.conf file - sudo -u postgres bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_ - standby_mode = 'on' - primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD sslmode=$SSLMODE' - _EOF1_ - " - - echo Restoring postgresql.conf - sudo -u postgres mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/ - - echo Starting PostgreSQL - sudo service postgresql start - ``` - -1. Run it with: - - ```sh - bash /tmp/replica.sh - ``` - - When prompted, enter the IP/FQDN of the **primary** node, and the password you set up - for the `gitlab_replicator` user in the first step. - - You should use `verify-ca` for the `sslmode`. You can use `disable` if you - are happy to skip PostgreSQL TLS authentication altogether (e.g., you know - the network path is secure, or you are using a site-to-site VPN). This is - **not** safe over the public Internet! - - You can read more details about each `sslmode` in the - [PostgreSQL documentation][pg-docs-ssl]; - the instructions above are carefully written to ensure protection against - both passive eavesdroppers and active "man-in-the-middle" attackers. - -The replication process is now over. - -## PGBouncer support (optional) - -1. First, enter the PostgreSQL console as an admin user. - -1. Then create the read-only user: - - ```sql - -- NOTE: Use the password defined earlier - CREATE USER gitlab_geo_fdw WITH password '<your_password_here>'; - GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw; - GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw; - GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw; - GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw; - - -- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw" - -- automatically. - ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw; - ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw; - ``` - -1. Enter the PostgreSQL console on the **secondary** tracking database and change the user mapping to this new user: - - ``` - ALTER USER MAPPING FOR gitlab_geo SERVER gitlab_secondary OPTIONS (SET user 'gitlab_geo_fdw') - ``` - -## MySQL replication - -MySQL replication is not supported for Geo. - -## Troubleshooting - -Read the [troubleshooting document](troubleshooting.md). - -[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75 -[pgback]: http://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html -[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication -[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html -[database]: database.md -[add-geo-node]: configuration.md#step-3-add-the-secondary-gitlab-node -[database-replication]: database.md#step-2-configure-the-secondary-server -[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION -[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html -[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html |