diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2019-11-16 21:06:10 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2019-11-16 21:06:10 +0000 |
commit | 00bfd2d81d2539e16829585f203169bdd0274bec (patch) | |
tree | 5d201485a5cda4505131396ac0c8155ae812ba8f /doc/administration | |
parent | 409c3cb076e500968ec4c283cb388b56f3e7c9e6 (diff) | |
download | gitlab-ce-00bfd2d81d2539e16829585f203169bdd0274bec.tar.gz |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/administration')
-rw-r--r-- | doc/administration/high_availability/README.md | 58 | ||||
-rw-r--r-- | doc/administration/high_availability/database.md | 87 | ||||
-rw-r--r-- | doc/administration/high_availability/pgbouncer.md | 47 |
3 files changed, 132 insertions, 60 deletions
diff --git a/doc/administration/high_availability/README.md b/doc/administration/high_availability/README.md index 81c12279898..7f0b4056acc 100644 --- a/doc/administration/high_availability/README.md +++ b/doc/administration/high_availability/README.md @@ -82,12 +82,12 @@ Complete the following installation steps in order. A link at the end of each section will bring you back to the Scalable Architecture Examples section so you can continue with the next step. -1. [PostgreSQL](database.md#postgresql-in-a-scaled-environment) +1. [PostgreSQL](database.md#postgresql-in-a-scaled-environment) with [PGBouncer](https://docs.gitlab.com/ee/administration/high_availability/pgbouncer.html) 1. [Redis](redis.md#redis-in-a-scaled-environment) 1. [Gitaly](gitaly.md) (recommended) and / or [NFS](nfs.md)[^4] 1. [GitLab application nodes](gitlab.md) - With [Object Storage service enabled](../gitaly/index.md#eliminating-nfs-altogether)[^3] -1. [Load Balancer](load_balancer.md)[^2] +1. [Load Balancer(s)](load_balancer.md)[^2] 1. [Monitoring node (Prometheus and Grafana)](monitoring_node.md) ### Full Scaling @@ -98,8 +98,8 @@ is split into separate Sidekiq and Unicorn/Workhorse nodes. One indication that this architecture is required is if Sidekiq queues begin to periodically increase in size, indicating that there is contention or there are not enough resources. -- 1 or more PostgreSQL node -- 1 or more Redis node +- 1 or more PostgreSQL nodes +- 1 or more Redis nodes - 1 or more Gitaly storage servers - 1 or more Object Storage services[^3] and / or NFS storage server[^4] - 2 or more Sidekiq nodes @@ -182,6 +182,7 @@ the basis of the GitLab.com architecture. While this scales well it also comes with the added complexity of many more nodes to configure, manage, and monitor. - 3 PostgreSQL nodes +- 1 or more PgBouncer nodes (with associated internal load balancers) - 4 or more Redis nodes (2 separate clusters for persistent and cache data) - 3 Consul nodes - 3 Sentinel nodes @@ -228,16 +229,17 @@ users are, how much automation you use, mirroring, and repo/change size. | ----------------------------|-------|-----------------------|---------------| | GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 3 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 | | PostgreSQL | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | -| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | | Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 16 vCPU, 60GB Memory | n1-standard-16 | | Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 | | Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | -| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | +| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | | S3 Object Storage[^3] . | - | - | - | -| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | -| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | +| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Internal load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud vendors a best effort like for like can be used. @@ -255,16 +257,17 @@ vendors a best effort like for like can be used. | ----------------------------|-------|-----------------------|---------------| | GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 7 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 | | PostgreSQL | 3 | 8 vCPU, 30GB Memory | n1-standard-8 | -| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | | Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 32 vCPU, 120GB Memory | n1-standard-32 | | Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 | | Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | -| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | +| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | | S3 Object Storage[^3] . | - | - | - | -| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | -| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | +| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Internal load balancing node[^2] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud vendors a best effort like for like can be used. @@ -284,16 +287,17 @@ may be adjusted prior to certification based on performance testing. | ----------------------------|-------|-----------------------|---------------| | GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 15 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 | | PostgreSQL | 3 | 8 vCPU, 30GB Memory | n1-standard-8 | -| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | | Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 64 vCPU, 240GB Memory | n1-standard-64 | | Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 | | Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 | | Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | -| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | +| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | | S3 Object Storage[^3] . | - | - | - | -| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 | -| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | +| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | +| Internal load balancing node[^2] . | 1 | 8 vCPU, 7.2GB Memory | n1-highcpu-8 | NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud vendors a best effort like for like can be used. @@ -305,18 +309,18 @@ vendors a best effort like for like can be used. project counts and sizes. [^2]: Our architectures have been tested and validated with [HAProxy](https://www.haproxy.org/) -as the load balancer. However other reputable load balancers with similar feature sets -should also work here but be aware these aren't validated. + as the load balancer. However other reputable load balancers with similar feature sets + should also work here but be aware these aren't validated. [^3]: For data objects such as LFS, Uploads, Artifacts, etc... We recommend a S3 Object Storage -where possible over NFS due to better performance and availability. Several types of objects -are supported for S3 storage - [Job artifacts](../job_artifacts.md#using-object-storage), -[LFS](../lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage), -[Uploads](../uploads.md#using-object-storage-core-only), -[Merge Request Diffs](../merge_request_diffs.md#using-object-storage), -[Packages](../packages/index.md#using-object-storage) (Optional Feature), -[Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (Optional Feature). + where possible over NFS due to better performance and availability. Several types of objects + are supported for S3 storage - [Job artifacts](../job_artifacts.md#using-object-storage), + [LFS](../lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage), + [Uploads](../uploads.md#using-object-storage-core-only), + [Merge Request Diffs](../merge_request_diffs.md#using-object-storage), + [Packages](../packages/index.md#using-object-storage) (Optional Feature), + [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (Optional Feature). [^4]: NFS storage server is still required for [GitLab Pages](https://gitlab.com/gitlab-org/gitlab-pages/issues/196) -and optionally for CI Job Incremental Logging -([can be switched to use Redis instead](https://docs.gitlab.com/ee/administration/job_logs.html#new-incremental-logging-architecture)). + and optionally for CI Job Incremental Logging + ([can be switched to use Redis instead](https://docs.gitlab.com/ee/administration/job_logs.html#new-incremental-logging-architecture)). diff --git a/doc/administration/high_availability/database.md b/doc/administration/high_availability/database.md index a50cc0cbd03..02684f575d4 100644 --- a/doc/administration/high_availability/database.md +++ b/doc/administration/high_availability/database.md @@ -135,7 +135,8 @@ The recommended configuration for a PostgreSQL HA requires: - `repmgrd` - A service to monitor, and handle failover in case of a failure - `Consul` agent - Used for service discovery, to alert other nodes when failover occurs - A minimum of three `Consul` server nodes -- A minimum of one `pgbouncer` service node +- A minimum of one `pgbouncer` service node, but it's recommended to have one per database node + - An internal load balancer (TCP) is required when there is more than one `pgbouncer` service node You also need to take into consideration the underlying network topology, making sure you have redundant connectivity between all Database and GitLab instances, @@ -155,13 +156,13 @@ Database nodes run two services with PostgreSQL: On failure, the old master node is automatically evicted from the cluster, and should be rejoined manually once recovered. - Consul. Monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster. -Alongside PgBouncer, there is a Consul agent that watches the status of the PostgreSQL service. If that status changes, Consul runs a script which updates the configuration and reloads PgBouncer +Alongside each PgBouncer, there is a Consul agent that watches the status of the PostgreSQL service. If that status changes, Consul runs a script which updates the configuration and reloads PgBouncer ##### Connection flow Each service in the package comes with a set of [default ports](https://docs.gitlab.com/omnibus/package-information/defaults.html#ports). You may need to make specific firewall rules for the connections listed below: -- Application servers connect to [PgBouncer default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) +- Application servers connect to either PgBouncer directly via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers. - PgBouncer connects to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) - Repmgr connects to the database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) - Postgres secondaries connect to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) @@ -499,7 +500,7 @@ attributes set, but the following need to be set. # Disable PostgreSQL on the application node postgresql['enable'] = false - gitlab_rails['db_host'] = 'PGBOUNCER_NODE' + gitlab_rails['db_host'] = 'PGBOUNCER_NODE' or 'INTERNAL_LOAD_BALANCER' gitlab_rails['db_port'] = 6432 gitlab_rails['db_password'] = 'POSTGRESQL_USER_PASSWORD' gitlab_rails['auto_migrate'] = false @@ -533,7 +534,8 @@ Here we'll show you some fully expanded example configurations. ##### Example recommended setup -This example uses 3 Consul servers, 3 PostgreSQL servers, and 1 application node. +This example uses 3 Consul servers, 3 PgBouncer servers (with associated internal load balancer), +3 PostgreSQL servers, and 1 application node. We start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses. @@ -543,14 +545,16 @@ Here is a list and description of each machine and the assigned IP: - `10.6.0.11`: Consul 1 - `10.6.0.12`: Consul 2 - `10.6.0.13`: Consul 3 -- `10.6.0.21`: PostgreSQL master -- `10.6.0.22`: PostgreSQL secondary -- `10.6.0.23`: PostgreSQL secondary -- `10.6.0.31`: GitLab application - -All passwords are set to `toomanysecrets`, please do not use this password or derived hashes. +- `10.6.0.20`: Internal Load Balancer +- `10.6.0.21`: PgBouncer 1 +- `10.6.0.22`: PgBouncer 2 +- `10.6.0.23`: PgBouncer 3 +- `10.6.0.31`: PostgreSQL master +- `10.6.0.32`: PostgreSQL secondary +- `10.6.0.33`: PostgreSQL secondary +- `10.6.0.41`: GitLab application -The external_url for GitLab is `http://gitlab.example.com` +All passwords are set to `toomanysecrets`, please do not use this password or derived hashes and the external_url for GitLab is `http://gitlab.example.com`. Please note that after the initial configuration, if a failover occurs, the PostgresSQL master will change to one of the available secondaries until it is failed back. @@ -566,10 +570,45 @@ consul['configuration'] = { server: true, retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13) } +consul['monitoring_service_discovery'] = true +``` + +[Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect. + +##### Example recommended setup for PgBouncer servers + +On each server edit `/etc/gitlab/gitlab.rb`: + +```ruby +# Disable all components except Pgbouncer and Consul agent +roles ['pgbouncer_role'] + +# Configure PgBouncer +pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul) + +pgbouncer['users'] = { + 'gitlab-consul': { + password: '5e0e3263571e3704ad655076301d6ebe' + }, + 'pgbouncer': { + password: '771a8625958a529132abe6f1a4acb19c' + } +} + +consul['watchers'] = %w(postgresql) +consul['enable'] = true +consul['configuration'] = { + retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13) +} +consul['monitoring_service_discovery'] = true ``` [Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect. +##### Internal load balancer setup + +An internal load balancer (TCP) is then required to be setup to serve each PgBouncer node (in this example on the IP of `10.6.0.20`). An example of how to do this can be found in the [PgBouncer Configure Internal Load Balancer](pgbouncer.md#configure-the-internal-load-balancer) section. + ##### Example recommended setup for PostgreSQL servers ###### Primary node @@ -589,9 +628,6 @@ postgresql['shared_preload_libraries'] = 'repmgr_funcs' # Disable automatic database migrations gitlab_rails['auto_migrate'] = false -# Configure the Consul agent -consul['services'] = %w(postgresql) - postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c' postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f' postgresql['max_wal_senders'] = 4 @@ -599,9 +635,13 @@ postgresql['max_wal_senders'] = 4 postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16) repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16) +# Configure the Consul agent +consul['services'] = %w(postgresql) +consul['enable'] = true consul['configuration'] = { retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13) } +consul['monitoring_service_discovery'] = true ``` [Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect. @@ -626,18 +666,15 @@ On the server edit `/etc/gitlab/gitlab.rb`: ```ruby external_url 'http://gitlab.example.com' -gitlab_rails['db_host'] = '127.0.0.1' +gitlab_rails['db_host'] = '10.6.0.20' # Internal Load Balancer for PgBouncer nodes gitlab_rails['db_port'] = 6432 gitlab_rails['db_password'] = 'toomanysecrets' gitlab_rails['auto_migrate'] = false postgresql['enable'] = false -pgbouncer['enable'] = true +pgbouncer['enable'] = false consul['enable'] = true -# Configure PgBouncer -pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul) - # Configure Consul agent consul['watchers'] = %w(postgresql) @@ -661,7 +698,7 @@ consul['configuration'] = { After deploying the configuration follow these steps: -1. On `10.6.0.21`, our primary database +1. On `10.6.0.31`, our primary database Enable the `pg_trgm` extension @@ -673,7 +710,7 @@ After deploying the configuration follow these steps: CREATE EXTENSION pg_trgm; ``` -1. On `10.6.0.22`, our first standby database +1. On `10.6.0.32`, our first standby database Make this node a standby of the primary @@ -681,7 +718,7 @@ After deploying the configuration follow these steps: gitlab-ctl repmgr standby setup 10.6.0.21 ``` -1. On `10.6.0.23`, our second standby database +1. On `10.6.0.33`, our second standby database Make this node a standby of the primary @@ -689,7 +726,7 @@ After deploying the configuration follow these steps: gitlab-ctl repmgr standby setup 10.6.0.21 ``` -1. On `10.6.0.31`, our application server +1. On `10.6.0.41`, our application server Set `gitlab-consul` user's PgBouncer password to `toomanysecrets` @@ -705,7 +742,7 @@ After deploying the configuration follow these steps: #### Example minimal setup -This example uses 3 PostgreSQL servers, and 1 application node. +This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside). It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL. The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](consul.md#outage-recovery) on the same set of machines. diff --git a/doc/administration/high_availability/pgbouncer.md b/doc/administration/high_availability/pgbouncer.md index 352c300a3d0..09b33c3554a 100644 --- a/doc/administration/high_availability/pgbouncer.md +++ b/doc/administration/high_availability/pgbouncer.md @@ -4,13 +4,9 @@ type: reference # Working with the bundle PgBouncer service -As part of its High Availability stack, GitLab Premium includes a bundled version of [PgBouncer](https://www.pgbouncer.org/) that can be managed through `/etc/gitlab/gitlab.rb`. +As part of its High Availability stack, GitLab Premium includes a bundled version of [PgBouncer](https://pgbouncer.github.io/) that can be managed through `/etc/gitlab/gitlab.rb`. PgBouncer is used to seamlessly migrate database connections between servers in a failover scenario. Additionally, it can be used in a non-HA setup to pool connections, speeding up response time while reducing resource usage. -In a High Availability setup, PgBouncer is used to seamlessly migrate database connections between servers in a failover scenario. - -Additionally, it can be used in a non-HA setup to pool connections, speeding up response time while reducing resource usage. - -It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on its own dedicated node in a cluster. +In a HA setup, it's recommended to run a PgBouncer node separately for each database node with an internal load balancer (TCP) serving each accordingly. ## Operations @@ -18,7 +14,7 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i 1. Make sure you collect [`CONSUL_SERVER_NODES`](database.md#consul-information), [`CONSUL_PASSWORD_HASH`](database.md#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](database.md#pgbouncer-information) before executing the next step. -1. Edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section: +1. One each node, edit the `/etc/gitlab/gitlab.rb` config file and replace values noted in the `# START user configuration` section as below: ```ruby # Disable all components except PgBouncer and Consul agent @@ -67,7 +63,7 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i #### PgBouncer Checkpoint -1. Ensure the node is talking to the current master: +1. Ensure each node is talking to the current master: ```sh gitlab-ctl pgb-console # You will be prompted for PGBOUNCER_PASSWORD @@ -100,6 +96,41 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i (2 rows) ``` +#### Configure the internal load balancer + +If you're running more than one PgBouncer node as recommended, then at this time you'll need to set up a TCP internal load balancer to serve each correctly. This can be done with any reputable TCP load balancer. + +As an example here's how you could do it with [HAProxy](https://www.haproxy.org/): + +``` +global + log /dev/log local0 + log localhost local1 notice + log stdout format raw local0 + +defaults + log global + default-server inter 10s fall 3 rise 2 + balance leastconn + +frontend internal-pgbouncer-tcp-in + bind *:6432 + mode tcp + option tcplog + + default_backend pgbouncer + +backend pgbouncer + mode tcp + option tcp-check + + server pgbouncer1 <ip>:6432 check + server pgbouncer2 <ip>:6432 check + server pgbouncer3 <ip>:6432 check +``` + +Refer to your preferred Load Balancer's documentation for further guidance. + ### Running PgBouncer as part of a non-HA GitLab installation 1. Generate PGBOUNCER_USER_PASSWORD_HASH with the command `gitlab-ctl pg-password-md5 pgbouncer` |