diff options
Diffstat (limited to 'doc/administration/reference_architectures/10k_users.md')
-rw-r--r-- | doc/administration/reference_architectures/10k_users.md | 271 |
1 files changed, 199 insertions, 72 deletions
diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index a51641f661f..97af1fe8d3c 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -15,25 +15,31 @@ full list of reference architectures, see > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) > - **Test requests per second (RPS) rates:** API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS -| Service | Nodes | Configuration | GCP | AWS | Azure | -|--------------------------------------------|-------------|-------------------------|-----------------|-------------|----------| -| External load balancing node | 1 | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| Consul | 3 | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| PostgreSQL | 3 | 8 vCPU, 30 GB memory | n1-standard-8 | m5.2xlarge | D8s v3 | -| PgBouncer | 3 | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| Internal load balancing node | 1 | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| Redis - Cache | 3 | 4 vCPU, 15 GB memory | n1-standard-4 | m5.xlarge | D4s v3 | -| Redis - Queues / Shared State | 3 | 4 vCPU, 15 GB memory | n1-standard-4 | m5.xlarge | D4s v3 | -| Redis Sentinel - Cache | 3 | 1 vCPU, 1.7 GB memory | g1-small | t3.small | B1MS | -| Redis Sentinel - Queues / Shared State | 3 | 1 vCPU, 1.7 GB memory | g1-small | t3.small | B1MS | -| Gitaly | 3 | 16 vCPU, 60 GB memory | n1-standard-16 | m5.4xlarge | D16s v3 | -| Praefect | 3 | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| Praefect PostgreSQL | 1+* | 2 vCPU, 1.8 GB memory | n1-highcpu-2 | c5.large | F2s v2 | -| Sidekiq | 4 | 4 vCPU, 15 GB memory | n1-standard-4 | m5.xlarge | D4s v3 | -| GitLab Rails | 3 | 32 vCPU, 28.8 GB memory | n1-highcpu-32 | c5.9xlarge | F32s v2 | -| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | n1-highcpu-4 | c5.xlarge | F4s v2 | -| Object storage | n/a | n/a | n/a | n/a | n/a | -| NFS server | 1 | 4 vCPU, 3.6 GB memory | n1-highcpu-4 | `c5.xlarge` | F4s v2 | +| Service | Nodes | Configuration | GCP | AWS | Azure | +|--------------------------------------------|-------------|-------------------------|------------------|--------------|-----------| +| External load balancing node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| Consul* | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| PostgreSQL* | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` | +| PgBouncer* | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| Internal load balancing node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| Redis - Cache** | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` | +| Redis - Queues / Shared State** | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` | +| Redis Sentinel - Cache** | 3 | 1 vCPU, 1.7 GB memory | `g1-small` | `t3.small` | `B1MS` | +| Redis Sentinel - Queues / Shared State** | 3 | 1 vCPU, 1.7 GB memory | `g1-small` | `t3.small` | `B1MS` | +| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` | +| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| Praefect PostgreSQL* | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | +| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` | +| GitLab Rails | 3 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` | +| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` | +| Object storage | n/a | n/a | n/a | n/a | n/a | +| NFS server | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` | + +NOTE: +Components marked with * can be optionally run on reputable +third party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work. +Components marked with ** can be optionally run on reputable +third party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work. ```plantuml @startuml 10k @@ -210,11 +216,12 @@ The following list includes descriptions of each server and its assigned IP: ## Configure the external load balancer -In an active/active GitLab configuration, you'll need a load balancer to route +In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use -or its exact configuration is beyond the scope of GitLab documentation. We hope +or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice. Some load balancer examples include HAProxy (open-source), +balancer of choice and that the routing methods used are distributing calls evenly +between all nodes. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -387,6 +394,8 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. +Also ensure that the routing methods used are distributing calls evenly across +all nodes. <div align="right"> <a type="button" class="btn btn-default" href="#setup-components"> @@ -433,7 +442,7 @@ To configure Consul: # Set the network addresses that the exporters will listen on node_exporter['listen_address'] = '0.0.0.0:9100' - # Disable auto migrations + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -557,7 +566,7 @@ in the second step, do not supply the `EXTERNAL_URL` value. # Incoming recommended value for max connections is 500. See https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5691. patroni['postgresql']['max_connections'] = 500 - # Disable automatic database migrations + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false # Configure the Consul agent @@ -853,7 +862,7 @@ a node and change its status from primary to replica (and vice versa). node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - # Prevent database migrations from running on upgrade + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -920,7 +929,7 @@ You can specify multiple roles, like sentinel and Redis, as: node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - # Prevent database migrations from running on upgrade + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -1052,7 +1061,7 @@ To configure the Sentinel Cache server: node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - # Disable auto migrations + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -1117,13 +1126,8 @@ a node and change its status from primary to replica (and vice versa). # Set the network addresses that the exporters will listen on node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - ``` -1. Only the primary GitLab application server should handle migrations. To - prevent database migrations from running on upgrade, add the following - configuration to your `/etc/gitlab/gitlab.rb` file: - - ```ruby + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -1184,7 +1188,7 @@ You can specify multiple roles, like sentinel and Redis, as: node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - # Disable auto migrations + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -1316,7 +1320,7 @@ To configure the Sentinel Queues server: node_exporter['listen_address'] = '0.0.0.0:9100' redis_exporter['listen_address'] = '0.0.0.0:9121' - # Disable auto migrations + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ``` @@ -1401,6 +1405,7 @@ in the second step, do not supply the `EXTERNAL_URL` value. postgresql['listen_address'] = '0.0.0.0' postgresql['max_connections'] = 200 + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false # Configure the Consul agent @@ -1546,7 +1551,8 @@ To configure the Praefect nodes, on each one: praefect['enable'] = true praefect['listen_addr'] = '0.0.0.0:2305' - gitlab_rails['rake_cache_clear'] = false + # Prevent database migrations from running on upgrade automatically + praefect['auto_migrate'] = false gitlab_rails['auto_migrate'] = false # Configure the Consul agent @@ -1670,8 +1676,7 @@ On each node: alertmanager['enable'] = false prometheus['enable'] = false - # Prevent database connections during 'gitlab-ctl reconfigure' - gitlab_rails['rake_cache_clear'] = false + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false # Configure the gitlab-shell API callback URL. Without this, `git push` will @@ -1905,6 +1910,7 @@ To configure the Sidekiq nodes, on each one: gitlab_rails['db_password'] = '<postgresql_user_password>' gitlab_rails['db_adapter'] = 'postgresql' gitlab_rails['db_encoding'] = 'unicode' + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ####################################### @@ -2015,6 +2021,7 @@ On each node perform the following: gitlab_rails['db_host'] = '10.6.0.20' # internal load balancer IP gitlab_rails['db_port'] = 6432 gitlab_rails['db_password'] = '<postgresql_user_password>' + # Prevent database migrations from running on upgrade automatically gitlab_rails['auto_migrate'] = false ## Redis connection details @@ -2210,7 +2217,6 @@ To configure the Monitoring node: external_url 'http://gitlab.example.com' # Disable all other services - gitlab_rails['auto_migrate'] = false alertmanager['enable'] = false gitaly['enable'] = false gitlab_exporter['enable'] = false @@ -2244,6 +2250,9 @@ To configure the Monitoring node: consul['configuration'] = { retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13) } + + # Prevent database migrations from running on upgrade automatically + gitlab_rails['auto_migrate'] = false ``` 1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). @@ -2338,10 +2347,10 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag See how to [configure NFS](../nfs.md). WARNING: -From GitLab 13.0, using NFS for Git repositories is deprecated. -From GitLab 14.0, technical support for NFS for Git repositories -will no longer be provided. Upgrade to [Gitaly Cluster](../gitaly/praefect.md) -as soon as possible. +From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be +considered and customer technical support will be considered out of scope. +[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and +[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss). <div align="right"> <a type="button" class="btn btn-default" href="#setup-components"> @@ -2349,29 +2358,145 @@ as soon as possible. </a> </div> -## Cloud Native Deployment (optional) +## Cloud Native Hybrid reference architecture with Helm Charts (alternative) + +As an alternative approach, you can also run select components of GitLab as Cloud Native +in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/). +In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes +in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition, +the following other supporting services are supported: NGINX, Task Runner, Migrations, +Prometheus and Grafana. Hybrid installations leverage the benefits of both cloud native and traditional -deployments. We recommend shifting the Sidekiq and Webservice components into -Kubernetes to reap cloud native workload management benefits while the others -are deployed using the traditional server method already described. +Kubernetes, you can reap certain cloud native workload management benefits while +the others are deployed in compute VMs with Omnibus as described above in this +page. -The following sections detail this hybrid approach. +NOTE: +This is an **advanced** setup. Running services in Kubernetes is well known +to be complex. **This setup is only recommended** if you have strong working +knowledge and experience in Kubernetes. The rest of this +section will assume this. ### Cluster topology -The following table provides a starting point for hybrid -deployment infrastructure. The recommendations use Google Cloud's Kubernetes Engine (GKE) -and associated machine types, but the memory and CPU requirements should -translate to most other providers. +The following tables and diagram details the hybrid environment using the same formats +as the normal environment above. + +First starting with the components that run in Kubernetes. The recommendations at this +time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory +and CPU requirements should translate to most other providers. We hope to update this in the +future with further specific cloud provider details. + +| Service | Nodes | Configuration | GCP | Allocatable CPUs and Memory | +|-------------------------------------------------------|-------|-------------------------|------------------|-----------------------------| +| Webservice | 4 | 32 vCPU, 28.8 GB memory | `n1-standard-32` | 127.5 vCPU, 118 GB memory | +| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | +| Supporting services such as NGINX, Prometheus, etc... | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | + +Next are the backend components that run on static compute VMs via Omnibus (or External PaaS +services where applicable): + +| Service | Nodes | Configuration | GCP | +|--------------------------------------------|-------|-------------------------|------------------| +| Consul* | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | +| PostgreSQL* | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | +| PgBouncer* | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | +| Internal load balancing node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | +| Redis - Cache** | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | +| Redis - Queues / Shared State** | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | +| Redis Sentinel - Cache** | 3 | 1 vCPU, 1.7 GB memory | `g1-small` | +| Redis Sentinel - Queues / Shared State** | 3 | 1 vCPU, 1.7 GB memory | `g1-small` | +| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | +| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | +| Praefect PostgreSQL* | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | +| Object storage | n/a | n/a | n/a | + +NOTE: +Components marked with * can be optionally run on reputable +third party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work. +Components marked with ** can be optionally run on reputable +third party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work. + +```plantuml +@startuml 10k + +card "Kubernetes via Helm Charts" as kubernetes { + card "**External Load Balancer**" as elb #6a9be7 + + together { + collections "**Webservice** x4" as gitlab #32CD32 + collections "**Sidekiq** x4" as sidekiq #ff8dd1 + } + + card "**Prometheus + Grafana**" as monitor #7FFFD4 + card "**Supporting Services**" as support +} + +card "**Internal Load Balancer**" as ilb #9370DB +collections "**Consul** x3" as consul #e76a9b + +card "Gitaly Cluster" as gitaly_cluster { + collections "**Praefect** x3" as praefect #FF8C00 + collections "**Gitaly** x3" as gitaly #FF8C00 + card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00 + + praefect -[#FF8C00]-> gitaly + praefect -[#FF8C00]> praefect_postgres +} + +card "Database" as database { + collections "**PGBouncer** x3" as pgbouncer #4EA7FF + card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF + collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF + + pgbouncer -[#4EA7FF]-> postgres_primary + postgres_primary .[#4EA7FF]> postgres_secondary +} + +card "redis" as redis { + collections "**Redis Persistent** x3" as redis_persistent #FF6347 + collections "**Redis Cache** x3" as redis_cache #FF6347 + collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347 + collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347 + + redis_persistent <.[#FF6347]- redis_persistent_sentinel + redis_cache <.[#FF6347]- redis_cache_sentinel +} + +cloud "**Object Storage**" as object_storage #white + +elb -[#6a9be7]-> gitlab +elb -[#6a9be7]-> monitor +elb -[hidden]-> support + +gitlab -[#32CD32]> sidekiq +gitlab -[#32CD32]--> ilb +gitlab -[#32CD32]-> object_storage +gitlab -[#32CD32]---> redis +gitlab -[hidden]--> consul + +sidekiq -[#ff8dd1]--> ilb +sidekiq -[#ff8dd1]-> object_storage +sidekiq -[#ff8dd1]---> redis +sidekiq -[hidden]--> consul + +ilb -[#9370DB]-> gitaly_cluster +ilb -[#9370DB]-> database + +consul .[#e76a9b]-> database +consul .[#e76a9b]-> gitaly_cluster +consul .[#e76a9b,norank]--> redis -Machine count | Machine type | Allocatable vCPUs | Allocatable memory (GB) | Purpose --|-|-|-|- -2 | `n1-standard-4` | 7.75 | 25 | Non-GitLab resources, including Grafana, NGINX, and Prometheus -4 | `n1-standard-4` | 15.5 | 50 | GitLab Sidekiq pods -4 | `n1-highcpu-32` | 127.5 | 118 | GitLab Webservice pods +monitor .[#7FFFD4]> consul +monitor .[#7FFFD4]-> database +monitor .[#7FFFD4]-> gitaly_cluster +monitor .[#7FFFD4,norank]--> redis +monitor .[#7FFFD4]> ilb +monitor .[#7FFFD4,norank]u--> elb -"Allocatable" in this table refers to the amount of resources available to workloads deployed in Kubernetes _after_ accounting for the overhead of running Kubernetes itself. +@enduml +``` ### Resource usage settings @@ -2379,29 +2504,31 @@ The following formulas help when calculating how many pods may be deployed withi The [10k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/10k.yaml) documents how to apply the calculated configuration to the Helm Chart. +#### Webservice + +Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_. +Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using +the [recommended topology](#cluster-topology) because four worker processes +are created by default and each pod has other small processes running. + +For 10k users we recommend a total Puma worker count of around 80. +With the [provided recommendations](#cluster-topology) this allows the deployment of up to 20 +Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using +the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional +Webservice pod. + +For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + #### Sidekiq Sidekiq pods should generally have 1 vCPU and 2 GB of memory. [The provided starting point](#cluster-topology) allows the deployment of up to -16 Sidekiq pods. Expand available resources using the 1vCPU to 2GB memory +16 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory ratio for each additional pod. For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). -#### Webservice - -Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_. -Each Webservice pod will consume roughly 2 vCPUs and 2.5 GB of memory using -the [recommended topology](#cluster-topology) because two worker processes -are created by default. - -The [provided recommendations](#cluster-topology) allow the deployment of up to 28 -Webservice pods. Expand available resources using the ratio of 1 vCPU to 1.25 GB of memory -_per each worker process_ for each additional Webservice pod. - -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). - <div align="right"> <a type="button" class="btn btn-default" href="#setup-components"> Back to setup components <i class="fa fa-angle-double-up" aria-hidden="true"></i> |