summaryrefslogtreecommitdiff
path: root/doc/administration/reference_architectures
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/reference_architectures')
-rw-r--r--doc/administration/reference_architectures/10k_users.md39
-rw-r--r--doc/administration/reference_architectures/25k_users.md213
-rw-r--r--doc/administration/reference_architectures/2k_users.md13
-rw-r--r--doc/administration/reference_architectures/3k_users.md32
-rw-r--r--doc/administration/reference_architectures/50k_users.md213
-rw-r--r--doc/administration/reference_architectures/5k_users.md226
-rw-r--r--doc/administration/reference_architectures/index.md28
-rw-r--r--doc/administration/reference_architectures/troubleshooting.md2
8 files changed, 666 insertions, 100 deletions
diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md
index 4627b27a45e..1fc3483fbd4 100644
--- a/doc/administration/reference_architectures/10k_users.md
+++ b/doc/administration/reference_architectures/10k_users.md
@@ -94,7 +94,6 @@ cloud "**Object Storage**" as object_storage #white
elb -[#6a9be7]-> gitlab
elb -[#6a9be7]--> monitor
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -598,8 +597,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = '<postgresql_password_hash>'
+ # Set up basic authentication for the Patroni API (use the same username/password in all nodes).
+ patroni['username'] = '<patroni_api_username>'
+ patroni['password'] = '<patroni_api_password>'
+
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -802,7 +805,7 @@ Managed Redis from cloud providers (such as AWS ElastiCache) will work. If these
services support high availability, be sure it _isn't_ of the Redis Cluster type.
Redis version 5.0 or higher is required, which is included with Omnibus GitLab
packages starting with GitLab 13.0. Older Redis versions don't support an
-optional count argument to SPOP, which is required for [Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+optional count argument to SPOP, which is required for [Merge Trains](../../ci/pipelines/merge_trains.md).
Note the Redis node's IP address or hostname, port, and password (if required).
These will be necessary later when configuring the [GitLab application servers](#configure-gitlab-rails).
@@ -1403,7 +1406,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
postgresql['sql_user_password'] = "<praefect_postgresql_password_hash>"
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1605,7 +1608,7 @@ To configure the Praefect nodes, on each one:
1. Praefect requires to run some database migrations, much like the main GitLab application. For this
you should select **one Praefect node only to run the migrations**, AKA the _Deploy Node_. This node
must be configured first before the others as follows:
-
+
1. In the `/etc/gitlab/gitlab.rb` file, change the `praefect['auto_migrate']` setting value from `false` to `true`
1. To ensure database migrations are only run during reconfigure and not automatically on upgrade, run:
@@ -1613,7 +1616,7 @@ To configure the Praefect nodes, on each one:
```shell
sudo touch /etc/gitlab/skip-auto-reconfigure
```
-
+
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect and
to run the Praefect database migrations.
@@ -1681,7 +1684,7 @@ On each node:
# balancer.
gitlab_rails['internal_api_url'] = 'https://gitlab.example.com'
- # Gitaly
+ # Gitaly
gitaly['enable'] = true
# Make Gitaly accept connections on all network interfaces. You must use
@@ -2344,10 +2347,13 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
@@ -2365,9 +2371,9 @@ the following other supporting services are supported: NGINX, Task Runner, Migra
Prometheus and Grafana.
Hybrid installations leverage the benefits of both cloud native and traditional
-Kubernetes, you can reap certain cloud native workload management benefits while
-the others are deployed in compute VMs with Omnibus as described above in this
-page.
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
NOTE:
This is an **advanced** setup. Running services in Kubernetes is well known
@@ -2389,7 +2395,7 @@ future with further specific cloud provider details.
|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
| Webservice | 4 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 127.5 vCPU, 118 GB memory |
| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
-| Supporting services such as NGINX, Prometheus, etc. | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+| Supporting services such as NGINX or Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
@@ -2478,7 +2484,6 @@ elb -[#6a9be7]-> gitlab
elb -[#6a9be7]-> monitor
elb -[hidden]-> support
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -2532,7 +2537,7 @@ For further information on resource usage, see the [Webservice resources](https:
Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
[The provided starting point](#cluster-topology) allows the deployment of up to
-16 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+14 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
ratio for each additional pod.
For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md
index 1f72c45c2b7..e45a8f6963c 100644
--- a/doc/administration/reference_architectures/25k_users.md
+++ b/doc/administration/reference_architectures/25k_users.md
@@ -19,7 +19,7 @@ full list of reference architectures, see
|------------------------------------------|-------------|-------------------------|------------------|--------------|-----------|
| External load balancing node(3) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 16 vCPU, 60 GB memory | `n1-standard-1` | `m5.4xlarge` | `D16s v3` |
+| PostgreSQL(1) | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Internal load balancing node(3) | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` | `c5.large` | `F2s v2` |
| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
@@ -94,7 +94,6 @@ cloud "**Object Storage**" as object_storage #white
elb -[#6a9be7]-> gitlab
elb -[#6a9be7]--> monitor
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -600,8 +599,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = '<postgresql_password_hash>'
+ # Set up basic authentication for the Patroni API (use the same username/password in all nodes).
+ patroni['username'] = '<patroni_api_username>'
+ patroni['password'] = '<patroni_api_password>'
+
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -804,7 +807,7 @@ Managed Redis from cloud providers (such as AWS ElastiCache) will work. If these
services support high availability, be sure it _isn't_ of the Redis Cluster type.
Redis version 5.0 or higher is required, which is included with Omnibus GitLab
packages starting with GitLab 13.0. Older Redis versions don't support an
-optional count argument to SPOP, which is required for [Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+optional count argument to SPOP, which is required for [Merge Trains](../../ci/pipelines/merge_trains.md).
Note the Redis node's IP address or hostname, port, and password (if required).
These will be necessary later when configuring the [GitLab application servers](#configure-gitlab-rails).
@@ -863,7 +866,7 @@ a node and change its status from primary to replica (and vice versa).
redis_exporter['flags'] = {
'redis.addr' => 'redis://10.6.0.51:6379',
'redis.password' => 'redis-password-goes-here',
- }
+ }
# Prevent database migrations from running on upgrade automatically
gitlab_rails['auto_migrate'] = false
@@ -1421,7 +1424,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
postgresql['sql_user_password'] = "<praefect_postgresql_password_hash>"
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1623,7 +1626,7 @@ the file of the same name on this server. If this is the first Omnibus node you
1. Praefect requires to run some database migrations, much like the main GitLab application. For this
you should select **one Praefect node only to run the migrations**, AKA the _Deploy Node_. This node
must be configured first before the others as follows:
-
+
1. In the `/etc/gitlab/gitlab.rb` file, change the `praefect['auto_migrate']` setting value from `false` to `true`
1. To ensure database migrations are only run during reconfigure and not automatically on upgrade, run:
@@ -1631,7 +1634,7 @@ the file of the same name on this server. If this is the first Omnibus node you
```shell
sudo touch /etc/gitlab/skip-auto-reconfigure
```
-
+
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect and
to run the Praefect database migrations.
@@ -1699,7 +1702,7 @@ On each node:
# balancer.
gitlab_rails['internal_api_url'] = 'https://gitlab.example.com'
- # Gitaly
+ # Gitaly
gitaly['enable'] = true
# Make Gitaly accept connections on all network interfaces. You must use
@@ -2362,10 +2365,194 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section will assume this.
+
+### Cluster topology
+
+The following tables and diagram details the hybrid environment using the same formats
+as the normal environment above.
+
+First starting with the components that run in Kubernetes. The recommendations at this
+time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
+| Webservice | 7 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 223 vCPU, 206.5 GB memory |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
+| Supporting services such as NGINX, Prometheus, etc. | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL(1) | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` |
+| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node(3) | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` |
+| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
+| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage(4) | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 25k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x7" as gitlab #32CD32
+ collections "**Sidekiq** x4" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Internal Load Balancer**" as ilb #9370DB
+collections "**Consul** x3" as consul #e76a9b
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Praefect** x3" as praefect #FF8C00
+ collections "**Gitaly** x3" as gitaly #FF8C00
+ card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00
+
+ praefect -[#FF8C00]-> gitaly
+ praefect -[#FF8C00]> praefect_postgres
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+ card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF
+ collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF
+
+ pgbouncer -[#4EA7FF]-> postgres_primary
+ postgres_primary .[#4EA7FF]> postgres_secondary
+}
+
+card "redis" as redis {
+ collections "**Redis Persistent** x3" as redis_persistent #FF6347
+ collections "**Redis Cache** x3" as redis_cache #FF6347
+ collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347
+ collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347
+
+ redis_persistent <.[#FF6347]- redis_persistent_sentinel
+ redis_cache <.[#FF6347]- redis_cache_sentinel
+}
+
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]-> monitor
+elb -[hidden]-> support
+
+gitlab -[#32CD32]--> ilb
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]---> redis
+gitlab -[hidden]--> consul
+
+sidekiq -[#ff8dd1]--> ilb
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> redis
+sidekiq -[hidden]--> consul
+
+ilb -[#9370DB]-> gitaly_cluster
+ilb -[#9370DB]-> database
+
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b,norank]--> redis
+
+monitor .[#7FFFD4]> consul
+monitor .[#7FFFD4]-> database
+monitor .[#7FFFD4]-> gitaly_cluster
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4]> ilb
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [25k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/25k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+the [recommended topology](#cluster-topology) because four worker processes
+are created by default and each pod has other small processes running.
+
+For 25k users we recommend a total Puma worker count of around 140.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 35
+Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+14 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md
index 7db3a343e0b..ff3db877553 100644
--- a/doc/administration/reference_architectures/2k_users.md
+++ b/doc/administration/reference_architectures/2k_users.md
@@ -324,7 +324,7 @@ to be used with GitLab.
Redis version 5.0 or higher is required, as this is what ships with
Omnibus GitLab packages starting with GitLab 13.0. Older Redis versions
do not support an optional count argument to SPOP which is now required for
-[Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+[Merge Trains](../../ci/pipelines/merge_trains.md).
In addition, GitLab makes use of certain commands like `UNLINK` and `USAGE` which
were introduced only in Redis 4.
@@ -965,10 +965,13 @@ possible. However, if you intend to use GitLab Pages,
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md
index bca5e4c3dab..ef58e69ee27 100644
--- a/doc/administration/reference_architectures/3k_users.md
+++ b/doc/administration/reference_architectures/3k_users.md
@@ -101,7 +101,6 @@ cloud "**Object Storage**" as object_storage #white
elb -[#6a9be7]-> gitlab
elb -[#6a9be7]--> monitor
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -440,7 +439,7 @@ services support high availability, be sure it is **not** the Redis Cluster type
Redis version 5.0 or higher is required, as this is what ships with
Omnibus GitLab packages starting with GitLab 13.0. Older Redis versions
do not support an optional count argument to SPOP which is now required for
-[Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+[Merge Trains](../../ci/pipelines/merge_trains.md).
Note the Redis node's IP address or hostname, port, and password (if required).
These will be necessary when configuring the
@@ -829,7 +828,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
username of `gitlab_replicator` (recommended). The command will request a password
and a confirmation. Use the value that is output by this command in the next step
as the value of `<postgresql_replication_password_hash>`:
-
+
```shell
sudo gitlab-ctl pg-password-md5 gitlab_replicator
```
@@ -848,7 +847,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
```ruby
# Disable all components except Patroni and Consul
roles(['patroni_role'])
-
+
# PostgreSQL configuration
postgresql['listen_address'] = '0.0.0.0'
@@ -866,7 +865,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Prevent database migrations from running on upgrade automatically
gitlab_rails['auto_migrate'] = false
-
+
# Configure the Consul agent
consul['services'] = %w(postgresql)
## Enable service discovery for Prometheus
@@ -882,8 +881,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = '<postgresql_password_hash>'
+ # Set up basic authentication for the Patroni API (use the same username/password in all nodes).
+ patroni['username'] = '<patroni_api_username>'
+ patroni['password'] = '<patroni_api_password>'
+
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1127,7 +1130,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
postgresql['sql_user_password'] = "<praefect_postgresql_password_hash>"
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1328,7 +1331,7 @@ the file of the same name on this server. If this is the first Omnibus node you
1. Praefect requires to run some database migrations, much like the main GitLab application. For this
you should select **one Praefect node only to run the migrations**, AKA the _Deploy Node_. This node
must be configured first before the others as follows:
-
+
1. In the `/etc/gitlab/gitlab.rb` file, change the `praefect['auto_migrate']` setting value from `false` to `true`
1. To ensure database migrations are only run during reconfigure and not automatically on upgrade, run:
@@ -1336,7 +1339,7 @@ the file of the same name on this server. If this is the first Omnibus node you
```shell
sudo touch /etc/gitlab/skip-auto-reconfigure
```
-
+
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect and
to run the Praefect database migrations.
@@ -2062,10 +2065,13 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
## Supported modifications for lower user counts (HA)
diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md
index b3324cb75fb..766f94f6c53 100644
--- a/doc/administration/reference_architectures/50k_users.md
+++ b/doc/administration/reference_architectures/50k_users.md
@@ -94,7 +94,6 @@ cloud "**Object Storage**" as object_storage #white
elb -[#6a9be7]-> gitlab
elb -[#6a9be7]--> monitor
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -608,8 +607,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = '<postgresql_password_hash>'
+ # Set up basic authentication for the Patroni API (use the same username/password in all nodes).
+ patroni['username'] = '<patroni_api_username>'
+ patroni['password'] = '<patroni_api_password>'
+
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -812,7 +815,7 @@ Managed Redis from cloud providers (such as AWS ElastiCache) will work. If these
services support high availability, be sure it _isn't_ of the Redis Cluster type.
Redis version 5.0 or higher is required, which is included with Omnibus GitLab
packages starting with GitLab 13.0. Older Redis versions don't support an
-optional count argument to SPOP, which is required for [Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+optional count argument to SPOP, which is required for [Merge Trains](../../ci/pipelines/merge_trains.md).
Note the Redis node's IP address or hostname, port, and password (if required).
These will be necessary later when configuring the [GitLab application servers](#configure-gitlab-rails).
@@ -872,7 +875,7 @@ a node and change its status from primary to replica (and vice versa).
'redis.addr' => 'redis://10.6.0.51:6379',
'redis.password' => 'redis-password-goes-here',
}
-
+
# Prevent database migrations from running on upgrade automatically
gitlab_rails['auto_migrate'] = false
```
@@ -1425,7 +1428,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
postgresql['sql_user_password'] = "<praefect_postgresql_password_hash>"
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1627,7 +1630,7 @@ the file of the same name on this server. If this is the first Omnibus node you
1. Praefect requires to run some database migrations, much like the main GitLab application. For this
you should select **one Praefect node only to run the migrations**, AKA the _Deploy Node_. This node
must be configured first before the others as follows:
-
+
1. In the `/etc/gitlab/gitlab.rb` file, change the `praefect['auto_migrate']` setting value from `false` to `true`
1. To ensure database migrations are only run during reconfigure and not automatically on upgrade, run:
@@ -1635,7 +1638,7 @@ the file of the same name on this server. If this is the first Omnibus node you
```shell
sudo touch /etc/gitlab/skip-auto-reconfigure
```
-
+
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect and
to run the Praefect database migrations.
@@ -1703,7 +1706,7 @@ On each node:
# balancer.
gitlab_rails['internal_api_url'] = 'https://gitlab.example.com'
- # Gitaly
+ # Gitaly
gitaly['enable'] = true
# Make Gitaly accept connections on all network interfaces. You must use
@@ -1929,7 +1932,7 @@ To configure the Sidekiq nodes, on each one:
## Set number of Sidekiq threads per queue process to the recommend number of 10
sidekiq['max_concurrency'] = 10
- # Monitoring
+ # Monitoring
consul['enable'] = true
consul['monitoring_service_discovery'] = true
@@ -2373,10 +2376,194 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section will assume this.
+
+### Cluster topology
+
+The following tables and diagram details the hybrid environment using the same formats
+as the normal environment above.
+
+First starting with the components that run in Kubernetes. The recommendations at this
+time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
+| Webservice | 16 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 510 vCPU, 472 GB memory |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
+| Supporting services such as NGINX, Prometheus, etc. | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL(1) | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
+| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node(3) | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` |
+| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
+| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage(4) | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 50k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x16" as gitlab #32CD32
+ collections "**Sidekiq** x4" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Internal Load Balancer**" as ilb #9370DB
+collections "**Consul** x3" as consul #e76a9b
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Praefect** x3" as praefect #FF8C00
+ collections "**Gitaly** x3" as gitaly #FF8C00
+ card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00
+
+ praefect -[#FF8C00]-> gitaly
+ praefect -[#FF8C00]> praefect_postgres
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+ card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF
+ collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF
+
+ pgbouncer -[#4EA7FF]-> postgres_primary
+ postgres_primary .[#4EA7FF]> postgres_secondary
+}
+
+card "redis" as redis {
+ collections "**Redis Persistent** x3" as redis_persistent #FF6347
+ collections "**Redis Cache** x3" as redis_cache #FF6347
+ collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347
+ collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347
+
+ redis_persistent <.[#FF6347]- redis_persistent_sentinel
+ redis_cache <.[#FF6347]- redis_cache_sentinel
+}
+
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]-> monitor
+elb -[hidden]-> support
+
+gitlab -[#32CD32]--> ilb
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]---> redis
+gitlab -[hidden]--> consul
+
+sidekiq -[#ff8dd1]--> ilb
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> redis
+sidekiq -[hidden]--> consul
+
+ilb -[#9370DB]-> gitaly_cluster
+ilb -[#9370DB]-> database
+
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b,norank]--> redis
+
+monitor .[#7FFFD4]> consul
+monitor .[#7FFFD4]-> database
+monitor .[#7FFFD4]-> gitaly_cluster
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4]> ilb
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [50k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/50k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+the [recommended topology](#cluster-topology) because four worker processes
+are created by default and each pod has other small processes running.
+
+For 50k users we recommend a total Puma worker count of around 320.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 80
+Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+14 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md
index 9952df196c9..e57c4545b13 100644
--- a/doc/administration/reference_architectures/5k_users.md
+++ b/doc/administration/reference_architectures/5k_users.md
@@ -60,10 +60,7 @@ together {
collections "**Sidekiq** x4" as sidekiq #ff8dd1
}
-together {
- card "**Prometheus + Grafana**" as monitor #7FFFD4
- collections "**Consul** x3" as consul #e76a9b
-}
+card "**Prometheus + Grafana**" as monitor #7FFFD4
card "Gitaly Cluster" as gitaly_cluster {
collections "**Praefect** x3" as praefect #FF8C00
@@ -83,14 +80,15 @@ card "Database" as database {
postgres_primary .[#4EA7FF]> postgres_secondary
}
-card "redis" as redis {
- collections "**Redis Persistent** x3" as redis_persistent #FF6347
- collections "**Redis Cache** x3" as redis_cache #FF6347
- collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347
- collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347
+node "**Consul + Sentinel** x3" as consul_sentinel {
+ component Consul as consul #e76a9b
+ component Sentinel as sentinel #e6e727
+}
- redis_persistent <.[#FF6347]- redis_persistent_sentinel
- redis_cache <.[#FF6347]- redis_cache_sentinel
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
}
cloud "**Object Storage**" as object_storage #white
@@ -98,7 +96,6 @@ cloud "**Object Storage**" as object_storage #white
elb -[#6a9be7]-> gitlab
elb -[#6a9be7]--> monitor
-gitlab -[#32CD32]> sidekiq
gitlab -[#32CD32]--> ilb
gitlab -[#32CD32]-> object_storage
gitlab -[#32CD32]---> redis
@@ -432,7 +429,7 @@ services support high availability, be sure it is **not** the Redis Cluster type
Redis version 5.0 or higher is required, as this is what ships with
Omnibus GitLab packages starting with GitLab 13.0. Older Redis versions
do not support an optional count argument to SPOP which is now required for
-[Merge Trains](../../ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/index.md).
+[Merge Trains](../../ci/pipelines/merge_trains.md).
Note the Redis node's IP address or hostname, port, and password (if required).
These will be necessary when configuring the
@@ -846,7 +843,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Sets `max_replication_slots` to double the number of database nodes.
# Patroni uses one extra slot per node when initiating the replication.
patroni['postgresql']['max_replication_slots'] = 8
-
+
# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
# This is used to prevent replication from using up all of the
# available database connections.
@@ -873,8 +870,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = '<postgresql_password_hash>'
+ # Set up basic authentication for the Patroni API (use the same username/password in all nodes).
+ patroni['username'] = '<patroni_api_username>'
+ patroni['password'] = '<patroni_api_password>'
+
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1118,7 +1119,7 @@ in the second step, do not supply the `EXTERNAL_URL` value.
postgresql['sql_user_password'] = "<praefect_postgresql_password_hash>"
# Replace XXX.XXX.XXX.XXX/YY with Network Address
- postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24)
+ postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
node_exporter['listen_address'] = '0.0.0.0:9100'
@@ -1320,7 +1321,7 @@ the file of the same name on this server. If this is the first Omnibus node you
1. Praefect requires to run some database migrations, much like the main GitLab application. For this
you should select **one Praefect node only to run the migrations**, AKA the _Deploy Node_. This node
must be configured first before the others as follows:
-
+
1. In the `/etc/gitlab/gitlab.rb` file, change the `praefect['auto_migrate']` setting value from `false` to `true`
1. To ensure database migrations are only run during reconfigure and not automatically on upgrade, run:
@@ -1328,7 +1329,7 @@ the file of the same name on this server. If this is the first Omnibus node you
```shell
sudo touch /etc/gitlab/skip-auto-reconfigure
```
-
+
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect and
to run the Praefect database migrations.
@@ -2056,10 +2057,191 @@ to use GitLab Pages, this currently [requires NFS](troubleshooting.md#gitlab-pag
See how to [configure NFS](../nfs.md).
WARNING:
-From GitLab 14.0, enhancements and bug fixes for NFS for Git repositories will no longer be
-considered and customer technical support will be considered out of scope.
-[Read more about Gitaly and NFS](../gitaly/index.md#nfs-deprecation-notice) and
-[the correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
+unavailable from GitLab 15.0. No further enhancements are planned for this feature.
+
+Read:
+
+- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
+- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section will assume this.
+
+### Cluster topology
+
+The following tables and diagram details the hybrid environment using the same formats
+as the normal environment above.
+
+First starting with the components that run in Kubernetes. The recommendations at this
+time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
+| Webservice | 5 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | 79.5 vCPU, 62 GB memory |
+| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory |
+| Supporting services such as NGINX, Prometheus, etc. | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| Redis(2) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| Consul(1) + Sentinel(2) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL(1) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Gitaly | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` |
+| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage(4) | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 5k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x5" as gitlab #32CD32
+ collections "**Sidekiq** x3" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Internal Load Balancer**" as ilb #9370DB
+
+node "**Consul + Sentinel** x3" as consul_sentinel {
+ component Consul as consul #e76a9b
+ component Sentinel as sentinel #e6e727
+}
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Praefect** x3" as praefect #FF8C00
+ collections "**Gitaly** x3" as gitaly #FF8C00
+ card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00
+
+ praefect -[#FF8C00]-> gitaly
+ praefect -[#FF8C00]> praefect_postgres
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+ card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF
+ collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF
+
+ pgbouncer -[#4EA7FF]-> postgres_primary
+ postgres_primary .[#4EA7FF]> postgres_secondary
+}
+
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
+}
+
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]-> monitor
+elb -[hidden]-> support
+
+gitlab -[#32CD32]--> ilb
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]---> redis
+gitlab -[hidden]--> consul
+
+sidekiq -[#ff8dd1]--> ilb
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> redis
+sidekiq -[hidden]--> consul
+
+ilb -[#9370DB]-> gitaly_cluster
+ilb -[#9370DB]-> database
+
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b,norank]--> redis
+
+monitor .[#7FFFD4]> consul
+monitor .[#7FFFD4]-> database
+monitor .[#7FFFD4]-> gitaly_cluster
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4]> ilb
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [5k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/5k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+the [recommended topology](#cluster-topology) because four worker processes
+are created by default and each pod has other small processes running.
+
+For 5k users we recommend a total Puma worker count of around 40.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 10
+Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+8 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md
index 49024365e30..22871f6ea8d 100644
--- a/doc/administration/reference_architectures/index.md
+++ b/doc/administration/reference_architectures/index.md
@@ -69,6 +69,13 @@ The following reference architectures are available:
- [Up to 25,000 users](25k_users.md)
- [Up to 50,000 users](50k_users.md)
+The following Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes, are available:
+
+- [Up to 5,000 users](5k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
+- [Up to 10,000 users](10k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
+- [Up to 25,000 users](25k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
+- [Up to 50,000 users](50k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
+
A GitLab [Premium or Ultimate](https://about.gitlab.com/pricing/#self-managed) license is required
to get assistance from Support with troubleshooting the [2,000 users](2k_users.md)
and higher reference architectures.
@@ -163,7 +170,7 @@ a layer of complexity that will add challenges to finding out where potential
issues might lie.
The reference architectures use the official GitLab Linux packages (Omnibus
-GitLab) to install and configure the various components (with one notable exception being the suggested select Cloud Native installation method described below). The components are
+GitLab) or [Helm Charts](https://docs.gitlab.com/charts/) to install and configure the various components. The components are
installed on separate machines (virtualized or bare metal), with machine hardware
requirements listed in the "Configuration" column and equivalent VM standard sizes listed
in GCP/AWS/Azure columns of each [available reference architecture](#available-reference-architectures).
@@ -175,21 +182,10 @@ Other technologies, like [Docker swarm](https://docs.docker.com/engine/swarm/)
are not officially supported, but can be implemented at your own risk. In that
case, GitLab Support will not be able to help you.
-### Configuring select components with Cloud Native Helm
-
-We also provide [Helm charts](https://docs.gitlab.com/charts/) as a Cloud Native installation
-method for GitLab. For the reference architectures, select components can be set up in this
-way as an alternative if so desired.
+## Supported modifications for lower user count HA reference architectures
-For these kind of setups we support using the charts in an [advanced configuration](https://docs.gitlab.com/charts/#advanced-configuration)
-where stateful backend components, such as the database or Gitaly, are run externally - either
-via Omnibus or reputable third party services. Note that we don't currently support running the
-stateful components via Helm _at large scales_.
+The reference architectures for user counts [3,000](3k_users.md) and up support High Availability (HA).
-When designing these environments you should refer to the respective [Reference Architecture](#available-reference-architectures)
-above for guidance on sizing. Components run via Helm would be similarly scaled to their Omnibus
-specs, only translated into Kubernetes resources.
+In the specific case you have the requirement to achieve HA but have a lower user count, select modifications to the [3,000 user](3k_users.md) architecture are supported.
-For example, if you were to set up a 50k installation with the Rails nodes being run in Helm,
-then the same amount of resources as given for Omnibus should be given to the Kubernetes
-cluster with the Rails nodes broken down into a number of smaller Pods across that cluster.
+For more details, [refer to this section in the architecture's documentation](3k_users.md#supported-modifications-for-lower-user-counts-ha).
diff --git a/doc/administration/reference_architectures/troubleshooting.md b/doc/administration/reference_architectures/troubleshooting.md
index 4b07cff7de2..61d9dfea2a2 100644
--- a/doc/administration/reference_architectures/troubleshooting.md
+++ b/doc/administration/reference_architectures/troubleshooting.md
@@ -207,7 +207,7 @@ To make sure your configuration is correct:
## Troubleshooting Gitaly
For troubleshooting information, see Gitaly and Gitaly Cluster
-[troubleshooting information](../gitaly/index.md).
+[troubleshooting information](../gitaly/troubleshooting.md).
## Troubleshooting the GitLab Rails application