diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-04-10 15:09:50 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-04-10 15:09:50 +0000 |
commit | de2fb5b82c92c90f90ed67ced45143c04e934fb8 (patch) | |
tree | ff8e5e642580de7bb596d90dd0e7f739f44ca540 /doc/administration/availability | |
parent | c6a33b298229f9e04933be43d6176c476ef03012 (diff) | |
download | gitlab-ce-de2fb5b82c92c90f90ed67ced45143c04e934fb8.tar.gz |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/administration/availability')
-rw-r--r-- | doc/administration/availability/index.md | 131 |
1 files changed, 131 insertions, 0 deletions
diff --git a/doc/administration/availability/index.md b/doc/administration/availability/index.md new file mode 100644 index 00000000000..90113985ad5 --- /dev/null +++ b/doc/administration/availability/index.md @@ -0,0 +1,131 @@ +--- +type: reference, concepts +--- + +# Availability + +GitLab offers high availability options for organizations that require +the fault tolerance and redundancy necessary to maintain high-uptime operations. + +Please consult our [scaling documentation](../scaling) if you want to resolve +performance bottlenecks you encounter in individual GitLab components without +incurring the additional complexity costs associated with maintaining a +highly-available architecture. + +On this page, we present examples of self-managed instances which demonstrate +how GitLab can be scaled out and made highly available. These examples progress +from simple to complex as scaling or highly-available components are added. + +For larger setups serving 2,000 or more users, we provide +[reference architectures](../scaling/index.md#reference-architectures) based on GitLab's +experience with GitLab.com and internal scale testing that aim to achieve the +right balance of scalability and availability. + +For detailed insight into how GitLab scales and configures GitLab.com, you can +watch [this 1 hour Q&A](https://www.youtube.com/watch?v=uCU8jdYzpac) +with [John Northrup](https://gitlab.com/northrup), and live questions coming +in from some of our customers. + +## High availability + +### Omnibus installation with automatic database failover + +By adding automatic failover for database systems, we can enable higher uptime with an additional layer of complexity. + +- For PostgreSQL, we provide repmgr for server cluster management and failover + and a combination of [PgBouncer](../high_availability/pgbouncer.md) and [Consul](../high_availability/consul.md) for + database client cutover. +- For Redis, we use [Redis Sentinel](../high_availability/redis.md) for server failover and client cutover. + +You can also optionally run [additional Sidekiq processes on dedicated hardware](../high_availability/sidekiq.md) +and configure individual Sidekiq processes to +[process specific background job queues](../operations/extra_sidekiq_processes.md) +if you need to scale out background job processing. + +### GitLab Geo + +GitLab Geo allows you to replicate your GitLab instance to other geographical locations as a read-only fully operational instance that can also be promoted in case of disaster. + +This configuration is supported in [GitLab Premium and Ultimate](https://about.gitlab.com/pricing/). + +References: + +- [Geo Documentation](../geo/replication/index.md) +- [GitLab Geo with a highly available configuration](../geo/replication/high_availability.md) + +## GitLab components and configuration instructions + +The GitLab application depends on the following [components](../../development/architecture.md#component-diagram). +It can also depend on several third party services depending on +your environment setup. Here we'll detail both in the order in which +you would typically configure them along with our recommendations for +their use and configuration. + +### Third party services + +Here's some details of several third party services a typical environment +will depend on. The services can be provided by numerous applications +or providers and further advice can be given on how best to select. +These should be configured first, before the [GitLab components](#gitlab-components). + +| Component | Description | Configuration instructions | +|--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------| +| [Load Balancer(s)](../high_availability/load_balancer.md)[^6] | Handles load balancing for the GitLab nodes where required | [Load balancer HA configuration](../high_availability/load_balancer.md) | +| [Cloud Object Storage service](../high_availability/object_storage.md)[^4] | Recommended store for shared data objects | [Cloud Object Storage configuration](../high_availability/object_storage.md) | +| [NFS](../high_availability/nfs.md)[^5] [^7] | Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages | [NFS configuration](../high_availability/nfs.md) | + +### GitLab components + +Next are all of the components provided directly by GitLab. As mentioned +earlier, they are presented in the typical order you would configure +them. + +| Component | Description | Configuration instructions | +|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------| +| [Consul](../../development/architecture.md#consul)[^3] | Service discovery and health checks/failover | [Consul HA configuration](../high_availability/consul.md) **(PREMIUM ONLY)** | +| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [Database HA configuration](../high_availability/database.md) | +| [PgBouncer](../../development/architecture.md#pgbouncer) | Database Pool Manager | [PgBouncer HA configuration](../high_availability/pgbouncer.md) **(PREMIUM ONLY)** | +| [Redis](../../development/architecture.md#redis)[^3] with Redis Sentinel | Key/Value store for shared data with HA watcher service | [Redis HA configuration](../high_availability/redis.md) | +| [Gitaly](../../development/architecture.md#gitaly)[^2] [^5] [^7] | Recommended high-level storage for Git repository data | [Gitaly HA configuration](../high_availability/gitaly.md) | +| [Sidekiq](../../development/architecture.md#sidekiq) | Asynchronous/Background jobs | [Sidekiq configuration](../high_availability/sidekiq.md) | +| [GitLab application nodes](../../development/architecture.md#unicorn)[^1] | (Unicorn / Puma, Workhorse) - Web-requests (UI, API, Git over HTTP) | [GitLab app HA/scaling configuration](../high_availability/gitlab.md) | +| [Prometheus](../../development/architecture.md#prometheus) and [Grafana](../../development/architecture.md#grafana) | GitLab environment monitoring | [Monitoring node for scaling/HA](../high_availability/monitoring_node.md) | + +In some cases, components can be combined on the same nodes to reduce complexity as well. + +[^1]: In our architectures we run each GitLab Rails node using the Puma webserver + and have its number of workers set to 90% of available CPUs along with 4 threads. + +[^2]: Gitaly node requirements are dependent on customer data, specifically the number of + projects and their sizes. We recommend 2 nodes as an absolute minimum for HA environments + and at least 4 nodes should be used when supporting 50,000 or more users. + We also recommend that each Gitaly node should store no more than 5TB of data + and have the number of [`gitaly-ruby` workers](../gitaly/index.md#gitaly-ruby) + set to 20% of available CPUs. Additional nodes should be considered in conjunction + with a review of expected data size and spread based on the recommendations above. + +[^3]: Recommended Redis setup differs depending on the size of the architecture. + For smaller architectures (up to 5,000 users) we suggest one Redis cluster for all + classes and that Redis Sentinel is hosted alongside Consul. + For larger architectures (10,000 users or more) we suggest running a separate + [Redis Cluster](../high_availability/redis.md#running-multiple-redis-clusters) for the Cache class + and another for the Queues and Shared State classes respectively. We also recommend + that you run the Redis Sentinel clusters separately as well for each Redis Cluster. + +[^4]: For data objects such as LFS, Uploads, Artifacts, etc. We recommend a [Cloud Object Storage service](../object_storage.md) + over NFS where possible, due to better performance and availability. + +[^5]: NFS can be used as an alternative for both repository data (replacing Gitaly) and + object storage but this isn't typically recommended for performance reasons. Note however it is required for + [GitLab Pages](https://gitlab.com/gitlab-org/gitlab-pages/issues/196). + +[^6]: Our architectures have been tested and validated with [HAProxy](https://www.haproxy.org/) + as the load balancer. However other reputable load balancers with similar feature sets + should also work instead but be aware these aren't validated. + +[^7]: We strongly recommend that any Gitaly and / or NFS nodes are set up with SSD disks over + HDD with a throughput of at least 8,000 IOPS for read operations and 2,000 IOPS for write + as these components have heavy I/O. These IOPS values are recommended only as a starter + as with time they may be adjusted higher or lower depending on the scale of your + environment's workload. If you're running the environment on a Cloud provider + you may need to refer to their documentation on how configure IOPS correctly. |