1 files changed, 130 insertions, 20 deletions
diff --git a/doc/administration/operations/puma.md b/doc/administration/operations/puma.md
index fffff78b9d6..e8477eaf686 100644
--- a/doc/administration/operations/puma.md
+++ b/doc/administration/operations/puma.md
@@ -4,35 +4,102 @@ group: Distribution
 info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
 ---
 
-# Switching to Puma **(FREE SELF)**
+# Puma **(FREE SELF)**
 
-As of GitLab 12.9, [Puma](https://github.com/puma/puma) has replaced [Unicorn](https://yhbt.net/unicorn/)
-as the default web server. From GitLab 14.0, the following run Puma:
+Puma is a simple, fast, multi-threaded, and highly concurrent HTTP 1.1 server for
+Ruby applications. It's the default GitLab web server since GitLab 13.0
+and has replaced Unicorn. From GitLab 14.0, Unicorn is no longer supported.
 
-- All-in-one package-based installations.
-- Helm chart-based installations.
+NOTE:
+Starting with GitLab 13.0, Puma is the default web server and Unicorn has been disabled.
+In GitLab 14.0, Unicorn was removed from the Linux package and only Puma is available.
 
-## Why switch to Puma?
+## Configure Puma
 
-Puma has a multi-thread architecture which uses less memory than a multi-process
-application server like Unicorn. On GitLab.com, we saw a 40% reduction in memory
-consumption.
+To configure Puma:
 
-Most Rails applications requests normally include a proportion of I/O wait time.
-During I/O wait time MRI Ruby will release the GVL (Global VM Lock) to other threads.
-Multi-threaded Puma can therefore still serve more requests than a single process.
+1. Determine suitable Puma worker and thread [settings](../../install/requirements.md#puma-settings).
+1. If you're switching from Unicorn, [convert any custom settings to Puma](#convert-unicorn-settings-to-puma).
+1. For multi-node deployments, configure the load balancer to use the
+   [readiness check](../load_balancer.md#readiness-check).
+1. Reconfigure GitLab so the above changes take effect:
+
+   ```shell
+   sudo gitlab-ctl reconfigure
+   ```
+
+For Helm-based deployments, see the
+[`webservice` chart documentation](https://docs.gitlab.com/charts/charts/gitlab/webservice/index.html).
+
+For more details about the Puma configuration, see the
+[Puma documentation](https://github.com/puma/puma#configuration).
+
+## Puma Worker Killer
+
+By default:
+
+- The [Puma Worker Killer](https://github.com/schneems/puma_worker_killer) restarts a worker if it
+  exceeds a [memory limit](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/cluster/puma_worker_killer_initializer.rb).
+- Rolling restarts of Puma workers are performed every 12 hours.
+
+To change the memory limit setting:
+
+1. Edit `/etc/gitlab/gitlab.rb`:
+
+   ```ruby
+   puma['per_worker_max_memory_mb'] = 1024
+   ```
+
+1. Reconfigure GitLab for the changes to take effect:
+
+   ```shell
+   sudo gitlab-ctl reconfigure
+   ```
+
+## Worker timeout
+
+A [timeout of 60 seconds](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/initializers/rack_timeout.rb)
+is used when Puma is enabled.
+
+NOTE:
+Unlike Unicorn, the `puma['worker_timeout']` setting does not set the maximum request duration.
 
-## Configuring Puma to replace Unicorn
+To change the worker timeout:
 
-Beginning with GitLab 13.0, Puma is the default application server. We removed support for
-Unicorn in GitLab 14.0.
+1. Edit `/etc/gitlab/gitlab.rb`:
 
-When switching to Puma, Unicorn server configuration
-will _not_ carry over automatically, due to differences between the two application servers. For Omnibus-based
-deployments, see [Configuring Puma Settings](https://docs.gitlab.com/omnibus/settings/puma.html#configuring-puma-settings).
-For Helm based deployments, see the [`webservice` chart documentation](https://docs.gitlab.com/charts/charts/gitlab/webservice/index.html).
+   ```ruby
+   gitlab_rails['env'] = {
+      'GITLAB_RAILS_RACK_TIMEOUT' => 600
+    }
+   ```
 
-Additionally we strongly recommend that multi-node deployments [configure their load balancers to use the readiness check](../load_balancer.md#readiness-check) due to a difference between Unicorn and Puma in how they handle connections during a restart of the service.
+1. Reconfigure GitLab for the changes to take effect:
+
+   ```shell
+   sudo gitlab-ctl reconfigure
+   ```
+
+## Memory-constrained environments
+
+In a memory-constrained environment with less than 4GB of RAM available, consider disabling Puma
+[Clustered mode](https://github.com/puma/puma#clustered-mode).
+
+Configuring Puma by setting the amount of `workers` to `0` could reduce memory usage by hundreds of MB.
+For details on Puma worker and thread settings, see the [Puma requirements](../../install/requirements.md#puma-settings).
+
+Unlike in a Clustered mode, which is set up by default, only a single Puma process would serve the application.
+
+The downside of running Puma with such configuration is the reduced throughput, which could be
+considered as a fair tradeoff in a memory-constraint environment.
+
+When running Puma in Single mode, some features are not supported:
+
+- Phased restart do not work: [issue](https://gitlab.com/gitlab-org/gitlab/-/issues/300665)
+- [Phased restart](https://gitlab.com/gitlab-org/gitlab/-/issues/300665)
+- [Puma Worker Killer](https://gitlab.com/gitlab-org/gitlab/-/issues/300664)
+
+To learn more, visit [epic 5303](https://gitlab.com/groups/gitlab-org/-/epics/5303).
 
 ## Performance caveat when using Puma with Rugged
 
@@ -66,3 +133,46 @@ optimal configuration:
   Rugged, single-threaded Puma works the same as Unicorn.
 - To force Rugged to be used with multi-threaded Puma, you can use
   [feature flags](../../development/gitaly.md#legacy-rugged-code).
+
+## Convert Unicorn settings to Puma
+
+NOTE:
+Starting with GitLab 13.0, Puma is the default web server and Unicorn has been
+disabled by default. In GitLab 14.0, Unicorn was removed from the Linux package
+and only Puma is available.
+
+Puma has a multi-thread architecture which uses less memory than a multi-process
+application server like Unicorn. On GitLab.com, we saw a 40% reduction in memory
+consumption. Most Rails applications requests normally include a proportion of I/O wait time.
+
+During I/O wait time MRI Ruby releases the GVL (Global VM Lock) to other threads.
+Multi-threaded Puma can therefore still serve more requests than a single process.
+
+When switching to Puma, any Unicorn server configuration will _not_ carry over
+automatically, due to differences between the two application servers.
+
+The table below summarizes which Unicorn configuration keys correspond to those
+in Puma when using the Linux package, and which ones have no corresponding counterpart.
+
+| Unicorn                              | Puma                               |
+| ------------------------------------ | ---------------------------------- |
+| `unicorn['enable']`                  | `puma['enable']`                   |
+| `unicorn['worker_timeout']`          | `puma['worker_timeout']`           |
+| `unicorn['worker_processes']`        | `puma['worker_processes']`         |
+| n/a                                  | `puma['ha']`                       |
+| n/a                                  | `puma['min_threads']`              |
+| n/a                                  | `puma['max_threads']`              |
+| `unicorn['listen']`                  | `puma['listen']`                   |
+| `unicorn['port']`                    | `puma['port']`                     |
+| `unicorn['socket']`                  | `puma['socket']`                   |
+| `unicorn['pidfile']`                 | `puma['pidfile']`                  |
+| `unicorn['tcp_nopush']`              | n/a                                |
+| `unicorn['backlog_socket']`          | n/a                                |
+| `unicorn['somaxconn']`               | `puma['somaxconn']`                |
+| n/a                                  | `puma['state_path']`               |
+| `unicorn['log_directory']`           | `puma['log_directory']`            |
+| `unicorn['worker_memory_limit_min']` | n/a                                |
+| `unicorn['worker_memory_limit_max']` | `puma['per_worker_max_memory_mb']` |
+| `unicorn['exporter_enabled']`        | `puma['exporter_enabled']`         |
+| `unicorn['exporter_address']`        | `puma['exporter_address']`         |
+| `unicorn['exporter_port']`           | `puma['exporter_port']`            |