summaryrefslogtreecommitdiff
path: root/doc/administration/sidekiq/sidekiq_memory_killer.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/sidekiq/sidekiq_memory_killer.md')
-rw-r--r--doc/administration/sidekiq/sidekiq_memory_killer.md131
1 files changed, 82 insertions, 49 deletions
diff --git a/doc/administration/sidekiq/sidekiq_memory_killer.md b/doc/administration/sidekiq/sidekiq_memory_killer.md
index 0876f98621d..cb27d44a2e6 100644
--- a/doc/administration/sidekiq/sidekiq_memory_killer.md
+++ b/doc/administration/sidekiq/sidekiq_memory_killer.md
@@ -4,22 +4,21 @@ group: Application Performance
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
-# Sidekiq MemoryKiller **(FREE SELF)**
+# Reducing memory use
The GitLab Rails application code suffers from memory leaks. For web requests
-this problem is made manageable using
-[`puma-worker-killer`](https://github.com/schneems/puma_worker_killer) which
-restarts Puma worker processes if it exceeds a memory limit. The Sidekiq
-MemoryKiller applies the same approach to the Sidekiq processes used by GitLab
+this problem is made manageable using a [supervision thread](../operations/puma.md#reducing-memory-use)
+that automatically restarts workers if they exceed a given resident set size (RSS) threshold
+for a certain amount of time.
+We use the same approach to the Sidekiq processes used by GitLab
to process background jobs.
-Unlike puma-worker-killer, which is enabled by default for all GitLab
-installations of GitLab 13.0 and later, the Sidekiq MemoryKiller is enabled by default
-_only_ for Omnibus packages. The reason for this is that the MemoryKiller
-relies on runit to restart Sidekiq after a memory-induced shutdown and GitLab
-installations from source do not all use runit or an equivalent.
+GitLab monitors the available RSS limit by default only for installations using
+the Linux packages (Omnibus) or Docker. The reason for this is that GitLab
+relies on runit to restart Sidekiq after a memory-induced shutdown, and GitLab
+self-compiled or Helm chart based installations don't use runit or an equivalent tool.
-With the default settings, the MemoryKiller causes a Sidekiq restart no
+With the default settings, Sidekiq restarts no
more often than once every 15 minutes, with the restart causing about one
minute of delay for incoming background jobs.
@@ -28,41 +27,75 @@ are cleanly terminated when Sidekiq is restarted, each Sidekiq process should be
run as a process group leader (for example, using `chpst -P`). If using Omnibus or the
`bin/background_jobs` script with `runit` installed, this is handled for you.
-## Configuring the MemoryKiller
-
-The MemoryKiller is controlled using environment variables.
-
-- `SIDEKIQ_MEMORY_KILLER_MAX_RSS` (KB): if this variable is set, and its value is greater
- than 0, the MemoryKiller is enabled. Otherwise the MemoryKiller is disabled.
-
- `SIDEKIQ_MEMORY_KILLER_MAX_RSS` defines the Sidekiq process allowed RSS.
-
- If the Sidekiq process exceeds the allowed RSS for longer than
- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME` the graceful restart is triggered. If the
- Sidekiq process go below the allowed RSS within `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`,
- the restart is aborted.
-
- The default value for Omnibus packages is set
- [in the Omnibus GitLab repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb).
-
-- `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS` (KB): If the Sidekiq
- process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS`,
- an immediate graceful restart of Sidekiq is triggered.
-
-- `SIDEKIQ_MEMORY_KILLER_CHECK_INTERVAL`: Define how
- often to check process RSS, default to 3 seconds.
-
-- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults to 900 seconds (15 minutes).
- The usage of this variable is described as part of `SIDEKIQ_MEMORY_KILLER_MAX_RSS`.
-
-- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. This defines the
- maximum time allowed for all Sidekiq jobs to finish. No new jobs are accepted
- during that time, and the process exits as soon as all jobs finish.
-
- If jobs do not finish during that time, the MemoryKiller interrupts all currently
- running jobs by sending `SIGTERM` to the Sidekiq process.
-
- If the process hard shutdown/restart is not performed by Sidekiq,
- the Sidekiq process is forcefully terminated after
- `Sidekiq[:timeout] + 2` seconds. An external supervision mechanism
- (for example, runit) must restart Sidekiq afterwards.
+## Configuring the limits
+
+Sidekiq memory limits are controlled using environment variables.
+
+- `SIDEKIQ_MEMORY_KILLER_MAX_RSS` (KB): defines the Sidekiq process soft limit for allowed RSS.
+ If the Sidekiq process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_MAX_RSS`,
+ for longer than `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`, the graceful restart is triggered.
+ If `SIDEKIQ_MEMORY_KILLER_MAX_RSS` is not set, or its value is set to 0, the soft limit is not monitored.
+ `SIDEKIQ_MEMORY_KILLER_MAX_RSS` defaults to `2000000`.
+
+- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defines the grace time period in seconds for which the Sidekiq process is allowed to run
+ above the allowed RSS soft limit. If the Sidekiq process goes below the allowed RSS (soft limit)
+ within `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`, the restart is aborted. Default value is 900 seconds (15 minutes).
+
+- `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS` (KB): defines the Sidekiq process hard limit for allowed RSS.
+ If the Sidekiq process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS`,
+ an immediate graceful restart of Sidekiq is triggered. If this value is not set, or set to 0,
+ the hard limit is not be monitored.
+
+- `SIDEKIQ_MEMORY_KILLER_CHECK_INTERVAL`: defines how often to check the process RSS. Defaults to 3 seconds.
+
+- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defines the maximum time allowed for all Sidekiq jobs to finish.
+ No new jobs are accepted during that time. Defaults to 30 seconds.
+
+ If the process restart is not performed by Sidekiq, the Sidekiq process is forcefully terminated after
+ [Sidekiq shutdown timeout](https://github.com/mperham/sidekiq/wiki/Signals#term) (defaults to 25 seconds) +2 seconds.
+ If jobs do not finish during that time, all currently running jobs are interrupted with a `SIGTERM` signal
+ sent to the Sidekiq process.
+
+- `GITLAB_MEMORY_WATCHDOG_ENABLED`: enabled by default. Set the `GITLAB_MEMORY_WATCHDOG_ENABLED` to false, to use legacy
+ Daemon Sidekiq Memory Killer implementation used prior GitLab 15.9. Support for setting `GITLAB_MEMORY_WATCHDOG_ENABLED`
+ will be removed in GitLab 16.0.
+
+### Monitor worker restarts
+
+GitLab emits log events if workers are restarted due to high memory usage.
+
+The following is an example of one of these log events in `/var/log/gitlab/gitlab-rails/sidekiq_client.log`:
+
+```json
+{
+ "severity": "WARN",
+ "time": "2023-02-04T09:45:16.173Z",
+ "correlation_id": null,
+ "pid": 2725,
+ "worker_id": "sidekiq_1",
+ "memwd_handler_class": "Gitlab::Memory::Watchdog::SidekiqHandler",
+ "memwd_sleep_time_s": 3,
+ "memwd_rss_bytes": 1079683247,
+ "memwd_max_rss_bytes": 629145600,
+ "memwd_max_strikes": 5,
+ "memwd_cur_strikes": 6,
+ "message": "rss memory limit exceeded",
+ "running_jobs": [
+ {
+ jid: "83efb701c59547ee42ff7068",
+ worker_class: "Ci::DeleteObjectsWorker"
+ },
+ {
+ jid: "c3a74503dc2637f8f9445dd3",
+ worker_class: "Ci::ArchiveTraceWorker"
+ }
+ ]
+}
+```
+
+Where:
+
+- `memwd_rss_bytes` is the actual amount of memory consumed.
+- `memwd_max_rss_bytes` is the RSS limit set through `per_worker_max_memory_mb`.
+- `running jobs` lists the jobs that were running at the time when the process
+ exceeded the RSS limit and started a graceful restart.