diff options
Diffstat (limited to 'doc/administration/sidekiq/sidekiq_memory_killer.md')
-rw-r--r-- | doc/administration/sidekiq/sidekiq_memory_killer.md | 131 |
1 files changed, 82 insertions, 49 deletions
diff --git a/doc/administration/sidekiq/sidekiq_memory_killer.md b/doc/administration/sidekiq/sidekiq_memory_killer.md index 0876f98621d..cb27d44a2e6 100644 --- a/doc/administration/sidekiq/sidekiq_memory_killer.md +++ b/doc/administration/sidekiq/sidekiq_memory_killer.md @@ -4,22 +4,21 @@ group: Application Performance info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments --- -# Sidekiq MemoryKiller **(FREE SELF)** +# Reducing memory use The GitLab Rails application code suffers from memory leaks. For web requests -this problem is made manageable using -[`puma-worker-killer`](https://github.com/schneems/puma_worker_killer) which -restarts Puma worker processes if it exceeds a memory limit. The Sidekiq -MemoryKiller applies the same approach to the Sidekiq processes used by GitLab +this problem is made manageable using a [supervision thread](../operations/puma.md#reducing-memory-use) +that automatically restarts workers if they exceed a given resident set size (RSS) threshold +for a certain amount of time. +We use the same approach to the Sidekiq processes used by GitLab to process background jobs. -Unlike puma-worker-killer, which is enabled by default for all GitLab -installations of GitLab 13.0 and later, the Sidekiq MemoryKiller is enabled by default -_only_ for Omnibus packages. The reason for this is that the MemoryKiller -relies on runit to restart Sidekiq after a memory-induced shutdown and GitLab -installations from source do not all use runit or an equivalent. +GitLab monitors the available RSS limit by default only for installations using +the Linux packages (Omnibus) or Docker. The reason for this is that GitLab +relies on runit to restart Sidekiq after a memory-induced shutdown, and GitLab +self-compiled or Helm chart based installations don't use runit or an equivalent tool. -With the default settings, the MemoryKiller causes a Sidekiq restart no +With the default settings, Sidekiq restarts no more often than once every 15 minutes, with the restart causing about one minute of delay for incoming background jobs. @@ -28,41 +27,75 @@ are cleanly terminated when Sidekiq is restarted, each Sidekiq process should be run as a process group leader (for example, using `chpst -P`). If using Omnibus or the `bin/background_jobs` script with `runit` installed, this is handled for you. -## Configuring the MemoryKiller - -The MemoryKiller is controlled using environment variables. - -- `SIDEKIQ_MEMORY_KILLER_MAX_RSS` (KB): if this variable is set, and its value is greater - than 0, the MemoryKiller is enabled. Otherwise the MemoryKiller is disabled. - - `SIDEKIQ_MEMORY_KILLER_MAX_RSS` defines the Sidekiq process allowed RSS. - - If the Sidekiq process exceeds the allowed RSS for longer than - `SIDEKIQ_MEMORY_KILLER_GRACE_TIME` the graceful restart is triggered. If the - Sidekiq process go below the allowed RSS within `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`, - the restart is aborted. - - The default value for Omnibus packages is set - [in the Omnibus GitLab repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb). - -- `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS` (KB): If the Sidekiq - process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS`, - an immediate graceful restart of Sidekiq is triggered. - -- `SIDEKIQ_MEMORY_KILLER_CHECK_INTERVAL`: Define how - often to check process RSS, default to 3 seconds. - -- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults to 900 seconds (15 minutes). - The usage of this variable is described as part of `SIDEKIQ_MEMORY_KILLER_MAX_RSS`. - -- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. This defines the - maximum time allowed for all Sidekiq jobs to finish. No new jobs are accepted - during that time, and the process exits as soon as all jobs finish. - - If jobs do not finish during that time, the MemoryKiller interrupts all currently - running jobs by sending `SIGTERM` to the Sidekiq process. - - If the process hard shutdown/restart is not performed by Sidekiq, - the Sidekiq process is forcefully terminated after - `Sidekiq[:timeout] + 2` seconds. An external supervision mechanism - (for example, runit) must restart Sidekiq afterwards. +## Configuring the limits + +Sidekiq memory limits are controlled using environment variables. + +- `SIDEKIQ_MEMORY_KILLER_MAX_RSS` (KB): defines the Sidekiq process soft limit for allowed RSS. + If the Sidekiq process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_MAX_RSS`, + for longer than `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`, the graceful restart is triggered. + If `SIDEKIQ_MEMORY_KILLER_MAX_RSS` is not set, or its value is set to 0, the soft limit is not monitored. + `SIDEKIQ_MEMORY_KILLER_MAX_RSS` defaults to `2000000`. + +- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defines the grace time period in seconds for which the Sidekiq process is allowed to run + above the allowed RSS soft limit. If the Sidekiq process goes below the allowed RSS (soft limit) + within `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`, the restart is aborted. Default value is 900 seconds (15 minutes). + +- `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS` (KB): defines the Sidekiq process hard limit for allowed RSS. + If the Sidekiq process RSS (expressed in kilobytes) exceeds `SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS`, + an immediate graceful restart of Sidekiq is triggered. If this value is not set, or set to 0, + the hard limit is not be monitored. + +- `SIDEKIQ_MEMORY_KILLER_CHECK_INTERVAL`: defines how often to check the process RSS. Defaults to 3 seconds. + +- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defines the maximum time allowed for all Sidekiq jobs to finish. + No new jobs are accepted during that time. Defaults to 30 seconds. + + If the process restart is not performed by Sidekiq, the Sidekiq process is forcefully terminated after + [Sidekiq shutdown timeout](https://github.com/mperham/sidekiq/wiki/Signals#term) (defaults to 25 seconds) +2 seconds. + If jobs do not finish during that time, all currently running jobs are interrupted with a `SIGTERM` signal + sent to the Sidekiq process. + +- `GITLAB_MEMORY_WATCHDOG_ENABLED`: enabled by default. Set the `GITLAB_MEMORY_WATCHDOG_ENABLED` to false, to use legacy + Daemon Sidekiq Memory Killer implementation used prior GitLab 15.9. Support for setting `GITLAB_MEMORY_WATCHDOG_ENABLED` + will be removed in GitLab 16.0. + +### Monitor worker restarts + +GitLab emits log events if workers are restarted due to high memory usage. + +The following is an example of one of these log events in `/var/log/gitlab/gitlab-rails/sidekiq_client.log`: + +```json +{ + "severity": "WARN", + "time": "2023-02-04T09:45:16.173Z", + "correlation_id": null, + "pid": 2725, + "worker_id": "sidekiq_1", + "memwd_handler_class": "Gitlab::Memory::Watchdog::SidekiqHandler", + "memwd_sleep_time_s": 3, + "memwd_rss_bytes": 1079683247, + "memwd_max_rss_bytes": 629145600, + "memwd_max_strikes": 5, + "memwd_cur_strikes": 6, + "message": "rss memory limit exceeded", + "running_jobs": [ + { + jid: "83efb701c59547ee42ff7068", + worker_class: "Ci::DeleteObjectsWorker" + }, + { + jid: "c3a74503dc2637f8f9445dd3", + worker_class: "Ci::ArchiveTraceWorker" + } + ] +} +``` + +Where: + +- `memwd_rss_bytes` is the actual amount of memory consumed. +- `memwd_max_rss_bytes` is the RSS limit set through `per_worker_max_memory_mb`. +- `running jobs` lists the jobs that were running at the time when the process + exceeded the RSS limit and started a graceful restart. |