diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2021-05-19 15:44:42 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2021-05-19 15:44:42 +0000 |
commit | 4555e1b21c365ed8303ffb7a3325d773c9b8bf31 (patch) | |
tree | 5423a1c7516cffe36384133ade12572cf709398d /doc/development/sidekiq_style_guide.md | |
parent | e570267f2f6b326480d284e0164a6464ba4081bc (diff) | |
download | gitlab-ce-4555e1b21c365ed8303ffb7a3325d773c9b8bf31.tar.gz |
Add latest changes from gitlab-org/gitlab@13-12-stable-eev13.12.0-rc42
Diffstat (limited to 'doc/development/sidekiq_style_guide.md')
-rw-r--r-- | doc/development/sidekiq_style_guide.md | 47 |
1 files changed, 47 insertions, 0 deletions
diff --git a/doc/development/sidekiq_style_guide.md b/doc/development/sidekiq_style_guide.md index 4bcd5e50fae..82b6a54540f 100644 --- a/doc/development/sidekiq_style_guide.md +++ b/doc/development/sidekiq_style_guide.md @@ -15,6 +15,36 @@ All workers should include `ApplicationWorker` instead of `Sidekiq::Worker`, which adds some convenience methods and automatically sets the queue based on the worker's name. +## Retries + +Sidekiq defaults to using [25 +retries](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry), +with back-off between each retry. 25 retries means that the last retry +would happen around three weeks after the first attempt (assuming all 24 +prior retries failed). + +For most workers - especially [idempotent workers](#idempotent-jobs) - +the default of 25 retries is more than sufficient. Many of our older +workers declare 3 retries, which used to be the default within the +GitLab application. 3 retries happen over the course of a couple of +minutes, so the jobs are prone to failing completely. + +A lower retry count may be applicable if any of the below apply: + +1. The worker contacts an external service and we do not provide + guarantees on delivery. For example, webhooks. +1. The worker is not idempotent and running it multiple times could + leave the system in an inconsistent state. For example, a worker that + posts a system note and then performs an action: if the second step + fails and the worker retries, the system note will be posted again. +1. The worker is a cronjob that runs frequently. For example, if a cron + job runs every hour, then we don't need to retry beyond an hour + because we don't need two of the same job running at once. + +Each retry for a worker is counted as a failure in our metrics. A worker +which always fails 9 times and succeeds on the 10th would have a 90% +error rate. + ## Dedicated Queues All workers should use their own queue, which is automatically set based on the @@ -719,6 +749,23 @@ possible situations: 1. A job is queued by a node running the newer version of the application, but executed on a node running an older version of the application. +### Adding new workers + +On GitLab.com, we [do not currently have a Sidekiq deployment in the +canary stage](https://gitlab.com/gitlab-org/gitlab/-/issues/19239). This +means that a new worker than can be scheduled from an HTTP endpoint may +be scheduled from canary but not run on Sidekiq until the full +production deployment is complete. This can be several hours later than +scheduling the job. For some workers, this will not be a problem. For +others - particularly [latency-sensitive +jobs](#latency-sensitive-jobs) - this will result in a poor user +experience. + +This only applies to new worker classes when they are first introduced. +As we recommend [using feature flags](feature_flags/) as a general +development process, it's best to control the entire change (including +scheduling of the new Sidekiq worker) with a feature flag. + ### Changing the arguments for a worker Jobs need to be backward and forward compatible between consecutive versions |