summaryrefslogtreecommitdiff
path: root/doc/development/sidekiq_style_guide.md
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-10-20 08:43:02 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2021-10-20 08:43:02 +0000
commitd9ab72d6080f594d0b3cae15f14b3ef2c6c638cb (patch)
tree2341ef426af70ad1e289c38036737e04b0aa5007 /doc/development/sidekiq_style_guide.md
parentd6e514dd13db8947884cd58fe2a9c2a063400a9b (diff)
downloadgitlab-ce-d9ab72d6080f594d0b3cae15f14b3ef2c6c638cb.tar.gz
Add latest changes from gitlab-org/gitlab@14-4-stable-eev14.4.0-rc42
Diffstat (limited to 'doc/development/sidekiq_style_guide.md')
-rw-r--r--doc/development/sidekiq_style_guide.md66
1 files changed, 55 insertions, 11 deletions
diff --git a/doc/development/sidekiq_style_guide.md b/doc/development/sidekiq_style_guide.md
index 04b7e2f5c45..d45e2073fe7 100644
--- a/doc/development/sidekiq_style_guide.md
+++ b/doc/development/sidekiq_style_guide.md
@@ -154,12 +154,6 @@ A good example of that would be a cache expiration worker.
A job scheduled for an idempotent worker is [deduplicated](#deduplication) when
an unstarted job with the same arguments is already in the queue.
-WARNING:
-For [data consistency jobs](#job-data-consistency-strategies), the deduplication is not compatible with the
-`data_consistency` attribute set to `:sticky` or `:delayed`.
-The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one.
-There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this.
-
### Ensuring a worker is idempotent
Make sure the worker tests pass using the following shared example:
@@ -285,6 +279,55 @@ module AuthorizedProjectUpdate
end
```
+### Deduplication with load balancing
+
+> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/6763) in GitLab 14.4.
+
+Jobs that declare either `:sticky` or `:delayed` data consistency
+are eligible for database load-balancing.
+In both cases, jobs are [scheduled in the future](#scheduling-jobs-in-the-future) with a short delay (1 second).
+This minimizes the chance of replication lag after a write.
+
+If you really want to deduplicate jobs eligible for load balancing,
+specify `including_scheduled: true` argument when defining deduplication strategy:
+
+```ruby
+class DelayedIdempotentWorker
+ include ApplicationWorker
+ data_consistency :delayed
+
+ deduplicate :until_executing, including_scheduled: true
+ idempotent!
+
+ # ...
+end
+```
+
+#### Preserve the latest WAL location for idempotent jobs
+
+> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69372) in GitLab 14.3.
+> - [Enabled on GitLab.com](https://gitlab.com/gitlab-org/gitlab/-/issues/338350) in GitLab 14.4.
+
+The deduplication always take into account the latest binary replication pointer, not the first one.
+This happens because we drop the same job scheduled for the second time and the Write-Ahead Log (WAL) is lost.
+This could lead to comparing the old WAL location and reading from a stale replica.
+
+To support both deduplication and maintaining data consistency with load balancing,
+we are preserving the latest WAL location for idempotent jobs in Redis.
+This way we are always comparing the latest binary replication pointer,
+making sure that we read from the replica that is fully caught up.
+
+FLAG:
+On self-managed GitLab, by default this feature is not available.
+To make it available,
+ask an administrator to [enable the preserve_latest_wal_locations_for_idempotent_jobs flag](../administration/feature_flags.md).
+FLAG:
+On self-managed GitLab, by default this feature is not available.
+To make it available,
+ask an administrator to [enable the `preserve_latest_wal_locations_for_idempotent_jobs` flag](../administration/feature_flags.md).
+This feature flag is related to GitLab development and is not intended to be used by GitLab administrators, though.
+On GitLab.com, this feature is available but can be configured by GitLab.com administrators only.
+
## Limited capacity worker
It is possible to limit the number of concurrent running jobs for a worker class
@@ -553,11 +596,6 @@ class DelayedWorker
end
```
-For [idempotent jobs](#idempotent-jobs), the deduplication is not compatible with the
-`data_consistency` attribute set to `:sticky` or `:delayed`.
-The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one.
-There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this.
-
### `feature_flag` property
The `feature_flag` property allows you to toggle a job's `data_consistency`,
@@ -583,6 +621,12 @@ class DelayedWorker
end
```
+### Data consistency with idempotent jobs
+
+For [idempotent jobs](#idempotent-jobs) that declare either `:sticky` or `:delayed` data consistency, we are
+[preserving the latest WAL location](#preserve-the-latest-wal-location-for-idempotent-jobs) while deduplicating,
+ensuring that we read from the replica that is fully caught up.
+
## Jobs with External Dependencies
Most background jobs in the GitLab application communicate with other GitLab