From d9ab72d6080f594d0b3cae15f14b3ef2c6c638cb Mon Sep 17 00:00:00 2001 From: GitLab Bot Date: Wed, 20 Oct 2021 08:43:02 +0000 Subject: Add latest changes from gitlab-org/gitlab@14-4-stable-ee --- doc/development/sidekiq_style_guide.md | 66 ++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 11 deletions(-) (limited to 'doc/development/sidekiq_style_guide.md') diff --git a/doc/development/sidekiq_style_guide.md b/doc/development/sidekiq_style_guide.md index 04b7e2f5c45..d45e2073fe7 100644 --- a/doc/development/sidekiq_style_guide.md +++ b/doc/development/sidekiq_style_guide.md @@ -154,12 +154,6 @@ A good example of that would be a cache expiration worker. A job scheduled for an idempotent worker is [deduplicated](#deduplication) when an unstarted job with the same arguments is already in the queue. -WARNING: -For [data consistency jobs](#job-data-consistency-strategies), the deduplication is not compatible with the -`data_consistency` attribute set to `:sticky` or `:delayed`. -The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one. -There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this. - ### Ensuring a worker is idempotent Make sure the worker tests pass using the following shared example: @@ -285,6 +279,55 @@ module AuthorizedProjectUpdate end ``` +### Deduplication with load balancing + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/6763) in GitLab 14.4. + +Jobs that declare either `:sticky` or `:delayed` data consistency +are eligible for database load-balancing. +In both cases, jobs are [scheduled in the future](#scheduling-jobs-in-the-future) with a short delay (1 second). +This minimizes the chance of replication lag after a write. + +If you really want to deduplicate jobs eligible for load balancing, +specify `including_scheduled: true` argument when defining deduplication strategy: + +```ruby +class DelayedIdempotentWorker + include ApplicationWorker + data_consistency :delayed + + deduplicate :until_executing, including_scheduled: true + idempotent! + + # ... +end +``` + +#### Preserve the latest WAL location for idempotent jobs + +> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69372) in GitLab 14.3. +> - [Enabled on GitLab.com](https://gitlab.com/gitlab-org/gitlab/-/issues/338350) in GitLab 14.4. + +The deduplication always take into account the latest binary replication pointer, not the first one. +This happens because we drop the same job scheduled for the second time and the Write-Ahead Log (WAL) is lost. +This could lead to comparing the old WAL location and reading from a stale replica. + +To support both deduplication and maintaining data consistency with load balancing, +we are preserving the latest WAL location for idempotent jobs in Redis. +This way we are always comparing the latest binary replication pointer, +making sure that we read from the replica that is fully caught up. + +FLAG: +On self-managed GitLab, by default this feature is not available. +To make it available, +ask an administrator to [enable the preserve_latest_wal_locations_for_idempotent_jobs flag](../administration/feature_flags.md). +FLAG: +On self-managed GitLab, by default this feature is not available. +To make it available, +ask an administrator to [enable the `preserve_latest_wal_locations_for_idempotent_jobs` flag](../administration/feature_flags.md). +This feature flag is related to GitLab development and is not intended to be used by GitLab administrators, though. +On GitLab.com, this feature is available but can be configured by GitLab.com administrators only. + ## Limited capacity worker It is possible to limit the number of concurrent running jobs for a worker class @@ -553,11 +596,6 @@ class DelayedWorker end ``` -For [idempotent jobs](#idempotent-jobs), the deduplication is not compatible with the -`data_consistency` attribute set to `:sticky` or `:delayed`. -The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one. -There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this. - ### `feature_flag` property The `feature_flag` property allows you to toggle a job's `data_consistency`, @@ -583,6 +621,12 @@ class DelayedWorker end ``` +### Data consistency with idempotent jobs + +For [idempotent jobs](#idempotent-jobs) that declare either `:sticky` or `:delayed` data consistency, we are +[preserving the latest WAL location](#preserve-the-latest-wal-location-for-idempotent-jobs) while deduplicating, +ensuring that we read from the replica that is fully caught up. + ## Jobs with External Dependencies Most background jobs in the GitLab application communicate with other GitLab -- cgit v1.2.1