summaryrefslogtreecommitdiff
path: root/doc/administration/operations/extra_sidekiq_routing.md
blob: cd3a53b7c6311f019d9ec7128a69077036747059 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# Queue routing rules **(FREE SELF)**

When the number of Sidekiq jobs increases to a certain scale, the system faces
some scalability issues. One of them is that the length of the queue tends to get
longer. High-urgency jobs have to wait longer until other less urgent jobs
finish. This head-of-line blocking situation may eventually affect the
responsiveness of the system, especially critical actions. In another scenario,
the performance of some jobs is degraded due to other long running or CPU-intensive jobs
(computing or rendering ones) in the same machine.

To counter the aforementioned issues, one effective solution is to split
Sidekiq jobs into different queues and assign machines handling each queue
exclusively. For example, all CPU-intensive jobs could be routed to the
`cpu-bound` queue and handled by a fleet of CPU optimized instances. The queue
topology differs between companies depending on the workloads and usage
patterns. Therefore, GitLab supports a flexible mechanism for the
administrator to route the jobs based on their characteristics.

As an alternative to [Queue selector](extra_sidekiq_processes.md#queue-selector), which
configures Sidekiq cluster to listen to a specific set of workers or queues,
GitLab also supports routing a job from a worker to the desired queue when it
is scheduled. Sidekiq clients try to match a job against a configured list of
routing rules. Rules are evaluated from first to last, and as soon as we find a
match for a given worker we stop processing for that worker (first match wins).
If the worker doesn't match any rule, it falls back to the queue name generated
from the worker name.

By default, if the routing rules are not configured (or denoted with an empty
array), all the jobs are routed to the queue generated from the worker name.

## Example configuration

In `/etc/gitlab/gitlab.rb`:

```ruby
sidekiq['routing_rules'] = [
  # Do not re-route workers that require their own queue
  ['tags=needs_own_queue', nil],
  # Route all non-CPU-bound workers that are high urgency to `high-urgency` queue
  ['resource_boundary!=cpu&urgency=high', 'high-urgency'],
  # Route all database, gitaly and global search workers that are throttled to `throttled` queue
  ['feature_category=database,gitaly,global_search&urgency=throttled', 'throttled'],
  # Route all workers having contact with outside work to a `network-intenstive` queue
  ['has_external_dependencies=true|feature_category=hooks|tags=network', 'network-intensive'],
  # Route all import workers to the queues generated by the worker name, for
  # example, JiraImportWorker to `jira_import`, SVNWorker to `svn_worker`
  ['feature_category=import', nil],
  # Wildcard matching, route the rest to `default` queue
  ['*', 'default']
]
```

The routing rules list is an order-matter array of tuples of query and
corresponding queue:

- The query is following a [worker matching query](#worker-matching-query) syntax.
- The `<queue_name>` must be a valid Sidekiq queue name. If the queue name
  is `nil`, or an empty string, the worker is routed to the queue generated
  by the name of the worker instead.

The query supports wildcard matching `*`, which matches all workers. As a
result, the wildcard query must stay at the end of the list or the rules after it
are ignored.

NOTE:
Mixing queue routing rules and queue selectors requires care to
ensure all jobs that are scheduled and picked up by appropriate Sidekiq
workers.

## Worker matching query

GitLab provides a simple query syntax to match a worker based on its
attributes. This query syntax is employed by both [Queue routing
rules](#queue-routing-rules) and [Queue
selector](extra_sidekiq_processes.md#queue-selector). A query includes two
components:

- Attributes that can be selected.
- Operators used to construct a query.

### Available attributes

> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/261) in GitLab 13.1 (`tags`).

Queue matching query works upon the worker attributes, described in
[Sidekiq style guide](../../development/sidekiq/index.md). We support querying
based on a subset of worker attributes:

- `feature_category` - the [GitLab feature
  category](https://about.gitlab.com/direction/maturity/#category-maturity) the
  queue belongs to. For example, the `merge` queue belongs to the
  `source_code_management` category.
- `has_external_dependencies` - whether or not the queue connects to external
  services. For example, all importers have this set to `true`.
- `urgency` - how important it is that this queue's jobs run
  quickly. Can be `high`, `low`, or `throttled`. For example, the
  `authorized_projects` queue is used to refresh user permissions, and
  is `high` urgency.
- `worker_name` - the worker name. The other attributes are typically more useful as
  they are more general, but this is available in case a particular worker needs
  to be selected.
- `name` - the queue name generated from the worker name. The other attributes
  are typically more useful as they are more general, but this is available in
  case a particular queue needs to be selected. Because this is generated from
  the worker name, it does not change based on the result of other routing
  rules.
- `resource_boundary` - if the queue is bound by `cpu`, `memory`, or
  `unknown`. For example, the `ProjectExportWorker` is memory bound as it has
  to load data in memory before saving it for export.
- `tags` - short-lived annotations for queues. These are expected to frequently
  change from release to release, and may be removed entirely.

`has_external_dependencies` is a boolean attribute: only the exact
string `true` is considered true, and everything else is considered
false.

`tags` is a set, which means that `=` checks for intersecting sets, and
`!=` checks for disjoint sets. For example, `tags=a,b` selects queues
that have tags `a`, `b`, or both. `tags!=a,b` selects queues that have
neither of those tags.

The attributes of each worker are hard-coded in the source code. For
convenience, we generate a [list of all available attributes in
GitLab Community Edition](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/all_queues.yml)
and a [list of all available attributes in
GitLab Enterprise Edition](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/all_queues.yml).

### Available operators

`queue_selector` supports the following operators, listed from highest
to lowest precedence:

- `|` - the logical OR operator. For example, `query_a|query_b` (where `query_a`
  and `query_b` are queries made up of the other operators here) will include
  queues that match either query.
- `&` - the logical AND operator. For example, `query_a&query_b` (where
  `query_a` and `query_b` are queries made up of the other operators here) will
  only include queues that match both queries.
- `!=` - the NOT IN operator. For example, `feature_category!=issue_tracking`
  excludes all queues from the `issue_tracking` feature category.
- `=` - the IN operator. For example, `resource_boundary=cpu` includes all
  queues that are CPU bound.
- `,` - the concatenate set operator. For example,
  `feature_category=continuous_integration,pages` includes all queues from
  either the `continuous_integration` category or the `pages` category. This
  example is also possible using the OR operator, but allows greater brevity, as
  well as being lower precedence.

The operator precedence for this syntax is fixed: it's not possible to make AND
have higher precedence than OR.

[In GitLab 12.9](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26594) and
later, as with the standard queue group syntax above, a single `*` as the
entire queue group selects all queues.

### Migration

After the Sidekiq routing rules are changed, administrators need to take care
with the migration to avoid losing jobs entirely, especially in a system with
long queues of jobs. The migration can be done by following the migration steps
mentioned in [Sidekiq job
migration](../../raketasks/sidekiq_job_migration.md)

### Workers that cannot be migrated

Some workers cannot share a queue with other workers - typically because
they check the size of their own queue - and so must be excluded from
this process. We recommend excluding these from any further worker
routing by adding a rule to keep them in their own queue, for example:

```ruby
sidekiq['routing_rules'] = [
  ['tags=needs_own_queue', nil],
  # ...
]
```

These queues will also need to be included in at least one [Sidekiq
queue group](extra_sidekiq_processes.md#start-multiple-processes).

The following table shows the workers that should have their own queue:

| Worker name | Queue name | GitLab issue |
| --- | --- | --- |
| `EmailReceiverWorker` | `email_receiver` | [`gitlab-com/gl-infra/scalability#1263`](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1263) |
| `ServiceDeskEmailReceiverWorker` | `service_desk_email_receiver` | [`gitlab-com/gl-infra/scalability#1263`](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1263) |
| `ProjectImportScheduleWorker` | `project_import_schedule` | [`gitlab-org/gitlab#340630`](https://gitlab.com/gitlab-org/gitlab/-/issues/340630) |
| `HashedStorage::MigratorWorker` | `hashed_storage:hashed_storage_migrator` | [`gitlab-org/gitlab#340629`](https://gitlab.com/gitlab-org/gitlab/-/issues/340629) |
| `HashedStorage::ProjectMigrateWorker` | `hashed_storage:hashed_storage_project_migrate` | [`gitlab-org/gitlab#340629`](https://gitlab.com/gitlab-org/gitlab/-/issues/340629) |
| `HashedStorage::ProjectRollbackWorker` | `hashed_storage:hashed_storage_project_rollback` | [`gitlab-org/gitlab#340629`](https://gitlab.com/gitlab-org/gitlab/-/issues/340629) |
| `HashedStorage::RollbackerWorker` | `hashed_storage:hashed_storage_rollbacker` | [`gitlab-org/gitlab#340629`](https://gitlab.com/gitlab-org/gitlab/-/issues/340629) |