diff options
Diffstat (limited to 'doc/administration/operations/extra_sidekiq_routing.md')
-rw-r--r-- | doc/administration/operations/extra_sidekiq_routing.md | 164 |
1 files changed, 164 insertions, 0 deletions
diff --git a/doc/administration/operations/extra_sidekiq_routing.md b/doc/administration/operations/extra_sidekiq_routing.md new file mode 100644 index 00000000000..93cf8bd4f43 --- /dev/null +++ b/doc/administration/operations/extra_sidekiq_routing.md @@ -0,0 +1,164 @@ +--- +stage: Enablement +group: Distribution +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Queue routing rules **(FREE SELF)** + +When the number of Sidekiq jobs increases to a certain scale, the system faces +some scalability issues. One of them is that the length of the queue tends to get +longer. High-urgency jobs have to wait longer until other less urgent jobs +finish. This head-of-line blocking situation may eventually affect the +responsiveness of the system, especially critical actions. In another scenario, +the performance of some jobs is degraded due to other long running or CPU-intensive jobs +(computing or rendering ones) in the same machine. + +To counter the aforementioned issues, one effective solution is to split +Sidekiq jobs into different queues and assign machines handling each queue +exclusively. For example, all CPU-intensive jobs could be routed to the +`cpu-bound` queue and handled by a fleet of CPU optimized instances. The queue +topology differs between companies depending on the workloads and usage +patterns. Therefore, GitLab supports a flexible mechanism for the +administrator to route the jobs based on their characteristics. + +As an alternative to [Queue selector](extra_sidekiq_processes.md#queue-selector), which +configures Sidekiq cluster to listen to a specific set of workers or queues, +GitLab also supports routing a job from a worker to the desired queue when it +is scheduled. Sidekiq clients try to match a job against a configured list of +routing rules. Rules are evaluated from first to last, and as soon as we find a +match for a given worker we stop processing for that worker (first match wins). +If the worker doesn't match any rule, it falls back to the queue name generated +from the worker name. + +By default, if the routing rules are not configured (or denoted with an empty +array), all the jobs are routed to the queue generated from the worker name. + +## Example configuration + +In `/etc/gitlab/gitlab.rb`: + +```ruby +sidekiq['routing_rules'] = [ + # Route all non-CPU-bound workers that are high urgency to `high-urgency` queue + ['resource_boundary!=cpu&urgency=high', 'high-urgency'], + # Route all database, gitaly and global search workers that are throttled to `throttled` queue + ['feature_category=database,gitaly,global_search&urgency=throttled', 'throttled'], + # Route all workers having contact with outside work to a `network-intenstive` queue + ['has_external_dependencies=true|feature_category=hooks|tags=network', 'network-intensive'], + # Route all import workers to the queues generated by the worker name, for + # example, JiraImportWorker to `jira_import`, SVNWorker to `svn_worker` + ['feature_category=import', nil], + # Wildcard matching, route the rest to `default` queue + ['*', 'default'] +] +``` + +The routing rules list is an order-matter array of tuples of query and +corresponding queue: + +- The query is following a [worker matching query](#worker-matching-query) syntax. +- The `<queue_name>` must be a valid Sidekiq queue name. If the queue name + is `nil`, or an empty string, the worker is routed to the queue generated + by the name of the worker instead. + +The query supports wildcard matching `*`, which matches all workers. As a +result, the wildcard query must stay at the end of the list or the rules after it +are ignored. + +NOTE: +Mixing queue routing rules and queue selectors requires care to +ensure all jobs that are scheduled and picked up by appropriate Sidekiq +workers. + +## Worker matching query + +GitLab provides a simple query syntax to match a worker based on its +attributes. This query syntax is employed by both [Queue routing +rules](#queue-routing-rules) and [Queue +selector](extra_sidekiq_processes.md#queue-selector). A query includes two +components: + +- Attributes that can be selected. +- Operators used to construct a query. + +### Available attributes + +> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/261) in GitLab 13.1 (`tags`). + +Queue matching query works upon the worker attributes, described in [Sidekiq +style guide](../../development/sidekiq_style_guide.md). We support querying +based on a subset of worker attributes: + +- `feature_category` - the [GitLab feature + category](https://about.gitlab.com/direction/maturity/#category-maturity) the + queue belongs to. For example, the `merge` queue belongs to the + `source_code_management` category. +- `has_external_dependencies` - whether or not the queue connects to external + services. For example, all importers have this set to `true`. +- `urgency` - how important it is that this queue's jobs run + quickly. Can be `high`, `low`, or `throttled`. For example, the + `authorized_projects` queue is used to refresh user permissions, and + is high urgency. +- `worker_name` - the worker name. The other attributes are typically more useful as + they are more general, but this is available in case a particular worker needs + to be selected. +- `name` - the queue name. The other attributes are typically more useful as + they are more general, but this is available in case a particular queue needs + to be selected. +- `resource_boundary` - if the queue is bound by `cpu`, `memory`, or + `unknown`. For example, the `ProjectExportWorker` is memory bound as it has + to load data in memory before saving it for export. +- `tags` - short-lived annotations for queues. These are expected to frequently + change from release to release, and may be removed entirely. + +`has_external_dependencies` is a boolean attribute: only the exact +string `true` is considered true, and everything else is considered +false. + +`tags` is a set, which means that `=` checks for intersecting sets, and +`!=` checks for disjoint sets. For example, `tags=a,b` selects queues +that have tags `a`, `b`, or both. `tags!=a,b` selects queues that have +neither of those tags. + +The attributes of each worker are hard-coded in the source code. For +convenience, we generate a [list of all available attributes in +GitLab Community Edition](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/all_queues.yml) +and a [list of all available attributes in +GitLab Enterprise Edition](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/all_queues.yml). + +### Available operators + +`queue_selector` supports the following operators, listed from highest +to lowest precedence: + +- `|` - the logical OR operator. For example, `query_a|query_b` (where `query_a` + and `query_b` are queries made up of the other operators here) will include + queues that match either query. +- `&` - the logical AND operator. For example, `query_a&query_b` (where + `query_a` and `query_b` are queries made up of the other operators here) will + only include queues that match both queries. +- `!=` - the NOT IN operator. For example, `feature_category!=issue_tracking` + excludes all queues from the `issue_tracking` feature category. +- `=` - the IN operator. For example, `resource_boundary=cpu` includes all + queues that are CPU bound. +- `,` - the concatenate set operator. For example, + `feature_category=continuous_integration,pages` includes all queues from + either the `continuous_integration` category or the `pages` category. This + example is also possible using the OR operator, but allows greater brevity, as + well as being lower precedence. + +The operator precedence for this syntax is fixed: it's not possible to make AND +have higher precedence than OR. + +[In GitLab 12.9](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26594) and +later, as with the standard queue group syntax above, a single `*` as the +entire queue group selects all queues. + +### Migration + +After the Sidekiq routing rules are changed, administrators need to take care +with the migration to avoid losing jobs entirely, especially in a system with +long queues of jobs. The migration can be done by following the migration steps +mentioned in [Sidekiq job +migration](../../raketasks/sidekiq_job_migration.md) |