doc/architecture/blueprints/rate_limiting/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391

---
status: ready
creation-date: "2022-09-08"
authors: [ "@grzesiek", "@marshall007", "@fabiopitino", "@hswimelar" ]
coach: "@andrewn"
approvers: [ "@sgoldstein" ]
owning-stage:
participating-stages: []
---

# Next Rate Limiting Architecture

## Summary

Introducing reasonable application limits is a very important step in any SaaS
platform scaling strategy. The more users a SaaS platform has, the more
important it is to introduce sensible rate limiting and policies enforcement
that will help to achieve availability goals, reduce the problem of noisy
neighbours for users and ensure that they can keep using a platform
successfully.

This is especially true for GitLab.com. Our goal is to have a reasonable and
transparent strategy for enforcing application limits, which will become a
definition of a responsible usage, to help us with keeping our availability and
user satisfaction at a desired level.

We've been introducing various application limits for many years already, but
we've never had a consistent strategy for doing it. What we want to build now is
a consistent framework used by engineers and product managers, across entire
application stack, to define, expose and enforce limits and policies.

Lack of consistency in defining limits, not being able to expose them to our
users, support engineers and satellite services, has negative impact on our
productivity, makes it difficult to introduce new limits and eventually
prevents us from enforcing responsible usage on all layers of our application
stack.

This blueprint has been written to consolidate our limits and to describe the
vision of our next rate limiting and policies enforcement architecture.

## Goals

**Implement a next architecture for rate limiting and policies definition.**

## Challenges

- We have many ways to define application limits, in many different places.
- It is difficult to understand what limits have been applied to a request.
- It is difficult to introduce new limits, even more to define policies.
- Finding what limits are defined requires performing a codebase audit.
- We don't have a good way to expose limits to satellite services like Registry.
- We enforce a number of different policies via opaque external systems
  (Pipeline Validation Service, Bouncer, Watchtower, Cloudflare, HAProxy).
- There is not standardized way to define policies in a way consistent with defining limits.
- It is difficult to understand when a user is approaching a limit threshold.
- There is no way to automatically notify a user when they are approaching thresholds.
- There is no single way to change limits for a namespace / project / user / customer.
- There is no single way to monitor limits through real-time metrics.
- There is no framework for hierarchical limit configuration (instance / namespace / subgroup / project).
- We allow disabling rate-limiting for some marquee SaaS customers, but this
  increases a risk for those same customers. We should instead be able to set
  higher limits.

## Opportunity

We want to build a new framework, making it easier to define limits, quotas and
policies, and to enforce / adjust them in a controlled way, through robust
monitoring capabilities.

<!-- markdownlint-disable MD029 -->

1. Build a framework to define and enforce limits in GitLab Rails.
2. Build an API to consume limits in satellite service and expose them to users.
3. Extract parts of this framework into a dedicated GitLab Limits Service.

<!-- markdownlint-enable MD029 -->

The most important opportunity here is consolidation happening on multiple
levels:

1. Consolidate on the application limits tooling used in GitLab Rails.
1. Consolidate on the process of adding and managing application limits.
1. Consolidate on the behavior of hierarchical cascade of limits and overrides.
1. Consolidate on the application limits tooling used across entire application stack.
1. Consolidate on the policies enforcement tooling used across entire company.

Once we do that we will unlock another opportunity: to ship the new framework /
tooling as a GitLab feature to unlock these consolidation benefits for our
users, customers and entire wider community audience.

### Limits, quotas and policies

This document aims to describe our technical vision for building the next rate
limiting architecture for GitLab.com. We refer to this architectural evolution
as "the next rate limiting architecture", but this is a mental shortcut,
because we actually want to build a better framework that will make it easier
for us to manage not only rate limits, but also quotas and policies.

Below you can find a short definition of what we understand by a limit, by a
quota and by a policy.

- **Limit:** A constraint on application usage, typically used to mitigate
  risks to performance, stability, and security.
  - _Example:_ API calls per second for a given IP address
  - _Example:_ `git clone` events per minute for a given user
  - _Example:_ maximum artifact upload size of 1 GB
- **Quota:** A global constraint in application usage that is aggregated across an
  entire namespace over the duration of their billing cycle.
  - _Example:_ 400 CI/CD minutes per namespace per month
  - _Example:_ 10 GB transfer per namespace per month
- **Policy:** A representation of business logic that is decoupled from application
  code. Decoupled policy definitions allow logic to be shared across multiple services
  and/or "hot-loaded" at runtime without releasing a new version of the application.
  - _Example:_ decode and verify a JWT, determine whether the user has access to the
    given resource based on the JWT scopes and claims
  - _Example:_ deny access based on group-level constraints
    (such as IP allowlist, SSO, and 2FA) across all services

Technically, all of these are limits, because rate limiting is still
"limiting", quota is usually a business limit, and policy limits what you can
do with the application to enforce specific rules. By referring to a "limit" in
this document we mean a limit that is defined to protect business, availability
and security.

### Framework to define and enforce limits

First we want to build a new framework that will allow us to define and enforce
application limits, in the GitLab Rails project context, in a more consistent
and established way. In order to do that, we will need to build a new
abstraction that will tell engineers how to define a limit in a structured way
(presumably using YAML or Cue format) and then how to consume the limit in the
application itself.

We already do have many limits defined in the application, we can use them to
triangulate to find a reasonable abstraction that will consolidate how we
define, use and enforce limits.

We envision building a simple Ruby library here (we can add it to LabKit) that
will make it trivial for engineers to check if a certain limit has been
exceeded or not.

```yaml
name: my_limit_name
actors: user
context: project, group, pipeline
type: rate / second
group: pipeline::execution
limits:
  warn: 2B / day
  soft: 100k / s
  hard: 500k / s
```

```ruby
Gitlab::Limits::RateThreshold.enforce(:my_limit_name) do |threshold|
  actor   = current_user
  context = current_project

  threshold.available do |limit|
    # ...
  end

  threshold.approaching do |limit|
    # ...
  end

  threshold.exceeded do |limit|
    # ...
  end
end
```

In the example above, when `my_limit_name` is defined in YAML, engineers will
be check the current state and execute appropriate code block depending on the
past usage / resource consumption.

Things we want to build and support by default:

1. Comprehensive dashboards showing how often limits are being hit.
1. Notifications about the risk of hitting limits.
1. Automation checking if limits definitions are being enforced properly.
1. Different types of limits - time bound / number per resource etc.
1. A panel that makes it easy to override limits per plan / namespace.
1. Logging that will expose limits applied in Kibana.
1. An automatically generated documentation page describing all the limits.

### API to expose limits and policies

Once we have an established a consistent way to define application limits we
can build a few API endpoints that will allow us to expose them to our users,
customers and other satellite services that may want to consume them.

Users will be able to ask the API about the limits / thresholds that have been
set for them, how often they are hitting them, and what impact those might have
on their business. This kind of transparency can help them with communicating
their needs to customer success team at GitLab, and we will be able to
communicate how the responsible usage is defined at a given moment.

Because of how GitLab architecture has been built, GitLab Rails application, in
most cases, behaves as a central enterprise service bus (ESB) and there are a
few satellite services communicating with it. Services like Container Registry,
GitLab Runners, Gitaly, Workhorse, KAS could use the API to receive a set of
application limits those are supposed to enforce. This will still allow us to
define all of them in a single place.

We should, however, avoid the possible negative-feedback-loop, that will put
additional strain on the Rails application when there is a sudden increase in
usage happening. This might be a big customer starting a new automation that
traverses our API or a Denial of Service attack. In such cases, the additional
traffic will reach GitLab Rails and subsequently also other satellite services.
Then the satellite services may need to consult Rails again to obtain new
instructions / policies around rate limiting the increased traffic. This can
put additional strain on Rails application and eventually degrade performance
even more. In order to avoid this problem, we should extract the API endpoints
to separate service (see the section below) if the request rate to those
endpoints depends on the volume of incoming traffic. Alternatively we can keep
those endpoints in Rails if the increased traffic will not translate into
increase of requests rate or increase in resources consumption on these API
endpoints on the Rails side.

#### Decoupled Limits Service

At some point we may decide that it is time to extract a stateful backend
responsible for storing metadata around limits, all the counters and state
required, and exposing API, out of Rails.

It is impossible to make a decision about extracting such a decoupled limits
service yet, because we will need to ship more proof-of-concept work, and
concrete iterations to inform us better about when and how we should do that. We
will depend on the Evolution Architecture practice to guide us towards either
extracting Decoupled Limits Service or not doing that at all.

As we evolve this blueprint, we will document our findings and insights about
how this service should look like, in this section of the document.

### GitLab Policy Service

_Disclaimer_: Extracting a GitLab Policy Service might be out of scope
of the current workstream organized around implementing this blueprint.

Not all limits can be easily described in YAML. There are some more complex
policies that require a bit more sophisticated approach and a declarative
programming language used to enforce them. One example of such a language might be
[Rego](https://www.openpolicyagent.org/docs/latest/policy-language/) language.
It is a standardized way to define policies in
[OPA - Open Policy Agent](https://www.openpolicyagent.org/). At GitLab we are
already using OPA in some departments. We envision the need to additional
consolidation to not only consolidate on the tooling we are using internally at
GitLab, but to also transform the Next Rate Limiting Architecture into
something we can make a part of the product itself.

Today, we already do have a policy service we are using to decide whether a
pipeline can be created or not. There are many policies defined in
[Pipeline Validation Service](https://gitlab.com/gitlab-org/modelops/anti-abuse/pipeline-validation-service).
There is a significant opportunity here in transforming Pipeline Validation
Service into a general purpose GitLab Policy Service / GitLab Policy Agent that
will be well integrated into the GitLab product itself.

Generalizing Pipeline Validation Service into GitLab Policy Service can bring a
few interesting benefits:

1. Consolidate on our tooling across the company to improve efficiency.
1. Integrate our GitLab Rails limits framework to resolve policies using the policy service.
1. Do not struggle to define complex policies in YAML and hack evaluating them in Ruby.
1. Build a policy for GraphQL queries limiting using query execution cost estimation.
1. Make it easier to resolve policies that do not need "hierarchical limits" structure.
1. Make GitLab Policy Service part of the product and integrate it into the single application.

We envision using GitLab Policy Service to be place to define policies that do
not require knowing anything about the hierarchical structure of the limits.
There are limits that do not need this, like IP addresses allow-list, spam
checks, configuration validation etc.

We defined "Policy" as a stateless, functional-style, limit. It takes input
arguments and evaluates to either true or false. It should not require a global
counter or any other volatile global state to get evaluated. It may still
require to have a globally defined rules / configuration, but this state is not
volatile in a same way a rate limiting counter may be, or a megabytes consumed
to evaluate quota limit.

#### Policies used internally and externally

The GitLab Policy Service might be used in two different ways:

1. Rails limits framework will use it as a source of policies enforced internally.
1. The policy service feature will be used as a backend to store policies defined by users.

These are two slightly different use-cases: first one is about using
internally-defined policies to ensure the stability / availability of a GitLab
instance (GitLab.com or self-managed instance). The second use-case is about
making GitLab Policy Service a feature that users will be able to build on top
of.

Both use-cases are valid but we will need to make technical decision about how
to separate them. Even if we decide to implement them both in a single service,
we will need to draw a strong boundary between the two.

The same principle might apply to Decouple Limits Service described in one of
the sections of this document above.

#### The two limits / policy services

It is possible that GitLab Policy Service and Decoupled Limits Service can
actually be the same thing. It, however, depends on the implementation details
that we can't predict yet, and the decision about merging these services
together will need to be informed by subsequent iterations' feedback.

## Hierarchical limits

GitLab application aggregates users, projects, groups and namespaces in a
hierarchical way. This hierarchical structure has been designed to make it
easier to manage permissions, streamline workflows, and allow users and
customers to store related projects, repositories, and other artifacts,
together.

It is important to design the new rate limiting framework in a way that it
built on top of this hierarchical structure and engineers, customers, SREs and
other stakeholders can understand how limits are being applied, enforced and
overridden within the hierarchy of namespaces, groups and projects.

We want to reduce the cognitive load required to understand how limits are
being managed within the existing permissions structure. We might need to build
a simple and easy-to-understand formula for how our application decides which
limits and thresholds to apply for a given request and a given actor:

> GitLab will read default limits for every operation, all overrides configured
> and will choose a limit with the highest precedence configured. A limit
> precedence needs to be explicitly configured for every override, a default
> limit has precedence 100.

One way in which we can simplify limits management in general is to:

1. Have default limits / thresholds defined in YAML files with a default precedence 100.
1. Allow limits to be overridden through the API, store overrides in the database.
1. Every limit / threshold override needs to have an integer precedence value provided.
1. Build an API that will take an actor and expose limits applicable for it.
1. Build a dashboard showing actors with non-standard limits / overrides.
1. Build a observability around this showing in Kibana when non-standard limits are being used.

The points above represent an idea to use precedence score (or Z-Index for
limits), but there may be better solutions, like just defining a direction of
overrides - a lower limit might always override a limit defined higher in the
hierarchy. Choosing a proper solution will require a thoughtful research.

## Principles

1. Try to avoid building rate limiting framework in a tightly coupled way.
1. Build application limits API in a way that it can be easily extracted to a separate service.
1. Build application limits definition in a way that is independent from the Rails application.
1. Build tooling that produce consistent behavior and results across programming languages.
1. Build the new framework in a way that we can extend to allow self-managed administrators to customize limits.
1. Maintain consistent features and behavior across SaaS and self-managed codebase.
1. Be mindful about a cognitive load added by the hierarchical limits, aim to reduce it.

## Phases and iterations

**Phase 1**: Compile examples of current most important application limits — Owning Team
    a. Owning Team (in collaboration with Stage Groups) compiles a list of the
    most important application limits used in Rails today.
**Phase 2**: Implement Rate Limiting Framework in Rails - Owning Team
    a. Triangulate rate limiting abstractions based on the data gathered in Phase 1
    b. Develop YAML model for limits.
    c. Build Rails SDK.
    d. Create examples showcasing usage of the new rate limits SDK.
**Phase 3**: Team fan out of Rails SDK - Stage Groups
    a. Individual stage groups begin using the SDK built in Phase 2 for new limit and policies.
    b. Stage groups begin replacing historical adhoc limit implementations with the SDK.
    c. Provides means to monitor and observe the progress of the replacement effort. Ideally this is broken down to the `feature_category` level to drive group-level buy-in -- Owning Team.
**Phase 4**: Enable Satellite Services to Use the Rate Limiting Framework - Owning Team
    a. Determine if the goals of Phase 4 are best met by either
        1. Extracting the Rails rate limiting service into a decoupled service OR
        2. Implementing a separate Go library which uses the same backend (eg, Redis) for rate limiting.
**Phase 5**: SDK for Satellite Services - Owning Team
    a. Build Golang SDK.
    c. Create examples showcasing usage of the new rate limits SDK.
**Phase 6**: Team fan out for Satellite Services - Stage Groups
    a. Individual stage groups being using the SDK built in Phase 5 for new limit and policies.
    b. Stage groups begin replacing historical adhoc limit implementations with the SDK.

## Status

Request For Comments.

## Timeline

- 2022-04-27: [Rate Limit Architecture Working Group](https://about.gitlab.com/company/team/structure/working-groups/rate-limit-architecture/) started.
- 2022-06-07: Working Group members [started submitting technical proposals](https://gitlab.com/gitlab-org/gitlab/-/issues/364524) for the next rate limiting architecture.
- 2022-06-15: We started [scoring proposals](https://docs.google.com/spreadsheets/d/1DFHU1kSdTnpydwM5P2RK8NhVBNWgEHvzT72eOhB8F9E) submitted by Working Group members.
- 2022-07-06: A fourth, [consolidated proposal](https://gitlab.com/gitlab-org/gitlab/-/issues/364524#note_1017640650), has been submitted.
- 2022-07-12: Started working on the design document following [Architecture Evolution Workflow](https://about.gitlab.com/handbook/engineering/architecture/workflow/).
- 2022-09-08: The initial version of the blueprint has been merged.