diff options
Diffstat (limited to 'doc/architecture/blueprints/runner_tokens')
-rw-r--r-- | doc/architecture/blueprints/runner_tokens/index.md | 227 |
1 files changed, 227 insertions, 0 deletions
diff --git a/doc/architecture/blueprints/runner_tokens/index.md b/doc/architecture/blueprints/runner_tokens/index.md new file mode 100644 index 00000000000..3f8a27e503d --- /dev/null +++ b/doc/architecture/blueprints/runner_tokens/index.md @@ -0,0 +1,227 @@ +--- +stage: Verify +group: Runner +comments: false +description: 'Next Runner Token Architecture' +--- + +# Next GitLab Runner Token Architecture + +## Summary + +GitLab Runner is a core component of GitLab CI/CD that runs +CI/CD jobs in a reliable and concurrent environment. Ever since the beginnings +of the service as a Ruby program, runners are registered in a GitLab instance with +a registration token - a randomly generated string of text. The registration token is unique for its given scope +(instance, group, or project). The registration token proves that the party that registers the runner has +administrator access to the instance, group, or project to which the runner is registered. + +This approach has worked well in the initial years, but some major known issues started to +become apparent as the target audience grew: + +| Problem | Symptoms | +|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Single token per scope | - The registration token is shared by multiple runners: <br/>- Single tokens lower the value of auditing and make traceability almost impossible; <br/>- Copied in many places for [self-registration of runners](https://docs.gitlab.com/runner/install/kubernetes.html#required-configuration); <br/>- Reports of users storing tokens in unsecured locations; <br/>- Makes rotation of tokens costly. <br/>- In the case of a security event affecting the whole instance, rotating tokens requires users to update a table of projects/namespaces, which takes a significant amount of time. | +| No provision for automatic expiration | Requires manual intervention to change token. Addressed in [#30942](https://gitlab.com/gitlab-org/gitlab/-/issues/30942). | +| No permissions model | Used to register a runner for protected branches, and for any tags. In this case, the registration token has permission to do everything. Effectively, someone taking a possession of registration token could steal secrets or source code. | +| No traceability | Given that the token is not created by a user, and is accessible to all administrators, there is no possibility to know the source of a leaked token. | +| No historical records | When reset, the previous value of the registration token is not stored so there is no historical data to enable deeper auditing and inspection. | +| Token stored in project/namespace model | Inadvertent disclosure of token is possible. | +| Too many registered runners | It is too straightforward to register a new runner using a well-known registration token. | + +In light of these issues, it is important that we redesign the way in which we connect runners to the GitLab instance so that we can guarantee traceability, security, and performance. + +We call this new mechanism the "next GitLab Runner Token architecture". + +## Proposal + +The proposal addresses the issues of a _single token per scope_ and _token storage_ +by eliminating the need for a registration token. Runner creation happens +in the GitLab Runners settings page for the given scope, in the context of the logged-in user +, which provides traceability. The page provides instructions to configure the newly-created +runner in supported environments. + +The runner configuration will be generated through a new `deploy` command, which will leverage +the `/runners/verify` REST endpoint to ensure the validity of the authentication token. +The remaining concerns become non-issues due to the elimination of the registration token. + +The configuration can be applied across many machines by reusing the same instructions. +A unique system identifier will be generated automatically if a value is missing from +the runner entry in the `config.toml` file. This allows differentiating systems sharing the same +runner token (for example, in auto-scaling scenarios), and is crucial for the proper functioning of our +long-polling mechanism when the same authentication token is shared across two or more runner managers. + +Given that the creation of runners involves user interaction, it should be possible +to eventually lower the per-plan limit of CI runners that can be registered per scope. + +### Auto-scaling scenarios (for example Helm chart) + +In the existing model, a new runner is created whenever a new worker is required. This +has led to many situations where runners are left behind and become stale. + +In the proposed model, a `ci_runners` table entry describes a configuration, +which the runner could reuse across multiple machines. This allows differentiating the context in +which the runner is being used. In situations where we must differentiate between runners +that reuse the same configuration, we can use the unique system identifier to track all +unique "runners" that are executed in context of a single `ci_runners` model. This unique +system identifier would be present in the Runner's `config.toml` configuration file and +initially set when generating the new `[[runners]]` configuration by means of the `deploy` command. +Legacy files that miss values for unique system identifiers will get rewritten automatically with new values. + +### Runner identification in CI jobs + +For users to identify the machine where the job was executed, the unique identifier will need to be visible in CI job contexts. +As a first iteration, GitLab Runner will include the unique system identifier in the build logs, +wherever it publishes the short token SHA. + +Given that the runner will potentially be reused with different unique system identifiers, +we can store the unique system ID. This ensures the unique system ID maps to a GitLab Runner's `config.toml` entry with +the runner token. The `ci_runner_machines` would hold information about each unique runner machine, +with information when runner last connected, and what type of runner it was. The relevant fields +will be moved from the `ci_runners`. +The `ci_builds_runner_session` (or `ci_builds` or `ci_builds_metadata`) will reference +`ci_runner_machines`. +We might consider a more efficient way to store `contacted_at` than updating the existing record. + +```sql +CREATE TABLE ci_builds_runner_session ( + ... + runner_machine_id bigint NOT NULL +); + +CREATE TABLE ci_runner_machines ( + id integer NOT NULL, + machine_id character varying UNIQUE NOT NULL, + contacted_at timestamp without time zone, + version character varying, + revision character varying, + platform character varying, + architecture character varying, + ip_address character varying, + executor_type smallint, +); +``` + +## Advantages + +- Easier for users to wrap their minds around the concept: instead of two types of tokens, + there is a single type of token - the per-runner authentication token. Having two types of tokens + frequently results in misunderstandings when discussing issues; +- Runners can always be traced back to the user who created it, using the audit log; +- The claims of a CI runner are known at creation time, and cannot be changed from the runner + (for example, changing the `access_level`/`protected` flag). Authenticated users + may however still edit these settings through the GitLab UI. + +## Details + +In the proposed approach, we create a distinct way to configure runners that is usable +alongside the current registration token method during a transition period. The idea is +to avoid having the Runner make API calls that allow it to leverage a single "god-like" +token to register new runners. + +The new workflow looks as follows: + + 1. The user opens the Runners settings page; + 1. The user fills in the details regarding the new desired runner, namely description, + tags, protected, locked, etc.; + 1. The user clicks `Create`. That results in the following: + + 1. Creates a new runner in the `ci_runners` table (and corresponding authentication token); + 1. Presents the user with instructions on how to configure this new runner on a machine, + with possibilities for different supported deployment scenarios (e.g. shell, `docker-compose`, Helm chart, etc.) + This information contains a token which will only be available to the user once, and the UI + will make it clear to the user that the value will not be shown again, as registering the same runner multiple times + is discouraged (though not impossible). + + 1. The user copies and pastes the instructions for the intended deployment scenario (a `deploy` command), leading to the following actions: + + 1. Upon executing the new `gitlab-runner deploy` command in the instructions, `gitlab-runner` will perform + a call to the `POST /runners/verify` with the given runner token; + 1. If the `POST /runners/verify` GitLab endpoint validates the token, the `config.toml` file will be populated with the configuration. + + The `gitlab-runner deploy` will also accept executor-specific arguments + currently present in the `register` command. + +As part of the transition period, we will provide admins and top-level group owners with a instance/group-level setting to disable +the legacy registration token functionality and enforce using only the new workflow. +Any attempt by a `gitlab-runner register` command to hit the `POST /runners` endpoint to register a new runner +will result in a `HTTP 410 - Gone` status code. The instance setting is inherited by the groups +, which means that if the legacy registration method is disabled at the instance method, the descendant groups/projects will also mandatorily +prevent the legacy registration method. + +The registration token workflow is to be deprecated (with a deprecation notice printed by the `gitlab-runner register` command) +and removed at a future major release after the concept is proven stable and customers have migrated to the new workflow. + +### Handling of legacy runners + +Legacy versions of GitLab Runner will not send the unique system identifier in its requests, and we +will not change logic in Workhorse to handle unique system IDs. This can be improved upon in the +future once the legacy registration system is removed, and runners have been upgraded to newer +versions. + +Not using the unique system ID means that all connected runners with the same token will be +notified, instead of just the runner matching the exact system identifier. While not ideal, this is +not an issue per-se. + +### Helm chart + +The `runnerRegistrationToken` entry in the [`values.yaml` file](https://gitlab.com/gitlab-org/charts/gitlab-runner/-/blob/a70bc29a903b79d5675bb0c45d981adf8b7a8659/values.yaml#L52) +will be retired. The `runnerRegistrationToken` entry will be replaced by the existing `runnerToken` value, which will be passed +to the new `gitlab-runner deploy` command in [`configmap.yaml`](https://gitlab.com/gitlab-org/charts/gitlab-runner/-/blob/a70bc29a903b79d5675bb0c45d981adf8b7a8659/templates/configmap.yaml#L116). + +### Runner creation through API + +Automated runner creation may be allowed, although always through authenticated API calls - +using PAT tokens for example - such that every runner is associated with an owner. + +## Implementation plan + +| Component | Milestone | Changes | +|------------------|-----------|---------| +| GitLab Rails app | `15.x` (latest at `15.6`) | Deprecate `POST /api/v4/runners` endpoint for `16.0`. This hinges on a [proposal](https://gitlab.com/gitlab-org/gitlab/-/issues/373774) to allow deprecating REST API endpoints for security reasons. | +| GitLab Runner | `15.x` (latest at `15.8`) | Add deprecation notice for `register` command for `16.0`. | +| GitLab Runner | `15.x` | Ensure all runner entries in `config.toml` have unique system identifier values assigned. Log new system ID values with `INFO` level as they get created. | +| GitLab Runner | `15.x` | Start additionally logging unique system ID anywhere we log the runner short SHA. | +| GitLab Rails app | `15.x` | Create database migrations to add settings from `application_settings` and `namaspace_settings` tables. | +| GitLab Runner | `15.x` | Start sending `unique_id` value in `POST /jobs/request` request and other follow-up requests that require identifying the unique system. | +| GitLab Runner | `15.x` | Implement new user-authenticated API (REST and GraphQL) to create a new runner. | +| GitLab Rails app | `15.x` | Implement UI to create new runner. | +| GitLab Runner | `16.0` | Remove `register` command and support for `POST /runners` endpoint. | +| GitLab Rails app | `16.0` | Remove legacy UI showing registration with a registration token. | +| GitLab Rails app | `16.0` | Create database migrations to remove settings from `application_settings` and `namaspace_settings` tables. | +| GitLab Rails app | `16.0` | Make [`POST /api/v4/runners` endpoint](../../../api/runners.md#register-a-new-runner-deprecated) permanently return `410 Gone`. A future v5 version of the API would return `404 Not Found`. | +| GitLab Rails app | `16.0` | Start refusing job requests that don't include a unique ID. | + +## Status + +Status: RFC. + +## Who + +Proposal: + +<!-- vale gitlab.Spelling = NO --> + +| Role | Who +|------------------------------|--------------------------------------------------| +| Authors | Kamil Trzciński, Tomasz Maczukin, Pedro Pombeiro | +| Architecture Evolution Coach | Kamil Trzciński | +| Engineering Leader | Elliot Rushton, Cheryl Li | +| Product Manager | Darren Eastman, Jackie Porter | +| Domain Expert / Runner | Tomasz Maczukin | + +DRIs: + +| Role | Who | +|------------------------------|---------------------------------| +| Leadership | Elliot Rushton | +| Product | Darren Eastman | +| Engineering | Tomasz Maczukin, Pedro Pombeiro | + +Domain experts: + +| Area | Who | +|------------------------------|-----------------| +| Domain Expert / Runner | Tomasz Maczukin | + +<!-- vale gitlab.Spelling = YES --> |