doc/development/agent/gitops.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148

---
stage: Configure
group: Configure
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
---

# GitOps with the Kubernetes Agent **(PREMIUM SELF)**

The [GitLab Kubernetes Agent](../../user/clusters/agent/index.md) supports the
[pull-based version](https://www.gitops.tech/#pull-based-deployments) of
[GitOps](https://www.gitops.tech/). To be useful, the feature must be able to perform these tasks:

- Connect one or more Kubernetes clusters to a GitLab project or group.
- Synchronize cluster-wide state from a Git repository.
- Synchronize namespace-scoped state from a Git repository.
- Control the following settings:

  - The kinds of objects an agent can manage.
  - Enabling the namespaced mode of operation for managing objects only in a specific namespace.
  - Enabling the non-namespaced mode of operation for managing objects in any namespace, and
    managing non-namespaced objects.

- Synchronize state from one or more Git repositories into a cluster.
- Configure multiple agents running in different clusters to synchronize state
  from the same repository.

## GitOps architecture

In this architecture, the Kubernetes cluster (`agentk`) periodically fetches
configuration from (`kas`), spawning a goroutine for each configured GitOps
repository. Each goroutine makes a streaming `GetObjectsToSynchronize()` gRPC call.
`kas` accepts these requests, then checks if this agent is authorized to access
this GitLab repository. If authorized, `kas` polls Gitaly for repository updates
and sends the latest manifests to the agent.

Before each poll, `kas` verifies with GitLab that the agent's token is still valid.
When `agentk` receives an updated manifest, it performs a synchronization using
[`gitops-engine`](https://github.com/argoproj/gitops-engine).

If a repository is removed from the list, `agentk` stops the `GetObjectsToSynchronize()`
calls to that repository.

```mermaid
graph TB
  agentk -- fetch configuration --> kas
  agentk -- fetch GitOps manifests --> kas

  subgraph "GitLab"
  kas[kas]
  GitLabRoR[GitLab RoR]
  Gitaly[Gitaly]
  kas -- poll GitOps repositories --> Gitaly
  kas -- authZ for agentk --> GitLabRoR
  kas -- fetch configuration --> Gitaly
  end

  subgraph "Kubernetes cluster"
  agentk[agentk]
  end
```

## Architecture considered but not implemented

As part of the implementation process, this architecture was considered, but ultimately
not implemented.

In this architecture, `agentk` periodically fetches configuration from `kas`. For each
configured GitOps repository, it spawns a goroutine. Each goroutine then spawns a
copy of [`git-sync`](https://github.com/kubernetes/git-sync). It polls a particular
repository and invokes a corresponding webhook on `agentk` when it changes. When that
happens, `agentk` performs a synchronization using
[`gitops-engine`](https://github.com/argoproj/gitops-engine).

For repositories no longer in the list, `agentk` stops corresponding goroutines
and `git-sync` copies, also deleting their cloned repositories from disk:

```mermaid
graph TB
  agentk -- fetch configuration --> kas
  git-sync -- poll GitOps repositories --> GitLabRoR

  subgraph "GitLab"
  kas[kas]
  GitLabRoR[GitLab RoR]
  kas -- authZ for agentk --> GitLabRoR
  kas -- fetch configuration --> Gitaly[Gitaly]
  end

  subgraph "Kubernetes cluster"
  agentk[agentk]
  git-sync[git-sync]
  agentk -- control --> git-sync
  git-sync -- notify about changes --> agentk
  end
```

## Comparing implemented and non-implemented architectures

Both architectures attempt to answer the same question: how to grant an agent
access to a non-public repository?

In the **implemented** architecture:

- Favorable: Fewer moving parts, as `git-sync` and `git` are not used, making this
  design more reliable.
- Favorable: Uses existing connectivity and authentication mechanisms are used (gRPC + `agentk` token).
- Favorable: No polling through external infrastructure. Saves traffic and avoids
  noise in access logs.

In the **unimplemented** architecture:

- Favorable: `agentk` uses `git-sync` to access repositories with standard protocols
  (either HTTPS, or SSH and Git) with accepted authentication and authorization methods.

  - Unfavorable: The user must put credentials into a `secret`. GitLab doesn't have
    a mechanism for per-repository tokens for robots.
  - Unfavorable: Rotating all credentials is more work than rotating a single `agentk` token.

- Unfavorable: A dependency on an external component (`git-sync`) that can be avoided.
- Unfavorable: More network traffic and connections than the implemented design

### Ideas considered for the unimplemented design

As part of the design process, these ideas were considered, and discarded:

- Running `git-sync` and `gitops-engine` as part of `kas`.

  - Favorable: More code and infrastructure under our control for GitLab.com
  - Unfavorable: Running an arbitrary number of `git-sync` processes would require
    an unbounded amount of RAM and disk space.
  - Unfavorable: Unclear which `kas` replica is responsible for which agent and
    repository synchronization. If done as part of `agentk`, leader election can be
    done using [client-go](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection?tab=doc).

- Running `git-sync` and a "`gitops-engine` driver" helper program as a separate
  Kubernetes `Deployment`.

  - Favorable: Better isolation and higher resiliency. For example, if the node
    with `agentk` dies, not all synchronization stops.
  - Favorable: Each deployment has its own memory and disk limits.
  - Favorable: Per-repository synchronization identity (distinct `ServiceAccount`)
    can be implemented.
  - Unfavorable: Time consuming to implement properly:

    - Each `Deployment` needs CRUD (create, update, and delete) permissions.
    - Users may want to customize a `Deployment`, or add and remove satellite objects
      like `PodDisruptionBudget`, `HorizontalPodAutoscaler`, and `PodSecurityPolicy`.
    - Metrics, monitoring, logs for the `Deployment`.