summaryrefslogtreecommitdiff
path: root/doc/architecture
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2020-12-17 11:59:07 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2020-12-17 11:59:07 +0000
commit8b573c94895dc0ac0e1d9d59cf3e8745e8b539ca (patch)
tree544930fb309b30317ae9797a9683768705d664c4 /doc/architecture
parent4b1de649d0168371549608993deac953eb692019 (diff)
downloadgitlab-ce-8b573c94895dc0ac0e1d9d59cf3e8745e8b539ca.tar.gz
Add latest changes from gitlab-org/gitlab@13-7-stable-eev13.7.0-rc42
Diffstat (limited to 'doc/architecture')
-rw-r--r--doc/architecture/blueprints/cloud_native_build_logs/index.md16
-rw-r--r--doc/architecture/blueprints/cloud_native_gitlab_pages/index.md10
-rw-r--r--doc/architecture/blueprints/feature_flags_development/index.md26
-rw-r--r--doc/architecture/blueprints/gitlab_to_kubernetes_communication/index.md164
-rw-r--r--doc/architecture/blueprints/image_resizing/index.md6
-rw-r--r--doc/architecture/index.md2
6 files changed, 208 insertions, 16 deletions
diff --git a/doc/architecture/blueprints/cloud_native_build_logs/index.md b/doc/architecture/blueprints/cloud_native_build_logs/index.md
index f901a724653..3aba10fc758 100644
--- a/doc/architecture/blueprints/cloud_native_build_logs/index.md
+++ b/doc/architecture/blueprints/cloud_native_build_logs/index.md
@@ -1,7 +1,7 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
description: 'Next iteration of build logs architecture at GitLab'
---
@@ -24,7 +24,7 @@ output to a file on a disk. This is simple, but this mechanism depends on
shared local storage - the same file needs to be available on every GitLab web
node machine, because GitLab Runner might connect to a different one every time
it performs an API request. Sidekiq also needs access to the file because when
-a job is complete, a trace file contents will be sent to the object store.
+a job is complete, the trace file contents are sent to the object store.
## New architecture
@@ -109,16 +109,22 @@ of complexity, maintenance cost and enormous, negative impact on availability.
1. ✓ Evaluate performance and edge-cases, iterate to improve the new architecture
1. ✓ Design cloud native build logs correctness verification mechanisms
1. ✓ Build observability mechanisms around performance and correctness
-1. Rollout the feature into production environment incrementally
+1. ✓ Rollout the feature into production environment incrementally
The work needed to make the new architecture production ready and enabled on
-GitLab.com is being tracked in [Cloud Native Build Logs on
+GitLab.com had been tracked in [Cloud Native Build Logs on
GitLab.com](https://gitlab.com/groups/gitlab-org/-/epics/4275) epic.
Enabling this feature on GitLab.com is a subtask of [making the new
architecture generally
available](https://gitlab.com/groups/gitlab-org/-/epics/3791) for everyone.
+## Status
+
+This change has been implemented and enabled on GitLab.com.
+
+We are working on [an epic to make this feature more resilient and observable](https://gitlab.com/groups/gitlab-org/-/epics/4860).
+
## Who
Proposal:
@@ -137,7 +143,7 @@ DRIs:
| Role | Who
|------------------------------|------------------------|
-| Product | Jason Yavorska |
+| Product | Thao Yeager |
| Leadership | Darby Frey |
| Engineering | Grzegorz Bizon |
diff --git a/doc/architecture/blueprints/cloud_native_gitlab_pages/index.md b/doc/architecture/blueprints/cloud_native_gitlab_pages/index.md
index 27d2f1362e5..95ffcdd0b39 100644
--- a/doc/architecture/blueprints/cloud_native_gitlab_pages/index.md
+++ b/doc/architecture/blueprints/cloud_native_gitlab_pages/index.md
@@ -1,7 +1,7 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
description: 'Making GitLab Pages a Cloud Native application - architecture blueprint.'
---
@@ -72,9 +72,9 @@ of complexity, maintenance cost and enormous, negative impact on availability.
## New GitLab Pages Architecture
-- GitLab Pages is going to source domains' configuration from GitLab's internal
+- GitLab Pages sources domains' configuration from the GitLab internal
API, instead of reading `config.json` files from a local shared storage.
-- GitLab Pages is going to serve static content from Object Storage.
+- GitLab Pages serves static content from Object Storage.
```mermaid
graph TD
@@ -90,10 +90,10 @@ too.
## Iterations
-1. ✓ Redesign GitLab Pages configuration source to use GitLab's API
+1. ✓ Redesign GitLab Pages configuration source to use the GitLab API
1. ✓ Evaluate performance and build reliable caching mechanisms
1. ✓ Incrementally rollout the new source on GitLab.com
-1. ✓ Make GitLab Pages API domains config source enabled by default
+1. ✓ Make GitLab Pages API domains configuration source enabled by default
1. Enable experimentation with different servings through feature flags
1. Triangulate object store serving design through meaningful experiments
1. Design pages migration mechanisms that can work incrementally
diff --git a/doc/architecture/blueprints/feature_flags_development/index.md b/doc/architecture/blueprints/feature_flags_development/index.md
index 76fb5f5c7db..c92b4113465 100644
--- a/doc/architecture/blueprints/feature_flags_development/index.md
+++ b/doc/architecture/blueprints/feature_flags_development/index.md
@@ -1,7 +1,7 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
description: 'Internal usage of Feature Flags for GitLab development'
---
@@ -119,7 +119,9 @@ This work is being done as part of dedicated epic: [Improve internal usage of
Feature Flags](https://gitlab.com/groups/gitlab-org/-/epics/3551). This epic
describes a meta reasons for making these changes.
-## Who
+## [Who](#who)
+
+### Blueprint
Proposal:
@@ -140,4 +142,24 @@ DRIs:
| Leadership | Craig Gomes |
| Engineering | Kamil Trzciński |
+### [Stakeholders](#stakeholders)
+
+| Role | Person | Title
+|--------------------|-----------------------|--------------------------------------------------------------------|
+| Executive Sponsor | Christopher Lefelhocz | Senior Director of Development |
+| Facilitator | Darby Frey | Senior Engineering Manager, Verify |
+| DRI / Leadership | Craig Gomes | Backend Engineering Manager, Memory and Database |
+| DRI / Engineering | Kamil Trzciński | Distinguished Engineer, Ops and Enablement |
+| DRI / Product | Kenny Johnston | Senior Director of Product Management, Ops |
+| Functional Lead | Ricky Wiens | Backend Engineering Manager, Verify:Testing |
+| Functional Lead | Anthony Sandoval | Engineering Manager, Reliability |
+| Functional Lead | James Heimbuck | Senior Product Manager, Verify:Testing |
+| Member | Grzegorz Bizon | Staff Backend Engineer, Verify |
+| Member | Michelle Gill | Engineering Manager, Create:Source Code |
+| Member | Wayne Haber | Director of Engineering, Threat Management |
+| Member | Doug Stull | Senior Fullstack Engineer, Growth:Expansion |
+| Member | Andrew Fontaine | Senior Frontend Engineer, Release |
+| Member | Rémy Coutable | Staff Backend Engineer, Engineering Productivity |
+| Member | Marin Jankovski | Senior Engineering Manager, Infrastructure, Delivery & Scalability |
+
<!-- vale gitlab.Spelling = YES -->
diff --git a/doc/architecture/blueprints/gitlab_to_kubernetes_communication/index.md b/doc/architecture/blueprints/gitlab_to_kubernetes_communication/index.md
new file mode 100644
index 00000000000..6c27ecca284
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_to_kubernetes_communication/index.md
@@ -0,0 +1,164 @@
+---
+stage: configure
+group: configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+comments: false
+description: 'GitLab to Kubernetes communication'
+---
+
+# GitLab to Kubernetes communication
+
+The goal of this document is to define how GitLab can communicate with Kubernetes
+and in-cluster services through the GitLab Kubernetes Agent.
+
+## Challenges
+
+### Lack of network connectivity
+
+For various features that exist today, GitLab communicates with Kubernetes by directly
+or indirectly calling its API endpoints. This works well, as long as a network
+path from GitLab to the cluster exists, which isn't always the case:
+
+- GitLab.com and a self-managed cluster, where the cluster is not exposed to the Internet.
+- GitLab.com and a cloud-vendor managed cluster, where the cluster is not exposed to the Internet.
+- Self-managed GitLab and a cloud-vendor managed cluster, where the cluster is not
+ exposed to the Internet and there is no private peering between the cloud network
+ and the customer's network.
+
+ This last item is the hardest to address, as something must give to create a network
+ path. This feature gives the customer an extra option (exposing the `gitlab-kas` domain but
+ not the whole GitLab) in addition to the existing options (peering the networks,
+ or exposing one of the two sides).
+
+Even if technically possible, it's almost always undesirable to expose a Kubernetes
+cluster's API to the Internet for security reasons. As a result, our customers
+are reluctant to do so, and are faced with a choice of security versus the features
+GitLab provides for connected clusters.
+
+This choice is true not only for Kubernetes' API, but for all APIs exposed by services
+running on a customer's cluster that GitLab may need to access. For example,
+Prometheus running in a cluster must be exposed for the GitLab integration to access it.
+
+### Cluster-admin permissions
+
+Both current integrations - building your own cluster (certificate-based) and GitLab-managed
+cluster in a cloud - require granting full `cluster-admin` access to GitLab. Credentials
+are stored on the GitLab side and this is yet another security concern for our customers.
+
+For more discussion on these issues, read
+[issue #212810](https://gitlab.com/gitlab-org/gitlab/-/issues/212810).
+
+## GitLab Kubernetes Agent epic
+
+To address these challenges and provide some new features, the Configure group
+is building an active in-cluster component that inverts the
+direction of communication:
+
+1. The customer installs an agent into their cluster.
+1. The agent connects to GitLab.com or their self-managed GitLab instance,
+ receiving commands from it.
+
+The customer does not need to provide any credentials to GitLab, and
+is in full control of what permissions the agent has.
+
+For more information, visit the
+[GitLab Kubernetes Agent repository](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent) or
+[the epic](https://gitlab.com/groups/gitlab-org/-/epics/3329).
+
+### Request routing
+
+Agents connect to the server-side component called GitLab Kubernetes Agent Server
+(`gitlab-kas`) and keep an open connection that waits for commands. The
+difficulty with the approach is in routing requests from GitLab to the correct agent.
+Each cluster may contain multiple logical agents, and each may be running as multiple
+replicas (`Pod`s), connected to an arbitrary `gitlab-kas` instance.
+
+Existing and new features require real-time access to the APIs of the cluster
+and (optionally) APIs of components, running in the cluster. As a result, it's difficult to pass
+the information back and forth using the more traditional polling approach.
+
+A good example to illustrate the real-time need is Prometheus integration.
+If we wanted to draw real-time graphs, we would need direct access to the Prometheus API
+to make queries and quickly return results. `gitlab-kas` could expose the Prometheus API
+to GitLab, and transparently route traffic to one of the correct agents connected
+at the moment. The agent then would stream the request to Prometheus and stream the response back.
+
+## Proposal
+
+Implement request routing in `gitlab-kas`. Encapsulate and hide all related
+complexity from the main application by providing a clean API to work with Kubernetes
+and the agents.
+
+The above does not necessarily mean proxying Kubernetes' API directly, but that
+is possible should we need it.
+
+What APIs `gitlab-kas` provides depends on the features developed, but first
+we must solve the request routing problem. It blocks any and all features
+that require direct communication with agents, Kubernetes or in-cluster services.
+
+Detailed implementation proposal with all technical details is in
+[`kas_request_routing.md`](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/kas_request_routing.md).
+
+```mermaid
+flowchart LR
+ subgraph "Kubernetes 1"
+ agentk1p1["agentk 1, Pod1"]
+ agentk1p2["agentk 1, Pod2"]
+ end
+
+ subgraph "Kubernetes 2"
+ agentk2p1["agentk 2, Pod1"]
+ end
+
+ subgraph "Kubernetes 3"
+ agentk3p1["agentk 3, Pod1"]
+ end
+
+ subgraph kas
+ kas1["kas 1"]
+ kas2["kas 2"]
+ kas3["kas 3"]
+ end
+
+ GitLab["GitLab Rails"]
+ Redis
+
+ GitLab -- "gRPC to any kas" --> kas
+ kas1 -- register connected agents --> Redis
+ kas2 -- register connected agents --> Redis
+ kas1 -- lookup agent --> Redis
+
+ agentk1p1 -- "gRPC" --> kas1
+ agentk1p2 -- "gRPC" --> kas2
+ agentk2p1 -- "gRPC" --> kas1
+ agentk3p1 -- "gRPC" --> kas2
+```
+
+### Iterations
+
+Iterations are tracked in [the dedicated epic](https://gitlab.com/groups/gitlab-org/-/epics/4591).
+
+## Who
+
+Proposal:
+
+<!-- vale gitlab.Spelling = NO -->
+
+| Role | Who
+|------------------------------|-------------------------|
+| Author | Mikhail Mazurskiy |
+| Architecture Evolution Coach | Andrew Newdigate |
+| Engineering Leader | Nicholas Klick |
+| Domain Expert | Thong Kuah |
+| Domain Expert | Graeme Gillies |
+| Security Expert | Vitor Meireles De Sousa |
+
+DRIs:
+
+| Role | Who
+|------------------------------|------------------------|
+| Product Lead | Viktor Nagy |
+| Engineering Leader | Nicholas Klick |
+| Domain Expert | Mikhail Mazurskiy |
+
+<!-- vale gitlab.Spelling = YES -->
diff --git a/doc/architecture/blueprints/image_resizing/index.md b/doc/architecture/blueprints/image_resizing/index.md
index 47e2ad24960..9e5c45a715d 100644
--- a/doc/architecture/blueprints/image_resizing/index.md
+++ b/doc/architecture/blueprints/image_resizing/index.md
@@ -1,18 +1,18 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
description: 'Image Resizing'
---
# Image resizing for avatars and content images
-Currently, we are showing all uploaded images 1:1, which is of course not ideal. To improve performance greatly we will add image resizing to the backend. There are two main areas of image resizing to consider; avatars and content images. The MVC for this implementation will focus on Avatars. Avatars requests consist of approximately 70% of total image requests. There is an identified set of sizes we intend to support which makes the scope of this first MVC very narrow. Content image resizing has many more considerations for size and features. It is entirely possible that we have two separate development efforts with the same goal of increasing performance via image resizing.
+Currently, we are showing all uploaded images 1:1, which is of course not ideal. To improve performance greatly, add image resizing to the backend. There are two main areas of image resizing to consider; avatars and content images. The MVC for this implementation focuses on Avatars. Avatars requests consist of approximately 70% of total image requests. There is an identified set of sizes we intend to support which makes the scope of this first MVC very narrow. Content image resizing has many more considerations for size and features. It is entirely possible that we have two separate development efforts with the same goal of increasing performance via image resizing.
## MVC Avatar Resizing
-We will implement a dynamic image resizing solution. This means image should be resized and optimized on the fly so that if we define new targeted sizes later we can add them dynamically. This would mean a huge improvement in performance as some of the measurements suggest that we can save up to 95% of our current load size. Our initial investigations indicate that we have uploaded approximately 1.65 million avatars totaling approximately 80GB in size and averaging approximately 48kb each. Early measurements indicate we can reduce the most common avatar dimensions to between 1-3kb in size, netting us a greater than 90% size reduction. For the MVC we will not consider application level caching and rely purely on HTTP based caches as implemented in CDNs and browsers, but might revisit this decision later on. To easily mitigate performance issues with avatar resizing, especially in the case of self managed, an operations feature flag will be implemented to disable dynamic image resizing.
+When implementing a dynamic image resizing solution, images should be resized and optimized on the fly so that if we define new targeted sizes later we can add them dynamically. This would mean a huge improvement in performance as some of the measurements suggest that we can save up to 95% of our current load size. Our initial investigations indicate that we have uploaded approximately 1.65 million avatars totaling approximately 80GB in size and averaging approximately 48kb each. Early measurements indicate we can reduce the most common avatar dimensions to between 1-3kb in size, netting us a greater than 90% size reduction. For the MVC we don't consider application level caching and rely purely on HTTP based caches as implemented in CDNs and browsers, but might revisit this decision later on. To easily mitigate performance issues with avatar resizing, especially in the case of self managed, an operations feature flag is implemented to disable dynamic image resizing.
```mermaid
sequenceDiagram
diff --git a/doc/architecture/index.md b/doc/architecture/index.md
index 0cac646ea83..5dee2fd8a80 100644
--- a/doc/architecture/index.md
+++ b/doc/architecture/index.md
@@ -1,7 +1,7 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
description: 'Architecture Practice at GitLab'
---