summaryrefslogtreecommitdiff
path: root/doc/architecture
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture')
-rw-r--r--doc/architecture/blueprints/ci_data_decay/index.md35
-rw-r--r--doc/architecture/blueprints/consolidating_groups_and_projects/index.md2
-rw-r--r--doc/architecture/blueprints/database_testing/index.md4
-rw-r--r--doc/architecture/blueprints/object_storage/index.md9
-rw-r--r--doc/architecture/blueprints/runner_scaling/index.md42
5 files changed, 67 insertions, 25 deletions
diff --git a/doc/architecture/blueprints/ci_data_decay/index.md b/doc/architecture/blueprints/ci_data_decay/index.md
index 155c781b04a..cbd9f5dea7a 100644
--- a/doc/architecture/blueprints/ci_data_decay/index.md
+++ b/doc/architecture/blueprints/ci_data_decay/index.md
@@ -74,13 +74,13 @@ we might want to follow these three tracks described below.
<!-- markdownlint-disable MD029 -->
-1. Partition builds queuing tables
-2. Archive CI/CD data into partitioned database schema
-3. Migrate archived builds metadata out of primary database
+1. Partition CI/CD builds queuing database tables
+2. Partition CI/CD pipelines database tables
+3. Reduce the rate of builds metadata table growth
<!-- markdownlint-enable MD029 -->
-### Migrate archived builds metadata out of primary database
+### Reduce the rate of builds metadata table growth
Once a build (or a pipeline) gets archived, it is no longer possible to resume
pipeline processing in such pipeline. It means that all the metadata, we store
@@ -98,15 +98,16 @@ be able to use de-duplication of metadata entries and other normalization
strategies to consume less storage while retaining ability to query this
dataset. Technical evaluation will be required to find the best solution here.
-Epic: [Migrate archived builds metadata out of primary database](https://gitlab.com/groups/gitlab-org/-/epics/7216).
+Epic: [Reduce the rate of builds metadata table growth](https://gitlab.com/groups/gitlab-org/-/epics/7434).
-### Archive CI/CD data into partitioned database schema
+### Partition CI/CD pipelines database tables
-After we move CI/CD metadata to a different store, the problem of having
-billions of rows describing pipelines, builds and artifacts, remains. We still
-need to keep reference to the metadata we store in object storage and we still
-do need to be able to retrieve this information reliably in bulk (or search
-through it).
+After we move CI/CD metadata to a different store, or reduce the rate of
+metadata growth in a different way, the problem of having billions of rows
+describing pipelines, builds and artifacts, remains. We still need to keep
+reference to the metadata we might store in object storage and we still do need
+to be able to retrieve this information reliably in bulk (or search through
+it).
It means that by moving data to object storage we might not be able to reduce
the number of rows in CI/CD tables. Moving data to object storage should help
@@ -132,9 +133,9 @@ partitioning on the application level.
Partitioning rarely accessed data should also follow the policy defined for
builds archival, to make it consistent and reliable.
-Epic: [Archive CI/CD data into partitioned database schema](https://gitlab.com/groups/gitlab-org/-/epics/5417).
+Epic: [Partition CI/CD pipelines database tables](https://gitlab.com/groups/gitlab-org/-/epics/5417).
-### Partition builds queuing tables
+### Partition CI/CD builds queuing database tables
While working on the [CI/CD Scale](../ci_scale/index.md) blueprint, we have
introduced a [new architecture for queuing CI/CD builds](https://gitlab.com/groups/gitlab-org/-/epics/5909#note_680407908)
@@ -156,7 +157,7 @@ for builds archival. Instead we should leverage a long-standing policy saying
that builds created more 24 hours ago need to be removed from the queue. This
business rule is present in the product since the inception of GitLab CI.
-Epic: [Partition builds queuing tables](https://gitlab.com/gitlab-org/gitlab/-/issues/347027).
+Epic: [Partition CI/CD builds queuing database tables](https://gitlab.com/groups/gitlab-org/-/epics/7438).
## Principles
@@ -215,9 +216,9 @@ pipelines data, although a user provided partition identifier may be required.
All three tracks can be worked on in parallel:
-1. [Migrate archived build metadata to object storage](https://gitlab.com/groups/gitlab-org/-/epics/7216).
-1. [Partition CI/CD data that have been archived](https://gitlab.com/groups/gitlab-org/-/epics/5417).
-1. [Partition CI/CD queuing tables using list partitioning](https://gitlab.com/gitlab-org/gitlab/-/issues/347027)
+1. [Reduce the rate of builds metadata table growth](https://gitlab.com/groups/gitlab-org/-/epics/7434).
+1. [Partition CI/CD pipelines database tables](https://gitlab.com/groups/gitlab-org/-/epics/5417).
+1. [Partition CI/CD queuing tables using list partitioning](https://gitlab.com/groups/gitlab-org/-/epics/7438)
## Status
diff --git a/doc/architecture/blueprints/consolidating_groups_and_projects/index.md b/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
index 345160dc77f..6040ac1e50f 100644
--- a/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
+++ b/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
@@ -131,7 +131,7 @@ epic.
The initial iteration will provide a framework to house features under `Namespaces`. Stage groups will eventually need to migrate their own features and functionality over to `Namespaces`. This may impact these features in unexpected ways. Therefore, to minimize UX debt and maintain product consistency, stage groups will have to consider a number of factors when migrating their features over to `Namespaces`:
-1. **Conceptual model**: What are the current and future state conceptual models of these features ([see object modeling for designers](https://hpadkisson.medium.com/object-modeling-for-designers-an-introduction-7871bdcf8baf))? These should be documented in Pajamas (example: [Merge Requests](https://design.gitlab.com/objects/merge-request)).
+1. **Conceptual model**: What are the current and future state conceptual models of these features ([see object modeling for designers](https://hpadkisson.medium.com/object-modeling-for-designers-an-introduction-7871bdcf8baf))? These should be documented in Pajamas (example: [merge requests](https://design.gitlab.com/objects/merge-request)).
1. **Merge conflicts**: What inconsistencies are there across project, group, and admin levels? How might these be addressed? For an example of how we rationalized this for labels, please see [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/338820).
1. **Inheritance & information flow**: How is information inherited across our container hierarchy currently? How might this be impacted if complying with the new [inheritance behavior](https://gitlab.com/gitlab-org/gitlab/-/issues/343316) framework?
1. **Settings**: Where can settings for this feature be found currently? How will these be impacted by `Namespaces`?
diff --git a/doc/architecture/blueprints/database_testing/index.md b/doc/architecture/blueprints/database_testing/index.md
index 4676caab85d..8c0cb550d61 100644
--- a/doc/architecture/blueprints/database_testing/index.md
+++ b/doc/architecture/blueprints/database_testing/index.md
@@ -100,7 +100,7 @@ The short-term goal is detailed in [this epic](https://gitlab.com/groups/gitlab-
### Mid-term - Improved feedback, query testing and background migration testing
-Mid-term, we plan to expand the level of detail the testing pipeline reports back to the Merge Request and expand its scope to cover query testing, too. By doing so, we use our experience from database code reviews and using thin-clone technology and bring this back closer to the GitLab workflow. Instead of reaching out to different tools (`postgres.ai`, `joe`, Slack, plan visualizations, and so on) we bring this back to GitLab and working directly on the Merge Request.
+Mid-term, we plan to expand the level of detail the testing pipeline reports back to the merge requet and expand its scope to cover query testing, too. By doing so, we use our experience from database code reviews and using thin-clone technology and bring this back closer to the GitLab workflow. Instead of reaching out to different tools (`postgres.ai`, `joe`, Slack, plan visualizations, and so on) we bring this back to GitLab and working directly on the merge request.
Secondly, we plan to cover background migrations testing, too. These are typically data migrations that are scheduled to run over a long period of time. The success of both the scheduling phase and the job execution phase typically depends a lot on data distribution - which only surfaces when running these migrations on actual production data. In order to become confident about a background migration, we plan to provide the following feedback:
@@ -109,7 +109,7 @@ Secondly, we plan to cover background migrations testing, too. These are typical
### Long-term - incorporate into GitLab product
-There are opportunities to discuss for extracting features from this into GitLab itself. For example, annotating the Merge Request with query examples and attaching feedback gathered from the testing run can become a first-class citizen instead of using Merge Request description and comments for it. We plan to evaluate those ideas as we see those being used in earlier phases and bring our experience back into the product.
+There are opportunities to discuss for extracting features from this into GitLab itself. For example, annotating the merge request with query examples and attaching feedback gathered from the testing run can become a first-class citizen instead of using merge request description and comments for it. We plan to evaluate those ideas as we see those being used in earlier phases and bring our experience back into the product.
## An alternative discussed: Anonymization
diff --git a/doc/architecture/blueprints/object_storage/index.md b/doc/architecture/blueprints/object_storage/index.md
index a79374d60bd..7864c951eca 100644
--- a/doc/architecture/blueprints/object_storage/index.md
+++ b/doc/architecture/blueprints/object_storage/index.md
@@ -89,20 +89,23 @@ replaced by a mock implementation. Furthermore, the presence of a
shared disk, both in CI and in local development, often hides broken
implementations until we deploy on an HA environment.
-Shipping MinIO as part of the product will reduce the differences
+One consideration we can take is to investigate shipping MinIO as part of the product. This could reduce the differences
between a cloud and a local installation, standardizing our file
storage on a single technology.
-The removal of local disk operations will reduce the complexity of
+The removal of local disk operations would reduce the complexity of
development as well as mitigate several security attack vectors as
we no longer write user-provided data on the local storage.
-It will also reduce human errors as we will always run a local object
+It would also reduce human errors as we will always run a local object
storage in development mode and any local file disk access should
raise a red flag during the merge request review.
This effort is described in [this epic](https://gitlab.com/groups/gitlab-org/-/epics/6099).
+Before considering any specific third-party technology, the
+open source software licensing implications should be considered. As of 23 April 2021, [MinIO is subject to the AGPL v3 license](https://github.com/minio/minio/commit/069432566fcfac1f1053677cc925ddafd750730a). GitLab Legal must be consulted before any decision is taken to ship MinIO as proposed in this blueprint.
+
### Enable direct upload by default on every upload
Because every group of features requires its own bucket, we don't have
diff --git a/doc/architecture/blueprints/runner_scaling/index.md b/doc/architecture/blueprints/runner_scaling/index.md
index 8e47b5fda8c..174fe191cc7 100644
--- a/doc/architecture/blueprints/runner_scaling/index.md
+++ b/doc/architecture/blueprints/runner_scaling/index.md
@@ -44,7 +44,7 @@ and the documentation for it has been removed from the official page. This
means that the original reason to use Docker Machine is no longer valid too.
To keep supporting our customers and the wider community we need to design a
-new mechanism for GitLab Runner autoscaling. It not only needs to support
+new mechanism for GitLab Runner auto-scaling. It not only needs to support
auto-scaling, but it also needs to do that in the way to enable us to build on
top of it to improve efficiency, reliability and availability.
@@ -144,7 +144,7 @@ on a single machine bring. It is difficult to predict that, so ideally we
should build a PoC that will help us to better understand what we can expect
from this.
-To run this experiement we most likely we will need to build an experimental
+To run this experiment we most likely we will need to build an experimental
plugin, that not only allows us to schedule running multiple builds on a single
machine, but also has a set of comprehensive metrics built into it, to make it
easier to understand how it performs.
@@ -204,6 +204,44 @@ document, define requirements and score the solution accordingly. This will
allow us to choose a solution that will work best for us and the wider
community.
+### Design principles
+
+Our goal is to design a GitLab Runner plugin system interface that is flexible
+and simple for the wider community to consume. As we cannot build plugins for
+all cloud platforms, we want to ensure a low entry barrier for anyone who needs
+to develop a plugin. We want to allow everyone to contribute.
+
+To achieve this goal, we will follow a few critical design principles. These
+principles will guide our development process for the new plugin system
+abstraction.
+
+#### General high-level principles
+
+1. Design the new auto-scaling architecture aiming for having more choices and
+ flexibility in the future, instead of imposing new constraints.
+1. Design the new auto-scaling architecture to experiment with running multiple
+ jobs in parallel, on a single machine.
+1. Design the new provisioning architecture to replace Docker Machine in a way
+ that the wider community can easily build on top of the new abstractions.
+
+#### Principles for the new plugin system
+
+1. Make the entry barrier for writing a new plugin low.
+1. Developing a new plugin should be simple and require only basic knowledge of
+ a programming language and a cloud provider's API.
+1. Strive for a balance between the plugin system's simplicity and flexibility.
+ These are not mutually exclusive.
+1. Abstract away as many technical details as possible but do not hide them completely.
+1. Build an abstraction that serves our community well but allows us to ship it quickly.
+1. Invest in a flexible solution, avoid one-way-door decisions, foster iteration.
+1. When in doubts err on the side of making things more simple for the wider community.
+
+#### The most important technical details
+
+1. Favor gRPC communication between a plugin and GitLab Runner.
+1. Make it possible to version communication interface and support many versions.
+1. Make Go a primary language for writing plugins but accept other languages too.
+
## Status
Status: RFC.