summaryrefslogtreecommitdiff
path: root/doc/architecture
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-12-06 15:14:39 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2021-12-06 15:14:39 +0000
commit55242833f832095a6fcff00b1ccacbc5900ee52a (patch)
tree6e17b16638e60099533473b540fe8f635d2f25da /doc/architecture
parent7c31b0312ba0eae4e4ebe54125b13aa2ae5f5db4 (diff)
downloadgitlab-ce-55242833f832095a6fcff00b1ccacbc5900ee52a.tar.gz
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/architecture')
-rw-r--r--doc/architecture/blueprints/ci_scale/index.md69
1 files changed, 35 insertions, 34 deletions
diff --git a/doc/architecture/blueprints/ci_scale/index.md b/doc/architecture/blueprints/ci_scale/index.md
index 3e9fbc534d5..af1cac42241 100644
--- a/doc/architecture/blueprints/ci_scale/index.md
+++ b/doc/architecture/blueprints/ci_scale/index.md
@@ -5,7 +5,7 @@ comments: false
description: 'Improve scalability of GitLab CI/CD'
---
-# Next CI/CD scale target: 20M builds per day by 2024
+# CI/CD Scaling
## Summary
@@ -20,13 +20,8 @@ store all the builds in PostgreSQL in `ci_builds` table, and because we are
creating more than [2 million builds each day on GitLab.com](https://docs.google.com/spreadsheets/d/17ZdTWQMnTHWbyERlvj1GA7qhw_uIfCoI5Zfrrsh95zU),
we are reaching database limits that are slowing our development velocity down.
-On February 1st, 2021, a billionth CI/CD job was created and the number of
-builds is growing exponentially. We will run out of the available primary keys
-for builds before December 2021 unless we improve the database model used to
-store CI/CD data.
-
-We expect to see 20M builds created daily on GitLab.com in the first half of
-2024.
+On February 1st, 2021, GitLab.com surpased 1 billion CI/CD builds created and the number of
+builds continues to grow exponentially.
![CI builds cumulative with forecast](ci_builds_cumulative_forecast.png)
@@ -60,8 +55,8 @@ that have the same problem.
Primary keys problem will be tackled by our Database Team.
-Status: As of October 2021 the primary keys in CI tables have been migrated to
-big integers.
+**Status**: As of October 2021 the primary keys in CI tables have been migrated
+to big integers.
### The table is too large
@@ -84,6 +79,14 @@ seem fine in the development environment may not work on GitLab.com. The
difference in the dataset size between the environments makes it difficult to
predict the performance of even the most simple queries.
+Team members and the wider community members are struggling to contribute the
+Verify area, because we restricted the possibility of extending `ci_builds`
+even further. Our static analysis tools prevent adding more columns to this
+table. Adding new queries is unpredictable because of the size of the dataset
+and the amount of queries executed using the table. This significantly hinders
+the development velocity and contributes to incidents on the production
+environment.
+
We also expect a significant, exponential growth in the upcoming years.
One of the forecasts done using [Facebook's
@@ -94,6 +97,10 @@ sustain in upcoming years.
![CI builds daily forecast](ci_builds_daily_forecast.png)
+**Status**: As of October 2021 we reduced the growth rate of `ci_builds` table
+by writing build options and variables to `ci_builds_metadata` table. We plan
+to ship futher improvements that will be described in a separate blueprint.
+
### Queuing mechanisms are using the large table
Because of how large the table is, mechanisms that we use to build queues of
@@ -114,8 +121,8 @@ table that will accelerate SQL queries used to build
queues](https://gitlab.com/gitlab-org/gitlab/-/issues/322766) and we want to
explore them.
-Status: the new architecture [has been implemented on GitLab.com](https://gitlab.com/groups/gitlab-org/-/epics/5909#note_680407908).
-
+**Status**: As of October 2021 the new architecture [has been implemented on
+GitLab.com](https://gitlab.com/groups/gitlab-org/-/epics/5909#note_680407908).
The following epic tracks making it generally available: [Make the new pending
builds architecture generally available](
https://gitlab.com/groups/gitlab-org/-/epics/6954).
@@ -136,17 +143,8 @@ columns, tables, partitions or database shards.
Effort to improve background migrations will be owned by our Database Team.
-Status: In progress.
-
-### Development velocity is negatively affected
-
-Team members and the wider community members are struggling to contribute the
-Verify area, because we restricted the possibility of extending `ci_builds`
-even further. Our static analysis tools prevent adding more columns to this
-table. Adding new queries is unpredictable because of the size of the dataset
-and the amount of queries executed using the table. This significantly hinders
-the development velocity and contributes to incidents on the production
-environment.
+**Status**: In progress. We plan to ship further improvements that will be
+described in a separate architectural blueprint.
## Proposal
@@ -157,32 +155,34 @@ First, we want to focus on things that are urgently needed right now. We need
to fix primary keys overflow risk and unblock other teams that are working on
database partitioning and sharding.
-We want to improve situation around bottlenecks that are known already, like
-queuing mechanisms using the large table and things that are holding other
-teams back.
+We want to improve known bottlenecks, like
+builds queuing mechanisms that is using the large table, and other things that
+are holding other teams back.
Extending CI/CD metrics is important to get a better sense of how the system
performs and to what growth should we expect. This will make it easier for us
to identify bottlenecks and perform more advanced capacity planning.
-As we work on first iterations we expect our Database Sharding team and
-Database Scalability Working Group to make progress on patterns we will be able
-to use to partition the large CI/CD dataset. We consider the strong time-decay
-effect, related to the diminishing importance of pipelines with time, as an
-opportunity we might want to seize.
+Next step is to better understand how we can leverage strong time-decay
+characteristic of CI/CD data. This might help us to partition CI/CD dataset to
+reduce the size of CI/CD database tables.
## Iterations
Work required to achieve our next CI/CD scaling target is tracked in the
-[GitLab CI/CD 20M builds per day scaling
-target](https://gitlab.com/groups/gitlab-org/-/epics/5745) epic.
+[CI/CD Scaling](https://gitlab.com/groups/gitlab-org/-/epics/5745) epic.
+
+1. ✓ Migrate primary keys to big integers on GitLab.com.
+1. ✓ Implement the new architecture of builds queuing on GitLab.com.
+1. Make the new builds queuing architecture generally available.
+1. Partition CI/CD data using time-decay pattern.
## Status
|-------------|--------------|
| Created at | 21.01.2021 |
| Approved at | 26.04.2021 |
-| Updated at | 28.10.2021 |
+| Updated at | 06.12.2021 |
Status: In progress.
@@ -215,6 +215,7 @@ Domain experts:
| Area | Who
|------------------------------|------------------------|
| Domain Expert / Verify | Fabio Pitino |
+| Domain Expert / Verify | Marius Bobin |
| Domain Expert / Database | Jose Finotto |
| Domain Expert / PostgreSQL | Nikolay Samokhvalov |