summaryrefslogtreecommitdiff
path: root/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md')
-rw-r--r--doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md85
1 files changed, 79 insertions, 6 deletions
diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
index d61412ae1ed..261390d1d14 100644
--- a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
+++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
@@ -75,7 +75,7 @@ incidents, over the last couple of months, for example:
- S2: 2022-04-12 [Transactions detected that have been running for more than 10m](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6821)
- S2: 2022-04-06 [Database contention plausibly caused by excessive `ci_builds` reads](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6773)
- S2: 2022-03-18 [Unable to remove a foreign key on `ci_builds`](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6642)
-- S2: 2022-10-10 [The queuing_queries_duration SLI apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7852#note_1130123525)
+- S2: 2022-10-10 [The `queuing_queries_duration` SLI apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7852#note_1130123525)
We have approximately 50 `ci_*` prefixed database tables, and some of them
would benefit from partitioning.
@@ -180,6 +180,49 @@ respective database tables. Using `RANGE` partitioning works similarly to using
of `partition_id` values, using `RANGE` partitioning might be a better
strategy.
+### Multi-project pipelines
+
+Parent-child pipeline will always be part of the same partition because child
+pipelines are considered a resource of the parent pipeline. They can't be
+viewed individually in the project pipeline list page.
+
+On the other hand, multi-project pipelines can be viewed in the pipeline list page.
+They can also be accessed from the pipeline graph as downstream/upstream links
+when created via the `trigger` token or the API using a job token.
+They can also be created from other pipelines by using trigger tokens, but in this
+case we don't store the source pipeline.
+
+While partitioning `ci_builds` we need to update the foreign keys to the
+`ci_sources_pipelines` table:
+
+```plain
+Foreign-key constraints:
+ "fk_be5624bf37" FOREIGN KEY (source_job_id) REFERENCES ci_builds(id) ON DELETE CASCADE
+ "fk_d4e29af7d7" FOREIGN KEY (source_pipeline_id) REFERENCES ci_pipelines(id) ON DELETE CASCADE
+ "fk_e1bad85861" FOREIGN KEY (pipeline_id) REFERENCES ci_pipelines(id) ON DELETE CASCADE
+```
+
+A `ci_sources_pipelines` record references two `ci_pipelines` rows (parent and
+the child). Our usual strategy has been to add a `partition_id` to the
+table, but if we do it here we will force all multi-project
+pipelines to be part of the same partition.
+
+We should add two `partition_id` columns for this table, a
+`partition_id` and a `source_partition_id`:
+
+```plain
+Foreign-key constraints:
+ "fk_be5624bf37" FOREIGN KEY (source_job_id, source_partition_id) REFERENCES ci_builds(id, source_partition_id) ON DELETE CASCADE
+ "fk_d4e29af7d7" FOREIGN KEY (source_pipeline_id, source_partition_id) REFERENCES ci_pipelines(id, source_partition_id) ON DELETE CASCADE
+ "fk_e1bad85861" FOREIGN KEY (pipeline_id, partition_id) REFERENCES ci_pipelines(id, partition_id) ON DELETE CASCADE
+```
+
+This solution is the closest to a two way door decision because:
+
+- We retain the ability to reference pipelines in different partitions.
+- If we later decide that we want to force multi-project pipelines in the same partition
+ we could add a constraint to validate that both columns have the same value.
+
## Why do we want to use explicit logical partition ids?
Partitioning CI/CD data using a logical `partition_id` has several benefits. We
@@ -248,10 +291,9 @@ smart enough to move rows between partitions on its own.
A partitioned table is called a **routing** table and it will use the `p_`
prefix which should help us with building automated tooling for query analysis.
-A table partition will be called **partition** and it can use the a
-physical partition ID as suffix, leaded by a `p` letter, for example
-`ci_builds_p101`. Existing CI tables will become **zero partitions** of the
-new routing tables. Depending on the chosen
+A table partition will be called **partition** and it can use the a physical
+partition ID as suffix, for example `ci_builds_101`. Existing CI tables will
+become **zero partitions** of the new routing tables. Depending on the chosen
[partitioning strategy](#how-do-we-want-to-partition-cicd-data) for a given
table, it is possible to have many logical partitions per one physical partition.
@@ -273,6 +315,37 @@ during a low traffic period([after `00:00 UTC`](https://dashboards.gitlab.net/d/
See an example of this strategy in our [partition tooling](../../../development/database/table_partitioning.md#step-6---create-parent-table-and-attach-existing-table-as-the-initial-partition)).
+### Partitioning steps
+
+The database [partition tooling](../../../development/database/table_partitioning.md#partitioning-a-table-list)
+docs contain a list of steps to partition a table, but the steps are not enough
+for our iterative strategy. As our dataset continues to grow we want to take
+advantage of partitioning performance right away and not wait until all tables
+are partitioned. For example, after partitioning the `ci_builds_metadata` table
+we want to start writing and reading data to/from a new partition. This means
+that we will increase the `partition_id` value from `100`, the default value,
+to `101`. Now all of the new resources for the pipeline hierarchy will be
+persisted with `partition_id = 101`. We can continue following the database
+tooling instructions for the next table that will be partitioned, but we require
+a few extra steps:
+
+- add `partition_id` column for the FK references with default value of `100`
+ since the majority of records should have that value.
+- change application logic to cascade the `partition_id` value
+- correct `partition_id` values for recent records with a post deploy/background
+ migration, similar to this:
+
+ ```sql
+ UPDATE ci_pipeline_metadata
+ SET partition_id = ci_pipelines.partition_id
+ FROM ci_pipelines
+ WHERE ci_pipelines.id = ci_pipeline_metadata.pipeline_id
+ AND ci_pipelines.partition_id in (101, 102);
+ ```
+
+- change the foreign key definitions
+- ...
+
## Storing partitions metadata in the database
To build an efficient mechanism that will be responsible for creating
@@ -297,7 +370,7 @@ system - any letter from `g` to `z` in Latin alphabet, for example `x`. In that
case an example of an URI would look like `1e240x5ba0`. If we decide to update
the primary identifier of a partitioned resource (today it is just a big
integer) it is important to design a system that is resilient to migrating data
-between partitions, to avoid changing idenfiers when rebalancing happens.
+between partitions, to avoid changing identifiers when rebalancing happens.
`ci_partitions` table will store information about a partition identifier,
pipeline ids range it is valid for and whether the partitions have been