summaryrefslogtreecommitdiff
path: root/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md')
-rw-r--r--doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md48
1 files changed, 34 insertions, 14 deletions
diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
index 5f907ecdaa4..d61412ae1ed 100644
--- a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
+++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
@@ -7,18 +7,6 @@ description: 'Pipeline data partitioning design'
# Pipeline data partitioning design
-_Disclaimer: The following contains information related to upcoming products,
-features, and functionality._
-
-_It is important to note that the information presented is for informational
-purposes only. Please do not rely on this information for purchasing or
-planning purposes._
-
-_As with all projects, the items mentioned in this document and linked pages
-are subject to change or delay. The development, release and timing of any
-products, features, or functionality remain at the sole discretion of GitLab
-Inc._
-
## What problem are we trying to solve?
We want to partition the CI/CD dataset, because some of the database tables are
@@ -267,9 +255,27 @@ new routing tables. Depending on the chosen
[partitioning strategy](#how-do-we-want-to-partition-cicd-data) for a given
table, it is possible to have many logical partitions per one physical partition.
+### Attaching first partition and acquiring locks
+
+We learned when [partitioning](https://gitlab.com/gitlab-org/gitlab/-/issues/378644)
+the first table that `PostgreSQL` requires an `AccessExclusiveLock` on the table and
+all of the other tables that it references through foreign keys. This can cause a deadlock
+if the migration tries to acquire the locks in a different order from the application
+business logic.
+
+To solve this problem, we introduced a **priority locking strategy** to avoid
+further deadlock errors. This allows us to define the locking order and
+then try keep retrying aggressively until we acquire the locks or run out of retries.
+This process can take up to 40 minutes.
+
+With this strategy, we successfully acquired a lock on `ci_builds` table after 15 retries
+during a low traffic period([after `00:00 UTC`](https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&viewPanel=537181794&from=now-2d&to=now)).
+
+See an example of this strategy in our [partition tooling](../../../development/database/table_partitioning.md#step-6---create-parent-table-and-attach-existing-table-as-the-initial-partition)).
+
## Storing partitions metadata in the database
-In order to build an efficient mechanism that will be responsible for creating
+To build an efficient mechanism that will be responsible for creating
new partitions, and to implement time decay we want to introduce a partitioning
metadata table, called `ci_partitions`. In that table we would store metadata
about all the logical partitions, with many pipelines per partition. We may
@@ -366,6 +372,20 @@ scope block takes an argument). Preloading instance dependent scopes is not
supported.
```
+### Query analyzers
+
+We implemented 2 query analyzers to detect queries that need to be fixed so that everything
+keeps working with partitioned tables:
+
+- One analyzer to detect queries not going through a routing table.
+- One analyzer to detect queries that use routing tables without specifying the `partition_id` in the `WHERE` clauses.
+
+We started by enabling our first analyzer in `test` environment to detect existing broken
+queries. It is also enabled on `production` environment, but for a small subset of the traffic (`0.1%`)
+because of scalability concerns.
+
+The second analyzer will be enabled in a future iteration.
+
### Primary key
Primary key must include the partitioning key column to partition the table.
@@ -652,7 +672,7 @@ application-wide outage.
1. Make it possible to create partitions in an automatic way.
1. Deliver the new architecture to self-managed instances.
-The diagram below visualizes this plan on Gantt chart. Please note that dates
+The diagram below visualizes this plan on Gantt chart. The dates
on the chart below are just estimates to visualize the plan better, these are
not deadlines and can change at any time.