Add latest changes from gitlab-org/gitlab@14-5-stable-eev14.5.0-rc42

author: GitLab Bot <gitlab-bot@gitlab.com> 2021-11-18 13:16:36 +0000
committer: GitLab Bot <gitlab-bot@gitlab.com> 2021-11-18 13:16:36 +0000
commit: 311b0269b4eb9839fa63f80c8d7a58f32b8138a0 (patch)
tree: 07e7870bca8aed6d61fdcc810731c50d2c40af47 /doc/development/database
parent: 27909cef6c4170ed9205afa7426b8d3de47cbb0c (diff)
download: gitlab-ce-311b0269b4eb9839fa63f80c8d7a58f32b8138a0.tar.gz
2 files changed, 238 insertions, 10 deletions
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
new file mode 100644
index 00000000000..157c1284512
--- /dev/null
+++ b/doc/development/database/loose_foreign_keys.md
@@ -0,0 +1,182 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Loose foreign keys
+
+## Problem statement
+
+In relational databases (including PostgreSQL), foreign keys provide a way to link
+two database tables together, and ensure data-consistency between them. In GitLab,
+[foreign keys](../foreign_keys.md) are vital part of the database design process.
+Most of our database tables have foreign keys.
+
+With the ongoing database [decomposition work](https://gitlab.com/groups/gitlab-org/-/epics/6168),
+linked records might be present on two different database servers. Ensuring data consistency
+between two databases is not possible with standard PostgreSQL foreign keys. PostgreSQL
+does not support foreign keys operating within a single database server, defining
+a link between two database tables in two different database servers over the network.
+
+Example:
+
+- Database "Main": `projects` table
+- Database "CI": `ci_pipelines` table
+
+A project can have many pipelines. When a project is deleted, the associated `ci_pipeline` (via the
+`project_id` column) records must be also deleted.
+
+With a multi-database setup, this cannot be achieved with foreign keys.
+
+## Asynchronous approach
+
+Our preferred approach to this problem is eventual consistency. With the loose foreign keys
+feature, we can configure delayed association cleanup without negatively affecting the
+application performance.
+
+### How it works
+
+In the previous example, a record in the `projects` table can have multiple `ci_pipeline`
+records. To keep the cleanup process separate from the actual parent record deletion,
+we can:
+
+1. Create a `DELETE` trigger on the `projects` table.
+   Record the deletions in a separate table (`deleted_records`).
+1. A job checks the `deleted_records` table every 5 minutes.
+1. For each record in the table, delete the associated `ci_pipelines` records
+   using the `project_id` column.
+
+NOTE:
+For this procedure to work, we must register which tables to clean up asynchronously.
+
+## Example migration and configuration
+
+### Configure the model
+
+First, tell the application that the `projects` table has a new loose foreign key.
+You can do this in the `Project` model:
+
+```ruby
+class Project < ApplicationRecord
+  # ...
+
+  include LooseForeignKey
+
+  loose_foreign_key :ci_pipelines, :project_id, on_delete: :async_delete # or async_nullify
+
+  # ...
+end
+```
+
+This instruction ensures the asynchronous cleanup process knows about the association, and the
+how to do the cleanup. In this case, the associated `ci_pipelines` records are deleted.
+
+### Track record changes
+
+To know about deletions in the `projects` table, configure a `DELETE` trigger using a database
+migration (post-migration). The trigger needs to be configured only once. If the model already has
+at least one `loose_foreign_key` definition, then this step can be skipped:
+
+```ruby
+class TrackProjectRecordChanges < Gitlab::Database::Migration[1.0]
+  include Gitlab::Database::MigrationHelpers::LooseForeignKeyHelpers
+
+  enable_lock_retries!
+
+  def up
+    track_record_deletions(:projects)
+  end
+
+  def down
+    untrack_record_deletions(:projects)
+  end
+end
+```
+
+### Remove the foreign key
+
+If there is an existing foreign key, then it can be removed from the database. As of GitLab 14.5,
+the following foreign key describes the link between the `projects` and `ci_pipelines` tables:
+
+```sql
+ALTER TABLE ONLY ci_pipelines
+ADD CONSTRAINT fk_86635dbd80
+FOREIGN KEY (project_id)
+REFERENCES projects(id)
+ON DELETE CASCADE;
+```
+
+The migration should run after the `DELETE` trigger is installed. If the foreign key is deleted
+earlier, there is a good chance of introducing data inconsistency which needs manual cleanup:
+
+```ruby
+class RemoveProjectsCiPipelineFk < Gitlab::Database::Migration[1.0]
+  enable_lock_retries!
+
+  def up
+    remove_foreign_key_if_exists(:ci_pipelines, :projects, name: "fk_86635dbd80")
+  end
+
+  def down
+    add_concurrent_foreign_key(:ci_pipelines, :projects, name: "fk_86635dbd80", column: :project_id, target_column: :id, on_delete: "cascade")
+  end
+end
+```
+
+At this point, the setup phase is concluded. The deleted `projects` records should be automatically
+picked up by the scheduled cleanup worker job.
+
+## Caveats of loose foreign keys
+
+### Record creation
+
+The feature provides an efficient way of cleaning up associated records after the parent record is
+deleted. Without foreign keys, it's the application's responsibility to validate if the parent record
+exists when a new associated record is created.
+
+A bad example: record creation with the given ID (`project_id` comes from user input).
+In this example, nothing prevents us from passing a random project ID:
+
+```ruby
+Ci::Pipeline.create!(project_id: params[:project_id])
+```
+
+A good example: record creation with extra check:
+
+```ruby
+project = Project.find(params[:project_id])
+Ci::Pipeline.create!(project_id: project.id)
+```
+
+### Association lookup
+
+Consider the following HTTP request:
+
+```plaintext
+GET /projects/5/pipelines/100
+```
+
+The controller action ignores the `project_id` parameter and finds the pipeline using the ID:
+
+```ruby
+  def show
+  # bad, avoid it
+  pipeline = Ci::Pipeline.find(params[:id]) # 100
+end
+```
+
+This endpoint still works when the parent `Project` model is deleted. This can be considered a
+a data leak which should not happen under normal circumstances:
+
+```ruby
+def show
+  # good
+  project = Project.find(params[:project_id])
+  pipeline = project.pipelines.find(params[:pipeline_id]) # 100
+end
+```
+
+NOTE:
+This example is unlikely in GitLab, because we usually look up the parent models to perform
+permission checks.
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index 0ba752ba3a6..a17ad798305 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -88,16 +88,6 @@ test: &test
       statement_timeout: 120s
 ```
 
-### Migrations
-
-Place any migrations that affect `Ci::CiDatabaseRecord` models
-and their tables in two directories:
-
-- `db/migrate`
-- `db/ci_migrate`
-
-We aim to keep the schema for both tables the same across both databases.
-
 <!--
 NOTE: The `validate_cross_joins!` method in `spec/support/database/prevent_cross_joins.rb` references
       the following heading in the code, so if you make a change to this heading, make sure to update
@@ -272,6 +262,62 @@ logic to delete these rows if or whenever necessary in your domain.
 Finally, this de-normalization and new query also improves performance because
 it does less joins and needs less filtering.
 
+##### Remove a redundant join
+
+Sometimes there are cases where a query is doing excess (or redundant) joins.
+
+A common example occurs where a query is joining from `A` to `C`, via some
+table with both foreign keys, `B`.
+When you only care about counting how
+many rows there are in `C` and if there are foreign keys and `NOT NULL` constraints
+on the foreign keys in `B`, then it might be enough to count those rows.
+For example, in
+[MR 71811](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71811), it was
+previously doing `project.runners.count`, which would produce a query like:
+
+```sql
+select count(*) from projects
+inner join ci_runner_projects on ci_runner_projects.project_id = projects.id
+where ci_runner_projects.runner_id IN (1, 2, 3)
+```
+
+This was changed to avoid the cross-join by changing the code to
+`project.runner_projects.count`. It produces the same response with the
+following query:
+
+```sql
+select count(*) from ci_runner_projects
+where ci_runner_projects.runner_id IN (1, 2, 3)
+```
+
+Another common redundant join is joining all the way to another table,
+then filtering by primary key when you could have instead filtered on a foreign
+key. See an example in
+[MR 71614](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71614). The previous
+code was `joins(scan: :build).where(ci_builds: { id: build_ids })`, which
+generated a query like:
+
+```sql
+select ...
+inner join security_scans
+inner join ci_builds on security_scans.build_id = ci_builds.id
+where ci_builds.id IN (1, 2, 3)
+```
+
+However, as `security_scans` already has a foreign key `build_id`, the code
+can be changed to `joins(:scan).where(security_scans: { build_id: build_ids })`,
+which produces the same response with the following query:
+
+```sql
+select ...
+inner join security_scans
+where security_scans.build_id IN (1, 2, 3)
+```
+
+Both of these examples of removing redundant joins remove the cross-joins,
+but they have the added benefit of producing simpler and faster
+queries.
+
 ##### Use `disable_joins` for `has_one` or `has_many` `through:` relations
 
 Sometimes a join query is caused by using `has_one ... through:` or `has_many
author	GitLab Bot <gitlab-bot@gitlab.com>	2021-11-18 13:16:36 +0000
committer	GitLab Bot <gitlab-bot@gitlab.com>	2021-11-18 13:16:36 +0000
commit	311b0269b4eb9839fa63f80c8d7a58f32b8138a0 (patch)
tree	07e7870bca8aed6d61fdcc810731c50d2c40af47 /doc/development/database
parent	27909cef6c4170ed9205afa7426b8d3de47cbb0c (diff)
download	gitlab-ce-311b0269b4eb9839fa63f80c8d7a58f32b8138a0.tar.gz